My Big Numbers talk at Festivaletteratura

September 14th, 2017

Last weekend, I gave a talk on big numbers, as well as a Q&A about quantum computing, at Festivaletteratura: one of the main European literary festivals, held every year in beautiful and historic Mantua, Italy.  (For those who didn’t know, as I didn’t: this is the city where Virgil was born, and where Romeo gets banished in Romeo and Juliet.  Its layout hasn’t substantially changed since the Middle Ages.)

I don’t know how much big numbers or quantum computing have to do with literature, but I relished the challenge of explaining these things to an audience that was not merely “popular” but humanisitically rather than scientifically inclined.  In this case, there was not only a math barrier, but also a language barrier, as the festival was mostly in Italian and only some of the attendees knew English, to varying degrees.  The quantum computing session was live-translated into Italian (the challenge faced by the translator in not mangling this material provided a lot of free humor), but the big numbers talk wasn’t.  What’s more, the talk was held outdoors, on the steps of a cathedral, with tons of background noise, including a bell that loudly chimed halfway through the talk.  So if my own words weren’t simple and clear, forget it.

Anyway, in the rest of this post, I’ll share a writeup of my big numbers talk.  The talk has substantial overlap with my “classic” Who Can Name The Bigger Number? essay from 1999.  While I don’t mean to supersede or displace that essay, the truth is that I think and write somewhat differently than I did as a teenager (whuda thunk?), and I wanted to give Scott2017 a crack at material that Scott1999 has been over already.  If nothing else, the new version is more up-to-date and less self-indulgent, and it includes points (for example, the relation between ordinal generalizations of the Busy Beaver function and the axioms of set theory) that I didn’t understand back in 1999.

For regular readers of this blog, I don’t know how much will be new here.  But if you’re one of those people who keeps introducing themselves at social events by saying “I really love your blog, Scott, even though I don’t understand anything that’s in it”—something that’s always a bit awkward for me, because, uh, thanks, I guess, but what am I supposed to say next?—then this lecture is for you.  I hope you’ll read it and understand it.

Thanks so much to Festivaletteratura organizer Matteo Polettini for inviting me, and to Fabrizio Illuminati for moderating the Q&A.  I had a wonderful time in Mantua, although I confess there’s something about being Italian that I don’t understand.  Namely: how do you derive any pleasure from international travel, if anywhere you go, the pizza, pasta, bread, cheese, ice cream, coffee, architecture, scenery, historical sights, and pretty much everything else all fall short of what you’re used to?


Big Numbers

by Scott Aaronson
Sept. 9, 2017

My four-year-old daughter sometimes comes to me and says something like: “daddy, I think I finally figured out what the biggest number is!  Is it a million million million million million million million million thousand thousand thousand hundred hundred hundred hundred twenty eighty ninety eighty thirty a million?”

So I reply, “I’m not even sure exactly what number you named—but whatever it is, why not that number plus one?”

“Oh yeah,” she says.  “So is that the biggest number?”

Of course there’s no biggest number, but it’s natural to wonder what are the biggest numbers we can name in a reasonable amount of time.  Can I have two volunteers from the audience—ideally, two kids who like math?

[Two kids eventually come up.  I draw a line down the middle of the blackboard, and place one kid on each side of it, each with a piece of chalk.]

So the game is, you each have ten seconds to write down the biggest number you can.  You can’t write anything like “the other person’s number plus 1,” and you also can’t write infinity—it has to be finite.  But other than that, you can write basically anything you want, as long as I’m able to understand exactly what number you’ve named.  [These instructions are translated into Italian for the kids.]

Are you ready?  On your mark, get set, GO!

[The kid on the left writes something like: 999999999

While the kid on the right writes something like: 11111111111111111

Looking at these, I comment:]

9 is bigger than 1, but 1 is a bit faster to write, and as you can see that makes the difference here!  OK, let’s give our volunteers a round of applause.

[I didn’t plant the kids, but if I had, I couldn’t have designed a better jumping-off point.]


I’ve been fascinated by how to name huge numbers since I was a kid myself.  When I was a teenager, I even wrote an essay on the subject, called Who Can Name the Bigger Number?  That essay might still get more views than any of the research I’ve done in all the years since!  I don’t know whether to be happy or sad about that.

I think the reason the essay remains so popular, is that it shows up on Google whenever someone types something like “what is the biggest number?”  Some of you might know that Google itself was named after the huge number called a googol: 10100, or 1 followed by a hundred zeroes.

Of course, a googol isn’t even close to the biggest number we can name.  For starters, there’s a googolplex, which is 1 followed by a googol zeroes.  Then there’s a googolplexplex, which is 1 followed by a googolplex zeroes, and a googolplexplexplex, and so on.  But one of the most basic lessons you’ll learn in this talk is that, when it comes to naming big numbers, whenever you find yourself just repeating the same operation over and over and over, it’s time to step back, and look for something new to do that transcends everything you were doing previously.  (Applications to everyday life left as exercises for the listener.)

One of the first people to think about systems for naming huge numbers was Archimedes, who was Greek but lived in what’s now Italy (specifically Syracuse, Sicily) in the 200s BC.  Archimedes wrote a sort of pop-science article—possibly history’s first pop-science article—called The Sand-Reckoner.  In this remarkable piece, which was addressed to the King of Syracuse, Archimedes sets out to calculate an upper bound on the number of grains of sand needed to fill the entire universe, or at least the universe as known in antiquity.  He thereby seeks to refute people who use “the number of sand grains” as a shorthand for uncountability and unknowability.

Of course, Archimedes was just guessing about the size of the universe, though he did use the best astronomy available in his time—namely, the work of Aristarchus, who anticipated Copernicus.  Besides estimates for the size of the universe and of a sand grain, the other thing Archimedes needed was a way to name arbitrarily large numbers.  Since he didn’t have Arabic numerals or scientific notation, his system was basically just to compose the word “myriad” (which means 10,000) into bigger and bigger chunks: a “myriad myriad” gets its own name, a “myriad myriad myriad” gets another, and so on.  Using this system, Archimedes estimated that ~1063 sand grains would suffice to fill the universe.  Ancient Hindu mathematicians were able to name similarly large numbers using similar notations.  In some sense, the next really fundamental advances in naming big numbers wouldn’t occur until the 20th century.

We’ll come to those advances, but before we do, I’d like to discuss another question that motivated Archimedes’ essay: namely, what are the biggest numbers relevant to the physical world?

For starters, how many atoms are in a human body?  Anyone have a guess?  About 1028.  (If you remember from high-school chemistry that a “mole” is 6×1023, this is not hard to ballpark.)

How many stars are in our galaxy?  Estimates vary, but let’s say a few hundred billion.

How many stars are in the entire observable universe?  Something like 1023.

How many subatomic particles are in the observable universe?  No one knows for sure—for one thing, because we don’t know what the dark matter is made of—but 1090 is a reasonable estimate.

Some of you might be wondering: but for all anyone knows, couldn’t the universe be infinite?  Couldn’t it have infinitely many stars and particles?  The answer to that is interesting: indeed, no one knows whether space goes on forever or curves back on itself, like the surface of the earth.  But because of the dark energy, discovered in 1998, it seems likely that even if space is infinite, we can only ever see a finite part of it.  The dark energy is a force that pushes the galaxies apart.  The further away they are from us, the faster they’re receding—with galaxies far enough away from us receding faster than light.

Right now, we can see the light from galaxies that are up to about 45 billion light-years away.  (Why 45 billion light-years, you ask, if the universe itself is “only” 13.6 billion years old?  Well, when the galaxies emitted the light, they were a lot closer to us than they are now!  The universe expanded in the meantime.)  If, as seems likely, the dark energy has the form of a cosmological constant, then there’s a somewhat further horizon, such that it’s not just that the galaxies beyond that can’t be seen by us right now—it’s that they can never be seen.


In practice, many big numbers come from the phenomenon of exponential growth.  Here’s a graph showing the three functions n, n2, and 2n:

The difference is, n and even n2 grow in a more-or-less manageable way, but 2n just shoots up off the screen.  The shooting-up has real-life consequences—indeed, more important consequences than just about any other mathematical fact one can think of.

The current human population is about 7.5 billion (when I was a kid, it was more like 5 billion).  Right now, the population is doubling about once every 64 years.  If it continues to double at that rate, and humans don’t colonize other worlds, then you can calculate that, less than 3000 years from now, the entire earth, all the way down to the core, will be made of human flesh.  I hope the people use deodorant!

Nuclear chain reactions are a second example of exponential growth: one uranium or plutonium nucleus fissions and emits neutrons that cause, let’s say, two other nuclei to fission, which then cause four nuclei to fission, then 8, 16, 32, and so on, until boom, you’ve got your nuclear weapon (or your nuclear reactor, if you do something to slow the process down).  A third example is compound interest, as with your bank account, or for that matter an entire country’s GDP.  A fourth example is Moore’s Law, which is the thing that said that the number of components in a microprocessor doubled every 18 months (with other metrics, like memory, processing speed, etc., on similar exponential trajectories).  Here at Festivaletteratura, there’s a “Hack Space,” where you can see state-of-the-art Olivetti personal computers from around 1980: huge desk-sized machines with maybe 16K of usable RAM.  Moore’s Law is the thing that took us from those (and the even bigger, weaker computers before them) to the smartphone that’s in your pocket.

However, a general rule is that any time we encounter exponential growth in our observed universe, it can’t last for long.  It will stop, if not before then when it runs out of whatever resource it needs to continue: for example, food or land in the case of people, fuel in the case of a nuclear reaction.  OK, but what about Moore’s Law: what physical constraint will stop it?

By some definitions, Moore’s Law has already stopped: computers aren’t getting that much faster in terms of clock speed; they’re mostly just getting more and more parallel, with more and more cores on a chip.  And it’s easy to see why: the speed of light is finite, which means the speed of a computer will always be limited by the size of its components.  And transistors are now just 15 nanometers across; a couple orders of magnitude smaller and you’ll be dealing with individual atoms.  And unless we leap really far into science fiction, it’s hard to imagine building a transistor smaller than one atom across!

OK, but what if we do leap really far into science fiction?  Forget about engineering difficulties: is there any fundamental principle of physics that prevents us from making components smaller and smaller, and thereby making our computers faster and faster, without limit?

While no one has tested this directly, it appears from current physics that there is a fundamental limit to speed, and that it’s about 1043 operations per second, or one operation per Planck time.  Likewise, it appears that there’s a fundamental limit to the density with which information can be stored, and that it’s about 1069 bits per square meter, or one bit per Planck area. (Surprisingly, the latter limit scales only with the surface area of a region, not with its volume.)

What would happen if you tried to build a faster computer than that, or a denser hard drive?  The answer is: cycling through that many different states per second, or storing that many bits, would involve concentrating so much energy in so small a region, that the region would exceed what’s called its Schwarzschild radius.  If you don’t know what that means, it’s just a fancy way of saying that your computer would collapse to a black hole.  I’ve always liked that as Nature’s way of telling you not to do something!

Note that, on the modern view, a black hole itself is not only the densest possible object allowed by physics, but also the most efficient possible hard drive, storing ~1069 bits per square meter of its event horizon—though the bits are not so easy to retrieve! It’s also, in a certain sense, the fastest possible computer, since it really does cycle through 1043 states per second—though it might not be computing anything that anyone would care about.

We can also combine these fundamental limits on computer speed and storage capacity, with the limits that I mentioned earlier on the size of the observable universe, which come from the cosmological constant.  If we do so, we get an upper bound of ~10122 on the number of bits that can ever be involved in any computation in our world, no matter how large: if we tried to do a bigger computation than that, the far parts of it would be receding away from us faster than the speed of light.  In some sense, this 10122 is the most fundamental number that sets the scale of our universe: on the current conception of physics, everything you’ve ever seen or done, or will see or will do, can be represented by a sequence of at most 10122 ones and zeroes.


Having said that, in math, computer science, and many other fields (including physics itself), many of us meet bigger numbers than 10122 dozens of times before breakfast! How so? Mostly because we choose to ask, not about the number of things that are, but about the number of possible ways they could be—not about the size of ordinary 3-dimensional space, but the sizes of abstract spaces of possible configurations. And the latter are subject to exponential growth, continuing way beyond 10122.

As an example, let’s ask: how many different novels could possibly be written (say, at most 400 pages long, with a normal-size font, yadda yadda)? Well, we could get a lower bound on the number just by walking around here at Festivaletteratura, but the number that could be written certainly far exceeds the number that have been written or ever will be. This was the subject of Jorge Luis Borges’ famous story The Library of Babel, which imagined an immense library containing every book that could possibly be written up to a certain length. Of course, the vast majority of the books are filled with meaningless nonsense, but among their number one can find all the great works of literature, books predicting the future of humanity in perfect detail, books predicting the future except with a single error, etc. etc. etc.

To get more quantitative, let’s simply ask: how many different ways are there to fill the first page of a novel?  Let’s go ahead and assume that the page is filled with intelligible (or at least grammatical) English text, rather than arbitrary sequences of symbols, at a standard font size and page size.  In that case, using standard estimates for the entropy (i.e., compressibility) of English, I estimated this morning that there are maybe ~10700 possibilities.  So, forget about the rest of the novel: there are astronomically more possible first pages than could fit in the observable universe!

We could likewise ask: how many chess games could be played?  I’ve seen estimates from 1040 up to 10120, depending on whether we count only “sensible” games or also “absurd” ones (though in all cases, with a limit on the length of the game as might occur in a real competition). For Go, by contrast, which is played on a larger board (19×19 rather than 8×8) the estimates for the number of possible games seem to start at 10800 and only increase from there. This difference in magnitudes has something to do with why Go is a “harder” game than chess, why computers were able to beat the world chess champion already in 1997, but the world Go champion not until last year.

Or we could ask: given a thousand cities, how many routes are there for a salesman that visit each city exactly once? We write the answer as 1000!, pronounced “1000 factorial,” which just means 1000×999×998×…×2×1: there are 1000 choices for the first city, then 999 for the second city, 998 for the third, and so on.  This number is about 4×102567.  So again, more possible routes than atoms in the visible universe, yadda yadda.

But suppose the salesman is interested only in the shortest route that visits each city, given the distance between every city and every other.  We could then ask: to find that shortest route, would a computer need to search exhaustively through all 1000! possibilities—or, maybe not all 1000!, maybe it could be a bit more clever than that, but at any rate, a number that grew exponentially with the number of cities n?  Or could there be an algorithm that zeroed in on the shortest route dramatically faster: say, using a number of steps that grew only linearly or quadratically with the number of cities?

This, modulo a few details, is one of the most famous unsolved problems in all of math and science.  You may have heard of it; it’s called P versus NP.  P (Polynomial-Time) is the class of problems that an ordinary digital computer can solve in a “reasonable” amount of time, where we define “reasonable” to mean, growing at most like the size of the problem (for example, the number of cities) raised to some fixed power.  NP (Nondeterministic Polynomial-Time) is the class for which a computer can at least recognize a solution in polynomial-time.  If P=NP, it would mean that for every combinatorial problem of this sort, for which a computer could recognize a valid solution—Sudoku puzzles, scheduling airline flights, fitting boxes into the trunk of a car, etc. etc.—there would be an algorithm that cut through the combinatorial explosion of possible solutions, and zeroed in on the best one.  If P≠NP, it would mean that at least some problems of this kind required astronomical time, regardless of how cleverly we programmed our computers.

Most of us believe that P≠NP—indeed, I like to say that if we were physicists, we would’ve simply declared P≠NP a “law of nature,” and given ourselves Nobel Prizes for the discovery of the law!  And if it turned out that P=NP, we’d just give ourselves more Nobel Prizes for the law’s overthrow.  But because we’re mathematicians and computer scientists, we call it a “conjecture.”

Another famous example of an NP problem is: I give you (say) a 2000-digit number, and I ask you to find its prime factors.  Multiplying two thousand-digit numbers is easy, at least for a computer, but factoring the product back into primes seems astronomically hard—at least, with our present-day computers running any known algorithm.  Why does anyone care?  Well, you might know that, any time you order something online—in fact, every time you see a little padlock icon in your web browser—your personal information, like (say) your credit card number, is being protected by a cryptographic code that depends on the belief that factoring huge numbers is hard, or a few closely-related beliefs.  If P=NP, then those beliefs would be false, and indeed all cryptography that depends on hard math problems would be breakable in “reasonable” amounts of time.

In the special case of factoring, though—and of the other number theory problems that underlie modern cryptography—it wouldn’t even take anything as shocking as P=NP for them to fall.  Actually, that provides a good segue into another case where exponentials, and numbers vastly larger than 10122, regularly arise in the real world: quantum mechanics.

Some of you might have heard that quantum mechanics is complicated or hard.  But I can let you in on a secret, which is that it’s incredibly simple once you take the physics out of it!  Indeed, I think of quantum mechanics as not exactly even “physics,” but more like an operating system that the rest of physics runs on as application programs.  It’s a certain generalization of the rules of probability.  In one sentence, the central thing quantum mechanics says is that, to fully describe a physical system, you have to assign a number called an “amplitude” to every possible configuration that the system could be found in.  These amplitudes are used to calculate the probabilities that the system will be found in one configuration or another if you look at it.  But the amplitudes aren’t themselves probabilities: rather than just going from 0 to 1, they can be positive or negative or even complex numbers.

For us, the key point is that, if we have a system with (say) a thousand interacting particles, then the rules of quantum mechanics say we need at least 21000 amplitudes to describe it—which is way more than we could write down on pieces of paper filling the entire observable universe!  In some sense, chemists and physicists knew about this immensity since 1926.  But they knew it mainly as a practical problem: if you’re trying to simulate quantum mechanics on a conventional computer, then as far as we know, the resources needed to do so increase exponentially with the number of particles being simulated.  Only in the 1980s did a few physicists, such as Richard Feynman and David Deutsch, suggest “turning the lemon into lemonade,” and building computers that themselves would exploit the exponential growth of amplitudes.  Supposing we built such a computer, what would it be good for?  At the time, the only obvious application was simulating quantum mechanics itself!  And that’s probably still the most important application today.

In 1994, though, a guy named Peter Shor made a discovery that dramatically increased the level of interest in quantum computers.  That discovery was that a quantum computer, if built, could factor an n-digit number using a number of steps that grows only like about n2, rather than exponentially with n.  The upshot is that, if and when practical quantum computers are built, they’ll be able to break almost all the cryptography that’s currently used to secure the Internet.

(Right now, only small quantum computers have been built; the record for using Shor’s algorithm is still to factor 21 into 3×7 with high statistical confidence!  But Google is planning within the next year or so to build a chip with 49 quantum bits, or qubits, and other groups around the world are pursuing parallel efforts.  Almost certainly, 49 qubits still won’t be enough to do anything useful, including codebreaking, but it might be enough to do something classically hard, in the sense of taking at least ~249 or 563 trillion steps to simulate classically.)

I should stress, though, that for other NP problems—including breaking various other cryptographic codes, and solving the Traveling Salesman Problem, Sudoku, and the other combinatorial problems mentioned earlier—we don’t know any quantum algorithm analogous to Shor’s factoring algorithm.  For these problems, we generally think that a quantum computer could solve them in roughly the square root of the number of steps that would be needed classically, because of another famous quantum algorithm called Grover’s algorithm.  But getting an exponential quantum speedup for these problems would, at the least, require an additional breakthrough.  No one has proved that such a breakthrough in quantum algorithms is impossible: indeed, no one has proved that it’s impossible even for classical algorithms; that’s the P vs. NP question!  But most of us regard it as unlikely.

If we’re right, then the upshot is that quantum computers are not magic bullets: they might yield dramatic speedups for certain special problems (like factoring), but they won’t tame the curse of exponentiality, cut through to the optimal solution, every time we encounter a Library-of-Babel-like profusion of possibilities.  For (say) the Traveling Salesman Problem with a thousand cities, even a quantum computer—which is the most powerful kind of computer rooted in known laws of physics—might, for all we know, take longer than the age of the universe to find the shortest route.


The truth is, though, the biggest numbers that show up in math are way bigger than anything we’ve discussed until now: bigger than 10122, or even

$$ 2^{10^{122}}, $$

which is a rough estimate for the number of quantum-mechanical amplitudes needed to describe our observable universe.

For starters, there’s Skewes’ number, which the mathematician G. H. Hardy once called “the largest number which has ever served any definite purpose in mathematics.”  Let π(x) be the number of prime numbers up to x: for example, π(10)=4, since we have 2, 3, 5, and 7.  Then there’s a certain estimate for π(x) called li(x).  It’s known that li(x) overestimates π(x) for an enormous range of x’s (up to trillions and beyond)—but then at some point, it crosses over and starts underestimating π(x) (then overestimates again, then underestimates, and so on).  Skewes’ number is an upper bound on the location of the first such crossover point.  In 1955, Skewes proved that the first crossover must happen before

$$ x = 10^{10^{10^{964}}}. $$

Note that this bound has since been substantially improved, to 1.4×10316.  But no matter: there are numbers vastly bigger even than Skewes’ original estimate, which have since shown up in Ramsey theory and other parts of logic and combinatorics to take Skewes’ number’s place.

Alas, I won’t have time here to delve into specific (beautiful) examples of such numbers, such as Graham’s number.  So in lieu of that, let me just tell you about the sorts of processes, going far beyond exponentiation, that tend to yield such numbers.

The starting point is to remember a sequence of operations we all learn about in elementary school, and then ask why the sequence suddenly and inexplicably stops.

As long as we’re only talking about positive integers, “multiplication” just means “repeated addition.”  For example, 5×3 means 5 added to itself 3 times, or 5+5+5.

Likewise, “exponentiation” just means “repeated multiplication.”  For example, 53 means 5×5×5.

But what’s repeated exponentiation?  For that we introduce a new operation, which we call tetration, and write like so: 35 means 5 raised to itself 3 times, or

$$ ^{3} 5 = 5^{5^5} = 5^{3125} \approx 1.9 \times 10^{2184}. $$

But we can keep going. Let x pentated to the y, or xPy, mean x tetrated to itself y times.  Let x sextated to the y, or xSy, mean x pentated to itself y times, and so on.

Then we can define the Ackermann function, invented by the mathematician Wilhelm Ackermann in 1928, which cuts across all these operations to get more rapid growth than we could with any one of them alone.  In terms of the operations above, we can give a slightly nonstandard, but perfectly serviceable, definition of the Ackermann function as follows:

A(1) is 1+1=2.

A(2) is 2×2=4.

A(3) is 3 to the 3rd power, or 33=27.

Not very impressive so far!  But wait…

A(4) is 4 tetrated to the 4, or

$$ ^{4}4 = 4^{4^{4^4}} = 4^{4^{256}} = BIG $$

A(5) is 5 pentated to the 5, which I won’t even try to simplify.  A(6) is 6 sextated to the 6.  And so on.

More than just a curiosity, the Ackermann function actually shows up sometimes in math and theoretical computer science.  For example, the inverse Ackermann function—a function α such that α(A(n))=n, which therefore grows as slowly as the Ackermann function grows quickly, and which is at most 4 for any n that would ever arise in the physical universe—sometimes appears in the running times of real-world algorithms.

In the meantime, though, the Ackermann function also has a more immediate application.  Next time you find yourself in a biggest-number contest, like the one with which we opened this talk, you can just write A(1000), or even A(A(1000)) (after specifying that A means the Ackermann function above).  You’ll win—period—unless your opponent has also heard of something Ackermann-like or beyond.


OK, but Ackermann is very far from the end of the story.  If we want to go incomprehensibly beyond it, the starting point is the so-called “Berry Paradox”, which was first described by Bertrand Russell, though he said he learned it from a librarian named Berry.  The Berry Paradox asks us to imagine leaping past exponentials, the Ackermann function, and every other particular system for naming huge numbers.  Instead, why not just go straight for a single gambit that seems to beat everything else:

The biggest number that can be specified using a hundred English words or fewer

Why is this called a paradox?  Well, do any of you see the problem here?

Right: if the above made sense, then we could just as well have written

Twice the biggest number that can be specified using a hundred English words or fewer

But we just specified that number—one that, by definition, takes more than a hundred words to specify—using far fewer than a hundred words!  Whoa.  What gives?

Most logicians would say the resolution of this paradox is simply that the concept of “specifying a number with English words” isn’t precisely defined, so phrases like the ones above don’t actually name definite numbers.  And how do we know that the concept isn’t precisely defined?  Why, because if it was, then it would lead to paradoxes like the Berry Paradox!

So if we want to escape the jaws of logical contradiction, then in this gambit, we ought to replace English by a clear, logical language: one that can be used to specify numbers in a completely unambiguous way.  Like … oh, I know!  Why not write:

The biggest number that can be specified using a computer program that’s at most 1000 bytes long

To make this work, there are just two issues we need to get out of the way.  First, what does it mean to “specify” a number using a computer program?  There are different things it could mean, but for concreteness, let’s say a computer program specifies a number N if, when you run it (with no input), the program runs for exactly N steps and then stops.  A program that runs forever doesn’t specify any number.

The second issue is, which programming language do we have in mind: BASIC? C? Python?  The answer is that it won’t much matter!  The Church-Turing Thesis, one of the foundational ideas of computer science, implies that every “reasonable” programming language can emulate every other one.  So the story here can be repeated with just about any programming language of your choice.  For concreteness, though, we’ll pick one of the first and simplest programming languages, namely “Turing machine”—the language invented by Alan Turing all the way back in 1936!

In the Turing machine language, we imagine a one-dimensional tape divided into squares, extending infinitely in both directions, and with all squares initially containing a “0.”  There’s also a tape head with n “internal states,” moving back and forth on the tape.  Each internal state contains an instruction, and the only allowed instructions are: write a “0” in the current square, write a “1” in the current square, move one square left on the tape, move one square right on the tape, jump to a different internal state, halt, and do any of the previous conditional on whether the current square contains a “0” or a “1.”

Using Turing machines, in 1962 the mathematician Tibor Radó invented the so-called Busy Beaver function, or BB(n), which allowed naming by far the largest numbers anyone had yet named.  BB(n) is defined as follows: consider all Turing machines with n internal states.  Some of those machines run forever, when started on an all-0 input tape.  Discard them.  Among the ones that eventually halt, there must be some machine that runs for a maximum number of steps before halting.  However many steps that is, that’s what we call BB(n), the nth Busy Beaver number.

The first few values of the Busy Beaver function have actually been calculated, so let’s see them.

BB(1) is 1.  For a 1-state Turing machine on an all-0 tape, the choices are limited: either you halt in the very first step, or else you run forever.

BB(2) is 6, as isn’t too hard to verify by trying things out with pen and paper.

BB(3) is 21: that determination was already a research paper.

BB(4) is 107 (another research paper).

Much like with the Ackermann function, not very impressive yet!  But wait:

BB(5) is not yet known, but it’s known to be at least 47,176,870.

BB(6) is at least 7.4×1036,534.

BB(7) is at least

$$ 10^{10^{10^{10^{18,000,000}}}}. $$

Clearly we’re dealing with a monster here, but can we understand just how terrifying of a monster?  Well, call a sequence f(1), f(2), … computable, if there’s some computer program that takes n as input, runs for a finite time, then halts with f(n) as its output.  To illustrate, f(n)=n2, f(n)=2n, and even the Ackermann function that we saw before are all computable.

But I claim that the Busy Beaver function grows faster than any computable function.  Since this talk should have at least some math in it, let’s see a proof of that claim.

Maybe the nicest way to see it is this: suppose, to the contrary, that there were a computable function f that grew at least as fast as the Busy Beaver function.  Then by using that f, we could take the Berry Paradox from before, and turn it into an actual contradiction in mathematics!  So for example, suppose the program to compute f were a thousand bytes long.  Then we could write another program, not much longer than a thousand bytes, to run for (say) 2×f(1000000) steps: that program would just need to include a subroutine for f, plus a little extra code to feed that subroutine the input 1000000, and then to run for 2×f(1000000) steps.  But by assumption, f(1000000) is at least the maximum number of steps that any program up to a million bytes long can run for—even though we just wrote a program, less than a million bytes long, that ran for more steps!  This gives us our contradiction.  The only possible conclusion is that the function f, and the program to compute it, couldn’t have existed in the first place.

(As an alternative, rather than arguing by contradiction, one could simply start with any computable function f, and then build programs that compute f(n) for various “hardwired” values of n, in order to show that BB(n) must grow at least as rapidly as f(n).  Or, for yet a third proof, one can argue that, if any upper bound on the BB function were computable, then one could use that to solve the halting problem, which Turing famously showed to be uncomputable in 1936.)

In some sense, it’s not so surprising that the BB function should grow uncomputably quickly—because if it were computable, then huge swathes of mathematical truth would be laid bare to us.  For example, suppose we wanted to know the truth or falsehood of the Goldbach Conjecture, which says that every even number 4 or greater can be written as a sum of two prime numbers.  Then we’d just need to write a program that checked each even number one by one, and halted if and only if it found one that wasn’t a sum of two primes.  Suppose that program corresponded to a Turing machine with N states.  Then by definition, if it halted at all, it would have to halt after at most BB(N) steps.  But that means that, if we knew BB(N)—or even any upper bound on BB(N)—then we could find out whether our program halts, by simply running it for the requisite number of steps and seeing.  In that way we’d learn the truth or falsehood of Goldbach’s Conjecture—and similarly for the Riemann Hypothesis, and every other famous unproved mathematical conjecture (there are a lot of them) that can be phrased in terms of a computer program never halting.

(Here, admittedly, I’m using “we could find” in an extremely theoretical sense.  Even if someone handed you an N-state Turing machine that ran for BB(N) steps, the number BB(N) would be so hyper-mega-astronomical that, in practice, you could probably never distinguish the machine from one that simply ran forever.  So the aforementioned “strategy” for proving Goldbach’s Conjecture, or the Riemann Hypothesis would probably never yield fruit before the heat death of the universe, even though in principle it would reduce the task to a “mere finite calculation.”)

OK, you wanna know something else wild about the Busy Beaver function?  In 2015, my former student Adam Yedidia and I wrote a paper where we proved that BB(8000)—i.e., the 8000th Busy Beaver number—can’t be determined using the usual axioms for mathematics, which are called Zermelo-Fraenkel (ZF) set theory.  Nor can B(8001) or any larger Busy Beaver number.

To be sure, BB(8000) has some definite value: there are finitely many 8000-state Turing machines, and each one either halts or runs forever, and among the ones that halt, there’s some maximum number of steps that any of them runs for.  What we showed is that math, if it limits itself to the currently-accepted axioms, can never prove the value of BB(8000), even in principle.

The way we did that was by explicitly constructing an 8000-state Turing machine, which (in effect) enumerates all the consequences of the ZF axioms one after the next, and halts if and only if it ever finds a contradiction—that is, a proof of 0=1.  Presumably set theory is actually consistent, and therefore our program runs forever.  But if you proved the program ran forever, you’d also be proving the consistency of set theory.  And has anyone heard of any obstacle to doing that?  Of course, Gödel’s Incompleteness Theorem!  Because of Gödel, if set theory is consistent (well, technically, also arithmetically sound), then it can’t prove our program either halts or runs forever.  But that means set theory can’t determine BB(8000) either—because if it could do that, then it could also determine the behavior of our program.

To be clear, it was long understood that there’s some computer program that halts if and only if set theory is inconsistent—and therefore, that the axioms of set theory can determine at most k values of the Busy Beaver function, for some positive integer k.  “All” Adam and I did was to prove the first explicit upper bound, k≤8000, which required a lot of optimizations and software engineering to get the number of states down to something reasonable (our initial estimate was more like k≤1,000,000).  More recently, Stefan O’Rear has improved our bound—most recently, he says, to k≤1000, meaning that, at least by the lights of ZF set theory, fewer than a thousand values of the BB function can ever be known.

Meanwhile, let me remind you that, at present, only four values of the function are known!  Could the value of BB(100) already be independent of set theory?  What about BB(10)?  BB(5)?  Just how early in the sequence do you leap off into Platonic hyperspace?  I don’t know the answer to that question but would love to.


Ah, you ask, but is there any number sequence that grows so fast, it blows even the Busy Beavers out of the water?  There is!

Imagine a magic box into which you could feed in any positive integer n, and it would instantly spit out BB(n), the nth Busy Beaver number.  Computer scientists call such a box an “oracle.”  Even though the BB function is uncomputable, it still makes mathematical sense to imagine a Turing machine that’s enhanced by the magical ability to access a BB oracle any time it wants: call this a “super Turing machine.”  Then let SBB(n), or the nth super Busy Beaver number, be the maximum number of steps that any n-state super Turing machine makes before halting, if given no input.

By simply repeating the reasoning for the ordinary BB function, one can show that, not only does SBB(n) grow faster than any computable function, it grows faster than any function computable by super Turing machines (for example, BB(n), BB(BB(n)), etc).

Let a super duper Turing machine be a Turing machine with access to an oracle for the super Busy Beaver numbers.  Then you can use super duper Turing machines to define a super duper Busy Beaver function, which you can use in turn to define super duper pooper Turing machines, and so on!

Let “level-1 BB” be the ordinary BB function, let “level-2 BB” be the super BB function, let “level 3 BB” be the super duper BB function, and so on.  Then clearly we can go to “level-k BB,” for any positive integer k.

But we need not stop even there!  We can then go to level-ω BB.  What’s ω?  Mathematicians would say it’s the “first infinite ordinal”—the ordinals being a system where you can pass from any set of numbers you can possibly name (even an infinite set), to the next number larger than all of them.  More concretely, the level-ω Busy Beaver function is simply the Busy Beaver function for Turing machines that are able, whenever they want, to call an oracle to compute the level-k Busy Beaver function, for any positive integer k of their choice.

But why stop there?  We can then go to level-(ω+1) BB, which is just the Busy Beaver function for Turing machines that are able to call the level-ω Busy Beaver function as an oracle.  And thence to level-(ω+2) BB, level-(ω+3) BB, etc., defined analogously.  But then we can transcend that entire sequence and go to level-2ω BB, which involves Turing machines that can call level-(ω+k) BB as an oracle for any positive integer k.  In the same way, we can pass to level-3ω BB, level-4ω BB, etc., until we transcend that entire sequence and pass to level-ω2 BB, which can call any of the previous ones as oracles.  Then we have level-ω3 BB, level-ω4 BB, etc., until we transcend that whole sequence with level-ωω BB.  But we’re still not done!  For why not pass to level

$$ \omega^{\omega^{\omega}} $$,

level

$$ \omega^{\omega^{\omega^{\omega}}} $$,

etc., until we reach level

$$ \left. \omega^{\omega^{\omega^{.^{.^{.}}}}}\right\} _{\omega\text{ times}} $$?

(This last ordinal is also called ε0.)  And mathematicians know how to keep going even to way, way bigger ordinals than ε0, which give rise to ever more rapidly-growing Busy Beaver sequences.  Ordinals achieve something that on its face seems paradoxical, which is to systematize the concept of transcendence.

So then just how far can you push this?  Alas, ultimately the answer depends on which axioms you assume for mathematics.  The issue is this: once you get to sufficiently enormous ordinals, you need some systematic way to specify them, say by using computer programs.  But then the question becomes which ordinals you can “prove to exist,” by giving a computer program together with a proof that the program does what it’s supposed to do.  The more powerful the axiom system, the bigger the ordinals you can prove to exist in this way—but every axiom system will run out of gas at some point, only to be transcended, in Gödelian fashion, by a yet more powerful system that can name yet larger ordinals.

So for example, if we use Peano arithmetic—invented by the Italian mathematician Giuseppe Peano—then Gentzen proved in the 1930s that we can name any ordinals below ε0, but not ε0 itself or anything beyond it.  If we use ZF set theory, then we can name vastly bigger ordinals, but once again we’ll eventually run out of steam.

(Technical remark: some people have claimed that we can transcend this entire process by passing from first-order to second-order logic.  But I fundamentally disagree, because with second-order logic, which number you’ve named could depend on the model of set theory, and therefore be impossible to pin down.  With the ordinal Busy Beaver numbers, by contrast, the number you’ve named might be breathtakingly hopeless ever to compute—but provided the notations have been fixed, and the ordinals you refer to actually exist, at least we know there is a unique positive integer that you’re talking about.)

Anyway, the upshot of all of this is that, if you try to hold a name-the-biggest-number contest between two actual professionals who are trying to win, it will (alas) degenerate into an argument about the axioms of set theory.  For the stronger the set theory you’re allowed to assume consistent, the bigger the ordinals you can name, therefore the faster-growing the BB functions you can define, therefore the bigger the actual numbers.

So, yes, in the end the biggest-number contest just becomes another Gödelian morass, but one can get surprisingly far before that happens.


In the meantime, our universe seems to limit us to at most 10122 choices that could ever be made, or experiences that could ever be had, by any one observer.  Or fewer, if you believe that you won’t live until the heat death of the universe in some post-Singularity computer cloud, but for at most about 102 years.  In the meantime, the survival of the human race might hinge on people’s ability to understand much smaller numbers than 10122: for example, a billion, a trillion, and other numbers that characterize the exponential growth of our civilization and the limits that we’re now running up against.

On a happier note, though, if our goal is to make math engaging to young people, or to build bridges between the quantitative and literary worlds, the way this festival is doing, it seems to me that it wouldn’t hurt to let people know about the vastness that’s out there.  Thanks for your attention.

GapP, Oracles, and Quantum Supremacy

September 1st, 2017

Let me start with a few quick announcements before the main entrée:

First, the website haspvsnpbeensolved.com is now live!  Thanks so much to my friend Adam Chalmers for setting it up.  Please try it out on your favorite P vs. NP solution paper—I think you’ll be impressed by how well our secret validation algorithm performs.

Second, some readers might enjoy a YouTube video of me lecturing about the computability theory of closed timelike curves, from the Workshop on Computational Complexity and High Energy Physics at the University of Maryland a month ago.  Other videos from the workshop—including of talks by John Preskill, Daniel Harlow, Stephen Jordan, and other names known around Shtetl-Optimized, and of a panel discussion in which I participated—are worth checking out as well.  Thanks so much to Stephen for organizing such a great workshop!

Third, thanks to everyone who’s emailed to ask whether I’m holding up OK with Hurricane Harvey, and whether I know how to swim (I do).  As it happens, I haven’t been in Texas for two months—I spent most of the summer visiting NYU and doing other travel, and this year, Dana and I are doing an early sabbatical at Tel Aviv University.  However, I understand from friends that Austin, being several hours’ drive further inland, got nothing compared to what Houston did, and that UT is open on schedule for the fall semester.  Hopefully our house is still standing as well!  Our thoughts go to all those affected by the disaster in Houston.  Eventually, the Earth’s rapidly destabilizing climate almost certainly means that Austin will be threatened as well by “500-year events” happening every year or two, as for that matter will a large portion of the earth’s surface.  For now, though, Austin lives to be weird another day.


GapP, Oracles, and Quantum Supremacy

by Scott Aaronson

Stuart Kurtz 60th Birthday Conference, Columbia, South Carolina

August 20, 2017

It’s great to be here, to celebrate the life and work of Stuart Kurtz, which could never be … eclipsed … by anything.

I wanted to say something about work in structural complexity and counting complexity and oracles that Stuart was involved with “back in the day,” and how that work plays a major role in issues that concern us right now in quantum computing.  A major goal for the next few years is the unfortunately-named Quantum Supremacy.  What this means is to get a clear quantum speedup, for some task: not necessarily a useful task, but something that we can be as confident as possible is classically hard.  For example, consider the 49-qubit superconducting chip that Google is planning to fabricate within the next year or so.  This won’t yet be good enough for running Shor’s algorithm, to factor numbers of any interesting size, but it hopefully will be good enough to sample from a probability distribution over n-bit strings—in this case, 49-bit strings—that’s hard to sample from classically, taking somewhere on the order of 249 steps.

Furthermore, the evidence that that sort of thing is indeed classically hard, might actually be stronger than the evidence that factoring is classically hard.  As I like to say, a fast classical factoring algorithm would “merely” collapse the world’s electronic commerce—as far as we know, it wouldn’t collapse the polynomial hierarchy!  By contrast, a fast classical algorithm to simulate quantum sampling would collapse the polynomial hierarchy, assuming the simulation is exact.  Let me first go over the argument for that, and then explain some of the more recent things we’ve learned.

Our starting point will be two fundamental complexity classes, #P and GapP.

#P is the class of all nonnegative integer functions f, for which there exists a nondeterministic polynomial-time Turing machine M such that f(x) equals the number of accepting paths of M(x).  Less formally, #P is the class of problems that boil down to summing up an exponential number of nonnegative terms, each of which is efficiently computable individually.

GapP—introduced by Fenner, Fortnow, and Kurtz in 1992—can be defined as the set {f-g : f,g∈#P}; that is, the closure of #P under subtraction.  Equivalently, GapP is the class of problems that boil down to summing up an exponential number of terms, each of which is efficiently computable individually, but which could be either positive or negative, and which can therefore cancel each other out.  As you can see, GapP is a class that in some sense anticipates quantum computing!

For our purposes, the most important difference between #P and GapP is that #P functions can at least be multiplicatively approximated in the class BPPNP, by using Stockmeyer’s technique of approximating counting with universal hash functions.  By contrast, even if you just want to approximate a GapP function to within (say) a factor of 2—or for that matter, just decide whether a GapP function is positive or negative—it’s not hard to see that that’s already a #P-hard problem.  For, supposing we had an oracle to solve this problem, we could then shift the sum this way and that by adding positive and negative dummy terms, and use binary search, to zero in on the sum’s exact value in polynomial time.

It’s also not hard to see that a quantum computation can encode an arbitrary GapP function in one of its amplitudes.  Indeed, let s:{0,1}n→{1,-1} be any Boolean function that’s given by a polynomial-size circuit.  Then consider the quantum circuit below.

When we run this circuit, the probability that we see the all-0 string as output is

$$ \left( \frac{1}{\sqrt{2^n}} \sum_{z\in \{0,1\}^n} s(z) \right)^2 = \frac{1}{2^n} \sum_{z,w\in \{0,1\}^n} s(z) s(w) $$

which is clearly in GapP, and clearly #P-hard even to approximate to within a multiplicative factor.

By contrast, suppose we had a probabilistic polynomial-time classical algorithm, call it M, to sample the output distribution of the above quantum circuit.  Then we could rewrite the above probability as Prr[M(r) outputs 0…0], where r consists of the classical random bits used by M.  This is again an exponentially large sum, with one term for each possible r value—but now it’s a sum of nonnegative terms (probabilities), which is therefore approximable in BPPNP.

We can state the upshot as follows.  Let ExactSampBPP be the class of sampling problems—that is, families of probability distributions {Dx}x, one for each input x∈{0,1}n—for which there exists a polynomial-time randomized algorithm that outputs a sample exactly from Dx, in time polynomial in |x|.  Let ExactSampBQP be the same thing except that we allow a polynomial-time quantum algorithm.  Then we have that, if ExactSampBPP = ExactSampBQP, then squared sums of both positive and negative terms, could efficiently be rewritten as sums of nonnegative terms only—and hence P#P=BPPNP.  This, in turn, would collapse the polynomial hierarchy to the third level, by Toda’s Theorem that PH⊆P#P, together with the result BPPNP⊆∑3.  To summarize:

Theorem 1.  Quantum computers can efficiently solve exact sampling problems that are classically hard unless the polynomial hierarchy collapses.

(In fact, the argument works not only if the classical algorithm exactly samples Dx, but if it samples from any distribution in which the probabilities are multiplicatively close to Dx‘s.  If we really only care about exact sampling, then we can strengthen the conclusion to get that PH collapses to the second level.)

This sort of reasoning was implicit in several early works, including those of Fenner et al. and Terhal and DiVincenzo.  It was made fully explicit in my paper with Alex Arkhipov on BosonSampling in 2011, and in the independent work of Bremner, Jozsa, and Shepherd on the IQP model.  These works actually showed something stronger, which is that we get a collapse of PH, not merely from a fast classical algorithm to simulate arbitrary quantum systems, but from fast classical algorithms to simulate various special quantum systems.  In the case of BosonSampling, that special system is a collection of identical, non-interacting photons passing through a network of beamsplitters, then being measured at the very end to count the number of photons in each mode.  In the case of IQP, the special system is a collection of qubits that are prepared, subjected to some commuting Hamiltonians acting on various subsets of the qubits, and then measured.  These special systems don’t seem to be capable of universal quantum computation (or for that matter, even universal classical computation!)—and correspondingly, many of them seem easier to realize in the lab than a full universal quantum computer.

From an experimental standpoint, though, all these results are unsatisfactory, because they all talk only about the classical hardness of exact (or very nearly exact) sampling—and indeed, the arguments are based around the hardness of estimating just a single, exponentially-small amplitude.  But any real experiment will have tons of noise and inaccuracy, so it seems only fair to let the classical simulation be subject to serious noise and inaccuracy as well—but as soon as we do, the previous argument collapses.

Thus, from the very beginning, Alex Arkhipov and I took it as our “real” goal to show, under some reasonable assumption, that there’s a distribution D that a polynomial-time quantum algorithm can sample from, but such that no polynomial-time classical algorithm can sample from any distribution that’s even ε-close to D in variation distance.  Indeed, this goal is what led us to BosonSampling in the first place: we knew that we needed amplitudes that were not only #P-hard but “robustly” #P-hard; we knew that the permanent of an n×n matrix (at least over finite fields) was the canonical example of a “robustly” #P-hard function; and finally, we knew that systems of identical non-interacting bosons, such as photons, gave rise to amplitudes that were permanents in an extremely natural way.  The fact that photons actually exist in the physical world, and that our friends with quantum optics labs like to do experiments with them, was just a nice bonus!

A bit more formally, let ApproxSampBPP be the class of sampling problems for which there exists a classical algorithm that, given an input x∈{0,1}n and a parameter ε>0, samples a distribution that’s at most  away from Dx in variation distance, in time polynomial in n and 1/ε.  Let ApproxSampBQP be the same except that we allow a quantum algorithm.  Then the “dream” result that we’d love to prove—both then and now—is the following.

Strong Quantum Supremacy Conjecture.  If ApproxSampBPP = ApproxSampBQP, then the polynomial hierarchy collapses.

Unfortunately, Alex and I were only able to prove this conjecture assuming a further hypothesis, about the permanents of i.i.d. Gaussian matrices.

Theorem 2 (A.-Arkhipov).  Given an n×n matrix X of independent complex Gaussian entries, each of mean 0 and variance 1, assume it’s a #P-hard problem to approximate |Per(X)|2 to within ±ε⋅n!, with probability at least 1-δ over the choice of X, in time polynomial in n, 1/ε, and 1/δ.  Then the Strong Quantum Supremacy Conjecture holds.  Indeed, more than that: in such a case, even a fast approximate classical simulation of BosonSampling, in particular, would imply P#P=BPPNP and hence a collapse of PH.

Alas, after some months of effort, we were unable to prove the needed #P-hardness result for Gaussian permanents, and it remains an outstanding open problem—there’s not even a consensus as to whether it should be true or false.  Note that there is a famous polynomial-time classical algorithm to approximate the permanents of nonnegative matrices, due to Jerrum, Sinclair, and Vigoda, but that algorithm breaks down for matrices with negative or complex entries.  This is once again the power of cancellations, the difference between #P and GapP.

Frustratingly, if we want the exact permanents of i.i.d. Gaussian matrices, we were able to prove that that’s #P-hard; and if we want the approximate permanents of arbitrary matrices, we also know that that’s #P-hard—it’s only when we have approximation and random inputs in the same problem that we no longer have the tools to prove #P-hardness.

In the meantime, one can also ask a meta-question.  How hard should it be to prove the Strong Quantum Supremacy Conjecture?  Were we right to look at slightly exotic objects, like the permanents of Gaussian matrices?  Or could Strong Quantum Supremacy have a “pure, abstract complexity theory proof”?

Well, one way to formalize that question is to ask whether Strong Quantum Supremacy has a relativizing proof, a proof that holds in the presence of an arbitrary oracle.  Alex and I explicitly raised that as an open problem in our BosonSampling paper.

Note that “weak” quantum supremacy—i.e., the statement that ExactSampBPP = ExactSampBQP collapses the polynomial hierarchy—has a relativizing proof, namely the proof that I sketched earlier.  All the ingredients that we used—Toda’s Theorem, Stockmeyer approximate counting, simple manipulations of quantum circuits—were relativizing ingredients.  By contrast, all the way back in 1998, Fortnow and Rogers proved the following.

Theorem 3 (Fortnow and Rogers).  There exists an oracle relative to which P=BQP and yet PH is infinite.

In other words, if you want to prove that P=BQP collapses the polynomial hierarchy, the proof can’t be relativizing.  This theorem was subsequently generalized in a paper by Fenner, Fortnow, Kurtz, and Li, which used concepts like “generic oracles” that seem powerful but that I don’t understand.

The trouble is, Fortnow and Rogers’s construction was extremely tailored to making P=BQP.  It didn’t even make PromiseBPP=PromiseBQP (that is, it allowed that quantum computers might still be stronger than classical ones for promise problems), let alone did it collapse quantum with classical for sampling problems.

We can organize the various quantum/classical collapse possibilities as follows:

ExactSampBPP = ExactSampBQP

ApproxSampBPP = ApproxSampBQP   ⇔   FBPP = FBQP

PromiseBPP = PromiseBQP

BPP = BQP

Here FBPP is the class of relation problems solvable in randomized polynomial time—that is, problems where given an input x∈{0,1}n and a parameter ε>0, the goal is to produce any output in a certain set Sx, with success probability at least 1-ε, in time polynomial in n and 1/ε.  FBQP is the same thing except for quantum polynomial time.

The equivalence between the two equalities ApproxSampBPP = ApproxSampBQP and FBPP=FBQP is not obvious, and was the main result in my 2011 paper The Equivalence of Sampling and Searching.  While it’s easy to see that ApproxSampBPP = ApproxSampBQP implies FBPP=FBQP, the opposite direction requires us to take an arbitrary sampling problem S, and define a relation problem RS that has “essentially the same difficulty” as S (in the sense that RS has an efficient classical algorithm iff S does, RS has an efficient quantum algorithm iff S does, etc).  This, in turn, we do using Kolmogorov complexity: basically, RS asks us to output a tuple of samples that have large probabilities according to the requisite probability distribution from the sampling problem; and that also, conditioned on that, are close to algorithmically random.  The key observation is that, if a probabilistic Turing machine of fixed size can solve that relation problem for arbitrarily large inputs, then it must be doing so by sampling from a probability distribution close in variation distance to D—since any other approach would lead to outputs that were algorithmically compressible.

Be that as it may, staring at the chain of implications above, a natural question is which equalities in the chain collapse the polynomial hierarchy in a relativizing way, and which equalities collapse PH (if they do) only for deeper, non-relativizing reasons.

This is one of the questions that Lijie Chen and I took up, and settled, in our paper Complexity-Theoretic Foundations of Quantum Supremacy Experiments, which was presented at this summer’s Computational Complexity Conference (CCC) in Riga.  The “main” results in our paper—or at least, the results that the physicists care about—were about how confident we can be in the classical hardness of simulating quantum sampling experiments with random circuits, such as the experiments that the Google group will hopefully be able to do with its 49-qubit device in the near future.  This involved coming up with a new hardness assumption, which was tailored to those sorts of experiments, and giving a reduction from that new assumption, and studying how far existing algorithms come toward breaking the new assumption (tl;dr: not very far).

But our paper also had what I think of as a “back end,” containing results mainly of interest to complexity theorists, about what kinds of quantum supremacy theorems we can and can’t hope for in principle.  When I’m giving talks about our paper to physicists, I never have time to get to this back end—it’s always just “blah, blah, we also did some stuff involving structural complexity and oracles.”  But given that a large fraction of all the people on earth who enjoy those things are probably right here in this room, in the rest of this talk, I’d like to tell you about what was in the back end.

The first thing there was the following result.

Theorem 4 (A.-Chen).  There exists an oracle relative to which ApproxSampBPP = ApproxSampBQP and yet PH is infinite. In other words, any proof of the Strong Quantum Supremacy Conjecture will require non-relativizing techniques.

Theorem 4 represents a substantial generalization of Fortnow and Rogers’s Theorem 3, in that it makes quantum and classical equivalent not only for promise problems, but even for approximate sampling problems.  There’s also a sense in which Theorem 4 is the best possible: as we already saw, there are no oracles relative to which ExactSampBPP = ExactSampBQP and yet PH is infinite, because the opposite conclusion relativizes.

So how did we prove Theorem 4?  Well, we learned at this workshop that Stuart Kurtz pioneered the development of principled ways to prove oracle results just like this one, with multiple “nearly conflicting” requirements.  But, because we didn’t know that at the time, we basically just plunged in and built the oracle we wanted by hand!

In more detail, you can think of our oracle construction as proceeding in three steps.

  1. We throw in an oracle for a PSPACE-complete problem.  This collapses ApproxSampBPP with ApproxSampBQP, which is what we want.  Unfortunately, it also collapses the polynomial hierarchy down to P, which is not what we want!
  2. So then we need to add in a second part of the oracle that makes PH infinite again.  From Håstad’s seminal work in the 1980s until recently, even if we just wanted any oracle that makes PH infinite, without doing anything else at the same time, we only knew how to achieve that with quite special oracles.  But in their 2015 breakthrough, Rossman, Servedio, and Tan have shown that even a random oracle makes PH infinite with probability 1.  So for simplicity, we might as well take this second part of the oracle to be random.  The “only” problem is that, along with making PH infinite, a random oracle will also re-separate ApproxSampBPP and ApproxSampBQP (and for that matter, even ExactSampBPP and ExactSampBQP)—for example, because of the Fourier sampling task performed by the quantum circuit I showed you earlier!  So we once again seem back where we started.
    (To ward off confusion: ever since Fortnow and Rogers posed the problem in 1998, it remains frustratingly open whether BPP and BQP can be separated by a random oracle—that’s a problem that I and others have worked on, making partial progress that makes a query complexity separation look unlikely without definitively ruling one out.  But separating the sampling versions of BPP and BQP by a random oracle is much, much easier.)
  3. So, finally, we need to take the random oracle that makes PH infinite, and “scatter its bits around randomly” in such a way that a PH machine can still find the bits, but an ApproxSampBQP machine can’t.  In other words: given our initial random oracle A, we can make a new oracle B such that B(y,r)=(1,A(y)) if r is equal to a single randomly-chosen “password” ry, depending on the query y, and B(y,r)=(0,0) otherwise.  In that case, it takes just one more existential quantifier to guess the password ry, so PH can do it, but a quantum algorithm is stuck, basically because the linearity of quantum mechanics makes the algorithm not very sensitive to tiny random changes to the oracle string (i.e., the same reason why Grover’s algorithm can’t be arbitrarily sped up).  Incidentally, the reason why the password ry needs to depend on the query y is that otherwise the input x to the quantum algorithm could hardcode a password, and thereby reveal exponentially many bits of the random oracle A.

We should now check: why does the above oracle “only” collapse ApproxSampBPP and ApproxSampBQP?  Why doesn’t it also collapse ExactSampBPP and ExactSampBQP—as we know that it can’t, by our previous argument?  The answer is: because a quantum algorithm does have an exponentially small probability of correctly guessing a given password ry.  And that’s enough to make the distribution sampled by the quantum algorithm differ, by 1/exp(n) in variation distance, from the distribution sampled by any efficient classical simulation of the algorithm—an error that doesn’t matter for approximate sampling, but does matter for exact sampling.

Anyway, it’s then just like seven pages of formalizing the above intuitions and you’re done!

OK, since there seems to be time, I’d like to tell you about one more result from the back end of my and Lijie’s paper.

If we can work relative to whatever oracle A we like, then it’s easy to get quantum supremacy, and indeed BPPA≠BQPA.  We can, for example, use Simon’s problem, or Shor’s period-finding problem, or Forrelation, or other choices of black-box problems that admit huge, provable quantum speedups.  In the unrelativized world, by contrast, it’s clear that we have to make some complexity assumption for quantum supremacy—even if we just want ExactSampBPP ≠ ExactSampBQP.  For if (say) P=P#P, then ExactSampBPP and ExactSampBQP would collapse as well.

Lijie and I were wondering: what happens if we try to “interpolate” between the relativized and unrelativized worlds?  More specifically, what happens if our algorithms are allowed to query a black box, but we’re promised that whatever’s inside the black box is efficiently computable (i.e., has a small circuit)?  How hard is it to separate BPP from BQP, or ApproxSampBPP from ApproxSampBQP, relative to an oracle A that’s constrained to lie in P/poly?

Here, we’ll start with a beautiful observation that’s implicit in 2004 work by Servedio and Gortler, as well as 2012 work by Mark Zhandry.  In our formulation, this observation is as follows:

Theorem 5.  Suppose there exist cryptographic one-way functions (even just against classical adversaries).  Then there exists an oracle A∈P/poly such that BPPA≠BQPA.

While we still need to make a computational hardness assumption here, to separate quantum from classical computing, the surprise is that the assumption is so much weaker than what we’re used to.  We don’t need to assume the hardness of factoring or discrete log—or for that matter, of any “structured” problem that could be a basis for, e.g., public-key cryptography.  Just a one-way function that’s hard to invert, that’s all!

The intuition here is really simple.  Suppose there’s a one-way function; then it’s well-known, by the HILL and GGM Theorems of classical cryptography, that we can bootstrap it to get a cryptographic pseudorandom function family.  This is a family of polynomial-time computable functions fs:{0,1}n→{0,1}n, parameterized by a secret seed s, such that fs can’t be distinguished from a truly random function f by any polynomial-time algorithm that’s given oracle access to the function and that doesn’t know s.  Then, as our efficiently computable oracle A that separates quantum from classical computing, we take an ensemble of functions like

gs,r(x) = fs(x mod r),

where r is an exponentially large integer that serves as a “hidden period,” and s and r are both secrets stored by the oracle that are inaccessible to the algorithm that queries it.

The reasoning is now as follows: certainly there’s an efficient quantum algorithm to find r, or to solve some decision problem involving r, which we can use to define a language that’s in BQPA but not in BPPA.  That algorithm is just Shor’s period-finding algorithm!  (Technically, Shor’s algorithm needs certain assumptions on the starting function fs to work—e.g., it couldn’t be a constant function—but if those assumptions aren’t satisfied, then fs wasn’t pseudorandom anyway.)  On the other hand, suppose there were an efficient classical algorithm to find the period r.  In that case, we have a dilemma on our hands: would the classical algorithm still have worked, had we replaced fs by a truly random function?  If so, then the classical algorithm would violate well-known lower bounds on the classical query complexity of period-finding.  But if not, then by working on pseudorandom functions but not on truly random functions, the algorithm would be distinguishing the two—so fs wouldn’t have been a cryptographic pseudorandom function at all, contrary to assumption!

This all caused Lijie and me to wonder whether Theorem 5 could be strengthened even further, so that it wouldn’t use any complexity assumption at all.  In other words, why couldn’t we just prove unconditionally that there’s an oracle A∈P/poly such that BPPA≠BQPA?  By comparison, it’s not hard to see that we can unconditionally construct an oracle A∈P/poly such that PA≠NPA.

Alas, with the following theorem, we were able to explain why BPP vs. BQP (and even ApproxSampBPP vs. ApproxSampBQP) are different, and why some computational assumption is still needed to separate quantum from classical, even if we’re working relative to an efficiently computable oracle.

Theorem 6 (A.-Chen).  Suppose that, in the real world, ApproxSampBPP = ApproxSampBQP and NP⊆BPP (granted, these are big assumptions!).  Then ApproxSampBPPA = ApproxSampBQPA for all oracles A∈P/poly.

Taking the contrapositive, this is saying that you can’t separate ApproxSampBPP from ApproxSampBQP relative to an efficiently computable oracle, without separating some complexity classes in the real world.  This contrasts not only with P vs. NP, but even with ExactSampBPP vs. ExactSampBQP, which can be separated unconditionally relative to efficiently computable oracles.

The proof of Theorem 6 is intuitive and appealing.  Not surprisingly, we’re going to heavily exploit the assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP.  Let Q be a polynomial-time quantum algorithm that queries an oracle A∈P/poly.  Then we need to simulate Q—and in particular, sample close to the same probability distribution over outputs—using a polynomial-time classical algorithm that queries A.

Let

$$ \sum_{x,w} \alpha_{x,w} \left|x,w\right\rangle $$

be the state of Q immediately before its first query to the oracle A, where x is the input to be submitted to the oracle.  Then our first task is to get a bunch of samples from the probability distribution D={|αx,w|2}x,w, or something close to D in variation distance.  But this is easy to do, using the assumption ApproxSampBPP = ApproxSampBQP.

Let x1,…,xk be our samples from D, marginalized to the x part.  Then next, our classical algorithm queries A on each of x1,…,xk, getting responses A(x1),…,A(xk).  The next step is to search for a function f∈P/poly—or more specifically, a function of whatever fixed polynomial size is relevant—that agrees with A on the sample data, i.e. such that f(xi)=A(xi) for all i∈[k].  This is where we’ll use the assumption NP⊆BPP (together, of course, with the fact that at least one such f exists, namely A itself!), to make the task of finding f efficient.  We’ll also appeal to a fundamental fact about the sample complexity of PAC-learning.  The fact is that, if we find a polynomial-size circuit f that agrees with A on a bunch of sample points drawn independently from a distribution, then f will probably agree with A on most further points drawn from the same distribution as well.

So, OK, we then have a pretty good “mock oracle,” f, that we can substitute for the real oracle on the first query that Q makes.  Of course f and A won’t perfectly agree, but the small fraction of disagreements won’t matter much, again because of the linearity of quantum mechanics (i.e., the same thing that prevents us from speeding up Grover’s algorithm arbitrarily).  So we can basically simulate Q’s first query, and now our classical simulation is good to go until Q’s second query!  But now you can see where this is going: we iterate the same approach, and reuse the same assumptions ApproxSampBPP = ApproxSampBQP and NP⊆BPP, to find a new “mock oracle” that lets us simulate Q’s second query, and so on until all of Q’s queries have been simulated.

OK, I’ll stop there.  I don’t have a clever conclusion or anything.  Thank you.

HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP

August 25th, 2017

This post has a grab bag of topics, unified only by the fact that I can no longer put off blogging about them. So if something doesn’t interest you, just scroll down till you find something that does.


Great news, everyone: following a few reader complaints about the matter, the scottaaronson.com domain now supports https—and even automatically redirects to it! I’m so proud that Shtetl-Optimized has finally entered the technological universe of 1994. Thanks so much to heroic reader Martin Dehnel-Wild for setting this up for me.

Update 26/08/2017: Comments should now be working again; comments are now coming through to the moderated view in the blog’s control panel, so if they don’t show up immediately it might just be awaiting moderation. Thanks for your patience.


Last weekend, I was in Columbia, South Carolina, for a workshop to honor the 60th birthday of Stuart Kurtz, theoretical computer scientist at the University of Chicago.  I gave a talk about how work Kurtz was involved in from the 1990s—for example, on defining the complexity class GapP, and constructing oracles that satisfy conflicting requirements simultaneously—plays a major role in modern research on quantum computational supremacy: as an example, my recent paper with Lijie Chen.  (Except, what a terrible week to be discussing the paths to supremacy!  I promise there are no tiki torches involved, only much weaker photon sources.)

Coincidentally, I don’t know if you read anything about this on social media, but there was this total solar eclipse that passed right over Columbia at the end of the conference.

I’d always wondered why some people travel to remote corners of the earth to catch these.  So the sky gets dark for two minutes, and then it gets light again, in a way that’s been completely understood and predictable for centuries?

Having seen it, I can now tell you the deal, if you missed it and prefer to read about it here rather than 10500 other places online.  At risk of stating the obvious: it’s not the dark sky; it’s the sun’s corona visible around the moon.  Ironically, it’s only when the sun’s blotted out that you can actually look at the sun, at all the weird stuff going on around its disk.

OK, but totality is “only” to eclipses as orgasms are to sex.  There’s also the whole social experience of standing around outside with friends for an hour as the moon gradually takes a bigger bite out of the sun, staring up from time to time with eclipse-glasses to check its progress—and then everyone breaking into applause as the sky finally goes mostly dark, and you can look at the corona with the naked eye.  And then, if you like, standing around for another hour as the moon gradually exits the other way.  (If you’re outside the path of totality, this standing around and checking with eclipse-glasses is the whole experience.)

One cool thing is that, a little before and after totality, shadows on the ground have little crescents in them, as if the eclipse is imprinting its “logo” all over the earth.

For me, the biggest lesson the eclipse drove home was the logarithmic nature of perceived brightness (see also Scott Alexander’s story).  Like, the sun can be more than 90% occluded, and yet it’s barely a shade darker outside.  And you can still only look up with glasses so dark that they blot out everything except the sliver of sun, which still looks pretty much like the normal sun if you catch it out of the corner of your unaided eye.  Only during totality, and a few minutes before and after, is the darkening obvious.


Another topic at the workshop, unsurprisingly, was the ongoing darkening of the United States.  If it wasn’t obvious from my blog’s name, and if saying so explicitly will make any difference for anything, let the record state:

Shtetl-Optimized condemns Nazis, as well as anyone who knowingly marches with Nazis or defends them as “fine people.”

For a year, this blog has consistently described the now-president as a thug, liar, traitor, bully, sexual predator, madman, racist, and fraud, and has urged decent people everywhere to fight him by every peaceful and legal means available.  But if there’s some form of condemnation that I accidentally missed, then after Charlottesville, and Trump’s unhinged quasi-defenses of violent neo-Nazis, and defenses of his previous defenses, etc.—please consider Shtetl-Optimized to have condemned Trump that way also.

At least Charlottesville seems to have set local decisionmakers on an unstoppable course toward removing the country’s remaining Confederate statues—something I strongly supported back in May, before it had become the fully thermonuclear issue that it is now.  In an overnight operation, UT Austin has taken down its statues of Robert E. Lee, Albert Johnston, John Reagan, and Stephen Hogg.  (I confess, the postmaster general of the Confederacy wouldn’t have been my #1 priority for removal.  And, genuine question: what did Texas governor Stephen Hogg do that was so awful for his time, besides naming his daughter Ima Hogg?)


A final thing to talk about—yeah, we can’t avoid it—is Norbert Blum’s claimed proof of P≠NP.  I suppose I should be gratified that, after my last post, there were commenters who said, “OK, but enough about gender politics—what about P vs. NP?”  Here’s what I wrote on Tuesday the 15th:

To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.

Many people misunderstood me to be saying that I’d again bet $200,000, even though the sentence said the exact opposite.  Maybe I should’ve said: I’m searching in vain for the right way to teach the nerd world to get less excited about these claims, to have the same reaction that the experts do, which is ‘oh boy, not another one’—which doesn’t mean that you know the error, or even that there is an error, but just means that you know the history.

Speaking of which, some friends and I recently had an awesome idea.  Just today, I registered the domain name haspvsnpbeensolved.com.  I’d like to set this up with a form that lets you type in the URL of a paper claiming to resolve the P vs. NP problem.  The site will then take 30 seconds or so to process the paper—with a status bar, progress updates, etc.—before finally rendering a verdict about the paper’s correctness.  Do any readers volunteer to help me create this?  Don’t worry, I’ll supply the secret algorithm to decide correctness, and will personally vouch for that algorithm for as long as the site remains live.

I have nothing bad to say about Norbert Blum, who made important contributions including the 3n circuit size lower bound for an explicit Boolean function—something that stood until very recently as the world record—and whose P≠NP paper was lucidly written, passing many of the most obvious checks.  And I received a bit of criticism for my “dismissive” stance.  Apparently, some right-wing former string theorist who I no longer read, whose name rhymes with Mubos Lotl, even accused me of being a conformist left-wing ideologue, driven to ignore Blum’s proof by an irrational conviction that any P≠NP proof will necessarily be so difficult that it will need to “await the Second Coming of Christ.”  Luca Trevisan’s reaction to that is worth quoting:

I agree with [Mubos Lotl] that the second coming of Jesus Christ is not a necessary condition for a correct proof that P is different from NP. I am keeping an open mind as to whether it is a sufficient condition.

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind.  Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

That being the case, it’s only intellectual honesty that compels me to report that, by about Friday of last week—i.e., exactly on my predicted schedule—a clear consensus had developed among experts that Blum’s P≠NP proof was irreparably flawed, and the consensus has stood since that time.

I’ve often wished that, even just for an hour or two, I could be free from this terrifying burden that I’ve carried around since childhood: the burden of having the right instincts about virtually everything.  Trust me, this “gift” is a lot less useful than it sounds, especially when reality so often contradicts what’s popular or expedient to say.

The background to Blum’s attempt, the counterexample that shows the proof has to fail somewhere, and the specifics of what appears to go wrong have already been covered at length elsewhere: see especially Luca’s post, Dick Lipton’s post, John Baez’s post, and the CS Theory StackExchange thread.

Very briefly, though: Blum claims to generalize some of the most celebrated complexity results of the 1980s—namely, superpolynomial lower bounds on the sizes of monotone circuits, which consist entirely of Boolean AND and OR gates—so that they also work for general (non-monotone) circuits, consisting of AND, OR, and NOT gates.  Everyone agrees that, if this succeeded, it would imply P≠NP.

Alas, another big discovery from the 1980s was that there are monotone Boolean functions (like Perfect Matching) that require superpolynomial-size monotone circuits, even though they have polynomial-size non-monotone circuits.  Why is that such a bummer?  Because it means our techniques for proving monotone circuit lower bounds can’t possibly work in as much generality as one might’ve naïvely hoped: if they did, they’d imply not merely that P doesn’t contain NP, but also that P doesn’t contain itself.

Blum was aware of all this, and gave arguments as to why his approach evades the Matching counterexample.  The trouble is, there’s another counterexample, which Blum doesn’t address, called Tardos’s function.  This is a weird creature: it’s obtained by starting with a graph invariant called the Lovász theta function, then looking at a polynomial-time approximation scheme for the theta function, and finally rounding the output of that PTAS to get a monotone function.  But whatever: in constructing this function, Tardos achieved her goal, which was to produce a monotone function that all known lower bound techniques for monotone circuits work perfectly fine for, but which is nevertheless in P (i.e., has polynomial-size non-monotone circuits).  In particular, if Blum’s proof worked, then it would also work for Tardos’s function, and that gives us a contradiction.

Of course, this merely tells us that Blum’s proof must have one or more mistakes; it doesn’t pinpoint where they are.  But the latter question has now been addressed as well.  On CS StackExchange, an anonymous commenter who goes variously by “idolvon” and “vloodin” provides a detailed analysis of the proof of Blum’s crucial Theorem 6.  I haven’t gone through every step myself, and there might be more to say about the matter than “vloodin” has, but several experts who are at once smarter, more knowledgeable, more cautious, and more publicity-shy than me have confirmed for me that vloodin correctly identified the erroneous region.

To those who wonder what gave me the confidence to call this immediately, without working through the details: besides the Cassandra-like burden that I was born with, I can explain something that might be helpful.  When Razborov achieved his superpolynomial monotone lower bounds in the 1980s, there was a brief surge of excitement: how far away could a P≠NP proof possibly be?  But then people, including Razborov himself, understood much more deeply what was going on—an understanding that was reflected in the theorems they proved, but also wasn’t completely captured by those theorems.

What was going on was this: monotone circuits are an interesting and nontrivial computational model.  Indeed for certain Boolean functions, such as the “slice functions,” they’re every bit as powerful as general circuits.  However, insofar as it’s possible to prove superpolynomial lower bounds on monotone circuit size, it’s possible only because monotone circuits are ridiculously less expressive than general Boolean circuits for the problems in question.  E.g., it’s possible only because monotone circuits aren’t expressing pseudorandom functions, and therefore aren’t engaging the natural proofs barrier or most of the other terrifying beasts that we’re up against.

So what can we say about the prospect that a minor tweak to the monotone circuit lower bound techniques from the 1980s would yield P≠NP?  If, like Mubos Lotl, you took the view that discrete math and theory of computation are just a mess of disconnected, random statements, then such a prospect would seem as likely to you as not.  But if you’re armed with the understanding above, then this possibility is a lot like the possibility that the OPERA experiment discovered superluminal neutrinos: no, not a logical impossibility, but something that’s safe to bet against at 10,000:1 odds.

During the discussion of Deolalikar’s earlier P≠NP claim, I once compared betting against a proof that all sorts of people are calling “formidable,” “solid,” etc., to standing in front of a huge pendulum—behind the furthest point that it reached the last time—even as it swings toward your face.  Just as certain physics teachers stake their lives on the conservation of energy, so I’m willing to stake my academic reputation, again and again, on the conservation of circuit-lower-bound difficulty.  And here I am, alive to tell the tale.

What I believe II (ft. Sarah Constantin and Stacey Jeffery)

August 15th, 2017

Unrelated Update: To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.


In my post “The Kolmogorov Option,” I tried to step back from current controversies, and use history to reflect on the broader question of how nerds should behave when their penchant for speaking unpopular truths collides head-on with their desire to be kind and decent and charitable, and to be judged as such by their culture.  I was gratified to get positive feedback about this approach from men and women all over the ideological spectrum.

However, a few people who I like and respect accused me of “dogwhistling.” They warned, in particular, that if I wouldn’t just come out and say what I thought about the James Damore Google memo thing, then people would assume the very worst—even though, of course, my friends themselves knew better.

So in this post, I’ll come out and say what I think.  But first, I’ll do something even better: I’ll hand the podium over to two friends, Sarah Constantin and Stacey Jeffery, both of whom were kind enough to email me detailed thoughts in response to my Kolmogorov post.


Sarah Constantin completed her PhD in math at Yale. I don’t think I’ve met her in person yet, but we have a huge number of mutual friends in the so-called “rationalist community.”  Whenever Sarah emails me about something I’ve written, I pay extremely close attention, because I have yet to read a single thing by her that wasn’t full of insight and good sense.  I strongly urge anyone who likes her beautiful essay below to check out her blog, which is called Otium.

Sarah Constantin’s Commentary:

I’ve had a women-in-STEM essay brewing in me for years, but I’ve been reluctant to actually write publicly on the topic for fear of stirring up a firestorm of controversy.  On the other hand, we seem to be at a cultural inflection point on the issue, especially in the wake of the leaked Google memo, and other people are already scared to speak out, so I think it’s past time for me to put my name on the line, and Scott has graciously provided me a platform to do so.

I’m a woman in tech myself. I’m a data scientist doing machine learning for drug discovery at Recursion Pharmaceuticals, and before that I was a data scientist at Palantir. Before that I was a woman in math — I got my PhD from Yale, studying applied harmonic analysis. I’ve been in this world all my adult life, and I obviously don’t believe my gender makes me unfit to do the work.

I’m also not under any misapprehension that I’m some sort of exception. I’ve been mentored by Ingrid Daubechies and Maryam Mirzakhani (the first female Fields Medalist, who died tragically young last month).  I’ve been lucky enough to work with women who are far, far better than me.  There are a lot of remarkable women in math and computer science — women just aren’t the majority in those fields. But “not the majority” doesn’t mean “rare” or “unknown.”

I even think diversity programs can be worthwhile. I went to the Institute for Advanced Studies’ Women and Math Program, which would be an excellent graduate summer school even if it weren’t all-female, and taught at its sister program for high school girls, which likewise is a great math camp independent of the gender angle. There’s a certain magic, if you’re in a male-dominated field, of once in a while being in a room full of women doing math, and I hope that everybody gets to have that experience once.  

But (you knew the “but” was coming), I think the Google memo was largely correct, and the way people conventionally talk about women in tech is wrong.

Let’s look at some of his claims. From the beginning of the memo:

  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
  • Extreme: all disparities in representation are due to oppression
  • Authoritarian: we should discriminate to correct for this oppression

Okay, so there’s a pervasive assumption that any deviation from 50% representation of women in technical jobs is a.) due to oppression, and b.) ought to be corrected by differential hiring practices. I think it is basically true that people widely believe this, and that people can lose their jobs for openly contradicting it (as James Damore, the author of the memo, did).  I have heard people I work with advocating hiring quotas for women (i.e. explicitly earmarking a number of jobs for women candidates only).  It’s not a strawman.

Then, Damore disagrees with this assumption:

  • Differences in distributions of traits between men and women may in part explain why we don’t have 50% representation of women in tech and leadership. Discrimination to reach equal representation is unfair, divisive, and bad for business.

Again, I agree with Damore. Note that this doesn’t mean that I must believe that sexism against women isn’t real and important (I’ve heard enough horror stories to be confident that some work environments are toxic to women).  It doesn’t even mean that I must be certain that the different rates of men and women in technical fields are due to genetics.  I’m very far from certain, and I’m not an expert in psychology. I don’t think I can do justice to the science in this post, so I’m not going to cover the research literature.

But I do think it’s irresponsible to assume a priori that there are no innate sex differences that might explain what we see.  It’s an empirical matter, and a topic for research, not dogma.

Moreover, I think discrimination on the basis of sex to reach equal representation is unfair and unproductive.  It’s unfair, because it’s not meritocratic.  You’re not choosing the best human for the job regardless of gender.

I think women might actually benefit from companies giving genuine meritocracy a chance. “Blind” auditions (in which the evaluator doesn’t see the performer) gave women a better chance of landing orchestra jobs; apparently, orchestras were prejudiced against female musicians, and the blinding canceled out that prejudice. Google’s own research has actually shown that the single best predictor of work performance is a work sample — testing candidates with a small project similar to what they’d do on the job. Work samples are easy to anonymize to reduce gender bias, and they’re more effective than traditional interviews, where split-second first impressions usually decide who gets hired, but don’t correlate at all with job performance. A number of tech companies have switched to work samples as part of their interview process.  I used work samples myself when I was hiring for a startup, just because they seemed more accurate at predicting who’d be good at the job; entirely without intending to, I got a 50% gender ratio.  If you want to reduce gender bias in tech, it’s worth at least considering blinded hiring via work samples.

Moreover, thinking about “representation” in science and technology reflects underlying assumptions that I think are quite dangerous.

You expect interest groups to squabble over who gets a piece of the federal budget. In politics, people will band together in blocs, and try to get the biggest piece of the spoils they can.  “Women should get such-and-such a percent of tech jobs” sounds precisely like this kind of politicking; women are assumed to be a unified bloc who will vote together, and the focus is on what size chunk they can negotiate for themselves. If a tech job (or a university position) were a cushy sinecure, a ticket to privilege, and nothing more, you might reasonably ask “how come some people get more goodies than others? Isn’t meritocracy just an excuse to restrict the goodies to your preferred group?”

Again, this is not a strawman. Here’s one Vox response to the memo stating explicitly that she believes women are a unified bloc:

The manifesto’s sleight-of-hand delineation between “women, on average” and the actual living, breathing women who have had to work alongside this guy failed to reassure many of those women — and failed to reassure me. That’s because the manifesto’s author overestimated the extent to which women are willing to be turned against their own gender.

Speaking for myself, it doesn’t matter to me how soothingly a man coos that I’m not like most women, when those coos are accompanied by misogyny against most women. I am a woman. I do not stop being one during the parts of the day when I am practicing my craft. There can be no realistic chance of individual comfort for me in an environment where others in my demographic categories (or, really, any protected demographic categories) are subjected to skepticism and condescension.

She can’t be comfortable unless everybody in any protected demographic category — note that this is a legal, governmental category — is given the benefit of the doubt?  That’s a pretty collectivist commitment!

Or, look at Piper Harron, an assistant professor in math who blogged on the American Mathematical Society’s website that universities should simply “stop hiring white cis men”, and explicitly says “If you are on a hiring committee, and you are looking at applicants and you see a stellar white male applicant, think long and hard about whether your department needs another white man. You are not hiring a researching robot who will output papers from a dark closet. You are hiring an educator, a role model, a spokesperson, an advisor, a committee person … There is no objectivity. There is no meritocracy.”

Piper Harron reflects an extreme, of course, but she’s explicitly saying, on America’s major communication channel for and by mathematicians, that whether you get to work in math should not be based on whether you’re actually good at math. For her, it’s all politics.  Life itself is political, and therefore a zero-sum power struggle between groups.  

But most of us, male or female, didn’t fall in love with science and technology for that. Science is the mission to explore and understand our universe. Technology is the project of expanding human power to shape that universe. What we do towards those goals will live longer than any “protected demographic category”, any nation, any civilization.  We know how the Babylonians mapped the stars.

Women deserve an equal chance at a berth on the journey of exploration not because they form a political bloc but because some of them are discoverers and can contribute to the human mission.

Maybe, in a world corrupted by rent-seeking, the majority of well-paying jobs have some element of unearned privilege; perhaps almost all of us got at least part of our salaries by indirectly expropriating someone who had as good a right to it as us.

But that’s not a good thing, and that’s not what we hope for science and engineering to be, and I truly believe that this is not the inevitable fate of the human race — that we can only squabble over scraps, and never create.  

I’ve seen creation, and I’ve seen discovery. I know they’re real.

I care a lot more about whether my company achieves its goal of curing 100 rare diseases in 10 years than about the demographic makeup of our team.  We have an actual mission; we are trying to do something beyond collecting spoils.  

Do I rely on brilliant work by other women every day? I do. My respect for myself and my female colleagues is not incompatible with primarily caring about the mission.

Am I “turning against my own gender” because I see women as individuals first? I don’t think so. We’re half the human race, for Pete’s sake! We’re diverse. We disagree. We’re human.

When you think of “women-in-STEM” as a talking point on a political agenda, you mention Ada Lovelace and Grace Hopper in passing, and move on to talking about quotas.  When you think of women as individuals, you start to notice how many genuinely foundational advances were made by women — just in my own field of machine learning, Adele Cutler co-invented random forests, Corrina Cortes co-invented support vector machines, and Fei Fei Li created the famous ImageNet benchmark dataset that started a revolution in image recognition.

As a child, my favorite book was Carl Sagan’s Contact, a novel about Ellie Arroway, an astronomer loosely based on his wife Ann Druyan. The name is not an accident; like the title character in Sinclair Lewis’ Arrowsmith, Ellie is a truth-seeking scientist who battles corruption, anti-intellectualism, and blind prejudice.  Sexism is one of the challenges she faces, but the essence of her life is about wonder and curiosity. She’s what I’ve always tried to become.

I hope that, in seeking to encourage the world’s Ellies in science and technology, we remember why we’re doing that in the first place. I hope we remember humans are explorers.


Now let’s hear from another friend who wrote to me recently, and who has a slightly different take.  Stacey Jeffery is a quantum computing theorist at one of my favorite research centers, CWI in Amsterdam.  She completed her PhD at University of Waterloo, and has done wonderful work on quantum query complexity and other topics close to my heart.  When I was being viciously attacked in the comment-171 affair, Stacey was one of the first people to send me a note of support, and I’ve never forgotten it.

Stacey Jeffery’s Commentary

I don’t think Google was right to fire Damore. This makes me a minority among people with whom I have discussed this issue.  Hopefully some people come out in the comments in support of the other position, so it’s not just me presenting that view, but the main argument I encountered was that what he said just sounded way too sexist for Google to put up with.  I agree with part of that, it did sound sexist to me.  In fact it also sounded racist to me. But that’s not because he necessarily said anything actually sexist or actually racist, but because he said the kinds of things that you usually only hear from sexist people, and in particular, the kind of sexist people who are also racist.  I’m very unlikely to try to pursue further interaction with a person who says these kinds of things for those reasons, but I think firing him for what he said between the lines sets a very bad precedent.  It seems to me he was fired for associating himself with the wrong ideas, and it does feel a bit like certain subjects are not up for rational discussion.  If Google wants an open environment, where employees can feel safe discussing company policy, I don’t think this contributes to that.  If they want their employees, and the world, to think that they aim for diversity because it’s the most rational course of action to achieve their overall objectives, rather than because it serves some secret agenda, like maintaining a PC public image, then I don’t think they’ve served that cause either.  Personally, this irritates me the most, because I feel they have damaged the image for a cause I feel strongly about.

My position is independent of the validity of Damore’s attempt at scientific argument, which is outside my area of expertise.  I personally don’t think it’s very productive for non-social-scientists to take authoritative positions on social science issues, especially ones that appear to be controversial within the field (but I say this as a layperson).  This may include some of the other commentary in this blog post, which I have not yet read, and might even extend to Scott’s decision to comment on this issue at all (but this bridge was crossed in the previous blog post).  However, I think one of the reasons that many of us do this is that the burden of solving the problem of too few women in STEM is often placed on us.  Some people in STEM feel they are blamed for not being welcoming enough to women (in fact, in my specific field, it’s my experience that the majority of people are very sympathetic).  Many scientific funding applications even ask applicants how they plan to address the issue of diversity, as if they should be the ones to come up with a solution for this difficult problem that nobody knows the answer to, and is not even within their expertise.  So it’s not surprising when these same people start to think about and form opinions on these social science issues.  Obviously, we working in STEM have valuable insight into how we might encourage women to pursue STEM careers, and we should be pushed to think about this, but we don’t have all the answers (and maybe we should remember that the next time we consider authoring an authoritative memo on the subject).


Scott’s Mansplaining Commentary

I’m incredibly grateful to Sarah and Stacey for sharing their views.  Now it’s time for me to mansplain my own thoughts in light of what they said.  Let me start with a seven-point creed.

1. I believe that science and engineering, both in academia and in industry, benefit enormously from contributions from people of every ethnic background and gender identity.  This sort of university-president-style banality shouldn’t even need to be said, but in a world where the President of the US criticizes neo-Nazis only under extreme pressure from his own party, I suppose it does.

2. I believe that there’s no noticeable difference in average ability between men and women in STEM fields—or if there’s some small disparity, for all I know the advantage goes to women. I have enough Sheldon Cooper in me that, if this hadn’t been my experience, I’d probably let it slip that it hadn’t been, but it has been.  When I taught 6.045 (undergrad computability and complexity) at MIT, women were only 20% or so of the students, but for whatever reasons they were wildly overrepresented among the top students.

3. I believe that women in STEM face obstacles that men don’t.  These range from the sheer awkwardness of sometimes being the only woman in a room full of guys, to challenges related to pregnancy and childcare, to actual belittlement and harassment.  Note that, even if men in STEM fields are no more sexist on average than men in other fields—or are less sexist, as one might expect from their generally socially liberal views and attitudes—the mere fact of the gender imbalance means that women in STEM will have many more opportunities to be exposed to whatever sexists there are.  This puts a special burden on us to create a welcoming environment for women.

4. Given that we know that gender gaps in interest and inclination appear early in life, I believe in doing anything we can to encourage girls’ interest in STEM fields.  Trust me, my four-year-old daughter Lily wishes I didn’t believe so fervently in working with her every day on her math skills.

5. I believe that gender diversity is valuable in itself.  It’s just nicer, for men and women alike, to have a work environment with many people of both sexes—especially if (as is often the case in STEM) so much of our lives revolves around our work.  I think that affirmative action for women, women-only scholarships and conferences, and other current efforts to improve gender diversity can all be defended and supported on that ground alone.

6. I believe that John Stuart Mill’s The Subjection of Women is one of the masterpieces of history, possibly the highest pinnacle that moral philosophy has ever reached.  Everyone should read it carefully and reflect on it if they haven’t already.

7. I believe it’s a tragedy that the current holder of the US presidency is a confessed sexual predator, who’s full of contempt not merely for feminism, but for essentially every worthwhile human value. I believe those of us on the “pro-Enlightenment side” now face the historic burden of banding together to stop this thug by every legal and peaceful means available. I believe that, whenever the “good guys” tear each other down in internecine warfare—e.g. “nerds vs. feminists”—it represents a wasted opportunity and an unearned victory for the enemies of progress.

OK, now for the part that might blow some people’s minds.  I hold that every single belief above is compatible with what James Damore wrote in his now-infamous memo—at least, if we’re talking about the actual words in it.  In some cases, Damore even makes the above points himself.  In particular, there’s nothing in what he wrote about female Googlers being less qualified on average than male Googlers, or being too neurotic to code, or anything like that: the question at hand is just why there are fewer women in these positions, and that in turn becomes a question about why there are fewer women earlier in the CS pipeline.  Reasonable people need not agree about the answers to those questions, or regard them as known or obvious, to see that the failure to make this one elementary distinction, between quality and quantity, already condemns 95% of Damore’s attackers as not having read or understood what he wrote.

Let that be the measure of just how terrifyingly efficient the social-media outrage machine has become at twisting its victims’ words to fit a clickbait narrative—a phenomenon with which I happen to be personally acquainted.  Strikingly, it seems not to make the slightest difference if (as in this case) the original source text is easily available to everyone.

Still, while most coverage of Damore’s memo was depressing in its monotonous incomprehension, dissent was by no means confined to the right-wingers eager to recruit Damore to their side.  Peter Singer—the legendary leftist moral philosopher, and someone whose fearlessness and consistency I’ve always admired whether I’ve agreed with him or not—wrote a powerful condemnation of Google’s decision to fire Damore.  Scott Alexander was brilliant as usual in picking apart bad arguments.  Megan McArdle drew on her experiences to illustrate some of Damore’s contentions.  Steven Pinker tweeted that Damore’s firing “makes [the] job of anti-Trumpists harder.”

Like Peter Singer, and also like Sarah Constantin and Stacey Jeffery above, I have no plans to take any position on biological differences in male and female inclinations and cognitive styles, and what role (if any) such differences might play in 80% of Google engineers being male—or, for that matter, what role they might play in 80% of graduating veterinarians now being female, or other striking gender gaps.  I decline to take a position not only because I’m not an expert, but also because, as Singer says, doing so isn’t necessary to reach the right verdict about Damore’s firing.  It suffices to note that the basic thesis being discussed—namely, that natural selection doesn’t stop at the neck, and that it’s perfectly plausible that it acted differently on women and men in ways that might help explain many of the population-level differences that we see today—can also be found in, for example, The Blank Slate by Steven Pinker, and other mainstream works by some of the greatest thinkers alive.

And therefore I say: if James Damore deserves to be fired from Google, for treating evolutionary psychology as potentially relevant to social issues, then Steven Pinker deserves to be fired from Harvard for the same offense.

Yes, I realize that an employee of a private company is different from a tenured professor.  But I don’t see why it’s relevant here.  For if someone really believes that mooting the hypothesis of an evolutionary reason for average differences in cognitive styles between men and women, is enough by itself to create a hostile environment for women—well then, why should tenure be a bar to firing, any more than it is in cases of sexual harassment?

But the reductio needn’t stop there.  It seems to me that, if Damore deserves to be fired, then so do the 56% of Googlers who said in a poll that they opposed his firing.  For isn’t that 56% just as responsible for maintaining a hostile environment as Damore himself was? (And how would Google find out which employees opposed the firing? Well, if there’s any company on earth that could…)  Furthermore, after those 56% of Googlers are fired, any of the remaining 44% who think the 56% shouldn’t have been fired should be fired as well!  And so on iteratively, until only an ideologically reliable core remains, which might or might not be the empty set.

OK, but while the wider implications of Damore’s firing have frightened and depressed me all week, as I said, I depart from Damore on the question of affirmative action and other diversity policies.  Fundamentally, what I want is a sort of negotiated agreement or bargain, between STEM nerds and the wider culture in which they live.  The agreement would work like this: STEM nerds do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies.  The STEM nerds also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves,  or anything else potentially inflammatory.  In return, just two things:

  1. Male STEM nerds don’t regularly get libelled as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.
  2. The fields beloved by STEM nerds are suffered to continue to exist, rather than getting destroyed and rebuilt along explicitly ideological lines, as already happened with many humanities and social science fields.

So in summary, neither side advances its theories about the causes of gender gaps; both sides simply agree that there are more interesting topics to explore.  In concrete terms, the social-justice side gets to retain 100% of what it has now, or maybe even expand it.  And all it has to offer in exchange is “R-E-S-P-E-C-T“!  Like, don’t smear and shame male nerds as a class, or nerdy disciplines themselves, for gender gaps that the male nerds would be as happy as anybody to see eradicated.

The trouble is that, fueled by outrage-fests on social media, I think the social-justice side is currently failing to uphold its end of this imagined bargain.  Nearly every day the sun rises to yet another thinkpiece about the toxic “bro culture” of Silicon Valley: a culture so uniquely and incorrigibly misogynist, it seems, that it still intentionally keeps women out, even after law and biology and most other white-collar fields have achieved or exceeded gender parity, their own “bro cultures” notwithstanding.  The trouble with this slander against male STEM nerds, besides its fundamental falsity (which Scott Alexander documented), is that puts the male nerds into an impossible position.  For how can they refute the slander without talking about other possible explanations for fields like CS being 80% male, which is the very thing we all know they’re not supposed to talk about?

In Europe, in the Middle Ages, the Church would sometimes enjoy forcing the local Jews into “disputations” about whose religion was the true one.  At these events, a popular tactic on the Church’s side was to make statements that the Jews couldn’t possibly answer without blaspheming the name of Christ—which, of course, could lead to the Jews’ expulsion or execution if they dared it.

Maybe I have weird moral intuitions, but it’s hard for me to imagine a more contemptible act of intellectual treason, than deliberately trapping your opponents between surrender and blasphemy.  I’d actually rather have someone force me into one or the other, than make me choose, and thereby make me responsible for whichever choice I made.  So I believe the social-justice left would do well to forswear this trapping tactic forever.

Ironically, I suspect that in the long term, doing so would benefit no entity more than the social-justice left itself.  If I had to steelman, in one sentence, the argument that in the space of one year propelled the “alt-right” from obscurity in dark and hateful corners of the Internet, to the improbable and ghastly ascent of Donald Trump and his white-nationalist brigade to the most powerful office on earth, the argument would be this:

If the elites, the technocrats, the “Cathedral”-dwellers, were willing to lie to the masses about humans being blank slates—and they obviously were—then why shouldn’t we assume that they also lied to us about healthcare and free trade and guns and climate change and everything else?

We progressives deluded ourselves that we could permanently shame our enemies into silence, on pain of sexism, racism, xenophobia, and other blasphemies.  But the “victories” won that way were hollow and illusory, and the crumbling of the illusion brings us to where we are now: with a vindictive, delusional madman in the White House who has a non-negligible chance of starting a nuclear war this week.

The Enlightenment was a specific historical period in 18th-century Europe.  But the term can also be used much more broadly, to refer to every trend in human history that’s other than horrible.  Seen that way, the Enlightenment encompasses the scientific revolution, the abolition of slavery, the decline of all forms of violence, the spread of democracy and literacy, and the liberation of women from domestic drudgery to careers of their own choosing.  The invention of Google, which made the entire world’s knowledge just a search bar away, is now also a permanent part of the story of the Enlightenment.

I fantasize that, within my lifetime, the Enlightenment will expand further to tolerate a diversity of cognitive styles—including people on the Asperger’s and autism spectrum, with their penchant for speaking uncomfortable truths—as well as a diversity of natural abilities and inclinations.  Society might or might not get the “demographically correct” percentage of Ellie Arroways—Ellie might decide to become a doctor or musician rather than an astronomer, and that’s fine too—but most important, it will nurture all the Ellie Arroways that it gets, all the misfits and explorers of every background.  I wonder whether, while disagreeing on exactly what’s meant by it, all parties to this debate could agree that diversity represents a next frontier for the Enlightenment.


Comment Policy: Any comment, from any side, that attacks people rather than propositions will be deleted.  I don’t care if the comment also makes useful points: if it contains a single ad hominem, it’s out.

As it happens, I’m at a quantum supremacy workshop in Bristol, UK right now—yeah, yeah, I’m a closet supremacist after all, hur hur—so I probably won’t participate in the comments until later.

The Kolmogorov option

August 8th, 2017

Andrey Nikolaevich Kolmogorov was one of the giants of 20th-century mathematics.  I’ve always found it amazing that the same man was responsible both for establishing the foundations of classical probability theory in the 1930s, and also for co-inventing the theory of algorithmic randomness (a.k.a. Kolmogorov complexity) in the 1960s, which challenged the classical foundations, by holding that it is possible after all to talk about the entropy of an individual object, without reference to any ensemble from which the object was drawn.  Incredibly, going strong into his eighties, Kolmogorov then pioneered the study of “sophistication,” which amends Kolmogorov complexity to assign low values both to “simple” objects and “random” ones, and high values only to a third category of objects, which are “neither simple nor random.”  So, Kolmogorov was at the vanguard of the revolution, counter-revolution, and counter-counter-revolution.

But that doesn’t even scratch the surface of his accomplishments: he made fundamental contributions to topology and dynamical systems, and together with Vladimir Arnold, solved Hilbert’s thirteenth problem, showing that any multivariate continuous function can be written as a composition of continuous functions of two variables.  He mentored an awe-inspiring list of young mathematicians, whose names (besides Arnold) include Dobrushin, Dynkin, Gelfand, Martin-Löf, Sinai, and in theoretical computer science, our own Leonid Levin.  If that wasn’t enough, during World War II Kolmogorov applied his mathematical gifts to artillery problems, helping to protect Moscow from German bombardment.

Kolmogorov was private in his personal and political life, which might have had something to do with being gay, at a time and place when that was in no way widely accepted.  From what I’ve read—for example, in Gessen’s biography of Perelman—Kolmogorov seems to have been generally a model of integrity and decency.  He established schools for mathematically gifted children, which became jewels of the Soviet Union; one still reads about them with awe.  And at a time when Soviet mathematics was convulsed by antisemitism—with students of Jewish descent excluded from the top math programs for made-up reasons, sent instead to remote trade schools—Kolmogorov quietly protected Jewish researchers.

OK, but all this leaves a question.  Kolmogorov was a leading and admired Soviet scientist all through the era of Stalin’s purges, the Gulag, the KGB, the murders and disappearances and forced confessions, the show trials, the rewritings of history, the allies suddenly denounced as traitors, the tragicomedy of Lysenkoism.  Anyone as intelligent, individualistic, and morally sensitive as Kolmogorov would obviously have seen through the lies of his government, and been horrified by its brutality.  So then why did he utter nary a word in public against what was happening?

As far as I can tell, the answer is simply: because Kolmogorov knew better than to pick fights he couldn’t win.  He judged that he could best serve the cause of truth by building up an enclosed little bubble of truth, and protecting that bubble from interference by the Soviet system, and even making the bubble useful to the system wherever he could—rather than futilely struggling to reform the system, and simply making martyrs of himself and all his students for his trouble.

There’s a saying of Kolmogorov, which associates wisdom with keeping your mouth shut:

“Every mathematician believes that he is ahead of the others. The reason none state this belief in public is because they are intelligent people.”

There’s also a story that Kolmogorov loved to tell about himself, which presents math as a sort of refuge from the arbitrariness of the world: he said that he once studied to become a historian, but was put off by the fact that historians demanded ten different proofs for the same proposition, whereas in math, a single proof suffices.

There was also a dark side to political quietism.  In 1936, Kolmogorov joined other mathematicians in testifying against his former mentor in the so-called Luzin affair.  By many accounts, he did this because the police blackmailed him, by threatening to reveal his homosexual relationship with Pavel Aleksandrov.  On the other hand, while he was never foolish enough to take on Lysenko directly, Kolmogorov did publish a paper in 1940 courageously supporting Mendelian genetics.


It seems likely that in every culture, there have been truths, which moreover everyone knows to be true on some level, but which are so corrosive to the culture’s moral self-conception that one can’t assert them, or even entertain them seriously, without (in the best case) being ostracized for the rest of one’s life.  In the USSR, those truths were the ones that undermined the entire communist project: for example, that humans are not blank slates; that Mendelian genetics is right; that Soviet collectivized agriculture was a humanitarian disaster.  In our own culture, those truths are—well, you didn’t expect me to say, did you? 🙂

I’ve long been fascinated by the psychology of unspeakable truths.  Like, for any halfway perceptive person in the USSR, there must have been an incredible temptation to make a name for yourself as a daring truth-teller: so much low-hanging fruit!  So much to say that’s correct and important, and that best of all, hardly anyone else is saying!

But then one would think better of it.  It’s not as if, when you speak a forbidden truth, your colleagues and superiors will thank you for correcting their misconceptions.  Indeed, it’s not as if they didn’t already know, on some level, whatever you imagined yourself telling them.  In fact it’s often because they fear you might be right that the authorities see no choice but to make an example of you, lest the heresy spread more widely.  One corollary is that the more reasonably and cogently you make your case, the more you force the authorities’ hand.

But what’s the inner psychology of the authorities?  For some, it probably really is as cynical as the preceding paragraph makes it sound.  But for most, I doubt that.  I think that most authorities simply internalize the ruling ideology so deeply that they equate dissent with sin.  So in particular, the better you can ground your case in empirical facts, the craftier and more conniving a deceiver you become in their eyes, and hence the more virtuous they are for punishing you.  Someone who’s arrived at that point is completely insulated from argument: absent some crisis that makes them reevaluate their entire life, there’s no sense in even trying.  The question of whether or not your arguments have merit won’t even get entered upon, nor will the authority ever be able to repeat back your arguments in a form you’d recognize—for even repeating the arguments correctly could invite accusations of secretly agreeing with them.  Instead, the sole subject of interest will be you: who you think you are, what your motivations were to utter something so divisive and hateful.  And you have as good a chance of convincing authorities of your benign motivations as you’d have of convincing the Inquisition that, sure, you’re a heretic, but the good kind of heretic, the kind who rejects the divinity of Jesus but believes in niceness and tolerance and helping people.  To an Inquisitor, “good heretic” doesn’t parse any better than “round square,” and the very utterance of such a phrase is an invitation to mockery.  If the Inquisition had had Twitter, its favorite sentence would be “I can’t even.”

If it means anything to be a lover of truth, it means that anytime society finds itself stuck in one of these naked-emperor equilibriums—i.e., an equilibrium with certain facts known to nearly everyone, but severe punishments for anyone who tries to make those facts common knowledge—you hope that eventually society climbs its way out.  But crucially, you can hope this while also realizing that, if you tried singlehandedly to change the equilibrium, it wouldn’t achieve anything good for the cause of truth.  If iconoclasts simply throw themselves against a ruling ideology one by one, they can be picked off as easily as tribesmen charging a tank with spears, and each kill will only embolden the tank-gunners still further.  The charging tribesmen don’t even have the assurance that, if truth ultimately does prevail, then they’ll be honored as martyrs: they might instead end up like Ted Nelson babbling about hypertext in 1960, or H.C. Pocklington yammering about polynomial-time algorithms in 1917, nearly forgotten by history for being too far ahead of their time.

Does this mean that, like Winston Smith, the iconoclast simply must accept that 2+2=5, and that a boot will stamp on a human face forever?  No, not at all.  Instead the iconoclast can choose what I think of as the Kolmogorov option.  This is where you build up fortresses of truth in places the ideological authorities don’t particularly understand or care about, like pure math, or butterfly taxonomy, or irregular verbs.  You avoid a direct assault on any beliefs your culture considers necessary for it to operate.  You even seek out common ground with the local enforcers of orthodoxy.  Best of all is a shared enemy, and a way your knowledge and skills might be useful against that enemy.  For Kolmogorov, the shared enemy was the Nazis; for someone today, an excellent choice might be Trump, who’s rightly despised by many intellectual factions that spend most of their time despising each other.  Meanwhile, you wait for a moment when, because of social tectonic shifts beyond your control, the ruling ideology has become fragile enough that truth-tellers acting in concert really can bring it down.  You accept that this moment of reckoning might never arrive, or not in your lifetime.  But even if so, you could still be honored by future generations for building your local pocket of truth, and for not giving falsehood any more aid or comfort than was necessary for your survival.


When it comes to the amount of flak one takes for defending controversial views in public under one’s own name, I defer to almost no one.  For anyone tempted, based on this post, to call me a conformist or coward: how many times have you been denounced online, and from how many different corners of the ideological spectrum?  How many people have demanded your firing?   How many death threats have you received?  How many threatened lawsuits?  How many comments that simply say “kill yourself kike” or similar?  Answer and we can talk about cowardice.

But, yes, there are places even I won’t go, hills I won’t die on.  Broadly speaking:

  • My Law is that, as a scientist, I’ll hold discovering and disseminating the truth to be a central duty of my life, one that overrides almost every other value.  I’ll constantly urge myself to share what I see as the truth, even if it’s wildly unpopular, or makes me look weird, or is otherwise damaging to me.
  • The Amendment to the Law is that I’ll go to great lengths not to hurt anyone else’s feelings: for example, by propagating negative stereotypes, or by saying anything that might discourage any enthusiastic person from entering science.  And if I don’t understand what is or isn’t hurtful, then I’ll defer to the leading intellectuals in my culture to tell me.  This Amendment often overrides the Law, causing me to bite my tongue.
  • The Amendment to the Amendment is that, when pushed, I’ll stand by what I care about—such as free scientific inquiry, liberal Enlightenment norms, humor, clarity, and the survival of the planet and of family and friends and colleagues and nerdy misfits wherever they might be found.  So if someone puts me in a situation where there’s no way to protect what I care about without speaking a truth that hurts someone’s feelings, then I might speak the truth, feelings be damned.  (Even then, though, I’ll try to minimize collateral damage.)

When I see social media ablaze with this or that popular falsehood, I sometimes feel the “Galileo urge” washing over me.  I think: I’m a tenured professor with a semi-popular blog.  How can I look myself in the mirror, if I won’t use my platform and relative job safety to declare to the world, “and yet it moves”?

But then I remember that even Galileo weighed his options and tried hard to be prudent.  In his mind, the Dialogue Concerning the Two Chief World Systems actually represented a compromise (!).  Galileo never declared outright that the earth orbits the sun.  Instead, he put the Copernican doctrine, as a “possible view,” into the mouth of his character Salviati—only to have Simplicio “refute” Salviati, by the final dialogue, with the argument that faith always trumps reason, and that human beings are pathetically unequipped to deduce the plan of God from mere surface appearances.  Then, when that fig-leaf turned out not to be wide enough to fool the Church, Galileo quickly capitulated.  He repented of his error, and agreed never to defend the Copernican heresy again.  And he didn’t, at least not publicly.

Some have called Galileo a coward for that.  But the great David Hilbert held a different view.  Hilbert said that science, unlike religion, has no need for martyrs, because it’s based on facts that can’t be denied indefinitely.  Given that, Hilbert considered Galileo’s response to be precisely correct: in effect Galileo told the Inquisitors, hey, you’re the ones with the torture rack.  Just tell me which way you want it.  I can have the earth orbiting Mars and Venus in figure-eights by tomorrow if you decree it so.

Three hundred years later, Andrey Kolmogorov would say to the Soviet authorities, in so many words: hey, you’re the ones with the Gulag and secret police.  Consider me at your service.  I’ll even help you stop Hitler’s ideology from taking over the world—you’re 100% right about that one, I’ll give you that.  Now as for your own wondrous ideology: just tell me the dogma of the week, and I’ll try to make sure Soviet mathematics presents no threat to it.

There’s a quiet dignity to Kolmogorov’s (and Galileo’s) approach: a dignity that I suspect will be alien to many, but recognizable to those in the business of science.


Comment Policy: I welcome discussion about the responses of Galileo, Kolmogorov, and other historical figures to official ideologies that they didn’t believe in; and about the meta-question of how a truth-valuing person ought to behave when living under such ideologies.  In the hopes of maintaining a civil discussion, any comments that mention current hot-button ideological disputes will be ruthlessly deleted.

Is “information is physical” contentful?

July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

Three things

July 17th, 2017

I was shocked and horrified to learn of the loss of Maryam Mirzakhani at age 40, after a battle with cancer (see here or here).  Mirzakhani was a renowned mathematician at Stanford and the world’s first and so far only female Fields Medalist.  I never had the privilege of meeting her, but everything I’ve read about her fills me with admiration.  I wish to offer condolences to her friends and family, including her husband Jan Vondrák, also a professor at Stanford and a member of the CS theory community.


In other depressing news, discussion continues to rage on social media about “The Uninhabitable Earth,” the New York magazine article by David Wallace-Wells arguing that the dangers of climate change have been systematically understated even by climate scientists; that sea level rise is the least of the problems; and that if we stay the current course, much of the earth’s landmass has a good chance of being uninhabitable by the year 2100.  In an unusual turn of events, the Wallace-Wells piece has been getting slammed by climate scientists, including Michael Mann (see here and also this interview)—people who are usually in the news to refute the claims of deniers.

Some of the critics’ arguments seem cogent to me: for example, that Wallace-Wells misunderstood some satellite data, and more broadly, that the piece misleadingly presents its scenario as overwhelmingly probable by 2100 if we do nothing, rather than as “only” 10% likely or whatever—i.e., a mere Trump-becoming-president level of risk.  Other objections to the article impressed me less: for example, that doom-and-gloom is a bad way to motivate people about climate change; that the masses need a more optimistic takeaway.  That obviously has no bearing on the truth of what’s going to happen—but even if we did agree to entertain such arguments, well, it’s not as if mainstream messaging on climate change has been an unmitigated success.  What if everyone should be sweating-in-the-night terrified?

As far as I understand it, the question of the plausibility of Wallace-Wells’s catastrophe scenario mostly just comes down to a single scientific unknown: namely, will the melting permafrost belch huge amounts of methane into the atmosphere?  If it does, then “Armageddon” is probably a fair description of what awaits us in the next century, and if not, not.  Alas, our understanding of permafrost doesn’t seem especially reliable, and it strikes me that models of such feedbacks have a long history of erring on the side of conservatism (for example, researchers were astonished by how quickly glaciers and ice shelves fell apart).

So, while I wish the article was written with more caveats, I submit that runaway warming scenarios deserve more attention rather than less.  And we should be putting discussion of those scenarios in exactly the broader context that Wallace-Wells does: namely, that of the Permian-Triassic extinction event, the Fermi paradox, and the conditions for a technological civilization to survive past its infancy.

Certainly we spend much more time on risks to civilization (e.g., nuclear terrorism, bioengineered pandemics) that strike me as less probable than this one.  And certainly this tail, in the distribution of possible outcomes, deserves at least as much attention as its more popular opposite, the tail where climate change turns out not to be much of a problem at all.  For the grim truth about climate change is that history won’t end in 2100: only the projections do.  And the mere addition of 50 more years could easily suffice to turn a tail risk into a body risk.

Of course, that the worst will happen is a clear prediction of reverse Hollywoodism theory—besides being the “natural, default” prediction for a computer scientist used to worst-case analysis.  This is one prediction that I hope turns out to be as wrong as possible.


OK, now for something to cheer us all up.  Yesterday the group of Misha Lukin, at Harvard, put a paper on the arXiv reporting the creation of a 51-qubit quantum simulator using cold atoms.  The paper doesn’t directly address the question of quantum supremacy, or indeed of performance comparisons between the new device and classical simulations at all.  But this is clearly a big step forward, while the world waits for the fully-programmable 50-qubit superconducting QCs that have been promised by the groups at Google and IBM.

Indeed, this strikes me as the most exciting news in experimental quantum information since last month, when Jian-Wei Pan’s group in Shanghai reported the first transmission of entangled photons from a satellite to earth—thereby allowing violations of the Bell inequality over 1200 kilometers, teleportation of a qubit from earth to space, and other major firsts.  These are breakthroughs that we knew were in the works ever since the Chinese government launched the QUESS satellite devoted to quantum communications.  I should’ve blogged about them in June.  Then again, regular readers of Shtetl-Optimized, familiar as they already are with the universal reach of quantum mechanics and with the general state of quantum information technology, shouldn’t find anything here that fundamentally surprises them, should they?

Amsterdam art museums plagiarizing my blog?

July 12th, 2017

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.

 

Photo credits: Ronald de Wolf and Marijn Heule.

ITCS’2018

July 5th, 2017

My friend Anna Karlin, who chairs the ITCS program committee this year, asked me to post the following announcement, and I’m happy to oblige her.  I’ve enjoyed ITCS every time I’ve attended, and was even involved in the statement that led to ITCS’s creation, although I don’t take direct responsibility for the content of this ad. –SA


The ITCS 2018 Call For Papers is now available!

ITCS is a conference that stands apart from all others. For a decade now, it has been celebrating the vibrancy and unity of our field of Theoretical Computer Science. See this blog post for a detailed discussion of what makes ITCS so cool and the brief description of ITCS’17 at the end of this post.

ITCS seeks to promote research that carries a strong conceptual message  (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

This year, ITCS will be held at MIT in Cambridge, MA from January 11-14, 2018.

The submission deadline is September 8, 2017, with notification of decisions by October 30, 2017.

 Authors should strive to make their papers accessible not only to experts in their subarea, but also to the theory community at large. The committee will place a premium on writing that conveys clearly and in the simplest possible way what the paper is accomplishing.

Ten-page versions of accepted papers will be published in an electronic proceedings of the conference. However, the alternative of publishing a one page abstract with a link to a full PDF will also be available (to accommodate subsequent publication in journals that would not consider results that have been published in preliminary form in a conference proceedings).

You can find all the details in the official Call For Papers.

 On last year’s ITCS (by the PC Chair Christos Papadimitriou)

 This past ITCS (2017) was by all accounts the most successful ever.  We had 170+ submissions and 61 papers, including 5 “invited papers”, and 90+ registrants, all new records.  There was a voluntary poster session for authors to get a chance to go through more detail, and the famous Graduating Bits event, where the younger ones get their 5 minutes to show off their accomplishment and personality.

The spirit of the conference was invigorating, heartwarming, and great fun.  I believe none of the twelve sessions had fewer than 70 attendees — no parallelism, of course — while the now famous last session was among the best attended and went one hour overtime due to the excitement of discussion (compare with the last large conference that you attended).

Alex Halderman testifying before the Senate Intelligence Committee

June 21st, 2017

This morning, my childhood best friend Alex Halderman testified before the US Senate about the proven ease of hacking electronic voting machines without leaving any record, the certainty that Russia has the technical capability to hack American elections, and the urgency of three commonsense (and cheap) countermeasures:

  1. a paper trail for every vote cast in every state,
  2. routine statistical sampling of the paper trail—enough to determine whether large-scale tampering occurred, and
  3. cybersecurity audits to instill general best practices (such as firewalling election systems).

You can watch Alex on C-SPAN here—his testimony begins at 2:16:13, and is followed by the Q&A period.  You can also read Alex’s prepared testimony here, as well as his accompanying Washington Post editorial (joint with Justin Talbot-Zorn).

Alex’s testimony—its civic, nonpartisan nature, right down to Alex’s flourish of approvingly quoting President Trump in support of paper ballots—reflects a moving optimism that, even in these dark times for democracy, Congress can be prodded into doing the right thing merely because it’s clearly, overwhelmingly in the national interest.  I wish I could say I shared that optimism.  Nevertheless, when called to testify, what can one do but act on the assumption that such optimism is justified?  Here’s hoping that Alex’s urgent message is heard and acted on.