Archive for the ‘Metaphysical Spouting’ Category

The Complete Idiot’s Guide to the Independence of the Continuum Hypothesis: Part 1 of <=Aleph_0

Saturday, October 31st, 2020

A global pandemic, apocalyptic fires, and the possible descent of the US into violent anarchy three days from now can do strange things to the soul.

Bertrand Russell—and if he’d done nothing else in his long life, I’d love him forever for it—once wrote that “in adolescence, I hated life and was continually on the verge of suicide, from which, however, I was restrained by the desire to know more mathematics.” This summer, unable to bear the bleakness of 2020, I obsessively read up on the celebrated proof of the unsolvability of the Continuum Hypothesis (CH) from the standard foundation of mathematics, the Zermelo-Fraenkel axioms of set theory. (In this post, I’ll typically refer to “ZFC,” which means Zermelo-Fraenkel plus the famous Axiom of Choice.)

For those tuning in from home, the Continuum Hypothesis was formulated by Georg Cantor, shortly after his epochal discovery that there are different orders of infinity: so for example, the infinity of real numbers (denoted C for continuum, or \( 2^{\aleph_0} \)) is strictly greater than the infinity of integers (denoted ℵ0, or “Aleph-zero”). CH is simply the statement that there’s no infinity intermediate between ℵ0 and C: that anything greater than the first is at least the second. Cantor tried in vain for decades to prove or disprove CH; the quest is believed to have contributed to his mental breakdown. When David Hilbert presented his famous list of 23 unsolved math problems in 1900, CH was at the very top.

Halfway between Hilbert’s speech and today, the question of CH was finally “answered,” with the solution earning the only Fields Medal that’s ever been awarded for work in set theory and logic. But unlike with any previous yes-or-no question in the history of mathematics, the answer was that there provably is no answer from the accepted axioms of set theory! You can either have intermediate infinities or not; neither possibility can create a contradiction. And if you do have intermediate infinities, it’s up to you how many: 1, 5, 17, ∞, etc.

The easier half, the consistency of CH with set theory, was proved by incompleteness dude Kurt Gödel in 1940; the harder half, the consistency of not(CH), by Paul Cohen in 1963. Cohen’s work introduced the method of forcing, which was so fruitful in proving set-theoretic questions unsolvable that it quickly took over the whole subject of set theory. Learning Gödel and Cohen’s proofs had been a dream of mine since teenagerhood, but one I constantly put off.

This time around I started with Cohen’s retrospective essay, as well as Timothy Chow’s Forcing for Dummies and A Beginner’s Guide to Forcing. I worked through Cohen’s own Set Theory and the Continuum Hypothesis, and Ken Kunen’s Set Theory: An Introduction to Independence Proofs, and Dana Scott’s 1967 paper reformulating Cohen’s proof. I emailed questions to Timothy Chow, who was ridiculously generous with his time. When Tim and I couldn’t answer something, we tried Bob Solovay (one of the world’s great set theorists, who later worked in computational complexity and quantum computing), or Andreas Blass or Asaf Karagila. At some point mathematician and friend-of-the-blog Greg Kuperberg joined my quest for understanding. I thank all of them, but needless to say take sole responsibility for all the errors that surely remain in these posts.

On the one hand, the proof of the independence of CH would seem to stand with general relativity, the wheel, and the chocolate bar as a triumph of the human intellect. It represents a culmination of Cantor’s quest to know the basic rules of infinity—all the more amazing if the answer turns out to be that, in some sense, we can’t know them.

On the other hand, perhaps no other scientific discovery of equally broad interest remains so sparsely popularized, not even (say) quantum field theory or the proof of Fermat’s Last Theorem. I found barely any attempts to explain how forcing works to non-set-theorists, let alone to non-mathematicians. One notable exception was Timothy Chow’s Beginner’s Guide to Forcing, mentioned earlier—but Chow himself, near the beginning of his essay, calls forcing an “open exposition problem,” and admits that he hasn’t solved it. My modest goal, in this post and the following ones, is to make a further advance on the exposition problem.

OK, but why a doofus computer scientist like me? Why not, y’know, an actual expert? I won’t put forward my ignorance as a qualification, although I have often found that the better I learn a topic, the more completely I forget what initially confused me, and so the less able I become to explain things to beginners.

Still, there is one thing I know well that turns out to be intimately related to Cohen’s forcing method, and that made me feel like I had a small “in” for this subject. This is the construction of oracles in computational complexity theory. In CS, we like to construct hypothetical universes where P=NP or P≠NP, or P≠BQP, or the polynomial hierarchy is infinite, etc. To do so, we, by fiat, insert a new function—an oracle—into the universe of computational problems, carefully chosen to make the desired statement hold. Often the oracle needs to satisfy an infinite list of conditions, so we handle them one by one, taking care that when we satisfy a new condition we don’t invalidate the previous conditions.

All this, I kept reading, is profoundly analogous to what the set theorists do when they create a mathematical universe where the Axiom of Choice is true but CH is false, or vice versa, or any of a thousand more exotic possibilities. They insert new sets into their models of set theory, sets that are carefully constructed to “force” infinite lists of conditions to hold. In fact, some of the exact same people—such as Solovay—who helped pioneer forcing in the 1960s, later went on to pioneer oracles in computational complexity. We’ll say more about this connection in a future post.

How Could It Be?

How do you study a well-defined math problem, and return the answer that, as far as the accepted axioms of math can say, there is no answer? I mean: even supposing it’s true that there’s no answer, how do you prove such a thing?

Arguably, not even Gödel’s Incompleteness Theorem achieved such a feat. Recall, the Incompleteness Theorem says loosely that, for every formal system F that could possibly serve as a useful foundation for mathematics, there exist statements even of elementary arithmetic that are true but unprovable in F—and Con(F), a statement that encodes F’s own consistency, is an example of one. But the very statement that Con(F) is unprovable is equivalent to Con(F)’s being true (since an inconsistent system could prove anything, including Con(F)). In other words, if the Incompleteness Theorem as applied to F holds any interest, then that’s only because F is, in fact, consistent; it’s just that resources beyond F are needed to prove this.

Yes, there’s a “self-hating theory,” F+Not(Con(F)), which believes in its own inconsistency. And yes, by Gödel, this self-hating theory is consistent if F itself is. This means that it has a model—involving “nonstandard integers,” formal artifacts that effectively promise a proof of F’s inconsistency without ever actually delivering it. We’ll have much, much more to say about models later on, but for now, they’re just collections of objects, along with relationships between the objects, that satisfy all the axioms of a theory (thus, a model of the axioms of group theory is simply … any group!).

In any case, though, the self-hating theory F+Not(Con(F)) can’t be arithmetically sound: I mean, just look at it! It’s either unsound because F is consistent, or else it’s unsound because F is inconsistent. In general, this is one of the most fundamental points in logic: consistency does not imply soundness. If I believe that the moon is made of cheese, that might be consistent with all my other beliefs about the moon (for example, that Neil Armstrong ate delicious chunks of it), but that doesn’t mean my belief is true. Like the classic conspiracy theorist, who thinks that any apparent evidence against their hypothesis was planted by George Soros or the CIA, I might simply believe a self-consistent collection of absurdities. Consistency is purely a syntactic condition—it just means that I can never prove both a statement and its opposite—but soundness goes further, asserting that whatever I can prove is actually the case, a relationship between what’s inside my head and what’s outside it.

So again, assuming we had any business using F in the first place, the Incompleteness Theorem gives us two consistent ways to extend F (by adding Con(F) or by adding Not(Con(F))), but only one sound way (by adding Con(F)). But the independence of CH from the ZFC axioms of set theory is of a fundamentally different kind. It will give us models of ZFC+CH, and models of ZFC+Not(CH), that are both at least somewhat plausible as “sketches of mathematical reality”—and that both even have defenders. The question of which is right, or whether it’s possible to decide at all, will be punted to the future: to the discovery (or not) of some intuitively compelling foundation for mathematics that, as Gödel hoped, answers the question by going beyond ZFC.

Four Levels to Unpack

While experts might consider this too obvious to spell out, Gödel’s and Cohen’s analyses of CH aren’t so much about infinity, as they are about our ability to reason about infinity using finite sequences of symbols. The game is about building self-contained mathematical universes to order—universes where all the accepted axioms about infinite sets hold true, and yet that, in some cases, seem to mock what those axioms were supposed to mean, by containing vastly fewer objects than the mathematical universe was “meant” to have.

In understanding these proofs, the central hurdle, I think, is that there are at least four different “levels of description” that need to be kept in mind simultaneously.

At the first level, Gödel’s and Cohen’s proofs, like all mathematical proofs, are finite sequences of symbols. Not only that, they’re proofs that can be formalized in elementary arithmetic (!). In other words, even though they’re about the axioms of set theory, they don’t themselves require those axioms. Again, this is possible because, at the end of the day, Gödel’s and Cohen’s proofs won’t be talking about infinite sets, but “only” about finite sequences of symbols that make statements about infinite sets.

At the second level, the proofs are making an “unbounded” but perfectly clear claim. They’re claiming that, if someone showed you a proof of either CH or Not(CH), from the ZFC axioms of set theory, then no matter how long the proof or what its details, you could convert it into a proof that ZFC itself was inconsistent. In symbols, they’re proving the “relative consistency statements”

Con(ZFC) ⇒ Con(ZFC+CH),
Con(ZFC) ⇒ Con(ZFC+Not(CH)),

and they’re proving these as theorems of elementary arithmetic. (Note that there’s no hope of proving Con(ZF+CH) or Con(ZFC+Not(CH)) outright within ZFC, since by Gödel, ZFC can’t even prove its own consistency.)

This translation is completely explicit; the independence proofs even yield algorithms to convert proofs of inconsistencies in ZFC+CH or ZFC+Not(CH), supposing that they existed, into proofs of inconsistencies in ZFC itself.

Having said that, as Cohen himself often pointed out, thinking about the independence proofs in terms of algorithms to manipulate sequences of symbols is hopeless: to have any chance of understanding these proofs, let alone coming up with them, at some point you need to think about what the symbols refer to.

This brings us to the third level: the symbols refer to models of set theory, which could also be called “mathematical universes.” Crucially, we always can and often will take these models to be only countably infinite: that is, to contain an infinity of sets, but “merely” ℵ0 of them, the infinity of integers or of finite strings, and no more.

The fourth level of description is from within the models themselves: each model imagines itself to have an uncountable infinity of sets. As far as the model’s concerned, it comprises the entire mathematical universe, even though “looking in from outside,” we can see that that’s not true. In particular, each model of ZFC thinks it has uncountably many sets, many themselves of uncountable cardinality, even if “from the outside” the model is countable.

Say what? The models are mistaken about something as basic as their own size, about how many sets they have? Yes. The models will be like The Matrix (the movie, not the mathematical object), or The Truman Show. They’re self-contained little universes whose inhabitants can never discover that they’re living a lie—that they’re missing sets that we, from the outside, know to exist. The poor denizens of the Matrix will never even be able to learn that their universe—what they mistakenly think of as the universe—is secretly countable! And no Morpheus will ever arrive to enlighten them, although—and this is crucial to Cohen’s proof in particular—the inhabitants will be able to reason more-or-less intelligibly about what would happen if a Morpheus did arrive.

The Löwenheim-Skolem Theorem, from the early 1920s, says that any countable list of first-order axioms that has any model at all (i.e., that’s consistent), must have a model with at most countably many elements. And ZFC is a countable list of first-order axioms, so Löwenheim-Skolem applies to it—even though ZFC implies the existence of an uncountable infinity of sets! Before taking the plunge, we’ll need to not merely grudgingly accept but love and internalize this “paradox,” because pretty much the entire proof of the independence of CH is built on top of it.

Incidentally, once we realize that it’s possible to build self-consistent yet “fake” mathematical universes, we can ask the question that, incredibly, the Matrix movies never ask. Namely, how do we know that our own, larger universe isn’t similarly a lie? The answer is that we don’t! As an example—I hope you’re sitting down for this—even though Cantor proved that there are uncountably many real numbers, that only means there are uncountably many reals for us. We can’t rule out the possibly that God, looking down on our universe, would see countably many reals.

Cantor’s Proof Revisited

To back up: the whole story of CH starts, of course, with Cantor’s epochal discovery of the different orders of infinity, that for example, there are more subsets of positive integers (or equivalently real numbers, or equivalently infinite binary sequences) than there are positive integers. The devout Cantor thought his discovery illuminated the nature of God; it’s never been entirely obvious to me that he was wrong.

Recall how Cantor’s proof works: we suppose by contradiction that we have an enumeration of all infinite binary sequences: for example,

s(0) = 00000000…
s(1) = 01010101…
s(2) = 11001010….
s(3) = 10000000….

We then produce a new infinite binary sequence that’s not on the list, by going down the diagonal and flipping each bit, which in the example above would produce 1011…

But look more carefully. What Cantor really shows is only that, within our mathematical universe, there can’t be an enumeration of all the reals of our universe. For if there were, we could use it to define a new real that was in the universe but not in the enumeration. The proof doesn’t rule out the possibility that God could enumerate the reals of our universe! It only shows that, if so, there would need to be additional, heavenly reals that were missing from even God’s enumeration (for example, the one produced by diagonalizing against that enumeration).

Which reals could possibly be “missing” from our universe? Every real you can name—42, π, √e, even uncomputable reals like Chaitin’s Ω—has to be there, right? Yes, and there’s the rub: every real you can name. Each name is a finite string of symbols, so whatever your naming system, you can only ever name countably many reals, leaving 100% of the reals nameless.

Or did you think of only the rationals or algebraic numbers as forming a countable dust of discrete points, with numbers like π and e filling in the solid “continuum” between them? If so, then I hope you’re sitting down for this: every real number you’ve ever heard of belongs to the countable dust! The entire concept of “the continuum” is only needed for reals that don’t have names and never will.

From ℵ0 Feet

Gödel and Cohen’s achievement was to show that, without creating any contradictions in set theory, we can adjust size of this elusive “continuum,” put more reals into it or fewer. How does one even start to begin to prove such a statement?

From a distance of ℵ0 feet, Gödel proves the consistency of CH by building minimalist mathematical universes: one where “the only sets that exist, are the ones required to exist by the ZFC axioms.” (These universes can, however, differ from each other in how “tall” they are: that is, in how many ordinals they have, and hence how many sets overall. More about that in a future post!) Gödel proves that, if the axioms of set theory are consistent—that is, if they describe any universes at all—then they also describe these minimalist universes. He then proves that, in any of these minimalist universes, from the standpoint of someone within that universe, there are exactly ℵ1 real numbers, and hence CH holds.

At an equally stratospheric level, Cohen proves the consistency of not(CH) by building … well, non-minimalist mathematical universes! A simple way is to start with Gödel’s minimalist universe—or rather, an even more minimalist universe than his, one that’s been cut down to have only countably many sets—and then to stick in a bunch of new real numbers that weren’t in that universe before. We choose the new real numbers to ensure two things: first, we still have a model of ZFC, and second, that we make CH false. The details of how to do that will, of course, concern us later.

My Biggest Confusion

In subsequent posts, I’ll say more about the character of the ZFC axioms and how one builds models of them to order. Just as a teaser, though, to conclude this post I’d like to clear up a fundamental misconception I had about this subject, from roughly the age of 16 until a couple months ago.

I thought: the way Gödel proves the consistency of CH, must be by examining all the sets in his minimalist universe, and checking that each one has either at most ℵ0 elements or else at least C of them. Likewise, the way Cohen proves the consistency of not(CH), must be by “forcing in” some extra sets, which have more than ℵ0 elements but fewer than C elements.

Except, it turns out that’s not how it works. Firstly, to prove CH in his universe, Gödel is not going to check each set to make sure it doesn’t have intermediate cardinality; instead, he’s simply going to count all the reals to make sure that there are only ℵ1 of them—where 1 is the next infinite cardinality after ℵ0. This will imply that C=ℵ1, which is another way to state CH.

More importantly, to build a universe where CH is false, Cohen is going to start with a universe where C=ℵ1, like Gödel’s universe, and then add in more reals: say, ℵ2 of them. The ℵ1 “original” reals will then supply our set of intermediate cardinality between the ℵ0 integers and the ℵ2 “new” reals.

Looking back, the core of my confusion was this. I had thought: I can visualize what ℵ0 means; that’s just the infinity of integers. I can also visualize what \( C=2^{\aleph_0} \) means; that’s the infinity of points on a line. Those, therefore, are the two bedrocks of clarity in this discussion. By contrast, I can’t visualize a set of intermediate cardinality between ℵ0 and C. The intermediate infinity, being weird and ghostlike, is the one that shouldn’t exist unless we deliberately “force” it to.

Turns out I had things backwards. For starters, I can’t visualize the uncountable infinity of real numbers. I might think I’m visualizing the real line—it’s solid, it’s black, it’s got little points everywhere—but how can I be sure that I’m not merely visualizing the ℵ0 rationals, or (say) the computable or definable reals, which include all the ones that arise in ordinary math?

The continuum C is not at all the bedrock of clarity that I’d thought it was. Unlike its junior partner ℵ0, the continuum is adjustable, changeable—and we will change it when we build different models of ZFC. What’s (relatively) more “fixed” in this game is something that I, like many non-experts, had always given short shrift to: Cantor’s sequence of Alephs ℵ0, ℵ1, ℵ2, etc.

Cantor, who was a very great man, didn’t merely discover that C>ℵ0; he also discovered that the infinite cardinalities form a well-ordered sequence, with no infinite descending chains. Thus, after ℵ0, there’s a next greater infinity that we call ℵ1; after ℵ1 comes ℵ2; after the entire infinite sequence ℵ0,ℵ1,ℵ2,ℵ3,… comes ℵω; after ℵω comes ℵω+1; and so on. These infinities will always be there in any universe of set theory, and always in the same order.

Our job, as engineers of the mathematical universe, will include pegging the continuum C to one of the Alephs. If we stick in a bare minimum of reals, we’ll get C=ℵ1, if we stick in more we can get C=ℵ2 or C=ℵ3, etc. We can’t make C equal to ℵ0—that’s Cantor’s Theorem—and we also can’t make C equal to ℵω, by an important theorem of König that we’ll discuss later (yes, this is an umlaut-heavy field). But it will turn out that we can make C equal to just about any other Aleph: in particular, to any infinity other than ℵ0 that’s not the supremum of a countable list of smaller infinities.

In some sense, this is the whole journey that we need to undertake in this subject: from seeing the cardinality of the continuum as a metaphysical mystery, which we might contemplate by staring really hard at a black line on white paper, to seeing the cardinality of the continuum as an engineering problem.

Stay tuned! Next installment coming after the civilizational Singularity in three days, assuming there’s still power and Internet and food and so forth.

Oh, and happy Halloween. Ghostly sets of intermediate cardinality … spoooooky!

My second podcast with Lex Fridman

Monday, October 12th, 2020

Here it is—enjoy! (I strongly recommend listening at 2x speed.)

We recorded it a month ago—outdoors (for obvious covid reasons), on a covered balcony in Austin, as it drizzled all around us. Topics included:

  • Whether the universe is a simulation
  • Eugene Goostman, GPT-3, the Turing Test, and consciousness
  • Why I disagree with Integrated Information Theory
  • Why I disagree with Penrose’s ideas about physics and the mind
  • Intro to complexity theory, including P, NP, PSPACE, BQP, and SZK
  • The US’s catastrophic failure on covid
  • The importance of the election
  • My objections to cancel culture
  • The role of love in my life (!)

Thanks so much to Lex for his characteristically probing questions, apologies as always for my verbal tics, and here’s our first podcast for those who missed that one.

My video interview with Lex Fridman at MIT about philosophy and quantum computing

Monday, February 17th, 2020

Here it is (about 90 minutes; I recommend the 1.5x speed)

I had buried this as an addendum to my previous post on the quantum supremacy lecture tour, but then decided that a steely-eyed assessment of what’s likely to have more or less interest for this blog’s readers probably militated in favor of a separate post.

Thanks so much to Lex for arranging the interview and for his questions!

“Quantum Computing and the Meaning of Life”

Wednesday, March 13th, 2019

Manolis Kellis is a computational biologist at MIT, known as one of the leaders in applying big data to genomics and gene regulatory networks. Throughout my 9 years at MIT, Manolis was one of my best friends there, even though our research styles and interests might seem distant. He and I were in the same PECASE class; see if you can spot us both in this photo (in the rows behind America’s last sentient president). My and Manolis’s families also became close after we both got married and had kids. We still keep in touch.

Today Manolis will be celebrating his 42nd birthday, with a symposium on the meaning of life (!). He asked his friends and colleagues to contribute talks and videos reflecting on that weighty topic.

Here’s a 15-minute video interview that Manolis and I recorded last night, where he asks me to pontificate about the implications of quantum mechanics for consciousness and free will and whether the universe is a computer simulation—and also about, uh, how to balance blogging with work and family.

Also, here’s a 2-minute birthday video that I made for Manolis before I really understood what he wanted. Unlike the first video, this one has no academic content, but it does involve me wearing a cowboy hat and swinging a makeshift “lasso.”

Happy birthday Manolis!

Interpretive cards (MWI, Bohm, Copenhagen: collect ’em all)

Saturday, February 3rd, 2018

I’ve been way too distracted by actual research lately from my primary career as a nerd blogger—that’s what happens when you’re on sabbatical.  But now I’m sick, and in no condition to be thinking about research.  And this morning, in a thread that had turned to my views on the interpretation of quantum mechanics called “QBism,” regular commenter Atreat asked me the following pointed question:

Scott, what is your preferred interpretation of QM? I don’t think I’ve ever seen you put your cards on the table and lay out clearly what interpretation(s) you think are closest to the truth. I don’t think your ghost paper qualifies as an answer, BTW. I’ve heard you say you have deep skepticism about objective collapse theories and yet these would seemingly be right up your philosophical alley so to speak. If you had to bet on which interpretation was closest to the truth, which one would you go with?

Many people have asked me some variant of the same thing.  As it happens, I’d been toying since the summer with a huge post about my views on each major interpretation, but I never quite got it into a form I wanted.  By contrast, it took me only an hour to write out a reply to Atreat, and in the age of social media and attention spans measured in attoseconds, many readers will probably prefer that short reply to the huge post anyway.  So then I figured, why not promote it to a full post and be done with it?  So without further ado:


Dear Atreat,

It’s no coincidence that you haven’t seen me put my cards on the table with a favored interpretation of QM!

There are interpretations (like the “transactional interpretation”) that make no sense whatsoever to me.

There are “interpretations” like dynamical collapse that aren’t interpretations at all, but proposals for new physical theories.  By all means, let’s test QM on larger and larger systems, among other reasons because it could tell us that some such theory is true or—vastly more likely, I think—place new limits on it! (People are trying.)

Then there’s the deBroglie-Bohm theory, which does lay its cards on the table in a very interesting way, by proposing a specific evolution rule for hidden variables (chosen to match the predictions of QM), but which thereby opens itself up to the charge of non-uniqueness: why that rule, as opposed to a thousand other rules that someone could write down?  And if they all lead to the same predictions, then how could anyone ever know which rule was right?

And then there are dozens of interpretations that seem to differ from one of the “main” interpretations (Many-Worlds, Copenhagen, Bohm) mostly just in the verbal patter.

As for Copenhagen, I’ve described it as “shut-up and calculate except without ever shutting up about it”!  I regard Bohr’s writings on the subject as barely comprehensible, and Copenhagen as less of an interpretation than a self-conscious anti-interpretation: a studied refusal to offer any account of the actual constituents of the world, and—most of all—an insistence that if you insist on such an account, then that just proves that you cling naïvely to a classical worldview, and haven’t grasped the enormity of the quantum revolution.

But the basic split between Many-Worlds and Copenhagen (or better: between Many-Worlds and “shut-up-and-calculate” / “QM needs no interpretation” / etc.), I regard as coming from two fundamentally different conceptions of what a scientific theory is supposed to do for you.  Is it supposed to posit an objective state for the universe, or be only a tool that you use to organize your experiences?

Also, are the ultimate equations that govern the universe “real,” while tables and chairs are “unreal” (in the sense of being no more than fuzzy approximate descriptions of certain solutions to the equations)?  Or are the tables and chairs “real,” while the equations are “unreal” (in the sense of being tools invented by humans to predict the behavior of tables and chairs and whatever else, while extraterrestrials might use other tools)?  Which level of reality do you care about / want to load with positive affect, and which level do you want to denigrate?

This is not like picking a race horse, in the sense that there might be no future discovery or event that will tell us who was closer to the truth.  I regard it as conceivable that superintelligent AIs will still argue about the interpretation of QM … or maybe that God and the angels argue about it now.

Indeed, about the only thing I can think of that might definitively settle the debate, would be the discovery of an even deeper level of description than QM—but such a discovery would “settle” the debate only by completely changing the terms of it.

I will say this, however, in favor of Many-Worlds: it’s clearly and unequivocally the best interpretation of QM, as long as we leave ourselves out of the picture!  I.e., as long as we say that the goal of physics is to give the simplest, cleanest possible mathematical description of the world that somewhere contains something that seems to correspond to observation, and we’re willing to shunt as much metaphysical weirdness as needed to those who worry themselves about details like “wait, so are we postulating the physical existence of a continuum of slightly different variants of me, or just an astronomically large finite number?” (Incidentally, Max Tegmark’s “mathematical multiverse” does even better than MWI by this standard.  Tegmark is the one waiting for you all the way at the bottom of the slippery slope of always preferring Occam’s Razor over trying to account for the specificity of the observed world.)  It’s no coincidence, I don’t think, that MWI is so popular among those who are also eliminativists about consciousness.

When I taught my undergrad Intro to Quantum Information course last spring—for which lecture notes are coming soon, by the way!—it was striking how often I needed to resort to an MWI-like way of speaking when students got confused about measurement and decoherence. (“So then we apply this unitary transformation U that entangles the system and environment, and we compute a partial trace over the environment qubits, and we see that it’s as if the system has been measured, though of course we could in principle reverse this by applying U-1 … oh shoot, have I just conceded MWI?”)

On the other hand, when (at the TAs’ insistence) we put an optional ungraded question on the final exam that asked students their favorite interpretation of QM, we found that there was no correlation whatsoever between interpretation and final exam score—except that students who said they didn’t believe any interpretation at all, or that the question was meaningless or didn’t matter, scored noticeably higher than everyone else.

Anyway, as I said, MWI is the best interpretation if we leave ourselves out of the picture.  But you object: “OK, and what if we don’t leave ourselves out of the picture?  If we dig deep enough on the interpretation of QM, aren’t we ultimately also asking about the ‘hard problem of consciousness,’ much as some people try to deny that? So for example, what would it be like to be maintained in a coherent superposition of thinking two different thoughts A and B, and then to get measured in the |A⟩+|B⟩, |A⟩-|B⟩ basis?  Would it even be like anything?  Or is there something about our consciousness that depends on decoherence, irreversibility, full participation in the arrow of the time, not living in an enclosed little unitary box like AdS/CFT—something that we’d necessarily destroy if we tried to set up a large-scale interference experiment on our own brains, or any other conscious entities?  If so, then wouldn’t that point to a strange sort of reconciliation of Many-Worlds with Copenhagen—where as soon as we had a superposition involving different subjective experiences, for that very reason its being a superposition would be forevermore devoid of empirical consequences, and we could treat it as just a classical probability distribution?”

I’m not sure, but The Ghost in the Quantum Turing Machine will probably have to stand as my last word (or rather, last many words) on those questions for the time being.

Is “information is physical” contentful?

Thursday, July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

Higher-level causation exists (but I wish it didn’t)

Sunday, June 4th, 2017

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.


Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.


This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?


To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.


Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.

Your yearly dose of is-the-universe-a-simulation

Wednesday, March 22nd, 2017

Yesterday Ryan Mandelbaum, at Gizmodo, posted a decidedly tongue-in-cheek piece about whether or not the universe is a computer simulation.  (The piece was filed under the category “LOL.”)

The immediate impetus for Mandelbaum’s piece was a blog post by Sabine Hossenfelder, a physicist who will likely be familiar to regulars here in the nerdosphere.  In her post, Sabine vents about the simulation speculations of philosophers like Nick Bostrom.  She writes:

Proclaiming that “the programmer did it” doesn’t only not explain anything – it teleports us back to the age of mythology. The simulation hypothesis annoys me because it intrudes on the terrain of physicists. It’s a bold claim about the laws of nature that however doesn’t pay any attention to what we know about the laws of nature.

After hammering home that point, Sabine goes further, and says that the simulation hypothesis is almost ruled out, by (for example) the fact that our universe is Lorentz-invariant, and a simulation of our world by a discrete lattice of bits won’t reproduce Lorentz-invariance or other continuous symmetries.

In writing his post, Ryan Mandelbaum interviewed two people: Sabine and me.

I basically told Ryan that I agree with Sabine insofar as she argues that the simulation hypothesis is lazy—that it doesn’t pay its rent by doing real explanatory work, doesn’t even engage much with any of the deep things we’ve learned about the physical world—and disagree insofar as she argues that the simulation hypothesis faces some special difficulty because of Lorentz-invariance or other continuous phenomena in known physics.  In short: blame it for being unfalsifiable rather than for being falsified!

Indeed, to whatever extent we believe the Bekenstein bound—and even more pointedly, to whatever extent we think the AdS/CFT correspondence says something about reality—we believe that in quantum gravity, any bounded physical system (with a short-wavelength cutoff, yada yada) lives in a Hilbert space of a finite number of qubits, perhaps ~1069 qubits per square meter of surface area.  And as a corollary, if the cosmological constant is indeed constant (so that galaxies more than ~20 billion light years away are receding from us faster than light), then our entire observable universe can be described as a system of ~10122 qubits.  The qubits would in some sense be the fundamental reality, from which Lorentz-invariant spacetime and all the rest would need to be recovered as low-energy effective descriptions.  (I hasten to add: there’s of course nothing special about qubits here, any more than there is about bits in classical computation, compared to some other unit of information—nothing that says the Hilbert space dimension has to be a power of 2 or anything silly like that.)  Anyway, this would mean that our observable universe could be simulated by a quantum computer—or even for that matter by a classical computer, to high precision, using a mere ~210^122 time steps.

Sabine might respond that AdS/CFT and other quantum gravity ideas are mere theoretical speculations, not solid and established like special relativity.  But crucially, if you believe that the observable universe couldn’t be simulated by a computer even in principle—that it has no mapping to any system of bits or qubits—then at some point the speculative shoe shifts to the other foot.  The question becomes: do you reject the Church-Turing Thesis?  Or, what amounts to the same thing: do you believe, like Roger Penrose, that it’s possible to build devices in nature that solve the halting problem or other uncomputable problems?  If so, how?  But if not, then how exactly does the universe avoid being computational, in the broad sense of the term?

I’d write more, but by coincidence, right now I’m at an It from Qubit meeting at Stanford, where everyone is talking about how to map quantum theories of gravity to quantum circuits acting on finite sets of qubits, and the questions in quantum circuit complexity that are thereby raised.  It’s tremendously exciting—the mixture of attendees is among the most stimulating I’ve ever encountered, from Lenny Susskind and Don Page and Daniel Harlow to Umesh Vazirani and Dorit Aharonov and Mario Szegedy to Google’s Sergey Brin.  But it should surprise no one that, amid all the discussion of computation and fundamental physics, the question of whether the universe “really” “is” a simulation has barely come up.  Why would it, when there are so many more fruitful things to ask?  All I can say with confidence is that, if our world is a simulation, then whoever is simulating it (God, or a bored teenager in the metaverse) seems to have a clear preference for the 2-norm over the 1-norm, and for the complex numbers over the reals.

State

Sunday, January 1st, 2017

Happy New Year, everyone!  I tripped over a well-concealed hole and sprained my ankle while carrying my daughter across the grass at Austin’s New Years festival, so am now ringing in 2017 lying in bed immobilized, which somehow seems appropriate.  At least Lily is fine, and at least being bedridden gives me ample opportunity to blog.


Another year, another annual Edge question, with its opportunity for hundreds of scientists and intellectuals (including yours truly) to pontificate, often about why their own field of study is the source of the most important insights and challenges facing humanity.  This year’s question was:

What scientific term or concept ought to be more widely known?

With the example given of Richard Dawkins’s “meme,” which jumped into the general vernacular, becoming a meme itself.

My entry, about the notion of “state” (yeah, I tried to focus on the basics), is here.

This year’s question presented a particular challenge, which scientists writing for a broad audience might not have faced for generations.  Namely: to what extent, if any, should your writing acknowledge the dark shadow of recent events?  Does the Putinization of the United States render your little pet debates and hobbyhorses irrelevant?  Or is the most defiant thing you can do to ignore the unfolding catastrophe, to continue building your intellectual sandcastle even as the tidal wave of populist hatred nears?

In any case, the instructions from Edge were clear: ignore politics.  Focus on the eternal.  But people interpreted that injunction differently.

One of my first ideas was to write about the Second Law of Thermodynamics, and to muse about how one of humanity’s tragic flaws is to take for granted the gargantuan effort needed to create and maintain even little temporary pockets of order.  Again and again, people imagine that, if their local pocket of order isn’t working how they want, then they should smash it to pieces, since while admittedly that might make things even worse, there’s also at least 50/50 odds that they’ll magically improve.  In reasoning thus, people fail to appreciate just how exponentially more numerous are the paths downhill, into barbarism and chaos, than are the few paths further up.  So thrashing about randomly, with no knowledge or understanding, is statistically certain to make things worse: on this point thermodynamics, common sense, and human history are all in total agreement.  The implications of these musings for the present would be left as exercises for the reader.

Anyway, I was then pleased when, in a case of convergent evolution, my friend and hero Steven Pinker wrote exactly that essay, so I didn’t need to.

There are many other essays that are worth a read, some of which allude to recent events but the majority of which don’t.  Let me mention a few.

Let me now discuss some disagreements I had with a few of the essays.

  • Donald Hoffman on the holographic principle.  For the point he wanted to make, about the mismatch between our intuitions and the physical world, it seems to me that Hoffman could’ve picked pretty much anything in physics, from Galileo and Newton onward.  What’s new about holography?
  • Jerry Coyne on determinism.  Coyne, who’s written many things I admire, here offers his version of an old argument that I tear my hair out every time I read.  There’s no free will, Coyne says, and therefore we should treat criminals more lightly, e.g. by eschewing harsh punishments in favor of rehabilitation.  Following tradition, Coyne never engages the obvious reply, which is: “sorry, to whom were you addressing that argument?  To me, the jailer?  To the judge?  The jury?  Voters?  Were you addressing us as moral agents, for whom the concept of ‘should’ is relevant?  Then why shouldn’t we address the criminals the same way?”
  • Michael Gazzaniga on “The Schnitt.”  Yes, it’s possible that things like the hard problem of consciousness, or the measurement problem in quantum mechanics, will never have a satisfactory resolution.  But even if so, building a complicated verbal edifice whose sole purpose is to tell people not even to look for a solution, to be satisfied with two “non-overlapping magisteria” and a lack of any explanation for how to reconcile them, never struck me as a substantive contribution to knowledge.  It wasn’t when Niels Bohr did it, and it’s not when someone today does it either.
  • I had a related quibble with Amanda Gefter’s piece on “enactivism”: the view she takes as her starting point, that “physics proves there’s no third-person view of the world,” is controversial to put it mildly among those who know the relevant physics.  (And even if we granted that view, surely a third-person perspective exists for the quasi-Newtonian world in which we evolved, and that’s relevant for the cognitive science questions Gefter then discusses.)
  • Thomas Bass on information pathology.  Bass obliquely discusses the propaganda, conspiracy theories, social-media echo chambers, and unchallenged lies that helped fuel Trump’s rise.  He then locates the source of the problem in Shannon’s information theory (!), which told us how to quantify information, but failed to address questions about the information’s meaning or relevance.  To me, this is almost exactly like blaming arithmetic because it only tells you how to add numbers, without caring whether they’re numbers of rescued orphans or numbers of bombs.  Arithmetic is fine; the problem is with us.
  • In his piece on “number sense,” Keith Devlin argues that the teaching of “rigid, rule-based” math has been rendered obsolete by computers, leaving only the need to teach high-level conceptual understanding.  I partly agree and partly disagree, with the disagreement coming from firsthand knowledge of just how badly that lofty idea gets beaten to mush once it filters down to the grade-school level.  I would say that the basic function of math education is to teach clarity of thought: does this statement hold for all positive integers, or not?  Not how do you feel about it, but does it hold?  If it holds, can you prove it?  What other statements would it follow from?  If it doesn’t hold, can you give a counterexample?  (Incidentally, there are plenty of questions of this type for which humans still outperform the best available software!)  Admittedly, pencil-and-paper arithmetic is both boring and useless—but if you never mastered anything like it, then you certainly wouldn’t be ready for the concept of an algorithm, or for asking higher-level questions about algorithms.
  • Daniel Hook on PT-symmetric quantum mechanics.  As far as I understand, PT-symmetric Hamiltonians are equivalent to ordinary Hermitian ones under similarity transformations.  So this is a mathematical trick, perhaps a useful one—but it’s extremely misleading to talk about it as if it were a new physical theory that differed from quantum mechanics.
  • Jared Diamond extols the virtues of common sense, of which there are indeed many—but alas, his example is that if a mathematical proof leads to a conclusion that your common sense tells you is wrong, then you shouldn’t waste time looking for the exact mistake.  Sometimes that’s good advice, but it’s pretty terrible applied to Goodstein’s Theorem, the muddy children puzzle, the strategy-stealing argument for Go, or anything else that genuinely is shocking until your common sense expands to accommodate it.  Math, like science in general, is a constant dialogue between formal methods and common sense, where sometimes it’s one that needs to get with the program and sometimes it’s the other.
  • Hans Halvorson on matter.  I take issue with Halvorson’s claim that quantum mechanics had to be discarded in favor of quantum field theory, because QM was inconsistent with special relativity.  It seems much better to say: the thing that conflicts with special relativity, and that quantum field theory superseded, was a particular application of quantum mechanics, involving wavefunctions of N particles moving around in a non-relativistic space.  The general principles of QM—unit vectors in complex Hilbert space, unitary evolution, the Born rule, etc.—survived the transition to QFT without the slightest change.

 

The No-Cloning Theorem and the Human Condition: My After-Dinner Talk at QCRYPT

Monday, September 19th, 2016

The following are the after-dinner remarks that I delivered at QCRYPT’2016, the premier quantum cryptography conference, on Thursday Sep. 15 in Washington DC.  You could compare to my after-dinner remarks at QIP’2006 to see how much I’ve “”matured”” since then. Thanks so much to Yi-Kai Liu and the other organizers for inviting me and for putting on a really fantastic conference.


It’s wonderful to be here at QCRYPT among so many friends—this is the first significant conference I’ve attended since I moved from MIT to Texas. I do, however, need to register a complaint with the organizers, which is: why wasn’t I allowed to bring my concealed firearm to the conference? You know, down in Texas, we don’t look too kindly on you academic elitists in Washington DC telling us what to do, who we can and can’t shoot and so forth. Don’t mess with Texas! As you might’ve heard, many of us Texans even support a big, beautiful, physical wall being built along our border with Mexico. Personally, though, I don’t think the wall proposal goes far enough. Forget about illegal immigration and smuggling: I don’t even want Americans and Mexicans to be able to win the CHSH game with probability exceeding 3/4. Do any of you know what kind of wall could prevent that? Maybe a metaphysical wall.

OK, but that’s not what I wanted to talk about. When Yi-Kai asked me to give an after-dinner talk, I wasn’t sure whether to try to say something actually relevant to quantum cryptography or just make jokes. So I’ll do something in between: I’ll tell you about research directions in quantum cryptography that are also jokes.

The subject of this talk is a deep theorem that stands as one of the crowning achievements of our field. I refer, of course, to the No-Cloning Theorem. Almost everything we’re talking about at this conference, from QKD onwards, is based in some way on quantum states being unclonable. If you read Stephen Wiesner’s paper from 1968, which founded quantum cryptography, the No-Cloning Theorem already played a central role—although Wiesner didn’t call it that. By the way, here’s my #1 piece of research advice to the students in the audience: if you want to become immortal, just find some fact that everyone already knows and give it a name!

I’d like to pose the question: why should our universe be governed by physical laws that make the No-Cloning Theorem true? I mean, it’s possible that there’s some other reason for our universe to be quantum-mechanical, and No-Cloning is just a byproduct of that. No-Cloning would then be like the armpit of quantum mechanics: not there because it does anything useful, but just because there’s gotta be something under your arms.

OK, but No-Cloning feels really fundamental. One of my early memories is when I was 5 years old or so, and utterly transfixed by my dad’s home fax machine, one of those crappy 1980s fax machines with wax paper. I kept thinking about it: is it really true that a piece of paper gets transmaterialized, sent through a wire, and reconstituted at the other location? Could I have been that wrong about how the universe works? Until finally I got it—and once you get it, it’s hard even to recapture your original confusion, because it becomes so obvious that the world is made not of stuff but of copyable bits of information. “Information wants to be free!”

The No-Cloning Theorem represents nothing less than a partial return to the view of the world that I had before I was five. It says that quantum information doesn’t want to be free: it wants to be private. There is, it turns out, a kind of information that’s tied to a particular place, or set of places. It can be moved around, or even teleported, but it can’t be copied the way a fax machine copies bits.

So I think it’s worth at least entertaining the possibility that we don’t have No-Cloning because of quantum mechanics; we have quantum mechanics because of No-Cloning—or because quantum mechanics is the simplest, most elegant theory that has unclonability as a core principle. But if so, that just pushes the question back to: why should unclonability be a core principle of physics?


Quantum Key Distribution

A first suggestion about this question came from Gilles Brassard, who’s here. Years ago, I attended a talk by Gilles in which he speculated that the laws of quantum mechanics are what they are because Quantum Key Distribution (QKD) has to be possible, while bit commitment has to be impossible. If true, that would be awesome for the people at this conference. It would mean that, far from being this exotic competitor to RSA and Diffie-Hellman that’s distance-limited and bandwidth-limited and has a tiny market share right now, QKD would be the entire reason why the universe is as it is! Or maybe what this really amounts to is an appeal to the Anthropic Principle. Like, if QKD hadn’t been possible, then we wouldn’t be here at QCRYPT to talk about it.


Quantum Money

But maybe we should search more broadly for the reasons why our laws of physics satisfy a No-Cloning Theorem. Wiesner’s paper sort of hinted at QKD, but the main thing it had was a scheme for unforgeable quantum money. This is one of the most direct uses imaginable for the No-Cloning Theorem: to store economic value in something that it’s physically impossible to copy. So maybe that’s the reason for No-Cloning: because God wanted us to have e-commerce, and didn’t want us to have to bother with blockchains (and certainly not with credit card numbers).

The central difficulty with quantum money is: how do you authenticate a bill as genuine? (OK, fine, there’s also the dificulty of how to keep a bill coherent in your wallet for more than a microsecond or whatever. But we’ll leave that for the engineers.)

In Wiesner’s original scheme, he solved the authentication problem by saying that, whenever you want to verify a quantum bill, you bring it back to the bank that printed it. The bank then looks up the bill’s classical serial number in a giant database, which tells the bank in which basis to measure each of the bill’s qubits.

With this system, you can actually get information-theoretic security against counterfeiting. OK, but the fact that you have to bring a bill to the bank to be verified negates much of the advantage of quantum money in the first place. If you’re going to keep involving a bank, then why not just use a credit card?

That’s why over the past decade, some of us have been working on public-key quantum money: that is, quantum money that anyone can verify. For this kind of quantum money, it’s easy to see that the No-Cloning Theorem is no longer enough: you also need some cryptographic assumption. But OK, we can consider that. In recent years, we’ve achieved glory by proposing a huge variety of public-key quantum money schemes—and we’ve achieved even greater glory by breaking almost all of them!

After a while, there were basically two schemes left standing: one based on knot theory by Ed Farhi, Peter Shor, et al. That one has been proven to be secure under the assumption that it can’t be broken. The second scheme, which Paul Christiano and I proposed in 2012, is based on hidden subspaces encoded by multivariate polynomials. For our scheme, Paul and I were able to do better than Farhi et al.: we gave a security reduction. That is, we proved that our quantum money scheme is secure, unless there’s a polynomial-time quantum algorithm to find hidden subspaces encoded by low-degree multivariate polynomials (yadda yadda, you can look up the details) with much greater success probability than we thought possible.

Today, the situation is that my and Paul’s security proof remains completely valid, but meanwhile, our money is completely insecure! Our reduction means the opposite of what we thought it did. There is a break of our quantum money scheme, and as a consequence, there’s also a quantum algorithm to find large subspaces hidden by low-degree polynomials with much better success probability than we’d thought. What happened was that first, some French algebraic cryptanalysts—Faugere, Pena, I can’t pronounce their names—used Gröbner bases to break the noiseless version of scheme, in classical polynomial time. So I thought, phew! At least I had acceded when Paul insisted that we also include a noisy version of the scheme. But later, Paul noticed that there’s a quantum reduction from the problem of breaking our noisy scheme to the problem of breaking the noiseless one, so the former is broken as well.

I’m choosing to spin this positively: “we used quantum money to discover a striking new quantum algorithm for finding subspaces hidden by low-degree polynomials. Err, yes, that’s exactly what we did.”

But, bottom line, until we manage to invent a better public-key quantum money scheme, or otherwise sort this out, I don’t think we’re entitled to claim that God put unclonability into our universe in order for quantum money to be possible.


Copy-Protected Quantum Software

So if not money, then what about its cousin, copy-protected software—could that be why No-Cloning holds? By copy-protected quantum software, I just mean a quantum state that, if you feed it into your quantum computer, lets you evaluate some Boolean function on any input of your choice, but that doesn’t let you efficiently prepare more states that let the same function be evaluated. I think this is important as one of the preeminent evil applications of quantum information. Why should nuclear physicists and genetic engineers get a monopoly on the evil stuff?

OK, but is copy-protected quantum software even possible? The first worry you might have is that, yeah, maybe it’s possible, but then every time you wanted to run the quantum program, you’d have to make a measurement that destroyed it. So then you’d have to go back and buy a new copy of the program for the next run, and so on. Of course, to the software company, this would presumably be a feature rather than a bug!

But as it turns out, there’s a fact many of you know—sometimes called the “Gentle Measurement Lemma,” other times the “Almost As Good As New Lemma”—which says that, as long as the outcome of your measurement on a quantum state could be predicted almost with certainty given knowledge of the state, the measurement can be implemented in such a way that it hardly damages the state at all. This tells us that, if quantum money, copy-protected quantum software, and the other things we’re talking about are possible at all, then they can also be made reusable. I summarize the principle as: “if rockets, then space shuttles.”

Much like with quantum money, one can show that, relative to a suitable oracle, it’s possible to quantumly copy-protect any efficiently computable function—or rather, any function that’s hard to learn from its input/output behavior. Indeed, the implementation can be not only copy-protected but also obfuscated, so that the user learns nothing besides the input/output behavior. As Bill Fefferman pointed out in his talk this morning, the No-Cloning Theorem lets us bypass Barak et al.’s famous result on the impossibility of obfuscation, because their impossibility proof assumed the ability to copy the obfuscated program.

Of course, what we really care about is whether quantum copy-protection is possible in the real world, with no oracle. I was able to give candidate implementations of quantum copy-protection for extremely special functions, like one that just checks the validity of a password. In the general case—that is, for arbitrary programs—Paul Christiano has a beautiful proposal for how to do it, which builds on our hidden-subspace money scheme. Unfortunately, since our money scheme is currently in the shop being repaired, it’s probably premature to think about the security of the much more complicated copy-protection scheme! But these are wonderful open problems, and I encourage any of you to come and scoop us. Once we know whether uncopyable quantum software is possible at all, we could then debate whether it’s the “reason” for our universe to have unclonability as a core principle.


Unclonable Proofs and Advice

Along the same lines, I can’t resist mentioning some favorite research directions, which some enterprising student here could totally turn into a talk at next year’s QCRYPT.

Firstly, what can we say about clonable versus unclonable quantum proofs—that is, QMA witness states? In other words: for which problems in QMA can we ensure that there’s an accepting witness that lets you efficiently create as many additional accepting witnesses as you want? (I mean, besides the QCMA problems, the ones that have short classical witnesses?) For which problems in QMA can we ensure that there’s an accepting witness that doesn’t let you efficiently create any additional accepting witnesses? I do have a few observations about these questions—ask me if you’re interested—but on the whole, I believe almost anything one can ask about them remains open.

Admittedly, it’s not clear how much use an unclonable proof would be. Like, imagine a quantum state that encoded a proof of the Riemann Hypothesis, and which you would keep in your bedroom, in a glass orb on your nightstand or something. And whenever you felt your doubts about the Riemann Hypothesis resurfacing, you’d take the state out of its orb and measure it again to reassure yourself of RH’s truth. You’d be like, “my preciousssss!” And no one else could copy your state and thereby gain the same Riemann-faith-restoring powers that you had. I dunno, I probably won’t hawk this application in a DARPA grant.

Similarly, one can ask about clonable versus unclonable quantum advice states—that is, initial states that are given to you to boost your computational power beyond that of an ordinary quantum computer. And that’s also a fascinating open problem.

OK, but maybe none of this quite gets at why our universe has unclonability. And this is an after-dinner talk, so do you want me to get to the really crazy stuff? Yes?


Self-Referential Paradoxes

OK! What if unclonability is our universe’s way around the paradoxes of self-reference, like the unsolvability of the halting problem and Gödel’s Incompleteness Theorem? Allow me to explain what I mean.

In kindergarten or wherever, we all learn Turing’s proof that there’s no computer program to solve the halting problem. But what isn’t usually stressed is that that proof actually does more than advertised. If someone hands you a program that they claim solves the halting problem, Turing doesn’t merely tell you that that person is wrong—rather, he shows you exactly how to expose the person as a jackass, by constructing an example input on which their program fails. All you do is, you take their claimed halt-decider, modify it in some simple way, and then feed the result back to the halt-decider as input. You thereby create a situation where, if your program halts given its own code as input, then it must run forever, and if it runs forever then it halts. “WHOOOOSH!” [head-exploding gesture]

OK, but now imagine that the program someone hands you, which they claim solves the halting problem, is a quantum program. That is, it’s a quantum state, which you measure in some basis depending on the program you’re interested in, in order to decide whether that program halts. Well, the truth is, this quantum program still can’t work to solve the halting problem. After all, there’s some classical program that simulates the quantum one, albeit less efficiently, and we already know that the classical program can’t work.

But now consider the question: how would you actually produce an example input on which this quantum program failed to solve the halting problem? Like, suppose the program worked on every input you tried. Then ultimately, to produce a counterexample, you might need to follow Turing’s proof and make a copy of the claimed quantum halt-decider. But then, of course, you’d run up against the No-Cloning Theorem!

So we seem to arrive at the conclusion that, while of course there’s no quantum program to solve the halting problem, there might be a quantum program for which no one could explicitly refute that it solved the halting problem, by giving a counterexample.

I was pretty excited about this observation for a day or two, until I noticed the following. Let’s suppose your quantum program that allegedly solves the halting problem has n qubits. Then it’s possible to prove that the program can’t possibly be used to compute more than, say, 2n bits of Chaitin’s constant Ω, which is the probability that a random program halts. OK, but if we had an actual oracle for the halting problem, we could use it to compute as many bits of Ω as we wanted. So, suppose I treated my quantum program as if it were an oracle for the halting problem, and I used it to compute the first 2n bits of Ω. Then I would know that, assuming the truth of quantum mechanics, the program must have made a mistake somewhere. There would still be something weird, which is that I wouldn’t know on which input my program had made an error—I would just know that it must’ve erred somewhere! With a bit of cleverness, one can narrow things down to two inputs, such that the quantum halt-decider must have erred on at least one of them. But I don’t know whether it’s possible to go further, and concentrate the wrongness on a single query.

We can play a similar game with other famous applications of self-reference. For example, suppose we use a quantum state to encode a system of axioms. Then that system of axioms will still be subject to Gödel’s Incompleteness Theorem (which I guess I believe despite the umlaut). If it’s consistent, it won’t be able to prove all the true statements of arithmetic. But we might never be able to produce an explicit example of a true statement that the axioms don’t prove. To do so we’d have to clone the state encoding the axioms and thereby violate No-Cloning.


Personal Identity

But since I’m a bit drunk, I should confess that all this stuff about Gödel and self-reference is just a warmup to what I really wanted to talk about, which is whether the No-Cloning Theorem might have anything to do with the mysteries of personal identity and “free will.” I first encountered this idea in Roger Penrose’s book, The Emperor’s New Mind. But I want to stress that I’m not talking here about the possibility that the brain is a quantum computer—much less about the possibility that it’s a quantum-gravitational hypercomputer that uses microtubules to solve the halting problem! I might be drunk, but I’m not that drunk. I also think that the Penrose-Lucas argument, based on Gödel’s Theorem, for why the brain has to work that way is fundamentally flawed.

But here I’m talking about something different. See, I have a lot of friends in the Singularity / Friendly AI movement. And I talk to them whenever I pass through the Bay Area, which is where they congregate. And many of them express great confidence that before too long—maybe in 20 or 30 years, maybe in 100 years—we’ll be able to upload ourselves to computers and live forever on the Internet (as opposed to just living 70% of our lives on the Internet, like we do today).

This would have lots of advantages. For example, any time you were about to do something dangerous, you’d just make a backup copy of yourself first. If you were struggling with a conference deadline, you’d spawn 100 temporary copies of yourself. If you wanted to visit Mars or Jupiter, you’d just email yourself there. If Trump became president, you’d not run yourself for 8 years (or maybe 80 or 800 years). And so on.

Admittedly, some awkward questions arise. For example, let’s say the hardware runs three copies of your code and takes a majority vote, just for error-correcting purposes. Does that bring three copies of you into existence, or only one copy? Or let’s say your code is run homomorphically encrypted, with the only decryption key stored in another galaxy. Does that count? Or you email yourself to Mars. If you want to make sure that you’ll wake up on Mars, is it important that you delete the copy of your code that remains on earth? Does it matter whether anyone runs the code or not? And what exactly counts as “running” it? Or my favorite one: could someone threaten you by saying, “look, I have a copy of your code, and if you don’t do what I say, I’m going to make a thousand copies of it and subject them all to horrible tortures?”

The issue, in all these cases, is that in a world where there could be millions of copies of your code running on different substrates in different locations—or things where it’s not even clear whether they count as a copy or not—we don’t have a principled way to take as input a description of the state of the universe, and then identify where in the universe you are—or even a probability distribution over places where you could be. And yet you seem to need such a way in order to make predictions and decisions.

A few years ago, I wrote this gigantic, post-tenure essay called The Ghost in the Quantum Turing Machine, where I tried to make the point that we don’t know at what level of granularity a brain would need to be simulated in order to duplicate someone’s subjective identity. Maybe you’d only need to go down to the level of neurons and synapses. But if you needed to go all the way down to the molecular level, then the No-Cloning Theorem would immediately throw a wrench into most of the paradoxes of personal identity that we discussed earlier.

For it would mean that there were some microscopic yet essential details about each of us that were fundamentally uncopyable, localized to a particular part of space. We would all, in effect, be quantumly copy-protected software. Each of us would have a core of unpredictability—not merely probabilistic unpredictability, like that of a quantum random number generator, but genuine unpredictability—that an external model of us would fail to capture completely. Of course, by having futuristic nanorobots scan our brains and so forth, it would be possible in principle to make extremely realistic copies of us. But those copies necessarily wouldn’t capture quite everything. And, one can speculate, maybe not enough for your subjective experience to “transfer over.”

Maybe the most striking aspect of this picture is that sure, you could teleport yourself to Mars—but to do so you’d need to use quantum teleportation, and as we all know, quantum teleportation necessarily destroys the original copy of the teleported state. So we’d avert this metaphysical crisis about what to do with the copy that remained on Earth.

Look—I don’t know if any of you are like me, and have ever gotten depressed by reflecting that all of your life experiences, all your joys and sorrows and loves and losses, every itch and flick of your finger, could in principle be encoded by a huge but finite string of bits, and therefore by a single positive integer. (Really? No one else gets depressed about that?) It’s kind of like: given that this integer has existed since before there was a universe, and will continue to exist after the universe has degenerated into a thin gruel of radiation, what’s the point of even going through the motions? You know?

But the No-Cloning Theorem raises the possibility that at least this integer is really your integer. At least it’s something that no one else knows, and no one else could know in principle, even with futuristic brain-scanning technology: you’ll always be able to surprise the world with a new digit. I don’t know if that’s true or not, but if it were true, then it seems like the sort of thing that would be worthy of elevating unclonability to a fundamental principle of the universe.

So as you enjoy your dinner and dessert at this historic Mayflower Hotel, I ask you to reflect on the following. People can photograph this event, they can video it, they can type up transcripts, in principle they could even record everything that happens down to the millimeter level, and post it on the Internet for posterity. But they’re not gonna get the quantum states. There’s something about this evening, like about every evening, that will vanish forever, so please savor it while it lasts. Thank you.


Update (Sep. 20): Unbeknownst to me, Marc Kaplan did video the event and put it up on YouTube! Click here to watch. Thanks very much to Marc! I hope you enjoy, even though of course, the video can’t precisely clone the experience of having been there.

[Note: The part where I raise my middle finger is an inside joke—one of the speakers during the technical sessions inadvertently did the same while making a point, causing great mirth in the audience.]