Is “information is physical” contentful?

July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  So this is essentially why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, physicsts are confident that the new bits will have nowhere to go except into pushing up the energy—which, in this case, means describing a greater range of possible speeds for the water molecules.  So then the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

Three things

July 17th, 2017

I was shocked and horrified to learn of the loss of Maryam Mirzakhani at age 40, after a battle with cancer (see here or here).  Mirzakhani was a renowned mathematician at Stanford and the world’s first and so far only female Fields Medalist.  I never had the privilege of meeting her, but everything I’ve read about her fills me with admiration.  I wish to offer condolences to her friends and family, including her husband Jan Vondrák, also a professor at Stanford and a member of the CS theory community.


In other depressing news, discussion continues to rage on social media about “The Uninhabitable Earth,” the New York magazine article by David Wallace-Wells arguing that the dangers of climate change have been systematically understated even by climate scientists; that sea level rise is the least of the problems; and that if we stay the current course, much of the earth’s landmass has a good chance of being uninhabitable by the year 2100.  In an unusual turn of events, the Wallace-Wells piece has been getting slammed by climate scientists, including Michael Mann (see here and also this interview)—people who are usually in the news to refute the claims of deniers.

Some of the critics’ arguments seem cogent to me: for example, that Wallace-Wells misunderstood some satellite data, and more broadly, that the piece misleadingly presents its scenario as overwhelmingly probable by 2100 if we do nothing, rather than as “only” 10% likely or whatever—i.e., a mere Trump-becoming-president level of risk.  Other objections to the article impressed me less: for example, that doom-and-gloom is a bad way to motivate people about climate change; that the masses need a more optimistic takeaway.  That obviously has no bearing on the truth of what’s going to happen—but even if we did agree to entertain such arguments, well, it’s not as if mainstream messaging on climate change has been an unmitigated success.  What if everyone should be sweating-in-the-night terrified?

As far as I understand it, the question of the plausibility of Wallace-Wells’s catastrophe scenario mostly just comes down to a single scientific unknown: namely, will the melting permafrost belch huge amounts of methane into the atmosphere?  If it does, then “Armageddon” is probably a fair description of what awaits us in the next century, and if not, not.  Alas, our understanding of permafrost doesn’t seem especially reliable, and it strikes me that models of such feedbacks have a long history of erring on the side of conservatism (for example, researchers were astonished by how quickly glaciers and ice shelves fell apart).

So, while I wish the article was written with more caveats, I submit that runaway warming scenarios deserve more attention rather than less.  And we should be putting discussion of those scenarios in exactly the broader context that Wallace-Wells does: namely, that of the Permian-Triassic extinction event, the Fermi paradox, and the conditions for a technological civilization to survive past its infancy.

Certainly we spend much more time on risks to civilization (e.g., nuclear terrorism, bioengineered pandemics) that strike me as less probable than this one.  And certainly this tail, in the distribution of possible outcomes, deserves at least as much attention as its more popular opposite, the tail where climate change turns out not to be much of a problem at all.  For the grim truth about climate change is that history won’t end in 2100: only the projections do.  And the mere addition of 50 more years could easily suffice to turn a tail risk into a body risk.

Of course, that the worst will happen is a clear prediction of reverse Hollywoodism theory—besides being the “natural, default” prediction for a computer scientist used to worst-case analysis.  This is one prediction that I hope turns out to be as wrong as possible.


OK, now for something to cheer us all up.  Yesterday the group of Misha Lukin, at Harvard, put a paper on the arXiv reporting the creation of a 51-qubit quantum simulator using cold atoms.  The paper doesn’t directly address the question of quantum supremacy, or indeed of performance comparisons between the new device and classical simulations at all.  But this is clearly a big step forward, while the world waits for the fully-programmable 50-qubit superconducting QCs that have been promised by the groups at Google and IBM.

Indeed, this strikes me as the most exciting news in experimental quantum information since last month, when Jian-Wei Pan’s group in Shanghai reported the first transmission of entangled photons from a satellite to earth—thereby allowing violations of the Bell inequality over 1200 kilometers, teleportation of a qubit from earth to space, and other major firsts.  These are breakthroughs that we knew were in the works ever since the Chinese government launched the QUESS satellite devoted to quantum communications.  I should’ve blogged about them in June.  Then again, regular readers of Shtetl-Optimized, familiar as they already are with the universal reach of quantum mechanics and with the general state of quantum information technology, shouldn’t find anything here that fundamentally surprises them, should they?

Amsterdam art museums plagiarizing my blog?

July 12th, 2017

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.

Photo credits: Ronald de Wolf and Marijn Heule.

ITCS’2018

July 5th, 2017

My friend Anna Karlin, who chairs the ITCS program committee this year, asked me to post the following announcement, and I’m happy to oblige her.  I’ve enjoyed ITCS every time I’ve attended, and was even involved in the statement that led to ITCS’s creation, although I don’t take direct responsibility for the content of this ad. –SA


The ITCS 2018 Call For Papers is now available!

ITCS is a conference that stands apart from all others. For a decade now, it has been celebrating the vibrancy and unity of our field of Theoretical Computer Science. See this blog post for a detailed discussion of what makes ITCS so cool and the brief description of ITCS’17 at the end of this post.

ITCS seeks to promote research that carries a strong conceptual message  (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

This year, ITCS will be held at MIT in Cambridge, MA from January 11-14, 2018.

The submission deadline is September 8, 2017, with notification of decisions by October 30, 2017.

 Authors should strive to make their papers accessible not only to experts in their subarea, but also to the theory community at large. The committee will place a premium on writing that conveys clearly and in the simplest possible way what the paper is accomplishing.

Ten-page versions of accepted papers will be published in an electronic proceedings of the conference. However, the alternative of publishing a one page abstract with a link to a full PDF will also be available (to accommodate subsequent publication in journals that would not consider results that have been published in preliminary form in a conference proceedings).

You can find all the details in the official Call For Papers.

 On last year’s ITCS (by the PC Chair Christos Papadimitriou)

 This past ITCS (2017) was by all accounts the most successful ever.  We had 170+ submissions and 61 papers, including 5 “invited papers”, and 90+ registrants, all new records.  There was a voluntary poster session for authors to get a chance to go through more detail, and the famous Graduating Bits event, where the younger ones get their 5 minutes to show off their accomplishment and personality.

The spirit of the conference was invigorating, heartwarming, and great fun.  I believe none of the twelve sessions had fewer than 70 attendees — no parallelism, of course — while the now famous last session was among the best attended and went one hour overtime due to the excitement of discussion (compare with the last large conference that you attended).

Alex Halderman testifying before the Senate Intelligence Committee

June 21st, 2017

This morning, my childhood best friend Alex Halderman testified before the US Senate about the proven ease of hacking electronic voting machines without leaving any record, the certainty that Russia has the technical capability to hack American elections, and the urgency of three commonsense (and cheap) countermeasures:

  1. a paper trail for every vote cast in every state,
  2. routine statistical sampling of the paper trail—enough to determine whether large-scale tampering occurred, and
  3. cybersecurity audits to instill general best practices (such as firewalling election systems).

You can watch Alex on C-SPAN here—his testimony begins at 2:16:13, and is followed by the Q&A period.  You can also read Alex’s prepared testimony here, as well as his accompanying Washington Post editorial (joint with Justin Talbot-Zorn).

Alex’s testimony—its civic, nonpartisan nature, right down to Alex’s flourish of approvingly quoting President Trump in support of paper ballots—reflects a moving optimism that, even in these dark times for democracy, Congress can be prodded into doing the right thing merely because it’s clearly, overwhelmingly in the national interest.  I wish I could say I shared that optimism.  Nevertheless, when called to testify, what can one do but act on the assumption that such optimism is justified?  Here’s hoping that Alex’s urgent message is heard and acted on.

Higher-level causation exists (but I wish it didn’t)

June 4th, 2017

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.


Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.


This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?


To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.


Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.

The Social Justice Warriors are right

May 29th, 2017

As you might know, I haven’t been exactly the world’s most consistent fan of the Social Justice movement, nor has it been the most consistent fan of me.

I cringe when I read about yet another conservative college lecture shut down by mob violence; or student protesters demanding the firing of a professor for trying gently to argue and reason with them; or an editor forced from his position for writing a (progressive) defense of “cultural appropriation”—a practice that I take to have been ubiquitous for all of recorded history, and without which there wouldn’t be any culture at all.  I cringe not only because I know that I was in the crosshairs once before and could easily be again, but also because, it seems to me, the Social Justice scalp-hunters are so astoundingly oblivious to the misdirection of their energies, to the power of their message for losing elections and neutering the progressive cause, to the massive gift their every absurdity provides to the world’s Fox Newses and Breitbarts and Trumps.

Yet there’s at least one issue where it seems to me that the Social Justice Warriors are 100% right, and their opponents 100% wrong. This is the moral imperative to take down every monument to Confederate “war heroes,” and to rename every street and school and college named after individuals whose primary contribution to the world was to defend chattel slavery.  As a now-Southerner, I have a greater personal stake here than I did before: UT Austin just recently removed its statue of Jefferson Davis, while keeping up its statue of Robert E. Lee.  My kids will likely attend what until very recently was called Robert E. Lee Elementary—this summer renamed Russell Lee Elementary.  (My suggestion, that the school be called T. D. Lee Parity Violation Elementary, was sadly never considered.)

So I was gratified that last week, New Orleans finally took down its monuments to slavers.  Mayor Mitch Landrieu’s speech, setting out the reasons for the removal, is worth reading.

I used to have little patience for “merely symbolic” issues: would that offensive statues and flags were the worst problems!  But it now seems to me that the fight over Confederate symbols is just a thinly-veiled proxy for the biggest moral question that’s faced the United States through its history, and also the most urgent question facing it in 2017.  Namely: Did the Union actually win the Civil War? Were the anti-Enlightenment forces—the slavers, the worshippers of blood and land and race and hierarchy—truly defeated? Do those forces acknowledge the finality and the rightness of their defeat?

For those who say that, sure, slavery was bad and all, but we need to keep statues to slavers up so as not to “erase history,” we need only change the example. Would we similarly defend statues of Hitler, Himmler, and Goebbels, looming over Berlin in heroic poses?  Yes, let Germans reflect somberly and often on this aspect of their heritage—but not by hoisting a swastika over City Hall.

For those who say the Civil War wasn’t “really” about slavery, I reply: this is the canonical example of a “Mount Stupid” belief, the sort of thing you can say only if you’ve learned enough to be wrong but not enough to be unwrong.  In 1861, the Confederate ringleaders themselves loudly proclaimed to future generations that, indeed, their desire to preserve slavery was their overriding reason to secede. Here’s CSA Vice-President Alexander Stephens, in his famous Cornerstone Speech:

Our new government is founded upon exactly the opposite ideas; its foundations are laid, its cornerstone rests, upon the great truth that the negro is not equal to the white man; that slavery, subordination to the superior race, is his natural and normal condition. This, our new government, is the first, in the history of the world, based upon this great physical, philosophical, and moral truth.

Here’s Texas’ Declaration of Secession:

We hold as undeniable truths that the governments of the various States, and of the confederacy itself, were established exclusively by the white race, for themselves and their posterity; that the African race had no agency in their establishment; that they were rightfully held and regarded as an inferior and dependent race, and in that condition only could their existence in this country be rendered beneficial or tolerable. That in this free government all white men are and of right ought to be entitled to equal civil and political rights; that the servitude of the African race, as existing in these States, is mutually beneficial to both bond and free, and is abundantly authorized and justified by the experience of mankind, and the revealed will of the Almighty Creator, as recognized by all Christian nations; while the destruction of the existing relations between the two races, as advocated by our sectional enemies, would bring inevitable calamities upon both and desolation upon the fifteen slave-holding states.

It was only when defeat looked inevitable that the slavers started changing their story, claiming that their real grievance was never about slavery per se, but only “states’ rights” (states’ right to do what, exactly?). So again, why should we take the slavers’ rationalizations any more seriously than we take the postwar epiphanies of jailed Nazis that actually, they’d never felt any personal animus toward Jews, that the Final Solution was just the world’s biggest bureaucratic mishap?  Of course there’s a difference: when the Allies occupied Germany, they insisted on de-Nazification.  They didn’t suffer streets to be named after Hitler. And today, incredibly, fascism and white nationalism are greater threats here in the US than they are in Germany.  One reads about the historic irony of some American Jews, who are eligible for German citizenship because of grandparents expelled from there, now seeking to move there because they’re terrified about Trump.

By contrast, after a brief Reconstruction, the United States lost its will to continue de-Confederatizing the South.  The leaders were left free to write book after book whitewashing their cause, even to hold political office again.  And probably not by coincidence, we then got nearly a hundred years of Jim Crow—and still today, a half-century after the civil rights movement, southern governors and legislatures that do everything in their power to disenfranchise black voters.

For those who ask: but wasn’t Robert E. Lee a great general who was admired by millions? Didn’t he fight bravely for a cause he believed in?  Maybe it’s just me, but I’m allergic to granting undue respect to history’s villains just because they managed to amass power and get others to go along with them.  I remember reading once in some magazine that, yes, Genghis Khan might have raped thousands and murdered millions, but since DNA tests suggest that ~1% of humanity is now descended from him, we should also celebrate Khan’s positive contribution to “peopling the world.” Likewise, Hegel and Marx and Freud and Heidegger might have been wrong in nearly everything they said, sometimes with horrific consequences, but their ideas still need to be studied reverently, because of the number of other intellectuals who took them seriously.  As I reject those special pleas, so I reject the analogous ones for Jefferson Davis, Alexander Stephens, and Robert E. Lee, who as far as I can tell, should all (along with the rest of the Confederate leadership) have been sentenced for treason.

This has nothing to do with judging the past by standards of the present. By all means, build statues to Washington and Jefferson even though they held slaves, to Lincoln even though he called blacks inferior even while he freed them, to Churchill even though he fought the independence of India.  But don’t look for moral complexity where there isn’t any.  Don’t celebrate people who were terrible even for their own time, whose public life was devoted entirely to what we now know to be evil.

And if, after the last Confederate general comes down, the public spaces are too empty, fill them with monuments to Alan Turing, Marian Rejewski, Bertrand Russell, Hypatia of Alexandria, Emmy Noether, Lise Meitner, Mark Twain, Srinivasa Ramanujan, Frederick Douglass, Vasili Arkhipov, Stanislav Petrov, Raoul Wallenberg, even the inventors of saltwater taffy or Gatorade or the intermittent windshield wiper.  There are, I think, enough people who added value to the world to fill every city square and street sign.

Yet more errors in papers

May 24th, 2017

Following up on my posts PostBQP Postscripts and More Wrong Things I Said In Papers, it felt like time for another post in which I publicly flog myself for mistakes in my research papers.  [Warning: The rest of this post is kinda, sorta technical.  Read at your own risk.]


(1) In my 2006 paper “Oracles are subtle but not malicious,” I claimed to show that if PP is contained in BQP/qpoly, then the counting hierarchy collapses to QMA (Theorem 5).  But on further reflection, I only know how to show a collapse of the counting hierarchy under the stronger assumption that PP is in BQP/poly.  If PP is in BQP/qpoly, then certainly P#P=PP=QMA, but I don’t know how to collapse any further levels of the counting hierarchy.  The issue is this: in QMA, we can indeed nondeterministically guess an (amplified) quantum advice state for a BQP/qpoly algorithm.  We can then verify that the advice state works to solve PP problems, by using (for example) the interactive protocol for the permanent, or some other #P-complete problem.  But having done that, how do we then unravel the higher levels of the counting hierarchy?  For example, how do we simulate PPPP in PPBQP=PP?  We don’t have any mechanism to pass the quantum advice up to the oracle PP machine, since queries to a PP oracle are by definition classical strings.  We could try to use tools from my later paper with Andy Drucker, passing a classical description of the quantum advice up to the oracle and then using the description to reconstruct the advice for ourselves.  But doing so just doesn’t seem to give us a complexity class that’s low for PP, which is what would be needed to unravel the counting hierarchy.  I still think this result might be recoverable, but a new idea is needed.


(2) In my 2008 algebrization paper with Avi Wigderson, one of the most surprising things we showed was a general connection between communication complexity lower bounds and algebraic query complexity lower bounds.  Specifically, given a Boolean oracle A:{0,1}n→{0,1}, let ~A be a low-degree extension of A over a finite field F (that is, ~A(x)=A(x) whenever x∈{0,1}n).  Then suppose we have an algorithm that’s able to learn some property of A, by making k black-box queries to ~A.  We observed that, in such a case, if Alice is given the top half of the truth table of A, and Bob is given the bottom half of the truth table, then there’s necessarily a communication protocol by which Alice and Bob can learn the same property of A, by exchanging at most O(kn log|F|) bits.  This connection is extremely model-independent: a randomized query algorithm gives rise to a randomized communication protocol, a quantum query algorithm gives rise to a quantum communication protocol, etc. etc.  The upshot is that, if you want to lower-bound the number of queries that an algorithm needs to make to the algebraic extension oracle ~A, in order to learn something about A, then it suffices to prove a suitable communication complexity lower bound.  And the latter, unlike algebraic query complexity, is a well-studied subject with countless results that one can take off the shelf.  We illustrated how one could use this connection to prove, for example, that there exists an oracle A such that NPA ⊄ BQP~A, for any low-degree extension ~A of A—a separation that we didn’t and don’t know how to prove any other way. Likewise, there exists an oracle B such that BQPB ⊄ BPP~B for any low-degree extension ~B of B.

The trouble is, our “proof sketches” for these separations (in Theorem 5.11) are inadequate, even for “sketches.”  They can often be fixed, but only by appealing to special properties of the communication complexity separations in question, properties that don’t necessarily hold for an arbitrary communication separation between two arbitrary models.

The issue is this: while it’s true, as we claimed, that a communication complexity lower bound implies an algebraic query complexity lower bound, it’s not true in general that a communication complexity upper bound implies an algebraic query complexity upper bound!  So, from a communication separation between models C and D, we certainly obtain a query complexity problem that’s not in D~A, but then the problem might not be in CA.  What tripped us up was that, in the cases we had in mind (e.g. Disjointness), it’s obvious that the query problem is in CA.  In other cases, however, such as Raz’s separation between quantum and randomized communication complexity, it probably isn’t even true.  In the latter case, to recover the desired conclusion about algebraic query complexity (namely, the existence of an oracle B such that BQPB ⊄ BPP~B), what seems to be needed is to start from a later quantum vs. classical communication complexity separation due to Klartag and Regev, and then convert their communication problem into a query problem using a recent approach by myself and Shalev Ben-David (see Section 4).  Unfortunately, my and Shalev’s approach only tells us nonconstructively that there exists a query problem with the desired separation, with no upper bound on the gate complexity of the quantum algorithm.  So strictly speaking, I still don’t know how to get a separation between the relativized complexity classes BQPB and BPP~B defined in terms of Turing machines.

In any case, I of course should have realized this issue with the algebrization paper the moment Shalev and I encountered the same issue when writing our later paper.  Let me acknowledge Shalev, as well as Robin Kothari, for helping to spur my realization of the issue.


In case it wasn’t clear, the mistakes I’ve detailed here have no effect on the main results of the papers in question (e.g., the existence of an oracle relative to which PP has linear-sized circuits; the existence and pervasiveness of the algebrization barrier).  The effect is entirely on various “bonus” results—results that, because they’re bonus, were gone over much less carefully by authors and reviewers alike.

Nevertheless, I’ve always felt like in science, the louder you are about your own mistakes, the better.  Hence this post.

Unsong of unsongs

May 20th, 2017

On Wednesday, Scott Alexander finally completed his sprawling serial novel Unsong, after a year and a half of weekly updates—incredibly, in his spare time while also working as a full-term resident in psychiatry, and also regularly updating Slate Star Codex, which I consider to be the world’s best blog.  I was honored to attend a party in Austin (mirroring parties in San Francisco, Boston, Tel Aviv, and elsewhere) to celebrate Alexander’s release of the last chapter—depending on your definition, possibly the first “fan event” I’ve ever attended.

Like many other nerds I’ve met, I’d been following Unsong almost since the beginning—with its mix of Talmudic erudition, CS humor, puns, and even a shout-out to Quantum Computing Since Democritus (which shows up as Ben Aharon’s Gematria Since Adam), how could I not be?  I now count Unsong as one of my favorite works of fiction, and Scott Alexander alongside Rebecca Newberger Goldstein among my favorite contemporary novelists.  The goal of this post is simply to prod readers of my blog who don’t yet know Unsong: if you’ve ever liked anything here on Shtetl-Optimized, then I predict you’ll like Unsong, and probably more.

[WARNING: SPOILERS FOLLOW]

Though not trivial to summarize, Unsong is about a world where the ideas of religion and mysticism—all of them, more or less, although with a special focus on kabbalistic Judaism—turn out to be true.  In 1968, the Apollo 8 mission leads not to an orbit of the Moon, as planned, but instead to cracking an invisible crystal sphere that had surrounded the Earth for millennia.  Down through the crack rush angels, devils, and other supernatural forces.  Life on Earth becomes increasingly strange: on the one hand, many technologies stop working; on the other, people can now gain magical powers by speaking various names of God.  A worldwide industry arises to discover new names of God by brute-force search through sequences of syllables.  And a powerful agency, the eponymous UNSONG (United Nations Subcommittee on Names of God), is formed to enforce kabbalistic copyright law, hunting down and punishing anyone who speaks divine names without paying licensing fees to the theonomic corporations.

As the story progresses, we learn that eons ago, there was an epic battle in Heaven between Good and Evil, and Evil had the upper hand.  But just as all seemed lost, an autistic angel named Uriel reprogrammed the universe to run on math and science rather than on God’s love, as a last-ditch strategy to prevent Satan’s forces from invading the sublunary realm.  Molecular biology, the clockwork regularity of physical laws, false evidence for a huge and mindless cosmos—all these were retconned into the world’s underpinnings.  Uriel did still need to be occasionally involved, but less as a loving god than as an overworked sysadmin: for example, he descended to Mount Sinai to warn humans never to boil goats in their mothers’ milk, because he discovered that doing so (like the other proscribed activities in the Torah, Uriel’s readme file) triggered bugs in the patchwork of code that was holding the universe together.  Now that the sky has cracked, Uriel is forced to issue increasingly desperate patches, and even those will only buy a few decades until his math-and-science-based world stops working entirely, with Satan again triumphant.

Anyway, that’s a tiny part of the setup.  Through 72 chapters and 22 interludes, there’s world-building and philosophical debates and long kabbalistic digressions.  There are battle sequences (the most striking involves the Lubavitcher Rebbe riding atop a divinely-animated Statue of Liberty like a golem).  There’s wordplay and inside jokes—holy of holies are there those—including, notoriously, a sequence of cringe-inducing puns involving whales.  But in this story, wordplay isn’t just there for the hell of it: Scott Alexander has built an entire fictional universe that runs on wordplay—one where battles between the great masters, the equivalent of the light-saber fights in Star Wars, are conducted by rearranging letters in the sky to give them new meanings.  Scott A. famously claims he’s bad at math (though if you read anything he’s written on statistics or logic puzzles, it’s clear he undersells himself).  One could read Unsong as Alexander’s book-length answer to the question: what could it mean for the world to be law-governed but not mathematical?  What if the Book of Nature were written in English, or Hebrew, or other human languages, and if the Newtons and Einsteins were those who were most adept with words?

I should confess that for me, the experience of reading Unsong was colored by the knowledge that, in his years of brilliant and prolific writing, lighting up the blogosphere like a comet, the greatest risk Scott Alexander ever took (by his own account) was to defend me.  It’s like, imagine that in Elizabethan England, you were placed in the stocks and jeered at by thousands for advocating some unpopular loser cause—like, I dunno, anti-cat-burning or something.  And imagine that, when it counted, your most eloquent supporter was a then-obscure poet from Stratford-upon-Avon.  You’d be grateful to the poet, of course; you might even become a regular reader of his work, even if it wasn’t good.  But if the same poet went on to write Hamlet or Macbeth?  It might almost be enough for you to volunteer to be scorned and pilloried all over again, just for the honor of having the Bard divert a rivulet of his creative rapids to protesting on your behalf.

Yes, a tiny part of me had a self-absorbed child’s reaction to Unsong: “could Amanda Marcotte have written this?  could Arthur Chu?  who better to have in your camp: the ideologues du jour of Twitter and Metafilter, Salon.com and RationalWiki?  Or a lone creative genius, someone who can conjure whole worlds into being, as though graced himself with the Shem haMephorash of which he writes?”  Then of course I’d catch myself, and think: no, if you want to be in Scott Alexander’s camp, then the only way to do it is to be in nobody’s camp.  If two years ago it was morally justified to defend me, then the reasons why have nothing to do with the literary gifts of any of my defenders.  And conversely, the least we can do for Unsong is to judge it by what’s on the page, rather than as a soldier in some army fielded by the Gray Tribe.

So in that spirit, let me explain some of what’s wrong with Unsong.  That it’s a first novel sometimes shows.  It’s brilliant on world-building and arguments and historical tidbits and jokes, epic on puns, and uneven on character and narrative flow.  The story jumps around spasmodically in time, so much so that I needed a timeline to keep track of what was happening.  Subplots that are still open beget additional subplots ad headacheum, like a string of unmatched left-parentheses.  Even more disorienting, the novel changes its mind partway through about its narrative core.  Initially, the reader is given a clear sense that this is going to be a story about a young Bay Area kabbalist named Aaron Smith-Teller, his not-quite-girlfriend Ana, and their struggle for supernatural fair-use rights.  Soon, though, Aaron and Ana become almost side characters, their battle against UNSONG just one subplot among many, as the focus shifts to the decades-long war between the Comet King, a messianic figure come to rescue humanity, and Thamiel, the Prince of Hell.  For the Comet King, even saving the earth from impending doom is too paltry a goal to hold his interest much.  As a strict utilitarian and fan of Peter Singer, the Comet King’s singleminded passion is destroying Hell itself, and thereby rescuing the billions of souls who are trapped there for eternity.

Anyway, unlike the Comet King, and unlike a certain other Scott A., I have merely human powers to marshal my time.  I also have two kids and a stack of unwritten papers.  So let me end this post now.  If the post causes just one person to read Unsong who otherwise wouldn’t have, it will be as if I’ve nerdified the entire world.

My broken blog

May 8th, 2017

I wanted to let people know I’m well-aware that Shtetl-Optimized has been suffering from the following problems lately:

  • Commenters are presented with the logins (handle, email address, and URL) of random other commenters, rather than with their own default login data.  In particular, this means that email addresses are leaking, and that when you comment, you should not (for the time being) enter your real email address if that’s information that you’d wanted to share only with me.  Another thing it means is that, when I try to comment, I’m not logged in as “Scott,” so even I have to enter my login data manually every time I comment.
  • Comments (including my own comments!) take about an hour to show up after I’ve approved them.
  • New blog posts also take a while to show up.

Since all three of these problems started happening around the same time, I assume they’re related.  But I don’t even know where to start in trying to solve them (Googling for “WordPress” plus descriptions of these bugs was unhelpful).  Would anyone like to help out?  If you earn my trust, I’ll even temporarily give you administrative privileges on this blog so you can poke around yourself.

Thanks so much, and hope to return to your regularly scheduled programming shortly…