CCC’s Declaration of Independence

June 6th, 2014

Recently, the participants of the Conference on Computational Complexity (CCC)—the latest iteration of which I’ll be speaking at next week in Vancouver—voted to declare their independence from the IEEE, and to become a solo, researcher-organized conference.  See this open letter for the reasons why (basically, IEEE charged a huge overhead, didn’t allow open access to the proceedings, and increased rather than decreased the administrative burden on the organizers).  As a former member of the CCC Steering Committee, I’m in violent agreement with this move, and only wish we’d managed to do it sooner.

Now, Dieter van Melkebeek (the current Steering Committee chair) is asking complexity theorists to sign a public Letter of Support, to make it crystal-clear that the community is behind the move to independence.  And Jeff Kinne has asked me to advertise the letter on my blog.  So, if you’re a complexity theorist who agrees with the move, please go there and sign (it already has 111 signatures, but could use more).

Meanwhile, I wish to express my profound gratitude to Dieter, Jeff, and the other Steering Committee members for their efforts toward independence.  The only thing I might’ve done differently would be to add a little more … I dunno, pizzazz to the documents explaining the reasons for the move.  Like:

When in the Course of human events, it becomes necessary for a conference to dissolve the organizational bands that have connected it with the IEEE, and to assume among the powers of the earth, the separate and equal station to which the Laws of Mathematics and the CCC Charter entitle it, a decent respect to the opinions of theorist-kind requires that the participants should declare the causes which impel them to the separation.

We hold these truths to be self-evident (but still in need of proof), that P and NP are created unequal, that one-way functions exist, that the polynomial hierarchy is infinite…

Giulio Tononi and Me: A Phi-nal Exchange

May 30th, 2014

You might recall that last week I wrote a post criticizing Integrated Information Theory (IIT), and its apparent implication that a simple Reed-Solomon decoding circuit would, if scaled to a large enough size, bring into being a consciousness vastly exceeding our own.  On Wednesday Giulio Tononi, the creator of IIT, was kind enough to send me a fascinating 14-page rebuttal, and to give me permission to share it here:

Why Scott should stare at a blank wall and reconsider (or, the conscious grid)

If you’re interested in this subject at all, then I strongly recommend reading Giulio’s response before continuing further.   But for those who want the tl;dr: Giulio, not one to battle strawmen, first restates my own argument against IIT with crystal clarity.  And while he has some minor quibbles (e.g., apparently my calculations of Φ didn’t use the most recent, “3.0″ version of IIT), he wisely sets those aside in order to focus on the core question: according to IIT, are all sorts of simple expander graphs conscious?

There, he doesn’t “bite the bullet” so much as devour a bullet hoagie with mustard.  He affirms that, yes, according to IIT, a large network of XOR gates arranged in a simple expander graph is conscious.  Indeed, he goes further, and says that the “expander” part is superfluous: even a network of XOR gates arranged in a 2D square grid is conscious.  In my language, Giulio is simply pointing out here that a √n×√n square grid has decent expansion: good enough to produce a Φ-value of about √n, if not the information-theoretic maximum of n (or n/2, etc.) that an expander graph could achieve.  And apparently, by Giulio’s lights, Φ=√n is sufficient for consciousness!

While Giulio never mentions this, it’s interesting to observe that logic gates arranged in a 1-dimensional line would produce a tiny Φ-value (Φ=O(1)).  So even by IIT standards, such a linear array would not be conscious.  Yet the jump from a line to a two-dimensional grid is enough to light the spark of Mind.

Personally, I give Giulio enormous credit for having the intellectual courage to follow his theory wherever it leads.  When the critics point out, “if your theory were true, then the Moon would be made of peanut butter,” he doesn’t try to wiggle out of the prediction, but proudly replies, “yes, chunky peanut butter—and you forgot to add that the Earth is made of Nutella!”

Yet even as we admire Giulio’s honesty and consistency, his stance might also prompt us, gently, to take another look at this peanut-butter-moon theory, and at what grounds we had for believing it in the first place.  In his response essay, Giulio offers four arguments (by my count) for accepting IIT despite, or even because of, its conscious-grid prediction: one “negative” argument and three “positive” ones.  Alas, while your Φ-lage may vary, I didn’t find any of the four arguments persuasive.  In the rest of this post, I’ll go through them one by one and explain why.

I. The Copernicus-of-Consciousness Argument

Like many commenters on my last post, Giulio heavily criticizes my appeal to “common sense” in rejecting IIT.  Sure, he says, I might find it “obvious” that a huge Vandermonde matrix, or its physical instantiation, isn’t conscious.  But didn’t people also find it “obvious” for millennia that the Sun orbits the Earth?  Isn’t the entire point of science to challenge common sense?  Clearly, then, the test of a theory of consciousness is not how well it upholds “common sense,” but how well it fits the facts.

The above position sounds pretty convincing: who could dispute that observable facts trump personal intuitions?  The trouble is, what are the observable facts when it comes to consciousness?  The anti-common-sense view gets all its force by pretending that we’re in a relatively late stage of research—namely, the stage of taking an agreed-upon scientific definition of consciousness, and applying it to test our intuitions—rather than in an extremely early stage, of agreeing on what the word “consciousness” is even supposed to mean.

Since I think this point is extremely important—and of general interest, beyond just IIT—I’ll expand on it with some analogies.

Suppose I told you that, in my opinion, the ε-δ definition of continuous functions—the one you learn in calculus class—failed to capture the true meaning of continuity.  Suppose I told you that I had a new, better definition of continuity—and amazingly, when I tried out my definition on some examples, it turned out that ⌊x⌋ (the floor function) was continuous, whereas x2  had discontinuities, though only at 17.5 and 42.

You would probably ask what I was smoking, and whether you could have some.  But why?  Why shouldn’t the study of continuity produce counterintuitive results?  After all, even the standard definition of continuity leads to some famously weird results, like that x sin(1/x) is a continuous function, even though sin(1/x) is discontinuous.  And it’s not as if the standard definition is God-given: people had been using words like “continuous” for centuries before Bolzano, Weierstrass, et al. formalized the ε-δ definition, a definition that millions of calculus students still find far from intuitive.  So why shouldn’t there be a different, better definition of “continuous,” and why shouldn’t it reveal that a step function is continuous while a parabola is not?

In my view, the way out of this conceptual jungle is to realize that, before any formal definitions, any ε’s and δ’s, we start with an intuition for we’re trying to capture by the word “continuous.”  And if we press hard enough on what that intuition involves, we’ll find that it largely consists of various “paradigm-cases.”  A continuous function, we’d say, is a function like 3x, or x2, or sin(x), while a discontinuity is the kind of thing that the function 1/x has at x=0, or that ⌊x⌋ has at every integer point.  Crucially, we use the paradigm-cases to guide our choice of a formal definition—not vice versa!  It’s true that, once we have a formal definition, we can then apply it to “exotic” cases like x sin(1/x), and we might be surprised by the results.  But the paradigm-cases are different.  If, for example, our definition told us that x2 was discontinuous, that wouldn’t be a “surprise”; it would just be evidence that we’d picked a bad definition.  The definition failed at the only task for which it could have succeeded: namely, that of capturing what we meant.

Some people might say that this is all well and good in pure math, but empirical science has no need for squishy intuitions and paradigm-cases.  Nothing could be further from the truth.  Suppose, again, that I told you that physicists since Kelvin had gotten the definition of temperature all wrong, and that I had a new, better definition.  And, when I built a Scott-thermometer that measures true temperatures, it delivered the shocking result that boiling water is actually colder than ice.  You’d probably tell me where to shove my Scott-thermometer.  But wait: how do you know that I’m not the Copernicus of heat, and that future generations won’t celebrate my breakthrough while scoffing at your small-mindedness?

I’d say there’s an excellent answer: because what we mean by heat is “whatever it is that boiling water has more of than ice” (along with dozens of other paradigm-cases).  And because, if you use a thermometer to check whether boiling water is hotter than ice, then the term for what you’re doing is calibrating your thermometer.  When the clock strikes 13, it’s time to fix the clock, and when the thermometer says boiling water’s colder than ice, it’s time to replace the thermometer—or if needed, even the entire theory on which the thermometer is based.

Ah, you say, but doesn’t modern physics define heat in a completely different, non-intuitive way, in terms of molecular motion?  Yes, and that turned out to be a superb definition—not only because it was precise, explanatory, and applicable to cases far beyond our everyday experience, but crucially, because it matched common sense on the paradigm-cases.  If it hadn’t given sensible results for boiling water and ice, then the only possible conclusion would be that, whatever new quantity physicists had defined, they shouldn’t call it “temperature,” or claim that their quantity measured the amount of “heat.”  They should call their new thing something else.

The implications for the consciousness debate are obvious.  When we consider whether to accept IIT’s equation of integrated information with consciousness, we don’t start with any agreed-upon, independent notion of consciousness against which the new notion can be compared.  The main things we start with, in my view, are certain paradigm-cases that gesture toward what we mean:

  • You are conscious (though not when anesthetized).
  • (Most) other people appear to be conscious, judging from their behavior.
  • Many animals appear to be conscious, though probably to a lesser degree than humans (and the degree of consciousness in each particular species is far from obvious).
  • A rock is not conscious.  A wall is not conscious.  A Reed-Solomon code is not conscious.  Microsoft Word is not conscious (though a Word macro that passed the Turing test conceivably would be).

Fetuses, coma patients, fish, and hypothetical AIs are the x sin(1/x)’s of consciousness: they’re the tougher cases, the ones where we might actually need a formal definition to adjudicate the truth.

Now, given a proposed formal definition for an intuitive concept, how can we check whether the definition is talking about same thing we were trying to get at before?  Well, we can check whether the definition at least agrees that parabolas are continuous while step functions are not, that boiling water is hot while ice is cold, and that we’re conscious while Reed-Solomon decoders are not.  If so, then the definition might be picking out the same thing that we meant, or were trying to mean, pre-theoretically (though we still can’t be certain).  If not, then the definition is certainly talking about something else.

What else can we do?

II. The Axiom Argument

According to Giulio, there is something else we can do, besides relying on paradigm-cases.  That something else, in his words, is to lay down “postulates about how the physical world should be organized to support the essential properties of experience,” then use those postulates to derive a consciousness-measuring quantity.

OK, so what are IIT’s postulates?  Here’s how Giulio states the five postulates leading to Φ in his response essay (he “derives” these from earlier “phenomenological axioms,” which you can find in the essay):

  1. A system of mechanisms exists intrinsically if it can make a difference to itself, by affecting the probability of its past and future states, i.e. it has causal power (existence).
  2. It is composed of submechanisms each with their own causal power (composition).
  3. It generates a conceptual structure that is the specific way it is, as specified by each mechanism’s concept — this is how each mechanism affects the probability of the system’s past and future states (information).
  4. The conceptual structure is unified — it cannot be decomposed into independent components (integration).
  5. The conceptual structure is singular — there can be no superposition of multiple conceptual structures over the same mechanisms and intervals of time.

From my standpoint, these postulates have three problems.  First, I don’t really understand them.  Second, insofar as I do understand them, I don’t necessarily accept their truth.  And third, insofar as I do accept their truth, I don’t see how they lead to Φ.

To elaborate a bit:

I don’t really understand the postulates.  I realize that the postulates are explicated further in the many papers on IIT.  Unfortunately, while it’s possible that I missed something, in all of the papers that I read, the definitions never seemed to “bottom out” in mathematical notions that I understood, like functions mapping finite sets to other finite sets.  What, for example, is a “mechanism”?  What’s a “system of mechanisms”?  What’s “causal power”?  What’s a “conceptual structure,” and what does it mean for it to be “unified”?  Alas, it doesn’t help to define these notions in terms of other notions that I also don’t understand.  And yes, I agree that all these notions can be given fully rigorous definitions, but there could be many different ways to do so, and the devil could lie in the details.  In any case, because (as I said) it’s entirely possible that the failure is mine, I place much less weight on this point than I do on the two points to follow.

I don’t necessarily accept the postulates’ truth.  Is consciousness a “unified conceptual structure”?  Is it “singular”?  Maybe.  I don’t know.  It sounds plausible.  But at any rate, I’m far less confident about any these postulates—whatever one means by them!—than I am about my own “postulate,” which is that you and I are conscious while my toaster is not.  Note that my postulate, though not phenomenological, does have the merit of constraining candidate theories of consciousness in an unambiguous way.

I don’t see how the postulates lead to Φ.  Even if one accepts the postulates, how does one deduce that the “amount of consciousness” should be measured by Φ, rather than by some other quantity?  None of the papers I read—including the ones Giulio linked to in his response essay—contained anything that looked to me like a derivation of Φ.  Instead, there was general discussion of the postulates, and then Φ just sort of appeared at some point.  Furthermore, given the many idiosyncrasies of Φ—the minimization over all bipartite (why just bipartite? why not tripartite?) decompositions of the system, the need for normalization (or something else in version 3.0) to deal with highly-unbalanced partitions—it would be quite a surprise were it possible to derive its specific form from postulates of such generality.

I was going to argue for that conclusion in more detail, when I realized that Giulio had kindly done the work for me already.  Recall that Giulio chided me for not using the “latest, 2014, version 3.0″ edition of Φ in my previous post.  Well, if the postulates uniquely determined the form of Φ, then what’s with all these upgrades?  Or has Φ’s definition been changing from year to year because the postulates themselves have been changing?  If the latter, then maybe one should wait for the situation to stabilize before trying to form an opinion of the postulates’ meaningfulness, truth, and completeness?

III. The Ironic Empirical Argument

Or maybe not.  Despite all the problems noted above with the IIT postulates, Giulio argues in his essay that there’s a good a reason to accept them: namely, they explain various empirical facts from neuroscience, and lead to confirmed predictions.  In his words:

[A] theory’s postulates must be able to explain, in a principled and parsimonious way, at least those many facts about consciousness and the brain that are reasonably established and non-controversial.  For example, we know that our own consciousness depends on certain brain structures (the cortex) and not others (the cerebellum), that it vanishes during certain periods of sleep (dreamless sleep) and reappears during others (dreams), that it vanishes during certain epileptic seizures, and so on.  Clearly, a theory of consciousness must be able to provide an adequate account for such seemingly disparate but largely uncontroversial facts.  Such empirical facts, and not intuitions, should be its primary test…

[I]n some cases we already have some suggestive evidence [of the truth of the IIT postulates' predictions].  One example is the cerebellum, which has 69 billion neurons or so — more than four times the 16 billion neurons of the cerebral cortex — and is as complicated a piece of biological machinery as any.  Though we do not understand exactly how it works (perhaps even less than we understand the cerebral cortex), its connectivity definitely suggests that the cerebellum is ill suited to information integration, since it lacks lateral connections among its basic modules.  And indeed, though the cerebellum is heavily connected to the cerebral cortex, removing it hardly affects our consciousness, whereas removing the cortex eliminates it.

I hope I’m not alone in noticing the irony of this move.  But just in case, let me spell it out: Giulio has stated, as “largely uncontroversial facts,” that certain brain regions (the cerebellum) and certain states (dreamless sleep) are not associated with our consciousness.  He then views it as a victory for IIT, if those regions and states turn out to have lower information integration than the regions and states that he does take to be associated with our consciousness.

But how does Giulio know that the cerebellum isn’t conscious?  Even if it doesn’t produce “our” consciousness, maybe the cerebellum has its own consciousness, just as rich as the cortex’s but separate from it.  Maybe removing the cerebellum destroys that other consciousness, unbeknownst to “us.”  Likewise, maybe “dreamless” sleep brings about its own form of consciousness, one that (unlike dreams) we never, ever remember in the morning.

Giulio might take the implausibility of those ideas as obvious, or at least as “largely uncontroversial” among neuroscientists.  But here’s the problem with that: he just told us that a 2D square grid is conscious!  He told us that we must not rely on “commonsense intuition,” or on any popular consensus, to say that if a square mesh of wires is just sitting there XORing some input bits, doing nothing at all that we’d want to call intelligent, then it’s probably safe to conclude that the mesh isn’t conscious.  So then why shouldn’t he say the same for the cerebellum, or for the brain in dreamless sleep?  By Giulio’s own rules (the ones he used for the mesh), we have no a-priori clue whether those systems are conscious or not—so even if IIT predicts that they’re not conscious, that can’t be counted as any sort of success for IIT.

For me, the point is even stronger: I, personally, would be a million times more inclined to ascribe consciousness to the human cerebellum, or to dreamless sleep, than I would to the mesh of XOR gates.  For it’s not hard to imagine neuroscientists of the future discovering “hidden forms of intelligence” in the cerebellum, and all but impossible to imagine them doing the same for the mesh.  But even if you put those examples on the same footing, still the take-home message seems clear: you can’t count it as a “success” for IIT if it predicts that the cerebellum in unconscious, while at the same time denying that it’s a “failure” for IIT if it predicts that a square mesh of XOR gates is conscious.  If the unconsciousness of the cerebellum can be considered an “empirical fact,” safe enough for theories of consciousness to be judged against it, then surely the unconsciousness of the mesh can also be considered such a fact.

IV. The Phenomenology Argument

I now come to, for me, the strangest and most surprising part of Giulio’s response.  Despite his earlier claim that IIT need not dovetail with “commonsense intuition” about which systems are conscious—that it can defy intuition—at some point, Giulio valiantly tries to reprogram our intuition, to make us feel why a 2D grid could be conscious.  As best I can understand, the argument seems to be that, when we stare at a blank 2D screen, we form a rich experience in our heads, and that richness must be mirrored by a corresponding “intrinsic” richness in 2D space itself:

[I]f one thinks a bit about it, the experience of empty 2D visual space is not at all empty, but contains a remarkable amount of structure.  In fact, when we stare at the blank screen, quite a lot is immediately available to us without any effort whatsoever.  Thus, we are aware of all the possible locations in space (“points”): the various locations are right “there”, in front of us.  We are aware of their relative positions: a point may be left or right of another, above or below, and so on, for every position, without us having to order them.  And we are aware of the relative distances among points: quite clearly, two points may be close or far, and this is the case for every position.  Because we are aware of all of this immediately, without any need to calculate anything, and quite regularly, since 2D space pervades most of our experiences, we tend to take for granted the vast set of relationship[s] that make up 2D space.

And yet, says IIT, given that our experience of the blank screen definitely exists, and it is precisely the way it is — it is 2D visual space, with all its relational properties — there must be physical mechanisms that specify such phenomenological relationships through their causal power … One may also see that the causal relationships that make up 2D space obtain whether the elements are on or off.  And finally, one may see that such a 2D grid is necessary not so much to represent space from the extrinsic perspective of an observer, but to create it, from its own intrinsic perspective.

Now, it would be child’s-play to criticize the above line of argument for conflating our consciousness of the screen with the alleged consciousness of the screen itself.  To wit:  Just because it feels like something to see a wall, doesn’t mean it feels like something to be a wall.  You can smell a rose, and the rose can smell good, but that doesn’t mean the rose can smell you.

However, I actually prefer a different tack in criticizing Giulio’s “wall argument.”  Suppose I accepted that my mental image of the relationships between certain entities was relevant to assessing whether those entities had their own mental life, independent of me or any other observer.  For example, suppose I believed that, if my experience of 2D space is rich and structured, then that’s evidence that 2D space is rich and structured enough to be conscious.

Then my question is this: why shouldn’t the same be true of 1D space?  After all, my experience of staring at a rope is also rich and structured, no less than my experience of staring at a wall.  I perceive some points on the rope as being toward the left, others as being toward the right, and some points as being between two other points.  In fact, the rope even has a structure—namely, a natural total ordering on its points—that the wall lacks.  So why does IIT cruelly deny subjective experience to a row of logic gates strung along a rope, reserving it only for a mesh of logic gates pasted to a wall?

And yes, I know the answer: because the logic gates on the rope aren’t “integrated” enough.  But who’s to say that the gates in the 2D mesh are integrated enough?  As I mentioned before, their Φ-value grows only as the square root of the number of gates, so that the ratio of integrated information to total information tends to 0 as the number of gates increases.  And besides, aren’t what Giulio calls “the facts of phenomenology” the real arbiters here, and isn’t my perception of the rope’s structure a phenomenological fact?  When you cut a rope, does it not split?  When you prick it, does it not fray?

Conclusion

At this point, I fear we’re at a philosophical impasse.  Having learned that, according to IIT,

  1. a square grid of XOR gates is conscious, and your experience of staring at a blank wall provides evidence for that,
  2. by contrast, a linear array of XOR gates is not conscious, your experience of staring at a rope notwithstanding,
  3. the human cerebellum is also not conscious (even though a grid of XOR gates is), and
  4. unlike with the XOR gates, we don’t need a theory to tell us the cerebellum is unconscious, but can simply accept it as “reasonably established” and “largely uncontroversial,”

I personally feel completely safe in saying that this is not the theory of consciousness for me.  But I’ve also learned that other people, even after understanding the above, still don’t reject IIT.  And you know what?  Bully for them.  On reflection, I firmly believe that a two-state solution is possible, in which we simply adopt different words for the different things that we mean by “consciousness”—like, say, consciousnessReal for my kind and consciousnessWTF for the IIT kind.  OK, OK, just kidding!  How about “paradigm-case consciousness” for the one and “IIT consciousness” for the other.


Completely unrelated announcement: Some of you might enjoy this Nature News piece by Amanda Gefter, about black holes and computational complexity.

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

May 27th, 2014

Update (June 3): A few days after we posted this paper online, Brent Werness, a postdoc in probability theory at the University of Washington, discovered a serious error in the “experimental” part of the paper.  Happily, Brent is now collaborating with us on producing a new version of the paper that fixes the error, which we hope to have available within a few months (and which will replace the version currently on the arXiv).

To make a long story short: while the overall idea, of measuring “apparent complexity” by the compressed file size of a coarse-grained image, is fine, the “interacting coffee automaton” that we study in the paper is not an example where the apparent complexity becomes large at intermediate times.  That fact can be deduced as a corollary of a result of Liggett from 2009 about the “symmetric exclusion process,” and can be seen as a far-reaching generalization of a result that we prove in our paper’s appendix: namely, that in the non-interacting coffee automaton (our “control case”), the apparent complexity after t time steps is upper-bounded by O(log(nt)).  As it turns out, we were more right than we knew to worry about large-deviation bounds giving complete mathematical control over what happens when the cream spills into the coffee, thereby preventing the apparent complexity from ever becoming large!

But what about our numerical results, which showed a small but unmistakable complexity bump for the interacting automaton (figure 10(a) in the paper)?  It now appears that the complexity bump we saw in our data is likely to be explainable by an incomplete removal of what we called “border pixel artifacts”: that is, “spurious” complexity that arises merely from the fact that, at the border between cream and coffee, we need to round the fraction of cream up or down to the nearest integer to produce a grayscale.  In the paper, we devoted a whole section (Section 6) to border pixel artifacts and the need to deal with them: something sufficiently non-obvious that in the comments of this post, you can find people arguing with me that it’s a non-issue.  Well, it now appears that we erred by underestimating the severity of border pixel artifacts, and that a better procedure to get rid of them would also eliminate the complexity bump for the interacting automaton.

Once again, this error has no effect on either the general idea of complexity rising and then falling in closed thermodynamic systems, or our proposal for how to quantify that rise and fall—the two aspects of the paper that have generated the most interest.  But we made a bad choice of model system with which to illustrate those ideas.  Had I looked more carefully at the data, I could’ve noticed the problem before we posted, and I take responsibility for my failure to do so.

The good news is that ultimately, I think the truth only makes our story more interesting.  For it turns out that apparent complexity, as we define it, is not something that’s trivial to achieve by just setting loose a bunch of randomly-walking particles, which bump into each other but are otherwise completely independent.  If you want “complexity” along the approach to thermal equilibrium, you need to work a bit harder for it.  One promising idea, which we’re now exploring, is to consider a cream tendril whose tip takes a random walk through the coffee, leaving a trail of cream in its wake.  Using results in probability theory—closely related, or so I’m told, to the results for which Wendelin Werner won his Fields Medal!—it may even be possible to prove analytically that the apparent complexity becomes large in thermodynamic systems with this sort of behavior, much as one can prove that the complexity doesn’t become large in our original coffee automaton.

So, if you’re interested in this topic, stay tuned for the updated version of our paper.  In the meantime, I wish to express our deepest imaginable gratitude to Brent Werness for telling us all this.


Good news!  After nearly three years of procrastination, fellow blogger Sean Carroll, former MIT undergraduate Lauren Ouellette, and yours truly finally finished a paper with the above title (coming soon to an arXiv near you).  PowerPoint slides are also available (as usual, you’re on your own if you can’t open them—sorry!).

For the background and context of this paper, please see my old post “The First Law of Complexodynamics,” which discussed Sean’s problem of defining a “complextropy” measure that first increases and then decreases in closed thermodynamic systems, in contrast to entropy (which increases monotonically).  In this exploratory paper, we basically do five things:

  1. We survey several candidate “complextropy” measures: their strengths, weaknesses, and relations to one another.
  2. We propose a model system for studying such measures: a probabilistic cellular automaton that models a cup of coffee into which cream has just been poured.
  3. We report the results of numerical experiments with one of the measures, which we call “apparent complexity” (basically, the gzip file size of a smeared-out image of the coffee cup).  The results confirm that the apparent complexity does indeed increase, reach a maximum, then turn around and decrease as the coffee and cream mix.
  4. We discuss a technical issue that one needs to overcome (the so-called “border pixels” problem) before one can do meaningful experiments in this area, and offer a solution.
  5. We raise the open problem of proving analytically that the apparent complexity ever becomes large for the coffee automaton.  To underscore this problem’s difficulty, we prove that the apparent complexity doesn’t become large in a simplified version of the coffee automaton.

Anyway, here’s the abstract:

In contrast to entropy, which increases monotonically, the “complexity” or “interestingness” of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids (“coffee” and “cream”). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton’s state, which we dub the “apparent complexity.” We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the “coffee cup’s” horizontal dimension. We raise the problem of proving this behavior analytically.

Questions and comments more than welcome.


In unrelated news, Shafi Goldwasser has asked me to announce that the Call for Papers for the 2015 Innovations in Theoretical Computer Science (ITCS) conference is now available.

Why I Am Not An Integrated Information Theorist (or, The Unconscious Expander)

May 21st, 2014

Happy birthday to me!

Recently, lots of people have been asking me what I think about IIT—no, not the Indian Institutes of Technology, but Integrated Information Theory, a widely-discussed “mathematical theory of consciousness” developed over the past decade by the neuroscientist Giulio Tononi.  One of the askers was Max Tegmark, who’s enthusiastically adopted IIT as a plank in his radical mathematizing platform (see his paper “Consciousness as a State of Matter”).  When, in the comment thread about Max’s Mathematical Universe Hypothesis, I expressed doubts about IIT, Max challenged me to back up my doubts with a quantitative calculation.

So, this is the post that I promised to Max and all the others, about why I don’t believe IIT.  And yes, it will contain that quantitative calculation.

But first, what is IIT?  The central ideas of IIT, as I understand them, are:

(1) to propose a quantitative measure, called Φ, of the amount of “integrated information” in a physical system (i.e. information that can’t be localized in the system’s individual parts), and then

(2) to hypothesize that a physical system is “conscious” if and only if it has a large value of Φ—and indeed, that a system is more conscious the larger its Φ value.

I’ll return later to the precise definition of Φ—but basically, it’s obtained by minimizing, over all subdivisions of your physical system into two parts A and B, some measure of the mutual information between A’s outputs and B’s inputs and vice versa.  Now, one immediate consequence of any definition like this is that all sorts of simple physical systems (a thermostat, a photodiode, etc.) will turn out to have small but nonzero Φ values.  To his credit, Tononi cheerfully accepts the panpsychist implication: yes, he says, it really does mean that thermostats and photodiodes have small but nonzero levels of consciousness.  On the other hand, for the theory to work, it had better be the case that Φ is small for “intuitively unconscious” systems, and only large for “intuitively conscious” systems.  As I’ll explain later, this strikes me as a crucial point on which IIT fails.

The literature on IIT is too big to do it justice in a blog post.  Strikingly, in addition to the “primary” literature, there’s now even a “secondary” literature, which treats IIT as a sort of established base on which to build further speculations about consciousness.  Besides the Tegmark paper linked to above, see for example this paper by Maguire et al., and associated popular article.  (Ironically, Maguire et al. use IIT to argue for the Penrose-like view that consciousness might have uncomputable aspects—a use diametrically opposed to Tegmark’s.)

Anyway, if you want to read a popular article about IIT, there are loads of them: see here for the New York Times’s, here for Scientific American‘s, here for IEEE Spectrum‘s, and here for the New Yorker‘s.  Unfortunately, none of those articles will tell you the meat (i.e., the definition of integrated information); for that you need technical papers, like this or this by Tononi, or this by Seth et al.  IIT is also described in Christof Koch’s memoir Consciousness: Confessions of a Romantic Reductionist, which I read and enjoyed; as well as Tononi’s Phi: A Voyage from the Brain to the Soul, which I haven’t yet read.  (Koch, one of the world’s best-known thinkers and writers about consciousness, has also become an evangelist for IIT.)

So, I want to explain why I don’t think IIT solves even the problem that it “plausibly could have” solved.  But before I can do that, I need to do some philosophical ground-clearing.  Broadly speaking, what is it that a “mathematical theory of consciousness” is supposed to do?  What questions should it answer, and how should we judge whether it’s succeeded?

The most obvious thing a consciousness theory could do is to explain why consciousness exists: that is, to solve what David Chalmers calls the “Hard Problem,” by telling us how a clump of neurons is able to give rise to the taste of strawberries, the redness of red … you know, all that ineffable first-persony stuff.  Alas, there’s a strong argument—one that I, personally, find completely convincing—why that’s too much to ask of any scientific theory.  Namely, no matter what the third-person facts were, one could always imagine a universe consistent with those facts in which no one “really” experienced anything.  So for example, if someone claims that integrated information “explains” why consciousness exists—nope, sorry!  I’ve just conjured into my imagination beings whose Φ-values are a thousand, nay a trillion times larger than humans’, yet who are also philosophical zombies: entities that there’s nothing that it’s like to be.  Granted, maybe such zombies can’t exist in the actual world: maybe, if you tried to create one, God would notice its large Φ-value and generously bequeath it a soul.  But if so, then that’s a further fact about our world, a fact that manifestly couldn’t be deduced from the properties of Φ alone.  Notice that the details of Φ are completely irrelevant to the argument.

Faced with this point, many scientifically-minded people start yelling and throwing things.  They say that “zombies” and so forth are empty metaphysics, and that our only hope of learning about consciousness is to engage with actual facts about the brain.  And that’s a perfectly reasonable position!  As far as I’m concerned, you absolutely have the option of dismissing Chalmers’ Hard Problem as a navel-gazing distraction from the real work of neuroscience.  The one thing you can’t do is have it both ways: that is, you can’t say both that the Hard Problem is meaningless, and that progress in neuroscience will soon solve the problem if it hasn’t already.  You can’t maintain simultaneously that

(a) once you account for someone’s observed behavior and the details of their brain organization, there’s nothing further about consciousness to be explained, and

(b) remarkably, the XYZ theory of consciousness can explain the “nothing further” (e.g., by reducing it to integrated information processing), or might be on the verge of doing so.

As obvious as this sounds, it seems to me that large swaths of consciousness-theorizing can just be summarily rejected for trying to have their brain and eat it in precisely the above way.

Fortunately, I think IIT survives the above observations.  For we can easily interpret IIT as trying to do something more “modest” than solve the Hard Problem, although still staggeringly audacious.  Namely, we can say that IIT “merely” aims to tell us which physical systems are associated with consciousness and which aren’t, purely in terms of the systems’ physical organization.  The test of such a theory is whether it can produce results agreeing with “commonsense intuition”: for example, whether it can affirm, from first principles, that (most) humans are conscious; that dogs and horses are also conscious but less so; that rocks, livers, bacteria colonies, and existing digital computers are not conscious (or are hardly conscious); and that a room full of people has no “mega-consciousness” over and above the consciousnesses of the individuals.

The reason it’s so important that the theory uphold “common sense” on these test cases is that, given the experimental inaccessibility of consciousness, this is basically the only test available to us.  If the theory gets the test cases “wrong” (i.e., gives results diverging from common sense), it’s not clear that there’s anything else for the theory to get “right.”  Of course, supposing we had a theory that got the test cases right, we could then have a field day with the less-obvious cases, programming our computers to tell us exactly how much consciousness is present in octopi, fetuses, brain-damaged patients, and hypothetical AI bots.

In my opinion, how to construct a theory that tells us which physical systems are conscious and which aren’t—giving answers that agree with “common sense” whenever the latter renders a verdict—is one of the deepest, most fascinating problems in all of science.  Since I don’t know a standard name for the problem, I hereby call it the Pretty-Hard Problem of Consciousness.  Unlike with the Hard Hard Problem, I don’t know of any philosophical reason why the Pretty-Hard Problem should be inherently unsolvable; but on the other hand, humans seem nowhere close to solving it (if we had solved it, then we could reduce the abortion, animal rights, and strong AI debates to “gentlemen, let us calculate!”).

Now, I regard IIT as a serious, honorable attempt to grapple with the Pretty-Hard Problem of Consciousness: something concrete enough to move the discussion forward.  But I also regard IIT as a failed attempt on the problem.  And I wish people would recognize its failure, learn from it, and move on.

In my view, IIT fails to solve the Pretty-Hard Problem because it unavoidably predicts vast amounts of consciousness in physical systems that no sane person would regard as particularly “conscious” at all: indeed, systems that do nothing but apply a low-density parity-check code, or other simple transformations of their input data.  Moreover, IIT predicts not merely that these systems are “slightly” conscious (which would be fine), but that they can be unboundedly more conscious than humans are.

To justify that claim, I first need to define Φ.  Strikingly, despite the large literature about Φ, I had a hard time finding a clear mathematical definition of it—one that not only listed formulas but fully defined the structures that the formulas were talking about.  Complicating matters further, there are several competing definitions of Φ in the literature, including ΦDM (discrete memoryless), ΦE (empirical), and ΦAR (autoregressive), which apply in different contexts (e.g., some take time evolution into account and others don’t).  Nevertheless, I think I can define Φ in a way that will make sense to theoretical computer scientists.  And crucially, the broad point I want to make about Φ won’t depend much on the details of its formalization anyway.

We consider a discrete system in a state x=(x1,…,xn)∈Sn, where S is a finite alphabet (the simplest case is S={0,1}).  We imagine that the system evolves via an “updating function” f:Sn→Sn. Then the question that interests us is whether the xi‘s can be partitioned into two sets A and B, of roughly comparable size, such that the updates to the variables in A don’t depend very much on the variables in B and vice versa.  If such a partition exists, then we say that the computation of f does not involve “global integration of information,” which on Tononi’s theory is a defining aspect of consciousness.

More formally, given a partition (A,B) of {1,…,n}, let us write an input y=(y1,…,yn)∈Sn to f in the form (yA,yB), where yA consists of the y variables in A and yB consists of the y variables in B.  Then we can think of f as mapping an input pair (yA,yB) to an output pair (zA,zB).  Now, we define the “effective information” EI(A→B) as H(zB | A random, yB=xB).  Or in words, EI(A→B) is the Shannon entropy of the output variables in B, if the input variables in A are drawn uniformly at random, while the input variables in B are fixed to their values in x.  It’s a measure of the dependence of B on A in the computation of f(x).  Similarly, we define

EI(B→A) := H(zA | B random, yA=xA).

We then consider the sum

Φ(A,B) := EI(A→B) + EI(B→A).

Intuitively, we’d like the integrated information Φ=Φ(f,x) be the minimum of Φ(A,B), over all 2n-2 possible partitions of {1,…,n} into nonempty sets A and B.  The idea is that Φ should be large, if and only if it’s not possible to partition the variables into two sets A and B, in such a way that not much information flows from A to B or vice versa when f(x) is computed.

However, no sooner do we propose this than we notice a technical problem.  What if A is much larger than B, or vice versa?  As an extreme case, what if A={1,…,n-1} and B={n}?  In that case, we’ll have Φ(A,B)≤2log2|S|, but only for the boring reason that there’s hardly any entropy in B as a whole, to either influence A or be influenced by it.  For this reason, Tononi proposes a fix where we normalize each Φ(A,B) by dividing it by min{|A|,|B|}.  He then defines the integrated information Φ to be Φ(A,B), for whichever partition (A,B) minimizes the ratio Φ(A,B) / min{|A|,|B|}.  (Unless I missed it, Tononi never specifies what we should do if there are multiple (A,B)’s that all achieve the same minimum of Φ(A,B) / min{|A|,|B|}.  I’ll return to that point later, along with other idiosyncrasies of the normalization procedure.)

Tononi gives some simple examples of the computation of Φ, showing that it is indeed larger for systems that are more “richly interconnected” in an intuitive sense.  He speculates, plausibly, that Φ is quite large for (some reasonable model of) the interconnection network of the human brain—and probably larger for the brain than for typical electronic devices (which tend to be highly modular in design, thereby decreasing their Φ), or, let’s say, than for other organs like the pancreas.  Ambitiously, he even speculates at length about how a large value of Φ might be connected to the phenomenology of consciousness.

To be sure, empirical work in integrated information theory has been hampered by three difficulties.  The first difficulty is that we don’t know the detailed interconnection network of the human brain.  The second difficulty is that it’s not even clear what we should define that network to be: for example, as a crude first attempt, should we assign a Boolean variable to each neuron, which equals 1 if the neuron is currently firing and 0 if it’s not firing, and let f be the function that updates those variables over a timescale of, say, a millisecond?  What other variables do we need—firing rates, internal states of the neurons, neurotransmitter levels?  Is choosing many of these variables uniformly at random (for the purpose of calculating Φ) really a reasonable way to “randomize” the variables, and if not, what other prescription should we use?

The third and final difficulty is that, even if we knew exactly what we meant by “the f and x corresponding to the human brain,” and even if we had complete knowledge of that f and x, computing Φ(f,x) could still be computationally intractable.  For recall that the definition of Φ involved minimizing a quantity over all the exponentially-many possible bipartitions of {1,…,n}.  While it’s not directly relevant to my arguments in this post, I leave it as a challenge for interested readers to pin down the computational complexity of approximating Φ to some reasonable precision, assuming that f is specified by a polynomial-size Boolean circuit, or alternatively, by an NC0 function (i.e., a function each of whose outputs depends on only a constant number of the inputs).  (Presumably Φ will be #P-hard to calculate exactly, but only because calculating entropy exactly is a #P-hard problem—that’s not interesting.)

I conjecture that approximating Φ is an NP-hard problem, even for restricted families of f’s like NC0 circuits—which invites the amusing thought that God, or Nature, would need to solve an NP-hard problem just to decide whether or not to imbue a given physical system with consciousness!  (Alas, if you wanted to exploit this as a practical approach for solving NP-complete problems such as 3SAT, you’d need to do a rather drastic experiment on your own brain—an experiment whose result would be to render you unconscious if your 3SAT instance was satisfiable, or conscious if it was unsatisfiable!  In neither case would you be able to communicate the outcome of the experiment to anyone else, nor would you have any recollection of the outcome after the experiment was finished.)  In the other direction, it would also be interesting to upper-bound the complexity of approximating Φ.  Because of the need to estimate the entropies of distributions (even given a bipartition (A,B)), I don’t know that this problem is in NP—the best I can observe is that it’s in AM.

In any case, my own reason for rejecting IIT has nothing to do with any of the “merely practical” issues above: neither the difficulty of defining f and x, nor the difficulty of learning them, nor the difficulty of calculating Φ(f,x).  My reason is much more basic, striking directly at the hypothesized link between “integrated information” and consciousness.  Specifically, I claim the following:

Yes, it might be a decent rule of thumb that, if you want to know which brain regions (for example) are associated with consciousness, you should start by looking for regions with lots of information integration.  And yes, it’s even possible, for all I know, that having a large Φ-value is one necessary condition among many for a physical system to be conscious.  However, having a large Φ-value is certainly not a sufficient condition for consciousness, or even for the appearance of consciousness.  As a consequence, Φ can’t possibly capture the essence of what makes a physical system conscious, or even of what makes a system look conscious to external observers.

The demonstration of this claim is embarrassingly simple.  Let S=Fp, where p is some prime sufficiently larger than n, and let V be an n×n Vandermonde matrix over Fp—that is, a matrix whose (i,j) entry equals ij-1 (mod p).  Then let f:Sn→Sn be the update function defined by f(x)=Vx.  Now, for p large enough, the Vandermonde matrix is well-known to have the property that every submatrix is full-rank (i.e., “every submatrix preserves all the information that it’s possible to preserve about the part of x that it acts on”).  And this implies that, regardless of which bipartition (A,B) of {1,…,n} we choose, we’ll get

EI(A→B) = EI(B→A) = min{|A|,|B|} log2p,

and hence

Φ(A,B) = EI(A→B) + EI(B→A) = 2 min{|A|,|B|} log2p,

or after normalizing,

Φ(A,B) / min{|A|,|B|} = 2 log2p.

Or in words: the normalized information integration has the same value—namely, the maximum value!—for every possible bipartition.  Now, I’d like to proceed from here to a determination of Φ itself, but I’m prevented from doing so by the ambiguity in the definition of Φ that I noted earlier.  Namely, since every bipartition (A,B) minimizes the normalized value Φ(A,B) / min{|A|,|B|}, in theory I ought to be able to pick any of them for the purpose of calculating Φ.  But the unnormalized value Φ(A,B), which gives the final Φ, can vary greatly, across bipartitions: from 2 log2p (if min{|A|,|B|}=1) all the way up to n log2p (if min{|A|,|B|}=n/2).  So at this point, Φ is simply undefined.

On the other hand, I can solve this problem, and make Φ well-defined, by an ironic little hack.  The hack is to replace the Vandermonde matrix V by an n×n matrix W, which consists of the first n/2 rows of the Vandermonde matrix each repeated twice (assume for simplicity that n is a multiple of 4).  As before, we let f(x)=Wx.  Then if we set A={1,…,n/2} and B={n/2+1,…,n}, we can achieve

EI(A→B) = EI(B→A) = (n/4) log2p,

Φ(A,B) = EI(A→B) + EI(B→A) = (n/2) log2p,

and hence

Φ(A,B) / min{|A|,|B|} = log2p.

In this case, I claim that the above is the unique bipartition that minimizes the normalized integrated information Φ(A,B) / min{|A|,|B|}, up to trivial reorderings of the rows.  To prove this claim: if |A|=|B|=n/2, then clearly we minimize Φ(A,B) by maximizing the number of repeated rows in A and the number of repeated rows in B, exactly as we did above.  Thus, assume |A|≤|B| (the case |B|≤|A| is analogous).  Then clearly

EI(B→A) ≥ |A|/2,

while

EI(A→B) ≥ min{|A|, |B|/2}.

So if we let |A|=cn and |B|=(1-c)n for some c∈(0,1/2], then

Φ(A,B) ≥ [c/2 + min{c, (1-c)/2}] n,

and

Φ(A,B) / min{|A|,|B|} = Φ(A,B) / |A| = 1/2 + min{1, 1/(2c) – 1/2}.

But the above expression is uniquely minimized when c=1/2.  Hence the normalized integrated information is minimized essentially uniquely by setting A={1,…,n/2} and B={n/2+1,…,n}, and we get

Φ = Φ(A,B) = (n/2) log2p,

which is quite a large value (only a factor of 2 less than the trivial upper bound of n log2p).

Now, why did I call the switch from V to W an “ironic little hack”?  Because, in order to ensure a large value of Φ, I decreased—by a factor of 2, in fact—the amount of “information integration” that was intuitively happening in my system!  I did that in order to decrease the normalized value Φ(A,B) / min{|A|,|B|} for the particular bipartition (A,B) that I cared about, thereby ensuring that that (A,B) would be chosen over all the other bipartitions, thereby increasing the final, unnormalized value Φ(A,B) that Tononi’s prescription tells me to return.  I hope I’m not alone in fearing that this illustrates a disturbing non-robustness in the definition of Φ.

But let’s leave that issue aside; maybe it can be ameliorated by fiddling with the definition.  The broader point is this: I’ve shown that my system—the system that simply applies the matrix W to an input vector x—has an enormous amount of integrated information Φ.  Indeed, this system’s Φ equals half of its entire information content.  So for example, if n were 1014 or so—something that wouldn’t be hard to arrange with existing computers—then this system’s Φ would exceed any plausible upper bound on the integrated information content of the human brain.

And yet this Vandermonde system doesn’t even come close to doing anything that we’d want to call intelligent, let alone conscious!  When you apply the Vandermonde matrix to a vector, all you’re really doing is mapping the list of coefficients of a degree-(n-1) polynomial over Fp, to the values of the polynomial on the n points 0,1,…,n-1.  Now, evaluating a polynomial on a set of points turns out to be an excellent way to achieve “integrated information,” with every subset of outputs as correlated with every subset of inputs as it could possibly be.  In fact, that’s precisely why polynomials are used so heavily in error-correcting codes, such as the Reed-Solomon code, employed (among many other places) in CD’s and DVD’s.  But that doesn’t imply that every time you start up your DVD player you’re lighting the fire of consciousness.  It doesn’t even hint at such a thing.  All it tells us is that you can have integrated information without consciousness (or even intelligence)—just like you can have computation without consciousness, and unpredictability without consciousness, and electricity without consciousness.

It might be objected that, in defining my “Vandermonde system,” I was too abstract and mathematical.  I said that the system maps the input vector x to the output vector Wx, but I didn’t say anything about how it did so.  To perform a computation—even a computation as simple as a matrix-vector multiply—won’t we need a physical network of wires, logic gates, and so forth?  And in any realistic such network, won’t each logic gate be directly connected to at most a few other gates, rather than to billions of them?  And if we define the integrated information Φ, not directly in terms of the inputs and outputs of the function f(x)=Wx, but in terms of all the actual logic gates involved in computing f, isn’t it possible or even likely that Φ will go back down?

This is a good objection, but I don’t think it can rescue IIT.  For we can achieve the same qualitative effect that I illustrated with the Vandermonde matrix—the same “global information integration,” in which every large set of outputs depends heavily on every large set of inputs—even using much “sparser” computations, ones where each individual output depends on only a few of the inputs.  This is precisely the idea behind low-density parity check (LDPC) codes, which have had a major impact on coding theory over the past two decades.  Of course, one would need to muck around a bit to construct a physical system based on LDPC codes whose integrated information Φ was provably large, and for which there were no wildly-unbalanced bipartitions that achieved lower Φ(A,B)/min{|A|,|B|} values than the balanced bipartitions one cared about.  But I feel safe in asserting that this could be done, similarly to how I did it with the Vandermonde matrix.

More generally, we can achieve pretty good information integration by hooking together logic gates according to any bipartite expander graph: that is, any graph with n vertices on each side, such that every k vertices on the left side are connected to at least min{(1+ε)k,n} vertices on the right side, for some constant ε>0.  And it’s well-known how to create expander graphs whose degree (i.e., the number of edges incident to each vertex, or the number of wires coming out of each logic gate) is a constant, such as 3.  One can do so either by plunking down edges at random, or (less trivially) by explicit constructions from algebra or combinatorics.  And as indicated in the title of this post, I feel 100% confident in saying that the so-constructed expander graphs are not conscious!  The brain might be an expander, but not every expander is a brain.

Before winding down this post, I can’t resist telling you that the concept of integrated information (though it wasn’t called that) played an interesting role in computational complexity in the 1970s.  As I understand the history, Leslie Valiant conjectured that Boolean functions f:{0,1}n→{0,1}n with a high degree of “information integration” (such as discrete analogues of the Fourier transform) might be good candidates for proving circuit lower bounds, which in turn might be baby steps toward P≠NP.  More strongly, Valiant conjectured that the property of information integration, all by itself, implied that such functions had to be at least somewhat computationally complex—i.e., that they couldn’t be computed by circuits of size O(n), or even required circuits of size Ω(n log n).  Alas, that hope was refuted by Valiant’s later discovery of linear-size superconcentrators.  Just as information integration doesn’t suffice for intelligence or consciousness, so Valiant learned that information integration doesn’t suffice for circuit lower bounds either.

As humans, we seem to have the intuition that global integration of information is such a powerful property that no “simple” or “mundane” computational process could possibly achieve it.  But our intuition is wrong.  If it were right, then we wouldn’t have linear-size superconcentrators or LDPC codes.

I should mention that I had the privilege of briefly speaking with Giulio Tononi (as well as his collaborator, Christof Koch) this winter at an FQXi conference in Puerto Rico.  At that time, I challenged Tononi with a much cruder, handwavier version of some of the same points that I made above.  Tononi’s response, as best as I can reconstruct it, was that it’s wrong to approach IIT like a mathematician; instead one needs to start “from the inside,” with the phenomenology of consciousness, and only then try to build general theories that can be tested against counterexamples.  This response perplexed me: of course you can start from phenomenology, or from anything else you like, when constructing your theory of consciousness.  However, once your theory has been constructed, surely it’s then fair game for others to try to refute it with counterexamples?  And surely the theory should be judged, like anything else in science or philosophy, by how well it withstands such attacks?

But let me end on a positive note.  In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed.  Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.

[Endnote: See also this related post, by the philosopher Eric Schwetzgebel: Why Tononi Should Think That the United States Is Conscious.  While the discussion is much more informal, and the proposed counterexample more debatable, the basic objection to IIT is the same.]


Update (5/22): Here are a few clarifications of this post that might be helpful.

(1) The stuff about zombies and the Hard Problem was simply meant as motivation and background for what I called the “Pretty-Hard Problem of Consciousness”—the problem that I take IIT to be addressing.  You can disagree with the zombie stuff without it having any effect on my arguments about IIT.

(2) I wasn’t arguing in this post that dualism is true, or that consciousness is irreducibly mysterious, or that there could never be any convincing theory that told us how much consciousness was present in a physical system.  All I was arguing was that, at any rate, IIT is not such a theory.

(3) Yes, it’s true that my demonstration of IIT’s falsehood assumes—as an axiom, if you like—that while we might not know exactly what we mean by “consciousness,” at any rate we’re talking about something that humans have to a greater extent than DVD players.  If you reject that axiom, then I’d simply want to define a new word for a certain quality that non-anesthetized humans seem to have and that DVD players seem not to, and clarify that that other quality is the one I’m interested in.

(4) For my counterexample, the reason I chose the Vandermonde matrix is not merely that it’s invertible, but that all of its submatrices are full-rank.  This is the property that’s relevant for producing a large value of the integrated information Φ; by contrast, note that the identity matrix is invertible, but produces a system with Φ=0.  (As another note, if we work over a large enough field, then a random matrix will have this same property with high probability—but I wanted an explicit example, and while the Vandermonde is far from the only one, it’s one of the simplest.)

(5) The n×n Vandermonde matrix only does what I want if we work over (say) a prime field Fp with p>>n elements.  Thus, it’s natural to wonder whether similar examples exist where the basic system variables are bits, rather than elements of Fp.  The answer is yes. One way to get such examples is using the low-density parity check codes that I mention in the post.  Another common way to get Boolean examples, and which is also used in practice in error-correcting codes, is to start with the Vandermonde matrix (a.k.a. the Reed-Solomon code), and then combine it with an additional component that encodes the elements of Fp as strings of bits in some way.  Of course, you then need to check that doing this doesn’t harm the properties of the original Vandermonde matrix that you cared about (e.g., the “information integration”) too much, which causes some additional complication.

(6) Finally, it might be objected that my counterexamples ignored the issue of dynamics and “feedback loops”: they all consisted of unidirectional processes, which map inputs to outputs and then halt.  However, this can be fixed by the simple expedient of iterating the process over and over!  I.e., first map x to Wx, then map Wx to W2x, and so on.  The integrated information should then be the same as in the unidirectional case.


Update (5/24): See a very interesting comment by David Chalmers.

The NEW Ten Most Annoying Questions in Quantum Computing

May 13th, 2014

Eight years ago, I put up a post entitled The Ten Most Annoying Questions in Quantum Computing.  One of the ten wasn’t a real question—it was simply a request for readers to submit questions—so let’s call it nine.  I’m delighted to say that, of the nine questions, six have by now been completely settled—most recently, my question about the parallel-repeated value of the CHSH game, which Andris Ambainis pointed out to me last week can be answered using a 2008 result of Barak et al. combined with a 2013 result of Dinur and Steurer.

To be clear, the demise of so many problems is exactly the outcome I wanted. In picking problems, my goal wasn’t to shock and awe with difficulty—as if to say “this is how smart I am, that whatever stumps me will also stump everyone else for decades.” Nor was it to showcase my bottomless profundity, by proffering questions so vague, multipartite, and open-ended that no matter what progress was made, I could always reply “ah, but you still haven’t addressed the real question!” Nor, finally, was my goal to list the biggest research directions for the entire field, the stuff everyone already knows about (“is there a polynomial-time quantum algorithm for graph isomorphism?”). My interest was exclusively in “little” questions, in weird puzzles that looked (at least at the time) like there was no deep obstruction to just killing them one by one, whichever way their answers turned out. What made them annoying was that they hadn’t succumbed already.

So, now that two-thirds of my problems have met the fate they deserved, at Andris’s suggestion I’m presenting a new list of Ten Most Annoying Questions in Quantum Computing—a list that starts with the three still-unanswered questions from the old list, and then adds seven more.

But we’ll get to that shortly. First, let’s review the six questions that have been answered.


CLOSED, NO-LONGER ANNOYING QUESTIONS IN QUANTUM COMPUTING

1. Given an n-qubit pure state, is there always a way to apply Hadamard gates to some subset of the qubits, so as to make all 2n computational basis states have nonzero amplitudes?  Positive answer by Ashley Montanaro and Dan Shepherd, posted to this blog in 2006.

3. Can any QMA(2) (QMA with two unentangled yes-provers) protocol be amplified to exponentially small error probability?  Positive answer by Aram Harrow and Ashley Montanaro, from a FOCS’2010 paper.

4. If a unitary operation U can be applied in polynomial time, then can some square root of U also be applied in polynomial time?  Positive answer by Lana Sheridan, Dmitri Maslov, and Michele Mosca, from a 2008 paper.

5. Suppose Alice and Bob are playing n parallel CHSH games, with no communication or entanglement. Is the probability that they’ll win all n games at most pn, for some p bounded below 0.853?

OK, let me relay what Andris Ambainis told me about this question, with Andris’s kind permission. First of all, we’ve known for a while that the optimal success probability is not the (3/4)n that Alice and Bob could trivially achieve by just playing all n games separately. I observed in 2006 that, by correlating their strategies between pairs of games in a clever way, Alice and Bob can win with probability (√10 / 4)n ~ 0.79n. And Barak et al. showed in 2008 that they can win with probability ((1+√5)/4)n ~ 0.81n. (Unfortunately, I don’t know the actual strategy that achieves the latter bound!  Barak et al. say they’ll describe it in the full version of their paper, but the full version hasn’t yet appeared.)

Anyway, Dinur-Steurer 2013 gave a general recipe to prove that the value of a repeated projection game is at most αn, where α is some constant that depends on the game in question. When Andris followed their recipe for the CHSH game, he obtained the result α=(1+√5)/4—thereby showing that Barak et al.’s strategy, whatever it is, is precisely optimal! Andris also observes that, for any two-prover game G, the Dinur-Steurer bound α(G) is always strictly less than the entangled value ω*(G), unless the classical and entangled values are the same for one copy of the game (i.e., unless ω(G)=ω*(G)). This implies that parallel repetition can never completely eliminate a quantum advantage.

6. Forget about an oracle relative to which BQP is not in PH (the Polynomial Hierarchy). Forget about an oracle relative to which BQP is not in AM (Arthur-Merlin). Is there an oracle relative to which BQP is not in SZK (Statistical Zero-Knowledge)?  Positive answer by me, posted to this blog in 2006.  See also my BQP vs. PH paper for a different proof.

9. Is there an n-qubit pure state that can be prepared by a circuit of size n3, and that can’t be distinguished from the maximally mixed state by any circuit of size n2?  A positive answer follows from this 2009 paper by Richard Low—thanks very much to Fernando Brandao for bringing that to my attention a few months ago.


OK, now on to:

THE NEW TEN MOST ANNOYING QUESTIONS IN QUANTUM COMPUTING

1. Can we get any upper bound whatsoever on the complexity class QMIP—i.e., quantum multi-prover interactive proofs with unlimited prior entanglement? (Since I asked this question in 2006, Ito and Vidick achieved the breakthrough lower bound NEXP⊆QMIP, but there’s been basically no progress on the upper bound side.)

2. Given any n-qubit unitary operation U, does there exist an oracle relative to which U can be (approximately) applied in polynomial time? (Since 2006, my interest in this question has only increased. See this paper by me and Greg Kuperberg for background and related results.)

3. How many mutually unbiased bases are there in non-prime-power dimensions?

4. Since Chris Fuchs was so thrilled by my including one of his favorite questions on my earlier list (question #3 above), let me add another of his favorites: do SIC-POVMs exist in arbitrary finite dimensions?

5. Is there a Boolean function f:{0,1}n→{0,1} whose bounded-error quantum query complexity is strictly greater than n/2?  (Thanks to Shelby Kimmel for this question!  Note that this paper by van Dam shows that the bounded-error quantum query complexity never exceeds n/2+O(√n), while this paper by Ambainis et al. shows that it’s at least n/2-O(√n) for almost all Boolean functions f.)

6. Is there a “universal disentangler”: that is, a superoperator S that takes nO(1) qubits as input; that produces a 2n-qubit bipartite state (with n qubits on each side) as output; whose output S(ρ) is always close in variation distance to a separable state; and that given an appropriate input state, can produce as output an approximation to any desired separable state?  (See here for background about this problem, originally posed by John Watrous. Note that if such an S existed and were computationally efficient, it would imply QMA=QMA(2).)

7. Suppose we have explicit descriptions of n two-outcome POVM measurements—say, as d×d Hermitian matrices E1,…,En—and are also given k=(log(nd))O(1) copies of an unknown quantum state ρ in d dimensions.  Is there a way to measure the copies so as to estimate the n expectation values Tr(E1ρ),…,Tr(Enρ), each to constant additive error?  (A forthcoming paper of mine on private-key quantum money will contain some background and related results.)

8. Is there a collection of 1- and 2-qubit gates that generates a group of unitary matrices that is (a) not universal for quantum computation, (b) not just conjugate to permuted diagonal matrices or one-qubit gates plus swaps, and (c) not conjugate to a subgroup of the Clifford group?

9. Given a partial Boolean function f:S→{0,1} with S⊆{0,1}n, is the bounded-error quantum query complexity of f always polynomially related to the smallest degree of any polynomial p:{0,1}n→R such that (a) p(x)∈[0,1] for all x∈{0,1}n, and (b) |p(x)-f(x)|≤1/3 for all x∈S?

10. Is there a quantum finite automaton that reads in an infinite sequence of i.i.d. coin flips, and whose limiting probability of being found in an “accept” state is at least 2/3 if the coin is fair and at most 1/3 if the coin is unfair?  (See this paper by me and Andy Drucker for background and related results.)

The Quest for Randomness

April 22nd, 2014

So, I’ve written an article of that title for the wonderful American Scientist magazine—or rather, Part I of such an article.  This part explains the basics of Kolmogorov complexity and algorithmic information theory: how, under reasonable assumptions, these ideas can be used in principle to “certify” that a string of numbers was really produced randomly—something that one might’ve imagined impossible a priori.  Unfortunately, the article also explains why this fact is of limited use in practice: because Kolmogorov complexity is uncomputable!  Readers who already know this material won’t find much that’s new here, but I hope those who don’t will enjoy the piece.

Part II, to appear in the next issue, will be all about quantum entanglement and Bell’s Theorem, and their very recent use in striking protocols for generating so-called “Einstein-certified random numbers”—something of much more immediate practical interest.

Thanks so much to Fenella Saunders of American Scientist for commissioning these articles, and my apologies to her and any interested readers for the 4.5 years (!) it took me to get off my rear end (or rather, onto it) to write these things.


Update (4/28): Kate Becker of NOVA has published an article about “whether information is fundamental to reality,” which includes some quotes from me. Enjoy!

Is There Anything Beyond Quantum Computing?

April 11th, 2014

So I’ve written an article about the above question for PBS’s website—a sort of tl;dr version of my 2005 survey paper NP-Complete Problems and Physical Reality, but updated with new material about the simulation of quantum field theories and about AdS/CFT.  Go over there, read the article (it’s free), then come back here to talk about it if you like.  Thanks so much to Kate Becker for commissioning the article.

In other news, there’s a profile of me at MIT News (called “The Complexonaut”) that some people might find amusing.

Oh, and anyone who thinks the main reason to care about quantum computing is that, if our civilization ever manages to surmount the profound scientific and technological obstacles to building a scalable quantum computer, then that little padlock icon on your web browser would no longer represent ironclad security?  Ha ha.  Yeah, it turns out that, besides factoring integers, you can also break OpenSSL by (for example) exploiting a memory bug in C.  The main reason to care about quantum computing is, and has always been, science.

Waiting for BQP Fever

April 1st, 2014

Update (April 5): By now, three or four people have written in asking for my reaction to the preprint “Computational solution to quantum foundational problems” by Arkady Bolotin.  (See here for the inevitable Slashdot discussion, entitled “P vs. NP Problem Linked to the Quantum Nature of the Universe.”)  It gives me no pleasure to respond to this sort of thing—it would be far better to let papers this gobsmackingly uninformed about the relevant issues fade away in quiet obscurity—but since that no longer seems to be possible in the age of social media, my brief response is here.


(note: sorry, no April Fools post, just a post that happens to have gone up on April Fools)

This weekend, Dana and I celebrated our third anniversary by going out to your typical sappy romantic movie: Particle Fever, a documentary about the Large Hadron Collider.  As it turns out, the movie was spectacularly good; anyone who reads this blog should go see it.  Or, to offer even higher praise:

If watching Particle Fever doesn’t cause you to feel in your bones the value of fundamental science—the thrill of discovery, unmotivated by any application—then you are not truly human.  You are a barnyard animal who happens to walk on its hind legs.

Indeed, I regard Particle Fever as one of the finest advertisements for science itself ever created.  It’s effective precisely because it doesn’t try to tell you why science is important (except for one scene, where an economist asks a physicist after a public talk about the “return on investment” of the LHC, and is given the standard correct answer, about “what was the return on investment of radio waves when they were first discovered?”).  Instead, the movie simply shows you the lives of particle physicists, of people who take for granted the urgency of knowing the truth about the basic constituents of reality.  And in showing you the scientists’ quest, it makes you feel as they feel.  Incidentally, the movie also shows footage of Congressmen ridiculing the uselessness of the Superconducting Supercollider, during the debates that led to the SSC’s cancellation.  So, gently, implicitly, you’re invited to choose: whose side are you on?

I do have a few, not quite criticisms of the movie, but points that any viewer should bear in mind while watching it.

First, it’s important not to come away with the impression that Particle Fever shows “what science is usually like.”  Sure, there are plenty of scenes that any scientist would find familiar: sleep-deprived postdocs; boisterous theorists correcting each other’s statements over Chinese food; a harried lab manager walking to the office oblivious to traffic.  On the other hand, the decades-long quest to find the Higgs boson, the agonizing drought of new data before the one big money shot, the need for an entire field to coalesce around a single machine, the whole careers hitched to specific speculative scenarios that this one machine could favor or disfavor—all of that is a profoundly abnormal situation in the history of science.  Particle physics didn’t used to be that way, and other parts of science are not that way today.  Of course, the fact that particle physics became that way makes it unusually suited for a suspenseful movie—a fact that the creators of Particle Fever understood perfectly and exploited to the hilt.

Second, the movie frames the importance of the Higgs search as follows: if the Higgs boson turned out to be relatively light, like 115 GeV, then that would favor supersymmetry, and hence an “elegant, orderly universe.”  If, on the other hand, the Higgs turned out to be relatively heavy, like 140 GeV, then that would favor anthropic multiverse scenarios (and hence a “messy, random universe”).  So the fact that the Higgs ended up being 125 GeV means the universe is coyly refusing to tell us whether it’s orderly or random, and more research is needed.

In my view, it’s entirely appropriate for a movie like this one to relate its subject matter to big, metaphysical questions, to the kinds of questions anyone can get curious about (in contrast to, say, “what is the mechanism of electroweak symmetry breaking?”) and that the scientists themselves talk about anyway.  But caution is needed here.  My lay understanding, which might be wrong, is as follows: while it’s true that a lighter Higgs would tend to favor supersymmetric models, the only way to argue that a heavier Higgs would “favor the multiverse,” is if you believe that a multiverse is automatically favored by a lack of better explanations.  More broadly, I wish the film had made clearer that the explanation for (some) apparent “fine-tunings” in the Standard Model might be neither supersymmetry, nor the multiverse, nor “it’s just an inexplicable accident,” but simply some other explanation that no one has thought of yet, but that would emerge from a better understanding of quantum field theory.  As one example, on reading up on the subject after watching the film, I was surprised to learn that a very conservative-sounding idea—that of “asymptotically safe gravity”—was used in 2009 to predict the Higgs mass right on the nose, at 126.3 ± 2.2 GeV.  Of course, it’s possible that this was just a lucky guess (there were, after all, lots of Higgs mass predictions).  But as an outsider, I’d love to understand why possibilities like this don’t seem to get discussed more (there might, of course, be perfectly good reasons that I don’t know).

Third, for understandable dramatic reasons, the movie focuses almost entirely on the “younger generation,” from postdocs working on ATLAS and CMS detectors, to theorists like Nima Arkani-Hamed who are excited about the LHC because of its ability to test scenarios like supersymmetry.  From the movie’s perspective, the creation of the Standard Model itself, in the 60s and 70s, might as well be ancient history.  Indeed, when Peter Higgs finally appears near the end of the film, it’s as if Isaac Newton has walked onstage.  At several points, I found myself wishing that some of the original architects of the Standard Model, like Steven Weinberg or Sheldon Glashow, had been interviewed to provide their perspectives.  After all, their model is really the one that’s been vindicated at the LHC, not (so far) any of the newer ideas like supersymmetry or large extra dimensions.

OK, but let me come to the main point of this post.  I confess that my overwhelming emotion on watching Particle Fever was one of regret—regret that my own field, quantum computing, has never managed to make the case for itself the way particle physics and cosmology have, in terms of the human urge to explore the unknown.

See, from my perspective, there’s a lot to envy about the high-energy physicists.  Most importantly, they don’t perceive any need to justify what they do in terms of practical applications.  Sure, they happily point to “spinoffs,” like the fact that the Web was invented at CERN.  But any time they try to justify what they do, the unstated message is that if you don’t see the inherent value of understanding the universe, then the problem lies with you.

Now, no marketing consultant would ever in a trillion years endorse such an out-of-touch, elitist sales pitch.  But the remarkable fact is that the message has more-or-less worked.  While the cancellation of the SSC was a setback, the high-energy physicists did succeed in persuading the world to pony up the $11 billion needed to build the LHC, and to gain the information that the mass of the Higgs boson is about 125 GeV.

Now contrast that with quantum computing.  To hear the media tell it, a quantum computer would be a powerful new gizmo, sort of like existing computers except faster.  (Why would it be faster?  Something to do with trying both 0 and 1 at the same time.)  The reasons to build quantum computers are things that could make any buzzword-spouting dullard nod in recognition: cracking uncrackable encryption, finding bugs in aviation software, sifting through massive data sets, maybe even curing cancer, predicting the weather, or finding aliens.  And all of this could be yours in a few short years—or some say it’s even commercially available today.  So, if you check back in a few years and it’s still not on store shelves, probably it went the way of flying cars or moving sidewalks: another technological marvel that just failed to materialize for some reason.

Foolishly, shortsightedly, many academics in quantum computing have played along with this stunted vision of their field—because saying this sort of thing is the easiest way to get funding, because everyone else says the same stuff, and because after you’ve repeated something on enough grant applications you start to believe it yourself.  All in all, then, it’s just easier to go along with the “gizmo vision” of quantum computing than to ask pointed questions like:

What happens when it turns out that some of the most-hyped applications of quantum computers (e.g., optimization, machine learning, and Big Data) were based on wildly inflated hopes—that there simply isn’t much quantum speedup to be had for typical problems of that kind, that yes, quantum algorithms exist, but they aren’t much faster than the best classical randomized algorithms?  What happens when it turns out that the real applications of quantum computing—like breaking RSA and simulating quantum systems—are nice, but not important enough by themselves to justify the cost?  (E.g., when the imminent risk of a quantum computer simply causes people to switch from RSA to other cryptographic codes?  Or when the large polynomial overheads of quantum simulation algorithms limit their usefulness?)  Finally, what happens when it turns out that the promises of useful quantum computers in 5-10 years were wildly unrealistic?

I’ll tell you: when this happens, the spigots of funding that once flowed freely will dry up, and the techno-journalists and pointy-haired bosses who once sang our praises will turn to the next craze.  And they’re unlikely to be impressed when we protest, “no, look, the reasons we told you before for why you should support quantum computing were never the real reasons!  and the real reasons remain as valid as ever!”

In my view, we as a community have failed to make the honest case for quantum computing—the case based on basic science—because we’ve underestimated the public.  We’ve falsely believed that people would never support us if we told them the truth: that while the potential applications are wonderful cherries on the sundae, they’re not and have never been the main reason to build a quantum computer.  The main reason is that we want to make absolutely manifest what quantum mechanics says about the nature of reality.  We want to lift the enormity of Hilbert space out of the textbooks, and rub its full, linear, unmodified truth in the face of anyone who denies it.  Or if it isn’t the truth, then we want to discover what is the truth.

Many people would say it’s impossible to make the latter pitch, that funders and laypeople would never understand it or buy it.  But there’s an $11-billion, 17-mile ring under Geneva that speaks against their cynicism.

Anyway, let me end this “movie review” with an anecdote.  The other day a respected colleague of mine—someone who doesn’t normally follow such matters—asked me what I thought about D-Wave.  After I’d given my usual spiel, he smiled and said:

“See Scott, but you could imagine scientists of the 1400s saying the same things about Columbus!  He had no plan that could survive academic scrutiny.  He raised money under the false belief that he could reach India by sailing due west.  And he didn’t understand what he’d found even after he’d found it.  Yet for all that, it was Columbus, and not some academic critic on the sidelines, who discovered the new world.”

With this one analogy, my colleague had eloquently summarized the case for D-Wave, a case often leveled against me much more verbosely.  But I had an answer.

“I accept your analogy!” I replied.  “But to me, Columbus and the other conquerors of the Americas weren’t heroes to be admired or emulated.  Motivated by gold and spices rather than knowledge, they spread disease, killed and enslaved millions in one of history’s greatest holocausts, and burned the priceless records of the Maya and Inca civilizations so that the world would never even understand what was lost.  I submit that, had it been undertaken by curious and careful scientists—or at least people with a scientific mindset—rather than by swashbucklers funded by greedy kings, the European exploration and colonization of the Americas could have been incalculably less tragic.”

The trouble is, when I say things like that, people just laugh at me knowingly.  There he goes again, the pie-in-the-sky complexity theorist, who has no idea what it takes to get anything done in the real world.  What an amusingly contrary perspective he has.

And that, in the end, is why I think Particle Fever is such an important movie.  Through the stories of the people who built the LHC, you’ll see how it really is possible to reach a new continent without the promise of gold or the allure of lies.

This review of Max Tegmark’s book also occurs infinitely often in the decimal expansion of π

March 22nd, 2014

Two months ago, commenter rrtucci asked me what I thought about Max Tegmark and his “Mathematical Universe Hypothesis”: the idea, which Tegmark defends in his recent book Our Mathematical Universe, that physical and mathematical existence are the same thing, and that what we call “the physical world” is simply one more mathematical structure, alongside the dodecahedron and so forth.  I replied as follows:

…I find Max a fascinating person, a wonderful conference organizer, someone who’s always been extremely nice to me personally, and an absolute master at finding common ground with his intellectual opponents—I’m trying to learn from him, and hope someday to become 10-122 as good.  I can also say that, like various other commentators (e.g., Peter Woit), I personally find the “Mathematical Universe Hypothesis” to be devoid of content.

After Peter Woit found that comment and highlighted it on his own blog, my comments section was graced by none other than Tegmark himself, who wrote:

Thanks Scott for your all to [sic] kind words!  I very much look forward to hearing what you think about what I actually say in the book once you’ve had a chance to read it!  I’m happy to give you a hardcopy (which can double as door-stop) – just let me know.

With this reply, Max illustrated perfectly why I’ve been trying to learn from him, and how far I fall short.  Where I would’ve said “yo dumbass, why don’t you read my book before spouting off?,” Tegmark gracefully, diplomatically shamed me into reading his book.

So, now that I’ve done so, what do I think?  Briefly, I think it’s a superb piece of popular science writing—stuffed to the gills with thought-provoking arguments, entertaining anecdotes, and fascinating facts.  I think everyone interested in math, science, or philosophy should buy the book and read it.  And I still think the MUH is basically devoid of content, as it stands.

Let me start with what makes the book so good.  First and foremost, the personal touch.  Tegmark deftly conveys the excitement of being involved in the analysis of the cosmic microwave background fluctuations—of actually getting detailed numerical data about the origin of the universe.  (The book came out just a few months before last week’s bombshell announcement of B-modes in the CMB data; presumably the next edition will have an update about that.)  And Tegmark doesn’t just give you arguments for the Many-Worlds Interpretation of quantum mechanics; he tells you how he came to believe it.  He writes of being a beginning PhD student at Berkeley, living at International House (and dating an Australian exchange student who he met his first day at IHouse), who became obsessed with solving the quantum measurement problem, and who therefore headed to the physics library, where he was awestruck by reading the original Many-Worlds articles of Hugh Everett and Bryce deWitt.  As it happens, every single part of the last sentence also describes me (!!!)—except that the Australian exchange student who I met my first day at IHouse lost interest in me when she decided that I was too nerdy.  And also, I eventually decided that the MWI left me pretty much as confused about the measurement problem as before, whereas Tegmark remains a wholehearted Many-Worlder.

The other thing I loved about Tegmark’s book was its almost comical concreteness.  He doesn’t just metaphorically write about “knobs” for adjusting the constants of physics: he shows you a picture of a box with the knobs on it.  He also shows a “letter” that lists not only his street address, zip code, town, state, and country, but also his planet, Hubble volume, post-inflationary bubble, quantum branch, and mathematical structure.  Probably my favorite figure was the one labeled “What Dark Matter Looks Like / What Dark Energy Looks Like,” which showed two blank boxes.

Sometimes Tegmark seems to subtly subvert the conventions of popular-science writing.  For example, in the first chapter, he includes a table that categorizes each of the book’s remaining chapters as “Mainstream,” “Controversial,” or “Extremely Controversial.”  And whenever you’re reading the text and cringing at a crucial factual point that was left out, chances are good you’ll find a footnote at the bottom of the page explaining that point.  I hope both of these conventions become de rigueur for all future pop-science books, but I’m not counting on it.

The book has what Tegmark himself describes as a “Dr. Jekyll / Mr. Hyde” structure, with the first (“Dr. Jekyll”) half of the book relaying more-or-less accepted discoveries in physics and cosmology, and the second (“Mr. Hyde”) half focusing on Tegmark’s own Mathematical Universe Hypothesis (MUH).  Let’s accept that both halves are enjoyable reads, and that the first half contains lots of wonderful science.  Is there anything worth saying about the truth or falsehood of the MUH?

In my view, the MUH gestures toward two points that are both correct and important—neither of them new, but both well worth repeating in a pop-science book.  The first is that the laws of physics aren’t “suggestions,” which the particles can obey when they feel like it but ignore when Uri Geller picks up a spoon.  In that respect, they’re completely unlike human laws, and the fact that we use the same word for both is unfortunate.  Nor are the laws merely observed correlations, as in “scientists find link between yogurt and weight loss.”  The links of fundamental physics are ironclad: the world “obeys” them in much the same sense that a computer obeys its code, or the positive integers obey the rules of arithmetic.  Of course we don’t yet know the complete program describing the state evolution of the universe, but everything learned since Galileo leads one to expect that such a program exists.  (According to quantum mechanics, the program describing our observed reality is a probabilistic one, but for me, that fact by itself does nothing to change its lawlike character.  After all, if you know the initial state, Hamiltonian, and measurement basis, then quantum mechanics gives you a perfect algorithm to calculate the probabilities.)

The second true and important nugget in the MUH is that the laws are “mathematical.”  By itself, I’d say that’s a vacuous statement, since anything that can be described at all can be described mathematically.  (As a degenerate case, a “mathematical description of reality” could simply be a gargantuan string of bits, listing everything that will ever happen at every point in spacetime.)  The nontrivial part is that, at least if we ignore boundary conditions and the details of our local environment (which maybe we shouldn’t!), the laws of nature are expressible as simple, elegant math—and moreover, the same structures (complex numbers, group representations, Riemannian manifolds…) that mathematicians find important for internal reasons, again and again turn out to play a crucial role in physics.  It didn’t have to be that way, but it is.

Putting the two points together, it seems fair to say that the physical world is “isomorphic to” a mathematical structure—and moreover, a structure whose time evolution obeys simple, elegant laws.   All of this I find unobjectionable: if you believe it, it doesn’t make you a Tegmarkian; it makes you ready for freshman science class.

But Tegmark goes further.  He doesn’t say that the universe is “isomorphic” to a mathematical structure; he says that it is that structure, that its physical and mathematical existence are the same thing.  Furthermore, he says that every mathematical structure “exists” in the same sense that “ours” does; we simply find ourselves in one of the structures capable of intelligent life (which shouldn’t surprise us).  Thus, for Tegmark, the answer to Stephen Hawking’s famous question—”What is it that breathes fire into the equations and gives them a universe to describe?”—is that every consistent set of equations has fire breathed into it.  Or rather, every mathematical structure of at most countable cardinality whose relations are definable by some computer program.  (Tegmark allows that structures that aren’t computably definable, like the set of real numbers, might not have fire breathed into them.)

Anyway, the ensemble of all (computable?) mathematical structures, constituting the totality of existence, is what Tegmark calls the “Level IV multiverse.”  In his nomenclature, our universe consists of anything from which we can receive signals; anything that exists but that we can’t receive signals from is part of a “multiverse” rather than our universe.  The “Level I multiverse” is just the entirety of our spacetime, including faraway regions from which we can never receive a signal due to the dark energy.  The Level II multiverse consists of the infinitely many other “bubbles” (i.e., “local Big Bangs”), with different values of the constants of physics, that would, in eternal inflation cosmologies, have generically formed out of the same inflating substance that gave rise to our Big Bang.  The Level III multiverse is Everett’s many worlds.  Thus, for Tegmark, the Level IV multiverse is a sort of natural culmination of earlier multiverse theorizing.  (Some people might call it a reductio ad absurdum, but Tegmark is nothing if not a bullet-swallower.)

Now, why should you believe in any of these multiverses?  Or better: what does it buy you to believe in them?

As Tegmark correctly points out, none of the multiverses are “theories,” but they might be implications of theories that we have other good reasons to accept.  In particular, it seems crazy to believe that the Big Bang created space only up to the furthest point from which light can reach the earth, and no further.  So, do you believe that space extends further than our cosmological horizon?  Then boom! you believe in the Level I multiverse, according to Tegmark’s definition of it.

Likewise, do you believe there was a period of inflation in the first ~10-32 seconds after the Big Bang?  Inflation has made several confirmed predictions (e.g., about the “fractal” nature of the CMB perturbations), and if last week’s announcement of B-modes in the CMB is independently verified, that will pretty much clinch the case for inflation.  But Alan Guth, Andrei Linde, and others have argued that, if you accept inflation, then it seems hard to prevent patches of the inflating substance from continuing to inflate forever, and thereby giving rise to infinitely many “other” Big Bangs.  Furthermore, if you accept string theory, then the six extra dimensions should generically curl up differently in each of those Big Bangs, giving rise to different apparent values of the constants of physics.  So then boom! with those assumptions, you’re sold on the Level II multiverse as well.  Finally, of course, there are people (like David Deutsch, Eliezer Yudkowsky, and Tegmark himself) who think that quantum mechanics forces you to accept the Level III multiverse of Everett.  Better yet, Tegmark claims that these multiverses are “falsifiable.”  For example, if inflation turns out to be wrong, then the Level II multiverse is dead, while if quantum mechanics is wrong, then the Level III one is dead.

Admittedly, the Level IV multiverse is a tougher sell, even by the standards of the last two paragraphs.  If you believe physical existence to be the same thing as mathematical existence, what puzzles does that help to explain?  What novel predictions does it make?  Forging fearlessly ahead, Tegmark argues that the MUH helps to “explain” why our universe has so many mathematical regularities in the first place.  And it “predicts” that more mathematical regularities will be discovered, and that everything discovered by science will be mathematically describable.  But what about the existence of other mathematical universes?  If, Tegmark says (on page 354), our qualitative laws of physics turn out to allow a narrow range of numerical constants that permit life, whereas other possible qualitative laws have no range of numerical constants that permit life, then that would be evidence for the existence of a mathematical multiverse.  For if our qualitative laws were the only ones into which fire had been breathed, then why would they just so happen to have a narrow but nonempty range of life-permitting constants?

I suppose I’m not alone in finding this totally unpersuasive.  When most scientists say they want “predictions,” they have in mind something meatier than “predict the universe will continue to be describable by mathematics.”  (How would we know if we found something that wasn’t mathematically describable?  Could we even describe such a thing with English words, in order to write papers about it?)  They also have in mind something meatier than “predict that the laws of physics will be compatible with the existence of intelligent observers, but if you changed them a little, then they’d stop being compatible.”  (The first part of that prediction is solid enough, but the second part might depend entirely on what we mean by a “little change” or even an “intelligent observer.”)

What’s worse is that Tegmark’s rules appear to let him have it both ways.  To whatever extent the laws of physics turn out to be “as simple and elegant as anyone could hope for,” Tegmark can say: “you see?  that’s evidence for the mathematical character of our universe, and hence for the MUH!”  But to whatever extent the laws turn out not to be so elegant, to be weird or arbitrary, he can say: “see?  that’s evidence that our laws were selected more-or-less randomly among all possible laws compatible with the existence of intelligent life—just as the MUH predicted!”

Still, maybe the MUH could be sharpened to the point where it did make definite predictions?  As Tegmark acknowledges, the central difficulty with doing so is that no one has any idea what measure to use over the space of mathematical objects (or even computably-describable objects).  This becomes clear if we ask a simple question like: what fraction of the mathematical multiverse consists of worlds that contain nothing but a single three-dimensional cube?

We could try to answer such a question using the universal prior: that is, we could make a list of all self-delimiting computer programs, then count the total weight of programs that generate a single cube and then halt, where each n-bit program gets assigned 1/2n weight.  Sure, the resulting fraction would be uncomputable, but at least we’d have defined it.  Except wait … which programming language should we use?  (The constant factors could actually matter here!)  Worse yet, what exactly counts as a “cube”?  Does it have to have faces, or are vertices and edges enough?  How should we interpret the string of 1′s and 0′s output by the program, in order to know whether it describes a cube or not?  (Also, how do we decide whether two programs describe the “same” cube?  And if they do, does that mean they’re describing the same universe, or two different universes that happen to be identical?)

These problems are simply more-dramatic versions of the “standard” measure problem in inflationary cosmology, which asks how to make statistical predictions in a multiverse where everything that can happen will happen, and will happen an infinite number of times.  The measure problem is sometimes discussed as if it were a technical issue: something to acknowledge but then set to the side, in the hope that someone will eventually come along with some clever counting rule that solves it.  To my mind, however, the problem goes deeper: it’s a sign that, although we might have started out in physics, we’ve now stumbled into metaphysics.

Some cosmologists would strongly protest that view.  Most of them would agree with me that Tegmark’s Level IV multiverse is metaphysics, but they’d insist that the Level I, Level II, and perhaps Level III multiverses were perfectly within the scope of scientific inquiry: they either exist or don’t exist, and the fact that we get confused about the measure problem is our issue, not nature’s.

My response can be summed up in a question: why not ride this slippery slope all the way to the bottom?  Thinkers like Nick Bostrom and Robin Hanson have pointed out that, in the far future, we might expect that computer-simulated worlds (as in The Matrix) will vastly outnumber the “real” world.  So then, why shouldn’t we predict that we’re much more likely to live in a computer simulation than we are in one of the “original” worlds doing the simulating?  And as a logical next step, why shouldn’t we do physics by trying to calculate a probability measure over different kinds of simulated worlds: for example, those run by benevolent simulators versus evil ones?  (For our world, my own money’s on “evil.”)

But why stop there?  As Tegmark points out, what does it matter if a computer simulation is actually run or not?  Indeed, why shouldn’t you say something like the following: assuming that π is a normal number, your entire life history must be encoded infinitely many times in π’s decimal expansion.  Therefore, you’re infinitely more likely to be one of your infinitely many doppelgängers “living in the digits of π” than you are to be the “real” you, of whom there’s only one!  (Of course, you might also be living in the digits of e or √2, possibilities that also merit reflection.)

At this point, of course, you’re all the way at the bottom of the slope, in Mathematical Universe Land, where Tegmark is eagerly waiting for you.  But you still have no idea how to calculate a measure over mathematical objects: for example, how to say whether you’re more likely to be living in the first 1010^120 digits of π, or the first 1010^120 digits of e.  And as a consequence, you still don’t know how to use the MUH to constrain your expectations for what you’re going to see next.

Now, notice that these different ways down the slippery slope all have a common structure:

  1. We borrow an idea from science that’s real and important and profound: for example, the possible infinite size and duration of our universe, or inflationary cosmology, or the linearity of quantum mechanics, or the likelihood of π being a normal number, or the possibility of computer-simulated universes.
  2. We then run with that idea until we smack right into a measure problem, and lose the ability to make useful predictions.

Many people want to frame the multiverse debates as “science versus pseudoscience,” or “science versus science fiction,” or (as I did before) “physics versus metaphysics.”  But actually, I don’t think any of those dichotomies get to the nub of the matter.  All of the multiverses I’ve mentioned—certainly the inflationary and Everett multiverses, but even the computer-simuverse and the π-verse—have their origins in legitimate scientific questions and in genuinely-great achievements of science.  However, they then extrapolate those achievements in a direction that hasn’t yet led to anything impressive.  Or at least, not to anything that we couldn’t have gotten without the ontological commitments that led to the multiverse and its measure problem.

What is it, in general, that makes a scientific theory impressive?  I’d say that the answer is simple: connecting elegant math to actual facts of experience.

When Einstein said, the perihelion of Mercury precesses at 43 seconds of arc per century because gravity is the curvature of spacetime—that was impressive.

When Dirac said, you should see a positron because this equation in quantum field theory is a quadratic with both positive and negative solutions (and then the positron was found)—that was impressive.

When Darwin said, there must be equal numbers of males and females in all these different animal species because any other ratio would fail to be an equilibrium—that was impressive.

When people say that multiverse theorizing “isn’t science,” I think what they mean is that it’s failed, so far, to be impressive science in the above sense.  It hasn’t yet produced any satisfying clicks of understanding, much less dramatically-confirmed predictions.  Yes, Steven Weinberg kind-of, sort-of used “multiverse” reasoning to predict—correctly—that the cosmological constant should be nonzero.  But as far as I can tell, he could just as well have dispensed with the “multiverse” part, and said: “I see no physical reason why the cosmological constant should be zero, rather than having some small nonzero value still consistent with the formation of stars and galaxies.”

At this, many multiverse proponents would protest: “look, Einstein, Dirac, and Darwin is setting a pretty high bar!  Those guys were smart but also lucky, and it’s unrealistic to expect that scientists will always be so lucky.  For many aspects of the world, there might not be an elegant theoretical explanation—or any explanation at all better than, ‘well, if it were much different, then we probably wouldn’t be here talking about it.’  So, are you saying we should ignore where the evidence leads us, just because of some a-priori prejudice in favor of mathematical elegance?”

In a sense, yes, I am saying that.  Here’s an analogy: suppose an aspiring filmmaker said, “I want my films to capture the reality of human experience, not some Hollywood myth.  So, in most of my movies nothing much will happen at all.  If something does happen—say, a major character dies—it won’t be after some interesting, character-forming struggle, but meaninglessly, in a way totally unrelated to the rest of the film.  Like maybe they get hit by a bus.  Then some other random stuff will happen, and then the movie will end.”

Such a filmmaker, I’d say, would have a perfect plan for creating boring, arthouse movies that nobody wants to watch.  Dramatic, character-forming struggles against the odds might not be the norm of human experience, but they are the central ingredient of entertaining cinema—so if you want to create an entertaining movie, then you have to postselect on those parts of human experience that do involve dramatic struggles.  In the same way, I claim that elegant mathematical explanations for observed facts are the central ingredient of great science.  Not everything in the universe might have such an explanation, but if one wants to create great science, one has to postselect on the things that do.

(Note that there’s an irony here: the same unsatisfyingness, the same lack of explanatory oomph, that make something a “lousy movie” to those with a scientific mindset, can easily make it a great movie to those without such a mindset.  The hunger for nontrivial mathematical explanations is a hunger one has to acquire!)

Some readers might argue: “but weren’t quantum mechanics, chaos theory, and Gödel’s theorem scientifically important precisely because they said that certain phenomena—the exact timing of a radioactive decay, next month’s weather, the bits of Chaitin’s Ω—were unpredictable and unexplainable in fundamental ways?”  To me, these are the exceptions that prove the rule.  Quantum mechanics, chaos, and Gödel’s theorem were great science not because they declared certain facts unexplainable, but because they explained why those facts (and not other facts) had no explanations of certain kinds.  Even more to the point, they gave definite rules to help figure out what would and wouldn’t be explainable in their respective domains: is this state an eigenstate of the operator you’re measuring?  is the Lyapunov exponent positive?  is there a proof of independence from PA or ZFC?

So, what would be the analogue of the above for the multiverse?  Is there any Level II or IV multiverse hypothesis that says: sure, the mass of electron might be a cosmic accident, with at best an anthropic explanation, but the mass of the Higgs boson is almost certainly not such an accident?  Or that the sum or difference of the two masses is not an accident?  (And no, it doesn’t count to affirm as “non-accidental” things that we already have non-anthropic explanations for.)  If such a hypothesis exists, tell me in the comments!  As far as I know, all Level II and IV multiverse hypotheses are still at the stage where basically anything that isn’t already explained might vary across universes and be anthropically selected.  And that, to my mind, makes them very different in character from quantum mechanics, chaos, or Gödel’s theorem.

In summary, here’s what I feel is a reasonable position to take right now, regarding all four of Tegmark’s multiverse levels (not to mention the computer-simuverse, which I humbly propose as Level 3.5):

Yes, these multiverses are a perfectly fine thing to speculate about: sure they’re unobservable, but so are plenty of other entities that science has forced us to accept.  There are even natural reasons, within physics and cosmology, that could lead a person to speculate about each of these multiverse levels.  So if you want to speculate, knock yourself out!  If, however, you want me to accept the results as more than speculation—if you want me to put them on the bookshelf next to Darwin and Einstein—then you’ll need to do more than argue that other stuff I already believe logically entails a multiverse (which I’ve never been sure about), or point to facts that are currently unexplained as evidence that we need a multiverse to explain their unexplainability, or claim as triumphs for your hypothesis things that don’t really need the hypothesis at all, or describe implausible hypothetical scenarios that could confirm or falsify the hypothesis.  Rather, you’ll need to use your multiverse hypothesis—and your proposed solution to the resulting measure problem—to do something new that impresses me.

The Scientific Case for P≠NP

March 7th, 2014

Out there in the wider world—OK, OK, among Luboš Motl, and a few others who comment on this blog—there appears to be a widespread opinion that P≠NP is just “a fashionable dogma of the so-called experts,” something that’s no more likely to be true than false.  The doubters can even point to at least one accomplished complexity theorist, Dick Lipton, who publicly advocates agnosticism about whether P=NP.

Of course, not all the doubters reach their doubts the same way.  For Lipton, the thinking is probably something like: as scientists, we should be rigorously open-minded, and constantly question even the most fundamental hypotheses of our field.  For the outsiders, the thinking is more like: computer scientists are just not very smart—certainly not as smart as real scientists—so the fact that they consider something a “fundamental hypothesis” provides no information of value.

Consider, for example, this comment of Ignacio Mosqueira:

If there is no proof that means that there is no reason a-priori to prefer your arguments over those [of] Lubos. Expertise is not enough.  And the fact that Lubos is difficult to deal with doesn’t change that.

In my response, I wondered how broadly Ignacio would apply the principle “if there’s no proof, then there’s no reason to prefer any argument over any other one.”  For example, would he agree with the guy interviewed on Jon Stewart who earnestly explained that, since there’s no proof that turning on the LHC will destroy the world, but also no proof that it won’t destroy the world, the only rational inference is that there’s a 50% chance it will destroy the world?  (John Oliver’s deadpan response was classic: “I’m … not sure that’s how probability works…”)

In a lengthy reply, Luboš bites this bullet with relish and mustard.  In physics, he agrees, or even in “continuous mathematics that is more physics-wise,” it’s possible to have justified beliefs even without proof.  For example, he admits to a 99.9% probability that the Riemann hypothesis is true.  But, he goes on, “partial evidence in discrete mathematics just cannot exist.”  Discrete math and computer science, you see, are so arbitrary, manmade, and haphazard that every question is independent of every other; no amount of experience can give anyone any idea which way the next question will go.

No, I’m not kidding.  That’s his argument.

I couldn’t help wondering: what about number theory?  Aren’t the positive integers a “discrete” structure?  And isn’t the Riemann Hypothesis fundamentally about the distribution of primes?  Or does the Riemann Hypothesis get counted as an “honorary physics-wise continuous problem” because it can also be stated analytically?  But then what about Goldbach’s Conjecture?  Is Luboš 50/50 on that one too?  Better yet, what about continuous, analytic problems that are closely related to P vs. NP?  For example, Valiant’s Conjecture says you can’t linearly embed the permanent of an n×n matrix as the determinant of an m×m matrix, unless m≥exp(n).  Mulmuley and others have connected this “continuous cousin” of P≠NP to issues in algebraic geometry, representation theory, and even quantum groups and Langlands duality.  So, does that make it kosher?  The more I thought about the proposed distinction, the less sense it made to me.

But enough of this.  In the rest of this post, I want to explain why the odds that you should assign to P≠NP are more like 99% than they are like 50%.  This post supersedes my 2006 post on the same topic, which I hereby retire.  While that post was mostly OK as far as it went, I now feel like I can do a much better job articulating the central point.  (And also, I made the serious mistake in 2006 of striving for literary eloquence and tongue-in-cheek humor.  That works great for readers who already know the issues inside-and-out, and just want to be amused.  Alas, it doesn’t work so well for readers who don’t know the issues, are extremely literal-minded, and just want ammunition to prove their starting assumption that I’m a doofus who doesn’t understand the basics of his own field.)

So, OK, why should you believe P≠NP?  Here’s why:

Because, like any other successful scientific hypothesis, the P≠NP hypothesis has passed severe tests that it had no good reason to pass were it false.

What kind of tests am I talking about?

By now, tens of thousands of problems have been proved to be NP-complete.  They range in character from theorem proving to graph coloring to airline scheduling to bin packing to protein folding to auction pricing to VLSI design to minimizing soap films to winning at Super Mario Bros.  Meanwhile, another cluster of tens of thousands of problems has been proved to lie in P (or BPP).  Those range from primality to matching to linear and semidefinite programming to edit distance to polynomial factoring to hundreds of approximation tasks.  Like the NP-complete problems, many of the P and BPP problems are also related to each other by a rich network of reductions.  (For example, countless other problems are in P “because” linear and semidefinite programming are.)

So, if we were to draw a map of the complexity class NP  according to current knowledge, what would it look like?  There’d be a huge, growing component of NP-complete problems, all connected to each other by an intricate network of reductions.  There’d be a second huge component of P problems, many of them again connected by reductions.  Then, much like with the map of the continental US, there’d be a sparser population in the middle: stuff like factoring, graph isomorphism, and Unique Games that for various reasons has thus far resisted assimilation onto either of the coasts.

Of course, to prove P=NP, it would suffice to find a single link—that is, a single polynomial-time equivalence—between any of the tens of thousands of problems on the P coast, and any of the tens of thousands on the NP-complete one.  In half a century, this hasn’t happened: even as they’ve both ballooned exponentially, the two giant regions have remained defiantly separate from each other.  But that’s not even the main point.  The main point is that, as people explore these two regions, again and again there are “close calls”: places where, if a single parameter had worked out differently, the two regions would have come together in a cataclysmic collision.  Yet every single time, it’s just a fake-out.  Again and again the two regions “touch,” and their border even traces out weird and jagged shapes.  But even in those border zones, not a single problem ever crosses from one region to the other.  It’s as if they’re kept on their respective sides by an invisible electric fence.

As an example, consider the Set Cover problem: i.e., the problem, given a collection of subsets S1,…,Sm⊆{1,…,n}, of finding as few subsets as possible whose union equals the whole set.  Chvatal showed in 1979 that a greedy algorithm can produce, in polynomial time, a collection of sets whose size is at most ln(n) times larger than the optimum size.  This raises an obvious question: can you do better?  What about 0.9ln(n)?  Alas, building on a long sequence of prior works in PCP theory, it was recently shown that, if you could find a covering set at most (1-ε)ln(n) times larger than the optimum one, then you’d be solving an NP-complete problem, and P would equal NP.  Notice that, conversely, if the hardness result worked for ln(n) or anything above, then we’d also get P=NP.  So, why do the algorithm and the hardness result “happen to meet” at exactly ln(n), with neither one venturing the tiniest bit beyond?  Well, we might say, ln(n) is where the invisible electric fence is for this problem.

Want another example?  OK then, consider the “Boolean Max-k-CSP” problem: that is, the problem of setting n bits so as to satisfy the maximum number of constraints, where each constraint can involve an arbitrary Boolean function on any k of the bits.  The best known approximation algorithm, based on semidefinite programming, is guaranteed to satisfy at least a 2k/2k fraction of the constraints.  Can you guess where this is going?  Recently, Siu On Chan showed that it’s NP-hard to satisfy even slightly more than a 2k/2k fraction of constraints: if you can, then P=NP.  In this case the invisible electric fence sends off its shocks at 2k/2k.

I could multiply such examples endlessly—or at least, Dana (my source for such matters) could do so.  But there are also dozens of “weird coincidences” that involve running times rather than approximation ratios; and that strongly suggest, not only that P≠NP, but that problems like 3SAT should require cn time for some constant c.  For a recent example—not even a particularly important one, but one that’s fresh in my memory—consider this paper by myself, Dana, and Russell Impagliazzo.  A first thing we do in that paper is to give an approximation algorithm for a family of two-prover games called “free games.”  Our algorithm runs in quasipolynomial time:  specifically, nO(log(n)).  A second thing we do is show how to reduce the NP-complete 3SAT problem to free games of size ~2O(√n).

Composing those two results, you get an algorithm for 3SAT whose overall running time is roughly

$$ 2^{O( \sqrt{n} \log 2^{\sqrt{n}}) } = 2^{O(n)}. $$

Of course, this doesn’t improve on the trivial “try all possible solutions” algorithm.  But notice that, if our approximation algorithm for free games had been slightly faster—say, nO(log log(n))—then we could’ve used it to solve 3SAT in $$ 2^{O(\sqrt{n} \log n)} $$ time.  Conversely, if our reduction from 3SAT had produced free games of size (say) $$ 2^{O(n^{1/3})} $$ rather than 2O(√n), then we could’ve used that to solve 3SAT in $$ 2^{O(n^{2/3})} $$ time.

I should stress that these two results have completely different proofs: the approximation algorithm for free games “doesn’t know or care” about the existence of the reduction, nor does the reduction know or care about the algorithm.  Yet somehow, their respective parameters “conspire” so that 3SAT still needs cn time.  And you see the same sort of thing over and over, no matter which problem domain you’re interested in.  These ubiquitous “coincidences” would be immediately explained if 3SAT actually did require cn time—i.e., if it had a “hard core” for which brute-force search was unavoidable, no matter which way you sliced things up.  If that’s not true—i.e., if 3SAT has a subexponential algorithm—then we’re left with unexplained “spooky action at a distance.”  How do the algorithms and the reductions manage to coordinate with each other, every single time, to avoid spilling the subexponential secret?

Notice that, contrary to Luboš’s loud claims, there’s no “symmetry” between P=NP and P≠NP in these arguments.  Lower bound proofs are much harder to come across than either algorithms or reductions, and there’s not really a mystery about why: it’s hard to prove a negative!  (Especially when you’re up against known mathematical barriers, including relativization, algebrization, and natural proofs.)  In other words, even under the assumption that lower bound proofs exist, we now understand a lot about why the existing mathematical tools can’t deliver them, or can only do so for much easier problems.  Nor can I think of any example of a “spooky numerical coincidence” between two unrelated-seeming results, which would’ve yielded a proof of P≠NP had some parameters worked out differently.  P=NP and P≠NP can look like “symmetric” possibilities only if your symmetry is unbroken by knowledge.

Imagine a pond with small yellow frogs on one end, and large green frogs on the other.  After observing the frogs for decades, herpetologists conjecture that the populations represent two distinct species with different evolutionary histories, and are not interfertile.  Everyone realizes that to disprove this hypothesis, all it would take would be a single example of a green/yellow hybrid.  Since (for some reason) the herpetologists really care about this question, they undertake a huge program of breeding experiments, putting thousands of yellow female frogs next to green male frogs (and vice versa) during mating season, with candlelight, soft music, etc.  Nothing.

As this green vs. yellow frog conundrum grows in fame, other communities start investigating it as well: geneticists, ecologists, amateur nature-lovers, commercial animal breeders, ambitious teenagers on the science-fair circuit, and even some extralusionary physicists hoping to show up their dimwitted friends in biology.  These other communities try out hundreds of exotic breeding strategies that the herpetologists hadn’t considered, and contribute many useful insights.  They also manage to breed a larger, greener, but still yellow frog—something that, while it’s not a “true” hybrid, does have important practical applications for the frog-leg industry.  But in the end, no one has any success getting green and yellow frogs to mate.

Then one day, someone exclaims: “aha!  I just found a huge, previously-unexplored part of the pond where green and yellow frogs live together!  And what’s more, in this part, the small yellow frogs are bigger and greener than normal, and the large green frogs are smaller and yellower!”

This is exciting: the previously-sharp boundary separating green from yellow has been blurred!  Maybe the chasm can be crossed after all!

Alas, further investigation reveals that, even in the new part of the pond, the two frog populations still stay completely separate.  The smaller, yellower frogs there will mate with other small yellow frogs (even from faraway parts of the pond that they’d never ordinarily visit), but never, ever with the larger, greener frogs even from their own part.  And vice versa.  The result?  A discovery that could have falsified the original hypothesis has instead strengthened it—and precisely because it could’ve falsified it but didn’t.

Now imagine the above story repeated a few dozen more times—with more parts of the pond, a neighboring pond, sexually-precocious tadpoles, etc.  Oh, and I forgot to say this before, but imagine that doing a DNA analysis, to prove once and for all that the green and yellow frogs had separate lineages, is extraordinarily difficult.  But the geneticists know why it’s so difficult, and the reasons have more to do with the limits of their sequencing machines and with certain peculiarities of frog DNA, than with anything about these specific frogs.  In fact, the geneticists did get the sequencing machines to work for the easier cases of turtles and snakes—and in those cases, their results usually dovetailed well with earlier guesses based on behavior.  So for example, where reddish turtles and bluish turtles had never been observed interbreeding, the reason really did turn out to be that they came from separate species.  There were some surprises, of course, but nothing even remotely as shocking as seeing the green and yellow frogs suddenly getting it on.

Now, even after all this, someone could saunter over to the pond and say: “ha, what a bunch of morons!  I’ve never even seen a frog or heard one croak, but I know that you haven’t proved anything!  For all you know, the green and yellow frogs will start going at it tomorrow.  And don’t even tell me about ‘the weight of evidence,’ blah blah blah.  Biology is a scummy mud-discipline.  It has no ideas or principles; it’s just a random assortment of unrelated facts.  If the frogs started mating tomorrow, that would just be another brute, arbitrary fact, no more surprising or unsurprising than if they didn’t start mating tomorrow.  You jokers promote the ideology that green and yellow frogs are separate species, not because the evidence warrants it, but just because it’s a convenient way to cover up your own embarrassing failure to get them to mate.  I could probably breed them myself in ten minutes, but I have better things to do.”

At this, a few onlookers might nod appreciatively and say: “y’know, that guy might be an asshole, but let’s give him credit: he’s unafraid to speak truth to competence.”

Even among the herpetologists, a few might beat their breasts and announce: “Who’s to say he isn’t right?  I mean, what do we really know?  How do we know there even is a pond, or that these so-called ‘frogs’ aren’t secretly giraffes?  I, at least, have some small measure of wisdom, in that I know that I know nothing.”

What I want you to notice is how scientifically worthless all of these comments are.  If you wanted to do actual research on the frogs, then regardless of which sympathies you started with, you’d have no choice but to ignore the naysayers, and proceed as if the yellow and green frogs were different species.  Sure, you’d have in the back of your mind that they might be the same; you’d be ready to adjust your views if new evidence came in.  But for now, the theory that there’s just one species, divided into two subgroups that happen never to mate despite living in the same habitat, fails miserably at making contact with any of the facts that have been learned.  It leaves too much unexplained; in fact it explains nothing.

For all that, you might ask, don’t the naysayers occasionally turn out to be right?  Of course they do!  But if they were right more than occasionally, then science wouldn’t be possible.  We would still be in caves, beating our breasts and asking how we can know that frogs aren’t secretly giraffes.

So, that’s what I think about P and NP.  Do I expect this post to convince everyone?  No—but to tell you the truth, I don’t want it to.  I want it to convince most people, but I also want a few to continue speculating that P=NP.

Why, despite everything I’ve said, do I want maybe-P=NP-ism not to die out entirely?  Because alongside the P=NP carpers, I also often hear from a second group of carpers.  This second group says that P and NP are so obviously, self-evidently unequal that the quest to separate them with mathematical rigor is quixotic and absurd.  Theoretical computer scientists should quit wasting their time struggling to understand truths that don’t need to be understood, but only accepted, and do something useful for the world.  (A natural generalization of this view, I guess, is that all basic science should end.)  So, what I really want is for the two opposing groups of naysayers to keep each other in check, so that those who feel impelled to do so can get on with the fascinating quest to understand the ultimate limits of computation.


Update (March 8): At least eight readers have by now emailed me, or left comments, asking why I’m wasting so much time and energy arguing with Luboš Motl.  Isn’t it obvious that, ever since he stopped doing research around 2006 (if not earlier), this guy has completely lost his marbles?  That he’ll never, ever change his mind about anything?

Yes.  In fact, I’ve noticed repeatedly that, even when Luboš is wrong about a straightforward factual matter, he never really admits error: he just switches, without skipping a beat, to some other way to attack his interlocutor.  (To give a small example: watch how he reacts to being told that graph isomorphism is neither known nor believed to be NP-complete.  Caught making a freshman-level error about the field he’s attacking, he simply rants about how graph isomorphism is just as “representative” and “important” as NP-complete problems anyway, since no discrete math question is ever more or less “important” than any other; they’re all equally contrived and arbitrary.  At the Luboš casino, you lose even when you win!  The only thing you can do is stop playing and walk away.)

Anyway, my goal here was never to convince Luboš.  I was writing, not for him, but for my other readers: especially for those genuinely unfamiliar with these interesting issues, or intimidated by Luboš’s air of certainty.  I felt like I owed it to them to set out, clearly and forcefully, certain facts that all complexity theorists have encountered in their research, but that we hardly ever bother to articulate.  If you’ve never studied physics, then yes, it sounds crazy that there would be quadrillions of invisible neutrinos coursing through your body.  And if you’ve never studied computer science, it sounds crazy that there would be an “invisible electric fence,” again and again just barely separating what the state-of-the-art approximation algorithms can handle from what the state-of-the-art PCP tools can prove is NP-complete.  But there it is, and I wanted everyone else at least to see what the experts see, so that their personal judgments about the likelihood of P=NP could be informed by seeing it.

Luboš’s response to my post disappointed me (yes, really!).  I expected it to be nasty and unhinged, and so it was.  What I didn’t expect was that it would be so intellectually lightweight.  Confronted with the total untenability of his foot-stomping distinction between “continuous math” (where you can have justified beliefs without proof) and “discrete math” (where you can’t), and with exactly the sorts of “detailed, confirmed predictions” of the P≠NP hypothesis that he’d declared impossible, Luboš’s response was simply to repeat his original misconceptions, but louder.

And that brings me, I confess, to a second reason for my engagement with Luboš.  Several times, I’ve heard people express sentiments like:

Yes, of course Luboš is a raging jerk and a social retard.  But if you can just get past that, he’s so sharp and intellectually honest!  No matter how many people he needlessly offends, he always tells it like it is.

I want the nerd world to see—in as stark a situation as possible—that the above is not correct.  Luboš is wrong much of the time, and he’s intellectually dishonest.

At one point in his post, Luboš actually compares computer scientists who find P≠NP a plausible working hypothesis to his even greater nemesis: the “climate cataclysmic crackpots.”  (Strangely, he forgot to compare us to feminists, Communists, Muslim terrorists, or loop quantum gravity theorists.)  Even though the P versus NP and global warming issues might not seem closely linked, part of me is thrilled that Luboš has connected them as he has.  If, after seeing this ex-physicist’s “thought process” laid bare on the P versus NP problem—how his arrogance and incuriosity lead him to stake out a laughably-absurd position; how his vanity then causes him to double down after his errors are exposed—if, after seeing this, a single person is led to question Lubošian epistemology more generally, then my efforts will not have been in vain.

Anyway, now that I’ve finally unmasked Luboš—certainly to my own satisfaction, and I hope to that of most scientifically-literate readers—I’m done with this.  The physicist John Baez is rumored to have said: “It’s not easy to ignore Luboš, but it’s ALWAYS worth the effort.”  It took me eight years, but I finally see the multiple layers of profundity hidden in that snark.

And thus I make the following announcement:

For the next three years, I, Scott Aaronson, will not respond to anything Luboš says, nor will I allow him to comment on this blog.

In March 2017, I’ll reassess my Luboš policy.  Whether I relent will depend on a variety of factors—including whether Luboš has gotten the professional help he needs (from a winged pig, perhaps?) and changed his behavior; but also, how much my own quality of life has improved in the meantime.


Another Update (3/11): There’s some further thoughtful discussion of this post over on Reddit.


Another Update (3/13): Check out my MathOverflow question directly inspired by the comments on this post.


Yet Another Update (3/17): Dick Lipton and Ken Regan now have a response up to this post. My own response is coming soon in their comment section. For now, check out an excellent comment by Timothy Gowers, which begins “I firmly believe that P≠NP,” then plays devil’s-advocate by exploring the possibility that in this comment thread I called P being ‘severed in two,’ then finally returns to reasons for believing that P≠NP after all.