## Archive for the ‘Complexity’ Category

### A Physically Universal Cellular Automaton

Thursday, June 26th, 2014

It’s been understood for decades that, if you take a simple discrete rule—say, a cellular automaton like Conway’s Game of Life—and iterate it over and over, you can very easily get the capacity for universal computation.  In other words, your cellular automaton becomes able to implement any desired sequence of AND, OR, and NOT gates, store and retrieve bits in a memory, and even (in principle) run Windows or Linux, albeit probably veerrryyy sloowwllyyy, using a complicated contraption of thousands or millions of cells to represent each bit of the desired computation.  If I’m not mistaken, a guy named Wolfram even wrote an entire 1200-page-long book about this phenomenon (see here for my 2002 review).

But suppose we want more than mere computational universality.  Suppose we want “physical” universality: that is, the ability to implement any transformation whatsoever on any finite region of the cellular automaton’s state, by suitably initializing the complement of that region.  So for example, suppose that, given some 1000×1000 square of cells, we’d like to replace every “0” cell within that square by a “1” cell, and vice versa.  Then physical universality would mean that we could do that, eventually, by some “machine” we could build outside the 1000×1000 square of interest.

You might wonder: are we really asking for more here than just ordinary computational universality?  Indeed we are.  To see this, consider Conway’s famous Game of Life.  Even though Life has been proved to be computationally universal, it’s not physically universal in the above sense.  The reason is simply that Life’s evolution rule is not time-reversible.  So if, for example, there were a lone “1” cell deep inside the 1000×1000 square, surrounded by a sea of “0” cells, then that “1” cell would immediately disappear without a trace, and no amount of machinery outside the square could possibly detect that it was ever there.

Furthermore, even cellular automata that are both time-reversible and computationally universal could fail to be physically universal.  Suppose, for example, that our CA allowed for the construction of “impenetrable walls,” through which no signal could pass.  And suppose that our 1000×1000 region contained a hollow box built out of these impenetrable walls.  Then, by definition, no amount of machinery that we built outside the region could ever detect whether there was a particle bouncing around inside the box.

So, in summary, we now face a genuinely new question:

Does there exist a physically universal cellular automaton, or not?

This question had sort of vaguely bounced around in my head (and probably other people’s) for years.  But as far as I know, it was first asked, clearly and explicitly, in a lovely 2010 preprint by Dominik Janzing.

Today, I’m proud to report that Luke Schaeffer, a first-year PhD student in my group, has answered Janzing’s question in the affirmative, by constructing the first cellular automaton (again, to the best of our knowledge) that’s been proved to be physically universal.  Click here for Luke’s beautifully-written preprint about his construction, and click here for a webpage that he’s prepared, explaining the details of the construction using color figures and videos.  Even if you don’t have time to get into the nitty-gritty, the videos on the webpage should give you a sense for the intricacy of what he accomplished.

Very briefly, Luke first defines a reversible, two-dimensional CA involving particles that move diagonally across a square lattice, in one of four possible directions (northeast, northwest, southeast, or southwest).  The number of particles is always conserved.  The only interesting behavior occurs when three of the particles “collide” in a single 2×2 square, and Luke gives rules (symmetric under rotations and reflections) that specify what happens then.

Given these rules, it’s possible to prove that any configuration whatsoever of finitely many particles will “diffuse,” after not too many time steps, into four unchanging clouds of particles, which thereafter simply move away from each other in the four diagonal directions for all eternity.  This has the interesting consequence that Luke’s CA, when initialized with finitely many particles, cannot be capable of universal computation in Turing’s sense.  In other words, there’s no way, using n initial particles confined to an n×n box, to set up a computation that continues to do something interesting after 2n or 22^n time steps, let alone forever. On the other hand, using finitely many particles, one can also prove that the CA can perform universal computation in the Boolean circuit sense.  In other words, we can implement AND, OR, and NOT gates, and by chaining them together, can compute any Boolean function that we like on any fixed number of input bits (with the number of input bits generally much smaller than the number of particles).  And this “circuit universality,” rather than Turing-machine universality, is all that’s implied anyway by physical universality in Janzing’s sense.  (As a side note, the distinction between circuit and Turing-machine universality seems to deserve much more attention than it usually gets.)

Anyway, while the “diffusion into four clouds” aspect of Luke’s CA might seem annoying, it turns out to be extremely useful for proving physical universality.  For it has the consequence that, no matter what the initial state was inside the square we cared about, that state will before too long be encoded into the states of four clouds headed away from the square.  So then, “all” we need to do is engineer some additional clouds of particles, initially outside the square, that

1. intercept the four escaping clouds,
2. “decode” the contents of those clouds into a flat sequence of bits,
3. apply an arbitrary Boolean circuit to that bit sequence, and then
4. convert the output bits of the Boolean circuit into new clouds of particles converging back onto the square.

So, well … that’s exactly what Luke did.  And just in case there’s any doubt about the correctness of the end result, Luke actually implemented his construction in the cellular-automaton simulator Golly, where you can try it out yourself (he explains how on his webpage).

So far, of course, I’ve skirted past the obvious question of “why.”  Who cares that we now know that there exists a physically-universal CA?  Apart from the sheer intrinsic coolness, a second reason is that I’ve been interested for years in how to make finer (but still computer-sciencey) distinctions, among various “candidate laws of physics,” than simply saying that some laws are computationally universal and others aren’t, or some are easy to simulate on a standard Turing machine and others hard.  For ironically, the very pervasiveness of computational universality (the thing Wolfram goes on and on about) makes it of limited usefulness in distinguishing physical laws: almost any sufficiently-interesting set of laws will turn out to be computationally universal, at least in the circuit sense if not the Turing-machine one!

On the other hand, many of these laws will be computationally universal only because of extremely convoluted constructions, which fall apart if even the tiniest error is introduced.  And in other cases, we’ll be able to build a universal computer, all right, but that computer will be relatively impotent to obtain interesting input about its physical environment, or to make its output affect the gross features of the CA’s physical state.  If you like, we’ll have a recipe for creating a universe full of ivory-tower, eggheaded nerds, who can search for counterexamples to Goldbach’s Conjecture but can’t build a shelter to protect themselves from a hail of “1” bits, or even learn whether such a hail is present or not, or decide which other part of the CA to travel to.

As I see it, Janzing’s notion of physical universality is directly addressing this “egghead” problem, by asking whether we can build not merely a universal computer but a particularly powerful kind of robot: one that can effect a completely arbitrary transformation (given enough time, of course) on any part of its physical environment.  And the answer turns out to be that, at least in a weird CA consisting of clouds of diagonally-moving particles, we can indeed do that.  The question of whether we can also achieve physical universality in more natural CAs remains open (and in his Future Work section, Luke discusses several ways of formalizing what we mean by “more natural”).

As Luke mentions in his introduction, there’s at least a loose connection here to David Deutsch’s recent notion of constructor theory (see also this followup paper by Deutsch and Chiara Marletto).  Basically, Deutsch and Marletto want to reconstruct all of physics taking what can and can’t be constructed (i.e., what kinds of transformations are possible) as the most primitive concept, rather than (as in ordinary physics) what will or won’t happen (i.e., how the universe’s state evolves with time).  The hope is that, once physics was reconstructed in this way, we could then (for example) state and answer the question of whether or not scalable quantum computers can be built as a principled question of physics, rather than as a “mere” question of engineering.

Now, regardless of what you think about these audacious goals, or about Deutsch and Marletto’s progress (or lack of progress?) so far toward achieving them, it’s certainly a worthwhile project to study what sorts of machines can and can’t be constructed, as a matter of principle, both in the real physical world and in other, hypothetical worlds that capture various aspects of our world.  Indeed, one could say that that’s what many of us in quantum information and theoretical computer science have been trying to do for decades!  However, Janzing’s “physical universality” problem hints at a different way to approach the project: starting with some far-reaching desire (say, to be able to implement any transformation whatsoever on any finite region), can we engineer laws of physics that make that desire possible?  If so, then how close can we make those laws to “our” laws?

Luke has now taken a first stab at answering these questions.  Whether his result ends up merely being a fun, recreational “terminal branch” on the tree of science, or a trunk leading to something more, probably just depends on how interested people get.  I have no doubt that our laws of physics permit the creation of additional papers on this topic, but whether they do or don’t is (as far as I can see) merely a question of contingency and human will, not a constructor-theoretic question.

### CCC’s Declaration of Independence

Friday, June 6th, 2014

Recently, the participants of the Conference on Computational Complexity (CCC)—the latest iteration of which I’ll be speaking at next week in Vancouver—voted to declare their independence from the IEEE, and to become a solo, researcher-organized conference.  See this open letter for the reasons why (basically, IEEE charged a huge overhead, didn’t allow open access to the proceedings, and increased rather than decreased the administrative burden on the organizers).  As a former member of the CCC Steering Committee, I’m in violent agreement with this move, and only wish we’d managed to do it sooner.

Now, Dieter van Melkebeek (the current Steering Committee chair) is asking complexity theorists to sign a public Letter of Support, to make it crystal-clear that the community is behind the move to independence.  And Jeff Kinne has asked me to advertise the letter on my blog.  So, if you’re a complexity theorist who agrees with the move, please go there and sign (it already has 111 signatures, but could use more).

Meanwhile, I wish to express my profound gratitude to Dieter, Jeff, and the other Steering Committee members for their efforts toward independence.  The only thing I might’ve done differently would be to add a little more … I dunno, pizzazz to the documents explaining the reasons for the move.  Like:

When in the Course of human events, it becomes necessary for a conference to dissolve the organizational bands that have connected it with the IEEE, and to assume among the powers of the earth, the separate and equal station to which the Laws of Mathematics and the CCC Charter entitle it, a decent respect to the opinions of theorist-kind requires that the participants should declare the causes which impel them to the separation.

We hold these truths to be self-evident (but still in need of proof), that P and NP are created unequal, that one-way functions exist, that the polynomial hierarchy is infinite…

### Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

Tuesday, May 27th, 2014

Update (June 3): A few days after we posted this paper online, Brent Werness, a postdoc in probability theory at the University of Washington, discovered a serious error in the “experimental” part of the paper.  Happily, Brent is now collaborating with us on producing a new version of the paper that fixes the error, which we hope to have available within a few months (and which will replace the version currently on the arXiv).

To make a long story short: while the overall idea, of measuring “apparent complexity” by the compressed file size of a coarse-grained image, is fine, the “interacting coffee automaton” that we study in the paper is not an example where the apparent complexity becomes large at intermediate times.  That fact can be deduced as a corollary of a result of Liggett from 2009 about the “symmetric exclusion process,” and can be seen as a far-reaching generalization of a result that we prove in our paper’s appendix: namely, that in the non-interacting coffee automaton (our “control case”), the apparent complexity after t time steps is upper-bounded by O(log(nt)).  As it turns out, we were more right than we knew to worry about large-deviation bounds giving complete mathematical control over what happens when the cream spills into the coffee, thereby preventing the apparent complexity from ever becoming large!

But what about our numerical results, which showed a small but unmistakable complexity bump for the interacting automaton (figure 10(a) in the paper)?  It now appears that the complexity bump we saw in our data is likely to be explainable by an incomplete removal of what we called “border pixel artifacts”: that is, “spurious” complexity that arises merely from the fact that, at the border between cream and coffee, we need to round the fraction of cream up or down to the nearest integer to produce a grayscale.  In the paper, we devoted a whole section (Section 6) to border pixel artifacts and the need to deal with them: something sufficiently non-obvious that in the comments of this post, you can find people arguing with me that it’s a non-issue.  Well, it now appears that we erred by underestimating the severity of border pixel artifacts, and that a better procedure to get rid of them would also eliminate the complexity bump for the interacting automaton.

Once again, this error has no effect on either the general idea of complexity rising and then falling in closed thermodynamic systems, or our proposal for how to quantify that rise and fall—the two aspects of the paper that have generated the most interest.  But we made a bad choice of model system with which to illustrate those ideas.  Had I looked more carefully at the data, I could’ve noticed the problem before we posted, and I take responsibility for my failure to do so.

The good news is that ultimately, I think the truth only makes our story more interesting.  For it turns out that apparent complexity, as we define it, is not something that’s trivial to achieve by just setting loose a bunch of randomly-walking particles, which bump into each other but are otherwise completely independent.  If you want “complexity” along the approach to thermal equilibrium, you need to work a bit harder for it.  One promising idea, which we’re now exploring, is to consider a cream tendril whose tip takes a random walk through the coffee, leaving a trail of cream in its wake.  Using results in probability theory—closely related, or so I’m told, to the results for which Wendelin Werner won his Fields Medal!—it may even be possible to prove analytically that the apparent complexity becomes large in thermodynamic systems with this sort of behavior, much as one can prove that the complexity doesn’t become large in our original coffee automaton.

So, if you’re interested in this topic, stay tuned for the updated version of our paper.  In the meantime, I wish to express our deepest imaginable gratitude to Brent Werness for telling us all this.

Good news!  After nearly three years of procrastination, fellow blogger Sean Carroll, former MIT undergraduate Lauren Ouellette, and yours truly finally finished a paper with the above title (coming soon to an arXiv near you).  PowerPoint slides are also available (as usual, you’re on your own if you can’t open them—sorry!).

For the background and context of this paper, please see my old post “The First Law of Complexodynamics,” which discussed Sean’s problem of defining a “complextropy” measure that first increases and then decreases in closed thermodynamic systems, in contrast to entropy (which increases monotonically).  In this exploratory paper, we basically do five things:

1. We survey several candidate “complextropy” measures: their strengths, weaknesses, and relations to one another.
2. We propose a model system for studying such measures: a probabilistic cellular automaton that models a cup of coffee into which cream has just been poured.
3. We report the results of numerical experiments with one of the measures, which we call “apparent complexity” (basically, the gzip file size of a smeared-out image of the coffee cup).  The results confirm that the apparent complexity does indeed increase, reach a maximum, then turn around and decrease as the coffee and cream mix.
4. We discuss a technical issue that one needs to overcome (the so-called “border pixels” problem) before one can do meaningful experiments in this area, and offer a solution.
5. We raise the open problem of proving analytically that the apparent complexity ever becomes large for the coffee automaton.  To underscore this problem’s difficulty, we prove that the apparent complexity doesn’t become large in a simplified version of the coffee automaton.

Anyway, here’s the abstract:

In contrast to entropy, which increases monotonically, the “complexity” or “interestingness” of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids (“coffee” and “cream”). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton’s state, which we dub the “apparent complexity.” We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the “coffee cup’s” horizontal dimension. We raise the problem of proving this behavior analytically.

Questions and comments more than welcome.

In unrelated news, Shafi Goldwasser has asked me to announce that the Call for Papers for the 2015 Innovations in Theoretical Computer Science (ITCS) conference is now available.

### Why I Am Not An Integrated Information Theorist (or, The Unconscious Expander)

Wednesday, May 21st, 2014

Happy birthday to me!

Recently, lots of people have been asking me what I think about IIT—no, not the Indian Institutes of Technology, but Integrated Information Theory, a widely-discussed “mathematical theory of consciousness” developed over the past decade by the neuroscientist Giulio Tononi.  One of the askers was Max Tegmark, who’s enthusiastically adopted IIT as a plank in his radical mathematizing platform (see his paper “Consciousness as a State of Matter”).  When, in the comment thread about Max’s Mathematical Universe Hypothesis, I expressed doubts about IIT, Max challenged me to back up my doubts with a quantitative calculation.

So, this is the post that I promised to Max and all the others, about why I don’t believe IIT.  And yes, it will contain that quantitative calculation.

But first, what is IIT?  The central ideas of IIT, as I understand them, are:

(1) to propose a quantitative measure, called Φ, of the amount of “integrated information” in a physical system (i.e. information that can’t be localized in the system’s individual parts), and then

(2) to hypothesize that a physical system is “conscious” if and only if it has a large value of Φ—and indeed, that a system is more conscious the larger its Φ value.

I’ll return later to the precise definition of Φ—but basically, it’s obtained by minimizing, over all subdivisions of your physical system into two parts A and B, some measure of the mutual information between A’s outputs and B’s inputs and vice versa.  Now, one immediate consequence of any definition like this is that all sorts of simple physical systems (a thermostat, a photodiode, etc.) will turn out to have small but nonzero Φ values.  To his credit, Tononi cheerfully accepts the panpsychist implication: yes, he says, it really does mean that thermostats and photodiodes have small but nonzero levels of consciousness.  On the other hand, for the theory to work, it had better be the case that Φ is small for “intuitively unconscious” systems, and only large for “intuitively conscious” systems.  As I’ll explain later, this strikes me as a crucial point on which IIT fails.

The literature on IIT is too big to do it justice in a blog post.  Strikingly, in addition to the “primary” literature, there’s now even a “secondary” literature, which treats IIT as a sort of established base on which to build further speculations about consciousness.  Besides the Tegmark paper linked to above, see for example this paper by Maguire et al., and associated popular article.  (Ironically, Maguire et al. use IIT to argue for the Penrose-like view that consciousness might have uncomputable aspects—a use diametrically opposed to Tegmark’s.)

Anyway, if you want to read a popular article about IIT, there are loads of them: see here for the New York Times’s, here for Scientific American‘s, here for IEEE Spectrum‘s, and here for the New Yorker‘s.  Unfortunately, none of those articles will tell you the meat (i.e., the definition of integrated information); for that you need technical papers, like this or this by Tononi, or this by Seth et al.  IIT is also described in Christof Koch’s memoir Consciousness: Confessions of a Romantic Reductionist, which I read and enjoyed; as well as Tononi’s Phi: A Voyage from the Brain to the Soul, which I haven’t yet read.  (Koch, one of the world’s best-known thinkers and writers about consciousness, has also become an evangelist for IIT.)

So, I want to explain why I don’t think IIT solves even the problem that it “plausibly could have” solved.  But before I can do that, I need to do some philosophical ground-clearing.  Broadly speaking, what is it that a “mathematical theory of consciousness” is supposed to do?  What questions should it answer, and how should we judge whether it’s succeeded?

The most obvious thing a consciousness theory could do is to explain why consciousness exists: that is, to solve what David Chalmers calls the “Hard Problem,” by telling us how a clump of neurons is able to give rise to the taste of strawberries, the redness of red … you know, all that ineffable first-persony stuff.  Alas, there’s a strong argument—one that I, personally, find completely convincing—why that’s too much to ask of any scientific theory.  Namely, no matter what the third-person facts were, one could always imagine a universe consistent with those facts in which no one “really” experienced anything.  So for example, if someone claims that integrated information “explains” why consciousness exists—nope, sorry!  I’ve just conjured into my imagination beings whose Φ-values are a thousand, nay a trillion times larger than humans’, yet who are also philosophical zombies: entities that there’s nothing that it’s like to be.  Granted, maybe such zombies can’t exist in the actual world: maybe, if you tried to create one, God would notice its large Φ-value and generously bequeath it a soul.  But if so, then that’s a further fact about our world, a fact that manifestly couldn’t be deduced from the properties of Φ alone.  Notice that the details of Φ are completely irrelevant to the argument.

Faced with this point, many scientifically-minded people start yelling and throwing things.  They say that “zombies” and so forth are empty metaphysics, and that our only hope of learning about consciousness is to engage with actual facts about the brain.  And that’s a perfectly reasonable position!  As far as I’m concerned, you absolutely have the option of dismissing Chalmers’ Hard Problem as a navel-gazing distraction from the real work of neuroscience.  The one thing you can’t do is have it both ways: that is, you can’t say both that the Hard Problem is meaningless, and that progress in neuroscience will soon solve the problem if it hasn’t already.  You can’t maintain simultaneously that

(a) once you account for someone’s observed behavior and the details of their brain organization, there’s nothing further about consciousness to be explained, and

(b) remarkably, the XYZ theory of consciousness can explain the “nothing further” (e.g., by reducing it to integrated information processing), or might be on the verge of doing so.

As obvious as this sounds, it seems to me that large swaths of consciousness-theorizing can just be summarily rejected for trying to have their brain and eat it in precisely the above way.

Fortunately, I think IIT survives the above observations.  For we can easily interpret IIT as trying to do something more “modest” than solve the Hard Problem, although still staggeringly audacious.  Namely, we can say that IIT “merely” aims to tell us which physical systems are associated with consciousness and which aren’t, purely in terms of the systems’ physical organization.  The test of such a theory is whether it can produce results agreeing with “commonsense intuition”: for example, whether it can affirm, from first principles, that (most) humans are conscious; that dogs and horses are also conscious but less so; that rocks, livers, bacteria colonies, and existing digital computers are not conscious (or are hardly conscious); and that a room full of people has no “mega-consciousness” over and above the consciousnesses of the individuals.

The reason it’s so important that the theory uphold “common sense” on these test cases is that, given the experimental inaccessibility of consciousness, this is basically the only test available to us.  If the theory gets the test cases “wrong” (i.e., gives results diverging from common sense), it’s not clear that there’s anything else for the theory to get “right.”  Of course, supposing we had a theory that got the test cases right, we could then have a field day with the less-obvious cases, programming our computers to tell us exactly how much consciousness is present in octopi, fetuses, brain-damaged patients, and hypothetical AI bots.

In my opinion, how to construct a theory that tells us which physical systems are conscious and which aren’t—giving answers that agree with “common sense” whenever the latter renders a verdict—is one of the deepest, most fascinating problems in all of science.  Since I don’t know a standard name for the problem, I hereby call it the Pretty-Hard Problem of Consciousness.  Unlike with the Hard Hard Problem, I don’t know of any philosophical reason why the Pretty-Hard Problem should be inherently unsolvable; but on the other hand, humans seem nowhere close to solving it (if we had solved it, then we could reduce the abortion, animal rights, and strong AI debates to “gentlemen, let us calculate!”).

Now, I regard IIT as a serious, honorable attempt to grapple with the Pretty-Hard Problem of Consciousness: something concrete enough to move the discussion forward.  But I also regard IIT as a failed attempt on the problem.  And I wish people would recognize its failure, learn from it, and move on.

In my view, IIT fails to solve the Pretty-Hard Problem because it unavoidably predicts vast amounts of consciousness in physical systems that no sane person would regard as particularly “conscious” at all: indeed, systems that do nothing but apply a low-density parity-check code, or other simple transformations of their input data.  Moreover, IIT predicts not merely that these systems are “slightly” conscious (which would be fine), but that they can be unboundedly more conscious than humans are.

To justify that claim, I first need to define Φ.  Strikingly, despite the large literature about Φ, I had a hard time finding a clear mathematical definition of it—one that not only listed formulas but fully defined the structures that the formulas were talking about.  Complicating matters further, there are several competing definitions of Φ in the literature, including ΦDM (discrete memoryless), ΦE (empirical), and ΦAR (autoregressive), which apply in different contexts (e.g., some take time evolution into account and others don’t).  Nevertheless, I think I can define Φ in a way that will make sense to theoretical computer scientists.  And crucially, the broad point I want to make about Φ won’t depend much on the details of its formalization anyway.

We consider a discrete system in a state x=(x1,…,xn)∈Sn, where S is a finite alphabet (the simplest case is S={0,1}).  We imagine that the system evolves via an “updating function” f:Sn→Sn. Then the question that interests us is whether the xi‘s can be partitioned into two sets A and B, of roughly comparable size, such that the updates to the variables in A don’t depend very much on the variables in B and vice versa.  If such a partition exists, then we say that the computation of f does not involve “global integration of information,” which on Tononi’s theory is a defining aspect of consciousness.

More formally, given a partition (A,B) of {1,…,n}, let us write an input y=(y1,…,yn)∈Sn to f in the form (yA,yB), where yA consists of the y variables in A and yB consists of the y variables in B.  Then we can think of f as mapping an input pair (yA,yB) to an output pair (zA,zB).  Now, we define the “effective information” EI(A→B) as H(zB | A random, yB=xB).  Or in words, EI(A→B) is the Shannon entropy of the output variables in B, if the input variables in A are drawn uniformly at random, while the input variables in B are fixed to their values in x.  It’s a measure of the dependence of B on A in the computation of f(x).  Similarly, we define

EI(B→A) := H(zA | B random, yA=xA).

We then consider the sum

Φ(A,B) := EI(A→B) + EI(B→A).

Intuitively, we’d like the integrated information Φ=Φ(f,x) be the minimum of Φ(A,B), over all 2n-2 possible partitions of {1,…,n} into nonempty sets A and B.  The idea is that Φ should be large, if and only if it’s not possible to partition the variables into two sets A and B, in such a way that not much information flows from A to B or vice versa when f(x) is computed.

However, no sooner do we propose this than we notice a technical problem.  What if A is much larger than B, or vice versa?  As an extreme case, what if A={1,…,n-1} and B={n}?  In that case, we’ll have Φ(A,B)≤2log2|S|, but only for the boring reason that there’s hardly any entropy in B as a whole, to either influence A or be influenced by it.  For this reason, Tononi proposes a fix where we normalize each Φ(A,B) by dividing it by min{|A|,|B|}.  He then defines the integrated information Φ to be Φ(A,B), for whichever partition (A,B) minimizes the ratio Φ(A,B) / min{|A|,|B|}.  (Unless I missed it, Tononi never specifies what we should do if there are multiple (A,B)’s that all achieve the same minimum of Φ(A,B) / min{|A|,|B|}.  I’ll return to that point later, along with other idiosyncrasies of the normalization procedure.)

Tononi gives some simple examples of the computation of Φ, showing that it is indeed larger for systems that are more “richly interconnected” in an intuitive sense.  He speculates, plausibly, that Φ is quite large for (some reasonable model of) the interconnection network of the human brain—and probably larger for the brain than for typical electronic devices (which tend to be highly modular in design, thereby decreasing their Φ), or, let’s say, than for other organs like the pancreas.  Ambitiously, he even speculates at length about how a large value of Φ might be connected to the phenomenology of consciousness.

To be sure, empirical work in integrated information theory has been hampered by three difficulties.  The first difficulty is that we don’t know the detailed interconnection network of the human brain.  The second difficulty is that it’s not even clear what we should define that network to be: for example, as a crude first attempt, should we assign a Boolean variable to each neuron, which equals 1 if the neuron is currently firing and 0 if it’s not firing, and let f be the function that updates those variables over a timescale of, say, a millisecond?  What other variables do we need—firing rates, internal states of the neurons, neurotransmitter levels?  Is choosing many of these variables uniformly at random (for the purpose of calculating Φ) really a reasonable way to “randomize” the variables, and if not, what other prescription should we use?

The third and final difficulty is that, even if we knew exactly what we meant by “the f and x corresponding to the human brain,” and even if we had complete knowledge of that f and x, computing Φ(f,x) could still be computationally intractable.  For recall that the definition of Φ involved minimizing a quantity over all the exponentially-many possible bipartitions of {1,…,n}.  While it’s not directly relevant to my arguments in this post, I leave it as a challenge for interested readers to pin down the computational complexity of approximating Φ to some reasonable precision, assuming that f is specified by a polynomial-size Boolean circuit, or alternatively, by an NC0 function (i.e., a function each of whose outputs depends on only a constant number of the inputs).  (Presumably Φ will be #P-hard to calculate exactly, but only because calculating entropy exactly is a #P-hard problem—that’s not interesting.)

I conjecture that approximating Φ is an NP-hard problem, even for restricted families of f’s like NC0 circuits—which invites the amusing thought that God, or Nature, would need to solve an NP-hard problem just to decide whether or not to imbue a given physical system with consciousness!  (Alas, if you wanted to exploit this as a practical approach for solving NP-complete problems such as 3SAT, you’d need to do a rather drastic experiment on your own brain—an experiment whose result would be to render you unconscious if your 3SAT instance was satisfiable, or conscious if it was unsatisfiable!  In neither case would you be able to communicate the outcome of the experiment to anyone else, nor would you have any recollection of the outcome after the experiment was finished.)  In the other direction, it would also be interesting to upper-bound the complexity of approximating Φ.  Because of the need to estimate the entropies of distributions (even given a bipartition (A,B)), I don’t know that this problem is in NP—the best I can observe is that it’s in AM.

In any case, my own reason for rejecting IIT has nothing to do with any of the “merely practical” issues above: neither the difficulty of defining f and x, nor the difficulty of learning them, nor the difficulty of calculating Φ(f,x).  My reason is much more basic, striking directly at the hypothesized link between “integrated information” and consciousness.  Specifically, I claim the following:

Yes, it might be a decent rule of thumb that, if you want to know which brain regions (for example) are associated with consciousness, you should start by looking for regions with lots of information integration.  And yes, it’s even possible, for all I know, that having a large Φ-value is one necessary condition among many for a physical system to be conscious.  However, having a large Φ-value is certainly not a sufficient condition for consciousness, or even for the appearance of consciousness.  As a consequence, Φ can’t possibly capture the essence of what makes a physical system conscious, or even of what makes a system look conscious to external observers.

The demonstration of this claim is embarrassingly simple.  Let S=Fp, where p is some prime sufficiently larger than n, and let V be an n×n Vandermonde matrix over Fp—that is, a matrix whose (i,j) entry equals ij-1 (mod p).  Then let f:Sn→Sn be the update function defined by f(x)=Vx.  Now, for p large enough, the Vandermonde matrix is well-known to have the property that every submatrix is full-rank (i.e., “every submatrix preserves all the information that it’s possible to preserve about the part of x that it acts on”).  And this implies that, regardless of which bipartition (A,B) of {1,…,n} we choose, we’ll get

EI(A→B) = EI(B→A) = min{|A|,|B|} log2p,

and hence

Φ(A,B) = EI(A→B) + EI(B→A) = 2 min{|A|,|B|} log2p,

or after normalizing,

Φ(A,B) / min{|A|,|B|} = 2 log2p.

Or in words: the normalized information integration has the same value—namely, the maximum value!—for every possible bipartition.  Now, I’d like to proceed from here to a determination of Φ itself, but I’m prevented from doing so by the ambiguity in the definition of Φ that I noted earlier.  Namely, since every bipartition (A,B) minimizes the normalized value Φ(A,B) / min{|A|,|B|}, in theory I ought to be able to pick any of them for the purpose of calculating Φ.  But the unnormalized value Φ(A,B), which gives the final Φ, can vary greatly, across bipartitions: from 2 log2p (if min{|A|,|B|}=1) all the way up to n log2p (if min{|A|,|B|}=n/2).  So at this point, Φ is simply undefined.

On the other hand, I can solve this problem, and make Φ well-defined, by an ironic little hack.  The hack is to replace the Vandermonde matrix V by an n×n matrix W, which consists of the first n/2 rows of the Vandermonde matrix each repeated twice (assume for simplicity that n is a multiple of 4).  As before, we let f(x)=Wx.  Then if we set A={1,…,n/2} and B={n/2+1,…,n}, we can achieve

EI(A→B) = EI(B→A) = (n/4) log2p,

Φ(A,B) = EI(A→B) + EI(B→A) = (n/2) log2p,

and hence

Φ(A,B) / min{|A|,|B|} = log2p.

In this case, I claim that the above is the unique bipartition that minimizes the normalized integrated information Φ(A,B) / min{|A|,|B|}, up to trivial reorderings of the rows.  To prove this claim: if |A|=|B|=n/2, then clearly we minimize Φ(A,B) by maximizing the number of repeated rows in A and the number of repeated rows in B, exactly as we did above.  Thus, assume |A|≤|B| (the case |B|≤|A| is analogous).  Then clearly

EI(B→A) ≥ |A|/2,

while

EI(A→B) ≥ min{|A|, |B|/2}.

So if we let |A|=cn and |B|=(1-c)n for some c∈(0,1/2], then

Φ(A,B) ≥ [c/2 + min{c, (1-c)/2}] n,

and

Φ(A,B) / min{|A|,|B|} = Φ(A,B) / |A| = 1/2 + min{1, 1/(2c) – 1/2}.

But the above expression is uniquely minimized when c=1/2.  Hence the normalized integrated information is minimized essentially uniquely by setting A={1,…,n/2} and B={n/2+1,…,n}, and we get

Φ = Φ(A,B) = (n/2) log2p,

which is quite a large value (only a factor of 2 less than the trivial upper bound of n log2p).

Now, why did I call the switch from V to W an “ironic little hack”?  Because, in order to ensure a large value of Φ, I decreased—by a factor of 2, in fact—the amount of “information integration” that was intuitively happening in my system!  I did that in order to decrease the normalized value Φ(A,B) / min{|A|,|B|} for the particular bipartition (A,B) that I cared about, thereby ensuring that that (A,B) would be chosen over all the other bipartitions, thereby increasing the final, unnormalized value Φ(A,B) that Tononi’s prescription tells me to return.  I hope I’m not alone in fearing that this illustrates a disturbing non-robustness in the definition of Φ.

But let’s leave that issue aside; maybe it can be ameliorated by fiddling with the definition.  The broader point is this: I’ve shown that my system—the system that simply applies the matrix W to an input vector x—has an enormous amount of integrated information Φ.  Indeed, this system’s Φ equals half of its entire information content.  So for example, if n were 1014 or so—something that wouldn’t be hard to arrange with existing computers—then this system’s Φ would exceed any plausible upper bound on the integrated information content of the human brain.

And yet this Vandermonde system doesn’t even come close to doing anything that we’d want to call intelligent, let alone conscious!  When you apply the Vandermonde matrix to a vector, all you’re really doing is mapping the list of coefficients of a degree-(n-1) polynomial over Fp, to the values of the polynomial on the n points 0,1,…,n-1.  Now, evaluating a polynomial on a set of points turns out to be an excellent way to achieve “integrated information,” with every subset of outputs as correlated with every subset of inputs as it could possibly be.  In fact, that’s precisely why polynomials are used so heavily in error-correcting codes, such as the Reed-Solomon code, employed (among many other places) in CD’s and DVD’s.  But that doesn’t imply that every time you start up your DVD player you’re lighting the fire of consciousness.  It doesn’t even hint at such a thing.  All it tells us is that you can have integrated information without consciousness (or even intelligence)—just like you can have computation without consciousness, and unpredictability without consciousness, and electricity without consciousness.

It might be objected that, in defining my “Vandermonde system,” I was too abstract and mathematical.  I said that the system maps the input vector x to the output vector Wx, but I didn’t say anything about how it did so.  To perform a computation—even a computation as simple as a matrix-vector multiply—won’t we need a physical network of wires, logic gates, and so forth?  And in any realistic such network, won’t each logic gate be directly connected to at most a few other gates, rather than to billions of them?  And if we define the integrated information Φ, not directly in terms of the inputs and outputs of the function f(x)=Wx, but in terms of all the actual logic gates involved in computing f, isn’t it possible or even likely that Φ will go back down?

This is a good objection, but I don’t think it can rescue IIT.  For we can achieve the same qualitative effect that I illustrated with the Vandermonde matrix—the same “global information integration,” in which every large set of outputs depends heavily on every large set of inputs—even using much “sparser” computations, ones where each individual output depends on only a few of the inputs.  This is precisely the idea behind low-density parity check (LDPC) codes, which have had a major impact on coding theory over the past two decades.  Of course, one would need to muck around a bit to construct a physical system based on LDPC codes whose integrated information Φ was provably large, and for which there were no wildly-unbalanced bipartitions that achieved lower Φ(A,B)/min{|A|,|B|} values than the balanced bipartitions one cared about.  But I feel safe in asserting that this could be done, similarly to how I did it with the Vandermonde matrix.

More generally, we can achieve pretty good information integration by hooking together logic gates according to any bipartite expander graph: that is, any graph with n vertices on each side, such that every k vertices on the left side are connected to at least min{(1+ε)k,n} vertices on the right side, for some constant ε>0.  And it’s well-known how to create expander graphs whose degree (i.e., the number of edges incident to each vertex, or the number of wires coming out of each logic gate) is a constant, such as 3.  One can do so either by plunking down edges at random, or (less trivially) by explicit constructions from algebra or combinatorics.  And as indicated in the title of this post, I feel 100% confident in saying that the so-constructed expander graphs are not conscious!  The brain might be an expander, but not every expander is a brain.

Before winding down this post, I can’t resist telling you that the concept of integrated information (though it wasn’t called that) played an interesting role in computational complexity in the 1970s.  As I understand the history, Leslie Valiant conjectured that Boolean functions f:{0,1}n→{0,1}n with a high degree of “information integration” (such as discrete analogues of the Fourier transform) might be good candidates for proving circuit lower bounds, which in turn might be baby steps toward P≠NP.  More strongly, Valiant conjectured that the property of information integration, all by itself, implied that such functions had to be at least somewhat computationally complex—i.e., that they couldn’t be computed by circuits of size O(n), or even required circuits of size Ω(n log n).  Alas, that hope was refuted by Valiant’s later discovery of linear-size superconcentrators.  Just as information integration doesn’t suffice for intelligence or consciousness, so Valiant learned that information integration doesn’t suffice for circuit lower bounds either.

As humans, we seem to have the intuition that global integration of information is such a powerful property that no “simple” or “mundane” computational process could possibly achieve it.  But our intuition is wrong.  If it were right, then we wouldn’t have linear-size superconcentrators or LDPC codes.

I should mention that I had the privilege of briefly speaking with Giulio Tononi (as well as his collaborator, Christof Koch) this winter at an FQXi conference in Puerto Rico.  At that time, I challenged Tononi with a much cruder, handwavier version of some of the same points that I made above.  Tononi’s response, as best as I can reconstruct it, was that it’s wrong to approach IIT like a mathematician; instead one needs to start “from the inside,” with the phenomenology of consciousness, and only then try to build general theories that can be tested against counterexamples.  This response perplexed me: of course you can start from phenomenology, or from anything else you like, when constructing your theory of consciousness.  However, once your theory has been constructed, surely it’s then fair game for others to try to refute it with counterexamples?  And surely the theory should be judged, like anything else in science or philosophy, by how well it withstands such attacks?

But let me end on a positive note.  In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed.  Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.

[Endnote: See also this related post, by the philosopher Eric Schwetzgebel: Why Tononi Should Think That the United States Is Conscious.  While the discussion is much more informal, and the proposed counterexample more debatable, the basic objection to IIT is the same.]

Update (5/22): Here are a few clarifications of this post that might be helpful.

(1) The stuff about zombies and the Hard Problem was simply meant as motivation and background for what I called the “Pretty-Hard Problem of Consciousness”—the problem that I take IIT to be addressing.  You can disagree with the zombie stuff without it having any effect on my arguments about IIT.

(2) I wasn’t arguing in this post that dualism is true, or that consciousness is irreducibly mysterious, or that there could never be any convincing theory that told us how much consciousness was present in a physical system.  All I was arguing was that, at any rate, IIT is not such a theory.

(3) Yes, it’s true that my demonstration of IIT’s falsehood assumes—as an axiom, if you like—that while we might not know exactly what we mean by “consciousness,” at any rate we’re talking about something that humans have to a greater extent than DVD players.  If you reject that axiom, then I’d simply want to define a new word for a certain quality that non-anesthetized humans seem to have and that DVD players seem not to, and clarify that that other quality is the one I’m interested in.

(4) For my counterexample, the reason I chose the Vandermonde matrix is not merely that it’s invertible, but that all of its submatrices are full-rank.  This is the property that’s relevant for producing a large value of the integrated information Φ; by contrast, note that the identity matrix is invertible, but produces a system with Φ=0.  (As another note, if we work over a large enough field, then a random matrix will have this same property with high probability—but I wanted an explicit example, and while the Vandermonde is far from the only one, it’s one of the simplest.)

(5) The n×n Vandermonde matrix only does what I want if we work over (say) a prime field Fp with p>>n elements.  Thus, it’s natural to wonder whether similar examples exist where the basic system variables are bits, rather than elements of Fp.  The answer is yes. One way to get such examples is using the low-density parity check codes that I mention in the post.  Another common way to get Boolean examples, and which is also used in practice in error-correcting codes, is to start with the Vandermonde matrix (a.k.a. the Reed-Solomon code), and then combine it with an additional component that encodes the elements of Fp as strings of bits in some way.  Of course, you then need to check that doing this doesn’t harm the properties of the original Vandermonde matrix that you cared about (e.g., the “information integration”) too much, which causes some additional complication.

(6) Finally, it might be objected that my counterexamples ignored the issue of dynamics and “feedback loops”: they all consisted of unidirectional processes, which map inputs to outputs and then halt.  However, this can be fixed by the simple expedient of iterating the process over and over!  I.e., first map x to Wx, then map Wx to W2x, and so on.  The integrated information should then be the same as in the unidirectional case.

Update (5/24): See a very interesting comment by David Chalmers.

### The NEW Ten Most Annoying Questions in Quantum Computing

Tuesday, May 13th, 2014

Eight years ago, I put up a post entitled The Ten Most Annoying Questions in Quantum Computing.  One of the ten wasn’t a real question—it was simply a request for readers to submit questions—so let’s call it nine.  I’m delighted to say that, of the nine questions, six have by now been completely settled—most recently, my question about the parallel-repeated value of the CHSH game, which Andris Ambainis pointed out to me last week can be answered using a 2008 result of Barak et al. combined with a 2013 result of Dinur and Steurer.

To be clear, the demise of so many problems is exactly the outcome I wanted. In picking problems, my goal wasn’t to shock and awe with difficulty—as if to say “this is how smart I am, that whatever stumps me will also stump everyone else for decades.” Nor was it to showcase my bottomless profundity, by proffering questions so vague, multipartite, and open-ended that no matter what progress was made, I could always reply “ah, but you still haven’t addressed the real question!” Nor, finally, was my goal to list the biggest research directions for the entire field, the stuff everyone already knows about (“is there a polynomial-time quantum algorithm for graph isomorphism?”). My interest was exclusively in “little” questions, in weird puzzles that looked (at least at the time) like there was no deep obstruction to just killing them one by one, whichever way their answers turned out. What made them annoying was that they hadn’t succumbed already.

So, now that two-thirds of my problems have met the fate they deserved, at Andris’s suggestion I’m presenting a new list of Ten Most Annoying Questions in Quantum Computing—a list that starts with the three still-unanswered questions from the old list, and then adds seven more.

But we’ll get to that shortly. First, let’s review the six questions that have been answered.

CLOSED, NO-LONGER ANNOYING QUESTIONS IN QUANTUM COMPUTING

1. Given an n-qubit pure state, is there always a way to apply Hadamard gates to some subset of the qubits, so as to make all 2n computational basis states have nonzero amplitudes?  Positive answer by Ashley Montanaro and Dan Shepherd, posted to this blog in 2006.

3. Can any QMA(2) (QMA with two unentangled yes-provers) protocol be amplified to exponentially small error probability?  Positive answer by Aram Harrow and Ashley Montanaro, from a FOCS’2010 paper.

4. If a unitary operation U can be applied in polynomial time, then can some square root of U also be applied in polynomial time?  Positive answer by Lana Sheridan, Dmitri Maslov, and Michele Mosca, from a 2008 paper.

5. Suppose Alice and Bob are playing n parallel CHSH games, with no communication or entanglement. Is the probability that they’ll win all n games at most pn, for some p bounded below 0.853?

OK, let me relay what Andris Ambainis told me about this question, with Andris’s kind permission. First of all, we’ve known for a while that the optimal success probability is not the (3/4)n that Alice and Bob could trivially achieve by just playing all n games separately. I observed in 2006 that, by correlating their strategies between pairs of games in a clever way, Alice and Bob can win with probability (√10 / 4)n ~ 0.79n. And Barak et al. showed in 2008 that they can win with probability ((1+√5)/4)n ~ 0.81n. (Unfortunately, I don’t know the actual strategy that achieves the latter bound!  Barak et al. say they’ll describe it in the full version of their paper, but the full version hasn’t yet appeared.)

Anyway, Dinur-Steurer 2013 gave a general recipe to prove that the value of a repeated projection game is at most αn, where α is some constant that depends on the game in question. When Andris followed their recipe for the CHSH game, he obtained the result α=(1+√5)/4—thereby showing that Barak et al.’s strategy, whatever it is, is precisely optimal! Andris also observes that, for any two-prover game G, the Dinur-Steurer bound α(G) is always strictly less than the entangled value ω*(G), unless the classical and entangled values are the same for one copy of the game (i.e., unless ω(G)=ω*(G)). This implies that parallel repetition can never completely eliminate a quantum advantage.

6. Forget about an oracle relative to which BQP is not in PH (the Polynomial Hierarchy). Forget about an oracle relative to which BQP is not in AM (Arthur-Merlin). Is there an oracle relative to which BQP is not in SZK (Statistical Zero-Knowledge)?  Positive answer by me, posted to this blog in 2006.  See also my BQP vs. PH paper for a different proof.

9. Is there an n-qubit pure state that can be prepared by a circuit of size n3, and that can’t be distinguished from the maximally mixed state by any circuit of size n2?  A positive answer follows from this 2009 paper by Richard Low—thanks very much to Fernando Brandao for bringing that to my attention a few months ago.

OK, now on to:

THE NEW TEN MOST ANNOYING QUESTIONS IN QUANTUM COMPUTING

1. Can we get any upper bound whatsoever on the complexity class QMIP—i.e., quantum multi-prover interactive proofs with unlimited prior entanglement? (Since I asked this question in 2006, Ito and Vidick achieved the breakthrough lower bound NEXP⊆QMIP, but there’s been basically no progress on the upper bound side.)

2. Given any n-qubit unitary operation U, does there exist an oracle relative to which U can be (approximately) applied in polynomial time? (Since 2006, my interest in this question has only increased. See this paper by me and Greg Kuperberg for background and related results.)

3. How many mutually unbiased bases are there in non-prime-power dimensions?

4. Since Chris Fuchs was so thrilled by my including one of his favorite questions on my earlier list (question #3 above), let me add another of his favorites: do SIC-POVMs exist in arbitrary finite dimensions?

5. Is there a Boolean function f:{0,1}n→{0,1} whose bounded-error quantum query complexity is strictly greater than n/2?  (Thanks to Shelby Kimmel for this question!  Note that this paper by van Dam shows that the bounded-error quantum query complexity never exceeds n/2+O(√n), while this paper by Ambainis et al. shows that it’s at least n/2-O(√n) for almost all Boolean functions f.)

6. Is there a “universal disentangler”: that is, a superoperator S that takes nO(1) qubits as input; that produces a 2n-qubit bipartite state (with n qubits on each side) as output; whose output S(ρ) is always close in variation distance to a separable state; and that given an appropriate input state, can produce as output an approximation to any desired separable state?  (See here for background about this problem, originally posed by John Watrous. Note that if such an S existed and were computationally efficient, it would imply QMA=QMA(2).)

7. Suppose we have explicit descriptions of n two-outcome POVM measurements—say, as d×d Hermitian matrices E1,…,En—and are also given k=(log(nd))O(1) copies of an unknown quantum state ρ in d dimensions.  Is there a way to measure the copies so as to estimate the n expectation values Tr(E1ρ),…,Tr(Enρ), each to constant additive error?  (A forthcoming paper of mine on private-key quantum money will contain some background and related results.)

8. Is there a collection of 1- and 2-qubit gates that generates a group of unitary matrices that is (a) not universal for quantum computation, (b) not just conjugate to permuted diagonal matrices or one-qubit gates plus swaps, and (c) not conjugate to a subgroup of the Clifford group?

9. Given a partial Boolean function f:S→{0,1} with S⊆{0,1}n, is the bounded-error quantum query complexity of f always polynomially related to the smallest degree of any polynomial p:{0,1}n→R such that (a) p(x)∈[0,1] for all x∈{0,1}n, and (b) |p(x)-f(x)|≤1/3 for all x∈S?

10. Is there a quantum finite automaton that reads in an infinite sequence of i.i.d. coin flips, and whose limiting probability of being found in an “accept” state is at least 2/3 if the coin is fair and at most 1/3 if the coin is unfair?  (See this paper by me and Andy Drucker for background and related results.)

### The Quest for Randomness

Tuesday, April 22nd, 2014

So, I’ve written an article of that title for the wonderful American Scientist magazine—or rather, Part I of such an article.  This part explains the basics of Kolmogorov complexity and algorithmic information theory: how, under reasonable assumptions, these ideas can be used in principle to “certify” that a string of numbers was really produced randomly—something that one might’ve imagined impossible a priori.  Unfortunately, the article also explains why this fact is of limited use in practice: because Kolmogorov complexity is uncomputable!  Readers who already know this material won’t find much that’s new here, but I hope those who don’t will enjoy the piece.

Part II, to appear in the next issue, will be all about quantum entanglement and Bell’s Theorem, and their very recent use in striking protocols for generating so-called “Einstein-certified random numbers”—something of much more immediate practical interest.

Thanks so much to Fenella Saunders of American Scientist for commissioning these articles, and my apologies to her and any interested readers for the 4.5 years (!) it took me to get off my rear end (or rather, onto it) to write these things.

Update (4/28): Kate Becker of NOVA has published an article about “whether information is fundamental to reality,” which includes some quotes from me. Enjoy!

### Is There Anything Beyond Quantum Computing?

Friday, April 11th, 2014

So I’ve written an article about the above question for PBS’s website—a sort of tl;dr version of my 2005 survey paper NP-Complete Problems and Physical Reality, but updated with new material about the simulation of quantum field theories and about AdS/CFT.  Go over there, read the article (it’s free), then come back here to talk about it if you like.  Thanks so much to Kate Becker for commissioning the article.

In other news, there’s a profile of me at MIT News (called “The Complexonaut”) that some people might find amusing.

Oh, and anyone who thinks the main reason to care about quantum computing is that, if our civilization ever manages to surmount the profound scientific and technological obstacles to building a scalable quantum computer, then that little padlock icon on your web browser would no longer represent ironclad security?  Ha ha.  Yeah, it turns out that, besides factoring integers, you can also break OpenSSL by (for example) exploiting a memory bug in C.  The main reason to care about quantum computing is, and has always been, science.

### The Scientific Case for P≠NP

Friday, March 7th, 2014

Out there in the wider world—OK, OK, among Luboš Motl, and a few others who comment on this blog—there appears to be a widespread opinion that P≠NP is just “a fashionable dogma of the so-called experts,” something that’s no more likely to be true than false.  The doubters can even point to at least one accomplished complexity theorist, Dick Lipton, who publicly advocates agnosticism about whether P=NP.

Of course, not all the doubters reach their doubts the same way.  For Lipton, the thinking is probably something like: as scientists, we should be rigorously open-minded, and constantly question even the most fundamental hypotheses of our field.  For the outsiders, the thinking is more like: computer scientists are just not very smart—certainly not as smart as real scientists—so the fact that they consider something a “fundamental hypothesis” provides no information of value.

Consider, for example, this comment of Ignacio Mosqueira:

If there is no proof that means that there is no reason a-priori to prefer your arguments over those [of] Lubos. Expertise is not enough.  And the fact that Lubos is difficult to deal with doesn’t change that.

In my response, I wondered how broadly Ignacio would apply the principle “if there’s no proof, then there’s no reason to prefer any argument over any other one.”  For example, would he agree with the guy interviewed on Jon Stewart who earnestly explained that, since there’s no proof that turning on the LHC will destroy the world, but also no proof that it won’t destroy the world, the only rational inference is that there’s a 50% chance it will destroy the world?  (John Oliver’s deadpan response was classic: “I’m … not sure that’s how probability works…”)

In a lengthy reply, Luboš bites this bullet with relish and mustard.  In physics, he agrees, or even in “continuous mathematics that is more physics-wise,” it’s possible to have justified beliefs even without proof.  For example, he admits to a 99.9% probability that the Riemann hypothesis is true.  But, he goes on, “partial evidence in discrete mathematics just cannot exist.”  Discrete math and computer science, you see, are so arbitrary, manmade, and haphazard that every question is independent of every other; no amount of experience can give anyone any idea which way the next question will go.

No, I’m not kidding.  That’s his argument.

I couldn’t help wondering: what about number theory?  Aren’t the positive integers a “discrete” structure?  And isn’t the Riemann Hypothesis fundamentally about the distribution of primes?  Or does the Riemann Hypothesis get counted as an “honorary physics-wise continuous problem” because it can also be stated analytically?  But then what about Goldbach’s Conjecture?  Is Luboš 50/50 on that one too?  Better yet, what about continuous, analytic problems that are closely related to P vs. NP?  For example, Valiant’s Conjecture says you can’t linearly embed the permanent of an n×n matrix as the determinant of an m×m matrix, unless m≥exp(n).  Mulmuley and others have connected this “continuous cousin” of P≠NP to issues in algebraic geometry, representation theory, and even quantum groups and Langlands duality.  So, does that make it kosher?  The more I thought about the proposed distinction, the less sense it made to me.

But enough of this.  In the rest of this post, I want to explain why the odds that you should assign to P≠NP are more like 99% than they are like 50%.  This post supersedes my 2006 post on the same topic, which I hereby retire.  While that post was mostly OK as far as it went, I now feel like I can do a much better job articulating the central point.  (And also, I made the serious mistake in 2006 of striving for literary eloquence and tongue-in-cheek humor.  That works great for readers who already know the issues inside-and-out, and just want to be amused.  Alas, it doesn’t work so well for readers who don’t know the issues, are extremely literal-minded, and just want ammunition to prove their starting assumption that I’m a doofus who doesn’t understand the basics of his own field.)

So, OK, why should you believe P≠NP?  Here’s why:

Because, like any other successful scientific hypothesis, the P≠NP hypothesis has passed severe tests that it had no good reason to pass were it false.

What kind of tests am I talking about?

By now, tens of thousands of problems have been proved to be NP-complete.  They range in character from theorem proving to graph coloring to airline scheduling to bin packing to protein folding to auction pricing to VLSI design to minimizing soap films to winning at Super Mario Bros.  Meanwhile, another cluster of tens of thousands of problems has been proved to lie in P (or BPP).  Those range from primality to matching to linear and semidefinite programming to edit distance to polynomial factoring to hundreds of approximation tasks.  Like the NP-complete problems, many of the P and BPP problems are also related to each other by a rich network of reductions.  (For example, countless other problems are in P “because” linear and semidefinite programming are.)

So, if we were to draw a map of the complexity class NP  according to current knowledge, what would it look like?  There’d be a huge, growing component of NP-complete problems, all connected to each other by an intricate network of reductions.  There’d be a second huge component of P problems, many of them again connected by reductions.  Then, much like with the map of the continental US, there’d be a sparser population in the middle: stuff like factoring, graph isomorphism, and Unique Games that for various reasons has thus far resisted assimilation onto either of the coasts.

Of course, to prove P=NP, it would suffice to find a single link—that is, a single polynomial-time equivalence—between any of the tens of thousands of problems on the P coast, and any of the tens of thousands on the NP-complete one.  In half a century, this hasn’t happened: even as they’ve both ballooned exponentially, the two giant regions have remained defiantly separate from each other.  But that’s not even the main point.  The main point is that, as people explore these two regions, again and again there are “close calls”: places where, if a single parameter had worked out differently, the two regions would have come together in a cataclysmic collision.  Yet every single time, it’s just a fake-out.  Again and again the two regions “touch,” and their border even traces out weird and jagged shapes.  But even in those border zones, not a single problem ever crosses from one region to the other.  It’s as if they’re kept on their respective sides by an invisible electric fence.

As an example, consider the Set Cover problem: i.e., the problem, given a collection of subsets S1,…,Sm⊆{1,…,n}, of finding as few subsets as possible whose union equals the whole set.  Chvatal showed in 1979 that a greedy algorithm can produce, in polynomial time, a collection of sets whose size is at most ln(n) times larger than the optimum size.  This raises an obvious question: can you do better?  What about 0.9ln(n)?  Alas, building on a long sequence of prior works in PCP theory, it was recently shown that, if you could find a covering set at most (1-ε)ln(n) times larger than the optimum one, then you’d be solving an NP-complete problem, and P would equal NP.  Notice that, conversely, if the hardness result worked for ln(n) or anything above, then we’d also get P=NP.  So, why do the algorithm and the hardness result “happen to meet” at exactly ln(n), with neither one venturing the tiniest bit beyond?  Well, we might say, ln(n) is where the invisible electric fence is for this problem.

Want another example?  OK then, consider the “Boolean Max-k-CSP” problem: that is, the problem of setting n bits so as to satisfy the maximum number of constraints, where each constraint can involve an arbitrary Boolean function on any k of the bits.  The best known approximation algorithm, based on semidefinite programming, is guaranteed to satisfy at least a 2k/2k fraction of the constraints.  Can you guess where this is going?  Recently, Siu On Chan showed that it’s NP-hard to satisfy even slightly more than a 2k/2k fraction of constraints: if you can, then P=NP.  In this case the invisible electric fence sends off its shocks at 2k/2k.

I could multiply such examples endlessly—or at least, Dana (my source for such matters) could do so.  But there are also dozens of “weird coincidences” that involve running times rather than approximation ratios; and that strongly suggest, not only that P≠NP, but that problems like 3SAT should require cn time for some constant c.  For a recent example—not even a particularly important one, but one that’s fresh in my memory—consider this paper by myself, Dana, and Russell Impagliazzo.  A first thing we do in that paper is to give an approximation algorithm for a family of two-prover games called “free games.”  Our algorithm runs in quasipolynomial time:  specifically, nO(log(n)).  A second thing we do is show how to reduce the NP-complete 3SAT problem to free games of size ~2O(√n).

Composing those two results, you get an algorithm for 3SAT whose overall running time is roughly

$$2^{O( \sqrt{n} \log 2^{\sqrt{n}}) } = 2^{O(n)}.$$

Of course, this doesn’t improve on the trivial “try all possible solutions” algorithm.  But notice that, if our approximation algorithm for free games had been slightly faster—say, nO(log log(n))—then we could’ve used it to solve 3SAT in $$2^{O(\sqrt{n} \log n)}$$ time.  Conversely, if our reduction from 3SAT had produced free games of size (say) $$2^{O(n^{1/3})}$$ rather than 2O(√n), then we could’ve used that to solve 3SAT in $$2^{O(n^{2/3})}$$ time.

I should stress that these two results have completely different proofs: the approximation algorithm for free games “doesn’t know or care” about the existence of the reduction, nor does the reduction know or care about the algorithm.  Yet somehow, their respective parameters “conspire” so that 3SAT still needs cn time.  And you see the same sort of thing over and over, no matter which problem domain you’re interested in.  These ubiquitous “coincidences” would be immediately explained if 3SAT actually did require cn time—i.e., if it had a “hard core” for which brute-force search was unavoidable, no matter which way you sliced things up.  If that’s not true—i.e., if 3SAT has a subexponential algorithm—then we’re left with unexplained “spooky action at a distance.”  How do the algorithms and the reductions manage to coordinate with each other, every single time, to avoid spilling the subexponential secret?

Notice that, contrary to Luboš’s loud claims, there’s no “symmetry” between P=NP and P≠NP in these arguments.  Lower bound proofs are much harder to come across than either algorithms or reductions, and there’s not really a mystery about why: it’s hard to prove a negative!  (Especially when you’re up against known mathematical barriers, including relativization, algebrization, and natural proofs.)  In other words, even under the assumption that lower bound proofs exist, we now understand a lot about why the existing mathematical tools can’t deliver them, or can only do so for much easier problems.  Nor can I think of any example of a “spooky numerical coincidence” between two unrelated-seeming results, which would’ve yielded a proof of P≠NP had some parameters worked out differently.  P=NP and P≠NP can look like “symmetric” possibilities only if your symmetry is unbroken by knowledge.

Imagine a pond with small yellow frogs on one end, and large green frogs on the other.  After observing the frogs for decades, herpetologists conjecture that the populations represent two distinct species with different evolutionary histories, and are not interfertile.  Everyone realizes that to disprove this hypothesis, all it would take would be a single example of a green/yellow hybrid.  Since (for some reason) the herpetologists really care about this question, they undertake a huge program of breeding experiments, putting thousands of yellow female frogs next to green male frogs (and vice versa) during mating season, with candlelight, soft music, etc.  Nothing.

As this green vs. yellow frog conundrum grows in fame, other communities start investigating it as well: geneticists, ecologists, amateur nature-lovers, commercial animal breeders, ambitious teenagers on the science-fair circuit, and even some extralusionary physicists hoping to show up their dimwitted friends in biology.  These other communities try out hundreds of exotic breeding strategies that the herpetologists hadn’t considered, and contribute many useful insights.  They also manage to breed a larger, greener, but still yellow frog—something that, while it’s not a “true” hybrid, does have important practical applications for the frog-leg industry.  But in the end, no one has any success getting green and yellow frogs to mate.

Then one day, someone exclaims: “aha!  I just found a huge, previously-unexplored part of the pond where green and yellow frogs live together!  And what’s more, in this part, the small yellow frogs are bigger and greener than normal, and the large green frogs are smaller and yellower!”

This is exciting: the previously-sharp boundary separating green from yellow has been blurred!  Maybe the chasm can be crossed after all!

Alas, further investigation reveals that, even in the new part of the pond, the two frog populations still stay completely separate.  The smaller, yellower frogs there will mate with other small yellow frogs (even from faraway parts of the pond that they’d never ordinarily visit), but never, ever with the larger, greener frogs even from their own part.  And vice versa.  The result?  A discovery that could have falsified the original hypothesis has instead strengthened it—and precisely because it could’ve falsified it but didn’t.

Now imagine the above story repeated a few dozen more times—with more parts of the pond, a neighboring pond, sexually-precocious tadpoles, etc.  Oh, and I forgot to say this before, but imagine that doing a DNA analysis, to prove once and for all that the green and yellow frogs had separate lineages, is extraordinarily difficult.  But the geneticists know why it’s so difficult, and the reasons have more to do with the limits of their sequencing machines and with certain peculiarities of frog DNA, than with anything about these specific frogs.  In fact, the geneticists did get the sequencing machines to work for the easier cases of turtles and snakes—and in those cases, their results usually dovetailed well with earlier guesses based on behavior.  So for example, where reddish turtles and bluish turtles had never been observed interbreeding, the reason really did turn out to be that they came from separate species.  There were some surprises, of course, but nothing even remotely as shocking as seeing the green and yellow frogs suddenly getting it on.

Now, even after all this, someone could saunter over to the pond and say: “ha, what a bunch of morons!  I’ve never even seen a frog or heard one croak, but I know that you haven’t proved anything!  For all you know, the green and yellow frogs will start going at it tomorrow.  And don’t even tell me about ‘the weight of evidence,’ blah blah blah.  Biology is a scummy mud-discipline.  It has no ideas or principles; it’s just a random assortment of unrelated facts.  If the frogs started mating tomorrow, that would just be another brute, arbitrary fact, no more surprising or unsurprising than if they didn’t start mating tomorrow.  You jokers promote the ideology that green and yellow frogs are separate species, not because the evidence warrants it, but just because it’s a convenient way to cover up your own embarrassing failure to get them to mate.  I could probably breed them myself in ten minutes, but I have better things to do.”

At this, a few onlookers might nod appreciatively and say: “y’know, that guy might be an asshole, but let’s give him credit: he’s unafraid to speak truth to competence.”

Even among the herpetologists, a few might beat their breasts and announce: “Who’s to say he isn’t right?  I mean, what do we really know?  How do we know there even is a pond, or that these so-called ‘frogs’ aren’t secretly giraffes?  I, at least, have some small measure of wisdom, in that I know that I know nothing.”

What I want you to notice is how scientifically worthless all of these comments are.  If you wanted to do actual research on the frogs, then regardless of which sympathies you started with, you’d have no choice but to ignore the naysayers, and proceed as if the yellow and green frogs were different species.  Sure, you’d have in the back of your mind that they might be the same; you’d be ready to adjust your views if new evidence came in.  But for now, the theory that there’s just one species, divided into two subgroups that happen never to mate despite living in the same habitat, fails miserably at making contact with any of the facts that have been learned.  It leaves too much unexplained; in fact it explains nothing.

For all that, you might ask, don’t the naysayers occasionally turn out to be right?  Of course they do!  But if they were right more than occasionally, then science wouldn’t be possible.  We would still be in caves, beating our breasts and asking how we can know that frogs aren’t secretly giraffes.

So, that’s what I think about P and NP.  Do I expect this post to convince everyone?  No—but to tell you the truth, I don’t want it to.  I want it to convince most people, but I also want a few to continue speculating that P=NP.

Why, despite everything I’ve said, do I want maybe-P=NP-ism not to die out entirely?  Because alongside the P=NP carpers, I also often hear from a second group of carpers.  This second group says that P and NP are so obviously, self-evidently unequal that the quest to separate them with mathematical rigor is quixotic and absurd.  Theoretical computer scientists should quit wasting their time struggling to understand truths that don’t need to be understood, but only accepted, and do something useful for the world.  (A natural generalization of this view, I guess, is that all basic science should end.)  So, what I really want is for the two opposing groups of naysayers to keep each other in check, so that those who feel impelled to do so can get on with the fascinating quest to understand the ultimate limits of computation.

Update (March 8): At least eight readers have by now emailed me, or left comments, asking why I’m wasting so much time and energy arguing with Luboš Motl.  Isn’t it obvious that, ever since he stopped doing research around 2006 (if not earlier), this guy has completely lost his marbles?  That he’ll never, ever change his mind about anything?

Yes.  In fact, I’ve noticed repeatedly that, even when Luboš is wrong about a straightforward factual matter, he never really admits error: he just switches, without skipping a beat, to some other way to attack his interlocutor.  (To give a small example: watch how he reacts to being told that graph isomorphism is neither known nor believed to be NP-complete.  Caught making a freshman-level error about the field he’s attacking, he simply rants about how graph isomorphism is just as “representative” and “important” as NP-complete problems anyway, since no discrete math question is ever more or less “important” than any other; they’re all equally contrived and arbitrary.  At the Luboš casino, you lose even when you win!  The only thing you can do is stop playing and walk away.)

Anyway, my goal here was never to convince Luboš.  I was writing, not for him, but for my other readers: especially for those genuinely unfamiliar with these interesting issues, or intimidated by Luboš’s air of certainty.  I felt like I owed it to them to set out, clearly and forcefully, certain facts that all complexity theorists have encountered in their research, but that we hardly ever bother to articulate.  If you’ve never studied physics, then yes, it sounds crazy that there would be quadrillions of invisible neutrinos coursing through your body.  And if you’ve never studied computer science, it sounds crazy that there would be an “invisible electric fence,” again and again just barely separating what the state-of-the-art approximation algorithms can handle from what the state-of-the-art PCP tools can prove is NP-complete.  But there it is, and I wanted everyone else at least to see what the experts see, so that their personal judgments about the likelihood of P=NP could be informed by seeing it.

Luboš’s response to my post disappointed me (yes, really!).  I expected it to be nasty and unhinged, and so it was.  What I didn’t expect was that it would be so intellectually lightweight.  Confronted with the total untenability of his foot-stomping distinction between “continuous math” (where you can have justified beliefs without proof) and “discrete math” (where you can’t), and with exactly the sorts of “detailed, confirmed predictions” of the P≠NP hypothesis that he’d declared impossible, Luboš’s response was simply to repeat his original misconceptions, but louder.

And that brings me, I confess, to a second reason for my engagement with Luboš.  Several times, I’ve heard people express sentiments like:

Yes, of course Luboš is a raging jerk and a social retard.  But if you can just get past that, he’s so sharp and intellectually honest!  No matter how many people he needlessly offends, he always tells it like it is.

I want the nerd world to see—in as stark a situation as possible—that the above is not correct.  Luboš is wrong much of the time, and he’s intellectually dishonest.

At one point in his post, Luboš actually compares computer scientists who find P≠NP a plausible working hypothesis to his even greater nemesis: the “climate cataclysmic crackpots.”  (Strangely, he forgot to compare us to feminists, Communists, Muslim terrorists, or loop quantum gravity theorists.)  Even though the P versus NP and global warming issues might not seem closely linked, part of me is thrilled that Luboš has connected them as he has.  If, after seeing this ex-physicist’s “thought process” laid bare on the P versus NP problem—how his arrogance and incuriosity lead him to stake out a laughably-absurd position; how his vanity then causes him to double down after his errors are exposed—if, after seeing this, a single person is led to question Lubošian epistemology more generally, then my efforts will not have been in vain.

Anyway, now that I’ve finally unmasked Luboš—certainly to my own satisfaction, and I hope to that of most scientifically-literate readers—I’m done with this.  The physicist John Baez is rumored to have said: “It’s not easy to ignore Luboš, but it’s ALWAYS worth the effort.”  It took me eight years, but I finally see the multiple layers of profundity hidden in that snark.

And thus I make the following announcement:

For the next three years, I, Scott Aaronson, will not respond to anything Luboš says, nor will I allow him to comment on this blog.

In March 2017, I’ll reassess my Luboš policy.  Whether I relent will depend on a variety of factors—including whether Luboš has gotten the professional help he needs (from a winged pig, perhaps?) and changed his behavior; but also, how much my own quality of life has improved in the meantime.

Another Update (3/11): There’s some further thoughtful discussion of this post over on Reddit.

Another Update (3/13): Check out my MathOverflow question directly inspired by the comments on this post.

Yet Another Update (3/17): Dick Lipton and Ken Regan now have a response up to this post. My own response is coming soon in their comment section. For now, check out an excellent comment by Timothy Gowers, which begins “I firmly believe that P≠NP,” then plays devil’s-advocate by exploring the possibility that in this comment thread I called P being ‘severed in two,’ then finally returns to reasons for believing that P≠NP after all.

### Recent papers by Susskind and Tao illustrate the long reach of computation

Sunday, March 2nd, 2014

Most of the time, I’m a crabby, cantankerous ogre, whose only real passion in life is using this blog to shoot down the wrong ideas of others.  But alas, try as I might to maintain my reputation as a pure bundle of seething negativity, sometimes events transpire that pierce my crusty exterior.  Maybe it’s because I’m in Berkeley now, visiting the new Simons Institute for Theory of Computing during its special semester on Hamiltonian complexity.  And it’s tough to keep up my acerbic East Coast skepticism of everything new in the face of all this friggin’ sunshine.  (Speaking of which, if you’re in the Bay Area and wanted to meet me, this week’s the week!  Email me.)  Or maybe it’s watching Lily running around, her face wide with wonder.  If she’s so excited by her discovery of (say) a toilet plunger or some lint on the floor, what right do I have not to be excited by actual scientific progress?

Which brings me to the third reason for my relatively-sunny disposition: two long and fascinating recent papers on the arXiv.  What these papers have in common is that they use concepts from theoretical computer science in unexpected ways, while trying to address open problems at the heart of “traditional, continuous” physics and math.  One paper uses quantum circuit complexity to help understand black holes; the other uses fault-tolerant universal computation to help understand the Navier-Stokes equations.

Recently, our always-pleasant string-theorist friend Luboš Motl described computational complexity theorists as “extraordinarily naïve” (earlier, he also called us “deluded” and “bigoted”).  Why?  Because we’re obsessed with “arbitrary, manmade” concepts like the set of problems solvable in polynomial time, and especially because we assume things we haven’t yet proved such as P≠NP.  (Jokes about throwing stones from a glass house—or a stringy house—are left as exercises for the reader.)  The two papers that I want to discuss today reflect a different perspective: one that regards computation as no more “arbitrary” than other central concepts of mathematics, and indeed, as something that shows up even in contexts that seem incredibly remote from it, from the AdS/CFT correspondence to turbulent fluid flow.

Our first paper is Computational Complexity and Black Hole Horizons, by Lenny Susskind.  As readers of this blog might recall, last year Daniel Harlow and Patrick Hayden made a striking connection between computational complexity and the black-hole “firewall” question, by giving complexity-theoretic evidence that performing the measurement of Hawking radiation required for the AMPS experiment would require an exponentially-long quantum computation.  In his new work, Susskind makes a different, and in some ways even stranger, connection between complexity and firewalls.  Specifically, given an n-qubit pure state |ψ〉, recall that the quantum circuit complexity of |ψ〉 is the minimum number of 2-qubit gates needed to prepare |ψ〉 starting from the all-|0〉 state.  Then for reasons related to black holes and firewalls, Susskind wants to use the quantum circuit complexity of |ψ〉 as an intrinsic clock, to measure how long |ψ〉 has been evolving for.  Last week, I had the pleasure of visiting Stanford, where Lenny spent several hours explaining this stuff to me.  I still don’t fully understand it, but since it’s arguable that no one (including Lenny himself) does, let me give it a shot.

My approach will be to divide into two questions.  The first question is: why, in general (i.e., forgetting about black holes), might one want to use quantum circuit complexity as a clock?  Here the answer is: because unlike most other clocks, this one should continue to tick for an exponentially long time!

Consider some standard, classical thermodynamic system, like a box filled with gas, with the gas all initially concentrated in one corner.  Over time, the gas will diffuse across the box, in accord with the Second Law, until it completely equilibrates.  Furthermore, if we know the laws of physics, then we can calculate exactly how fast this diffusion will happen.  But this implies that we can use the box as a clock!  To do so, we’d simply have to measure how diffused the gas was, then work backwards to determine how much time had elapsed since the gas started diffusing.

But notice that this “clock” only works until the gas reaches equilibrium—i.e., is equally spread across the box.  Once the gas gets to equilibrium, which it does in a reasonably short time, it just stays there (at least until the next Poincaré recurrence time).  So, if you see the box in equilibrium, there’s no measurement you could make—or certainly no “practical” measurement—that would tell you how long it’s been there.  Indeed, if we model the collisions between gas particles (and between gas particles and the walls of the box) as random events, then something even stronger is true.  Namely, the probability distribution over all possible configurations of the gas particles will quickly converge to an equilibrium distribution.  And if you all you knew was that the particles were in the equilibrium distribution, then there’s no property of their distribution that you could point to—not even an abstract, unmeasurable property—such that knowing that property would tell you how long the gas had been in equilibrium.

Interestingly, something very different happens if we consider a quantum pure state, in complete isolation from its environment.  If you have some quantum particles in a perfectly-isolating box, and you start them out in a “simple” state (say, with all particles unentangled and in a corner), then they too will appear to diffuse, with their wavefunctions spreading out and getting entangled with each other, until the system reaches “equilibrium.”  After that, there will once again be no “simple” measurement you can make—say, of the density of particles in some particular location—that will give you any idea of how long the box has been in equilibrium.  On the other hand, the laws of unitary evolution assure us that the quantum state is still evolving, rotating serenely through Hilbert space, just like it was before equilibration!  Indeed, in principle you could even measure that the evolution was still happening, but to do so, you’d need to perform an absurdly precise and complicated measurement—one that basically inverted the entire unitary transformation that had been applied since the particles started diffusing.

Lenny now asks the question: if the quantum state of the particles continues to evolve even after “equilibration,” then what physical quantity can we point to as continuing to increase?  By the argument above, it can’t be anything simple that physicists are used to talking about, like coarse-grained entropy.  Indeed, the most obvious candidate that springs to mind, for a quantity that should keep increasing even after equilibration, is just the quantum circuit complexity of the state!  If there’s no “magic shortcut” to simulating this system—that is, if the fastest way to learn the quantum state at time T is just to run the evolution equations forward for T time steps—then the quantum circuit complexity will continue to increase linearly with T, long after equilibration.  Eventually, the complexity will “max out” at ~cn, where n is the number of particles, simply because (neglecting small multiplicative terms) the dimension of the Hilbert space is always an upper bound on the circuit complexity.  After even longer amounts of time—like ~cc^n—the circuit complexity will dip back down (sometimes even to 0), as the quantum state undergoes recurrences.  But both of those effects only occur on timescales ridiculously longer than anything normally relevant to physics or everyday life.

Admittedly, given the current status of complexity theory, there’s little hope of proving unconditionally that the quantum circuit complexity continues to rise until it becomes exponential, when some time-independent Hamiltonian is continuously applied to the all-|0〉 state.  (If we could prove such a statement, then presumably we could also prove PSPACE⊄BQP/poly.)  But maybe we could prove such a statement modulo a reasonable conjecture.  And we do have suggestive weaker results.  In particular (and as I just learned this Friday), in 2012 Brandão, Harrow, and Horodecki, building on earlier work due to Low, showed that, if you apply S>>n random two-qubit gates to n qubits initially in the all-|0〉 state, then with high probability, not only do you get a state with large circuit complexity, you get a state that can’t even be distinguished from the maximally mixed state by any quantum circuit with at most ~S1/6 gates.

OK, now on to the second question: what does any of this have to do with black holes?  The connection Lenny wants to make involves the AdS/CFT correspondence, the “duality” between two completely different-looking theories that’s been the rage in string theory since the late 1990s.  On one side of the ring is AdS (Anti de Sitter), a quantum-gravitational theory in D spacetime dimensions—one where black holes can form and evaporate, etc., but on the other hand, the entire universe is surrounded by a reflecting boundary a finite distance away, to help keep everything nice and unitary.  On the other side is CFT (Conformal Field Theory): an “ordinary” quantum field theory, with no gravity, that lives only on the (D-1)-dimensional “boundary” of the AdS space, and not in its interior “bulk.”  The claim of AdS/CFT is that despite how different they look, these two theories are “equivalent,” in the sense that any calculation in one theory can be transformed to a calculation in the other theory that yields the same answer.  Moreover, we get mileage this way, since a calculation that’s hard on the AdS side is often easy on the CFT side and vice versa.

As an example, suppose we’re interested in what happens inside a black hole—say, because we want to investigate the AMPS firewall paradox.  Now, figuring out what happens inside a black hole (or even on or near the event horizon) is a notoriously hard problem in quantum gravity; that’s why people have been arguing about firewalls for the past two years, and about the black hole information problem for the past forty!  But what if we could put our black hole in an AdS box?  Then using AdS/CFT, couldn’t we translate questions about the black-hole interior to questions about the CFT on the boundary, which don’t involve gravity and which would therefore hopefully be easier to answer?

In fact people have tried to do that—but frustratingly, they haven’t been able to use the CFT calculations to answer even the grossest, most basic questions about what someone falling into the black hole would actually experience.  (For example, would that person hit a “firewall” and die immediately at the horizon, or would she continue smoothly through, only dying close to the singularity?)  Lenny’s paper explores a possible reason for this failure.  It turns out that the way AdS/CFT works, the closer to the black hole’s event horizon you want to know what happens, the longer you need to time-evolve the quantum state of the CFT to find out.  In particular, if you want to know what’s going on at distance ε from the event horizon, then you need to run the CFT for an amount of time that grows like log(1/ε).  And what if you want to know what’s going on inside the black hole?  In line with the holographic principle, it turns out that you can express an observable inside the horizon by an integral over the entire AdS space outside the horizon.  Now, that integral will include a part where the distance ε from the event horizon goes to 0—so log(1/ε), and hence the complexity of the CFT calculation that you have to do, diverges to infinity.  For some kinds of calculations, the ε→0 part of the integral isn’t very important, and can be neglected at the cost of only a small error.  For other kinds of calculations, unfortunately—and in particular, for the kind that would tell you whether or not there’s a firewall—the ε→0 part is extremely important, and it makes the CFT calculation hopelessly intractable.

Note that yes, we even need to continue the integration for ε much smaller than the Planck length—i.e., for so-called “transplanckian” distances!  As Lenny puts it, however:

For most of this vast sub-planckian range of scales we don’t expect that the operational meaning has anything to do with meter sticks … It has more to do with large times than small distances.

One could give this transplanckian blowup in computational complexity a pessimistic spin: darn, so it’s probably hopeless to use AdS/CFT to prove once and for all that there are no firewalls!  But there’s also a more positive interpretation: the interior of a black hole is “protected from meddling” by a thick armor of computational complexity.  To explain this requires a digression about firewalls.

The original firewall paradox of AMPS could be phrased as follows: if you performed a certain weird, complicated measurement on the Hawking radiation emitted from a “sufficiently old” black hole, then the expected results of that measurement would be incompatible with also seeing a smooth, Einsteinian spacetime if you later jumped into the black hole to see what was there.  (Technically, because you’d violate the monogamy of entanglement.)  If what awaited you behind the event horizon wasn’t a “classical” black hole interior with a singularity in the middle, but an immediate breakdown of spacetime, then one says you would’ve “hit a firewall.”

Yes, it seems preposterous that “firewalls” would exist: at the least, it would fly in the face of everything people thought they understood for decades about general relativity and quantum field theory.  But crucially—and here I have to disagree with Stephen Hawking—one can’t “solve” this problem by simply repeating the physical absurdities of firewalls, or by constructing scenarios where firewalls “self-evidently” don’t arise.  Instead, as I see it, solving the problem means giving an account of what actually happens when you do the AMPS experiment, or of what goes wrong when you try to do it.

On this last question, it seems to me that Susskind and Juan Maldacena achieved a major advance in their much-discussed ER=EPR paper last year.  Namely, they presented a picture where, sure, a firewall arises (at least temporarily) if you do the AMPS experiment—but no firewall arises if you don’t do the experiment!  In other words, doing the experiment sends a nonlocal signal to the interior of the black hole (though you do have to jump into the black hole to receive the signal, so causality outside the black hole is still preserved).  Now, how is it possible for your measurement of the Hawking radiation to send an instantaneous signal to the black hole interior, which might be light-years away from you when you measure?  On Susskind and Maldacena’s account, it’s possible because the entanglement between the Hawking radiation and the degrees of freedom still in the black hole, can be interpreted as creating wormholes between the two.  Under ordinary conditions, these wormholes (like most wormholes in general relativity) are “non-traversable”: they “pinch off” if you try to send signals through them, so they can’t be used for faster-than-light communication.  However, if you did the AMPS experiment, then the wormholes would become traversable, and could carry a firewall (or an innocuous happy-birthday message, or whatever) from the Hawking radiation to the black hole interior.  (Incidentally, ER stands for Einstein and Rosen, who wrote a famous paper on wormholes, while EPR stands for Einstein, Podolsky, and Rosen, who wrote a famous paper on entanglement.  “ER=EPR” is Susskind and Maldacena’s shorthand for their proposed connection between wormholes and entanglement.)

Anyway, these heady ideas raise an obvious question: how hard would it be to do the AMPS experiment?  Is sending a nonlocal signal to the interior of a black hole via that experiment actually a realistic possibility?  In their work a year ago on computational complexity and firewalls, Harlow and Hayden already addressed that question, though from a different perspective than Susskind.  In particular, Harlow and Hayden gave strong evidence that carrying out the AMPS experiment would require solving a problem believed to be exponentially hard even for a quantum computer: specifically, a complete problem for QSZK (Quantum Statistical Zero Knowledge).  In followup work (not yet written up, though see my talk at KITP and my PowerPoint slides), I showed that the Harlow-Hayden problem is actually at least as hard as inverting one-way functions, which is even stronger evidence for hardness.

All of this suggests that, even supposing we could surround an astrophysical black hole with a giant array of perfect photodetectors, wait ~1069 years for the black hole to (mostly) evaporate, then route the Hawking photons into a super-powerful, fault-tolerant quantum computer, doing the AMPS experiment (and hence, creating traversable wormholes to the black hole interior) still wouldn’t be a realistic prospect, even if the equations formally allow it!  There’s no way to sugarcoat this: computational complexity limitations seem to be the only thing protecting the geometry of spacetime from nefarious experimenters.

Anyway, Susskind takes that amazing observation of Harlow and Hayden as a starting point, but then goes off on a different tack.  For one thing, he isn’t focused on the AMPS experiment (the one involving monogamy of entanglement) specifically: he just wants to know how hard it is to do any experiment (possibly a different one) that would send nonlocal signals to the black hole interior.  For another, unlike Harlow and Hayden, Susskind isn’t trying to show that such an experiment would be exponentially hard.  Instead, he’s content if the experiment is “merely” polynomially hard—but in the same sense that (say) unscrambling an egg, or recovering a burned book from the smoke and ash, are polynomially hard.  In other words, Susskind only wants to argue that creating a traversable wormhole would be “thermodynamics-complete.”  A third, related, difference is that Susskind considers an extremely special model scenario: namely, the AdS/CFT description of something called the “thermofield double state.”  (This state involves two entangled black holes in otherwise-separated spacetimes; according to ER=EPR, we can think of those black holes as being connected by a wormhole.)  While I don’t yet understand this point, apparently the thermofield double state is much more favorable for firewall production than a “realistic” spacetime—and in particular, the Harlow-Hayden argument doesn’t apply to it.  Susskind wants to show that even so (i.e., despite how “easy” we’ve made it), sending a signal through the wormhole connecting the two black holes of the thermofield double state would still require solving a thermodynamics-complete problem.

So that’s the setup.  What new insights does Lenny get?  This, finally, is where we circle back to the view of quantum circuit complexity as a clock.  Briefly, Lenny finds that the quantum state getting more and more complicated in the CFT description—i.e., its quantum circuit complexity going up and up—directly corresponds to the wormhole getting longer and longer in the AdS description.  (Indeed, the length of the wormhole increases linearly with time, growing like the circuit complexity divided by the total number of qubits.)  And the wormhole getting longer and longer is what makes it non-traversable—i.e., what makes it impossible to send a signal through.

Why has quantum circuit complexity made a sudden appearance here?  Because in the CFT description, the circuit complexity continuing to increase is the only thing that’s obviously “happening”!  From a conventional physics standpoint, the quantum state of the CFT very quickly reaches equilibrium and then just stays there.  If you measured some “conventional” physical observable—say, the energy density at a particular point—then it wouldn’t look like the CFT state was continuing to evolve at all.  And yet we know that the CFT state is evolving, for two extremely different reasons.  Firstly, because (as we discussed early on in this post) unitary evolution is still happening, so presumably the state’s quantum circuit complexity is continuing to increase.  And secondly, because in the dual AdS description, the wormhole is continuing to get longer!

From this connection, at least three striking conclusions follow:

1. That even when nothing else seems to be happening in a physical system (i.e., it seems to have equilibrated), the fact that the system’s quantum circuit complexity keeps increasing can be “physically relevant” all by itself.  We know that it’s physically relevant, because in the AdS dual description, it corresponds to the wormhole getting longer!
2. That even in the special case of the thermofield double state, the geometry of spacetime continues to be protected by an “armor” of computational complexity.  Suppose that Alice, in one half of the thermofield double state, wants to send a message to Bob in the other half (which Bob can retrieve by jumping into his black hole).  In order to get her message through, Alice needs to prevent the wormhole connecting her black hole to Bob’s from stretching uncontrollably—since as long as it stretches, the wormhole remains non-traversable.  But in the CFT picture, stopping the wormhole from stretching corresponds to stopping the quantum circuit complexity from increasing!  And that, in turn, suggests that Alice would need to act on the radiation outside her black hole in an incredibly complicated and finely-tuned way.  For “generically,” the circuit complexity of an n-qubit state should just continue to increase, the longer you run unitary evolution for, until it hits its exp(n) maximum.  To prevent that from happening would essentially require “freezing” or “inverting” the unitary evolution applied by nature—but that’s the sort of thing that we expect to be thermodynamics-complete.  (How exactly do Alice’s actions in the “bulk” affect the evolution of the CFT state?  That’s an excellent question that I don’t understand AdS/CFT well enough to answer.  All I know is that the answer involves something that Lenny calls “precursor operators.”)
3. The third and final conclusion is that there can be a physically-relevant difference between pseudorandom n-qubit pure states and “truly” random states—even though, by the definition of pseudorandom, such a difference can’t be detected by any small quantum circuit!  Once again, the way to see the difference is using AdS/CFT.  It’s easy to show, by a counting argument, that almost all n-qubit pure states have nearly-maximal quantum circuit complexity.  But if the circuit complexity is already maximal, that means in particular that it’s not increasing!  Lenny argues that this corresponds to the wormhole between the two black holes no longer stretching.  But if the wormhole is no longer stretching, then it’s “vulnerable to firewalls” (i.e., to messages going through!).  It had previously been argued that random CFT states almost always correspond to black holes with firewalls—and since the CFT states formed by realistic physical processes will look “indistinguishable from random states,” black holes that form under realistic conditions should generically have firewalls as well.  But Lenny rejects this argument, on the ground that the CFT states that arise in realistic situations are not random pure states.  And what distinguishes them from random states?  Simply that they have non-maximal (and increasing) quantum circuit complexity!

I’ll leave you with a question of my own about this complexity / black hole connection: one that I’m unsure how to think about, but that perhaps interests me more than any other here.  My question is: could you ever learn the answer to an otherwise-intractable computational problem by jumping into a black hole?  Of course, you’d have to really want the answer—so much so that you wouldn’t mind dying moments after learning it, or not being able to share it with anyone else!  But never mind that.  What I have in mind is first applying some polynomial-size quantum circuit to the Hawking radiation, then jumping into the black hole to see what nonlocal effect (if any) the circuit had on the interior.  The fact that the mapping between interior and exterior states is so complicated suggests that there might be complexity-theoretic mileage to be had this way, but I don’t know what.  (It’s also possible that you can get a computational speedup in special cases like the thermofield double state, but that a Harlow-Hayden-like obstruction prevents you from getting one with real astrophysical black holes.  I.e., that for real black holes, you’ll just see a smooth, boring, Einsteinian black hole interior no matter what polynomial-size quantum circuit you applied to the Hawking radiation.)

If you’re still here, the second paper I want to discuss today is Finite-time blowup for an averaged three-dimensional Navier-Stokes equation by Terry Tao.  (See also the excellent Quanta article by Erica Klarreich.)  I’ll have much, much less to say about this paper than I did about Susskind’s, but that’s not because it’s less interesting: it’s only because I understand the issues even less well.

Navier-Stokes existence and smoothness is one of the seven Clay Millennium Problems (alongside P vs. NP, the Riemann Hypothesis, etc).  The problem asks whether the standard, classical differential equations for three-dimensional fluid flow are well-behaved, in the sense of not “blowing up” (e.g., concentrating infinite energy on a single point) after a finite amount of time.

Expanding on ideas from his earlier blog posts and papers about Navier-Stokes (see here for the gentlest of them), Tao argues that the Navier-Stokes problem is closely related to the question of whether or not it’s possible to “build a fault-tolerant universal computer out of water.”  Why?  Well, it’s not the computational universality per se that matters, but if you could use fluid flow to construct general enough computing elements—resistors, capacitors, transistors, etc.—then you could use those elements to recursively shift the energy in a given region into a region half the size, and from there to a region a quarter the size, and so on, faster and faster, until you got infinite energy density after a finite amount of time.

Strikingly, building on an earlier construction by Katz and Pavlovic, Tao shows that this is actually possible for an “averaged” version of the Navier-Stokes equations!  So at the least, any proof of existence and smoothness for the real Navier-Stokes equations will need to “notice” the difference between the real and averaged versions.  In his paper, though, Tao hints at the possibility (or dare one say likelihood?) that the truth might go the other way.  That is, maybe the “universal computer” construction can be ported from the averaged Navier-Stokes equations to the real ones.  In that case, we’d have blowup in finite time for the real equations, and a negative solution to the Navier-Stokes existence and smoothness problem.  Of course, such a result wouldn’t imply that real, physical water was in any danger of “blowing up”!  It would simply mean that the discrete nature of water (i.e., the fact that it’s made of H2O molecules, rather than being infinitely divisible) was essential to understanding its stability given arbitrary initial conditions.

So, what are the prospects for such a blowup result?  Let me quote from Tao’s paper:

Once enough logic gates of ideal fluid are constructed, it seems that the main difficulties in executing the above program [to prove a blowup result for the “real” Navier-Stokes equations] are of a “software engineering” nature, and would be in principle achievable, even if the details could be extremely complicated in practice.  The main mathematical difficulty in executing this “fluid computing” program would thus be to arrive at (and rigorously certify) a design for logical gates of inviscid fluid that has some good noise tolerance properties.  In this regard, ideas from quantum computing (which faces a unitarity constraint somewhat analogous to the energy conservation constraint for ideal fluids, albeit with the key difference of having a linear evolution rather than a nonlinear one) may prove to be useful.

One minor point that I’d love to understand is, what happens in two dimensions?  Existence and smoothness are known to hold for the 2-dimensional analogues of the Navier-Stokes equations.  If they also held for the averaged 2-dimensional equations, then it would follow that Tao’s “universal computer” must be making essential use of the third dimension. How?  If I knew the answer to that, then I’d feel for the first time like I had some visual crutch for understanding why 3-dimensional fluid flow is so complicated, even though 2-dimensional fluid flow isn’t.

I see that, in blog comments here and here, Tao says that the crucial difference between the 2- and 3-dimensional Navier-Stokes equations arises from the different scaling behavior of the dissipation term: basically, you can ignore it in 3 or more dimensions, but you can’t ignore it in 2.  But maybe there’s a more doofus-friendly explanation, which would start with some 3-dimensional fluid logic gate, and then explain why the gate has no natural 2-dimensional analogue, or why dissipation causes its analogue to fail.

Obviously, there’s much more to say about both papers (especially the second…) than I said in this post, and many people more knowledgeable than I am to say those things.  But that’s what the comments section is for.  Right now I’m going outside to enjoy the California sunshine.

### More “tweets”

Friday, January 31st, 2014

Update (Feb. 4): After Luke Muelhauser of MIRI interviewed me about “philosophical progress,” Luke asked me for other people to interview about philosophy and theoretical computer science. I suggested my friend and colleague Ronald de Wolf of the University of Amsterdam, and I’m delighted that Luke took me up on it. Here’s the resulting interview, which focuses mostly on quantum computing (with a little Kolmogorov complexity and Occam’s Razor thrown in). I read the interview with admiration (and hoping to learn some tips): Ronald tackles each question with more clarity, precision, and especially levelheadedness than I would.

Another Update: Jeff Kinne asked me to post a link to a forum about the future of the Conference on Computational Complexity (CCC)—and in particular, whether it should continue to be affiliated with the IEEE. Any readers who have ever had any involvement with the CCC conference are encouraged to participate. You can read all about what the issues are in a manifesto written by Dieter van Melkebeek.

Yet Another Update: Some people might be interested in my response to Geordie Rose’s response to the Shin et al. paper about a classical model for the D-Wave machine.

“How ‘Quantum’ is the D-Wave Machine?” by Shin, Smith, Smolin, Vazirani goo.gl/JkLg0l – was previous skepticism too GENEROUS to D-Wave?

D-Wave not of broad enough interest? OK then, try “AM with Multiple Merlins” by Dana Moshkovitz, Russell Impagliazzo, and me goo.gl/ziSUz9

“Remarks on the Physical Church-Turing Thesis” – my talk at the FQXi conference in Vieques, Puerto Rico is now on YouTube goo.gl/kAd9TZ

Cool new SciCast site (scicast.org) lets you place bets on P vs NP, Unique Games Conjecture, etc. But glitches remain to be ironed out