PostBQP Postscripts: A Confession of Mathematical Errors
Sunday, November 30th, 2014tl;dr: This post reveals two errors in one of my most-cited papers, and also explains how to fix them. Thanks to Piotr Achinger, Michael Cohen, Greg Kuperberg, Ciaran Lee, Ryan O’Donnell, Julian Rosen, Will Sawin, Cem Say, and others for their contributions to this post.
If you look at my Wikipedia page, apparently one of the two things in the world that I’m “known for” (along with algebrization) is “quantum Turing with postselection.” By this, Wikipedia means my 2004 definition of the complexity class PostBQP—that is, the class of decision problems solvable in bounded-error quantum polynomial time, assuming the ability to postselect (or condition) on certain measurement outcomes—and my proof that PostBQP coincides with the classical complexity PP (that is, the class of decision problems expressible in terms of whether the number of inputs that cause a given polynomial-time Turing machine to accept does or doesn’t exceed some threshold).
To explain this a bit: even without quantum mechanics, it’s pretty obvious that, if you could “postselect” on exponentially-unlikely events, then you’d get huge, unrealistic amounts of computational power. For example (and apologies in advance for the macabre imagery), you could “solve” NP-complete problems in polynomial time by simply guessing a random solution, then checking whether the solution is right, and shooting yourself if it happened to be wrong! Conditioned on still being alive (and if you like, appealing to the “anthropic principle”), you must find yourself having guessed a valid solution—assuming, of course, that there were any valid solutions to be found. If there weren’t any, then you’d seem to be out of luck! (Exercise for the reader: generalize this “algorithm,” so that it still works even if you don’t know in advance whether your NP-complete problem instance has any valid solutions.)
So with the PostBQP=PP theorem, the surprise was not that postselection gives you lots of computational power, but rather that postselection combined with quantum mechanics gives you much more power even than postselection by itself (or quantum mechanics by itself, for that matter). Since PPP=P#P, the class PP basically captures the full difficulty of #P-complete counting problems—that is, not just solving an NP-complete problem, but counting how many solutions it has. It’s not obvious that a quantum computer with postselection can solve counting problems, but that’s what the theorem shows. That, in turn, has implications for other things: for example, I showed it can be used to prove classical facts about PP, like the fact that PP is closed under intersection (the Beigel-Reingold-Spielman Theorem), in a straightforward way; and it’s also used to show the hardness of quantum sampling problems, in the work of Bremner-Jozsa-Shepherd as well as my BosonSampling work with Arkhipov.
I’m diffident about being “known for” something so simple; once I had asked the question, the proof of PostBQP=PP took me all of an hour to work out. Yet PostBQP ended up being a hundred times more influential for quantum computing theory than things on which I expended a thousand times more effort. So on balance, I guess I’m happy to call PostBQP my own.
That’s why today’s post comes with a special sense of intellectual responsibility. Within the last month, it’s come to my attention that there are at least two embarrassing oversights in my PostBQP paper from a decade ago, one of them concerning the very definition of PostBQP. I hasten to clarify: once one fixes up the definition, the PostBQP=PP theorem remains perfectly valid, and all the applications of PostBQP that I mentioned above—for example, to reproving Beigel-Reingold-Spielman, and to the hardness of quantum sampling problems—go through just fine. But if you think I have nothing to be embarrassed about: well, read on.
The definitional subtlety came clearly to my attention a few weeks ago, when I was lecturing about PostBQP in my 6.845 Quantum Complexity Theory graduate class. I defined PostBQP as the class of languages L⊆{0,1}* for which there exists a polynomial-time quantum Turing machine M such that, for all inputs x∈{0,1}*,
- M(x) “succeeds” (determined, say, by measuring its first output qubit in the {|0>,|1>} basis) with nonzero probability.
- If x∈L, then conditioned on M(x) succeeding, M(x) “accepts” (determined, say, by measuring its second output qubit in the {|0>,|1>} basis) with probability at least 2/3.
- If x∉L, then conditioned on M(x) succeeding, M(x) accepts with probability at most 1/3.
I then had to reassure the students that PostBQP, so defined, was a “robust” class: that is, that the definition doesn’t depend on stupid things like which set of quantum gates we allow. I argued that, even though we’re postselecting on exponentially-unlikely events, it’s still OK, because the Solovay-Kitaev Theorem lets us approximate any desired unitary to within exponentially-small error, with only a polynomial increase in the size of our quantum circuit. (Here we actually need the full power of the Solovay-Kitaev Theorem, in contrast to ordinary BQP, where we only need part of the power.)
A student in the class, Michael Cohen, immediately jumped in with a difficulty: what if M(x) succeeded, not with exponentially-small probability, but with doubly-exponentially-small probability—say, exp(-2n)? In that case, one could no longer use the Solovay-Kitaev Theorem to show the irrelevance of the gate set. It would no longer even be clear that PostBQP⊆PP, since the PP simulation might not be able to keep track of such tiny probabilities.
Thinking on my feet, I replied that we could presumably choose a set of gates—for example, gates involving rational numbers only—for which doubly-exponentially-small probabilities would never arise. Or if all else failed, we could simply add to the definition of PostBQP that M(x) had to “succeed” with probability at least 1/exp(n): after all, that was the only situation I ever cared about anyway, and the only one that ever arose in the applications of PostBQP.
But the question still gnawed at me: was there a problem with my original, unamended definition of PostBQP? If we weren’t careful in choosing our gate set, could we have cancellations that produced doubly-exponentially-small probabilities? I promised I’d think about it more.
By a funny coincidence, just a couple weeks later, Ciaran Lee, a student at Oxford, emailed me the exact same question. So on a train ride from Princeton to Boston, I decided to think about it for real. It wasn’t hard to show that, if the gates involved square roots of rational numbers only—for example, if we’re dealing with the Hadamard and Toffoli gates, or the cos(π/8) and CNOT gates, or other standard gate sets—then every measurement outcome has at least 1/exp(n) probability, so there’s no problem with the definition of PostBQP. But I didn’t know what might happen with stranger gate sets.
As is my wont these days—when parenting, teaching, and so forth leave me with almost no time to concentrate on math—I posted the problem to MathOverflow. Almost immediately, I got incisive responses. First, Piotr Achinger pointed out that, if we allow arbitrary gates, then it’s easy to get massive cancellations. In more detail, let {an} be extremely-rapidly growing sequence of integers, say with an+1 > exp(an). Then define
$$ \alpha = \sum_{n=1}^{\infty} 0.1^{a_n}. $$
If we write out α in decimal notation, it will consist of mostly 0’s, but with 1’s spaced further and further apart, like so: 0.1101000000000001000…. Now consider a gate set that involves α as well as 0.1 and -0.1 as matrix entries. Given n qubits, it’s not hard to see that we can set up an interference experiment in which one of the paths leading to a given outcome E has amplitude α, and the other paths have amplitudes $$ -(0.1^{a_1}), -(0.1^{a_2}), \ldots, -(0.1^{a_k}), $$ where k is the largest integer such that ak≤n. In that case, the total amplitude of E will be about $$0.1^{a_{k+1}},$$ which for most values of n is doubly-exponentially small in n. Of course, by simply choosing a faster-growing sequence {an}, we can cause an even more severe cancellation.
Furthermore, by modifying the above construction to involve two crazy transcendental numbers α and β, I claim that we can set up a PostBQP computation such that deciding what happens is arbitrarily harder than PP (though still computable)—say, outside of exponential space, or even triple-exponential space. Moreover, we can do this despite the fact that the first n digits of α and β remain computable in O(n) time. The details are left as an exercise for the interested reader.
Yet even though we can engineer massive cancellations with crazy gates, I still conjectured that nothing would go wrong with “normal” gates: for example, gates involving algebraic amplitudes only. More formally, I conjectured that any finite set A=(a1,…,ak) of algebraic numbers is “tame,” in the sense that, if p is any degree-n polynomial with integer coefficients at most exp(n) in absolute value, then p(a1,…,ak)≠0 implies |p(a1,…,ak)|≥1/exp(n). And indeed, Julian Rosen on MathOverflow found an elegant proof of this fact. I’ll let you read it over there if you’re interested, but briefly, it interprets the amplitude we want as one particular Archimedean valuation of a certain element of a number field, and then lower-bounds the amplitude by considering the product of all Archimedean and non-Archimedean valuations (the latter of which involves the p-adic numbers). Since this was a bit heavy-duty for me, I was grateful when Will Sawin reformulated the proof in linear-algebraic terms that I understood.
And then came the embarrassing part. A few days ago, I was chatting with Greg Kuperberg, the renowned mathematician and author of our climate-change parable. I thought he’d be interested in this PostBQP progress, so I mentioned it to him. Delicately, Greg let me know that he had recently proved the exact same results, for the exact same reason (namely, fixing the definition of PostBQP), for the latest revision of his paper How Hard Is It to Approximate the Jones Polynomial?. Moreover, he actually wrote to me in June to tell me about this! At the time, however, I regarded it as “pointless mathematical hairsplitting” (who cares about these low-level gate-set issues anyway?). So I didn’t pay it any attention—and then I’d completely forgotten about Greg’s work when the question resurfaced a few months later. This is truly a just punishment for looking down on “mathematical hairsplitting,” and not a lesson I’ll soon forget.
Anyway, Greg’s paper provides yet a third proof that the algebraic numbers are tame, this one using Galois conjugates (though it turns out that, from a sufficiently refined perspective, Greg’s proof is equivalent to the other two).
There remains one obvious open problem here, one that I noted in the MathOverflow post and in which Greg is also extremely interested. Namely, we now know that it’s possible to screw up PostBQP using gates with amplitudes that are crazy transcendental numbers (closely related to the Liouville numbers). And we also know that, if the gates have algebraic amplitudes, then everything is fine: all events have at least 1/exp(n) probability. But what if the gates have not-so-crazy transcendental amplitudes, like 1/e, or (a bit more realistically) cos(2)? I conjecture that everything is still fine, but the proof techniques that worked for the algebraic case seem useless here.
Stepping back, how great are the consequences of all this for our understanding of PostBQP? Fortunately, I claim that they’re not that great, for the following reason. As Adleman, DeMarrais, and Huang already noted in 1997—in the same paper that proved BQP⊆PP—we can screw up the definition even of BQP, let alone PostBQP, using a bizarre enough gate set. For example, suppose we had a gate G that mapped |0> to x|0>+y|1>, where y was a real number whose binary expansion encoded the halting problem (for example, y might equal Chaitin’s Ω). Then by applying G more and more times, we could learn more and more bits of y, and thereby solve an uncomputable problem in the limit n→∞.
Faced with this observation, most quantum computing experts would say something like: “OK, but this is silly! It has no physical relevance, since we’ll never come across a magical gate like G—if only we did! And at any rate, it has nothing to do with quantum computing specifically: even classically, one could imagine a coin that landed heads with probability equal to Chaitin’s Ω. Therefore, the right way to deal with this is simply to define BQP in such a way as to disallow such absurd gates.” And indeed, that is what’s done today—usually without even remarking on it.
Now, it turns out that even gates that are “perfectly safe” for defining BQP, can turn “unsafe” when it comes to defining PostBQP. To screw up the definition of PostBQP, it’s not necessary that a gate involve uncomputable (or extremely hard-to-compute) amplitudes: the amplitudes could all be easily computable, but they could still be “unsafe” because of massive cancellations, as in the example above involving α. But one could think of this as a difference of degree, rather than of kind. It’s still true that there’s a large set of gates, including virtually all the gates anyone has ever cared about in practice (Toffoli, Hadamard, π/8, etc. etc.), that are perfectly safe for defining the complexity class; it’s just that the set is slightly smaller than it was for BQP.
The other issue with the PostBQP=PP paper was discovered by Ryan O’Donnell and Cem Say. In Proposition 3 of the paper, I claim that PostBQP = BQPPostBQP||,classical, where the latter is the class of problems solvable by a BQP machine that’s allowed to make poly(n) parallel, classical queries to a PostBQP oracle. As Ryan pointed out to me, nothing in my brief argument for this depended on quantum mechanics, so it would equally well show that PostBPP = BPPPostBPP||, where PostBPP (also known as BPPpath) is the classical analogue of PostBQP, and BPPPostBPP|| is the class of problems solvable by a BPP machine that can make poly(n) parallel queries to a PostBPP oracle. But BPPPostBPP|| clearly contains BPPNP||, which in turn contains AM—so we would get AM in PostBPP, and therefore AM in PostBQP=PP. But Vereshchagin gave an oracle relative to which AM is not contained in PP. Since there was no nonrelativizing ingredient anywhere in my argument, the only possible conclusion is that my argument was wrong. (This, incidentally, provides a nice illustration of the value of oracle results.)
In retrospect, it’s easy to pinpoint what went wrong. If we try to simulate BPPPostBPP|| in PostBPP, our random bits will be playing a dual role: in choosing the queries to be submitted to the PostBPP oracle, and in providing the “raw material for postselection,” in computing the responses to those queries. But in PostBPP, we only get to postselect once. When we do, the two sets of random bits that we’d wanted to keep separate will get hopelessly mixed up, with the postselection acting on the “BPP” random bits, not just on the “PostBPP” ones.
How can we fix this problem? Well, when defining the class BQPPostBQP||,classical, suppose we require the queries to the PostBQP oracle to be not only “classical,” but deterministic: that is, they have to be generated in advance by a P machine, and can’t depend on any random bits whatsoever. And suppose we define BPPPostBPP||,classical similarly. In that case, it’s not hard to see that the equalities BQPPostBQP||,classical = PostBQP and BPPPostBPP||,classical = PostBPP both go through. You don’t actually care about this, do you? But Ryan O’Donnell and Cem Say did, and that’s good enough for me.
I wish I could say that these are the only cases of mistakes recently being found in decade-old papers of mine, but alas, such is not the case. In the near future, my student Adam Bouland, MIT undergrad Mitchell Lee, and Singapore’s Joe Fitzsimons will post to the arXiv a paper that grew out of an error in my 2005 paper Quantum Computing and Hidden Variables. In that paper, I introduced a hypothetical generalization of the quantum computing model, in which one gets to see the entire trajectory of a hidden variable, rather than just a single measurement outcome. I showed that this generalization would let us solve problems somewhat beyond what we think we can do with a “standard” quantum computer. In particular, we could solve the collision problem in O(1) queries, efficiently solve Graph Isomorphism (and all other problems in the Statistical Zero-Knowledge class), and search an N-element list in only ~N1/3 steps, rather than the ~N1/2 steps of Grover’s search algorithm. That part of the paper remains fine!
On the other hand, at the end of the paper, I also gave a brief argument to show that, even in the hidden-variable model, ~N1/3 steps are required to search an N-element list. But Mitchell Lee and Adam Bouland discovered that that argument is wrong: it fails to account for all the possible ways that an algorithm could exploit the correlations between the hidden variable’s values at different moments in time. (I’ve previously discussed this error in other blog posts, as well as in the latest edition of Quantum Computing Since Democritus.)
If we suitably restrict the hidden-variable theory, then we can correctly prove a lower bound of ~N1/4, or even (with strong enough assumptions) ~N1/3; and we do that in the forthcoming paper. Even with no restrictions, as far as we know an ~N1/3 lower bound for search with hidden variables remains true. But it now looks like proving it will require a major advance in our understanding of hidden-variable theories: for example, a proof that the “Schrödinger theory” is robust to small perturbations, which I’d given as the main open problem in my 2005 paper.
As if that weren’t enough, in my 2003 paper Quantum Certificate Complexity, I claimed (as a side remark) that one could get a recursive Boolean function f with an asymptotic gap between the block sensitivity bs(f) and the randomized certificate complexity RC(f). However, two and a half years ago, Avishay Tal discovered that this didn’t work, because block sensitivity doesn’t behave nicely under composition. (In assuming it did, I was propagating an error introduced earlier by Wegener and Zádori.) More broadly, Avishay showed that there is no recursively-defined Boolean function with an asymptotic gap between bs(f) and RC(f). On the other hand, if we just want some Boolean function with an asymptotic gap between bs(f) and RC(f), then Raghav Kulkarni observed that we can use a non-recursive function introduced by Xiaoming Sun, which yields bs(f)≈N3/7 and RC(f)≈N4/7. This is actually a larger separation than the one I’d wrongly claimed.
Now that I’ve come clean about all these things, hopefully the healing can begin at last.