## PostBQP Postscripts: A Confession of Mathematical Errors

**tl;dr: This post reveals two errors in one of my most-cited papers, and also explains how to fix them. Thanks to Piotr Achinger, Michael Cohen, Greg Kuperberg, Ciaran Lee, Ryan O’Donnell, Julian Rosen, Will Sawin, Cem Say, and others for their contributions to this post.**

If you look at my Wikipedia page, apparently one of the two things in the world that I’m “known for” (along with algebrization) is “quantum Turing with postselection.” By this, Wikipedia means my 2004 definition of the complexity class PostBQP—that is, the class of decision problems solvable in bounded-error quantum polynomial time, assuming the ability to *postselect* (or condition) on certain measurement outcomes—and my proof that PostBQP coincides with the classical complexity PP (that is, the class of decision problems expressible in terms of whether the number of inputs that cause a given polynomial-time Turing machine to accept does or doesn’t exceed some threshold).

To explain this a bit: even without quantum mechanics, it’s pretty obvious that, if you could “postselect” on exponentially-unlikely events, then you’d get huge, unrealistic amounts of computational power. For example (and apologies in advance for the macabre imagery), you could “solve” NP-complete problems in polynomial time by simply guessing a random solution, then checking whether the solution is right, and shooting yourself if it happened to be wrong! *Conditioned* on still being alive (and if you like, appealing to the “anthropic principle”), you must find yourself having guessed a valid solution—assuming, of course, that there were any valid solutions to be found. If there weren’t any, then you’d seem to be out of luck! (Exercise for the reader: generalize this “algorithm,” so that it still works even if you don’t know in advance whether your NP-complete problem instance has any valid solutions.)

So with the PostBQP=PP theorem, the surprise was not that postselection gives you lots of computational power, but rather that postselection *combined* with quantum mechanics gives you much more power even than postselection by itself (or quantum mechanics by itself, for that matter). Since P^{PP}=P^{#P}, the class PP basically captures the full difficulty of #P-complete counting problems—that is, not just solving an NP-complete problem, but counting how many solutions it has. It’s not obvious that a quantum computer with postselection can solve counting problems, but that’s what the theorem shows. That, in turn, has implications for other things: for example, I showed it can be used to prove classical facts about PP, like the fact that PP is closed under intersection (the Beigel-Reingold-Spielman Theorem), in a straightforward way; and it’s also used to show the hardness of quantum sampling problems, in the work of Bremner-Jozsa-Shepherd as well as my BosonSampling work with Arkhipov.

I’m diffident about being “known for” something so simple; once I had asked the question, the proof of PostBQP=PP took me all of an hour to work out. Yet PostBQP ended up being a hundred times more influential for quantum computing theory than things on which I expended a thousand times more effort. So on balance, I guess I’m happy to call PostBQP my own.

That’s why today’s post comes with a special sense of intellectual responsibility. Within the last month, it’s come to my attention that there are at least *two* embarrassing oversights in my PostBQP paper from a decade ago, one of them concerning the very *definition* of PostBQP. I hasten to clarify: once one fixes up the definition, the PostBQP=PP theorem remains perfectly valid, and all the applications of PostBQP that I mentioned above—for example, to reproving Beigel-Reingold-Spielman, and to the hardness of quantum sampling problems—go through just fine. But if you think I have nothing to be embarrassed about: well, read on.

The definitional subtlety came clearly to my attention a few weeks ago, when I was lecturing about PostBQP in my 6.845 Quantum Complexity Theory graduate class. I defined PostBQP as the class of languages L⊆{0,1}* for which there exists a polynomial-time quantum Turing machine M such that, for all inputs x∈{0,1}*,

- M(x) “succeeds” (determined, say, by measuring its first output qubit in the {|0>,|1>} basis) with nonzero probability.
- If x∈L, then conditioned on M(x) succeeding, M(x) “accepts” (determined, say, by measuring its second output qubit in the {|0>,|1>} basis) with probability at least 2/3.
- If x∉L, then conditioned on M(x) succeeding, M(x) accepts with probability at most 1/3.

I then had to reassure the students that PostBQP, so defined, was a “robust” class: that is, that the definition doesn’t depend on stupid things like which set of quantum gates we allow. I argued that, even though we’re postselecting on exponentially-unlikely events, it’s still OK, because the Solovay-Kitaev Theorem lets us approximate any desired unitary to within exponentially-small error, with only a polynomial increase in the size of our quantum circuit. (Here we actually need the full power of the Solovay-Kitaev Theorem, in contrast to ordinary BQP, where we only need part of the power.)

A student in the class, Michael Cohen, immediately jumped in with a difficulty: what if M(x) succeeded, not with exponentially-small probability, but with *doubly*-exponentially-small probability—say, exp(-2^{n})? In that case, one could no longer use the Solovay-Kitaev Theorem to show the irrelevance of the gate set. It would no longer even be clear that PostBQP⊆PP, since the PP simulation might not be able to keep track of such tiny probabilities.

Thinking on my feet, I replied that we could presumably choose a set of gates—for example, gates involving rational numbers only—for which doubly-exponentially-small probabilities would never arise. Or if all else failed, we could simply add to the definition of PostBQP that M(x) had to “succeed” with probability at least 1/exp(n): after all, that was the only situation I ever cared about anyway, and the only one that ever arose in the applications of PostBQP.

But the question still gnawed at me: was there a problem with my original, unamended definition of PostBQP? If we weren’t careful in choosing our gate set, *could* we have cancellations that produced doubly-exponentially-small probabilities? I promised I’d think about it more.

By a funny coincidence, just a couple weeks later, Ciaran Lee, a student at Oxford, emailed me the exact same question. So on a train ride from Princeton to Boston, I decided to think about it for real. It wasn’t hard to show that, if the gates involved square roots of rational numbers only—for example, if we’re dealing with the Hadamard and Toffoli gates, or the cos(π/8) and CNOT gates, or other standard gate sets—then every measurement outcome has at least 1/exp(n) probability, so there’s no problem with the definition of PostBQP. But I didn’t know what might happen with stranger gate sets.

As is my wont these days—when parenting, teaching, and so forth leave me with almost no time to concentrate on math—I posted the problem to MathOverflow. Almost immediately, I got incisive responses. First, Piotr Achinger pointed out that, if we allow arbitrary gates, then it’s easy to get massive cancellations. In more detail, let {a_{n}} be extremely-rapidly growing sequence of integers, say with a_{n+1} > exp(a_{n}). Then define

$$ \alpha = \sum_{n=1}^{\infty} 0.1^{a_n}. $$

If we write out α in decimal notation, it will consist of mostly 0’s, but with 1’s spaced further and further apart, like so: 0.1101000000000001000…. Now consider a gate set that involves α as well as 0.1 and -0.1 as matrix entries. Given n qubits, it’s not hard to see that we can set up an interference experiment in which one of the paths leading to a given outcome E has amplitude α, and the other paths have amplitudes $$ -(0.1^{a_1}), -(0.1^{a_2}), \ldots, -(0.1^{a_k}), $$ where k is the largest integer such that a_{k}≤n. In that case, the total amplitude of E will be about $$0.1^{a_{k+1}},$$ which for most values of n is doubly-exponentially small in n. Of course, by simply choosing a faster-growing sequence {a_{n}}, we can cause an even more severe cancellation.

Furthermore, by modifying the above construction to involve *two* crazy transcendental numbers α and β, I claim that we can set up a PostBQP computation such that deciding what happens is arbitrarily harder than PP (though still computable)—say, outside of exponential space, or even triple-exponential space. Moreover, we can do this despite the fact that the first n digits of α and β remain computable in O(n) time. The details are left as an exercise for the interested reader.

Yet even though we can engineer massive cancellations with crazy gates, I still conjectured that nothing would go wrong with “normal” gates: for example, gates involving algebraic amplitudes only. More formally, I conjectured that any finite set A=(a_{1},…,a_{k}) of algebraic numbers is “tame,” in the sense that, if p is any degree-n polynomial with integer coefficients at most exp(n) in absolute value, then p(a_{1},…,a_{k})≠0 implies |p(a_{1},…,a_{k})|≥1/exp(n). And indeed, Julian Rosen on MathOverflow found an elegant proof of this fact. I’ll let you read it over there if you’re interested, but briefly, it interprets the amplitude we want as one particular Archimedean valuation of a certain element of a number field, and then lower-bounds the amplitude by considering the product of *all* Archimedean and non-Archimedean valuations (the latter of which involves the p-adic numbers). Since this was a bit heavy-duty for me, I was grateful when Will Sawin reformulated the proof in linear-algebraic terms that I understood.

And then came the embarrassing part. A few days ago, I was chatting with Greg Kuperberg, the renowned mathematician and author of our climate-change parable. I thought he’d be interested in this PostBQP progress, so I mentioned it to him. Delicately, Greg let me know that he had recently proved the exact same results, for the exact same reason (namely, fixing the definition of PostBQP), for the latest revision of his paper How Hard Is It to Approximate the Jones Polynomial?. Moreover, he actually wrote to me in June to tell me about this! At the time, however, I regarded it as “pointless mathematical hairsplitting” (who cares about these low-level gate-set issues anyway?). So I didn’t pay it any attention—and then I’d completely forgotten about Greg’s work when the question resurfaced a few months later. This is truly a just punishment for looking down on “mathematical hairsplitting,” and not a lesson I’ll soon forget.

Anyway, Greg’s paper provides yet a third proof that the algebraic numbers are tame, this one using Galois conjugates (though it turns out that, from a sufficiently refined perspective, Greg’s proof is equivalent to the other two).

There remains one obvious open problem here, one that I noted in the MathOverflow post and in which Greg is also extremely interested. Namely, we now know that it’s possible to screw up PostBQP using gates with amplitudes that are crazy transcendental numbers (closely related to the Liouville numbers). And we also know that, if the gates have algebraic amplitudes, then everything is fine: all events have at least 1/exp(n) probability. But what if the gates have *not*-so-crazy transcendental amplitudes, like 1/e, or (a bit more realistically) cos(2)? I conjecture that everything is still fine, but the proof techniques that worked for the algebraic case seem useless here.

Stepping back, how great are the consequences of all this for our understanding of PostBQP? Fortunately, I claim that they’re not that great, for the following reason. As Adleman, DeMarrais, and Huang already noted in 1997—in the same paper that proved BQP⊆PP—we can screw up the definition even of BQP, let alone PostBQP, using a bizarre enough gate set. For example, suppose we had a gate G that mapped |0> to x|0>+y|1>, where y was a real number whose binary expansion encoded the halting problem (for example, y might equal Chaitin’s Ω). Then by applying G more and more times, we could learn more and more bits of y, and thereby solve an uncomputable problem in the limit n→∞.

Faced with this observation, most quantum computing experts would say something like: “OK, but this is silly! It has no physical relevance, since we’ll never come across a magical gate like G—if only we did! And at any rate, it has nothing to do with quantum computing specifically: even classically, one could imagine a coin that landed heads with probability equal to Chaitin’s Ω. Therefore, the right way to deal with this is simply to define BQP in such a way as to disallow such absurd gates.” And indeed, that *is* what’s done today—usually without even remarking on it.

Now, it turns out that even gates that are “perfectly safe” for defining BQP, can turn “unsafe” when it comes to defining PostBQP. To screw up the definition of PostBQP, it’s not necessary that a gate involve uncomputable (or extremely hard-to-compute) amplitudes: the amplitudes could all be easily computable, but they could still be “unsafe” because of massive cancellations, as in the example above involving α. But one could think of this as a difference of degree, rather than of kind. It’s still true that there’s a large set of gates, including virtually all the gates anyone has ever cared about in practice (Toffoli, Hadamard, π/8, etc. etc.), that are perfectly safe for defining the complexity class; it’s just that the set is slightly smaller than it was for BQP.

The other issue with the PostBQP=PP paper was discovered by Ryan O’Donnell and Cem Say. In Proposition 3 of the paper, I claim that PostBQP = BQP^{PostBQP}_{||,classical}, where the latter is the class of problems solvable by a BQP machine that’s allowed to make poly(n) *parallel, classical* queries to a PostBQP oracle. As Ryan pointed out to me, nothing in my brief argument for this depended on quantum mechanics, so it would equally well show that PostBPP = BPP^{PostBPP}_{||}, where PostBPP (also known as BPP_{path}) is the *classical* analogue of PostBQP, and BPP^{PostBPP}_{||} is the class of problems solvable by a BPP machine that can make poly(n) parallel queries to a PostBPP oracle. But BPP^{PostBPP}_{||} clearly contains BPP^{NP}_{||}, which in turn contains AM—so we would get AM in PostBPP, and therefore AM in PostBQP=PP. But Vereshchagin gave an oracle relative to which AM is *not* contained in PP. Since there was no nonrelativizing ingredient anywhere in my argument, the only possible conclusion is that my argument was wrong. (This, incidentally, provides a nice illustration of the value of oracle results.)

In retrospect, it’s easy to pinpoint what went wrong. If we try to simulate BPP^{PostBPP}_{||} in PostBPP, our random bits will be playing a dual role: in choosing the queries to be submitted to the PostBPP oracle, *and* in providing the “raw material for postselection,” in computing the responses to those queries. But in PostBPP, we only get to postselect once. When we do, the two sets of random bits that we’d wanted to keep separate will get hopelessly mixed up, with the postselection acting on the “BPP” random bits, not just on the “PostBPP” ones.

How can we fix this problem? Well, when defining the class BQP^{PostBQP}_{||,classical}, suppose we require the queries to the PostBQP oracle to be not only “classical,” but *deterministic*: that is, they have to be generated in advance by a P machine, and can’t depend on any random bits whatsoever. And suppose we define BPP^{PostBPP}_{||,classical} similarly. In that case, it’s not hard to see that the equalities BQP^{PostBQP}_{||,classical} = PostBQP and BPP^{PostBPP}_{||,classical} = PostBPP both go through. You don’t actually care about this, do you? But Ryan O’Donnell and Cem Say did, and that’s good enough for me.

I wish I could say that these are the only cases of mistakes recently being found in decade-old papers of mine, but alas, such is not the case. In the near future, my student Adam Bouland, MIT undergrad Mitchell Lee, and Singapore’s Joe Fitzsimons will post to the arXiv a paper that grew out of an error in my 2005 paper Quantum Computing and Hidden Variables. In that paper, I introduced a hypothetical generalization of the quantum computing model, in which one gets to see the entire trajectory of a hidden variable, rather than just a single measurement outcome. I showed that this generalization would let us solve problems *somewhat* beyond what we think we can do with a “standard” quantum computer. In particular, we could solve the collision problem in O(1) queries, efficiently solve Graph Isomorphism (and all other problems in the Statistical Zero-Knowledge class), and search an N-element list in only ~N^{1/3} steps, rather than the ~N^{1/2} steps of Grover’s search algorithm. *That* part of the paper remains fine!

On the other hand, at the end of the paper, I also gave a brief argument to show that, even in the hidden-variable model, ~N^{1/3} steps are *required* to search an N-element list. But Mitchell Lee and Adam Bouland discovered that that argument is wrong: it fails to account for all the possible ways that an algorithm could exploit the correlations between the hidden variable’s values at different moments in time. (I’ve previously discussed this error in other blog posts, as well as in the latest edition of *Quantum Computing Since Democritus*.)

If we suitably restrict the hidden-variable theory, then we can *correctly* prove a lower bound of ~N^{1/4}, or even (with strong enough assumptions) ~N^{1/3}; and we do that in the forthcoming paper. Even with no restrictions, as far as we know an ~N^{1/3} lower bound for search with hidden variables remains *true*. But it now looks like proving it will require a major advance in our understanding of hidden-variable theories: for example, a proof that the “Schrödinger theory” is robust to small perturbations, which I’d given as the main open problem in my 2005 paper.

As if that weren’t enough, in my 2003 paper Quantum Certificate Complexity, I claimed (as a side remark) that one could get a recursive Boolean function f with an asymptotic gap between the block sensitivity bs(f) and the randomized certificate complexity RC(f). However, two and a half years ago, Avishay Tal discovered that this didn’t work, because block sensitivity doesn’t behave nicely under composition. (In assuming it did, I was propagating an error introduced earlier by Wegener and Zádori.) More broadly, Avishay showed that there is *no* recursively-defined Boolean function with an asymptotic gap between bs(f) and RC(f). On the other hand, if we just want *some* Boolean function with an asymptotic gap between bs(f) and RC(f), then Raghav Kulkarni observed that we can use a non-recursive function introduced by Xiaoming Sun, which yields bs(f)≈N^{3/7} and RC(f)≈N^{4/7}. This is actually a larger separation than the one I’d wrongly claimed.

Now that I’ve come clean about all these things, hopefully the healing can begin at last.

Comment #1 November 30th, 2014 at 11:09 pm

Thanks for this post. Minor point: following the link from Ryan O’Donnell’s homepage it seems that Cem Say is a 48 year-old professor in Turkey and not his student.

Comment #2 December 1st, 2014 at 12:16 am

According to http://www.cmpe.boun.edu.tr/~say/, Cem Say is not a student of Ryan.

Comment #3 December 1st, 2014 at 12:57 am

Hi Scott, thanks for the prompt acknowledgment! It was too bad Cem and I didn’t get to show AM in PP. I have to request one more correction here, though: Cem Say is not my student; far from it, he’s a full professor at Boğaziçi University. [Feel free to delete this comment if you update the main text.]

-Ryan

Comment #4 December 1st, 2014 at 12:59 am

BTW, it’s “Achinger”, not “Achtinger”.

Comment #5 December 1st, 2014 at 12:59 am

I should confess to my own embarrassing error, which is evident in the first version of my paper and which is one of the things that I needed to fix when I worked on it this summer: I didn’t appreciate the need for the full strength of the Solovay-Kitaev theorem in the definition of PostBQP, and the related refinement of the Adleman-DeMarrais-Huang condition on computability of matrix gate entries.

The issue is that the definition of BQP is stable if you can approximate one gate set A with another set of gate set B with polynomial overhead and polynomial precision. You can think of this as the baby Solovay-Kitaev theorem; it has a simple, follow-your-nose proof. What Solovay and Kitaev actually show is that you can approximate A with B with polynomial overhead and

exponentialprecision, or equivalently polylog overhead and polynomial precision. For much of complexity theory, this is overkill — so why did they prove it? One reason is that polylogarithmic overhead is a very natural level of equivalence for computing in practice. (For instance, much of computer engineering in practice is close to the abstract RAM machine model. Different mathematical variations of RAM machines are equivalent up to polylog, not polynomial, overhead.) I suspect that that was Kitaev’s first motivation in raising the question.It then turned out to be a case where the stars align magically at the level of pure mathematics. The full theorem, which seemed better than strictly necessary, is important for the robustness of PostBQP, since in that complexity class you compare exponentially small probabilities and quantum amplitudes.

The related Adleman-DeMarrais-Huang condition is that the matrix entries for each gate should have an FPTAS (fully polynomial-time approximation scheme). If so, then you can not only approximate each gate in A by a circuit of gates in B a la Solovay-Kitaev, you can also know what you’re approximating. For PostBQP, this needs to be strengthened to an FPTEAS (fully polynomial-time, exponentially precise approximation scheme). In other words, if you have an FPTEAS, you can generate the digits of each number in polynomial time.

Open question: The proof of the Solovay-Kitaev theorem requires that the inverse of every gate is also available as a gate. (Because it uses near-cancellations of commutators to create multiscale approximation.) What if the gate set is not closed under inverses? Then I think it is known that the required exponential-strength approximating words or circuits exist, but it seems to be an open problem to come up with an efficient algorithm to find them. (The fact that Solovay-Kitaev is an algorithmic construction is what makes it important and original.)

This open problem is related to what made me the most upset for a little while at my oversight. My paper also treats the Tutte polynomial, where you get a generating set of gates that is not closed under inverses. So for a little while, that result looked wrecked. Fortunately there are Tutte polynomial gadgets that yield exponentially precise, approximate inverses.

Comment #6 December 1st, 2014 at 7:20 am

HDB #1, Ilya #2, Ryan #3, Michał #4: <forehead slap> thanks, fixed! I was, of course, just trying to further justify the “embarrassing myself” tag on this post. 😀

Comment #7 December 1st, 2014 at 8:19 am

Scott, Greg:

Concerning the result on “tame” algebraic numbers, I think a result in the same vein (is it the same?) was also proved, or more precisely used, in a paper of H. Lenstra “Finding small degree factors of lacunary polynomials” (see Proposition 2.2, and he refers for this to P. Voutier, “An effective lower bound for the height of algebraic numbers”, the name of which appears relevant indeed).

He actually proves a little more, by referring to an absolute height function.

Comment #8 December 1st, 2014 at 8:45 am

B. #7: Thanks!! I have to run right now, but on a quick look, that proposition of Lenstra/Voutier does indeed look very closely related, and probably implies what we want (once you unpack the definition of the height function, etc.) In general, it wouldn’t surprise me at all if this sort of thing was independently discovered many times.

Comment #9 December 1st, 2014 at 9:15 am

I can’t follow any of the specifics at all, but it makes me wonder if there could be more systematic ways/processes to help validate and manage such complex and subtle mathematical claims.

The essence of science is to test theories, but pure math is obviously very difficult to test. Ideally one would like to translate any claim into a program that can try to find counter-examples, but I don’t think it’s feasible most of the time given the exponential nature of many of the problems.

Trying to always come up with an alternate proof for each claim could help (relying on different tools), especially considering that the very act of looking for extra equivalences is often the source of further breakthrough and better understanding.

Maybe maintaining a sort of online explicit dependency tree between proofs would be useful (so that when the authors of one node find a flaw, the authors of child nodes are automatically alerted).

At the very least, it’s impressive to see the power of online forums like MathOverflow (crowd sourcing) 🙂

Comment #10 December 1st, 2014 at 9:38 am

That’s what I call overnight fame:) Better at 48 than never!

By the way, if you wish to see some unconditional proofs about models where postselection is really more powerful than its cousin nondeterminism, see:

Abuzer Yakaryılmaz, A. C. Cem Say, “Proving the power of postselection,” Fundamenta Informaticae, Vol. 123, No. 1, pp. 107-134, 2013. (http://arxiv.org/abs/1111.3125)

This may also be one of the first papers where the “restarting” view of postselection was explicitly used, I guess.

Thank you, Scott!

Comment #11 December 1st, 2014 at 10:00 am

Fred #9: On the whole, I’d say the process works extraordinarily well in math; I only wish that the error-correcting processes in other spheres (e.g. politics) worked 1% as well. 🙂 As you can see, errors aren’t always found right away, but if enough people care about something (often, because they’re trying to adapt it for their own work), then errors will inevitably come to light, and essentially always be acknowledged when they are.

Comment #12 December 1st, 2014 at 1:22 pm

There is now a titanium standard of correctness of proofs of theorems: You can encode a theorem and proof in a formal proof-checking language and have it verified by computer. This has actually been done for at least three modern results of mathematics: The four-color theorem, the sphere-packing conjecture in 3 dimensions, and the Feit-Thompson theorem that odd-order groups are solvable.

Except for lack of training, and except that it would maybe triple the total work of the result (getting the theorem, writing the paper, etc.), there is nothing stopping Scott or anyone from doing this with one of Scott’s papers, or many people’s papers. However, it wouldn’t have caught the first “mistake”, and maybe not the second one either. Scott’s theorem about PostBQP is correct as stated; what we found is a mistake in the

interpretationof the theorem.The way that I wrote it in the paper is that for each gate set Γ, you get a complexity class PostBQP-Γ. It is clear that if Γ is FPTEAS, then you get PostBQP-Γ ⊇ PostBQP. (I suppose that that always holds even if Γ isn’t FPTEAS, using the quantum fault tolerance theorem! I didn’t think of that in my paper. If you’re keeping a hairsplitting notebook, Scott, that surely belongs there.) To get the reverse inclusion, PostBQP-Γ ⊆ PostBQP, you need that Γ is FPTEAS and tame. Anyway, technically speaking, Scott’s paper claims nothing rigorous about PostBQP-Γ; it is only a correctly stated and proved theorem about vanilla PostBQP.

The worst that the paper has is a peripheral mistake in the definition of PostBQP rather than in the proof of the theorem. He says, “By a result of Shi, we can assume WLOG that the circuit is composed only of Hadamard and Toffoli gates.” So, this is clearly incomplete reasoning. Maybe this would have been caught rather than skirted if someone did try to code this paper as a formal proof; but more because of the human reason that you have to go over everything with a fine-toothed comb when you do formal proofs.

Comment #13 December 1st, 2014 at 1:54 pm

Greg, would it really only triple the workload for these kind of theorems? I had the impression that the overhead was much worse.

Comment #14 December 1st, 2014 at 2:30 pm

Joe #13 – That depends on how much of the learning curve for this sort of work is counted as continuing overhead, and how much is counted as prior training.

I think that Georges Gonthier and Tom Hales have really demonstrated that it is not out of reach; that it is not 20 times or 100 times as much work as the original result. However, certainly in the case of Hales’ theorem (he encoded his own theorem!), the result is not completely satisfying. Yes, the computer checked “the” proof — it certainly checked a complete proof, which was the purpose of the project. However, many of the preparatory lemmas are obfuscated imitations, computer-generated as well as computer-checked, of proofs for human consumption. It is similar to (indeed related to) the experience of Mathematica/Maple/etc. finding an integral with a 10-page calculation, when an able human can do it with a half-page calculation.

Indeed Scott’s paper is sufficiently simple and conceptual that it’s not clear what you would learn from formalizing it. I suspect that the journey would be the only reward. You might learn nothing new about Scott’s theorem, but you could make a very interesting formalized library for quantum and classical complexity theory.

Comment #15 December 1st, 2014 at 10:34 pm

Greg: Can you recommend a good introduction to the area? I’ve always been taken with the idea, but I had previously thought the overhead would be too high.

Comment #16 December 1st, 2014 at 11:18 pm

Joe – The main thing that I know about it is that one of the most successful proof systems is called “Coq”. It means “rooster” in French. I’m not sure why they didn’t translate the name to English for English-speaking audiences.

One other thing that I know is that my late uncle, Andrzej Trybulec, created another such system called “Mizar”. It was his life’s work.

Comment #17 December 2nd, 2014 at 12:50 am

This kind of thing is why I’d like to get the “formally verified complexity theory” ball rolling. In complexity arguments, there are so many details to keep track of, and any small mistake could throw off an argument, that even a small amount of computer assistance could be valuable in preventing small mistakes from spreading. Some algebraic topologists (Voevodsky in particular) learnt this lesson the hard way, and Homotopy Type Theory is the happy result, a formal system capable of greatly simplifying (and verifying) proofs in algebraic topology. It remains to be seen whether someone will invent a formal system that both simplifies complexity theoretic proofs and makes them easier to verify automatically.

Comment #18 December 2nd, 2014 at 2:15 am

As a Canadian, I approve of these apologies.

Comment #19 December 2nd, 2014 at 8:17 pm

Everyone should do a post like this!

The issue of postselection “contaminating” the random choice of inputs to the problem reminds me of the issues discussed in

http://arxiv.org/abs/0908.3023

They too wrestle with the notion of deterministic inputs, e.g. with:

On a different note, I wonder if you found so many bugs in your simple and famous papers not because they are uniquely wrong, but because they are simple and famous. I suspect that in my own papers, the # of bugs found is really a function of how comprehensible they are and how many people have looked at them.

Comment #20 December 3rd, 2014 at 12:27 am

Francisco –

In complexity arguments, there are so many details to keep track of, and any small mistake could throw off an argument, that even a small amount of computer assistance could be valuable in preventing small mistakes from spreading.That’s not my experience. Many of the earlier arguments in complexity theory, which are already quite important, are based on a simple, clear, nearly inevitable idea. Newer methods based on hashing and coding theory are more complicated, but my impression is that you build up some amount of machinery and then you see the writing on the wall.

I have not seen much in this area that has the feel of delicate estimates as you see in number theory, for example. Nor the fairly intricate algorithmic proofs as you sometimes see in geometric topology, for example. But certainly specific algorithms in CS, as opposed to complexity theory, can be quite delicate or intricate.

Some algebraic topologists (Voevodsky in particular) learnt this lesson the hard way, and Homotopy Type Theory is the happy result, a formal system capable of greatly simplifying (and verifying) proofs in algebraic topology.More power to Voevodsky for entering the field of formalized proofs, but I am not sure what it really does simplify in algebraic topology. Algebraic topology might well benefit from a stronger connection to computer science in general, since it includes some serious complexity-theoretic and algorithmic questions. Whether it specifically needs formalized proofs is less clear to me.

As far as I know, the marquee formalized proofs by Gonthier and Hales don’t simplify anything. Some of the proofs were simplified and improved before they could be formalized, but that’s not the same thing. The formalized proofs do certify the results, which is very important for a different reason, to clear the air of controversy.

It remains to be seen whether someone will invent a formal system that both simplifies complexity theoretic proofs and makes them easier to verify automatically.I haven’t done anything for formalized proofs in complexity theory, but I do have an old project to build an expert system of complexity class relations, that I called Complexity Zoology.

Comment #21 December 3rd, 2014 at 2:47 am

I appreciate Scott’s honesty. However, given that the original theorem was wrong, what will it take now for us to be able to believe that PostBQP=PP?

It seems to me that the result is basically in limbo. The author says that the mistake was fixed, but of course he would say that. There isn’t a huge incentive for others to dig into the details to verify the claim. Seeing that it took so long for the community to find this mistake, how much longer do we have to wait to get confidence in this? I feel like I have to verify the result myself, but lack the time, or the interest in reading poorly written proofs. As far as I can tell from this blog post, there is not even any proof currently written up for me to consider. Maybe I am too skeptical?

Comment #22 December 3rd, 2014 at 4:45 am

Scott: Just to make sure I understand this right. You propose to choose special gates where the probabilities or amplitudes of the output encode some information. Are you saying that then a quantum circuit with post-selection can extract the information from the low order digits of the amplitudes effectively, but a quantum circuit without post-selection or a randomized circuit with post-selection cannot?

Comment #23 December 3rd, 2014 at 8:02 am

Joe #21: I think you misunderstand the situation. I gave a proof in the paper for PP⊆PostBQP (which isn’t long, by the way, just 1 page). The proof is completely valid. Indeed, it even works with the bizarre, pathological gate sets I’m talking about in this post.

Now, I also included a sentence or two to explain why the converse direction, PostBQP⊆PP, was “obvious.” And it

isobvious—as long as the gate set that you used to define PostBQP isn’t too pathological. At the time, I didn’t care about the issue of pathological gate sets, but eventually other peopledidcare, and they rightly forced me to care as well. In particular, the treatment of this issue in Greg’s paper on the Jones polynomial is fully rigorous.Interestingly, if you look at the other three examples of errors/oversights in this post,

every one of themhas a common structure. Namely, for every statement that I recognized as central and requiring a nontrivial proof, I gave a proof that’s withstood the test of time (as proofs tend to do 🙂 ). When problems arose, they were always in auxiliary, side statements—things I felt were “obvious” and could be dealt with in a brief, handwavy remark or a 3-sentence proof sketch. (And in most of the cases, including the PostBQP one, the real issue was a wrong or ambiguousdefinition; the handwavy argument given is fine as long as the definition is fixed.)I’ve found this to be the situation for other researchers as well: for example, a paper from the early 90s wrongly asserted that every 2-prover game has perfect parallel repetition—not because the authors gave an incorrect proof of the parallel repetition theorem, but because at the time, they considered it so obvious that it didn’t even

requireproof.Lessons about which parts of published math/TCS papers to trust and which to be skeptical about—assuming, of course, that you don’t have time to verify everything yourself—are left as exercises for the reader. 🙂

Comment #24 December 3rd, 2014 at 8:12 am

jonas #22: No, it’s more subtle than that. There exist gates that encode nontrivial information, in such a way that even a BQP circuit (i.e., a quantum circuit with no postselection) can efficiently extract it. The gate G from the post provides an example of that. There are also classical analogues of those gates: for example, a coin that lands heads with probability equal to Chaitin’s Ω. A BPP machine can efficiently extract uncomputable information from those coins as well.

However, there

alsoexist other quantum gates—for example, those that involve amplitudes like α (i.e., sums of super-duper-rapidly decreasing sequences of rational numbers)—such that a BQP circuit couldnotefficiently extract any useful information from them, but a PostBQP circuitcoulddo so, by using a combination of quantum interference and postselection.In order to get an analogous phenomenon classically, I believe you would need coins such that the probabilities of their landing heads

themselvesdecreased super-duper-rapidly as a function of the input size n. I don’t see how you could get it with coins that have constant probabilities of landing heads. But, of course, I might be mistaken about that.Comment #25 December 3rd, 2014 at 9:06 am

Should not the tenure committee be reconvened? (there is ordinarily a five year statute of limitations on such decisions)

Comment #26 December 3rd, 2014 at 9:38 am

Slipper.Mystery #25: LOL! As it happens, all the oversights discussed in this post were committed before MIT

hiredme, let alone before they tenured me. But yes, if they actually tenured me because they thought I’d proved PostBQP_{G}=PP for arbitrary gate sets G, not merely for all the gate sets G that people normally care about, then I hereby invite the committee to revisit their decision.Comment #27 December 3rd, 2014 at 10:42 am

Some people are here are clearly reading too much in to Scott’s style of self-criticism. It has been reasonably standard in some papers to quantum computation solely in terms of one gate set, the Hadamard and Toffoli gates. If you read his paper that way, it has no mistake at all, only a limitation that he didn’t properly consider. In particular, in answer to Joe’s question, “What will it take now for us to be able to believe that PostBQP=PP”, the answer is that it only takes Scott’s paper. The issue is not whether PostBQP=PP, the issue is whether standard PostBQP equals its other flavors, namely PostBQP-Γ for other gate sets Γ. (I think that I still cannot do subscripts in comments in WordPress, even though Scott can!?)

That said, if this is the page for hair-splitting, I do not completely agree with Scott’s comment #23.

(1) I think that gate sets such as those produced by the Jones polynomial are certainly among those that people care about. That’s why this gate set issue came up in my paper. Of course the result turns out to be true in that case, but only using the extra lemma that such a gate set Γ is tame. So, the question is whether Scott’s tenure at MIT should be revoked because of an extra lemma that people in MO proved in 15 minutes. I wish! Then maybe we can hire him at UC Davis.

(2) Scott’s original paper cites a paper of Shi to argue that the gate set doesn’t matter, implicitly as Scott now explains only for one side of the relation, namely PostBQP⊇PP. The citation should really just have been for Solovay-Kitaev; as such, his paper only establishes that PostBQP-Γ⊇PP when Γ is FPTEAS. Actually what this step really proves is PostBQP-Γ⊇PostBQP. The only way that I know how to prove that PostBQP-Γ⊇PostBQP if Γ is not FPTEAS is to use the quantum fault tolerance theorem to coerce Γ to a nearby gate set Γ’ that is FPTEAS or even rational. (Or, maybe better, to use short words in Γ to bring it close to Hadamard-Toffoli, and then coerce it to that.)

(3) If people really want to split hairs, question: Is it the case that PostBQP-Γ⊆PostBQP if and only if Γ is both tame and FPTEAS?

Comment #28 December 3rd, 2014 at 11:50 am

Greg #27:

Is it the case that PostBQP-Γ⊆PostBQP if and only if Γ is both tame and FPTEAS?

Nice question! So, we agree that if Γ is both tame and FPTEAS, then PostBQP

_{Γ}⊆PostBQP; the only question concerns the converse direction?As a first observation, we can’t hope for an

unconditionalproof of the converse direction. As an example, suppose Γ was FPTEAS but non-tame, and suppose a difficulty arose because, even though we could compute the first n bits of a transition amplitude α in O(n) time, we could arrange a postselected interference experiment whose outcome depended on the 2^{n}th bit of α. But suppose the k^{th}bit of αwascomputable in polylog(k)space, for all k. Then we might still be able to simulate PostBQP_{Γ}in PostBQP, but it would depend on whether P=PSPACE.So we should revise your conjecture to something like: “PostBQP

_{Γ}⊆ PostBQP if and only if all transition amplitudes in Γ have the property that, in poly(n) time, we can compute all the bits of those amplitudes that can ever actually matter for estimating a conditional probability to constant additive error.” Here, of course, the problem is to refine the second part of the “if and only if” so as to make it non-tautological. 🙂Comment #29 December 3rd, 2014 at 12:31 pm

Of course you’re right that the converse direction is all that is left.

Your rebuttal here may indeed render a full “if and and only if” to be optional hairsplitting even by my standards. Maybe the question should be modified somehow, but I would have to think more how.

I have to admit that the question I like a lot better is whether Solovay-Kitaev is still true without gate inverses. That is also splitting hairs, but in this case artistically, in my opinion.

Comment #30 December 3rd, 2014 at 4:34 pm

Talking about magical gates, quantum machines are able to make much better use of them, compared to classical machines: It’s known that polynomial-time two-way probabilistic finite automata cannot recognize any nonregular language, even when equipped with such magic coins. Add one qubit of quantum memory to that model, and you can recognize uncountably many languages.

Details in: http://arxiv.org/abs/1411.7647

Comment #31 December 3rd, 2014 at 5:50 pm

Cem – Cool! It’s this kind of thing that convinces me that “bad” computer science can be good mathematics.

Comment #32 December 3rd, 2014 at 11:20 pm

Is the power of PostBQP reduced when constrained to some choices of otherwise universal quantum gates (e.g. Hadamard and Toffoli) or does the choice of gates merely hinder a PP simulation, without providing additional computational power?

The goal was to show that a problem in PostBQP is also in PP, right?

Comment #33 December 4th, 2014 at 2:17 am

Greg Kuperberg #16: The story is that Coq was intentionally named a bad word in english as revenge for something else, I unfortunately forget what, being a bad word in french.

Comment #34 December 4th, 2014 at 8:21 am

Job #32: With a sufficiently bizarre gate, you

really canget additional computational power. This can be seen even without postselection, and even classically. Suppose you had a coin whose probability p of landing heads encoded the bits of the halting problem. Then by flipping the coin more and more times, you could (slowly!) learn more and more bits of p, and thereby solve any instance of the halting problem in a finite amount of time.In some sense, bringing in quantum mechanics plus postselection “amplifies” the above issue slightly, by allowing it to arise even in situations where it

wouldn’tarise without quantum interference or without postselection. And yes, you can thereby increase your computational power to any (computable) bound you want above PP, such as EXPSPACE or EXPEXPEXP. But the gates for which this happens are still a bizarre, measure-0 set. (Note to self: rigorously prove that the set is measure 0…)As I said in the post, the usual approach to these issues, in the “standard” cases of BPP and BQP, is just to define the complexity classes so as to disallow bizarre gates! This is justified by the facts that (a) we’ll presumably never come across gates like these in physical reality, and (b) crucially,

even if we did, noise and imprecision would inevitably prevent us from reading out more than a small finite number of bits about the gates. So as Greg suggested, If I’d given the issue any thought, I simply would have made a similar stipulation in my original definition of PostBQP.Comment #35 December 4th, 2014 at 11:51 am

Chutzpah starting off with an example that will kill off most students that don’t understand the problem before they even get to the corrections and then leaving off with a beginning of healing statement…at the end ! A very classical hedge.

Comment #36 December 4th, 2014 at 11:52 am

Ok. But in that case, can you be sure that if you define BQP with only your favourite gate set, you’re not missing anything you could compute with a larger but still realistic non-magical gate set?

Comment #37 December 4th, 2014 at 1:02 pm

Scott, gates that give you all of EXPEXPEXP are measure zero, but gates that give you access to uncomputable problems are full measure. A random gate is a random oracle.

Comment #38 December 4th, 2014 at 2:02 pm

jonas #36: The key observation here is that, as Greg mentioned, PostBQP

_{G}=PP for any gate set G that has the two properties(1) it’s “tame” (i.e., it doesn’t produce doubly-exponentially-small cancellations), and

(2) the first n bits of any of the transition amplitudes are computable in poly(n) time.

And I think it’s reasonable to assume that both of these properties hold in real life: if you think of a gate as a rotation that you turn on for some prescribed length of time and then turn off, then you’ll never control it to such precision that you could produce doubly-exponentially-small cancellations. And you’ll

certainlyneed to be able to efficiently compute the bits of the rotation angle, since otherwise, how were you applying the rotation in the first place?(And also, you can only control a constant number of bits of the rotation angle anyway; the remaining bits are uncontrollable noise. Ultimately, you need the Fault-Tolerance Theorem to assure you that this lack of control poses no problem of principle for scalable quantum computation.)

The only way I see that the assumptions could be violated, would be if Nature happened to provide us with “magical” fundamental physical processes (e.g., particle decays?), which occurred with amplitudes that were non-tame or super-hard-to-compute complex numbers. But to me, that speculation belongs in the same class as the speculations that (e.g.) there are magical quantum states left over from the Big Bang that would let us solve the group membership problem (even though those states would’ve taken exponential time to prepare), or that there are magical unitary n-qubit unitary transformations that physics can apply and that would let us solve NP-complete problems. In other words, these speculations all share the properties that

(1) they don’t formally violate any rules of quantum physics, and

(2) if they existed, they

really couldincrease our computational power beyond BQP, but(3) there seems to be no reason at present to take them more seriously than the Easter Bunny.

Comment #39 December 4th, 2014 at 2:13 pm

Douglas #37:

A random gate is a random oracle.

Yes, that’s a fair and important point. But observe that the random gates you discuss are already trouble for BQP, let alone for PostBQP! And their classical analogues are trouble for BPP: that is, almost every coin bias p lets you learn log(n) bits of a random oracle in poly(n) time, and thereby “solve an uncomputable problem.”

In all these cases, the argument I’d want to make is that we don’t get to control our gates to that much precision anyway—i.e., we only ever learn O(1) bits from them (say, the value of the fine-structure constant?) that we didn’t write down at the outset.

In any case, let me state my conjecture more precisely: the gates that are unsafe for PostBQP,

despite being safe for BQP,form a measure-0 subset of the set of all gates.Comment #40 December 4th, 2014 at 4:14 pm

Given that PostBQP is defined as “BQP with post-selection”, shouldn’t

post-selection be constrained to the set of quantum states reachable by all

universal quantum gates?

That is, in order to post-select on a given quantum state, you’d need to show

that any universal set of quantum gates can produce the state that will be

selected against.

Comment #41 December 4th, 2014 at 4:30 pm

Job #40: There are

nostates reachable from the all-0 state byeveryuniversal set of quantum gates, except for the all-0 state itself. For we can always get a universal set by taking a different universal set and perturbing it by some crazy transcendental amount. (Part of) what makes BQP a robust class is that every state is arbitrarilyapproximableby gates from any universal set, and approximation is the only thing that matters—not only in practice, but for any “normal” complexity-theoretic application.So your proposed condition for PostBQP is

waytoo strong. (And if we changed it to every set of states reachable bysomeuniversal set of states, then we’d get a vacuous condition, since that set is the set of all states.)Comment #42 December 4th, 2014 at 4:51 pm

Scott,

That could be addressed that by introducing a maximum error constant e.

E.g. a quantum state is post-selectable if it can be reached by the action of any set of universal quantum gates with error deviation at most e.

This seems more or less analogous to your suggestion of requiring that M(x) succeed with probability no less than 1/exp(n).

Comment #43 December 4th, 2014 at 5:40 pm

Job #42: Then once again, as long as e≥1/exp(n), your set of “post-selectable states” will simply be the set of all states.

Comment #44 December 4th, 2014 at 5:53 pm

Can you explain a little bit what is meant by and what the purpose of defining PostBQP in a way “consistent with what is practically achievable” if PostBQP is anyways an unrealistic class? The relevance of “PostBQP=PP” to people who actually want to build machines is a kind of metaphor for the power of quantum mechanics, right? It’s a little unclear to me how practical assumptions could factor in to theoretical results like that PP is closed under intersection.

Comment #45 December 4th, 2014 at 7:02 pm

Scott, let me try again:

A state is post-selectable if its probability of being measured is non-zero within the minimum precision required for universal computation.Admittedly this is what you mentioned in your blog post, but i managed to throw “universal computation” in there which makes it more legit.

Comment #46 December 4th, 2014 at 7:49 pm

Sam Hopkins #44: Well, there’s practical and then there’s practical. 🙂 To expand on what I mean, let’s say your goal is to prove PP closed under intersection. In that case, all you need is that PP=PostBQP

_{G}forsomegate set G—sinceeveryPostBQP_{G}is clearly closed under intersection. So you can just take G to be the set of gates with rational amplitudes, and be done with it. For another example, let’s say your goal is to show that exact BosonSampling, or simulating constant-depth quantum circuits or stabilizer circuits with magic initial states, are classically intractable unless PH collapses. In that case, all you need is the PP⊆PostBQP direction, which doesn’t depend on the gate set at all; the PostBQP⊆PP direction is irrelevant. More generally, I can’t think of any “application” of PostBQP=PP to anyotherissue in quantum computing theory, for which it really matters that PostBQP_{G}⊆PP fails for pathological gate sets G. (Though Greg would probably argue that making sure PostBQP_{G}=PP works for the somewhat-exotic gate sets G that arise in topological QC provided an example.)Comment #47 December 4th, 2014 at 10:04 pm

Scott #23:

Thank you for explaining. That makes more sense. I apologize for my attitude. I should have read your post more closely.

“Interestingly, if you look at the other three examples of errors/oversights in this post, every one of them has a common structure. Namely, for every statement that I recognized as central and requiring a nontrivial proof, I gave a proof that’s withstood the test of time (as proofs tend to do 🙂 ). When problems arose, they were always in auxiliary, side statements—things I felt were “obvious” and could be dealt with in a brief, handwavy remark or a 3-sentence proof sketch…

“I’ve found this to be the situation for other researchers as well: for example, a paper from the early 90s wrongly asserted that every 2-prover game has perfect parallel repetition—not because the authors gave an incorrect proof of the parallel repetition theorem, but because at the time, they considered it so obvious that it didn’t even require proof.”

So why didn’t you learn that lesson?

Now I’d better apologize again. 🙂 But seriously, this is a lesson that CS researchers need to learn. Mistakes like this can conceivably be very bad, and set back entire fields. Imagine if instead of giving an incorrect proof of parallel repetition they had simply stated that it is trivial or obvious. How many graduate students would that have discouraged from discovering the truth? I still find that a large fraction of the statements in papers that are called “trivial” or “obvious” are actually false. Often the statements can be fixed, but just as often they can’t be.

(I think this is partly because statements that authors call “trivial” or “obvious” draw additional scrutiny. Also, authors who still use those words are on average more careless writers and careless researchers.)

Comment #48 December 5th, 2014 at 8:02 am

Joe #47: You yourself apologize, but then go on to say the same sort of things you said the first time—so I guess we’re all in the same boat here! 🙂

I think you’re ignoring a massive selection effect: 99% of what gets handwaved as trivial in math/TCS papers,

really isas trivial and unproblematic as the authors said it was! But of course, that’s not the 99% that later prompts blog posts like one.Furthermore, trying to justify every trivial claim would make papers unreadably and unwrite-ably long—try earnestly to do so, and what you get is not a research paper but

Principia Mathematica(where by volume II, Russell and Whitehead had finally developed enough machinery to prove 1+1=2).And when you write:

Imagine if instead of giving an incorrect proof of parallel repetition they had simply stated that it is trivial or obvious. How many graduate students would that have discouraged from discovering the truth?

My understanding is that the latter

iswhat happened with the Fortnow-Rompel-Sipser paper. They published a paper with a side-note saying something like, “obviously, you can then just do parallel repetition to decrease the error.” Later, they asked a student to fill in that detail; and when the student couldn’t do it, it led Fortnow to find a counterexample, and an erratum to be published. But even if they hadn’t asked a student to do it, given the level of interest in PCP at that time (and continuing till today), I’mcertainthat the problem would’ve been detected by someone else before long.More generally, I would say: errors are inevitable (both straightforward technical errors and the even-more-insidious “errors” of definition and interpretation), and detecting and correcting them is a completely normal part of scientific progress—and that’s as true in math and TCS as it is in empirical sciences. By themselves, errors are not a sign of any sort of pathology in a field—in fact,

noerrors would make me worried that the field is stagnating or not advancing fast enough. Of course, if errors are so pervasive that they make progress impossible, or (especially) if they’re brought to light but then not acknowledged by the authors, thenthat’sa sign of pathology.Comment #49 December 5th, 2014 at 9:37 am

Scott #48

One on hand you have errors in proofs – at least those can be found and addressed, cause there is a tentative proof. As you say I don’t think these would stop progress, assuming there’s enough interest in the area (if not, who cares?). And the ultimate test of any theory is when it’s finally grounded in a real world application/experiment (otherwise, who cares?).

What’s probably more insidious and harmful to progress is when there are no known proof, but the majority of the field assumes one way or another, and it turns out they’re wrong.

I’m still fascinated by this story:

http://www.wired.com/2013/05/twin-primes/all/

Kudos to the people who decide to go against the mainstream opinion and dedicate years of their live on it.

Science is about questioning authority after all…

Comment #50 December 5th, 2014 at 10:14 am

Although when it’s about going from theory to practice, it often comes down to “models”, and it’s no longer as simple as verifying a “proof”, it’s about understanding the trade-offs and the assumptions, which easily get swept under the rug when money is at stake. Then smart people get blamed.

http://archive.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

Comment #51 December 5th, 2014 at 10:15 am

Fred #49: The story of Yitang Zhang is an inspiration to all of us, but I don’t see how it’s an example of anything we’re talking about here. As I understand it, no one claimed to have any impossibility proof showing that the GPY sieving method could never yield bounded gaps between the primes; it’s just that various experts tried to do it and failed. Through a feat of superhuman perseverance, Zhang succeeded. And once he did, his accomplishment was

immediatelyrecognized and celebrated by the math community.The examples discussed in this post are all things that I assumed without careful proof, not for “ideological” reasons, but because

I didn’t even notice there was a nontrivial issue there.Crucially, this meant that, as soon as the issue was brought to my attention, I got it—it at most took the exchange of one or two emails. (The one exception, of course, being Greg Kuperberg’s gchat with me about PostBQP, when I was simply too tired and distracted to concentrate on what he was saying—and whatever the issue was, he said he’d resolved it anyway, so what was the problem? 😉 )At the other extreme, there are cases like P≠NP, where a whole research community believes something without proof (or at least, conducts their research

as ifthey believe it), but not because they haven’t given it much thought—rather, precisely because they’ve given it anenormous amountof thought, and the hypothesis has survived a decades-long gauntlet of criticism, argument, and attempted refutation. (Which then makes it exasperating to experts when someone comes along with the notion thatthey, though not the experts, get to bypass the gauntlet, and have their own untested beliefs about the question treated on an equal footing—since, after all, nothing has been proved.)In my view, neither of the two situations above—the unexamined assumption that’s acknowledged to be unfounded as soon as it’s brought to light, or the examined assumption that’s provisionally held precisely because of how

muchit’s been examined—is dangerous to the Popperian process by which science advances. On the contrary, both of these sorts of things are normalpartsof the Popperian process. What’s dangerous is when an assumption is clung to forbadreasons, like authority, superstition, dogma, or the fear of offending a powerful interest group; and when arguments against the assumption aren’t even allowed to be entertained.Comment #52 December 5th, 2014 at 11:17 am

Scott #51

You’re right – going against the “honest” main stream opinion of the field is one thing (that opinion is often built on empirical evidence), and going against a dogma is quite another (Yitang Zhang’s achievement was immediately recognized, while Galileo ended up in jail).

I was thinking that the general public often doesn’t realize the amount of courage and sacrifice involved in doing science.

There’s a personal “risk” involved in every scientific pursuit (and not all of them require the same commitment – the Particle Fever documentary made it clear that some scientists “bet” their entire career on one theory.. being wrong still helps science in general, but it’s a bitter reward).

For every “against-all-odds” genius who is successful (Zhang, Wiles, Perelman), how many others turn out to be wrong?

Also, lots of endeavors require huge teams and huge budgets, those have to be more pragmatic in terms of likelihood of success (vs one guy doing math alone on his free time).

Another related topic is the idea that scientific progress is somewhat inevitable (because of its very nature) – it’s often observed that many people converge towards the same solution at the same time, that specific individual breakthrough don’t matter as much as we may think – e.g. if Einstein had been hit by a bus as a toddler, certainly someone else would eventually had come up with the same theories. I’m not quite sure it’s true though.

Comment #53 December 5th, 2014 at 12:12 pm

Fred – “Science is about questioning authority after all…”

This point really gets overplayed. Science is about objective analysis of the facts, which

mayinvolve questioning authority when authority happens to be wrong. Science is not a fetish of debate for the sake of debate, nor a fetish of contradicting authority just for its own sake either.“For every “against-all-odds” genius who is successful (Zhang, Wiles, Perelman), how many others turn out to be wrong?”

Actually, Wiles and Perelman certainly didn’t work against all odds. They may have worked alone, but they were well-connected, well-trained insiders. You can call them geniuses if you like because they certainly are very talented; but the word “genius” is also overplayed.

“If Einstein had been hit by a bus as a toddler, certainly someone else would eventually had come up with the same theories.”

That’s absolutely true, because with the exception of general relativity, everything that Einstein discovered was long overdue. The reason that his miracle year was possible was that he was excellent at reading the writing on the wall, at a moment when the physics community was strangely stubborn. Even general relativity would have been discovered within a few decades; it was indeed inevitable.

However, the point is moot. If Einstein hadn’t done these things, they would have been done by another Einstein. You can dismiss almost anyone’s creative achievement this way. “Yes, Michalengelo, your ceiling painting is great; but pshaw, if you hadn’t done it, someone else would eventually.” That’s a maddening reaction.

Anyway, this is all peripheral to the real point of Scott’s post. It’s too meta. Scott isn’t really talking about methods of doing science. His real point is that you need a tame set of gates to properly define the complexity class PostBQP.

Comment #54 December 5th, 2014 at 4:17 pm

Greg #53

You’re right, sorry.

I guess that when I read

“As you can see, errors aren’t always found right away, but if enough people care about something (often, because they’re trying to adapt it for their own work), then errors will inevitably come to light, and essentially always be acknowledged when they are.”

my mind kinda drifted “up” from errors in proofs to the proofs themselves, etc.

Comment #55 December 6th, 2014 at 1:59 am

Kuperberg #20,

As an algebraic topology grad student, I would also love to see more contact with theoretical CS, particularly in the direction of proving certain problems are hard.

You responded (in 1999!) to the topology mailing list explaining Anick’s result that rational homotopy groups of finite complexes are #P-hard. My impression (I would love to be corrected) is that almost no knowledge has been gained since then!

In particular, I know no lower bounds on the complexity of stable homotopy theory.

(1) In contrast with Anick’s result in the unstable setting, the rational stable homotopy groups of a finite complex are just its rational cohomology groups, which are easily computed.

(2) No one was able to answer my question on mathoverflow:

http://mathoverflow.net/questions/125745/computational-complexity-of-topological-k-theory

I do not know the complexity of even the simplest generalized cohomology theory.

(3) There is an intuition that the stable homotopy groups of finite complexes are built in v_i periodic layers (where i=0,1,2,…). One is supposed to be able to compute the (v_i)th layer for any fixed i, but the complexity should grow rapidly with i. Is there any substantiation behind this intuition?

(4) There is an intuition that pieces of the stable homotopy groups of spheres are strongly related to the group cohomology of the Morava stabilizer group. Are there established lower bounds on the complexities of purely algebraic problems, like computing group cohomology or Ext groups? I would imagine so…

Comment #56 December 6th, 2014 at 10:49 am

Does anybody here thinks that the problem of the gate selection is not only in the mathematical definition of the classes but actually tells us that the mathematical-physical assumption about the connection between a 1:1 correspondence between quantum states and Hilbert space operators ( or gates ) is not accurate ?

How can we claim things like Scott did in #34 “we’ll presumably never come across gates like these in physical reality” with no real theoretical justification?

Comment #57 December 6th, 2014 at 10:59 am

Itai #56: The theoretical justification is that we can only implement gates to finite accuracy.

Comment #58 December 6th, 2014 at 1:53 pm

Jeremy Hahn – These are all great questions, of course, but I don’t know how much I can add to them. I only have the sense that there is a lot of unexplored territory with upper bounds on complexity in algebraic topology, which may be as fruitful to explore as lower bounds. For example, it’s a theorem that it’s recursive to compute homotopy groups of spheres (even unstable cases). Presumably if you step through the proof, you can get some explicit upper bound, not just that it’s recursive. For example, is it primitive recursive?

With questions like group cohomology, it is very easy to get into non-recursive territory with finitely presented groups. There is the well-known result of Novikov that whether a presented group is trivial is halting-complete. Of course you can compute the first cohomology directly from the group presentation, but I don’t know about the second cohomology, which doesn’t even have to have finite rank.

Surely the difficulty of computing group cohomology depends on how the group is described. If it is a finite group described by its group multiplication table, then the bar resolution gives you an upper bound on the complexity of the group cohomology. This upper bound isn’t considered very good, but at least it’s something. In fact, that’s a great question in its own right: How hard is it to compute the group cohomology of a finite group?

I don’t know much about the proof of Brown’s theorem that homotopy groups between simply connected spaces are recursive. (Which of course generalizes the fact that homotopy groups of spheres are recursive.) I do remember that it involved finite approximations of some kind.

Comment #59 December 6th, 2014 at 2:27 pm

There is actually a semi-realistic interpretation of the issue that Adleman-DeMarrais-Huang raised with gates whose matrices aren’t computable, or at least have very high computational complexity. You can look at this way: What kind of computations could you do if someone else handed you incredibly precise quantum gates that are also rich with information? If these gates are encoded with quantum fault tolerance, then that’s actually possible. (Since, by the amazing power of the Solovay-Kitaev theorem and the fault tolerance theorem, you can get n digits of precision with polynomial overhead.)

Their answer is that you can read the digits back out again and learn whatever information they carry. Once you have this conclusion, it’s only barely about quantum computation. It’s like buying a classical computer not for its RAM and its CPU, but because it has incredibly valuable ROM. Hell, the ROM could be so great that you might not even care about the CPU for any purpose other than to read the ROM. Certainly in the BQP-gate-set scenario, it’s a desperate struggle: Even though you can design these gates with polynomial overhead, it takes exponential work to read out the digits. The information in the digits has to be more valuable than that to be worth it.

So, does that make gates like this “physical”? Yes and no. They are physical in the sense that once quantum computers exist, people will be able to make fancy gates. (Moreover, this issue is the same with classical stochastic gates, which are already realistic.) On the other hand, the idea that quantum mechanics will just hand you special quantum gates with exotic information in the matrix coefficients reminds me of the controversial plot line of Sagan’s book Contact, in which God places a special message for humanity in the digits of π in base 11.

Comment #60 December 6th, 2014 at 3:54 pm

Jeremy: My impression (I would love to be corrected) is that almost no knowledge has been gained since then!

There is certainly progress in 1) understanding computational aspects of topology and 2) applying and relating ideas and notions from algebraic topology to computation and to complexity (but not necessarily through computational complexity).

On 1) there is much progress on the computational complexity of questions about embedding. And also about the computational compelxity of telling if a knot is knotted or not. (See, for example, https://gilkalai.wordpress.com/2012/04/10/greg-kuperberg-it-is-in-np-to-tell-if-a-knot-is-knotted-under-grh/ )

2) is too large area to briefly describe.

Comment #61 December 8th, 2014 at 8:44 am

Scott #57

“The theoretical justification is that we can only implement gates to finite accuracy.”

Is this accuracy related to de-coherence creeping in, and amplitude superposition is lost?

Or is this something even more fundamental, like an analogy to a classical coin flip that would have a theoretical arbitrary randomness associated with it, e.g. prob(head) = 0.1 * PI, but in practice the accuracy of the realization is always going to be limited?

Comment #62 December 8th, 2014 at 9:02 am

Fred #61: The latter. (Though see the expanded, better treatment of this issue by Greg #59: he explains how, by using fault-tolerance, it

wouldbe possible to overcome the issue, but what you’d get at the end of the day is a computer that’s valuable because of the information its designer hardwired into its transition probabilities, not because of anything it does.)Comment #63 December 10th, 2014 at 3:04 pm

Scott #48:

“I think you’re ignoring a massive selection effect: 99% of what gets handwaved as trivial in math/TCS papers, really is as trivial and unproblematic as the authors said it was! But of course, that’s not the 99% that later prompts blog posts like one.”

We will have to disagree here. I think the percentage is well below 99%. This is based on experience, not blog posts.

“Furthermore, trying to justify every trivial claim would make papers unreadably and unwrite-ably long—try earnestly to do so, and what you get is not a research paper but Principia Mathematica (where by volume II, Russell and Whitehead had finally developed enough machinery to prove 1+1=2).”

Again, I can’t agree with this hyperbole. Nobody in TCS is arguing about 1+1=2. They are making statements, calling them trivial, and skipping the proofs. Most of the statements have one-page proofs that would fit fine in an appendix. If it really requires a 1000-page proof, then it probably isn’t trivial and less definitive language should be used: “We suspect that this statement can be verified by a straightforward but lengthy calculation, but have not checked the details.”

“My understanding is that the latter is what happened with the Fortnow-Rompel-Sipser paper. They published a paper with a side-note saying something like, ‘obviously, you can then just do parallel repetition to decrease the error.’ …”

This kind of claim can be truly bad. If the result is true, then they’ve preemptively claimed credit for it, without actually proving the result. In this way, they discourage others from working on it in either direction, because there’s less incentive. If the result is wrong, then they spread errors through the literature. In the worst cases, you end up with subfields where outsiders can’t judge the status or progress, with knowledge tied up in a spaghetti of claims, published and unpublished. Again, this protects the insiders from competition.

Comment #64 December 10th, 2014 at 4:52 pm

Scott –

Furthermore, trying to justify every trivial claim would make papers unreadably and unwrite-ably long—try earnestly to do so, and what you get is not a research paper but Principia Mathematica (where by volume II, Russell and Whitehead had finally developed enough machinery to prove 1+1=2).I wouldn’t say it quite like that. I think that the recent victories of formal proof systems is that Principia was just as much an honorable failure, indeed a prescient effort, as the Babbage calculating engine. They just didn’t have computers, that’s all.

What is true is that rigorous coding is not easy; and not the same thing as a human proof even if it is derived from that; and is really someone else’s effort. Even if a computer scientist publishes an algorithm, everyone understands that implementation can fairly be someone else’s business. It’s absurd to impose some absolute standard that every published algorithm should include an implementation, or any other specific element.

Likewise, it’s absurd to demand an absolute standard of completeness for a human proof. No matter what details you put in a proof, there is always someone else to demand more details. Until you reach the limit of formalized proof so that no one would want to read it. Indeed, good writing can make the problem “worse”. When scientists write well and write carefully, as Scott certainly does for the most part, it tends to pull in readers, some of whom who have trouble understanding the material. (Just as many “bad” doctors are very good doctors and many “bad” roads are very good roads, that simply create >1 unit of demand for every unit of achievement.) The only way to play it safe is to write impenetrable papers that no one enjoys reading.

Comment #65 December 10th, 2014 at 6:18 pm

Heck, even if a paper has abbreviated explanations or secondary omissions, that too can sometimes be taken as someone else’s problem. I didn’t mind filling in this omission in Scott’s paper in my own paper. (It is an inevitable fill-in that is probably equivalent to how anyone else would do it.) In fact, I enjoyed it. Of course you always want your paper to be complete, but this kind of thing happens. People have filled in omissions and corrections in some of my papers, and maybe Scott will too one day. It’s completely standard reciprocation, or you could just say cooperation.

Comment #66 December 11th, 2014 at 9:18 am

It reminds me that whenever I read Knuth, the stuff that gives me the most problem is when he says things like “it is clearly obvious that…”.

Comment #67 December 15th, 2014 at 5:04 pm

Scott,

Suppose we have a not-quite-quantum machine which, given a quantum circuit, processes a random execution branch and returns the result along with a boolean indicating whether the execution path destructively interferes.

What’s the computational power of this machine? How much does the interference boolean buy us?

Comment #68 December 16th, 2014 at 9:30 pm

Let me try to answer my own question.

My understanding is that the interference bit wouldn’t be sufficiently useful to solve Simon’s problem efficiently.

In Simon’s algorithm, if the secret value has a single zero bit, then the vast majority of execution paths should interfere destructively. In this case the interference bit would almost always be true and we’d just have that plus a non-orthogonal value of y, which is not much.

I find this interesting because the problem of determining whether an execution path destructively interferes does not seem to be an easy problem either, so this type of not-quite-quantum machine might as well be somewhere between BQP and BPP.

In trying to narrow down the computational advantage of a quantum computer relative to a probabilistic classical machine, i’ve gone through the following stages of decreasing naivety:

1. It’s magical because Quantum Physics is not well understood.

2. It’s because of entanglement. Entaglement is weird and must have weird computational power.

3. It’s because a QC is implicitly keeping track of 2^n possibilities with varying probabilities (riiight).

4. It’s because of the negative amplitudes and the phase kickback trick (nope).

5. It’s because of interference, it’s difficult to determine which results will never be measured (maybe?).

As i understand it, it’s not any of the above. A QC is faster than a classical machine because it can, not just identify, but avoid execution paths that have the property of destructive interference. Can we say that a QC avoids non-symmetrical execution paths?

What if we define a machine that avoids execution paths based on different criteria? For example, let M be a machine that avoids execution paths for which a valid though different yet isomorphic execution path exists. Or maybe M avoids execution paths based on sub-isomorphism?

How many distinct classes of universal machines can we define based on execution-path avoidance? Are any of these more powerful than BQP?

Comment #69 December 18th, 2014 at 5:32 pm

Joe @63:

No, Principia Mathematica didn’t have a 1000-page proof of 1+1=2. It had 1000 1-page proofs of the machinery needed to prove 1+1=2 in only 1 page. So you see, I trust, why appending a proof of each trivial fact you use in a paper isn’t a workable plan.

Job @68:

From my understanding of the papers Scott alluded to in his original post, a machine that allows the efficient reproduction of quantum statistics and follows a definite execution path will be strictly more powerful than a quantum computer.

Comment #70 July 29th, 2016 at 5:49 pm

[…] years ago, I wrote a blog post entitled PostBQP Postscripts, owning up to not one but four substantive mathematical errors that I’d made over the years […]