Archive for the ‘Complexity’ Category

A query complexity breakthrough

Wednesday, June 17th, 2015

Update (June 26): See this just-released paper, which independently obtains a couple of the same results as the Ambainis et al. paper, but in a different way (using the original Göös et al. function, rather than modifications of it).


Lots of people have accused me of overusing the word “breakthrough” on this blog. So I ask them: what word should I use when a paper comes out that solves not one, not two, but three of the open problems I’ve cared about most for literally half of my life, since I was 17 years old?

Yesterday morning, Andris Ambainis, Kaspars Balodis, Aleksandrs Belovs, Troy Lee, Miklos Santha, and Juris Smotrovs posted a preprint to ECCC in which they give:

(1) A total Boolean function f with roughly a fourth-power separation between its deterministic and bounded-error quantum query complexities (i.e., with D(f)~Q(f)4). This refutes the conjecture, which people have been making since Beals et al.’s seminal work in 1998, that the biggest possible gap is quadratic.

(2) A total Boolean function f with a quadratic separation between its deterministic and randomized query complexities (with D(f)~R0(f)2). This refutes a conjecture of Saks and Wigderson from 1986, that the best possible gap is R0(f)~D(f)0.753 (from the recursive AND/OR tree), and shows that the known relation D(f)=O(R0(f)2) is close to tight.

(3) The first total Boolean function f with any asymptotic gap between its zero-error and bounded-error randomized query complexities (in particular, with R0(f)~R(f)3/2).

(There are also other new separations—for example, involving exact quantum query complexity and approximate degree as a real polynomial. But the above three are the most spectacular to me.)

In updates to this post (coming soon), I’ll try my best to explain to general readers what D(f), R(f), and so forth are (see here for the classic survey of these measures), and I’ll also discuss how Ambainis et al. designed the strange functions f that achieve the separations (though their paper already does a good job of explaining it). For now, I’ll just write the stuff that’s easier to write.

I’m at the Federated Computing Research Conference in Portland, Oregon right now, where yesterday I gave my STOC talk (click here for the PowerPoint slides) about the largest possible separations between R(f) and Q(f) for partial Boolean functions f. (That paper is also joint work with Andris Ambainis, who has his fingers in many pies, or his queries in many oracles, or something.) Anyway, when I did a practice run of my talk on Monday night, I commented that, of course, for total Boolean functions f (those not involving a promise), the largest known gap between R(f) and Q(f) is quadratic, and is achieved when f is the OR function because of Grover’s algorithm.

Then, Tuesday morning, an hour before I was to give my talk, I saw the Ambainis et al. bombshell, which made that comment obsolete. So, being notoriously bad at keeping my mouth shut, I mentioned to my audience that, while it was great that they came all the way to Portland to learn what was new in theoretical computer science, if they wanted real news in the subfield I was talking about, they could stop listening to me and check their laptops.

(Having said that, I have had a wonderful time at FCRC, and have learned lots of other interesting things—I can do another addendum to the post about FCRC highlights if people want me to.)

Anyway, within the tiny world of query complexity—i.e., the world where I cut my teeth and spent much of my career—the Ambainis et al. paper is sufficiently revolutionary that I feel the need to say what it doesn’t do.

First, the paper does not give a better-than-quadratic gap between R(f) and Q(f) (i.e., between bounded-error randomized and quantum query complexities). The quantum algorithms that compute their functions f are still “just” variants of the old standbys, Grover’s algorithm and amplitude amplification. What’s new is that the authors have found functions where you can get the quadratic, Grover speedup between R(f) and Q(f), while also getting asymptotic gaps between D(f) and R(f), and between R0(f) and R(f). So, putting it together, you get superquadratic gaps between D(f) and Q(f), and between R0(f) and Q(f). But it remains at least a plausible conjecture that R(f)=O(Q(f)2) for all total Boolean functions f—i.e., if you insist on a “fair comparison,” then the largest known quantum speedup for total Boolean functions remains the Grover one.

Second, as far as I can tell (I might be mistaken) (I’m not), the paper doesn’t give new separations involving certificate complexity or block sensitivity (e.g., between D(f) and bs(f)). So for example, it remains open whether D(f)=O(bs(f)2), and whether C(f)=O(bs(f)α) for some α<2. (Update: Avishay Tal, in the comments, informs me that the latter conjecture was falsified by Gilmer, Saks, and Srinivasan in 2013. Wow, I’m really out of it!)

In the end, achieving these separations didn’t require any sophisticated new mathematical machinery—just finding the right functions, something that could’ve been done back in 1998, had anyone been clever enough. So, where did these bizarre functions f come from? Ambainis et al. directly adapted them from a great recent communication complexity paper by Mika Göös, Toniann Pitassi, and Thomas Watson. But the Göös et al. paper itself could’ve been written much earlier. It’s yet another example of something I’ve seen again and again in this business, how there’s no substitute for just playing around with a bunch of examples.

The highest compliment one researcher can pay another is, “I wish I’d found that myself.” And I do, of course, but having missed it, I’m thrilled that at least I get to be alive for it and blog about it. Huge congratulations to the authors!


Addendum: What’s this about?

OK, so let’s say you have a Boolean function f:{0,1}n→{0,1}, mapping n input bits to 1 output bit. Some examples are the OR function, which outputs 1 if any of the n input bits are 1, and the MAJORITY function, which outputs 1 if the majority of them are.

Query complexity is the study of how many input bits you need to read in order to learn the value of the output bit. So for example, in evaluating the OR function, if you found a single input bit that was 1, you could stop right there: you’d know that the output was 1, without even needing to look at the remaining bits. In the worst case, however, if the input consisted of all 0s, you’d have to look at all of them before you could be totally sure the output was 0. So we say that the OR function has a deterministic query complexity of n.

In this game, we don’t care about any other resources used by an algorithm, like memory or running time: just how many bits of the input it looks at! There are many reasons why, but the simplest is that, unlike with memory or running time, for many functions we can actually figure out how many input bits need to be looked at, without needing to solve anything like P vs. NP. (But note that this can already be nontrivial! For algorithms can often cleverly avoid looking at all the bits, for example by looking at some and then deciding which ones to look at next based on which values they see.)

In general, given a deterministic algorithm A and an n-bit input string x, let DA,x (an integer from 0 to n) be the number of bits of x that A examines when you run it. Then let DA be the maximum of DA,x over all n-bit strings x. Then D(f), or the deterministic query complexity of f, is the minimum of DA, over all algorithms A that correctly evaluate f(x) on every input x.

For example, D(OR) and D(MAJORITY) are both n: in the worst case, you need to read everything. For a more interesting example, consider the 3-bit Boolean function

f(x,y,z) = (not(x) and y) or (x and z).

This function has D(f)=2, even though it depends on all 3 of the input bits. (Do you see why?) In general, even if f depends on n input bits, D(f) could be as small as log2n.

The bounded-error randomized query complexity, or R(f), is like D(f), except that now we allow the algorithm to make random choices of which input bit to query, and for each input x, the algorithm only needs to compute f(x) with probability 2/3. (Here the choice of 2/3 is arbitrary; if you wanted the right answer with some larger constant probability, say 99.9%, you could just repeat the algorithm a constant number of times and take a majority vote.) The zero-error randomized query complexity, or R0(f), is the variant where the algorithm is allowed to make random choices, but at the end of the day, needs to output the correct f(x) with probability 1.

To illustrate these concepts, consider the three-bit majority function, MAJ(x,y,z). We have D(MAJ)=3, since if a deterministic algorithm queried one bit and got a 0 and queried a second bit and got a 1 (as can happen), it would have no choice but to query the third bit. But for any possible setting of x, y, and z, if we choose which bits to query randomly, there’s at least a 1/3 chance that the first two queries will return either two 0s or two 1s—at which point we can stop, with no need to query the third bit. Hence R0(MAJ)≤(1/3)2+(2/3)3=8/3 (in fact it equals 8/3, although we haven’t quite shown that). Meanwhile, R(MAJ), as we defined it, is only 1, since if you just need a 2/3 probability of being correct, you can simply pick x, y, or z at random and output it!

The bounded-error quantum query complexity, or Q(f), is the minimum number of queries made by a quantum algorithm for f, which, again, has to output the right answer with probability at least 2/3 for every input x. Here a quantum algorithm makes a “query” by feeding a superposition of basis states, each of the form |i,a,w〉, to a “black box,” which maps each basis state to |i, a XOR xi, w〉, where i is the index of the input bit xi to be queried, a is a 1-qubit “answer register” into which xi is reversibly written, and w is a “workspace” that doesn’t participate in the query. In between two queries, the algorithm can apply any unitary transformation it wants to the superposition of |i,a,w〉’s, as long as it doesn’t depend on x. Finally, some designated qubit is measured to decide whether the algorithm accepts or rejects.

As an example, consider the 2-bit XOR function, XOR(x,y). We have D(XOR)=R0(XOR)=R(XOR)=2, since until you’ve queried both bits, you’ve learned nothing about their XOR. By contrast, Q(XOR)=1, because of the famous Deutsch-Jozsa algorithm.

It’s clear that

0 ≤ Q(f) ≤ R(f) ≤ R0(f) ≤ D(f) ≤ n,

since a quantum algorithm can simulate a randomized one and a randomized one can simulate a deterministic one.

A central question for the field, since these measures were studied in the 1980s or so, has been how far apart these measures can get from each other. If you allow partial Boolean functions—meaning that only some n-bit strings, not all of them, are “valid inputs” for which the algorithm needs to return a definite answer—then it’s easy to get enormous separations between any two of the measures (indeed, even bigger than exponential), as for example in my recent paper with Andris.

For total functions, by contrast, it’s been known for a long time that these measures can differ by at most polynomial factors:

D(f) = O(R(f)3) (Nisan)

D(f) = O(R0(f)2) (folklore, I think)

R0(f) = O(R2 log(n)) (Midrijanis)

D(f) = O(Q(f)6) (Beals et al. 1998)

OK, so what were the largest known gaps? For D versus R0 (as well as D versus R), the largest known gap since 1986 has come from the “recursive AND/OR tree”: that is, an OR of two ANDs of two ORs of two ANDs of … forming a complete binary tree of depth d, with the n=2d input variables comprising the leaves. For this function, we have D(f)=n, whereas Saks and Wigderson showed that R0(f)=Θ(n0.753) (and later, Santha showed that R(f)=Θ(n0.753) as well).

For D versus Q, the largest gap has been for the OR function: we have D(OR)=n (as mentioned earlier), but Q(OR)=Θ(√n) because of Grover’s algorithm. Finally, for R0 versus R, no asymptotic gap has been known for any total function. (This is a problem that I clearly remember working on back in 2000, when I was an undergrad. I even wrote a computer program, the Boolean Function Wizard, partly to search for separations between R0 versus R. Alas, while I did find one or two functions with separations, I was unable to conclude anything from them about asymptotics.)

So, how did Ambainis et al. achieve bigger gaps for each of these? I’ll try to have an explanation written by the time my flight from Portland to Boston has landed tonight. But if you can’t wait for that, or you prefer it straight from the horse’s mouth, read their paper!


Addendum 2: The Actual Functions

As I mentioned before, the starting point for everything Ambainis et al. do is a certain Boolean function g recently constructed by Göös, Pitassi, and Watson (henceforth GPW), for different purposes than the ones that concern Ambainis et al. We think of the inputs to g as divided into nm “cells,” which are arranged in a rectangular grid with m columns and n rows. Each cell contains a bit that’s either 0 or 1 (its “label), as well as a pointer to another cell (consisting of ~log2(nm) bits). The pointer can also be “null” (i.e., can point nowhere). We’ll imagine that a query of a cell gives you everything: the label and all the bits of the pointer. This could increase the query complexity of an algorithm, but only by a log(n) factor, which we won’t worry about.

Let X be a setting of all the labels and pointers in the grid. Then the question we ask about X is the following:

    Does there exist a “marked column”: that is, a column where all n of the labels are 1, and which has exactly one non-null pointer, which begins a chain of pointers of length m-1, which visits exactly one “0” cell in each column other than the marked column, and then terminates at a null pointer?

If such a marked column exists, then we set g(X)=1; otherwise we set g(X)=0. Crucially, notice that if a marked column exists, then it’s unique, since the chain of pointers “zeroes out” all m-1 of the other columns, and prevents them from being marked.

This g already leads to a new query complexity separation, one that refutes a strengthened form of the Saks-Wigderson conjecture. For it’s not hard to see that D(g)=Ω(mn): indeed, any deterministic algorithm must query almost all of the cells. A variant of this is proved in the paper, but the basic idea is that an adversary can answer all queries with giant fields of ‘1’ labels and null pointers—until a given column is almost completed, at which point the adversary fills in the last cell with a ‘0’ label and a pointer to the last ‘0’ cell that it filled in. The algorithm just can’t catch a break; it will need to fill in m-1 columns before it knows where the marked one is (if a marked column exists at all).

By contrast, it’s possible to show that, if n=m, then R(g) is about O(n4/3). I had an argument for R(g)=O((n+m)log(m)) in an earlier version of this post, but the argument was wrong; I thank Alexander Belov for catching the error. I’ll post the R(g)=O(n4/3) argument once I understand it.

To get the other separations—for example, total Boolean functions for which D~R02, D~Q4, R0~Q3, R0~R3/2, and R~approxdeg4—Ambainis et al. need to add various “enhancements” to the basic GPW function g defined above. There are three enhancements, which can either be added individually or combined, depending on one’s needs.

1. Instead of just a single marked column, we can define g(X) to be 1 if and only if there are k marked columns, which point to each other in a cycle, and which also point to a trail of m-k ‘0’ cells, showing that none of the other columns contain all ‘1’ cells. This can help a bounded-error randomized algorithm—which can quickly find one of the all-1 columns using random sampling—while not much helping a zero-error randomized algorithm.

2. Instead of a linear chain of pointers showing that all the non-marked columns contain a ‘0’ cell, for g(X) to be 1 we can demand a complete binary tree of pointers, originating at a marked column and fanning out to all the unmarked columns in only log(m) layers. This can substantially help a quantum algorithm, which can’t follow a pointer trail any faster than a classical algorithm can; but which, given a complete binary tree, can “fan out” and run Grover’s algorithm on all the leaves in only the square root of the number of queries that would be needed classically. Meanwhile, however, putting the pointers in a tree doesn’t much help deterministic or randomized algorithms.

3. In addition to pointers “fanning out” from a marked column to all of the unmarked columns, we can demand that in every unmarked column, some ‘0’ cell contains a back-pointer, which leads back to a marked column. These back-pointers can help a randomized or quantum algorithm find a marked column faster, while not much helping a deterministic algorithm.

Unless I’m mistaken, the situation is this:

With no enhancements, you can get D~R2 and something like D~R03/2 (although I still don’t understand how you get the latter with no enhancements; the paper mentions it without proof Andris has kindly supplied a proof here).

With only the cycle enhancement, you can get R0~R3/2.

With only the binary tree enhancement, you can get R~approxdeg4.

With only the back-pointer enhancement, you can get D~R02.

With the cycle enhancement and the binary-tree enhancement, you can get R0~Q3.

With the back-pointer enhancement and the binary-tree enhancement, you can get D~Q4.

It’s an interesting question whether there are separations that require both the cycle enhancement and the back-pointer enhancement; Ambainis et al. don’t give any examples.

And here’s another interesting question not mentioned in the paper. Using the binary-tree enhancement, Ambainis et al. achieve a fourth-power separation between bounded-error randomized query complexity and approximate degree as a real polynomial—i.e., quadratically better than any separation that was known before. Their proof of this involves cleverly constructing a low-degree polynomial by summing a bunch of low-degree polynomials derived from quantum algorithms (one for each possible marked row). As a result, their final, summed polynomial does not itself correspond to a quantum algorithm, meaning that they don’t get a fourth-power separation between R and Q (which would’ve been even more spectacular than what they do get). On the other hand, purely from the existence of a function with R~approxdeg4, we can deduce that that function has either

(i) a super-quadratic gap between R and Q (refuting my conjecture that the Grover speedup is the best possible quantum speedup for total Boolean functions), or
(ii) a quadratic gap between quantum query complexity and approximate degree—substantially improving over the gap found by Ambainis in 2003.

I conjecture that the truth is (ii); it would be great to have a proof or disproof of this.

NSA in P/poly: The Power of Precomputation

Friday, May 22nd, 2015

Even after the Snowden revelations, there remained at least one big mystery about what the NSA was doing and how.  The NSA’s classified 2013 budget request mentioned, as a priority item, “groundbreaking cryptanalytic capabilities to defeat adversarial cryptography and exploit internet traffic.”  There was a requested increase, of several hundred million dollars, for “cryptanalytic IT services” and “cryptanalysis and exploitation services program C” (whatever that was).  And a classified presentation slide showed encrypted data being passed to a high-performance computing system called “TURMOIL,” and decrypts coming out.  But whatever was going on inside TURMOIL seemed to be secret even within NSA; someone at Snowden’s level wouldn’t have had access to the details.

So, what was (or is) inside the NSA’s cryptanalytic black box?  A quantum computer?  Maybe even one that they bought from D-Wave?  (Rimshot.)  A fast classical factoring algorithm?  A proof of P=NP?  Commentators on the Internet rushed to suggest each of these far-reaching possibilities.  Some of us tried to pour cold water on these speculations—pointing out that one could envision many scenarios that were a little more prosaic, a little more tied to the details of how public-key crypto is actually used in the real world.  Were we just naïve?

This week, a new bombshell 14-author paper (see also the website) advances an exceedingly plausible hypothesis about what may have been the NSA’s greatest cryptanalytic secret of recent years.  One of the authors is J. Alex Halderman of the University of Michigan, my best friend since junior high school, who I’ve blogged about before.  Because of that, I had some advance knowledge of this scoop, and found myself having to do what regular Shtetl-Optimized readers will know is the single hardest thing in the world for me: bite my tongue and not say anything.  Until now, that is.

Besides Alex, the other authors are David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Béguelink, and Paul Zimmermann (two of these, Green and Heninger, have previously turned up on Shtetl-Optimized).

These authors study vulnerabilities in Diffie-Hellman key exchange, the “original” (but still widely-used) public-key cryptosystem, the one that predates even RSA.  Diffie-Hellman is the thing where Alice and Bob first agree on a huge prime number p and a number g, then Alice picks a secret a and sends Bob ga (mod p), and Bob picks a secret b and sends Alice gb (mod p), and then Alice and Bob can both compute (ga)b=(gb)a=gab (mod p), but an eavesdropper who’s listening in only knows p, g, ga (mod p), and gb (mod p), and one can plausibly conjecture that it’s hard from those things alone to get gab (mod p).  So then Alice and Bob share a secret unknown to the eavesdropper, which they didn’t before, and they can use that secret to start doing cryptography.

As far as anyone knows today, the best way to break Diffie-Hellman is simply by calculating discrete logarithms: that is, solving the problem of recovering a given only g and h=ga (mod p).  At least on a classical computer, the fastest known algorithm for discrete logarithms (over fields of prime order) is the number field sieve (NFS).  Under plausible conjectures about the distribution of “smooth” numbers, NFS uses time that grows like exp((1.923+o(1))(log p)1/3(log log p)2/3), where the exp and logs are base e (and yes, even the lower-order stuff like (log log p)2/3 makes a big difference in practice).  Of course, once you know the running time of the best-known algorithm, you can then try to choose a key size (that is, a value of log(p)) that’s out of reach for that algorithm on the computing hardware of today.

(Note that the recent breakthrough of Antoine Joux, solving discrete log in quasipolynomial time in fields of small characteristic, also relied heavily on sieving ideas.  But there are no improvements from this yet for the “original” discrete log problem, over prime fields.)

But there’s one crucial further fact, which has been understood for at least a decade by theoretical cryptographers, but somehow was slow to filter out to the people who deploy practical cryptosystems.  The further fact is that in NFS, you can arrange things so that almost all the discrete-logging effort depends only on the prime number p, and not at all on the specific numbers g and h for which you’re trying to take the discrete log.  After this initial “precomputation” step, you then have a massive database that you can use to speed up the “descent” step: the step of solving ga=h (mod p), for any (g,h) pair that you want.

It’s a little like the complexity class P/poly, where a single, hard-to-compute “advice string” unlocks exponentially many inputs once you have it.  (Or a bit more precisely, one could say that NFS reveals that exponentiation modulo a prime number is sort of a trapdoor one-way function, except that the trapdoor information is subexponential-size, and given the trapdoor, inverting the function is still subexponential-time, but a milder subexponential than before.)

The kicker is that, in practice, a large percentage of all clients and servers that use Diffie-Hellman key exchange use the same few prime numbers p.  This means that, if you wanted to decrypt a large fraction of all the traffic encrypted with Diffie-Hellman, you wouldn’t need to do NFS over and over: you could just do it for a few p‘s and cache the results.  This fact can singlehandedly change the outlook for breaking Diffie-Hellman.

The story is different depending on the key size, log(p).  In the 1990s, the US government insisted on “export-grade” cryptography for products sold overseas (what a quaint concept!), which meant that the key size was restricted to 512 bits.  For 512-bit keys, Adrian et al. were able to implement NFS and use it to do the precomputation step in about 7 days on a cluster with a few thousand cores.  After this initial precomputation step (which produced 2.5GB of data), doing the descent, to find the discrete log for a specific (g,h) pair, took only about 90 seconds on a 24-core machine.

OK, but no one still uses 512-bit keys, do they?  The first part of Adrian et al.’s paper demonstrates that, because of implementation issues, even today you can force many servers to “downgrade” to the 512-bit, export-grade keys—and then, having done so, you can stall for time for 90 seconds as you figure out the session key, and then do a man-in-the-middle attack and take over and impersonate the server.  It’s an impressive example of the sort of game computer security researchers have been playing for a long time—but it’s really just a warmup to the main act.

As you’d expect, many servers today are configured more intelligently, and will only agree to 1024-bit keys.  But even there, Adrian et al. found that a large fraction of servers rely on just a single 1024-bit prime (!), and many of the ones that don’t rely on just a few other primes.  Adrian et al. estimate that, for a single 1024-bit prime, doing the NFS precomputation would take about 45 million years using a single core—or to put it more ominously, 1 year using 45 million cores.  If you built special-purpose hardware, that could go down by almost two orders of magnitude, putting the monetary cost at a few hundred million dollars, completely within the reach of a sufficiently determined nation-state.  Once the precomputation was done, and the terabytes of output stored in a data center somewhere, computing a particular discrete log would then take about 30 days using 1 core, or mere minutes using a supercomputer.  Once again, none of this is assuming any algorithmic advances beyond what’s publicly known.  (Of course, it’s possible that the NSA also has some algorithmic advances; even modest ones could obviate the need for special-purpose hardware.)

While writing this post, I did my own back-of-the-envelope, and got that using NFS, calculating a 1024-bit discrete log should be about 7.5 million times harder than calculating a 512-bit discrete log.  So, extrapolating from the 7 days it took Adrian et al. to do it for 512 bits, this suggests that it might’ve taken them about 143,840 years to calculate 1024-bit discrete logs with the few thousand cores they had, or 1 year if they had 143,840 times as many cores (since almost all this stuff is extremely parallelizable).  Adrian et al. mention optimizations that they expect would improve this by a factor of 3, giving us about 100 million core-years, very similar to Adrian et al.’s estimate of 45 million core-years (the lower-order terms in the running time of NFS might account for some of the remaining discrepancy).

Adrian et al. mount a detailed argument in their paper that all of the details about NSA’s “groundbreaking cryptanalytic capabilities” that we learned from the Snowden documents match what would be true if the NSA were doing something like the above.  The way Alex put it to me is that, sure, the NSA might not have been doing this, but if not, then he would like to understand why not—for it would’ve been completely feasible within the cryptanalytic budget they had, and the NSA would’ve known that, and it would’ve been a very good codebreaking value for the money.

Now that we know about this weakness of Diffie-Hellman key exchange, what can be done?

The most obvious solution—but a good one!—is just to use longer keys.  For decades, when applied cryptographers would announce some attack like this, theorists like me would say with exasperation: “dude, why don’t you fix all these problems in one stroke by just, like, increasing the key sizes by a factor of 10?  when it’s an exponential against a polynomial, we all know the exponential will win eventually, so why not just go out to where it does?”  The applied cryptographers explain to us, with equal exasperation in their voices, that there are all sorts of reasons why not, from efficiency to (maybe the biggest thing) backwards-compatibility.  You can’t unilaterally demand 2048-bit keys, if millions of your customers are using browsers that only understand 1024-bit keys.  On the other hand, given the new revelations, it looks like there really will be a big push to migrate to larger key sizes, as the theorists would’ve suggested from their ivory towers.

A second, equally-obvious solution is to stop relying so much on the same few prime numbers in Diffie-Hellman key exchange.  (Note that the reason RSA isn’t vulnerable to this particular attack is that it inherently requires a different composite number N for each public key.)  In practice, generating a new huge random prime number tends to be expensive—taking, say, a few minutes—which is why people so often rely on “standard” primes.  At the least, we could use libraries of millions of “safe” primes, from which a prime for a given session is chosen randomly.

A third solution is to migrate to elliptic-curve cryptography (ECC), which as far as anyone knows today, is much less vulnerable to descent attacks than the original Diffie-Hellman scheme.  Alas, there’s been a lot of understandable distrust of ECC after the DUAL_EC_DBRG scandal, in which it came out that the NSA backdoored some of NIST’s elliptic-curve-based pseudorandom generators by choosing particular parameters that it knew how handle.  But maybe the right lesson to draw is mod-p groups and elliptic-curve groups both seem to be pretty good for cryptography, but the mod-p groups are less good if everyone is using the same few prime numbers p (and those primes are “within nation-state range”), and the elliptic-curve groups are less good if everyone is using the same few parameters.  (A lot of these things do seem pretty predictable with hindsight, but how many did you predict?)

Many people will use this paper to ask political questions, like: hasn’t the NSA’s codebreaking mission once again usurped its mission to ensure the nation’s information security?  Doesn’t the 512-bit vulnerability that many Diffie-Hellman implementations still face, as a holdover from the 1990s export rules, illustrate why encryption should never be deliberately weakened for purposes of “national security”?  How can we get over the issue of backwards-compatibility, and get everyone using strong crypto?  People absolutely should be asking such questions.

But for readers of this blog, there’s one question that probably looms even larger than those of freedom versus security, openness versus secrecy, etc.: namely, the question of theory versus practice.  Which “side” should be said to have “won” this round?  Some will say: those useless theoretical cryptographers, they didn’t even know how their coveted Diffie-Hellman system could be broken in the real world!  The theoretical cryptographers might reply: of course we knew about the ability to do precomputation with NFS!  This wasn’t some NSA secret; it’s something we discussed openly for years.  And if someone told us how Diffie-Hellman was actually being used (with much of the world relying on the same few primes), we could’ve immediately spotted the potential for such an attack.  To which others might reply: then why didn’t you spot it?

Perhaps the right lesson to draw is how silly such debates really are.  In the end, piecing this story together took a team that was willing to do everything from learning some fairly difficult number theory to coding up simulations to poring over the Snowden documents for clues about the NSA’s budget.  Clear thought doesn’t respect the boundaries between disciplines, or between theory and practice.

(Thanks very much to Nadia Heninger and Neal Koblitz for reading this post and correcting a few errors in it.  For more about this, see Bruce Schneier’s post or Matt Green’s post.)

Five announcements

Saturday, May 16th, 2015

1. Sanjeev Arora sent me a heads-up that there’s a discussion about the future of the STOC conference  at the Windows on Theory blog—in particular, about the idea of turning STOC into a longer “CS theory festival.”  If you have opinions about this, don’t miss the chance to make your voice heard.

2. Back in January, I blogged about a new quantum optimization algorithm by Farhi, Goldstone, and Gutmann, which was notable for being, as far as anyone could tell, the first quantum algorithm to achieve a provably better approximation ratio than the best-known classical algorithm for an NP-hard optimization problem.  Today, I report that a fearsome list of authors—Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright—has put out an eagerly-awaited paper that gives a classical algorithm for the same problem, with better performance than the quantum algorithm’s.  (They write that this “improves both qualitatively and quantitatively” on Farhi et al.’s work; I assume “qualitatively” refers to the fact that the new algorithm is classical.)  What happened, apparently, is that after I blogged (with enthusiasm) about the Farhi et al. result, a bunch of classical complexity theorists read my post and decided independently that they could match or beat the quantum algorithm’s performance classically; then they found out about each other and decided to merge their efforts.  I’m proud to say that this isn’t the first example of this blog catalyzing actual research progress, though it’s probably the best example so far.  [Update: Luca Trevisan now has a great post explaining what happened in much more detail, entitled “How Many Theoreticians Does It Take to Approximate Max 3Lin?”]

Another update: Farhi et al. have posted a new version of their paper, in which they can almost match the performance of the classical algorithm using their quantum algorithm.

3. Jennifer Ouellette has a wonderful article in Quanta magazine about recent progress in AdS/MERA (i.e., “the emergence of spacetime from entanglement”), centered around the ideas of Brian Swingle.  This is one of the main things that I’d love to understand better right now—if I succeed even partially, you’ll know because I’ll write a blog post trying to explain it to others.  See also this blog post by Sean Carroll (about this paper by Ning Bao et al.), and this paper by Pastawski, Yoshida, Harlow, and Preskill, which explicitly mines the AdS/CFT correspondence for examples of quantum error-correcting codes.

4. Celebrity rationalist Julia Galef, who I had the great honor of meeting recently, has a podcast interview with Sean Carroll about why Carroll accepts the many-worlds interpretation.  (Or if, like me, you prefer the written word to the spoken one, click here for a full transcript.)  Unfortunately, Sean is given the opportunity at the end of the interview to recommend one science book to his listeners—just one!—but he squanders it by plugging some weird, self-indulgent thing called Quantum Computing Since Democritus.  Julia also has a YouTube video about what she learned from the interview, but I haven’t yet watched it (is there a transcript?).

5. I came across an insightful if meandering essay about nerd culture by Meredith L. Patterson.  In particular, noticing how the term “nerd” has been co-opted by normal, socially-skilled people, who’ve quickly set about remaking nerd social norms to make them identical to the rest of the world’s norms, Patterson coins the term “weird-nerd” to describe people like herself, who are still nerds in the original sense and who don’t see nerd culture as something horribly, irreparably broken.  As she writes: “We’ll start to feel less defensive when we get some indication — any indication — that our critics understand what parts of our culture we don’t want to lose and why we don’t want to lose them.”  (But is this the start of a linguistic treadmill?  Will we eventually need to talk about weird-weird-nerds, etc.?)

Two papers

Tuesday, April 21st, 2015

Just to get myself back into the habit of blogging:

For those of you who don’t read Lance’s and Bill’s blog, there was a pretty significant breakthrough in complexity theory announced last week.  (And yes, I’m now spending one of the two or so uses of the word “breakthrough” that I allow myself per year—wait, did I just spend the second one with this sentence?)  Ben Rossman (a former MIT PhD student whose thesis committee I was honored to serve on), Rocco Servedio, and Li-Yang Tan have now shown that the polynomial hierarchy is infinite relative to a random oracle, thereby solving the main open problem from Johan Håstad’s 1986 PhD thesis.  While it feels silly even to mention it, the best previous result in this direction was to separate PNP from Σ2P relative to a random oracle, which I did in my Counterexample to the Generalized Linial-Nisan Conjecture paper.  In some sense Rossman et al. infinitely improve on that (using completely different techniques).  Proving their result boils down to proving a new lower bound on the sizes of constant-depth circuits.  Basically, they need to show that, for every k, there are problems that can be solved by small circuits with k layers of AND, OR, and NOT gates, but for which the answer can’t even be guessed, noticeably better than chance, by any small circuit with only k-1 layers of AND, OR, and NOT gates.  They achieve that using a new generalization of the method of random restrictions.  Congratulations to Ben, Rocco, and Li-Yang!

Meanwhile, if you want to know what I’ve been doing for the last couple months, one answer is contained in this 68-page labor of love preprint by me and my superb PhD students Daniel Grier and Luke Schaeffer.  There we give a full classification of all possible sets of classical reversible gates acting on bits (like the Fredkin, Toffoli, and CNOT gates), as well as a linear-time algorithm to decide whether one reversible gate generates another one (previously, that problem wasn’t even known to be decidable).  We thereby completely answer a question that basically no one was asking, although I don’t understand why not.

Memrefuting

Wednesday, February 11th, 2015

(in which I bring this blog back to the “safe, uncontroversial” territory of arguing with people who think they can solve NP-complete problems in polynomial time)

A few people have asked my opinion about “memcomputing”: a computing paradigm that’s being advertised, by its developers, as a way to solve NP-complete problems in polynomial time.  According to the paper Memcomputing NP-complete problems in polynomial time using polynomial resources and collective states, memcomputing “is based on the brain-like notion that one can process and store information within the same units (memprocessors) by means of their mutual interactions.”  The authors are explicit that, in their view, this idea allows the Subset Sum problem to be solved with polynomial resources, by exploring all 2n possible subsets in parallel, and that this refutes the Extended Church-Turing Thesis.  They’ve actually built ‘memcomputers’ that solve small instances of Subset Sum, and they hope to scale them up, though they mention hardware limitations that have made doing so difficult—more about that later.

A bunch of people (on Hacker News, Reddit, and elsewhere) tried to explain the problems with the Subset Sum claim when the above preprint was posted to the arXiv last year.  However, an overlapping set of authors has now simply repeated the claim, unmodified, in a feature article in this month’s Scientific American.  Unfortunately the SciAm article is behind a paywall, but here’s the relevant passage:

Memcomputing really shows advantages when applied to one of the most difficult types of problems we know of in computer science: calculating all the properties of a large series of integers. This is the kind of challenge a computer faces when trying to decipher complex codes. For instance, give the computer 100 integers and then ask it to find at least one subset that adds up to zero. The computer would have to check all possible subsets and then sum all numbers in each subset. It would plow through each possible combination, one by one, which is an exponentially huge increase in processing time. If checking 10 integers took one second, 100 integers would take 1027 seconds—millions of trillions of years … [in contrast,] a memcomputer can calculate all subsets and sums in just one step, in true parallel fashion, because it does not have to shuttle them back and forth to a processor (or several processors) in a series of sequential steps. The single-step approach would take just a single second.

For those tuning in from home: in the Subset Sum problem, we’re given n integers a1,…,an, and we want to know whether there exists a subset of them that sums to a target integer k.  (To avoid trivializing the problem, either k should be nonzero or else the subset should be required to be nonempty, a mistake in the passage quoted above.)

To solve Subset Sum in polynomial time, the basic idea of “memcomputing” is to generate waves at frequencies that encode the sums of all possible subsets of ai‘s, and then measure the resulting signal to see if there’s a frequency there that corresponds to k.

Alas, there’s a clear scalability problem that seems to me to completely kill this proposal, as a practical way of solving NP-complete problems.  The problem is that the signal being measured is (in principle!) a sum of waves of exponentially many different frequencies.  By measuring this wave and taking a Fourier transform, one will not be able to make out the individual frequencies until one has monitored the signal for an exponential amount of time.  There are actually two issues here:

(1) Even if there were just a single frequency, measuring the frequency to exponential precision will take exponential time. This can be easily seen by contemplating even a moderately large n.  Thus, suppose n=1000.  Then we would need to measure a frequency to a precision of one part in ~21000. If the lowest frequency were (say) 1Hz, then we would be trying to distinguish frequencies that differ by far less than the Planck scale.  But distinguishing frequencies that close would require so much energy that one would exceed the Schwarzschild limit and create a black hole!  The alternative is to make the lowest frequency slower than the lifetime of the universe, causing an exponential blowup in the amount of time we need to run the experiment.

(2) Because there are exponentially many frequencies, the amplitude of each frequency will get attenuated by an exponential amount.  Again, suppose that n=1000, so that we’re talking about attenuation by a ~2-1000 factor.  Then given any amount of input radiation that could be gathered in physical universe, the expected amount of amplitude on each frequency would correspond to a microscopically small fraction of 1 photon — so again, it would take exponential time for us to notice any radiation at all on the frequency that interests us (unless we used an insensitive test that was liable to confuse that frequency with many other nearby frequencies).

What do the authors have to say about these issues?  Here are the key passages from the above-mentioned paper:

all frequencies involved in the collective state (1) are dampened by the factor 2-n.  In the case of the ideal machine, i.e., a noiseless machine, this would not represent an issue because no information is lost.  On the contrary, when noise is accounted for, the exponential factor represents the hardest limitation of the experimentally fabricated machine, which we reiterate is a technological limit for this particular realization of a memcomputing machine but not for all of them …

In conclusion we have demonstrated experimentally a deterministic memcomputing machine that is able to solve an NP-complete problem in polynomial time (actually in one step) using only polynomial resources.  The actual machine we built clearly suffers from technological limitations due to unavoidable noise that impair [sic] the scalability.  This issue can, however, be overcome in other UMMs [universal memcomputing machines] using other ways to encode such information.

The trouble is that no other way to encode such information is ever mentioned.  And that’s not an accident: as explained above, when n becomes even moderately large, this is no longer a hardware issue; it’s a fundamental physics issue.

It’s important to realize that the idea of solving NP-complete problems in polynomial time using an analog device is far from new: computer scientists discussed such ideas extensively in the 1960s and 1970s.  Indeed, the whole point of my NP-complete Problems and Physical Reality paper was to survey the history of such attempts, and (hopefully!) to serve as a prophylactic against people making more such attempts without understanding the history.  For computer scientists ultimately came to realize that all proposals along these lines simply “smuggle the exponentiality” somewhere that isn’t being explicitly considered, exactly like all proposals for perpetual-motion machines smuggle the entropy increase somewhere that isn’t being explicitly considered.  The problem isn’t a practical one; it’s one of principle.  And I find it unfortunate that the recent memcomputing papers show no awareness of this story.

(Incidentally, quantum computing is interesting precisely because, out of all “post-Extended-Church-Turing” computing proposals, it’s the only one for which we can’t articulate a clear physical reason why it won’t scale, analogous to the reasons given above for memcomputing.  With quantum computing the tables are turned, with the skeptics forced to handwave about present-day practicalities, while the proponents wield the sharp steel of accepted physical law.  But as readers of this blog well know, quantum computing doesn’t seem to promise the polynomial-time solution of NP-complete problems, only of more specialized problems.)

Quantum Machine Learning Algorithms: Read the Fine Print

Monday, February 2nd, 2015

So, I’ve written a 4-page essay of that title, which examines the recent spate of quantum algorithms for clustering, classification, support vector machines, and other “Big Data” problems that grew out of a 2008 breakthrough on solving linear systems by Harrow, Hassidim, and Lloyd, as well as the challenges in applying these algorithms to get genuine exponential speedups over the best classical algorithms.  An edited version of the essay will be published as a Commentary in Nature Physics.  Thanks so much to Iulia Georgescu at Nature for suggesting that I write this.

Update (April 4, 2015): The piece has now been published.

Quantum computing news items (by reader request)

Monday, January 12th, 2015

Within the last couple months, there was a major milestone in the quest to build a scalable quantum computer, and also a major milestone in the quest to figure out what you would do with a quantum computer if you had one.  As I’ve admitted many times, neither of those two quests is really the reason why I got into quantum computing—I’m one of the people who would still want to study this field, even if there were no serious prospect either of building a quantum computer or of doing anything useful with it for a thousand years—but for some reason that I don’t fully understand, both of those goals do seem to excite other people.

So, OK, the experimental breakthrough was the Martinis group’s use of quantum error-correction with superconducting qubits, to preserve a logical bit for several times longer than the underlying physical qubits survived for.  Shortly before this came out, I heard Krysta Svore give a talk at Yale in which she argued that preserving a logical qubit for longer than the physical qubits was the next experimental milestone (the fourth, out of seven she listed) along the way to a scalable, fault-tolerant quantum computer.  Well, it looks like that milestone may have been crossed.  (update: I’ve since learned from Graeme Smith, in the comments section, that the milestone crossed should really be considered the “3.5th,” since even though quantum error-correction was used, the information that was being protected was classical.  I also learned from commenter Jacob that the seven milestones Krysta listed came from a Science paper by Schoelkopf and Devorret.  She cited the paper; the forgetfulness was entirely mine.)

In more detail, the Martinis group used a linear array of 9 qubits: 5 data qubits interleaved with 4 measurement qubits. The authors describe this setup as a “precursor” to Kitaev’s surface code (which would involve a 2-dimensional array).  They report that, after 8 cycles of error detection and correction, they were able to suppress the effective error rate compared to the physical qubits by a factor of 8.5.  They also use quantum state tomography to verify that their qubits were indeed in entangled states as they did this.

Of course, this is not yet a demonstration of any nontrivial fault-tolerant computation, let alone of scaling such a computation up to where it’s hard to simulate with a classical computer.  But it pretty clearly lies along the “critical path” to that.

As I blogged back in September, Google recently hired Martinis’s group away from UC Santa Barbara, where they’ll work on superconducting quantum annealing, as a step along the way to full universal QC.  As I mentioned then, the Martinis group’s “Xmon” qubits have maybe 10,000 times the coherence times of D-Wave’s qubits, at least when you measure coherence in the usual ways.  The fact that Martinis et al. are carefully doing quantum state tomography and demonstrating beneficial error-correction before scaling up are further indications of the differences between their approach and D-Wave’s.  Of course, even if you do everything right, there’s still no guarantee that you’ll outperform a classical computer anytime soon: it might simply be that the things you can do in the near future (e.g., quantum annealing for NP-complete problems) are not things where you’re going to outperform the best classical algorithms.  But it’s certainly worth watching closely.

Meanwhile, the quantum algorithms breakthrough came in a paper last month by an extremely well-known trio down the Infinite Corridor from me: Farhi, Goldstone, and Gutmann.  In slightly earlier work, Farhi et al. proposed a new quantum algorithm for NP-hard optimization problems.  Their algorithm badly needs a name; right now they’re just calling it the “QAOA,” or Quantum Approximate Optimization Algorithm.  But here’s what you need to know: their new algorithm is different from their famous adiabatic algorithm, although it does become equivalent to the adiabatic algorithm in a certain infinite limit.  Rather than staying in the ground state of some Hamiltonian, the QAOA simply

  1. starts with a uniform superposition over all n-bit strings,
  2. applies a set of unitary transformations, one for each variable and constraint of the NP-hard instance,
  3. repeats the set some number of times p (the case p=1 is already interesting), and then
  4. measures the state in the computational basis to see what solution was obtained.

The unitary transformations have adjustable real parameters, and a big part of the game is figuring out how to set the parameters to get a good solution.

The original, hyper-ambitious goal of the QAOA was to solve the Unique Games problem in quantum polynomial time—thereby disproving the Unique Games Conjecture (which I previously blogged about here), unless NP⊆BQP.  It hasn’t yet succeeded at that goal.  In their earlier work, Farhi et al. managed to show that the QAOA solves the MAX-CUT problem on 3-regular graphs with approximation ratio 0.6924, which is better than random guessing, but not as good as the best-known classical algorithms (Goemans-Williamson, or for the degree-3 case, Halperin-Livnat-Zwick), let alone better than those algorithms (which is what would be needed to refute the UGC).

In their new work, Farhi et al. apply the QAOA to a different problem: the poetically-named MAX E3LIN2.  Here you’re given a collection of linear equations mod 2 in n Boolean variables, where each equation involves exactly 3 variables, and each variable appears in at most D equations.  The goal is to satisfy as many of the equations as possible, assuming that they’re not all satisfiable (if they were then the problem would be trivial).  If you just guess a solution randomly, you’ll satisfy a 1/2 fraction of the equations.  Håstad gave a polynomial-time classical algorithm that satisfies a 1/2+c/D fraction of the maximum number of satisfiable equations, for some constant c.  This remains the best approximation ratio that we know how to achieve classically.  Meanwhile, Trevisan showed that if there’s a polynomial-time classical algorithm that satisfies a 1/2+c/√D fraction of the max number of satisfiable equations, for a sufficiently large constant c, then P=NP.

OK, so what do Farhi et al. do?  They show that the QAOA, with suitably tuned parameters, is able to satisfy a 1/2+c/D3/4 fraction of the total number of equations in polynomial time, for some constant c.  (In particular, this implies that a 1/2+c/D3/4 fraction of the equations are satisfiable—assuming, as Farhi et al. do, that two equations directly contradicting each other, like x+y+z=0 and x+y+z=1, never appear in the same instance.)

Now, the above is a bigger fraction than the best-known classical algorithm satisfies!  (And not only that, but here the fraction is of the total number of equations, rather than the number of satisfiable equations.)  Farhi et al. also show that, if the constraint hypergraph doesn’t contain any small cycles, then QAOA can satisfy a 1/2+c/√D fraction of the equations in polynomial time, which is essentially the best possible unless NP⊆BQP.

The importance of this result is not that anyone cares about the MAX E3LIN2 problem for its own sake.  Rather it’s that, as far as I know, this is the first time that a quantum algorithm has been proved to achieve a better approximation ratio for a natural NP-hard optimization problem than the best known classical algorithm achieves.  People have discussed that as a hypothetical possibility for 20 years, but (again, unless I’m missing something) we never had a good example until now.  The big question now is whether the 1/2+c/D3/4 performance can be matched classically, or whether there truly is an NP-intermediate region of this optimization problem where quantum outperforms classical.  (The third possibility, that doing as well as the quantum algorithm is already NP-hard, is one that I won’t even speculate about.  For, as Boaz Barak rightly points out in the comments section, the quantum algorithm is still being analyzed only in the regime where solutions are combinatorially guaranteed to exist—and that regime can’t possibly be NP-hard, unless NP=coNP.)

[Above, I corrected some errors that appeared in the original version of this post—thanks to Ed Farhi and to the commenters for bringing them to my attention.]


Update (Feb. 3, 2015): Boaz Barak has left the following comment:

in a work with Ankur Moitra, Oded Regev, David Stuerer and Aravindan Vijayaraghavan we were able to match (in fact exceed) the guarantees of the Farhi et al paper via a classical efficient algorithm. (Namely satisfy 1/2 + C/√D fraction of the equations). p.s. we hope to post this on the arxiv soon

Quantum Complexity Theory Student Project Showcase 3

Friday, December 26th, 2014

Merry Christmas (belatedly)!  This year Quanta Claus has brought us eight fascinating final project reports from students in my 6.845 Quantum Complexity Theory class, covering everything from interactive proofs to query and communication complexity to quantum algorithms to quantum gates (and one project even includes a web-based demo you can try!).  Continuing in the tradition of the two previous showcases, I’m sharing the reports here; some of these works might also be posted to the arXiv and/or submitted to journals.  Thanks so much to the students who volunteered to participate in the showcase, and to all the students for making this such a great class.

PostBQP Postscripts: A Confession of Mathematical Errors

Sunday, November 30th, 2014

tl;dr: This post reveals two errors in one of my most-cited papers, and also explains how to fix them.  Thanks to Piotr Achinger, Michael Cohen, Greg Kuperberg, Ciaran Lee, Ryan O’Donnell, Julian Rosen, Will Sawin, Cem Say, and others for their contributions to this post.


If you look at my Wikipedia page, apparently one of the two things in the world that I’m “known for” (along with algebrization) is “quantum Turing with postselection.”  By this, Wikipedia means my 2004 definition of the complexity class PostBQP—that is, the class of decision problems solvable in bounded-error quantum polynomial time, assuming the ability to postselect (or condition) on certain measurement outcomes—and my proof that PostBQP coincides with the classical complexity PP (that is, the class of decision problems expressible in terms of whether the number of inputs that cause a given polynomial-time Turing machine to accept does or doesn’t exceed some threshold).

To explain this a bit: even without quantum mechanics, it’s pretty obvious that, if you could “postselect” on exponentially-unlikely events, then you’d get huge, unrealistic amounts of computational power.  For example (and apologies in advance for the macabre imagery), you could “solve” NP-complete problems in polynomial time by simply guessing a random solution, then checking whether the solution is right, and shooting yourself if it happened to be wrong!  Conditioned on still being alive (and if you like, appealing to the “anthropic principle”), you must find yourself having guessed a valid solution—assuming, of course, that there were any valid solutions to be found.  If there weren’t any, then you’d seem to be out of luck!  (Exercise for the reader: generalize this “algorithm,” so that it still works even if you don’t know in advance whether your NP-complete problem instance has any valid solutions.)

So with the PostBQP=PP theorem, the surprise was not that postselection gives you lots of computational power, but rather that postselection combined with quantum mechanics gives you much more power even than postselection by itself (or quantum mechanics by itself, for that matter).  Since PPP=P#P, the class PP basically captures the full difficulty of #P-complete counting problems—that is, not just solving an NP-complete problem, but counting how many solutions it has.  It’s not obvious that a quantum computer with postselection can solve counting problems, but that’s what the theorem shows.  That, in turn, has implications for other things: for example, I showed it can be used to prove classical facts about PP, like the fact that PP is closed under intersection (the Beigel-Reingold-Spielman Theorem), in a straightforward way; and it’s also used to show the hardness of quantum sampling problems, in the work of Bremner-Jozsa-Shepherd as well as my BosonSampling work with Arkhipov.

I’m diffident about being “known for” something so simple; once I had asked the question, the proof of PostBQP=PP took me all of an hour to work out.  Yet PostBQP ended up being a hundred times more influential for quantum computing theory than things on which I expended a thousand times more effort.  So on balance, I guess I’m happy to call PostBQP my own.

That’s why today’s post comes with a special sense of intellectual responsibility.  Within the last month, it’s come to my attention that there are at least two embarrassing oversights in my PostBQP paper from a decade ago, one of them concerning the very definition of PostBQP.  I hasten to clarify: once one fixes up the definition, the PostBQP=PP theorem remains perfectly valid, and all the applications of PostBQP that I mentioned above—for example, to reproving Beigel-Reingold-Spielman, and to the hardness of quantum sampling problems—go through just fine.  But if you think I have nothing to be embarrassed about: well, read on.


The definitional subtlety came clearly to my attention a few weeks ago, when I was lecturing about PostBQP in my 6.845 Quantum Complexity Theory graduate class.  I defined PostBQP as the class of languages L⊆{0,1}* for which there exists a polynomial-time quantum Turing machine M such that, for all inputs x∈{0,1}*,

  • M(x) “succeeds” (determined, say, by measuring its first output qubit in the {|0>,|1>} basis) with nonzero probability.
  • If x∈L, then conditioned on M(x) succeeding, M(x) “accepts” (determined, say, by measuring its second output qubit in the {|0>,|1>} basis) with probability at least 2/3.
  • If x∉L, then conditioned on M(x) succeeding, M(x) accepts with probability at most 1/3.

I then had to reassure the students that PostBQP, so defined, was a “robust” class: that is, that the definition doesn’t depend on stupid things like which set of quantum gates we allow. I argued that, even though we’re postselecting on exponentially-unlikely events, it’s still OK, because the Solovay-Kitaev Theorem lets us approximate any desired unitary to within exponentially-small error, with only a polynomial increase in the size of our quantum circuit. (Here we actually need the full power of the Solovay-Kitaev Theorem, in contrast to ordinary BQP, where we only need part of the power.)

A student in the class, Michael Cohen, immediately jumped in with a difficulty: what if M(x) succeeded, not with exponentially-small probability, but with doubly-exponentially-small probability—say, exp(-2n)?  In that case, one could no longer use the Solovay-Kitaev Theorem to show the irrelevance of the gate set.  It would no longer even be clear that PostBQP⊆PP, since the PP simulation might not be able to keep track of such tiny probabilities.

Thinking on my feet, I replied that we could presumably choose a set of gates—for example, gates involving rational numbers only—for which doubly-exponentially-small probabilities would never arise.  Or if all else failed, we could simply add to the definition of PostBQP that M(x) had to “succeed” with probability at least 1/exp(n): after all, that was the only situation I ever cared about anyway, and the only one that ever arose in the applications of PostBQP.

But the question still gnawed at me: was there a problem with my original, unamended definition of PostBQP?  If we weren’t careful in choosing our gate set, could we have cancellations that produced doubly-exponentially-small probabilities?  I promised I’d think about it more.

By a funny coincidence, just a couple weeks later, Ciaran Lee, a student at Oxford, emailed me the exact same question.  So on a train ride from Princeton to Boston, I decided to think about it for real.  It wasn’t hard to show that, if the gates involved square roots of rational numbers only—for example, if we’re dealing with the Hadamard and Toffoli gates, or the cos(π/8) and CNOT gates, or other standard gate sets—then every measurement outcome has at least 1/exp(n) probability, so there’s no problem with the definition of PostBQP.  But I didn’t know what might happen with stranger gate sets.

As is my wont these days—when parenting, teaching, and so forth leave me with almost no time to concentrate on math—I posted the problem to MathOverflow.  Almost immediately, I got incisive responses.  First, Piotr Achinger pointed out that, if we allow arbitrary gates, then it’s easy to get massive cancellations.  In more detail, let {an} be extremely-rapidly growing sequence of integers, say with an+1 > exp(an).  Then define

$$ \alpha = \sum_{n=1}^{\infty} 0.1^{a_n}. $$

If we write out α in decimal notation, it will consist of mostly 0’s, but with 1’s spaced further and further apart, like so: 0.1101000000000001000….  Now consider a gate set that involves α as well as 0.1 and -0.1 as matrix entries.  Given n qubits, it’s not hard to see that we can set up an interference experiment in which one of the paths leading to a given outcome E has amplitude α, and the other paths have amplitudes $$ -(0.1^{a_1}), -(0.1^{a_2}), \ldots, -(0.1^{a_k}), $$ where k is the largest integer such that ak≤n. In that case, the total amplitude of E will be about $$0.1^{a_{k+1}},$$ which for most values of n is doubly-exponentially small in n. Of course, by simply choosing a faster-growing sequence {an}, we can cause an even more severe cancellation.

Furthermore, by modifying the above construction to involve two crazy transcendental numbers α and β, I claim that we can set up a PostBQP computation such that deciding what happens is arbitrarily harder than PP (though still computable)—say, outside of exponential space, or even triple-exponential space. Moreover, we can do this despite the fact that the first n digits of α and β remain computable in O(n) time. The details are left as an exercise for the interested reader.

Yet even though we can engineer massive cancellations with crazy gates, I still conjectured that nothing would go wrong with “normal” gates: for example, gates involving algebraic amplitudes only. More formally, I conjectured that any finite set A=(a1,…,ak) of algebraic numbers is “tame,” in the sense that, if p is any degree-n polynomial with integer coefficients at most exp(n) in absolute value, then p(a1,…,ak)≠0 implies |p(a1,…,ak)|≥1/exp(n). And indeed, Julian Rosen on MathOverflow found an elegant proof of this fact. I’ll let you read it over there if you’re interested, but briefly, it interprets the amplitude we want as one particular Archimedean valuation of a certain element of a number field, and then lower-bounds the amplitude by considering the product of all Archimedean and non-Archimedean valuations (the latter of which involves the p-adic numbers). Since this was a bit heavy-duty for me, I was grateful when Will Sawin reformulated the proof in linear-algebraic terms that I understood.

And then came the embarrassing part. A few days ago, I was chatting with Greg Kuperberg, the renowned mathematician and author of our climate-change parable. I thought he’d be interested in this PostBQP progress, so I mentioned it to him. Delicately, Greg let me know that he had recently proved the exact same results, for the exact same reason (namely, fixing the definition of PostBQP), for the latest revision of his paper How Hard Is It to Approximate the Jones Polynomial?. Moreover, he actually wrote to me in June to tell me about this! At the time, however, I regarded it as “pointless mathematical hairsplitting” (who cares about these low-level gate-set issues anyway?). So I didn’t pay it any attention—and then I’d completely forgotten about Greg’s work when the question resurfaced a few months later. This is truly a just punishment for looking down on “mathematical hairsplitting,” and not a lesson I’ll soon forget.

Anyway, Greg’s paper provides yet a third proof that the algebraic numbers are tame, this one using Galois conjugates (though it turns out that, from a sufficiently refined perspective, Greg’s proof is equivalent to the other two).

There remains one obvious open problem here, one that I noted in the MathOverflow post and in which Greg is also extremely interested. Namely, we now know that it’s possible to screw up PostBQP using gates with amplitudes that are crazy transcendental numbers (closely related to the Liouville numbers). And we also know that, if the gates have algebraic amplitudes, then everything is fine: all events have at least 1/exp(n) probability. But what if the gates have not-so-crazy transcendental amplitudes, like 1/e, or (a bit more realistically) cos(2)?  I conjecture that everything is still fine, but the proof techniques that worked for the algebraic case seem useless here.

Stepping back, how great are the consequences of all this for our understanding of PostBQP? Fortunately, I claim that they’re not that great, for the following reason. As Adleman, DeMarrais, and Huang already noted in 1997—in the same paper that proved BQP⊆PP—we can screw up the definition even of BQP, let alone PostBQP, using a bizarre enough gate set. For example, suppose we had a gate G that mapped |0> to x|0>+y|1>, where y was a real number whose binary expansion encoded the halting problem (for example, y might equal Chaitin’s Ω).  Then by applying G more and more times, we could learn more and more bits of y, and thereby solve an uncomputable problem in the limit n→∞.

Faced with this observation, most quantum computing experts would say something like: “OK, but this is silly! It has no physical relevance, since we’ll never come across a magical gate like G—if only we did! And at any rate, it has nothing to do with quantum computing specifically: even classically, one could imagine a coin that landed heads with probability equal to Chaitin’s Ω. Therefore, the right way to deal with this is simply to define BQP in such a way as to disallow such absurd gates.” And indeed, that is what’s done today—usually without even remarking on it.

Now, it turns out that even gates that are “perfectly safe” for defining BQP, can turn “unsafe” when it comes to defining PostBQP. To screw up the definition of PostBQP, it’s not necessary that a gate involve uncomputable (or extremely hard-to-compute) amplitudes: the amplitudes could all be easily computable, but they could still be “unsafe” because of massive cancellations, as in the example above involving α. But one could think of this as a difference of degree, rather than of kind. It’s still true that there’s a large set of gates, including virtually all the gates anyone has ever cared about in practice (Toffoli, Hadamard, π/8, etc. etc.), that are perfectly safe for defining the complexity class; it’s just that the set is slightly smaller than it was for BQP.


The other issue with the PostBQP=PP paper was discovered by Ryan O’Donnell and Cem Say.  In Proposition 3 of the paper, I claim that PostBQP = BQPPostBQP||,classical, where the latter is the class of problems solvable by a BQP machine that’s allowed to make poly(n) parallel, classical queries to a PostBQP oracle.  As Ryan pointed out to me, nothing in my brief argument for this depended on quantum mechanics, so it would equally well show that PostBPP = BPPPostBPP||, where PostBPP (also known as BPPpath) is the classical analogue of PostBQP, and BPPPostBPP|| is the class of problems solvable by a BPP machine that can make poly(n) parallel queries to a PostBPP oracle.  But BPPPostBPP|| clearly contains BPPNP||, which in turn contains AM—so we would get AM in PostBPP, and therefore AM in PostBQP=PP.  But Vereshchagin gave an oracle relative to which AM is not contained in PP.  Since there was no nonrelativizing ingredient anywhere in my argument, the only possible conclusion is that my argument was wrong.  (This, incidentally, provides a nice illustration of the value of oracle results.)

In retrospect, it’s easy to pinpoint what went wrong.  If we try to simulate BPPPostBPP|| in PostBPP, our random bits will be playing a dual role: in choosing the queries to be submitted to the PostBPP oracle, and in providing the “raw material for postselection,” in computing the responses to those queries.  But in PostBPP, we only get to postselect once.  When we do, the two sets of random bits that we’d wanted to keep separate will get hopelessly mixed up, with the postselection acting on the “BPP” random bits, not just on the “PostBPP” ones.

How can we fix this problem?  Well, when defining the class BQPPostBQP||,classical, suppose we require the queries to the PostBQP oracle to be not only “classical,” but deterministic: that is, they have to be generated in advance by a P machine, and can’t depend on any random bits whatsoever.  And suppose we define BPPPostBPP||,classical similarly.  In that case, it’s not hard to see that the equalities BQPPostBQP||,classical = PostBQP and BPPPostBPP||,classical = PostBPP both go through.  You don’t actually care about this, do you?  But Ryan O’Donnell and Cem Say did, and that’s good enough for me.


I wish I could say that these are the only cases of mistakes recently being found in decade-old papers of mine, but alas, such is not the case.  In the near future, my student Adam Bouland, MIT undergrad Mitchell Lee, and Singapore’s Joe Fitzsimons will post to the arXiv a paper that grew out of an error in my 2005 paper Quantum Computing and Hidden Variables. In that paper, I introduced a hypothetical generalization of the quantum computing model, in which one gets to see the entire trajectory of a hidden variable, rather than just a single measurement outcome. I showed that this generalization would let us solve problems somewhat beyond what we think we can do with a “standard” quantum computer. In particular, we could solve the collision problem in O(1) queries, efficiently solve Graph Isomorphism (and all other problems in the Statistical Zero-Knowledge class), and search an N-element list in only ~N1/3 steps, rather than the ~N1/2 steps of Grover’s search algorithm. That part of the paper remains fine!

On the other hand, at the end of the paper, I also gave a brief argument to show that, even in the hidden-variable model, ~N1/3 steps are required to search an N-element list. But Mitchell Lee and Adam Bouland discovered that that argument is wrong: it fails to account for all the possible ways that an algorithm could exploit the correlations between the hidden variable’s values at different moments in time. (I’ve previously discussed this error in other blog posts, as well as in the latest edition of Quantum Computing Since Democritus.)

If we suitably restrict the hidden-variable theory, then we can correctly prove a lower bound of ~N1/4, or even (with strong enough assumptions) ~N1/3; and we do that in the forthcoming paper. Even with no restrictions, as far as we know an ~N1/3 lower bound for search with hidden variables remains true. But it now looks like proving it will require a major advance in our understanding of hidden-variable theories: for example, a proof that the “Schrödinger theory” is robust to small perturbations, which I’d given as the main open problem in my 2005 paper.

As if that weren’t enough, in my 2003 paper Quantum Certificate Complexity, I claimed (as a side remark) that one could get a recursive Boolean function f with an asymptotic gap between the block sensitivity bs(f) and the randomized certificate complexity RC(f). However, two and a half years ago, Avishay Tal discovered that this didn’t work, because block sensitivity doesn’t behave nicely under composition.  (In assuming it did, I was propagating an error introduced earlier by Wegener and Zádori.)  More broadly, Avishay showed that there is no recursively-defined Boolean function with an asymptotic gap between bs(f) and RC(f). On the other hand, if we just want some Boolean function with an asymptotic gap between bs(f) and RC(f), then Raghav Kulkarni observed that we can use a non-recursive function introduced by Xiaoming Sun, which yields bs(f)≈N3/7 and RC(f)≈N4/7. This is actually a larger separation than the one I’d wrongly claimed.

Now that I’ve come clean about all these things, hopefully the healing can begin at last.

Lens of Computation on the Sciences

Tuesday, November 25th, 2014

This weekend, the Institute for Advanced Study in Princeton hosted a workshop on the “Lens of Computation in the Sciences,” which was organized by Avi Wigderson, and was meant to showcase theoretical computer science’s imperialistic ambitions to transform every other field.  I was proud to speak at the workshop, representing CS theory’s designs on physics.  But videos of all four of the talks are now available, and all are worth checking out:

Unfortunately, the videos were slow to buffer when I last tried it.  While you’re waiting, you could also check my PowerPoint slides, though they overlap considerably with my previous talks.  (As always, if you can’t read PowerPoint, then go ask another reader of this blog to convert the file into a format you like.)

Thanks so much to Avi, and everyone else at IAS, for organizing an awesome workshop!