### The Collision Lower Bound After 12 Years

Sunday, July 7th, 2013Streaming video is now available for the talks at the QStart conference, a couple weeks ago at Hebrew University in Jerusalem. If you’re the sort of person who likes watching quantum information talks, then check out the excellent ones by Ray Laflamme, John Martinis, Umesh Vazirani, Thomas Vidick, Jacob Bekenstein, and many others.

My own contribution—the first “backwards-facing, crusty, retrospective” talk I’ve ever given—was called The Collision Lower Bound After 12 Years (click here for the slides—and to answer the inevitable question, no, I have no idea how to open PowerPoint files in your favorite free-range, organic computing platform). Briefly, the collision lower bound is the theorem that even a quantum computer needs at least ~n^{1/3} steps to find a duplicate in a long list of random numbers between 1 and n, even assuming the list is long enough that there are many, many duplicates to be found. (Moreover, ~n^{1/3} steps are known to *suffice*, by the BHT algorithm, a clever adaptation of Grover’s search algorithm. Also, for simplicity a “step” means a single access to the list, though of course a quantum algorithm can access multiple list elements in superposition and it still counts as one step.)

By comparison, for classical algorithms, ~√n steps are necessary and sufficient to find a collision, by the famous Birthday Paradox. So, just like for Grover’s search problem, a quantum computer could give you a modest speedup over classical for the collision problem, but *only* a modest one. The reason this is interesting is that, because of the abundance of collisions to be found, the collision problem has a great deal more structure than Grover’s search problem (though it has less structure than Shor’s period-finding problem, where there famously *is* an exponential quantum speedup).

One “obvious” motivation for the collision problem is that it models the problem of breaking collision-resistant hash functions (like SHA-256) in cryptography. In particular, if there *were* a superfast (e.g., log(n)-time) quantum algorithm for the collision problem, then there could be no CRHFs secure against quantum attack. So the fact that there’s no such algorithm at least opens up the possibility of quantum-secure CRHFs. However, there are many other motivations. For example, the collision lower bound rules out the most “simpleminded” approach to a polynomial-time quantum algorithm for the Graph Isomorphism problem (though, I hasten to add, it says nothing about more sophisticated approaches). The collision problem is also closely related to Statistical Zero Knowledge (SZK) proof protocols, so that the collision lower bound leads to an oracle relative to which SZK is not in BQP.

Probably the most bizarre motivation to other people, but for some reason the most important one to me back in 2001, is that the collision problem is closely related to the problem of sampling the *entire trajectories of hidden variables*, in hidden-variable theories such as Bohmian mechanics. The collision lower bound provides strong evidence that this trajectory-sampling problem is hard even for a quantum computer—intuitively because a QC can’t keep track of the *correlations* between the hidden-variable positions at different times. The way I like to put it is that if, at the moment of your death, your entire life history flashed before you in an instant (and if a suitable hidden-variable theory were true, and if you’d performed an appropriate quantum interference experiment on your own brain during your life), then you really *could* solve the collision problem in only O(1) steps. Interestingly, you *still* might not be able to solve NP-complete problems—I don’t know! But you could at least do *something* that we think is hard for a quantum computer.

I proved the first collision lower bound in 2001 (actually, a week or so after the 9/11 attacks), after four months of sleepless nights and failed attempts. (Well actually, I only got the weaker lower bound of ~n^{1/5}; the ~n^{1/3} was a subsequent improvement due to Yaoyun Shi. Before ~n^{1/5}, no one could even rule out that a quantum computer could solve the collision problem with a *constant* number of steps (!!), independent of n—say, 4 steps.) It was the first thing I’d proved of any significance, and probably the most important thing I did while in grad school. I knew it was one of the favorite problems of my adviser, Umesh Vazirani, so I didn’t even tell Umesh I was working on it until I’d already spent the whole summer on it. I figured he’d think I was nuts.

**Bonus Proof Explanation!**

The technique that ultimately worked was the polynomial method, which was introduced to quantum computing four years prior in a seminal paper of Beals et al. In this technique, you first suppose by contradiction that a quantum algorithm exists to solve your problem that makes very few accesses to the input bits—say, T. Then you write out the quantum algorithm’s acceptance probability (e.g., the probability that the algorithm outputs “yes, I found what I was looking for”) as a multivariate polynomial p in the input bits. It’s not hard to prove that p has degree at most 2T, since the amplitudes in the quantum algorithm can be written as degree-T polynomials (each input access increases the degree by at most 1, and unitary transformations in between input accesses don’t increase the degree at all); then squaring the amplitudes to get probabilities doubles the degree. (This is the only part of the method that uses anything specific to quantum mechanics!)

Next, you choose some parameter k related to the problem of interest, and you let q(k) be the expectation of p(X) over all inputs X with the parameter equal to k. For example, with the collision problem, it turns out that the “right” choice to make is to set k=1 if each number appears exactly once in your input list, k=2 if each number appears exactly twice, k=3 if each number appears exactly three times, and so on. Then—here comes the “magic” part—you show that *q(k) itself is a univariate polynomial in k, again of degree at most 2T*. This magical step is called “symmetrization”; it can be traced at least as far back as the famous 1969 book *Perceptrons* by Marvin Minsky and Seymour Papert. In the case of the collision problem, I still have no explanation, 12 years later, for why symmetrization works: all I can say is that you do the calculation, and you cancel lots of things from both the numerator and the denominator, and what comes out at the end is a low-degree polynomial in k. (It’s precisely because I would never have predicted such a “zany coincidence,” that I had to stumble around in the dark for 4 months before I finally discovered by chance that the polynomial method worked.)

Anyway, after applying symmetrization, you’re left with a low-degree univariate polynomial q with some very interesting properties: for example, you need 0≤q(k)≤1 for positive integers k, since then q(k) represents an averaged *probability* that your quantum algorithm does something. You also need q(1) to be close to 0, since if k=1 then there no collisions to be found, and you need q(2) to be close to 1, since if k=2 then there are lots of collisions and you’d like your algorithm to find one. But now, you can appeal to a theorem of A. A. Markov from the 1890s, which implies that no low-degree polynomial exists with those properties! Hence your original efficient quantum algorithm can’t have existed either: indeed, you get a quantitative lower bound (a tight one, if you’re careful) on the number of input accesses your algorithm must have made. And that, modulo some nasty technicalities (e.g., what if k doesn’t evenly divide the size of your list?), is how the collision lower bound works.

So, in the first half of my QStart talk, I explain the collision lower bound and its original motivations (and a little about the proof, but no more than what I said above). Then in the second half, I survey lots of extensions and applications between 2002 and the present, as well as the many remaining open problems. For example, I discuss the tight lower bound of Ambainis et al. for the “index erasure” problem, Belovs’s proof of the element distinctness lower bound using the adversary method, and my and Ambainis’s generalization of the collision lower bound to arbitrary symmetric problems. I also talk about Mark Zhandry’s recent breakthrough (sorry, am I not allowed to use that word?) showing that the GGM construction of pseudorandom functions is secure against quantum adversaries, and how Zhandry’s result can be seen—in retrospect, anyway—as yet another application of the collision lower bound.

Probably of the most general interest, I discuss how Daniel Harlow and Patrick Hayden invoked the collision lower bound in their striking recent paper on the AMPS black hole “firewall” paradox. In particular they argued that, in order to uncover the apparent violation of local quantum field theory at the heart of the paradox, an observer falling into a black hole would probably need to solve a QSZK-complete computational problem. And of course, the collision lower bound furnishes our main piece of evidence that QSZK-complete problems really should require exponential time even for quantum computers. So, Harlow and Hayden argue, the black hole would already have evaporated before the observer had even made a dent in the requisite computation.

Now, the Harlow-Hayden paper, and the AMPS paradox more generally, really deserve posts of their own—just as soon as I learn enough to decide what I think about them. For now, I’ll simply say that, regardless of how convinced you are by Harlow and Hayden’s argument (and, a bit like with my free-will essay, it’s not clear how convinced the authors themselves are!), it’s one of the most ambitious syntheses of computational complexity and physics I’ve ever seen. You can disagree with it, but to read the paper (or watch the talk, streaming video from Strings’2013 here) is to experience the thrill of seeing black hole physics related to complexity theory by authors who really know both.

(In my own talk on the collision lower bound, the short segment about Harlow-Hayden generated more questions and discussion than the rest of the talk combined—with *me* being challenged to defend their argument, even with Patrick Hayden right there in the audience! I remarked later that that portion of the talk was itself a black hole for audience interest.)

In totally unrelated news, *Quantum Computing Since Democritus* made Scientific American’s list of best summer books! I can’t think of a more appropriate honor, since if there’s *any* phrase that captures what QCSD is all about, “sizzling summer beach read” would be it. Apparently there will even be an online poll soon, where y’all can go and vote for QCSD as your favorite. Vote early and often, and from multiple IP addresses!