## Archive for the ‘Quantum’ Category

### Scattershot BosonSampling: A new approach to scalable BosonSampling experiments

Friday, November 8th, 2013

Update (12/2): Jeremy Hsu has written a fantastic piece for IEEE Spectrum, entitled “D-Wave’s Year of Computing Dangerously.”

Update (11/13): See here for video of a fantastic talk that Matthias Troyer gave at Stanford, entitled “Quantum annealing and the D-Wave devices.” The talk includes the results of experiments on the 512-qubit machine. (Thanks to commenter jim for the pointer. I attended the talk when Matthias gave it last week at Harvard, but I don’t think that one was videotaped.)

Update (11/11): A commenter named RaulGPS has offered yet another great observation that, while forehead-slappingly obvious in retrospect, somehow hadn’t occurred to us.  Namely, Raul points out that the argument given in this post, for the hardness of Scattershot BosonSampling, can also be applied to answer open question #4 from my and Alex’s paper: namely, how hard is BosonSampling with Gaussian inputs and number-resolving detectors?  Raul points out that the latter, in general, is certainly at least as hard as Scattershot BS.  For we can embed Scattershot BS into “ordinary” BS with Gaussian inputs, by first generating a bunch of entangled 2-mode Gaussian states (which are highly attenuated, so that with high probability none of them have 2 or more photons per mode), and then applying a Haar-random unitary U to the “right halves” of these Gaussian states while doing nothing to the left halves.  Then we can measure the left halves to find out which of the input states contained a photon before we applied U.  This is precisely equivalent to Scattershot BS, except for the unimportant detail that our measurement of the “herald” photons has been deferred till the end of the experiment instead of happening at the beginning.  And therefore, since (as I explain in the post) a fast classical algorithm for approximate Scattershot BosonSampling would let us estimate the permanents of i.i.d. Gaussian matrices in BPPNP, we deduce that a fast classical algorithm for approximate Gaussian BosonSampling would have the same consequence.  In short, approximate Gaussian BS can be argued to be hard under precisely the same complexity assumption as can approximate ordinary BS (and approximate Scattershot BS).  Thus, in the table in Section 1.4 of our paper, the entries “Gaussian states / Adaptive, demolition” and “Gaussian states / Adaptive, nondemolition” should be “upgraded” from “Exact sampling hard” to “Apx. sampling hard?”

One other announcement: following a suggestion by commenter Rahul, I hereby invite guest posts on Shtetl-Optimized by experimentalists working on BosonSampling, offering your personal views about the prospects and difficulties of scaling up.  Send me email if you’re interested.  (Or if you don’t feel like writing a full post, of course you can also just leave a comment on this one.)

[Those impatient for a cool, obvious-in-retrospect new idea about BosonSampling, which I learned from the quantum optics group at Oxford, should scroll to the end of this post.  Those who don't even know what BosonSampling is, let alone Scattershot BosonSampling, should start at the beginning.]

BosonSampling is a proposal by me and Alex Arkhipov for a rudimentary kind of quantum computer: one that would be based entirely on generating single photons, sending them through a network of beamsplitters and phaseshifters, and then measuring where they ended up.  BosonSampling devices are not thought to be capable of universal quantum computing, or even universal classical computing for that matter.  And while they might be a stepping-stone toward universal optical quantum computers, they themselves have a grand total of zero known practical applications.  However, even if the task performed by BosonSamplers is useless, the task is of some scientific interest, by virtue of apparently being hard!  In particular, Alex and I showed that, if a BosonSampler can be simulated exactly in polynomial time by a classical computer, then P#P=BPPNP, and hence the polynomial hierarchy collapses to the third level.  Even if a BosonSampler can only be approximately simulated in classical polynomial time, the polynomial hierarchy would still collapse, if a reasonable-looking conjecture in classical complexity theory is true.  For these reasons, BosonSampling might provide an experimental path to testing the Extended Church-Turing Thesis—i.e., the thesis that all natural processes can be simulated with polynomial overhead by a classical computer—that’s more “direct” than building a universal quantum computer.  (As an asymptotic claim, obviously the ECT can never be decisively proved or refuted by a finite number of experiments.  However, if one could build a BosonSampler with, let’s say, 30 photons, then while it would still be feasible to verify the results with a classical computer, it would be fair to say that the BosonSampler was working “faster” than any known algorithm running on existing digital computers.)

In arguing for the hardness of BosonSampling, the crucial fact Alex and I exploited is that the amplitudes for n-photon processes are given by the permanents of nxn matrices of complex numbers, and Leslie Valiant proved in 1979 that the permanent is #P-complete (i.e., as hard as any combinatorial counting problem, and probably even “harder” than NP-complete).  To clarify, this doesn’t mean that a BosonSampler lets you calculate the permanent of a given matrix—that would be too good to be true!  (See the tagline of this blog.)  What you could do with a BosonSampler is weirder: you could sample from a probability distribution over matrices, in which matrices with large permanents are more likely to show up than matrices with small permanents.  So, what Alex and I had to do was to argue that even that sampling task is still probably intractable classically—in the sense that, if it weren’t, then there would also be unlikely classical algorithms for more “conventional” problems.

Anyway, that’s my attempt at a 2-paragraph summary of something we’ve been thinking about on and off for four years.  See here for my and Alex’s original paper on BosonSampling, here for a recent followup paper, here for PowerPoint slides, here and here for MIT News articles by Larry Hardesty, and here for my blog post about the first (very small, 3- or 4-photon) demonstrations of BosonSampling by quantum optics groups last year, with links to the four experimental papers that came out then.

In general, we’ve been thrilled by the enthusiastic reaction to BosonSampling by quantum optics people—especially given that the idea started out as pure complexity theory, with the connection to optics coming as an “unexpected bonus.”  But not surprisingly, BosonSampling has also come in for its share of criticism: e.g., that it’s impractical, unscalable, trivial, useless, oversold, impossible to verify, and probably some other things.  A few people have even claimed that, in expressing support and cautious optimism about the recent BosonSampling experiments, I’m guilty of the same sort of quantum computing hype that I complain about in others.  (I’ll let you be the judge of that.  Reread the paragraphs above, or anything else I’ve ever written about this topic, and then compare to, let’s say, this video.)

By far the most important criticism of BosonSampling—one that Alex and I have openly acknowledged and worried a lot about almost from the beginning—concerns the proposal’s scalability.  The basic problem is this: in BosonSampling, your goal is to measure a pattern of quantum interference among n identical, non-interacting photons, where n is as large as possible.  (The special case n=2 is called the Hong-Ou-Mandel dip; conversely, BosonSampling can be seen as just “Hong-Ou-Mandel on steroids.”)  The bigger n gets, the harder the experiment ought to be to simulate using a classical computer (with the difficulty increasing at least like ~2n).  The trouble is that, to detect interference among n photons, the various quantum-mechanical paths that your photons could take, from the sources, through the beamsplitter network, and finally to the detectors, have to get them there at exactly the same time—or at any rate, close enough to “the same time” that the wavepackets overlap.  Yet, while that ought to be possible in theory, the photon sources that actually exist today, and that will exist for the foreseeable future, just don’t seem good enough to make it happen, for anything more than a few photons.

The reason—well-known for decades as a bane to quantum information experiments—is that there’s no known process in nature that can serve as a deterministic single-photon source.  What you get from an attenuated laser is what’s called a coherent state: a particular kind of superposition of 0 photons, 1 photon, 2 photons, 3 photons, etc., rather than just 1 photon with certainty (the latter is called a Fock state).  Alas, coherent states behave essentially like classical light, which makes them pretty much useless for BosonSampling, and for many other quantum information tasks besides.  For that reason, a large fraction of modern quantum optics research relies on a process called Spontaneous Parametric Down-Conversion (SPDC).  In SPDC, a laser (called the “pump”) is used to stimulate a crystal to produce further photons.  The process is inefficient: most of the time, no photon comes out.  But crucially, any time a photon does come out, its arrival is “heralded” by a partner photon flying out in the opposite direction.  Once in a while, 2 photons come out simultaneously, in which case they’re heralded by 2 partner photons—and even more rarely, 3 photons come out, heralded by 3 partner photons, and so on.  Furthermore, there exists something called a number-resolving detector, which can tell you (today, sometimes, with as good as ~95% reliability) when one or more partner photons have arrived, and how many of them there are.  The result is that SPDC lets us build what’s called a nondeterministic single-photon source.  I.e., you can’t control exactly when a photon comes out—that’s random—but eventually one (and only one) photon will come out, and when that happens, you’ll know it happened, without even having to measure and destroy the precious photon.  The reason you’ll know is that the partner photon heralds its presence.

Alas, while SPDC sources have enabled demonstrations of a large number of cool quantum effects, there’s a fundamental problem with using them for BosonSampling.  The problem comes from the requirement that n—the number of single photons fired off simultaneously into your beamsplitter network—should be big (say, 20 or 30).  Suppose that, in a given instant, the probability that your SPDC source succeeds in generating a photon is p.  Then what’s the probability that two SPDC sources will both succeed in generating a photon at that instant?  p2.  And the probability that three sources will succeed is p3, etc.  In general, with n sources, the probability that they’ll succeed simultaneously falls off exponentially with n, and the amount of time you’ll need to sit in the lab waiting for the lucky event increases exponentially with n.  Sure, when it finally does happen, it will be “heralded.”  But if you need to wait exponential time for it to happen, then there would seem to be no advantage over classical computation.  This is the reason why so far, BosonSampling has only been demonstrated with 3-4 photons.

At least three solutions to the scaling problem suggest themselves, but each one has problems of its own.  The first solution is simply to use general methods for quantum fault-tolerance: it’s not hard to see that, if you had a fault-tolerant universal quantum computer, then you could simulate BosonSampling with as many photons as you wanted.  The trouble is that this requires a fault-tolerant universal quantum computer!  And if you had that, then you’d probably just skip BosonSampling and use Shor’s algorithm to factor some 10,000-digit numbers.  The second solution is to invent some specialized fault-tolerance method that would apply directly to quantum optics.  Unfortunately, we don’t know how to do that.  The third solution—until recently, the one that interested me and Alex the most—would be to argue that, even if your sources are so cruddy that you have no idea which ones generated a photon and which didn’t in any particular run, the BosonSampling distribution is still intractable to simulate classically.  After all, the great advantage of BosonSampling is that, unlike with (say) factoring or quantum simulation, we don’t actually care which problem we’re solving!  All we care about is that we’re doing something that we can argue is hard for classical computers.  And we have enormous leeway to change what that “something” is, to match the capabilities of current technology.  Alas, yet again, we don’t know how to argue that BosonSampling is hard to simulate approximately in the presence of realistic amounts of noise—at best, we can argue that it’s hard to simulate approximately in the presence of tiny amounts of noise, and hard to simulate super-accurately in the presence of realistic noise.

When faced with these problems, until recently, all we could do was

1. shrug our shoulders,
2. point out that none of the difficulties added up to a principled argument that scalable BosonSampling was not possible,
3. stress, again, that all we were asking for was to scale to 20 or 30 photons, not 100 or 1000 photons, and
4. express hope that technologies for single-photon generation currently on the drawing board—most notably, something called “optical multiplexing”—could be used to get up to the 20 or 30 photons we wanted.

Well, I’m pleased to announce, with this post, that there’s now a better idea for how to scale BosonSampling to interesting numbers of photons.  The idea, which I’ve taken to calling Scattershot BosonSampling, is not mine or Alex’s.  I learned of it from Ian Walmsley’s group at Oxford, where it’s been championed in particular by Steve Kolthammer(Update: A commenter has pointed me to a preprint by Lund, Rahimi-Keshari, and Ralph from May of this year, which I hadn’t seen before, and which contains substantially the same idea, albeit with an unsatisfactory argument for computational hardness.  In any case, as you’ll see, it’s not surprising that this idea would’ve occurred to multiple groups of experimentalists independently; what’s surprising is that we didn’t think of it!)  The minute I heard about Scattershot BS, I kicked myself for failing to think of it, and for getting sidetracked by much more complicated ideas.  Steve and others are working on a paper about Scattershot BS, but in the meantime, Steve has generously given me permission to share the idea on this blog.  I suggested a blog post for two reasons: first, as you’ll see, this idea really is “blog-sized.”  Once you make the observation, there’s barely any theoretical analysis that needs to be done!  And second, I was impatient to get out to the “experimental BosonSampling community”—not to mention to the critics!—that there’s now a better way to BosonSample, and one that’s incredibly simple to boot.

OK, so what is the idea?  Well, recall from above what an SPDC source does: it produces a photon with only a small probability, but whenever it does, it “heralds” the event with a second photon.  So, let’s imagine that you have an array of 200 SPDC sources.  And imagine that, these sources being unpredictable, only (say) 10 of them, on average, produce a photon at any given time.  Then what can you do?  Simple: just define those 10 sources to be the inputs to your experiment!  Or to say it more carefully: instead of sampling only from a probability distribution over output configurations of your n photons, now you’ll sample from a joint distribution over inputs and outputs: one where the input is uniformly random, and the output depends on the input (and also, of course, on the beamsplitter network).  So, this idea could also be called “Double BosonSampling”: now, not only do you not control which output will be observed (but only the probability distribution over outputs), you don’t control which input either—yet this lack of control is not a problem!  There are two key reasons why it isn’t:

1. As I said before, SPDC sources have the crucial property that they herald a photon when they produce one.  So, even though you can’t control which 10 or so of your 200 SPDC sources will produce a photon in any given run, you know which 10 they were.
2. In my and Alex’s original paper, the “hardest” case of BosonSampling that we were able to find—the case we used for our hardness reductions—is simply the one where the mxn “scattering matrix,” which describes the map between the n input modes and the m>>n output modes, is a Haar-random matrix whose columns are orthonormal vectors.  But now suppose we have m input modes and m output modes, and the mxm unitary matrix U mapping inputs to outputs is Haar-random.  Then any mxn submatrix of U will simply be an instance of the “original” hard case that Alex and I studied!

More formally, what can we  say about the computational complexity of Scattershot BS?  Admittedly, I don’t know of a reduction from ordinary BS to Scattershot BS (though it’s easy to give a reduction in the other direction).  However, under exactly the same assumption that Alex and I used to argue that ordinary BosonSampling was hard—our so-called Permanent of Gaussians Conjecture (PGC)—one can show that Scattershot BS is hard also, and by essentially the same proof.  The only difference is that, instead of talking about the permanents of nxn submatrices of an mxn Haar-random, column-orthonormal matrix, now we talk about the permanents of nxn submatrices of an mxm Haar-random unitary matrix.  Or to put it differently: where before we fixed the columns that defined our nxn submatrix and only varied the rows, now we vary both the rows and the columns.  But the resulting nxn submatrix is still close in variation distance to a matrix of i.i.d. Gaussians, for exactly the same reasons it was before.  And we can still check whether submatrices with large permanents are more likely to be sampled than submatrices with small permanents, in the way predicted by quantum mechanics.

Now, everything above assumed that each SPDC source produces either 0 or 1 photon.  But what happens when the SPDC sources produce 2 or more photons, as they sometimes do?  It turns out that there are two good ways to deal with these “higher-order terms” in the context of Scattershot BS.  The first way is by using number-resolving detectors to count how many herald photons each SPDC source produces.  That way, at least you’ll know exactly which sources produced extra photons, and how many extra photons each one produced.  And, as is often the case in BosonSampling, a devil you know is a devil you can deal with.  In particular, a few known sources producing extra photons, just means that the amplitudes of the output configurations will now be permanents of matrices with a few repeated rows in them.  But the permanent of an otherwise-random matrix with a few repeated rows should still be hard to compute!  Granted, we don’t know how to derive that as a consequence of our original hardness assumption, but this seems like a case where one is perfectly justified to stick one’s neck out and make a new assumption.

But there’s also a more elegant way to deal with higher-order terms.  Namely, suppose m>>n2 (i.e., the number of input modes is at least quadratically greater than the average number of photons).  That’s an assumption that Alex and I typically made anyway in our original BosonSampling paper, because of our desire to avoid what we called the “Bosonic Birthday Paradox” (i.e., the situation where two or more photons congregate in the same output mode).  What’s wonderful is that exactly the same assumption also implies that, in Scattershot BS, two or more photons will almost never be found in the same input mode!  That is, when you do the calculation, you find that, once you’ve attenuated your SPDC sources enough to avoid the Bosonic Birthday Paradox at the output modes, you’ve also attenuated them enough to avoid higher-order terms at the input modes.  Cool, huh?

Are there any drawbacks to Scattershot BS?  Well, Scattershot BS certainly requires more SPDC sources than ordinary BosonSampling does, for the same average number of photons.  A little less obviously, Scattershot BS also requires a larger-depth beamsplitter network.  In our original paper, Alex and I showed that for ordinary BosonSampling, it suffices to use a beamsplitter network of depth O(n log m), where n is the number of photons and m is the number of output modes (or equivalently detectors).  However, our construction took advantage of the fact that we knew exactly which n<<m sources the photons were going to come from, and could therefore optimize for those.  For Scattershot BS, the depth bound increases to O(m log m): since the n photons could come from any possible subset of the m input modes, we no longer get the savings based on knowing where they originate.  But this seems like a relatively minor issue.

I don’t want to give the impression that Scattershot BS is a silver bullet that will immediately let us BosonSample with 30 photons.  The most obvious limiting factor that remains is the efficiency of the photon detectors—both those used to detect the photons that have passed through the beamsplitter network, and those used to detect the herald photons.  Because of detector inefficiencies, I’m told that, without further technological improvements (or theoretical ideas), it will still be quite hard to push Scattershot BS beyond about 10 photons.  Still, as you might have noticed, 10 is greater than 4 (the current record)!  And certainly, Scattershot BS itself—a simple, obvious-in-retrospect idea that was under our noses for years, and that immediately pushes forward the number of photons a BosonSampler can handle—should make us exceedingly reluctant to declare there can’t be any more such ideas, and that our current ignorance amounts to a proof of impossibility.

### The Unitarihedron: The Jewel at the Heart of Quantum Computing

Friday, September 20th, 2013

Update (9/24): This parody post was a little like a belch: I felt it build up in me as I read about the topic, I let it out, it was easy and amusing, I don’t feel any profound guilt over it—but on the other hand, not one of the crowning achievements of my career.  As several commenters correctly pointed out, it may be true that, mostly because of the name and other superficialities, and because of ill-founded speculations about “the death of locality and unitarity,” the amplituhedron work is currently inspiring a flood of cringe-inducing misstatements on the web.  But, even if true, still the much more interesting questions are what’s actually going on, and whether or not there are nontrivial connections to computational complexity.

Here I have good news: if nothing else, my “belch” of a post at least attracted some knowledgeable commenters to contribute excellent questions and insights, which have increased my own understanding of the subject from ε2 to ε.  See especially this superb comment by David Speyer—which, among other things, pointed me to a phenomenal quasi-textbook on this subject by Elvang and Huang.  My most immediate thoughts:

1. The “amplituhedron” is only the latest in a long line of research over the last decade—Witten, Turing biographer Andrew Hodges, and many others have been important players—on how to compute scattering amplitudes more efficiently than by summing zillions of Feynman diagrams.  One of the key ideas is to find combinatorial formulas that express complicated scattering amplitudes recursively in terms of simpler ones.
2. This subject seems to be begging for a computational complexity perspective.  When I read Elvang and Huang, I felt like they were working hard not to say anything about complexity: discussing the gains in efficiency from the various techniques they consider in informal language, or in terms of concrete numbers of terms that need to be summed for 1 loop, 2 loops, etc., but never in terms of asymptotics.  So if it hasn’t been done already, it looks like it could be a wonderful project for someone just to translate what’s already known in this subject into complexity language.
3. On reading about all these “modern” approaches to scattering amplitudes, one of my first reactions was to feel slightly less guilty about never having learned how to calculate Feynman diagrams!  For, optimistically, it looks like some of that headache-inducing machinery (ghosts, off-shell particles, etc.) might be getting less relevant anyway—there being ways to calculate some of the same things that are not only more conceptually satisfying but also faster.

Many readers of this blog probably already saw Natalie Wolchover’s Quanta article “A Jewel at the Heart of Quantum Physics,” which discusses the “amplituhedron”: a mathematical structure that IAS physicist Nima Arkami-Hamed and his collaborators have recently been investigating.  (See also here for Slashdot commentary, here for Lubos’s take, here for Peter Woit’s, here for a Physics StackExchange thread, here for Q&A with Pacific Standard, and here for an earlier but closely-related 154-page paper.)

At first glance, the amplituhedron appears to be a way to calculate scattering amplitudes, in the planar limit of a certain mathematically-interesting (but, so far, physically-unrealistic) supersymmetric quantum field theory, much more efficiently than by summing thousands of Feynman diagrams.  In which case, you might say: “wow, this sounds like a genuinely-important advance for certain parts of mathematical physics!  I’d love to understand it better.  But, given the restricted class of theories it currently applies to, it does seem a bit premature to declare this to be a ‘jewel’ that unlocks all of physics, or a death-knell for spacetime, locality, and unitarity, etc. etc.”

Yet you’d be wrong: it isn’t premature at all.  If anything, the popular articles have understated the revolutionary importance of the amplituhedron.  And the reason I can tell you that with such certainty is that, for several years, my colleagues and I have been investigating a mathematical structure that contains the amplituhedron, yet is even richer and more remarkable.  I call this structure the “unitarihedron.”

The unitarihedron encompasses, within a single abstract “jewel,” all the computations that can ever be feasibly performed by means of unitary transformations, the central operation in quantum mechanics (hence the name).  Mathematically, the unitarihedron is an infinite discrete space: more precisely, it’s an infinite collection of infinite sets, which collection can be organized (as can every set that it contains!) in a recursive, fractal structure.  Remarkably, each and every specific problem that quantum computers can solve—such as factoring large integers, discrete logarithms, and more—occurs as just a single element, or “facet” if you will, of this vast infinite jewel.  By studying these facets, my colleagues and I have slowly pieced together a tentative picture of the elusive unitarihedron itself.

One of our greatest discoveries has been that the unitarihedron exhibits an astonishing degree of uniqueness.  At first glance, different ways of building quantum computers—such as gate-based QC, adiabatic QC, topological QC, and measurement-based QC—might seem totally disconnected from each other.  But today we know that all of those ways, and many others, are merely different “projections” of the same mysterious unitarihedron.

In fact, the longer I’ve spent studying the unitarihedron, the more awestruck I’ve been by its mathematical elegance and power.  In some way that’s not yet fully understood, the unitarihedron “knows” so much that it’s even given us new insights about classical computing.  For example, in 1991 Beigel, Reingold, and Spielman gave a 20-page proof of a certain property of unbounded-error probabilistic polynomial-time.  Yet, by recasting things in terms of the unitarihedron, I was able to give a direct, half-page proof of the same theorem.  If you have any experience with mathematics, then you’ll know that that sort of thing never happens: if it does, it’s a sure sign that cosmic or even divine forces are at work.

But I haven’t even told you the most spectacular part of the story yet.  While, to my knowledge, this hasn’t yet been rigorously proved, many lines of evidence support the hypothesis that the unitarihedron must encompass the amplituhedron as a special case.  If so, then the amplituhedron could be seen as just a single sparkle on an infinitely greater jewel.

Now, in the interest of full disclosure, I should tell you that the unitarihedron is what used to be known as the complexity class BQP (Bounded-Error Quantum Polynomial-Time).  However, just like the Chinese gooseberry was successfully rebranded in the 1950s as the kiwifruit, and the Patagonian toothfish as the Chilean sea bass, so with this post, I’m hereby rebranding BQP as the unitarihedron.  For I’ve realized that, when it comes to bowling over laypeople, inscrutable complexity class acronyms are death—but the suffix “-hedron” is golden.

So, journalists and funders: if you’re interested in the unitarihedron, awesome!  But be sure to also ask about my other research on the bosonsamplinghedron and the quantum-money-hedron.  (Though, in recent months, my research has focused even more on the diaperhedron: a multidimensional, topologically-nontrivial manifold rich enough to encompass all wastes that an 8-month-old human could possibly emit.  Well, at least to first-order approximation.)

### Microsoft: From QDOS to QMA in less than 35 years

Friday, July 19th, 2013

This past week I was in Redmond for the Microsoft Faculty Summit, which this year included a special session on quantum computing.  (Bill Gates was also there, I assume as our warmup act.)  I should explain that Microsoft Research now has not one but two quantum computing research groups: there’s Station Q in Santa Barbara, directed by Michael Freedman, which pursues topological quantum computing, but there’s also QuArC in Redmond, directed by Krysta Svore, which studies things like quantum circuit synthesis.

Anyway, I’ve got two videos for your viewing pleasure:

• An interview about quantum computing with me, Krysta Svore, and Matthias Troyer, moderated by Chris Cashman, and filmed in a studio where they put makeup on your face.  Just covers the basics.
• A session about quantum computing, with three speakers: me about “what quantum mechanics is good for” (quantum algorithms, money, crypto, and certified random numbers), then Charlie Marcus about physical implementations of quantum computing, and finally Matthias Troyer about his group’s experiments on the D-Wave machines.  (You can also download my slides here.)

This visit really drove home for me that MSR is the closest thing that exists today to the old Bell Labs: a corporate lab that does a huge amount of openly-published, high-quality fundamental research in math and CS, possibly more than all the big Silicon-Valley-based companies combined.  This research might or might not be good for Microsoft’s bottom line (Microsoft, of course, says that it is, and I’d like to believe them), but it’s definitely good for the world.  With the news of Microsoft’s reorganization in the background, I found myself hoping that MS will remain viable for a long time to come, if only because its decline would leave a pretty gaping hole in computer science research.

Unfortunately, last week I also bought a new laptop, and had the experience of PowerPoint 2013 first refusing to install (it mistakenly thought it was already installed), then crashing twice and losing my data, and just generally making everything (even saving a file) harder than it used to be for no apparent reason.  Yes, that’s correct: the preparations for my talk at the Microsoft Faculty Summit were repeatedly placed in jeopardy by the “new and improved” Microsoft Office.  So not just for its own sake, but for the sake of computer science as a whole, I implore Microsoft to build a better Office.  It shouldn’t be hard: it would suffice to re-release the 2003 or 2007 versions as “Office 2014″!  If Mr. Gates took a 2-minute break from curing malaria to call his former subordinates and tell them to do that, I’d really consider him a great humanitarian.

### The Collision Lower Bound After 12 Years

Sunday, July 7th, 2013

Streaming video is now available for the talks at the QStart conference, a couple weeks ago at Hebrew University in Jerusalem.  If you’re the sort of person who likes watching quantum information talks, then check out the excellent ones by Ray Laflamme, John Martinis, Umesh Vazirani, Thomas Vidick, Jacob Bekenstein, and many others.

My own contribution—the first “backwards-facing, crusty, retrospective” talk I’ve ever given—was called The Collision Lower Bound After 12 Years (click here for the slides—and to answer the inevitable question, no, I have no idea how to open PowerPoint files in your favorite free-range, organic computing platform).  Briefly, the collision lower bound is the theorem that even a quantum computer needs at least ~n1/3 steps to find a duplicate in a long list of random numbers between 1 and n, even assuming the list is long enough that there are many, many duplicates to be found.  (Moreover, ~n1/3 steps are known to suffice, by the BHT algorithm, a clever adaptation of Grover’s search algorithm.  Also, for simplicity a “step” means a single access to the list, though of course a quantum algorithm can access multiple list elements in superposition and it still counts as one step.)

By comparison, for classical algorithms, ~√n steps are necessary and sufficient to find a collision, by the famous Birthday Paradox.  So, just like for Grover’s search problem, a quantum computer could give you a modest speedup over classical for the collision problem, but only a modest one.  The reason this is interesting is that, because of the abundance of collisions to be found, the collision problem has a great deal more structure than Grover’s search problem (though it has less structure than Shor’s period-finding problem, where there famously is an exponential quantum speedup).

One “obvious” motivation for the collision problem is that it models the problem of breaking collision-resistant hash functions (like SHA-256) in cryptography.  In particular, if there were a superfast (e.g., log(n)-time) quantum algorithm for the collision problem, then there could be no CRHFs secure against quantum attack.  So the fact that there’s no such algorithm at least opens up the possibility of quantum-secure CRHFs.  However, there are many other motivations.  For example, the collision lower bound rules out the most “simpleminded” approach to a polynomial-time quantum algorithm for the Graph Isomorphism problem (though, I hasten to add, it says nothing about more sophisticated approaches).  The collision problem is also closely related to Statistical Zero Knowledge (SZK) proof protocols, so that the collision lower bound leads to an oracle relative to which SZK is not in BQP.

Probably the most bizarre motivation to other people, but for some reason the most important one to me back in 2001, is that the collision problem is closely related to the problem of sampling the entire trajectories of hidden variables, in hidden-variable theories such as Bohmian mechanics.  The collision lower bound provides strong evidence that this trajectory-sampling problem is hard even for a quantum computer—intuitively because a QC can’t keep track of the correlations between the hidden-variable positions at different times.  The way I like to put it is that if, at the moment of your death, your entire life history flashed before you in an instant (and if a suitable hidden-variable theory were true, and if you’d performed an appropriate quantum interference experiment on your own brain during your life), then you really could solve the collision problem in only O(1) steps.  Interestingly, you still might not be able to solve NP-complete problems—I don’t know!  But you could at least do something that we think is hard for a quantum computer.

I proved the first collision lower bound in 2001 (actually, a week or so after the 9/11 attacks), after four months of sleepless nights and failed attempts.  (Well actually, I only got the weaker lower bound of ~n1/5; the ~n1/3 was a subsequent improvement due to Yaoyun Shi.  Before ~n1/5, no one could even rule out that a quantum computer could solve the collision problem with a constant number of steps (!!), independent of n—say, 4 steps.)  It was the first thing I’d proved of any significance, and probably the most important thing I did while in grad school.  I knew it was one of the favorite problems of my adviser, Umesh Vazirani, so I didn’t even tell Umesh I was working on it until I’d already spent the whole summer on it.  I figured he’d think I was nuts.

Bonus Proof Explanation!

The technique that ultimately worked was the polynomial method, which was introduced to quantum computing four years prior in a seminal paper of Beals et al.  In this technique, you first suppose by contradiction that a quantum algorithm exists to solve your problem that makes very few accesses to the input bits—say, T.  Then you write out the quantum algorithm’s acceptance probability (e.g., the probability that the algorithm outputs “yes, I found what I was looking for”) as a multivariate polynomial p in the input bits.  It’s not hard to prove that p has degree at most 2T, since the amplitudes in the quantum algorithm can be written as degree-T polynomials (each input access increases the degree by at most 1, and unitary transformations in between input accesses don’t increase the degree at all); then squaring the amplitudes to get probabilities doubles the degree.  (This is the only part of the method that uses anything specific to quantum mechanics!)

Next, you choose some parameter k related to the problem of interest, and you let q(k) be the expectation of p(X) over all inputs X with the parameter equal to k.  For example, with the collision problem, it turns out that the “right” choice to make is to set k=1 if each number appears exactly once in your input list, k=2 if each number appears exactly twice, k=3 if each number appears exactly three times, and so on.  Then—here comes the “magic” part—you show that q(k) itself is a univariate polynomial in k, again of degree at most 2T.  This magical step is called “symmetrization”; it can be traced at least as far back as the famous 1969 book Perceptrons by Marvin Minsky and Seymour Papert.  In the case of the collision problem, I still have no explanation, 12 years later, for why symmetrization works: all I can say is that you do the calculation, and you cancel lots of things from both the numerator and the denominator, and what comes out at the end is a low-degree polynomial in k.  (It’s precisely because I would never have predicted such a “zany coincidence,” that I had to stumble around in the dark for 4 months before I finally discovered by chance that the polynomial method worked.)

Anyway, after applying symmetrization, you’re left with a low-degree univariate polynomial q with some very interesting properties: for example, you need 0≤q(k)≤1 for positive integers k, since then q(k) represents an averaged probability that your quantum algorithm does something.  You also need q(1) to be close to 0, since if k=1 then there no collisions to be found, and you need q(2) to be close to 1, since if k=2 then there are lots of collisions and you’d like your algorithm to find one.  But now, you can appeal to a theorem of A. A. Markov from the 1890s, which implies that no low-degree polynomial exists with those properties!  Hence your original efficient quantum algorithm can’t have existed either: indeed, you get a quantitative lower bound (a tight one, if you’re careful) on the number of input accesses your algorithm must have made.  And that, modulo some nasty technicalities (e.g., what if k doesn’t evenly divide the size of your list?), is how the collision lower bound works.

So, in the first half of my QStart talk, I explain the collision lower bound and its original motivations (and a little about the proof, but no more than what I said above).  Then in the second half, I survey lots of extensions and applications between 2002 and the present, as well as the many remaining open problems.  For example, I discuss the tight lower bound of Ambainis et al. for the “index erasure” problem, Belovs’s proof of the element distinctness lower bound using the adversary method, and my and Ambainis’s generalization of the collision lower bound to arbitrary symmetric problems.  I also talk about Mark Zhandry’s recent breakthrough (sorry, am I not allowed to use that word?) showing that the GGM construction of pseudorandom functions is secure against quantum adversaries, and how Zhandry’s result can be seen—in retrospect, anyway—as yet another application of the collision lower bound.

Probably of the most general interest, I discuss how Daniel Harlow and Patrick Hayden invoked the collision lower bound in their striking recent paper on the AMPS black hole “firewall” paradox.  In particular they argued that, in order to uncover the apparent violation of local quantum field theory at the heart of the paradox, an observer falling into a black hole would probably need to solve a QSZK-complete computational problem.  And of course, the collision lower bound furnishes our main piece of evidence that QSZK-complete problems really should require exponential time even for quantum computers.  So, Harlow and Hayden argue, the black hole would already have evaporated before the observer had even made a dent in the requisite computation.

Now, the Harlow-Hayden paper, and the AMPS paradox more generally, really deserve posts of their own—just as soon as I learn enough to decide what I think about them.  For now, I’ll simply say that, regardless of how convinced you are by Harlow and Hayden’s argument (and, a bit like with my free-will essay, it’s not clear how convinced the authors themselves are!), it’s one of the most ambitious syntheses of computational complexity and physics I’ve ever seen.  You can disagree with it, but to read the paper (or watch the talk, streaming video from Strings’2013 here) is to experience the thrill of seeing black hole physics related to complexity theory by authors who really know both.

(In my own talk on the collision lower bound, the short segment about Harlow-Hayden generated more questions and discussion than the rest of the talk combined—with me being challenged to defend their argument, even with Patrick Hayden right there in the audience!  I remarked later that that portion of the talk was itself a black hole for audience interest.)

In totally unrelated news, Quantum Computing Since Democritus made Scientific American’s list of best summer books!  I can’t think of a more appropriate honor, since if there’s any phrase that captures what QCSD is all about, “sizzling summer beach read” would be it.  Apparently there will even be an online poll soon, where y’all can go and vote for QCSD as your favorite.  Vote early and often, and from multiple IP addresses!

### D-Wave: Truth finally starts to emerge

Thursday, May 16th, 2013

Wrap-Up (June 5): This will be my final update on this post (really!!), since the discussion seems to have reached a point where not much progress is being made, and since I’d like to oblige the commenters who’ve asked me to change the subject.  Let me try to summarize the main point I’ve been trying to get across this whole time.  I’ll call the point (*).

(*) D-Wave founder Geordie Rose claims that D-Wave has now accomplished its goal of building a quantum computer that, in his words, is “better at something than any other option available.”  This claim has been widely and uncritically repeated in the press, so that much of the nerd world now accepts it as fact.  However, the claim is not supported by the evidence currently available.  It appears that, while the D-Wave machine does outperform certain off-the-shelf solvers, simulated annealing codes have been written that outperform the D-Wave machine on its own native problem when run on a standard laptop.  More research is needed to clarify the issue, but in the meantime, it seems worth knowing that this is where things currently stand.

In the comments, many people tried repeatedly to change the subject from (*) to various subsidiary questions.  For example: isn’t it possible that D-Wave’s current device will be found to provide a speedup on some other distribution of instances, besides the one that was tested?  Even if not, isn’t it possible that D-Wave will achieve a genuine speedup with some future generation of machines?  Did it make business sense for Google to buy a D-Wave machine?  What were Google’s likely reasons?  What’s D-Wave’s current value as a company?  Should Cathy McGeoch have acted differently, in the type of comparison she agreed to do, or in how she communicated about its results?  Should I have acted differently, in my interaction with McGeoch?

And, I’m afraid to say, I jumped in to the discussion of all of those questions—because, let’s face it, there are very few subjects about which I don’t have an opinion, or at least a list of qualified observations to make.  In retrospect, I now think that was a mistake.  It would have been better to sidestep all the other questions—not one of which I really know the answer to, and each of which admits multiple valid perspectives—and just focus relentlessly on the truth of assertion (*).

Here’s an analogy: imagine that a biotech startup claimed that, by using an expensive and controversial new gene therapy, it could cure patients at a higher rate than with the best available conventional drugs—basing its claim on a single clinical trial.  Imagine that this claim was widely repeated in the press as an established fact.  Now imagine that closer examination of the clinical trial revealed that it showed nothing of the kind: it compared against the wrong drugs.  And imagine that a more relevant clinical trial—mostly unmentioned in the press—had also been done, and discovered that when you compare to the right drugs, the drugs do better.  Imagine that someone wrote a blog post bringing all of this to public attention.

And now imagine that the response to that blogger was the following: “aha, but isn’t it possible that some future clinical trial will show an advantage for the gene therapy—maybe with some other group of patients?  Even if not, isn’t it possible that the startup will manage to develop an effective gene therapy sometime in the future?  Betcha didn’t consider that, did you?  And anyway, at least they’re out there trying to make gene therapy work!  So we should all support them, rather than relentlessly criticizing.  And as for the startup’s misleading claims to the public?  Oh, don’t be so naïve: that’s just PR.  If you can’t tune out the PR and concentrate on the science, that’s your own damn problem.  In summary, the real issue isn’t what some clinical trial did or didn’t show; it’s you and your hostile attitude.”

In a different context, these sorts of responses would be considered strange, and the need to resort to them revealing.  But the rules for D-Wave are different.

(Interestingly, in excusing D-Wave’s statements, some commenters explicitly defended standards of intellectual discourse so relaxed that, as far as I could tell, just about anything anyone could possibly say would be OK with them—except of course for what I say on this blog, which is not OK!  It reminds me of the central tenet of cultural relativism: that there exist no universal standards by which any culture could ever be judged “good” or “bad,” except that Western culture is irredeemably evil.)

Update (June 4): Matthias Troyer (who, unfortunately, still can’t comment here for embargo reasons) has asked me to clarify that it’s not he, but rather his postdoc Sergei Isakov, who deserves the credit for actually writing the simulated annealing code that outperformed the D-Wave machine on the latter’s own “home turf” (i.e., random QUBO instances with the D-Wave constraint graph).  The quantum Monte Carlo code, which also did quite well at simulating the D-Wave machine, was written by Isakov together with another of Matthias’s postdocs, Troels Rønnow.

Update (June 3): See Cathy McGeoch’s response (here and here), and my response to her response.

Yet More Updates (June 2): Alex Selby has a detailed new post summarizing his comparisons between the D-Wave device (as reported by McGeoch and Wang) and his own solver—finding that his solver can handily outperform the device and speculating about the reasons why.

In other news, Catherine McGeoch spoke on Friday in the MIT quantum group meeting.  Incredibly, she spoke for more than an hour, without once mentioning the USC results that found that simulated annealing on a standard laptop (when competently implemented) handily outperformed the D-Wave machine, or making any attempt to reconcile those results with hers and Wang’s.  Instead, McGeogh used the time to enlighten the assembled experts about what quantum annealing was, what an exact solver was, etc. etc., then repeated the speedup claims as if the more informative comparisons simply didn’t exist.  I left without asking questions, not wanting to be the one to instigate an unpleasant confrontation, and—I’ll admit—questioning my own sanity as a result of no one else asking about the gigantic elephant in the room.

More Updates (May 21): Happy 25th birthday to me!  Among the many interesting comments below, see especially this one by Alex Selby, who says he’s written his own specialist solver for one class of the McGeoch and Wang benchmarks that significantly outperforms the software (and D-Wave machine) tested by McGeoch and Wang on those benchmarks—and who provides the Python code so you can try it yourself.

Also, Igor Vernik asked me to announce that on July 8th, D-Wave will be giving a technical presentation at the International Superconducting Electronics Conference in Cambridge.  See here for more info; I’ll be traveling then and won’t be able to make it.  I don’t know whether the performance comparisons to Matthias Troyer’s and Alex Selby’s code will be among the topics discussed, or if there will be an opportunity to ask questions about such things.

In another exciting update, John Smolin and Graeme Smith posted a paper to the arXiv tonight questioning even the “signature of quantumness” part of the latest D-Wave claims—the part that I’d been ~98% willing to accept, even as I relayed evidence that cast enormous doubt on the “speedup” part. Specifically, Smolin and Smith propose a classical model that they say can explain the “bimodal” pattern of success probabilities observed by the USC group as well as quantum annealing can. I haven’t yet had time to read their paper or form an opinion about it, but I’d be very interested if others wanted to weigh in.   Update (May 26): The USC group has put out a new preprint responding to Smolin and Smith, offering additional evidence for quantum behavior in the D-Wave device that they say can’t be explained using Smolin and Smith’s model.

Update (May 17): Daniel Lidar emailed me to clarify his views about error-correction and the viability of D-Wave’s approach.  He invited me to share his clarification with others—something that I’m delighted to do, since I agree with him wholeheartedly.  Without further ado, here’s what Lidar says:

I don’t believe D-Wave’s approach is scalable without error correction.  I believe that the incorporation of error correction is a necessary condition in order to ever achieve a speedup with D-Wave’s machines, and I don’t believe D-Wave’s machines are any different from other types of quantum information processing in this regard.  I have repeatedly made this point to D-Wave over several years, and I hope that in the future their designs will allow more flexibility in the incorporation of error correction.

Lidar also clarified that he not only doesn’t dispute what Matthias Troyer told me about the lack of speedup of the D-Wave device compared to classical simulated annealing in their experiments, but “fully agrees, endorses, and approves” of it—and indeed, that he himself was part of the team that did the comparison.

In other news, this Hacker News thread, which features clear, comprehending discussions of this blog post and the backstory that led up to it, has helped to restore my faith in humanity.

Two years ago almost to the day, I announced my retirement as Chief D-Wave Skeptic.  But—as many readers predicted at the time—recent events (and the contents of my inbox!) have given me no choice except to resume my post.  In an all-too-familiar pattern, multiple rounds of D-Wave-related hype have made it all over the world before the truth has had time to put its pants on and drop its daughter off in daycare.  And the current hype is particularly a shame, because once one slices through all the layers of ugh—the rigged comparisons, the “dramatic announcements” that mean nothing, the lazy journalists cherry-picking what they want to hear and ignoring the inconvenient bits—there really has been a huge scientific advance this past month in characterizing the D-Wave devices.  I’m speaking about the experiments on the D-Wave One installed at USC, the main results of which finally appeared in April.  Two of the coauthors of this new work—Matthias Troyer and Daniel Lidar—were at MIT recently to speak about their results, Troyer last week and Lidar this Tuesday.  Intriguingly, despite being coauthors on the same paper, Troyer and Lidar have very different interpretations of what their results mean, but we’ll get to that later.  For now, let me summarize what I think their work has established.

Evidence for Quantum Annealing Behavior

For the first time, we have evidence that the D-Wave One is doing what should be described as “quantum annealing” rather than “classical annealing” on more than 100 qubits.  (Note that D-Wave itself now speaks about “quantum annealing” rather than “quantum adiabatic optimization.”  The difference between the two is that the adiabatic algorithm runs coherently, at zero temperature, while quantum annealing is a “messier” version in which the qubits are strongly coupled to their environment throughout, but still maintain some quantum coherence.)  The evidence for quantum annealing behavior is still extremely indirect, but despite my “Chief Skeptic” role, I’m ready to accept what the evidence indicates with essentially no hesitation.

So what is the evidence?  Basically, the USC group ran the D-Wave One on a large number of randomly generated instances of what I’ll call the “D-Wave problem”: namely, the problem of finding the lowest-energy configuration of an Ising spin glass, with nearest-neighbor interactions that correspond to the D-Wave chip’s particular topology.  Of course, restricting attention to this “D-Wave problem” tilts the tables heavily in D-Wave’s favor, but no matter: scientifically, it makes a lot more sense than trying to encode Sudoku puzzles or something like that.  Anyway, the group then looked at the distribution of success probabilities when each instance was repeatedly fed to the D-Wave machine.  For example, would the randomly-generated instances fall into one giant clump, with a few outlying instances that were especially easy or especially hard for the machine?  Surprisingly, they found that the answer was no: the pattern was strongly bimodal, with most instances either extremely easy or extremely hard, and few instances in between.  Next, the group fed the same instances to Quantum Monte Carlo: a standard classical algorithm that uses Wick rotation to find the ground states of “stoquastic Hamiltonians,” the particular type of quantum evolution that the D-Wave machine is claimed to implement.  When they did that, they found exactly the same bimodal pattern that they found with the D-Wave machine.  Finally they fed the instances to a classical simulated annealing program—but there they found a “unimodal” distribution, not a bimodal one.  So, their conclusion is that whatever the D-Wave machine is doing, it’s more similar to Quantum Monte Carlo than it is to classical simulated annealing.

Curiously, we don’t yet have any hint of a theoretical explanation for why Quantum Monte Carlo should give rise to a bimodal distribution, while classical simulating annealing should give rise to a unimodal one.  The USC group simply observed the pattern empirically (as far as I know, they’re the first to do so), then took advantage of it to characterize the D-Wave machine.  I regard explaining this pattern as an outstanding open problem raised by their work.

In any case, if we accept that the D-Wave One is doing “quantum annealing,” then despite the absence of a Bell-inequality violation or other direct evidence, it’s reasonably safe to infer that there should be large-scale entanglement in the device.  I.e., the true quantum state is no doubt extremely mixed, but there’s no particular reason to believe we could decompose that state into a mixture of product states.  For years, I tirelessly repeated that D-Wave hadn’t even provided evidence that its qubits were entangled—and that, while you can have entanglement with no quantum speedup, you can’t possibly have a quantum speedup without at least the capacity to generate entanglement.  Now, I’d say, D-Wave finally has cleared the evidence-for-entanglement bar—and, while they’re not the first to do so with superconducting qubits, they’re certainly the first to do so with so many superconducting qubits.  So I congratulate D-Wave on this accomplishment.  If this had been advertised from the start as a scientific research project—”of course we’re a long way from QC being practical; no one would ever claim otherwise; but as a first step, we’ve shown experimentally that we can entangle 100 superconducting qubits with controllable couplings”—my reaction would’ve been, “cool!”  (Similar to my reaction to any number of other steps toward scalable QC being reported by research groups all over the world.)

No Speedup Compared to Classical Simulated Annealing

But of course, D-Wave’s claims—and the claims being made on its behalf by the Hype-Industrial Complex—are far more aggressive than that.  And so we come to the part of this post that has not been pre-approved by the International D-Wave Hype Repeaters Association.  Namely, the same USC paper that reported the quantum annealing behavior of the D-Wave One, also showed no speed advantage whatsoever for quantum annealing over classical simulated annealing.  In more detail, Matthias Troyer’s group spent a few months carefully studying the D-Wave problem—after which, they were able to write optimized simulated annealing code that solves the D-Wave problem on a normal, off-the-shelf classical computer, about 15 times faster than the D-Wave machine itself solves the D-Wave problem!  Of course, if you wanted even more classical speedup than that, then you could simply add more processors to your classical computer, for only a tiny fraction of the ~$10 million that a D-Wave One would set you back. Some people might claim it’s “unfair” to optimize the classical simulated annealing code to take advantage of the quirks of the D-Wave problem. But think about it this way: D-Wave has spent ~$100 million, and hundreds of person-years, optimizing the hell out of a special-purpose annealing device, with the sole aim of solving this one problem that D-Wave itself defined.  So if we’re serious about comparing the results to a classical computer, isn’t it reasonable to have one professor and a few postdocs spend a few months optimizing the classical code as well?

As I said, besides simulated annealing, the USC group also compared the D-Wave One’s performance against a classical implementation of Quantum Monte Carlo.  And maybe not surprisingly, the D-Wave machine was faster than a “direct classical simulation of itself” (I can’t remember how many times faster, and couldn’t find that information in the paper).  But even here, there’s a delicious irony.  The only reason the USC group was able to compare the D-Wave one against QMC at all, is that QMC is efficiently implementable on a classical computer!  (Albeit probably with a large constant overhead compared to running the D-Wave annealer itself—hence the superior performance of classical simulated annealing over QMC.)  This means that, if the D-Wave machine can be understood as reaching essentially the same results as QMC (technically, “QMC with no sign problem”), then there’s no real hope for using the D-Wave machine to get an asymptotic speedup over a classical computer.  The race between the D-Wave machine and classical simulations of the machine would then necessarily be a cat-and-mouse game, a battle of constant factors with no clear asymptotic victor.  (Some people might conjecture that it will also be a “Tom & Jerry game,” the kind where the classical mouse always gets the better of the quantum cat.)

At this point, it’s important to give a hearing to three possible counterarguments to what I’ve written above.

The first counterargument is that, if you plot both the runtime of simulated annealing and the runtime of the D-Wave machine as functions of the instance size n, you find that, while simulated annealing is faster in absolute terms, it can look like the curve for the D-Wave machine is less steep.  Over on the blog “nextbigfuture”, an apparent trend of this kind has been fearlessly extrapolated to predict that with 512 qubits, the D-Wave machine will be 10 billion times faster than a classical computer.  But there’s a tiny fly in the ointment.  As Troyer carefully explained to me last week, the “slow growth rate” of the D-Wave machine’s runtime is, ironically, basically an artifact of the machine being run too slowly on small values of n.  Run the D-Wave machine as fast as it can run for small n, and the difference in the slopes disappears, with only the constant-factor advantage for simulated annealing remaining.  In short, there seems to be no evidence, at present, that the D-Wave machine is going to overtake simulated annealing for any instance size.

The second counterargument is that the correlation between the two “bimodal distributions”—that for the D-Wave machine and that for the Quantum Monte Carlo simulation—is not perfect.  In other words, there are a few instances (not many) that QMC solves faster than the D-Wave machine, and likewise a few instances that the D-Wave machine solves faster than QMC.  Not surprisingly, the latter fact has been eagerly seized on by the D-Wave boosters (“hey, sometimes the machine does better!”).  But Troyer has a simple and hilarious response to that.  Namely, he found that his group’s QMC code did a better job of correlating with the D-Wave machine, than the D-Wave machine did of correlating with itself!  In other words, calibration errors seem entirely sufficient to explain the variation in performance, with no need to posit any special class of instances (however small) on which the D-Wave machine dramatically outperforms QMC.

The third counterargument is just the banal one: the USC experiment was only one experiment with one set of instances (albeit, a set one might have thought would be heavily biased toward D-Wave).  There’s no proof that, in the future, it won’t be discovered that the D-Wave machine does something more than QMC, and that there’s some (perhaps specially-designed) set of instances on which the D-Wave machine asymptotically outperforms both QMC and Troyer’s simulated annealing code.  (Indeed, I gather that folks at D-Wave are now assiduously looking for such instances.)  Well, I concede that almost anything is possible in the future—but “these experiments, while not supporting D-Wave’s claims about the usefulness of its devices, also don’t conclusively disprove those claims” is a very different message than what’s currently making it into the press.

Comparison to CPLEX is Rigged

Unfortunately, the USC paper is not the one that’s gotten the most press attention—perhaps because half of it inconveniently told the hypesters something they didn’t want to hear (“no speedup”).  Instead, journalists have preferred a paper released this week by Catherine McGeoch and Cong Wang, which reports that quantum annealing running on the D-Wave machine outperformed the CPLEX optimization package running on a classical computer by a factor of ~3600, on Ising spin problems involving 439 bits.  Wow!  That sounds awesome!  But before rushing to press, let’s pause to ask ourselves: how can we reconcile this with the USC group’s result of no speedup?

The answer turns out to be painfully simple.  CPLEX is a general-purpose, off-the-shelf exact optimization package.  Of course an exact solver can’t compete against quantum annealing—or for that matter, against classical annealing or other classical heuristics!  Noticing this problem, McGeoch and Wang do also compare the D-Wave machine against tabu search, a classical heuristic algorithm.  When they do so, they find that an advantage for the D-Wave machine persists, but it becomes much, much smaller (they didn’t report the exact time comparison).  Amusingly, they write in their “Conclusions and Future Work” section:

It would of course be interesting to see if highly tuned implementations of, say, tabu search or simulated annealing could compete with Blackbox or even QA [i.e., the D-Wave machines] on QUBO [quadratic binary optimization] problems; some preliminary work on this question is underway.

As I said above, at the time McGeoch and Wang’s paper was released to the media (though maybe not at the time it was written?), the “highly tuned implementation” of simulated annealing that they ask for had already been written and tested, and the result was that it outperformed the D-Wave machine on all instance sizes tested.  In other words, their comparison to CPLEX had already been superseded by a much more informative comparison—one that gave the “opposite” result—before it ever became public.  For obvious reasons, most press reports have simply ignored this fact.

Troyer, Lidar, and Stone Soup

Much of what I’ve written in this post, I learned by talking to Matthias Troyer—the man who carefully experimented with the D-Wave machine and figured out how to beat it using simulated annealing, and who I regard as probably the world’s #1 expert right now on what exactly the machine does.  Troyer wasn’t shy about sharing his opinions, and while couched with qualifications, they tended toward extremely skeptical.  For example, Troyer conjectured that, if D-Wave ultimately succeeds in getting a speedup over classical computers in a fair comparison, then it will probably be by improving coherence and calibration, incorporating error-correction, and doing other things that “traditional,” “academic” quantum computing researchers had said all along would need to be done.

As I said, Daniel Lidar is another coauthor on the USC paper, and also recently visited MIT to speak.  Lidar and Troyer agree on the basic facts—yet Lidar noticeably differed from Troyer, in trying to give each fact the most “pro-D-Wave spin” it could possibly support.  Lidar spoke at our quantum group meeting, not about the D-Wave vs. simulated annealing performance comparison (which he agrees with), but about a proposal of his for incorporating quantum error-correction into the D-Wave device, together with some experimental results.  He presented his proposal, not as a reductio ad absurdum of D-Wave’s entire philosophy, but rather as a positive opportunity to get a quantum speedup using D-Wave’s approach.

So, to summarize my current assessment of the situation: yes, absolutely, D-Wave might someday succeed—ironically, by adapting the very ideas from “the gate model” that its entire business plan has been based on avoiding, and that D-Wave founder Geordie Rose has loudly denigrated for D-Wave’s entire history!  If that’s what happens, then I predict that science writers, and blogs like “nextbigfuture,” will announce from megaphones that D-Wave has been vindicated at last, while its narrow-minded, theorem-obsessed, ivory-tower academic naysayers now have egg all over their faces.  No one will care that the path to success—through quantum error-correction and so on—actually proved the academic critics right, and that D-Wave’s “vindication” was precisely like that of the deliciousness of stone soup in the old folktale.  As for myself, I’ll probably bang my head on my desk until I sustain so much brain damage that I no longer care either.  But at least I’ll still have tenure, and the world will have quantum computers.

The Messiah’s Quantum Annealer

Over the past few days, I’ve explained the above to at least six different journalists who asked.  And I’ve repeatedly gotten a striking response: “What you say makes sense—but then why are all these prestigious people and companies investing in D-Wave?  Why did Bo Ewald, a prominent Silicon Valley insider, recently join D-Wave as president of its US operations?  Why the deal with Lockheed Martin?  Why the huge deal with NASA and Google, just announced today?  What’s your reaction to all this news?”

My reaction, I confess, is simple.  I don’t care—I actually told them this—if the former Pope Benedict has ended his retirement to become D-Wave’s new marketing director.  I don’t care if the Messiah has come to Earth on a flaming chariot, not to usher in an age of peace but simply to spend $10 million on D-Wave’s new Vesuvius chip. And if you imagine that I’ll ever care about such things, then you obviously don’t know much about me. I’ll tell you what: if peer pressure is where it’s at, then come to me with the news that Umesh Vazirani, or Greg Kuperberg, or Matthias Troyer is now convinced, based on the latest evidence, that D-Wave’s chip asymptotically outperforms simulated annealing in a fair comparison, and does so because of quantum effects. Any one such scientist’s considered opinion would mean more to me than 500,000 business deals. The Argument from Consequences Let me end this post with an argument that several of my friends in physics have explicitly made to me—not in the exact words below but in similar ones. “Look, Scott, let the investors, government bureaucrats, and gullible laypeople believe whatever they want—and let D-Wave keep telling them whatever’s necessary to stay in business. It’s unsportsmanlike and uncollegial of you to hold D-Wave’s scientists accountable for whatever wild claims their company’s PR department might make. After all, we’re in this game too! Our universities put out all sorts of overhyped press releases, but we don’t complain because we know that it’s done for our benefit. Besides, you’d doubtless be trumpeting the same misleading claims, if you were in D-Wave’s shoes and needed the cash infusions to survive. Anyway, who really cares whether there’s a quantum speedup yet or no quantum speedup? At least D-Wave is out there trying to build a scalable quantum computer, and getting millions of dollars from Jeff Bezos, Lockheed, Google, the CIA, etc. etc. to do so—resources more of which would be directed our way if we showed a more cooperative attitude! If we care about scalable QCs ever getting built, then the wise course is to celebrate what D-Wave has done—they just demonstrated quantum annealing on 100 qubits, for crying out loud! So let’s all be grownups here, focus on the science, and ignore the marketing buzz as so much meaningless noise—just like a tennis player might ignore his opponent’s trash-talking (‘your mother is a whore,’ etc.) and focus on the game.” I get this argument: really, I do. I even concede that there’s something to be said for it. But let me now offer a contrary argument for the reader’s consideration. Suppose that, unlike in the “stone soup” scenario I outlined above, it eventually becomes clear that quantum annealing can be made to work on thousands of qubits, but that it’s a dead end as far as getting a quantum speedup is concerned. Suppose the evidence piles up that simulated annealing on a conventional computer will continue to beat quantum annealing, if even the slightest effort is put into optimizing the classical annealing code. If that happens, then I predict that the very same people now hyping D-Wave will turn around and—without the slightest acknowledgment of error on their part—declare that the entire field of quantum computing has now been unmasked as a mirage, a scam, and a chimera. The same pointy-haired bosses who now flock toward quantum computing, will flock away from it just as quickly and as uncomprehendingly. Academic QC programs will be decimated, despite the slow but genuine progress that they’d been making the entire time in a “parallel universe” from D-Wave. People’s contempt for academia is such that, while a D-Wave success would be trumpeted as its alone, a D-Wave failure would be blamed on the entire QC community. When it comes down to it, that’s the reason why I care about this matter enough to have served as “Chief D-Wave Skeptic” from 2007 to 2011, and enough to resume my post today. As I’ve said many times, I really, genuinely hope that D-Wave succeeds at building a QC that achieves an unambiguous speedup! I even hope the academic QC community will contribute to D-Wave’s success, by doing careful independent studies like the USC group did, and by coming up with proposals like Lidar’s for how D-Wave could move forward. On the other hand, in the strange, unlikely event that D-Wave doesn’t succeed, I’d like people to know that many of us in the QC community were doing what academics are supposed to do, which is to be skeptical and not leave obvious questions unasked. I’d like them to know that some of us simply tried to understand and describe what we saw in front of us—changing our opinions repeatedly as new evidence came in, but disregarding “meta-arguments” like my physicist friends’ above. The reason I can joke about how easy it is to bribe me is that it’s actually kind of hard. ### “Closer to Truth” Wednesday, May 1st, 2013 Two years ago, when I attended the FQXi conference on a ship from Norway to Denmark, I (along with many other conference participants) was interviewed by Robert Lawrence Kuhn, who produces a late-night TV program called “Closer to Truth.” I’m pleased to announce (hat tip: Sean Carroll) that four videos from my interview are finally available online: • Is the Universe a Computer? • (like a politician, I steer the question toward “what kind of computer is the universe?,” then start talking about P vs. NP, quantum computing, and the holographic principle) • What Does Quantum Theory Mean? • (here I mostly talk about the idea of computational intractability as a principle of physics) • Quantum Computing Mysteries • (basics of quantum mechanics and quantum computing) • Setting Time Aright (about the differences between time and space, the P vs. PSPACE problem, and computing with closed timelike curves) (No, I didn’t choose the titles!) For regular readers of this blog, there’s probably nothing new in these videos, but for those who are “just tuning in,” they provide an extremely simple and concise introduction to what I care about and why. I’m pretty happy with how they came out. Once you’re finished with me (or maybe even before then…), click here for the full list of interviewees, which includes David Albert, Raphael Bousso, Sean Carroll, David Deutsch, Rebecca Goldstein, Seth Lloyd, Marvin Minsky, Roger Penrose, Lenny Susskind, Steven Weinberg, and many, many others who might be of interest to Shtetl-Optimized readers. ### Superiority of the Latke: The Unexpected Convergence of Quantum Mechanics and Common Sense Friday, April 26th, 2013 Back in February, I gave a talk with the above title at the Annual MIT Latke-Hamentaschen Debate. I’m pleased to announce that streaming video of my talk is now available! (My segment starts about 10 minutes into the video, and lasts for 10 minutes.) You can also download my PowerPoint slides here. Out of hundreds of talks I’ve given in my life, on five continents, this is the single talk of which I’m the proudest. Of course, before you form an opinion about the issue at hand, you should also check out the contributions of my fellow debaters. On the sadly-mistaken hamentasch side, my favorite presentation was that of mathematician Arthur Mattuck, which starts in at 56 minutes and lasts for a full half hour (!! – the allotted time was only 8 minutes). Mattuck relates the shapes of latkes and hamentaschen to the famous Kakeya problem in measure theory—though strangely, his final conclusions seem to provide no support whatsoever for the hamentaschen, even on Mattuck’s own terms. Finally, what if you’re a reader for whom the very words “latke” and “hamentaschen” are just as incomprehensible as the title of this blog? OK, here are some Cliff Notes: • Latkes are fried potato pancakes, traditionally eaten by Jews on Hannukah. • Hamentaschen are triangular fruit-filled cookies, traditionally eaten by Jews on Purim. • Beginning at the University of Chicago in 1946, many universities around the world have held farcical annual “debates” between faculty members (both Jewish and non-Jewish) about which of those two foods is better. (The reason I say “farcical” is simply that, as I explain in my talk, the truth has always been overwhelmingly on one side.) The debaters have invoked everything from feminist theory to particle physics to bolster their case. Thanks very much to Dean of Admissions Stu Schmill for moderating, and to MIT Hillel for organizing the debate. Update: Luboš has a new blog post announcing that he finally found a chapter in Quantum Computing Since Democritus that he likes! Woohoo! Whether coincidentally or not, the chapter he likes makes exactly the same points about quantum mechanics that I also make in my pro-latke presentation. ### QStart conference in Jerusalem, June 24-27 Sunday, April 14th, 2013 Friend-of-the-blog Dorit Aharonov asked me to advertise the QStart Conference, which will be held at Hebrew University of Jerusalem June 24-27 of this year, to celebrate the opening of Hebrew University’s new Quantum Information Science Center. Speakers include Yakir Aharonov, Jacob Bekenstein, Hans Briegel, Ed Farhi, Patrick Hayden, Ray Laflamme, Elon Lindenstrauss, Alex Lubotzky, John Martinis, Barbara Terhal, Umesh Vazirani, Stephanie Wehner, Andrew Yao … and me, your humble blogger (who will actually be there with Lily, on her first trip abroad—or for that matter, beyond the Boston metropolitan area). Dorit tells me that the conference should be of interest to mathematicians, physicists, chemists, philosophers, and computer scientists; that registration is open now; and that student travel support is available. Oh, and if you’re one of the people who think quantum computing is bunk? As displayed on the poster above, leading QC skeptic Gil Kalai is a co-organizer of the conference. ### “So You Think Quantum Computing Is Bunk?” Friday, April 12th, 2013 On Wednesday, I gave a fun talk with that title down the street at Microsoft Research New England. Disappointingly, no one in the audience did seem to think quantum computing was bunk (or if they did, they didn’t speak up): I was basically preaching to the choir. My PowerPoint slides are here. There’s also a streaming video here, but watch it at your own risk—my stuttering and other nerdy mannerisms seemed particularly bad, at least in the short initial segment that I listened to. I really need media training. Anyway, thanks very much to Boaz Barak for inviting me. ### Quantum Computing Since Democritus: The Buzz Intensifies Thursday, March 21st, 2013 Update (March 22): The Kindle edition of Quantum Computing Since Democritus is now available, for the low price of$15.40!  (Not factorial.)  Click here to get it from amazon.com, or here to get it from amazon.co.uk.  And let me know how it looks (I haven’t seen it yet).  Another Update: Just saw the Kindle edition, and the figures and formulas came out great!  It’s a product I stand behind with pride.

In the meantime, I regret to say that the marketing for this book is getting crasser and more exploitative by the day.

It seems like wherever I go these days, all anyone wants to talk about is Quantum Computing Since Democritus—the sprawling new book by Scott Aaronson, published by Cambridge University Press and available for order now.  Among leading figures in quantum information science—many of them well-known to Shtetl-Optimized readers—the book is garnering the sort of hyperbolic praise that would make Shakespeare or Tolstoy blush:

“I laughed, I cried, I fell off my chair – and that was just reading the chapter on Computational Complexity.  Aaronson is a tornado of intellectual activity: he rips our brains from their intellectual foundations; twists them through a tour of physics, mathematics, computer science, and philosophy; stuffs them full of facts and theorems; tickles them until they cry ‘Uncle’; and then drops them, quivering, back into our skulls.  Aaronson raises deep questions of how the physical universe is put together and why it is put together the way it is.  While we read his lucid explanations we can believe – at least while we hold the book in our hands – that we understand the answers, too.” –Seth Lloyd

“Scott Aaronson has written a beautiful and highly original synthesis of what we know about some of the most fundamental questions in science: What is information? What does it mean to compute? What is the nature of mind and of free will?” –Michael Nielsen

“Not since Richard Feynman’s Lectures on Physics has there been a set of lecture notes as brilliant and as entertaining.  Aaronson leads the reader on a wild romp through the most important intellectual achievements in computing and physics, weaving these seemingly disparate fields into a captivating narrative for our modern age of information.  Aaronson wildly runs through the fields of physics and computers, showing us how they are connected, how to understand our computational universe, and what questions exist on the borders of these fields that we still don’t understand.   This book is a poem disguised as a set of lecture notes.  The lectures are on computing and physics, complexity theory and mathematical logic and quantum physics.  The poem is made up of proofs, jokes, stories, and revelations, synthesizing the two towering fields of computer science and physics into a coherent tapestry of sheer intellectual awesomeness.” –Dave Bacon

After months of overhearing people saying things like the above—in the halls of MIT, the checkout line at Trader Joe’s, the bathroom, anywhere—I finally had to ask in annoyance: “is all this buzz justified?  I mean, I’m sure the book is as deep, hilarious, and worldview-changing as everyone says it is.  But, after all, it’s based off lecture notes that have long been available for free on the web.  And Aaronson, being the magnanimous, open-access-loving saint that he is, has no plans to remove the online notes, even though he could really use the royalties from book sales to feed his growing family.  Nor does Cambridge University Press object to his principled decision.”

“No, you don’t understand,” they told me.  “Word on the street has it that the book is extensively updated for 2013—that it’s packed with new discussions of things like algebrization, lattice-based cryptography, the QIP=PSPACE theorem, the ‘quantum time travel controversy,’ BosonSampling, black-hole firewalls, and even the Australian models episode.  They say it took years of painstaking work, by Aaronson and his student Alex Arkhipov, to get the notes into book form: fixing mistakes, clarifying difficult points, smoothing out rough edges, all while leaving intact the original’s inimitable humor.  I even heard Aaronson reveals he’s changed his mind about certain things since 2006.  How could you not want such a labor of love on your bookshelf?”

Exasperated, I finally exclaimed: “But the book isn’t even out yet in North America!  Amazon.com says it won’t ship until April 30.”

“Sure,” one gas-station attendant replied to me, “but the secret is, it’s available now from Amazon.co.uk.  Personally, I couldn’t wait a month, so I ordered it shipped to me from across the pond.  But if you’re a less hardcore quantum complexity theory fan, and you live in North America, you can also preorder the book from Amazon.com, and they’ll send it to you when it arrives.”

Much as the hype still grated, I had to admit that I’d run out of counterarguments, so I looked into ordering a copy for myself.