## Archive for the ‘Quantum’ Category

### Yet more errors in papers

Wednesday, May 24th, 2017

Following up on my posts PostBQP Postscripts and More Wrong Things I Said In Papers, it felt like time for another post in which I publicly flog myself for mistakes in my research papers.  [Warning: The rest of this post is kinda, sorta technical.  Read at your own risk.]

(1) In my 2006 paper “Oracles are subtle but not malicious,” I claimed to show that if PP is contained in BQP/qpoly, then the counting hierarchy collapses to QMA (Theorem 5).  But on further reflection, I only know how to show a collapse of the counting hierarchy under the stronger assumption that PP is in BQP/poly.  If PP is in BQP/qpoly, then certainly P#P=PP=QMA, but I don’t know how to collapse any further levels of the counting hierarchy.  The issue is this: in QMA, we can indeed nondeterministically guess an (amplified) quantum advice state for a BQP/qpoly algorithm.  We can then verify that the advice state works to solve PP problems, by using (for example) the interactive protocol for the permanent, or some other #P-complete problem.  But having done that, how do we then unravel the higher levels of the counting hierarchy?  For example, how do we simulate PPPP in PPBQP=PP?  We don’t have any mechanism to pass the quantum advice up to the oracle PP machine, since queries to a PP oracle are by definition classical strings.  We could try to use tools from my later paper with Andy Drucker, passing a classical description of the quantum advice up to the oracle and then using the description to reconstruct the advice for ourselves.  But doing so just doesn’t seem to give us a complexity class that’s low for PP, which is what would be needed to unravel the counting hierarchy.  I still think this result might be recoverable, but a new idea is needed.

(2) In my 2008 algebrization paper with Avi Wigderson, one of the most surprising things we showed was a general connection between communication complexity lower bounds and algebraic query complexity lower bounds.  Specifically, given a Boolean oracle A:{0,1}n→{0,1}, let ~A be a low-degree extension of A over a finite field F (that is, ~A(x)=A(x) whenever x∈{0,1}n).  Then suppose we have an algorithm that’s able to learn some property of A, by making k black-box queries to ~A.  We observed that, in such a case, if Alice is given the top half of the truth table of A, and Bob is given the bottom half of the truth table, then there’s necessarily a communication protocol by which Alice and Bob can learn the same property of A, by exchanging at most O(kn log|F|) bits.  This connection is extremely model-independent: a randomized query algorithm gives rise to a randomized communication protocol, a quantum query algorithm gives rise to a quantum communication protocol, etc. etc.  The upshot is that, if you want to lower-bound the number of queries that an algorithm needs to make to the algebraic extension oracle ~A, in order to learn something about A, then it suffices to prove a suitable communication complexity lower bound.  And the latter, unlike algebraic query complexity, is a well-studied subject with countless results that one can take off the shelf.  We illustrated how one could use this connection to prove, for example, that there exists an oracle A such that NPA ⊄ BQP~A, for any low-degree extension ~A of A—a separation that we didn’t and don’t know how to prove any other way. Likewise, there exists an oracle B such that BQPB ⊄ BPP~B for any low-degree extension ~B of B.

The trouble is, our “proof sketches” for these separations (in Theorem 5.11) are inadequate, even for “sketches.”  They can often be fixed, but only by appealing to special properties of the communication complexity separations in question, properties that don’t necessarily hold for an arbitrary communication separation between two arbitrary models.

The issue is this: while it’s true, as we claimed, that a communication complexity lower bound implies an algebraic query complexity lower bound, it’s not true in general that a communication complexity upper bound implies an algebraic query complexity upper bound!  So, from a communication separation between models C and D, we certainly obtain a query complexity problem that’s not in D~A, but then the problem might not be in CA.  What tripped us up was that, in the cases we had in mind (e.g. Disjointness), it’s obvious that the query problem is in CA.  In other cases, however, such as Raz’s separation between quantum and randomized communication complexity, it probably isn’t even true.  In the latter case, to recover the desired conclusion about algebraic query complexity (namely, the existence of an oracle B such that BQPB ⊄ BPP~B), what seems to be needed is to start from a later quantum vs. classical communication complexity separation due to Klartag and Regev, and then convert their communication problem into a query problem using a recent approach by myself and Shalev Ben-David (see Section 4).  Unfortunately, my and Shalev’s approach only tells us nonconstructively that there exists a query problem with the desired separation, with no upper bound on the gate complexity of the quantum algorithm.  So strictly speaking, I still don’t know how to get a separation between the relativized complexity classes BQPB and BPP~B defined in terms of Turing machines.

In any case, I of course should have realized this issue with the algebrization paper the moment Shalev and I encountered the same issue when writing our later paper.  Let me acknowledge Shalev, as well as Robin Kothari, for helping to spur my realization of the issue.

In case it wasn’t clear, the mistakes I’ve detailed here have no effect on the main results of the papers in question (e.g., the existence of an oracle relative to which PP has linear-sized circuits; the existence and pervasiveness of the algebrization barrier).  The effect is entirely on various “bonus” results—results that, because they’re bonus, were gone over much less carefully by authors and reviewers alike.

Nevertheless, I’ve always felt like in science, the louder you are about your own mistakes, the better.  Hence this post.

### This Week’s BS

Friday, May 5th, 2017

First, a group in Shanghai, led by Chaoyang Lu and Jianwei Pan, has reported in Nature Photonics that they can do BosonSampling with a coincidence rate that’s higher than in previous experiments by a factor of several thousand.  This, in particular, lets them do BosonSampling with 5 photons.  Now, 5 might not sound like that much, especially since the group in Bristol previously did 6-photon BosonSampling.  But to make their experiment work, the Bristol group needed to start its photons in the initial state |3,3〉: that is, two modes with 3 photons each.  This gives rise to matrices with repeated rows, whose permanents are much easier to calculate than the permanents of arbitrary matrices.  By contrast, the Shangai group starts its photons in the “true BosonSampling initial state” |1,1,1,1,1〉: that is, five modes with 1 photon each.  That’s the kind of initial state we ultimately want.

The second piece of news is that on Monday, a group at Bristol—overlapping with the group we mentioned before—submitted a preprint to the arXiv with the provocative title “No imminent quantum supremacy by boson sampling.”  In this paper, they give numerical evidence that BosonSampling, with n photons and m modes, can be approximately simulated by a classical computer in “merely” about n2n time (that is, the time needed to calculate a single n×n permanent), as opposed to the roughly mn time that one would need if one had to calculate permanents corresponding to all the possible outcomes of the experiment.  As a consequence of that, they argue that achieving quantum supremacy via BosonSampling would probably require at least ~50 photons—which would in turn require a “step change” in technology, as they put it.

I completely agree with the Bristol group’s view of the asymptotics.  In fact, Alex Arkhipov and I ourselves repeatedly told experimentalists, in our papers and talks about BosonSampling (the question came up often…), that the classical complexity of the problem should only be taken to scale like 2n, rather than like mn.  Despite not having a general proof that the problem could actually be solved in ~2n time in the worst case, we said that for two main reasons:

1. Even under the most optimistic assumptions, our hardness reductions, from Gaussian permanent estimation and so forth, only yielded ~2n hardness, not ~mn hardness.  (Hardness reductions giving us important clues about the real world?  Whuda thunk??)
2. If our BosonSampling matrix is Haar-random—or otherwise not too skewed to produce outcomes with huge probabilities—then it’s not hard to see that we can do approximate BosonSampling in O(n2n) time classically, by using rejection sampling.

Indeed, Alex and I insisted on these points despite some pushback from experimentalists, who were understandably hoping that they could get to quantum supremacy just by upping m, the number of modes, without needing to do anything heroic with n, the number of photons!  So I’m happy to see that a more careful analysis supports the guess that Alex and I made.

On the other hand, what does this mean for the number of photons needed for “quantum supremacy”: is it 20? 30? 50?  I confess that that sort of question interests me much less, since it all depends on the details of how you define the comparison (are we comparing against ENIAC? a laptop? a server farm? how many cores? etc etc).  As I’ve often said, my real hope with quantum supremacy is to see a quantum advantage that’s so overwhelming—so duh-obvious to the naked eye—that we don’t have to squint or argue about the definitions.

### Insert D-Wave Post Here

Thursday, March 16th, 2017

In the two months since I last blogged, the US has continued its descent into madness.  Yet even while so many certainties have proven ephemeral as the morning dew—the US’s autonomy from Russia, the sanity of our nuclear chain of command, the outcome of our Civil War, the constraints on rulers that supposedly set us apart from the world’s dictator-run hellholes—I’ve learned that certain facts of life remain constant.

The moon still waxes and wanes.  Electrons remain bound to their nuclei.  P≠NP proofs still fill my inbox.  Squirrels still gather acorns.  And—of course!—people continue to claim big quantum speedups using D-Wave devices, and those claims still require careful scrutiny.

With that preamble, I hereby offer you eight quantum computing news items.

Cathy McGeoch Episode II: The Selby Comparison

On January 17, a group from D-Wave—including Cathy McGeoch, who now works directly for D-Wave—put out a preprint claiming a factor-of-2500 speedup for the D-Wave machine (the new, 2000-qubit one) compared to the best classical algorithms.  Notably, they wrote that the speedup persisted when they compared against simulated annealing, quantum Monte Carlo, and even the so-called Hamze-de Freitas-Selby (HFS) algorithm, which was often the classical victor in previous performance comparisons against the D-Wave machine.

Reading this, I was happy to see how far the discussion has advanced since 2013, when McGeoch and Cong Wang reported a factor-of-3600 speedup for the D-Wave machine, but then it turned out that they’d compared only against classical exact solvers rather than heuristics—a choice for which they were heavily criticized on this blog and elsewhere.  (And indeed, that particular speedup disappeared once the classical computer’s shackles were removed.)

So, when people asked me this January about the new speedup claim—the one even against the HFS algorithm—I replied that, even though we’ve by now been around this carousel several times, I felt like the ball was now firmly in the D-Wave skeptics’ court, to reproduce the observed performance classically.  And if, after a year or so, no one could, that would be a good time to start taking seriously that a D-Wave speedup might finally be here to stay—and to move on to the next question, of whether this speedup had anything to do with quantum computation, or only with the building of a piece of special-purpose optimization hardware.

A&M: Annealing and Matching

As it happened, it only took one month.  On March 2, Salvatore Mandrà, Helmut Katzgraber, and Creighton Thomas put up a response preprint, pointing out that the instances studied by the D-Wave group in their most recent comparison are actually reducible to the minimum-weight perfect matching problem—and for that reason, are solvable in polynomial time on a classical computer.   Much of Mandrà et al.’s paper just consists of graphs, wherein they plot the running times of the D-Wave machine and of a classical heuristic on the relevant instances—clearly all different flavors of exponential—and then Edmonds’ matching algorithm from the 1960s, which breaks away from the pack into polynomiality.

But let me bend over backwards to tell you the full story.  Last week, I had the privilege of visiting Texas A&M to give a talk.  While there, I got to meet Helmut Katzgraber, a condensed-matter physicist who’s one of the world experts on quantum annealing experiments, to talk to him about their new response paper.  Helmut was clear in his prediction that, with only small modifications to the instances considered, one could see similar performance by the D-Wave machine while avoiding the reduction to perfect matching.  With those future modifications, it’s possible that one really might see a D-Wave speedup that survived serious attempts by skeptics to make it go away.

But Helmut was equally clear in saying that, even in such a case, he sees no evidence at present that the speedup would be asymptotic or quantum-computational in nature.  In other words, he thinks the existing data is well explained by the observation that we’re comparing D-Wave against classical algorithms for Ising spin minimization problems on Chimera graphs, and D-Wave has heroically engineered an expensive piece of hardware specifically for Ising spin minimization problems on Chimera graphs and basically nothing else.  If so, then the prediction would be that such speedups as can be found are unlikely to extend either to more “practical” optimization problems—which need to be embedded into the Chimera graph with considerable losses—or to better scaling behavior on large instances.  (As usual, as long as the comparison is against the best classical algorithms, and as long as we grant the classical algorithm the same non-quantum advantages that the D-Wave machine enjoys, such as classical parallelism—as Rønnow et al advocated.)

Incidentally, my visit to Texas A&M was partly an “apology tour.”  When I announced on this blog that I was moving from MIT to UT Austin, I talked about the challenge and excitement of setting up a quantum computing research center in a place that currently had little quantum computing for hundreds of miles around.  This thoughtless remark inexcusably left out not only my friends at Louisiana State (like Jon Dowling and Mark Wilde), but even closer to home, Katzgraber and the others at Texas A&M.  I felt terrible about this for months.  So it gives me special satisfaction to have the opportunity to call out Katzgraber’s new work in this post.  In football, UT and A&M were longtime arch-rivals, but when it comes to the appropriate level of skepticism to apply to quantum supremacy claims, the Texas Republic seems remarkably unified.

When 15 MilliKelvin is Toasty

In other D-Wave-related scientific news, on Monday night Tameem Albash, Victor Martin-Mayer, and Itay Hen put out a preprint arguing that, in order for quantum annealing to have any real chance of yielding a speedup over classical optimization methods, the temperature of the annealer should decrease at least like 1/log(n), where n is the instance size, and more likely like 1/nβ (i.e., as an inverse power law).

If this is correct, then cold as the D-Wave machine is, at 0.015 degrees or whatever above absolute zero, it still wouldn’t be cold enough to see a scalable speedup, at least not without quantum fault-tolerance, something that D-Wave has so far eschewed.  With no error-correction, any constant temperature that’s above zero would cause dangerous level-crossings up to excited states when the instances get large enough.  Only a temperature that actually converged to zero as the problems got larger would suffice.

Over the last few years, I’ve heard many experts make this exact same point in conversation, but this is the first time I’ve seen the argument spelled out in a paper, with explicit calculations (modulo assumptions) of the rate at which the temperature would need to go to zero for uncorrected quantum annealing to be a viable path to a speedup.  I lack the expertise to evaluate the calculations myself, but any experts who’d like to share their insight in the comments section are “warmly” (har har) invited.

“Their Current Numbers Are Still To Be Checked”

As some of you will have seen, The Economist now has a sprawling 10-page cover story about quantum computing and other quantum technologies.  I had some contact with the author while the story was in the works.

The piece covers a lot of ground and contains many true statements.  It could be much worse.

But I take issue with two things.

First, The Economist claims: “What is notable about the effort [to build scalable QCs] now is that the challenges are no longer scientific but have become matters of engineering.”  As John Preskill and others pointed out, this is pretty far from true, at least if we interpret the claim in the way most engineers and businesspeople would.

Yes, we know the rules of quantum mechanics, and the theory of quantum fault-tolerance, and a few promising applications; and the basic building blocks of QC have already been demonstrated in several platforms.  But if (let’s say) someone were to pony up \$100 billion, asking only for a universal quantum computer as soon as possible, I think the rational thing to do would be to spend initially on a frenzy of basic research: should we bet on superconducting qubits, trapped ions, nonabelian anyons, photonics, a combination thereof, or something else?  (Even that is far from settled.)  Can we invent better error-correcting codes and magic state distillation schemes, in order to push the resource requirements for universal QC down by three or four orders of magnitude?  Which decoherence mechanisms will be relevant when we try to do this stuff at scale?  And of course, which new quantum algorithms can we discover, and which new cryptographic codes resistant to quantum attack?

The second statement I take issue with is this:

“For years experts questioned whether the [D-Wave] devices were actually exploiting quantum mechanics and whether they worked better than traditional computers.  Those questions have since been conclusively answered—yes, and sometimes”

1. depends on what you mean by “exploit” (yes, there are quantum tunneling effects, but do they help you solve problems faster?), and
2. no, the evidence remains weak to nonexistent that the D-Wave machine solves anything faster than a traditional computer—certainly if, by “traditional computer,” we mean a device that gets all the advantages of the D-Wave machine (e.g., classical parallelism, hardware heroically specialized to the one type of problem we’re testing on), but no quantum effects.

Shortly afterward, when discussing the race to achieve “quantum supremacy” (i.e., a clear quantum computing speedup for some task, not necessarily a useful one), the Economist piece hedges: “D-Wave has hinted it has already [achieved quantum supremacy], but has made similar claims in the past; their current numbers are still to be checked.”

To me, “their current numbers are still to be checked” deserves its place alongside “mistakes were made” among the great understatements of the English language—perhaps a fitting honor for The Economist.

Defeat Device

Some of you might also have seen that D-Wave announced a deal with Volkswagen, to use D-Wave machines for traffic flow.  I had some advance warning of this deal, when reporters called asking me to comment on it.  At least in the materials I saw, no evidence is discussed that the D-Wave machine actually solves whatever problem VW is interested in faster than it could be solved with a classical computer.  Indeed, in a pattern we’ve seen repeatedly for the past decade, the question of such evidence is never even directly confronted or acknowledged.

So I guess I’ll say the same thing here that I said to the journalists.  Namely, until there’s a paper or some other technical information, obviously there’s not much I can say about this D-Wave/Volkswagen collaboration.  But it would be astonishing if quantum supremacy were to be achieved on an application problem of interest to a carmaker, even as scientists struggle to achieve that milestone on contrived and artificial benchmarks, even as the milestone seems repeatedly to elude D-Wave itself on contrived and artificial benchmarks.  In the previous such partnerships—such as that with Lockheed Martin—we can reasonably guess that no convincing evidence for quantum supremacy was found, because if it had been, it would’ve been trumpeted from the rooftops.

Anyway, I confess that I couldn’t resist adding a tiny snark—something about how, if these claims of amazing performance were found not to withstand an examination of the details, it would not be the first time in Volkswagen’s recent history.

Farewell to a Visionary Leader—One Who Was Trash-Talking Critics on Social Media A Decade Before President Trump

This isn’t really news, but since it happened since my last D-Wave post, I figured I should share.  Apparently D-Wave’s outspoken and inimitable founder, Geordie Rose, left D-Wave to form a machine-learning startup (see D-Wave’s leadership page, where Rose is absent).  I wish Geordie the best with his new venture.

Martinis Visits UT Austin

On Feb. 22, we were privileged to have John Martinis of Google visit UT Austin for a day and give the physics colloquium.  Martinis concentrated on the quest to achieve quantum supremacy, in the near future, using sampling problems inspired by theoretical proposals such as BosonSampling and IQP, but tailored to Google’s architecture.  He elaborated on Google’s plan to build a 49-qubit device within the next few years: basically, a 7×7 square array of superconducting qubits with controllable nearest-neighbor couplings.  To a layperson, 49 qubits might sound unimpressive compared to D-Wave’s 2000—but the point is that these qubits will hopefully maintain coherence times thousands of times longer than the D-Wave qubits, and will also support arbitrary quantum computations (rather than only annealing).  Obviously I don’t know whether Google will succeed in its announced plan, but if it does, I’m very optimistic about a convincing quantum supremacy demonstration being possible with this sort of device.

Perhaps most memorably, Martinis unveiled some spectacular data, which showed near-perfect agreement between Google’s earlier 9-qubit quantum computer and the theoretical predictions for a simulation of the Hofstadter butterfly (incidentally invented by Douglas Hofstadter, of Gödel, Escher, Bach fame, when he was still a physics graduate student).  My colleague Andrew Potter explained to me that the Hofstadter butterfly can’t be used to show quantum supremacy, because it’s mathematically equivalent to a system of non-interacting fermions, and can therefore be simulated in classical polynomial time.  But it’s certainly an impressive calibration test for Google’s device.

2000 Qubits Are Easy, 50 Qubits Are Hard

Just like the Google group, IBM has also publicly set itself the ambitious goal of building a 50-qubit superconducting quantum computer in the near future (i.e., the next few years).  Here in Austin, IBM held a quantum computing session at South by Southwest, so I went—my first exposure of any kind to SXSW.  There were 10 or 15 people in the audience; the purpose of the presentation was to walk through the use of the IBM Quantum Experience in designing 5-qubit quantum circuits and submitting them first to a simulator and then to IBM’s actual superconducting device.  (To the end user, of course, the real machine differs from the simulation only in that with the former, you can see the exact effects of decoherence.)  Afterward, I chatted with the presenters, who were extremely friendly and knowledgeable, and relieved (they said) that I found nothing substantial to criticize in their summary of quantum computing.

Hope everyone had a great Pi Day and Ides of March.

### “THE TALK”: My quantum computing cartoon with Zach Weinersmith

Wednesday, December 14th, 2016

OK, here’s the big entrée that I promised you yesterday:

“THE TALK”: My joint cartoon about quantum comgputing with Zach Weinersmith of SMBC Comics.

In case you’re wondering how this came about: after our mutual friend Sean Carroll introduced me and Zach for a different reason, the idea of a joint quantum computing comic just seemed too good to pass up.  The basic premise—“The Talk”—was all Zach.  I dutifully drafted some dialogue for him, which he then improved and illustrated.  I.e., he did almost all the work (despite having a newborn competing for his attention!).  Still, it was an honor for me to collaborate with one of the great visual artists of our time, and I hope you like the result.  Beyond that, I’ll let the work speak for itself.

### Quantum computing news (98% Trump-free)

Thursday, November 24th, 2016

(1) Apparently Microsoft has decided to make a major investment in building topological quantum computers, which will include hiring Charles Marcus and Matthias Troyer among others.  See here for their blog post, and here for the New York Times piece.  In the race to implement QC among the established corporate labs, Microsoft thus joins the Martinis group at Google, as well as the IBM group at T. J. Watson—though both Google and IBM are focused on superconducting qubits, rather than the more exotic nonabelian anyon approach that Microsoft has long favored and is now doubling down on.  I don’t really know more about this new initiative beyond what’s in the articles, but I know many of the people involved, they’re some of the most serious in the business, and Microsoft intensifying its commitment to QC can only be good for the field.  I wish the new effort every success, despite being personally agnostic between superconducting qubits, trapped ions, photonics, nonabelian anyons, and other QC hardware proposals—whichever one gets there first is fine with me!

(2) For me, though, perhaps the most exciting QC development of the last month was a new preprint by my longtime friend Dorit Aharonov and her colleague Yosi Atia, entitled Fast-Forwarding of Hamiltonians and Exponentially Precise Measurements.  In this work, Dorit and Yosi wield the clarifying sword of computational complexity at one of the most historically confusing issues in quantum mechanics: namely, the so-called “time-energy uncertainty principle” (TEUP).

The TEUP says that, just as position and momentum are conjugate in quantum mechanics, so too are energy and time—with greater precision in energy corresponding to lesser precision in time and vice versa.  The trouble is, it was never 100% clear what the TEUP even meant—after all, time isn’t even an observable in quantum mechanics, just an external parameter—and, to whatever extent the TEUP did have a definite meaning, it wasn’t clear that it was true.  Indeed, as Dorit and Yosi’s paper discusses in detail, in 1961 Dorit’s uncle Yakir Aharonov, together with David Bohm, gave a counterexample to a natural interpretation of the TEUP.  But, despite this and other counterexamples, the general feeling among physicists—who, after all, are physicists!—seems to have been that some corrected version of the TEUP should hold “in all reasonable circumstances.”

But, OK, what do we mean by a “reasonable circumstance”?  This is where Dorit and Yosi come in.   In the new work, they present a compelling case that the TEUP should really be formulated as a tradeoff between the precision of energy measurements and circuit complexity (that is, the minimum number of gates needed to implement the energy measurement)—and in that amended form, the TEUP holds for exactly those Hamiltonians H that can’t be “computationally fast-forwarded.”  In other words, it holds whenever applying the unitary transformation e-iHt requires close to t computation steps, when there’s no magical shortcut that lets you simulate t steps of time evolution with only (say) log(t) steps.  And, just as the physicists handwavingly thought, that should indeed hold for “generic” Hamiltonians H (assuming BQP≠PSPACE), although it’s possible to use Shor’s algorithm, for finding the order of an element in a multiplicative group, to devise a counterexample to it.

Anyway, there’s lots of other stuff in the paper, including a connection to the stuff Lenny Susskind and I have been doing about the “generic” growth of circuit complexity, in the CFT dual of an expanding wormhole (where we also needed to assume BQP≠PSPACE and closely related complexity separations, for much the same reasons).  Congratulations to Dorit and Yosi for once again illustrating the long reach of computational complexity in physics, and for giving me a reason to be happy this month!

(3) As many of you will have seen, my former MIT colleagues, Lior Eldar and Peter Shor, recently posted an arXiv preprint claiming a bombshell result: namely, a polynomial-time quantum algorithm to solve a variant of the Closest Vector Problem in lattices.  Their claimed algorithm wouldn’t yet break lattice-based cryptography, but if the approximation factors could be improved, it would be well on the way to doing so.  This has been one of the most tempting targets for quantum algorithms research for more than twenty years—ever since Shor’s “original” algorithm laid waste to RSA, Diffie-Hellman, elliptic-curve cryptography, and more in a world with scalable quantum computers, leaving lattice-based cryptography as one of the few public-key crypto proposals still standing.

Unfortunately, Lior tells me that Oded Regev has discovered a flaw in the algorithm, which he and Peter don’t currently know how to fix.  So for now, they’re withdrawing the paper (because of the Thanksgiving holiday, the withdrawal won’t take effect on the arXiv until Monday).  It’s still a worthy attempt on a great problem—here’s hoping that they or someone else manage to, as Lior put it to me, “make the algorithm great again.”

### My 5-minute quantum computing talk at the White House

Tuesday, October 25th, 2016

(OK, technically it was in the Eisenhower Executive Office Building, which is not exactly the White House itself, but is adjacent to the West Wing in the White House complex.  And President Obama wasn’t there—maybe, like Justin Trudeau, he already knows everything about quantum computing?  But lots of people from the Office of Science and Technology Policy were!  And some of us talked with Valerie Jarrett, Obama’s adviser, when she passed us on her way to the West Wing.

The occasion was a Quantum Information Science policy workshop that OSTP held, and which the White House explicitly gave us permission to discuss on social media.  Indeed, John Preskill already tweeted photos from the event.  Besides me and Preskill, others in attendance included Umesh Vazirani, Seth Lloyd, Yaoyun Shi, Rob Schoelkopf, Krysta Svore, Hartmut Neven, Stephen Jordan…

I don’t know whether this is the first time that the polynomial hierarchy, or the notion of variation distance, were ever invoked in a speech at the White House.  But in any case, I was proud to receive a box of Hershey Kisses bearing the presidential seal.  I thought of not eating them, but then I got hungry, and realized that I can simply refill the box later if desired.

For regular readers of Shtetl-Optimized, my talk won’t have all that much that’s new, but in any case it’s short.

Incidentally, during the workshop, a guy from OSTP told me that, when he and others at the White House were asked to prepare materials about quantum computing, posts on Shtetl-Optimized (such as Shor I’ll Do It) were a huge help.  Honored though I was to have “served my country,” I winced, thinking about all the puerile doofosities I might’ve self-censored had I had any idea who might read them.  I didn’t dare ask whether anyone at the White House also reads the comment sections!

Thanks so much to all the other participants and to the organizers for a great workshop.  –SA)

Quantum Supremacy

by Scott Aaronson (UT Austin)

October 18, 2016

Thank you; it’s great to be here.  There are lots of directions that excite me enormously right now in quantum computing theory, which is what I work on.  For example, there’s the use of quantum computing to get new insight into classical computation, into condensed matter physics, and recently, even into the black hole information problem.

But since I have five minutes, I wanted to talk here about one particular direction—one that, like nothing else that I know of, bridges theory and experiment in the service of what we hope will be a spectacular result in the near future.  This direction is what’s known as “Quantum Supremacy”—John [Preskill], did you help popularize that term?  [John nods yes]—although some people have been backing away from the term recently, because of the campaign of one of the possible future occupants of this here complex.

But what quantum supremacy means to me, is demonstrating a quantum speedup for some task as confidently as possible.  Notice that I didn’t say a useful task!  I like to say that for me, the #1 application of quantum computing—more than codebreaking, machine learning, or even quantum simulation—is just disproving the people who say quantum computing is impossible!  So, quantum supremacy targets that application.

What is important for quantum supremacy is that we solve a clearly defined problem, with some relationship between inputs and outputs that’s independent of whatever hardware we’re using to solve the problem.  That’s part of why it doesn’t cut it to point to some complicated, hard-to-simulate molecule and say “aha!  quantum supremacy!”

One discovery, which I and others stumbled on 7 or 8 years ago, is that quantum supremacy seems to become much easier to demonstrate if we switch from problems with a single valid output to sampling problems: that is, problems of sampling exactly or approximately from some specified probability distribution.

Doing this has two advantages.  First, we no longer need a full, fault-tolerant quantum computer—in fact, very rudimentary types of quantum hardware appear to suffice.  Second, we can design sampling problems for which we can arguably be more confident that they really are hard for a classical computer, than we are that (say) factoring is classically hard.  I like to say that a fast classical factoring algorithm might collapse the world’s electronic commerce, but as far as we know, it wouldn’t collapse the polynomial hierarchy!  But with sampling problems, at least with exact sampling, we can often show the latter implication, which is about the best evidence you can possibly get for such a problem being hard in the present state of mathematics.

One example of these sampling tasks that we think are classically hard is BosonSampling, which Alex Arkhipov and I proposed in 2011.  BosonSampling uses a bunch of identical photons that are sent through a network of beamsplitters, then measured to count the number of photons in each output mode.  Over the past few years, this proposal has been experimentally demonstrated by quantum optics groups around the world, with the current record being a 6-photon demonstration by the O’Brien group in Bristol, UK.  A second example is the IQP (“Instantaneous Quantum Polynomial-Time”) or Commuting Hamiltonians model of Bremner, Jozsa, and Shepherd.

A third example—no doubt the simplest—is just to sample from the output distribution of a random quantum circuit, let’s say on a 2D square lattice of qubits with nearest-neighbor interactions.  Notably, this last task is one that the Martinis group at Google is working toward achieving right now, with 40-50 qubits.  They say that they’ll achieve it in as little as one or two years, which translated from experimental jargon, means maybe five years?  But not infinity years.

The challenges on the experimental side are clear: get enough qubits with long enough coherence times to achieve this.  But there are also some huge theoretical challenges remaining.

A first is, can we still solve classically hard sampling problems even in the presence of realistic experimental imperfections?  Arkhipov and I already thought about that problem—in particular, about sampling from a distribution that’s merely close in variation distance to the BosonSampling one—and got results that admittedly weren’t as satisfactory as the results for exact sampling.  But I’m delighted to say that, just within the last month or two, there have been some excellent new papers on the arXiv that tackle exactly this question, with both positive and negative results.

A second theoretical challenge is, how do we verify the results of a quantum supremacy experiment?  Note that, as far as we know today, verification could itself require classical exponential time.  But that’s not the showstopper that some people think, since we could target the “sweet spot” of 40-50 qubits, where classical verification is difficult (and in particular, clearly “costlier” than running the experiment itself), but also far from impossible with cluster computing resources.

If I have any policy advice, it’s this: recognize that a clear demonstration of quantum supremacy is at least as big a deal as (say) the discovery of the Higgs boson.  After this scientific milestone is achieved, I predict that the whole discussion of commercial applications of quantum computing will shift to a new plane, much like the Manhattan Project shifted to a new plane after Fermi built his pile under the Chicago stadium in 1942.  In other words: at this point, the most “applied” thing to do might be to set applications aside temporarily, and just achieve this quantum supremacy milestone—i.e., build the quantum computing Fermi pile—and thereby show the world that quantum computing speedups are a reality.  Thank you.

### Avi Wigderson’s “Permanent” Impact on Me

Wednesday, October 12th, 2016

The following is the lightly-edited transcript of a talk that I gave a week ago, on Wednesday October 5, at Avi Wigderson’s 60th birthday conference at the Institute for Advanced Study in Princeton.  Videos of all the talks (including mine) are now available here.

Thanks so much to Sanjeev Arora, Boaz Barak, Ran Raz, Peter Sarnak, and Amir Shpilka for organizing the conference and for inviting me to speak; to all the other participants and speakers for a great conference; and of course to Avi himself for being Avi. –SA

I apologize that I wasn’t able to prepare slides for today’s talk. But the good news is that, ever since I moved to Texas two months ago, I now carry concealed chalk everywhere I go. [Pull chalk out of pocket]

My history with Avi goes back literally half my life. I spent a semester with him at Hebrew University, and then a year with him as a postdoc here at IAS. Avi has played a bigger role in my career than just about anyone—he helped me professionally, he helped me intellectually, and once I dated and then married an Israeli theoretical computer scientist (who was also a postdoc in Avi’s group), Avi even helped me learn Hebrew. Just today, Avi taught me the Hebrew word for the permanent of a matrix. It turns out that it’s [throaty noises] pehhrmahnent.

But it all started with a talk that Avi gave in Princeton in 2000, which I attended as a prospective graduate student. That talk was about the following function of an n×n matrix A∈Rn×n, the permanent:

$$\text{Per}(A) = \sum_{\sigma \in S_n} \prod_{i=1}^n a_{i,\sigma(i)}.$$

Avi contrasted that function with a superficially similar function, the determinant:

$$\text{Det}(A) = \sum_{\sigma \in S_n} (-1)^{\text{sgn}(\sigma)} \prod_{i=1}^n a_{i,\sigma(i)}.$$

In this talk, I want to share a few of the amazing things Avi said about these two functions, and how the things he said then reverberated through my entire career.

Firstly, like we all learn in kindergarten or whatever, the determinant is computable in polynomial time, for example by using Gaussian elimination. By contrast, Valiant proved in 1979 that computing the permanent is #P-complete—which means, at least as hard as any combinatorial counting problem, a class believed to be even harder than NP-complete.

So, despite differing from each other only by some innocent-looking -1 factors, which the determinant has but the permanent lacks, these two functions effectively engage different regions of mathematics. The determinant is linear-algebraic and geometric; it’s the product of the eigenvalues and the volume of the parallelipiped defined by the row vectors. But the permanent is much more stubbornly combinatorial. It’s not quite as pervasive in mathematics as the determinant is, though it does show up. As an example, if you have a bipartite graph G, then the permanent of G’s adjacency matrix counts the number of perfect matchings in G.

When n=2, computing the permanent doesn’t look too different from computing the determinant: indeed, we have

$$\text{Per}\left( \begin{array} [c]{cc}% a & b\\ c & d \end{array} \right) =\text{Det}\left( \begin{array} [c]{cc}% a & -b\\ c & d \end{array} \right).$$

But as n gets larger, the fact that the permanent is #P-complete means that it must get exponentially harder to compute than the determinant, unless basic complexity classes collapse. And indeed, to this day, the fastest known algorithm to compute an n×n permanent, Ryser’s algorithm, takes O(n2n) time, which is only modestly better than the brute-force algorithm that just sums all n! terms.

Yet interestingly, not everything about the permanent is hard. So for example, if A is nonnegative, then in 1997, Jerrum, Sinclair, and Vigoda famously gave a polynomial-time randomized algorithm to approximate Per(A) to within a 1+ε multiplicative factor, for ε>0 as small as you like. As an even simpler example, if A is nonnegative and you just want to know whether its permanent is zero or nonzero, that’s equivalent to deciding whether a bipartite graph has at least one perfect matching. And we all know that that can be done in polynomial time.

OK, but the usual algorithm from the textbooks that puts the matching problem in the class P is already a slightly nontrivial one—maybe first grade rather than kindergarten! It involves repeatedly using depth-first search to construct augmenting paths, then modifying the graph, etc. etc.

Sixteen years ago in Princeton, the first thing Avi said that blew my mind is that there’s a vastly simpler polynomial-time algorithm to decide whether a bipartite graph has a perfect matching—or equivalently, to decide whether a nonnegative matrix has a zero or nonzero permanent. This algorithm is not quite as efficient as the textbook one, but it makes up for it by being more magical.

So here’s what you do: you start with the 0/1 adjacency matrix of your graph. I’ll do a 2×2 example, since that’s all I’ll be able to compute on the fly:

$$\left( \begin{array} [c]{cc}% 1 & 1\\ 0 & 1 \end{array} \right).$$

Then you normalize each row so it sums to 1. In the above example, this would give

$$\left( \begin{array} [c]{cc}% \frac{1}{2} & \frac{1}{2} \\ 0 & 1 \end{array} \right).$$

Next you normalize each column so it sums to 1:

$$\left( \begin{array} [c]{cc}% 1 & \frac{1}{3} \\ 0 & \frac{2}{3} \end{array} \right).$$

OK, but now the problem is that the rows are no longer normalized, so you normalize them again:

$$\left( \begin{array} [c]{cc}% \frac{3}{4} & \frac{1}{4} \\ 0 & 1 \end{array} \right).$$

Then you normalize the columns again, and so on. You repeat something like n3log(n) times. If, after that time, all the row sums and column sums have become within ±1/n of 1, then you conclude that the permanent was nonzero and the graph had a perfect matching. Otherwise, the permanent was zero and the graph had no perfect matching.

What gives? Well, it’s a nice exercise to prove why this works. I’ll just give you a sketch: first, when you multiply any row or column of a matrix by a scalar, you multiply the permanent by that same scalar. By using that fact, together with the arithmetic-geometric mean inequality, it’s possible to prove that, in every iteration but the first, when you rebalance all the rows or all the columns to sum to 1, you can’t be decreasing the permanent. The permanent increases monotonically.

Second, if the permanent is nonzero, then after the first iteration it’s at least 1/nn, simply because we started with a matrix of 0’s and 1’s.

Third, the permanent is always at most the product of the row sums or the product of the column sums, which after the first iteration is 1.

Fourth, at any iteration where there’s some row sum or column sum that’s far from 1, rescaling must not only increase the permanent, but increase it by an appreciable amount—like, a 1+1/n2 factor or so.

Putting these four observations together, we find that if the permanent is nonzero, then our scaling procedure must terminate after a polynomial number of steps, with every row sum and every column sum close to 1—since otherwise, the permanent would just keep on increasing past its upper bound of 1.

But a converse statement is also true. Suppose the matrix can be scaled so that every row sum and every column sum gets within ±1/n of 1. Then the rescaled entries define a flow through the bipartite graph, with roughly the same amount of flow through each of the 2n vertices. But if such a flow exists, then Hall’s Theorem tells you that there must be a perfect matching (and hence the permanent must be nonzero)—since if a matching didn’t exist, then there would necessarily be a bottleneck preventing the flow.

Together with Nati Linial and Alex Samorodnitsky, Avi generalized this to show that scaling the rows and columns gives you a polynomial-time algorithm to approximate the permanent of a nonnegative matrix. This basically follows from the so-called Egorychev-Falikman Theorem, which says that the permanent of a doubly stochastic matrix is at least n!/nn. The approximation ratio that you get this way, ~en, isn’t nearly as good as Jerrum-Sinclair-Vigoda’s, but the advantage is that the algorithm is deterministic (and also ridiculously simple).

For myself, though, I just filed away this idea of matrix scaling for whenever I might need it. It didn’t take long. A year after Avi’s lecture, when I was a beginning grad student at Berkeley, I was obsessing about the foundations of quantum mechanics. Specifically, I was obsessing about the fact that, when you measure a quantum state, the rules of quantum mechanics tell you how to calculate the probability that you’ll see a particular outcome. But the rules are silent about so-called multiple-time or transition probabilities. In other words: suppose we adopt Everett’s Many-Worlds view, according to which quantum mechanics needs to be applied consistently to every system, regardless of scale, so in particular, the state of the entire universe (including us) is a quantum superposition state. We perceive just one branch, but there are also these other branches where we made different choices or where different things happened to us, etc.

OK, fine: for me, that’s not the troubling part! The troubling part is that quantum mechanics rejects as meaningless questions like the following: given that you’re in this branch of the superposition at time t1, what’s the probability that you’ll be in that branch at time t2, after some unitary transformation is applied? Orthodox quantum mechanics would say: well, either someone measured you at time t1, in which case their act of measuring collapsed the superposition and created a whole new situation. Or else no one measured at t1—but in that case, your state at time t1 was the superposition state, full stop. It’s sheer metaphysics to imagine a “real you” that jumps around from one branch of the superposition to another, having a sequence of definite experiences.

Granted, in practice, branches of the universe’s superposition that split from each other tend never to rejoin, for the same thermodynamic reasons why eggs tend never to unscramble themselves. And as long as the history of the Everettian multiverse has the structure of a tree, we can sensibly define transition probabilities. But if, with some technology of the remote future, we were able to do quantum interference experiments on human brains (or other conscious entities), the rules of quantum mechanics would no longer predict what those beings should see—not even probabilistically.

I was interested in the question: suppose we just wanted to postulate transition probabilities, with the transitions taking place in some fixed orthogonal basis. What would be a mathematically reasonable way to do that? And it occurred to me that one thing you could do is the following. Suppose for simplicity that you have a pure quantum state, which is just a unit vector of n complex numbers called amplitudes:

$$\left( \begin{array} [c]{c}% \alpha_{1}\\ \vdots\\ \alpha_{n}% \end{array} \right)$$

Then the first rule of quantum mechanics says that you can apply any unitary transformation U (that is, any norm-preserving linear transformation) to map this state to a new one:

$$\left( \begin{array} [c]{c}% \beta_{1}\\ \vdots\\ \beta_{n}% \end{array} \right) =\left( \begin{array} [c]{ccc}% u_{11} & \cdots & u_{1n}\\ \vdots & \ddots & \vdots\\ u_{n1} & \cdots & u_{nn}% \end{array} \right) \left( \begin{array} [c]{c}% \alpha_{1}\\ \vdots\\ \alpha_{n}% \end{array} \right).$$

The second rule of quantum mechanics, the famous Born Rule, says that if you measure in the standard basis before applying U, then the probability that you’ll find youself in state i equals |αi|2. Likewise, if you measure in the standard basis after applying U, the probability that you’ll find youself in state j equals |βj|2.

OK, but what’s the probability that you’re in state i at the initial time, and then state j at the final time? These joint probabilities, call them pij, had better add up to |αi|2 and |βj|2, if we sum the rows and columns respectively. And ideally, they should be “derived” in some way from the unitary U—so that for example, if uij=0 then pij=0 as well.

So here’s something you could do: start by replacing each uij by its absolute value, to get a nonnegative matrix. Then, normalize the ith row so that it sums to |αi|2, for each i. Then normalize the jth column so that it sums to |βj|2, for each j. Then normalize the rows, then the columns, and keep iterating until hopefully you end up with all the rows and columns having the right sums.

So the first question I faced was, does this process converge? And I remembered what Avi taught me about the permanent. In this case, because of the nonuniform row and column scalings, the permanent no longer works as a progress measure, but there’s something else that does work. Namely, as a first step, we can use the Max-Flow/Min-Cut Theorem to show that there exists a nonnegative matrix F=(fij) such that fij=0 whenever uij=0, and also

$$\sum_j f_{ij} = \left|\alpha_i\right|^2 \forall i,\ \ \ \ \ \sum_i f_{ij} = \left|\beta_j\right|^2 \forall j.$$

Then, letting M=(mij) be our current rescaled matrix (so that initially mij:=|uij|), we use

$$\prod_{i,j : u_{ij}\ne 0} m_{ij}^{f_{ij}}$$

as our progress measure. By using the nonnegativity of the Kullback-Leibler divergence, one can prove that this quantity never decreases. So then, just like with 0/1 matrices and the permanent, we get eventual convergence, and indeed convergence after a number of iterations that’s polynomial in n.

I was pretty stoked about this until I went to the library, and discovered that Erwin Schrödinger had proposed the same matrix scaling process in 1931! And Masao Nagasawa and others then rigorously analyzed it. OK, but their motivations were somewhat different, and for some reason they never talked about finite-dimensional matrices, only infinite-dimensional ones.

I can’t resist telling you my favorite open problem about this matrix scaling process: namely, is it stable under small perturbations? In other words, if I change one of the αi‘s or uij‘s by some small ε, then do the final pij‘s also change by at most some small δ? To clarify, several people have shown me how to prove that the mapping to the pij‘s is continuous. But for computer science applications, one needs something stronger: namely that when the matrix M, and the row and column scalings, actually arise from a unitary matrix in the way above, we get strong uniform continuity, with a 1/nO(1) change to the inputs producing only a 1/nO(1) change to the outputs (and hopefully even better than that).

The more general idea that I was groping toward or reinventing here is called a hidden-variable theory, of which the most famous example is Bohmian mechanics. Again, though, Bohmian mechanics has the defect that it’s only formulated for some exotic state space that the physicists care about for some reason—a space involving pointlike objects called “particles” that move around in 3 Euclidean dimensions (why 3? why not 17?).

Anyway, this whole thing led me to wonder: under the Schrödinger scaling process, or something like it, what’s the computational complexity of sampling an entire history of the hidden variable through a quantum computation? (“If, at the moment of your death, your whole life history flashes before you in an instant, what can you then efficiently compute?”)

Clearly the complexity is at least BQP (i.e., quantum polynomial time), because even sampling where the hidden variable is at a single time is equivalent to sampling the output distribution of a quantum computer. But could the complexity be even more than BQP, because of the correlations between the hidden variable values at different times? I noticed that, indeed, sampling a hidden variable history would let you do some crazy-seeming things, like solve the Graph Isomorphism problem in polynomial time (OK, fine, that seemed more impressive at the time than it does after Babai’s breakthrough), or find collisions in arbitrary cryptographic hash functions, or more generally, solve any problem in the complexity class SZK (Statistical Zero Knowledge).

But you might ask: what evidence do we have that any these problems are hard even for garden-variety quantum computers? As many of you know, it’s widely conjectured today that NP⊄BQP—i.e., that quantum computers can’t solve NP-complete problems in polynomial time. And in the “black box” setting, where all you know how to do is query candidate solutions to your NP-complete problem to check whether they’re valid, it’s been proven that quantum computers don’t give you an exponential speedup: the best they can give is the square-root speedup of Grover’s algorithm.

But for these SZK problems, like finding collisions in hash functions, who the hell knows? So, this is the line of thought that led me to probably the most important thing I did in grad school, the so-called quantum lower bound for collision-finding. That result says that, if (again) your hash function is only accessible as a black box, then a quantum computer can provide at most a polynomial speedup over a classical computer for finding collisions in it (in this case, it turns out, at most a two-thirds power speedup). There are several reasons you might care about that, such as showing that one of the basic building blocks of modern cryptography could still be secure in a world with quantum computers, or proving an oracle separation between SZK and BQP. But my original motivation was just to understand how transition probabilities would change quantum computation.

The permanent has also shown up in a much more direct way in my work on quantum computation. If we go back to Avi’s lecture from 2000, a second thing he said that blew my mind was that apparently, or so he had heard, even the fundamental particles of the universe know something about the determinant and the permanent. In particular, he said, fermions—the matter particles, like the quarks and electrons in this stage—have transition amplitudes that are determinants of matrices. Meanwhile, bosons—the force-carrying particles, like the photons coming from the ceiling that let you see this talk—have transition amplitudes that are permanents of matrices.

Or as Steven Weinberg, one of the great physicists on earth, memorably put it in the first edition of his recent quantum mechanics textbook: “in the case of bosons, it is also a determinant, except without minus signs.” I’ve had the pleasure of getting to know Weinberg at Austin, so recently I asked him about that line. He told me that of course he knew that the determinant without minus signs is called a permanent, but he thought no one else would know! As far as he knew, the permanent was just some esoteric function used by a few quantum field theorists who needed to calculate boson amplitudes.

Briefly, the reason why the permanent and determinant turn up here is the following: whenever you have n particles that are identical, to calculate the amplitude for them to do something, you need to sum over all n! possible permutations of the particles. Furthermore, each contribution to the sum is a product of n complex numbers, one uij for each particle that hops from i to j. But there’s a difference: when you swap two identical bosons, nothing happens, and that’s why bosons give rise to the permanent (of an n×n complex matrix, if there are n bosons). By contrast, when you swap two identical fermions, the amplitude for that state of the universe gets multiplied by -1, and that’s why fermions give rise to the determinant.

Anyway, Avi ended his talk with a quip about how unfair it seemed to the bosons that they should have to work so much harder than the fermions just to calculate where they should be!

And then that one joke of Avi—that way of looking at things—rattled around in my head for a decade, like a song I couldn’t get rid of. It raised the question: wait a minute, bosons—particles that occur in Nature—are governed by a #P-complete function? Does that mean we could actually use bosons to solve #P-complete problems in polynomial time? That seems ridiculous, like the kind of nonsense I’m fighting every few weeks on my blog! As I said before, most of us don’t even expect quantum computers to be able to solve NP-complete problems in polynomial time, let alone #P-complete ones.

As it happens, Troyansky and Tishby had already taken up that puzzle in 1996. (Indeed Avi, being the social butterfly and hub node of our field that he is, had learned about the role of permaments and determinants in quantum mechanics from them.) What Troyansky and Tishby said was, it’s true that if you have a system of n identical, non-interacting bosons, their transition amplitudes are given by permanents of n×n matrices. OK, but amplitudes in quantum mechanics are not directly observable. They’re just what you use to calculate the probability that you’ll see this or that measurement outcome. But if you try to encode a hard instance of a #P-complete problem into a bosonic system, the relevant amplitudes will in general be exponentially small. And that means that, if you want a decent estimate of the permanent, you’ll need to repeat the experiment an exponential number of times. So OK, they said, nice try, but this doesn’t give you a computational advantage after all in calculating the permanent compared to classical brute force.

In our 2011 work on BosonSampling, my student Alex Arkhipov and I reopened the question. We said, not so fast. It’s true that bosons don’t seem to help you in estimating the permanent of a specific matrix of your choice. But what if your goal was just to sample a random n×n matrix A∈Cn×n, in a way that’s somehow biased toward matrices with larger permanents? Now, why would that be your goal? I have no idea! But this sampling is something that a bosonic system would easily let you do.

So, what Arkhipov and I proved was that this gives rise to a class of probability distributions that can be sampled in quantum polynomial time (indeed, by a very rudimentary type of quantum computer), but that can’t be sampled in classical polynomial time unless the polynomial hierarchy collapses to the third level. And even though you’re not solving a #P-complete problem, the #P-completeness of the permanent still plays a crucial role in explaining why the sampling problem is hard. (Basically, one proves that the probabilities are #P-hard even to approximate, but that if there were a fast classical sampling algorithm, then the probabilities could be approximated in the class BPPNP. So if a fast classical sampling algorithm existed, then P#P would equal BPPNP, which would collapse the polynomial hierarchy by Toda’s Theorem.)

When we started on this, Arkhipov and I thought about it as just pure complexity theory—conceptually clarifying what role the #P-completeness of the permanent plays in physics. But then at some point it occurred to us: bosons (such as photons) actually exist, and experimentalists in quantum optics like to play with them, so maybe they could demonstrate some of this stuff in the lab. And as it turned out, the quantum optics people were looking for something to do at the time, and they ate it up.

Over the past five years, a trend has arisen in experimental physics that goes by the name “Quantum Supremacy,” although some people are now backing away from the name because of Trump. The idea is: without yet having a universal quantum computer, can we use the hardware that we’re able to build today to demonstrate the reality of a quantum-computational speedup as clearly as possible? Not necessarily for a useful problem, but just for some problem? Of course, no experiment can prove that something is scaling polynomially rather than exponentially, since that’s an asymptotic statement. But an experiment could certainly raise the stakes for the people who deny such a statement—for example, by solving something a trillion times faster than we know how to solve it otherwise, using methods for which we don’t know a reason for them not to scale.

I like to say that for me, the #1 application of quantum computing, more than breaking RSA or even simulating physics and chemistry, is simply disproving the people who say that quantum computing is impossible! So, quantum supremacy targets that application.

Experimental BosonSampling has become a major part of the race to demonstrate quantum supremacy. By now, at least a half-dozen groups around the world have reported small-scale implementations—the record, so far, being an experiment at Bristol that used 6 photons, and experimentally confirmed that, yes, their transition amplitudes are given by permanents of 6×6 complex matrices. The challenge now is to build single-photon sources that are good enough that you could scale up to (let’s say) 30 photons, which is where you’d really start seeing a quantum advantage over the best known classical algorithms. And again, this whole quest really started with Avi’s joke.

A year after my and Arkhipov’s work, I noticed that one also can run the connection between quantum optics and the permanent in the “reverse” direction. In other words: with BosonSampling, we used the famous theorem of Valiant, that the permanent is #P-complete, to help us argue that bosons can solve hard sampling problems. But if we know by some other means that quantum optics lets us encode #P-complete problems, then we can use that to give an independent, “quantum” proof that the permanent is #P-complete in the first place! As it happens, there is another way to see why quantum optics lets us encode #P-complete problems. Namely, we can use celebrated work by Knill, Laflamme, and Milburn (KLM) from 2001, which showed how to perform universal quantum computation using quantum optics with the one additional resource of “feed-forward measurements.” With minor modifications, the construction by KLM also lets us encode a #P-complete problem into a bosonic amplitude, which we know is a permanent—thereby proving that the permanent is #P-complete, in what I personally regard as a much more intuitive way than Valiant’s original approach based on cycle covers. This illustrates a theme that we’ve seen over and over in the last 13 years or so, which is the use of quantum methods and arguments to gain insight even about classical computation.

Admittedly, I wasn’t proving anything here in classical complexity theory that wasn’t already known, just giving a different proof for an old result! Extremely recently, however, my students Daniel Grier and Luke Schaeffer have extended my argument based on quantum optics, to show that computing the permanent of a unitary or orthogonal matrix is #P-complete. (Indeed, even over finite fields of characteristic k, computing the permanent of an orthogonal matrix is a ModkP-complete problem, as long as k is not 2 or 3—which turns out to be the tight answer.) This is not a result that we previously knew by any means, whether quantum or classical.

I can’t resist telling you the biggest theoretical open problem that arose from my and Arkhipov’s work. We would like to say: even if you had a polynomial-time algorithm that sampled a probability distribution that was merely close, in variation distance, to the BosonSampling distribution, that would already imply a collapse of the polynomial hierarchy. But we’re only able to prove that assuming a certain problem is #P-complete, which no one has been able to prove #P-complete. That problem is the following:

Given an n×n matrix A, each of whose entries is an i.i.d. complex Gaussian with mean 0 and variance 1 (that is, drawn from N(0,1)C), estimate |Per(A)|2, to within additive error ±ε·n!, with probability at least 1-δ over the choice of A. Do this in time polynomial in n, 1/ε, and 1/δ.

Note that, if you care about exactly computing the permanent of a Gaussian random matrix, or about approximating the permanent of an arbitrary matrix, we know how to prove both of those problems #P-complete. The difficulty “only” arises when we combine approximation and average-case in the same problem.

At the moment, we don’t even know something more basic, which is: what’s the distribution over |Per(A)|2, when A is an n×n matrix of i.i.d. N(0,1)C Gaussians? Based on numerical evidence, we conjecture that the distribution converges to lognormal as n gets large. By using the interpretation of the determinant as the volume of a parallelipiped, we can prove that the distribution over |Det(A)|2 converges to lognormal. And the distribution over |Per(A)|2 looks almost the same when you plot it. But not surprisingly, the permanent is harder to analyze.

This brings me to my final vignette. Why would anyone even suspect that approximating the permanent of a Gaussian random matrix would be a #P-hard problem? Well, because if you look at the permanent of an n×n matrix over a large enough finite field, say Fp, that function famously has the property of random self-reducibility. This means: the ability to calculate such a permanent in polynomial time, on 90% all matrices in Fpn×n, or even for that matter on only 1% of them, implies the ability to calculate it in polynomial time on every such matrix.

The reason for this is simply that the permanent is a low-degree polynomial, and low-degree polynomials have extremely useful error-correcting properties. In particular, if you can compute such a polynomial on any large fraction of points, then you can do noisy polynomial interpolation (e.g., the Berlekamp-Welch algorithm, or list decoding), in order to get the value of the polynomial on an arbitrary point.

I don’t specifically remember Avi talking about the random self-reducibility of the permanent in his 2000 lecture, but he obviously would have talked about it! And it was really knowing about the random self-reducibility of the permanent, and how powerful it was, that let me and Alex Arkhipov to the study of BosonSampling in the first place.

In complexity theory, the random self-reducibility of the permanent is hugely important because it was sort of the spark for some of our most convincing examples of non-relativizing results—that is, results that fail relative to a suitable oracle. The most famous such result is that #P, and for that matter even PSPACE, admit interactive protocols (the IP=PSPACE theorem). In the 1970s, Baker, Gill, and Solovay pointed out that non-relativizing methods would be needed to resolve P vs. NP and many of the other great problems of the field.

In 2007, Avi and I wrote our only joint paper so far. In that paper, we decided to take a closer look at the non-relativizing results based on interactive proofs. We said: while it’s true that these results don’t relativize—that is, there are oracles relative to which they fail—nevertheless, these results hold relative to all oracles that themselves encode low-degree polynomials over finite fields (such as the permanent). So, introducing a term, Avi and I said that results like IP=PSPACE algebrize.

But then we also showed that, if you want to prove P≠NP—or for that matter, even prove circuit lower bounds that go “slightly” beyond what’s already known (such as NEXPP/poly)—you’ll need techniques that are not only non-relativizing, but also non-algebrizing. So in some sense, the properties of the permanent that are used (for example) in proving that it has an interactive protocol, just “aren’t prying the black box open wide enough.”

I have a more recent result, from 2011 or so, that I never got around to finishing a paper about. In this newer work, I decided to take another look at the question: what is it about the permanent that actually fails to relativize? And I prove the following result: relative to an arbitrary oracle A, the class #P has complete problems that are both random self-reducible and downward self-reducible (that is, reducible to smaller instances of the same problem). So, contrary to what certainly I and maybe others had often thought, it’s not the random self-reducibility of the permanent that’s the crucial thing about it. What’s important, instead, is a further property that the permanent has, of being self-checkable and self-correctible.

In other words: given (say) a noisy circuit for the permanent, it’s not just that you can correct that circuit to compute whichever low-degree polynomial it was close to computing. Rather, it’s that you can confirm that the polynomial is in fact the permanent, and nothing else.

I like the way Ketan Mulmuley thinks about this phenomenon in his Geometric Complexity Theory, which is a speculative, audacious program to try to prove that the permanent is harder than the determinant, and to tackle the other great separation questions of complexity theory (including P vs. NP), by using algebraic geometry and representation theory. Mulmuley says: the permanent is a polynomial in the entries of an n×n matrix that not only satisfies certain symmetries (e.g., under interchanging rows or columns), but is uniquely characterized by those symmetries. In other words, if you find a polynomial that passes certain tests—for example, if it behaves in the right way under rescaling and interchanging rows and columns—then that polynomial must be the permanent, or a scalar multiple of the permanent. Similarly, if you find a polynomial that passes the usual interactive proof for the permanent, that polynomial must be the permanent. I think this goes a long way toward explaining why the permanent is so special: it’s not just any hard-to-compute, low-degree polynomial; it’s one that you can recognize when you come across it.

I’ve now told you about the eventual impact of one single survey talk that Avi gave 16 years ago—not even a particularly major or important one. So you can only imagine what Avi’s impact must have been on all of us, if you integrate over all the talks he’s given and papers he’s written and young people he’s mentored and connections he’s made his entire career. May that impact be permanent.

### The No-Cloning Theorem and the Human Condition: My After-Dinner Talk at QCRYPT

Monday, September 19th, 2016

The following are the after-dinner remarks that I delivered at QCRYPT’2016, the premier quantum cryptography conference, on Thursday Sep. 15 in Washington DC.  You could compare to my after-dinner remarks at QIP’2006 to see how much I’ve “”matured”” since then. Thanks so much to Yi-Kai Liu and the other organizers for inviting me and for putting on a really fantastic conference.

It’s wonderful to be here at QCRYPT among so many friends—this is the first significant conference I’ve attended since I moved from MIT to Texas. I do, however, need to register a complaint with the organizers, which is: why wasn’t I allowed to bring my concealed firearm to the conference? You know, down in Texas, we don’t look too kindly on you academic elitists in Washington DC telling us what to do, who we can and can’t shoot and so forth. Don’t mess with Texas! As you might’ve heard, many of us Texans even support a big, beautiful, physical wall being built along our border with Mexico. Personally, though, I don’t think the wall proposal goes far enough. Forget about illegal immigration and smuggling: I don’t even want Americans and Mexicans to be able to win the CHSH game with probability exceeding 3/4. Do any of you know what kind of wall could prevent that? Maybe a metaphysical wall.

OK, but that’s not what I wanted to talk about. When Yi-Kai asked me to give an after-dinner talk, I wasn’t sure whether to try to say something actually relevant to quantum cryptography or just make jokes. So I’ll do something in between: I’ll tell you about research directions in quantum cryptography that are also jokes.

The subject of this talk is a deep theorem that stands as one of the crowning achievements of our field. I refer, of course, to the No-Cloning Theorem. Almost everything we’re talking about at this conference, from QKD onwards, is based in some way on quantum states being unclonable. If you read Stephen Wiesner’s paper from 1968, which founded quantum cryptography, the No-Cloning Theorem already played a central role—although Wiesner didn’t call it that. By the way, here’s my #1 piece of research advice to the students in the audience: if you want to become immortal, just find some fact that everyone already knows and give it a name!

I’d like to pose the question: why should our universe be governed by physical laws that make the No-Cloning Theorem true? I mean, it’s possible that there’s some other reason for our universe to be quantum-mechanical, and No-Cloning is just a byproduct of that. No-Cloning would then be like the armpit of quantum mechanics: not there because it does anything useful, but just because there’s gotta be something under your arms.

OK, but No-Cloning feels really fundamental. One of my early memories is when I was 5 years old or so, and utterly transfixed by my dad’s home fax machine, one of those crappy 1980s fax machines with wax paper. I kept thinking about it: is it really true that a piece of paper gets transmaterialized, sent through a wire, and reconstituted at the other location? Could I have been that wrong about how the universe works? Until finally I got it—and once you get it, it’s hard even to recapture your original confusion, because it becomes so obvious that the world is made not of stuff but of copyable bits of information. “Information wants to be free!”

The No-Cloning Theorem represents nothing less than a partial return to the view of the world that I had before I was five. It says that quantum information doesn’t want to be free: it wants to be private. There is, it turns out, a kind of information that’s tied to a particular place, or set of places. It can be moved around, or even teleported, but it can’t be copied the way a fax machine copies bits.

So I think it’s worth at least entertaining the possibility that we don’t have No-Cloning because of quantum mechanics; we have quantum mechanics because of No-Cloning—or because quantum mechanics is the simplest, most elegant theory that has unclonability as a core principle. But if so, that just pushes the question back to: why should unclonability be a core principle of physics?

Quantum Key Distribution

A first suggestion about this question came from Gilles Brassard, who’s here. Years ago, I attended a talk by Gilles in which he speculated that the laws of quantum mechanics are what they are because Quantum Key Distribution (QKD) has to be possible, while bit commitment has to be impossible. If true, that would be awesome for the people at this conference. It would mean that, far from being this exotic competitor to RSA and Diffie-Hellman that’s distance-limited and bandwidth-limited and has a tiny market share right now, QKD would be the entire reason why the universe is as it is! Or maybe what this really amounts to is an appeal to the Anthropic Principle. Like, if QKD hadn’t been possible, then we wouldn’t be here at QCRYPT to talk about it.

Quantum Money

But maybe we should search more broadly for the reasons why our laws of physics satisfy a No-Cloning Theorem. Wiesner’s paper sort of hinted at QKD, but the main thing it had was a scheme for unforgeable quantum money. This is one of the most direct uses imaginable for the No-Cloning Theorem: to store economic value in something that it’s physically impossible to copy. So maybe that’s the reason for No-Cloning: because God wanted us to have e-commerce, and didn’t want us to have to bother with blockchains (and certainly not with credit card numbers).

The central difficulty with quantum money is: how do you authenticate a bill as genuine? (OK, fine, there’s also the dificulty of how to keep a bill coherent in your wallet for more than a microsecond or whatever. But we’ll leave that for the engineers.)

In Wiesner’s original scheme, he solved the authentication problem by saying that, whenever you want to verify a quantum bill, you bring it back to the bank that printed it. The bank then looks up the bill’s classical serial number in a giant database, which tells the bank in which basis to measure each of the bill’s qubits.

With this system, you can actually get information-theoretic security against counterfeiting. OK, but the fact that you have to bring a bill to the bank to be verified negates much of the advantage of quantum money in the first place. If you’re going to keep involving a bank, then why not just use a credit card?

That’s why over the past decade, some of us have been working on public-key quantum money: that is, quantum money that anyone can verify. For this kind of quantum money, it’s easy to see that the No-Cloning Theorem is no longer enough: you also need some cryptographic assumption. But OK, we can consider that. In recent years, we’ve achieved glory by proposing a huge variety of public-key quantum money schemes—and we’ve achieved even greater glory by breaking almost all of them!

After a while, there were basically two schemes left standing: one based on knot theory by Ed Farhi, Peter Shor, et al. That one has been proven to be secure under the assumption that it can’t be broken. The second scheme, which Paul Christiano and I proposed in 2012, is based on hidden subspaces encoded by multivariate polynomials. For our scheme, Paul and I were able to do better than Farhi et al.: we gave a security reduction. That is, we proved that our quantum money scheme is secure, unless there’s a polynomial-time quantum algorithm to find hidden subspaces encoded by low-degree multivariate polynomials (yadda yadda, you can look up the details) with much greater success probability than we thought possible.

Today, the situation is that my and Paul’s security proof remains completely valid, but meanwhile, our money is completely insecure! Our reduction means the opposite of what we thought it did. There is a break of our quantum money scheme, and as a consequence, there’s also a quantum algorithm to find large subspaces hidden by low-degree polynomials with much better success probability than we’d thought. What happened was that first, some French algebraic cryptanalysts—Faugere, Pena, I can’t pronounce their names—used Gröbner bases to break the noiseless version of scheme, in classical polynomial time. So I thought, phew! At least I had acceded when Paul insisted that we also include a noisy version of the scheme. But later, Paul noticed that there’s a quantum reduction from the problem of breaking our noisy scheme to the problem of breaking the noiseless one, so the former is broken as well.

I’m choosing to spin this positively: “we used quantum money to discover a striking new quantum algorithm for finding subspaces hidden by low-degree polynomials. Err, yes, that’s exactly what we did.”

But, bottom line, until we manage to invent a better public-key quantum money scheme, or otherwise sort this out, I don’t think we’re entitled to claim that God put unclonability into our universe in order for quantum money to be possible.

Copy-Protected Quantum Software

So if not money, then what about its cousin, copy-protected software—could that be why No-Cloning holds? By copy-protected quantum software, I just mean a quantum state that, if you feed it into your quantum computer, lets you evaluate some Boolean function on any input of your choice, but that doesn’t let you efficiently prepare more states that let the same function be evaluated. I think this is important as one of the preeminent evil applications of quantum information. Why should nuclear physicists and genetic engineers get a monopoly on the evil stuff?

OK, but is copy-protected quantum software even possible? The first worry you might have is that, yeah, maybe it’s possible, but then every time you wanted to run the quantum program, you’d have to make a measurement that destroyed it. So then you’d have to go back and buy a new copy of the program for the next run, and so on. Of course, to the software company, this would presumably be a feature rather than a bug!

But as it turns out, there’s a fact many of you know—sometimes called the “Gentle Measurement Lemma,” other times the “Almost As Good As New Lemma”—which says that, as long as the outcome of your measurement on a quantum state could be predicted almost with certainty given knowledge of the state, the measurement can be implemented in such a way that it hardly damages the state at all. This tells us that, if quantum money, copy-protected quantum software, and the other things we’re talking about are possible at all, then they can also be made reusable. I summarize the principle as: “if rockets, then space shuttles.”

Much like with quantum money, one can show that, relative to a suitable oracle, it’s possible to quantumly copy-protect any efficiently computable function—or rather, any function that’s hard to learn from its input/output behavior. Indeed, the implementation can be not only copy-protected but also obfuscated, so that the user learns nothing besides the input/output behavior. As Bill Fefferman pointed out in his talk this morning, the No-Cloning Theorem lets us bypass Barak et al.’s famous result on the impossibility of obfuscation, because their impossibility proof assumed the ability to copy the obfuscated program.

Of course, what we really care about is whether quantum copy-protection is possible in the real world, with no oracle. I was able to give candidate implementations of quantum copy-protection for extremely special functions, like one that just checks the validity of a password. In the general case—that is, for arbitrary programs—Paul Christiano has a beautiful proposal for how to do it, which builds on our hidden-subspace money scheme. Unfortunately, since our money scheme is currently in the shop being repaired, it’s probably premature to think about the security of the much more complicated copy-protection scheme! But these are wonderful open problems, and I encourage any of you to come and scoop us. Once we know whether uncopyable quantum software is possible at all, we could then debate whether it’s the “reason” for our universe to have unclonability as a core principle.

Along the same lines, I can’t resist mentioning some favorite research directions, which some enterprising student here could totally turn into a talk at next year’s QCRYPT.

Firstly, what can we say about clonable versus unclonable quantum proofs—that is, QMA witness states? In other words: for which problems in QMA can we ensure that there’s an accepting witness that lets you efficiently create as many additional accepting witnesses as you want? (I mean, besides the QCMA problems, the ones that have short classical witnesses?) For which problems in QMA can we ensure that there’s an accepting witness that doesn’t let you efficiently create any additional accepting witnesses? I do have a few observations about these questions—ask me if you’re interested—but on the whole, I believe almost anything one can ask about them remains open.

Admittedly, it’s not clear how much use an unclonable proof would be. Like, imagine a quantum state that encoded a proof of the Riemann Hypothesis, and which you would keep in your bedroom, in a glass orb on your nightstand or something. And whenever you felt your doubts about the Riemann Hypothesis resurfacing, you’d take the state out of its orb and measure it again to reassure yourself of RH’s truth. You’d be like, “my preciousssss!” And no one else could copy your state and thereby gain the same Riemann-faith-restoring powers that you had. I dunno, I probably won’t hawk this application in a DARPA grant.

Similarly, one can ask about clonable versus unclonable quantum advice states—that is, initial states that are given to you to boost your computational power beyond that of an ordinary quantum computer. And that’s also a fascinating open problem.

OK, but maybe none of this quite gets at why our universe has unclonability. And this is an after-dinner talk, so do you want me to get to the really crazy stuff? Yes?

OK! What if unclonability is our universe’s way around the paradoxes of self-reference, like the unsolvability of the halting problem and Gödel’s Incompleteness Theorem? Allow me to explain what I mean.

In kindergarten or wherever, we all learn Turing’s proof that there’s no computer program to solve the halting problem. But what isn’t usually stressed is that that proof actually does more than advertised. If someone hands you a program that they claim solves the halting problem, Turing doesn’t merely tell you that that person is wrong—rather, he shows you exactly how to expose the person as a jackass, by constructing an example input on which their program fails. All you do is, you take their claimed halt-decider, modify it in some simple way, and then feed the result back to the halt-decider as input. You thereby create a situation where, if your program halts given its own code as input, then it must run forever, and if it runs forever then it halts. “WHOOOOSH!” [head-exploding gesture]

OK, but now imagine that the program someone hands you, which they claim solves the halting problem, is a quantum program. That is, it’s a quantum state, which you measure in some basis depending on the program you’re interested in, in order to decide whether that program halts. Well, the truth is, this quantum program still can’t work to solve the halting problem. After all, there’s some classical program that simulates the quantum one, albeit less efficiently, and we already know that the classical program can’t work.

But now consider the question: how would you actually produce an example input on which this quantum program failed to solve the halting problem? Like, suppose the program worked on every input you tried. Then ultimately, to produce a counterexample, you might need to follow Turing’s proof and make a copy of the claimed quantum halt-decider. But then, of course, you’d run up against the No-Cloning Theorem!

So we seem to arrive at the conclusion that, while of course there’s no quantum program to solve the halting problem, there might be a quantum program for which no one could explicitly refute that it solved the halting problem, by giving a counterexample.

I was pretty excited about this observation for a day or two, until I noticed the following. Let’s suppose your quantum program that allegedly solves the halting problem has n qubits. Then it’s possible to prove that the program can’t possibly be used to compute more than, say, 2n bits of Chaitin’s constant Ω, which is the probability that a random program halts. OK, but if we had an actual oracle for the halting problem, we could use it to compute as many bits of Ω as we wanted. So, suppose I treated my quantum program as if it were an oracle for the halting problem, and I used it to compute the first 2n bits of Ω. Then I would know that, assuming the truth of quantum mechanics, the program must have made a mistake somewhere. There would still be something weird, which is that I wouldn’t know on which input my program had made an error—I would just know that it must’ve erred somewhere! With a bit of cleverness, one can narrow things down to two inputs, such that the quantum halt-decider must have erred on at least one of them. But I don’t know whether it’s possible to go further, and concentrate the wrongness on a single query.

We can play a similar game with other famous applications of self-reference. For example, suppose we use a quantum state to encode a system of axioms. Then that system of axioms will still be subject to Gödel’s Incompleteness Theorem (which I guess I believe despite the umlaut). If it’s consistent, it won’t be able to prove all the true statements of arithmetic. But we might never be able to produce an explicit example of a true statement that the axioms don’t prove. To do so we’d have to clone the state encoding the axioms and thereby violate No-Cloning.

Personal Identity

But since I’m a bit drunk, I should confess that all this stuff about Gödel and self-reference is just a warmup to what I really wanted to talk about, which is whether the No-Cloning Theorem might have anything to do with the mysteries of personal identity and “free will.” I first encountered this idea in Roger Penrose’s book, The Emperor’s New Mind. But I want to stress that I’m not talking here about the possibility that the brain is a quantum computer—much less about the possibility that it’s a quantum-gravitational hypercomputer that uses microtubules to solve the halting problem! I might be drunk, but I’m not that drunk. I also think that the Penrose-Lucas argument, based on Gödel’s Theorem, for why the brain has to work that way is fundamentally flawed.

But here I’m talking about something different. See, I have a lot of friends in the Singularity / Friendly AI movement. And I talk to them whenever I pass through the Bay Area, which is where they congregate. And many of them express great confidence that before too long—maybe in 20 or 30 years, maybe in 100 years—we’ll be able to upload ourselves to computers and live forever on the Internet (as opposed to just living 70% of our lives on the Internet, like we do today).

This would have lots of advantages. For example, any time you were about to do something dangerous, you’d just make a backup copy of yourself first. If you were struggling with a conference deadline, you’d spawn 100 temporary copies of yourself. If you wanted to visit Mars or Jupiter, you’d just email yourself there. If Trump became president, you’d not run yourself for 8 years (or maybe 80 or 800 years). And so on.

Admittedly, some awkward questions arise. For example, let’s say the hardware runs three copies of your code and takes a majority vote, just for error-correcting purposes. Does that bring three copies of you into existence, or only one copy? Or let’s say your code is run homomorphically encrypted, with the only decryption key stored in another galaxy. Does that count? Or you email yourself to Mars. If you want to make sure that you’ll wake up on Mars, is it important that you delete the copy of your code that remains on earth? Does it matter whether anyone runs the code or not? And what exactly counts as “running” it? Or my favorite one: could someone threaten you by saying, “look, I have a copy of your code, and if you don’t do what I say, I’m going to make a thousand copies of it and subject them all to horrible tortures?”

The issue, in all these cases, is that in a world where there could be millions of copies of your code running on different substrates in different locations—or things where it’s not even clear whether they count as a copy or not—we don’t have a principled way to take as input a description of the state of the universe, and then identify where in the universe you are—or even a probability distribution over places where you could be. And yet you seem to need such a way in order to make predictions and decisions.

A few years ago, I wrote this gigantic, post-tenure essay called The Ghost in the Quantum Turing Machine, where I tried to make the point that we don’t know at what level of granularity a brain would need to be simulated in order to duplicate someone’s subjective identity. Maybe you’d only need to go down to the level of neurons and synapses. But if you needed to go all the way down to the molecular level, then the No-Cloning Theorem would immediately throw a wrench into most of the paradoxes of personal identity that we discussed earlier.

For it would mean that there were some microscopic yet essential details about each of us that were fundamentally uncopyable, localized to a particular part of space. We would all, in effect, be quantumly copy-protected software. Each of us would have a core of unpredictability—not merely probabilistic unpredictability, like that of a quantum random number generator, but genuine unpredictability—that an external model of us would fail to capture completely. Of course, by having futuristic nanorobots scan our brains and so forth, it would be possible in principle to make extremely realistic copies of us. But those copies necessarily wouldn’t capture quite everything. And, one can speculate, maybe not enough for your subjective experience to “transfer over.”

Maybe the most striking aspect of this picture is that sure, you could teleport yourself to Mars—but to do so you’d need to use quantum teleportation, and as we all know, quantum teleportation necessarily destroys the original copy of the teleported state. So we’d avert this metaphysical crisis about what to do with the copy that remained on Earth.

Look—I don’t know if any of you are like me, and have ever gotten depressed by reflecting that all of your life experiences, all your joys and sorrows and loves and losses, every itch and flick of your finger, could in principle be encoded by a huge but finite string of bits, and therefore by a single positive integer. (Really? No one else gets depressed about that?) It’s kind of like: given that this integer has existed since before there was a universe, and will continue to exist after the universe has degenerated into a thin gruel of radiation, what’s the point of even going through the motions? You know?

But the No-Cloning Theorem raises the possibility that at least this integer is really your integer. At least it’s something that no one else knows, and no one else could know in principle, even with futuristic brain-scanning technology: you’ll always be able to surprise the world with a new digit. I don’t know if that’s true or not, but if it were true, then it seems like the sort of thing that would be worthy of elevating unclonability to a fundamental principle of the universe.

So as you enjoy your dinner and dessert at this historic Mayflower Hotel, I ask you to reflect on the following. People can photograph this event, they can video it, they can type up transcripts, in principle they could even record everything that happens down to the millimeter level, and post it on the Internet for posterity. But they’re not gonna get the quantum states. There’s something about this evening, like about every evening, that will vanish forever, so please savor it while it lasts. Thank you.

Update (Sep. 20): Unbeknownst to me, Marc Kaplan did video the event and put it up on YouTube! Click here to watch. Thanks very much to Marc! I hope you enjoy, even though of course, the video can’t precisely clone the experience of having been there.

[Note: The part where I raise my middle finger is an inside joke—one of the speakers during the technical sessions inadvertently did the same while making a point, causing great mirth in the audience.]

### More Wrong Things I Said in Papers

Friday, July 29th, 2016

Two years ago, I wrote a blog post entitled PostBQP Postscripts, owning up to not one but four substantive mathematical errors that I’d made over the years in my published papers, and which my students and colleagues later brought to my sheepish attention.  Fortunately, none of these errors affected the papers’ main messages; they just added interesting new twists to the story.  Even so, I remember feeling at the time like undergoing this public repentance was soul-cleansing intellectual hygiene.  I also felt like writing one big “post of shame” was easier than writing a bunch of separate errata and submitting them to journals, while also reaching a wider audience (and, therefore, doing an even better soul-cleansing job).

So I resolved that, anytime I’d saved up enough errata, I’d do another sackcloth-and-ashes post.  Which brings us to today.  Without further ado:

I. Quantum Money Falling Down

My and Paul Christiano’s explicit public-key quantum money scheme—the one based on low-degree polynomials—has now been fully broken.  To clarify, our abstract hidden-subspace scheme—the one that uses a classical black-box to test membership in the subspaces—remains totally fine.  Indeed, we unconditionally proved the security of the black-box scheme, and our security proof stands.  In the paper, though, we also stuck our necks out further, and conjectured that you could instantiate the black box, by publishing random low-degree polynomials that vanish on the subspaces you want to hide.  While I considered this superfluous, at Paul’s insistence, we also recommended adding completely-random “noise polynomials” for extra security.

Our scheme was broken in two stages.  First, in 2014, Pena et al. broke the noiseless version of our scheme, using Gröbner-basis methods, over fields of characteristic greater than 2.  Over F2—the field we happened to use in our scheme—Pena et al. couldn’t quite prove that their attack worked, but they gave numerical evidence that at least it finds the subspaces in nO(log n) time.  Note that nothing in Pena et al.’s attack is specific to quantum money: indeed, their attack consists of a purely classical algorithm, which efficiently solves the general classical problem of recovering large subspaces from polynomials that hide them.

At that point, at least the noisy version of our scheme—the one Paul had insisted we include—was still standing!  Indeed, the Gröbner-basis attack seemed to break down entirely when some of the polynomials were random garbage.

Later, though, Paul and Or Sattath realized that a quantum trick—basically, the single-copy tomography of Farhi et al.—can identify which polynomials are the noisy ones, provided we’re given a legitimate quantum money state to start with.  As a consequence, the problem of breaking the noisy scheme can be reduced to the problem of breaking the noiseless scheme—i.e., the problem that Pena et al. already essentially solved.

As bad as this sounds, it has an interesting positive consequence.  In our paper, Paul and I had actually given a security reduction for our money scheme based on low-degree polynomials.  In particular, we showed that there’s no polynomial-time quantum algorithm to counterfeit our money states, unless there’s a polynomial-time quantum algorithm that finds a basis for a subspace S≤F2n of dimension n/2 with Ω(2-n/2) success probability, given a collection of low-degree polynomials p1,…,pm and q1,…,qm (m=O(n)) most of which vanish on S and its dual subspace respectively, but that are otherwise random.  So, running our reduction backwards, the only possible conclusion from the break is that there is such a quantum algorithm!  Yet we would’ve had no idea how to find that quantum algorithm without going through quantum money—nor do we know a classical algorithm for the problem, or even a quantum algorithm with Ω(1) success probability.

In the meantime, the problem of designing a public-key quantum money scheme, with good cryptographic evidence for its security, remains open.  It’s plausible that there’s some other, more secure way to instantiate my and Paul’s hidden subspace scheme, for example using lattices.  And even before we’ve found such a way, we can use indistinguishability obfuscation as a stopgap.  We could also seek cryptographic evidence for the security of other kinds of public-key quantum money, like Farhi et al.’s based on knot invariants.

A paper about all this is on our to-do stack. In the meantime, for further details, see Lecture 9 in my Barbados lecture notes.

II. A De-Merlinization Mistake

In my 2006 paper QMA/qpoly ⊆ PSPACE/poly: De-Merlinizing Quantum Protocols, the technical core of the complexity result was a new quantum information lemma that I called the “Quantum OR Bound” (Lemma 14 in the paper).

Basically, the Quantum OR Bound says that, if we have an unknown quantum state ρ, as well as a collection of measurements M1,…,Mn that we might want to make on ρ, then we can distinguish the case that (a) every Mi rejects ρ with overwhelming probability, from the case that (b) at least one Mi accepts ρ with high probability.  And we can do this despite having only one copy of ρ, and despite the fact that earlier measurements might corrupt ρ, thereby compromising the later measurements.  The intuition is simply that, if the earlier measurements corrupted ρ substantially, that could only be because some of them had a decent probability of accepting ρ, meaning that at any rate, we’re not in case (a).

I’ve since reused the Quantum OR Bound for other problems—most notably, a proof that private-key quantum money requires either a computational assumption or a huge database maintained by the bank (see Theorem 8.3.1 in my Barbados lecture notes).

Alas, Aram Harrow and Ashley Montanaro recently discovered that my proof of the Quantum OR Bound is wrong.  It’s wrong because I neglected the possibility of “Zeno-like behavior,” in which repeated measurements on a quantum state would gradually shift the state far away from its starting point, without ever having a significant probability of rejecting the state.  For some reason, I assumed without any adequate argument that choosing the measurements at random, rather than in a predetermined order, would solve that problem.

Now, I might actually be right that randomizing the measurements is enough to solve the Zeno problem!  That remains a plausible conjecture, which Harrow and Montanaro could neither confirm nor refute.  In the meantime, though, Harrow and Montanaro were able to recover my QMA/qpoly⊆PSPACE/poly theorem, and all the other conclusions known to follow from the Quantum OR Bound (including some new ones that they discover), by designing a new measurement procedure whose soundness they can prove.

Their new procedure is based on an elegant, obvious-in-retrospect idea that somehow never occurred to me.  Namely, instead of just applying Mi‘s to ρ, one can first put a control qubit into an equal superposition of the |0〉 and |1〉 states, and then apply Mi‘s conditioned on the control qubit being in the |1〉 state.  While doing this, one can periodically measure the control qubit in the {|+〉,|-〉} basis, in order to check directly whether applying the Mi‘s has substantially corrupted ρ.  (If it hasn’t, one will always get the outcome |+〉; if it has, one might get |-〉.)  Substantial corruption, if detected, then tells us that some Mi‘s must have had non-negligible probabilities of accepting ρ.

III. Almost As Good As True

One lemma that I’ve used even more than the Quantum OR Bound is what I’ve called the “Almost As Good As New Lemma,” and what others in the field have called the “Gentle Measurement Lemma.”

I claimed a proof of the AAGANL in my 2004 paper Limitations of Quantum Advice and One-Way Communication (Lemma 2.2 there), and have used the lemma in like half a dozen later papers.  Alas, when I lectured at Barbados, Sasha Razborov and others discovered that my proof of the AAGANL was missing a crucial step!  More concretely, the proof I gave there works for pure states but not for mixed states.  For mixed states, the trouble is that I take a purification of the mixed state—something that always exists mathematically—but then illegally assume that the measurement I’m analyzing acts on the particular purification I’ve conjured up.

Fortunately, one can easily fix this problem by decomposing the state ρ into a mixture of pure states, then applying my earlier argument to each pure state separately, and finally using Cauchy-Schwarz (or just the convexity of the square-root function) to recombine the results.  Moreover, this is exactly what other people’s proofs of the Gentle Measurement Lemma did do, though I’d never noticed it before Barbados—I just idly wondered why those other proofs took twice as long as mine to do the same work!  For a correct proof, see Lemma 1.3.1 in the Barbados lecture notes.

IV. Oracle Woes

In my 2010 paper BQP and the Polynomial Hierarchy, I claimed to construct oracles A relative to which BQP⊄BPPpath and BQP⊄SZK, even while making only partial progress toward the big prize, which would’ve been an oracle relative to which BQP⊄PH.  Not only that: I claimed to show that any problem with a property called “almost k-wise independence”—one example being the Forrelation (or Fourier Checking) problem that I introduced in that paper—was neither in BPPpath nor in SZK.  But I showed that Forrelation is in BQP, thus yielding the separations.

Alas, this past spring Lijie Chen, who was my superb visiting student from Tsinghua University, realized that my proofs of these particular separations were wrong.  Not only that, they were wrong because I implicitly substituted a ratio of expectations for an expectation of ratios (!).  Again, it might still be true that almost k-wise independent problems can be neither in BPPpath nor in SZK: that remains an interesting conjecture, which Lijie was unable to resolve one way or the other.  (On the other hand, I showed here that almost k-wise independent problems can be in PH.)

But never fear!  In a recent arXiv preprint, Lijie has supplied correct proofs for the BQP⊄BPPpath and BQP⊄SZK oracle separations—using the same Forrelation problem that I studied, but additional properties of Forrelation besides its almost k-wise independence.  Lijie notes that my proofs, had they worked, would also have yielded an oracle relative to which BQP⊄AM, which would’ve been a spectacular result, nontrivial progress toward BQP⊄PH.  His proofs, by contrast, apply only to worst-case decision problems rather than problems of distinguishing two probability distributions, and therefore don’t imply anything about BQP vs. AM.  Anyway, there’s other cool stuff in his paper too.

V. We Needed More Coffee

This is one I’ve already written about on this blog, but just in case anyone missed it … in my, Sean Carroll, and Lauren Ouellette’s original draft paper on the coffee automaton, the specific rule we discuss doesn’t generate any significant amount of complexity (in the sense of coarse-grained entropy).  We wrongly thought it did, because of a misinterpretation of our simulation data.  But as Brent Werness brought to our attention, not only does a corrected simulation not show any complexity bump, one can rigorously prove there’s no complexity bump.  And we could’ve realized all this from the beginning, by reflecting that pure random diffusion (e.g., what cream does in coffee when you don’t stir it with a spoon) doesn’t actually produce interesting tendril patterns.

On the other hand, Brent proposed a different rule—one that involves “shearing” whole regions of cream and coffee across each other—that does generate significant complexity, basically because of all the long-range correlations it induces.  And not only do we clearly see this in simulations, but the growth of complexity can be rigorously proven!  Anyway, we have a long-delayed revision of the paper that will explain all this in more detail, with Brent as well as MIT student Varun Mohan now added as coauthors.

If any of my colleagues feel inspired to write up their own “litanies of mathematical error,” they’re welcome to do so in the comments!  Just remember: you don’t earn any epistemic virtue points unless the errors you reveal actually embarrass you.  No humblebragging about how you once left out a minus sign in your paper that won the Fields Medal.

### The Complexity of Quantum States and Transformations: From Quantum Money to Black Holes

Sunday, July 17th, 2016

On February 21-25, I taught a weeklong mini-course at the Bellairs Research Institute in Barbados, where I tried to tell an integrated story about everything from quantum proof and advice complexity classes to quantum money to AdS/CFT and the firewall problem—all through the unifying lens of quantum circuit complexity.  After a long effort—on the part of me, the scribes, the guest lecturers, and the organizers—the 111-page lecture notes are finally available, right here.

Here’s the summary:

This mini-course will introduce participants to an exciting frontier for quantum computing theory: namely, questions involving the computational complexity of preparing a certain quantum state or applying a certain unitary transformation. Traditionally, such questions were considered in the context of the Nonabelian Hidden Subgroup Problem and quantum interactive proof systems, but they are much broader than that. One important application is the problem of “public-key quantum money” – that is, quantum states that can be authenticated by anyone, but only created or copied by a central bank – as well as related problems such as copy-protected quantum software. A second, very recent application involves the black-hole information paradox, where physicists realized that for certain conceptual puzzles in quantum gravity, they needed to know whether certain states and operations had exponential quantum circuit complexity. These two applications (quantum money and quantum gravity) even turn out to have connections to each other! A recurring theme of the course will be the quest to relate these novel problems to more traditional computational problems, so that one can say, for example, “this quantum money is hard to counterfeit if that cryptosystem is secure,” or “this state is hard to prepare if PSPACE is not in PP/poly.” Numerous open problems and research directions will be suggested, many requiring only minimal quantum background. Some previous exposure to quantum computing and information will be assumed, but a brief review will be provided.

If you still haven’t decided whether to tackle this thing: it’s basically a quantum complexity theory textbook (well, a textbook for certain themes within quantum complexity theory) that I’ve written and put on the Internet for free.  It has explanations of lots of published results both old and new, but also some results of mine (e.g., about private-key quantum money, firewalls, and AdS/CFT) that I shamefully haven’t yet written up as papers, and that therefore aren’t currently available anywhere else.  If you’re interested in certain specific topics—for example, only quantum money, or only firewalls—you should be able to skip around in the notes without too much difficulty.

Thanks so much to Denis Therien for organizing the mini-course, Anil Ada for managing the scribe notes effort, my PhD students Adam Bouland and Luke Schaeffer for their special guest lecture (the last one), and finally, the course attendees for their constant questions and interruptions, and (of course) for scribing.

And in case you were wondering: yes, I’ll do absolutely anything for science, even if it means teaching a weeklong course in Barbados!  Lest you consider this a pure island boondoggle, please know that I spent probably 12-14 hours per day either lecturing (in two 3-hour installments) or preparing for the lectures, with little sleep and just occasional dips in the ocean.

And now I’m headed to the Perimeter Institute for their It from Qubit summer school, not at all unrelated to my Barbados lectures.  This time, though, it’s thankfully other people’s turns to lecture…