Archive for the ‘Announcements’ Category

G. Phi. Fo. Fum.

Wednesday, November 4th, 2015

Update (Dec. 14): The long wait is over!  Here’s Laci’s paper on the arXiv.  So far, I’ve read it only deeply enough to note that it contains the following sentence:

A group G ≤ S(Ω) defines the category of G-isomorphisms of strings on the domain Ω; the natural notation for this category, the central object of study in this paper, would seem to be “G-Strings.”

With that, I believe Laci himself has outshone even reddit’s attempt to mine his breakthrough result for juvenile humor.

See also a nice Quanta article about Laci’s algorithm by Erica Klarreich.  There’s only one statement in the article that I disagree with: namely that, if graph isomorphism were inherently quasipolynomial time, then it would be the first natural example of such a problem.  We know other natural problems, like approximating free games and socially-optimal Nash equilibria, that are solvable in nO(log n) time but that can’t be in P unless 3SAT is solvable in ~exp(√n) time.

Update (Nov. 17): Video of Laci’s first talk is now available.

Breaking News (Nov. 12): Jeremy Kun has written up a phenomenal summary of Babai’s first lecture.  I haven’t carefully studied all of it, and in any case, there are many missing details to be filled in later (Babai told Kun that the preprint will be available “soon, soon!”).  But from the summary, four points stood out to me:

1. Babai actually claims a quasipolynomial-time algorithm for an interestingly more general problem than graph isomorphism, called string isomorphism.  This was already in the abstract, but googling didn’t reveal what string isomorphism was.  So, OK, here’s what it is: you’re given two strings x and y over some finite alphabet, as well as the generators of a group G of permutations of the string indices.  The problem is to determine whether you can transform x to y by applying a permutation in G.  Or even more generally: given a string x, find a full set of generators for the subgroup of G that fixes x.  See Kun’s post for the straightforward reductions from GI to these group-theoretic problems.
2. As was hinted in the abstract, in Babai’s analysis of his algorithm, there’s one step that relies on a statement whose only known proof depends on the Classification of Finite Simple Groups.  (Thus, it’s not the algorithm itself requires iterating through all the sporadic simple groups or anything like that; this only shows up in the correctness proof.)  This is not the first-ever computer-science application of the Classification of Finite Simple Groups (indeed, Babai himself has some previous ones), but it’s certainly the most dramatic.
3. In previous work on GI, the Johnson graph emerged over and over as a forehead-bangingly hard case that caused numerous algorithms to fail.  In the new work, it looks like Babai’s central technical innovation is to show that, in some sense, the Johnson graph is the only obstruction to taking the divide-and-conquer approaches that people that had tried before, and making them run in quasipolynomial time.  I.e., in each step of the recursion, either you can find a Johnson graph on a large fraction of the vertices and handle it specially, or else you can do something that works whenever there’s not a Johnson graph on a large fraction of the vertices.  Babai calls this “split-or-Johnson.”
4. Babai stressed that in some sense, his new algorithm is the culmination of a program laid out by Eugene Luks in 1982.  Now, the Classification of Finite Simple Groups was also more-or-less completed in the early 1980s.  To my mind, this raises a fascinating socio-mathematical question: which aspects of the new work, if any, could not have been done in the early 80s, possibly by Babai or Luks themselves?  what is it that needed another 30 years?  If the answer turns out to be “nothing,” then to me that’s an astounding illustration of the role of the individual in mathematical progress.  As in: Laci was nice enough to take a third-of-a-century break between his and Luks’ work in the early 80s, and the “natural next step” in their program … and still no one managed to use that break to beat him to the next step!

Earlier today, I was tipped off to what might be the theoretical computer science result of the decade.  My source asked me not to break the news on this blog—but since other theory bloggers (and twitterers) are now covering the story, I guess the graph is out of the Babai.

According to the University of Chicago’s theory seminar calendar, on Tuesday of next week (November 10), the legendary Laszlo Babai will be giving a talk about a new algorithm that solves the graph isomorphism problem in quasipolynomial time.  The previous fastest algorithm to decide whether two n-vertex graphs G and H are isomorphic—by Babai and Luks, back in 1983—ran in exp(√(n log n)) time.  If we credit the announcement, Babai has now gotten that down to exp(polylog(n)), putting one of the central problems of computer science “just barely above P.”  (For years, I’ve answered questions on this blog about the status of graph isomorphism—would I bet that it’s in BQP? in coNP? etc.—by saying that, as far as I and many others are concerned, it might as well just be in P.  Of course I’m happy to reaffirm that conjecture tonight.)

Next week, I assume, Laci will lecture to a packed house; then the experts will race to unpack the details.  Until then, we probably need to sit tight; I don’t know any more than what’s in the abstract.  For now, I’m delighted if commenters want to share general thoughts or questions about graph isomorphism (and I’ll try to answer what I can), but I won’t allow uninformed speculations or rumors about the details of the new result—not until Laci has had a chance to speak.

Update (Nov. 5): While we all wait with bated breath for more details, you can amuse yourself with the talk I gave at Laci’s 60th birthday conference five years ago.

Also, a comment of mine that I should probably promote to the main post:

Dana points out to me that non-native English speakers might not get the staggeringly clever pun in this post’s title (hey, it was the best I could do on a deadline).

So, alright, fee fi fo fum is what the approaching giant bellows in Jack and the Beanstalk. It means something big is on the horizon. Also, G is a graph, and Phi is an isomorphism.

Update (Nov. 12): So, Laci gave his talk. Video was made but does not appear to be available yet. However, Gabriel Gaster, who was in attendance, graciously live-tweeted everything. Here’s a Storify of the live-tweets. (What’s a “Storify”?)

Six announcements

Monday, September 21st, 2015
1. I did a podcast interview with Julia Galef for her series “Rationally Speaking.”  See also here for the transcript (which I read rather than having to listen to myself stutter).  The interview is all about Aumann’s Theorem, and whether rational people can agree to disagree.  It covers a lot of the same ground as my recent post on the same topic, except with less technical detail about agreement theory and more … well, agreement.  At Julia’s suggestion, we’re planning to do a follow-up podcast about the particular intractability of online disagreements.  I feel confident that we’ll solve that problem once and for all.  (Update: Also check out this YouTube video, where Julia offers additional thoughts about what we discussed.)
2. When Julia asked me to recommend a book at the end of the interview, I picked probably my favorite contemporary novel: The Mind-Body Problem by Rebecca Newberger Goldstein.  Embarrassingly, I hadn’t realized that Rebecca had already been on Julia’s show twice as a guest!  Anyway, one of the thrills of my life over the last year has been to get to know Rebecca a little, as well as her husband, who’s some guy named Steve Pinker.  Like, they both live right here in Boston!  You can talk to them!  I was especially pleased two weeks ago to learn that Rebecca won the National Humanities Medal—as I told Julia, Rebecca Goldstein getting a medal at the White House is the sort of thing I imagine happening in my ideal fantasy world, making it a pleasant surprise that it happened in this one.  Huge congratulations to Rebecca!
3. The NSA has released probably its most explicit public statement so far about its plans to move to quantum-resistant cryptography.  For more see Bruce Schneier’s Crypto-Gram.  Hat tip for this item goes to reader Ole Aamot, one of the only people I’ve ever encountered whose name alphabetically precedes mine.
4. Last Tuesday, I got to hear Ayaan Hirsi Ali speak at MIT about her new book, Heretic, and then spend almost an hour talking to students who had come to argue with her.  I found her clear, articulate, and courageous (as I guess one has to be in her line of work, even with armed cops on either side of the lecture hall).  After the shameful decision of Brandeis in caving in to pressure and cancelling Hirsi Ali’s commencement speech, I thought it spoke well of MIT that they let her speak at all.  The bar shouldn’t be that low, but it is.
5. From far away on the political spectrum, I also heard Noam Chomsky talk last week (my first time hearing him live), about the current state of linguistics.  Much of the talk, it struck me, could have been given in the 1950s with essentially zero change (and I suspect Chomsky would agree), though a few parts of it were newer, such as the speculation that human languages have many of the features they do in order to minimize the amount of computation that the speaker needs to perform.  The talk was full of declarations that there had been no useful work whatsoever on various questions (e.g., about the evolutionary function of language), that they were total mysteries and would perhaps remain total mysteries forever.
6. Many of you have surely heard by now that Terry Tao solved the Erdös Discrepancy Problem, by showing that for every infinite sequence of heads and tails and every positive integer C, there’s a positive integer k such that, if you look at the subsequence formed by every kth flip, there comes a point where the heads outnumber tails or vice versa by at least C.  This resolves a problem that’s been open for more than 80 years.  For more details, see this post by Timothy Gowers.  Notably, Tao’s proof builds, in part, on a recent Polymath collaborative online effort.  It was a big deal last year when Konev and Lisitsa used a SAT-solver to prove that there’s always a subsequence with discrepancy at least 3; Tao’s result now improves on that bound by ∞.

6-photon BosonSampling

Wednesday, August 19th, 2015

The news is more-or-less what the title says!

In Science, a group led by Anthony Laing at Bristol has now reported BosonSampling with 6 photons, beating their own previous record of 5 photons, as well as the earlier record of 4 photons achieved a few years ago by the Walmsley group at Oxford (as well as the 3-photon experiments done by groups around the world).  I only learned the big news from a commenter on this blog, after the paper was already published (protip: if you’ve pushed forward the BosonSampling frontier, feel free to shoot me an email about it).

As several people explain in the comments, the main advance in the paper is arguably not increasing the number of photons, but rather the fact that the device is completely reconfigurable: you can try hundreds of different unitary transformations with the same chip.  In addition, the 3-photon results have an unprecedentedly high fidelity (about 95%).

The 6-photon results are, of course, consistent with quantum mechanics: the transition amplitudes are indeed given by permanents of 6×6 complex matrices.  Key sentence:

After collecting 15 sixfold coincidence events, a confidence of P = 0.998 was determined that these are drawn from a quantum (not classical) distribution.

No one said scaling BosonSampling would be easy: I’m guessing that it took weeks of data-gathering to get those 15 coincidence events.  Scaling up further will probably require improvements to the sources.

There’s also a caveat: their initial state consisted of 2 modes with 3 photons each, as opposed to what we really want, which is 6 modes with 1 photon each.  (Likewise, in the Walmsley group’s 4-photon experiments, the initial state consisted of 2 modes with 2 photons each.)  If the number of modes stayed 2 forever, then the output distributions would remain easy to sample with a classical computer no matter how many photons we had, since we’d then get permanents of matrices with only 2 distinct rows.  So “scaling up” needs to mean increasing not only the number of photons, but also the number of sources.

Nevertheless, this is an obvious step forward, and it came sooner than I expected.  Huge congratulations to the authors on their accomplishment!

But you might ask: given that 6×6 permanents are still pretty easy for a classical computer (the more so when the matrices have only 2 distinct rows), why should anyone care?  Well, the new result has major implications for what I’ve always regarded as the central goal of quantum computing research, much more important than breaking RSA or Grover search or even quantum simulation: namely, getting Gil Kalai to admit he was wrong.  Gil is on record, repeatedly, on this blog as well as his (see for example here), as saying that he doesn’t think BosonSampling will ever be possible even with 7 or 8 photons.  I don’t know whether the 6-photon result is giving him second thoughts (or sixth thoughts?) about that prediction.

Jacob Bekenstein (1947-2015)

Tuesday, August 18th, 2015

Today I learned the sad news that Jacob Bekenstein, one of the great theoretical physicists of our time, passed away at the too-early age of 68.

Everyone knows what a big deal it was when Stephen Hawking discovered in 1975 that black holes radiate.  Bekenstein was the guy who, as a grad student in Princeton in the early 1970s, was already raving about black holes having nonzero entropy and temperature, and satisfying the Second Law of Thermodynamics—something just about everyone, including Hawking, considered nuts at the time.  It was, as I understand it, Hawking’s failed attempt to prove Bekenstein wrong that led to Hawking’s discovery of the Hawking radiation, and thence to the modern picture of black holes.

In the decades since, Bekenstein continued to prove ingenious physical inequalities, often using thought experiments involving black holes.  The most famous of these, the Bekenstein bound, says that the number of bits that can be stored in any bounded physical system is finite, and is upper-bounded by ~2.6×1043 MR, where M is the system’s mass in kilograms and R is its radius in meters.  (This bound is saturated by black holes, and only by black holes, which therefore emerge as the most compact possible storage medium—though probably not the best for retrieval!)  Bekenstein’s lectures were models of clarity and rigor: at conferences full of audacious speculations, he stood out to my non-expert eyes as someone who was simply trying to follow chains of logic from accepted physical principles, however mind-bogglingly far those chains led but no further.

I first met Bekenstein in 2003, when I was a grad student spending a semester at Hebrew University in Jerusalem.  I was struck by the kindness he showed a 21-year-old nobody, who wasn’t even a physics student, coming to bother him.  Not only did he listen patiently to my blather about applying computational complexity to physics, he said that of course physics should ultimately aim to understand everything as the output of some computer program, that he too was thinking in terms of computation when he studied black-hole entropy.  I remember pondering the fact that the greatest reductionist I’d ever met was wearing a yarmulke—and then scolding myself for wasting precious brain-cycles on such a trivial thought when there was science to discuss.  I met Bekenstein maybe four or five more times on visits to Israel, most recently a year and a half ago, when we shared walks to and from the hotel at a firewall workshop at the Weizmann Institute.  He was unfailingly warm, modest, and generous—totally devoid of the egotism that I’ve heard can occasionally afflict people of his stature.  Now, much like with the qubits hitting the event horizon, the information that comprised Jacob Bekenstein might seem to be gone, but it remains woven into the cosmos.

Introducing some British people to P vs. NP

Wednesday, July 22nd, 2015

Here’s a 5-minute interview that I did with The Naked Scientists (a radio show syndicated by the BBC, and no, I’m not sure why it’s called that), explaining the P vs. NP problem.  For readers of this blog, there won’t be anything new here, but, well … you might enjoy the rest of the hour-long programme [sic], which also includes segments about a few other Clay Millennium problems (the Riemann Hypothesis, Navier-Stokes, and Poincaré), as well as a segment about fat fish that live in caves and gorge themselves on food on the rare occasions when it becomes available, and which turn out to share a gene variant with humans with similar tendencies.

Quantum query complexity: the other shoe drops

Tuesday, June 30th, 2015

Two weeks ago I blogged about a breakthrough in query complexity: namely, the refutation by Ambainis et al. of a whole slew of conjectures that had stood for decades (and that I mostly believed, and that had helped draw me into theoretical computer science as a teenager) about the largest possible gaps between various complexity measures for total Boolean functions. Specifically, Ambainis et al. built on a recent example of Göös, Pitassi, and Watson to construct bizarre Boolean functions f with, among other things, near-quadratic gaps between D(f) and R0(f) (where D is deterministic query complexity and R0 is zero-error randomized query complexity), near-1.5th-power gaps between R0(f) and R(f) (where R is bounded-error randomized query complexity), and near-4th-power gaps between D(f) and Q(f) (where Q is bounded-error quantum query complexity). See my previous post for more about the definitions of these concepts and the significance of the results (and note also that Mukhopadhyay and Sanyal independently obtained weaker results).

Because my mental world was in such upheaval, in that earlier post I took pains to point out one thing that Ambainis et al. hadn’t done: namely, they still hadn’t shown any super-quadratic separation between R(f) and Q(f), for any total Boolean function f. (Recall that a total Boolean function, f:{0,1}n→{0,1}, is one that’s defined for all 2n possible input strings x∈{0,1}n. Meanwhile, a partial Boolean function is one where there’s some promise on x: for example, that x encodes a periodic sequence. When you phrase them in the query complexity model, Shor’s algorithm and other quantum algorithms achieving exponential speedups work only for partial functions, not for total ones. Indeed, a famous result of Beals et al. from 1998 says that D(f)=O(Q(f)6) for all total functions f.)

So, clinging to a slender reed of sanity, I said it “remains at least a plausible conjecture” that, if you insist on a fair comparison—i.e., bounded-error quantum versus bounded-error randomized—then the biggest speedup quantum algorithms can ever give you over classical ones, for total Boolean functions, is the square-root speedup that Grover’s algorithm easily achieves for the n-bit OR function.

Today, I can proudly report that my PhD student, Shalev Ben-David, has refuted that conjecture as well.  Building on the Göös et al. and Ambainis et al. work, but adding a new twist to it, Shalev has constructed a total Boolean function f such that R(f) grows roughly like Q(f)2.5 (yes, that’s Q(f) to the 2.5th power). Furthermore, if a conjecture that Ambainis and I made in our recent “Forrelation” paper is correct—namely, that a problem called “k-fold Forrelation” has randomized query complexity roughly Ω(n1-1/k)—then one would get nearly a cubic gap between R(f) and Q(f).

The reason I found this question so interesting is that it seemed obvious to me that, to produce a super-quadratic separation between R and Q, one would need a fundamentally new kind of quantum algorithm: one that was unlike Simon’s and Shor’s algorithms in that it worked for total functions, but also unlike Grover’s algorithm in that it didn’t hit some impassable barrier at the square root of the classical running time.

Flummoxing my expectations once again, Shalev produced the super-quadratic separation, but not by designing any new quantum algorithm. Instead, he cleverly engineered a Boolean function for which you can use a combination of Grover’s algorithm and the Forrelation algorithm (or any other quantum algorithm that gives a huge speedup for some partial Boolean function—Forrelation is just the maximal example), to get an overall speedup that’s a little more than quadratic, while still keeping your Boolean function total. I’ll let you read Shalev’s short paper for the details, but briefly, it once again uses the Göös et al. / Ambainis et al. trick of defining a Boolean function that equals 1 if and only if the input string contains some hidden substructure, and the hidden substructure also contains a pointer to a “certificate” that lets you quickly verify that the hidden substructure was indeed there. You can use a super-fast algorithm—let’s say, a quantum algorithm designed for partial functions—to find the hidden substructure assuming it’s there. If you don’t find it, you can simply output 0. But if you do find it (or think you found it), then you can use the certificate, together with Grover’s algorithm, to confirm that you weren’t somehow misled, and that the substructure really was there. This checking step ensures that the function remains total.

Are there further separations to be found this way? Almost certainly! Indeed, Shalev, Robin Kothari, and I have already found some more things (as well as different/simpler proofs of known separations), though nothing quite as exciting as the above.

Update (July 1): Ronald de Wolf points out in the comments that this “trust-but-verify” trick, for designing total Boolean functions with unexpectedly low quantum query complexities, was also used in a recent paper by himself and Ambainis (while Ashley Montanaro points out that a similar trick was used even earlier, in a different context, by Le Gall).  What’s surprising, you might say, is that it took as long as it did for people to realize how many applications this trick has.

Update (July 2): In conversation with Robin Kothari and Cedric Lin, I realized that Shalev’s superquadratic separation between R and Q, combined with a recent result of Lin and Lin, resolves another open problem that had bothered me since 2001 or so. Given a Boolean function f, define the “projective quantum query complexity,” or P(f), to be the minimum number of queries made by a bounded-error quantum algorithm, in which the answer register gets immediately measured after each query. This is a model of quantum algorithms that’s powerful enough to capture (for example) Simon’s and Shor’s algorithms, but not Grover’s algorithm. Indeed, one might wonder whether there’s any total Boolean function for which P(f) is asymptotically smaller than R(f)—that’s the question I wondered about around 2001, and that I discussed with Elham Kashefi. Now, by using an argument based on the “Vaidman bomb,” Lin and Lin recently proved the fascinating result that P(f)=O(Q(f)2) for all functions f, partial or total. But, combining with Shalev’s result that there exists a total f for which R(f)=Ω(Q(f)2.5), we get that there’s a total f for which R(f)=Ω(P(f)1.25). In the other direction, the best I know is that P(f)=Ω(bs(f)) and therefore R(f)=O(P(f)3).

Five announcements

Saturday, May 16th, 2015

1. Sanjeev Arora sent me a heads-up that there’s a discussion about the future of the STOC conference  at the Windows on Theory blog—in particular, about the idea of turning STOC into a longer “CS theory festival.”  If you have opinions about this, don’t miss the chance to make your voice heard.

2. Back in January, I blogged about a new quantum optimization algorithm by Farhi, Goldstone, and Gutmann, which was notable for being, as far as anyone could tell, the first quantum algorithm to achieve a provably better approximation ratio than the best-known classical algorithm for an NP-hard optimization problem.  Today, I report that a fearsome list of authors—Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright—has put out an eagerly-awaited paper that gives a classical algorithm for the same problem, with better performance than the quantum algorithm’s.  (They write that this “improves both qualitatively and quantitatively” on Farhi et al.’s work; I assume “qualitatively” refers to the fact that the new algorithm is classical.)  What happened, apparently, is that after I blogged (with enthusiasm) about the Farhi et al. result, a bunch of classical complexity theorists read my post and decided independently that they could match or beat the quantum algorithm’s performance classically; then they found out about each other and decided to merge their efforts.  I’m proud to say that this isn’t the first example of this blog catalyzing actual research progress, though it’s probably the best example so far.  [Update: Luca Trevisan now has a great post explaining what happened in much more detail, entitled “How Many Theoreticians Does It Take to Approximate Max 3Lin?”]

Another update: Farhi et al. have posted a new version of their paper, in which they can almost match the performance of the classical algorithm using their quantum algorithm.

3. Jennifer Ouellette has a wonderful article in Quanta magazine about recent progress in AdS/MERA (i.e., “the emergence of spacetime from entanglement”), centered around the ideas of Brian Swingle.  This is one of the main things that I’d love to understand better right now—if I succeed even partially, you’ll know because I’ll write a blog post trying to explain it to others.  See also this blog post by Sean Carroll (about this paper by Ning Bao et al.), and this paper by Pastawski, Yoshida, Harlow, and Preskill, which explicitly mines the AdS/CFT correspondence for examples of quantum error-correcting codes.

4. Celebrity rationalist Julia Galef, who I had the great honor of meeting recently, has a podcast interview with Sean Carroll about why Carroll accepts the many-worlds interpretation.  (Or if, like me, you prefer the written word to the spoken one, click here for a full transcript.)  Unfortunately, Sean is given the opportunity at the end of the interview to recommend one science book to his listeners—just one!—but he squanders it by plugging some weird, self-indulgent thing called Quantum Computing Since Democritus.  Julia also has a YouTube video about what she learned from the interview, but I haven’t yet watched it (is there a transcript?).

5. I came across an insightful if meandering essay about nerd culture by Meredith L. Patterson.  In particular, noticing how the term “nerd” has been co-opted by normal, socially-skilled people, who’ve quickly set about remaking nerd social norms to make them identical to the rest of the world’s norms, Patterson coins the term “weird-nerd” to describe people like herself, who are still nerds in the original sense and who don’t see nerd culture as something horribly, irreparably broken.  As she writes: “We’ll start to feel less defensive when we get some indication — any indication — that our critics understand what parts of our culture we don’t want to lose and why we don’t want to lose them.”  (But is this the start of a linguistic treadmill?  Will we eventually need to talk about weird-weird-nerds, etc.?)

Two papers

Tuesday, April 21st, 2015

Just to get myself back into the habit of blogging:

For those of you who don’t read Lance’s and Bill’s blog, there was a pretty significant breakthrough in complexity theory announced last week.  (And yes, I’m now spending one of the two or so uses of the word “breakthrough” that I allow myself per year—wait, did I just spend the second one with this sentence?)  Ben Rossman (a former MIT PhD student whose thesis committee I was honored to serve on), Rocco Servedio, and Li-Yang Tan have now shown that the polynomial hierarchy is infinite relative to a random oracle, thereby solving the main open problem from Johan Håstad’s 1986 PhD thesis.  While it feels silly even to mention it, the best previous result in this direction was to separate PNP from Σ2P relative to a random oracle, which I did in my Counterexample to the Generalized Linial-Nisan Conjecture paper.  In some sense Rossman et al. infinitely improve on that (using completely different techniques).  Proving their result boils down to proving a new lower bound on the sizes of constant-depth circuits.  Basically, they need to show that, for every k, there are problems that can be solved by small circuits with k layers of AND, OR, and NOT gates, but for which the answer can’t even be guessed, noticeably better than chance, by any small circuit with only k-1 layers of AND, OR, and NOT gates.  They achieve that using a new generalization of the method of random restrictions.  Congratulations to Ben, Rocco, and Li-Yang!

Meanwhile, if you want to know what I’ve been doing for the last couple months, one answer is contained in this 68-page labor of love preprint by me and my superb PhD students Daniel Grier and Luke Schaeffer.  There we give a full classification of all possible sets of classical reversible gates acting on bits (like the Fredkin, Toffoli, and CNOT gates), as well as a linear-time algorithm to decide whether one reversible gate generates another one (previously, that problem wasn’t even known to be decidable).  We thereby completely answer a question that basically no one was asking, although I don’t understand why not.

The Turing movie

Tuesday, December 16th, 2014

Last week I finally saw The Imitation Game, the movie with Benedict Cumberbatch as Alan Turing.

OK, so for those who haven’t yet seen it: should you?  Here’s my one paragraph summary: imagine that you told the story of Alan Turing—one of the greatest triumphs and tragedies of human history, needing no embellishment whatsoever—to someone who only sort-of understood it, and who filled in the gaps with weird fabrications and Hollywood clichés.  And imagine that person retold the story to a second person, who understood even less, and that that person retold it to a third, who understood least of all, but who was charged with making the movie that would bring Turing’s story before the largest audience it’s ever had.  And yet, imagine that enough of the enormity of the original story made it through this noisy channel, that the final product was still pretty good.  (Except, imagine how much better it could’ve been!)

The fabrications were especially frustrating to me, because we know it’s possible to bring Alan Turing’s story to life in a way that fully honors the true science and history.  We know that, because Hugh Whitemore’s 1986 play Breaking the Code did it.  The producers of The Imitation Game would’ve done better just to junk their script, and remake Breaking the Code into a Hollywood blockbuster.  (Note that there is a 1996 BBC adaptation of Breaking the Code, with Derek Jacobi as Turing.)

Anyway, the movie focuses mostly on Turing’s codebreaking work at Bletchley Park, but also jumps around in time to his childhood at Sherborne School, and to his arrest for “homosexual indecency” and its aftermath.  Turing’s two world-changing papers—On Computable Numbers and Computing Machinery and Intelligence—are both mentioned, though strangely, his paper about computing zeroes of the Riemann zeta function is entirely overlooked.

• The boastful, trash-talking, humor-impaired badass-nerd of the movie seems a lot closer to The Big Bang Theory‘s Sheldon Cooper, or to some other Hollywood concept of “why smart people are so annoying,” than to the historical Alan Turing.  (At least in Sheldon’s case, the archetype is used for laughs, not drama or veracity.)  As portrayed in the definitive biography (Andrew Hodges’ Alan Turing: The Enigma), Turing was eccentric, sure, and fiercely individualistic (e.g., holding up his pants with pieces of string), but he didn’t get off on insulting the intelligence of the people around him.
• In the movie, Turing is pretty much singlehandedly responsible for designing, building, and operating the Bombes (the codebreaking machines), which he does over the strenuous objections of his superiors.  This, of course, is absurd: Bletchley employed about 10,000 people at its height.  Turing may have been the single most important cog in the operation, but he was still a cog.  And by November 1942, the operation was already running smoothly enough that Turing could set sail for the US (in waters that were now much safer, thanks to Bletchley!), to consult on other cryptographic projects at Bell Labs.
• But perhaps the movie’s zaniest conceit is that Turing was also in charge of deciding what to do with Bletchley’s intelligence (!).  In the movie, it falls to him, not the military, to decide which ship convoys will be saved, and which sacrificed to prevent spilling Bletchley’s secret.  If that had any historicity to it, it would surely be the most military and political power ever entrusted to a mathematician (update: see the comments section for potential counterexamples).
• It’s true that Turing (along with three other codebreakers) wrote a letter directly to Winston Churchill, pleading for more funding for Bletchley Park—and that Churchill saw the letter, and ordered “Action this day! Make sure they have all they want on extreme priority.”  However, the letter was not a power play to elevate Turing over Hugh Alexander and his other colleagues: in fact, Alexander co-signed the letter.  More broadly, the fierce infighting between Turing and everyone else at Bletchley Park, central to the movie’s plot, seems to have been almost entirely invented for dramatic purposes.
• The movie actually deserves a lot of credit for getting right that the major technical problem of Bletchley Park was how to get the Bombes to search through keys fast enough—and that speeding things up is where Turing made a central contribution.  As a result, The Imitation Game might be the first Hollywood movie ever made whose plot revolves around computational efficiency.  (Counterexamples, anyone?)  Unfortunately, the movie presents Turing’s great insight as being that one can speed up the search by guessing common phrases, like “HEIL HITLER,” that are likely to be in the plaintext.  That was, I believe, obvious to everyone from the beginning.
• Turing never built a computer in his own home, and he never named a computer “Christopher,” after his childhood crush Christopher Morcom.  (On the other hand, Christopher Morcom existed, and his early death from tuberculosis really did devastate Turing, sending him into morbid-yet-prescient ruminations about whether a mind could exist separately from a brain.)
• I found it ironic that The Imitation Game, produced in 2014, is far more squeamish about on-screen homosexuality than Breaking the Code, produced in 1986.  Turing talks about being gay (which is an improvement over 2001’s Enigma, which made Turing straight!), but is never shown embracing another man.  However, the more important problem is that the movie botches the story of the burglary of Turing’s house (i.e., the event that led to Turing’s arrest and conviction for homosexual indecency), omitting the role of Turing’s own naiveté in revealing his homosexuality to the police, and substituting some cloak-and-dagger spy stuff.  Once again, Breaking the Code handled this perfectly.
• In one scene, Euler is pronounced “Yooler.”

For more, see an excellent piece in Slate, How Accurate Is The Imitation Game?.  And for other science bloggers’ reactions, see this review by Christos Papadimitriou (which I thought was extremely kind, though it focuses more on Turing himself than on the movie), this reaction by Peter Woit, which largely echoes mine, and this by Clifford Johnson.

PostBQP Postscripts: A Confession of Mathematical Errors

Sunday, November 30th, 2014

tl;dr: This post reveals two errors in one of my most-cited papers, and also explains how to fix them.  Thanks to Piotr Achinger, Michael Cohen, Greg Kuperberg, Ciaran Lee, Ryan O’Donnell, Julian Rosen, Will Sawin, Cem Say, and others for their contributions to this post.

If you look at my Wikipedia page, apparently one of the two things in the world that I’m “known for” (along with algebrization) is “quantum Turing with postselection.”  By this, Wikipedia means my 2004 definition of the complexity class PostBQP—that is, the class of decision problems solvable in bounded-error quantum polynomial time, assuming the ability to postselect (or condition) on certain measurement outcomes—and my proof that PostBQP coincides with the classical complexity PP (that is, the class of decision problems expressible in terms of whether the number of inputs that cause a given polynomial-time Turing machine to accept does or doesn’t exceed some threshold).

To explain this a bit: even without quantum mechanics, it’s pretty obvious that, if you could “postselect” on exponentially-unlikely events, then you’d get huge, unrealistic amounts of computational power.  For example (and apologies in advance for the macabre imagery), you could “solve” NP-complete problems in polynomial time by simply guessing a random solution, then checking whether the solution is right, and shooting yourself if it happened to be wrong!  Conditioned on still being alive (and if you like, appealing to the “anthropic principle”), you must find yourself having guessed a valid solution—assuming, of course, that there were any valid solutions to be found.  If there weren’t any, then you’d seem to be out of luck!  (Exercise for the reader: generalize this “algorithm,” so that it still works even if you don’t know in advance whether your NP-complete problem instance has any valid solutions.)

So with the PostBQP=PP theorem, the surprise was not that postselection gives you lots of computational power, but rather that postselection combined with quantum mechanics gives you much more power even than postselection by itself (or quantum mechanics by itself, for that matter).  Since PPP=P#P, the class PP basically captures the full difficulty of #P-complete counting problems—that is, not just solving an NP-complete problem, but counting how many solutions it has.  It’s not obvious that a quantum computer with postselection can solve counting problems, but that’s what the theorem shows.  That, in turn, has implications for other things: for example, I showed it can be used to prove classical facts about PP, like the fact that PP is closed under intersection (the Beigel-Reingold-Spielman Theorem), in a straightforward way; and it’s also used to show the hardness of quantum sampling problems, in the work of Bremner-Jozsa-Shepherd as well as my BosonSampling work with Arkhipov.

I’m diffident about being “known for” something so simple; once I had asked the question, the proof of PostBQP=PP took me all of an hour to work out.  Yet PostBQP ended up being a hundred times more influential for quantum computing theory than things on which I expended a thousand times more effort.  So on balance, I guess I’m happy to call PostBQP my own.

That’s why today’s post comes with a special sense of intellectual responsibility.  Within the last month, it’s come to my attention that there are at least two embarrassing oversights in my PostBQP paper from a decade ago, one of them concerning the very definition of PostBQP.  I hasten to clarify: once one fixes up the definition, the PostBQP=PP theorem remains perfectly valid, and all the applications of PostBQP that I mentioned above—for example, to reproving Beigel-Reingold-Spielman, and to the hardness of quantum sampling problems—go through just fine.  But if you think I have nothing to be embarrassed about: well, read on.

The definitional subtlety came clearly to my attention a few weeks ago, when I was lecturing about PostBQP in my 6.845 Quantum Complexity Theory graduate class.  I defined PostBQP as the class of languages L⊆{0,1}* for which there exists a polynomial-time quantum Turing machine M such that, for all inputs x∈{0,1}*,

• M(x) “succeeds” (determined, say, by measuring its first output qubit in the {|0>,|1>} basis) with nonzero probability.
• If x∈L, then conditioned on M(x) succeeding, M(x) “accepts” (determined, say, by measuring its second output qubit in the {|0>,|1>} basis) with probability at least 2/3.
• If x∉L, then conditioned on M(x) succeeding, M(x) accepts with probability at most 1/3.

I then had to reassure the students that PostBQP, so defined, was a “robust” class: that is, that the definition doesn’t depend on stupid things like which set of quantum gates we allow. I argued that, even though we’re postselecting on exponentially-unlikely events, it’s still OK, because the Solovay-Kitaev Theorem lets us approximate any desired unitary to within exponentially-small error, with only a polynomial increase in the size of our quantum circuit. (Here we actually need the full power of the Solovay-Kitaev Theorem, in contrast to ordinary BQP, where we only need part of the power.)

A student in the class, Michael Cohen, immediately jumped in with a difficulty: what if M(x) succeeded, not with exponentially-small probability, but with doubly-exponentially-small probability—say, exp(-2n)?  In that case, one could no longer use the Solovay-Kitaev Theorem to show the irrelevance of the gate set.  It would no longer even be clear that PostBQP⊆PP, since the PP simulation might not be able to keep track of such tiny probabilities.

Thinking on my feet, I replied that we could presumably choose a set of gates—for example, gates involving rational numbers only—for which doubly-exponentially-small probabilities would never arise.  Or if all else failed, we could simply add to the definition of PostBQP that M(x) had to “succeed” with probability at least 1/exp(n): after all, that was the only situation I ever cared about anyway, and the only one that ever arose in the applications of PostBQP.

But the question still gnawed at me: was there a problem with my original, unamended definition of PostBQP?  If we weren’t careful in choosing our gate set, could we have cancellations that produced doubly-exponentially-small probabilities?  I promised I’d think about it more.

By a funny coincidence, just a couple weeks later, Ciaran Lee, a student at Oxford, emailed me the exact same question.  So on a train ride from Princeton to Boston, I decided to think about it for real.  It wasn’t hard to show that, if the gates involved square roots of rational numbers only—for example, if we’re dealing with the Hadamard and Toffoli gates, or the cos(π/8) and CNOT gates, or other standard gate sets—then every measurement outcome has at least 1/exp(n) probability, so there’s no problem with the definition of PostBQP.  But I didn’t know what might happen with stranger gate sets.

As is my wont these days—when parenting, teaching, and so forth leave me with almost no time to concentrate on math—I posted the problem to MathOverflow.  Almost immediately, I got incisive responses.  First, Piotr Achinger pointed out that, if we allow arbitrary gates, then it’s easy to get massive cancellations.  In more detail, let {an} be extremely-rapidly growing sequence of integers, say with an+1 > exp(an).  Then define

$$\alpha = \sum_{n=1}^{\infty} 0.1^{a_n}.$$

If we write out α in decimal notation, it will consist of mostly 0’s, but with 1’s spaced further and further apart, like so: 0.1101000000000001000….  Now consider a gate set that involves α as well as 0.1 and -0.1 as matrix entries.  Given n qubits, it’s not hard to see that we can set up an interference experiment in which one of the paths leading to a given outcome E has amplitude α, and the other paths have amplitudes $$-(0.1^{a_1}), -(0.1^{a_2}), \ldots, -(0.1^{a_k}),$$ where k is the largest integer such that ak≤n. In that case, the total amplitude of E will be about $$0.1^{a_{k+1}},$$ which for most values of n is doubly-exponentially small in n. Of course, by simply choosing a faster-growing sequence {an}, we can cause an even more severe cancellation.

Furthermore, by modifying the above construction to involve two crazy transcendental numbers α and β, I claim that we can set up a PostBQP computation such that deciding what happens is arbitrarily harder than PP (though still computable)—say, outside of exponential space, or even triple-exponential space. Moreover, we can do this despite the fact that the first n digits of α and β remain computable in O(n) time. The details are left as an exercise for the interested reader.

Yet even though we can engineer massive cancellations with crazy gates, I still conjectured that nothing would go wrong with “normal” gates: for example, gates involving algebraic amplitudes only. More formally, I conjectured that any finite set A=(a1,…,ak) of algebraic numbers is “tame,” in the sense that, if p is any degree-n polynomial with integer coefficients at most exp(n) in absolute value, then p(a1,…,ak)≠0 implies |p(a1,…,ak)|≥1/exp(n). And indeed, Julian Rosen on MathOverflow found an elegant proof of this fact. I’ll let you read it over there if you’re interested, but briefly, it interprets the amplitude we want as one particular Archimedean valuation of a certain element of a number field, and then lower-bounds the amplitude by considering the product of all Archimedean and non-Archimedean valuations (the latter of which involves the p-adic numbers). Since this was a bit heavy-duty for me, I was grateful when Will Sawin reformulated the proof in linear-algebraic terms that I understood.

And then came the embarrassing part. A few days ago, I was chatting with Greg Kuperberg, the renowned mathematician and author of our climate-change parable. I thought he’d be interested in this PostBQP progress, so I mentioned it to him. Delicately, Greg let me know that he had recently proved the exact same results, for the exact same reason (namely, fixing the definition of PostBQP), for the latest revision of his paper How Hard Is It to Approximate the Jones Polynomial?. Moreover, he actually wrote to me in June to tell me about this! At the time, however, I regarded it as “pointless mathematical hairsplitting” (who cares about these low-level gate-set issues anyway?). So I didn’t pay it any attention—and then I’d completely forgotten about Greg’s work when the question resurfaced a few months later. This is truly a just punishment for looking down on “mathematical hairsplitting,” and not a lesson I’ll soon forget.

Anyway, Greg’s paper provides yet a third proof that the algebraic numbers are tame, this one using Galois conjugates (though it turns out that, from a sufficiently refined perspective, Greg’s proof is equivalent to the other two).

There remains one obvious open problem here, one that I noted in the MathOverflow post and in which Greg is also extremely interested. Namely, we now know that it’s possible to screw up PostBQP using gates with amplitudes that are crazy transcendental numbers (closely related to the Liouville numbers). And we also know that, if the gates have algebraic amplitudes, then everything is fine: all events have at least 1/exp(n) probability. But what if the gates have not-so-crazy transcendental amplitudes, like 1/e, or (a bit more realistically) cos(2)?  I conjecture that everything is still fine, but the proof techniques that worked for the algebraic case seem useless here.

Stepping back, how great are the consequences of all this for our understanding of PostBQP? Fortunately, I claim that they’re not that great, for the following reason. As Adleman, DeMarrais, and Huang already noted in 1997—in the same paper that proved BQP⊆PP—we can screw up the definition even of BQP, let alone PostBQP, using a bizarre enough gate set. For example, suppose we had a gate G that mapped |0> to x|0>+y|1>, where y was a real number whose binary expansion encoded the halting problem (for example, y might equal Chaitin’s Ω).  Then by applying G more and more times, we could learn more and more bits of y, and thereby solve an uncomputable problem in the limit n→∞.

Faced with this observation, most quantum computing experts would say something like: “OK, but this is silly! It has no physical relevance, since we’ll never come across a magical gate like G—if only we did! And at any rate, it has nothing to do with quantum computing specifically: even classically, one could imagine a coin that landed heads with probability equal to Chaitin’s Ω. Therefore, the right way to deal with this is simply to define BQP in such a way as to disallow such absurd gates.” And indeed, that is what’s done today—usually without even remarking on it.

Now, it turns out that even gates that are “perfectly safe” for defining BQP, can turn “unsafe” when it comes to defining PostBQP. To screw up the definition of PostBQP, it’s not necessary that a gate involve uncomputable (or extremely hard-to-compute) amplitudes: the amplitudes could all be easily computable, but they could still be “unsafe” because of massive cancellations, as in the example above involving α. But one could think of this as a difference of degree, rather than of kind. It’s still true that there’s a large set of gates, including virtually all the gates anyone has ever cared about in practice (Toffoli, Hadamard, π/8, etc. etc.), that are perfectly safe for defining the complexity class; it’s just that the set is slightly smaller than it was for BQP.

The other issue with the PostBQP=PP paper was discovered by Ryan O’Donnell and Cem Say.  In Proposition 3 of the paper, I claim that PostBQP = BQPPostBQP||,classical, where the latter is the class of problems solvable by a BQP machine that’s allowed to make poly(n) parallel, classical queries to a PostBQP oracle.  As Ryan pointed out to me, nothing in my brief argument for this depended on quantum mechanics, so it would equally well show that PostBPP = BPPPostBPP||, where PostBPP (also known as BPPpath) is the classical analogue of PostBQP, and BPPPostBPP|| is the class of problems solvable by a BPP machine that can make poly(n) parallel queries to a PostBPP oracle.  But BPPPostBPP|| clearly contains BPPNP||, which in turn contains AM—so we would get AM in PostBPP, and therefore AM in PostBQP=PP.  But Vereshchagin gave an oracle relative to which AM is not contained in PP.  Since there was no nonrelativizing ingredient anywhere in my argument, the only possible conclusion is that my argument was wrong.  (This, incidentally, provides a nice illustration of the value of oracle results.)

In retrospect, it’s easy to pinpoint what went wrong.  If we try to simulate BPPPostBPP|| in PostBPP, our random bits will be playing a dual role: in choosing the queries to be submitted to the PostBPP oracle, and in providing the “raw material for postselection,” in computing the responses to those queries.  But in PostBPP, we only get to postselect once.  When we do, the two sets of random bits that we’d wanted to keep separate will get hopelessly mixed up, with the postselection acting on the “BPP” random bits, not just on the “PostBPP” ones.

How can we fix this problem?  Well, when defining the class BQPPostBQP||,classical, suppose we require the queries to the PostBQP oracle to be not only “classical,” but deterministic: that is, they have to be generated in advance by a P machine, and can’t depend on any random bits whatsoever.  And suppose we define BPPPostBPP||,classical similarly.  In that case, it’s not hard to see that the equalities BQPPostBQP||,classical = PostBQP and BPPPostBPP||,classical = PostBPP both go through.  You don’t actually care about this, do you?  But Ryan O’Donnell and Cem Say did, and that’s good enough for me.

I wish I could say that these are the only cases of mistakes recently being found in decade-old papers of mine, but alas, such is not the case.  In the near future, my student Adam Bouland, MIT undergrad Mitchell Lee, and Singapore’s Joe Fitzsimons will post to the arXiv a paper that grew out of an error in my 2005 paper Quantum Computing and Hidden Variables. In that paper, I introduced a hypothetical generalization of the quantum computing model, in which one gets to see the entire trajectory of a hidden variable, rather than just a single measurement outcome. I showed that this generalization would let us solve problems somewhat beyond what we think we can do with a “standard” quantum computer. In particular, we could solve the collision problem in O(1) queries, efficiently solve Graph Isomorphism (and all other problems in the Statistical Zero-Knowledge class), and search an N-element list in only ~N1/3 steps, rather than the ~N1/2 steps of Grover’s search algorithm. That part of the paper remains fine!

On the other hand, at the end of the paper, I also gave a brief argument to show that, even in the hidden-variable model, ~N1/3 steps are required to search an N-element list. But Mitchell Lee and Adam Bouland discovered that that argument is wrong: it fails to account for all the possible ways that an algorithm could exploit the correlations between the hidden variable’s values at different moments in time. (I’ve previously discussed this error in other blog posts, as well as in the latest edition of Quantum Computing Since Democritus.)

If we suitably restrict the hidden-variable theory, then we can correctly prove a lower bound of ~N1/4, or even (with strong enough assumptions) ~N1/3; and we do that in the forthcoming paper. Even with no restrictions, as far as we know an ~N1/3 lower bound for search with hidden variables remains true. But it now looks like proving it will require a major advance in our understanding of hidden-variable theories: for example, a proof that the “Schrödinger theory” is robust to small perturbations, which I’d given as the main open problem in my 2005 paper.

As if that weren’t enough, in my 2003 paper Quantum Certificate Complexity, I claimed (as a side remark) that one could get a recursive Boolean function f with an asymptotic gap between the block sensitivity bs(f) and the randomized certificate complexity RC(f). However, two and a half years ago, Avishay Tal discovered that this didn’t work, because block sensitivity doesn’t behave nicely under composition.  (In assuming it did, I was propagating an error introduced earlier by Wegener and Zádori.)  More broadly, Avishay showed that there is no recursively-defined Boolean function with an asymptotic gap between bs(f) and RC(f). On the other hand, if we just want some Boolean function with an asymptotic gap between bs(f) and RC(f), then Raghav Kulkarni observed that we can use a non-recursive function introduced by Xiaoming Sun, which yields bs(f)≈N3/7 and RC(f)≈N4/7. This is actually a larger separation than the one I’d wrongly claimed.

Now that I’ve come clean about all these things, hopefully the healing can begin at last.