A query complexity breakthrough

Update (June 26): See this just-released paper, which independently obtains a couple of the same results as the Ambainis et al. paper, but in a different way (using the original Göös et al. function, rather than modifications of it).

Lots of people have accused me of overusing the word “breakthrough” on this blog. So I ask them: what word should I use when a paper comes out that solves not one, not two, but three of the open problems I’ve cared about most for literally half of my life, since I was 17 years old?

Yesterday morning, Andris Ambainis, Kaspars Balodis, Aleksandrs Belovs, Troy Lee, Miklos Santha, and Juris Smotrovs posted a preprint to ECCC in which they give:

(1) A total Boolean function f with roughly a fourth-power separation between its deterministic and bounded-error quantum query complexities (i.e., with D(f)~Q(f)4). This refutes the conjecture, which people have been making since Beals et al.’s seminal work in 1998, that the biggest possible gap is quadratic.

(2) A total Boolean function f with a quadratic separation between its deterministic and randomized query complexities (with D(f)~R0(f)2). This refutes a conjecture of Saks and Wigderson from 1986, that the best possible gap is R0(f)~D(f)0.753 (from the recursive AND/OR tree), and shows that the known relation D(f)=O(R0(f)2) is close to tight.

(3) The first total Boolean function f with any asymptotic gap between its zero-error and bounded-error randomized query complexities (in particular, with R0(f)~R(f)3/2).

(There are also other new separations—for example, involving exact quantum query complexity and approximate degree as a real polynomial. But the above three are the most spectacular to me.)

In updates to this post (coming soon), I’ll try my best to explain to general readers what D(f), R(f), and so forth are (see here for the classic survey of these measures), and I’ll also discuss how Ambainis et al. designed the strange functions f that achieve the separations (though their paper already does a good job of explaining it). For now, I’ll just write the stuff that’s easier to write.

I’m at the Federated Computing Research Conference in Portland, Oregon right now, where yesterday I gave my STOC talk (click here for the PowerPoint slides) about the largest possible separations between R(f) and Q(f) for partial Boolean functions f. (That paper is also joint work with Andris Ambainis, who has his fingers in many pies, or his queries in many oracles, or something.) Anyway, when I did a practice run of my talk on Monday night, I commented that, of course, for total Boolean functions f (those not involving a promise), the largest known gap between R(f) and Q(f) is quadratic, and is achieved when f is the OR function because of Grover’s algorithm.

Then, Tuesday morning, an hour before I was to give my talk, I saw the Ambainis et al. bombshell, which made that comment obsolete. So, being notoriously bad at keeping my mouth shut, I mentioned to my audience that, while it was great that they came all the way to Portland to learn what was new in theoretical computer science, if they wanted real news in the subfield I was talking about, they could stop listening to me and check their laptops.

(Having said that, I have had a wonderful time at FCRC, and have learned lots of other interesting things—I can do another addendum to the post about FCRC highlights if people want me to.)

Anyway, within the tiny world of query complexity—i.e., the world where I cut my teeth and spent much of my career—the Ambainis et al. paper is sufficiently revolutionary that I feel the need to say what it doesn’t do.

First, the paper does not give a better-than-quadratic gap between R(f) and Q(f) (i.e., between bounded-error randomized and quantum query complexities). The quantum algorithms that compute their functions f are still “just” variants of the old standbys, Grover’s algorithm and amplitude amplification. What’s new is that the authors have found functions where you can get the quadratic, Grover speedup between R(f) and Q(f), while also getting asymptotic gaps between D(f) and R(f), and between R0(f) and R(f). So, putting it together, you get superquadratic gaps between D(f) and Q(f), and between R0(f) and Q(f). But it remains at least a plausible conjecture that R(f)=O(Q(f)2) for all total Boolean functions f—i.e., if you insist on a “fair comparison,” then the largest known quantum speedup for total Boolean functions remains the Grover one.

Second, as far as I can tell (I might be mistaken) (I’m not), the paper doesn’t give new separations involving certificate complexity or block sensitivity (e.g., between D(f) and bs(f)). So for example, it remains open whether D(f)=O(bs(f)2), and whether C(f)=O(bs(f)α) for some α<2. (Update: Avishay Tal, in the comments, informs me that the latter conjecture was falsified by Gilmer, Saks, and Srinivasan in 2013. Wow, I’m really out of it!)

In the end, achieving these separations didn’t require any sophisticated new mathematical machinery—just finding the right functions, something that could’ve been done back in 1998, had anyone been clever enough. So, where did these bizarre functions f come from? Ambainis et al. directly adapted them from a great recent communication complexity paper by Mika Göös, Toniann Pitassi, and Thomas Watson. But the Göös et al. paper itself could’ve been written much earlier. It’s yet another example of something I’ve seen again and again in this business, how there’s no substitute for just playing around with a bunch of examples.

The highest compliment one researcher can pay another is, “I wish I’d found that myself.” And I do, of course, but having missed it, I’m thrilled that at least I get to be alive for it and blog about it. Huge congratulations to the authors!

Addendum: What’s this about?

OK, so let’s say you have a Boolean function f:{0,1}n→{0,1}, mapping n input bits to 1 output bit. Some examples are the OR function, which outputs 1 if any of the n input bits are 1, and the MAJORITY function, which outputs 1 if the majority of them are.

Query complexity is the study of how many input bits you need to read in order to learn the value of the output bit. So for example, in evaluating the OR function, if you found a single input bit that was 1, you could stop right there: you’d know that the output was 1, without even needing to look at the remaining bits. In the worst case, however, if the input consisted of all 0s, you’d have to look at all of them before you could be totally sure the output was 0. So we say that the OR function has a deterministic query complexity of n.

In this game, we don’t care about any other resources used by an algorithm, like memory or running time: just how many bits of the input it looks at! There are many reasons why, but the simplest is that, unlike with memory or running time, for many functions we can actually figure out how many input bits need to be looked at, without needing to solve anything like P vs. NP. (But note that this can already be nontrivial! For algorithms can often cleverly avoid looking at all the bits, for example by looking at some and then deciding which ones to look at next based on which values they see.)

In general, given a deterministic algorithm A and an n-bit input string x, let DA,x (an integer from 0 to n) be the number of bits of x that A examines when you run it. Then let DA be the maximum of DA,x over all n-bit strings x. Then D(f), or the deterministic query complexity of f, is the minimum of DA, over all algorithms A that correctly evaluate f(x) on every input x.

For example, D(OR) and D(MAJORITY) are both n: in the worst case, you need to read everything. For a more interesting example, consider the 3-bit Boolean function

f(x,y,z) = (not(x) and y) or (x and z).

This function has D(f)=2, even though it depends on all 3 of the input bits. (Do you see why?) In general, even if f depends on n input bits, D(f) could be as small as log2n.

The bounded-error randomized query complexity, or R(f), is like D(f), except that now we allow the algorithm to make random choices of which input bit to query, and for each input x, the algorithm only needs to compute f(x) with probability 2/3. (Here the choice of 2/3 is arbitrary; if you wanted the right answer with some larger constant probability, say 99.9%, you could just repeat the algorithm a constant number of times and take a majority vote.) The zero-error randomized query complexity, or R0(f), is the variant where the algorithm is allowed to make random choices, but at the end of the day, needs to output the correct f(x) with probability 1.

To illustrate these concepts, consider the three-bit majority function, MAJ(x,y,z). We have D(MAJ)=3, since if a deterministic algorithm queried one bit and got a 0 and queried a second bit and got a 1 (as can happen), it would have no choice but to query the third bit. But for any possible setting of x, y, and z, if we choose which bits to query randomly, there’s at least a 1/3 chance that the first two queries will return either two 0s or two 1s—at which point we can stop, with no need to query the third bit. Hence R0(MAJ)≤(1/3)2+(2/3)3=8/3 (in fact it equals 8/3, although we haven’t quite shown that). Meanwhile, R(MAJ), as we defined it, is only 1, since if you just need a 2/3 probability of being correct, you can simply pick x, y, or z at random and output it!

The bounded-error quantum query complexity, or Q(f), is the minimum number of queries made by a quantum algorithm for f, which, again, has to output the right answer with probability at least 2/3 for every input x. Here a quantum algorithm makes a “query” by feeding a superposition of basis states, each of the form |i,a,w〉, to a “black box,” which maps each basis state to |i, a XOR xi, w〉, where i is the index of the input bit xi to be queried, a is a 1-qubit “answer register” into which xi is reversibly written, and w is a “workspace” that doesn’t participate in the query. In between two queries, the algorithm can apply any unitary transformation it wants to the superposition of |i,a,w〉’s, as long as it doesn’t depend on x. Finally, some designated qubit is measured to decide whether the algorithm accepts or rejects.

As an example, consider the 2-bit XOR function, XOR(x,y). We have D(XOR)=R0(XOR)=R(XOR)=2, since until you’ve queried both bits, you’ve learned nothing about their XOR. By contrast, Q(XOR)=1, because of the famous Deutsch-Jozsa algorithm.

It’s clear that

0 ≤ Q(f) ≤ R(f) ≤ R0(f) ≤ D(f) ≤ n,

since a quantum algorithm can simulate a randomized one and a randomized one can simulate a deterministic one.

A central question for the field, since these measures were studied in the 1980s or so, has been how far apart these measures can get from each other. If you allow partial Boolean functions—meaning that only some n-bit strings, not all of them, are “valid inputs” for which the algorithm needs to return a definite answer—then it’s easy to get enormous separations between any two of the measures (indeed, even bigger than exponential), as for example in my recent paper with Andris.

For total functions, by contrast, it’s been known for a long time that these measures can differ by at most polynomial factors:

D(f) = O(R(f)3) (Nisan)

D(f) = O(R0(f)2) (folklore, I think)

R0(f) = O(R2 log(n)) (Midrijanis)

D(f) = O(Q(f)6) (Beals et al. 1998)

OK, so what were the largest known gaps? For D versus R0 (as well as D versus R), the largest known gap since 1986 has come from the “recursive AND/OR tree”: that is, an OR of two ANDs of two ORs of two ANDs of … forming a complete binary tree of depth d, with the n=2d input variables comprising the leaves. For this function, we have D(f)=n, whereas Saks and Wigderson showed that R0(f)=Θ(n0.753) (and later, Santha showed that R(f)=Θ(n0.753) as well).

For D versus Q, the largest gap has been for the OR function: we have D(OR)=n (as mentioned earlier), but Q(OR)=Θ(√n) because of Grover’s algorithm. Finally, for R0 versus R, no asymptotic gap has been known for any total function. (This is a problem that I clearly remember working on back in 2000, when I was an undergrad. I even wrote a computer program, the Boolean Function Wizard, partly to search for separations between R0 versus R. Alas, while I did find one or two functions with separations, I was unable to conclude anything from them about asymptotics.)

So, how did Ambainis et al. achieve bigger gaps for each of these? I’ll try to have an explanation written by the time my flight from Portland to Boston has landed tonight. But if you can’t wait for that, or you prefer it straight from the horse’s mouth, read their paper!

Addendum 2: The Actual Functions

As I mentioned before, the starting point for everything Ambainis et al. do is a certain Boolean function g recently constructed by Göös, Pitassi, and Watson (henceforth GPW), for different purposes than the ones that concern Ambainis et al. We think of the inputs to g as divided into nm “cells,” which are arranged in a rectangular grid with m columns and n rows. Each cell contains a bit that’s either 0 or 1 (its “label), as well as a pointer to another cell (consisting of ~log2(nm) bits). The pointer can also be “null” (i.e., can point nowhere). We’ll imagine that a query of a cell gives you everything: the label and all the bits of the pointer. This could increase the query complexity of an algorithm, but only by a log(n) factor, which we won’t worry about.

Let X be a setting of all the labels and pointers in the grid. Then the question we ask about X is the following:

    Does there exist a “marked column”: that is, a column where all n of the labels are 1, and which has exactly one non-null pointer, which begins a chain of pointers of length m-1, which visits exactly one “0” cell in each column other than the marked column, and then terminates at a null pointer?

If such a marked column exists, then we set g(X)=1; otherwise we set g(X)=0. Crucially, notice that if a marked column exists, then it’s unique, since the chain of pointers “zeroes out” all m-1 of the other columns, and prevents them from being marked.

This g already leads to a new query complexity separation, one that refutes a strengthened form of the Saks-Wigderson conjecture. For it’s not hard to see that D(g)=Ω(mn): indeed, any deterministic algorithm must query almost all of the cells. A variant of this is proved in the paper, but the basic idea is that an adversary can answer all queries with giant fields of ‘1’ labels and null pointers—until a given column is almost completed, at which point the adversary fills in the last cell with a ‘0’ label and a pointer to the last ‘0’ cell that it filled in. The algorithm just can’t catch a break; it will need to fill in m-1 columns before it knows where the marked one is (if a marked column exists at all).

By contrast, it’s possible to show that, if n=m, then R(g) is about O(n4/3). I had an argument for R(g)=O((n+m)log(m)) in an earlier version of this post, but the argument was wrong; I thank Alexander Belov for catching the error. I’ll post the R(g)=O(n4/3) argument once I understand it.

To get the other separations—for example, total Boolean functions for which D~R02, D~Q4, R0~Q3, R0~R3/2, and R~approxdeg4—Ambainis et al. need to add various “enhancements” to the basic GPW function g defined above. There are three enhancements, which can either be added individually or combined, depending on one’s needs.

1. Instead of just a single marked column, we can define g(X) to be 1 if and only if there are k marked columns, which point to each other in a cycle, and which also point to a trail of m-k ‘0’ cells, showing that none of the other columns contain all ‘1’ cells. This can help a bounded-error randomized algorithm—which can quickly find one of the all-1 columns using random sampling—while not much helping a zero-error randomized algorithm.

2. Instead of a linear chain of pointers showing that all the non-marked columns contain a ‘0’ cell, for g(X) to be 1 we can demand a complete binary tree of pointers, originating at a marked column and fanning out to all the unmarked columns in only log(m) layers. This can substantially help a quantum algorithm, which can’t follow a pointer trail any faster than a classical algorithm can; but which, given a complete binary tree, can “fan out” and run Grover’s algorithm on all the leaves in only the square root of the number of queries that would be needed classically. Meanwhile, however, putting the pointers in a tree doesn’t much help deterministic or randomized algorithms.

3. In addition to pointers “fanning out” from a marked column to all of the unmarked columns, we can demand that in every unmarked column, some ‘0’ cell contains a back-pointer, which leads back to a marked column. These back-pointers can help a randomized or quantum algorithm find a marked column faster, while not much helping a deterministic algorithm.

Unless I’m mistaken, the situation is this:

With no enhancements, you can get D~R2 and something like D~R03/2 (although I still don’t understand how you get the latter with no enhancements; the paper mentions it without proof Andris has kindly supplied a proof here).

With only the cycle enhancement, you can get R0~R3/2.

With only the binary tree enhancement, you can get R~approxdeg4.

With only the back-pointer enhancement, you can get D~R02.

With the cycle enhancement and the binary-tree enhancement, you can get R0~Q3.

With the back-pointer enhancement and the binary-tree enhancement, you can get D~Q4.

It’s an interesting question whether there are separations that require both the cycle enhancement and the back-pointer enhancement; Ambainis et al. don’t give any examples.

And here’s another interesting question not mentioned in the paper. Using the binary-tree enhancement, Ambainis et al. achieve a fourth-power separation between bounded-error randomized query complexity and approximate degree as a real polynomial—i.e., quadratically better than any separation that was known before. Their proof of this involves cleverly constructing a low-degree polynomial by summing a bunch of low-degree polynomials derived from quantum algorithms (one for each possible marked row). As a result, their final, summed polynomial does not itself correspond to a quantum algorithm, meaning that they don’t get a fourth-power separation between R and Q (which would’ve been even more spectacular than what they do get). On the other hand, purely from the existence of a function with R~approxdeg4, we can deduce that that function has either

(i) a super-quadratic gap between R and Q (refuting my conjecture that the Grover speedup is the best possible quantum speedup for total Boolean functions), or
(ii) a quadratic gap between quantum query complexity and approximate degree—substantially improving over the gap found by Ambainis in 2003.

I conjecture that the truth is (ii); it would be great to have a proof or disproof of this.

50 Responses to “A query complexity breakthrough”

  1. Ryan O'Donnell Says:

    Definitely a truly outstanding result. (And the preceding Goos-Pitassi-Watson one is pretty great too.)

  2. John Says:

    Typo: 0.753 -> 1/0.753

  3. Avishay Tal Says:

    Hi Scott, regarding the block-sensitivity vs. certificate complexity question you mentioned, it was resolved in 2013 by Gilmer, Saks and Srinivasan (http://arxiv.org/abs/1306.0630): they proved the existence of a function whose certificate complexity is quadratically larger than its block sensitivity.

  4. Sniffnoy Says:

    Some examples, those!

  5. MadHatter Says:

    Yes, FCRC updates please!

  6. F. Marshall Says:

    FCRC highlights post would be cool!

  7. Scott Says:

    Ryan #1: Completely agreed!

    John #2 and Avishay #3: OK, thanks very much, fixed!

    MadHatter #5 and F. Marshall #6: OK, it’s a-comin’!

  8. Mike Says:

    Scott, you say that “We have D(XOR)=R0(XOR)=R(XOR)=2, since until you’ve queried both bits, you’ve learned nothing about their XOR. By contrast, Q(XOR)=1, because of the famous Deutsch-Jozsa algorithm.”

    But I recently saw a paper (http://arxiv.org/abs/1506.04627) that claims “we provide an efficient classical version of the Deutsch-Jozsa algorithm, which was one of the first examples of quantum computational speed-up. Our conclusion is that the quantum Deutsch-Josza algorithm owes its speed-up to resources that are not necessarily quantum-mechanical, and when compared with this new classical solution, offers no speed-up at all.”

    Have you seen the paper? If so, what do you think?

  9. Rahul Says:

    f(x,y,z) = (not(x) and y) or (x and z).

    This function has D(f)=2, even though it depends on all 3 of the input bits. (Do you see why?)

    For the non-expert, can someone explain why? I tried to think about this & I always seem to find a need to examine all three bits.

    Trying to think of the expression in other ways didn’t help either: http://www.wolframalpha.com/input/?i=%28NOT+X+AND+Y%29+OR+%28X+AND+Z%29

  10. Scott Says:

    Mike #8: Yes, the problem that the Deutsch-Jozsa algorithm solves with 1 quantum query, can also be solved by a classical randomized algorithm that makes 2 queries on average. So you get a factor-of-2 speedup, but no more than that—exactly as in the example I showed!

    This is an observation that’s been well-understood since the early 1990s, and that’s explicitly noted in countless papers, so there’s absolutely no reason for someone to write a paper about it in 2015.

    Fundamentally, the fact that you only get a factor of 2 is the reason why Deutsch-Jozsa was never a satisfying/convincing example of a quantum speedup—and that’s precisely what made the later work by Bernstein-Vazirani, Simon, and Shor (which gave successively more convincing examples) so important.

    Yes, Deutsch-Jozsa shows a huge advantage for quantum deterministic algorithms over classical deterministic ones, so it’s a hint of what might be possible. But to be truly convincing, you need an advantage over classical probabilistic algorithms, not only deterministic ones.

  11. Scott Says:

    Rahul #9: The expression is saying, “if x=0 then output y; otherwise output z.” So, first query x, then query either y or z depending on what you see.

  12. Jay Says:

    Thx for the examples! Could you give a hint of why polynomial separations for query complexity are more interesting than for time complexity?

    Suppose two problems have the same time complexity. Would they both have the same query complexity?

  13. Scott Says:

    Jay #12: Polynomial separations are also very interesting for time complexity. Indeed there’s a whole community that studies them, namely the algorithms community! 🙂 The main thing that’s different about query complexity is just that we can prove polynomial and even larger separations unconditionally (i.e., it doesn’t require solving P-versus-NP-type problems).

    Two problems could both be solvable in (say) linear time, yet have arbitrarily different query complexities: for example, maybe one problem requires scanning all n input bits and doing a simple, linear-time computation on them, whereas another problem requires reading only log(n) of the bits, but then doing an exponential-time computation on them (so that the total time is 2log(n)=n). Conversely, two problems could also have the same query complexity (say, linear in n) but extremely different time complexities.

  14. Jay Says:

    Scott #13

    1) Thx for this clarification.

    2) sorry, I screw up my question: suppose two problems A and B such that one can reduce problem A to problem B in polynomial time. Would that implicates both have the same query complexity?

  15. Rahul Says:

    Scott #11:

    Ah! Got it. Very elegant, thanks!

    So for a general f(x,y,z) or extended for n input bits there’s no automated procedure to what D(f) will be? One has to take every specific f and argue for it?

  16. Scott Says:

    Jay #14: No, it does not imply that. For example, XOR on n bits and XOR on the first log(n) bits are both computable in polynomial time, so trivially polytime reducible to each other, but their query complexities are different. Keep in mind that query complexity (with an n-bit input) is always between 0 and n, whereas time complexity can be much greater than n. Yes, it’s possible to have reductions that meaningfully relate the query complexity of problem A to the query complexity of problem B. But those reductions need to be carefully designed to preserve query complexity, and not blow it up by factors of n or whatever (which would make the reductions meaningless).

  17. Scott Says:

    Rahul #15: Check out Algorithms for Boolean Function Query Properties, which is the first theory paper I wrote (back in 2000)! Given the truth table of an n-bit Boolean function f, which takes 2n bits to specify, it’s possible to compute D(f) in about 3n time. R(f) and Q(f) can also be computed in time polynomial in the size of the truth table (in the Q(f) case, by a 2003 result of Barnum, Saks, and Szegedy). But of course, that would just give you the answer for a particular input size n, not the asymptotics (which is what we really care about). In practice, it tends to be much, much easier to work out the asymptotics for D(f) than for the randomized and quantum query complexities.

  18. Jay Says:

    Thanks again 🙂

  19. Avi Says:

    >But for any possible setting of x, y, and z, if we choose which bits to query randomly, there’s at least a 1/3 chance that the first two queries will return either two 0s or two 1s—at which point we can stop, with no need to query the third bit.

    Isn’t it 1/2? Whatever the first bit is, the second bit has a 50% of being the same.

  20. Sniffnoy Says:

    This is nitpicking, but, I think you might want to clarify in the defintions of R(f) and R_0(f) that in the case of R(f), you are still looking at the worst case number of queries like in D(f), but for R_0(f), you are looking at the expected number of queries. I mean, the example you used obviously conveys the latter, but I’m not sure it conveys the former.

    …or does that end up being equivalent somehow? I’m a bit out of my area here…

  21. Alex Lopez-Ortiz Says:

    The reason for the supposed overuse of the word breakthrough is that it has two different meanings:

    1) a sudden, dramatic, and important discovery or development.

    2) a major achievement or success that permits further progress, as in technology.

    For example, the Stothers&Vassilevska-Williams result on Matrix Multiplication is a major result on an important problem and thus clearly a breakthrough in the sense of (1). At the same time results by the same authors as well as Le Gall strongly suggest that not much further developments will flow from this technique and hence not a breakthrough in the sense of (2).

    So in the future you might want to say “a complexity breakthrough (type 1)”. <g>

  22. Scott Says:

    Avi #19: No, think about it. If we set the 3 bits to, say, 001 (the hardest case), then there are 3 possible pairs of bits that we can query, but for two of them we’ll see different values (01), and for only one of them will we see the same value twice (00). Another way to say this is that if we pick a 0 first (which happens with probability 2/3), then we have a 1/2 chance of picking the same value next, but if we pick the 1 first (which happens with probability 1/3), then we have no chance of picking the same value next. So overall, the probability is 1/3.

  23. Scott Says:

    Sniffnoy #20: Yeah, I was hoping not to have to get into that. 🙂 There’s a normalization issue: you can also define R(f) in terms of expected number of queries, but then it’s always at least a factor of 3 smaller than R0(f), for the silly reason that your algorithm can succeed with probability 2/3 by

    – immediately accepting with 1/3 probability,
    – immediately rejecting with 1/3 probability, and
    – running the zero-error algorithm with 1/3 probability.

    So, if R(f) were defined in terms of expected number of queries, then my preference would be to multiply it by 3, so that it’s equal to R0(f) except when it has a “real” reason to be smaller.

  24. Job Says:

    When using Quantum algorithms like Grover’s and DJ’s for query complexity, what’s the black-box that’s used? Is it derived from the input string?

    For example, the boolean OR function receives an n-bit string as input, but Grover’s algorithm operates on a quantum circuit.

  25. Sniffnoy Says:

    Scott #22: Yikes! That’s a pretty good reason not to define it that way. Especially because then you have to adjust the multiplier if you want to also adjust the threshold probability.

    I think my original complaint about this is something of a lost purpose, actually — I’m used to nitpicking about this when people talk about zero-error random stuff, but don’t make any note that we’re looking at expected time (or other resources), making it unclear what the advantage of the randomness is. In this case that was the part you did make clear, so my nitpick didn’t serve its usual purpose! I think I just did that because I was used to bringing it up without thinking about whether there was a good reason to.

  26. Graeme Says:

    >…Andris Ambainis, who has his fingers in many pies, or his >queries in many oracles, or something.

    I think what you meant to say here was, “…Andris Ambainis, who is the oracle for many queries.”

  27. Scott Says:

    Graeme #26: Yes, that’s better; thanks!

  28. Scott Says:

    Job #24: Sorry, there was a notational problem in my post (now fixed) that probably contributed to your confusion. Namely, I was repurposing f as the black box to be queried, rather than using x for the black box (as I should have), and reserving f for the property of the black box x that you’re trying to compute.

    So yes, in all the examples discussed in this post (OR, Deutsch-Jozsa, etc.), you should identify the black box with the n-bit input string x. A “query” supplies an index i from 1 to n, whereupon the black box returns xi (the ith bit of x).

    Grover’s algorithm can easily be cast in this framework: we simply say that we’re searching for an index i such that xi=1, or trying to decide whether such an i exists.

  29. Sniffnoy Says:

    OK, this is truly nitpicking, but the actual definition of g doesn’t require terminating in a null pointer after m-1 steps; it just requires that you visit one 0-value cell in each of the other m-1 columns during the first m-1 steps. The chain can continue after that.

  30. Scott Says:

    Sniffnoy #29: Thanks; I’d missed that! But presumably the separations themselves should work fine with either convention.

  31. Job Says:

    Thanks, now i understand.

    The algorithm A really receives a function X which produces the input string, bit by bit.

    Then the Quantum version of A receives a quantum circuit implementing X.

    For the boolean OR function, the classical algorithm has to query X for up to n bits, whereas the quantum algorithm can use Grover’s to determine whether X ever produces the value 1, with at most √n queries.

    Is there a quantum speedup for PARITY?

  32. Rahul Says:

    So does query complexity have practical uses? e.g. In complex logic circuits, or ladder diagrams etc. do engineers use any of this work to decide a minimal set of inputs that need to be examined? Or is this purely an elegant set of problems?

    Even in the TCS / complexity sphere does this tie in with other areas? i.e. does a fourth-power separation between deterministic and bounded-error quantum query complexities have wider implications for designing QCs or P/NP ?

  33. Scott Says:

    Job #31:

      Is there a quantum speedup for PARITY?

    You can get a factor-of-2 speedup, by grouping the input bits into n/2 pairs, running Deutsch-Jozsa on each pair, and then taking the parity of the results yourself (which doesn’t require any additional queries to the input).

    However, this factor-of-2 speedup is the best possible for PARITY: that was proven by Farhi et al. (in one of their first forays into quantum computing), as well as by Beals et al.

    One can also prove that there’s no asymptotic quantum advantage for MAJORITY. So, PARITY and MAJORITY are not functions where a quantum computer can really help you, in contrast to AND and OR.

  34. Scott Says:

    Rahul #32: The main practical uses of query complexity have been as (1) a testing-ground for designing new algorithms (especially quantum algorithms), and (2) a way to understand the limitations of large classes of algorithms.

    As an example, Shor’s algorithm was directly inspired by Simon’s algorithm, which was a result in query complexity (and which some people considered of no real importance, since it was “merely” query complexity). And the core of Shor’s algorithm itself is period-finding, which is also query complexity.

    If you’re willing to count BosonSampling, 🙂 that was also directly inspired by query complexity stuff, e.g. my 2010 results on the Fourier Sampling problem.

    Query complexity is also what tells us that Grover’s algorithm is optimal, that n1/3 is the fastest a quantum computer can find collisions in cryptographic hash functions, and that you shouldn’t hope for a fast quantum algorithm for parity or majority voting (see comment #33)—in all these cases, without using some additional information about the problem structure.

    Now, even by the above standards, asking (e.g.) for the largest possible separation between classical and quantum query complexities for total Boolean functions is a more theoretical question—since even if you manage to find a super-quadratic separation, there’s no guarantee it will be for a problem of any practical interest to anyone. (As, indeed, the GPW / Ambainis et al. functions probably aren’t!)

    But then again, who knows? Simon’s algorithm also looked like something of no conceivable practical value to anyone, before it inspired Shor’s algorithm. In science, sometimes the best way to make practically-important discoveries is to forget entirely about practical importance, and just pursue what’s cool or elegant or interesting. At least, that’s how it’s been for 350 years…

  35. David Speyer Says:

    Maybe this is too classical to be worth mentioning, but the famous n log n bound for sorting by comparison is basically a query complexity result: You need to compare n log n pairs (i,j) in order to sort n objects. And, in this case, it turned out that the other processing doesn’t drive the time demands up above n log n.

  36. Scott Says:

    David #35: Yes, thanks for that! I’d estimate that maybe a third of theoretical computer science boils down in one way or another to query complexity. 🙂 You see it even in seemingly remote places, like cryptography and computational economics, as well as of course in basic algorithms and data structures.

  37. Job Says:

    From your description it sounds like the fourth-power speedup over D(g) is the result of accumulating smaller √n speedups (using Grover’s and the fanout enhancement?). Is that how it is?

    If so, what’s stopping us from using that approach to get increasingly larger speedups for similar problems? (e.g by adding more dimensions to the table)

  38. Scott Says:

    Job #37: Yes, although the two √n speedups come from different sources, so it’s not like you can keep accumulating more of them willy-nilly.

    You could try adding more dimensions to the table, but in the existing constructions, both of the dimensions serve very specific purposes, so you’d have to invent a purpose for the new dimensions to serve. My guess (I might be wrong) is that you’ll get diminishing returns this way—at least, that’s what’s happened in the past whenever I’ve tried to produce larger separations between query complexity measures by generalizing a 2-dimensional problem to 3 or more dimensions. Note also that, no matter what you do, we know that for total Boolean functions, you can’t get any separations bigger than D~R02 (which Ambainis et al. already achieved), or D~R3, or R0~R2, or D~Q6. So while it’s conceivable that adding dimensions could help in some way, there must be a limit beyond which it stops helping.

  39. Shmi Nux Says:

    Sorry, unrelated to query complexity: optimizing packing in the real world: http://www.quora.com/How-did-game-developers-pack-entire-games-into-so-little-memory-twenty-five-years-ago

    > my packer used a variety of algorithms (first-fit, best-fit, etc.) to try to find the best packing, including a stochastic search akin to the gradient descent process used in Simulated annealing. Basically, I had a whole bunch of different packing strategies, and would try them all and use the best result.

    I wonder if a CS expert would have suggested something better.

    > The problem with using a random guided search like that, though, is that you never know if you’re going to get the same result again. Some Crash levels fit into the maximum allowed number of pages (I think it was 21) only by virtue of the stochastic packer “getting lucky”. This meant that once you had the level packed, you might change the code for a turtle and never be able to find a 21-page packing again.

  40. Joshua Zelinsky Says:


    Has anyone looked for how lowarge query complexity separations between quantum computers and the sort of hidden variable models you were looking at? You had the N^{1/3} database search example which beats Grover’s N^{1/2}. Can this be turned into a Boolean function separation?

  41. Scott Says:

    Joshua #40: That’s a wonderful question; thanks for asking it!

    Let DQ(f) be the “dynamical quantum query complexity” (i.e., quantum query complexity in the hidden-variable model I defined in 2005).

    Then I conjecture that DQ(f)=Ω(bs(f)1/3) for all f; this would follow if a cube-root speedup was the best you could get for the Grover search problem. In my original paper, I had sketched what I claimed was a proof for this, but the proof was wrong—see my blog post about this, or the later paper with Bouland, Fitzsimons, and Lee.

    Anyway, if the cube-root lower bound is true—or true for some particular class of hidden-variable theories—then for those theories, combining with D(f)=O(bs(f)3) would yield D(f)=O(DQ(f)9).

    What separations are possible between D(f) and DQ(f), or between R(f) and DQ(f), etc. in light of Ambainis et al.’s results, is a great question that I’ll take up in a subsequent comment.

  42. Craig Gidney Says:

    I think $D~R_0^{3/2}$ comes from the fact that:

    1) If there are more than n^{3/2} zeroes on the board, then random sampling will knock out a column after O(sqrt(n)) samples. Unless the distribution is very uneven, random sampling of un-eliminated columns will eliminate all but sqrt(n) columns within O(n * sqrt(n)) samples; and then you can just brute force the rest.

    2) If there are fewer than n^{3/2} zeroes, then fully scanning randomly chosen columns, then following all the chains, and iterating on un-eliminated columns, becomes viable. If there is a long chain of zeroes, like there must be if there’s a solution, then you’ll tend to hit it at the halfway mark when you sample a column, cutting the un-eliminated columns in half.

    3) I’m not exactly sure how to solve the case where there isn’t a long chain of zeros and there’s less than n^{3/2} zeroes. I think it has to do with finding columns A and B such that A doesn’t lead to B and B doesn’t lead to A proving there’s no solution. An adversary trying to avoid that disproof being easy to stumble upon might have to connect columns in ways that mean searching outward from one column must lead to at least half of the other columns on average, eliminating them.

  43. Scott Says:

    Joshua #40: I’ve now thought a bit about your question of getting bigger separations between between classical and “hidden-variable” query complexities, than we know between classical and quantum. The situation turns out to be remarkably rich and interesting.

    The fundamental difficulty is that, with the hidden-variable models, you only get to see a hidden-variable history once. I.e., you don’t get to sample multiple hidden-variable histories in superposition, then sample the history of a hidden variable through that quantum computation, and so on recursively. Because of this, hidden-variable algorithms don’t compose in the way you would like. And since the quantum algorithms for the Ambainis et al. functions all work by composing multiple calls to Grover’s algorithm, this failure of composition is a serious problem.

    As an easier warmup case, let’s consider an OR of √N AND’s of √N variables each. This function is well-known to have quantum query complexity Ω(√N). Thus, by analogy to the case of the OR function, one might hope that its hidden-variable query complexity would be O(N1/3). The trouble is that that would require recursion: if we run the recursive Grover algorithm, then sampling the hidden variable will speed up the search for a subtree of the OR that evaluates to 1, but it’s not going to speed up the search within each subtree. For this reason, the best upper bound I’m able to get on the hidden-variable query complexity of the OR/AND tree is

    O( (√N)1/3 (√N)1/2) = O(N5/12).

    This is better than √N, but not all the way down to N1/3.

    And for Ambainis et al.’s function h—the one that gives a cubic separation between R0 and Q—I’m not able to get any upper bound on hidden-variable query complexity better than the upper bound on Q. Basically, hidden-variable sampling only obviously helps you for the last step of the algorithm, the one where you’re verifying that the cycle of “marked rows” really is marked. But when you plug in m1/3 rather than √m for the query complexity of the last step, the linear program that tells you how to optimize the parameters still doesn’t return a better solution than T=(mn)1/3 queries.

    Likewise, for Ambainis et al.’s f function—the one they use to achieve a quartic separation between D and Q—I also currently can’t get anything better when I consider the hidden-variable query complexity. The reason is that we have D=Ω(nm) assuming n≥2m, while (ignoring log factors)

    Q = O(√n + √m),

    and the hidden-variable complexity is O(√n + m1/3). But as long as we need the assumption n≥2m, improving that √m to m1/3 just doesn’t make a difference.

    OK, but having said all that: there’s also an alternative model we could consider, where you get to make “multiple non-collapsing measurements” of a state, then use the (classical) results of those measurements as inputs to your next quantum computation, and so on. (Note, however, that you still don’t get to invoke the “multiple non-collapsing measurements” oracle in superposition.) This is basically the model that Bouland, Fitzsimons, Lee, and I considered.

    In this stronger model, first of all, using T queries, you can prepare a superposition that has a probability mass of T2/N on the ‘1’ branch of the OR/AND tree. You can then keep sampling from that superposition over and over, and for each branch you see, check whether it’s indeed a ‘1’ branch in O(N1/6) queries. So if a ‘1’ branch exists, then you’ll succeed in finding one after (N/T2)*N1/6 queries total. Setting

    T = N7/6/T2

    and solving for T, we find that a non-collapsing algorithm can evaluate the OR/AND tree with

    O(N7/18) ~ O(N0.389)

    queries, which is better than the

    O(N5/12) ~ O(N0.417)

    for the hidden-variable algorithm, but still not all the way down to O(N1/3).

    In the non-collapsing measurements model, I also get that, by setting k=N2/7 and m=N6/7 (where N is the total number of cells), we can get an O(N2/7)-query algorithm for the function h. This yields a 3.5th-power separation between R0 and the non-collapsing query complexity, better than the cubic separation that Ambainis et al. got between R0 and Q.

    Likewise, for Ambainis et al.’s function f, I get that there’s a non-collapsing algorithm for the first stage (the one that finds the candidate marked column) that uses

    O( min{ (n/k)1/3k, k1/3√(n/k) } ) = O(n7/15)

    queries, and an algorithm for the second stage (the one that verifies the marked column) that uses O(m1/3) queries (as usual, ignoring log factors). Combining this with the lower bound D(f)=Ω(nm) that holds when n≥2m, we find that the non-collapsing query complexity of this function is

    O(D(f)7/30) ~ O(D(f)0.233).

    So, again, this is slightly better than the quartic separation between D and Q.

  44. Scott Says:

    OK, everyone—Andris has kindly sent me a proof of the n2/3 randomized upper bound for GPW’s original function (or rather, a taller, skinnier variant thereof), which I’m reproducing below. It is, indeed, broadly along the lines that Craig Gidney #42 suggested.


    here is the algorithm. Set the number of columns to n^{1/3}, the number of rows to n^{2/3}.

    1. Sample n^{1/3} log n random entries in each column, eliminate all columns with at least one 0. W.h.p. this eliminates all columns in which the total number of 0s is at least C n^{1/3}.

    2. Repeat C log n times, for an appropriately chosen C:

    2a. Choose a random column in which we do not know any 0s, query all elements in it.

    2b. If the column is all-1, test if f=1.

    2c. If not, take every 0 in the column and pursue the sequence of pointers starting at it until the sequence ends or visits the same column for the second time.

    3. If an all-1 column was not found in step 2, answer “f=0”

    The number of queries is:

    Step 1: n^{1/3} log n in each of n^{1/3} columns, total of n^{2/3} log n;

    Step 2: in each iteration, we have:
    – n^{2/3} queries to query all the elements of column;
    – C n^{2/3} queries to pursue c n^{1/3} sequences of pointers, each of which is of length <= n^{1/3}; Since the number of iterations is O(log n), the total number of queries is O(n^{2/3} log n). Correctness: Let S_i be the set of columns that contain 0s but for which we have not yet queried any zeros, after i repetitions of step i. I would like to claim that, if f=1, E[S_{i+1}]<=E[S_i/2]. This implies that, after O(log n) repetitions, S_i will be empty w.h.p. The proof of E[S_{i+1}]<=E[S_i/2] is as follows: - if f=1, there is the good path of pointers; - all columns of S_i are on this path; - if we choose column k in step 2a, we end up eliminating from S_i all the columns which are after column k on the path; - since k is chosen randomly, every other column in S_i is after k with probability 1/2.

  45. jonas Says:

    Hello, Scott. Thanks for this writeup, it’s a nice introduction to query complexities. It seems that this is a TCS topic that would deserve to be known better. David Speyer #35: nice comment, thanks.

    I agree with Sniffnoy #20 that the way you leave quantifiers implicit in the definitions of R and R_0 is confusing. Even after your reasoning in #23, I think you should try to improve that part of the description.

  46. Jan-Åke Says:

    Scott, regarding #10, I would recommend rereading the paper.

  47. Scott Says:

    Jan-Åke #46: OK, I apologize. Having now read past the abstract, I see that what you do is to build a Spekkens-like toy model where the Deutsch-Jozsa problem can be solved with few queries with certainty. Now, people could argue about whether a Spekkens-like model actually counts as “classical resources,” or whether it’s already “semi-quantum”—but in any case, that’s certainly less trivial than observing the randomized algorithm, so if my comment implied that’s all your paper did, then I stand corrected. However, I might also suggest revising your abstract to ward off the “obvious” misreading (i.e., “well of course Deutsch-Jozsa doesn’t give a real quantum speedup, because there’s a fast randomized algorithm for it! everyone knows that!”).

  48. Jan-Åke Says:

    Thanks, that is good advice. (Note: I’ve run our computer implementation of the algorithm with n=10000.)

  49. Job Says:

    Now, people could argue about whether a Spekkens-like model actually counts as “classical resources,” or whether it’s already “semi-quantum”.

    I’d say that Jan’s model is classical because it can be efficiently simulated classically.

    Of course, the simulation overhead needs to be taken into account. But what’s the overhead? That’s the problem.

    If the QC gets a quantum version of the circuit as input, then does the classical algorithm also get its customized version of the circuit – whatever that might be?

    This is why i find the oracle separations for BQP to be weaker than normal.

    Granted, for Simon’s problem it’s not clear how a custom circuit could allow a classical machine to match a QC, but it’s not out of the question.

    My impression is that a toy model, like Jan’s, can exhibit interference and solve Simon’s problem for simple circuits (e.g. using NOT/CNOT), which we already know can be efficiently simulated, but will have a real tough time when Toffoli’s or controlled PI/x rotations are used.

  50. Shtetl-Optimized » Blog Archive » Quantum query complexity: the other shoe drops Says:

    […] weeks ago I blogged about a breakthrough in query complexity: namely, the refutation by Ambainis et al. of a whole slew of conjectures that […]