Sensitivity Conjecture resolved
The Sensitivity Conjecture, which I blogged about here, says that, for every Boolean function f:{0,1}n→{0,1}, the sensitivity of f—that is, the maximum, over all 2n input strings x∈{0,1}n, of the number of input bits such that flipping them changes the value of f—is at most polynomially smaller than a bunch of other complexity measures of f, including f’s block sensitivity, degree as a real polynomial, and classical and quantum query complexities. (For more, see for example this survey by Buhrman and de Wolf. Or for quick definitions of the relevant concepts, see here.)
Ever since it was posed by Nisan and Szegedy in 1989, this conjecture has stood as one of the most frustrating and embarrassing open problems in all of combinatorics and theoretical computer science. It seemed so easy, and so similar to other statements that had 5-line proofs. But a lot of the best people in the field sank months into trying to prove it. For whatever it’s worth, I also sank … well, at least weeks into it.
Now Hao Huang, a mathematician at Emory University, has posted a 6-page preprint on his homepage that finally proves the Sensitivity Conjecture, in the form s(f)≥√deg(f). (I thank Ryan O’Donnell for tipping me off to this.) Within the preprint, the proof itself is about a page and a half.
Whenever there’s an announcement like this, ~99% of the time either the proof is wrong, or at any rate it’s way too complicated for outsiders to evaluate it quickly. This is one of the remaining 1% of cases. I’m rather confident that the proof is right. Why? Because I read and understood it. It took me about half an hour. If you’re comfortable with concepts like induced subgraph and eigenvalue, you can do the same.
From pioneering work by Gotsman and Linial in 1992, it was known that to prove the Sensitivity Conjecture, it suffices to prove the following even simpler combinatorial conjecture:
Let S be any subset of the n-dimensional Boolean hypercube, {0,1}n, which has size 2n-1+1. Then there must be a point in S with at least ~nc neighbors in S.
Here c>0 is some constant (say 1/2), and two points in S are “neighbors” if and only they differ in a single coordinate. Note that if S had size 2n-1, then the above statement would be false—as witnessed, for example, by the set of all n-bit strings with an even number of 1’s.
Huang proceeds by proving the Gotsman-Linial Conjecture. And the way he proves Gotsman-Linial is … well, at this point maybe I should just let you read the damn preprint yourself. I can’t say it more simply than he does.
If I had to try anyway, I’d say: Huang constructs a 2n×2n matrix, called An, that has 0’s where there are no edges between the corresponding vertices of the Boolean hypercube, and either 1’s or -1’s where there are edges—with a simple, weird pattern of 1’s and -1’s that magically makes everything work. He then lets H be an induced subgraph of the Boolean hypercube of size 2n-1+1. He lower-bounds the maximum degree of H by the largest eigenvalue of the corresponding (2n-1+1)×(2n-1+1) submatrix of An. Finally, he lower-bounds that largest eigenvalue by … no, I don’t want to spoil it! Read it yourself!
Paul Erdös famously spoke of a book, maintained by God, in which was written the simplest, most beautiful proof of each theorem. The highest compliment Erdös could give a proof was that it “came straight from the book.” In this case, I find it hard to imagine that even God knows how to prove the Sensitivity Conjecture in any simpler way than this.
Indeed, the question is: how could such an elementary 1.5-page argument have been overlooked for 30 years? I don’t have a compelling answer to that, besides noting that “short” and “elementary” often have little to do with “obvious.” Once you start looking at the spectral properties of this matrix An, the pieces snap together in precisely the right way—but how would you know to look at that?
By coincidence, earlier today I finished reading my first PG Wodehouse novel (Right Ho, Jeeves!), on the gushing recommendation of a friend. I don’t know how I’d missed Wodehouse for 38 years. His defining talent is his ability to tie together five or six plot threads in a way that feels perfect and inevitable even though you didn’t see it coming. This produces a form of pleasure that’s nearly indistinguishable from the pleasure one feels in reading a “proof from the book.” So my pleasure centers are pretty overloaded today—but in such depressing times for the world, I’ll take pleasure wherever I can get it.
Huge congratulations to Hao!
Added thought: What this really is, is one of the purest illustrations I’ve seen in my career of the power and glory of the P≠NP phenomenon. We talk all the time about how proofs are easier to verify than to find. In practice, though, it can be far from obvious that that’s true. Consider your typical STOC/FOCS paper: writing it probably took the authors several months, while fully understanding the thing from scratch would probably take … also several months! If there’s a gap, it’s only by a factor of 4 or 5 or something. Whereas in this case, I don’t know how long Huang spent searching for the proof, but the combined search efforts of the community add up to years or decades. The ratio of the difficulty of finding to the difficulty of completely grasping is in the hundreds of thousands or millions.
Another added thought: Because Hao actually proves a stronger statement than the original Sensitivity Conjecture, it has additional implications, a few of which Hao mentions in his preprint. Here’s one he didn’t mention: any randomized algorithm to guess the parity of an n-bit string, which succeeds with probability at least 2/3 on the majority of strings, must make at least ~√n queries to the string, while any such quantum algorithm must make at least ~n1/4 queries. For more, see the paper Weak Parity by me, Ambainis, Balodis, and Bavarian (Section 6).
Important Update: Hao Huang himself has graciously visited the comment section to satisfy readers’ curiosity by providing a detailed timeline of his work on the Sensitivity Conjecture. (tl;dr: he was introduced to the problem by Mike Saks in 2012, and had been attacking it on and off since then, until he finally had the key insight this past month while writing a grant proposal. Who knew that grant proposals could ever be useful for anything?!?)
Another Update: In the comments section, my former student Shalev Ben-David points out a simplification of Huang’s argument, which no longer uses Cauchy’s interlacing theorem. I thought there was no way this proof could possibly be made any simpler, and I was wrong!
Comment #1 July 2nd, 2019 at 2:24 am
Very enjoyable post, thanks!
Comment #2 July 2nd, 2019 at 3:21 am
This is an absolutely outstanding result! It’s concise enough to read and understand over a morning coffee, but solves a problem that has occupied some of the world’s top theoretical computer scientists over a 30 year period [and me 🙂 – I also spent far too long trying to prove this many years ago, completely fruitlessly]. The equivalence result of Gotsman and Linial that the result uses has a proof that’s only a few lines long as well. It’s pretty amazing that “putting in some minus signs in a fixed matrix and looking at the eigenvalues” is enough to prove this result.
Comment #3 July 2nd, 2019 at 3:49 am
Wodehouse is a lot of fun. Your description of why you liked the one you read reminded me of Piccadilly Jim. There’s an exquisite scene where several people are pretending to be other people (one of them two layers deep), and they all end up in the same room together. Some people are expecting some of them to recognize each other, while others are expecting them not to. Of course, they all maintain their separate illusions for long enough to create quite a bit of humour during the unravelling. Sadly, I’ve only seen the movie, but I’ve heard the book is even better.
Comment #4 July 2nd, 2019 at 4:30 am
[…] Thanks to Noga Alon for telling me about the breakthrough. For more on the conjecture and a polymath project devoted to solve it see this post on Aaronson’s Shtetl Optimized (SO). Update: See also this post on GLL, and this new lovely post on SO. […]
Comment #5 July 2nd, 2019 at 4:32 am
Because it is an isolated curiosity no one relevant to the domain where the problem lies cares.
Comment #6 July 2nd, 2019 at 4:34 am
I was just joking but sometimes things fall like https://www.quantamagazine.org/mathematician-disproves-hedetniemis-graph-theory-conjecture-20190617/, https://www.quantamagazine.org/statistician-proves-gaussian-correlation-inequality-20170328/ etc.
Comment #7 July 2nd, 2019 at 5:53 am
In case anybody else needs it, there’s a very explicit definition of sensitivity/block-sensitivity here:
https://cstheory.stackexchange.com/questions/19902/boolean-functions-where-sensitivity-equals-block-sensitivity
I found the LaTeX definitions a bit easier to work with 🙂 In particular, I kept not realizing that we were flipping input bits one at a time (a realization without which the rest of the definition becomes ludicrous!)
Long time lurker, thanks for all the entertaining posts!
Comment #8 July 2nd, 2019 at 6:31 am
L #5: Nope, this one was an isolated curiosity that everyone in the problem domain cared about. 🙂
Comment #9 July 2nd, 2019 at 7:19 am
Tim #3: Alas, last night I was reading more about Wodehouse the man, and it’s not great. I’d known that the big controversy of Wodehouse’s life, which dogged him until he died, was his willingly making “light and humorous” radio broadcasts for the Nazis, while the Nazis had him under house arrest during WWII. I knew that this divided the English literary establishment, with A.A. Milne (the creator of Winnie the Pooh) ending his friendship with Wodehouse over it, while no less than George Orwell defended Wodehouse as a politically naïve relic of Edwardian England—something that seems 100% plausible from Wodehouse’s writing.
I hadn’t known that Wodehouse was also a crude antisemite in his letters to friends (in a way that sometimes seeped into his published writings as well).
After reading up on this, it seems to me that Wodehouse was not a Nazi or a moral monster—but he wasn’t morally astute or praiseworthy either. As many critics observed, he simply never progressed morally beyond the ‘schoolboy phase.’ When WWII came, he didn’t care which side won, he regarded the whole thing as a big joke, and he genuinely, smugly believed that regarding it as a joke made him superior to anyone dull enough to pick sides.
He was also one of the greatest comic writers of all time—able, as not many are, to transmit rib-shaking laughs across generations and centuries. I look forward to reading more by him.
Comment #10 July 2nd, 2019 at 8:05 am
I must say that those Jeeves and Wooster stories start getting old after you’ve read two or three. It feels like reading the same book over and over.
Comment #11 July 2nd, 2019 at 9:02 am
PGW has never failed to pick me up from slumps (which seem all too frequent these days). Scott, I recommend you to step into the world of Blandings castle — flower pots fly and pigs disappear 🙂
Comment #12 July 2nd, 2019 at 11:23 am
It seems that this provides an example of a (signed) adjacency matrix which has incredibly high multiplicity of its eigenvalues (equal to half the number of vertices), but which absolutely none of its multiplicity is explained by (sign-preserving) automorphisms. Is this the only example of this type, or the best one?
To check that there are no nontrivial sign-preserving automorphisms, I think it’s cleaner to give a global description of the signing. For two n-bit strings that differ in one place, forming an edge in the Boolean hypercube, the associated sign is (-1) raised to the power of the number of 1s to the right of the place they differ. Next observe that automorphisms of the Boolean hypercube come form permuting the coordinates and flipping some of them.
For each coordinate i, we can characterize n-i as the number of coordinates that the sign of edges flipping the i’th coordinate nontrivially depend on. This number is preserved by any permutation or coordinate flip, so in fact the relevant permutation must be trivial. Then because the sign of each edge must be fixed, we can’t flip any coordinates either.
We can make this to an unsigned adjacency matrix by viewing it as defining a double cover of the Boolean cube, at which point it will presumably have a single nontrivial automorphism, switching the double cover, but multiplicity as high as a fourth the number of vertices of the graph.
Comment #13 July 2nd, 2019 at 11:34 am
As you enjoyed Wodehouse’s book, I warmly suggest the amazing tv series “Jeeves and Wooster” with Hugh Laurie (Dr. House!) and Stephen Fry.
https://www.imdb.com/title/tt0098833/
Comment #14 July 2nd, 2019 at 2:10 pm
[…] Sensitivity Conjecture Resolved 3 by traderjane | 1 comments on Hacker News. […]
Comment #15 July 2nd, 2019 at 2:50 pm
Is this another rare lower bounds proof? Is the technique something new, that can possibly be applied to other problems? This is cool.
Comment #16 July 2nd, 2019 at 3:00 pm
asdf #15: Yes, it’s a lower bound, albeit for a very simple combinatorial measure (the sensitivity). It has basically nothing to do with circuit lower bounds problems like P vs. NP.
Whether the techniques will be useful for other problems is an interesting question. Mostly, I expect that dozens of experts are slapping their foreheads right now, trying to figure out how they missed this. The basic ingredients were all readily available in the early 90s, right after the problem was posed, and (as I said) combining them takes just a page. It’s just that no one saw how.
Comment #17 July 2nd, 2019 at 3:47 pm
When it comes to these in-hindsight-simple results, how much do you think it is a productive exercise to reflect on how one might have seen it in advance? There seems to be a trade-off where you risk losing a key insight but also risk wasting time. I’m not even technically an undergrad though, so I would appreciate your take.
Comment #18 July 2nd, 2019 at 4:14 pm
Evan #17: You’re asking exactly the right “strategic question” here. Since these questions are hard to answer in general, let me just speak for myself and about this specific case. While the proof is within my understanding with good margin to spare 🙂 , I’m under no illusions that I “could have” or “should have” or “would have” found it. I knew Gotsman-Linial, I knew the use of algebraic techniques in combinatorics, but I just wasn’t thinking in the relevant direction at all. On the other hand, now that I see the proof, I feel like it really would be excellent practice to assimilate every single step—not in the sense of merely following the step, but in the sense of being able to apply it in other situations. In this case, that would mean, at least, the strategic replacement of 1’s by -1’s in an adjacency matrix, and the Cauchy interlacing theorem.
Comment #19 July 2nd, 2019 at 5:04 pm
In trying to wrap my head around the proof, I got stuck here: the matrices A_n that he defines clearly have the same completely different eigenvalues from the hypercube Q_n. Why does this not create an issue for the interlacing argument?
Comment #20 July 2nd, 2019 at 7:14 pm
Really beautiful proof.
My thoughts are: Morally, why does this proof work? What “in principle” is this A_n matrix achieving by converting 1 to -1? As it stands, it seems qualitatively correct, but magical to me.
Comment #21 July 2nd, 2019 at 7:35 pm
Naresh #20: There’s an answer to your question, but I don’t know if there’s any answer that’s shorter than the proof itself.
Comment #22 July 2nd, 2019 at 8:13 pm
I was wondering if there are other ways of assigning signs to the Boolean cube to ensure that the spectrum is still sqrt{n}, -sqrt{n}. So we need A^2 = nI and this is equivalent to saying that for any 4 cycle of the cube, 3 edges should be of the same sign and the 4th edge should be of opposite sign. For such a good sign pattern note that picking any vertex and flipping the signs of all the edges incident on it would still give a good sign pattern. Up to this transformation I think Hao’s construction is the only way to sign the cube and get the needed spectrum.
To show this see an n-cube as 2 (n-1) cubes attached by the matching edges in between them. Now say you assigned signs some way to the first (n-1) cube. WLOG we can assume that all the matching edges have sign 1, else take the vertex in the second cube that the -1 matched edge is incident on and flip the signs of all the edges incident on that vertex. Now note that the sign of an edge e in the second (n-1) cube should exactly be the opposite of that in the first cube. To see this consider the 4 cycle formed by the edge e in the first and second (n-1) cubes and the two matching edges which were given sign 1 and note that the 4 signs should be {1,1,1,-1}. But this is exactly Hao’s construction.
Comment #23 July 2nd, 2019 at 10:31 pm
Wodehouse’s antisemitism, and particularly his alleged Naziism, are greatly exaggerated.
I could go on, but rather, I would like to point out that Wodehouse provided a proof that “The Code Of The Woosters” is his greatest Bertie & Jeeves novel. The proof is over 200 pages long, but is easy to read.
Comment #24 July 3rd, 2019 at 12:04 am
Scott #9. As it happens, my wife was recently telling me about this (at least partial) rebuttal of precisely the same Wodehouse-was-an-antisemite article as you linked to. I hadn’t read either the article or the rebuttal before today, but I’m glad you’re able to enjoy Wodehouse’s work, even if you think he was a bit morally naïve.
Comment #25 July 3rd, 2019 at 2:31 am
I am wondering if you know of any “interesting” implications of this result. To date, I have the strange feeling that this is the answer to a question many people were interesting in, but for no other reason than its beauty and naturality(?). This is by no means bad reasons of course! In other words, does it help us to understand computation, even in some restricted models?
Comment #26 July 3rd, 2019 at 3:27 am
What do you mean by ‘Because Hao actually proves a stronger statement than the original Sensitivity Conjecture’?
Comment #27 July 3rd, 2019 at 4:10 am
Peter #23:
Wodehouse’s antisemitism, and particularly his alleged Naziism, are greatly exaggerated.
In my comment, I tried hard not to exaggerate them.
I could go on, but rather, I would like to point out that Wodehouse provided a proof that “The Code Of The Woosters” is his greatest Bertie & Jeeves novel. The proof is over 200 pages long, but is easy to read.
Err, wouldn’t the proof require reading all the Bertie & Jeeves novels? NP≠coNP. 😀
Comment #28 July 3rd, 2019 at 4:18 am
B. #25:
To date, I have the strange feeling that this is the answer to a question many people were interesting [sic] in, but for no other reason than its beauty and naturality(?).
In this particular case, your feeling is emphatically correct. I don’t want to lie to you and claim it isn’t.
In other words, does it help us to understand computation, even in some restricted models?
In the second update to the post, I gave one example of how it helps us understand computation. Namely, it proves my, Ambainis et al.’s “weak parity conjecture”—showing that any randomized algorithm to guess the parity of an n-bit string, even just on slightly more than half of all strings, needs to examine at least ~√n bits. To see why this is nontrivial, note that there actually is such a randomized algorithm that examines only ~n0.753 bits! Whereas before two days ago, the only lower bound we knew was that ~log(n) bits have to be examined.
I regret that right now I can’t think of other implications for computation, but maybe someone else can?
Comment #29 July 3rd, 2019 at 4:19 am
Note that not only is deg(f)<=s(f)^2 a new result, but I believe even deg(f)<=bs(f)^2 is new. The previous best was deg(f)<=bs(f)^3.
Comment #30 July 3rd, 2019 at 4:22 am
aa #26:
What do you mean by ‘Because Hao actually proves a stronger statement than the original Sensitivity Conjecture’?
I meant, he actually proves the conjecture of Chung et al., that every induced subgraph of the Boolean hypercube of size 2n-1+1 has maximum degree at least ~√n. This implies the Gotsman-Linial Conjecture (which, strictly speaking, involves both a subset of the hypercube and its complement), which is known to be equivalent to the Sensitivity Conjecture, but an implication in the other direction hadn’t been known.
Comment #31 July 3rd, 2019 at 4:49 am
Shalev #29: Nice observation! Can that, in turn, be used to tighten anything else?
Comment #32 July 3rd, 2019 at 5:05 am
Scott #31: Well, block sensitivity lower bounds various other measures, so we can get (for example) R(f)>=sqrt(deg(f)) and Q(f)>=deg(f)^(1/4). Perhaps the most interesting relation in that direction is the relationship between approximate degree and exact degree: they are now related by a power 4 instead of the previous 6.
This result also gives a new proof that D(f)<=bs(f)^3 – and indeed, it shows D(f)<=bs(f)*s(f)^2, which is better than the previously known D(f)<=bs(f)^2*s(f). But none of these are all that amazing; I don't think this gives us anything spectacular (except for the resolution of the sensitivity conjecture, of course!)
Comment #33 July 3rd, 2019 at 5:55 am
Wouldn’t everyone making sure to think how to apply these techniques in another setting be slightly counterproductive? In the sense that the proof is about picking which of the many many possible areas to bring together, then building a construction; so either someone needs to try to find something in common between multiple surprising proofs to get nice guidelines, or else the problems will get better method-coverage if people do not synchronise their method preferences.
Comment #34 July 3rd, 2019 at 7:30 am
Martin Roetteler #19: It’s OK that A_n has completely different eigenvalues from Q^n (in fact, the argument requires it!) – the interlacing theorem is applied to principal submatrices of A_n, and ends up proving a lower bound on the largest eigenvalue of such submatrices. By Lemma 2.3, this implies a lower bound on the maximal degree of the graph associated with the corresponding “unsigned” matrix.
Comment #35 July 3rd, 2019 at 9:17 am
By the way, in case others (like myself) are unfamiliar or uncomfortable with Cauchy’s interlace theorem, here’s a version of the proof that does not use it:
Let S be the sqrt(n)-eigenspace of the matrix A_n. Then S has dimension 2^(n-1). Consider a large principal submatrix B of A_n. We wish to lower bound its spectral norm (maximum eigenvalue). This is the same as maximizing over vectors x with norm 1. But it’s not hard to see that this maximum as the same as the maximum of over unit vectors x that have a 0 entry on indices corresponding to rows/columns that are not in B. Let L by the subspace of all such vectors with 0 entries on those indices. Then L has dimension at least 2^(n-1)+1, since B uses at least that many rows/columns. It follows that the intersection of L with S must have dimension at least 1. Thus there is a unit vector x that is in both L and S, meaning it has 0s on entries corresponding to rows/columns not in B but it is a sqrt(n)-eigenvector of A_n. This gives a lower bound of sqrt(n) on the spectral norm of B, as desired.
Comment #36 July 3rd, 2019 at 9:36 am
Apart from everything by Wodehouse, I love this interview, done when he was 91. It is amazing to me how someone can remain so irrepressibly happy:
https://web.archive.org/web/20161202045207/http://www.theparisreview.org/interviews/3773/p-g-wodehouse-the-art-of-fiction-no-60-p-g-wodehouse
What comes through in the interview: Nothing fazes him. He knows what he’s doing, knows he couldn’t do anything else, knows he’s good at what he does, is happy. If critics complain about something, he cheerfully accepts their criticism but with no regrets. He rereads the same old books he likes. (“Every year or so I read Shakespeare straight through.”) He cannot recall any bad times in his life. He is aware of how fortunate he has been. A. A. Milne was nasty to him, (http://strangeco.blogspot.com/2013/03/the-p-g-wodehousea-milne-feud.html) spearheading the public lynching of his reputation, but having got it out of his system he continues to read Milne’s books for pleasure.
There’s also this by Stephen Fry: http://www.pgwodehousebooks.com/fry.htm (“he taught me something about good nature. It *is* enough to be benign, to be gentle, to be funny, to be kind.”)
Comment #37 July 3rd, 2019 at 11:55 am
“any randomized algorithm to guess the parity of an n-bit string, which succeeds with probability at least 2/3 on the majority of strings, must make at least ~√n queries to the string”
I don’t get how anything short of looking at every bits of the string can work since we have no idea how the input strings are generated (any bit we haven’t looked at could change the parity with same probability as any other bit).
Comment #38 July 3rd, 2019 at 12:02 pm
if we do √n queries, we’re left with n – √n bits we haven’t looked at, and how is the parity of those bits related to the parity of the √n bits we’ve sampled? (without any a priori idea of how the strings are generated). Two successive random processes create order?
Comment #39 July 3rd, 2019 at 12:08 pm
“The ratio of the difficulty of finding to the difficulty of completely grasping is in the hundreds of thousands or millions”
It’s not that different than Alpha Go Zero taking a couple of hours (from scratch) to come up with observable novel tactics that human experts had overlooked for thousands of years.
Comment #40 July 3rd, 2019 at 12:54 pm
fred #37: If it seems impossible to you, then READ OUR PAPER to find out exactly how to do it! Also, pay careful attention to what I said: the algorithm only needs to succeed on slightly more than half of n-bit strings, not on all strings.
Comment #41 July 3rd, 2019 at 1:07 pm
Tim #24, and svat #36, thank you for the links.
I had heard the calumnies about Wodehouse but those articles refute it all quite convincingly.
In particular the transcript of the actual broadcasts shows them to have been all of a piece with the man himself, mocking his captors gently but clearly..
Comment #42 July 3rd, 2019 at 1:31 pm
Doug #41: Again, the sane charge against Wodehouse is not that he was some sort of secret Nazi sympathizer. That would be unhinged.
I have some (ahem) experience with hyperbolic, reality-divorced charges being laid against one’s person based on out-of-context or spitefully-reinterpreted quotes, so I try to be super-sensitive about doing the same to others.
The charge, rather, is that Wodehouse was cheerful, kindly, pleasant, undoubtedly a blast to talk to … but also shockingly morally obtuse, completely failing to understand (for example) why rule by Nazis would be any worse than rule by anyone else. That he was sort of trapped, developmentally, in the England of 1900-1914—and this was one of the secrets to his success as a writer, but it also made him singularly ill-equipped to understand the ensuing events of the 20th century. And the evidence for this is not the one isolated incident, but everything he said and did throughout his life.
Comment #43 July 3rd, 2019 at 2:05 pm
“Wodehouse’s antisemitism, and particularly his alleged Naziism, are greatly exaggerated.”
Interesting because, in the current climate, racism is clearly treated like being pregnant (you’re either pure as a virgin or about to pop triplets, there’s no in-between).
Comment #44 July 3rd, 2019 at 2:39 pm
Scott #40.
I think I’m struggling with statements about “one” specific input string, like
“the problem of computing the parity of an n-bit input string”
and statements related to a set of input strings, like
“where one only has to succeed on a 1/2+ε fraction of input strings”
What do you actually consider as “one run” of the algorithm, where at the end of it you can say you’ve beat the 1/2 fraction or not?
Is the number of input strings in a given run itself a free parameter?
Is one run of the algorithm (for a given n) always being fed every one of the 2^n possible input strings (in unknown order)? Or just a subset based on some unknown sampling of all possible 2^n strings? (e.g. me feeding your algo a set of strings that all have even parity)
Has this been actually implemented?
Comment #45 July 3rd, 2019 at 2:47 pm
fred #44: You need a general algorithm that does something on any n-bit input string. We say an algorithm A “succeeds” if, for more than half of input strings X, we have
Pr[ A(X)=Par(X) ] ≥ 2/3,
where A(X) is A’s output and Par(X) is the parity of X, and here the probability is over the internal randomness used by A.
We, surprisingly, give a randomized algorithm that succeeds in the above sense and that makes only O(n0.754) queries to A.
Since our algorithm basically just amounts to game-tree evaluation, I’m sure thousands of people have coded it up without intending to do so. In any case, it would be trivial to code, if for some unknown reason one was satisfied to “succeed” on Parity in the sense above.
Comment #46 July 3rd, 2019 at 2:56 pm
Shalev #32: Cool!! I wonder if there’s insight here that could be usable towards D(f)=O(Q(f)4), thereby settling my other favorite open problem about Boolean function complexity measures…
Comment #47 July 3rd, 2019 at 3:35 pm
Shalev #35: Nice!!! I didn’t think it was possible to make this proof any simpler, and I was wrong.
Comment #48 July 3rd, 2019 at 4:43 pm
The number of papers on the Sensitivity Conjecture is rather small and these papers are of minimal length. Draw your own conclusions.
References
[1] A. Ambainis, X. Sun, New separation between s(f) and bs(f), Electronic Colloquium on Computational Complexity (ECCC),18, 116, 2011. Available at http://eccc.hpi-web.de/report/2011/11.
[2] F. Chung, Z. Furedi, R. Graham, P. Seymour, On induced subgraphs of the cube, J. Comb. Theory, Ser. A, 49(1) (1988), 180–187.
[3] A. Drucker, A note on a communication game, CoRR, abs/1706.07890, 2017.
[5] J. Gilmer, M. Koucky, M. Saks, A new approach to the sensitivity conjecture, Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, ACM, 2015, 247–254
[6] P. Gopalan, R. Servedio, A. Wigderson, Degree and sensitivity: tails of two distributions, 31st Conference on Computational Complexity, CCC 2016, Tokyo,Japan, 13:1–13:23.
[7] C. Gotsman, N. Linial, The equivalence of two problems on the cube. J. Combin.Theory Ser. A, 61(1) (1992), 142–146.
[8] P. Hatami, R. Kulkarni, D. Pankratov, Variations on the Sensitivity Conjecture, Theory of Computing Library, Graduate Surveys 4(2011), 1–27.
[9] C. Kenyon, S. Kutin, Sensitivity, block sensitivity, and ℓ-block sensitivity of Boolean functions, Information and Computation, 189(1) (2004), 43–53.
[10] N. Nisan, CREW PRAMs and decision trees, Proc. 21st STOC, New York, NY, ACM Press (1989) 327–335.
[11] N. Nisan, M. Szegedy, On the degree of Boolean functions as real polynomials, Comput. Complexity, 4 (1992), 462–467.
[12] D. Rubinstein, Sensitivity vs. block sensitivity of Boolean functions, Combinatorica, 15(2) (1995), 297–299.
[13] M. Virza, Sensitivity versus block sensitivity of boolean functions, Information Processing Letters, 111(2011), 433
Comment #49 July 3rd, 2019 at 5:28 pm
Scott: Thanks for writing such an excellent article! You explained the problem and my contribution much better than I did. Recently I have received quite a few inquiries from my non-mathematician friends about this work, and I just showed them the link to your article 🙂
Regarding how long it took me to find this proof, here is the timeline for those who are interested.
Nov 2012: I was introduced to this problem by Michael Saks when I was a postdoc at the IAS, and got immediately attracted by the induced subgraph reformulation. And of course, in the next few weeks, I exhausted all the combinatorial techinques that I am aware of, yet I could not even improve the constant factor from the Chung-Furedi-Graham-Seymour paper, or give an alternative proof without using the isoperimetric inequality.
Around mid-year 2013: I started to believe that the maximum eigenvalue is a better parameter to look at, actually it is polynomially related to the max degree, i.e \sqrt{\Delta(G)} \le \lambda(G) \le \Delta(G). And in some sense it reflects some kind of “average degree” (unfortunately the average degree itself could be very small, something like \sqrt{n}/2^n).
2013-2018: I revisited this conjecture every time when I learn a new tool, without any success though. But at least thinking about it helps me quickly fall asleep many nights.
Late 2018: After working on a project (with Pohoata and Klurman) that uses Cvetkovic’s inertia bound to re-prove Kleitman’s isodiametric theorem (it is another cute proof using algebra solving extremal combinatoiral problems), and several semesters of teaching a graduate combinatorics course, I started to have a better understanding of eigenvalue interlacing, and believe that it might help this problem. For example, applying interlacing to the original adjacency matrix, one can already show that with (1/2+c) proportion of vertices, the induced subgraph has maximum degree c’*\sqrt{n}. I don’t think this statement could follow easily from combinatorial arguments. Yet at that time, I was hoping for developing something more general using the eigenspace decomposition of the adjacency matrix, like in this unanswered MO question:
https://mathoverflow.net/questions/331825/generalization-of-cauchys-eigenvalue-interlacing-theorem
June 2019: In a Madrid hotel when I was painfully writing a proposal and trying to make the approaches sound more convincing, I finally realized that the maximum eigenvalue of any pseudo-adjacency matrix of a graph provides lower bound on the maximum degree. The rest is just a bit of trial-and-error and linear algebra.
Comment #50 July 3rd, 2019 at 6:47 pm
Z #48: Thanks for the bibliography! I’m not sure that it’s complete, but even supposing it was—13 papers by serious authors, on an open problem somewhat disconnected from the rest of TCS, is a perfectly respectable number. (And in terms of the actual effort expended on the problem, I think those papers are just the tip of an iceberg.)
Comment #51 July 3rd, 2019 at 6:53 pm
Hao #49: Thank you so much for the detailed timeline! I’d suspected that you must’ve worked on the problem for a long time and tried many blind alleys—that this spectral approach didn’t just “spring fully-formed from the brow of Zeus”—but it’s great to have confirmation.
Comment #52 July 3rd, 2019 at 8:50 pm
I’m utterly flummoxed about the weak parity results. If a classical algorithm attempting to guess the parity of a string skips reading even a single bit of that string, then the probability of that last bit will be 1/2, and the algorithms guess will have exactly 1/2 probability of being correct, won’t it?
Comment #53 July 3rd, 2019 at 9:11 pm
Bram #52: You should be flummoxed! 😀
If you’d like to think about it without reading our paper—well, it’s analogous to the famous hat puzzle. Note that the following two statements can simultaneously be true:
(1) The success probability, averaged over all inputs, is exactly 1/2.
(2) On more than half of the inputs, the success probability exceeds 2/3.
Comment #54 July 3rd, 2019 at 10:09 pm
Scott#46, one barrier to extending Hao’s results to give D(f)=O(Q(f)^4) is that D(f) can be cubically larger than s(f) – there’s no hope of showing D(f)<=s(f)^2, unfortunately. Same for R(f). See my paper with Pooya Hatami and Avishay Tal:
https://arxiv.org/pdf/1605.07084.pdf
Z#48 and Scott#50: That bibliography isn't all the papers on the sensitivity conjecture! Those are just the ones Hao cites in his paper.
Note that the Hatami, Kulkarni, and Pankratov citation is a survey about sensitivity from 2011; it itself contains many citations, so check it out for early work. As for more recent progress, check out the citations in the paper I linked above, where we list "recent progress" on sensitivity as [AS11, Bop12, AP14, ABG+14, APV15, AV15, GKS15, Sze15, GNS+16, GSTW16, Tal16], only two of which are cited in Z's list (I think). Also, our list of citations is only current as of mid 2016, and there has been work since.
Comment #55 July 4th, 2019 at 1:22 am
[…] by /u/JoshuaZ1 [link] […]
Comment #56 July 4th, 2019 at 12:55 pm
What is meant by the summation over “j ~ 1” in the proof of Lemma 2.3?
Comment #57 July 4th, 2019 at 1:47 pm
Hmm from context I’m going to guess j ≠ 1
Comment #58 July 4th, 2019 at 3:27 pm
Scott #52: So here’s an example strategy: I look at all but one of the bits chosen randomly, and if they’re all 0 then I guess that the other one is 1, otherwise I say don’t know. That way there are N possible configurations for which I always guess correctly, and one for which I always guess wrong, so I’m more than 50% likely to guess right on more than half the configurations, but ironically when I do guess it’s almost always wrong because I’ll always guess on the wrong one but only rarely on the others, so maybe the opposite strategy is the ‘better’ one.
Am I making sense here? It seems like there isn’t a very big space of possible strategies: All you can do is look at a certain number of bits chosen randomly and based on the number of 0 and 1 bits guess either way.
Comment #59 July 4th, 2019 at 3:43 pm
cga #56, #57: It means “j adjacent to 1” (in the graph, where vertices have been put in correspondence with natural numbers).
Comment #60 July 4th, 2019 at 4:09 pm
Bram #58: You’re more-or-less on the right track. Keep in mind that, by the stated rules, you only need to win on slightly more than half of all n-bit strings.
Thus, here’s an example that we actually thought of first. Suppose you run Grover’s algorithm on your n-bit string to search for ‘1’ bits, taking O(√n) time. If the string turns out to be all 0’s, you can output that its parity is 0. If it’s not all 0’s, you can guess that its parity is 1—as it’s indeed slightly more likely to be than 0 (since we eliminated a single string of parity 0).
So now all one needs to do is think of a classical randomized algorithm with a similar behavior! 🙂
Comment #61 July 4th, 2019 at 11:09 pm
Is the argument in #35 really simpler? That looks to me like roughly the reasoning used to prove the interlacing theorem, just specialized to this case. Am I mistaken there?
Comment #62 July 4th, 2019 at 11:30 pm
Sniffnoy #61: Eh, I guess it’s a matter of taste. I’d probably seen the interlacing theorem at some point in my life, but I neither remembered it nor knew its proof. And its statement contains a lot more information than Huang actually uses. So, given that it’s possible to do so, I’d rather replace the appeal to that theorem by a few lines of totally self-contained linear algebra.
Comment #63 July 5th, 2019 at 5:05 am
The signing of the hypercube is extremal in the sense that it minimizes the spectral radius. This follows from the fact that if A is any signing then the diagonal elements of A^2 are all equal to the dimension, hence A must have an eigenvalue whose magnitude is at least the square root of the dimension.
For this reason, it wouldn’t be too surprising if this signing were to be found somewhere else.
We can ask for similar signings of other Cayley graphs, aiming at a spectral radius which is the square root of the number of generators. Some examples include the Cayley graph on the symmetric group generated by all transpositions, and the Schreier graph of the “slice” (all vectors in the hypercube with some fixed Hamming weight) generated by all transpositions.
Comment #64 July 5th, 2019 at 7:03 am
If I understand correctly, the proof is about taking a step back, looking at the problem in a larger space (more dimensions)? So, giving it room to breathe?
I always find that technique quite fascinating.
Comment #65 July 5th, 2019 at 11:19 am
fred #64: You could describe it that way, but you could also describe almost any nontrivial proof the same way! I.e., almost every really interesting proof works by answering a larger question than the one that was asked, involving objects that don’t even appear in the original question. The hard part is figuring out which larger context is the right one for a given question.
In this case, I’d say that the key insights were:
(1) Realizing you can prove the Gotsman-Linial conjecture by lower-bounding the largest eigenvalue of the induced subgraph, and that this is still true even if you attach weights of +1 and -1 to the edges of the Boolean hypercube.
(2) Finding a clever way to attach those weights that makes everything magically work out, with √n and -√n eigenspaces both of dimension 2n-1.
Comment #66 July 5th, 2019 at 4:50 pm
Scott #60: This seems like a combinatorial rather than an algorithmic problem to me. My guess (although I could be wrong) is that there’s no benefit to being careful about which bits you look at, and you might as well choose which ones to look at randomly and then guessing based on how many 0 bits there were, ignoring order. There’s a convex hull of quality of solutions: For each fraction of the possible assignments, that many of them are guessed right at least a certain other fraction of the time, but the fraction of 1/2 + epsilon seems to be the most important.
Running through simple cases by hand, with two bits guessed out of 3 the best strategy is on 00 guess 1, on 01, flip a coin, and on 11 guess 0. That gets 2/3 reliability in 6/8 of the cases and zero in the others. For two bits out of four the best strategy is if the two bits are different guess 1, otherwise guess 0 with a 2/3 probability and otherwise 1, which in 10 out of 16 cases guesses right 2/3 of the time. With 2 bits out of 5 the best strategy is the exact inverse for 2 bits out of 3, with on 00 guess 0, on 11 guess 1, and on 01 flip a coin. 5/8 of the time that guesses right 3/5 of the time.
Notable things in that list is that the three strategies are wildly different from each other and that 2 is less than the square root of 5.
Comment #67 July 5th, 2019 at 5:01 pm
Bram #66: It’s absolutely true that, before you begin the guessing algorithm, you can reorder the n bits in any way you like. Once the algorithm starts, though, you may need to make choices that break complete permutation symmetry among the bits. Indeed, that’s exactly what happens in our randomized algorithm that correctly guesses the parity on more than half of strings by examining only O(n0.754) bits. If it still sounds impossible … well, I think you now have “permission” to look at the paper!
Comment #68 July 6th, 2019 at 5:21 am
Yeah that does sound unreasonable, especially for a classical algorithm.
It’s like it’s concentrating all of the non-zero probability of randomly being right for more than half of the inputs, into just being right for exactly half + 1.
BTW, when using Grover’s search to compute OR(X) do you run it on a function f(i) = x_i?
Comment #69 July 6th, 2019 at 5:31 am
Rao #22: If we think of the signs as elements in F_2, then the condition becomes that every 4-cycle sums to 1. You’re describing the space of all solutions to a system of linear equations.
Comment #70 July 6th, 2019 at 7:54 am
Job #68:
BTW, when using Grover’s search to compute OR(X) do you run it on a function f(i) = x_i?
Yes.
Comment #71 July 7th, 2019 at 5:30 pm
Scott #67
“you may need to make choices that break complete permutation symmetry among the bits”
I figured that if the 2^N entries are laid down in increasing order (0, 1, 2, 3,… 2^N), looking at the bits, and the parity, there’s a kind of recursive symmetry to it – no matter what subset of bits you consider, half of them are gonna be parity 1, the other parity 0.
But I still don’t see how scrambling that symmetry would make it so that you magically get more lucky on average.
What puzzles me is that if you consider sampling N-1 bits instead of √n (so you know all the bits except one), you’re still left having to “beat” a random coin flip. Seems odd that looking at less bits would give you an edge…
Brem #66
“Running through simple cases by hand, with two bits guessed out of 3 the best strategy is on 00 guess 1, on 01, flip a coin, and on 11 guess 0. That gets 2/3 reliability in 6/8 of the cases and zero in the others.”
I don’t see it… 00, guess 1, you’re right half the time, 11, guess 0, you’re right half the time, and 01, flip a coin, you’re still right half the time.
Comment #72 July 7th, 2019 at 5:38 pm
Btw, I do understand the Monty Hall trick, where something is hidden in 3 boxes, you take one guess, and then host shows you one of the other two boxes that’s empty (he can always do that) and asks you if you want to switch, and you’re always better off switching your initial choice. But I don’t see how this applies here.
Comment #73 July 7th, 2019 at 8:44 pm
fred #71, #72: Read the paper.
Comment #74 July 8th, 2019 at 3:41 am
There is also an elegant “low-level variant” of this proof that doesn’t use any theorems but the following “a linear system of m equations and m+1 variables has a non-trivial solution”.
https://fedyapetrov.wordpress.com/2019/07/03/low-level-variant-of-huangs-argument-for-sensitivity-conjecture/
Comment #75 July 8th, 2019 at 6:24 am
Alexander #74: That sounds exactly like what Shalev noted higher up in the thread.
Comment #76 July 8th, 2019 at 8:39 am
Scott #73
Haha, I’m trying!
I’m even struggling with the concept of “random algorithm”, e.g when you write:
“[…] the minimum number of queries made by any randomized algorithm that computes f(X) with success probability at least 2/3 for every X”
How do you come up with this probability?
Do you mean that over the entire set of possible input (finite set of 2^n inputs for n), the fraction of inputs for which it works is exactly > 2/3?
Or do you mean that if we were to feed that same one input to the algorithm, over and over, the algorithm could give a true or false answer, but it would come up with the right answer 2/3 of the time (asymptotic value as we feed it the same input an infinite number of times)? And that would be true for any one input (of a possible 2^n) we decide to feed it.
Comment #77 July 8th, 2019 at 9:01 am
fred #76: The latter. As I said before, on each individual one of the 2n-1+1 inputs on which the algorithm “succeeds,” it outputs the right answer with at least 2/3 probability, where here the probability is over the algorithm’s internal random choices.
Comment #78 July 8th, 2019 at 11:30 am
Thanks for your patience, Scott!
Time isn’t a resource I have a lot left to spend, but it seems really worth it to try and understand this fully (it’s not everyday I come across something as puzzling which also has been solved). I’ll do my homework without bugging you more about it.
Comment #79 July 11th, 2019 at 3:18 pm
[…] of circuits and chips for digital computers. Huang’s proof is only six pages long and has been applauded for its simplicity. The method has the potential to apply to other prominent computer science […]
Comment #80 July 12th, 2019 at 5:52 am
[…] fact, as also noted on Scott’s blog, this case of interlacing can be inferred from simpler reasoning—but […]
Comment #81 July 21st, 2019 at 2:44 pm
From the pure operator theory perspective the useful general trick from the paper was improving the interlacing bound by modifying the matrix. However, the way lemma 2.3 is stated may obscure what is going on a bit unless one reads through the paper closely.
Indeed, the paper’s statement “From the proof, one actually has lamda1(H) >= lamda1(A_H)…” appears to mean that to convert “actually” to “exactly” one must reformulate lemma 2.3 and its proof slightly to break it up into two components:
1) The degree is greater than the top eigenvalue of the adjacency matrix.
2) The top eigenvalue of a self-adjoint matrix with non-negative entries is only reduced when the entries are multiplied by numbers in {-1,0,1}. (However, without change to the proof, the set of factors {-1,0,1} can be enlarged to the unit interval [-1,1] or to the complex unit ball, as long as one enforces self-adjointness.)
So a key idea of the proof of Theorem 1.1 is that although the unmodified induced adjacency matrix indeed has a top eigenvalue greater than root n, the interlacing bound itself is too lossy to prove this without modification. (For example, in the n=4 case the eigenvalues of the unmodified adjacency matrix of Q^4 are {-4,-2,-2,-2,-2,0,0,0,0,0,0,2,2,2,2,4}, giving the trivial bound lamda1>=0.)
In any given application of this trick for improving interlacing bounds, some luck will however be required for the optimal factors to be simple to write down. Finding the factors is an optimization problem that one can get computerized help for, however.
Comment #82 July 25th, 2019 at 1:06 pm
[…] and theoretical computer science,” wrote Scott Aaronson of the University of Texas, Austin, in a blog post. “The list of people who tried to solve it and failed is like a who’s who of discrete math and […]
Comment #83 July 26th, 2019 at 7:13 am
Hi Scott,
I’m a long-time reader of your blog, though I’ve only commented a couple of times since I usually don’t have much to add. But this time, I noticed something about the “magical” pattern of 1’s and -1’s that you might find interesting. Consider the following slightly modified pattern:
A_1 = 0,1;-1,0
A_n = A_{n-1}, I; -I, -A_{n-1}
First, this matrix has eigenvalues +-sqrt(-n) and is anti-symmetric, but I think a modified version of Lemma 2.3 would still go through.
Second, it is the adjacency matrix of a directed version of the cube graph. And not just any cube graph, but in fact the *Klee-Minty cube*! The KM cube is a notable example of a *unique sink orientation*, which are a purely combinatorial representations of various optimization problems such as linear programming. The KM cube can be used to show that the simplex algorithm may need exponentially long to solve a LP, since it contains a Hamilton path of length 2^n – 1.
So, I am not that surprised to see a construction similar to Klee-Minty cubes used here, as it basically maximizes “askewness” in some sense. Both Will Sawin #12 and Sankeerth Rao #22 make correct observations about the properties of this cube. In particular, the operation of “flipping all the bits at one vertex” mentioned by #22 turns the KM cube into an *odd USO*, which we observed in this paper: https://arxiv.org/pdf/1704.08481.pdf (see Lemma 21).
Comment #84 July 27th, 2019 at 2:41 am
As a part-time group theorist, I cannot help noticing that the -1,0,1 adjacency matrix of the hypercube designed by Hao is proportional to an involution, so its eigenvalues are very well under control (e.g. all the same in absolute value). Does this matrix have a large monomial (i.e. signed permutation matrices) centraliser?
Comment #85 July 28th, 2019 at 12:51 am
I’ve got the proof down to one page at https://www.cs.stanford.edu/~knuth/papers/huang.pdf
(but I’m not sure if God will vote for it)
Comment #86 July 28th, 2019 at 7:22 am
I also wrote a blog post for students and non-experts: [A walk-through of Hao Huang’s solution to the sensitivity conjecture](http://www.zhengwenjie.net/huanghao/)
Comment #87 July 28th, 2019 at 3:26 pm
A bit of conceptual explanation behind the matrix A=A_n. It will be convenient to identify the square matrices of size 2^n with the tensor power of square matrices of size 2.
Let \sigma_1 and \sigma_3 be Pauli matrices (see https://en.wikipedia.org/wiki/Pauli_matrices) and let I be the identity matrix (all of these matrices are square matrices of size 2).
Now let
B_1 = \sigma_3 \otimes I \otimes \cdots \otimes I,
B_2= \sigma_1 \otimes \sigma_3 \otimes I \otimes \cdots \otimes I,
B_3= \sigma_1 \otimes \sigma_1 \otimes \sigma_3 \otimes I \otimes \cdots \otimes I,
…
B_n= \sigma_1 \otimes \cdots \otimes \sigma_1 \otimes \sigma_3;
(B_i contains exactly one factor \sigma_3 on i-th position, all factors before it are equal to \sigma_3 and all factors after it are equal to the identity matrix I).
Note that the matrices B_i described above appear naturally in the context of quantum mechanics for the following reason: since Pauli matrices anticommute: \sigma_1 \sigma_3=- \sigma_3 \sigma_1, it follows that the matrices B_i anticommute as well: B_i B_j = – B_j B_i whenever i\neq j. Also, (B_i)^2=I\otimes \cdots \otimes I is the identity matrix.
Luckily, we have that A=B_1+\cdots+B_n is a (signed) adjacency matrix for the graph we are interested in.
From the anticommutation of the matrices B_i it follows immediately that
A^2=(B_1+\cdots+B_n)^2= n.
Comment #88 July 30th, 2019 at 9:16 am
[…] own variant of Huang’s proof on his homepage—in Knuth’s words, fleshing out the argument that Shalev Ben-David previously posted on this blog—and then left a comment about it here, […]
Comment #89 July 31st, 2019 at 7:03 am
Hao #49: This timeline is lovely by the way, and I think enhances your result tremendously on a human level. I think we would all benefit from such timelines being appended to papers, just to get a sense of the struggle behind some of the most magnificent work, as well as more mundane progress. As Scott #51 indicates, it really shows your blood-and-sweat contribution rather than just Zeus’ inspiration.
Comment #90 August 1st, 2019 at 7:25 pm
[…] Knuth on Huang’s Sensitivity Proof: “I’ve got the proof down to one page” [pdf] 6 by espeed | 4 comments on Hacker News. […]
Comment #91 September 3rd, 2019 at 11:37 pm
[…] wrote Scott Aaronson, professor of computer science at the University of Texas at Austin, in a blog post about Huang’s […]
Comment #92 September 4th, 2019 at 9:33 am
[…] this,” University of Texas at Austin theoretical computer scientist Scott Aaronson wrote on his blog, “~99% of the time either the proof is wrong, or at any rate it’s way too complicated […]
Comment #93 November 17th, 2019 at 5:33 pm
[…] and should: at 19 pages, it looks very approachable and clear, if not as radically short as (say) Huang’s proof of the Sensitivity Conjecture. Their argument subsumes all the earlier results that I knew would need to be subsumed, and […]