Six announcements

September 21st, 2015
  1. I did a podcast interview with Julia Galef for her series “Rationally Speaking.”  See also here for the transcript (which I read rather than having to listen to myself stutter).  The interview is all about Aumann’s Theorem, and whether rational people can agree to disagree.  It covers a lot of the same ground as my recent post on the same topic, except with less technical detail about agreement theory and more … well, agreement.  At Julia’s suggestion, we’re planning to do a follow-up podcast about the particular intractability of online disagreements.  I feel confident that we’ll solve that problem once and for all.  (Update: Also check out this YouTube video, where Julia offers additional thoughts about what we discussed.)
  2. When Julia asked me to recommend a book at the end of the interview, I picked probably my favorite contemporary novel: The Mind-Body Problem by Rebecca Newberger Goldstein.  Embarrassingly, I hadn’t realized that Rebecca had already been on Julia’s show twice as a guest!  Anyway, one of the thrills of my life over the last year has been to get to know Rebecca a little, as well as her husband, who’s some guy named Steve Pinker.  Like, they both live right here in Boston!  You can talk to them!  I was especially pleased two weeks ago to learn that Rebecca won the National Humanities Medal—as I told Julia, Rebecca Goldstein getting a medal at the White House is the sort of thing I imagine happening in my ideal fantasy world, making it a pleasant surprise that it happened in this one.  Huge congratulations to Rebecca!
  3. The NSA has released probably its most explicit public statement so far about its plans to move to quantum-resistant cryptography.  For more see Bruce Schneier’s Crypto-Gram.  Hat tip for this item goes to reader Ole Aamot, one of the only people I’ve ever encountered whose name alphabetically precedes mine.
  4. Last Tuesday, I got to hear Ayaan Hirsi Ali speak at MIT about her new book, Heretic, and then spend almost an hour talking to students who had come to argue with her.  I found her clear, articulate, and courageous (as I guess one has to be in her line of work, even with armed cops on either side of the lecture hall).  After the shameful decision of Brandeis in caving in to pressure and cancelling Hirsi Ali’s commencement speech, I thought it spoke well of MIT that they let her speak at all.  The bar shouldn’t be that low, but it is.
  5. From far away on the political spectrum, I also heard Noam Chomsky talk last week (my first time hearing him live), about the current state of linguistics.  Much of the talk, it struck me, could have been given in the 1950s with essentially zero change (and I suspect Chomsky would agree), though a few parts of it were newer, such as the speculation that human languages have many of the features they do in order to minimize the amount of computation that the speaker needs to perform.  The talk was full of declarations that there had been no useful work whatsoever on various questions (e.g., about the evolutionary function of language), that they were total mysteries and would perhaps remain total mysteries forever.
  6. Many of you have surely heard by now that Terry Tao solved the Erdös Discrepancy Problem, by showing that for every infinite sequence of heads and tails and every positive integer C, there’s a positive integer k such that, if you look at the subsequence formed by every kth flip, there comes a point where the heads outnumber tails or vice versa by at least C.  This resolves a problem that’s been open for more than 80 years.  For more details, see this post by Timothy Gowers.  Notably, Tao’s proof builds, in part, on a recent Polymath collaborative online effort.  It was a big deal last year when Konev and Lisitsa used a SAT-solver to prove that there’s always a subsequence with discrepancy at least 3; Tao’s result now improves on that bound by ∞.

Bell inequality violation finally done right

September 15th, 2015

A few weeks ago, Hensen et al., of the Delft University of Technology and Barcelona, Spain, put out a paper reporting the first experiment that violates the Bell inequality in a way that closes off the two main loopholes simultaneously: the locality and detection loopholes.  Well, at least with ~96% confidence.  This is big news, not only because of the result itself, but because of the advances in experimental technique needed to achieve it.  Last Friday, two renowned experimentalists—Chris Monroe of U. of Maryland and Jungsang Kim of Duke—visited MIT, and in addition to talking about their own exciting ion-trap work, they did a huge amount to help me understand the new Bell test experiment.  So OK, let me try to explain this.

While some people like to make it more complicated, the Bell inequality is the following statement. Alice and Bob are cooperating with each other to win a certain game (the “CHSH game“) with the highest possible probability. They can agree on a strategy and share information and particles in advance, but then they can’t communicate once the game starts. Alice gets a uniform random bit x, and Bob gets a uniform random bit y (independent of x).  Their goal is to output bits, a and b respectively, such that a XOR b = x AND y: in other words, such that a and b are different if and only if x and y are both 1.  The Bell inequality says that, in any universe that satisfies the property of local realism, no matter which strategy they use, Alice and Bob can win the game at most 75% of the time (for example, by always outputting a=b=0).

What does local realism mean?  It means that, after she receives her input x, any experiment Alice can perform in her lab has a definite result that might depend on x, on the state of her lab, and on whatever information she pre-shared with Bob, but at any rate, not on Bob’s input y.  If you like: a=a(x,w) is a function of x and of the information w available before the game started, but is not a function of y.  Likewise, b=b(y,w) is a function of y and w, but not of x.  Perhaps the best way to explain local realism is that it’s the thing you believe in, if you believe all the physicists babbling about “quantum entanglement” just missed something completely obvious.  Clearly, at the moment two “entangled” particles are created, but before they separate, one of them flips a tiny coin and then says to the other, “listen, if anyone asks, I’ll be spinning up and you’ll be spinning down.”  Then the naïve, doofus physicists measure one particle, find it spinning down, and wonder how the other particle instantly “knows” to be spinning up—oooh, spooky! mysterious!  Anyway, if that’s how you think it has to work, then you believe in local realism, and you must predict that Alice and Bob can win the CHSH game with probability at most 3/4.

What Bell observed in 1964 is that, even though quantum mechanics doesn’t let Alice send a signal to Bob (or vice versa) faster than the speed of light, it still makes a prediction about the CHSH game that conflicts with local realism.  (And thus, quantum mechanics exhibits what one might not have realized beforehand was even a logical possibility: it doesn’t allow communication faster than light, but simulating the predictions of quantum mechanics in a classical universe would require faster-than-light communication.)  In particular, if Alice and Bob share entangled qubits, say $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}},$$ then there’s a simple protocol that lets them violate the Bell inequality, winning the CHSH game ~85% of the time (with probability (1+1/√2)/2 > 3/4).  Starting in the 1970s, people did experiments that vindicated the prediction of quantum mechanics, and falsified local realism—or so the story goes.

The violation of the Bell inequality has a schizophrenic status in physics.  To many of the physicists I know, Nature’s violating the Bell inequality is so trivial and obvious that it’s barely even worth doing the experiment: if people had just understood and believed Bohr and Heisenberg back in 1925, there would’ve been no need for this whole tiresome discussion.  To others, however, the Bell inequality violation remains so unacceptable that some way must be found around it—from casting doubt on the experiments that have been done, to overthrowing basic presuppositions of science (e.g., our own “freedom” to generate random bits x and y to send to Alice and Bob respectively).

For several decades, there was a relatively conservative way out for local realist diehards, and that was to point to “loopholes”: imperfections in the existing experiments which meant that local realism was still theoretically compatible with the results, at least if one was willing to assume a sufficiently strange conspiracy.

Fine, you interject, but surely no one literally believed these little experimental imperfections would be the thing that would rescue local realism?  Not so fast.  Right here, on this blog, I’ve had people point to the loopholes as a reason to accept local realism and reject the reality of quantum entanglement.  See, for example, the numerous comments by Teresa Mendes in my Whether Or Not God Plays Dice, I Do post.  Arguing with Mendes back in 2012, I predicted that the two main loopholes would both be closed in a single experiment—and not merely eventually, but in, like, a decade.  I was wrong: achieving this milestone took only a few years.

Before going further, let’s understand what the two main loopholes are (or rather, were).

The locality loophole arises because the measuring process takes time and Alice and Bob are not infinitely far apart.  Thus, suppose that, the instant Alice starts measuring her particle, a secret signal starts flying toward Bob’s particle at the speed of light, revealing her choice of measurement setting (i.e., the value of x).  Likewise, the instant Bob starts measuring his particle, his doing so sends a secret signal flying toward Alice’s particle, revealing the value of y.  By the time the measurements are finished, a few microseconds later, there’s been plenty of time for the two particles to coordinate their responses to the measurements, despite being “classical under the hood.”

Meanwhile, the detection loophole arises because in practice, measurements of entangled particles—especially of photons—don’t always succeed in finding the particles, let alone ascertaining their properties.  So one needs to select those runs of the experiment where Alice and Bob both find the particles, and discard all the “bad” runs where they don’t.  This by itself wouldn’t be a problem, if not for the fact that the very same measurement that reveals whether the particles are there, is also the one that “counts” (i.e., where Alice and Bob feed x and y and get out a and b)!

To someone with a conspiratorial mind, this opens up the possibility that the measurement’s success or failure is somehow correlated with its result, in a way that could violate the Bell inequality despite there being no real entanglement.  To illustrate, suppose that at the instant they’re created, one entangled particle says to the other: “listen, if Alice measures me in the x=0 basis, I’ll give the a=1 result.  If Bob measures you in the y=1 basis, you give the b=1 result.  In any other case, we’ll just evade detection and count this run as a loss.”  In such a case, Alice and Bob will win the game with certainty, whenever it gets played at all—but that’s only because of the particles’ freedom to choose which rounds will count.  Indeed, by randomly varying their “acceptable” x and y values from one round to the next, the particles can even make it look like x and y have no effect on the probability of a round’s succeeding.

Until a month ago, the state-of-the-art was that there were experiments that closed the locality loophole, and other experiments that closed the detection loophole, but there was no single experiment that closed both of them.

To close the locality loophole, “all you need” is a fast enough measurement on photons that are far enough apart.  That way, even if the vast Einsteinian conspiracy is trying to send signals between Alice’s and Bob’s particles at the speed of light, to coordinate the answers classically, the whole experiment will be done before the signals can possibly have reached their destinations.  Admittedly, as Nicolas Gisin once pointed out to me, there’s a philosophical difficulty in defining what we mean by the experiment being “done.”  To some purists, a Bell experiment might only be “done” once the results (i.e., the values of a and b) are registered in human experimenters’ brains!  And given the slowness of human reaction times, this might imply that a real Bell experiment ought to be carried out with astronauts on faraway space stations, or with Alice on the moon and Bob on earth (which, OK, would be cool).  If we’re being reasonable, however, we can grant that the experiment is “done” once a and b are safely recorded in classical, macroscopic computer memories—in which case, given the speed of modern computer memories, separating Alice and Bob by half a kilometer can be enough.  And indeed, experiments starting in 1998 (see for example here) have done exactly that; the current record, unless I’m mistaken, is 18 kilometers.  (Update: I was mistaken; it’s 144 kilometers.)  Alas, since these experiments used hard-to-measure photons, they were still open to the detection loophole.

To close the detection loophole, the simplest approach is to use entangled qubits that (unlike photons) are slow and heavy and can be measured with success probability approaching 1.  That’s exactly what various groups did starting in 2001 (see for example here), with trapped ions, superconducting qubits, and other systems.  Alas, given current technology, these sorts of qubits are virtually impossible to move miles apart from each other without decohering them.  So the experiments used qubits that were close together, leaving the locality loophole wide open.

So the problem boils down to: how do you create long-lasting, reliably-measurable entanglement between particles that are very far apart (e.g., in separate labs)?  There are three basic ideas in Hensen et al.’s solution to this problem.

The first idea is to use a hybrid system.  Ultimately, Hensen et al. create entanglement between electron spins in nitrogen vacancy centers in diamond (one of the hottest—or coolest?—experimental quantum information platforms today), in two labs that are about a mile away from each other.  To get these faraway electron spins to talk to each other, they make them communicate via photons.  If you stimulate an electron, it’ll sometimes emit a photon with which it’s entangled.  Very occasionally, the two electrons you care about will even emit photons at the same time.  In those cases, by routing those photons into optical fibers and then measuring the photons, it’s possible to entangle the electrons.

Wait, what?  How does measuring the photons entangle the electrons from whence they came?  This brings us to the second idea, entanglement swapping.  The latter is a famous procedure to create entanglement between two particles A and B that have never interacted, by “merely” entangling A with another particle A’, entangling B with another particle B’, and then performing an entangled measurement on A’ and B’ and conditioning on its result.  To illustrate, consider the state

$$ \frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}} \otimes \frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}} $$

and now imagine that we project the first and third qubits onto the state $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}}.$$

If the measurement succeeds, you can check that we’ll be left with the state $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}}$$ in the second and fourth qubits, even though those qubits were not entangled before.

So to recap: these two electron spins, in labs a mile away from each other, both have some probability of producing a photon.  The photons, if produced, are routed to a third site, where if they’re both there, then an entangled measurement on both of them (and a conditioning on the results of that measurement) has some nonzero probability of causing the original electron spins to become entangled.

But there’s a problem: if you’ve been paying attention, all we’ve done is cause the electron spins to become entangled with some tiny, nonzero probability (something like 6.4×10-9 in the actual experiment).  So then, why is this any improvement over the previous experiments, which just directly measured faraway entangled photons, and also had some small but nonzero probability of detecting them?

This leads to the third idea.  The new setup is an improvement because, whenever the photon measurement succeeds, we know that the electron spins are there and that they’re entangled, without having to measure the electron spins to tell us that.  In other words, we’ve decoupled the measurement that tells us whether we succeeded in creating an entangled pair, from the measurement that uses the entangled pair to violate the Bell inequality.  And because of that decoupling, we can now just condition on the runs of the experiment where the entangled pair was there, without worrying that that will open up the detection loophole, biasing the results via some bizarre correlated conspiracy.  It’s as if the whole experiment were simply switched off, except for those rare lucky occasions when an entangled spin pair gets created (with its creation heralded by the photons).  On those rare occasions, Alice and Bob swing into action, measuring their respective spins within the brief window of time—about 4 microseconds—allowed by the locality loophole, seeking an additional morsel of evidence that entanglement is real.  (Well, actually, Alice and Bob swing into action regardless; they only find out later whether this was one of the runs that “counted.”)

So, those are the main ideas (as well as I understand them); then there’s lots of engineering.  In their setup, Hensen et al. were able to create just a few heralded entangled pairs per hour.  This allowed them to produce 245 CHSH games for Alice and Bob to play, and to reject the hypothesis of local realism at ~96% confidence.  Jungsang Kim explained to me that existing technologies could have produced many more events per hour, and hence, in a similar amount of time, “particle physics” (5σ or more) rather than “psychology” (2σ) levels of confidence that local realism is false.  But in this type of experiment, everything is a tradeoff.  Building not one but two labs for manipulating NV centers in diamond is extremely onerous, and Hensen et al. did what they had to do to get a significant result.

The basic idea here, of using photons to entangle longer-lasting qubits, is useful for more than pulverizing local realism.  In particular, the idea is a major part of current proposals for how to build a scalable ion-trap quantum computer.  Because of cross-talk, you can’t feasibly put more than 10 or so ions in the same trap while keeping all of them coherent and controllable.  So the current ideas for scaling up involve having lots of separate traps—but in that case, one will sometimes need to perform a Controlled-NOT, or some other 2-qubit gate, between a qubit in one trap and a qubit in another.  This can be achieved using the Gottesman-Chuang technique of gate teleportation, provided you have reliable entanglement between the traps.  But how do you create such entanglement?  Aha: the current idea is to entangle the ions by using photons as intermediaries, very similar in spirit to what Hensen et al. do.

At a more fundamental level, will this experiment finally convince everyone that local realism is dead, and that quantum mechanics might indeed be the operating system of reality?  Alas, I predict that those who confidently predicted that a loophole-free Bell test could never be done, will simply find some new way to wiggle out, without admitting the slightest problem for their previous view.  This prediction, you might say, is based on a different kind of realism.

Ask Me Anything: Diversity Edition

September 5th, 2015

With the fall semester imminent, and by popular request, I figured I’d do another Ask Me Anything (see here for the previous editions).  This one has a special focus: I’m looking for questions from readers who consider themselves members of groups that have historically been underrepresented in the Shtetl-Optimized comments section.  Besides the “obvious”—e.g., women and underrepresented ethnic groups—other examples might include children, traditionally religious people, jocks, liberal-arts majors… (but any group that includes John Sidles is probably not an example).  If I left out your group, please go ahead and bring it to my and your fellow readers’ attention!

My overriding ideal in life—what is to me as Communism was to Lenin, as Frosted Flakes are to Tony the Tiger—is people of every background coming together to discover and debate universal truths that transcend their backgrounds.  So few things have ever stung me more than accusations of being a closed-minded ivory-tower elitist white male nerd etc. etc.  Anyway, to anyone who’s ever felt excluded here for whatever reason, I hope this AMA will be taken as a small token of goodwill.

Similar rules apply as to my previous AMAs:

  • Only one question per person.
  • No multi-part questions, or questions that require me to read a document or watch a video and then comment on it.
  • Questions need not have anything to do with your underrepresented group (though they could). Math, science, futurology, academic career advice, etc. are all fine.  But please be courteous; anything gratuitously nosy or hostile will be left in the moderation queue.
  • I’ll stop taking further questions most likely after 24 hours (I’ll post a warning before closing the thread).

Update (Sep. 6): For anyone from the Boston area, or planning to visit it, I have an important piece of advice.  Do not ever, under any circumstances, attempt to visit Walden Pond, and tell everyone you know to stay away.  After we spent 40 minutes driving there with a toddler, the warden literally screamed at us to go away, that the park was at capacity. It wasn’t an issue of parking: even if we’d parked elsewhere, we just couldn’t go.  Exceptions were made for the people in front of us, but not for us, the ones with the 2-year-old who’d been promised her weekend outing would be to meet her best friend at Walden Pond.  It’s strangely fitting that what for Thoreau was a place of quiet contemplation, is today purely a site of overcrowding and frustration.

Another Update: OK, no new questions please, only comments on existing questions! I’ll deal with the backlog later today. Thanks to everyone who contributed.

D-Wave Open Thread

August 26th, 2015

A bunch of people have asked me to comment on D-Wave’s release of its 1000-qubit processor, and a paper by a group including Cathy McGeoch saying that the machine is 1 or 2 orders of faster (in annealing time, not wall-clock time) than simulated annealing running on a single-core classical computer.  It’s even been suggested that the “Scott-signal” has been shining brightly for a week above Quantham City, but that Scott-man has been too lazy and out-of-shape even to change into his tights.

Scientifically, it’s not clear if much has changed.  D-Wave now has a chip with twice as many qubits as the last one.  That chip continues to be pretty effective at finding its own low-energy states: indeed, depending on various details of definition, the machine can even find its own low-energy states “faster” than some implementation of simulated annealing running on a single-core chip.  Of course, it’s entirely possible that Matthias Troyer or Sergei Isakov or Troels Ronnow or someone like that will be able to find a better implementation of simulated annealing that closes even the modest gap—as happened the last time—but I’ll give the authors the benefit of the doubt that they put good-faith effort into optimizing the classical code.

More importantly, I’d say it remains unclear whether any of the machine’s performance on the instances tested here can be attributed to quantum tunneling effects.  In fact, the paper explicitly states (see page 3) that it’s not going to consider such questions, and I think the authors would agree that you could very well see results like theirs, even if what was going on was fundamentally classical annealing.  Also, of course, it’s still true that, if you wanted to solve a practical optimization problem, you’d first need to encode it into the Chimera graph, and that reduction entails a loss that could hand a decisive advantage to simulated annealing, even without the need to go to multiple cores.  (This is what I’ve described elsewhere as essentially all of these performance comparisons taking place on “the D-Wave machine’s home turf”: that is, on binary constraint satisfaction problems that have precisely the topology of D-Wave’s Chimera graph.)

But, I dunno, I’m just not feeling the urge to analyze this in more detail.  Part of the reason is that I think the press might be getting less hyper-excitable these days, thereby reducing the need for a Chief D-Wave Skeptic.  By this point, there may have been enough D-Wave announcements that papers realize they no longer need to cover each one like an extraterrestrial landing.  And there are more hats in the ring now, with John Martinis at Google seeking to build superconducting quantum annealing machines but with ~10,000x longer coherence times than D-Wave’s, and with IBM Research and some others also trying to scale superconducting QC.  The realization has set in, I think, that both D-Wave and the others are in this for the long haul, with D-Wave currently having lots of qubits, but with very short coherence times and unclear prospects for any quantum speedup, and Martinis and some others having qubits of far higher quality, but not yet able to couple enough of them.

The other issue is that, on my flight from Seoul back to Newark, I watched two recent kids’ movies that were almost defiant in their simple, unironic, 1950s-style messages of hope and optimism.  One was Disney’s new live-action Cinderella; the other was Brad Bird’s Tomorrowland.  And seeing these back-to-back filled me with such positivity and good will that, at least for these few hours, it’s hard to summon my usual crusty self.  I say, let’s invent the future together, and build flying cars and jetpacks in our garages!  Let a thousand creative ideas bloom for how to tackle climate change and the other crises facing civilization!  (Admittedly, mass-market flying cars and jetpacks are probably not a step forward on climate change … but, see, there’s that negativity coming back.)  And let another thousand ideas bloom for how to build scalable quantum computers—sure, including D-Wave’s!  Have courage and be kind!

So yeah, if readers would like to discuss the recent D-Wave paper further (especially those who know something about it), they’re more than welcome to do so in the comments section.  But I’ve been away from Dana and Lily for two weeks, and will endeavor to spend time with them rather than obsessively reloading the comments (let’s see if I succeed).

As a small token of my goodwill, I enclose two photos from my last visit to a D-Wave machine, which occurred when I met with some grad students in Waterloo this past spring.  As you can see, I even personally certified that the machine was operating as expected.  But more than that: surpassing all reasonable expectations for quantum AI, this model could actually converse intelligently, through a protruding head resembling that of IQC grad student Sarah Kaiser.


6-photon BosonSampling

August 19th, 2015

The news is more-or-less what the title says!

In Science, a group led by Anthony Laing at Bristol has now reported BosonSampling with 6 photons, beating their own previous record of 5 photons, as well as the earlier record of 4 photons achieved a few years ago by the Walmsley group at Oxford (as well as the 3-photon experiments done by groups around the world).  I only learned the big news from a commenter on this blog, after the paper was already published (protip: if you’ve pushed forward the BosonSampling frontier, feel free to shoot me an email about it).

As several people explain in the comments, the main advance in the paper is arguably not increasing the number of photons, but rather the fact that the device is completely reconfigurable: you can try hundreds of different unitary transformations with the same chip.  In addition, the 3-photon results have an unprecedentedly high fidelity (about 95%).

The 6-photon results are, of course, consistent with quantum mechanics: the transition amplitudes are indeed given by permanents of 6×6 complex matrices.  Key sentence:

After collecting 15 sixfold coincidence events, a confidence of P = 0.998 was determined that these are drawn from a quantum (not classical) distribution.

No one said scaling BosonSampling would be easy: I’m guessing that it took weeks of data-gathering to get those 15 coincidence events.  Scaling up further will probably require improvements to the sources.

There’s also a caveat: their initial state consisted of 2 modes with 3 photons each, as opposed to what we really want, which is 6 modes with 1 photon each.  (Likewise, in the Walmsley group’s 4-photon experiments, the initial state consisted of 2 modes with 2 photons each.)  If the number of modes stayed 2 forever, then the output distributions would remain easy to sample with a classical computer no matter how many photons we had, since we’d then get permanents of matrices with only 2 distinct rows.  So “scaling up” needs to mean increasing not only the number of photons, but also the number of sources.

Nevertheless, this is an obvious step forward, and it came sooner than I expected.  Huge congratulations to the authors on their accomplishment!

But you might ask: given that 6×6 permanents are still pretty easy for a classical computer (the more so when the matrices have only 2 distinct rows), why should anyone care?  Well, the new result has major implications for what I’ve always regarded as the central goal of quantum computing research, much more important than breaking RSA or Grover search or even quantum simulation: namely, getting Gil Kalai to admit he was wrong.  Gil is on record, repeatedly, on this blog as well as his (see for example here), as saying that he doesn’t think BosonSampling will ever be possible even with 7 or 8 photons.  I don’t know whether the 6-photon result is giving him second thoughts (or sixth thoughts?) about that prediction.

Jacob Bekenstein (1947-2015)

August 18th, 2015

Today I learned the sad news that Jacob Bekenstein, one of the great theoretical physicists of our time, passed away at the too-early age of 68.

Everyone knows what a big deal it was when Stephen Hawking discovered in 1975 that black holes radiate.  Bekenstein was the guy who, as a grad student in Princeton in the early 1970s, was already raving about black holes having nonzero entropy and temperature, and satisfying the Second Law of Thermodynamics—something just about everyone, including Hawking, considered nuts at the time.  It was, as I understand it, Hawking’s failed attempt to prove Bekenstein wrong that led to Hawking’s discovery of the Hawking radiation, and thence to the modern picture of black holes.

In the decades since, Bekenstein continued to prove ingenious physical inequalities, often using thought experiments involving black holes.  The most famous of these, the Bekenstein bound, says that the number of bits that can be stored in any bounded physical system is finite, and is upper-bounded by ~2.6×1043 MR, where M is the system’s mass in kilograms and R is its radius in meters.  (This bound is saturated by black holes, and only by black holes, which therefore emerge as the most compact possible storage medium—though probably not the best for retrieval!)  Bekenstein’s lectures were models of clarity and rigor: at conferences full of audacious speculations, he stood out to my non-expert eyes as someone who was simply trying to follow chains of logic from accepted physical principles, however mind-bogglingly far those chains led but no further.

I first met Bekenstein in 2003, when I was a grad student spending a semester at Hebrew University in Jerusalem.  I was struck by the kindness he showed a 21-year-old nobody, who wasn’t even a physics student, coming to bother him.  Not only did he listen patiently to my blather about applying computational complexity to physics, he said that of course physics should ultimately aim to understand everything as the output of some computer program, that he too was thinking in terms of computation when he studied black-hole entropy.  I remember pondering the fact that the greatest reductionist I’d ever met was wearing a yarmulke—and then scolding myself for wasting precious brain-cycles on such a trivial thought when there was science to discuss.  I met Bekenstein maybe four or five more times on visits to Israel, most recently a year and a half ago, when we shared walks to and from the hotel at a firewall workshop at the Weizmann Institute.  He was unfailingly warm, modest, and generous—totally devoid of the egotism that I’ve heard can occasionally afflict people of his stature.  Now, much like with the qubits hitting the event horizon, the information that comprised Jacob Bekenstein might seem to be gone, but it remains woven into the cosmos.

Common Knowledge and Aumann’s Agreement Theorem

August 16th, 2015

The following is the prepared version of a talk that I gave at SPARC: a high-school summer program about applied rationality held in Berkeley, CA for the past two weeks.  I had a wonderful time in Berkeley, meeting new friends and old, but I’m now leaving to visit the CQT in Singapore, and then to attend the AQIS conference in Seoul.

Common Knowledge and Aumann’s Agreement Theorem

August 14, 2015

Thank you so much for inviting me here!  I honestly don’t know whether it’s possible to teach applied rationality, the way this camp is trying to do.  What I know is that, if it is possible, then the people running SPARC are some of the awesomest people on earth to figure out how.  I’m incredibly proud that Chelsea Voss and Paul Christiano are both former students of mine, and I’m amazed by the program they and the others have put together here.  I hope you’re all having fun—or maximizing your utility functions, or whatever.

My research is mostly about quantum computing, and more broadly, computation and physics.  But I was asked to talk about something you can actually use in your lives, so I want to tell a different story, involving common knowledge.

I’ll start with the “Muddy Children Puzzle,” which is one of the greatest logic puzzles ever invented.  How many of you have seen this one?

OK, so the way it goes is, there are a hundred children playing in the mud.  Naturally, they all have muddy foreheads.  At some point their teacher comes along and says to them, as they all sit around in a circle: “stand up if you know your forehead is muddy.”  No one stands up.  For how could they know?  Each kid can see all the other 99 kids’ foreheads, so knows that they’re muddy, but can’t see his or her own forehead.  (We’ll assume that there are no mirrors or camera phones nearby, and also that this is mud that you don’t feel when it’s on your forehead.)

So the teacher tries again.  “Knowing that no one stood up the last time, now stand up if you know your forehead is muddy.”  Still no one stands up.  Why would they?  No matter how many times the teacher repeats the request, still no one stands up.

Then the teacher tries something new.  “Look, I hereby announce that at least one of you has a muddy forehead.”  After that announcement, the teacher again says, “stand up if you know your forehead is muddy”—and again no one stands up.  And again and again; it continues 99 times.  But then the hundredth time, all the children suddenly stand up.

(There’s a variant of the puzzle involving blue-eyed islanders who all suddenly commit suicide on the hundredth day, when they all learn that their eyes are blue—but as a blue-eyed person myself, that’s always struck me as needlessly macabre.)

What’s going on here?  Somehow, the teacher’s announcing to the children that at least one of them had a muddy forehead set something dramatic in motion, which would eventually make them all stand up—but how could that announcement possibly have made any difference?  After all, each child already knew that at least 99 children had muddy foreheads!

Like with many puzzles, the way to get intuition is to change the numbers.  So suppose there were two children with muddy foreheads, and the teacher announced to them that at least one had a muddy forehead, and then asked both of them whether their own forehead was muddy.  Neither would know.  But each child could reason as follows: “if my forehead weren’t muddy, then the other child would’ve seen that, and would also have known that at least one of us has a muddy forehead.  Therefore she would’ve known, when asked, that her own forehead was muddy.  Since she didn’t know, that means my forehead is muddy.”  So then both children know their foreheads are muddy, when the teacher asks a second time.

Now, this argument can be generalized to any (finite) number of children.  The crucial concept here is common knowledge.  We call a fact “common knowledge” if, not only does everyone know it, but everyone knows everyone knows it, and everyone knows everyone knows everyone knows it, and so on.  It’s true that in the beginning, each child knew that all the other children had muddy foreheads, but it wasn’t common knowledge that even one of them had a muddy forehead.  For example, if your forehead and mine are both muddy, then I know that at least one of us has a muddy forehead, and you know that too, but you don’t know that I know it (for what if your forehead were clean?), and I don’t know that you know it (for what if my forehead were clean?).

What the teacher’s announcement did, was to make it common knowledge that at least one child has a muddy forehead (since not only did everyone hear the announcement, but everyone witnessed everyone else hearing it, etc.).  And once you understand that point, it’s easy to argue by induction: after the teacher asks and no child stands up (and everyone sees that no one stood up), it becomes common knowledge that at least two children have muddy foreheads (since if only one child had had a muddy forehead, that child would’ve known it and stood up).  Next it becomes common knowledge that at least three children have muddy foreheads, and so on, until after a hundred rounds it’s common knowledge that everyone’s forehead is muddy, so everyone stands up.

The moral is that the mere act of saying something publicly can change the world—even if everything you said was already obvious to every last one of your listeners.  For it’s possible that, until your announcement, not everyone knew that everyone knew the thing, or knew everyone knew everyone knew it, etc., and that could have prevented them from acting.

This idea turns out to have huge real-life consequences, to situations way beyond children with muddy foreheads.  I mean, it also applies to children with dots on their foreheads, or “kick me” signs on their backs…

But seriously, let me give you an example I stole from Steven Pinker, from his wonderful book The Stuff of Thought.  Two people of indeterminate gender—let’s not make any assumptions here—go on a date.  Afterward, one of them says to the other: “Would you like to come up to my apartment to see my etchings?”  The other says, “Sure, I’d love to see them.”

This is such a cliché that we might not even notice the deep paradox here.  It’s like with life itself: people knew for thousands of years that every bird has the right kind of beak for its environment, but not until Darwin and Wallace could anyone articulate why (and only a few people before them even recognized there was a question there that called for a non-circular answer).

In our case, the puzzle is this: both people on the date know perfectly well that the reason they’re going up to the apartment has nothing to do with etchings.  They probably even both know the other knows that.  But if that’s the case, then why don’t they just blurt it out: “would you like to come up for some intercourse?”  (Or “fluid transfer,” as the John Nash character put it in the Beautiful Mind movie?)

So here’s Pinker’s answer.  Yes, both people know why they’re going to the apartment, but they also want to avoid their knowledge becoming common knowledge.  They want plausible deniability.  There are several possible reasons: to preserve the romantic fantasy of being “swept off one’s feet.”  To provide a face-saving way to back out later, should one of them change their mind: since nothing was ever openly said, there’s no agreement to abrogate.  In fact, even if only one of the people (say A) might care about such things, if the other person (say B) thinks there’s any chance A cares, B will also have an interest in avoiding common knowledge, for A’s sake.

Put differently, the issue is that, as soon as you say X out loud, the other person doesn’t merely learn X: they learn that you know X, that you know that they know that you know X, that you want them to know you know X, and an infinity of other things that might upset the delicate epistemic balance.  Contrast that with the situation where X is left unstated: yeah, both people are pretty sure that “etchings” are just a pretext, and can even plausibly guess that the other person knows they’re pretty sure about it.  But once you start getting to 3, 4, 5, levels of indirection—who knows?  Maybe you do just want to show me some etchings.

Philosophers like to discuss Sherlock Holmes and Professor Moriarty meeting in a train station, and Moriarty declaring, “I knew you’d be here,” and Holmes replying, “well, I knew that you knew I’d be here,” and Moriarty saying, “I knew you knew I knew I’d be here,” etc.  But real humans tend to be unable to reason reliably past three or four levels in the knowledge hierarchy.  (Related to that, you might have heard of the game where everyone guesses a number between 0 and 100, and the winner is whoever’s number is the closest to 2/3 of the average of all the numbers.  If this game is played by perfectly rational people, who know they’re all perfectly rational, and know they know, etc., then they must all guess 0—exercise for you to see why.  Yet experiments show that, if you actually want to win this game against average people, you should guess about 20.  People seem to start with 50 or so, iterate the operation of multiplying by 2/3 a few times, and then stop.)

Incidentally, do you know what I would’ve given for someone to have explained this stuff to me back in high school?  I think that a large fraction of the infamous social difficulties that nerds have, is simply down to nerds spending so much time in domains (like math and science) where the point is to struggle with every last neuron to make everything common knowledge, to make all truths as clear and explicit as possible.  Whereas in social contexts, very often you’re managing a delicate epistemic balance where you need certain things to be known, but not known to be known, and so forth—where you need to prevent common knowledge from arising, at least temporarily.  “Normal” people have an intuitive feel for this; it doesn’t need to be explained to them.  For nerds, by contrast, explaining it—in terms of the muddy children puzzle and so forth—might be exactly what’s needed.  Once they’re told the rules of a game, nerds can try playing it too!  They might even turn out to be good at it.

OK, now for a darker example of common knowledge in action.  If you read accounts of Nazi Germany, or the USSR, or North Korea or other despotic regimes today, you can easily be overwhelmed by this sense of, “so why didn’t all the sane people just rise up and overthrow the totalitarian monsters?  Surely there were more sane people than crazy, evil ones.  And probably the sane people even knew, from experience, that many of their neighbors were sane—so why this cowardice?”  Once again, it could be argued that common knowledge is the key.  Even if everyone knows the emperor is naked; indeed, even if everyone knows everyone knows he’s naked, still, if it’s not common knowledge, then anyone who says the emperor’s naked is knowingly assuming a massive personal risk.  That’s why, in the story, it took a child to shift the equilibrium.  Likewise, even if you know that 90% of the populace will join your democratic revolt provided they themselves know 90% will join it, if you can’t make your revolt’s popularity common knowledge, everyone will be stuck second-guessing each other, worried that if they revolt they’ll be an easily-crushed minority.  And because of that very worry, they’ll be correct!

(My favorite Soviet joke involves a man standing in the Moscow train station, handing out leaflets to everyone who passes by.  Eventually, of course, the KGB arrests him—but they discover to their surprise that the leaflets are just blank pieces of paper.  “What’s the meaning of this?” they demand.  “What is there to write?” replies the man.  “It’s so obvious!”  Note that this is precisely a situation where the man is trying to make common knowledge something he assumes his “readers” already know.)

The kicker is that, to prevent something from becoming common knowledge, all you need to do is censor the common-knowledge-producing mechanisms: the press, the Internet, public meetings.  This nicely explains why despots throughout history have been so obsessed with controlling the press, and also explains how it’s possible for 10% of a population to murder and enslave the other 90% (as has happened again and again in our species’ sorry history), even though the 90% could easily overwhelm the 10% by acting in concert.  Finally, it explains why believers in the Enlightenment project tend to be such fanatical absolutists about free speech—why they refuse to “balance” it against cultural sensitivity or social harmony or any other value, as so many well-meaning people urge these days.

OK, but let me try to tell you something surprising about common knowledge.  Here at SPARC, you’ve learned all about Bayes’ rule—how, if you like, you can treat “probabilities” as just made-up numbers in your head, which are required obey the probability calculus, and then there’s a very definite rule for how to update those numbers when you gain new information.  And indeed, how an agent that wanders around constantly updating these numbers in its head, and taking whichever action maximizes its expected utility (as calculated using the numbers), is probably the leading modern conception of what it means to be “rational.”

Now imagine that you’ve got two agents, call them Alice and Bob, with common knowledge of each other’s honesty and rationality, and with the same prior probability distribution over some set of possible states of the world.  But now imagine they go out and live their lives, and have totally different experiences that lead to their learning different things, and having different posterior distributions.  But then they meet again, and they realize that their opinions about some topic (say, Hillary’s chances of winning the election) are common knowledge: they both know each other’s opinion, and they both know that they both know, and so on.  Then a striking 1976 result called Aumann’s Theorem states that their opinions must be equal.  Or, as it’s summarized: “rational agents with common priors can never agree to disagree about anything.”

Actually, before going further, let’s prove Aumann’s Theorem—since it’s one of those things that sounds like a mistake when you first hear it, and then becomes a triviality once you see the 3-line proof.  (Albeit, a “triviality” that won Aumann a Nobel in economics.)  The key idea is that knowledge induces a partition on the set of possible states of the world.  Huh?  OK, imagine someone is either an old man, an old woman, a young man, or a young woman.  You and I agree in giving each of these a 25% prior probability.  Now imagine that you find out whether they’re a man or a woman, and I find out whether they’re young or old.  This can be illustrated as follows:


The diagram tells us, for example, that if the ground truth is “old woman,” then your knowledge is described by the set {old woman, young woman}, while my knowledge is described by the set {old woman, old man}.  And this different information leads us to different beliefs: for example, if someone asks for the probability that the person is a woman, you’ll say 100% but I’ll say 50%.  OK, but what does it mean for information to be common knowledge?  It means that I know that you know that I know that you know, and so on.  Which means that, if you want to find out what’s common knowledge between us, you need to take the least common coarsening of our knowledge partitions.  I.e., if the ground truth is some given world w, then what do I consider it possible that you consider it possible that I consider possible that … etc.?  Iterate this growth process until it stops, by “zigzagging” between our knowledge partitions, and you get the set S of worlds such that, if we’re in world w, then what’s common knowledge between us is that the world belongs to S.  Repeat for all w’s, and you get the least common coarsening of our partitions.  In the above example, the least common coarsening is trivial, with all four worlds ending up in the same set S, but there are nontrivial examples as well:


Now, if Alice’s expectation of a random variable X is common knowledge between her and Bob, that means that everywhere in S, her expectation must be constant … and hence must equal whatever the expectation is, over all the worlds in S!  Likewise, if Bob’s expectation is common knowledge with Alice, then everywhere in S, it must equal the expectation of X over S.  But that means that Alice’s and Bob’s expectations are the same.

There are lots of related results.  For example, rational agents with common priors, and common knowledge of each other’s rationality, should never engage in speculative trade (e.g., buying and selling stocks, assuming that they don’t need cash, they’re not earning a commission, etc.).  Why?  Basically because, if I try to sell you a stock for (say) $50, then you should reason that the very fact that I’m offering it means I must have information you don’t that it’s worth less than $50, so then you update accordingly and you don’t want it either.

Or here’s another one: suppose again that we’re Bayesians with common priors, and we’re having a conversation, where I tell you my opinion (say, of the probability Hillary will win the election).  Not any of the reasons or evidence on which the opinion is based—just the opinion itself.  Then you, being Bayesian, update your probabilities to account for what my opinion is.  Then you tell me your opinion (which might have changed after learning mine), I update on that, I tell you my new opinion, then you tell me your new opinion, and so on.  You might think this could go on forever!  But, no, Geanakoplos and Polemarchakis observed that, as long as there are only finitely many possible states of the world in our shared prior, this process must converge after finitely many steps with you and me having the same opinion (and moreover, with it being common knowledge that we have that opinion).  Why?  Because as long as our opinions differ, your telling me your opinion or me telling you mine must induce a nontrivial refinement of one of our knowledge partitions, like so:


I.e., if you learn something new, then at least one of your knowledge sets must get split along the different possible values of the thing you learned.  But since there are only finitely many underlying states, there can only be finitely many such splittings (note that, since Bayesians never forget anything, knowledge sets that are split will never again rejoin).

And something else: suppose your friend tells you a liberal opinion, then you take it into account, but reply with a more conservative opinion.  The friend takes your opinion into account, and replies with a revised opinion.  Question: is your friend’s new opinion likelier to be more liberal than yours, or more conservative?

Obviously, more liberal!  Yes, maybe your friend now sees some of your points and vice versa, maybe you’ve now drawn a bit closer (ideally!), but you’re not going to suddenly switch sides because of one conversation.

Yet, if you and your friend are Bayesians with common priors, one can prove that that’s not what should happen at all.  Indeed, your expectation of your own future opinion should equal your current opinion, and your expectation of your friend’s next opinion should also equal your current opinion—meaning that you shouldn’t be able to predict in which direction your opinion will change next, nor in which direction your friend will next disagree with you.  Why not?  Formally, because all these expectations are just different ways of calculating an expectation over the same set, namely your current knowledge set (i.e., the set of states of the world that you currently consider possible)!  More intuitively, we could say: if you could predict that, all else equal, the next thing you heard would probably shift your opinion in a liberal direction, then as a Bayesian you should already shift your opinion in a liberal direction right now.  (This is related to what’s called the “martingale property”: sure, a random variable X could evolve in many ways in the future, but the average of all those ways must be its current expectation E[X], by the very definition of E[X]…)

So, putting all these results together, we get a clear picture of what rational disagreements should look like: they should follow unbiased random walks, until sooner or later they terminate in common knowledge of complete agreement.  We now face a bit of a puzzle, in that hardly any disagreements in the history of the world have ever looked like that.  So what gives?

There are a few ways out:

(1) You could say that the “failed prediction” of Aumann’s Theorem is no surprise, since virtually all human beings are irrational cretins, or liars (or at least, it’s not common knowledge that they aren’t). Except for you, of course: you’re perfectly rational and honest.  And if you ever met anyone else as rational and honest as you, maybe you and they could have an Aumannian conversation.  But since such a person probably doesn’t exist, you’re totally justified to stand your ground, discount all opinions that differ from yours, etc.  Notice that, even if you genuinely believed that was all there was to it, Aumann’s Theorem would still have an aspirational significance for you: you would still have to say this is the ideal that all rationalists should strive toward when they disagree.  And that would already conflict with a lot of standard rationalist wisdom.  For example, we all know that arguments from authority carry little weight: what should sway you is not the mere fact of some other person stating their opinion, but the actual arguments and evidence that they’re able to bring.  Except that as we’ve seen, for Bayesians with common priors this isn’t true at all!  Instead, merely hearing your friend’s opinion serves as a powerful summary of what your friend knows.  And if you learn that your rational friend disagrees with you, then even without knowing why, you should take that as seriously as if you discovered a contradiction in your own thought processes.  This is related to an even broader point: there’s a normative rule of rationality that you should judge ideas only on their merits—yet if you’re a Bayesian, of course you’re going to take into account where the ideas come from, and how many other people hold them!  Likewise, if you’re a Bayesian police officer or a Bayesian airport screener or a Bayesian job interviewer, of course you’re going to profile people by their superficial characteristics, however unfair that might be to individuals—so all those studies proving that people evaluate the same resume differently if you change the name at the top are no great surprise.  It seems to me that the tension between these two different views of rationality, the normative and the Bayesian, generates a lot of the most intractable debates of the modern world.

(2) Or—and this is an obvious one—you could reject the assumption of common priors. After all, isn’t a major selling point of Bayesianism supposed to be its subjective aspect, the fact that you pick “whichever prior feels right for you,” and are constrained only in how to update that prior?  If Alice’s and Bob’s priors can be different, then all the reasoning I went through earlier collapses.  So rejecting common priors might seem appealing.  But there’s a paper by Tyler Cowen and Robin Hanson called “Are Disagreements Honest?”—one of the most worldview-destabilizing papers I’ve ever read—that calls that strategy into question.  What it says, basically, is this: if you’re really a thoroughgoing Bayesian rationalist, then your prior ought to allow for the possibility that you are the other person.  Or to put it another way: “you being born as you,” rather than as someone else, should be treated as just one more contingent fact that you observe and then conditionalize on!  And likewise, the other person should condition on the observation that they’re them and not you.  In this way, absolutely everything that makes you different from someone else can be understood as “differing information,” so we’re right back to the situation covered by Aumann’s Theorem.  Imagine, if you like, that we all started out behind some Rawlsian veil of ignorance, as pure reasoning minds that had yet to be assigned specific bodies.  In that original state, there was nothing to differentiate any of us from any other—anything that did would just be information to condition on—so we all should’ve had the same prior.  That might sound fanciful, but in some sense all it’s saying is: what licenses you to privilege an observation just because it’s your eyes that made it, or a thought just because it happened to occur in your head?  Like, if you’re objectively smarter or more observant than everyone else around you, fine, but to whatever extent you agree that you aren’t, your opinion gets no special epistemic protection just because it’s yours.

(3) If you’re uncomfortable with this tendency of Bayesian reasoning to refuse to be confined anywhere, to want to expand to cosmic or metaphysical scope (“I need to condition on having been born as me and not someone else”)—well then, you could reject the entire framework of Bayesianism, as your notion of rationality. Lest I be cast out from this camp as a heretic, I hasten to say: I include this option only for the sake of completeness!

(4) When I first learned about this stuff 12 years ago, it seemed obvious to me that a lot of it could be dismissed as irrelevant to the real world for reasons of complexity. I.e., sure, it might apply to ideal reasoners with unlimited time and computational power, but as soon as you impose realistic constraints, this whole Aumannian house of cards should collapse.  As an example, if Alice and Bob have common priors, then sure they’ll agree about everything if they effectively share all their information with each other!  But in practice, we don’t have time to “mind-meld,” swapping our entire life experiences with anyone we meet.  So one could conjecture that agreement, in general, requires a lot of communication.  So then I sat down and tried to prove that as a theorem.  And you know what I found?  That my intuition here wasn’t even close to correct!

In more detail, I proved the following theorem.  Suppose Alice and Bob are Bayesians with shared priors, and suppose they’re arguing about (say) the probability of some future event—or more generally, about any random variable X bounded in [0,1].  So, they have a conversation where Alice first announces her expectation of X, then Bob announces his new expectation, and so on.  The theorem says that Alice’s and Bob’s estimates of X will necessarily agree to within ±ε, with probability at least 1-δ over their shared prior, after they’ve exchanged only O(1/(δε2)) messages.  Note that this bound is completely independent of how much knowledge they have; it depends only on the accuracy with which they want to agree!  Furthermore, the same bound holds even if Alice and Bob only send a few discrete bits about their real-valued expectations with each message, rather than the expectations themselves.

The proof involves the idea that Alice and Bob’s estimates of X, call them XA and XB respectively, follow “unbiased random walks” (or more formally, are martingales).  Very roughly, if |XA-XB|≥ε with high probability over Alice and Bob’s shared prior, then that fact implies that the next message has a high probability (again, over the shared prior) of causing either XA or XB to jump up or down by about ε.  But XA and XB, being estimates of X, are bounded between 0 and 1.  So a random walk with a step size of ε can only continue for about 1/ε2 steps before it hits one of the “absorbing barriers.”

The way to formalize this is to look at the variances, Var[XA] and Var[XB], with respect to the shared prior.  Because Alice and Bob’s partitions keep getting refined, the variances are monotonically non-decreasing.  They start out 0 and can never exceed 1 (in fact they can never exceed 1/4, but let’s not worry about constants).  Now, the key lemma is that, if Pr[|XA-XB|≥ε]≥δ, then Var[XB] must increase by at least δε2 if Alice sends XA to Bob, and Var[XA] must increase by at least δε2 if Bob sends XB to Alice.  You can see my paper for the proof, or just work it out for yourself.  At any rate, the lemma implies that, after O(1/(δε2)) rounds of communication, there must be at least a temporary break in the disagreement; there must be some round where Alice and Bob approximately agree with high probability.

There are lots of other results in my paper, including an upper bound on the number of calls that Alice and Bob need to make to a “sampling oracle” to carry out this sort of protocol approximately, assuming they’re not perfect Bayesians but agents with bounded computational power.  But let me step back and address the broader question: what should we make of all this?  How should we live with the gargantuan chasm between the prediction of Bayesian rationality for how we should disagree, and the actual facts of how we do disagree?

We could simply declare that human beings are not well-modeled as Bayesians with common priors—that we’ve failed in giving a descriptive account of human behavior—and leave it at that.   OK, but that would still leave the question: does this stuff have normative value?  Should it affect how we behave, if we want to consider ourselves honest and rational?  I would argue, possibly yes.

Yes, you should constantly ask yourself the question: “would I still be defending this opinion, if I had been born as someone else?”  (Though you might say this insight predates Aumann by quite a bit, going back at least to Spinoza.)

Yes, if someone you respect as honest and rational disagrees with you, you should take it as seriously as if the disagreement were between two different aspects of yourself.

Finally, yes, we can try to judge epistemic communities by how closely they approach the Aumannian ideal.  In math and science, in my experience, it’s common to see two people furiously arguing with each other at a blackboard.  Come back five minutes later, and they’re arguing even more furiously, but now their positions have switched.  As we’ve seen, that’s precisely what the math says a rational conversation should look like.  In social and political discussions, though, usually the very best you’ll see is that two people start out diametrically opposed, but eventually one of them says “fine, I’ll grant you this,” and the other says “fine, I’ll grant you that.”  We might say, that’s certainly better than the common alternative, of the two people walking away even more polarized than before!  Yet the math tells us that even the first case—even the two people gradually getting closer in their views—is nothing at all like a rational exchange, which would involve the two participants repeatedly leapfrogging each other, completely changing their opinion about the question under discussion (and then changing back, and back again) every time they learned something new.  The first case, you might say, is more like haggling—more like “I’ll grant you that X is true if you grant me that Y is true”—than like our ideal friendly mathematicians arguing at the blackboard, whose acceptance of new truths is never slow or grudging, never conditional on the other person first agreeing with them about something else.

Armed with this understanding, we could try to rank fields by how hard it is to have an Aumannian conversation in them.  At the bottom—the easiest!—is math (or, let’s say, chess, or debugging a program, or fact-heavy fields like lexicography or geography).  Crucially, here I only mean the parts of these subjects with agreed-on rules and definite answers: once the conversation turns to whose theorems are deeper, or whose fault the bug was, things can get arbitrarily non-Aumannian.  Then there’s the type of science that involves messy correlational studies (I just mean, talking about what’s a risk factor for what, not the political implications).  Then there’s politics and aesthetics, with the most radioactive topics like Israel/Palestine higher up.  And then, at the very peak, there’s gender and social justice debates, where everyone brings their formative experiences along, and absolutely no one is a disinterested truth-seeker, and possibly no Aumannian conversation has ever been had in the history of the world.

I would urge that even at the very top, it’s still incumbent on all of us to try to make the Aumannian move, of “what would I think about this issue if I were someone else and not me?  If I were a man, a woman, black, white, gay, straight, a nerd, a jock?  How much of my thinking about this represents pure Spinozist reason, which could be ported to any rational mind, and how much of it would get lost in translation?”

Anyway, I’m sure some people would argue that, in the end, the whole framework of Bayesian agents, common priors, common knowledge, etc. can be chucked from this discussion like so much scaffolding, and the moral lessons I want to draw boil down to trite advice (“try to see the other person’s point of view”) that you all knew already.  Then again, even if you all knew all this, maybe you didn’t know that you all knew it!  So I hope you gained some new information from this talk in any case.  Thanks.

Update: Coincidentally, there’s a moving NYT piece by Oliver Sacks, which (among other things) recounts his experiences with his cousin, the Aumann of Aumann’s theorem.

Another Update: If I ever did attempt an Aumannian conversation with someone, the other Scott A. would be a candidate! Here he is in 2011 making several of the same points I did above, using the same examples (I thank him for pointing me to his post).

Introducing some British people to P vs. NP

July 22nd, 2015

Here’s a 5-minute interview that I did with The Naked Scientists (a radio show syndicated by the BBC, and no, I’m not sure why it’s called that), explaining the P vs. NP problem.  For readers of this blog, there won’t be anything new here, but, well … you might enjoy the rest of the hour-long programme [sic], which also includes segments about a few other Clay Millennium problems (the Riemann Hypothesis, Navier-Stokes, and Poincaré), as well as a segment about fat fish that live in caves and gorge themselves on food on the rare occasions when it becomes available, and which turn out to share a gene variant with humans with similar tendencies.

Quantum query complexity: the other shoe drops

June 30th, 2015

Two weeks ago I blogged about a breakthrough in query complexity: namely, the refutation by Ambainis et al. of a whole slew of conjectures that had stood for decades (and that I mostly believed, and that had helped draw me into theoretical computer science as a teenager) about the largest possible gaps between various complexity measures for total Boolean functions. Specifically, Ambainis et al. built on a recent example of Göös, Pitassi, and Watson to construct bizarre Boolean functions f with, among other things, near-quadratic gaps between D(f) and R0(f) (where D is deterministic query complexity and R0 is zero-error randomized query complexity), near-1.5th-power gaps between R0(f) and R(f) (where R is bounded-error randomized query complexity), and near-4th-power gaps between D(f) and Q(f) (where Q is bounded-error quantum query complexity). See my previous post for more about the definitions of these concepts and the significance of the results (and note also that Mukhopadhyay and Sanyal independently obtained weaker results).

Because my mental world was in such upheaval, in that earlier post I took pains to point out one thing that Ambainis et al. hadn’t done: namely, they still hadn’t shown any super-quadratic separation between R(f) and Q(f), for any total Boolean function f. (Recall that a total Boolean function, f:{0,1}n→{0,1}, is one that’s defined for all 2n possible input strings x∈{0,1}n. Meanwhile, a partial Boolean function is one where there’s some promise on x: for example, that x encodes a periodic sequence. When you phrase them in the query complexity model, Shor’s algorithm and other quantum algorithms achieving exponential speedups work only for partial functions, not for total ones. Indeed, a famous result of Beals et al. from 1998 says that D(f)=O(Q(f)6) for all total functions f.)

So, clinging to a slender reed of sanity, I said it “remains at least a plausible conjecture” that, if you insist on a fair comparison—i.e., bounded-error quantum versus bounded-error randomized—then the biggest speedup quantum algorithms can ever give you over classical ones, for total Boolean functions, is the square-root speedup that Grover’s algorithm easily achieves for the n-bit OR function.

Today, I can proudly report that my PhD student, Shalev Ben-David, has refuted that conjecture as well.  Building on the Göös et al. and Ambainis et al. work, but adding a new twist to it, Shalev has constructed a total Boolean function f such that R(f) grows roughly like Q(f)2.5 (yes, that’s Q(f) to the 2.5th power). Furthermore, if a conjecture that Ambainis and I made in our recent “Forrelation” paper is correct—namely, that a problem called “k-fold Forrelation” has randomized query complexity roughly Ω(n1-1/k)—then one would get nearly a cubic gap between R(f) and Q(f).

The reason I found this question so interesting is that it seemed obvious to me that, to produce a super-quadratic separation between R and Q, one would need a fundamentally new kind of quantum algorithm: one that was unlike Simon’s and Shor’s algorithms in that it worked for total functions, but also unlike Grover’s algorithm in that it didn’t hit some impassable barrier at the square root of the classical running time.

Flummoxing my expectations once again, Shalev produced the super-quadratic separation, but not by designing any new quantum algorithm. Instead, he cleverly engineered a Boolean function for which you can use a combination of Grover’s algorithm and the Forrelation algorithm (or any other quantum algorithm that gives a huge speedup for some partial Boolean function—Forrelation is just the maximal example), to get an overall speedup that’s a little more than quadratic, while still keeping your Boolean function total. I’ll let you read Shalev’s short paper for the details, but briefly, it once again uses the Göös et al. / Ambainis et al. trick of defining a Boolean function that equals 1 if and only if the input string contains some hidden substructure, and the hidden substructure also contains a pointer to a “certificate” that lets you quickly verify that the hidden substructure was indeed there. You can use a super-fast algorithm—let’s say, a quantum algorithm designed for partial functions—to find the hidden substructure assuming it’s there. If you don’t find it, you can simply output 0. But if you do find it (or think you found it), then you can use the certificate, together with Grover’s algorithm, to confirm that you weren’t somehow misled, and that the substructure really was there. This checking step ensures that the function remains total.

Are there further separations to be found this way? Almost certainly! Indeed, Shalev, Robin Kothari, and I have already found some more things (as well as different/simpler proofs of known separations), though nothing quite as exciting as the above.

Update (July 1): Ronald de Wolf points out in the comments that this “trust-but-verify” trick, for designing total Boolean functions with unexpectedly low quantum query complexities, was also used in a recent paper by himself and Ambainis (while Ashley Montanaro points out that a similar trick was used even earlier, in a different context, by Le Gall).  What’s surprising, you might say, is that it took as long as it did for people to realize how many applications this trick has.

Update (July 2): In conversation with Robin Kothari and Cedric Lin, I realized that Shalev’s superquadratic separation between R and Q, combined with a recent result of Lin and Lin, resolves another open problem that had bothered me since 2001 or so. Given a Boolean function f, define the “projective quantum query complexity,” or P(f), to be the minimum number of queries made by a bounded-error quantum algorithm, in which the answer register gets immediately measured after each query. This is a model of quantum algorithms that’s powerful enough to capture (for example) Simon’s and Shor’s algorithms, but not Grover’s algorithm. Indeed, one might wonder whether there’s any total Boolean function for which P(f) is asymptotically smaller than R(f)—that’s the question I wondered about around 2001, and that I discussed with Elham Kashefi. Now, by using an argument based on the “Vaidman bomb,” Lin and Lin recently proved the fascinating result that P(f)=O(Q(f)2) for all functions f, partial or total. But, combining with Shalev’s result that there exists a total f for which R(f)=Ω(Q(f)2.5), we get that there’s a total f for which R(f)=Ω(P(f)1.25). In the other direction, the best I know is that P(f)=Ω(bs(f)) and therefore R(f)=O(P(f)3).

Celebrate gay marriage—and its 2065 equivalent

June 27th, 2015

Yesterday was a historic day for the United States, and I was as delighted as everyone else I know.  I’ve supported gay marriage since the mid-1990s, when as a teenager, I read Andrew Hodges’ classic biography of Alan Turing, and burned with white-hot rage at Turing’s treatment.  In the world he was born into—our world, until fairly recently—Turing was “free”: free to prove the unsolvability of the halting problem, free to help save civilization from the Nazis, just not free to pursue the sexual and romantic fulfillment that nearly everyone else took for granted.  I resolved then that, if I was against anything in life, I was against the worldview that had hounded Turing to his death, or anything that even vaguely resembled it.

So I’m proud for my country, and I’m thrilled for my gay friends and colleagues and relatives.  At the same time, seeing my Facebook page light up with an endless sea of rainbow flags and jeers at Antonin Scalia, there’s something that gnaws at me.  To stand up for Alan Turing in 1952 would’ve taken genuine courage.  To support gay rights in the 60s, 70s, 80s, even the 90s, took courage.  But celebrating a social change when you know all your friends will upvote you, more than a decade after the tide of history has made the change unstoppable?  It’s fun, it’s righteous, it’s justified, I’m doing it myself.  But let’s not kid ourselves by calling it courageous.

Do you want to impress me with your moral backbone?  Then go and find a group that almost all of your Facebook friends still consider it okay, even praiseworthy, to despise and mock, for moral failings that either aren’t failings at all or are no worse than the rest of humanity’s.  (I promise: once you start looking, it shouldn’t be hard to find.)  Then take a public stand for that group.