Ask an unbounded question, get an uncomputable answer

December 11th, 2015

Just when I thought I could relax, as the waters slowly receded from the latest D-Tsunami, my inbox and Facebook feed once again lit up with inquiries—this time, asking me to confirm or deny that “A Paradox at the Heart of Mathematics Makes a Physics Problem Unanswerable.”

Uh-oh!

Luckily for my blood pressure, though, this one turned out to refer to something that more-or-less deserves the hype.  In particular, it’s about a phenomenal 146-page paper by Cubitt, Perez-Garcia, and Wolf, which just appeared this week in Nature (in condensed form, of course).  Incidentally, yeah, his name really is Toby Cubitt, pronounced like “qubit.”  He’s a good guy.

To those in quantum computing, Cubitt et al.’s breakthrough is old news, having already been on the arXiv for almost a year (we’ve also had a talk at MIT about it).  The arXiv has created a funny phenomenon, where you learn something new and cool, assimilate it, move on, and then a year later, everyone is suddenly asking you have you seen this thing, is it for real, etc. etc., just because the thing got some rubber stamp like acceptance to Nature that caused the press to pick it up.  Like, dude, I was into the undecidability of the spectral gap way before it went mainstream.

One more amusing anecdote before we dive into the math.  In his Nature News piece popularizing Cubitt et al.’s result, the writer Davide Castelvecchi quotes Rebecca Goldstein, the brilliant novelist and biographer of Kurt Gödel, as saying: “Turing thought more clearly about the relationship between physics and logic than Gödel did.”  Here’s what happened: Nature News wrote to Rebecca to ask what Gödel’s own thoughts were about the relation between undecidability and physics.  Rebecca passed the request along to me.  So I wrote back to her, arguing that they might just as well ask what Turing thought, since the Cubitt et al. result is “really” about Turing-undecidability (with Gödel-undecidability just an automatic corollary), and at any rate:

I also think that Turing thought more clearly about the relationship between logic and physics than Gödel did (indeed, Gödel himself said that it was only Turing‘s analysis of the notion of computability, in terms of actual physical machines that one could imagine building, that convinced him that computability had been properly defined).

Rebecca passed that back to Nature News, agreeing with it, and then at some point the quote became hers.  Far from being miffed about this, I consider having my forgettable words attributed to a genius like Rebecca to be one of the great honors of my life.  (By pure coincidence, she and I are having lunch next week; hopefully this will butter her up.)

So, OK, let me restate Cubitt et al.’s great theorem in less pop-sciencey terms than Nature News used.  (You could also just read the paper‘s intro, which is exceedingly clear, but what the hell—I’m here to serve.)

Suppose you have two-dimensional material made of a bunch of stationary particles, each with local Hilbert space dimension d, which are arranged on an L×L square grid (so, there are L2 particles in all).  And suppose there’s some fixed d2-dimensional Hamiltonian h, with a local copy hi,j=h acting on each neighboring pair of particles (i,j).  (I.e., the material is translationally invariant, with the same laws of physics acting throughout.)  Let H be the total Hamiltonian: that is, the sum of the hi,j‘s over all the neighboring (i,j)’s.

Then a huge fraction of all of physics—quantum field theory, condensed-matter physics, you name it—can be summarized as, you’re trying to figure out the eigenvalues and eigenvectors of H.  The lowest eigenvalue, λ0, tells you your material’s ground energy, while the higher eigenvalues, λ12,…, tell you the next discrete energy levels that the material can jump up to.  The corresponding eigenvectors tell you which quantum states the material is sitting in when it has these energies: the ground state v0, and the excited states v1,v2,…  Those, in turn, determine basically everything you could want to know about the material: whether it superconducts, etc. etc.

Of course, the eigenvalues and eigenvectors will depend on the lattice size L.  Equally obviously, for any fixed L, you could in principle compute all the eigenvalues and eigenvectors by just diagonalizing some huge-ass matrix.  (That matrix being H.)  But physicists are usually more interested in the limiting behavior as L goes to infinity.  One of their most basic distinctions is: the material is gapped if λ10, the difference between the first excited energy and the ground energy, converges to some positive value or even grows with L as L→∞.  It’s gapless if λ10 converges to 0 as L→∞.  (Actually, Cubitt et al. use more technical definitions of both of these concepts, but we’ll ignore that.)

Cubitt et al.’s theorem now says the following: for some fixed, constant local dimension d, there is no algorithm that takes as input the local Hamiltonian h (say, as a d2×d2 matrix of algebraic numbers), and that decides whether the material is gapped or gapless.  Indeed, you can reduce the halting problem to that problem, in such a way that the material will be gapped if your Turing machine halts, or gapless if it runs forever.

As an immediate corollary, there’s some 2D material—characterized by a translationally-invariant local Hamiltonian h on particles of local dimension d—such that whether the material is gapped or gapless is independent of the axioms of ZF set theory, or whatever else your favorite axioms might be.  (Proof: build a Turing machine M that halts if and only if it finds an inconsistency in set theory, then run Cubitt et al.’s reduction from the halting problem.  By Gödel, if set theory is consistent then it can’t prove whether M halts or not.)

Cubitt et al. never bother to work out the local dimension d that suffices for them, but it could be worked out, and it’s probably at least in the tens of thousands.  Thus, their result leaves open the possibility that there’s an algorithm to decide gaplessness for 2D lattices of qubits (i.e., the special case d=2), or other “reasonably low-dimensional” quantum systems.  We simply don’t know right now.  Another tantalizing open question is whether there’s an algorithm to decide gaplessness for one-dimensional spin chains—again, even in the special case d=2.  Right now, the best we have in that direction is a difficult recent result of Bravyi and Gosset, which gives an algorithm to decide gaplessness for one-dimensional, frustration-free chains of qubits.  (Here “frustration-free,” an amusing term that does not well describe this subject as a whole, means that you can minimize the energy H by minimizing the energies of each hi,j individually.  Or, if you think of H as a SAT instance, it’s satisfiable.)

But while the exact value of d where uncomputability kicks in is still up for grabs, it’s extremely important that d is some fixed, universal constant, independent of the Turing machine.  Indeed, as Cubitt et al. point out in their paper, this is the only feature that makes their new result not a trivial corollary of the uncomputability of Wang tiling.  The latter is a famous result from 1966, which says that there’s no algorithm that takes as input a finite set of tiles, and that tells you whether, using unlimited copies of each tile, you could cover the entire plane (or equivalently, arbitrarily large finite regions of the plane).  I.e., this is yet another “natural” math problem that secretly encodes the halting problem.

The fact that d is fixed also means that, in order to encode larger and larger Turing machines into the local Hamiltonian h (as you must, if you want to embed the halting problem), you need to use more and more bits of precision (!) in the ~d4 real numbers that define h.  This then raises a question: how do you actually extract a description of a Turing machine from the binary expansions of the real numbers that define your Hamiltonian?  To do this, Cubitt et al. use Kitaev’s phase estimation algorithm—which, interestingly, is the only part of their construction that uses quantum mechanics in any way.  One thing that I’d love to understand better is whether the phase estimation is really essential here, or whether the analogous classical question, with the “Hamiltonian” given by a probability distribution over classical constraints, could also be proved to be undecidable for some fixed value of d—thereby showing that Cubitt et al.’s discovery had nothing to do with quantum mechanics.

(It’s possible that the answer to this is obvious; I didn’t think about it deeply.  Note that if the “classical Hamiltonian” is also deterministic, then the problem must be decidable for every fixed d, since there are only finitely many possible h’s, and we could cache all the answers in a lookup table.)

Anyway, it’s now my professional duty, as the prickly, curmudgeonly blogger I am, to end the post by shooing you away from two tempting misinterpretations of the Cubitt et al. result.

First, the result does not say—or even suggest—that there’s any real, finite physical system whose behavior is Gödel- or Turing-undecidable.  Thus, it gives no support to speculations like Roger Penrose’s, about “hypercomputing” that would exceed the capabilities of Turing machines.  The reason, again, is that as soon as you fix a lattice size L, everything becomes computable.  The Cubitt et al. result applies only to questions about the limiting behavior, as the number of particles goes to infinity.  But we already knew lots of examples of physical systems for which predicting their behavior in some infinite limit is at least as hard as the halting problem: for instance, the Wang tiles discussed earlier, or Post rewrite systems, or even Turing machines themselves.  Local Hamiltonians are a profound, nontrivial addition to that list—one that will be particularly striking to physicists, many of whom calculate the spectral gaps of at least 50 Hamiltonians between dinner and dessert.  But in some sense, there was no a-priori reason why a problem this general, about physical systems of unbounded size, ought to have been computable.

Second, the result does not say that any particular question physicists want an answer to—for example, the million-dollar Yang-Mills mass gap problem—is Gödel-undecidable.  “All it says,” is that the possibility that some real-world question of that kind could be undecidable isn’t totally closed off.  The Nature News piece stresses this latter implication a lot—as, admittedly, do Cubitt et al. themselves.  But to put things in perspective: four logicians proved around 1970 that there’s no algorithm to decide whether an arbitrary polynomial equation has an integer solution, thereby giving a negative solution to Hilbert’s Tenth Problem.  Yet with few exceptions, “working number theorists” barely even noticed this development, nor was (say) Andrew Wiles dissuaded from proving Fermat’s Last Theorem, by the absence of a general algorithm to do things like what he was trying to do.  (Indeed, the absence of a general algorithm was shown even earlier for equations like FLT, which have variables in the exponent.)  So I doubt the mathematical physicists who calculate spectral gaps for a living will be any more terrified than the number theorists were, to learn that they’ve been laboring their entire lives on the shores of the halting problem.  “Good for us, then!” they could rightly reply.  “Maybe our jobs won’t be so easy to automate.”

Update (Dec. 20): My colleague Seth Lloyd calls my attention to a PRL paper of his from 1993, which also discusses the construction of physical systems that are gapped if a given Turing machine halts and gapless if it runs forever.  So this basic idea has been around for a while.  As I explained in the post, the main contribution of the Cubitt et al. paper is just to get undecidability into “the sort of system physicists could plausibly care about” (or for which they could’ve plausibly hoped for an analytic solution): in this case, 2D translationally-invariant nearest-neighbor Hamiltonians with bounded local dimension.

Google, D-Wave, and the case of the factor-10^8 speedup for WHAT?

December 9th, 2015

Update (Dec. 16):  If you’re still following this, please check out an important comment by Alex Selby, the discoverer of Selby’s algorithm, which I discussed in the post.  Selby queries a few points in the Google paper: among other things, he disagrees with their explanation of why his classical algorithm works so well on D-Wave’s Chimera graph (and with their prediction that it should stop working for larger graphs), and he explains that Karmarkar-Karp is not the best known classical algorithm for the Number Partitioning problem.  He also questions whether simulated annealing is the benchmark against which everything should be compared (on the grounds that “everything else requires fine-tuning”), pointing out that SA itself typically requires lots of tuning to get it to work well.

Update (Dec. 11): MIT News now has a Q&A with me about the new Google paper. I’m really happy with how the Q&A turned out; people who had trouble understanding this blog post might find the Q&A easier. Thanks very much to Larry Hardesty for arranging it.

Meanwhile, I feel good that there seems to have been actual progress in the D-Wave debate! In previous rounds, I had disagreed vehemently with some of my MIT colleagues (like Ed Farhi and Peter Shor) about the best way to respond to D-Wave’s announcements. Today, though, at our weekly group meeting, there was almost no daylight between any of us. Partly, I’m sure, it’s that I’ve learned to express myself better; partly it’s that the “trigger” this time was a serious research paper by a group separate from D-Wave, rather than some trash-talking statement from Geordie Rose. But mostly it’s that, thanks to the Google group’s careful investigations, this time pretty much anyone who knows anything agrees about all the basic facts, as I laid them out in this blog post and in the Q&A. All that remains are some small differences in emotional attitude: e.g., how much of your time do you want to spend on a speculative, “dirty” approach to quantum computing (which is far ahead of everyone else in terms of engineering and systems integration, but which still shows no signs of an asymptotic speedup over the best classical algorithms, which is pretty unsurprising given theoretical expectations), at a time when the “clean” approaches might finally be closing in on the long-sought asymptotic quantum speedup?

Another Update: Daniel Lidar was nice enough to email me an important observation, and to give me permission to share it here.  Namely, the D-Wave 2X has a minimum annealing time of 20 microseconds.  Because of this, the observed running times for small instance sizes are artificially forced upward, making the growth rate in the machine’s running time look milder than it really is.  (Regular readers might remember that exactly the same issue plagued previous D-Wave vs. classical performance comparisons.)  Correcting this would certainly decrease the D-Wave 2X’s predicted speedup over simulated annealing, in extrapolations to larger numbers of qubits than have been tested so far (although Daniel doesn’t know by how much).  Daniel stresses that he’s not criticizing the Google paper, which explicitly mentions the minimum annealing time—just calling attention to something that deserves emphasis.


In retrospect, I should’ve been suspicious, when more than a year went by with no major D-Wave announcements that everyone wanted me to react to immediately. Could it really be that this debate was over—or not “over,” but where it always should’ve been, in the hands of experts who might disagree vehemently but are always careful to qualify speedup claims—thereby freeing up the erstwhile Chief D-Wave Skeptic for more “””rewarding””” projects, like charting a middle path through the Internet’s endless social justice wars?

Nope.

As many of you will have seen by now, on Monday a team at Google put out a major paper reporting new experiments on the D-Wave 2X machine.  (See also Hartmut Neven’s blog post about this.)  The predictable popularized version of the results—see for example here and here—is that the D-Wave 2X has now demonstrated a factor-of-100-million speedup over standard classical chips, thereby conclusively putting to rest the question of whether the device is “truly a quantum computer.”  In the comment sections of one my previous posts, D-Wave investor Steve Jurvetson even tried to erect a victory stele, by quoting Karl Popper about falsification.

In situations like this, the first thing I do is turn to Matthias Troyer, who’s arguably the planet’s most balanced, knowledgeable, trustworthy interpreter of quantum annealing experiments. Happily, in collaboration with Ilia Zintchenko and Ethan Brown, Matthias was generous enough to write a clear 3-page document putting the new results into context, and to give me permission to share it on this blog. From a purely scientific standpoint, my post could end right here, with a link to their document.

Then again, from a purely scientific standpoint, the post could’ve ended even earlier, with the link to the Google paper itself!  For this is not a case where the paper hides some crucial issue that the skeptics then need to ferret out.  On the contrary, the paper’s authors include some of the most careful people in the business, and the paper explains the caveats as clearly as one could ask.  In some sense, then, all that’s left for me or Matthias to do is to tell you what you’d learn if you read the paper!

So, OK, has the D-Wave 2X demonstrated a factor-108 speedup or not?  Here’s the shortest answer that I think is non-misleading:

Yes, there’s a factor-108 speedup that looks clearly asymptotic in nature, and there’s also a factor-108 speedup over Quantum Monte Carlo. But the asymptotic speedup is only if you compare against simulated annealing, while the speedup over Quantum Monte Carlo is only constant-factor, not asymptotic. And in any case, both speedups disappear if you compare against other classical algorithms, like that of Alex Selby. Also, the constant-factor speedup probably has less to do with quantum mechanics than with the fact that D-Wave built extremely specialized hardware, which was then compared against a classical chip on the problem of simulating the specialized hardware itself (i.e., on Ising spin minimization instances with the topology of D-Wave’s Chimera graph). Thus, while there’s been genuine, interesting progress, it remains uncertain whether D-Wave’s approach will lead to speedups over the best known classical algorithms, let alone to speedups over the best known classical algorithms that are also asymptotic or also of practical importance. Indeed, all of these points also remain uncertain for quantum annealing as a whole.

To expand a bit, there are really three separate results in the Google paper:

  1. The authors create Chimera instances with tall, thin energy barriers blocking the way to the global minimum, by exploiting the 8-qubit “clusters” that play such a central role in the Chimera graph.  In line with a 2002 theoretical prediction by Farhi, Goldstone, and Gutmann (a prediction we’ve often discussed on this blog), they then find that on these special instances, quantum annealing reaches the global minimum exponentially faster than classical simulated annealing, and that the D-Wave machine realizes this advantage.  As far as I’m concerned, this completely nails down the case for computationally-relevant collective quantum tunneling in the D-Wave machine, at least within the 8-qubit clusters.  On the other hand, the authors point out that there are other classical algorithms, like that of Selby (building on Hamze and de Freitas), which group together the 8-bit clusters into 256-valued mega-variables, and thereby get rid of the energy barrier that kills simulated annealing.  These classical algorithms are found empirically to outperform the D-Wave machine.  The authors also match the D-Wave machine’s asymptotic performance (though not the leading constant) using Quantum Monte Carlo, which (despite its name) is a classical algorithm often used to find quantum-mechanical ground states.
  2. The authors make a case that the ability to tunnel past tall, thin energy barriers—i.e., the central advantage that quantum annealing has been shown to have over classical annealing—might be relevant to at least some real-world optimization problems.  They do this by studying a classic NP-hard problem called Number Partitioning, where you’re given a list of N positive integers, and your goal is to partition the integers into two subsets whose sums differ from each other by as little as possible.  Through numerical studies on classical computers, they find that quantum annealing (in the ideal case) and Quantum Monte Carlo should both outperform simulated annealing, by roughly equal amounts, on random instances of Number Partitioning.  Note that this part of the paper doesn’t involve any experiments on the D-Wave machine itself, so we don’t know whether calibration errors, encoding loss, etc. will kill the theoretical advantage over simulated annealing.  But even if not, this still wouldn’t yield a “true quantum speedup,” since (again) Quantum Monte Carlo is a perfectly-good classical algorithm, whose asymptotics match those of quantum annealing on these instances.
  3. Finally, on the special Chimera instances with the tall, thin energy barriers, the authors find that the D-Wave 2X reaches the global optimum about 108 times faster than Quantum Monte Carlo running on a single-core classical computer.  But, extremely interestingly, they also find that this speedup does not grow with problem size; instead it simply saturates at ~108.  In other words, this is a constant-factor speedup rather than an asymptotic one.  Now, obviously, solving a problem “only” 100 million times faster (rather than asymptotically faster) can still have practical value!  But it’s crucial to remember that this constant-factor speedup is only observed for the Chimera instances—or in essence, for “the problem of simulating the D-Wave machine itself”!  If you wanted to solve something of practical importance, you’d first need to embed it into the Chimera graph, and it remains unclear whether any of the constant-factor speedup would survive that embedding.  In any case, while the paper isn’t explicit about this, I gather that the constant-factor speedup disappears when one compares against (e.g.) the Selby algorithm, rather than against QMC.

So then, what do I say to Steve Jurvetson?  I say—happily, not grudgingly!—that the new Google paper provides the clearest demonstration so far of a D-Wave device’s capabilities.  But then I remind him of all the worries the QC researchers had from the beginning about D-Wave’s whole approach: the absence of error-correction; the restriction to finite-temperature quantum annealing (moreover, using “stoquastic Hamiltonians”), for which we lack clear evidence for a quantum speedup; the rush for more qubits rather than better qubits.  And I say: not only do all these worries remain in force, they’ve been thrown into sharper relief than ever, now that many of the side issues have been dealt with.  The D-Wave 2X is a remarkable piece of engineering.  If it’s still not showing an asymptotic speedup over the best known classical algorithms—as the new Google paper clearly explains that it isn’t—then the reasons are not boring or trivial ones.  Rather, they seem related to fundamental design choices that D-Wave made over a decade ago.

The obvious question now is: can D-Wave improve its design, in order to get a speedup that’s asymptotic, and that holds against all classical algorithms (including QMC and Selby’s algorithm), and that survives the encoding of a “real-world” problem into the Chimera graph?  Well, maybe or maybe not.  The Google paper returns again and again to the subject of planned future improvements to the machine, and how they might clear the path to a “true” quantum speedup. Roughly speaking, if we rule out radical alterations to D-Wave’s approach, there are four main things one would want to try, to see if they helped:

  1. Lower temperatures (and thus, longer qubit lifetimes, and smaller spectral gaps that can be safely gotten across without jumping up to an excited state).
  2. Better calibration of the qubits and couplings (and thus, ability to encode a problem of interest, like the Number Partitioning problem mentioned earlier, to greater precision).
  3. The ability to apply “non-stoquastic” Hamiltonians.  (D-Wave’s existing machines are all limited to stoquastic Hamiltonians, defined as Hamiltonians all of whose off-diagonal entries are real and non-positive.  While stoquastic Hamiltonians are easier from an engineering standpoint, they’re also the easiest kind to simulate classically, using algorithms like QMC—so much so that there’s no consensus on whether it’s even theoretically possible to get a true quantum speedup using stoquastic quantum annealing.  This is a subject of active research.)
  4. Better connectivity among the qubits (thereby reducing the huge loss that comes from taking problems of practical interest, and encoding them in the Chimera graph).

(Note that “more qubits” is not on this list: if a “true quantum speedup” is possible at all with D-Wave’s approach, then the 1000+ qubits that they already have seem like more than enough to notice it.)

Anyway, these are all, of course, things D-Wave knows about and will be working on in the near future. As well they should! But to repeat: even if D-Wave makes all four of these improvements, we still have no idea whether they’ll see a true, asymptotic, Selby-resistant, encoding-resistant quantum speedup. We just can’t say for sure that they won’t see one.

In the meantime, while it’s sometimes easy to forget during blog-discussions, the field of experimental quantum computing is a proper superset of D-Wave, and things have gotten tremendously more exciting on many fronts within the last year or two.  In particular, the group of John Martinis at Google (Martinis is one of the coauthors of the Google paper) now has superconducting qubits with orders of magnitude better coherence times than D-Wave’s qubits, and has demonstrated rudimentary quantum error-correction on 9 of them.  They’re now talking about scaling up to ~40 super-high-quality qubits with controllable couplings—not in the remote future, but in, like, the next few years.  If and when they achieve that, I’m extremely optimistic that they’ll be able to show a clear quantum advantage for something (e.g., some BosonSampling-like sampling task), if not necessarily something of practical importance.  IBM Yorktown Heights, which I visited last week, is also working (with IARPA funding) on integrating superconducting qubits with many-microsecond coherence times.  Meanwhile, some of the top ion-trap groups, like Chris Monroe’s at the University of Maryland, are talking similarly big about what they expect to be able to do soon. The “academic approach” to QC—which one could summarize as “understand the qubits, control them, keep them alive, and only then try to scale them up”—is finally bearing some juicy fruit.

(At last week’s IBM conference, there was plenty of D-Wave discussion; how could there not be? But the physicists in attendance—I was almost the only computer scientist there—seemed much more interested in approaches that aim for longer-laster qubits, fault-tolerance, and a clear asymptotic speedup.)

I still have no idea when and if we’ll have a practical, universal, fault-tolerant QC, capable of factoring 10,000-digit numbers and so on.  But it’s now looking like only a matter of years until Gil Kalai, and the other quantum computing skeptics, will be forced to admit they were wrong—which was always the main application I cared about anyway!

So yeah, it’s a heady time for QC, with many things coming together faster than I’d expected (then again, it was always my personal rule to err on the side of caution, and thereby avoid contributing to runaway spirals of hype).  As we stagger ahead into this new world of computing—bravely, coherently, hopefully non-stoquastically, possibly fault-tolerantly—my goal on this blog will remain what it’s been for a decade: not to prognosticate, not to pick winners, but merely to try to understand and explain what has and hasn’t already been shown.


Update (Dec. 10): Some readers might be interested in an economic analysis of the D-Wave speedup by commenter Carl Shulman.

Another Update: Since apparently some people didn’t understand this post, here are some comments from a Y-Combinator thread about the post that might be helpful:

(1) [T]he conclusion of the Google paper is that we have probable evidence that with enough qubits and a big enough problem it will be faster for a very specific problem compared to a non-optimal classical algorithm (we have ones that are for sure better).

This probably sounds like a somewhat useless result (quantum computer beats B-team classical algorithm), but it is in fact interesting because D-Wave’s computers are designed to perform quantum annealing and they are comparing it to simulated annealing (the somewhat analogous classical algorithm). However they only found evidence of a constant (i.e. one that 4000 qubits wouldn’t help with) speed up (though a large one) compared to a somewhat better algorithm (Quantum Monte Carlo, which is ironically not a quantum algorithm), and they still can’t beat an even better classical algorithm (Selby’s) at all, even in a way that won’t scale.

Scott’s central thesis is that although it is possible there could be a turning point past 2000 qubits where the D-Wave will beat our best classical alternative, none of the data collected so far suggests that. So it’s possible that a 4000 qubit D-Wave machine will exhibit this trend, but there is no evidence of it (yet) from examining a 2000 qubit machine. Scott’s central gripe with D-Wave’s approach is that they don’t have any even pie-in-the-sky theoretical reason to expect this to happen, and scaling up quantum computers without breaking the entire process is much harder than for classical computers so making them even bigger doesn’t seem like a solution.

(2) DWave machines are NOT gate quantum computers; they call their machine quantum annealing machines. It is not known the complexity class of problems that can be solved efficiently by quantum annealing machines, or if that class is equivalent to classical machines.

The result shows that the DWave machine is asymptotically faster than the Simulated Annealing algorithm (yay!), which suggests that it is executing the Quantum Annealing algorithm. However, the paper also explicitly states that this does not mean that the Dwave machine is exhibiting a ‘quantum speedup’. To do this, they would need to show it to outperform the best known classical algorithm, which as the paper acknowledges, it does not.

What the paper does seem to be showing is that the machine in question is actually fundamentally quantum in nature; it’s just not clear yet that that the type of quantum computer it is is an improvement over classical ones.

(3) [I]t isn’t called out in the linked blog since by now Scott probably considers it basic background information, but D-Wave only solves a very particular problem, and it is both not entirely clear that it has a superior solution to that problem than a classical algorithm can obtain and it is not clear that encoding real problems into that problem will not end up costing you all of the gains itself. Really pragmatic applications are still a ways into the future. It’s hard to imagine what they might be when we’re still so early in the process, and still have no good idea what either the practical or theoretical limits are.

(4) The popular perception of quantum computers as “doing things in parallel” is very misleading. A quantum computer lets you perform computation on a superposed state while maintaining that superposition. But that only helps if the structure of the problem lets you somehow “cancel out” the incorrect results leaving you with the single correct one. [There’s hope for the world! –SA]

If I can’t do math, I don’t want to be part of your revolution

December 3rd, 2015

1. Emma Goldman, the fiery early-20th-century anarchist, is credited for giving the world the immortal refrain “if I can’t dance, I don’t want to be part of your revolution” (actually it’s not clear that she ever said it so pithily, but she did express such a thought).  Admittedly, no one would mistake me for either a dancer or an anarchist, but I’ve always felt a kinship with Goldman over her terpsichorean line in the sand.  The other day, it occurred to me that there’s a parallel sentence that sums up my entire political philosophy—on the one hand, my default instinct to side with the downtrodden and with the progressive left, but on the other, my dissent from any even vaguely anti-STEM, anti-rationality, or anti-nerd undercurrents, and my refusal to join any popular uprising that seems liable (for example) to delay the discovery of a P≠NP proof, by inconveniencing the people working on one.

So, here’s my sentence, which you should feel free to reprint on t-shirts and coffee mugs as desired:

If I can’t do math, I don’t want to be part of your revolution.

2. Over at Scientific American‘s website, John Horgan posted an account of a workshop on Integrated Information Theory, which I attended a couple weeks ago at NYU (along with David Chalmers, Giulio Tononi, Christof Koch, Max Tegmark, and a dozen or so others).  I was the “official skeptic” of the workshop, and gave a talk based on my blog post The Unconscious Expander.  I don’t really agree with what Horgan says about physics and information in general, but I do (of course) join him in his skepticism of IIT, and he gives a pretty accurate summary of what people said at the workshop.  (Alas, my joke about my lunch not being poisoned completely bombed with the IIT crowd … as I should’ve predicted!)  The workshop itself was lots of fun; thanks so much to David, Giulio, and Hedda Hassel Morch for organizing it.

3. As you might have noticed, I’ve created a new category on this blog: “Obviously I’m Not Defending Aaronson.”  This category—reserved for posts that caused at least a hundred people to hate me—refers to a peculiar phrase I encountered over and over, in the social media threads denouncing me as a horrible person.  The phrase tends to occur in passages like: “look, obviously I’m not defending Aaronson, but it’s worth pointing out that, if you carefully reread everything he wrote, he never actually said that war orphans should be roasted alive and then eaten for fun.  That’s just something we all know that a clueless, arrogant nerd like him would think.”

4. Right now I’m at the “ThinkQ” conference at IBM in Yorktown Heights.  Here are the PowerPoint slides from my talk yesterday, entitled “The Largest Possible Quantum Speedups.”  Regular readers of this blog will find a lot that’s old and a little that’s new.

Talk, be merry, and be rational

November 23rd, 2015

Yesterday I wrote a statement on behalf of a Scott Alexander SlateStarCodex/rationalist meetup, which happened last night at MIT (in the same room where I teach my graduate class), and which I’d really wanted to attend but couldn’t.  I figured I’d share the statement here:

I had been looking forward to attending tonight’s MIT SlateStarCodex meetup as I hardly ever look forward to anything. Alas, I’m now stuck in Chicago, with my flight cancelled due to snow, and with all flights for the next day booked up. But instead of continuing to be depressed about it, I’ve decided to be happy that this meetup is even happening at all—that there’s a community of people who can read, let’s say, a hypothetical debate moderator questioning Ben Carson about what it’s like to be a severed half-brain, and simply be amused, instead of silently trying to figure out who benefits from the post and which tribe the writer belongs to. (And yes, I know: the answer is the gray tribe.) And you can find this community anywhere—even in Cambridge, Massachusetts! Look, I spend a lot of time online, just getting more and more upset reading social justice debates that are full of people calling each other douchebags without even being able to state anything in the same galactic supercluster as the other side’s case. And then what gives me hope for humanity is to click over to the slatestarcodex tab, and to see all the hundreds of comments (way more than my blog gets) by people who disagree with each other but who all basically get it, who all have minds that don’t make me despair. And to realize that, when Scott Alexander calls an SSC meetup, he can fill a room just about anywhere … well, at least anywhere I would visit. So talk, be merry, and be rational.

I’m now back in town, and told by people who attended the meetup that it was crowded, disorganized, and great.  And now I’m off to Harvard, to attend the other Scott A.’s talk “How To Ruin A Perfectly Good Randomized Controlled Trial.”


Update (Nov. 24) Scott Alexander’s talk at Harvard last night was one of the finest talks I’ve ever attended. He was introduced to rapturous applause as simply “the best blogger on the Internet,” and as finally an important speaker, in a talk series that had previously wasted everyone’s time with the likes of Steven Pinker and Peter Singer. (Scott demurred that his most notable accomplishment in life was giving the talk at Harvard that he was just now giving.) The actual content, as Scott warned from the outset, was “just” a small subset of a basic statistics course, but Scott brought each point alive with numerous recent examples, from psychiatry, pharmacology, and social sciences, where bad statistics or misinterpretations of statistics were accepted by nearly everyone and used to set policy. (E.g., Alcoholics Anonymous groups that claimed an “over 95%” success rate, because the people who relapsed were kicked out partway through and not counted toward the total.) Most impressively, Scott leapt immediately into the meat, ended after 20 minutes, and then spent the next two hours just taking questions. Scott is publicity-shy, but I hope for others’ sake that video of the talk will eventually make its way online.

Then, after the talk, I had the honor of meeting two fellow Boston-area rationalist bloggers, Kate Donovan and Jesse Galef. Yes, I said “fellow”: for almost a decade, I’ve considered myself on the fringes of the “rationalist movement.” I’d hang out a lot with skeptic/effective-altruist/transhumanist/LessWrong/OvercomingBias people (who are increasingly now SlateStarCodex people), read their blogs, listen and respond to their arguments, answer their CS theory questions. But I was always vaguely uncomfortable identifying myself with any group that even seemed to define itself by how rational it was compared to everyone else (even if the rationalists constantly qualified their self-designation with “aspiring”!). Also, my rationalist friends seemed overly interested in questions like how to prevent malevolent AIs from taking over the world, which I tend to think we lack the tools to make much progress on right now (though, like with many other remote possibilities, I’m happy for some people to work on them and see if they find anything interesting).

So, what changed? Well, in the debates about social justice, public shaming, etc. that have swept across the Internet these past few years, it seems to me that my rationalist friends have proven themselves able to weigh opposing arguments, examine their own shortcomings, resist groupthink and hysteria from both sides, and attack ideas rather than people, in a way that the wider society—and most depressingly to me, the “enlightened, liberal” part of society—has often failed. In a real-world test (“real-world,” in this context, meaning social media…), the rationalists have walked the walk and rationaled the rational, and thus they’ve given me no choice but to stand up and be counted as one of them.

Have a great Thanksgiving, those of you in the US!


Another Update: Dana, Lily, and I had the honor of having Scott Alexander over for dinner tonight. I found this genius of human nature, who took so much flak last year for defending me, to be completely uninterested in discussing anything related to social justice or online shaming. Instead, his gaze was fixed on the eternal: he just wanted to grill me all evening about physics and math and epistemology. Having recently read this Nature News article by Ron Cowen, he kept asking me things like: “you say that in quantum gravity, spacetime itself is supposed to dissolve into some sort of network of qubits. Well then, how does each qubit know which other qubits it’s supposed to be connected to? Are there additional qubits to specify the connectivity pattern? If so, then doesn’t that cause an infinite regress?” I handwaved something about AdS/CFT, where a dynamic spacetime is supposed to emerge from an ordinary quantum theory on a fixed background specified in advance. But I added that, in some sense, he had rediscovered the whole problem of quantum gravity that’s confused everyone for almost a century: if quantum mechanics presupposes a causal structure on the qubits or whatever other objects it talks about, then how do you write down a quantum theory of the causal structures themselves?

I’m sure there’s a lesson in here somewhere about what I should spend my time on.

G. Phi. Fo. Fum.

November 4th, 2015

Update (Dec. 14): The long wait is over!  Here’s Laci’s paper on the arXiv.  So far, I’ve read it only deeply enough to note that it contains the following sentence:

A group G ≤ S(Ω) defines the category of G-isomorphisms of strings on the domain Ω; the natural notation for this category, the central object of study in this paper, would seem to be “G-Strings.”

With that, I believe Laci himself has outshone even reddit’s attempt to mine his breakthrough result for juvenile humor.

See also a nice Quanta article about Laci’s algorithm by Erica Klarreich.  There’s only one statement in the article that I disagree with: namely that, if graph isomorphism were inherently quasipolynomial time, then it would be the first natural example of such a problem.  We know other natural problems, like approximating free games and socially-optimal Nash equilibria, that are solvable in nO(log n) time but that can’t be in P unless 3SAT is solvable in ~exp(√n) time.

Update (Nov. 17): Video of Laci’s first talk is now available.

Breaking News (Nov. 12): Jeremy Kun has written up a phenomenal summary of Babai’s first lecture.  I haven’t carefully studied all of it, and in any case, there are many missing details to be filled in later (Babai told Kun that the preprint will be available “soon, soon!”).  But from the summary, four points stood out to me:

  1. Babai actually claims a quasipolynomial-time algorithm for an interestingly more general problem than graph isomorphism, called string isomorphism.  This was already in the abstract, but googling didn’t reveal what string isomorphism was.  So, OK, here’s what it is: you’re given two strings x and y over some finite alphabet, as well as the generators of a group G of permutations of the string indices.  The problem is to determine whether you can transform x to y by applying a permutation in G.  Or even more generally: given a string x, find a full set of generators for the subgroup of G that fixes x.  See Kun’s post for the straightforward reductions from GI to these group-theoretic problems.
  2. As was hinted in the abstract, in Babai’s analysis of his algorithm, there’s one step that relies on a statement whose only known proof depends on the Classification of Finite Simple Groups.  (Thus, it’s not the algorithm itself requires iterating through all the sporadic simple groups or anything like that; this only shows up in the correctness proof.)  This is not the first-ever computer-science application of the Classification of Finite Simple Groups (indeed, Babai himself has some previous ones), but it’s certainly the most dramatic.
  3. In previous work on GI, the Johnson graph emerged over and over as a forehead-bangingly hard case that caused numerous algorithms to fail.  In the new work, it looks like Babai’s central technical innovation is to show that, in some sense, the Johnson graph is the only obstruction to taking the divide-and-conquer approaches that people that had tried before, and making them run in quasipolynomial time.  I.e., in each step of the recursion, either you can find a Johnson graph on a large fraction of the vertices and handle it specially, or else you can do something that works whenever there’s not a Johnson graph on a large fraction of the vertices.  Babai calls this “split-or-Johnson.”
  4. Babai stressed that in some sense, his new algorithm is the culmination of a program laid out by Eugene Luks in 1982.  Now, the Classification of Finite Simple Groups was also more-or-less completed in the early 1980s.  To my mind, this raises a fascinating socio-mathematical question: which aspects of the new work, if any, could not have been done in the early 80s, possibly by Babai or Luks themselves?  what is it that needed another 30 years?  If the answer turns out to be “nothing,” then to me that’s an astounding illustration of the role of the individual in mathematical progress.  As in: Laci was nice enough to take a third-of-a-century break between his and Luks’ work in the early 80s, and the “natural next step” in their program … and still no one managed to use that break to beat him to the next step!

Earlier today, I was tipped off to what might be the theoretical computer science result of the decade.  My source asked me not to break the news on this blog—but since other theory bloggers (and twitterers) are now covering the story, I guess the graph is out of the Babai.

According to the University of Chicago’s theory seminar calendar, on Tuesday of next week (November 10), the legendary Laszlo Babai will be giving a talk about a new algorithm that solves the graph isomorphism problem in quasipolynomial time.  The previous fastest algorithm to decide whether two n-vertex graphs G and H are isomorphic—by Babai and Luks, back in 1983—ran in exp(√(n log n)) time.  If we credit the announcement, Babai has now gotten that down to exp(polylog(n)), putting one of the central problems of computer science “just barely above P.”  (For years, I’ve answered questions on this blog about the status of graph isomorphism—would I bet that it’s in BQP? in coNP? etc.—by saying that, as far as I and many others are concerned, it might as well just be in P.  Of course I’m happy to reaffirm that conjecture tonight.)

Next week, I assume, Laci will lecture to a packed house; then the experts will race to unpack the details.  Until then, we probably need to sit tight; I don’t know any more than what’s in the abstract.  For now, I’m delighted if commenters want to share general thoughts or questions about graph isomorphism (and I’ll try to answer what I can), but I won’t allow uninformed speculations or rumors about the details of the new result—not until Laci has had a chance to speak.


Update (Nov. 5): While we all wait with bated breath for more details, you can amuse yourself with the talk I gave at Laci’s 60th birthday conference five years ago.

Also, a comment of mine that I should probably promote to the main post:

Dana points out to me that non-native English speakers might not get the staggeringly clever pun in this post’s title (hey, it was the best I could do on a deadline).

So, alright, fee fi fo fum is what the approaching giant bellows in Jack and the Beanstalk. It means something big is on the horizon. Also, G is a graph, and Phi is an isomorphism.


Update (Nov. 12): So, Laci gave his talk. Video was made but does not appear to be available yet. However, Gabriel Gaster, who was in attendance, graciously live-tweeted everything. Here’s a Storify of the live-tweets. (What’s a “Storify”?)

A breakthrough on QMA(2)?

October 30th, 2015

Last night, Martin Schwarz posted a preprint to the arXiv that claims to show the new complexity class containment QMA(2) ⊆ EXP.  (See also his brief blog post about this result.)  Here QMA(2) means Quantum Merlin-Arthur with two Merlins—i.e., the set of languages for which a “yes” answer can be witnessed by two unentangled quantum states, |ψ〉⊗|φ〉, on polynomially many qubits each, which are checked by a polynomial-time quantum algorithm—while EXP means deterministic exponential time.  Previously, the best upper bound we had was the trivial QMA(2) ⊆ NEXP (Nondeterministic Exponential Time), which comes from guessing exponential-size classical descriptions of the two quantum proofs.

Whether QMA(2) is contained in EXP is a problem that had fascinated me for a decade.  With Salman Beigi, Andy Drucker, Bill Fefferman, and Peter Shor, we discussed this problem in our 2008 paper The Power of Unentanglement.  That paper (with an additional ingredient supplied by Harrow and Montanaro) shows how to prove that a 3SAT instance of size n is satisfiable, using two unentangled quantum proofs with only Õ(√n) qubits each.  This implies that searching over all n-qubit unentangled proofs must take at least exp(n2) time, unless 3SAT is solvable in 2o(n) time (i.e., unless the Exponential Time Hypothesis is false).  However, since EXP is defined as the set of problems solvable in 2p(n) time, for any polynomial p, this is no barrier to QMA(2) ⊆ EXP being true—it merely constrains the possible techniques that could prove such a containment.

In trying to prove QMA(2) ⊆ EXP, the fundamental difficulty is that you need to optimize over unentangled quantum states only.  That might sound easier than optimizing over all states (including the entangled ones), but ironically, it’s harder!  The reason why it’s harder is that optimizing over all quantum states (say, to find the one that’s accepted by some measurement with the maximum probability) is a convex optimization problem: in fact, it boils down to finding the principal eigenvector of a Hermitian matrix.  By contrast, optimizing over only the separable states is a non-convex optimization problem, which is NP-hard to solve exactly (treating the dimension of the Hilbert space as the input size n)—meaning that the question shifts to what sorts of approximations are possible.

Last week, I had the pleasure of speaking with Martin in person, when I visited Vienna, Austria to give a public lecture at the wonderful new research institute IST.  Martin was then ironing out some final wrinkles in his proof, and I got to watch him in action—in particular, to see the care and detachment with which he examined the possibility that his proof might imply too much (e.g., that NP-complete problems are solvable in quasipolynomial time).  Fortunately, his proof turned out not to imply anything of the kind.

The reason why it didn’t is directly related to the most striking feature of Martin’s proof—namely, that it’s non-relativizing, leaving completely open the question of whether QMA(2)A ⊆ EXPA relative to all oracles A.  To explain how this is possible requires saying a bit about how the proof works.  The obvious way to prove QMA(2) ⊆ EXP—what I had assumed from the beginning was the only realistic way—would be to give a quasipolynomial-time approximation algorithm for the so-called Best Separable State or BSS problem.  The BSS problem, as defined in this paper by Russell Impagliazzo, Dana Moshkovitz, and myself (see also this one by Barak et al.), is as follows: you’re given as input an n2×n2 Hermitian matrix A, with all its eigenvalues in [0,1].  Your goal is to find length-n unit vectors, u and w, that maximize

(uT⊗wT)A(u⊗w),

to within an additive error of ±ε, for some constant ε.

Of course, if we just asked for a length-n2 unit vector v that maximized vTAv, we’d be asking for the principal eigenvector of A, which is easy to find in polynomial time.  By contrast, from the ABDFS and Harrow-Montanaro results, it follows that the BSS problem, for constant ε, cannot be solved in poly(n) time, unless 3SAT is solvable in 2o(n) time.  But this still leaves the possibility that BSS is solvable in nlog(n) time—and that possibility would immediately imply QMA(2) ⊆ EXP.  So, as I and others saw it, the real challenge here was to find a quasipolynomial-time approximation algorithm for BSS—something that remained elusive, although Brandao-Christandl-Yard made partial progress towards it.

But now Martin comes along, and proves QMA(2) ⊆ EXP in a way that sidesteps the BSS problem.  The way he does it is by using the fact that, if a problem is in QMA(2), then we don’t merely know a Hermitian operator A corresponding to the measurement of |ψ〉⊗|φ〉: rather, we know an actual polynomial-size sequence of quantum gates that get multiplied together to produce A.  Using that fact, Chailloux and Sattath showed that a natural variant of the QMA-complete Local Hamiltonians problem, which they call Separable Sparse Hamiltonians, is complete for QMA(2).  Thus, it suffices for Martin to show how to solve the Separable Sparse Hamiltonians problem in singly-exponential time.  This he does by using perturbation theory gadgets to reduce Separable Sparse Hamiltonians to Separable Local Hamiltonians with an exponentially-small promise gap, and then using a result of Shi and Wu to solve the latter problem in singly-exponential time.  All in all, given the significance of the advance, Martin’s paper is remarkably short; a surprising amount of it boils down to deeply understanding some not-especially-well-known results that were already in the literature.

One obvious problem left open is whether the full BSS problem—rather than just the special case of it corresponding to QMA(2)—is solvable in quasipolynomial time after all.  A second obvious problem is whether the containment QMA(2) ⊆ EXP can be improved to QMA(2) ⊆ PSPACE, or even (say) QMA(2) ⊆ PP.  (By comparison, note that QMA ⊆ PP, by a result of Kitaev and Watrous.)


Update (Nov. 10): I thought I should let people know that a serious concern has been raised by an expert about the correctness of the proof—and in particular, about the use of perturbation theory gadgets. Martin tells me that he’s working on a fix, and I very much hope he’ll succeed, but not much to do for now except let the scientific process trundle along (which doesn’t happen at blog-speed).

Ordinary Words Will Do

October 18th, 2015

Izabella Laba, a noted mathematician at the University of British Columbia, recently posted some tweets that used me as a bad, cautionary example for how “STEM faculty should be less contemptuous of social sciences.”  Here was the offending comment of mine, from the epic Walter Lewin thread last fall:

[W]hy not dispense with the empirically-empty notion of “privilege,” and just talk directly about the actual well-being of actual people, or groups of people?  If men are doing horrific things to women—for example, lashing them for driving cars, like in Saudi Arabia—then surely we can just say so in plain language.  Stipulating that the torturers are “exercising their male privilege” with every lash adds nothing to anyone’s understanding of the evil.  It’s bad writing.  More broadly, it seems to me that the entire apparatus of “privilege,” “delegitimation,” etc. etc. can simply be tossed overboard, to rust on the ocean floor alongside dialectical materialism and other theoretical superstructures that were once pompously insisted upon as preconditions of enlightened social discourse.  This isn’t quantum field theory.  Ordinary words will do.

Prof. Laba derisively commented:

Might as well ask you to explain calculus without using fancy words like “derivative” or “continuous.”  Simple number arithmetic will do.

Prof. Laba’s tweets were favorited by Jordan Ellenberg, a mathematician who wrote the excellent popular book How Not to Be Wrong.  (Ellenberg had also criticized me last year for my strange, naïve idea that human relations can be thought about using logic.)

Given my respect for the critics, I guess I’m honor-bound to respond.

For the record, I tend not to think about the social sciences—or for that matter, the natural sciences—as monolithic entities at all.  I admire any honest attempt to discover the truth about anything.  And not being a postmodern relativist, I believe there are deep truths worth discovering in history, psychology, economics, linguistics, possibly even sociology.  Reading the books of Steven Pinker underscored for me how much is actually understood nowadays about human nature—much of it only figured out within the last half-century.  Likewise, reading the voluminous profundities of Scott Alexander taught me that even in psychiatry, there are truths (and even a few definite cures) to be had for those who seek.

I also believe that the social sciences are harder—way harder—than math or physics or CS.  They’re harder because of the tenuousness of the correlations, because of the complexity of each individual human brain (let alone 7 billion of them on the same planet), but most of all, because politics and ideology and the scientist’s own biases place such powerful thumbs on the scale.  This makes it all the more impressive when a social scientist, like (say) Stanley Milgram or Judith Rich Harris or Napoleon Chagnon, teaches the world something important and new.

I will confess to contempt for anything that I regard as pompous obscurantism—for self-referential systems of jargon whose main purposes are to bar outsiders, to mask a lack of actual understanding, and to confer power on certain favored groups.  And I regard the need to be alert to such systems, to nip them in the bud before they grow into Lysenkoism, as in some sense the problem of intellectual life.  Which brings me to the most fundamental asymmetry between the hard and soft sciences.  Namely, the very fact that it’s so much harder to nurture new truths to maturity in the social sciences than it is in math or physics, means that in the former, the jargon-weeds have an easier time filling the void—and we know they’ve done it again and again, even in the post-Enlightenment West.

Time for a thought experiment.  Suppose you showed up at a university anytime between, let’s say, 1910 and 1970, and went from department to department asking (in so many words): what are you excited about this century?  Where are your new continents, what’s the future of your field?  Who should I read to learn about that future?

In physics, the consensus answer would’ve been something like: Planck, Einstein, Bohr, Schrödinger, Dirac.

In psychology, it would’ve been: Freud and Jung (with another faction for B. F. Skinner).

In politics and social sciences, over an enormous swath of academia (including in the West), it would’ve been: Marx, Engels, Trotsky, Lenin.

With hindsight, we now know that the physics advice would’ve been absolute perfection, the psychology and politics advice an unmitigated disaster.  Yes, physicists today know more than Einstein, can even correct him on some points, but the continents he revealed to us actually existed—indeed, have only become more important since Einstein’s time.

But Marx and Freud?  You would’ve done better to leave the campus, and ask a random person on the street what she or he thought about economics and psychology.  In high school, I remember cringing through a unit on the 1920s, when we learned about how “two European professors upset a war-weary civilization’s established certainties—with Einstein overturning received wisdom about space and time, and Freud doing just the same for the world of the mind.”  It was never thought important to add that Einstein’s theories turned out to be true while Freud’s turned out to be false.  Still, at least Freud’s ideas led “only” to decades of bad psychology and hundreds of innocent people sent to jail because of testimony procured through hypnosis, rather than to tens of millions of dead, as with the other social-scientific theory that reigned supreme among 20th-century academics.

Marx and Freud built impressive intellectual edifices—sufficiently impressive for a large fraction of intellectuals to have accepted those men as gurus on par with Darwin and Einstein for almost a century.  Yet on nearly every topic they wrote about, we now know that Marx and Freud couldn’t have been any more catastrophically wrong.  Moreover, their wrongness was knowable at the time—and was known to many, though the ones who knew were typically the ones who the intellectual leaders sneered at, as deluded reactionaries.

Which raises a question: suppose that, in the 1920s, I’d taken the social experts’ advice to study Marx and Freud, didn’t understand much of what they said (and found nonsensical much of what I did understand), and eventually rejected them as pretentious charlatans.  Then why wouldn’t I have been just like Prof. Laba’s ignorant rube, who dismisses calculus because he doesn’t understand technical terms like “continuous” and “derivative”?

On reflection, I don’t think that the two cases are comparable at all.

The hard sciences need technical vocabularies for a simple reason: because they’re about things that normal people don’t spend their hours obsessively worrying about.  Yes, I’d have a hard time understanding organic chemists or differential geometers, but largely for the same reasons I’d have a hard time understanding football fans or pirates.  It’s not just that I don’t understand the arguments; it’s that the arguments are about a world that’s alien to me (and that, to be honest, I don’t care about as much as I do my world).

Suppose, by contrast, that you’re writing about the topics everyone spends their time obsessively worrying about: politics, society, the human mind, the relations between the races and sexes.  In other words, suppose you’re writing about precisely the topics for which the ordinary English language has been honed over centuries—for which Shakespeare and Twain and Dr. King and so many others deployed the language to such spectacular effect.  In that case, what excuse could you possibly have to write in academese, to pepper your prose with undefined in-group neologisms?

Well, let’s be charitable; maybe you have a reason.  For example, maybe you’re doing a complicated meta-analysis of psychology papers, so you need to talk about r-values and kurtosis and heteroskedasticity.  Or maybe you’re putting people in an fMRI machine while you ask them questions, so you need to talk about the temporal resolution in the anterior cingulate cortex.  Or maybe you’re analyzing sibling rivalries using game theory, so you need Nash equilibria.  Or you’re picking apart sentences using Chomskyan formal grammar.  In all these cases, armchair language doesn’t suffice because you’re not just sitting in your armchair: you’re using a new tool to examine the everyday from a different perspective.  For present purposes, you might as well be doing algebraic geometry.

The Freudians and Marxists would, of course, claim that they’re doing the exact same thing.  Yes, they’d say, you thought you had the words to discuss your own mind or the power structure of society, but really you didn’t, because you lacked the revolutionary theoretical framework that we now possess.  (Trotsky’s writings  are suffused with this brand of arrogance in nearly every sentence: for example, when he ridicules the bourgeoisie liberals who whine about “human rights violations” in the early USSR, yet who are too dense to phrase their objections within the framework of dialectical materialism.)

I submit that, even without the hindsight of 2015, there would’ve been excellent reasons to be skeptical of these claims.  Has it ever happened, you might ask yourself, that someone sat in their study and mused about the same human questions that occupied Plato and Shakespeare and Hume, in the same human way they did, and then came up with a new, scientific conclusion that was as rigorous and secure as relativity or evolution?

Let me know if I missed something, but I can’t think of a single example.  Sure, it seems to me, there have been geniuses of human nature, who enlarged our vision without any recourse to the quantitative methods of science.  But even those geniuses “only” contributed melodies for other geniuses to answer in counterpoint, rather than stones for everyone who came later to build upon.  Also, the geniuses usually wrote well.

Am I claiming that progress is impossible in the social realm?  Not at all.  The emancipation of slaves, the end of dueling and blasphemy laws and the divine right of kings, women’s suffrage and participation in the workforce, gay marriage—all these strike me as crystal-clear examples of moral progress, as advances that will still be considered progress a thousand years from now, if there’s anyone around then to discuss such things.  Evolutionary psychology, heuristics and biases, reciprocal altruism, and countless other developments likewise strike me as intellectual progress within the sciences of human nature.  But none of these advances needed recondite language!  Ordinary words sufficed for Thomas Paine and Frederick Douglass and John Stuart Mill, as they sufficed for Robert Axelrod and for Kahneman and Tversky.  So forgive me for thinking that whatever is true and important in the social world today, should likewise be defensible to every smart person in ordinary words, and that this represents a genuine difference between the social sciences and physics.

Which brings us to the central point that Prof. Laba disputed in that comment of mine.  I believe there are countless moral heroes in our time, as well as social scientists who struggle heroically to get the right answers.  But as far as I can tell, the people who build complex intellectual edifices around words like “privilege” and “delegitimation” and “entitlement” and “marginalized” are very much the same sort of people who, a few generations ago, built similar edifices around “bourgeoisie” and “dialectical” and “false consciousness.”  In both cases, there’s an impressive body of theory that’s held up as the equivalent in its domain of relativity, quantum mechanics, and Darwinism, with any skeptics denounced as science-deniers.  In both cases, enlightened liberals are tempted to side with the theorists, since the theorists believe in so many of the same causes that the enlightened liberals believe in, and hate so many of the same people who the enlightened liberals hate.  But in both cases, the theorists’ language seems to alternate between incomprehensible word-salad and fervid, often profanity-laced denunciations, skipping entirely over calm clarity.  And in both cases, the only thing that the impressive theoretical edifice ever seems to get used for, is to prove over and over that certain favored groups should get more power while disfavored ones should get less.

So I’m led to the view that, if you want to rouse people’s anger about injustice or their pity about preventable suffering, or end arbitrary discrimination codified into law, or give individuals more freedom to pursue their own happiness, or come up with a new insight about human nature, or simply describe the human realities that you see around you—for all these purposes, the words that sufficed for every previous generation’s great humanists will also suffice for you.

On the other hand, to restrict freedom and invent new forms of discrimination—and to do it in the name of equality and justice—that takes theory.  You’ll need a sophisticated framework, for example, to prove that even if two adults both insist they’re consenting to a relationship, really they might not be, because of power structures in the wider society that your superior insight lets you see.  You’ll need advanced discourse to assure you that, even though your gut reaction might be horror at (say) someone who misspoke once and then had their life gleefully destroyed on social media, your gut is not to be trusted, because it’s poisoned by the same imperialist, patriarchal biases as everything else—and because what looks like a cruel lynching needs to be understood in a broader social context (did the victim belong to a dominant group, or to a marginalized one?).  Finally, you’ll need oodles of theory (bring out the Marcuse) to explain why the neoliberal fanaticism about “free speech” and “tolerance” and “due process” and “the presumption of innocence” is too abstract and simplistic—for those concepts, too, fail to distinguish between a marginalized group that deserves society’s protection and a dominant group that doesn’t.

So I concede to Prof. Laba that the complicated discourse of privilege, hegemony, etc. serves a definite purpose for the people who wield it, just as much as the complicated discourse of quantum field theory serves a purpose for physicists.  It’s just that the purposes of the privilege-warriors aren’t my purposes.  For my purposes—which include fighting injustice, advancing every social and natural science as quickly as possible, and helping all well-meaning women and men see each other’s common humanity—I said last year and I say again that ordinary words will do.


Update (Oct. 26): Izabella Laba has written a response to this post, for which I’m extremely grateful. Her reply reveals that she and I have a great deal of common ground, and also a few clear areas of disagreement (e.g., what’s wrong with Steven Pinker?). But my most important objection is simply that, the first time I loaded her blog, the text went directly over the rock image in the background, making it impossible to read without highlighting it.

Six announcements

September 21st, 2015
  1. I did a podcast interview with Julia Galef for her series “Rationally Speaking.”  See also here for the transcript (which I read rather than having to listen to myself stutter).  The interview is all about Aumann’s Theorem, and whether rational people can agree to disagree.  It covers a lot of the same ground as my recent post on the same topic, except with less technical detail about agreement theory and more … well, agreement.  At Julia’s suggestion, we’re planning to do a follow-up podcast about the particular intractability of online disagreements.  I feel confident that we’ll solve that problem once and for all.  (Update: Also check out this YouTube video, where Julia offers additional thoughts about what we discussed.)
  2. When Julia asked me to recommend a book at the end of the interview, I picked probably my favorite contemporary novel: The Mind-Body Problem by Rebecca Newberger Goldstein.  Embarrassingly, I hadn’t realized that Rebecca had already been on Julia’s show twice as a guest!  Anyway, one of the thrills of my life over the last year has been to get to know Rebecca a little, as well as her husband, who’s some guy named Steve Pinker.  Like, they both live right here in Boston!  You can talk to them!  I was especially pleased two weeks ago to learn that Rebecca won the National Humanities Medal—as I told Julia, Rebecca Goldstein getting a medal at the White House is the sort of thing I imagine happening in my ideal fantasy world, making it a pleasant surprise that it happened in this one.  Huge congratulations to Rebecca!
  3. The NSA has released probably its most explicit public statement so far about its plans to move to quantum-resistant cryptography.  For more see Bruce Schneier’s Crypto-Gram.  Hat tip for this item goes to reader Ole Aamot, one of the only people I’ve ever encountered whose name alphabetically precedes mine.
  4. Last Tuesday, I got to hear Ayaan Hirsi Ali speak at MIT about her new book, Heretic, and then spend almost an hour talking to students who had come to argue with her.  I found her clear, articulate, and courageous (as I guess one has to be in her line of work, even with armed cops on either side of the lecture hall).  After the shameful decision of Brandeis in caving in to pressure and cancelling Hirsi Ali’s commencement speech, I thought it spoke well of MIT that they let her speak at all.  The bar shouldn’t be that low, but it is.
  5. From far away on the political spectrum, I also heard Noam Chomsky talk last week (my first time hearing him live), about the current state of linguistics.  Much of the talk, it struck me, could have been given in the 1950s with essentially zero change (and I suspect Chomsky would agree), though a few parts of it were newer, such as the speculation that human languages have many of the features they do in order to minimize the amount of computation that the speaker needs to perform.  The talk was full of declarations that there had been no useful work whatsoever on various questions (e.g., about the evolutionary function of language), that they were total mysteries and would perhaps remain total mysteries forever.
  6. Many of you have surely heard by now that Terry Tao solved the Erdös Discrepancy Problem, by showing that for every infinite sequence of heads and tails and every positive integer C, there’s a positive integer k such that, if you look at the subsequence formed by every kth flip, there comes a point where the heads outnumber tails or vice versa by at least C.  This resolves a problem that’s been open for more than 80 years.  For more details, see this post by Timothy Gowers.  Notably, Tao’s proof builds, in part, on a recent Polymath collaborative online effort.  It was a big deal last year when Konev and Lisitsa used a SAT-solver to prove that there’s always a subsequence with discrepancy at least 3; Tao’s result now improves on that bound by ∞.

Bell inequality violation finally done right

September 15th, 2015

A few weeks ago, Hensen et al., of the Delft University of Technology and Barcelona, Spain, put out a paper reporting the first experiment that violates the Bell inequality in a way that closes off the two main loopholes simultaneously: the locality and detection loopholes.  Well, at least with ~96% confidence.  This is big news, not only because of the result itself, but because of the advances in experimental technique needed to achieve it.  Last Friday, two renowned experimentalists—Chris Monroe of U. of Maryland and Jungsang Kim of Duke—visited MIT, and in addition to talking about their own exciting ion-trap work, they did a huge amount to help me understand the new Bell test experiment.  So OK, let me try to explain this.

While some people like to make it more complicated, the Bell inequality is the following statement. Alice and Bob are cooperating with each other to win a certain game (the “CHSH game“) with the highest possible probability. They can agree on a strategy and share information and particles in advance, but then they can’t communicate once the game starts. Alice gets a uniform random bit x, and Bob gets a uniform random bit y (independent of x).  Their goal is to output bits, a and b respectively, such that a XOR b = x AND y: in other words, such that a and b are different if and only if x and y are both 1.  The Bell inequality says that, in any universe that satisfies the property of local realism, no matter which strategy they use, Alice and Bob can win the game at most 75% of the time (for example, by always outputting a=b=0).

What does local realism mean?  It means that, after she receives her input x, any experiment Alice can perform in her lab has a definite result that might depend on x, on the state of her lab, and on whatever information she pre-shared with Bob, but at any rate, not on Bob’s input y.  If you like: a=a(x,w) is a function of x and of the information w available before the game started, but is not a function of y.  Likewise, b=b(y,w) is a function of y and w, but not of x.  Perhaps the best way to explain local realism is that it’s the thing you believe in, if you believe all the physicists babbling about “quantum entanglement” just missed something completely obvious.  Clearly, at the moment two “entangled” particles are created, but before they separate, one of them flips a tiny coin and then says to the other, “listen, if anyone asks, I’ll be spinning up and you’ll be spinning down.”  Then the naïve, doofus physicists measure one particle, find it spinning down, and wonder how the other particle instantly “knows” to be spinning up—oooh, spooky! mysterious!  Anyway, if that’s how you think it has to work, then you believe in local realism, and you must predict that Alice and Bob can win the CHSH game with probability at most 3/4.

What Bell observed in 1964 is that, even though quantum mechanics doesn’t let Alice send a signal to Bob (or vice versa) faster than the speed of light, it still makes a prediction about the CHSH game that conflicts with local realism.  (And thus, quantum mechanics exhibits what one might not have realized beforehand was even a logical possibility: it doesn’t allow communication faster than light, but simulating the predictions of quantum mechanics in a classical universe would require faster-than-light communication.)  In particular, if Alice and Bob share entangled qubits, say $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}},$$ then there’s a simple protocol that lets them violate the Bell inequality, winning the CHSH game ~85% of the time (with probability (1+1/√2)/2 > 3/4).  Starting in the 1970s, people did experiments that vindicated the prediction of quantum mechanics, and falsified local realism—or so the story goes.

The violation of the Bell inequality has a schizophrenic status in physics.  To many of the physicists I know, Nature’s violating the Bell inequality is so trivial and obvious that it’s barely even worth doing the experiment: if people had just understood and believed Bohr and Heisenberg back in 1925, there would’ve been no need for this whole tiresome discussion.  To others, however, the Bell inequality violation remains so unacceptable that some way must be found around it—from casting doubt on the experiments that have been done, to overthrowing basic presuppositions of science (e.g., our own “freedom” to generate random bits x and y to send to Alice and Bob respectively).

For several decades, there was a relatively conservative way out for local realist diehards, and that was to point to “loopholes”: imperfections in the existing experiments which meant that local realism was still theoretically compatible with the results, at least if one was willing to assume a sufficiently strange conspiracy.

Fine, you interject, but surely no one literally believed these little experimental imperfections would be the thing that would rescue local realism?  Not so fast.  Right here, on this blog, I’ve had people point to the loopholes as a reason to accept local realism and reject the reality of quantum entanglement.  See, for example, the numerous comments by Teresa Mendes in my Whether Or Not God Plays Dice, I Do post.  Arguing with Mendes back in 2012, I predicted that the two main loopholes would both be closed in a single experiment—and not merely eventually, but in, like, a decade.  I was wrong: achieving this milestone took only a few years.

Before going further, let’s understand what the two main loopholes are (or rather, were).

The locality loophole arises because the measuring process takes time and Alice and Bob are not infinitely far apart.  Thus, suppose that, the instant Alice starts measuring her particle, a secret signal starts flying toward Bob’s particle at the speed of light, revealing her choice of measurement setting (i.e., the value of x).  Likewise, the instant Bob starts measuring his particle, his doing so sends a secret signal flying toward Alice’s particle, revealing the value of y.  By the time the measurements are finished, a few microseconds later, there’s been plenty of time for the two particles to coordinate their responses to the measurements, despite being “classical under the hood.”

Meanwhile, the detection loophole arises because in practice, measurements of entangled particles—especially of photons—don’t always succeed in finding the particles, let alone ascertaining their properties.  So one needs to select those runs of the experiment where Alice and Bob both find the particles, and discard all the “bad” runs where they don’t.  This by itself wouldn’t be a problem, if not for the fact that the very same measurement that reveals whether the particles are there, is also the one that “counts” (i.e., where Alice and Bob feed x and y and get out a and b)!

To someone with a conspiratorial mind, this opens up the possibility that the measurement’s success or failure is somehow correlated with its result, in a way that could violate the Bell inequality despite there being no real entanglement.  To illustrate, suppose that at the instant they’re created, one entangled particle says to the other: “listen, if Alice measures me in the x=0 basis, I’ll give the a=1 result.  If Bob measures you in the y=1 basis, you give the b=1 result.  In any other case, we’ll just evade detection and count this run as a loss.”  In such a case, Alice and Bob will win the game with certainty, whenever it gets played at all—but that’s only because of the particles’ freedom to choose which rounds will count.  Indeed, by randomly varying their “acceptable” x and y values from one round to the next, the particles can even make it look like x and y have no effect on the probability of a round’s succeeding.

Until a month ago, the state-of-the-art was that there were experiments that closed the locality loophole, and other experiments that closed the detection loophole, but there was no single experiment that closed both of them.

To close the locality loophole, “all you need” is a fast enough measurement on photons that are far enough apart.  That way, even if the vast Einsteinian conspiracy is trying to send signals between Alice’s and Bob’s particles at the speed of light, to coordinate the answers classically, the whole experiment will be done before the signals can possibly have reached their destinations.  Admittedly, as Nicolas Gisin once pointed out to me, there’s a philosophical difficulty in defining what we mean by the experiment being “done.”  To some purists, a Bell experiment might only be “done” once the results (i.e., the values of a and b) are registered in human experimenters’ brains!  And given the slowness of human reaction times, this might imply that a real Bell experiment ought to be carried out with astronauts on faraway space stations, or with Alice on the moon and Bob on earth (which, OK, would be cool).  If we’re being reasonable, however, we can grant that the experiment is “done” once a and b are safely recorded in classical, macroscopic computer memories—in which case, given the speed of modern computer memories, separating Alice and Bob by half a kilometer can be enough.  And indeed, experiments starting in 1998 (see for example here) have done exactly that; the current record, unless I’m mistaken, is 18 kilometers.  (Update: I was mistaken; it’s 144 kilometers.)  Alas, since these experiments used hard-to-measure photons, they were still open to the detection loophole.

To close the detection loophole, the simplest approach is to use entangled qubits that (unlike photons) are slow and heavy and can be measured with success probability approaching 1.  That’s exactly what various groups did starting in 2001 (see for example here), with trapped ions, superconducting qubits, and other systems.  Alas, given current technology, these sorts of qubits are virtually impossible to move miles apart from each other without decohering them.  So the experiments used qubits that were close together, leaving the locality loophole wide open.

So the problem boils down to: how do you create long-lasting, reliably-measurable entanglement between particles that are very far apart (e.g., in separate labs)?  There are three basic ideas in Hensen et al.’s solution to this problem.

The first idea is to use a hybrid system.  Ultimately, Hensen et al. create entanglement between electron spins in nitrogen vacancy centers in diamond (one of the hottest—or coolest?—experimental quantum information platforms today), in two labs that are about a mile away from each other.  To get these faraway electron spins to talk to each other, they make them communicate via photons.  If you stimulate an electron, it’ll sometimes emit a photon with which it’s entangled.  Very occasionally, the two electrons you care about will even emit photons at the same time.  In those cases, by routing those photons into optical fibers and then measuring the photons, it’s possible to entangle the electrons.

Wait, what?  How does measuring the photons entangle the electrons from whence they came?  This brings us to the second idea, entanglement swapping.  The latter is a famous procedure to create entanglement between two particles A and B that have never interacted, by “merely” entangling A with another particle A’, entangling B with another particle B’, and then performing an entangled measurement on A’ and B’ and conditioning on its result.  To illustrate, consider the state

$$ \frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}} \otimes \frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}} $$

and now imagine that we project the first and third qubits onto the state $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}}.$$

If the measurement succeeds, you can check that we’ll be left with the state $$\frac{\left| 00 \right\rangle + \left| 11 \right\rangle}{\sqrt{2}}$$ in the second and fourth qubits, even though those qubits were not entangled before.

So to recap: these two electron spins, in labs a mile away from each other, both have some probability of producing a photon.  The photons, if produced, are routed to a third site, where if they’re both there, then an entangled measurement on both of them (and a conditioning on the results of that measurement) has some nonzero probability of causing the original electron spins to become entangled.

But there’s a problem: if you’ve been paying attention, all we’ve done is cause the electron spins to become entangled with some tiny, nonzero probability (something like 6.4×10-9 in the actual experiment).  So then, why is this any improvement over the previous experiments, which just directly measured faraway entangled photons, and also had some small but nonzero probability of detecting them?

This leads to the third idea.  The new setup is an improvement because, whenever the photon measurement succeeds, we know that the electron spins are there and that they’re entangled, without having to measure the electron spins to tell us that.  In other words, we’ve decoupled the measurement that tells us whether we succeeded in creating an entangled pair, from the measurement that uses the entangled pair to violate the Bell inequality.  And because of that decoupling, we can now just condition on the runs of the experiment where the entangled pair was there, without worrying that that will open up the detection loophole, biasing the results via some bizarre correlated conspiracy.  It’s as if the whole experiment were simply switched off, except for those rare lucky occasions when an entangled spin pair gets created (with its creation heralded by the photons).  On those rare occasions, Alice and Bob swing into action, measuring their respective spins within the brief window of time—about 4 microseconds—allowed by the locality loophole, seeking an additional morsel of evidence that entanglement is real.  (Well, actually, Alice and Bob swing into action regardless; they only find out later whether this was one of the runs that “counted.”)

So, those are the main ideas (as well as I understand them); then there’s lots of engineering.  In their setup, Hensen et al. were able to create just a few heralded entangled pairs per hour.  This allowed them to produce 245 CHSH games for Alice and Bob to play, and to reject the hypothesis of local realism at ~96% confidence.  Jungsang Kim explained to me that existing technologies could have produced many more events per hour, and hence, in a similar amount of time, “particle physics” (5σ or more) rather than “psychology” (2σ) levels of confidence that local realism is false.  But in this type of experiment, everything is a tradeoff.  Building not one but two labs for manipulating NV centers in diamond is extremely onerous, and Hensen et al. did what they had to do to get a significant result.

The basic idea here, of using photons to entangle longer-lasting qubits, is useful for more than pulverizing local realism.  In particular, the idea is a major part of current proposals for how to build a scalable ion-trap quantum computer.  Because of cross-talk, you can’t feasibly put more than 10 or so ions in the same trap while keeping all of them coherent and controllable.  So the current ideas for scaling up involve having lots of separate traps—but in that case, one will sometimes need to perform a Controlled-NOT, or some other 2-qubit gate, between a qubit in one trap and a qubit in another.  This can be achieved using the Gottesman-Chuang technique of gate teleportation, provided you have reliable entanglement between the traps.  But how do you create such entanglement?  Aha: the current idea is to entangle the ions by using photons as intermediaries, very similar in spirit to what Hensen et al. do.

At a more fundamental level, will this experiment finally convince everyone that local realism is dead, and that quantum mechanics might indeed be the operating system of reality?  Alas, I predict that those who confidently predicted that a loophole-free Bell test could never be done, will simply find some new way to wiggle out, without admitting the slightest problem for their previous view.  This prediction, you might say, is based on a different kind of realism.

Ask Me Anything: Diversity Edition

September 5th, 2015

With the fall semester imminent, and by popular request, I figured I’d do another Ask Me Anything (see here for the previous editions).  This one has a special focus: I’m looking for questions from readers who consider themselves members of groups that have historically been underrepresented in the Shtetl-Optimized comments section.  Besides the “obvious”—e.g., women and underrepresented ethnic groups—other examples might include children, traditionally religious people, jocks, liberal-arts majors… (but any group that includes John Sidles is probably not an example).  If I left out your group, please go ahead and bring it to my and your fellow readers’ attention!

My overriding ideal in life—what is to me as Communism was to Lenin, as Frosted Flakes are to Tony the Tiger—is people of every background coming together to discover and debate universal truths that transcend their backgrounds.  So few things have ever stung me more than accusations of being a closed-minded ivory-tower elitist white male nerd etc. etc.  Anyway, to anyone who’s ever felt excluded here for whatever reason, I hope this AMA will be taken as a small token of goodwill.

Similar rules apply as to my previous AMAs:

  • Only one question per person.
  • No multi-part questions, or questions that require me to read a document or watch a video and then comment on it.
  • Questions need not have anything to do with your underrepresented group (though they could). Math, science, futurology, academic career advice, etc. are all fine.  But please be courteous; anything gratuitously nosy or hostile will be left in the moderation queue.
  • I’ll stop taking further questions most likely after 24 hours (I’ll post a warning before closing the thread).

Update (Sep. 6): For anyone from the Boston area, or planning to visit it, I have an important piece of advice.  Do not ever, under any circumstances, attempt to visit Walden Pond, and tell everyone you know to stay away.  After we spent 40 minutes driving there with a toddler, the warden literally screamed at us to go away, that the park was at capacity. It wasn’t an issue of parking: even if we’d parked elsewhere, we just couldn’t go.  Exceptions were made for the people in front of us, but not for us, the ones with the 2-year-old who’d been promised her weekend outing would be to meet her best friend at Walden Pond.  It’s strangely fitting that what for Thoreau was a place of quiet contemplation, is today purely a site of overcrowding and frustration.

Another Update: OK, no new questions please, only comments on existing questions! I’ll deal with the backlog later today. Thanks to everyone who contributed.