Archive for the ‘Mirrored on CSAIL Blog’ Category

The array size of the universe

Friday, May 23rd, 2008

I’ve been increasingly tempted to make this blog into a forum solely for responding to the posts at Overcoming Bias. (Possible new name: “Wallowing in Bias.”)

Two days ago, Robin Hanson pointed to a fascinating paper by Bousso, Harnik, Kribs, and Perez, on predicting the cosmological constant from an “entropic” version of the anthropic principle. Say what you like about whether anthropicology is science or not, for me there’s something delightfully non-intimidating about any physics paper with “anthropic” in the abstract. Sure, you know it’s going to have metric tensors, etc. (after all, it’s a physics paper) — but you also know that in the end, it’s going to turn on some core set of assumptions about the number of sentient observers, the prior probability of the universe being one way rather than another, etc., which will be comprehensible (if not necessarily plausible) to anyone familiar with Bayes’ Theorem and how to think formally.

So in this post, I’m going to try to extract an “anthropic core” of Bousso et al.’s argument — one that doesn’t depend on detailed calculations of entropy production (or anything else) — trusting my expert readers to correct me where I’m mistaken. In defense of this task, I can hardly do better than to quote the authors themselves. In explaining why they make what will seem to many like a decidedly dubious assumption — namely, that the “number of observations” in a given universe should be proportional to the increase in non-gravitational entropy, which is dominated (or so the authors calculate) by starlight hitting dust — they write:

We could have … continued to estimate the number of observers by more explicit anthropic criteria. This would not have changed our final result significantly. But why make a strong assumption if a more conservative one suffices? [p. 14]

In this post I’ll freely make strong assumptions, since my goal is to understand and explain the argument rather than to defend it.

The basic question the authors want to answer is this: why does our causally-connected patch of the universe have the size it does? Or more accurately: taking everything else we know about physics and cosmology as given, why shouldn’t we be surprised that it has the size it does?

From the standpoint of post-1998 cosmology, this is more-or-less equivalent to asking why the cosmological constant Λ ~ 10-122 should have the value it has. For the radius of our causal patch scales like

1/√Λ ~ 1061 Planck lengths ~ 1010 light-years,

while (if you believe the holographic principle) its maximum information content scales like 1/Λ ~ 10122 qubits. To put it differently, there might be stars and galaxies and computers that are more than ~1010 light-years away from us, and they might require more than ~10122 qubits to describe. But if so, they’re receding from us so quickly that we’ll never be able to observe or interact with them.

Of course, to ask why Λ has the value it does is really to ask two questions:

1. Why isn’t Λ smaller than it is, or even zero? (In this post, I’ll ignore the possibility of its being negative.)
2. Why isn’t Λ bigger than it is?

Presumably, any story that answers both questions simultaneously will have to bring in some actual facts about the universe. Let’s face it: 10-122 is just not the sort of answer you expect to get from armchair philosophizing (not that it wouldn’t be great if you did). It’s a number.

As a first remark, it’s easy to understand why Λ isn’t much bigger than it is. If it were really big, then matter in the early universe would’ve flown apart so quickly that stars and galaxies wouldn’t have formed, and hence we wouldn’t be here to blog about it. But this upper bound is far from tight. Bousso et al. write that, based on current estimates, Λ could be about 2000 times bigger than it is without preventing galaxy formation.

As for why Λ isn’t smaller, there’s a “naturalness” argument due originally (I think) to Weinberg, before the astronomers even discovered that Λ>0. One can think of Λ as the energy of empty space; as such, it’s a sum of positive and negative contributions from all possible “scalar fields” (or whatever else) that contribute to that energy. That all of these admittedly-unknown contributions would happen to cancel out exactly, yielding Λ=0, seems fantastically “unnatural” if you choose to think of the contributions as more-or-less random. (Attempts to calculate the likely values of Λ, with no “anthropic correction,” notoriously give values that are off by 120 orders of magnitude!) From this perspective, the smaller you want Λ to be, the higher the price you have to pay in the unlikelihood of your hypothesis.

Based on the above reasoning, Weinberg predicted that Λ would have close to the largest possible value it could have, consistent with the formation of galaxies. As mentioned before, this gives a prediction that’s too big by a factor of 2000 — a vast improvement over the other approaches, which gave predictions that were off by factors of 10120 or infinity!

Still, can’t we do better? One obvious approach to pushing Λ down would be to extend the relatively-uncontroversial argument explaining why Λ can’t be enormous. After all, the tinier we make Λ, the bigger the universe (or at least our causal patch of it) will be. And hence, one might argue, the more observers there will be, hence the more likely we’ll be to exist in the first place! This form of anthropicizing — that we’re twice as likely to exist in a universe with twice as many observers — is what philosopher Nick Bostrom calls the Self-Indication Assumption.

However, two problems with this idea are evident. First, why should it be our causal patch of the universe that matters, rather than the universe as a whole? For anthropic purposes, who cares if the various civilizations that arise in some universe are in causal contact with each other or not, provided they exist? Bousso et al.’s response is basically just to stress that, from what we know about quantum gravity (in particular, black-hole complementarity), it probably doesn’t even make sense to assign a Hilbert space to the entire universe, as opposed to some causal patch of it. Their “Causal-Patch Self-Indication Assumption” still strikes me as profoundly questionable — but let’s be good sports, assume it, and see what the consequences are.

If we do this, we immediately encounter a second problem with the anthropic argument for a low value of Λ: namely, it seems to work too well! On its face, the Self-Indication Assumption wants the number of observers in our causal patch to be infinite, hence the patch itself to be infinite in size, hence Λ=0, in direct conflict with observation.

But wait: what exactly is our prior over the possible values of Λ? Well, it appears Landscapeologists typically just assume a uniform prior over Λ within some range. (Can someone enlighten me on the reasons for this, if there are any? E.g., is it just that the middle part of a Gaussian is roughly uniform?) In that case, the probability that Λ is between ε and 2ε will be of order ε — and such an event, we might guess, would lead to a universe of “size” 1/ε, with order 1/ε observers. In other words, it seems like the tiny prior probability of a small cosmological constant should precisely cancel out the huge number of observers that such a constant leads to — Λ(1/Λ)=1 — leaving us with no prediction whatsoever about the value of Λ. (When I tried to think about this issue years ago, that’s about as far as I got.)

So to summarize: Bousso et al. need to explain to us on the one hand why Λ isn’t 2000 times bigger than it is, and on the other hand why it’s not arbitrarily smaller or 0. Alright, so are you ready for the argument?

The key, which maybe isn’t so surprising in retrospect, turns out to be other stuff that’s known about physics and astronomy (independent of Λ), together with the assumption that that other stuff stays the same (i.e., that all we’re varying is Λ). Sure, say Bousso et al.: in principle a universe with positive cosmological constant Λ could contain up to ~1/Λ bits of information, which corresponds — or so a computer scientist might estimate! — to ~1/Λ observers, like maybe ~1/√Λ observers in each of ~1/√Λ time periods. (The 1/√Λ comes from the Schwarzschild bound on the amount of matter and energy within a given radius, which is linear in the radius and therefore scales like 1/√Λ.)

But in reality, that 1/Λ upper bound on the number of observers won’t be anywhere close to saturated. In reality, what will happen is that after a billion or so years stars will begin to form, radiating light and quickly increasing the universe’s entropy, and then after a couple tens of billions more years, those stars will fizzle out and the universe will return to darkness. And this means that, even though you pay a Λ price in prior probability for a universe with 1/Λ information content, as Λ goes to zero what you get for your money is not ~1/√Λ observers in each of ~1/√Λ time periods (hence ~1/Λ observers in total), but rather just ~1/√Λ observers over a length of time independent of Λ (hence ~1/√Λ observers in total). In other words, you get diminishing returns for postulating a bigger and bigger causal patch, once your causal patch exceeds a few tens of billions of light-years in radius.

So that’s one direction. In the other direction, why shouldn’t we expect Λ to be 2000 times bigger than it is (i.e. the radius of our causal patch to be ~45 times smaller)? Well, Λ could be that big, say the authors, but in that case the galaxies would fly apart from each other before starlight really started to heat things up. So once again you lose out: during the very period when the stars are shining the brightest, entropy production is at its peak, civilizations are presumably arising and killing each other off, etc., the number of galaxies per causal patch is minuscule, and that more than cancels out the larger prior probability that comes with a larger value of Λ.

Putting it all together, then, what you get is a posterior distribution for Λ that’s peaked right around 10-122 or so, corresponding to a causal patch a couple tens of light-years across. This, of course, is exactly what’s observed. You also get the prediction that we should be living in the era when Λ is “just taking over” from gravity, which again is borne out by observation. According to another paper, which I haven’t yet read, several other predictions of cosmological parameters come out right as well.

On the other hand, it seems to me that there are still few enough data points that physicists’ ability to cook up some anthropic explanation to fit them all isn’t sufficiently surprising to compel belief. (In learning theory terms, the measurable cosmological parameters still seem shattered by the concept class of possible anthropic stories.) For those of us who, unlike Eliezer Yudkowsky, still hew to the plodding, non-Bayesian, laughably human norms of traditional science, it seems like what’s needed is a successful prediction of a not-yet-observed cosmological parameter.

Until then, I’m happy to adopt a bullet-dodging attitude toward this and all other proposed anthropic explanations. I assent to none, but wish to understand them all — the more so if they have a novel conceptual twist that I personally failed to think of.

Floating in Platonic heaven

Thursday, May 15th, 2008

In the comments section of my last post, Jack in Danville writes:

I may have misunderstood [an offhand comment about the “irrelevance” of the Continuum Hypothesis] … Intuitively I’ve thought the Continuum Hypothesis describes an aspect of the real world.

I know we’ve touched on similar topics before, but something tells me many of you are hungerin’ for a metamathematical foodfight, and Jack’s perplexity seemed as good a pretext as any for starting a new thread.

So, Jack: this is a Deep Question, but let me try to summarize my view in a few paragraphs.

It’s easy to imagine a “physical process” whose outcome could depend on whether Goldbach’s Conjecture is true or false. (For example, a computer program that tests even numbers successively and halts if it finds one that’s not a sum of two primes.) Likewise for P versus NP, the Riemann Hypothesis, and even considerably more abstract questions.

But can you imagine a “physical process” whose outcome could depend on whether there’s a set larger than the set of integers but smaller than the set of real numbers? If so, what would it look like?

I submit that the key distinction is between

  1. questions that are ultimately about Turing machines and finite sets of integers (even if they’re not phrased that way), and
  2. questions that aren’t.

We need to assume that we have a “direct intuition” about integers and finite processes, which precedes formal reasoning — since without such an intuition, we couldn’t even do formal reasoning in the first place. By contrast, for me the great lesson of Gödel and Cohen’s independence results is that we don’t have a similar intuition about transfinite sets, even if we sometimes fool ourselves into thinking we do. Sure, we might say we’re talking about arbitrary subsets of real numbers, but on closer inspection, it turns out we’re just talking about consequences of the ZFC axioms, and those axioms will happily admit models with intermediate cardinalities and other models without them, the same way the axioms of group theory admit both abelian and non-abelian groups. (Incidentally, Gödel’s models of ZFC+CH and Cohen’s models of ZFC+not(CH) both involve only countably many elements, which makes the notion that they’re telling us about some external reality even harder to understand.)

Of course, everything I’ve said is consistent with the possibility that there’s a “truth” about CH floating in Platonic heaven, or even that a plausible axiom system other than ZFC could prove or disprove CH (which was Gödel’s hope). But the “truth” of CH is not going to have consequences for human beings or the physical universe independent of its provability, in the same way that the truth of P=NP could conceivably have consequences for us even if we weren’t able to prove or disprove it.

For mathematicians, this distinction between “CH-like questions” and “Goldbach/Riemann/Pvs.NP-like questions” is a cringingly obvious one, probably even too obvious to point out. But I’ve seen so many people argue about Platonism versus formalism as if this distinction didn’t exist — as if one can’t be a Platonist about integers but a formalist about transfinite sets — that I think it’s worth hammering home.

To summarize, Kronecker had it backwards. Man and Woman deal with the integers; all else is the province of God.


Monday, February 25th, 2008

A reader named Hernan asked me for my opinion of a well-known rant by Jonathan Katz of Washington University, about why young people shouldn’t go into academic science since there are so few jobs and the jobs that there are stink anyway. I posted my response in the comments section, but since it seems to be of general interest I thought I’d make a proper entry of it.

Katz is correct that opportunities in academic science (at least in the US) are much scarcer than they were during the Cold War; I think government shortsightedness deserves a huge part of the blame for that. On the other hand, countless would-be grad students have already followed the invisible hand and taken Katz’s advice, and are doing quite well in Wall Street, Silicon Valley, etc. So the ones going to grad school are mostly the ones willing to assume the (by now well-known) risks: if they weren’t, they wouldn’t be there.

My fundamental disagreement with Katz is that I think PhD work is increasingly excellent preparation for industry careers. Of course, in some cases (e.g. a quantum computing PhD going to work for an Internet startup), it’s hard to argue that the PhD provides much beyond general skills like analytical thinking, teamwork, project completion, etc., and that those skills couldn’t just as well be obtained elsewhere. But even in those cases, I think a PhD at least won’t hurt your chances in industry these days (notwithstanding Phil Greenspun’s PhD expunging service). So what the PhD does is to give many people an opportunity to spend six years advancing human knowledge and doing something they enjoy, before switching to something that’s actually rewarded by the economy. (One corollary is that, if you’re not enjoying grad school, then you shouldn’t be there. But this is just an instance of a general rule: don’t choose a career option that causes you years of suffering in the hope that the suffering will end later; it probably won’t.)

Furthermore, if there used to be a stigma attached to leaving grad school for industry, I think that’s basically vanished, and that now many PhD programs even see training students for industry as a fundamental part of their mission.

I can’t comment on the rest of Katz’s complaints (the need for conformity, the burden of writing grant proposals, etc.), except to say that so far, my own experience has been more positive. Maybe the worst is ahead!

Incidentally, my comments apply most clearly to computer science PhD programs, which are what I’m most familiar with, but I believe they also apply to physics and other sciences. As for humanities PhD’s … dude, you’re on your own.

Scientific American article is out!

Sunday, February 17th, 2008

After three years of procrastination and delays, my 8-page feature article on “The Limits of Quantum Computers” has finally appeared in the March issue of Scientific American. Once I get permission, I’ll post a plain-text version on my website. In the meantime, you can buy the online issue for US$5.00 from SciAm‘s website, in which case you get colorful sidebars and graphics (including a bearded, white-lab-coated cartoon scientist holding quantum computers and complexity class inclusion diagrams), as well as an interview with Jeff Kimble about the unfortunate movie “Jumper”, and other articles about “the end of cosmology”, prediction markets (Robin Hanson and his “futarchy” get a mention), and the disastrous overfishing of the bluefin tuna (the kind used for toro sushi).

Update (2/18): By popular demand, I’m posting a rough early draft (PDF) of my article online. Read at your own risk!

So, what was it like to write for Scientific American? Exhausting, excruciating, and ultimately worth it. As a general rule, SciAm (probably like all large-circulation magazines) rewrites articles so extensively that the person listed in the byline is less the “writer” than the “content consultant.” Almost every sentence in my article bears the scars of battle (some that I won, more that I didn’t). Yet I have to concede that, when something was really cringe-inducingly wrong, SciAm was willing to listen and make changes — and besides, they did a great job with the cartoons. I’m happy with the end result. Thanks to George Musser, the editor who solicited the article and gave me lots of feedback in the early stages, and to Graham Collins, who finally saw it into print.

A few specific comments for your amusement:

  • In an earlier draft, the cartoons adorning my article were “all balding white guy, all the time” (supposedly, because of the need to keep a “consistent character” throughout the article). I demanded some sort of cartoon-diversity. After a heated discussion among the editors — in which, I’m told, the name of Larry Summers was invoked — they finally agreed to add a cartoon black woman. To those who think I’m a male chauvinist pig: how many brownie points do I get?
  • No, the crystal ball with floating ψ’s and φ’s, mounted atop a keyboard, is not an accurate depiction of what a quantum computer would look like. Having toured some actual QC labs, though, I had to admit it worked better graphically than a lab table piled high with tinfoil, lasers, and assorted pipes.
  • The topic label of the article is “Information Technology.” I pleaded with them to change the topic to “Computer Science,” but to no avail. Apparently the problem was that in the table of contents, the previous two articles were labeled “Prediction Science” and “Brain Science.”
  • The complexity class inclusion diagram on page 67 was a key concession I did win. (Apparently some editors felt a Venn diagram with P, NP, BQP, and PSPACE would be way too complicated, even for readers who regularly gobble down heaping helpings of M-theory.) As you can imagine, exposing people to this stuff seemed pretty important to me: this is apparently the first time P, NP, and NP-completeness have been explained at any length in Scientific American since articles by Knuth and by Lewis and Papadimitriou in the 1970’s.
  • In the author bio on page 67, the description of me as a “high school dropout” is a slight exaggeration, but there’s no other short term for what I am (see here for more).
  • I had nothing to do with the sidebar on page 68, about Vernor Vinge’s novel A Fire Upon the Deep. I’ve never read that (or anything else by Vinge for that matter).
  • My original draft included explanations of both the polynomial and adversary methods for quantum lower bounds, with references to BBCMdW and Ambainis. Shockingly, all of that was cut, while the part about time machines was greatly expanded.

During the hairiest parts of editing process, I was reminded of a passage in Anita and Solomon Feferman’s biography of the great logician Alfred Tarski, which described Tarski’s writing of an article for Scientific American (the only popular article he ever wrote).

Usually the Scientific American articles are heavily edited; many are rewriteen and some even ghostwritten, but Wisnovsky [Tarski’s editor] knew better than to tamper with Tarski’s work and did not — except for his usage of ‘which’ and ‘that’. It seemed to him that Tarski did a 180-degree reversal of these words, so he changed every ‘which’ to ‘that’ and every ‘that’ to ‘which’ and sent the proofs to Tarski, who changed everything back to the way it had been. Wisnovsky got out Fowler’s Dictionary of Modern English Usage, the house bible, and called Tarski on the telephone. “I asked if I could read him the relevant passage on ‘that’ and ‘which’ and he said, ‘yes’. It goes on for pages, but he listened very patiently until I finished. Then he said, ‘Well, you see, that is Fowler. I am Tarski.’ The minute he said that I caved in. I felt cut off at the knees and I gave up trying to make any changes at all.”

Yet, while the “Tarski approach” to magazine writing is a tempting one, here’s the final irony. I looked up Tarski’s actual article from 1969, and it badly needed an editor.

Geordie Rose at MIT

Tuesday, January 29th, 2008

While there are many, many things in this world that I’m bent on destroying, D-Wave Systems has never been one of them. Ideally, I’d simply let the D-Wave folks do their thing (namely, try to build an adiabatic quantum computer) while I do my thing (namely, study the fundamental limits of quantum computers). It was only when, in connection with D-Wave, cringe-worthy claims about quantum computing started appearing all over the press that I felt a professional obligation to say something.

Now that I’m “involved,” though, I also need to keep you ablog of any notable further developments. And presumably, D-Wave founder Geordie Rose coming to MIT to meet with our quantum information group counts as notable.

Two months ago, you’ll recall, we were graced by a visit from D-Wave’s Mohammad Amin and Andrew Berkley, but I’d never before had the pleasure of meeting Geordie. At least formally, the reason for his visit was not to defend D-Wave, but to present “four hard problems” for us to solve. These problems were as follows:

  1. Find a practical adiabatic factoring algorithm. Because of the equivalence of adiabatic and standard quantum computing, we know that such an algorithm exists, but the running time you get from applying the reduction is something like O(n11). Geordie asks for an O(n3) factoring algorithm in the adiabatic model. It was generally agreed (with one dissent, from Geordie) that reducing factoring to a 3SAT instance, and then throwing a generic adiabatic optimization algorithm at the result, would be a really, really bad approach to this problem.
  2. Find a fault-tolerance threshold for adiabatic quantum computing, similar to the known threshold in the circuit model. Geordie asserted that such a threshold has to exist, because of the equivalence of adiabatic and standard quantum computing. However, others immediately pointed out that this is not so: the equivalence theorem is not known to be “fault-tolerance-preserving.” This is a major open problem that many people have worked on without success.
  3. Prove upper and lower bounds on the adiabatic algorithm’s performance in finding exact solutions to hard optimization problems.
  4. Prove upper and lower bounds on its performance in finding approximate solutions to such problems. (Ed Farhi described 3 and 4 as “so much harder than anything else we’ve failed to solve.”)

While none of these problems are new to the quantum computing community, they’re all extremely good ones, and all (indeed) extremely hard.

Of course, we did also discuss some controversial, red-meat, “did-D-Wave-build-a-quantum-computer” sorts of questions, so I owe it to you to provide a few highlights from that discussion.

Seth Lloyd, who’s been more sympathetic to D-Wave than most of us, correctly pointed out that D-Wave has a “credibility problem in the scientific community.” He discussed in great detail the experiments D-Wave ought to be doing to convince scientists that they’re really seeing quantum effects. I strongly agreed with Seth, adding that I’d rather see two coherent qubits than thousands of incoherent ones. Of course, even if D-Wave could demonstrate two-qubit entanglement (and Geordie says it’s the “next thing on the list”), there would still remain the enormous issues of scalability and of the limitations of the adiabatic algorithm in solving hard optimization problems. But at least we could be more comfortable in saying that what they currently have is a tiny quantum computer.

Geordie conceded that, so far, D-Wave has no direct evidence for entanglement among two or more qubits. He nevertheless argued that they have indirect evidence (basically, that their data are better fit by a simple quantum model than a simple classical one), and that the lack of direct evidence is solely due to the difficulty of performing the requisite measurements. Seth replied that, despite the difficulty, D-Wave would “do itself a big favor” by performing the measurements.

Seth also mentioned D-Wave’s claims to the popular press — for example, about the ability of quantum computers to solve NP-complete problems — as a major factor in its scientific credibility problem. Geordie admitted that some of D-Wave’s publicity was (here he paused for a few seconds) “not inaccurate, but verging on inaccurate.”

Note: Geordie now says that he was only talking about the reporting on D-Wave; in his words, “I stand by 100% anything I’ve ever said to anyone about these machines.” At the time, I understood him quite clearly to be talking about D-Wave’s own publicity; it’s strange that he would have hesitated to admit that reporters have misunderstood things. But I freely admit that I might have misheard or misinterpreted him.

I asked Geordie about the result of Bansal, Bravyi, and Terhal that the planar Ising spin graph problem admits an efficient classical approximation algorithm — thus calling into question D-Wave’s whole strategy of solving other NP approximation problems by mapping them onto Ising spin graph instances. Geordie replied, first, that their machine can handle many non-planar links, and second, that Bansal et al.’s algorithm merely trades an exponential dependence on n for an exponential dependence on 1/ε. I agreed that their algorithm isn’t practical, but argued that its mere existence would have to be dealt with in any attempt to convert approximate solutions of the Ising spin graph problem into approximate solutions of the original optimization problems.

So, where do we stand? Here’s my attempt at a fair summary:

  • The people at D-Wave are not conscious frauds; they genuinely believe in what they’re doing.
  • On the other hand, much of the publicity surrounding D-Wave can be safely rejected. To some academics, even one or two public misrepresentations are enough to destroy a company’s credibility. Others, however, prefer to ignore press releases — seeing hype, exaggeration, and even outright falsehoods as just a necessary part of raising money — and to concentrate solely on a company’s communications with experts. Where you fall between these extremes probably depends on your personality more than anything else.
  • In the past, I criticized D-Wave (rightly, I think) for failing to share information with the scientific community in a good-faith manner. To their credit, they’re now making more of an effort to communicate.
  • Thus far, by Geordie’s own account, there’s no direct evidence that D-Wave’s machine actually produces entanglement at any stage of its operation (which all agree is a non-negotiable requirement for quantum computing). Geordie says that producing such evidence will be the “next thing on the list.” The Sudoku stunt was worthless from a scientific perspective; it did not answer any of the questions that physicists need answered.
  • Even if D-Wave managed to build (say) a coherent 1,024-qubit machine satisfying all of its design specs, it’s not obvious it would outperform a classical computer on any problem of practical interest. This is true both because of the inherent limitations of the adiabatic algorithm, and because of specific concerns about the Ising spin graph problem. On the other hand, it’s also not obvious that such a machine wouldn’t outperform a classical computer on some practical problems. The experiment would be an interesting one! Of course, this uncertainty — combined with the more immediate uncertainties about whether D-Wave can build such a machine at all, and indeed, about whether they can even produce two-qubit entanglement — also means that any talk of “lining up customers” is comically premature.

Science: the toroidal pyramid

Wednesday, January 23rd, 2008

Chad Orzel gripes about this month’s Scientific American special issue on “The Future of Physics” — which is actually extremely good, but which turns out to be exclusively about the future of high-energy particle physics. Not surprisingly, the commenters on Chad’s blog reignite the ancient debate about which science is more fundamental than which other one, and whether all sciences besides particle physics are stamp collecting.

I started writing a comment myself, but then I realized I hadn’t posted anything to my own blog in quite some time, so being nothing if not opportunistic, I decided to put it here instead.

To me, one of the most delicious things about computer science is the way it turns the traditional “pyramid of sciences” on its head. We all know, of course, that math and logic are more fundamental than particle physics (even particle physicists themselves will, if pressed, grudgingly admit as much), and that particle physics is in turn more fundamental than condensed-matter physics, which is more fundamental than chemistry, which is more fundamental than biology, which is more fundamental than psychology, anthropology, and so on, which still are more fundamental than grubby engineering fields like, say, computer science … but then you find out that computer science actually has as strong a claim as math to be the substrate beneath physics, that in a certain sense computer science is math, and that until you understand what kinds of machines the laws of physics do and don’t allow, you haven’t really understood the laws themselves … and the whole hierarchy of fundamental-ness gets twisted into a circle and revealed as the bad nerd joke that it always was.

That was a longer sentence than I intended.

Note (Jan. 25): From now on, all comments asking what I think of the movie “Teeth” will be instantly deleted. I’m sick of the general topic, and regret having ever brought it up. Thank you for your understanding.

Volume 4 is already written (in our hearts)

Thursday, January 10th, 2008

Today is the 70th birthday of Donald E. Knuth: Priest of Programming, Titan of Typesetting, Monarch of MMIX, intellectual heir to Turing and von Neumann, greatest living computer scientist by almost-universal assent … alright, you get the idea.

That being the case, Jeff Shallit proposed to various CS bloggers that we should all band together and present the master with a birthday surprise: one post each about how his work has inspired us. The posts are now in! Readers who don’t know about Knuth’s work (are there any?) should start with this post from Luca. Then see this from David Eppstein, this from Doron Zeilberger, this from Jeff, this from Bill Gasarch, and this from Suresh.

Knuth’s impact on my own work and thinking, while vast, has not been directly through research: his main influence on my BibTeX file is that if not for him, I wouldn’t have a BibTeX file. (One reason is that I’m one of the people Doron Zeilberger attacks for ignoring constant factors, and supporting what he calls “the ruling paradigm in computational complexity theory, with its POL vs. EXP dichotomy.”) So I decided to leave Knuth’s scientific oeuvre to others, and to concentrate in this post on his contributions to two other fields: mathematical exposition and computational theology.

Knuth’s creation of the TeX typesetting system — his original motivation being to perfect the layout of his own Art of Computer Programming books — was remarkable in two ways. First, because scientific typesetting is of so little interest to industry, it’s not clear if something like TeX would ever have been invented if not for one man and his borderline-neurotic perfectionism. Second, TeX is one of the only instances I can think of when a complicated software problem was solved so well that it never had to be solved again (nor will it for many decades, one hazards to guess). At least in math, computer science, and physics, the adoption of TeX has been so universal that failure to use it is now a reliable crackpot indicator.

From Wikipedia:

Since version 3, TeX has used an idiosyncratic version numbering system, where updates have been indicated by adding an extra digit at the end of the decimal, so that the version number asymptotically approaches π. This is a reflection of the fact that TeX is now very stable, and only minor updates are anticipated. The current version of TeX is 3.141592; it was last updated in December 2002 … Even though Donald Knuth himself has suggested a few areas in which TeX could have been improved, he indicated that he firmly believes that having an unchanged system that will produce the same output now and in the future is more important than introducing new features. For this reason, he has stated that the “absolutely final change (to be made after my death)” will be to change the version number to π, at which point all remaining bugs will become features.

But Knuth’s interest in scientific exposition goes far beyond typesetting. His 1974 Surreal Numbers: How Two Ex-Students Turned on to Pure Mathematics and Found Total Happiness, which he wrote in one week, was weirdness at the highest possible level: the Beatles’ White Album of math. It’s said to represent the only occasion in history when a new mathematical theory (Conway’s theory of surreal numbers) was introduced in the form of a novel. (Though admittedly, with the exception of one sex scene, this is a “novel” whose plot development mostly takes the form of lemmas.)

Those seeking to improve their own writing should consult Mathematical Writing (available for free on the web), the lecture notes from a course at Stanford taught by Knuth, Tracy Larrabee, and Paul Roberts. Like a lot of Knuth’s work, Mathematical Writing has the refreshing feel of an open-ended conversation: we get to see Knuth interact with students, other teachers, and visiting luminaries like Mary-Claire van Leunen, Paul Halmos, Jeff Ullman, and Leslie Lamport.

Since I’ve blogged before about the battle over academic publishing, I also wanted to mention Knuth’s remarkable and characteristically methodical 2003 letter to the editorial board of the Journal of Algorithms. Knuth asks in a postscript that his letter not be distributed widely — but not surprisingly, it already has been.

In the rest of this post, I’d like to talk about Things A Computer Scientist Rarely Talks About, the only book of Knuth’s for which I collected one of his coveted $2.56 prizes for spotting an error. (Nothing important, just a typo.)

Things is based on a series of lectures on computer science and religion that Knuth gave in 1997 at MIT. (At the risk of oversimplifying: Knuth practices Christianity, but in a strange form less interested in guns and gays than in some business about “universal compassion.”) Perhaps like most readers, when I bought Things I expected yet another essay on “non-overlapping magisteria,” a famous scientist’s apologia justifying his belief in the Virgin Birth and the Resurrection. But Knuth likes to surprise, and what he delivers instead is mostly a meditation on the typography of Bible verses [sic]. More precisely, Things is a “metabook”: a book about the lessons Knuth learned while writing and typesetting an earlier book, one I haven’t yet read, that analyzed verse 3:16 of every book of the Bible.

But this being a lecture series, Knuth also fields questions from the audience about everything from sin and redemption to mathematical Platonism. He has a habit of parrying all the really difficult questions with humor; indeed, he does this so often one comes to suspect humor is his answer. As far as I could tell, there’s only one passage in the entire book where Knuth directly addresses what atheists are probably waiting for him to address. From one of the question periods:

Q: How did you become so interested in God and religion in the first place?

A: It was because of the family I was born into. If I had been born in other circumstances, my religious life would no doubt have been quite different. (p. 155)

And then on to the next question.

To me, what’s remarkable about this response is that Knuth without any hesitation concedes what skeptics from Xenophanes to Richard Dawkins have held up as the central embarrassment of religion. This, of course, is the near-perfect correlation between the content of religious belief and the upbringing of the believer. How, Dawkins is fond of asking, could there possibly be such a thing as a Christian or Hindu or Jewish child? How could a four-year-old already know what he or she thinks about profound questions of cosmogony, history, and ethics — unless, of course, the child were brainwashed by parents or teachers?

My Bayesian friends, like Robin Hanson, carry this argument a step further. For them, the very fact that Knuth knows his beliefs would be different were he born to different parents must, assuming he’s rational, force him to change his beliefs. For how can he believe something with any conviction, if he knows his belief was largely determined by a logically-irrelevant coin toss?

And yet, openly defying the armies of Bayes arrayed against him, here we have Knuth saying, in effect: yes, if I know that if I were some other person my beliefs would be different, but I’m not that other person; I’m Knuth.

So, readers: is Knuth’s response a cop-out, the understandable yet ultimately-indefensible defense of an otherwise-great scientist who never managed to free himself from certain childhood myths? Or is it a profound acknowledgment that none of us ever escape the circumstances of our birth, that we might as well own up to it, that tolerance ought not to require a shared prior, that the pursuit of science and other universal values can coexist with the personal and incommunicable?

Taking a cue from Knuth himself, I’m going to dodge this question. Instead, I decided to end this post by quoting some of my favorite passages from Chapter 6 of Things A Computer Scientist Rarely Talks About.

On computer science and God: “When I talk about computer science as a possible basis for insights about God, of course I’m not thinking about God as a super-smart intellect surrounded by large clusters of ultrafast Linux workstations and great search engines. That’s the user’s point of view.” (p. 168)

“I think it’s fair to say that many of today’s large computer programs rank among the most complex intellectual achievements of all time. They’re absolutely trivial by comparison with any of the works of God, but still they’re somehow closer to those works than anything else we know.” (p. 169)

On infinity: “Infinity is a red herring. I would be perfectly happy to give up immortality if I could only live Super K years before dying [‘Super K’ being defined similarly to an Ackermann number]. In fact, Super K nanoseconds would be enough.” (p. 172)

On the other hand: “I once thought, if I ever had to preach a sermon in church, I would try to explain Cantor’s theorem to my non-mathematical friends so that they could understand something about the infinite.” (p. 172)

On God and computational complexity: “I think it’s fair to say that God may well be bound by the laws of computational complexity … But I don’t recommend that theologians undertake a deep study of computational complexity (unless, of course, they really enjoy it). ” (p. 174)

On quantum mechanics: “Several years ago, I chanced to open Paul Dirac’s famous book on the subject and I was surprised to find out that Dirac was not only an extremely good writer but also that his book was not totally impossible to understand. The biggest surprise, however — actually a shock — was to learn that the things he talks about in that book were completely different from anything I had ever read in Scientific American or in any other popular account of the subject. Apparently when physicists talk to physicists, they talk about linear transformations of generalized Hilbert spaces over the complex numbers; observable quantities are eigenvalues and eigenfunctions of Hermitian linear operators. But when physicists talk to the general public they don’t dare mention such esoteric things, so they speak instead about particles and spins and such, which are much less than half the story. No wonder I could never really understand the popular articles.” (p. 181)

“The extra detail that gets suppressed when quantum mechanics gets popularized amounts to the fact that, according to quantum mechanics, the universe actually consists of much more data than could ever be observed.” (p. 182)

On free will and the problem of evil: “I can design a program that never crashes if I don’t give the user any options. And if I allow the user to choose from only a small number of options, limited to things that appear on a menu, I can be sure that nothing anomalous will happen, because each option can be foreseen in advance and its effects can be checked. But if I give the user the ability to write programs that will combine with my own program, all hell might break loose. (In this sense the users of Emacs have much more free will than the users of Microsoft Word.) … I suppose we could even regard Figure 5 [a binary tree representing someone’s choices] as the Tree of the Knowledge of Good and Evil.” (p. 189-190)

Ten Signs a Claimed Mathematical Breakthrough is Wrong

Saturday, January 5th, 2008

Yesterday several people asked my opinion of a preprint claiming to solve the Graph Isomorphism problem in deterministic polynomial time. I responded:

If I read all such papers, then I wouldn’t have time for anything else. It’s an interesting question how you decide whether a given paper crosses the plausibility threshold or not. For me personally, the AKS “PRIMES in P” paper somehow crossed it whereas this one somehow doesn’t.

Of course, I’d welcome an opinion from anyone who’s actually read the paper.

Three commenters wrote in to say the paper looked good. Then the author found a bug and retracted it.

Update (1/5): Laci Babai writes in to tell me that’s not quite what happened. See here for what did happen, and here for an argument that Friedland’s approach would if sound have implied P=NP.

My purpose here is not to heap embarrassment on the author: he’s a serious mathematician who had a well-defined and interesting approach, and who (most importantly) retracted his claim as soon as a bug was discovered. (Would that everyone did the same!) Though the stakes are usually smaller, similar things have happened to most of us, including me.

Instead I want to explore the following metaquestion: suppose someone sends you a complicated solution to a famous decades-old math problem, like P vs. NP. How can you decide, in ten minutes or less, whether the solution is worth reading?

For a blogger like me — whose opinions are both expected immediately and googlable indefinitely — this question actually matters. Err in one direction, and I’ll forever be known as the hidebound reactionary who failed to recognize some 21st-century Ramanujan. Err in the other direction, and I’ll spend my whole life proofreading the work of crackpots.

A few will chime in: “but if everyone wrote out their proofs in computer-checkable form, there’d be no need for this absurd dilemma!” Sure, and if everyone buckled up there’d be fewer serious accidents. Yet here’s the bloodied patient, and here we are in the emergency room.

In deciding whether to spend time on a paper, obviously the identity of the authors plays some role. If Razborov says he proved a superlinear circuit lower bound for SAT, the claim on our attention is different than if Roofus McLoofus says the same thing. But the danger of elitism is obvious here — so in this post, I’ll only be interested in what can be inferred from the text itself.

Inspired by Sean Carroll’s closely-related Alternative-Science Respectability Checklist, without further ado I now offer the Ten Signs a Claimed Mathematical Breakthrough is Wrong.

1. The authors don’t use TeX. This simple test (suggested by Dave Bacon) already catches at least 60% of wrong mathematical breakthroughs. David Deutsch and Lov Grover are among the only known false positives.

2. The authors don’t understand the question. Maybe they mistake NP≠coNP for some claim about psychology or metaphysics. Or maybe they solve the Grover problem in O(1) queries, under some notion of quantum computing lifted from a magazine article. I’ve seen both.

3. The approach seems to yield something much stronger and maybe even false (but the authors never discuss that). They’ve proved 3SAT takes exponential time; their argument would go through just as well for 2SAT.

4. The approach conflicts with a known impossibility result (which the authors never mention). The four months I spent proving the collision lower bound actually saved me some time once or twice, when I was able to reject papers violating the bound without reading them.

5. The authors themselves switch to weasel words by the end. The abstract says “we show the problem is in P,” but the conclusion contains phrases like “seems to work” and “in all cases we have tried.” Personally, I happen to be a big fan of heuristic algorithms, honestly advertised and experimentally analyzed. But when a “proof” has turned into a “plausibility argument” by page 47 — release the hounds!

6. The paper jumps into technicalities without presenting a new idea. If a famous problem could be solved only by manipulating formulas and applying standard reductions, then it’s overwhelmingly likely someone would’ve solved it already. The exceptions to this rule are interesting precisely because they’re rare (and even with the exceptions, a new idea is usually needed to find the right manipulations in the first place).

7. The paper doesn’t build on (or in some cases even refer to) any previous work. Math is cumulative. Even Wiles and Perelman had to stand on the lemma-encrusted shoulders of giants.

8. The paper wastes lots of space on standard material. If you’d really proved P≠NP, then you wouldn’t start your paper by laboriously defining 3SAT, in a manner suggesting your readers might not have heard of it.

9. The paper waxes poetic about “practical consequences,” “deep philosophical implications,” etc. Note that most papers make exactly the opposite mistake: they never get around to explaining why anyone should read them. But when it comes to something like P≠NP, to “motivate” your result is to insult your readers’ intelligence.

10. The techniques just seem too wimpy for the problem at hand. Of all ten tests, this is the slipperiest and hardest to apply — but also the decisive one in many cases. As an analogy, suppose your friend in Boston blindfolded you, drove you around for twenty minutes, then took the blindfold off and claimed you were now in Beijing. Yes, you do see Chinese signs and pagoda roofs, and no, you can’t immediately disprove him — but based on your knowledge of both cars and geography, isn’t it more likely you’re just in Chinatown? I know it’s trite, but this is exactly how I feel when I see (for example) a paper that uses category theory to prove NL≠NP. We start in Boston, we end up in Beijing, and at no point is anything resembling an ocean ever crossed.

Obviously, these are just some heuristics I’ve found successful in the past. (The nice thing about math is that sooner or later the truth comes out, and then you know for sure whether your heuristics succeeded.) If a paper fails one or more tests (particularly tests 6-10), that doesn’t necessarily mean it’s wrong; conversely, if it passes all ten that still doesn’t mean it’s right. At some point, there might be nothing left to do except to roll up your sleeves, brew some coffee, and tell your graduate student to read the paper and report back to you.

Review of Mermin’s book

Monday, December 10th, 2007

By now, maybe a half-dozen people have asked me what I thought of David Mermin’s new book Quantum Computer Science: An Introduction; many seemed surprised when I told them I hadn’t read it. Since I aim to please, I finally cracked open my review copy on a train this weekend, and am pleased to report that … yes, it’s quite good. Indeed, the biggest problem is just Mermin’s infamous insistence on spelling qubit “Qbit.” (At this point, one might as well decide to spell quark “Qork.” A language consists of shared, often highly-irrational conventions that cannot be changed unilaterally.)

Mermin’s book is, one might say, radically limited in scope. There’s nothing about physical implementation, not a mixed state or POVM in sight, not a word on the provable limitations of quantum computers, no mention of P, NP, or BQP, and pretty much nothing about any development after 1995. The sole aim is to cover the “quantum canon” — Hadamards and Toffolis, Shor and Grover, quantum error-correction, the BB84 key distribution protocol, etc. — while dispelling various misconceptions along the way. But at that limited task, Mermin — who’s an extremely gifted expositor — does a better job than almost anyone.

He certainly does a better job than I would have. I’ll admit that, when the Mike&Ike book came out seven years ago, I naïvely imagined that the “quantum textbook problem” (or more precisely, the good quantum textbook problem) had been solved. From now on, there would be a single place where everyone could go to learn the quantum canon. And because anyone could know all that twentieth-century material by reading a single book, I could readily assume that anyone who was interested did know it, and could take that as shared background knowledge (like, say, the existence of the Roman Empire) when discussing newer topics like quantum lower bounds, the adiabatic algorithm, or BQP/qpoly.

Of course I couldn’t have been more wrong. In the years since Mike&Ike came out, the total amount of confusion in the world about the |A〉|B〉|C〉’s of quantum computing (as well as the total number of books that try to address that confusion) has increased exponentially. And so it’s good to have a distinguished physicist like Mermin patiently telling readers the following:

“To understand how to build a quantum computer … you must indeed have many years of experience in quantum mechanics and its applications under your belt. But if you only want to know what such a device is capable in principle of doing once you have it, then there is no reason to get involved in the really difficult physics of the subject.” (page xiii)

“This means that all traces of the amplitudes αx characterizing the input state have vanished from the output state. The only role they have played in the measurement is to determine the probability of a particular output.” (page 25)

“Small alterations in the phases produce small alterations in the probabilities of getting that extremely precise digital information, but not the precision of the information itself, once it is acquired.” (page 85)

Personally, I find it hard to remember that anyone needs to be told these things — and even when I do tell them, they don’t believe me (probably because I’m waving my arms too wildly). They think I’m making it up. But Mermin dispels the common misconceptions with a calm air of gravity.

I’ll end with two quibbles.

First, while Mermin talks a great deal about quantum black-box algorithms, he never once mentions the crucial distinction between the “black-box” world — the world where one can prove unconditionally both that quantum computers can solve certain problems exponentially faster than classical computers, and that they can’t solve certain other problems any faster than classical ones — and the “non-black-box” world, where all such statements are necessarily conjectural. The one time he does touch on this distinction, he gets it wrong:

“The best known classical algorithms for finding the period r of such a function take a time that grows faster than any power of the number n of bits of r (exponentially with n1/3).” (page 63)

The best classical algorithm for period-finding provably takes time that grows exponentially with n/2. The best known classical algorithms for factoring take time that grows exponentially with n1/3. But the latter algorithms (necessarily) use deeper properties of the factoring problem than just its reducibility to period-finding.

I found this an uncharacteristic omission for Mermin — whose tendency is to examine whatever he brings up from all possible angles — though perhaps it can be understood in terms of a decision to avoid any mention of complexity theory.

The second quibble is that there are no exercises.

Entanglement for peace and freedom

Friday, November 30th, 2007

A reader named Prempeh writes in the comments section of my last post:

I’m really no happier because of knowing that a phenomenon called quantum entanglement exist [sic]. Now, you say, this phenomenon has the potential to enable super-powerful computing, teleportation, … I say, until science helps me with a comprehensive, provable, repeatable methodology for using it’s [sic] results to make me (and everyone who wants to be) happy, I really do not see it as significantly more helpful than faith.

NB: Any chance that a unification theory could help the poor stave off devastating climate change caused in part by the profligacy of the west? End the brutality of war? Stop child sexual exploitation? Remove corruption, greed, racism, …

This is not a rhetorical question

A few quick non-rhetorical answers:

  1. At the least, thinking about quantum entanglement doesn’t exacerbate problems like war and climate change (if we neglect o(1) terms like the jet fuel needed to fly to conferences). The same can’t be said for many other human endeavors.
  2. The scientific revolution 400 years ago led directly to a doubling of the human lifespan, the birth of democracy and its subsequent spread across the world (Galileo, Newton → Spinoza, Hume, Locke → Paine, Jefferson → …), and the cessation of practices such as witch-burning. It’s true that those few lucky enough to have been tribal chieftains with large harems probably wouldn’t want to trade places with a modern; and also true that Hitler and Stalin managed to surpass the already-impressive brutality of the ancients. But on the whole, it seems to me that the human condition improved once we started understanding how the universe works. And given the number of utopian ideas that managed to do nothing but drench this vale of tears in new tears of their own, I don’t see the relative success of curiosity-driven science as anything to sneeze at.
  3. I do try to do my tiny part to raise awareness of climate change and other threats to civilization. Of course, every time I do so, I’m attacked in the comments section by hordes of denialists who tell me I should stick to what I know about (like quantum entanglement). There’s just no pleasing everyone.
  4. I see the central problem facing humanity — much more central than climate change, greed, racism, or anything else you mentioned — as collective stupidity. If we, as a species, weren’t so collectively stupid, we’d have error-correcting mechanisms that checked the other problems before they spiraled out of control.I also maintain the possibly-naïve hope that, if people could just understand basic conceptual points about how the world works — like why quantum entanglement doesn’t allow faster-than-light communication, but is still not the same as classical correlation — some tiny contribution might be made to fighting the collective stupidity of our species and thereby helping to ensure its continued survival. That, and not the prospect of teleportation or super-powerful computing, is what really motivates me.