Archive for the ‘Complexity’ Category

My visit to D-Wave: Beyond the roast-beef sandwich

Tuesday, February 21st, 2012

Last week I was in Vancouver, to give talks at the University of British Columbia and at the American Association for the Advancement of Science annual meeting.  As part of that visit, on Friday afternoon, John Preskill, John Martinis, Michael Freedman and I accepted a gracious invitation to tour the headquarters of D-Wave Systems in Burnaby (a suburb of Vancouver).  We started out in a conference room, where they served us cookies and sodas.  Being the mature person that I am, the possibility of the cookies being poisoned at no point crossed my mind.

Then we started the tour of D-Wave’s labs.  We looked under a microscope at the superconducting chips; we saw the cooling systems used to get the chips down to 20 millikelvin.  In an experience that harked back to the mainframe era, we actually walked inside the giant black cubes that D-Wave was preparing for shipment.  (The machines are so large partly because of the need for cooling, and partly to let engineers go in and fix things.)  Afterwards, D-Wave CTO Geordie Rose gave a 2-hour presentation about their latest experimental results.  Then we all went out to dinner.  The D-Wave folks were extremely cordial to us and fielded all of our questions.

In spite of my announcement almost a year ago that I was retiring as Chief D-Wave Skeptic, I thought it would be fitting to give Shtetl-Optimized readers an update on what I learned from this visit.  I’ll start with three factual points before moving on to larger issues.

Point #1: D-Wave now has a 128-(qu)bit machine that can output approximate solutions to a particular NP-hard minimization problem—namely, the problem of minimizing the energy of 90-100 Ising spins with pairwise interactions along a certain fixed graph (the “input” to the machine being the tunable interaction strengths).  So I hereby retire my notorious comment from 2007, about the 16-bit machine that D-Wave used for its Sudoku demonstration being no more computationally-useful than a roast-beef sandwich.  D-Wave does have something today that’s more computationally-useful than a roast-beef sandwich; the question is “merely” whether it’s ever more useful than your laptop.  Geordie presented graphs that showed D-Wave’s quantum annealer solving its Ising spin problem “faster” than classical simulated annealing and tabu search (where “faster” means ignoring the time for cooling the annealer down, which seemed fair to me).  Unfortunately, the data didn’t go up to large input sizes, while the data that did go up to large input sizes only compared against complete classical algorithms rather than heuristic ones.  (Of course, all this is leaving aside the large blowups that would likely be incurred in practice, from reducing practical optimization problems to D-Wave’s fixed Ising spin problem.)  In summary, while the observed speedup is certainly interesting, it remains unclear exactly what to make of it, and especially, whether or not quantum coherence is playing a role.

Which brings me to Point #2.  It remains true, as I’ve reiterated here for years, that we have no direct evidence that quantum coherence is playing a role in the observed speedup, or indeed that entanglement between qubits is ever present in the system.  (Note that, if there’s no entanglement, then it becomes extremely implausible that quantum coherence could be playing a role in a speedup.  For while separable-mixed-state quantum computers are not yet known to be efficiently simulable classically, we certainly don’t have any examples where they give a speedup.)  Last year, as reported on this blog, D-Wave had a nice Nature paper that reported quantum tunneling behavior in an 8-qubit system.  However, when I asked D-Wave scientist Mohammad Amin, he said he didn’t think that experiment provided any evidence for entanglement between qubits.

The “obvious” way to demonstrate entanglement between qubits would be to show a Bell inequality violation.  (We know that this can be done in superconducting qubits, as the Schoelkopf group at Yale among others reported it a couple years ago.)  Meanwhile, the “obvious” way to demonstrate a role for quantum coherence in the apparent speedup would be gradually to “turn down” the system’s coherence (for example, by adding an interaction that constantly measured the qubits in the computational basis), and check that the annealer’s performance degraded to that of classical simulated annealing.  Unfortunately, the D-Wave folks told us that neither experiment seems feasible with their current setup, basically because they don’t have arbitrary local unitary transformations and measurements available.  They said they want to try to demonstrate 2-qubit entanglement, but in the meantime, are open to other ideas for how to demonstrate a quantum role in the apparent speedup with their existing setup.

Point #3: D-Wave was finally able to clarify a conceptual point that had been bugging me for years.  I—and apparently many others!—thought D-Wave was claiming that their qubits decohere almost immediately (so that, in particular, entanglement would almost certainly never be present during the computation), but that the lack of entanglement didn’t matter, for some complicated reason having to do with energy gaps.  I was far from alone in regarding such a claim as incredible: as mentioned earlier, there’s no evidence that a quantum computer without entanglement can solve any problem asymptotically faster than a classical computer.  However, that isn’t D-Wave’s claim.  What they think is that their system decoheres almost immediately in the energy eigenbasis, but that it doesn’t decohere in the computational basis—so that, in particular, there would be entanglement at intermediate stages.  If so, that would be perfectly fine from the standpoint of the adiabatic algorithm, which doesn’t need coherence in the energy eigenbasis anyway (after all, the whole point is that, throughout the computation, you want to stay as close to the system’s ground state as possible!).  I understand that, given their knowledge of decoherence mechanisms, some physicists are extremely skeptical that you could have rapid decoherence in the energy basis without getting decoherence in the computational basis also.  So certainly the burden is on D-Wave to demonstrate that they maintain coherence “where it counts.”  But at least I now understand what they’re claiming, and how it would be compatible (if true) with a quantum speedup.

Let me now move on to three broader questions raised by the above points.

The first is: rather than constantly adding more qubits and issuing more hard-to-evaluate announcements, while leaving the scientific characterization of its devices in a state of limbo, why doesn’t D-Wave just focus all its efforts on demonstrating entanglement, or otherwise getting stronger evidence for a quantum role in the apparent speedup?  When I put this question to Mohammad Amin, he said that, if D-Wave had followed my suggestion, it would have published some interesting research papers and then gone out of business—since the fundraising pressure is always for more qubits and more dramatic announcements, not for clearer understanding of its systems.  So, let me try to get a message out to the pointy-haired bosses of the world: a single qubit that you understand is better than a thousand qubits that you don’t.  There’s a reason why academic quantum computing groups focus on pushing down decoherence and demonstrating entanglement in 2, 3, or 4 qubits: because that way, at least you know that the qubits are qubits!  Once you’ve shown that the foundation is solid, then you try to scale up.  So, please support D-Wave if it wants to spend money to show Bell inequality violations, or other “smoking-gun” evidence that its qubits are working together coherently.  You’re welcome, D-Wave!

The second question is one that I’ve encountered many times on the blogosphere: who cares how D-Wave’s system works, and whether it does or doesn’t exploit quantum coherence, as long as it solves practical problems faster?  Sure, maybe what D-Wave is building is really a series of interesting, useful, but still basically “classical” annealing devices.  Maybe the word “quantum” is functioning here as the stone in a stone soup: attracting money, interest, and talented people to build something that, while neat, ultimately doesn’t much depend on quantum mechanics at all.  As long as D-Wave’s (literal!) black box solves the problem instances in such-and-such amount of time, why does it matter what’s inside?

To see the obtuseness of this question, consider a simple thought experiment: suppose D-Wave were marketing a classical, special-purpose, $10-million computer designed to perform simulated annealing, for 90-bit Ising spin glass problems with a certain fixed topology, somewhat better than an off-the-shelf computing cluster. Would there be even 5% of the public interest that there is now? I think D-Wave itself would be the first to admit the answer is no. Indeed, Geordie Rose spoke explicitly in his presentation about the compelling nature of (as he put it) “the quantum computing story,” and how it was key to attracting investment. People don’t care about this stuff because they want to find the ground states of Ising spin systems a bit faster; they care because they want to know whether or not the human race has finally achieved a new form of computing. So characterizing the device matters, goddammit! I pride myself on being willing to adjust my opinions on just about anything in response to new data (as I’ve certainly done in D-Wave’s case), but the insistence that black boxes must be opened and explanations provided is something I’ll carry to the grave. Finally, given the skeptical-yet-positive tone of this post, some people will wonder whether I now regret my earlier, more unmitigated D-Wave skepticism. The answer is no! Asking questions is my job. I’ll give D-Wave credit whenever it answers some of the questions—as it did on this visit!—and will shift my views accordingly. But I’ll also neither stop asking nor apologize for asking, until the evidence for a quantum speedup becomes clear and indisputable (as it certainly hasn’t yet). On the other hand, I do regret the snowballing nastiness that developed as a combined result of my and other skeptics’ statements, D-Wave’s and its supporters’ statements, and the adversarial nature of the blogosphere. For the first time, I find myself really, genuinely hoping—with all my heart—that D-Wave will succeed in proving that it can do some (not necessarily universal) form of scalable quantum computation. For, if nothing else, such a success would prove to the world that my$100,000 is safe, and decisively refute the QC skeptics who, right now, are getting even further under my skin than the uncritical D-Wave boosters ever did.

Whether or not God plays dice, I do

Friday, February 3rd, 2012

Another Update (Feb. 7): I have a new piece up at IEEE Spectrum, explaining why I made this bet.  Thanks to Rachel Courtland for soliciting the piece and for her suggestions improving it.

Update: My $100,000 offer for disproving scalable quantum computing has been Slashdotted. Reading through the comments was amusing as always. The top comment suggested that winning my prize was trivial: “Just point a gun at his head and ask him ‘Convinced?’” (For the record: no, I wouldn’t be, even as I handed over my money. And if you want to be a street thug, why limit yourself to victims who happen to have made public bets about quantum computing?) Many people assumed I was a QC skeptic, and was offering the prize because I hoped to spur research aimed at disproving QC. (Which is actually an interesting misreading: I wonder how much “pro-paranormal” research has been spurred by James Randi’s million-dollar prize?) Other people said the bet was irrelevant since D-Wave has already built scalable QCs. (Oh, how I wish I could put the D-Wave boosters and the QC deniers in the same room, and let them duke it out with each other while leaving me alone for a while!) One person argued that it would be easy to prove the impossibility of scalable QCs, just like it would’ve been easy to prove the impossibility of scalable classical computers in 1946: the only problem is that both proofs would then be invalidated by advances in technology. (I think he understands the word “proof” differently than I do.) Then, buried deep in the comments, with a score of 2 out of 5, was one person who understood precisely: I think he’s saying that while a general quantum computer might be a very long way off, the underlying theory that allows such a thing to exist is on very solid ground (which is why he’s putting up the money). Of course this prize might still cost him since if the news of the prize goes viral he’s going to spend the next decade getting spammed by kooks. OK, two people: There’s some needed context. Aaronson himself works on quantum complexity theory. Much of his work deals with quantum computers (at a conceptual level–what is and isn’t possible). Yet there are some people who reject the idea the quantum computers can scale to “useful” sizes–including some very smart people like Leonid Levin (of Cook-Levin Theorem fame)–and some of them send him email, questions, comments on his blog, etc. saying so. These people are essentially asserting that Aaronson’s career is rooted in things that can’t exist. Thus, Aaronson essentially said “prove it.” It’s true that proving such a statement would be very difficult … But the context is that Aaronson gets mail and questions all the time from people who simply assert that scalable QC is impossible, and he’s challenging them to be more formal about it. He also mentions, in fairness, that if he does have to pay out, he’d consider it an honor, because it would be a great scientific advance. For better or worse, I’m now offering a US$100,000 award for a demonstration, convincing to me, that scalable quantum computing is impossible in the physical world.  This award has no time limit other than my death, and is entirely at my discretion (though if you want to convince me, a good approach would be to convince most of the physics community first).  I might, also at my discretion, decide to split the award among several people or groups, or give a smaller award for a discovery that dramatically weakens the possibility of scalable QC while still leaving it open.  I don’t promise to read every claimed refutation of QC that’s emailed to me.  Indeed, you needn’t even bother to send me your refutation directly: just convince most of the physics community, and believe me, I’ll hear about it!  The prize amount will not be adjusted for inflation.

The impetus for this prize was a post on Dick Lipton’s blog, entitled “Perpetual Motion of the 21st Century?”  (See also this followup post.)  The post consists of a debate between well-known quantum-computing skeptic Gil Kalai and well-known quantum-computing researcher Aram Harrow (Shtetl-Optimized commenters both), about the assumptions behind the Quantum Fault-Tolerance Theorem.  So far, the debate covers well-trodden ground, but I understand that it will continue for a while longer.  Anyway, in the comments section of the post, I pointed out that a refutation of scalable QC would require, not merely poking this or that hole in the Fault-Tolerance Theorem, but the construction of a dramatically-new, classically-efficiently-simulable picture of physical reality: something I don’t expect but would welcome as the scientific thrill of my life.  Gil more-or-less dared me to put a large cash prize behind my words—as I’m now, apparently, known for doing!—and I accepted his dare.

To clarify: no, I don’t expect ever to have to pay the prize, but that’s not, by itself, a sufficient reason for offering it.  After all, I also don’t expect Newt to win the Republican primary, but I’m not ready to put $100,000 on the line for that belief. The real reason to offer this prize is that, if I did have to pay, at least doing so would be an honor: for I’d then (presumably) simply be adding a little to the well-deserved Nobel Prize coffers of one of the greatest revolutionaries in the history of physics. Over on Lipton’s blog, my offer was criticized for being “like offering$100,000 to anyone who can prove that Bigfoot doesn’t exist.”  To me, though, that completely misses the point.  As I wrote there, whether Bigfoot exists is a question about the contingent history of evolution on Earth.  By contrast, whether scalable quantum computing is possible is a question about the laws of physics.  It’s perfectly conceivable that future developments in physics would conflict with scalable quantum computing, in the same way that relativity conflicts with faster-than-light communication, and the Second Law of Thermodynamics conflicts with perpetuum mobiles.  It’s for such a development in physics that I’m offering this prize.

Update: If anyone wants to offer a counterpart prize for a demonstration that scalable quantum computing is possible, I’ll be happy for that—as I’m sure, will many experimental QC groups around the world.  I’m certainly not offering such a prize.

Cerebrum-stuffer from Shtetl Claus

Saturday, December 24th, 2011

Ho3!  Home with family for the holidays and looking for something to do?  Then check out the archives of our 6.893 Philosophy and Theoretical Computer Science course blog.  The course just ended last week, so you can find discussions of everything from the interpretation of quantum mechanics to Occam’s Razor to the Church-Turing Thesis to strong AI, as well as links to student projects, including Criticisms of the Turing Test and Why You Should Ignore (Most of) Them, Barwise Inverse Relation Principle, Bayesian Surprise, Boosting, and Other Things that Begin with the Letter B, and an interactive demonstration of interactive proofs.  Thanks to my TA Andy Drucker, and especially to the students, for making this such an interesting course.

My New York Times essay on quantum computing

Monday, December 5th, 2011

I have a special treat for those commenters who consider me an incorrigible publicity-hound: an essay I was invited to write for the New York Times Science section, entitled Quantum Computing Promises New Insights, Not Just Supermachines.  (My original title was “The Real Reasons to Study Quantum Computing.”)  This piece is part of a collection of essays on “the future of computing,” which include one on self-driving cars by Sebastian Thrun, one on online learning by Daphne Koller, and other interesting stuff (the full list is here).

In writing my essay, the basic constraints were:

(a) I’d been given a rare opportunity to challenge at least ten popular misconceptions about quantum computing, and would kick myself for years if I didn’t hit all of them,

(b) I couldn’t presuppose the reader had heard of quantum computing, and

Satisfying these constraints was harder than it looked, and I benefited greatly from the feedback of friends and colleagues, as well as the enormously helpful Times staff.  I did get one request that floored me: namely, to remove all the material about “interference” and “amplitudes” (too technical), and replace it by something ordinary people could better relate to—like, say, a description of how a quantum computer would work by trying every possible answer in parallel.  Eventually, though, the Gray Lady and I found a compromise that everyone liked (and that actually improved the piece): namely, I’d first summarize the usual “try all answers in parallel” view, and then explain why it was wrong, bringing in the minus signs and Speaking Truth to Parallelism.

To accompany the essay, I also did a short podcast interview about quantum computing with the Times‘ David Corcoran.  (My part starts around 8:20.)  Overall, I’m happy with the interview, but be warned: when Corcoran asks me what quantum computers’ potential is, I start talking about the “try all answers in parallel” misconception—and then they cut to the next question before I get to the part about its being a misconception!  I need to get better at delivering soundbites…

One final comment: in case you’re wondering, those black spots on the Times‘ cartoon of me seem to be artifacts of whatever photo-editing software they used.  They’re not shrapnel wounds or disfiguring acne.

Quantum Algorithms for Quantum Field Theories

Saturday, December 3rd, 2011

For weeks, I’ve been meaning to blog about an important recent paper by Stephen Jordan, Keith Lee, and John Preskill, entitled Quantum Algorithms for Quantum Field Theories.  So I’m now doing so.

As long as I’ve been in quantum computing, people have been wondering aloud about the computational power of realistic quantum field theories (for example, the Standard Model of elementary particles).  But no one seemed to have any detailed analysis of this question (if there’s something I missed, surely commenters will let me know).  The “obvious” guess would be that realistic quantum field theories should provide exactly the same computational power as “ordinary,” nonrelativistic quantum mechanics—in other words, the power of BQP (the class of problems solvable in polynomial time by a quantum computer).  That would be analogous to the situation in classical physics, where bringing in special relativity dramatically changes our understanding of space, time, matter, and energy, but seems (unlike quantum mechanics) to have little or no effect on which computational problems can be solved efficiently.  Analogously, it would seem strange if quantum field theories (QFTs)—which tie together quantum mechanics, special relativity, and detailed knowledge about the elementary particles and their interactions, but seen from far enough away are “just” quantum mechanics—forced any major revision to quantum computing theory.

Until now, though, there seems to have been only one detailed analysis supporting that conclusion, and it applied to (2+1)-dimensional topological QFTs (TQFTs) only, rather than “realistic” (3+1)-dimensional QFTs.  This was the seminal work of Freedman, Kitaev, and Wang and Freedman, Larsen, and Wang in 2000.  (Six years later, Aharonov, Jones, and Landau gave a more computer-science-friendly version, by directly proving the BQP-completeness of approximating the Jones polynomial at roots of unity.  The latter problem was known to be closely-related to simulating TQFTs, from the celebrated work of Witten and others in the 1980s.)  To a theoretical computer scientist, dropping from three to two spatial dimensions might not sound like a big deal, but what’s important is that the relevant degrees of freedom become “topological”, making possible a clean, simple model of computation.  For “realistic” QFTs, by contrast, it wasn’t even obvious how to define a model of computation; putting realistic QFTs on a rigorous mathematical footing remains a notorious open problem.

In their new paper, Jordan, Lee, and Preskill say that they give an algorithm, running on a “conventional” quantum computer, to estimate scattering probabilities in a class of QFTs called “continuum φ4 theories.” Their algorithm uses time polynomial in the number of incoming particles in the scattering experiment and in their total energy, and inversely polynomial in the desired precision ε and in the distance λ-λc between the QFT’s coupling constant λ and a phase transition λc.  (In d=2 spatial dimensions, they say the dependence on the precision scales like (1/ε)2.376, the 2.376 coming from matrix multiplication. Naturally, that should now be amended to (1/ε)2.373.)  To develop their algorithm, Jordan et al. apparently had to introduce some new techniques for coping with the error incurred by discretizing QFTs.  No classical algorithm is known with similar scaling—so when suitably formalized, the “QFT simulation problem” might indeed be in BQP-BPP, matching the uninformed doofus intuition of complexity theorists like me.  Jordan et al. don’t say whether the problem they’re solving is also BQP-complete; I imagine that could be a topic for future research.  They also don’t say whether their precision parameter ε bounds the variation distance between the real and simulated output distributions (rather than just the differences between probabilities of individual scattering outcomes); I hope they or someone else will be able to clarify that point.

In case it isn’t obvious yet, let me make it crystal-clear that I lack the physics background to evaluate Jordan et al.’s work in a serious technical way.  All I can say with confidence is that the small number of people who (1) have the requisite background and (2) care about computational complexity, will probably spend non-negligible time discussing and understanding this paper in the weeks and months to come.

Conflict-of-Interest Warning: At a deep, subconscious level, I probably chose to blog about Jordan et al.’s paper not for any legitimate scientific reason, but simply because I know John Preskill and Stephen Jordan personally, and, despite being physicists, they’re both tremendously-respected colleagues who’ve made many outstanding contributions to quantum computing theory besides this one.  Then again, everything I’ve ever done—and everything you’ve ever done—has probably had such unsavory hidden motives as well, so who’s counting?  In all of history, there have only been ten or twenty people whose commitment to scientific objectivity has been absolute and pure, and since they comment on complexity blogs anonymously, we’ll probably never even know their names…

The Alternative to Resentment

Friday, December 2nd, 2011

A year ago, in a post entitled Anti-Complexitism, I tried to grapple with the strange phenomenon—one we’ve seen in force this past week—of anonymous commenters getting angry about the mere fact of announcements, on theoretical computer science blogs, of progress on longstanding open problems in theoretical computer science.  When I post something about global warming, Osama Bin Laden, or (of course) the  interpretation of quantum mechanics, I expect a groundswell of anger … but a lowering of the matrix-multiplication exponent ω?  Huh?  What was that about?

Well, in this case, some commenters were upset about attribution issues (which hopefully we can put behind us now, everyone agreeing about the importance of both Stothers’ and Vassilevska Williams’ contributions), while others honestly but mistakenly believed that a small improvement to ω isn’t a big deal (I tried to explain why they’re wrong here).  What interests me in this post is the commenters who went further, positing the existence of a powerful “clique” of complexity bloggers that’s doing something reprehensible by “hyping” progress in complexity theory, or by exceeding some quota (what, exactly?) on the use of the word “breakthrough.”

One of the sharpest responses to that paranoid worldview came (ironically) from a wonderful anonymous comment on my Anti-Complexitism post, which I recommend everyone read.  Here was my favorite paragraph:

The final criticism [by the anti-complexites] seems to be: complexity theory makes too much noise which people in other areas do not like.  I really don’t understand this one, I mean what is wrong with people in an area being excited about their area?  Is that wrong?  And where do we make those noise?  On complexity blogs!  If you don’t like complexity theorists being excited about their area why are you reading these blogs?  The metaphor would be an outsider going to a wedding and asking the people in the wedding with a very serious tone: “why is everyone happy here?”

Yesterday, in response to my reposting the above comment on Lance and Bill’s blog, another anonymous commenter had something extremely illuminating to say:

Scott, you are missing the larger socio-economical context: it’s not about excitement.  It’s about researchers competing for scarce resources, primarily funding.  The work involved in funding acquisition is generally loathed, and directly reduces the time scientists have for research and teaching.  If some researchers ramp up their hype-level vis-a-vis the rest of the community, as the complexity community is believed to be doing (what with all them Goedel awards?), they are forcing (or are seen as forcing) the rest either to accept a lower level of funding with all the concomitant disadvantages, or invest more time in hype themselves.  In other words, hypers are defecting in the prisoners dilemma type game scientists are playing, the objective of which is to minimise the labour involved in funding acquisition.

This is similar to teeth-whitening: in the past, it was perfectly possible to be considered attractive with natural, slightly yellowish teeth. Then some defected by bleaching, then more and more, and today natural teeth are socially hardly acceptable, certainly not if you want to be good-looking.  Is that progress?

I posted a response on Lance and Bill’s blog, but then decided it was important enough to repost here.  So:

Dear Anonymous 2:47,

Let me see whether I understand you correctly.  On the view you propose, other scientists shouldn’t have praised (say) Carl Sagan for getting millions of people around the world excited about science.  Rather, they should have despised him, for using hype to divert scarce funding dollars from their own fields to the fields Sagan favored (like astronomy, or Sagan’s preferred parts of astronomy).  Sagan forced all those other scientists to accept a terrible choice: either accept reduced funding, or else sink to Sagan’s level, and perform the loathed task of communicating their own excitement about their own fields to the public.

Actually, there were other scientists who drew essentially that conclusion.  As an example, Sagan was famously denied membership in the National Academy of Sciences, apparently because of a few vocal NAS members who were jealous and resentful of Sagan’s outreach activities.  The view we’re now being asked to accept is that those NAS members are the ones who emerge from the story the moral victors.

So let me thank you, Anonymous 2:47: it’s rare for anyone to explain the motivation behind angry TCS blog comments with that much candor.

Now that the real motivation has (apparently) crawled out from underneath its rock, I can examine it and refute it.  The central point is simply that science isn’t a Prisoner’s-Dilemma-type game.   What you describe as the “socially optimal equilibrium,” where no scientists need to be bothered to communicate their excitement about their fields, is not socially optimal at all—neither from the public’s standpoint nor from science’s.

At the crudest level, science funding is not a fixed-size pie.  For example, when Congress was debating the cancellation of the Superconducting Supercollider, a few physicists from other fields eagerly jumped on the anti-SSC bandwagon, hoping that the SSC money might then get diverted to their own fields.  Ultimately, of course, the SSC was cancelled, and none of the money ever found its way to other areas of physics.

So, if you see people using blogs to talk about research results that excite them, then instead of resenting it, consider starting your own blog to talk about the research results that excite YOU.  If your blog is well-written and interesting, I’ll even add you to my blogroll, game-theoretic funding considerations be damned.  Just go to WordPress.com—it’s free, and it takes only a few minutes to set one up.

ITCS’2012 in Cambridge, MA

Tuesday, November 29th, 2011

Since everything I write now seems to provide an occasion for bitter controversy, I’ll be curious to learn whose sensibilities I inadvertently offended by posting the following announcement for next year’s ITCS conference. -SA

Dear Theorists:

As you know the third Innovation in Theoretical Computer Science Conference will be held in Cambridge this January:  http://research.microsoft.com/en-us/um/newengland/events/itcs2012/.

REGISTRATION IS NOW OPEN and THE PROGRAM IS ONLINE.

In addition to the program, there are going to be a few novelties that we would like to point out to you.

In one session of the conference, students graduating this academic year (as well as researchers completing their postdoc this academic year) will be given few minutes to present themselves and their work.

The presentations will be grouped by University, in alphabetic order.

We hope this will give all of us an opportunity to have a synopsis of the great work being done by the “graduating” members of our community.

In order to speak in this special session, please send an email at  silvio.itcs12@gmail.com by DECEMBER 15.

Registration fees will be waived for presenters at Graduating Bits 2012.

If you/your students are graduating this year, or you plan to hire this year, we are encourage to attend ITCS 2012!

2. COMMUNITY BUILDING

To strengthen our (legendary!) friendship and collaboration, we will treat you to a PLAY BACK show: an improvisational theater where OUR actors will bring to life YOUR stories.

3. CHAIR RANTS

In addition to the chair of each session introducing the speakers and coauthors of the session (who will then introduce themselves and their coauthors), our chairs will provide us with their insights on the papers in their sessions.

We look forward to seeing all of you in Cambridge very soon!

All the Best

Shafi Goldwasser, Silvio Micali, and Yael Tauman Kalai

2.373

Monday, November 28th, 2011

For twenty years, the fastest known algorithm to multiply two n-by-n matrices, due to Coppersmith and Winograd, took a leisurely O(n2.376) steps.   Last year, though, in his PhD thesis, Andrew Stothers gave an improvement to O(n2.374) steps.  And today,  Virginia Vassilevska Williams of Berkeley and Stanford, released a paper that gives a general methodology for analyzing Coppersmith-Winograd-type algorithms, and that improves the matrix-multiplication time to a lightning-fast O(n2.373) steps.  (Virgi’s work was independent of Stothers’, though she credits him and applies an idea of his to simplify her proof.)  Full disclosure: I actually knew a month ago that this was coming—I had a hell of a time keeping the secret.  I’d recommend that you get started memorizing “ω<2.373,” but as Russell Impagliazzo points out in the comments, the exponent might get lowered again in short order.  Huge congratulations to Virgi and to Andrew for this breakthrough!

Update (Nov. 30): Last night I received an extremely gracious email from Andrew Stothers, which he’s given me permission to summarize here.  In the email, Andrew expressed how excited he was about Virgi’s new result, apologized for the confusion he caused by not mentioning his improvement to ω until page 71 of his thesis (he says he doesn’t know why he did it), and said that he meant to publish a paper, but was prevented from doing so by health and job issues.  He also said that he didn’t take issue with anything I wrote here, except that I mistakenly referred to him as Andy rather than Andrew.  In response, I congratulated Andrew on his achievement; expressed how happy I was that—ironically—his work is now finally getting some of the attention that it deserves; and promised to buy him a beer when and if I’m ever in Edinburgh, a city I’ve always wanted to visit.  (On the other hand, I warned Andrew that his LinkedIn profile, which unselfconsciously mentions improvements to his Word and Excel skills as one of the benefits of his PhD research breaching the Coppersmith-Winograd barrier, might have earned him a place in scientific folklore forever!)

In summary, I now see Andrew as an extraordinarily nice fellow who had some bad luck and—most conspicuously—a lack of good advice from people around him.  I do stand by the points that I was originally trying to make:

(a) that this tangled situation shouldn’t in any way detract from Virgi’s fantastic achievement, which (except for a simplification, as she discusses) must be considered completely independent of Andrew’s, and

(b) that there’s indeed an important cautionary lesson for students here, about adequately publicizing your work (yes, there’s a happy medium, between hiring a PR firm to wage a viral marketing campaign and burying your solution to a longstanding open problem so far in the body of your PhD thesis that even world experts in the subject who read your thesis will miss it).

On the other hand, I hereby apologize for anything I said that could even be perceived as slighting Andrew, his important work, or his motives.

Another Update: On the third hand, if you’re one of the commenters whose beef is not about attribution, but about the entire concept of using a CS theory blog to “promote” major milestones in CS theory (like the breaking of the Coppersmith-Winograd barrier), then I apologize for absolutely nothing.  Go read an economics or physics blog; I understand that those are entirely hype-free.  Better yet, go to hell.

In Defense of Kolmogorov Complexity

Tuesday, September 27th, 2011

I got lots of useful and interesting feedback on my last post, though I also learned a valuable sociological lesson about the “two kinds of complexity theory”:

If you write about the kind of complexity theory that involves acronyms like NP, BQP/qpoly, and r.s.r., people will think the issues must be difficult and arcane, even if they’re not and can be understood with very little effort.  By contrast, if you write about the kind of complexity theory that can be illustrated using pictures of coffee cups, people will think the issues can be sorted out with 15 seconds of thought, and will happily propose ‘solutions’ that presuppose what needs to be explained, answer a different question, or fail in simple examples.

Seriously, a large number of commenters raised two important questions, which I’d like to address forthwith in this followup post.

The first question is why I omitted the notion of coarse-graining, which plays a central role in many accounts of entropy and complexity. The short answer is that I shouldn’t have omitted it.  In fact, as both Sean Carroll and Luca Trevisan (among others) quickly pointed out, one can tell a perfectly-reasonable story about the coffee cup by defining the “complextropy,” not in terms of sophistication, but in terms of the ordinary Kolmogorov complexity of a coarse-grained or “smeared-out” state.  If you define the complextropy that way, it should increase and then decrease as desired, and furthermore, it’s probably easier to prove that statement than using the sophistication-based definition (though both versions seem highly nontrivial to analyze).

So, the reason I turned to sophistication was basically just the mathematician’s instinct to situate every concept in the most general structure where that concept makes sense.  For example, why define “connectedness” for polygons in the Euclidean plane, if the concept makes sense for arbitrary topological spaces?  Or in our case, why define “complextropy” for dynamical systems that happen to have a spatial structure over which one can coarse-grain, if the concept also makes sense for arbitrary dynamical systems whose evolution is computable by an efficient algorithm?  Of course, [OPEN PROBLEM ALERT] it would be wonderful to know whether the two types of complextropy can be shown to be related for those dynamical systems for which they both make sense, or whether we can construct a convincing example that separates the two.

The second question is why I invoked Kolmogorov complexity in a discussion about thermodynamics: many people seemed to think that, by doing so, I was making some novel or controversial claim.  I wasn’t.  People like Charles Bennett, Seth Lloyd, and Wojciech Zurek have employed Kolmogorov complexity as a useful language for thermodynamics since the 1980s; I was simply following in their footsteps.  Basically, what Kolmogorov complexity lets you do is talk in a well-defined way about the “entropy” or “randomness” of an individual object, without reference to any ensemble from which the object was drawn.  And this is often extremely convenient: notice that Kolmogorov complexity snuck its way in even when we defined complextropy in terms of coarse-graining!

Of course, if our dynamical system is probabilistic, then we always can talk instead about the “actual” entropy; in that case Kolmogorov complexity basically just amounts to a shorthand.  On the other hand, if our system is deterministic, then talking about the (resource-bounded) Kolmogorov complexity seems essential—since in that case there’s no “true” randomness at all, only pseudorandomness.

But a few commenters went further, disparaging Kolmogorov complexity itself rather than just its application to a particular problem.  Here’s Shtetl-Optimized regular Raoul Ohio:

As usual, my DAH (Devil’s Advocate Hat) is on. This is convenient, because it allows you to comment on anything without doing the work to really understanding it. Thus I will proceed to disparage the notion of using Kolmogorov Complexity (KC) for anything but entertainment.

Math is a subject where a couple of interesting definitions and a few theorems can launch a subfield such as KC. I have never studied KC … but a brief reading of the subject suggests that it started as a joke, and today a lot of people are not in on it.

… the KC of things would change as knowledge in other fields progresses. For example, what is the KC of

δ = 4.66920160910299067185320382…, and

α = 2.502907875095892822283902873218… ?

These are Feigenbaum’s constants (http://en.wikipedia.org/wiki/Feigenbaum_constants). A couple of decades ago, no one knew anything about these numbers. With the concept of analyzing discrete dynamical systems by bifurcation diagrams in hand, these can be calculated with a short program. So, did KC(δ) and KC(α) drop dramatically 20 odd years ago?

…using KC reminds me of physics arguments that use the wave function for the universe. Sure, there must be such a thing, but it is hard to say much about it.

On the other side of the coin, the theorems and proofs in basic KC are rather similar to those in many fields of TCS, and many SO [Shtetl-Optimized] readers might not think of these as a joke…

My intuition is that the entire concept of KC is “ill-posed”, to borrow a term from PDE.

In the interest of “full disclosure”, I must mention that often in the past I have thought some topic was a bunch of hooey until I understood it, after which I thought is was profound, just like listening to Lenard [sic] Cohen.

I wrote a reply to Raoul, and then decided that it should go into a top-level post, for the edification of Kolmogorov-skeptics everywhere.  So without further ado:

Hi Raoul!

I think this is indeed one of those cases where if you understood more, you’d see why your dismissal was wrong. And unlike with (say) art, music, or religion, the reasons why your dismissal is wrong can be articulated in words!

Contrary to what you say, K(x) is not undefinable: I’ll define it right now, as the length of the shortest prefix-free program (in some fixed universal programming language) that prints x and then halts! K(x) is uncomputable, but that’s a very different issue, and something that’s been known since the 1960s.

Basically, what K(x) lets you do is give a clear, observer-independent meaning to the loose notion of there “not existing any patterns” in a string. Already from that statement, it’s obvious that K(x) is going to be hard to compute—for as you correctly point out, detecting the existence or nonexistence of patterns is hard!

(Though contrary to what you say, K(Feigenbaum’s constant) didn’t suddenly become small when Feigenbaum defined the constant, any more than 42038542390523059230 suddenly became composite when I wrote it down, probably for the first time in human history. Please don’t tell me that you make no distinction between mathematical truths and our knowledge of them!)

The key point is that, even without being able to compute K(x) for most x’s, you can still use the definition of K(x) to give meaning to hundreds of intuitions that otherwise would’ve remained forever at a handwaving level. For example:

“The overwhelming majority of strings are patternless.”

“If a short computer program outputs a patternless string, then it can only be doing so by generating the string randomly.”

And many, many less obvious statements—every one of which can be upgraded to a theorem once you have a mathematical definition of “patternlessness”!

Furthermore, the idea of Kolmogorov complexity has actually inspired some important experimental work! For example, if you could compute K, then you could compute the “similarity” between two DNA sequences D1 and D2 by comparing K(D1)+K(D2) to K(D1,D2).

Of course you can’t compute K, but you can compute useful upper bounds on it. For example, let G(x) be the number of bits in the gzip compression of the string x. Then comparing G(D1)+G(D2) to G(D1,D2) has turned out to be a very useful way to measure similarity between DNA sequences.

It’s really no different from how, even though we can never say whether a curve in the physical world is continuous or not (since that would require infinitely precise measurements), the mathematical theories dealing with continuity (e.g., calculus, topology) can still be applied in physics in all sorts of ways.

The First Law of Complexodynamics

Friday, September 23rd, 2011

A few weeks ago, I had the pleasure of attending FQXi’s Setting Time Aright conference, part of which took place on a cruise from Bergen, Norway to Copenhagen, Denmark.  (Why aren’t theoretical computer science conferences ever held on cruises?  If nothing else, it certainly cuts down on attendees sneaking away from the conference venue.)  This conference brought together physicists, cosmologists, philosophers, biologists, psychologists, and (for some strange reason) one quantum complexity blogger to pontificate about the existence, directionality, and nature of time.  If you want to know more about the conference, check out Sean Carroll’s Cosmic Variance posts here and here.

Sean also delivered the opening talk of the conference, during which (among other things) he asked a beautiful question: why does “complexity” or “interestingness” of physical systems seem to increase with time and then hit a maximum and decrease, in contrast to the entropy, which of course increases monotonically?

My purpose, in this post, is to sketch a possible answer to Sean’s question, drawing on concepts from Kolmogorov complexity.  If this answer has been suggested before, I’m sure someone will let me know in the comments section.

First, some background: we all know the Second Law, which says that the entropy of any closed system tends to increase with time until it reaches a maximum value.  Here “entropy” is slippery to define—we’ll come back to that later—but somehow measures how “random” or “generic” or “disordered” a system is.  As Sean points out in his wonderful book From Eternity to Here, the Second Law is almost a tautology: how could a system not tend to evolve to more “generic” configurations?  if it didn’t, those configurations wouldn’t be generic!  So the real question is not why the entropy is increasing, but why it was ever low to begin with.  In other words, why did the universe’s initial state at the big bang contain so much order for the universe’s subsequent evolution to destroy?  I won’t address that celebrated mystery in this post, but will simply take the low entropy of the initial state as given.

The point that interests us is this: even though isolated physical systems get monotonically more entropic, they don’t get monotonically more “complicated” or “interesting.”  Sean didn’t define what he meant by “complicated” or “interesting” here—indeed, defining those concepts was part of his challenge—but he illustrated what he had in mind with the example of a coffee cup.  Shamelessly ripping off his slides:

Entropy increases monotonically from left to right, but intuitively, the “complexity” seems highest in the middle picture: the one with all the tendrils of milk.  And same is true for the whole universe: shortly after the big bang, the universe was basically just a low-entropy soup of high-energy particles.  A googol years from now, after the last black holes have sputtered away in bursts of Hawking radiation, the universe will basically be just a high-entropy soup of low-energy particles.  But today, in between, the universe contains interesting structures such as galaxies and brains and hot-dog-shaped novelty vehicles.  We see the pattern:

In answering Sean’s provocative question (whether there’s some “law of complexodynamics” that would explain his graph), it seems to me that the challenge is twofold:

1. Come up with a plausible formal definition of “complexity.”
2. Prove that the “complexity,” so defined, is large at intermediate times in natural model systems, despite being close to zero at the initial time and close to zero at late times.

To clarify: it’s not hard to explain, at least at a handwaving level, why the complexity should be close to zero at the initial time.  It’s because we assumed the entropy is close to zero, and entropy plausibly gives an upper bound on complexity.  Nor is it hard to explain why the complexity should be close to zero at late times: it’s because the system reaches equilibrium (i.e., something resembling the uniform distribution over all possible states), which we’re essentially defining to be simple.  At intermediate times, neither of those constraints is operative, and therefore the complexity could become large.  But does it become large?  How large?  How could we predict?  And what kind of “complexity” are we talking about, anyway?

After thinking on and off about these questions, I now conjecture that they can be answered using a notion called sophistication from the theory of Kolmogorov complexity.  Recall that the Kolmogorov complexity of a string x is the length of the shortest computer program that outputs x (in some Turing-universal programming language—the exact choice can be shown not to matter much).  Sophistication is a more … well, sophisticated concept, but we’ll get to that later.

As a first step, let’s use Kolmogorov complexity to define entropy.  Already it’s not quite obvious how to do that.  If you start, say, a cellular automaton, or a system of billiard balls, in some simple initial configuration, and then let it evolve for a while according to dynamical laws, visually it will look like the entropy is going up.  But if the system happens to be deterministic, then mathematically, its state can always be specified by giving (1) the initial state, and (2) the number of steps t it’s been run for.  The former takes a constant number of bits to specify (independent of t), while the latter takes log(t) bits.  It follows that, if we use Kolmogorov complexity as our stand-in for entropy, then the entropy can increase at most logarithmically with t—much slower than the linear or polynomial increase that we’d intuitively expect.

There are at least two ways to solve this problem.  The first is to consider probabilistic systems, rather than deterministic ones.  In the probabilistic case, the Kolmogorov complexity really does increase at a polynomial rate, as you’d expect.  The second solution is to replace the Kolmogorov complexity by the resource-bounded Kolmogorov complexity: the length of the shortest computer program that outputs the state in a short amount of time (or the size of the smallest, say, depth-3 circuit that outputs the state—for present purposes, it doesn’t even matter much what kind of resource bound we impose, as long as the bound is severe enough).  Even though there’s a computer program only log(t) bits long to compute the state of the system after t time steps, that program will typically use an amount of time that grows with t (or even faster), so if we rule out sufficiently complex programs, we can again get our program size to increase with t at a polynomial rate.

OK, that was entropy.  What about the thing Sean was calling “complexity”—which, to avoid confusion with other kinds of complexity, from now on I’m going to call “complextropy”?  For this, we’re going to need a cluster of related ideas that go under names like sophistication, Kolmogorov structure functions, and algorithmic statistics.  The backstory is that, in the 1970s (after introducing Kolmogorov complexity), Kolmogorov made an observation that was closely related to Sean’s observation above.  A uniformly random string, he said, has close-to-maximal Kolmogorov complexity, but it’s also one of the least “complex” or “interesting” strings imaginable.  After all, we can describe essentially everything you’d ever want to know about the string by saying “it’s random”!  But is there a way to formalize that intuition?  Indeed there is.

First, given a set S of n-bit strings, let K(S) be the number of bits in the shortest computer program that outputs the elements of S and then halts.  Also, given such a set S and an element x of S, let K(x|S) be the length of the shortest program that outputs x, given an oracle for testing membership in S.  Then we can let the sophistication of x, or Soph(x), be the smallest possible value of K(S), over all sets S such that

1. x∈S and
2. K(x|S) ≥ log2(|S|) – c, for some constant c.  (In other words, one can distill all the “nonrandom” information in x just by saying that x belongs that S.)

Intuitively, Soph(x) is the length of the shortest computer program that describes, not necessarily x itself, but a set S of which x is a “random” or “generic” member.  To illustrate, any string x with small Kolmogorov complexity has small sophistication, since we can let S be the singleton set {x}.  However, a uniformly-random string also has small sophistication, since we can let S be the set {0,1}n of all n-bit strings.  In fact, the question arises of whether there are any sophisticated strings!  Apparently, after Kolmogorov raised this question in the early 1980s, it was answered in the affirmative by Alexander Shen (for more, see this paper by Gács, Tromp, and Vitányi).  The construction is via a diagonalization argument that’s a bit too complicated to fit in this blog post.

But what does any of this have to do with coffee cups?  Well, at first glance, sophistication seems to have exactly the properties that we were looking for in a “complextropy” measure: it’s small for both simple strings and uniformly random strings, but large for strings in a weird third category of “neither simple nor random.”  Unfortunately, as we defined it above, sophistication still doesn’t do the job.  For deterministic systems, the problem is the same as the one pointed out earlier for Kolmogorov complexity: we can always describe the system’s state after t time steps by specifying the initial state, the transition rule, and t.  Therefore the sophistication can never exceed log(t)+c.  Even for probabilistic systems, though, we can specify the set S(t) of all possible states after t time steps by specifying the initial state, the probabilistic transition rule, and t.  And, at least assuming that the probability distribution over S(t) is uniform, by a simple counting argument the state after t steps will almost always be a “generic” element of S(t).  So again, the sophistication will almost never exceed log(t)+c.  (If the distribution over S(t) is nonuniform, then some technical further arguments are needed, which I omit.)

How can we fix this problem?  I think the key is to bring computational resource bounds into the picture.  (We already saw a hint of this in the discussion of entropy.)  In particular, suppose we define the complextropy of an n-bit string x to be something like the following:

the number of bits in the shortest computer program that runs in n log(n) time, and that outputs a nearly-uniform sample from a set S such that (i) x∈S, and (ii) any computer program that outputs x in n log(n) time, given an oracle that provides independent, uniform samples from S, has at least log2(|S|)-c bits, for some constant c.

Here n log(n) is just intended as a concrete example of a complexity bound: one could replace it with some other time bound, or a restriction to (say) constant-depth circuits or some other weak model of computation.  The motivation for the definition is that we want some “complextropy” measure that will assign a value close to 0 to the first and third coffee cups in the picture, but a large value to the second coffee cup.  And thus we consider the length of the shortest efficient computer program that outputs, not necessarily the target string x itself, but a sample from a probability distribution D such that x is not efficiently compressible with respect to D.  (In other words, x looks to any efficient algorithm like a “random” or “generic” sample from D.)

Note that it’s essential for this definition that we imposed a computational efficiency requirement in two places: on the sampling algorithm, and also on the algorithm that reconstructs x given the sampling oracle.  Without the first efficiency constraint, the complextropy could never exceed log(t)+c by the previous argument.  Meanwhile, without the second efficiency constraint, the complextropy would increase, but then it would probably keep right on increasing, for the following reason: a time-bounded sampling algorithm wouldn’t be able to sample from exactly the right set S, only a reasonable facsimile thereof, and a reconstruction algorithm with unlimited time could probably then use special properties of the target string x to reconstruct x with fewer than log2(|S|)-c bits.

But as long as we remember to put computational efficiency requirements on both algorithms, I conjecture that the complextropy will satisfy the “First Law of Complexodynamics,” exhibiting exactly the behavior that Sean wants: small for the initial state, large for intermediate states, then small again once the mixing has finished.  I don’t yet know how to prove this conjecture.  But crucially, it’s not a hopelessly open-ended question that one tosses out just to show how wide-ranging one’s thoughts are, but a relatively-bounded question about which actual theorems could be proved and actual papers published.

If you want to do so, the first step will be to “instantiate” everything I said above with a particular model system and particular resource constraints.  One good choice could be a discretized “coffee cup,” consisting of a 2D array of black and white pixels (the “coffee” and “milk”), which are initially in separated components and then subject to random nearest-neighbor mixing dynamics.  (E.g., at each time step, we pick an adjacent coffee pixel and milk pixel uniformly at random, and swap the two.)  Can we show that for such a system, the complextropy becomes large at intermediate times (intuitively, because of the need to specify the irregular boundaries between the regions of all-black pixels, all-white pixels, and mixed black-and-white pixels)?

One could try to show such a statement either theoretically or empirically.  Theoretically, I have no idea where to begin in proving it, despite a clear intuition that such a statement should hold: let me toss it out as a wonderful (I think) open problem!  At an empirical level, one could simply try to plot the complextropy in some simulated system, like the discrete coffee cup, and show that it has the predicted small-large-small behavior.   One obvious difficulty here is that the complextropy, under any definition like the one I gave, is almost certainly going to be intractable to compute or even approximate.  However, one could try to get around that problem the same way many others have, in empirical research inspired by Kolmogorov complexity: namely, by using something you can compute (e.g., the size of a gzip compressed file) as a rough-and-ready substitute for something you can’t compute (e.g., the Kolmogorov complexity K(x)).  In the interest of a full disclosure, a wonderful MIT undergrad, Lauren Oullette, recently started a research project with me where she’s trying to do exactly that.  So hopefully, by the end of the semester, we’ll be able to answer Sean’s question at least at a physics level of rigor!  Answering the question at a math/CS level of rigor could take a while longer.

PS (unrelated). Are neutrinos traveling faster than light?  See this xkcd strip (which does what I was trying to do in the Deolalikar affair, but better).