## Experimental complexity theory

I just came back from the MIT CSAIL (Computer Science and Artificial Intelligence Lab) annual meeting, which was held at a beach resort in Cape Cod. No, it isn’t California, but for at least a few months a year “my” coast can put up a respectable showing too:

Out of all the ideas I heard at the CSAIL meeting, the one that made me proudest to have become a professor was this: computer scientists should make a serious effort to address world hunger, deforestation, climate change, and other global crises, because of the significant opportunities to tap funding resources that are becoming available in these areas. I’m telling you, if a giant asteroid were going to hit the earth in a week, the first question academics would ask would be how to beat out competing proposals for the $50-million “Deflection of Space-Based Objects” initiative at NSF. The meeting ended with a “Wild & Crazy Ideas Session,” at which I (naturally) spoke. I briefly considered talking about quantum gravity computing, closed timelike curves, or quantum anthropic postselection, but ultimately decided on something a little less mainstream. My topic was “Experimental Computational Complexity Theory,” or “why do theoretical physicists get$8-billion machines for the sole purpose of confirming or refuting their speculative ideas, whereas theoretical computer scientists get diddlysquat?” More concretely, my proposal is to devote some of the world’s computing power to an all-out attempt to answer questions like the following: does computing the permanent of a 4-by-4 matrix require more arithmetic operations than computing its determinant? You can read my slides here.

### 57 Responses to “Experimental complexity theory”

Scott, are you proposing, like Avi Wigderson said about algorithms being sanity checks, huge experimental work has to be taken up just for sanity check on lower bounds? Or is there something more to your proposal?

I think it’s more promising to take such huge experiments for speculative ideas rather than for sanity checks of negative results.

3. the reader from Istanbul Says:

Did Wigderson say that, or was it Sipser? Or both? Anybody know?

4. Scott Says:

Nagesh: No, the hope would be that experimental results could help us prove lower bounds. For example, if we found that minimal circuits for the permanent always had a certain structure, we could then try to prove that structure was necessary and that it forced the size to increase superpolynomially with n.

To be clear, I see this as an extreme long-shot proposal, for the reasons laid out in the talk. But right now, theoretical computer science operates almost entirely without the sort of data that theoretical physics takes for granted (or took for granted until recently), and I think we should at least consider the possibility that the limit of n→infinity can be fruitfully approached from below, in the same way the Planck scale can.

5. rrtucci Says:

experimental complexity theory == computer hardware & software

6. lylebot Says:

This is exactly what I do whenever I need to prove something. I was pretty surprised to see in your slides that the theorist conventional wisdom is that you can’t learn anything using the methods that I use all the time. I’m an IR guy, not a theorist, though, so maybe I didn’t get the cultural conditioning that it’s supposed to be worthless? Or I’m just not smart enough to be able to prove something without looking at examples?

Oh now I get it. So the data acquired with such experiments might reveal insights into the hardness that can lead to actual lower bound proofs.

In fact this is the approach most applied researchers in CS (like those in data mining, computer vision etc.) take for inventing their so called “theories”. Since you want this approach for negative results the experiments ought be exhaustive and huge.

Thanks a lot for the explanation Scott.

Did Wigderson say that, or was it Sipser?

It was Sipser if Bill Gasarch isn’t wrong. Sorry for the mistake. I don’t know if Wigderson also said it though.

9. Scott Says:

experimental complexity theory == computer hardware & software

No, that equation completely fails. Complexity theory asks about the best algorithms that could possibly exist. Ordinary hardware and software tells us nothing about such questions, nor is it designed to. Some software (like RSA) is certainly applied complexity theory, but that’s different from experimental.

10. Scott Says:

lylebot: Yes, even pure mathematicians often resort to experiment (more often than they admit!) when they want to guess at the truth or falsity of a conjecture. But complexity theory is a special branch of mathematics — one that asks about the asymptotic behavior, not of some particular algorithm or a typical algorithm, but of the best algorithm that could possibly exist. And the set of possible algorithms is both huge beyond imagination and lacking in any simple characterization. That’s why, in the past, experimental work has not been a big help to complexity theory: because the amount of computation that you’d have to do to learn anything interesting is so astronomical. My “wild & crazy” proposal is that we at least ask ourselves whether that’s still true, given both the better understanding and the vastly increased computing power that we have today.

11. Anonymous Says:

But complexity theory is a special branch of mathematics

Thank you, Scott, for finally admitting this! 🙂

12. Scott Says:

13. Anonymous Says:

Even better, of course! I take it, then, that you would not consider it unreasonably presumptuous on the part of mathematicians were they to adopt the following as their slogan:

“Mathematics: the field that contains complexity theory”

(thereby subsuming complexity theory into mathematics in the same way that you yourself would subsume, well, the entire universe into complexity theory!)

14. Scott Says:

Universe ⊆ Complexity
Complexity ⊆ Math
Math ⊆ Universe

15. Jair Says:

So universe = math? That’s what I’ve been trying to tell people for a long time.

16. Not even right Says:

More concretely, my proposal is to devote some of the world’s computing power to an all-out attempt to answer questions like the following: does computing the permanent of a 4-by-4 matrix require more arithmetic operations than computing its determinant? You can read my slides here.

I haven’t heard of the permanent of a matrix but Scott just told us about it in his PowerPoint slides. The difference between it and the determinant of a matrix is the absence of the minus signs when doing the multiplication and addition of the matrix elements. It is very surpising to me that I am told that we haven’t found a program that calculates the permanent of a 4x 4 matrix with minimal steps, though we know the minimal steps for calculating its corresponding determinant.

17. Scott Says:

though we know the minimal steps for calculating its corresponding determinant.

No, we don’t even know that. 🙁

18. Eric Says:

Several people have tried (since Brent’s 1970 Master’s thesis, at least) to discover fast algorithms for small matrix multiplication by brute force (by restricting to bilinear algorithms, which makes things simpler). As far as I know, there was little success.

BTW, for the 4×4 determinant, you can save 1 operation by using Cramer’s formulas for the inner 3×3 one.

19. Stas Says:

Indeed, why to jump to #P-hard problems (such as the permanent) if we don’t know yet how to derive lower bounds for P problems? For instance, I would love to know if the complexity of matrix multiplication is O(n^2). The is the other side of the coin: optimal algorithms/minimal curcuits for tractable problems may open to us new algorithmic ideas that are not discovered yet by human creativity/intuition. Then, this may lead to new approaches to hard problems, and they may happen to be not hard at all… This is why I tend to consider P=NP possibility seriously: we know only very few techniques leading to efficient algorithms, and discovering one of them takes ages (like AKS for PRIMES). So, I don’t agree with the bayesian argument that if there was a polynomial algorithm for NPC we would discover it, thus P!=NP. Simply, we are not efficient at discovering efficient algorithms, and if there is a chance to power us up by performing a complete automated search for them, it may help a lot. Besides, positive results are more useful for practical applications, so assuming equal probability of success, I would rather go for a positive result.

20. Scott Says:

Stas: Nontrivial lower bounds for P problems are wonderful to whatever extent we can get them, and this is universally understood by complexity theorists. On the other hand, sometimes lower bounds for NP and #P problems are (perhaps not surprisingly) easier to prove. As an example, Fortnow and van Melkebeek showed that SAT can’t be solved simultaneously in n1.618… time and no(1) space (1.618… being the golden ratio). To my knowledge, we don’t currently have any similar result for a P problem.  Likewise, we know that there exist problems in #P that don’t have linear-size circuits, but we don’t know the same for P (or for that matter NP).

21. rrtucci Says:

Okay. I retract
experimental complexity theory == computer hardware & software
\mu(experimental complexity theory – computer hardware & software) = 0
where \mu is MY own, well defined measure.

Scott, You yourself, when pressed for an example of experimental complexity theory, talk about: “to devote some of the world’s computing power”, instead of ” to devote some of the world’s pencil resources”. I’m glad you are duly kowtowing to computer hardware and software. We computer programmer’s rule the world. Without us, you complexity theorists are utterly unattractive in the eyes of the world.

22. gsgs Says:

the whole theoretical disussion ignores constant
factors, but in practice these are important.
Isn’t this all a bit too distant from practice ?

I’d prefer to see computational bounds in seconds(n)
rather than some bounds with several parameters
and epsilons.

23. rrtucci Says:

correction:
Instead of – of the two sets, I meant their symmetric difference

24. Stas Says:

Nontrivial lower bounds for P problems are wonderful to whatever extent we can get them, and this is universally understood by complexity theorists. On the other hand, sometimes lower bounds for NP and #P problems are (perhaps not surprisingly) easier to prove.

Thank you, Scott, I got it — the harder the problem, the easier the derivation of a superlinear lower bound.

As an example, Fortnow and van Melkebeek showed that SAT can’t be solved simultaneously in n1.618… time and no(1) space (1.618… being the golden ratio).

I just wonder if there is a similar result for my favorite NP problem CLIQUE :).

To my knowledge, we don’t currently have any similar result for a P problem.
It makes the experimental search for them even more exciting…

25. Klas M. Says:

Eric:
Actually I and two of my coauthors have been working a complete computer search for matrix multplication algorithms, for rectangular matrices. So far we have mostly been working with cases were the optimal number of multiplications is known, but for some of these cases we have now generated all optimal methods. For one case we have in this way also proven that the best method known is in fact optimal.
No paper available yet, but we are writing it now.

26. Scott Says:

Klas: When you have a preprint, send it to me!

27. Scott Says:

the whole theoretical disussion ignores constant
factors, but in practice these are important.

If you’re searching for an optimal algorithm for 4-by-4 matrices, it’s obvious that the whole problem boils down to constant factors, and I say as much in the slides. Specifically, the problem is to reduce the number of cases to check from 10123 down to some manageable number.

28. rrtucci Says:

Scott said

“More concretely, my proposal is to devote some of the world’s computing power to an all-out attempt to answer questions like the following: does computing the permanent of a 4-by-4 matrix require more arithmetic operations than computing its determinant?”

Scott said

“Complexity theory asks about the best algorithms that could possibly exist. Ordinary hardware and software tells us nothing about such questions, nor is it designed to.”

29. Klas M. Says:

I’ll be happy do to that Scott. I hope we will have the preprint written up by the end of summer, depending on conference travel, vacations and so on

30. Scott Says:

rrtucci, we seem to be talking past each other.

Computer hardware and software can be used to do experimental research in complexity theory, just as magnets can be used to do experimental research in high-energy physics. But experimental high-energy physics is not about magnets, and experimental complexity theory would not be about existing hardware and software.

31. Koray Says:

Scott, do you mechanize any of your proofs? Because I’d like seeing huge amounts of computing power burned to generate a google-sized database of lemmas.

32. Rahul Says:

I guess Scott means a natural P problem…

33. Scott Says:

Rahul: Yes, thanks for the clarification. Stuffing an nk-time Turing machine into the problem statement is definitely not allowed.

34. Ernesto Says:

Hi Scott,
Did you presentation only have 8 slides? that was the number of slides in the file linked from your post.
Cheers

35. Scott Says:

Ernesto: Yes, and I didn’t even get to finish it. I was allotted 4 minutes, plus 1 minute for questions.

36. rrtucci Says:

Okay, I’ll say my last word about this and then I’ll shut up. Scott, you are fond of making a distinction between A is_used_by_or_uses B and A is_not_strictly_about B. You’ll say Complexity Theory is_not_strictly_about computer hardware or software. Or Experimental Complexity Theory is_not_strictly_about computer hardware or software. Or

“experimental complexity theory == computer hardware & software
No, that equation completely fails. Complexity theory asks about the best algorithms that could possibly exist. Ordinary hardware and software tells us nothing about such questions, nor is it designed to.”

Your last assertion might be true in the is_not_strictly_about sense, but it is false in the is_used_by_or_uses sense. I was of course talking in the is_used_by_or_uses sense, which is the sense most people would use to interpret a statement such as “experimental complexity theory == computer hardware & software”.

I also think that the A is_not_strictly_about B is a pedantic distinction, invented by and used almost exclusively by complexity theorist to distance themselves from computer programmers, because they consider themselves high above such trades. Of course, they are living in a bubble; the vast majority of the world admires a computer programmer much more than a complexity-what? I mean, so what if complexity-what?s eventually prove that P != NP. Everyone knew that all along. If I were a complexity theorist, I would stop making these true but pointless is_not_strictly_about distictions which make non-complexity theorists yawn. I would instead model myself around Donald Knuth, who is an excellent mathematician, and a very eager programmer. He is not concerned with these artificial class distinctions.

37. IC Says:

Of course, they are living in a bubble; the vast majority of the world admires a computer programmer much more than a complexity-what? I mean, so what if complexity-what?s eventually prove that P != NP.

Apparently, being admired by the masses is important to you. But it’s not a priority with most complexity theorists and mathematicians. They would rather that their work be admired by peers than by the general public, who have very little understanding of and, therefore, appreciation for theoretical research.

Having said that, however, I would have to agree with your last two sentences. The distinction between pure and applied mathematics is an artificial one, and a good mathematician (including the complexity theorist) should not look down on applications.

38. Scott Says:

I’m amused by people who’ve somehow gotten the idea that I look down on applications. I worked on applied projects for four years as an undergrad and summer student at Bell Labs, and just this week I started working with a student to implement the ideas in my “Learnability of Quantum States” paper, and turn them into a practical tool for experimental physicists.

39. IC Says:

Scott, I wasn’t referring to you in particular; it was a general statement about the attitude of some theorists, that’s all. 🙂

40. Daniel Says:

Good idea Scott. I would suggest, however, collaborating with researchers working on theorem provers if you are serious about this. I’m guessing that if the “experimental complexity” you describe ever comes to be, those involved would build something resembling a theorem prover running on a massive computer cluster.

41. Anon Says:

Does anyone know offhand the respective costs for 3×3 matrices? Also, does one want to treat multiplication and
addition on equal footing for these problems?

42. Scott Says:

Anon: Excellent questions!

The situation already looks nontrivial for 3-by-3 matrices — the exact answers might be known, but not by me. I know that 14 operations are sufficient, and I conjecture that that’s optimal. I can show that 9 operations are necessary (i.e., 8 don’t suffice) :-). I’ll be extremely interested if anyone has better upper or lower bounds.

No, there’s no need to treat multiplication and addition gates on an equal footing — I was doing so for simplicity, but it would also be extremely interesting to know the tradeoffs. Indeed, we could even look at the three-way tradeoff among addition, multiplication, and division gates.

(Gaussian elimination of course requires division gates. We can always replace division by addition and multiplication using Taylor series, but to do so we need to be working over either a field of small finite characteristic, or else over the reals or complex numbers where we’re willing to tolerate small errors. And at least for now, I’d prefer to look for determinant circuits that are formally correct regardless of the underlying field.)

43. Alon Rosen Says:

Scott,

Are you aware of Dodgson’s (a.k.a. Lewis Carroll) condensation method? It’s an alternative method for computing the determinant. It might not require less operations than Gaussian elimination, but seems to be different than the methods you are considering in your presentation.

44. Eric Says:

Assuming you want to compute a polynomial, to remove divisions (and keep an exact result!), what you need is that your base field has enough points (so that you can find a point where no denominator vanishes). Of course, there is an overhead in doing so, which is something like O(d log(d)loglog(d)), where d is the degree of the polynomial you want to compute.

45. Scott Says:

Alon: No, I didn’t know about Dodgson condensation! Does anyone know its complexity? Otherwise I’ll work it out myself.

46. Jonathan Vos Post Says:

This week I was teaching Order of Operations to my High School students. I quoted: “Ambition, Distraction, Uglification, and Derision.”

I asked if they knew the author that I cited, “Lewis Carroll.” One young lady said: “Yeah; he molested little girls.”

We went off on a long highly-charged tangent. Eventually, I explained who Charles Dodgson was, what his day job was, and I told the anecdote of his audience with Queen Victoria. And the book that he sent Her Majesty. That book containing part of what Scott Aaronson now wants to probe more deeply. Ummm, I mean probe down the rabbit hole. I mean…

Bareiss, E. H. “Sylvester’s Identity and Multistep Integer-Preserving Gaussian Elimination.” Math. Comput. 22, 565-578, 1968.

Bressoud, D. and Propp, J. “How the Alternating Sign Matrix Conjecture was Solved.” Not. Amer. Math. Soc. 46, 637-646.

Dodgson, C. L. “Condensation of Determinants, Being a New and Brief Method for Computing their Arithmetic Values.” Proc. Roy. Soc. Ser. A 15, 150-155, 1866.

Lotkin, M. “Note on the Method of Contractants.” Amer. Math. Soc. 55, 476-479, 1959.

Macmillan, R. H. A New Method for the Numerical Evaluation of Determinants.” J. Roy. Aeronaut. Soc. 59, 772, 1955.

Robbins, D. P. and Rumsey, H. Jr. “Determinants and Alternating Sign Matrices.” Adv. Math. 62, 169-184, 1986.

Weisstein, Eric W. “Condensation.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/Condensation.html

47. David Speyer Says:

Scott — Dodgson condensation has complexity O(n^3), which is, if I recall correctly, the same as Gaussian elimination. It tends to be useful for matrices which have a high degree of symmetry, as you wind up performing the same computations repeatedly.

48. Scott Says:

Thanks, David!

49. ryanw Says:

As I mentioned at FCRC, I am currently working on a project that automatizes almost all of the existing framework for time lower bounds established via “alternation-trading proofs”. (The Fortnow-van Melkebeek result cited above falls in this category.) My approach applies not only to the existing proofs of statements like
“SAT requires n^c time and n^{o(1)} space”
but also things like
“Pi_4 SAT requires n^d time on a one-tape Turing machine.”
So far, my implementations have just been written in Maple. Given that, they’re surprisingly powerful– enough to have checked all possible alternation-trading proofs with at most 20 “lines” for the best possible time lower bound for SAT on n^{o(1)} space algorithms (for a natural definition of “line”).
In some settings, this automated approach has found no better time lower bound than the current best– a strong suggestion that the current best is “optimal” in some sense. In other more technically annoying settings (where humans fear to tread?), it has managed to yield new improvements. Eventually, when my thesis is fully revised and all that, I’ll post a preprint.

50. Paul Beame Says:

As an example, Fortnow and van Melkebeek showed that SAT can’t be solved simultaneously in n1.618… time and no(1) space (1.618… being the golden ratio). To my knowledge, we don’t currently have any similar result for a P problem.

It depends a little on exactly what you mean by ‘similar’. We can’t get time bounds in such tradeoffs to be even as large as n1.00001 for problems in P but Mike Saks, Xiaodong Sun, Erik Vee and I, following work of Miki Ajtai, did prove that a particular simple problem in P can’t be computed simultaneously in n(log n/loglog n)1/2 time and n.99999 space, even non-uniformly.

Also, though Ryan Williams was too modest to note it, his recent CCC 2007 paper improved that golden ratio to a number above 1.80.

I don’t think that the hardness of the permanent alone is what makes it so appealing. For low level complexity lower bounds, it is often easier to prove that a simple function is hard rather than a complex one. (The alternation-trading arguments are notable exceptions.) The appeal of the permanent is not only that it is #P hard but also that it has many symmetries and is very similar to the determinant which is simple to compute.

51. ryano Says:

The number above 1.80 which Ryan believes might be tight is, amusingly, 2cos(pi/7).

52. Johan Richter Says:

You didn’t think we would recognize that on our own? Give us some credit ryano.

53. Boaz Barak Says:

I think this is a great idea.

On the one hand, experimental complexity theory seems much more difficult than, say, experimental number theory, since it seems to require a much larger search space, ruling out many naive approaches. Thus, it’s clear that designing and executing useful experiments would require not just strong computers but a combination of both theoretical understanding and engineering/programming skills of the kind many theoreticians do not posses (even those who know can program in other languages than BASIC :). But of course, this is also true for experimental physicists/biologists etc..

On the other hand, I think that this difficulty just demonstrates why experimental complexity theory could be so important. We don’t really understand the space of algorithms, and I’m sure that such experiments could come up with surprising results. (Probably good candidates are arithmetic problems with relatively efficient algorithms such as matrix multiplication and Fourier transform, since then we still have a good notion of what are the elementary operations but the search space is not doubly exponential.)

Probably the first step, before asking for funding, is for us complexity theorists to be receptive and supportive of such works.

54. Stas Says:

To specify more precisely what P problems I would like to see attacked experimentally, let me refer to
http://www.win.tue.nl/~gwoegi/CO2a/exact2.pdf
page 9, Open problem 4.3
It’s all related to k-CLIQUE complexity with fixed k. If you let me know about any progress in this direction, I would greatly appreciate it.

55. Tapan Parikh Says:

“computer scientists should make a serious effort to address world hunger, deforestation, climate change, and other global crises, because of the significant opportunities to tap funding resources that are becoming available in these areas.”

What?! And all these years, Ive been doing this for the sex and the drugs…

No, wait, thats Bono.

56. Scott Says:

Tapan: 🙂

Whatever your motivations, I’m a big fan of your work, and it made me feel pretty shitty to be competing with you on the job market this year. I mean, yes, the dearth of quantum complexity theorems is one of the most urgent crises facing our civilization … but more urgent than third-world poverty? That’s a tough one.

So I’m delighted to read on your homepage that you ended up at Berkeley. I loved it there, and I don’t think you could’ve made a better choice. If you get lucky, you might even encounter others there who share your interest in social justice.

Also, I highly recommend the mango smoothies at Hummingbird on Euclid Avenue.

57. John Sidles Says:

The exploration of geometry in large dimensions is a worthy “experimental” mathematics topic.

In our own work, we are presently computing (non-sparse) Riemann tensors on Kahler manifolds of (complex) dimension up to 99. Each tensor thus requires 99^4 = 10^8 complex entries. This is about the limit that can be computed on a desktop scale machine. These tensors are showing us remarkable mathematical regularities that we are far from understanding analytically.

We would dearly love to move to larger dimensions, say Riemann tensors of dimension 1000^4. This would let us explore the Kählerian geometry to protein-scale spin simulations. IBM’s Blue Gene racks seem ideally suited for this scale of computation.

Upon a review of the literature, we presently believe that our 99^4 Kählerian Riemann tensors are the largest ever computed in explicit numerical form — does anyone know of a comparable or larger geometric computation effort?