2.373

For twenty years, the fastest known algorithm to multiply two n-by-n matrices, due to Coppersmith and Winograd, took a leisurely O(n2.376) steps.   Last year, though, in his PhD thesis, Andrew Stothers gave an improvement to O(n2.374) steps.  And today,  Virginia Vassilevska Williams of Berkeley and Stanford, released a paper that gives a general methodology for analyzing Coppersmith-Winograd-type algorithms, and that improves the matrix-multiplication time to a lightning-fast O(n2.373) steps.  (Virgi’s work was independent of Stothers’, though she credits him and applies an idea of his to simplify her proof.)  Full disclosure: I actually knew a month ago that this was coming—I had a hell of a time keeping the secret.  I’d recommend that you get started memorizing “ω<2.373,” but as Russell Impagliazzo points out in the comments, the exponent might get lowered again in short order.  Huge congratulations to Virgi and to Andrew for this breakthrough!

Update (Nov. 30): Last night I received an extremely gracious email from Andrew Stothers, which he’s given me permission to summarize here.  In the email, Andrew expressed how excited he was about Virgi’s new result, apologized for the confusion he caused by not mentioning his improvement to ω until page 71 of his thesis (he says he doesn’t know why he did it), and said that he meant to publish a paper, but was prevented from doing so by health and job issues.  He also said that he didn’t take issue with anything I wrote here, except that I mistakenly referred to him as Andy rather than Andrew.  In response, I congratulated Andrew on his achievement; expressed how happy I was that—ironically—his work is now finally getting some of the attention that it deserves; and promised to buy him a beer when and if I’m ever in Edinburgh, a city I’ve always wanted to visit.  (On the other hand, I warned Andrew that his LinkedIn profile, which unselfconsciously mentions improvements to his Word and Excel skills as one of the benefits of his PhD research breaching the Coppersmith-Winograd barrier, might have earned him a place in scientific folklore forever!)

In summary, I now see Andrew as an extraordinarily nice fellow who had some bad luck and—most conspicuously—a lack of good advice from people around him.  I do stand by the points that I was originally trying to make:

(a) that this tangled situation shouldn’t in any way detract from Virgi’s fantastic achievement, which (except for a simplification, as she discusses) must be considered completely independent of Andrew’s, and

(b) that there’s indeed an important cautionary lesson for students here, about adequately publicizing your work (yes, there’s a happy medium, between hiring a PR firm to wage a viral marketing campaign and burying your solution to a longstanding open problem so far in the body of your PhD thesis that even world experts in the subject who read your thesis will miss it).

On the other hand, I hereby apologize for anything I said that could even be perceived as slighting Andrew, his important work, or his motives.

Another Update: On the third hand, if you’re one of the commenters whose beef is not about attribution, but about the entire concept of using a CS theory blog to “promote” major milestones in CS theory (like the breaking of the Coppersmith-Winograd barrier), then I apologize for absolutely nothing.  Go read an economics or physics blog; I understand that those are entirely hype-free.  Better yet, go to hell.

138 Responses to “2.373”

1. Russell Impagliazzo Says:

Scott, ye of little faith. Virginia will soon have this down to 2.37 2, making the time spent on memorizing the fourth digit obsolete.

2. Bram Cohen Says:

The approach to this problem seems a little like the Shawshank Redemption approach to breaking out of prison.

3. Scott Says:

Russell: Well, after Virgi and Ryan hauled themselves all the way to Israel for my and Dana’s wedding, it seems like the least I can do to promise to memorize the statements of their next ten breakthrough theorems (including any relevant constants, and even in cases where those constants look likely to be superseded soon). I’ll make this effort despite the fact that my memory grows worse by the day.

4. Marco Says:

I totally appreciate the importance of the result and the tour de force needed to prove it. But I was also curious to see approximately how big the matrices need to be so that only half the steps are required with the new algorithm:

http://www.wolframalpha.com/input/?i=NSolve%5Bn%5E%7B0.003%7D%3D%3D2%2Cn%5D

5. Scott Says:

Bram: Yes, this obviously required a level of perseverance beyond what most of us can muster. Indeed, it occurred to me that the section on page 3, entitled “Why wasn’t an improvement on CW found in the 1990s?,” could be shortened to “See rest of paper.” 🙂

6. Circe Says:

A minor point: As Virginia’s paper points out, this is not the first improvement in 20 years: an improvement was also made by Stothers in 2010, though this paper improves upon the analysis.

7. Scott Says:

Marco #4: Could you provide some more details for your calculation? Firstly, what are you calculating: the matrix size needed for the new algorithm to beat the naïve algorithm by a factor of 2, or the matrix size needed for it to beat Coppersmith-Winograd? Secondly, where do you get 21/0.003 from?

8. major point Says:

Circe: Why is that a minor point? I’m confused. The exponent for matrix multiplication was improved by Stothers in 2010 and no one seemed to care (did he publish his paper?), i.e. no one blogged about it, announced it. But now it is improved a little bit more and it’s a “breakthrough”?

9. Marco Says:

Scott #7: I naively checked for what $n$ one gets $n^{2.376}/n^{2.373} = n^{0.003} =2$ (so it is the new result vs Coppersmith-Winograd, up to many important details I suppose)

10. Circe Says:

major point:
did he publish his paper?

His thesis is cited in the Williams paper and is available from the UoEdinburgh thesis repository: http://www.maths.ed.ac.uk/pg/thesis/stothers.pdf

11. anony Says:

I browsed thru the paper. It’s like over 40 pages are full of random symbols and formulas. Couldn’t help thinking, how can one make sure that the paper is mathematically correct?

12. chi Says:

I think he is calculating the improvement over Coppersmith-Winograd, which is n^2.376/n^2.373 = n^(2.376-2.373) = n^0.003 (well … if you’re prepared to ignore some constants hidden by O notation). Then Wolfram Alpha solves n^0.003 = 2 for n.

13. Bram Cohen Says:

Circe, the Stothers result improved on an old technique to set a record for that technique, but it didn’t set an overall record because other techniques had lower exponents. This paper improves on the Stothers result to set an overall record, which happens by coincidence to just barely beat the old record. Or maybe it isn’t so coincidental. The authors say they can improve their result with more analysis, and given how brute force the techniques involved are they might have been eager to declare victory.

14. Scott Says:

Circe and “major point”: Virgi tells me that, for whatever reason, Stothers never published his result anywhere except his PhD thesis, where it was deeply buried (not even mentioned in the abstract!). His writeup also had bugs. Stothers never posted to arXiv, ECCC, or any other preprint repository, he didn’t communicate his result to the experts in the subject, and he’s apparently since left the field and doesn’t seem reachable via email.

Corrections: Stothers communicated his important result to a few experts (not many), and an email address has been found, though I won’t publish it since he values his privacy.

FWIW, I think Stothers’ result would have been very big news in 2010, if he’d bothered to tell people about it! But he didn’t. There’s an important lesson here for students everywhere: in academia, the tree that falls in the middle of the forest can’t claim priority!

Since I’ve been told that the sentence above is the one that set people off, I hereby retract it and apologize for it. There is indeed an important lesson for students here—and the lesson does indeed involve doing what Stothers didn’t do—but I nevertheless badly misstated the content of the lesson.

Having said that, as Virgi explains in the paper, after she finally learned through the grapevine about Stothers’ work, she was able to use his ideas to simplify (to a “mere” 70 pages, I guess 🙂 ) the proof of her own, better bound. I thought Virgi was very fair in sharing credit with Stothers, for a breakthrough that ultimately builds on both of their contributions.

15. Circe Says:

Scott: Of course, I agree with what you say here. I was just pointing out a difference between the claims in Virginia’s paper and your blog post 🙂

16. Scott Says:

Marco #9 and chi #12: In that case, the calculation is meaningless unless you include constant factors (though I have no idea whether they’d make the crossover point “better” or “worse”!).

I think it’s best to think of this result as being like a new upper bound on the mass of the Higgs boson (and the matrix multiplication exponent really is as important for computer science as the Higgs boson mass is for physics—largely because of the mind-boggling number of other things known to be equivalent to MM).

In particle physics, you need to go to higher and higher energies to uncover the “deeper truths”; in TCS, you have to go to larger and larger input sizes. In both cases, two unfortunate side-effects are that you need more and more complicated machinery, and that you get further and further from anything directly relevant to daily life. But in both cases, practical relevance wasn’t the goal to begin with; the goal was to redraw the boundary between human understanding and ignorance.

17. Scott Says:

anony #11:

Couldn’t help thinking, how can one make sure that the paper is mathematically correct?

An improvement to the matrix multiplication exponent is important enough that I expect other people to redo the calculations in short order. Even if they don’t check every formula in Virgi’s paper, they can still provide “independent experimental confirmations”: i.e., carry out the program she describes, and make sure they get the same optimization problems and the same solutions to them.

18. Marco Says:

Scott #11: I agree with your analysis and I like your parallel with Physics. As I wrote already 1) I appreciate the importance of the result, independently of “practical applications”, 2) I was just curious 🙂 , and 3) it was a naive calculation. I agree that constants are important to make sense of the estimate, but the estimate also means that from a practical point of view (which, again, I agree is not important here!!!) an improvement in the constants would probably have a larger impact.

19. Henry Cohn Says:

The Stothers situation is certainly weird, but let me try to clear up a few misconceptions:

Stothers absolutely communicated his result to at least some people in the field. For example, I heard about it from Amin Shokrollahi and Volker Strassen in June 2010.

Assuming his calculations are correct, he did obtain the first improvement on the Coppersmith-Winograd exponent (contrary to what Bram says above). I have no idea why he downplayed the improvement so much in his thesis – I actually assumed he had withdrawn his claimed improvement when I skimmed through his thesis and didn’t see it prominently mentioned. It wasn’t until Chris Umans insisted that it really was in there that I looked more carefully and discovered that indeed it was.

There’s an important lesson here for students everywhere: in academia, the tree that falls in the middle of the forest can’t claim priority!

I agree that there’s an important lesson here about adequately publicizing work, but I strongly disagree about priority. The only way the priority here is unclear is if there are nontrivial errors in the work Stothers did. That may be the case: I haven’t checked it, and I don’t know whether anyone else has. However, to the extent it is correct (or easily correctable), he deserves credit for the first improvement.

Anyway, it’s great to have a methodology for carrying out these sorts of optimization problems, and Virgi obtains a better exponent, so the new paper is certainly important (and more so than what Stothers did). However, Stothers should not be overlooked, and it’s not fair to him to describe this as the first improvement on Coppersmith and Winograd, unless the bugs in his write-up are pretty serious.

20. Scott Says:

Henry: Thanks for the clarifications; I edited the post to something that I hope represents the story better. I didn’t know that you, Shokrollahi, and Strassen knew about Stothers’ lowering of ω a year ago—if so, then maybe you should’ve told more people about it! 🙂

I also continue to think that the question of priority is more complicated than you say. As an extreme case, suppose that Stothers had lowered ω but hadn’t told anyone else (similar to what Newton and Gauss did on various occasions). Or suppose he’d “published” it in a book, only one copy of which was ever printed, shelved in a library in rural Estonia. Or suppose he’d presented the proof, in anagram form, in a footnote on page 374 of a thesis about goose migration patterns, with no explanation or reference to it anywhere else in the thesis.

In those cases, in an earlier era, Stothers’ theoretical priority would probably be discovered only decades later if at all, by which time it would be a curiosity for mathematical historians. These days news travels faster, but doesn’t the principle remain the same?

Looking through Stothers’ thesis, the following seems to be the closest he comes to saying what he did anywhere in the abstract or introduction:

Chapters 3 and 4 build on the work of Coppersmith and Winograd and examine how cubing and raising to the fourth power of Coppersmith and Winograd’s “complicated” algorithm aﬀect the value of ω, if at all.

Reading the above sentence (especially the “if at all” part), I think it would be completely reasonable to infer (as you apparently did infer) that Stothers tried to lower ω and failed, or had tentative ideas that might or might not pan out, etc. So, while this isn’t quite as strange as the hypothetical examples I gave above, I think it’s somewhere on the same spectrum.

21. Sasho Nikolov Says:

So any conjectures what happens with using higher tensor powers as base cases? Any reason to believe that taking higher and higher powers does not converge to $n^{2}$ at infinity?

22. Lower bounds Says:

What are known and conjectured lower bounds for matrix multiplication?

Also, to be sure: addition of two scalars and multiplication of two scalars are both considered as “unit steps” ?

23. Scott Says:

Lower bounds: Alas, no lower bound is known better than the trivial Ω(n2). (Though Ran Raz proved an Ω(n2 log(n)) lower bound under a restriction on the absolute values of the field elements.) In this respect matrix multiplication is no different from most other algorithmic problems.

And yes, addition and multiplication are both “unit steps”—though the fast matrix-multiplication algorithms that are known are all based on recursion on multiplication, so that without loss of generality people count the multiplication steps only (the addition steps only affect the constant in the big-O).

24. Scott Says:

Sasho Nikolov #20: Excellent question! I don’t know any intuitive reason to believe the higher tensor powers should converge on ω=2, nor do I know any intuitive reason to believe they shouldn’t. Maybe Virgi or someone else can provide enlightenment.

25. pete Says:

Bram Cohen: it says in the paper that the best previous factor was 2.376, so what do you mean that Stothers did not improve it?

Also, I don’t totally see how to reconcile the claim of independent work with the following statement from the paper:
“Our approach can be seen as a vast generalization of Stothers’ analysis, and part of
our proof has benefited from an observation of Stothers’ which we will point out in the main text.”

26. Henry Cohn Says:

Yeah, I agree that Stothers did an astonishingly bad job of advertising his results. I feel a little guilty in hindsight, since before Chris convinced me to look at Chapters 3 and 4, I told several people that Stothers did not improve the exponent of matrix multiplication in his thesis, so until now my net contribution to his credit was negative. As for why I didn’t publicize it more back when I first heard, at first I assumed he was keeping it under wraps while he worked at bigger improvements, and then I looked briefly at his thesis and decided it was a false alarm.

Let’s assume that all of this work is at least essentially correct. Then there are two relevant sorts of credit: intellectual credit and credit for contributing to the research community. Regarding intellectual credit, Stothers has priority, but his work was poorly publicized enough that there’s no doubt Virgi’s work was fully independent, so she deserves equal credit for rediscovering it (and in fact she went further and deserves more credit for that). Regarding credit for contributions to the community, Stothers deserves some credit, because he told some people about his work, wrote it up, and made it publicly available. However, the most troubling aspect of this is that he didn’t make the contribution he could have, and it’s not clear why.

One possibility is that he genuinely believed his results were of little importance, since they were “just” analyzing a power of the Coppersmith-Winograd tensor. Another possibility is that he believed readers should be kept in suspense until the end of a calculation. A third possibility is that he was leaving the field and just didn’t care any longer.

It’s sad that Stothers seems to have abandoned his results, and never sufficiently publicized them, but it’s great that Virgi rediscovered them and went further.

27. Circe Says:

The somewhat funny thing is that the only place (other than the equation $\omega < 2.374$ his thesis where Stothers does make a claim he improved CW89 is on his LinkedIn Profile: http://uk.linkedin.com/pub/andrew-stothers/30/42b/836 🙂

28. Scott Says:

Pete #25:

I don’t totally see how to reconcile the claim of independent work with the following statement from the paper:
“Our approach can be seen as a vast generalization of Stothers’ analysis, and part of our proof has benefited from an observation of Stothers’ which we will point out in the main text.”

Virgi discovered her improvement to ω independently of Stothers. Later, though, she found out about Stothers’ work, and realized that an idea of his could be used to simplify her proof.

29. Scott Says:

Circe #27: Great find! From the LinkedIn profile:

The main aim of my thesis was to show the existence of an algorithm to reduce the running time a computer requires to multiply two matrices. After researching the different methods, it was shown, for the first time in twenty years, that the running time could be further reduced from the (then) record. I was required to present my findings in the form of a thesis- this passed with only minor corrections required. This required high levels of analytical and logical skills, and the work was praised by my examiners. I am currently in the process of writing a paper with my supervisor showing these findings. I attended Transferable Skills courses in scientific writing and communication. I also gained much experience in various software packages include Word, Excel, LaTeX, Power Point, and Maple.
30. tim Says:

“I thought Virgi was very fair in sharing credit with Stothers, for a breakthrough that ultimately builds on both of their contributions.”

It’s actually not up to Virgi to give/decide credit for the previous result.

31. pete Says:

I would not write this about my “independent” work:

“Our approach can be seen as a vast generalization of Stothers’ analysis ..”

32. Henry Cohn Says:

Incidentally, Stothers seems to be on Twitter, so perhaps that will help re-establish contact with him (I’ve directed a couple of tweets towards him).

33. Sasho Nikolov Says:

@pete Independent work can be a generalization of someone else’s work. It can even be a special case of someone else’s work and still be independent. “Independent” is a statement about what information you had available when deriving the results. “Generalization” is a statement about how theorems and techniques relate to each other. The two are completely different things.

This situation is quite bizarre. It seems like Stothers wanted to publish a paper but for some reason didn’t get around to it.

34. Scott Says:

Henry, just to clarify: did Chris convince you that Stothers had lowered ω before or after you heard about Virgi’s result? Also, do you know how Shokrollahi and Strassen found out about Stothers’ result?

Anyway, following the lead of The Onion, I hereby put out the following request to my readers. If, while entering a locker room at night, you witness someone in the showers lowering the matrix multiplication exponent, don’t just confide to a friend what you saw: notify your nearest complexity blogger immediately. (Sorry, couldn’t resist.)

35. pete Says:

ok, Scott, but put your request in a blog post, not in the comments!

Scott, I concur with Pete, put your request in a blog post! It’s far too good to go unnoticed!

This whole situation is really puzzling. Stothers wrote a result in his Ph.D. thesis that 4 senior members knew about and didn’t tell anyone else for whatever reason. They didn’t believe the result, were waiting for a paper to come out, etc. Effectively, it was Virginia who made this work public through the course of her own research and now rightfully so, these giants would like to credit Stothers for the work that he did. But let’s not forget it was Virginia who deserves credit for making this work public. Without her paper, there would be no blog posts on matrix multiplication and Stothers work would continue to go unnoticed.

37. rgt Says:

“Without her paper, there would be no blog posts on matrix multiplication and Stothers work would continue to go unnoticed.”

research does not happen only in blogs. dissemination of results is sometimes slow. 2010 to 2011 is not a long time. clearly experts knew about the result. everybody here is comparing bounds rather than discussing new ideas. stothers and others seem to think that this is a technical variant of known ideas. they might have a point. this would explain everything.

38. Dave Bacon Says:

Sweet! Only another twenty years to push it past sqrt(5) 🙂

39. Douglas Knight Says:

“the same optimization problems”

Stothers already got the same optimization problem.

40. Cindy Says:

I disagree with you, Scott. Stothers did communicate his results to several experts. It is not his fault that nobody blogged about it or tweeted it for him. His thesis is publicly available. He did what he could, but he left the field. It is the responsibility of those still in it to keep up with the literature.

Speculation: Maybe he couldn’t get a job? Maybe he didn’t appreciate the significance of his work? If so, then some fault lies with his advisor and those experts he contacted. Maybe he did appreciate the significance, and saw it as a shallow technical extension of previous ideas.

It does not matter, though, since Williams properly credits Stothers. I hope that she also posts her paper to the arXiv. Unlike theses on university websites or pdfs on personal homepages, arXiv papers don’t get lost.

41. Scott Says:

rgt: Your apparently-preferred model of how science should work is one that I hope never triumphs. A lowering of the matrix-multiplication exponent, regardless of how it was achieved, is something significant enough that essentially everyone in the theory community would want to know about it. And in the Internet age, it’s something that essentially everyone would quickly know about, if small efforts had been made such as posting to ECCC or arXiv or submission to any conference. Ergo, if that doesn’t happen, then something has gone very interestingly wrong in the process of scientific communication, which it might be worth trying to understand and fix.

42. Scott Says:

Stothers already got the same optimization problem.

Douglas #39: According to Virgi’s paper, that’s true for the 4th-power case (though for some reason, his numerical optimization software returned a worse exponent than Virgi’s). On the other hand, Stothers didn’t analyze the 8th-power case, which apparently required new and harder techniques.

43. bernard Says:

does the arxiv allow submissions of phd theses? If so, then why won’t all universities post the theses of their graduates on the arxiv? If not, then we can’t blame stothers. He wanted to leave academia and he wrote a thesis which was made public. Writing a separate paper after you left academia does not make sense.

44. Anonymous Says:

I find the language you used in this post quite disparaging towards Stothers’s work. Andy Stothers “discussed” an improvement, but Virgi’s improvement was a “breakthrough”.

It sounds like you are already trying to find a way to put one work on a pedestal and simultaneously wipe the other from the history books. It would certainly be better to spend time reading these works, to appreciate their insights and contributions before really worrying about these things. I’m sure both have wonderful new ideas in their own right.

45. zan Says:

I agree with Anonymous #44. This discussion is unprofessional.

46. Scott Says:

Cindy #40:

Stothers did communicate his results to several experts. It is not his fault that nobody blogged about it or tweeted it for him. His thesis is publicly available. He did what he could, but he left the field. It is the responsibility of those still in it to keep up with the literature.

No, it’s not anyone’s responsibility to “keep up with” the full text of every PhD thesis written. The scientific process does a pretty good job of bringing results as big as a lowering of ω to the forefront, but only if the authors allow it to. The reason no one “blogged about [Stothers’ result] or tweeted it for him” isn’t that there was some conspiracy against him; it’s just that hardly anyone knew what he did!

Most importantly, students looking to draw lessons here need to realize that Stothers didn’t “do what he could” to publicize his achievement. Even a world expert on fast matrix multiplication (Henry Cohn), who read Stothers’ PhD thesis, still didn’t understand for a long time that Stothers was claiming a lowering of ω!! (See comment #19.) And this isn’t a subtle or complicated question; you’ve either lowered ω or you haven’t. Looking at Stothers’ thesis reveals that Henry’s reaction was natural: it reads as if Stothers were intentionally burying the result, whether because of doubts about its correctness, a desire to polish the result further, or some other reason.

47. Scott Says:

zan #45: I completely agree that the discussion is unprofessional, and I’m saddened that a simple post about a great achievement for our field got derailed in this way. I had no intention (or even thought) whatsoever to adjudicate any priority dispute—as I said, I thought Virgi already handled the issue admirably in her paper—but after various commenters made what I thought were unjustified accusations, I did feel a need to answer them.

Look, when I post something controversial, I expect lots of controversy in the comments. But when even a post like this elicits such a venomous reaction, I feel like maybe the time has come to shut down this blog entirely.

48. Douglas Knight Says:

#42. oops. thanks.

49. Nex Says:

So why is this a breakthrough? To an outsider like me it looks like a completely insignificant improvement. Even to Stothers and his peers it must have looked quite insignificant if they didn’t even bother to have it properly published or publicized.

Scott,
I just had a chance to read this post and the following comments. I would like to react to your comment #47, by letting you know that as a graduate student, I feel this type of discussion is actually very important, and should definitely not be a reason for you to stop blogging – on the contrary.
As a graduate student I need to learn not only how to produce scientific results but also what I should do (and not do) to disseminate them, and how I should give credit when independent discoveries occur (as this is not a very rare event). Discussions of this sort help me learn what other people think about these questions.

I believe the right thing to do as a scientific community is not to try to stop discussions on such issues – as they are an inevitable part of scientific research and we shouldn’t pretend they do not exist. Instead, we should discuss them as openly as possible, and learn as much as we can from that discussion. And what is a better place for an open discussion than the comment thread of a scientific blog post?

51. Helger Lipmaa Says:

Scott said:

Or suppose he’d “published” it in a book, only one copy of which was ever printed, shelved in a library in rural Estonia.

I think you should have said “e-book, published on a webpage of an obscure farm in rural E-stonia” 🙂

52. Sasho Nikolov Says:

I think the bitter anon/semi-anon users have entirely missed the point. Virgi has derived general procedures that make it possible to automate the analysis of many of the CW constructions (but not all?). This is really the big deal, not who got the better constant and when. And this is the essential difference between her work and Strother’s.

Given how many interesting questions this brings, the discussion has been very disappointing (although for one I am curious why a PhD student buries this kind of result on the bottom of page 70 of his thesis and doesn’t even state a theorem).

I am wondering if the new techniques give tools to reason about different tensor powers in a more abstract way, i.e. without necessarily deriving the exact value of $latex \omega$. For example, an interesting question is whether there exists a “best” tensor power or there is an infinite sequence of tensor powers for which the $latex \omega$ decreases monotonically.

53. Kenneth W. Regan Says:

Scott’s description in the post is fair. If people wish to invest time in the meta-question of what was on Andrew Stothers’ mind (and his advisor’s?) when he could have been plugging his 0.0018’s worth, I suggest the primary sources: his blog, and his other blog.

54. Circe Says:

Sasha: I think it is a good sign the discussion is finally back to technical issues. I think the main result in the Williams paper is not the automation of the analysis of the CW construction, but that of the analysis of its tensor powers, _given_ the original construction. The other main part of the result seems to be that this automation procedure can potentially be applied to tensor powers of other constructions too, rather than just CW.

55. Sasho Nikolov Says:

@Circe, right, I am not sure if my comment made so much sense. I meant that Virgi shows how to automate the analysis of the $n$-th tensor power of any base algorithm from some class. My question w.r.t. whether the new techniques might allow proving some more high-level abstract theorems was for using the $k$-th tensor power of the CW algorithm A as a base.

56. Anthony Says:

… I hope these results do not lead to the sort of petty arguments we had to endure back when Deolalikar’s paper, and more recently, (the other) William’s paper were announced.

Congratulations to both authors anyway. I hope further improvements will follow.

57. jonas Says:

For people like me who are unfamiliar with these results, could you clarify about it a bit please?

What do these results suppose of the elements of the matrix? Can they come from some general number space, represented in an opaque way, such that the algorithm only does arithmetic operations on them, which are assumed to take unit time? If so, what is assumed of the number space (eg. field of zero characteristic, field, commutative ring, any ring)?

Also, are the matrices represented in the usual way, that is, a dense array of its elements? Do they instead have to be stored in a special way that’s more time consuming to encode or decode?

I tried to look this up. Knuth chapter 4.6.4. explains how to multiply two n×n matrices in O(n^2.808) time, and that this works on any ring with opaque elements, representing the matrices in the obvious way. About the more theoretical results (not practically applicable for small matrices) he is more vague though, and the second edition is older than the 2.376 record anyway.

58. Scott Says:

Nex #49:

So why is this a breakthrough? To an outsider like me it looks like a completely insignificant improvement. Even to Stothers and his peers it must have looked quite insignificant if they didn’t even bother to have it properly published or publicized.

It’s a breakthrough because the value of the matrix multiplication exponent ω is one of the oldest and most fundamental problems in all of theoretical computer science. A huge number of important algorithmic problems—many of them (like parsing context-free languages) totally unrelated to matrix multiplication at first glance—turn out to have nω complexities, or other complexities that depend on the strange and mysterious constant ω. Many people conjecture that ω=2, yet to date, some deep and serious attempts to prove this have failed (as have all attempts to prove ω>2). Failing a solution to the ω?=2 problem, any progress on better understanding ω is of interest.

So if Stothers thought his small improvement to ω was unimportant (and I don’t claim to understand his motivations), then he was simply mistaken! BTW, if you read Henry Cohn’s comments, he was under the (understandable) misconception, after reading Stothers’ thesis, that Stothers had not improved ω, or had retracted an earlier claim to do so. Henry knows perfectly well that an improvement to ω is a big deal, which might be one reason why he’s worked on the problem himself! 🙂

59. SB Says:

Stothers seems quite a laid back guy from his blog posts: http://andrewstothers.wordpress.com/2010/12/29/reckoning-the-2010-review/

It’s a meta-discussion that I do not want to provoke anyone into, but I cannot help but admire someone who achieves some breakthrough result, seemingly without sweating it and while having a broad range of other interests.

60. Scott Says:

jonas #57:

For people like me who are unfamiliar with these results, could you clarify about it a bit please? What do these results suppose of the elements of the matrix?

It’s a good question. Someone correct me if I’m wrong, but I believe Strassen’s algorithm, and all the subsequent fast matrix-multiplication algorithms that have been discovered, work for arbitrary rings (where we assume addition and multiplication of ring elements can be done in unit time). In particular, I don’t think any of these algorithms require division: they’re all essentially based on recursive multiplication of submatrices.

As an interesting meta-question, if we define variants of ω for different underlying number spaces (all fields, some specific field, rings, commutative rings, etc.), then I don’t know what equivalences are known among those variants. Can anyone enlighten us?

61. Scott Says:

Just a quick note: some people might be interested in a post I wrote years ago entitled The Relativity of Originality. In that post, I discussed the “Free Will Theorem” by Conway and Kochen—and how I had published almost the same result as them four years earlier (as a beginning grad student), but had buried it inside my review of Stephen Wolfram’s A New Kind of Science, and thereby failed to communicate it adequately.

Rather than making a stink about this, I resolved not to make the same mistake next time, and to feel grateful to Conway and Kochen for giving the point (basically, a certain corollary of the Bell-type inequalities, which uses nonlocality as merely a tool for drawing a conclusion about the different issue of nondeterminism) the much wider discussion that it deserved.

62. Bram Cohen Says:

Everybody, I wasn’t trying to spread misinformation in my last comment, I just misread what was said in Virgi’s paper. I think my brain just assumed that the number given by Stothers was in between the two previous records, given the claim in Scott’s post that the record hadn’t been broken in twenty years.

63. Bram Cohen Says:

Reading over Stother’s thesis, he does make very clear in chapter 4 (but not in the abstract!) that he’s claiming a lower value of w. It’s rather strange that he uses a suboptimal parameterization, but my guess is that he didn’t use Maple properly to do the optimization. Certainly the values he gives are correct. He also predicts both Virgi’s extension and that it wouldn’t reduce the exponent very much.

64. Anonymous Says:

Thesis
http://www.maths.ed.ac.uk/pg/thesis/stothers.pdf
p.71
Raising the original Coppermsith and Winograd algorithm to the third power did not yield a reduction in the upper bound for the value of ω. However, it has provided the framework for the method we will now use to derive an improvement.
p.81
ω<2.373689703

65. Anonymous Says:

p. 108
As conjectured above, it may be possible to raise Coppersmith and Winograd’s algorithm to the eighth (or higher) power to obtain a reduction in ω. However, it is likely that any gains obtained in doing this will be very small.

66. Markus Bläser Says:

I was the external reviewer of the PhD thesis of Andrew
Stothers. So let me explain what Andrew thought
(more precisely what I think that he thought) and
why I think that he is right (except for not putting
a preprint online): If he had proved his
result in 1988, then nobody would have used the word
breakthrough. It is a decent piece of hard work, but
we do not get any new insights. We do not get any
new tensors to start with nor do we get a really new
way of using tensors for matrix multiplication.
Andrew and Virginia are pushing the CW method to its limits
but that’s it. Nothing more and nothing less.
It is currently not clear to me how this will get any better
bound than 2.37… (or whatever the infimum of this process is,
it is certainly not 2; whenever you raise the tensor
to a higher power, the improvement also shrinks geometrically).

Strassen’s 2.81 is a breakthrough, because nobody believed
that there was something better than the naive algorithm.
Ten years later, Pan gave a new starting algorithms yielding
a small improvement. Then Bini et al. introduced
the concept of border rank. It was just a very small improvement
in absolute numbers and only one year after Pan’s improvement,
but it introduced a new and very useful concept.
It’s a breakthrough. Then Schönhage
showed how to use disjoint copies of matrix multiplication
tensor (“Tau theorem”), Strassen how to use tensors that were
not matrix multiplication tensors at all (Laser method)
but had an inner and outer structure
which are matrix multiplication tensors. Then Coppersmith and Winograd
had a new method where the outer structure need no be
a matrix multiplication tensor. Cohn & Umans and Cohn et al.
gave a completely new framework to study this problem.
(BTW there is a not so well known paper by Coppersmith and Winograd
in which they push the methods by Schönhage to its limits and
get something like 2.49.)

Andrew Stothers and his advisor prepared a journal article, I think
it is submitted. They should have submitted a preprint.
The calculations in Andrew’s thesis are correct and, according
to Andrew, where checked with the help of computers.

PS: Over fields, the exponent can only depend on the
characteristic of k.

67. Sniffnoy Says:

These things that depend on nω, do they actually depend on nω, or on the time required for matrix multiplication? If the infimum isn’t achievable these could be different. Are people only using it as a lower bound or something?

68. Sniffnoy Says:

Ergh. Do <sup> tags not work here or did I just enter that wrong? Must be the latter, right?

69. Scott Says:

Markus #66: Thanks very much for sharing your perspective.

If he had proved his result in 1988, then nobody would have used the word breakthrough.

I agree with that, but I also think the fact that it’s now 2011 rather than 1988—and that there have now been 23 years of people repeating the constant “2.376” in paper after paper, with no hint of any lowering despite serious efforts to find one—is extremely relevant.

70. Scott Says:

Sniffnoy #67: These dozens of other things depend on the time needed for matrix multiplication, and therefore depend on ω—regardless of whether ω is an actual minimum or “merely” an infimum. (In the latter case, recognizing context-free languages and the other problems equivalent to MM would also have an infinite sequence of better and better algorithms, with no single best one.)

71. A Breakthrough On Matrix Product « Gödel’s Lost Letter and P=NP Says:

[…] examined Stothers’ thesis, has contributed an important comment on the blog of Scott Aaronson here. It evaluates the significance of the work in-context, and also removes the doubt going back to […]

72. Circe Says:

It’s a meta-discussion that I do not want to provoke anyone into, but I cannot help but admire someone who achieves some breakthrough result, seemingly without sweating it and while having a broad range of other interests.

My thoughts too about Andrew Stothers. His bio on Linked In somehow leads me to imagine Aryabhata saying in his bio that “One of my side projects was to improve the approximation for $\pi$ that had been lying unimproved for the last 1000 years. But since the improvement was only in the last decimal place, I don’t consider it a big deal. Also, I learned writing Sanskrit verse along the way.”

73. Dániel Varga Says:

Wow, Andrew marked both his blogs private sometime in the last couple of hours.

74. Scott Says:

SB #59 and Circe #72: I vociferously agree that Stothers accomplished something extremely important (in fact, much more important than he himself or his thesis reviewers seem to believe!). And I agree that he deserves to be recognized for it (as he will be recognized—ironically, almost entirely because of Virgi’s paper!).

But I can’t for the life of me understand the idea that his failure to communicate his results in any way provides a model for emulation by others. I’ve actually seen this sort of thing—claiming a result while backing off from claiming it, circulating a writeup to a few people but not really circulating it, burying an observation where no one will find it—happen over and over again in science, and its invariable effect is to leave fields in a state of confusion. There’s an excellent reason why, 350 years ago, science moved from the “announce-by-cryptogram” model to the model of rapid, widespread dissemination of research. And I’m not willing to forsake the attendant gains in human progress, just because some commenters here seem to enjoy the romantic image of someone stuffing the proof of a theorem into a bottle, throwing the bottle into the ocean, then going back to collecting seashells (or whatever), secure in the knowledge that the history of mathematics will need to be rewritten once the bottle washes up on some distant beach a thousand years later. Sorry, not how it works in this civilization.

Scott’s blog posts have a tendency to turn into something controversial. FWIW, I dont think Scott made any comments which can be construed as provocative and gave both Virgi and Andrew their due credits. Yet, somehow it turned into … well, you know what …

Scott, in the extremely unlikely event that BQP \in BPTIME (n^{\log n}), you will make a great career in writing sensationalist fare (No offence intended, just a joke. Again, FWIW, I think you wrote the thing very correctly without taking sides.).

76. Alexander S. Kulikov Says:

Thanks for pointing out this result, Scott! Everybody is still discussing the recent result by Ryan and now we have an amazing result by Virgi. Strong family. =) Now it is Ryan’s turn again. =) So, I’m curios what Ryan is going to prove.

Two points:
1. About two years ago James Eve published his ideas on multiplying matrices in n^{2+o(1)} and passed away shortly after.
The report and Donald Knuth’s comment on it is available here: http://www.cs.ncl.ac.uk/publications/trs/abstract/1169

2. There is a very nice presentation by Uri Zwick on graph algorithms based on matrix multiplication:
http://logic.pdmi.ras.ru/ssct09/slides/SSCT09_Uri_Zwick.pptx
(In case somebody is interested we also have a video recording of his minicourse.)

77. Joshua Zelinsky Says:

Follow up to Sniffnoy’s question: Many of the problems that use matrix mulitplication in a critical step are problems that have matrices that are restricted in some way due to how they arise from the problem (like matching sets in graph theory). But I haven’t seen anyone look at those and use the restricted forms to either improve on the matrix multiplication in those contexts or alternatively to show that the actual matrix multiplication will end up being faster. Is this just too difficult or is there just not much room for improvement there? Naively for example, I’d think that multiplying matrices that have coefficients in some limited set would be easier (so for example if the coefficients are in {0,1}).

From Stothers’ thesis abstract:

“Chapters 3 and 4 build on the work of Coppersmith and Winograd and examine how cubing and raising to the fourth power of Coppersmith and Winograd’s ‘complicated’ algorithm affect the value of omega, if at all.”

Yes, it does have an effect.

79. Chronoz Says:

Looking at Stothers’ LinkedIn profile reminds me of the alt-text of this ridiculously amusing xkcd comic. http://www.xkcd.com/664/

Do check out the alt-text.

80. Douglas Knight Says:

Joshua Zelinsky, Markus Bläser, above, asserts that the answer cannot depend on the field of coefficients, only the characteristic. This allows the possibility that restricting to 0-1 matrices makes it easier, especially if char 2 is easier than char 0, but it is pretty suggestive that it doesn’t help.

81. Scott Says:

Scott, in the extremely unlikely event that BQP \in BPTIME (n^{\log n}), you will make a great career in writing sensationalist fare

Thanks man; I’ll take a compliment where I can find one. 😀 Somehow, though, I seem to have a knack for writing what gets angrily derided as “sensationalist fare,” even when I write about completely-innocuous-seeming things that are part of the scientific consensus. For example: “It is not currently known that BQP ⊆ BPTIME(nlog(n)).” Now let the flamewar begin…

82. Russell Impagliazzo Says:

Ken, those links to Strothers’ blogs say they are private and I don’t have permission to read them. Is posting the link an ironic comment on his publicity shy nature?

83. Sid Says:

@Russell Impagliazzo: According to comment 73, they were public but were marked private soon after the links were posted.

84. Kenneth W. Regan Says:

The only irony I intended was on the smallness of the advance compared to the idea of “plugging” it (idiom note: “plugging” in this context means heavy advertising/selling). I gave Dick an opinion similar to Markus Bläser’s by phone this morning to say “here’s where he’s coming from” before signing off on the balance of opinion given in our post. I spent almost an hour reading Stothers’ blogs, for reasons ranging from wanting to pin down the dates in 2010 for the post to personal interest, before making the comment above. I felt that the conversation going on here should be informed by them. I’ve addressed this privately.

In phone conversation just now we were talking about Strassen’s belief that \omega > 2, which is publicly known. Dick suggested 2 + 1/e = 2.367879441… as a possible floor, right up against what’s known. Any takers or breakers?

85. John Sidles Says:

I keep an informal list of creative work accomplished by folks whose subsequent main career was non-research and/or non-academic and/or non-mainstream; prominent examples include Charles Hermite’s student Jules Henri Poincaré (mining engineer), Ludwig Wittgenstein’s philosophy student Maurice Drury (medicine), Bertram Kostant’s student James Harris “Jim” Simons (financial engineering), Richard Feynman’s student Marc Kislinger (medicine), and even musician Horatio Parker’s student Charles Ives (insurance).

Here the point is that perhaps Andrew Stothers has evaluated the worth of his own thesis’ main result by standards that are not universally embraced, yet nonetheless are entirely reasonable.

86. gabor Says:

@Markus Bläser:

It is currently not clear to me how this will get any better
bound than 2.37… (or whatever the infimum of this process is, it is certainly not 2; whenever you raise the tensor to a higher power, the improvement also shrinks geometrically).

Do you have a proof of this? According to Williams’s paper, the 3rd tensor power gave no improvement over 2.375…, yet the 4th tensor gives something. The behavior seems to be non-monotone with respect to the Nth tensor power. If you are correct, then from the 2nd to the 3rd there is no improvement, so why even check the 4th since the improvement can only be smaller…?

The calculations in Andrew’s thesis are correct and, according to Andrew, where checked with the help of computers.

What do you mean by “the calculations are correct”?

87. A new faster algorithm for matrix multiplication « Introduction to Algorithm Analysis and Design Says:

[…] Andrew Stothers also had a similar (but a bit weaker result) last year in his PhD thesis). See Scott Aaronson’s blog post and Dick Lipton and (our own) Ken Regan’s blog post for more on the […]

88. pete Says:

Here the point is that perhaps Andrew Stothers has evaluated the worth of his own thesis’ main result by standards …”

It appears that Blaser, Cohn, Strassen and Shokrallahi evaluated his thesis and said they did not care too much. One should not blame Stothers for therefore not advertising his result. It appears that he was, sadly, not encouraged to do so.

89. Jeff Says:

I think some of the negative reactions are due to the fact that two similar results appears to have been viewed differently. We can blame Stothers for not publicizing his work, but one could jump to the conclusion that who one knows is as important as what one does. If Stothers made this discovery while at MIT or Berkeley, would it have been treated differently?

I think the “point” is that it is exciting that progress has been made on such a fundamental and stubborn problem. This comment thread seems to have moved away from that point.

91. Henry Cohn Says:

It appears that Blaser, Cohn, Strassen and Shokrallahi evaluated his thesis and said they did not care too much.

I’m sorry if I gave that impression, and I don’t read Bläser’s comment as saying he didn’t care. I have no idea whether Strassen or Shokrollahi ever looked at the thesis itself, since I last talked with them about it before any of us were aware it was available online. Of course I can’t speak for them regarding evaluation, but they evidently thought the improved exponent was news worth discussing. I did look at the thesis some time later (while searching to see if Stothers had a preprint), and unfortunately I jumped to conclusions based on the abstract and introduction, which were pretty vaguely written; in particular, I believed that Stothers had in the end failed to improve on the Coppersmith and Winograd exponent using the 3rd and 4th tensor powers. That would have been a less exciting outcome, but even then working this out carefully would have been a worthwhile addition to the literature (by showing that a natural generalization did not yield an improved exponent). Actually getting some progress on lowering the exponent after two decades is definitely exciting.

I think the “point” is that it is exciting that progress has been made on such a fundamental and stubborn problem.

I agree!

92. breakthrough Says:

On Scott’s comment #3:
If you only need to: “memorize the statements of their next ten breakthrough theorems” you should not be worried by old men memory problems as it looks like there will be 20 more before you’ll grow your first grey hair.

(will someone stop this couple from solving all our problems?)

93. Andrew Marshall Says:

It’s possibly worth mentioning that the Wikipedia article on matrix multiplication still mentions CW as the record holder. Perhaps one of the experts in the field fancies updating it?

http://en.wikipedia.org/wiki/Matrix_multiplication#Algorithms_for_efficient_matrix_multiplication

94. Scott Says:

breakthrough #92: I share your fear about Ryan and Virgi; but on the other hand, my memory is already rapidly worsening with nary a gray hair on my head. This will be quite a race…

95. Lou Scheffer Says:

Andrew #93: OK, Wikipedia updated.

96. Scott Says:

Jeff #89:

I think some of the negative reactions are due to the fact that two similar results appears to have been viewed differently. We can blame Stothers for not publicizing his work, but one could jump to the conclusion that who one knows is as important as what one does. If Stothers made this discovery while at MIT or Berkeley, would it have been treated differently?

Aha! Again and again, when people react angrily to something I’ve written, I don’t even understand where the animus is coming from until someone points out an alternative reading of my post that’s so uncharitable, so detached from what I was thinking, that it would never have occurred to me on my own. Given the cosmopolitan, globetrotting nature of TCS, I figured it went without saying that an improvement of ω would be recognized as a Big Deal even if it came from the moon—provided the moon-people posted a preprint or something! (OK, I suppose a lowering of ω by moon-people would be a Big Deal even for reasons wholly unrelated to the asymptotic complexity of matrix multiplication…)

On reflection, though, Andrew’s geographic location almost certainly did play a role: not directly, through some sort of anti-Scottish sentiment (which would be especially strange on my part 😉 ), but indirectly, through making it less likely that he’d get the advice he needed about how to publicize his work.

97. John Sidles Says:

Scott’s Great Truth: “Given the cosmopolitan, globetrotting nature of TCS …”

Scott has expressed a Great Truth whose opposite great truth also has been evident in some aspects of the Stothers-Vassilevska episode:

The Opposite Great Truth of TCS:“Given the parochial, insular nature of TCS …”

It’s good news (IMHO) that the centroid of the Stothers-Vassilevska discourse has been steadily shifting towards TCS as a cosmopolitan/global enterprise, and for this shift we owe thanks to weblogs like Gödel’s Lost Letter and Shtetl Optimized.

98. Rahul Says:

Scott: If the upshot of this brouhaha is that you’re now more likely to visit Edinburgh, well, I really can’t complain…

99. Lou Scheffer Says:

For at least some of us, “the entire concept of using a CS theory blog to “promote” major milestones in CS theory” is precisely why we read this blog. It’s no different than running into a colleague who enthusiastically mentions a recent result they’ve run across. You might believe it, scoff at it, run to find the article, and so on, but the simple fact that a colleague found this interesting is a enormously useful hint. This is particularly true for those such as myself, who are interested in CS theory but not professionally involved, and live/work/study where there are few if any CS theorists to run into. You used to have to reside in a university department (or corporate lab) to get this timely and supremely useful insight, but now blogs at least theoretically extend this to anyone in the world who is interested. So I for one am extremely appreciative, whether or not I agree with the commentary.

100. curious Says:

Shouldn’t Virgi have made the world known about Andrew’s breakthru as soon as she realized that it was a breakthru?

101. Scott Says:

curious #100: Virgi did more than anyone else (including Andrew) to “make the world known about Andrew’s breakthru”, by discussing it prominently in her paper in relation to her own breakthru. I’d say she’s discharged her responsibilities there.

102. Jeff Says:

Tangentially, as someone in industry, I think the fact that PhD dissertations are only available behind paywalls is a shame. If they were published on arxiv or better yet something along the lines Tim Gowers described might lead to a wider group of people examining them.

103. Cindy Says:

Thank you, Scott, for your updates. I was not especially happy with the attribution in the original blog post.

It does seem like Stothers received bad advice, and was harmed by his poor connections. He also made a mistake by burying the result to some degree. I wonder why things worked so differently for Williams. Posting a pdf on a personal web site is also an obscure way to distribute research, and it seems like her announcement was the same as Stothers’ announcement, by email to a few experts–but the reception has been completely different.

104. pete Says:

“(a) that this tangled situation shouldn’t in any way detract from Virgi’s fantastic achievement, which (except for a simplification, as she discusses) must be considered completely independent of Andrew’s, and”

Yes, and this tangled situation (of not getting good advice) should not in any way detract from Stothers’ fantastic achievement of breaking the 20-year bound! Which of course was completely independent of Virgi’s.

105. Aaron Says:

Scott, you’re telling people who are sick of hyping to go to hell. I appreciated the entire discussion until you decided to react childishly and add a couple of more reasons for the “controversy” you constantly seem to seek.

I find this a bit insulting to other people who don’t necessarily share your taste in improving upper bounds by epsilon using recycled techniques+case analysis. I mean, no offense to the author, the result is nice and important, but my personal belief is that it will, most probably, never help us get significantly close to the lower bound.

Aside from that, keep up the good work, and wishing for the best.

106. Scott Says:

Cindy #103:

I wonder why things worked so differently for Williams.

Well, for starters,
(1) she’s submitted her paper to a major theory conference, and
(2) she didn’t bury her main result deep in the body of the paper, but put it right in the title and abstract (the parts of the paper that ~99% of readers never get past 🙂 ).

Because of (1) and (2), even if the news hadn’t happened to reach people (e.g., me, Dick Lipton) who were excited to blog about it, it would have gotten out very quickly anyway, in contrast to the Stothers case.

107. Ce se mai intâmplã in informatica teoreticã | Isarlâk Says:

[…] multe puteţi citi in limba englezã pe blogurile lui Scott Aaronson, Bill Gasarch şi Lance Fortnow, respectiv Robert Lipton. Acest din urmã blog conţine şi un […]

108. Scott Says:

Aaron #105:

I find this a bit insulting to other people who don’t necessarily share your taste in improving upper bounds by epsilon using recycled techniques+case analysis.

That was a remarkably-ironic ending to sentence that began “I find this a bit insulting…”! 😀

Here’s what I think are reasonable ground rules. If (what everyone agrees to be) a solid new technical result doesn’t interest you, just ignore it. If a blog keeps mentioning technical results none of which interest you, stop reading that blog. The only reasons to actually denigrate a new result in the blogosphere would be if, for example, the result was wrong, it was well-known for a long time, it was a complete triviality, it was being grossly overhyped in major media outlets (like CNN or The New York Times, not a couple nerd blogs), or people were drawing (or the authors were encouraging them to draw) wrong conclusions from the result about broader issues. Even if one of these conditions holds, one should still try to highlight what, if anything, is good about the result. But none of the conditions holds in this case. If your only beef with a particular result is that it isn’t your cup of tea (to mix food groups), then I don’t think there’s any need to tell the world.

109. Paul Beame Says:

BTW: Scott mentions in the post about “knowing about the result a month ago”. This is the kind of thing that one can hear about in conference foyers or over lunch (that was true for several of us at FOCS for this result, though not with the details), though I don’t want to speculate whether that was the path to Scott this time, since he wasn’t there. Maybe another bit of advice to students…?

110. Aaron Says:

Scott, since you agree that it may just as well be a matter of taste, you should maybe consider that it’s still ridiculous to undermine everyone who doesn’t share your opinions (and I’m saying this leaving all my own opinions aside). It’s a new result and it’s cool, that’s great. But if I don’t think it’s a breakthrough, I should gtfo, right?

111. AR Says:

Aaron said:
it’s still ridiculous to undermine everyone who doesn’t share your opinions

Isn’t this what most comments (including you) are trying to do? Scott posts “I think X” and comments are full of puppets typing many words trying to undermine belief in opinion X.

…my personal belief is that it will, most probably, never help us get significantly close to the lower bound.

Please, explain what qualifies you to this claim. Do you have any proof? Or is it just trolling?

112. Arul Says:

Doesn’t Cohn and other researchers’ GT method recover CW’s 2.376? Is it possible and wouldn’t it be really great that it recovers the 2.373 as well?

113. curious Says:

But Scott she didn’t have to wait till she improved the result and submitted her paper. Couldn’t she have made the world known about Andrew’s breakthough as soon as she came to know about it?

I don’t know how Cohn can say now that the result was buried deep in Andrew’s thesis and hence he didn’t realize that there was breakthrough stuff in there. Was he not supposed to go through the thesis carefully as an examiner?

114. anonymous commenters are bad Says:

curious, I think you miss the point of independent research. She came to know of Stothers’ result after she had already developed her result.

I don’t think Cohn was an examiner of the thesis. He is just a world expert on matrix multiplication. Someone told him about Stothers’ work. It seem as though he read the intro/abstract of the thesis and didn’t think the thesis was worth reading any further.

115. Henry Cohn Says:

I don’t know how Cohn can say now that the result was buried deep in Andrew’s thesis and hence he didn’t realize that there was breakthrough stuff in there. Was he not supposed to go through the thesis carefully as an examiner?

Bläser was actually the examiner, while I had no role in this (and in fact didn’t see the thesis until I found it on the web some time after Stothers graduated).

116. curious Says:

@Cohn: Thanks for the clarification. I had misunderstood and therefore sincerely apologize to you.

117. ano Says:

I suspect that part of the fodder for the nasty turn the comments took (/have taken) is that you edited your original post when you learned of Stothers’s result, instead of only appending to it.

The original post said:
“For twenty years, the fastest known algorithm [..] took a leisurely O(n^2.376) steps. But no more. Virginia Vassilevska Williams, [..] breakthrough paper [..] multiply n-by-n matrices in a lightning-fast O(n2.373) steps. [..] Huge congratulations to Virgi!”

which is fine. You’re excited, as everyone should be, about the improvement in the exponent.

But after you edited it, it looked (for a while) something like:

“For twenty years, the fastest known algorithm to multiply two n-by-n matrices, due to Coppersmith and Winograd, took a leisurely O(n^2.376) steps. Last year, though, buried in his PhD thesis where no one would read it, Andrew Stothers discussed an improvement to O(n^2.374) steps. And today, Virginia Vassilevska Williams of Berkeley and Stanford, released a breakthrough paper…”

The clause “buried…” in the second sentence (which I may have paraphrased) is understandable for those who read the first version: it’s your (fair enough) excuse for why you hadn’t heard of it when you posted the first version. But someone who saw only the second version misunderstood it and felt you were dismissing one of the results and calling the other a breakthrough.

And the core point, that there has been a breakthrough made — the exponent has been improved — by two people, was lost in this chatter.

Congrats again to Williams and Stothers for their fantastic work.

118. SB5 Says:

Is Virgi related to Ryan Williams? Oh I so hope she is! 🙂

119. Ano Says:

From an outsider’s point of view, over-advertising such results as breakthrough might not be a good strategy. This says something about how relevant of TCS is to modern computing sciences, where there are much more interesting things going on. I also find it strange the culture of improving 0.001 percent without thinking about what real difference it makes.

120. Scott Says:

SB5: Uh, they’re related by way of being married to each other.

121. Arul Says:

http://epubs.siam.org/sicomp/resource/1/smjcat/v11/i3/p472_s1?isAuthorized=no

The abstract here says that the limit point of $2 + \delta$ for some $\delta > 0$ (and may be all $\delta > 0$) cannot be achieved by any one algorithm. Do you have any thoughts about what they mean?

122. Charles Says:

Actually, the economics blogs I read are quite hype-full, much more so than this blog.

Regardless, keep up the good work, Scott! I really enjoy reading your blog; not only does it keep me informed about the major events in TCS but it’s entertaining.

123. gabor Says:

The abstract here says that the limit point of $2 + \delta$ for some $\delta > 0$ (and may be all $\delta > 0$) cannot be achieved by any one algorithm. … Do you have any thoughts about what they mean?

Strassen algorithm does 2 by 2 matrix multiplication with 7 multiplies which yields $O(n^{\log_2 7})$ operations for $n$ by $n$. They show for every $m$, if you can $m$ by $m$ matrix multiply with $k$ multiplies, then $\omega < \log_m k$, ie no single $m$ by $m$ algorithm can get the "best" exponent. This is why $\omega$ is defined as an infimum.

124. jonas Says:

125. Farbod Says:

On page 3 of Vassilevska Williams’s draft it is mentioned that the author was indeed aware of the improvement made by Stothers *prior* to the completion of her work. She also mentions that she has benefited from some observations of Stothers.

Given this, I am a bit puzzled by the title chosen for her paper. A better title would have eliminated much of the controversy. Also, assuming that the announcements on this and other blogs were made after reading the introduction to her paper, the argument that Stothers’ work was unknown to many (hence uncredited) doesn’t make much sense.

This new contribution of Vassilevska Williams could be seen as an advertisement of Stothers’ great work, as well as great new generalizations.

126. Scott Says:

Farbod #125: No, that’s not correct. Virgi found out about Stothers’ thesis after she’d already lowered ω. She was then able to apply an idea from his thesis to simplify her already-existing proof. It’s really not that complicated; I wish people could get it straight so we could move on.

127. Farbod Says:

By “the completion of her work” in my previous comment I meant finishing up the final draft and making the announcement.

Given that she was made aware of the fact that the other work had already “broken the barrier” (independently, but earlier nonetheless), the title could have been chosen more carefully.

128. DI Says:

I tend to agree with Farbod.
Time 1: Williams lowers w
Time 2: Williams discovers the work of Stothers. As this is very related to her work, she should read this and verify whether the result of Stothers is correct. It seems that this is what William did because after all, she can apply some Sthothers’ techniques to simplify some proofs.
Time 3 (at the moment of communicating the result) : Williams did already read in details the Stothers’ work and cannot find any flaw in it (otherwise she should mention it), she should realize herself that Stothers’ work is the first that breaks CW barrier. She did choose however a title that make her result as the first that breaks CW barrier.

I also agree with Cohn: if there is no serious flaw in Stothers’ work, we should credit Stothers’ work as the first that breaks CW barrier. The Williams’s work could be considered as an independent great result and vast improvement on Stothers’ work.

129. The Meaning of Omega « Gödel’s Lost Letter and P=NP Says:

[…] note that Markus Bläser has contended in a comment that the extension of CW used by both new papers has limitations, and we infer that some other […]

130. What is the worse case time complexity of matrices multiplication in Matlab? - Quora Says:

[…] Peripheral point: It seems that new lower bounds have been discovered since Coppersmith-Winograd:http://www.scottaaronson.com/blo…Comment downvoted • 10:45amCannot add reply if you are logged out.10:45am Add […]

131. Markus Bläser Says:

@various people

1) Do not get me wrong. I like the work by Andrew and Virginia.
Both are great pieces of hard work.
But if you know the work by Coppersmith and Winograd (and from
you know the limitations of their methods before reading their papers. I do not have a formal lower bound of what you can get
out of the outer structure of the Coppersmith and Winograd
tensor, but from what one typically observes, the improvement
shrinks geometrically with the power you take.

2) One possible reason why you do
not get anything from the third power is that it is the product
of the second and the first power. The second power is
superior to the first power. Note that taking the second,
fourth or whatever power is done for the sake of analysis.
In the construction, you take a huge power of the tensor.
And then it is better to think of it as a product of
second powers of the tensor than of a product of
second powers as well as first powers. (Again, I cannot
prove this.)

3) Ian Gordon, the internal examiner, and myself told
Andrew after his viva to put a preprint online as soon
as possible. Read the update in the post why he did not.

4) “The calculations in Stother’s paper are correct” in my
comment means that they are not wrong.
Some comments before raised the question whether there
was a serious flaw in Stother’s work. It is not.

132. Arul Says:

@Mark Blaser: Comment #66 “whenever you raise the tensor
to a higher power, the improvement also shrinks geometrically” Is there a proof for this?

133. Arul Says:

@Gabor: Comment #125 Thankyou

134. Jr Says:

The Strassen/Coppersmith-Winograd/Strothers/Vassilevska line of results are purerly algebraic, right? Has the complexity of matrix multiplication been studied in other models, say on Turing machines?

135. techtings» Key mathematical tool sees first advance in 24 years Says:

[…] knew a month ago that this was coming – I had a hell of a time keeping the secret,” wrote Scott Aaronson, another computer scientist and blogger, based at the Massachusetts Institute of […]

136. Cup Sets, Sunflowers, and Matrix Multiplication | Combinatorics and more Says:

[…] =2.376, to 2.374 and 2.373 respectively. (See the discussions over Lipton’s blog (1,2), Shtetl optimized, and Computational […]

137. Amir Shpilka on fast matrix multiplication | MyCQstate Says:

[…] : multiplying a column by a row matrix requires the computation of all possible products. A lot of excitement was generated recently when the best previously known upper bound, due to Coppersmith and Vinograd […]

138. The Stothers Situation | Andrew's Blog Says:

[…] asking me about my PhD thesis- confused, it emerged that a theoretical computer science lecturer had been writing about it. Fortunately, my twitter was private at the time and they were spared puns and inane ramblings […]