Customers who liked this quantum recommendation engine might also like its dequantization

I’m in Boulder, CO right now for the wonderful Boulder summer school on quantum information, where I’ll be lecturing today and tomorrow on introductory quantum algorithms.  But I now face the happy obligation of taking a break from all the lecture-preparing and schmoozing, to blog about a striking new result by a student of mine—a result that will probably make an appearance in my lectures as well.

Yesterday, Ewin Tang—an 18-year-old who just finished a bachelor’s at UT Austin, and who will be starting a PhD in CS at the University of Washington in the fall—posted a preprint entitled A quantum-inspired classical algorithm for recommendation systems. Ewin’s new algorithm solves the following problem, very loosely stated: given m users and n products, and incomplete data about which users like which products, organized into a convenient binary tree data structure; and given also the assumption that the full m×n preference matrix is low-rank (i.e., that there are not too many ways the users vary in their preferences), sample some products that a given user is likely to want to buy.  This is an abstraction of the problem that’s famously faced by Amazon and Netflix, every time they tell you which books or movies you “might enjoy.”  What’s striking about Ewin’s algorithm is that it uses only polylogarithmic time: that is, time polynomial in log(m), log(n), the matrix rank, and the inverses of the relevant error parameters.  Admittedly, the polynomial involves exponents of 33 and 24: so, not exactly “practical”!  But it seems likely to me that the algorithm will run much, much faster in practice than it can be guaranteed to run in theory.  Indeed, if any readers would like to implement the thing and test it out, please let us know in the comments section!

As the title suggests, Ewin’s algorithm was directly inspired by a quantum algorithm for the same problem, which Kerenidis and Prakash (henceforth KP) gave in 2016, and whose claim to fame was that it, too, ran in polylog(m,n) time.  Prior to Ewin’s result, the KP algorithm was arguably the strongest candidate there was for an exponential quantum speedup for a real-world machine learning problem.  The new result thus, I think, significantly changes the landscape of quantum machine learning, by killing off one of its flagship applications.  (Note that whether KP gives a real exponential speedup was one of the main open problems mentioned in John Preskill’s survey on the applications of near-term quantum computers.)  At the same time, Ewin’s result yields a new algorithm that can be run on today’s computers, that could conceivably be useful to those who need to recommend products to customers, and that was only discovered by exploiting intuition that came from quantum computing. So I’d consider this both a defeat and a victory for quantum algorithms research.

This result was the outcome of Ewin’s undergraduate thesis project (!), which I supervised. A year and a half ago, Ewin took my intro quantum information class, whereupon it quickly became clear that I should offer this person an independent project.  So I gave Ewin the problem of proving a poly(m,n) lower bound on the number of queries that any classical randomized algorithm would need to make to the user preference data, in order to generate product recommendations for a given user, in exactly the same setting that KP had studied.  This seemed obvious to me: in their algorithm, KP made essential use of quantum phase estimation, the same primitive used in Shor’s factoring algorithm.  Without phase estimation, you seemed to be stuck doing linear algebra on the full m×n matrix, which of course would take poly(m,n) time.  But KP had left the problem open, I didn’t know how to solve it either, and nailing it down seemed like an obvious challenge, if we wanted to establish the reality of quantum speedups for at least one practical machine learning problem.  (For the difficulties in finding such speedups, see my essay for Nature Physics, much of which is still relevant even though it was written prior to KP.)

Anyway, for a year, Ewin tried and failed to rule out a superfast classical algorithm for the KP problem—eventually, of course, discovering the unexpected reason for the failure!  Throughout this journey, I served as Ewin’s occasional sounding board, but can take no further credit for the result.  Indeed, I admit that I was initially skeptical when Ewin told me that phase estimation did not look essential after all for generating superfast recommendations—that a classical algorithm could get a similar effect by randomly sampling a tiny submatrix of the user preference matrix, and then carefully exploiting a variant of a 2004 result by Frieze, Kannan, and Vempala.  So when I was in Berkeley a few weeks ago for the Simons quantum computing program, I had the idea of flying Ewin over to explain the new result to the experts, including Kerenidis and Prakash themselves.  After four hours of lectures and Q&A, a consensus emerged that the thing looked solid.  Only after that gauntlet did I advise Ewin to put the preprint online.

So what’s next?  Well, one obvious challenge is to bring down the running time of Ewin’s algorithm, and (as I mentioned before) to investigate whether or not it could give a practical benefit today.  A different challenge is to find some other example of a quantum algorithm that solves a real-world machine learning problem with only a polylogarithmic number of queries … one for which the exponential quantum speedup will hopefully be Ewin-proof, ideally even provably so!  The field is now wide open.  It’s possible that my Forrelation problem, which Raz and Tal recently used for their breakthrough oracle separation between BQP and PH, could be an ingredient in such a separation.

Anyway, there’s much more to say about Ewin’s achievement, but I now need to run to lecture about quantum algorithms like Simon’s and Shor’s, which do achieve provable exponential speedups in query complexity!  Please join me in offering hearty congratulations, see Ewin’s nicely-written paper for details, and if you have any questions for me or (better yet) Ewin, feel free to ask in the comments.

Update: On the Hacker News thread, some commenters are lamenting that such a brilliant mind as Ewin’s would spend its time figuring out how to entice consumers to buy even more products that they don’t need. I confess that that’s an angle that hadn’t even occurred to me: I simply thought that it was a beautiful question whether you actually need a quantum computer to sample the rows of a partially-specified low-rank matrix in polylogarithmic time, and if the application to recommendation systems helped to motivate that question, then so much the better. Now, though, I feel compelled to point out that, in addition to the potentially lucrative application to Amazon and Netflix, research on low-rank matrix sampling algorithms might someday find many other, more economically worthless applications as well.

Another Update: For those who are interested, streaming video of my quantum algorithms lectures in Boulder are now available:

You can also see all the other lectures here.

18 Responses to “Customers who liked this quantum recommendation engine might also like its dequantization”

  1. Mike Says:

    Does this new approach offer clear insights that might be applicable beyond the machine learning case?

  2. Carl Lumma Says:

    So glad you blogged about this — I didn’t get the spelling of Ewin’s name after brunch the other week, and no similar name is listed as a postdoc on your home page (I didn’t guess he was an undergrad!), and couldn’t find a relevant video on the Simons Institute youtube channel.

  3. Scott Says:

    Mike #1: Good question! Check out the comments in Ewin’s paper about L2 sampling; they might offer some insights about where else these ideas might be useful. I should point out, though, that in many cases (Simon’s problem, period-finding, Forrelation, quantum walk on glued trees…), people have proved exponential separations between the quantum and randomized query complexities, which means that there certainly won’t be a dequantization in those cases along the same lines as what Ewin did.

  4. New top story on Hacker News: Customers who liked this Recommendation Engine may also like its Dequantization – Tech + Hckr News Says:

    […] Customers who liked this Recommendation Engine may also like its Dequantization 3 by beefman | 0 comments on Hacker News. […]

  5. Vaso Vuk Says:

    Hello I’m interested to implement this algorithm — I think memoization would end up creating great reductions in run time. Quantum computing could lead to exponential speedups, but it is unclear how one would draw the quantum circuit for such a sophisticated idea at this point. Any leads on that would be greatly appreciated.

  6. asdf Says:

    > Yesterday, Ewin Tang—an 18-year-old who just finished a bachelor’s at UT Austin, and who will be starting a PhD in CS at the University of Washington in the fall

    Could he finish that PhD before he even starts it? That result plus some incantations sounds like a thesis.

  7. Scott Says:

    asdf #6: Possibly, but why? Even when an undergrad is already an accomplished researcher, grad school provides an excellent opportunity to become even better. The pay isn’t high, but one will never again have as much free time to learn new things and think about hard problems (at least in my experience). At this level, it’s not about clearing some arbitrary milestone so one can get a degree, but about developing into the strongest researcher one can be.

    Having said that, if U. of Washington causes any trouble, we in Austin stand ready to take Ewin back… 🙂

  8. J-K Says:

    Just to chip in that recommender algorithms are also used in biology.

  9. C. Sowak Says:

    You are so right about learning and time. I think not realizing that early enough might have been one of the biggest mistakes in my whole life.
    The other thing is that it’s really good for one’s character to live without much money – helps you to stay closer to the roots.

    Nice one! Keep up the good work 🙂

  10. Scott Says:

    J-K #8: Cool! For what?

  11. Brian Says:

    Vaso #5: The implementation of this will require working through some of the assumptions in the paper. For one, a “low-overhead data structure”. Maybe it could be an extension of an already well-engineered framework for building recommendation systems.

    Scott #10: Here’s a paper where researchers predicted cancer drug response using a recommendation system:

  12. pst Says:

    Shot in the dark, just because I happened to be skimming yesterday and it seems to do with training machines on sparse data — related?

  13. JimV Says:

    After reading the post and then coming back the next day and reading the title again, I see what a great title it was. As my nephews say, sweet!

  14. rick samborski Says:

    You say: “Now, though, I feel compelled to point out that, in addition to the potentially lucrative application to Amazon and Netflix, research on low-rank matrix sampling algorithms might someday find many other, more economically worthless applications as well.” Don’t you mean more economically worthwhile applications as well? Or maybe that’s a joke.

  15. Scott Says:

    rick #14: Reread that sentence in the context of the whole paragraph and see if you get the joke.

  16. Mike Maxwell Says:

    I glance at this blog occasionally in hopes that some day I’ll understand s.t., and that it will make a difference. Alas, I’m a linguist with some computing knowledge (I understand the basics of algorithmic complexity), but most of this is over my head.

    But I have a question about a linguistic application that *sounds* like it might be related to this recommendation problem. In morphology, we have languages with multiple inflection classes, like Spanish -ar/-er/-ir verbs (similar systems in other Romance languages); the set of affixes is the same for all verbs within an inflection class, for example Spanish hablar (“to speak”) and arreglar (“to arrange”) both belong to the ar class, and therefore have all the same suffixes. Often some of the affixes are the same across all classes, for instance -o is the first person present indicative for -ar, -er and -ir verbs.

    Some languages have far more than three such systems; and these inflection classes can occur for nouns and adjectives as well as for verbs. And languages tend to have lots more nouns than verbs.

    To some extent cross-cutting these inflection classes are alternations in the stem. The Spanish verb tener (“to have”) is -er class, and venir (“to come”) is -ir class; and although they have some affixes that are different because of this, they have many of the same changes in their stem (= the part before the suffix), like tienes (“you have”) and vienes (“you come”), where the ‘e’ of the stem has changed to ‘ie’.

    A linguistic term that I’ll need to use is ‘lexeme’: it means a word, including all its affixed affixed forms. For English, an example would be RUN, with inflected forms run, runs, running, and ran.

    Spanish dictionaries of course tell you all this, i.e. given a lexeme, the dictionary tells you enough that you can construct all its affixed forms. But if you’re working on a less studied language, you may not have a good dictionary, indeed you may be creating the language’s first dictionary from words you find in corpora (texts). And except for the most common words (and sometimes not even then), you won’t find every form of every word in the corpus. In other words (pun intended), the matrix of inflected forms of lexemes is sparse (the matrix would contain, for each of the m lexemes, all n of its affixed forms, including stem changes).

    Your job as a lexicographer is then to decide which inflection class and which kinds of stem change a given lexeme has (or could have, since the answer may be ambiguous given the attested forms). For any given a column (let’s say different verbs are different columns) that’s the most similar to that column, in some sense of similar (same affixes, same stem changes).

    I’m guessing there’s something about this problem that is different from the recommender problem, but I’m not seeing what it is. (Actually, I know of one such difference: it has to do with the number of inflection classes you get in real languages, which is far lower than what you could get in principle, given the number of distinct affixes. But part of my interest lies in why this constraint exists in real languages.)

  17. Yovel Says:

    Hi Ewin (hopefully you are reading this),
    I’m going to start my thesis soon and I would love to try writing it about your paper. Of course I would not want to steal your own paper (plus you will probably be in a better position to write such a paper :)). Are you planning to try looking into lowering the polylog degree yourself soon, or is it free for grabs?

  18. Ewin Says:

    Mike #16:

    That does seem like a sensible application! I will note, though, that if n is sufficiently small (as it sounds like it might be in this case), the brute-force ways to do these types of linear algebra tasks become pretty fast (not much more time than it would take to upload the corpus into the matrix).

    Yovel #17:

    Feel free! I’m not currently planning to try lowering the exponents. From my perspective, the easiest way of doing this would be to abandon my algorithm strategy completely and do something more direct. It’s reasonable to me that such an approach could improve the exponents by at least half.

Leave a Reply