Time to do Scott’s research again

Given vectors v1,…,vm in Rn, is it NP-hard to find a unit vector w that maximizes the (absolute value of the) product of inner products, |(v1·w)···(vm·w)|? What about in Cn? Are these problems hard to approximate on a logarithmic scale?

$20 for answers to all three. I should get better at doing these things myself.

30 Responses to “Time to do Scott’s research again”

  1. Anonymous Says:

    Min -Sum_i Log(v_i.w) with ||w||

  2. Scott Says:

    No, it’s not a convex problem. As an example, let v1=(1,0) and v2=(0,1). Then there are exactly four local optima:
    w=(1,1)/sqrt(2),
    w=(1,-1)/sqrt(2),
    w=(-1,1)/sqrt(2), and
    w=(-1,-1)/sqrt(2).
    The average of two such optima is not a local optimum.

  3. Anonymous Says:

    Sorry Scott, you are right: for it to be convex, you need to add the constraints v_i.w > 0 for all i, which gives then a different problem.

  4. Scott Says:

    PS. In case anyone’s wondering, the problem comes from maximum-likelihood estimation of quantum states. I.e. given a set of measurement outcomes, find the (pure) state which is most likely to have produced those outcomes, assuming a uniform prior over pure states. As I recently found out, this is something experimental physicists actually do on a regular basis, using numerical optimization software — of course, without ever asking about complexity.

  5. Anonymous Says:

    Would that simplify to
    |(v1.v2 … vm) . (w.w. … w)|
    ?

  6. Scott Says:

    No. Inner products don’t commute like that.

  7. Fixpoint Says:

    Time to delurk and have some fun…

    Here’s a transformation of the problem which may help.

    Observation: w can be expressed as a linear combination of vs; w = Sum_i (a_i * v_i). (Otherwise we could project w into the space generated by the vs as w’ with the exact same product-of-dot-products but a shorter length).

    Then the objective function becomes
    |(v1·(a1*v1+a2*v2+…+am*vm))···(vm·(a1*v1+…+am*vm))|.

    Let M be the m*n matrix with v_i in row i. Then those inner terms are just transpose(M)*a, and we can rewrite your objective as:

    Given m*n matrix M, maximize |b1*b2*…*bm| / ||transpose(M)*a||, where b = M*transpose(M)*a.

    I’m not sure if this gets us much closer to a solution, but at least it starts to look a bit more familiar in this guise.

    In particular, we can imagine approximating a solution by choosing values for b (an obvious start would be 1/sqrt(m)!) and then solving for a. Perhaps use the partial derivatives for ||w|| over b to hillclimb. I wouldn’t be surprised if this is something fairly standard in the end — it’s not exactly my area of expertise, but I had fun thinking about it.

    Regards,
    Christopher

  8. Robin Blume-Kohout Says:

    Scott,

    Glad to see you dropping by State Estimation Land again! A couple of comments:
    1) “Prior” is pretty widely interpreted to mean “prior probability distribution”. That’s a measure. MLE is a frequentist method, and doesn’t involve (or even admit the existence of!) a prior. Technically, what you are talking about is a constraint on the domain of the likelihood function, not a prior distribution.
    2) There _are_ people who do MLE with a restriction to pure states, but they’re bad bad people (:>). The more popular (and much better motivated) scheme is optimization over mixed states (a convex set), with the non-holonomic constraint that Rho is positive.
    3) When you optimize over mixed states, the log of the likelihood function is convex. So that’s a convex problem — quite well-behaved. There have been rumblings for a couple of years now that Andrew Doherty is going to unleash a sophisticated linear programming scheme to do it; as of recently most people were using gradient algorithms on sqrt{Rho}, and arguing about whether that parameterization has local maxima or not.
    4. I have major issues with MLE, even over all states, but will restrain myself from using your blog as a soapbox! :>

  9. Fixpoint Says:

    Hmm. Math is hard. Well, linear algebra is hard. (It probably would have been easier had I shown up for more classes).

    Taking inspiration from Scott’s example, what if we simplify the problem a bit to be just finding the signs of the components of w?

    Letting w_i = ((-1)^x_i)/sqrt(n), how hard it is to find n bits x_i which maximize Scott’s product?

    I could imagine trying to build “hard” instances by taking n == m, and the matrix of vs to be an identity matrix plus some small epsilon times another matrix — but I haven’t worked out the details.

    Regards,
    Christopher

  10. Scott Says:

    Thanks so much, Robin! Thanks especially for reminding me of the religious difference between frequentists and Bayesians over MLE. I thought I had it straight, but evidently I didn’t. Like Sunnis and Shiites, frequentists and Bayesians are much more similar to each other than either are to me.

    To clarify: are you saying that, if we want to maximize the likelihood function over mixed states, then there’s a unique local optimum (or rather hyperplane of local optima) and hence it’s doable in time polynomial in the Hilbert space dimension?

  11. Greg Kuperberg Says:

    sum_k log |v_k.w| is a convex function on the sphere in R^n, except on the hyperplanes where the argument of logarithm vanishes. So you are maximizing a kind of cobblestone function. Once you pick a list of signs for the inner products v_k.w, you can maximize the restricted function efficiently using convex programming methods. However, picking the right cobblestone to maximize looks hard to me. I bet that it is indeed NP-hard.

    The same function for C^n (or CP^{n-1}) is no longer convex. It can’t be, because the complement of all of the hyperplanes is now connected. It’s probably also NP-hard, although unfortunately, it does not clearly reduce to the real case.

    I don’t see that passing to mixed states helps much. If you elect not to take the logarithm and instead maximize prod_k |v_k.w|^2, then you can think the objective as a product of m linear functions on the Bloch region of mixed states. The maximum has to be an extremal point, i.e., a pure state, by the saddle-like form of the objective.

  12. geoff Says:

    No, no, no.
    No, no, yes.
    No, yes, no.
    No, yes, yes.
    Yes, no, no.
    Yes, no, yes.
    Yes, yes, no.
    Yes, yes, yes.
    Where’s my $20?

  13. Scott Says:

    Thanks, Greg! The “decomposition into cobblestones” is an insight I was missing.

    I found it extremely annoying that I couldn’t reduce the real case to the complex one or vice versa.

  14. Scott Says:

    Where’s my $20?

    I dunno, wiseass — where’s your proof? Here in Nerd Country, when we use the word “answer,” we mean “justified answer.”

  15. John Sidles Says:

    A conceptually similar “is-it-NP-hard?” problem that I asked on Mike Nielsen’s weblog (and to which no one ever replied) is this:

    Given a set of two or more Hermitian matrices H_i, is there a unitary transform U such that the similarity transform U H_i U^dagger is sparse for each H_i?

    Here “sparse” may be taken to mean “no more than two entries in a row or column”.

    To a physicist, this is the problem of finding a “natural” basis for quantum-entangling Hamiltonian interactions, i.e., the basis in which the entanglement is simplest. To a cryptographer, this same problem amounts to breaking certain public key distribution systems.

    When I search the complexity literature, attempting to find an answer this question, I find that the inhabitants in the “Complexity Zoo” all look alike to me! It’s worse than helping my wife try to identify crossbill species.

  16. John Sidles Says:

    Hey, within five hours, fate has granted my wish (at least partially)!

    Namely, Dominik Janzing and Pawel Wocjan have this evening posted a preprint entitled Estimating diagonal entries of powers of sparse symmetric matrices is BQP-complete.

    Janzing and Wocjan do not precisely solve the Hermitian “sparsity” problem just posted, but they have solved a closely-related problem by an approach that I had not considered, namely, given a problem that is conjectured to be NP-complete, prove instead that it is BQP-complete!

    Then as the authors note “Given the assumption that BQP .ne. BPP, i.e., that a quantum computer is more powerful than a classical computer, the required information on the spectral measure cannot be obtained by an efficient classical algorithm.”

    To me, this work makes it plausible that the BQP-complete “Zoo” may not not only be very much larger than the NP-complete “Zoo,” but even better, the quantum animals in BQP may eventually prove to be very much easier to capture than classical animals, once we know more about them and their mathematical habitat.

  17. Douglas Knight Says:

    I’ve heard there are more sophisticated disagreements between frequentists and bayesians, but when it comes to MLE, the frequentists are just wrong. As usually done, it depends on a choice of coordinates. This is a special case of a prior, and I think it can be done for general prior measures.

    Even if one acknowleges priors, MLE is the essentially the mode, and has the same drawbacks as that. (eg, dependence on coarseness of binning should make you nervous)

  18. puzzled Says:

    I don’t know much about Quantum and don’t have time to read the paper John mentioned, but I am surprised at the provable existence of BQP-complete problems. Doesn’t a BQP algorithm contain a ‘promise’ in way similar to BPP?

    Also, entries of matrix powers are generally findable in #P, and #P functions are approximable in the Poly Hierarchy. So if this problem is BQP-hard, does this supposed result show BQP is in PH? Or am I confused, or not careful enough to relate the parameters associated with the mentioned results and this paper.

  19. Anonymous Says:

    Scott–I appreciate this blog’s math side, but I have to say that throwing out concrete math problems and asking about their NP-hardness status is not the same as engaging your audience in complexity theory proper, e.g. teaching us about how one should think about quantum information or invent new complexity classes.

    You admitted to being bad at NP-completeness proofs; would you admit this if you thought it was a very important activity?

  20. Nagesh Adluru Says:

    would you admit this if you thought it was a very important activity?

    Very nice question!!

  21. John Sidles Says:

    Puzzled sez: “Entries of matrix powers are generally findable in #P … ”

    Aha … but if we generalize to interval matrices (i.e., matrices whose entries are known only within intervals, e.g., [2.0-2.1]), then even a task as simple as computing the cube of an interval matrix is NP-hard.

    Such problems are ubiquitous in control engineering, because engineers almost never know the parameters of a physical system exactly. Which is why engineers in general, and quantum system engineers in particular, are beginning to take such a strong interest in complexity theory.

    As for Scott’s confession of being daunted by the NP-hard “zoo”, I admire him very much for it, and I don’t mind confessing that my own difficulties in grappling with the zoo are ~100X greater, even when I restrict my attention to problems that are relevant to hardware.

    Conversely, I’m skeptical of colleagues who assert that they have privileged access to the imperial message (a fable that I’m pretty sure most complexity theorists know, and those who don’t know it, will enjoy).

  22. Robin Blume-Kohout Says:

    Scott: Thanks for what I think was a compliment… is “You have helped me to understand the rantings of foolish people” a good thing? Anyway, I suppose the commonality of frequentists and Bayesians is that anyone who takes the time to define themselves as one or the other is passionate about statistical inference. I support this sort of passion — I see it as “passionate about understanding how the hell we figure anything out, ever.”

    Regarding your clarification question: yes. Although I freely confess to being a physicist, and that I _haven’t_ thought for more than 8 hours about whether there could possibly exist ultra-pathological cases that are hard.

    Several people have told me in the last week that I should visit PI. If it works out, I shall come equipped with bell, book, and candle, wearing my evangelist’s hat!

    Greg: Your cobblestone argument is interesting — I gotta take the time to understand it! However, I can’t agree with your last point on mixed states. Suppose I’m dealing with a qubit, and I’ve observed the pure states {|+z>,|-z>,|+x>,|-x>,|+y>,|-y>}. The likelihood of the maximally mixed state is higher than that of any pure state! More generally, you are right about the likelihood being the product of linear functions. Its logarithm is therefore a sum of concave-downward functions (log(x) is concave downward), and is itself concave downward. That means it has a unique maximum, which can be found by a convex program, gradient method, etc.

    Doug: I basically agree with what you’ve said (that frequentists are on the wrong side of the argument for quantum state estimation), but… I’m a bit unclear about a couple of things.
    1. “When it comes to MLE, the frequentists are just wrong.” MLE is a purely frequentist approach — a good Bayesian wouldn’t have anything to say about it except that it’s unjustified. Is this what you mean?
    2. MLE doesn’t depend on coordinates. The measurements that you make define a coordinate system, but MLE is just “Pick the state (hypothesis) that maximizes the conditional probability of the observed data.” No coords there!
    3. Having thought about this a _lot_, I still don’t know of a way to formulate MLE in terms of a prior, where “prior” means “measure”. I know of two ways to formulate MLE in a Bayesian context (1: pick a particular cost function; 2: apply the frequentist axiom brutally), but neither has anything to do with the prior. Can you enlighten me?
    4. You’re right that in the presence of a prior, MLE is the mode! However, I’d say it’s even worse than you say: there IS no mode! Every state has infinitesmal (zero) posterior probability. What gives? Simple: a measure does not have values; it has integrals. So the answer would be _entirely_ dependent on a completely arbitrary binning… which should make you more than nervous; it should make you OUTRAGED! 🙂

    (Sorry, Scott… I promised “no soapbox”…)

  23. Anonymous Says:

    No, no, no.
    No, no, yes.
    No, yes, no.
    No, yes, yes.
    Yes, no, no.
    Yes, no, yes.
    Yes, yes, no.
    Yes, yes, yes.

    Madeline Kahn imitation?

  24. puzzled Says:

    John–I know approximate matrix powers can be NP-hard; that doesn’t contradict their being in #P (which is hard for the PH, Toda’s thm.), and I’m fairly sure adding interval uncertainty doesn’t change that.

    I’m just asking why reducing BQP to an approximate matrix power entry computation doesn’t show BQP is in PH. The condition of BQP being in PH wouldn’t be that shocking, or show it to be a ‘weak’ or classically tractable class. It’s just an interesting result that lots of people are trying to prove, so I’m skeptical when an unrefereed paper seems to unwittingly resolve it. And again, I’m skeptical that they can show *any* problem BQP-complete.

  25. John Sidles Says:

    Puzzled: Since this is “international I freely confess to not being an expert week”, I will freely confess to being puzzled AND not-an-expert. Perhaps someone more expert than me can comment.

    Like many folks, I don’t mind learning that a given problem is NP-hard; what is greatly exasperating is to be left in doubt, and then having to work around the issue in the design process.

  26. Anonymous Says:

    Robin,

    (1) no, that’s not what I meant. I meant that MLE depends on a prior, and frequentists deny this. so I should convince you of (2).

    At this point, I should put in a disclaimer that I don’t know what I’m talking about, and might be just wrong.

    (2) If your states are discrete, then I would say there are no coordinates. (Actually, I think that a prior is still relevant; you should multiply the likelihood by the prior.)
    I believe that people do MLE as in (4)

  27. Douglas Knight Says:

    As I was saying,
    I believe that people do MLE as in (4), where there is a continuous space of states. This is what Scott was talking about, isn’t it? Then the likelihood function is a measure on the space of states. People express it as a function times the standard (“Lebesgue”) measure and choose the point where this function is maximized.

    The coordinates came in defining the standard measure, with which to compare the likelihood measure. The standard measure acts as a prior. Another way of saying this is that the coordinates give a way of binning, at every level of coarseness. (I suppose you could define “coordinates” this way.)

  28. Robin Blume-Kohout Says:

    I don’t think anybody’s reading this any more (Scott’s moved on to new, deeper questions), but just for posterity’s sake, here are a couple of final notes.

    MLE really doesn’t involve a prior. A prior is, as Douglas noted, a measure over states. The set of states can be discrete+finite (in which case defining a measure is trivially easy), or it can be continuous. In all the cases we’re discussing on this blog, it’s continuous, and defining a measure is more or less nontrivial.

    Anyway, the likelihood function is L(rho) = p( M | rho ), where rho is your state and M is your observation. Note that this is a probability measure over possible measurement records, but not over states! It is a function of rho. That is, it has a well-defined value for each rho. No measure here! So finding the rho* that maximizes L(rho) is perfectly well-defined without recourse to a measure.

    You can multiply L by some standard (Lebesgue) measure to get a probability measure, but frequentists don’t do that. They don’t even accept the existence of a measure over states, much less a “standard” measure. Furthermore, their methods are entirely consistent within that framework. They just aren’t always consistent with reality!

    So, anybody who does do MLE over a continuous state space, and makes reference to a prior measure (constraints are fine, even if some people erroneously call them priors), is completely off the rails. It is possible to construct a Bayesian method that follows such a protocol, but (a) it’s not a good method, and (b) it’s not properly “MLE”. But criticizing such folk is much like clubbing baby seals.

  29. puzzled Says:

    John–while still a bit skeptical about the paper, Lance’s last post cleared up one confusion of mine–approximating matrix entries need not be doable in the PH if the matrix isn’t all nonnegative reals.

  30. Scott Says:

    Scott–I appreciate this blog’s math side, but I have to say that throwing out concrete math problems and asking about their NP-hardness status is not the same as engaging your audience in complexity theory proper, e.g. teaching us about how one should think about quantum information or invent new complexity classes.

    Sorry, I don’t know any way to engage people with math except through the asking and answering of concrete questions. Apparently Erdős had the same problem.

    But as for inventing new complexity classes, that’s easy! Just string together “BP”, “N”, “Q”, etc. in some random order, top it off with a “P” or an “E”, and add subscripts and oracles to taste.

    You admitted to being bad at NP-completeness proofs; would you admit this if you thought it was a very important activity?

    I didn’t use to think it was an important activity, but I’m encountering more and more cases where it would’ve been helpful to me to be better at it.