Avi Wigderson’s “Permanent” Impact on Me

The following is the lightly-edited transcript of a talk that I gave a week ago, on Wednesday October 5, at Avi Wigderson’s 60th birthday conference at the Institute for Advanced Study in Princeton.  Videos of all the talks (including mine) are now available here.

Thanks so much to Sanjeev Arora, Boaz Barak, Ran Raz, Peter Sarnak, and Amir Shpilka for organizing the conference and for inviting me to speak; to all the other participants and speakers for a great conference; and of course to Avi himself for being Avi. –SA


I apologize that I wasn’t able to prepare slides for today’s talk. But the good news is that, ever since I moved to Texas two months ago, I now carry concealed chalk everywhere I go. [Pull chalk out of pocket]

My history with Avi goes back literally half my life. I spent a semester with him at Hebrew University, and then a year with him as a postdoc here at IAS. Avi has played a bigger role in my career than just about anyone—he helped me professionally, he helped me intellectually, and once I dated and then married an Israeli theoretical computer scientist (who was also a postdoc in Avi’s group), Avi even helped me learn Hebrew. Just today, Avi taught me the Hebrew word for the permanent of a matrix. It turns out that it’s [throaty noises] pehhrmahnent.

But it all started with a talk that Avi gave in Princeton in 2000, which I attended as a prospective graduate student. That talk was about the following function of an n×n matrix A∈Rn×n, the permanent:

$$ \text{Per}(A) = \sum_{\sigma \in S_n} \prod_{i=1}^n a_{i,\sigma(i)}. $$

Avi contrasted that function with a superficially similar function, the determinant:

$$ \text{Det}(A) = \sum_{\sigma \in S_n} (-1)^{\text{sgn}(\sigma)} \prod_{i=1}^n a_{i,\sigma(i)}. $$

In this talk, I want to share a few of the amazing things Avi said about these two functions, and how the things he said then reverberated through my entire career.

Firstly, like we all learn in kindergarten or whatever, the determinant is computable in polynomial time, for example by using Gaussian elimination. By contrast, Valiant proved in 1979 that computing the permanent is #P-complete—which means, at least as hard as any combinatorial counting problem, a class believed to be even harder than NP-complete.

So, despite differing from each other only by some innocent-looking -1 factors, which the determinant has but the permanent lacks, these two functions effectively engage different regions of mathematics. The determinant is linear-algebraic and geometric; it’s the product of the eigenvalues and the volume of the parallelipiped defined by the row vectors. But the permanent is much more stubbornly combinatorial. It’s not quite as pervasive in mathematics as the determinant is, though it does show up. As an example, if you have a bipartite graph G, then the permanent of G’s adjacency matrix counts the number of perfect matchings in G.

When n=2, computing the permanent doesn’t look too different from computing the determinant: indeed, we have

$$
\text{Per}\left(
\begin{array}
[c]{cc}%
a & b\\
c & d
\end{array}
\right) =\text{Det}\left(
\begin{array}
[c]{cc}%
a & -b\\
c & d
\end{array}
\right).
$$

But as n gets larger, the fact that the permanent is #P-complete means that it must get exponentially harder to compute than the determinant, unless basic complexity classes collapse. And indeed, to this day, the fastest known algorithm to compute an n×n permanent, Ryser’s algorithm, takes O(n2n) time, which is only modestly better than the brute-force algorithm that just sums all n! terms.

Yet interestingly, not everything about the permanent is hard. So for example, if A is nonnegative, then in 1997, Jerrum, Sinclair, and Vigoda famously gave a polynomial-time randomized algorithm to approximate Per(A) to within a 1+ε multiplicative factor, for ε>0 as small as you like. As an even simpler example, if A is nonnegative and you just want to know whether its permanent is zero or nonzero, that’s equivalent to deciding whether a bipartite graph has at least one perfect matching. And we all know that that can be done in polynomial time.


OK, but the usual algorithm from the textbooks that puts the matching problem in the class P is already a slightly nontrivial one—maybe first grade rather than kindergarten! It involves repeatedly using depth-first search to construct augmenting paths, then modifying the graph, etc. etc.

Sixteen years ago in Princeton, the first thing Avi said that blew my mind is that there’s a vastly simpler polynomial-time algorithm to decide whether a bipartite graph has a perfect matching—or equivalently, to decide whether a nonnegative matrix has a zero or nonzero permanent. This algorithm is not quite as efficient as the textbook one, but it makes up for it by being more magical.

So here’s what you do: you start with the 0/1 adjacency matrix of your graph. I’ll do a 2×2 example, since that’s all I’ll be able to compute on the fly:

$$ \left(
\begin{array}
[c]{cc}%
1 & 1\\
0 & 1
\end{array}
\right). $$

Then you normalize each row so it sums to 1. In the above example, this would give

$$ \left(
\begin{array}
[c]{cc}%
\frac{1}{2} & \frac{1}{2} \\
0 & 1
\end{array}
\right). $$

Next you normalize each column so it sums to 1:

$$ \left(
\begin{array}
[c]{cc}%
1 & \frac{1}{3} \\
0 & \frac{2}{3}
\end{array}
\right). $$

OK, but now the problem is that the rows are no longer normalized, so you normalize them again:

$$ \left(
\begin{array}
[c]{cc}%
\frac{3}{4} & \frac{1}{4} \\
0 & 1
\end{array}
\right). $$

Then you normalize the columns again, and so on. You repeat something like n3log(n) times. If, after that time, all the row sums and column sums have become within ±1/n of 1, then you conclude that the permanent was nonzero and the graph had a perfect matching. Otherwise, the permanent was zero and the graph had no perfect matching.

What gives? Well, it’s a nice exercise to prove why this works. I’ll just give you a sketch: first, when you multiply any row or column of a matrix by a scalar, you multiply the permanent by that same scalar. By using that fact, together with the arithmetic-geometric mean inequality, it’s possible to prove that, in every iteration but the first, when you rebalance all the rows or all the columns to sum to 1, you can’t be decreasing the permanent. The permanent increases monotonically.

Second, if the permanent is nonzero, then after the first iteration it’s at least 1/nn, simply because we started with a matrix of 0’s and 1’s.

Third, the permanent is always at most the product of the row sums or the product of the column sums, which after the first iteration is 1.

Fourth, at any iteration where there’s some row sum or column sum that’s far from 1, rescaling must not only increase the permanent, but increase it by an appreciable amount—like, a 1+1/n2 factor or so.

Putting these four observations together, we find that if the permanent is nonzero, then our scaling procedure must terminate after a polynomial number of steps, with every row sum and every column sum close to 1—since otherwise, the permanent would just keep on increasing past its upper bound of 1.

But a converse statement is also true. Suppose the matrix can be scaled so that every row sum and every column sum gets within ±1/n of 1. Then the rescaled entries define a flow through the bipartite graph, with roughly the same amount of flow through each of the 2n vertices. But if such a flow exists, then Hall’s Theorem tells you that there must be a perfect matching (and hence the permanent must be nonzero)—since if a matching didn’t exist, then there would necessarily be a bottleneck preventing the flow.

Together with Nati Linial and Alex Samorodnitsky, Avi generalized this to show that scaling the rows and columns gives you a polynomial-time algorithm to approximate the permanent of a nonnegative matrix. This basically follows from the so-called Egorychev-Falikman Theorem, which says that the permanent of a doubly stochastic matrix is at least n!/nn. The approximation ratio that you get this way, ~en, isn’t nearly as good as Jerrum-Sinclair-Vigoda’s, but the advantage is that the algorithm is deterministic (and also ridiculously simple).

For myself, though, I just filed away this idea of matrix scaling for whenever I might need it. It didn’t take long. A year after Avi’s lecture, when I was a beginning grad student at Berkeley, I was obsessing about the foundations of quantum mechanics. Specifically, I was obsessing about the fact that, when you measure a quantum state, the rules of quantum mechanics tell you how to calculate the probability that you’ll see a particular outcome. But the rules are silent about so-called multiple-time or transition probabilities. In other words: suppose we adopt Everett’s Many-Worlds view, according to which quantum mechanics needs to be applied consistently to every system, regardless of scale, so in particular, the state of the entire universe (including us) is a quantum superposition state. We perceive just one branch, but there are also these other branches where we made different choices or where different things happened to us, etc.

OK, fine: for me, that’s not the troubling part! The troubling part is that quantum mechanics rejects as meaningless questions like the following: given that you’re in this branch of the superposition at time t1, what’s the probability that you’ll be in that branch at time t2, after some unitary transformation is applied? Orthodox quantum mechanics would say: well, either someone measured you at time t1, in which case their act of measuring collapsed the superposition and created a whole new situation. Or else no one measured at t1—but in that case, your state at time t1 was the superposition state, full stop. It’s sheer metaphysics to imagine a “real you” that jumps around from one branch of the superposition to another, having a sequence of definite experiences.

Granted, in practice, branches of the universe’s superposition that split from each other tend never to rejoin, for the same thermodynamic reasons why eggs tend never to unscramble themselves. And as long as the history of the Everettian multiverse has the structure of a tree, we can sensibly define transition probabilities. But if, with some technology of the remote future, we were able to do quantum interference experiments on human brains (or other conscious entities), the rules of quantum mechanics would no longer predict what those beings should see—not even probabilistically.

I was interested in the question: suppose we just wanted to postulate transition probabilities, with the transitions taking place in some fixed orthogonal basis. What would be a mathematically reasonable way to do that? And it occurred to me that one thing you could do is the following. Suppose for simplicity that you have a pure quantum state, which is just a unit vector of n complex numbers called amplitudes:

$$ \left(
\begin{array}
[c]{c}%
\alpha_{1}\\
\vdots\\
\alpha_{n}%
\end{array}
\right) $$

Then the first rule of quantum mechanics says that you can apply any unitary transformation U (that is, any norm-preserving linear transformation) to map this state to a new one:

$$ \left(
\begin{array}
[c]{c}%
\beta_{1}\\
\vdots\\
\beta_{n}%
\end{array}
\right) =\left(
\begin{array}
[c]{ccc}%
u_{11} & \cdots & u_{1n}\\
\vdots & \ddots & \vdots\\
u_{n1} & \cdots & u_{nn}%
\end{array}
\right) \left(
\begin{array}
[c]{c}%
\alpha_{1}\\
\vdots\\
\alpha_{n}%
\end{array}
\right). $$

The second rule of quantum mechanics, the famous Born Rule, says that if you measure in the standard basis before applying U, then the probability that you’ll find youself in state i equals |αi|2. Likewise, if you measure in the standard basis after applying U, the probability that you’ll find youself in state j equals |βj|2.

OK, but what’s the probability that you’re in state i at the initial time, and then state j at the final time? These joint probabilities, call them pij, had better add up to |αi|2 and |βj|2, if we sum the rows and columns respectively. And ideally, they should be “derived” in some way from the unitary U—so that for example, if uij=0 then pij=0 as well.

So here’s something you could do: start by replacing each uij by its absolute value, to get a nonnegative matrix. Then, normalize the ith row so that it sums to |αi|2, for each i. Then normalize the jth column so that it sums to |βj|2, for each j. Then normalize the rows, then the columns, and keep iterating until hopefully you end up with all the rows and columns having the right sums.

So the first question I faced was, does this process converge? And I remembered what Avi taught me about the permanent. In this case, because of the nonuniform row and column scalings, the permanent no longer works as a progress measure, but there’s something else that does work. Namely, as a first step, we can use the Max-Flow/Min-Cut Theorem to show that there exists a nonnegative matrix F=(fij) such that fij=0 whenever uij=0, and also

$$ \sum_j f_{ij} = \left|\alpha_i\right|^2 \forall i,\ \ \ \ \ \sum_i f_{ij} = \left|\beta_j\right|^2 \forall j. $$

Then, letting M=(mij) be our current rescaled matrix (so that initially mij:=|uij|), we use

$$ \prod_{i,j : u_{ij}\ne 0} m_{ij}^{f_{ij}} $$

as our progress measure. By using the nonnegativity of the Kullback-Leibler divergence, one can prove that this quantity never decreases. So then, just like with 0/1 matrices and the permanent, we get eventual convergence, and indeed convergence after a number of iterations that’s polynomial in n.

I was pretty stoked about this until I went to the library, and discovered that Erwin Schrödinger had proposed the same matrix scaling process in 1931! And Masao Nagasawa and others then rigorously analyzed it. OK, but their motivations were somewhat different, and for some reason they never talked about finite-dimensional matrices, only infinite-dimensional ones.

I can’t resist telling you my favorite open problem about this matrix scaling process: namely, is it stable under small perturbations? In other words, if I change one of the αi‘s or uij‘s by some small ε, then do the final pij‘s also change by at most some small δ? To clarify, several people have shown me how to prove that the mapping to the pij‘s is continuous. But for computer science applications, one needs something stronger: namely that when the matrix M, and the row and column scalings, actually arise from a unitary matrix in the way above, we get strong uniform continuity, with a 1/nO(1) change to the inputs producing only a 1/nO(1) change to the outputs (and hopefully even better than that).

The more general idea that I was groping toward or reinventing here is called a hidden-variable theory, of which the most famous example is Bohmian mechanics. Again, though, Bohmian mechanics has the defect that it’s only formulated for some exotic state space that the physicists care about for some reason—a space involving pointlike objects called “particles” that move around in 3 Euclidean dimensions (why 3? why not 17?).

Anyway, this whole thing led me to wonder: under the Schrödinger scaling process, or something like it, what’s the computational complexity of sampling an entire history of the hidden variable through a quantum computation? (“If, at the moment of your death, your whole life history flashes before you in an instant, what can you then efficiently compute?”)

Clearly the complexity is at least BQP (i.e., quantum polynomial time), because even sampling where the hidden variable is at a single time is equivalent to sampling the output distribution of a quantum computer. But could the complexity be even more than BQP, because of the correlations between the hidden variable values at different times? I noticed that, indeed, sampling a hidden variable history would let you do some crazy-seeming things, like solve the Graph Isomorphism problem in polynomial time (OK, fine, that seemed more impressive at the time than it does after Babai’s breakthrough), or find collisions in arbitrary cryptographic hash functions, or more generally, solve any problem in the complexity class SZK (Statistical Zero Knowledge).

But you might ask: what evidence do we have that any these problems are hard even for garden-variety quantum computers? As many of you know, it’s widely conjectured today that NP⊄BQP—i.e., that quantum computers can’t solve NP-complete problems in polynomial time. And in the “black box” setting, where all you know how to do is query candidate solutions to your NP-complete problem to check whether they’re valid, it’s been proven that quantum computers don’t give you an exponential speedup: the best they can give is the square-root speedup of Grover’s algorithm.

But for these SZK problems, like finding collisions in hash functions, who the hell knows? So, this is the line of thought that led me to probably the most important thing I did in grad school, the so-called quantum lower bound for collision-finding. That result says that, if (again) your hash function is only accessible as a black box, then a quantum computer can provide at most a polynomial speedup over a classical computer for finding collisions in it (in this case, it turns out, at most a two-thirds power speedup). There are several reasons you might care about that, such as showing that one of the basic building blocks of modern cryptography could still be secure in a world with quantum computers, or proving an oracle separation between SZK and BQP. But my original motivation was just to understand how transition probabilities would change quantum computation.


The permanent has also shown up in a much more direct way in my work on quantum computation. If we go back to Avi’s lecture from 2000, a second thing he said that blew my mind was that apparently, or so he had heard, even the fundamental particles of the universe know something about the determinant and the permanent. In particular, he said, fermions—the matter particles, like the quarks and electrons in this stage—have transition amplitudes that are determinants of matrices. Meanwhile, bosons—the force-carrying particles, like the photons coming from the ceiling that let you see this talk—have transition amplitudes that are permanents of matrices.

Or as Steven Weinberg, one of the great physicists on earth, memorably put it in the first edition of his recent quantum mechanics textbook: “in the case of bosons, it is also a determinant, except without minus signs.” I’ve had the pleasure of getting to know Weinberg at Austin, so recently I asked him about that line. He told me that of course he knew that the determinant without minus signs is called a permanent, but he thought no one else would know! As far as he knew, the permanent was just some esoteric function used by a few quantum field theorists who needed to calculate boson amplitudes.

Briefly, the reason why the permanent and determinant turn up here is the following: whenever you have n particles that are identical, to calculate the amplitude for them to do something, you need to sum over all n! possible permutations of the particles. Furthermore, each contribution to the sum is a product of n complex numbers, one uij for each particle that hops from i to j. But there’s a difference: when you swap two identical bosons, nothing happens, and that’s why bosons give rise to the permanent (of an n×n complex matrix, if there are n bosons). By contrast, when you swap two identical fermions, the amplitude for that state of the universe gets multiplied by -1, and that’s why fermions give rise to the determinant.

Anyway, Avi ended his talk with a quip about how unfair it seemed to the bosons that they should have to work so much harder than the fermions just to calculate where they should be!

And then that one joke of Avi—that way of looking at things—rattled around in my head for a decade, like a song I couldn’t get rid of. It raised the question: wait a minute, bosons—particles that occur in Nature—are governed by a #P-complete function? Does that mean we could actually use bosons to solve #P-complete problems in polynomial time? That seems ridiculous, like the kind of nonsense I’m fighting every few weeks on my blog! As I said before, most of us don’t even expect quantum computers to be able to solve NP-complete problems in polynomial time, let alone #P-complete ones.

As it happens, Troyansky and Tishby had already taken up that puzzle in 1996. (Indeed Avi, being the social butterfly and hub node of our field that he is, had learned about the role of permaments and determinants in quantum mechanics from them.) What Troyansky and Tishby said was, it’s true that if you have a system of n identical, non-interacting bosons, their transition amplitudes are given by permanents of n×n matrices. OK, but amplitudes in quantum mechanics are not directly observable. They’re just what you use to calculate the probability that you’ll see this or that measurement outcome. But if you try to encode a hard instance of a #P-complete problem into a bosonic system, the relevant amplitudes will in general be exponentially small. And that means that, if you want a decent estimate of the permanent, you’ll need to repeat the experiment an exponential number of times. So OK, they said, nice try, but this doesn’t give you a computational advantage after all in calculating the permanent compared to classical brute force.

In our 2011 work on BosonSampling, my student Alex Arkhipov and I reopened the question. We said, not so fast. It’s true that bosons don’t seem to help you in estimating the permanent of a specific matrix of your choice. But what if your goal was just to sample a random n×n matrix A∈Cn×n, in a way that’s somehow biased toward matrices with larger permanents? Now, why would that be your goal? I have no idea! But this sampling is something that a bosonic system would easily let you do.

So, what Arkhipov and I proved was that this gives rise to a class of probability distributions that can be sampled in quantum polynomial time (indeed, by a very rudimentary type of quantum computer), but that can’t be sampled in classical polynomial time unless the polynomial hierarchy collapses to the third level. And even though you’re not solving a #P-complete problem, the #P-completeness of the permanent still plays a crucial role in explaining why the sampling problem is hard. (Basically, one proves that the probabilities are #P-hard even to approximate, but that if there were a fast classical sampling algorithm, then the probabilities could be approximated in the class BPPNP. So if a fast classical sampling algorithm existed, then P#P would equal BPPNP, which would collapse the polynomial hierarchy by Toda’s Theorem.)

When we started on this, Arkhipov and I thought about it as just pure complexity theory—conceptually clarifying what role the #P-completeness of the permanent plays in physics. But then at some point it occurred to us: bosons (such as photons) actually exist, and experimentalists in quantum optics like to play with them, so maybe they could demonstrate some of this stuff in the lab. And as it turned out, the quantum optics people were looking for something to do at the time, and they ate it up.

Over the past five years, a trend has arisen in experimental physics that goes by the name “Quantum Supremacy,” although some people are now backing away from the name because of Trump. The idea is: without yet having a universal quantum computer, can we use the hardware that we’re able to build today to demonstrate the reality of a quantum-computational speedup as clearly as possible? Not necessarily for a useful problem, but just for some problem? Of course, no experiment can prove that something is scaling polynomially rather than exponentially, since that’s an asymptotic statement. But an experiment could certainly raise the stakes for the people who deny such a statement—for example, by solving something a trillion times faster than we know how to solve it otherwise, using methods for which we don’t know a reason for them not to scale.

I like to say that for me, the #1 application of quantum computing, more than breaking RSA or even simulating physics and chemistry, is simply disproving the people who say that quantum computing is impossible! So, quantum supremacy targets that application.

Experimental BosonSampling has become a major part of the race to demonstrate quantum supremacy. By now, at least a half-dozen groups around the world have reported small-scale implementations—the record, so far, being an experiment at Bristol that used 6 photons, and experimentally confirmed that, yes, their transition amplitudes are given by permanents of 6×6 complex matrices. The challenge now is to build single-photon sources that are good enough that you could scale up to (let’s say) 30 photons, which is where you’d really start seeing a quantum advantage over the best known classical algorithms. And again, this whole quest really started with Avi’s joke.

A year after my and Arkhipov’s work, I noticed that one also can run the connection between quantum optics and the permanent in the “reverse” direction. In other words: with BosonSampling, we used the famous theorem of Valiant, that the permanent is #P-complete, to help us argue that bosons can solve hard sampling problems. But if we know by some other means that quantum optics lets us encode #P-complete problems, then we can use that to give an independent, “quantum” proof that the permanent is #P-complete in the first place! As it happens, there is another way to see why quantum optics lets us encode #P-complete problems. Namely, we can use celebrated work by Knill, Laflamme, and Milburn (KLM) from 2001, which showed how to perform universal quantum computation using quantum optics with the one additional resource of “feed-forward measurements.” With minor modifications, the construction by KLM also lets us encode a #P-complete problem into a bosonic amplitude, which we know is a permanent—thereby proving that the permanent is #P-complete, in what I personally regard as a much more intuitive way than Valiant’s original approach based on cycle covers. This illustrates a theme that we’ve seen over and over in the last 13 years or so, which is the use of quantum methods and arguments to gain insight even about classical computation.

Admittedly, I wasn’t proving anything here in classical complexity theory that wasn’t already known, just giving a different proof for an old result! Extremely recently, however, my students Daniel Grier and Luke Schaeffer have extended my argument based on quantum optics, to show that computing the permanent of a unitary or orthogonal matrix is #P-complete. (Indeed, even over finite fields of characteristic k, computing the permanent of an orthogonal matrix is a ModkP-complete problem, as long as k is not 2 or 3—which turns out to be the tight answer.) This is not a result that we previously knew by any means, whether quantum or classical.

I can’t resist telling you the biggest theoretical open problem that arose from my and Arkhipov’s work. We would like to say: even if you had a polynomial-time algorithm that sampled a probability distribution that was merely close, in variation distance, to the BosonSampling distribution, that would already imply a collapse of the polynomial hierarchy. But we’re only able to prove that assuming a certain problem is #P-complete, which no one has been able to prove #P-complete. That problem is the following:

Given an n×n matrix A, each of whose entries is an i.i.d. complex Gaussian with mean 0 and variance 1 (that is, drawn from N(0,1)C), estimate |Per(A)|2, to within additive error ±ε·n!, with probability at least 1-δ over the choice of A. Do this in time polynomial in n, 1/ε, and 1/δ.

Note that, if you care about exactly computing the permanent of a Gaussian random matrix, or about approximating the permanent of an arbitrary matrix, we know how to prove both of those problems #P-complete. The difficulty “only” arises when we combine approximation and average-case in the same problem.

At the moment, we don’t even know something more basic, which is: what’s the distribution over |Per(A)|2, when A is an n×n matrix of i.i.d. N(0,1)C Gaussians? Based on numerical evidence, we conjecture that the distribution converges to lognormal as n gets large. By using the interpretation of the determinant as the volume of a parallelipiped, we can prove that the distribution over |Det(A)|2 converges to lognormal. And the distribution over |Per(A)|2 looks almost the same when you plot it. But not surprisingly, the permanent is harder to analyze.


This brings me to my final vignette. Why would anyone even suspect that approximating the permanent of a Gaussian random matrix would be a #P-hard problem? Well, because if you look at the permanent of an n×n matrix over a large enough finite field, say Fp, that function famously has the property of random self-reducibility. This means: the ability to calculate such a permanent in polynomial time, on 90% all matrices in Fpn×n, or even for that matter on only 1% of them, implies the ability to calculate it in polynomial time on every such matrix.

The reason for this is simply that the permanent is a low-degree polynomial, and low-degree polynomials have extremely useful error-correcting properties. In particular, if you can compute such a polynomial on any large fraction of points, then you can do noisy polynomial interpolation (e.g., the Berlekamp-Welch algorithm, or list decoding), in order to get the value of the polynomial on an arbitrary point.

I don’t specifically remember Avi talking about the random self-reducibility of the permanent in his 2000 lecture, but he obviously would have talked about it! And it was really knowing about the random self-reducibility of the permanent, and how powerful it was, that led me and Alex Arkhipov to the study of BosonSampling in the first place.

In complexity theory, the random self-reducibility of the permanent is hugely important because it was sort of the spark for some of our most convincing examples of non-relativizing results—that is, results that fail relative to a suitable oracle. The most famous such result is that #P, and for that matter even PSPACE, admit interactive protocols (the IP=PSPACE theorem). In the 1970s, Baker, Gill, and Solovay pointed out that non-relativizing methods would be needed to resolve P vs. NP and many of the other great problems of the field.

In 2007, Avi and I wrote our only joint paper so far. In that paper, we decided to take a closer look at the non-relativizing results based on interactive proofs. We said: while it’s true that these results don’t relativize—that is, there are oracles relative to which they fail—nevertheless, these results hold relative to all oracles that themselves encode low-degree polynomials over finite fields (such as the permanent). So, introducing a term, Avi and I said that results like IP=PSPACE algebrize.

But then we also showed that, if you want to prove P≠NP—or for that matter, even prove circuit lower bounds that go “slightly” beyond what’s already known (such as NEXPP/poly)—you’ll need techniques that are not only non-relativizing, but also non-algebrizing. So in some sense, the properties of the permanent that are used (for example) in proving that it has an interactive protocol, just “aren’t prying the black box open wide enough.”

I have a more recent result, from 2011 or so, that I never got around to finishing a paper about. In this newer work, I decided to take another look at the question: what is it about the permanent that actually fails to relativize? And I prove the following result: relative to an arbitrary oracle A, the class #P has complete problems that are both random self-reducible and downward self-reducible (that is, reducible to smaller instances of the same problem). So, contrary to what certainly I and maybe others had often thought, it’s not the random self-reducibility of the permanent that’s the crucial thing about it. What’s important, instead, is a further property that the permanent has, of being self-checkable and self-correctible.

In other words: given (say) a noisy circuit for the permanent, it’s not just that you can correct that circuit to compute whichever low-degree polynomial it was close to computing. Rather, it’s that you can confirm that the polynomial is in fact the permanent, and nothing else.

I like the way Ketan Mulmuley thinks about this phenomenon in his Geometric Complexity Theory, which is a speculative, audacious program to try to prove that the permanent is harder than the determinant, and to tackle the other great separation questions of complexity theory (including P vs. NP), by using algebraic geometry and representation theory. Mulmuley says: the permanent is a polynomial in the entries of an n×n matrix that not only satisfies certain symmetries (e.g., under interchanging rows or columns), but is uniquely characterized by those symmetries. In other words, if you find a polynomial that passes certain tests—for example, if it behaves in the right way under rescaling and interchanging rows and columns—then that polynomial must be the permanent, or a scalar multiple of the permanent. Similarly, if you find a polynomial that passes the usual interactive proof for the permanent, that polynomial must be the permanent. I think this goes a long way toward explaining why the permanent is so special: it’s not just any hard-to-compute, low-degree polynomial; it’s one that you can recognize when you come across it.


I’ve now told you about the eventual impact of one single survey talk that Avi gave 16 years ago—not even a particularly major or important one. So you can only imagine what Avi’s impact must have been on all of us, if you integrate over all the talks he’s given and papers he’s written and young people he’s mentored and connections he’s made his entire career. May that impact be permanent.

70 Responses to “Avi Wigderson’s “Permanent” Impact on Me”

  1. Anonymous Says:

    Forgot the backslash in “\sigma \in S_n” in the formulæ.

  2. Another Trump Democrat Says:

    Any thoughts about the “Immanant of a matrix”?
    https://en.wikipedia.org/wiki/Immanant_of_a_matrix
    It seems like it could give a range of things between determinant and permanent.
    (My understanding is superficial, but STEM interest is the real reason to visit your blog.)

  3. Scott Says:

    Anonymous #1: Thanks! Fixed.

  4. Scott Says:

    ATD #2: Yeah, someone asked about that after the talk. Just as determinants give amplitudes for fermions and permanents give amplitudes for bosons, immanents give amplitudes for anyons, which are quasi-particles that don’t exist naturally but that could in principle exist in two-dimensional media, and which many condensed-matter physicists are working to create because of their applications to quantum computation.

    Now, it’s known that almost all immanents are #P-complete, except for the determinant and a few others that are “close” to the determinant. That means, unfortunately, that anyons and their associated immanents don’t seem to add much that’s new to the basic picture of BosonSampling. I.e., sure, you get a hard sampling problem, but BosonSampling was already hard! Furthermore, many nonabelian anyons are actually known to be universal for quantum computation (just by braiding them around each other in an appropriate pattern)—in which case, why even bother with BosonSampling? Just do Shor’s algorithm or something like that! 🙂

  5. jonas Says:

    Thank you for this writeup, it was a very nice read.

    I’m certainly not as influenced by Avi as you are, but I’ll tell my story anyway.

    I met Avi Widgerson in 2007 first, when he gave a talk about classical comlexity theory. He explained us why theoretical computer scientists believe that all randomized polynomial time algorithms can be derandomized in polynomial time. I’m not sure what the exact theorem he gave was, all I remember is that it used very weak assumptions.

    At that time, this was an idea very new to me. I’d heard of some problems with randomized algorithms and more complicated deterministic ones already, and of the existence of a deterministic polynomial time primality test too, but I had no idea that there were such general results about how anything can be derandomized.

    I’ve met Avi one or two more times on conferences, where he also gave very interesting talks. (I know that’s a difficult skill, and many good mathematicians give talks that are much harder to follow, even when the mathematics they’re talking about is interesting.) But because I was so young and easy to impress at that first talk, that’s the one that stuck in my mind the most.

  6. Sniffnoy Says:

    Now I’m wondering about other generalizations of the permanent and determinant. I saw a talk some time ago about the “λ-determinant”, a different generalization of the two, which is discussed here and was introduced here. And while trying to find that, I stumbled across the “fermionant”; fortunately that paper already discusses the complexity of computing it!

  7. Zelah Says:

    Hi and thanks for your fascinating write up.

    Now, you mentioned an amazing result (to myself) that the permanent of a Unitary / Orthogonal is #P-Complete due to Daniel Grier and Luke Schaeffer.

    I have tried to find the paper with this result but have come up short. I am interested to find this paper, due to Gurvit’s algorithm for the Permanent. Please can you explain how do they get around Gurvit’s approximation?

    Thanks in advance
    Zelah

  8. mike Says:

    Looks like Quantum Supremacy is in the news:

    https://arxiv.org/abs/1610.03632

  9. Scott Says:

    Sniffnoy #6: Sorry, I’d never seen the λ-determinant. Let me know if you find anything out about its complexity!

  10. Scott Says:

    Zelah #7: There’s an excellent reason you weren’t able to find Luke and Daniel’s paper—because it isn’t out yet! But it should be out any day (I just went over it with them this morning).

    One clarification: Gurvits only shows that the permanent of a unitary matrix can be approximated, to within small additive error, in randomized polynomial time. That’s 100% consistent with the exact computation of it being #P-hard.

  11. Scott Says:

    mike #8: Yes! That’s not even the only interesting quantum-supremacy-related paper to have appeared on the arXiv recently. See also here, here, here.

  12. Sniffnoy Says:

    Actually, oops, the λ-determinant doesn’t generalize the permanent after all, just the determinant. I thought it did, but I guess I misremembered. It does generalize the determinant (the (-1)-determinant) and… the product of the diagonal elements (the 0-determinant)? Yeah, that’s less good. And there isn’t any value of λ that gets you the permanent, as can be seen by looking at 3×3 matrices. (The 1-determinant coincides with the permanent when n≤2, but that’s kind of dumb.)

  13. fred Says:

    What’s behind all the recent “kindergarten” jokes?

  14. Another Trump Democrat Says:

    @fred #13 The “kindergarten” reference means something so easy even a deplorable redneck Trump supporter can understand it! 🙂

  15. Sniffnoy Says:

    Oops, I left my university’s proxy in the fermionant link in comment #6. Correct link is here. Sorry about that.

  16. Joshua Zelinsky Says:

    Now, I’m wondering what the correct generalization is that encompasses both the λ-determinant and the immanant. Actually, it isn’t immediately obvious to me that the λ-determinant is not a subcase of the immanant. Is there some quick way of seeing that? The fact that the λ-determinant doesn’t contain the permanent of course shows that the reverse inclusion doesn’t occur.

  17. mjgeddes Says:

    I’m inclined to consider seriously the view that the ‘mathematical realm’ is a real concrete place that exists *beneath* the physical level, and the mathematical realm is built from pure computation. The idea would then be that what we think of as the physical world ‘emerges’ as a higher-level manifestation of mathematics.

    This conception of mathematical realism differs substantially from Tegmark’s ‘mathematical multiverse’ or other philosophical views of mathematical realism such as Platonism that regard mathematical objects as timeless abstractions outside space-time.

    In my conception, mathematics is not timeless, it’s computation, and there is an arrow-of-time that exists for math just as there’s an arrow of time that exists for physics. My idea is that we should *define* the cosmological arrow-of-time as the direction of increasing complexity. So more complex math objects are built from simpler ones, and this hierarchical increase in complexity *defines* time, which I think is truly fundamental.

    I think there’s a mathematical hierarchy, and all of mathematics can be classified into 3 major levels of existence. Here are my 3-levels of mathematical existence:

    (1) At the base is ‘Theory of Computation’ – quantum computation at root. So I think the mathematical realm is *equivalent* to the quantum realm. It’s here (at the sub-physical level) that quantum wave-functions reside – they’re pure mathematical objects.

    http://www.zarzuelazen.com/TheoryOfComputation.html

    (2) Next level up is ‘Algebra’ which I think is where integers live. At this second level we are talking about functions and relations, which I think is about how the lower-level Turing machines interact. On this level I would also locate matrices and groups, and I would classify combinatorics, graph theory and number theory as arising at this level.

    http://www.zarzuelazen.com/Algebra.html

    (3) Next level up is ‘Analysis’ , and here we reach the level of the real numbers (the continuum). Geometry and topology start to emerge at this level, so this is about calculus and related stuff. I think complex numbers also live here. At a high-enough level there’s a ‘phase transition’ and sufficiently complex geometrical structures start to be interpreted as ‘physical’.
    http://www.zarzuelazen.com/SetTheory&Analysis.html

    Remember, I think the arrow-of-time is *defined* by the direction of increasing complexity as we move up the mathematical hierarchy. Past level 3, the physical world emerges (the boundary between levels 3 and 4 is the phase transition between mathematics and physics).

    In my ontology, it really should come as no surprise that abstract mathematical properties such as ‘determinants’ and ‘permanents’ can be related to quantum properties and particle physics. According to my scheme, the mathematical realm IS the quantum realm, and matrices are a high-level explanation of the way that Turing-machines interact (thus generating functions and relations).

  18. Sniffnoy Says:

    Josh: I’m doubting there is one. And the λ-determinant is definitely not an immanent. It’s not even a polynomial in the entries, it’s a Laurent polynomial in the entries! Take a look at the 3×3 case, for instance. (Also, like, the fact that λ is taken from the field you’re working over, rather than being a partition, would make it pretty hard for that to be workable even without that — especially, say, if the field were uncountable…)

    It’s not too hard to encompass both the immanent and the fermionant, though. (Side note: I guess technically the 1-fermionant is (-1)^n times the permanent, rather than just the permanent. That certainly doesn’t affect the complexity though.) Consider immanents, but with irreducible characters replaced by general class functions; now you’ve included the fermionants as well. I guess another way of describing that is “linear combinations of immanents”; so, fermionants are linear combinations of immanents.

    I dunno, I just brought up the λ-determinant because I remembered seeing it once, but it doesn’t seem like it actually fits in very well here.

  19. Chris Billington Says:

    (Please delete my previous comment – I accidentally pressed submit)

    Hi Scott,

    I too found myself reinventing a hidden variable theory without knowing that that’s what I was doing, in order to postulate classical transition probabilities of atomic states, so that we could make a slightly better semiclassical model. You might remember me emailing you ages ago to ask about some detail.

    My hidden variable theory was awful and limited, so it was great to discover yours, which was more general — I ended up using the Schrodinger theory.

    It’s not a paper yet (soon!) but here’s a poster about it if you are interested in what we’re using your work for:

    https://www.researchgate.net/publication/285264301_Reproduction_of_Stern-Gerlach_separation_in_semiclassical_models_using_hidden_variables

    Cheers,

    Chris

  20. Another Trump Democrat Says:

    @Sniffnoy#18 ” Consider immanants, but with irreducible characters replaced by general class functions”
    Sure, but you have a sequence of these indexed by n (you want to define something for all n by n matrices), so without any restriction, you can smuggle anything into this setup, even uncomputible functions. You need some restrictions on the type of functions to bring it down into the complexity classes you’re interested in. For example you could require that your thingo-nant is in #P when restricted to permutation matrices, then extend it to square matrices in the natural way, and it should still be in #P (or something closely related depending on what kind of matrix elements you have). Then the question is, can you come up with anything that could plausibly be strictly between P and #P-complete (even if you couldn’t prove it, of course). My guess is the situation would still be like Scott#4, that a thingo-nant will almost always have a permanant lurking in there, or else be close to a determinant, but maybe you can tweak some parameters to be somewhere in between.

  21. P=BPP Says:

    If $P=BPP$ can random self reducibility be replaced by something stronger? Please enlighten with references.

  22. Scott Says:

    P=BPP #21: Interesting question but I think the answer is no. Random self-reducibility is a black-box notion: you’re given a black box that computes (say) a #P-complete function on most inputs, and you need to correct it to compute the function on all inputs, in polynomial time and with a polynomial number of black-box calls. Now, with a deterministic algorithm, you could always get unlucky and query your black box only on inputs where it errs. And like most phenomena in randomized black-box complexity (as opposed to computational complexity), this one seems totally unaffected if P=BPP.

  23. J Says:

    I’m sorry to write this but you ended up attracting some venom (tangentially related to the lecture). http://www.math.rutgers.edu/~zeilberg/Opinion155.html

    On the other hand, it’s quite exciting and confusing to learn that apparently falsifiable, in real time with real experiments predictions are meaningless??

  24. Scott Says:

    J #23: LOL!! Is it, like, a sign you’ve made it in math or CS when Doron Zeilberger posts a diatribe against you? I’d respond, were there arguments in that post to respond to.

  25. P=BPP Says:

    For NP complete problems approximation is straight forward. You have a certificate and you want to somehow approximate this certificate (for example you relax using semidefinite programming and so on).

    For #P complete problems it is less clear what the analogy is (except counting certificates)? Given this is there any other reason to believe Permanent has deterministic poly time approximation other than P=BPP? May be if there is another behind the scenes reason it should be worth looking into.

  26. David Speyer Says:

    Re complexity of the lambda determinant: Computing it naively from the defining recursion requires O(n^3) arithmetic operations, so it is pretty efficient (over, say, the real or the rational numbers).

    One subtlety: Some of those operations are divisions, so in degenerate cases you may run into problems with dividing by 0. Jim Propp challenged me at one point to give an explicit way to work around the zeroes, analogous to the “zero window” algorithm in the number wall algorithm https://cs.uwaterloo.ca/journals/JIS/VOL4/LUNNON/numbwall10.html but I never responded to his challenge.

    Fomin, Grigoriev and Koshevoy https://arxiv.org/abs/1307.8425 have general results on how the arithmetic circuit length of a polynomial can change when you remove the division gate. I suspect there is an answer to your question buried in there if you read carefully.

  27. Why is the physical space 3-dimensional? | Ajit Jadhav's Weblog Says:

    […] happened was that I ran into Prof. Scott Aaronson’s latest blog post [^], which is actually a transcript of an informal talk he gave recently. The topic of this post […]

  28. Sniffnoy Says:

    …huh, I guess it is only O(n^3). I didn’t even try checking the recursion for it, I just sort of assumed it had to be awful! Well, I feel a bit foolish now.

    That seems really weird though. Computing the determinant in polynomial time without ever doing a row-reduction. I mean, it does have division in it, as you mention, so we’re not escaping that. But it doesn’t have any branching — I suppose because rather than branching when it needs to divide by zero, it simply fails instead!

    So not quite an algorithm, then. But if it’s O(n^3) generically it seems unlikely that a version that does handle general inputs would be significantly worse?

    Meanwhile the Fomin et al paper mentions that Strassen showed division can always be eliminated at polynomial cost, but of course that’s for polynomials (doesn’t really make sense otherwise, does it?), and this isn’t. Annoying.

  29. jonas Says:

    Re J #23: wow. The surprising part of that post is where it says quantum computing is “a challenging intellectual mathematical game, but with empty content”, and even tries to invoke Gil Kalai for that. That’s not what Gil Kalai said about quantum computing, right?

  30. Sniffnoy Says:

    By the way, Scott, in your bit about “not everything about the permanent is hard”, I’m surprised you didn’t mention that it’s easy to tell whether the permanent of an integer matrix is odd or even. 😛

  31. Scott Says:

    jonas #29: Right, that’s not at all what Gil says—he imagines that the field is heading toward the enormously meaningful, contentful discovery that QC is impossible! 🙂 And while I think Gil is dead wrong about that, I respect him for at least correctly understanding (unlike other, less curious QC skeptics) what a revolutionary development in physics it would be to explain why QC is impossible.

    A second problem with Zeilberger’s post is that he acknowledges Turing’s undecidability theorem to be deep and important (“because it reveals statements previously thought to be meaningful to be meaningless,” in Zeilberger’s absurd worldview), but then says that all later results about undecidability are empty of content. Here, it seems to me, Zeilberger simply isn’t following his own wrong logic correctly: he could also, for example, have celebrated my and Yedidia’s paper for proving that the question of the value of BB(8000) is “meaningless,” rather than “meaningful” as some people had mistakenly thought.

  32. Douglas Knight Says:

    Sniffnoy 28, Laurent polynomials are polynomials in additional variables, so doesn’t Strassen’s theorem apply?

  33. Sniffnoy Says:

    Douglas Knight: I’m not really seeing how? I mean, I haven’t checked the actual construction, but just from the statement given there, I don’t see how it would automatically apply.

    I mean, sure, you could frontload one division for each variable that appears inverted, so now you not only have various x_i but also x_i^(-1). And because we’re dealing with a Laurent polynomial, that shouldn’t cause any domain problems, where we accidentally restrict the domain by frontloading the division like that.

    But the problem, it seems to me, is that our operations don’t know that x_i^(-1) is supposed to represent the multiplicative inverse of x_i, and so won’t be able to do algebra with them appropriately. 1/a can’t be replaced with a^(-1), a*a^(-1) can’t be simplified to 1, a^(-1)+b^(-1) can’t be replaced by (a+b)a^(-1)b^(-1), etc.

    Later when I have time I’ll try doing an example out and see if I can find a concrete illustration of this, but I expect they shouldn’t be hard to come by.

    (Also, Scott, I’ve got a comment caught in moderation, would you mind approving it or deleting it? Thank you!)

  34. Polymath Says:

    If Gordeev and Haeusler are correct that NP=PSPACE (with their current upper bound on the size of the shortest proof/disproof in ZF of n-quantifier QBF statements being O(n^270) before any attempts to optimize it have been tried), much of what you say here becomes irrelevant. You should take a look at their work, because it satisfies zero of the usual tests for crankishness, and looks exactly like one would expect the first proof of a such a shocking result to look like if it were true (a team of researchers well-established in the field with a long record of publications building on their previous work with some new ideas and writing clearly and actively interacting with everyone currently querying the proof).

  35. fred Says:

    Hi Scott,
    sorry to be off-topic (from the Permanent of a Matrix to the Rise of the Matrix)…

    Given your skepticism on the topic, I wonder whether you watched this talk from Sam Harris on the AI singularity? (it’s only 14 min)

  36. Scott Says:

    Polymath #34: Inspired by your comment, I just checked Gordeev and Haeusler’s preprint against my Ten Signs A Claimed Mathematical Breakthrough Is Wrong. I counted 2.5 of the 10 Signs as activated here:

    6. The paper jumps into technicalities without presenting a new idea.
    Definitely true and the most worrying part.

    7. The paper doesn’t build on (or in some cases even refer to) any previous work.
    Partly true. The paper cites only a few ancient papers, obscure papers, and papers by the authors themselves.

    10. The techniques just seem too wimpy for the problem at hand.
    Definitely true. If NP=coNP=PSPACE from technical model-theoretic arguments like these, it would just seem like a random unexplained coincidence.

    As you say, this paper activates far fewer of the Signs than the usual such paper (and congratulations to the authors for that!), but still enough for me to wager (say) 500:1 odds against.

    You say that the authors are “actively interacting with everyone currently querying the proof.” Is there a website or blog where that’s happening? Could you share a link?

    I think viewing an interactive protocol would help enormously in getting more quickly to the heart of the matter, as we know because IP is strictly larger than NP … no wait, that’s precisely what these authors are denying! 😉

  37. Polymath Says:

    They are discussing their proof and correcting some misunderstandings on it on the FOM (Foundations of Mathematics) email forum.

    And they do indeed have significant-looking new ideas, which they make clear quite early in the paper, regarding conditions under which exponential reduction in proof size is possible, with careful control of the parameters used in the reduction, that are related to their previous proof-theoretic work but extend it with a powerful new notion of “natural deduction” involving directed acyclic graphs with some extra structure.

    I’m sure the correctness of their proof will be settled one way or another within 2 or 3 months. In the meantime, I hereby put up $1 at the odds you offered, since, whatever limit you must have had in mind, it’s reasonable to suppose you didn’t want to require people to make fractional bets.

  38. Polymath Says:

    Also, these are proof-theoretic arguments not model-theoretic arguments. The authors are proof theorists and models have nothing to do with it.

  39. Sniffnoy Says:

    Douglas: Oh you know what, I think it’s not too hard to get this to work using the fact that it’s a Laurent polynomial *with bounded denominator*. Specifically, each entry appears in the denominator at most once, and only the interior entries, at that.

    So, take the λ-determinant and multiply by the product of the interior entries. This is a polynomial, and we know how to compute it using division; so we can eliminate the division while keeping everything polynomial. Then we just need to divide by the product of the interior entries again.

    Now, there’s one subtlety here: How do we know each interior entry does appear in the denominator? After all, if λ=0 (the diagonal product) or λ=-1 (the determinant), they don’t! This is relevant because otherwise our method has again artificially restricted the domain, and isn’t the general algorithm we wanted (though it’s still better than what we started with, which involved divisions by a number of other things as well). But, this seems pretty easy to see from the alternating-sign-matrix expansion: Barring the exceptional cases λ=0 and λ=-1 for which the terms with denominator don’t appear in the first place, there’s simply no way for distinct terms there to cancel each other out!

    So it seems like we can indeed compute the λ-determinant in polynomial time for any λ; if λ=0 or λ=-1 we already know how, otherwise we use the method above.

    But that’s for fixed λ; what if we want λ to be part of the input? Well, fortunately, if I’m not mistaken, the λ-determinant is actually polynomial as well, not just a Laurent polynomial, because the number of inversions of a given alternating sign matrix is always at least the number of -1’s (this isn’t too hard to see, if I haven’t messed up). So we can straightforwardly include that, at least so long as λ isn’t 0 or -1.

    But if we’re working over a field like, say, the rationals, where perhap we can break out of the algebraic circuit model and do some equality testing and branching and such, then, yeah, we’re good, because we can explicitly check at the beginning whether λ is 0 or -1 and go from there.

    The only thing maybe missing here is that while this shows (assuming Strassen’s construction, which I haven’t looked up) that the λ-determinant can be computed in a polynomial number of “operations”, perhaps (assuming we’re working over, say, Q) the bit complexity is worse than that? After all, in the case of the determinant, straight Gauss-Jordan elimination isn’t enough to guarantee polynomial bit complexity; the intermediate numbers can grow exponentially large. You have to use a more advanced algorithm if you want to demonstrate that the determinant is truly polynomial-time, and not just a polynomial number of “operations”.

    Now we haven’t explicitly done anything here that would put any bound on how large the intermediate numbers can get… but (ignoring λ∈{0,-1}) this is just, like, a circuit, not a complicated branching thing like Gaussian elimination. So I don’t think that should be a problem here, right? That should be enough to guarantee that everything really is polynomial? Or no?

    So anyway yes it looks like the λ-determinant is in polynomial time, but if the bit at the end is wrong, then, at least, it’s possible in a polynomial number of “operations” regardless (assuming Strassen’s construction).

  40. Sniffnoy Says:

    Oops, that should say “polynomial in λ as well”, not “polynomial as well”.

  41. Scott Says:

    Polymath #37: While Dana forbid me from making more scientific bets, I’m sure she’ll make an exception for a claim of NP=coNP=PSPACE. So let’s be a little less wimpy about it. If I send you $10000 if the proof turns out to be valid, will you send me $20 if not?

  42. P=BPP Says:

    @Scott you skipped my query #25.

    Also what if a BPP algorithm uses black box randomization like randomized self reduction. If we cant derandomized RSR then we should not be able to derandomize the BPP algorithm as well right? So is this a constructive proof of P not BPP (I know P=BPP makes more sense from what we know).

  43. Scott Says:

    P=BPP:

      Given this is there any other reason to believe Permanent has deterministic poly time approximation other than P=BPP?

    The basic premises of derandomization seem to me like an overwhelmingly sufficient reason to believe the truth of the statement: if E requires exponential-size circuits then there are excellent pseudorandom generators, and if there are excellent PRGs then you can plug them into the Jerrum-Sinclair-Vigoda algorithm and boom, you’ve got a deterministic poly-time approximation algorithm for nonnegative permanents.

    So the interesting question here is whether JSV can be derandomized in a less high-powered way, as for example primality testing was.

      Also what if a BPP algorithm uses black box randomization like randomized self reduction. If we cant derandomized RSR then we should not be able to derandomize the BPP algorithm as well right?

    No, there’s a category error in your question. A BPP algorithm is a piece of code. If it has subroutines, then the subroutines themselves have to be BPP algorithms, and you’re even allowed to examine the subroutines and know their internal structure when you’re trying to derandomize the algorithm.

    At most, your argument points toward the existence of an oracle relative to which P≠BPP. But that’s something we already knew from more direct arguments!

  44. P=BPP Says:

    @Scott #43 Thank you but I lost you at ” A BPP algorithm is a piece of code. If it has subroutines, then the subroutines themselves have to be BPP algorithms, and you’re even allowed to examine the subroutines and know their internal structure when you’re trying to derandomize the algorithm.”

    Dick Lipton says if you have randomized algorithm for Discrete Log that works on 1/(log p)^c inputs then you have a BPP algorithm for Discrete Log. Is this correct? This is an example of RSR. You said RSR is not derandomizable. So if someone has a randomized algorithm for Discrete Log that works on 1/(log p)^c inputs then he/she has a BPP algorithm for the same. If you derandomize this then you have derandomized a particular RSR which you said was impossible. I am sure I am mistaken mainly because I do not understand. If you have an elementary reference I can read and get back.

  45. Scott Says:

    P=BPP #44: No, I meant that there’s no black-box derandomization of RSR. In other words: if you have a black box that computes a low-degree polynomial over a large finite field on a 1-ε fraction of inputs (with the ε fraction of errors adversarially chosen), and your only access to the polynomial is via the black box, and the polynomial could otherwise be arbitrary, then there’s a randomized algorithm to compute the polynomial on an arbitrary input that makes only a polynomial number of black-box queries, but there’s no deterministic such algorithm.

    But of course, for some specific polynomial, it might simply turn out that it’s computable in P. And if (hypothetically) the discovery of that proceeded by first finding an average-case algorithm, then converting it to a BPP algorithm using RSR, and finally converting the BPP algorithm into a P algorithm, you could say that the discovery had proceeded via “derandomizing RSR.” But again, it wouldn’t have been a black-box derandomization, because you would’ve needed to use additional facts about this polynomial, not just the fact that you could compute the polynomial on a 1-ε fraction of inputs.

    The distinction between black-box and non-black-box is crucial to what we’re talking about, and if you don’t assimilate it, we’re just going to go around in circles.

  46. Gil Kalai Says:

    Indeed it is a very tempting idea to think that BosonSampling could somehow help us to estimate the value of permanents and to identify decision problems (or decision-like problems) which are difficult. (Of course, there are good reasons to think that it could not help in (multiplicatively) approximating the value of permanents of general matrices being #P-complete.)

    Here are a few questions:

    1) Given a general complex n by m matrix M with orthonormal rows BosonSampling allows to approximate additively the value of a permanent of a n by n submatrix (absolute value squared) up to a 1/poly error. (I suppose this goes back to Troyansky and Tishby.) How hard is this computational task??? Is it BQP-hard? or in P? (Additively approximating the value of certain Jones polynomials (that a quantum computer can do) is known to be BQP-hard. But I dont know about permanents.)

  47. Gil Kalai Says:

    (cont.)

    2) There are various decision problems that we can solve using BosonSampling and it is interesting how hard they are.

    Consider a class F of n by n submatrices Y (with repeated columns) such that there is an efficient algorithm to tell if a submatrix Y is in the class. The associated decision problem is

    “Is the sum A of squares of absolute values of permanents for submatrices in F larger than this sum B for submatrices not in F?”
    I dont know how hard this problem is on its own. BosonSampling gives us an efficient quantum algoritm to answer YES if A > (1+epsilon B) and NO if A < (1-\epsilon B).
    Again, I dont know how hard is this. (E.g. could it be unique-game hard… is it in P?)

    3) A very simple special case of question 2 is when F is the submatrices which miss the first column of M. Maybe there the question is in P? (In this case (by the permanental Cauchy-Binet theorem) A can itself be expressed as the value of a certain permanent but I dont know how hard it is to approximate such permanent for a general M).

    4) (This is something we briefly discussed with Scott and others before) You can extend this question slightly by not asking that the rows of M are orthonormal. (And then when you sample you normalize by the sum of all contributions.) This is a general version of BosonSampling: How hard it is? Is it #P hard? Is it equivalent to ordinary BosonSampling?

  48. Scott Says:

    Gil #46: You and I have discussed these things many times, but no, I don’t see how BosonSampling gives you a speedup for additively estimating the permanent. After all, Gurvits’s algorithm already lets you approximate Per(A) to ±ε||A||n error in randomized O(n22) time. As far as we know today, in order to see an advantage with BosonSampling, you really do need to talk about a sampling task (like sampling a random matrix from a distribution that’s weighted toward matrices with larger permanents).

  49. Gil Kalai Says:

    Scott #48 yes, we certainly talked about related matters. I remember that we don’t know anything regarding question 4. Certainly the power of BosonSampling (and FourierSampling) as a sampling task is quite amazing. One conjecture in this direction that I find appealing is that a classical computer with a BQP oracle, or perhaps even with a QMA oracle cannot perform BosonSampling!!!

    Regarding the first problem: is it that any approximation for |Per(A)| (when the value is sufficiently large so that sampling is relevant) that we can achieve with BosonSampling can be done by a classical computer via Gurvits’s algorithm? (I was not aware of it).

    Regarding problems 2 and 3. (Certainly I’d love to see some discussion in the literature.) On the face of it, you can get an efficient approximated answer (in the sense I asked above) to very complicated-looking decision problems. Maybe there are reasons to expect that there are efficient classical algorithms as well, (Namely that when either A> (1+epsilon B) and B> (1+epsilon A) there must be a “reason” for it that allows efficiently classical algorithm. This looks like an interesting question. I dont know the answer even to the very simple version in question 2. (Maybe you told me and I forgot.)

    As for the paper of Lidror Troyansky and Tali Tishby I think they had several purposes beside pointing out that BosonSampling does not lead to efficient answers to #P problems. They wanted to point out that BosonSampling can be implemented by simple devices and also to make some distinction between the Bosonic case and Fermionic case which would reflect the huge computational complexity gap in computing determinants and permanents. Certainly, this gap is demonstrated in a very convincing and beautiful way in your paper with Alex.

  50. Polymath Says:

    Scott, I will take that bet, but I am confused because it seems to me that you could only place the odds that THIS proof is correct at 500:1 if your prior for the actual truth of the result was very low also, because conditioning on someone proving it, the proof looking like this doesn’t seem very surprising. What are your actual prior probabilities for the truth of the following?
    1) NP=CO-NP
    2) NP = PSPACE
    3) P = NP
    4) P = PSPACE

  51. Scott Says:

    Polymath #50: I’ve previously given probabilities for such statements in the 1% range, which actually seems quite consistent with my 0.2% probability for this particular attempt.

    Incidentally, Dana has since signed off on $20 vs. $10,000. It’s only out of respect for her that I won’t propose an increase to $2,000 vs. $1,000,000. 🙂

  52. fred Says:

    Hi Scott,

    is there anything here to be excited about? (Stanford and the National Institute of Informatics in Tokyo)

    http://spectrum.ieee.org/tech-talk/computing/hardware/new-computer-combines-electronic-circuits-with-light-pulses

  53. Scott Says:

    Fred #52: Possibly interesting from an engineering or hardware standpoint, but there would seem to be even less reason here than with D-Wave for imagining that you’re going to see any scalable speedup for NP-complete problems, since there doesn’t even seem to be an attempt to exploit collective quantum effects.

  54. Polymath Says:

    Your utility function is linear further up than I expected.

  55. Joshua Zelinsky Says:

    Scott #31,

    “A second problem with Zeilberger’s post is that he acknowledges Turing’s undecidability theorem to be deep and important (“because it reveals statements previously thought to be meaningful to be meaningless,” in Zeilberger’s absurd worldview), but then says that all later results about undecidability are empty of content. Here, it seems to me, Zeilberger simply isn’t following his own wrong logic correctly: he could also, for example, have celebrated my and Yedidia’s paper for proving that the question of the value of BB(8000) is “meaningless,” rather than “meaningful” as some people had mistakenly thought.”

    While I agree with the general thrust of your point here, I have to wonder if anyone would have prior to this work seriously have suggested that BB(8000) would be resolvable in ZFC. At a level of personal anecdote, for a few years now, I’ve been of the opinion that any such claim for anything as small as BB(100) or so was almost certainly ridiculous. Did anyone seriously suggest that we should actually expect the values that we can successfully determine in ZFC to go up as high as 8000?

  56. Scott Says:

    Joshua #55: The trouble with your question is that before our paper, I can’t even find people speculating in writing about the question of the least k such that BB(k) is independent of ZFC! But yes, I agree: one could have picked many better examples of computability results to illustrate the absurdity of Zeilberger’s claim that “meaningful” computability ended with Gödel and Turing.

  57. Easy Says:

    He seems a well published researcher and passes your 10point check on big claims https://arxiv.org/abs/1602.04781.
    Is he right any news?

  58. Sniffnoy Says:

    Man, it seems like other people are way too eager to say Scott’s “ten signs” have been passed!

    (And what about 11 and 12? And this is close enough to P != NP that many of those 8 signs (and Sid’s 9th) will apply as well, though many of those overlap…)

  59. gentzen Says:

    Sniffnoy #58: I don’t know whether Hauptmann’s proof of P != NP or his proof of NP != PSPACE are correct. I don’t even know whether potential holes (=refutations) where brought to the author’s attention. But if those papers of Hauptmann would not pass Scott’s “ten signs”, then which paper should pass it?

    Note that proving P != PH is equivalent to proving P != NP, since P=NP would imply P=PH. So your “And this is close enough to P != NP” hints that you should not judge this paper.

    But just because Hauptmann’s paper passes Scott’s “ten signs” doesn’t mean that it is productive to discuss it here. I admit that discussing a paper which fails most of Scott’s “ten signs” here is not too productive either, but at least one can establish which of the “ten signs” apply.

    And we should not forget that the plausibility of the result is important too. Hence scepticism against a paper proving P=NP or NP=PSPACE is always justified, even if it is a fine paper.

  60. jonas Says:

    @gentzen: In case it’s not clear, the so-called ten signs are the ones described in the blog entry https://scottaaronson-production.mystagingwebsite.com/?p=304 “Ten Signs a Claimed Mathematical Breakthrough is Wrong” (Scott accidentally gave the wrong link).

  61. Sniffnoy Says:

    Note that proving P != PH is equivalent to proving P != NP, since P=NP would imply P=PH.

    Oops, yes. If PH collapses it must collapse at some level; there’s no way to have P=NP but P!=Σ_2. You are right, they are equivalent. You’d think “…and therefore P=NP” is the sort of implication that would be important enough to be putting in the abstract! Not only did they put the important conclusion at the end, they left out the most important part of it. Burying the lede, man. Not a good way to write things.

    So your “And this is close enough to P != NP” hints that you should not judge this paper.

    I indeed have no intention of attempting to judge this paper, but that’s a bit harsh for making a simple mistake like that.

  62. AJ Says:

    I do not see any collapse results if Mod_kP is in P/poly for k\geq2. Could there be any?

  63. Scott Says:

    AJ #62: Yes. Toda’s Theorem gives us that PH is in BPPModkP for any k. So if ModkP is in P/poly, then PH is also in P/poly, which collapses PH to Σ2 by Karp-Lipton (and indeed, even to the smaller classes ZPPNP and S2P).

  64. AJ Says:

    interesting but strange indeed right?

  65. gentzen Says:

    Sniffnoy #61: This mistake proves that the person making it hasn’t even browsed the conclusion of the paper before making a judgment. The real challenge is to deny answering requests of anonymous people like “Easy” without linking this with a “too explicit” judgment of the paper in question. The idea to offer bets based on overwhelming priors that Scott used may be one possible way to achieve this. I decided to respond with “Whether it is correct or not is probably best determined by the normal peer review process”.

    The only issue is that the results of the review process are not public, especially their timeline. But the fact that a paper has passed the peer review process becomes public information, so the public information is sort of sufficient in the end (after a sufficient amount of time has passed).

  66. Scott Says:

    AJ #64: Yes, Toda’s Theorem is strange.

  67. Scott Says:

    Easy #57: FDR once famously told an activist, “OK fine, you’ve convinced me. Now go out, organize some marches on Washington, and pressure me!”

    Similarly: OK fine, you’ve made me aware of Hauptmann’s P≠NP claim. Now if enough other people also made me aware of it, I might be forced to offer betting odds against or something! As it is, though, I’ll just let it sit there and let experts examine it for bugs, as is the usual practice.

  68. Aula Says:

    Scott #67:

    I might be forced to offer betting odds against or something!

    I’m not going to ask you to do that, since I don’t want to get you in trouble with your wife. I want to ask you something else instead: how would you determine the odds for a paper like this? If the paper was claiming something generally believed to be false (like the recent NP=PSPACE paper) then it would be easy to give enormous odds for it being incorrect. Likewise, if the paper was trying to prove something that is most likely true but the proof itself was almost completely incomprehensible (like Deolalikar’s attempt at P≠NP some years ago) that would also justify huge odds against it. However, this particular paper is largely based on results and methods published by other people, tweaking them just enough to make them work and explaining why it’s necessary to do so. I think that kinda limits the ways in which the paper could be wrong; some subtle technicality is still possibile, but a blatant misunderstanding (or utter crackpottery) not so much, and that should be reflected in the odds.

  69. Scott Says:

    Aula #68: On the contrary (and again speaking in generalities), if a paper claims to resolve a gargantuan open problem by using only small tweaks to results and methods published by other people, that makes me want to bet at huge odds that the paper is wrong. For if the existing results and methods sufficed, then why wasn’t it done already?

    (And incidentally, mildly tweaking other people’s results and methods is actually one of the more dangerous things we do in TCS, since there are so many opportunities to have subtly misunderstood the exact conditions under which the other people’s methods apply, the exact meanings of their terms and notation, etc. Just like in programming, errors tend to crop up at the interfaces between supposed abstraction boundaries.)

  70. Aula Says:

    Scott #69:

    On the contrary (and again speaking in generalities), if a paper claims to resolve a gargantuan open problem by using only small tweaks to results and methods published by other people, that makes me want to bet at huge odds that the paper is wrong. For if the existing results and methods sufficed, then why wasn’t it done already?

    Because if people are convinced that the problem will need a 1000-page solution, they won’t ever bother looking for a 40-page one. Even if they are looking for a short solution, they may quickly discard an idea because of perceived difficulties, even if those difficulties could be overcome quite easily, while wasting a lot of time pursuing a hopeless idea that seems promising. (One of the most profound results of AI research so far is the observation that human intelligence is really biased in this way.)

    (And incidentally, mildly tweaking other people’s results and methods is actually one of the more dangerous things we do in TCS, […] Just like in programming, errors tend to crop up at the interfaces between supposed abstraction boundaries.)

    Well, sure, but OTOH that usually makes the errors easier to catch, at least in real-world programming.