Archive for the ‘Nerd Interest’ Category

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

Tuesday, May 27th, 2014

Update (June 3): A few days after we posted this paper online, Brent Werness, a postdoc in probability theory at the University of Washington, discovered a serious error in the “experimental” part of the paper.  Happily, Brent is now collaborating with us on producing a new version of the paper that fixes the error, which we hope to have available within a few months (and which will replace the version currently on the arXiv).

To make a long story short: while the overall idea, of measuring “apparent complexity” by the compressed file size of a coarse-grained image, is fine, the “interacting coffee automaton” that we study in the paper is not an example where the apparent complexity becomes large at intermediate times.  That fact can be deduced as a corollary of a result of Liggett from 2009 about the “symmetric exclusion process,” and can be seen as a far-reaching generalization of a result that we prove in our paper’s appendix: namely, that in the non-interacting coffee automaton (our “control case”), the apparent complexity after t time steps is upper-bounded by O(log(nt)).  As it turns out, we were more right than we knew to worry about large-deviation bounds giving complete mathematical control over what happens when the cream spills into the coffee, thereby preventing the apparent complexity from ever becoming large!

But what about our numerical results, which showed a small but unmistakable complexity bump for the interacting automaton (figure 10(a) in the paper)?  It now appears that the complexity bump we saw in our data is likely to be explainable by an incomplete removal of what we called “border pixel artifacts”: that is, “spurious” complexity that arises merely from the fact that, at the border between cream and coffee, we need to round the fraction of cream up or down to the nearest integer to produce a grayscale.  In the paper, we devoted a whole section (Section 6) to border pixel artifacts and the need to deal with them: something sufficiently non-obvious that in the comments of this post, you can find people arguing with me that it’s a non-issue.  Well, it now appears that we erred by underestimating the severity of border pixel artifacts, and that a better procedure to get rid of them would also eliminate the complexity bump for the interacting automaton.

Once again, this error has no effect on either the general idea of complexity rising and then falling in closed thermodynamic systems, or our proposal for how to quantify that rise and fall—the two aspects of the paper that have generated the most interest.  But we made a bad choice of model system with which to illustrate those ideas.  Had I looked more carefully at the data, I could’ve noticed the problem before we posted, and I take responsibility for my failure to do so.

The good news is that ultimately, I think the truth only makes our story more interesting.  For it turns out that apparent complexity, as we define it, is not something that’s trivial to achieve by just setting loose a bunch of randomly-walking particles, which bump into each other but are otherwise completely independent.  If you want “complexity” along the approach to thermal equilibrium, you need to work a bit harder for it.  One promising idea, which we’re now exploring, is to consider a cream tendril whose tip takes a random walk through the coffee, leaving a trail of cream in its wake.  Using results in probability theory—closely related, or so I’m told, to the results for which Wendelin Werner won his Fields Medal!—it may even be possible to prove analytically that the apparent complexity becomes large in thermodynamic systems with this sort of behavior, much as one can prove that the complexity doesn’t become large in our original coffee automaton.

So, if you’re interested in this topic, stay tuned for the updated version of our paper.  In the meantime, I wish to express our deepest imaginable gratitude to Brent Werness for telling us all this.


Good news!  After nearly three years of procrastination, fellow blogger Sean Carroll, former MIT undergraduate Lauren Ouellette, and yours truly finally finished a paper with the above title (coming soon to an arXiv near you).  PowerPoint slides are also available (as usual, you’re on your own if you can’t open them—sorry!).

For the background and context of this paper, please see my old post “The First Law of Complexodynamics,” which discussed Sean’s problem of defining a “complextropy” measure that first increases and then decreases in closed thermodynamic systems, in contrast to entropy (which increases monotonically).  In this exploratory paper, we basically do five things:

  1. We survey several candidate “complextropy” measures: their strengths, weaknesses, and relations to one another.
  2. We propose a model system for studying such measures: a probabilistic cellular automaton that models a cup of coffee into which cream has just been poured.
  3. We report the results of numerical experiments with one of the measures, which we call “apparent complexity” (basically, the gzip file size of a smeared-out image of the coffee cup).  The results confirm that the apparent complexity does indeed increase, reach a maximum, then turn around and decrease as the coffee and cream mix.
  4. We discuss a technical issue that one needs to overcome (the so-called “border pixels” problem) before one can do meaningful experiments in this area, and offer a solution.
  5. We raise the open problem of proving analytically that the apparent complexity ever becomes large for the coffee automaton.  To underscore this problem’s difficulty, we prove that the apparent complexity doesn’t become large in a simplified version of the coffee automaton.

Anyway, here’s the abstract:

In contrast to entropy, which increases monotonically, the “complexity” or “interestingness” of closed systems seems intuitively to increase at first and then decrease as equilibrium is approached. For example, our universe lacked complex structures at the Big Bang and will also lack them after black holes evaporate and particles are dispersed. This paper makes an initial attempt to quantify this pattern. As a model system, we use a simple, two-dimensional cellular automaton that simulates the mixing of two liquids (“coffee” and “cream”). A plausible complexity measure is then the Kolmogorov complexity of a coarse-grained approximation of the automaton’s state, which we dub the “apparent complexity.” We study this complexity measure, and show analytically that it never becomes large when the liquid particles are non-interacting. By contrast, when the particles do interact, we give numerical evidence that the complexity reaches a maximum comparable to the “coffee cup’s” horizontal dimension. We raise the problem of proving this behavior analytically.

Questions and comments more than welcome.


In unrelated news, Shafi Goldwasser has asked me to announce that the Call for Papers for the 2015 Innovations in Theoretical Computer Science (ITCS) conference is now available.

Why I Am Not An Integrated Information Theorist (or, The Unconscious Expander)

Wednesday, May 21st, 2014

Happy birthday to me!

Recently, lots of people have been asking me what I think about IIT—no, not the Indian Institutes of Technology, but Integrated Information Theory, a widely-discussed “mathematical theory of consciousness” developed over the past decade by the neuroscientist Giulio Tononi.  One of the askers was Max Tegmark, who’s enthusiastically adopted IIT as a plank in his radical mathematizing platform (see his paper “Consciousness as a State of Matter”).  When, in the comment thread about Max’s Mathematical Universe Hypothesis, I expressed doubts about IIT, Max challenged me to back up my doubts with a quantitative calculation.

So, this is the post that I promised to Max and all the others, about why I don’t believe IIT.  And yes, it will contain that quantitative calculation.

But first, what is IIT?  The central ideas of IIT, as I understand them, are:

(1) to propose a quantitative measure, called Φ, of the amount of “integrated information” in a physical system (i.e. information that can’t be localized in the system’s individual parts), and then

(2) to hypothesize that a physical system is “conscious” if and only if it has a large value of Φ—and indeed, that a system is more conscious the larger its Φ value.

I’ll return later to the precise definition of Φ—but basically, it’s obtained by minimizing, over all subdivisions of your physical system into two parts A and B, some measure of the mutual information between A’s outputs and B’s inputs and vice versa.  Now, one immediate consequence of any definition like this is that all sorts of simple physical systems (a thermostat, a photodiode, etc.) will turn out to have small but nonzero Φ values.  To his credit, Tononi cheerfully accepts the panpsychist implication: yes, he says, it really does mean that thermostats and photodiodes have small but nonzero levels of consciousness.  On the other hand, for the theory to work, it had better be the case that Φ is small for “intuitively unconscious” systems, and only large for “intuitively conscious” systems.  As I’ll explain later, this strikes me as a crucial point on which IIT fails.

The literature on IIT is too big to do it justice in a blog post.  Strikingly, in addition to the “primary” literature, there’s now even a “secondary” literature, which treats IIT as a sort of established base on which to build further speculations about consciousness.  Besides the Tegmark paper linked to above, see for example this paper by Maguire et al., and associated popular article.  (Ironically, Maguire et al. use IIT to argue for the Penrose-like view that consciousness might have uncomputable aspects—a use diametrically opposed to Tegmark’s.)

Anyway, if you want to read a popular article about IIT, there are loads of them: see here for the New York Times’s, here for Scientific American‘s, here for IEEE Spectrum‘s, and here for the New Yorker‘s.  Unfortunately, none of those articles will tell you the meat (i.e., the definition of integrated information); for that you need technical papers, like this or this by Tononi, or this by Seth et al.  IIT is also described in Christof Koch’s memoir Consciousness: Confessions of a Romantic Reductionist, which I read and enjoyed; as well as Tononi’s Phi: A Voyage from the Brain to the Soul, which I haven’t yet read.  (Koch, one of the world’s best-known thinkers and writers about consciousness, has also become an evangelist for IIT.)

So, I want to explain why I don’t think IIT solves even the problem that it “plausibly could have” solved.  But before I can do that, I need to do some philosophical ground-clearing.  Broadly speaking, what is it that a “mathematical theory of consciousness” is supposed to do?  What questions should it answer, and how should we judge whether it’s succeeded?

The most obvious thing a consciousness theory could do is to explain why consciousness exists: that is, to solve what David Chalmers calls the “Hard Problem,” by telling us how a clump of neurons is able to give rise to the taste of strawberries, the redness of red … you know, all that ineffable first-persony stuff.  Alas, there’s a strong argument—one that I, personally, find completely convincing—why that’s too much to ask of any scientific theory.  Namely, no matter what the third-person facts were, one could always imagine a universe consistent with those facts in which no one “really” experienced anything.  So for example, if someone claims that integrated information “explains” why consciousness exists—nope, sorry!  I’ve just conjured into my imagination beings whose Φ-values are a thousand, nay a trillion times larger than humans’, yet who are also philosophical zombies: entities that there’s nothing that it’s like to be.  Granted, maybe such zombies can’t exist in the actual world: maybe, if you tried to create one, God would notice its large Φ-value and generously bequeath it a soul.  But if so, then that’s a further fact about our world, a fact that manifestly couldn’t be deduced from the properties of Φ alone.  Notice that the details of Φ are completely irrelevant to the argument.

Faced with this point, many scientifically-minded people start yelling and throwing things.  They say that “zombies” and so forth are empty metaphysics, and that our only hope of learning about consciousness is to engage with actual facts about the brain.  And that’s a perfectly reasonable position!  As far as I’m concerned, you absolutely have the option of dismissing Chalmers’ Hard Problem as a navel-gazing distraction from the real work of neuroscience.  The one thing you can’t do is have it both ways: that is, you can’t say both that the Hard Problem is meaningless, and that progress in neuroscience will soon solve the problem if it hasn’t already.  You can’t maintain simultaneously that

(a) once you account for someone’s observed behavior and the details of their brain organization, there’s nothing further about consciousness to be explained, and

(b) remarkably, the XYZ theory of consciousness can explain the “nothing further” (e.g., by reducing it to integrated information processing), or might be on the verge of doing so.

As obvious as this sounds, it seems to me that large swaths of consciousness-theorizing can just be summarily rejected for trying to have their brain and eat it in precisely the above way.

Fortunately, I think IIT survives the above observations.  For we can easily interpret IIT as trying to do something more “modest” than solve the Hard Problem, although still staggeringly audacious.  Namely, we can say that IIT “merely” aims to tell us which physical systems are associated with consciousness and which aren’t, purely in terms of the systems’ physical organization.  The test of such a theory is whether it can produce results agreeing with “commonsense intuition”: for example, whether it can affirm, from first principles, that (most) humans are conscious; that dogs and horses are also conscious but less so; that rocks, livers, bacteria colonies, and existing digital computers are not conscious (or are hardly conscious); and that a room full of people has no “mega-consciousness” over and above the consciousnesses of the individuals.

The reason it’s so important that the theory uphold “common sense” on these test cases is that, given the experimental inaccessibility of consciousness, this is basically the only test available to us.  If the theory gets the test cases “wrong” (i.e., gives results diverging from common sense), it’s not clear that there’s anything else for the theory to get “right.”  Of course, supposing we had a theory that got the test cases right, we could then have a field day with the less-obvious cases, programming our computers to tell us exactly how much consciousness is present in octopi, fetuses, brain-damaged patients, and hypothetical AI bots.

In my opinion, how to construct a theory that tells us which physical systems are conscious and which aren’t—giving answers that agree with “common sense” whenever the latter renders a verdict—is one of the deepest, most fascinating problems in all of science.  Since I don’t know a standard name for the problem, I hereby call it the Pretty-Hard Problem of Consciousness.  Unlike with the Hard Hard Problem, I don’t know of any philosophical reason why the Pretty-Hard Problem should be inherently unsolvable; but on the other hand, humans seem nowhere close to solving it (if we had solved it, then we could reduce the abortion, animal rights, and strong AI debates to “gentlemen, let us calculate!”).

Now, I regard IIT as a serious, honorable attempt to grapple with the Pretty-Hard Problem of Consciousness: something concrete enough to move the discussion forward.  But I also regard IIT as a failed attempt on the problem.  And I wish people would recognize its failure, learn from it, and move on.

In my view, IIT fails to solve the Pretty-Hard Problem because it unavoidably predicts vast amounts of consciousness in physical systems that no sane person would regard as particularly “conscious” at all: indeed, systems that do nothing but apply a low-density parity-check code, or other simple transformations of their input data.  Moreover, IIT predicts not merely that these systems are “slightly” conscious (which would be fine), but that they can be unboundedly more conscious than humans are.

To justify that claim, I first need to define Φ.  Strikingly, despite the large literature about Φ, I had a hard time finding a clear mathematical definition of it—one that not only listed formulas but fully defined the structures that the formulas were talking about.  Complicating matters further, there are several competing definitions of Φ in the literature, including ΦDM (discrete memoryless), ΦE (empirical), and ΦAR (autoregressive), which apply in different contexts (e.g., some take time evolution into account and others don’t).  Nevertheless, I think I can define Φ in a way that will make sense to theoretical computer scientists.  And crucially, the broad point I want to make about Φ won’t depend much on the details of its formalization anyway.

We consider a discrete system in a state x=(x1,…,xn)∈Sn, where S is a finite alphabet (the simplest case is S={0,1}).  We imagine that the system evolves via an “updating function” f:Sn→Sn. Then the question that interests us is whether the xi‘s can be partitioned into two sets A and B, of roughly comparable size, such that the updates to the variables in A don’t depend very much on the variables in B and vice versa.  If such a partition exists, then we say that the computation of f does not involve “global integration of information,” which on Tononi’s theory is a defining aspect of consciousness.

More formally, given a partition (A,B) of {1,…,n}, let us write an input y=(y1,…,yn)∈Sn to f in the form (yA,yB), where yA consists of the y variables in A and yB consists of the y variables in B.  Then we can think of f as mapping an input pair (yA,yB) to an output pair (zA,zB).  Now, we define the “effective information” EI(A→B) as H(zB | A random, yB=xB).  Or in words, EI(A→B) is the Shannon entropy of the output variables in B, if the input variables in A are drawn uniformly at random, while the input variables in B are fixed to their values in x.  It’s a measure of the dependence of B on A in the computation of f(x).  Similarly, we define

EI(B→A) := H(zA | B random, yA=xA).

We then consider the sum

Φ(A,B) := EI(A→B) + EI(B→A).

Intuitively, we’d like the integrated information Φ=Φ(f,x) be the minimum of Φ(A,B), over all 2n-2 possible partitions of {1,…,n} into nonempty sets A and B.  The idea is that Φ should be large, if and only if it’s not possible to partition the variables into two sets A and B, in such a way that not much information flows from A to B or vice versa when f(x) is computed.

However, no sooner do we propose this than we notice a technical problem.  What if A is much larger than B, or vice versa?  As an extreme case, what if A={1,…,n-1} and B={n}?  In that case, we’ll have Φ(A,B)≤2log2|S|, but only for the boring reason that there’s hardly any entropy in B as a whole, to either influence A or be influenced by it.  For this reason, Tononi proposes a fix where we normalize each Φ(A,B) by dividing it by min{|A|,|B|}.  He then defines the integrated information Φ to be Φ(A,B), for whichever partition (A,B) minimizes the ratio Φ(A,B) / min{|A|,|B|}.  (Unless I missed it, Tononi never specifies what we should do if there are multiple (A,B)’s that all achieve the same minimum of Φ(A,B) / min{|A|,|B|}.  I’ll return to that point later, along with other idiosyncrasies of the normalization procedure.)

Tononi gives some simple examples of the computation of Φ, showing that it is indeed larger for systems that are more “richly interconnected” in an intuitive sense.  He speculates, plausibly, that Φ is quite large for (some reasonable model of) the interconnection network of the human brain—and probably larger for the brain than for typical electronic devices (which tend to be highly modular in design, thereby decreasing their Φ), or, let’s say, than for other organs like the pancreas.  Ambitiously, he even speculates at length about how a large value of Φ might be connected to the phenomenology of consciousness.

To be sure, empirical work in integrated information theory has been hampered by three difficulties.  The first difficulty is that we don’t know the detailed interconnection network of the human brain.  The second difficulty is that it’s not even clear what we should define that network to be: for example, as a crude first attempt, should we assign a Boolean variable to each neuron, which equals 1 if the neuron is currently firing and 0 if it’s not firing, and let f be the function that updates those variables over a timescale of, say, a millisecond?  What other variables do we need—firing rates, internal states of the neurons, neurotransmitter levels?  Is choosing many of these variables uniformly at random (for the purpose of calculating Φ) really a reasonable way to “randomize” the variables, and if not, what other prescription should we use?

The third and final difficulty is that, even if we knew exactly what we meant by “the f and x corresponding to the human brain,” and even if we had complete knowledge of that f and x, computing Φ(f,x) could still be computationally intractable.  For recall that the definition of Φ involved minimizing a quantity over all the exponentially-many possible bipartitions of {1,…,n}.  While it’s not directly relevant to my arguments in this post, I leave it as a challenge for interested readers to pin down the computational complexity of approximating Φ to some reasonable precision, assuming that f is specified by a polynomial-size Boolean circuit, or alternatively, by an NC0 function (i.e., a function each of whose outputs depends on only a constant number of the inputs).  (Presumably Φ will be #P-hard to calculate exactly, but only because calculating entropy exactly is a #P-hard problem—that’s not interesting.)

I conjecture that approximating Φ is an NP-hard problem, even for restricted families of f’s like NC0 circuits—which invites the amusing thought that God, or Nature, would need to solve an NP-hard problem just to decide whether or not to imbue a given physical system with consciousness!  (Alas, if you wanted to exploit this as a practical approach for solving NP-complete problems such as 3SAT, you’d need to do a rather drastic experiment on your own brain—an experiment whose result would be to render you unconscious if your 3SAT instance was satisfiable, or conscious if it was unsatisfiable!  In neither case would you be able to communicate the outcome of the experiment to anyone else, nor would you have any recollection of the outcome after the experiment was finished.)  In the other direction, it would also be interesting to upper-bound the complexity of approximating Φ.  Because of the need to estimate the entropies of distributions (even given a bipartition (A,B)), I don’t know that this problem is in NP—the best I can observe is that it’s in AM.

In any case, my own reason for rejecting IIT has nothing to do with any of the “merely practical” issues above: neither the difficulty of defining f and x, nor the difficulty of learning them, nor the difficulty of calculating Φ(f,x).  My reason is much more basic, striking directly at the hypothesized link between “integrated information” and consciousness.  Specifically, I claim the following:

Yes, it might be a decent rule of thumb that, if you want to know which brain regions (for example) are associated with consciousness, you should start by looking for regions with lots of information integration.  And yes, it’s even possible, for all I know, that having a large Φ-value is one necessary condition among many for a physical system to be conscious.  However, having a large Φ-value is certainly not a sufficient condition for consciousness, or even for the appearance of consciousness.  As a consequence, Φ can’t possibly capture the essence of what makes a physical system conscious, or even of what makes a system look conscious to external observers.

The demonstration of this claim is embarrassingly simple.  Let S=Fp, where p is some prime sufficiently larger than n, and let V be an n×n Vandermonde matrix over Fp—that is, a matrix whose (i,j) entry equals ij-1 (mod p).  Then let f:Sn→Sn be the update function defined by f(x)=Vx.  Now, for p large enough, the Vandermonde matrix is well-known to have the property that every submatrix is full-rank (i.e., “every submatrix preserves all the information that it’s possible to preserve about the part of x that it acts on”).  And this implies that, regardless of which bipartition (A,B) of {1,…,n} we choose, we’ll get

EI(A→B) = EI(B→A) = min{|A|,|B|} log2p,

and hence

Φ(A,B) = EI(A→B) + EI(B→A) = 2 min{|A|,|B|} log2p,

or after normalizing,

Φ(A,B) / min{|A|,|B|} = 2 log2p.

Or in words: the normalized information integration has the same value—namely, the maximum value!—for every possible bipartition.  Now, I’d like to proceed from here to a determination of Φ itself, but I’m prevented from doing so by the ambiguity in the definition of Φ that I noted earlier.  Namely, since every bipartition (A,B) minimizes the normalized value Φ(A,B) / min{|A|,|B|}, in theory I ought to be able to pick any of them for the purpose of calculating Φ.  But the unnormalized value Φ(A,B), which gives the final Φ, can vary greatly, across bipartitions: from 2 log2p (if min{|A|,|B|}=1) all the way up to n log2p (if min{|A|,|B|}=n/2).  So at this point, Φ is simply undefined.

On the other hand, I can solve this problem, and make Φ well-defined, by an ironic little hack.  The hack is to replace the Vandermonde matrix V by an n×n matrix W, which consists of the first n/2 rows of the Vandermonde matrix each repeated twice (assume for simplicity that n is a multiple of 4).  As before, we let f(x)=Wx.  Then if we set A={1,…,n/2} and B={n/2+1,…,n}, we can achieve

EI(A→B) = EI(B→A) = (n/4) log2p,

Φ(A,B) = EI(A→B) + EI(B→A) = (n/2) log2p,

and hence

Φ(A,B) / min{|A|,|B|} = log2p.

In this case, I claim that the above is the unique bipartition that minimizes the normalized integrated information Φ(A,B) / min{|A|,|B|}, up to trivial reorderings of the rows.  To prove this claim: if |A|=|B|=n/2, then clearly we minimize Φ(A,B) by maximizing the number of repeated rows in A and the number of repeated rows in B, exactly as we did above.  Thus, assume |A|≤|B| (the case |B|≤|A| is analogous).  Then clearly

EI(B→A) ≥ |A|/2,

while

EI(A→B) ≥ min{|A|, |B|/2}.

So if we let |A|=cn and |B|=(1-c)n for some c∈(0,1/2], then

Φ(A,B) ≥ [c/2 + min{c, (1-c)/2}] n,

and

Φ(A,B) / min{|A|,|B|} = Φ(A,B) / |A| = 1/2 + min{1, 1/(2c) – 1/2}.

But the above expression is uniquely minimized when c=1/2.  Hence the normalized integrated information is minimized essentially uniquely by setting A={1,…,n/2} and B={n/2+1,…,n}, and we get

Φ = Φ(A,B) = (n/2) log2p,

which is quite a large value (only a factor of 2 less than the trivial upper bound of n log2p).

Now, why did I call the switch from V to W an “ironic little hack”?  Because, in order to ensure a large value of Φ, I decreased—by a factor of 2, in fact—the amount of “information integration” that was intuitively happening in my system!  I did that in order to decrease the normalized value Φ(A,B) / min{|A|,|B|} for the particular bipartition (A,B) that I cared about, thereby ensuring that that (A,B) would be chosen over all the other bipartitions, thereby increasing the final, unnormalized value Φ(A,B) that Tononi’s prescription tells me to return.  I hope I’m not alone in fearing that this illustrates a disturbing non-robustness in the definition of Φ.

But let’s leave that issue aside; maybe it can be ameliorated by fiddling with the definition.  The broader point is this: I’ve shown that my system—the system that simply applies the matrix W to an input vector x—has an enormous amount of integrated information Φ.  Indeed, this system’s Φ equals half of its entire information content.  So for example, if n were 1014 or so—something that wouldn’t be hard to arrange with existing computers—then this system’s Φ would exceed any plausible upper bound on the integrated information content of the human brain.

And yet this Vandermonde system doesn’t even come close to doing anything that we’d want to call intelligent, let alone conscious!  When you apply the Vandermonde matrix to a vector, all you’re really doing is mapping the list of coefficients of a degree-(n-1) polynomial over Fp, to the values of the polynomial on the n points 0,1,…,n-1.  Now, evaluating a polynomial on a set of points turns out to be an excellent way to achieve “integrated information,” with every subset of outputs as correlated with every subset of inputs as it could possibly be.  In fact, that’s precisely why polynomials are used so heavily in error-correcting codes, such as the Reed-Solomon code, employed (among many other places) in CD’s and DVD’s.  But that doesn’t imply that every time you start up your DVD player you’re lighting the fire of consciousness.  It doesn’t even hint at such a thing.  All it tells us is that you can have integrated information without consciousness (or even intelligence)—just like you can have computation without consciousness, and unpredictability without consciousness, and electricity without consciousness.

It might be objected that, in defining my “Vandermonde system,” I was too abstract and mathematical.  I said that the system maps the input vector x to the output vector Wx, but I didn’t say anything about how it did so.  To perform a computation—even a computation as simple as a matrix-vector multiply—won’t we need a physical network of wires, logic gates, and so forth?  And in any realistic such network, won’t each logic gate be directly connected to at most a few other gates, rather than to billions of them?  And if we define the integrated information Φ, not directly in terms of the inputs and outputs of the function f(x)=Wx, but in terms of all the actual logic gates involved in computing f, isn’t it possible or even likely that Φ will go back down?

This is a good objection, but I don’t think it can rescue IIT.  For we can achieve the same qualitative effect that I illustrated with the Vandermonde matrix—the same “global information integration,” in which every large set of outputs depends heavily on every large set of inputs—even using much “sparser” computations, ones where each individual output depends on only a few of the inputs.  This is precisely the idea behind low-density parity check (LDPC) codes, which have had a major impact on coding theory over the past two decades.  Of course, one would need to muck around a bit to construct a physical system based on LDPC codes whose integrated information Φ was provably large, and for which there were no wildly-unbalanced bipartitions that achieved lower Φ(A,B)/min{|A|,|B|} values than the balanced bipartitions one cared about.  But I feel safe in asserting that this could be done, similarly to how I did it with the Vandermonde matrix.

More generally, we can achieve pretty good information integration by hooking together logic gates according to any bipartite expander graph: that is, any graph with n vertices on each side, such that every k vertices on the left side are connected to at least min{(1+ε)k,n} vertices on the right side, for some constant ε>0.  And it’s well-known how to create expander graphs whose degree (i.e., the number of edges incident to each vertex, or the number of wires coming out of each logic gate) is a constant, such as 3.  One can do so either by plunking down edges at random, or (less trivially) by explicit constructions from algebra or combinatorics.  And as indicated in the title of this post, I feel 100% confident in saying that the so-constructed expander graphs are not conscious!  The brain might be an expander, but not every expander is a brain.

Before winding down this post, I can’t resist telling you that the concept of integrated information (though it wasn’t called that) played an interesting role in computational complexity in the 1970s.  As I understand the history, Leslie Valiant conjectured that Boolean functions f:{0,1}n→{0,1}n with a high degree of “information integration” (such as discrete analogues of the Fourier transform) might be good candidates for proving circuit lower bounds, which in turn might be baby steps toward P≠NP.  More strongly, Valiant conjectured that the property of information integration, all by itself, implied that such functions had to be at least somewhat computationally complex—i.e., that they couldn’t be computed by circuits of size O(n), or even required circuits of size Ω(n log n).  Alas, that hope was refuted by Valiant’s later discovery of linear-size superconcentrators.  Just as information integration doesn’t suffice for intelligence or consciousness, so Valiant learned that information integration doesn’t suffice for circuit lower bounds either.

As humans, we seem to have the intuition that global integration of information is such a powerful property that no “simple” or “mundane” computational process could possibly achieve it.  But our intuition is wrong.  If it were right, then we wouldn’t have linear-size superconcentrators or LDPC codes.

I should mention that I had the privilege of briefly speaking with Giulio Tononi (as well as his collaborator, Christof Koch) this winter at an FQXi conference in Puerto Rico.  At that time, I challenged Tononi with a much cruder, handwavier version of some of the same points that I made above.  Tononi’s response, as best as I can reconstruct it, was that it’s wrong to approach IIT like a mathematician; instead one needs to start “from the inside,” with the phenomenology of consciousness, and only then try to build general theories that can be tested against counterexamples.  This response perplexed me: of course you can start from phenomenology, or from anything else you like, when constructing your theory of consciousness.  However, once your theory has been constructed, surely it’s then fair game for others to try to refute it with counterexamples?  And surely the theory should be judged, like anything else in science or philosophy, by how well it withstands such attacks?

But let me end on a positive note.  In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed.  Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.

[Endnote: See also this related post, by the philosopher Eric Schwetzgebel: Why Tononi Should Think That the United States Is Conscious.  While the discussion is much more informal, and the proposed counterexample more debatable, the basic objection to IIT is the same.]


Update (5/22): Here are a few clarifications of this post that might be helpful.

(1) The stuff about zombies and the Hard Problem was simply meant as motivation and background for what I called the “Pretty-Hard Problem of Consciousness”—the problem that I take IIT to be addressing.  You can disagree with the zombie stuff without it having any effect on my arguments about IIT.

(2) I wasn’t arguing in this post that dualism is true, or that consciousness is irreducibly mysterious, or that there could never be any convincing theory that told us how much consciousness was present in a physical system.  All I was arguing was that, at any rate, IIT is not such a theory.

(3) Yes, it’s true that my demonstration of IIT’s falsehood assumes—as an axiom, if you like—that while we might not know exactly what we mean by “consciousness,” at any rate we’re talking about something that humans have to a greater extent than DVD players.  If you reject that axiom, then I’d simply want to define a new word for a certain quality that non-anesthetized humans seem to have and that DVD players seem not to, and clarify that that other quality is the one I’m interested in.

(4) For my counterexample, the reason I chose the Vandermonde matrix is not merely that it’s invertible, but that all of its submatrices are full-rank.  This is the property that’s relevant for producing a large value of the integrated information Φ; by contrast, note that the identity matrix is invertible, but produces a system with Φ=0.  (As another note, if we work over a large enough field, then a random matrix will have this same property with high probability—but I wanted an explicit example, and while the Vandermonde is far from the only one, it’s one of the simplest.)

(5) The n×n Vandermonde matrix only does what I want if we work over (say) a prime field Fp with p>>n elements.  Thus, it’s natural to wonder whether similar examples exist where the basic system variables are bits, rather than elements of Fp.  The answer is yes. One way to get such examples is using the low-density parity check codes that I mention in the post.  Another common way to get Boolean examples, and which is also used in practice in error-correcting codes, is to start with the Vandermonde matrix (a.k.a. the Reed-Solomon code), and then combine it with an additional component that encodes the elements of Fp as strings of bits in some way.  Of course, you then need to check that doing this doesn’t harm the properties of the original Vandermonde matrix that you cared about (e.g., the “information integration”) too much, which causes some additional complication.

(6) Finally, it might be objected that my counterexamples ignored the issue of dynamics and “feedback loops”: they all consisted of unidirectional processes, which map inputs to outputs and then halt.  However, this can be fixed by the simple expedient of iterating the process over and over!  I.e., first map x to Wx, then map Wx to W2x, and so on.  The integrated information should then be the same as in the unidirectional case.


Update (5/24): See a very interesting comment by David Chalmers.

Waiting for BQP Fever

Tuesday, April 1st, 2014

Update (April 5): By now, three or four people have written in asking for my reaction to the preprint “Computational solution to quantum foundational problems” by Arkady Bolotin.  (See here for the inevitable Slashdot discussion, entitled “P vs. NP Problem Linked to the Quantum Nature of the Universe.”)  It gives me no pleasure to respond to this sort of thing—it would be far better to let papers this gobsmackingly uninformed about the relevant issues fade away in quiet obscurity—but since that no longer seems to be possible in the age of social media, my brief response is here.


(note: sorry, no April Fools post, just a post that happens to have gone up on April Fools)

This weekend, Dana and I celebrated our third anniversary by going out to your typical sappy romantic movie: Particle Fever, a documentary about the Large Hadron Collider.  As it turns out, the movie was spectacularly good; anyone who reads this blog should go see it.  Or, to offer even higher praise:

If watching Particle Fever doesn’t cause you to feel in your bones the value of fundamental science—the thrill of discovery, unmotivated by any application—then you are not truly human.  You are a barnyard animal who happens to walk on its hind legs.

Indeed, I regard Particle Fever as one of the finest advertisements for science itself ever created.  It’s effective precisely because it doesn’t try to tell you why science is important (except for one scene, where an economist asks a physicist after a public talk about the “return on investment” of the LHC, and is given the standard correct answer, about “what was the return on investment of radio waves when they were first discovered?”).  Instead, the movie simply shows you the lives of particle physicists, of people who take for granted the urgency of knowing the truth about the basic constituents of reality.  And in showing you the scientists’ quest, it makes you feel as they feel.  Incidentally, the movie also shows footage of Congressmen ridiculing the uselessness of the Superconducting Supercollider, during the debates that led to the SSC’s cancellation.  So, gently, implicitly, you’re invited to choose: whose side are you on?

I do have a few, not quite criticisms of the movie, but points that any viewer should bear in mind while watching it.

First, it’s important not to come away with the impression that Particle Fever shows “what science is usually like.”  Sure, there are plenty of scenes that any scientist would find familiar: sleep-deprived postdocs; boisterous theorists correcting each other’s statements over Chinese food; a harried lab manager walking to the office oblivious to traffic.  On the other hand, the decades-long quest to find the Higgs boson, the agonizing drought of new data before the one big money shot, the need for an entire field to coalesce around a single machine, the whole careers hitched to specific speculative scenarios that this one machine could favor or disfavor—all of that is a profoundly abnormal situation in the history of science.  Particle physics didn’t used to be that way, and other parts of science are not that way today.  Of course, the fact that particle physics became that way makes it unusually suited for a suspenseful movie—a fact that the creators of Particle Fever understood perfectly and exploited to the hilt.

Second, the movie frames the importance of the Higgs search as follows: if the Higgs boson turned out to be relatively light, like 115 GeV, then that would favor supersymmetry, and hence an “elegant, orderly universe.”  If, on the other hand, the Higgs turned out to be relatively heavy, like 140 GeV, then that would favor anthropic multiverse scenarios (and hence a “messy, random universe”).  So the fact that the Higgs ended up being 125 GeV means the universe is coyly refusing to tell us whether it’s orderly or random, and more research is needed.

In my view, it’s entirely appropriate for a movie like this one to relate its subject matter to big, metaphysical questions, to the kinds of questions anyone can get curious about (in contrast to, say, “what is the mechanism of electroweak symmetry breaking?”) and that the scientists themselves talk about anyway.  But caution is needed here.  My lay understanding, which might be wrong, is as follows: while it’s true that a lighter Higgs would tend to favor supersymmetric models, the only way to argue that a heavier Higgs would “favor the multiverse,” is if you believe that a multiverse is automatically favored by a lack of better explanations.  More broadly, I wish the film had made clearer that the explanation for (some) apparent “fine-tunings” in the Standard Model might be neither supersymmetry, nor the multiverse, nor “it’s just an inexplicable accident,” but simply some other explanation that no one has thought of yet, but that would emerge from a better understanding of quantum field theory.  As one example, on reading up on the subject after watching the film, I was surprised to learn that a very conservative-sounding idea—that of “asymptotically safe gravity”—was used in 2009 to predict the Higgs mass right on the nose, at 126.3 ± 2.2 GeV.  Of course, it’s possible that this was just a lucky guess (there were, after all, lots of Higgs mass predictions).  But as an outsider, I’d love to understand why possibilities like this don’t seem to get discussed more (there might, of course, be perfectly good reasons that I don’t know).

Third, for understandable dramatic reasons, the movie focuses almost entirely on the “younger generation,” from postdocs working on ATLAS and CMS detectors, to theorists like Nima Arkani-Hamed who are excited about the LHC because of its ability to test scenarios like supersymmetry.  From the movie’s perspective, the creation of the Standard Model itself, in the 60s and 70s, might as well be ancient history.  Indeed, when Peter Higgs finally appears near the end of the film, it’s as if Isaac Newton has walked onstage.  At several points, I found myself wishing that some of the original architects of the Standard Model, like Steven Weinberg or Sheldon Glashow, had been interviewed to provide their perspectives.  After all, their model is really the one that’s been vindicated at the LHC, not (so far) any of the newer ideas like supersymmetry or large extra dimensions.

OK, but let me come to the main point of this post.  I confess that my overwhelming emotion on watching Particle Fever was one of regret—regret that my own field, quantum computing, has never managed to make the case for itself the way particle physics and cosmology have, in terms of the human urge to explore the unknown.

See, from my perspective, there’s a lot to envy about the high-energy physicists.  Most importantly, they don’t perceive any need to justify what they do in terms of practical applications.  Sure, they happily point to “spinoffs,” like the fact that the Web was invented at CERN.  But any time they try to justify what they do, the unstated message is that if you don’t see the inherent value of understanding the universe, then the problem lies with you.

Now, no marketing consultant would ever in a trillion years endorse such an out-of-touch, elitist sales pitch.  But the remarkable fact is that the message has more-or-less worked.  While the cancellation of the SSC was a setback, the high-energy physicists did succeed in persuading the world to pony up the $11 billion needed to build the LHC, and to gain the information that the mass of the Higgs boson is about 125 GeV.

Now contrast that with quantum computing.  To hear the media tell it, a quantum computer would be a powerful new gizmo, sort of like existing computers except faster.  (Why would it be faster?  Something to do with trying both 0 and 1 at the same time.)  The reasons to build quantum computers are things that could make any buzzword-spouting dullard nod in recognition: cracking uncrackable encryption, finding bugs in aviation software, sifting through massive data sets, maybe even curing cancer, predicting the weather, or finding aliens.  And all of this could be yours in a few short years—or some say it’s even commercially available today.  So, if you check back in a few years and it’s still not on store shelves, probably it went the way of flying cars or moving sidewalks: another technological marvel that just failed to materialize for some reason.

Foolishly, shortsightedly, many academics in quantum computing have played along with this stunted vision of their field—because saying this sort of thing is the easiest way to get funding, because everyone else says the same stuff, and because after you’ve repeated something on enough grant applications you start to believe it yourself.  All in all, then, it’s just easier to go along with the “gizmo vision” of quantum computing than to ask pointed questions like:

What happens when it turns out that some of the most-hyped applications of quantum computers (e.g., optimization, machine learning, and Big Data) were based on wildly inflated hopes—that there simply isn’t much quantum speedup to be had for typical problems of that kind, that yes, quantum algorithms exist, but they aren’t much faster than the best classical randomized algorithms?  What happens when it turns out that the real applications of quantum computing—like breaking RSA and simulating quantum systems—are nice, but not important enough by themselves to justify the cost?  (E.g., when the imminent risk of a quantum computer simply causes people to switch from RSA to other cryptographic codes?  Or when the large polynomial overheads of quantum simulation algorithms limit their usefulness?)  Finally, what happens when it turns out that the promises of useful quantum computers in 5-10 years were wildly unrealistic?

I’ll tell you: when this happens, the spigots of funding that once flowed freely will dry up, and the techno-journalists and pointy-haired bosses who once sang our praises will turn to the next craze.  And they’re unlikely to be impressed when we protest, “no, look, the reasons we told you before for why you should support quantum computing were never the real reasons!  and the real reasons remain as valid as ever!”

In my view, we as a community have failed to make the honest case for quantum computing—the case based on basic science—because we’ve underestimated the public.  We’ve falsely believed that people would never support us if we told them the truth: that while the potential applications are wonderful cherries on the sundae, they’re not and have never been the main reason to build a quantum computer.  The main reason is that we want to make absolutely manifest what quantum mechanics says about the nature of reality.  We want to lift the enormity of Hilbert space out of the textbooks, and rub its full, linear, unmodified truth in the face of anyone who denies it.  Or if it isn’t the truth, then we want to discover what is the truth.

Many people would say it’s impossible to make the latter pitch, that funders and laypeople would never understand it or buy it.  But there’s an $11-billion, 17-mile ring under Geneva that speaks against their cynicism.

Anyway, let me end this “movie review” with an anecdote.  The other day a respected colleague of mine—someone who doesn’t normally follow such matters—asked me what I thought about D-Wave.  After I’d given my usual spiel, he smiled and said:

“See Scott, but you could imagine scientists of the 1400s saying the same things about Columbus!  He had no plan that could survive academic scrutiny.  He raised money under the false belief that he could reach India by sailing due west.  And he didn’t understand what he’d found even after he’d found it.  Yet for all that, it was Columbus, and not some academic critic on the sidelines, who discovered the new world.”

With this one analogy, my colleague had eloquently summarized the case for D-Wave, a case often leveled against me much more verbosely.  But I had an answer.

“I accept your analogy!” I replied.  “But to me, Columbus and the other conquerors of the Americas weren’t heroes to be admired or emulated.  Motivated by gold and spices rather than knowledge, they spread disease, killed and enslaved millions in one of history’s greatest holocausts, and burned the priceless records of the Maya and Inca civilizations so that the world would never even understand what was lost.  I submit that, had it been undertaken by curious and careful scientists—or at least people with a scientific mindset—rather than by swashbucklers funded by greedy kings, the European exploration and colonization of the Americas could have been incalculably less tragic.”

The trouble is, when I say things like that, people just laugh at me knowingly.  There he goes again, the pie-in-the-sky complexity theorist, who has no idea what it takes to get anything done in the real world.  What an amusingly contrary perspective he has.

And that, in the end, is why I think Particle Fever is such an important movie.  Through the stories of the people who built the LHC, you’ll see how it really is possible to reach a new continent without the promise of gold or the allure of lies.

This review of Max Tegmark’s book also occurs infinitely often in the decimal expansion of π

Saturday, March 22nd, 2014

Two months ago, commenter rrtucci asked me what I thought about Max Tegmark and his “Mathematical Universe Hypothesis”: the idea, which Tegmark defends in his recent book Our Mathematical Universe, that physical and mathematical existence are the same thing, and that what we call “the physical world” is simply one more mathematical structure, alongside the dodecahedron and so forth.  I replied as follows:

…I find Max a fascinating person, a wonderful conference organizer, someone who’s always been extremely nice to me personally, and an absolute master at finding common ground with his intellectual opponents—I’m trying to learn from him, and hope someday to become 10-122 as good.  I can also say that, like various other commentators (e.g., Peter Woit), I personally find the “Mathematical Universe Hypothesis” to be devoid of content.

After Peter Woit found that comment and highlighted it on his own blog, my comments section was graced by none other than Tegmark himself, who wrote:

Thanks Scott for your all to [sic] kind words!  I very much look forward to hearing what you think about what I actually say in the book once you’ve had a chance to read it!  I’m happy to give you a hardcopy (which can double as door-stop) – just let me know.

With this reply, Max illustrated perfectly why I’ve been trying to learn from him, and how far I fall short.  Where I would’ve said “yo dumbass, why don’t you read my book before spouting off?,” Tegmark gracefully, diplomatically shamed me into reading his book.

So, now that I’ve done so, what do I think?  Briefly, I think it’s a superb piece of popular science writing—stuffed to the gills with thought-provoking arguments, entertaining anecdotes, and fascinating facts.  I think everyone interested in math, science, or philosophy should buy the book and read it.  And I still think the MUH is basically devoid of content, as it stands.

Let me start with what makes the book so good.  First and foremost, the personal touch.  Tegmark deftly conveys the excitement of being involved in the analysis of the cosmic microwave background fluctuations—of actually getting detailed numerical data about the origin of the universe.  (The book came out just a few months before last week’s bombshell announcement of B-modes in the CMB data; presumably the next edition will have an update about that.)  And Tegmark doesn’t just give you arguments for the Many-Worlds Interpretation of quantum mechanics; he tells you how he came to believe it.  He writes of being a beginning PhD student at Berkeley, living at International House (and dating an Australian exchange student who he met his first day at IHouse), who became obsessed with solving the quantum measurement problem, and who therefore headed to the physics library, where he was awestruck by reading the original Many-Worlds articles of Hugh Everett and Bryce deWitt.  As it happens, every single part of the last sentence also describes me (!!!)—except that the Australian exchange student who I met my first day at IHouse lost interest in me when she decided that I was too nerdy.  And also, I eventually decided that the MWI left me pretty much as confused about the measurement problem as before, whereas Tegmark remains a wholehearted Many-Worlder.

The other thing I loved about Tegmark’s book was its almost comical concreteness.  He doesn’t just metaphorically write about “knobs” for adjusting the constants of physics: he shows you a picture of a box with the knobs on it.  He also shows a “letter” that lists not only his street address, zip code, town, state, and country, but also his planet, Hubble volume, post-inflationary bubble, quantum branch, and mathematical structure.  Probably my favorite figure was the one labeled “What Dark Matter Looks Like / What Dark Energy Looks Like,” which showed two blank boxes.

Sometimes Tegmark seems to subtly subvert the conventions of popular-science writing.  For example, in the first chapter, he includes a table that categorizes each of the book’s remaining chapters as “Mainstream,” “Controversial,” or “Extremely Controversial.”  And whenever you’re reading the text and cringing at a crucial factual point that was left out, chances are good you’ll find a footnote at the bottom of the page explaining that point.  I hope both of these conventions become de rigueur for all future pop-science books, but I’m not counting on it.

The book has what Tegmark himself describes as a “Dr. Jekyll / Mr. Hyde” structure, with the first (“Dr. Jekyll”) half of the book relaying more-or-less accepted discoveries in physics and cosmology, and the second (“Mr. Hyde”) half focusing on Tegmark’s own Mathematical Universe Hypothesis (MUH).  Let’s accept that both halves are enjoyable reads, and that the first half contains lots of wonderful science.  Is there anything worth saying about the truth or falsehood of the MUH?

In my view, the MUH gestures toward two points that are both correct and important—neither of them new, but both well worth repeating in a pop-science book.  The first is that the laws of physics aren’t “suggestions,” which the particles can obey when they feel like it but ignore when Uri Geller picks up a spoon.  In that respect, they’re completely unlike human laws, and the fact that we use the same word for both is unfortunate.  Nor are the laws merely observed correlations, as in “scientists find link between yogurt and weight loss.”  The links of fundamental physics are ironclad: the world “obeys” them in much the same sense that a computer obeys its code, or the positive integers obey the rules of arithmetic.  Of course we don’t yet know the complete program describing the state evolution of the universe, but everything learned since Galileo leads one to expect that such a program exists.  (According to quantum mechanics, the program describing our observed reality is a probabilistic one, but for me, that fact by itself does nothing to change its lawlike character.  After all, if you know the initial state, Hamiltonian, and measurement basis, then quantum mechanics gives you a perfect algorithm to calculate the probabilities.)

The second true and important nugget in the MUH is that the laws are “mathematical.”  By itself, I’d say that’s a vacuous statement, since anything that can be described at all can be described mathematically.  (As a degenerate case, a “mathematical description of reality” could simply be a gargantuan string of bits, listing everything that will ever happen at every point in spacetime.)  The nontrivial part is that, at least if we ignore boundary conditions and the details of our local environment (which maybe we shouldn’t!), the laws of nature are expressible as simple, elegant math—and moreover, the same structures (complex numbers, group representations, Riemannian manifolds…) that mathematicians find important for internal reasons, again and again turn out to play a crucial role in physics.  It didn’t have to be that way, but it is.

Putting the two points together, it seems fair to say that the physical world is “isomorphic to” a mathematical structure—and moreover, a structure whose time evolution obeys simple, elegant laws.   All of this I find unobjectionable: if you believe it, it doesn’t make you a Tegmarkian; it makes you ready for freshman science class.

But Tegmark goes further.  He doesn’t say that the universe is “isomorphic” to a mathematical structure; he says that it is that structure, that its physical and mathematical existence are the same thing.  Furthermore, he says that every mathematical structure “exists” in the same sense that “ours” does; we simply find ourselves in one of the structures capable of intelligent life (which shouldn’t surprise us).  Thus, for Tegmark, the answer to Stephen Hawking’s famous question—“What is it that breathes fire into the equations and gives them a universe to describe?”—is that every consistent set of equations has fire breathed into it.  Or rather, every mathematical structure of at most countable cardinality whose relations are definable by some computer program.  (Tegmark allows that structures that aren’t computably definable, like the set of real numbers, might not have fire breathed into them.)

Anyway, the ensemble of all (computable?) mathematical structures, constituting the totality of existence, is what Tegmark calls the “Level IV multiverse.”  In his nomenclature, our universe consists of anything from which we can receive signals; anything that exists but that we can’t receive signals from is part of a “multiverse” rather than our universe.  The “Level I multiverse” is just the entirety of our spacetime, including faraway regions from which we can never receive a signal due to the dark energy.  The Level II multiverse consists of the infinitely many other “bubbles” (i.e., “local Big Bangs”), with different values of the constants of physics, that would, in eternal inflation cosmologies, have generically formed out of the same inflating substance that gave rise to our Big Bang.  The Level III multiverse is Everett’s many worlds.  Thus, for Tegmark, the Level IV multiverse is a sort of natural culmination of earlier multiverse theorizing.  (Some people might call it a reductio ad absurdum, but Tegmark is nothing if not a bullet-swallower.)

Now, why should you believe in any of these multiverses?  Or better: what does it buy you to believe in them?

As Tegmark correctly points out, none of the multiverses are “theories,” but they might be implications of theories that we have other good reasons to accept.  In particular, it seems crazy to believe that the Big Bang created space only up to the furthest point from which light can reach the earth, and no further.  So, do you believe that space extends further than our cosmological horizon?  Then boom! you believe in the Level I multiverse, according to Tegmark’s definition of it.

Likewise, do you believe there was a period of inflation in the first ~10-32 seconds after the Big Bang?  Inflation has made several confirmed predictions (e.g., about the “fractal” nature of the CMB perturbations), and if last week’s announcement of B-modes in the CMB is independently verified, that will pretty much clinch the case for inflation.  But Alan Guth, Andrei Linde, and others have argued that, if you accept inflation, then it seems hard to prevent patches of the inflating substance from continuing to inflate forever, and thereby giving rise to infinitely many “other” Big Bangs.  Furthermore, if you accept string theory, then the six extra dimensions should generically curl up differently in each of those Big Bangs, giving rise to different apparent values of the constants of physics.  So then boom! with those assumptions, you’re sold on the Level II multiverse as well.  Finally, of course, there are people (like David Deutsch, Eliezer Yudkowsky, and Tegmark himself) who think that quantum mechanics forces you to accept the Level III multiverse of Everett.  Better yet, Tegmark claims that these multiverses are “falsifiable.”  For example, if inflation turns out to be wrong, then the Level II multiverse is dead, while if quantum mechanics is wrong, then the Level III one is dead.

Admittedly, the Level IV multiverse is a tougher sell, even by the standards of the last two paragraphs.  If you believe physical existence to be the same thing as mathematical existence, what puzzles does that help to explain?  What novel predictions does it make?  Forging fearlessly ahead, Tegmark argues that the MUH helps to “explain” why our universe has so many mathematical regularities in the first place.  And it “predicts” that more mathematical regularities will be discovered, and that everything discovered by science will be mathematically describable.  But what about the existence of other mathematical universes?  If, Tegmark says (on page 354), our qualitative laws of physics turn out to allow a narrow range of numerical constants that permit life, whereas other possible qualitative laws have no range of numerical constants that permit life, then that would be evidence for the existence of a mathematical multiverse.  For if our qualitative laws were the only ones into which fire had been breathed, then why would they just so happen to have a narrow but nonempty range of life-permitting constants?

I suppose I’m not alone in finding this totally unpersuasive.  When most scientists say they want “predictions,” they have in mind something meatier than “predict the universe will continue to be describable by mathematics.”  (How would we know if we found something that wasn’t mathematically describable?  Could we even describe such a thing with English words, in order to write papers about it?)  They also have in mind something meatier than “predict that the laws of physics will be compatible with the existence of intelligent observers, but if you changed them a little, then they’d stop being compatible.”  (The first part of that prediction is solid enough, but the second part might depend entirely on what we mean by a “little change” or even an “intelligent observer.”)

What’s worse is that Tegmark’s rules appear to let him have it both ways.  To whatever extent the laws of physics turn out to be “as simple and elegant as anyone could hope for,” Tegmark can say: “you see?  that’s evidence for the mathematical character of our universe, and hence for the MUH!”  But to whatever extent the laws turn out not to be so elegant, to be weird or arbitrary, he can say: “see?  that’s evidence that our laws were selected more-or-less randomly among all possible laws compatible with the existence of intelligent life—just as the MUH predicted!”

Still, maybe the MUH could be sharpened to the point where it did make definite predictions?  As Tegmark acknowledges, the central difficulty with doing so is that no one has any idea what measure to use over the space of mathematical objects (or even computably-describable objects).  This becomes clear if we ask a simple question like: what fraction of the mathematical multiverse consists of worlds that contain nothing but a single three-dimensional cube?

We could try to answer such a question using the universal prior: that is, we could make a list of all self-delimiting computer programs, then count the total weight of programs that generate a single cube and then halt, where each n-bit program gets assigned 1/2n weight.  Sure, the resulting fraction would be uncomputable, but at least we’d have defined it.  Except wait … which programming language should we use?  (The constant factors could actually matter here!)  Worse yet, what exactly counts as a “cube”?  Does it have to have faces, or are vertices and edges enough?  How should we interpret the string of 1’s and 0’s output by the program, in order to know whether it describes a cube or not?  (Also, how do we decide whether two programs describe the “same” cube?  And if they do, does that mean they’re describing the same universe, or two different universes that happen to be identical?)

These problems are simply more-dramatic versions of the “standard” measure problem in inflationary cosmology, which asks how to make statistical predictions in a multiverse where everything that can happen will happen, and will happen an infinite number of times.  The measure problem is sometimes discussed as if it were a technical issue: something to acknowledge but then set to the side, in the hope that someone will eventually come along with some clever counting rule that solves it.  To my mind, however, the problem goes deeper: it’s a sign that, although we might have started out in physics, we’ve now stumbled into metaphysics.

Some cosmologists would strongly protest that view.  Most of them would agree with me that Tegmark’s Level IV multiverse is metaphysics, but they’d insist that the Level I, Level II, and perhaps Level III multiverses were perfectly within the scope of scientific inquiry: they either exist or don’t exist, and the fact that we get confused about the measure problem is our issue, not nature’s.

My response can be summed up in a question: why not ride this slippery slope all the way to the bottom?  Thinkers like Nick Bostrom and Robin Hanson have pointed out that, in the far future, we might expect that computer-simulated worlds (as in The Matrix) will vastly outnumber the “real” world.  So then, why shouldn’t we predict that we’re much more likely to live in a computer simulation than we are in one of the “original” worlds doing the simulating?  And as a logical next step, why shouldn’t we do physics by trying to calculate a probability measure over different kinds of simulated worlds: for example, those run by benevolent simulators versus evil ones?  (For our world, my own money’s on “evil.”)

But why stop there?  As Tegmark points out, what does it matter if a computer simulation is actually run or not?  Indeed, why shouldn’t you say something like the following: assuming that π is a normal number, your entire life history must be encoded infinitely many times in π’s decimal expansion.  Therefore, you’re infinitely more likely to be one of your infinitely many doppelgängers “living in the digits of π” than you are to be the “real” you, of whom there’s only one!  (Of course, you might also be living in the digits of e or √2, possibilities that also merit reflection.)

At this point, of course, you’re all the way at the bottom of the slope, in Mathematical Universe Land, where Tegmark is eagerly waiting for you.  But you still have no idea how to calculate a measure over mathematical objects: for example, how to say whether you’re more likely to be living in the first 1010^120 digits of π, or the first 1010^120 digits of e.  And as a consequence, you still don’t know how to use the MUH to constrain your expectations for what you’re going to see next.

Now, notice that these different ways down the slippery slope all have a common structure:

  1. We borrow an idea from science that’s real and important and profound: for example, the possible infinite size and duration of our universe, or inflationary cosmology, or the linearity of quantum mechanics, or the likelihood of π being a normal number, or the possibility of computer-simulated universes.
  2. We then run with that idea until we smack right into a measure problem, and lose the ability to make useful predictions.

Many people want to frame the multiverse debates as “science versus pseudoscience,” or “science versus science fiction,” or (as I did before) “physics versus metaphysics.”  But actually, I don’t think any of those dichotomies get to the nub of the matter.  All of the multiverses I’ve mentioned—certainly the inflationary and Everett multiverses, but even the computer-simuverse and the π-verse—have their origins in legitimate scientific questions and in genuinely-great achievements of science.  However, they then extrapolate those achievements in a direction that hasn’t yet led to anything impressive.  Or at least, not to anything that we couldn’t have gotten without the ontological commitments that led to the multiverse and its measure problem.

What is it, in general, that makes a scientific theory impressive?  I’d say that the answer is simple: connecting elegant math to actual facts of experience.

When Einstein said, the perihelion of Mercury precesses at 43 seconds of arc per century because gravity is the curvature of spacetime—that was impressive.

When Dirac said, you should see a positron because this equation in quantum field theory is a quadratic with both positive and negative solutions (and then the positron was found)—that was impressive.

When Darwin said, there must be equal numbers of males and females in all these different animal species because any other ratio would fail to be an equilibrium—that was impressive.

When people say that multiverse theorizing “isn’t science,” I think what they mean is that it’s failed, so far, to be impressive science in the above sense.  It hasn’t yet produced any satisfying clicks of understanding, much less dramatically-confirmed predictions.  Yes, Steven Weinberg kind-of, sort-of used “multiverse” reasoning to predict—correctly—that the cosmological constant should be nonzero.  But as far as I can tell, he could just as well have dispensed with the “multiverse” part, and said: “I see no physical reason why the cosmological constant should be zero, rather than having some small nonzero value still consistent with the formation of stars and galaxies.”

At this, many multiverse proponents would protest: “look, Einstein, Dirac, and Darwin is setting a pretty high bar!  Those guys were smart but also lucky, and it’s unrealistic to expect that scientists will always be so lucky.  For many aspects of the world, there might not be an elegant theoretical explanation—or any explanation at all better than, ‘well, if it were much different, then we probably wouldn’t be here talking about it.’  So, are you saying we should ignore where the evidence leads us, just because of some a-priori prejudice in favor of mathematical elegance?”

In a sense, yes, I am saying that.  Here’s an analogy: suppose an aspiring filmmaker said, “I want my films to capture the reality of human experience, not some Hollywood myth.  So, in most of my movies nothing much will happen at all.  If something does happen—say, a major character dies—it won’t be after some interesting, character-forming struggle, but meaninglessly, in a way totally unrelated to the rest of the film.  Like maybe they get hit by a bus.  Then some other random stuff will happen, and then the movie will end.”

Such a filmmaker, I’d say, would have a perfect plan for creating boring, arthouse movies that nobody wants to watch.  Dramatic, character-forming struggles against the odds might not be the norm of human experience, but they are the central ingredient of entertaining cinema—so if you want to create an entertaining movie, then you have to postselect on those parts of human experience that do involve dramatic struggles.  In the same way, I claim that elegant mathematical explanations for observed facts are the central ingredient of great science.  Not everything in the universe might have such an explanation, but if one wants to create great science, one has to postselect on the things that do.

(Note that there’s an irony here: the same unsatisfyingness, the same lack of explanatory oomph, that make something a “lousy movie” to those with a scientific mindset, can easily make it a great movie to those without such a mindset.  The hunger for nontrivial mathematical explanations is a hunger one has to acquire!)

Some readers might argue: “but weren’t quantum mechanics, chaos theory, and Gödel’s theorem scientifically important precisely because they said that certain phenomena—the exact timing of a radioactive decay, next month’s weather, the bits of Chaitin’s Ω—were unpredictable and unexplainable in fundamental ways?”  To me, these are the exceptions that prove the rule.  Quantum mechanics, chaos, and Gödel’s theorem were great science not because they declared certain facts unexplainable, but because they explained why those facts (and not other facts) had no explanations of certain kinds.  Even more to the point, they gave definite rules to help figure out what would and wouldn’t be explainable in their respective domains: is this state an eigenstate of the operator you’re measuring?  is the Lyapunov exponent positive?  is there a proof of independence from PA or ZFC?

So, what would be the analogue of the above for the multiverse?  Is there any Level II or IV multiverse hypothesis that says: sure, the mass of electron might be a cosmic accident, with at best an anthropic explanation, but the mass of the Higgs boson is almost certainly not such an accident?  Or that the sum or difference of the two masses is not an accident?  (And no, it doesn’t count to affirm as “non-accidental” things that we already have non-anthropic explanations for.)  If such a hypothesis exists, tell me in the comments!  As far as I know, all Level II and IV multiverse hypotheses are still at the stage where basically anything that isn’t already explained might vary across universes and be anthropically selected.  And that, to my mind, makes them very different in character from quantum mechanics, chaos, or Gödel’s theorem.

In summary, here’s what I feel is a reasonable position to take right now, regarding all four of Tegmark’s multiverse levels (not to mention the computer-simuverse, which I humbly propose as Level 3.5):

Yes, these multiverses are a perfectly fine thing to speculate about: sure they’re unobservable, but so are plenty of other entities that science has forced us to accept.  There are even natural reasons, within physics and cosmology, that could lead a person to speculate about each of these multiverse levels.  So if you want to speculate, knock yourself out!  If, however, you want me to accept the results as more than speculation—if you want me to put them on the bookshelf next to Darwin and Einstein—then you’ll need to do more than argue that other stuff I already believe logically entails a multiverse (which I’ve never been sure about), or point to facts that are currently unexplained as evidence that we need a multiverse to explain their unexplainability, or claim as triumphs for your hypothesis things that don’t really need the hypothesis at all, or describe implausible hypothetical scenarios that could confirm or falsify the hypothesis.  Rather, you’ll need to use your multiverse hypothesis—and your proposed solution to the resulting measure problem—to do something new that impresses me.

TIME’s cover story on D-Wave: A case study in the conventions of modern journalism

Thursday, February 6th, 2014

This morning, commenter rrtucci pointed me to TIME Magazine’s cover story about D-Wave (yes, in today’s digital media environment, I need Shtetl-Optimized readers to tell me what’s on the cover of TIME…).  rrtucci predicted that, soon after reading the article, I’d be hospitalized with a severe stress-induced bleeding ulcer.  Undeterred, I grit my teeth, paid the $5 to go behind the paywall, and read the article.

The article, by Lev Grossman, could certainly be a lot worse.  If you get to the end, it discusses the experiments by Matthias Troyer’s group, and it makes clear the lack of any practically-relevant speedup today from the D-Wave devices.  It also includes a few skeptical quotes:

“In quantum computing, we have to be careful what we mean by ‘utilizing quantum effects,'” says Monroe, the University of Maryland scientist, who’s among the doubters. “This generally means that we are able to store superpositions of information in such a way that the system retains its ‘fuzziness,’ or quantum coherence, so that it can perform tasks that are impossible otherwise. And by that token there is no evidence that the D-Wave machine is utilizing quantum effects.”

One of the closest observers of the controversy has been Scott Aaronson, an associate professor at MIT and the author of a highly influential quantum-computing blog [aww, shucks –SA]. He remains, at best, cautious. “I’m convinced … that interesting quantum effects are probably present in D-Wave’s devices,” he wrote in an email. “But I’m not convinced that those effects, right now, are playing any causal role in solving any problems faster than we could solve them with a classical computer. Nor do I think there’s any good argument that D-Wave’s current approach, scaled up, will lead to such a speedup in the future. It might, but there’s currently no good reason to think so.”

Happily, the quote from me is something that I actually agreed with at the time I said it!  Today, having read the Shin et al. paper—which hadn’t yet come out when Grossman emailed me—I might tone down the statement “I’m convinced … that interesting quantum effects are probably present” to something like: “there’s pretty good evidence for quantum effects like entanglement at a ‘local’ level, but at the ‘global’ level we really have no idea.”

Alas, ultimately I regard this article as another victim (through no fault of the writer, possibly) of the strange conventions of modern journalism.  Maybe I can best explain those conventions with a quickie illustration:

MAGIC 8-BALL: THE RENEGADE MATH WHIZ WHO COULD CHANGE NUMBERS FOREVER

An eccentric billionaire, whose fascinating hobbies include nude skydiving and shark-taming, has been shaking up the scientific world lately with his controversial claim that 8+0 equals 17  [… six more pages about the billionaire redacted …]  It must be said that mathematicians, who we reached for comment because we’re diligent reporters, have tended to be miffed, skeptical, and sometimes even sarcastic about the billionaire’s claims.  Not surprisingly, though, the billionaire and his supporters have had some dismissive comments of their own about the mathematicians.  So, which side is right?  Or is the truth somewhere in the middle?  At this early stage, it’s hard for an outsider to say.  In the meantime, the raging controversy itself is reason enough for us to be covering this story using this story template.  Stay tuned for more!

As shown (for example) by Will Bourne’s story in Inc. magazine, it’s possible for a popular magazine to break out of the above template when covering D-Wave, or at least bend it more toward reality.  But it’s not easy.

More detailed comments:

  • The article gets off on a weird foot in the very first paragraph, describing the insides of D-Wave’s devices as “the coldest place in the universe.”  Err, 20mK is pretty cold, but colder temperatures are routinely achieved in many other physics experiments.  (Are D-Wave’s the coldest current, continuously-operating experiments, or something like that?  I dunno: counterexamples, anyone?  I’ve learned from experts that they’re not, not even close.  I heard from someone who had a bunch of dilution fridges running at 10mK in the lab he was emailing me from…)
  • The article jumps enthusiastically into the standard Quantum Computing = Exponential Parallelism Fallacy (the QC=EPF), which is so common to QC journalism that I don’t know if it’s even worth pointing it out anymore (but here I am doing so).
  • Commendably, the article states clearly that QCs would offer speedups only for certain specific problems, not others; that D-Wave’s devices are designed only for adiabatic optimization, and wouldn’t be useful (e.g.) for codebreaking; and that even for optimization, “D-Wave’s hardware isn’t powerful enough or well enough understood to show serious quantum speedup yet.”  But there’s a crucial further point that the article doesn’t make: namely, that we have no idea yet whether adiabatic optimization is something where quantum computers can give any practically-important speedup.  In other words, even if you could implement adiabatic optimization perfectly—at zero temperature, with zero decoherence—we still don’t know whether there’s any quantum speedup to be had that way, for any of the nifty applications that the article mentions: “software design, tumor treatments, logistical planning, the stock market, airlines schedules, the search for Earth-like planets in other solar systems, and in particular machine learning.”  In that respect, adiabatic optimization is extremely different from (e.g.) Shor’s factoring algorithm or quantum simulation: things where we know how much speedup we could get, at least compared to the best currently-known classical algorithms.  But I better stop now, since I feel myself entering an infinite loop (and I didn’t even need the adiabatic algorithm to detect it).

Merry Christmas! My quantum computing research explained, using only the 1000 most common English words

Tuesday, December 24th, 2013

[With special thanks to the Up-Goer Five Text Editor, which was inspired by this xkcd]

I study computers that would work in a different way than any computer that we have today.  These computers would be very small, and they would use facts about the world that are not well known to us from day to day life.  No one has built one of these computers yet—at least, we don’t think they have!—but we can still reason about what they could do for us if we did build them.

How would these new computers work? Well, when you go small enough, you find that, in order to figure out what the chance is that something will happen, you need to both add and take away a whole lot of numbers—one number for each possible way that the thing could happen, in fact. What’s interesting is, this means that the different ways a thing could happen can “kill each other out,” so that the thing never happens at all! I know it sounds weird, but the world of very small things has been known to work that way for almost a hundred years.

So, with the new kind of computer, the idea is to make the different ways each wrong answer could be reached kill each other out (with some of them “pointing” in one direction, some “pointing” in another direction), while the different ways that the right answer could be reached all point in more or less the same direction. If you can get that to happen, then when you finally look at the computer, you’ll find that there’s a very good chance that you’ll see the right answer. And if you don’t see the right answer, then you can just run the computer again until you do.

For some problems—like breaking a big number into its smallest parts (say, 43259 = 181 × 239)—we’ve learned that the new computers would be much, much faster than we think any of today’s computers could ever be. For other problems, however, the new computers don’t look like they’d be faster at all. So a big part of my work is trying to figure out for which problems the new computers would be faster, and for which problems they wouldn’t be.

You might wonder, why is it so hard to build these new computers? Why don’t we have them already? This part is a little hard to explain using the words I’m allowed, but let me try. It turns out that the new computers would very easily break. In fact, if the bits in such a computer were to “get out” in any way—that is, to work themselves into the air in the surrounding room, or whatever—then you could quickly lose everything about the new computer that makes it faster than today’s computers. For this reason, if you’re building the new kind of computer, you have to keep it very, very carefully away from anything that could cause it to lose its state—but then at the same time, you do have to touch the computer, to make it do the steps that will eventually give you the right answer. And no one knows how to do all of this yet. So far, people have only been able to use the new computers for very small checks, like breaking 15 into 3 × 5. But people are working very hard today on figuring out how to do bigger things with the new kind of computer.

In fact, building the new kind of computer is so hard, that some people even believe it won’t be possible! But my answer to them is simple. If it’s not possible, then that’s even more interesting to me than if it is possible! And either way, the only way I know to find out the truth is to try it and see what happens.

Sometimes, people pretend that they already built one of these computers even though they didn’t. Or they say things about what the computers could do that aren’t true. I have to admit that, even though I don’t really enjoy it, I do spend a lot of my time these days writing about why those people are wrong.

Oh, one other thing. Not long from now, it might be possible to build computers that don’t do everything that the new computers could eventually do, but that at least do some of it. Like, maybe we could use nothing but light and mirrors to answer questions that, while not important in and of themselves, are still hard to answer using today’s computers. That would at least show that we can do something that’s hard for today’s computers, and it could be a step along the way to the new computers. Anyway, that’s what a lot of my own work has been about for the past four years or so.

Besides the new kind of computers, I’m also interested in understanding what today’s computers can and can’t do. The biggest open problem about today’s computers could be put this way: if a computer can check an answer to a problem in a short time, then can a computer also find an answer in a short time? Almost all of us think that the answer is no, but no one knows how to show it. Six years ago, another guy and I figured out one of the reasons why this question is so hard to answer: that is, why the ideas that we already know don’t work.

Anyway, I have to go to dinner now. I hope you enjoyed this little piece about the kind of stuff that I work on.

Luke Muehlhauser interviews me about philosophical progress

Saturday, December 14th, 2013

I’m shipping out today to sunny Rio de Janeiro, where I’ll be giving a weeklong course about BosonSampling, at the invitation of Ernesto Galvão.  Then it’s on to Pennsylvania (where I’ll celebrate Christmas Eve with old family friends), Israel (where I’ll drop off Dana and Lily with Dana’s family in Tel Aviv, then lecture at the Jerusalem Winter School in Theoretical Physics), Puerto Rico (where I’ll speak at the FQXi conference on Physics of Information), back to Israel, and then New York before returning to Boston at the beginning of February.  Given this travel schedule, it’s possible that blogging will be even lighter than usual for the next month and a half (or not—we’ll see).

In the meantime, however, I’ve got the equivalent of at least five new blog posts to tide over Shtetl-Optimized fans.  Luke Muehlhauser, the Executive Director of the Machine Intelligence Research Institute (formerly the Singularity Institute for Artificial Intelligence), did an in-depth interview with me about “philosophical progress,” in which he prodded me to expand on certain comments in Why Philosophers Should Care About Computational Complexity and The Ghost in the Quantum Turing Machine.  Here are (abridged versions of) Luke’s five questions:

1. Why are you so interested in philosophy? And what is the social value of philosophy, from your perspective?

2. What are some of your favorite examples of illuminating Q-primes [i.e., scientifically-addressable pieces of big philosophical questions] that were solved within your own field, theoretical computer science?

3. Do you wish philosophy-the-field would be reformed in certain ways? Would you like to see more crosstalk between disciplines about philosophical issues? Do you think that, as Clark Glymour suggested, philosophy departments should be defunded unless they produce work that is directly useful to other fields … ?

4. Suppose a mathematically and analytically skilled student wanted to make progress, in roughly the way you describe, on the Big Questions of philosophy. What would you recommend they study? What should they read to be inspired? What skills should they develop? Where should they go to study?

5. Which object-level thinking tactics … do you use in your own theoretical (especially philosophical) research?  Are there tactics you suspect might be helpful, which you haven’t yet used much yourself?

For the answers—or at least my answers—click here!

PS. In case you missed it before, Quantum Computing Since Democritus was chosen by Scientific American blogger Jennifer Ouellette (via the “Time Lord,” Sean Carroll) as the top physics book of 2013.  Woohoo!!

23, Me, and the Right to Misinterpret Probabilities

Wednesday, December 11th, 2013

If you’re the sort of person who reads this blog, you may have heard that 23andMe—the company that (until recently) let anyone spit into a capsule, send it away to a DNA lab, and then learn basic information about their ancestry, disease risks, etc.—has suspended much of its service, on orders from the US Food and Drug Administration.  As I understand it, on Nov. 25, the FDA ordered 23andMe to stop marketing to new customers (though it can still serve existing customers), and on Dec. 5, the company stopped offering new health-related information to any customers (though you can still access the health information you had before, and ancestry and other non-health information is unaffected).

Of course, the impact of these developments is broader: within a couple weeks, “do-it-yourself genomics” has gone from an industry whose explosive growth lots of commentators took as a given, to one whose future looks severely in doubt (at least in the US).

The FDA gave the reasons for its order in a letter to Ann Wojcicki, 23andMe’s CEO.  Excerpts:

For instance, if the BRCA-related risk assessment for breast or ovarian cancer reports a false positive, it could lead a patient to undergo prophylactic surgery, chemoprevention, intensive screening, or other morbidity-inducing actions, while a false negative could result in a failure to recognize an actual risk that may exist.  Assessments for drug responses carry the risks that patients relying on such tests may begin to self-manage their treatments through dose changes or even abandon certain therapies depending on the outcome of the assessment.  For example, false genotype results for your warfarin drug response test could have significant unreasonable risk of illness, injury, or death to the patient due to thrombosis or bleeding events that occur from treatment with a drug at a dose that does not provide the appropriately calibrated anticoagulant effect …  The risk of serious injury or death is known to be high when patients are either non-compliant or not properly dosed; combined with the risk that a direct-to-consumer test result may be used by a patient to self-manage, serious concerns are raised if test results are not adequately understood by patients or if incorrect test results are reported.

To clarify, the DNA labs that 23andMe uses are already government-regulated.  Thus, the question at issue here is not whether, if 23andMe claims (say) that you have CG instead of CC at some particular locus, the information is reliable.  Rather, the question is whether 23andMe should be allowed to tell you that fact, while also telling you that a recent research paper found that people with CG have a 10.4% probability of developing Alzheimer’s disease, as compared to a 7.2% base rate.  More bluntly, the question is whether ordinary schmoes ought to be trusted to learn such facts about themselves, without a doctor as an intermediary to interpret the results for them, or perhaps to decide that there’s no good reason for the patient to know at all.

Among medical experts, a common attitude seems to be something like this: sure, getting access to your own genetic data is harmless fun, as long as you’re an overeducated nerd who just wants to satisfy his or her intellectual curiosity (or perhaps narcissism).  But 23andMe crossed a crucial line when it started marketing its service to the hoi polloi, as something that could genuinely tell them about health risks.  Most people don’t understand probability, and are incapable of parsing “based on certain gene variants we found, your chances of developing diabetes are about 6 times higher than the baseline” as anything other than “you will develop diabetes.”  Nor, just as worryingly, are they able to parse “your chances are lower than the baseline” as anything other than “you won’t develop diabetes.”

I understand this argument.  Nevertheless, I find it completely inconsistent with a free society.  Moreover, I predict that in the future, the FDA’s current stance will be looked back upon as an outrage, with the subtleties in the FDA’s position mattering about as much as the subtleties in the Church’s position toward Galileo (“look, Mr. G., it’s fine to discuss heliocentrism among your fellow astronomers, as a hypothesis or a calculational tool—just don’t write books telling the general public that heliocentrism is literally true, and that they should change their worldviews as a result!”).  That’s why I signed this petition asking the FDA to reconsider its decision, and I encourage you to sign it too.

Here are some comments that might help clarify my views:

(1) I signed up for 23andMe a few years ago, as did the rest of my family.  The information I gained from it wasn’t exactly earth-shattering: I learned, for example, that my eyes are probably blue, that my ancestry is mostly Ashkenazi, that there’s a risk my eyesight will further deteriorate as I age (the same thing a succession of ophthalmologists told me), that I can’t taste the bitter flavor in brussels sprouts, and that I’m an “unlikely sprinter.”  On the other hand, seeing exactly which gene variants correlate with these things, and how they compare to the variants my parents and brother have, was … cool.  It felt like I imagine it must have felt to buy a personal computer in 1975.  In addition, I found nothing the slightest bit dishonest about the way the results were reported.  Each result was stated explicitly in terms of probabilities—giving both the baseline rate for each condition, and the rate conditioned on having such-and-such gene variant—and there were even links to the original research papers if I wanted to read them myself.  I only wish that I got half as much context and detail from conventional doctor visits—or for that matter, from most materials I’ve read from the FDA itself.  (When Dana was pregnant, I was pleasantly surprised when some of the tests she underwent came back with explicit probabilities and base rates.  I remember wishing doctors would give me that kind of information more often.)

(2) From my limited reading and experience, I think it’s entirely possible that do-it-yourself genetic testing is overhyped; that it won’t live up to its most fervent advocates’ promises; that for most interesting traits there are just too many genes involved, via too many labyrinthine pathways, to make terribly useful predictions about individuals, etc.  So it’s important to me that, in deciding whether what 23andMe does should be legal, we’re not being asked to decide any of these complicated questions!  We’re only being asked whether the FDA should get to decide the answers in advance.

(3) As regular readers will know, I’m far from a doctrinaire libertarian.  Thus, my opposition to shutting down 23andMe is not at all a corollary of reflexive opposition to any government regulation of anything.  In fact, I’d be fine if the FDA wanted to insert a warning message on 23andMe (in addition to the warnings 23andMe already provides), emphasizing that genetic tests only provide crude statistical information, that they need to be interpreted with care, consult your doctor before doing anything based on these results, etc.  But when it comes to banning access to the results, I have trouble with some of the obvious slippery slopes.  E.g., what happens when some Chinese or Russian company launches a competing service?  Do we ban Americans from mailing their saliva overseas?  What happens when individuals become able just to sequence their entire genomes, and store and analyze them on their laptops?  Do we ban the sequencing technology?  Or do we just ban software that makes it easy enough to analyze the results?  If the software is hard enough to use, so only professional biologists use it, does that make it OK again?  Also, if the FDA will be in the business of banning genomic data analysis tools, then what about medical books?  For that matter, what about any books or websites, of any kind, that might cause someone to make a poor medical decision?  What would such a policy, if applied consistently, do to the multibillion-dollar alternative medicine industry?

(4) I don’t understand the history of 23andMe’s interactions with the FDA.  From what I’ve read, though, they have been communicating for five years, with everything 23andMe has said in public sounding conciliatory rather than defiant (though the FDA has accused 23andMe of being tardy with its responses).  Apparently, the key problem is simply that the FDA hasn’t yet developed a regulatory policy specifically for direct-to-consumer genetic tests.  It’s been considering such a policy for years—but in the meantime, it believes no one should be marketing such tests for health purposes before a policy exists.  Alas, there are very few cases where I’d feel inclined to support a government in saying: “X is a new technology that lots of people are excited about.  However, our regulatory policies haven’t yet caught up to X.  Therefore, our decision is that X is banned, until and unless we figure out how to regulate it.”  Maybe I could support such a policy, if X had the potential to level cities and kill millions.  But when it comes to consumer DNA tests, this sort of preemptive banning seems purposefully designed to give wet dreams to Ayn Rand fans.

(5) I confess that, despite everything I’ve said, my moral intuitions might be different if dead bodies were piling up because of terrible 23andMe-inspired medical decisions.  But as far as I know, there’s no evidence so far that even a single person was harmed.  Which isn’t so surprising: after all, people might run to their doctor terrified about something they learned on 23onMe, but no sane doctor would ever make a decision solely on that basis, without ordering further tests.

Twenty Reasons to Believe Oswald Acted Alone

Monday, December 2nd, 2013

As the world marked the 50th anniversary of the JFK assassination, I have to confess … no, no, not that I was in on the plot.  I wasn’t even born then, silly.  I have to confess that, in between struggling to make a paper deadline, attending a workshop in Princeton, celebrating Thanksgivukkah, teaching Lily how to pat her head and clap her hands, and not blogging, I also started dipping, for the first time in my life, into a tiny fraction of the vast literature about the JFK assassination.  The trigger (so to speak) for me was this article by David Talbot, the founder of Salon.com.  I figured, if the founder of Salon is a JFK conspiracy buff—if, for crying out loud, my skeptical heroes Bertrand Russell and Carl Sagan were both JFK conspiracy buffs—then maybe it’s at least worth familiarizing myself with the basic facts and arguments.

So, what happened when I did?  Were the scales peeled from my eyes?

In a sense, yes, they were.  Given how much has been written about this subject, and how many intelligent people take seriously the possibility of a conspiracy, I was shocked by how compelling I found the evidence to be that there were exactly three shots, all fired by Lee Harvey Oswald with a Carcano rifle from the sixth floor of the Texas School Book Depository, just as the Warren Commission said in 1964.  And as for Oswald’s motives, I think I understand them as well and as poorly as I understand the motives of the people who send me ramblings every week about P vs. NP and the secrets of the universe.

Before I started reading, if someone forced me to guess, maybe I would’ve assigned a ~10% probability to some sort of conspiracy.  Now, though, I’d place the JFK conspiracy hypothesis firmly in Moon-landings-were-faked, Twin-Towers-collapsed-from-the-inside territory.  Or to put it differently, “Oswald as lone, crazed assassin” has been added to my large class of “sanity-complete” propositions: propositions defined by the property that if I doubt any one of them, then there’s scarcely any part of the historical record that I shouldn’t doubt.  (And while one can’t exclude the possibility that Oswald confided in someone else before the act—his wife or a friend, for example—and that other person kept it a secret for 50 years, what’s known about Oswald strongly suggests that he didn’t.)

So, what convinced me?  In this post, I’ll give twenty reasons for believing that Oswald acted alone.  Notably, my reasons will have less to do with the minutiae of bullet angles and autopsy reports, than with general principles for deciding what’s true and what isn’t.  Of course, part of the reason for this focus is that the minutiae are debated in unbelievable detail elsewhere, and I have nothing further to contribute to those debates.  But another reason is that I’m skeptical that anyone actually comes to believe the JFK conspiracy hypothesis because they don’t see how the second bullet came in at the appropriate angle to pass through JFK’s neck and shoulder and then hit Governor Connally.  Clear up some technical point (or ten or fifty of them)—as has been done over and over—and the believers will simply claim that the data you used was altered by the CIA, or they’ll switch to other “anomalies” without batting an eye.  Instead, people start with certain general beliefs about how the world works, “who’s really in charge,” what sorts of explanations to look for, etc., and then use their general beliefs to decide which claims to accept about JFK’s head wounds or the foliage in Dealey Plaza—not vice versa.  That being so, one might as well just discuss the general beliefs from the outset.  So without further ado, here are my twenty reasons:

1. Conspiracy theorizing represents a known bug in the human nervous system.  Given that, I think our prior should be overwhelmingly against anything that even looks like a conspiracy theory.  (This is not to say conspiracies never happen.  Of course they do: Watergate, the Tobacco Institute, and the Nazi Final Solution were three well-known examples.  But the difference between conspiracy theorists’ fantasies and actual known conspiracies is this: in a conspiracy theory, some powerful organization’s public face hides a dark and terrible secret; its true mission is the opposite of its stated one.  By contrast, in every real conspiracy I can think of, the facade was already 90% as terrible as the reality!  And the “dark secret” was that the organization was doing precisely what you’d expect it to do, if its members genuinely held the beliefs that they claimed to hold.)

2. The shooting of Oswald by Jack Ruby created the perfect conditions for conspiracy theorizing to fester.  Conditioned on that happening, it would be astonishing if a conspiracy industry hadn’t arisen, with its hundreds of books and labyrinthine arguments, even under the assumption that Oswald and Ruby both really acted alone.

3. Other high-profile assassinations to which we might compare this one—for example, those of Lincoln, Garfield, McKinley, RFK, Martin Luther King Jr., Gandhi, Yitzchak Rabin…—appear to have been the work of “lone nuts,” or at most “conspiracies” of small numbers of lowlifes.  So why not this one?

4. Oswald seems to have perfectly fit the profile of a psychopathic killer (see, for example, Case Closed by Gerald Posner).  From very early in his life, Oswald exhibited grandiosity, resentment, lack of remorse, doctrinaire ideological fixations, and obsession with how he’d be remembered by history.

5. A half-century of investigation has failed to link any individual besides Oswald to the crime.  Conspiracy theorists love to throw around large, complicated entities like the CIA or the Mafia as potential “conspirators”—but in the rare cases when they’ve tried to go further, and implicate an actual human being other than Oswald or Ruby (or distant power figures like LBJ), the results have been pathetic and tragic.

6. Oswald had previously tried to assassinate General Walker—a fact that was confirmed by his widow Marina Oswald, but that, incredibly, is barely even discussed in the reams of conspiracy literature.

7. There’s clear evidence that Oswald murdered Officer Tippit an hour after shooting JFK—a fact that seems perfectly consistent with the state of mind of someone who’d just murdered the President, but that, again, seems to get remarkably little discussion in the conspiracy literature.

8. Besides being a violent nut, Oswald was also a known pathological liar.  He lied on his employment applications, he lied about having established a thriving New Orleans branch of Fair Play for Cuba, he lied and lied and lied.  Because of this tendency—as well as his persecution complex—Oswald’s loud protestations after his arrest that he was just a “patsy” count for almost nothing.

9. According to police accounts, Oswald acted snide and proud of himself after being taken into custody: for example, when asked whether he had killed the President, he replied “you find out for yourself.”  He certainly didn’t act like an innocent “patsy” arrested on such a grave charge would plausibly act.

10. Almost all JFK conspiracy theories must be false, simply because they’re mutually inconsistent.  Once you realize that, and start judging the competing conspiracy theories by the standards you’d have to judge them by if at most one could be true, enlightenment may dawn as you find there’s nothing in the way of just rejecting all of them.  (Of course, some people have gone through an analogous process with religions.)

11. The case for Oswald as lone assassin seems to become stronger, the more you focus on the physical evidence and stuff that happened right around the time and place of the event.  To an astonishing degree, the case for a conspiracy seems to rely on verbal testimony years or decades afterward—often by people who are known confabulators, who were nowhere near Dealey Plaza at the time, who have financial or revenge reasons to invent stories, and who “remembered” seeing Oswald and Ruby with CIA agents, etc. only under drugs or hypnosis.  This is precisely the pattern we would expect if conspiracy theorizing reflected the reality of the human nervous system rather than the reality of the assassination.

12. If the conspiracy is so powerful, why didn’t it do something more impressive than just assassinate JFK? Why didn’t it rig the election to prevent JFK from becoming President in the first place?  (In math, very often the way you discover a bug in your argument is by realizing that the argument gives you more than you originally intended—vastly, implausibly more.  Yet every pro-conspiracy argument I’ve read seems to suffer from the same problem.  For example, after successfully killing JFK, did the conspiracy simply disband?  Or did it go on to mastermind other assassinations?  If it didn’t, why not?  Isn’t pulling the puppet-strings of the world sort of an ongoing proposition?  What, if any, are the limits to this conspiracy’s power?)

13. Pretty much all the conspiracy writers I encountered exude total, 100% confidence, not only in the existence of additional shooters, but in the guilt of their favored villains (they might profess ignorance, but then in the very next sentence they’d talk about how JFK’s murder was “a triumph for the national security establishment”).  For me, their confidence had the effect of weakening my own confidence in their intellectual honesty, and in any aspects of their arguments that I had to take on faith.  The conspiracy camp would of course reply that the “Oswald acted alone” camp also exudes too much confidence in its position.  But the two cases are not symmetric: for one thing, because there are so many different conspiracy theories, but only one Oswald.  If I were a conspiracy believer I’d be racked with doubts, if nothing else then about whether my conspiracy was the right one.

14. Every conspiracy theory I’ve encountered seems to require “uncontrolled growth” in size and complexity: that is, the numbers of additional shooters, alterations of medical records, murders of inconvenient witnesses, coverups, coverups of the coverups, etc. that need to be postulated all seem to multiply without bound.  To some conspiracy believers, this uncontrolled growth might actually be a feature: the more nefarious and far-reaching the conspiracy’s tentacles, the better.  It should go without saying that I regard it as a bug.

15. JFK was not a liberal Messiah.  He moved slowly on civil rights for fear of a conservative backlash, invested heavily in building nukes, signed off on the botched plans to kill Fidel Castro, and helped lay the groundwork for the US’s later involvement in Vietnam.  Yes, it’s possible that he would’ve made wiser decisions about Vietnam than LBJ ended up making; that’s part of what makes his assassination (like RFK’s later assassination) a tragedy.  But many conspiracy theorists’ view of JFK as an implacable enemy of the military-industrial complex is preposterous.

16. By the same token, LBJ was not exactly a right-wing conspirator’s dream candidate.  He was, if anything, more aggressive on poverty and civil rights than JFK was.  And even if he did end up being better for certain military contractors, that’s not something that would’ve been easy to predict in 1963, when the US’s involvement in Vietnam had barely started.

17. Lots of politically-powerful figures have gone on the record as believers in a conspiracy, including John Kerry, numerous members of Congress, and even frequently-accused conspirator LBJ himself.  Some people would say that this lends credibility to the conspiracy cause.  To me, however, it indicates just the opposite: that there’s no secret cabal running the world, and that those in power are just as prone to bugs in the human nervous system as anyone else is.

18. As far as I can tell, the conspiracy theorists are absolutely correct that JFK’s security in Dallas was unbelievably poor; that the Warren Commission was as interested in reassuring the nation and preventing a war with the USSR or Cuba as it was in reaching the truth (the fact that it did reach the truth is almost incidental); and that agencies like the CIA and FBI kept records related to the assassination classified for way longer than there was any legitimate reason to (though note that most records finally were declassified in the 1990s, and they provided zero evidence for any conspiracy).  As you might guess, I ascribe all of these things to bureaucratic incompetence rather than to conspiratorial ultra-competence.  But once again, these government screwups help us understand how so many intelligent people could come to believe in a conspiracy even in the total absence of one.

19. In the context of the time, the belief that JFK was killed by a conspiracy filled a particular need: namely, the need to believe that the confusing, turbulent events of the 1960s had an understandable guiding motive behind them, and that a great man like JFK could only be brought down by an equally-great evil, rather than by a chronically-unemployed loser who happened to see on a map that JFK’s motorcade would be passing by his workplace.  Ironically, I think that Roger Ebert got it exactly right when he praised Oliver Stone’s JFK movie for its “emotional truth.”  In much the same way, one could say that Birth of a Nation was “emotionally true” for Southern racists, or that Ben Stein’s Expelled was “emotionally true” for creationists.  Again, I’d say that the “emotional truth” of the conspiracy hypothesis is further evidence for its factual falsehood: for it explains how so many people could come to believe in a conspiracy even if the evidence for one were dirt-poor.

20. At its core, every conspiracy argument seems to be built out of “holes”: “the details that don’t add up in the official account,” “the questions that haven’t been answered,” etc.  What I’ve never found is a truly coherent alternative scenario: just one “hole” after another.  This pattern is the single most important red flag for me, because it suggests that the JFK conspiracy theorists view themselves as basically defense attorneys: people who only need to sow enough doubts, rather than establish the reality of what happened.  Crucially, creationism, 9/11 trutherism, and every other elaborate-yet-totally-wrong intellectual edifice I’ve ever encountered has operated on precisely the same “defense attorney principle”: “if we can just raise enough doubts about the other side’s case, we win!”  But that’s a terrible approach to knowledge, once you’ve seen firsthand how a skilled arguer can raise unlimited doubts even about the nonexistence of a monster under your bed.  Such arguers are hoping, of course, that you’ll find their monster hypothesis so much more fun, exciting, and ironically comforting than the “random sounds in the night hypothesis,” that it won’t even occur to you to demand they show you their monster.

Further reading: this article in Slate.

Five announcements

Tuesday, October 1st, 2013

Update (Oct. 3): OK, a sixth announcement.  I just posted a question on CS Theory StackExchange, entitled Overarching reasons why problems are in P or BPP.  If you have suggested additions or improvements to my rough list of “overarching reasons,” please post them over there — thanks!


1. I’m in Oxford right now, for a Clay Institute workshop on New Insights into Computational Intractability.  The workshop is concurrent with three others, including one on Number Theory and Physics that includes an amplituhedron-related talk by Andrew Hodges.  (Speaking of which, see here for a small but non-parodic observation about expressing amplitudes as volumes of polytopes.)

2. I was hoping to stay in the UK one more week, to attend the Newton Institute’s special semester on Mathematical Challenges in Quantum Information over in Cambridge.  But alas I had to cancel, since my diaper-changing services are needed in the other Cambridge.  So, if anyone in Cambridge (or anywhere else in the United Kingdom) really wants to talk to me, come to Oxford this week!

3. Back in June, Jens Eisert and three others posted a preprint claiming that the output of a BosonSampling device would be “indistinguishable from the uniform distribution” in various senses.  Ever since then, people have emailing me, leaving comments on this blog, and cornering me at conferences to ask whether Alex Arkhipov and I had any response to these claims.  OK, so just this weekend, we posted our own 41-page preprint, entitled “BosonSampling Is Far From Uniform.”  I hope it suffices by way of reply!  (Incidentally, this is also the paper I hinted at in a previous post: the one where π2/6 and the Euler-Mascheroni constant make cameo appearances.)  To clarify, if we just wanted to answer the claims of the Eisert group, then I think a couple paragraphs would suffice for that (see, for example, these PowerPoint slides).  In our new paper, however, Alex and I take the opportunity to go further: we study lots of interesting questions about the statistical properties of Haar-random BosonSampling distributions, and about how one might test efficiently whether a claimed BosonSampling device worked, even with hundreds or thousands of photons.

4. Also on the arXiv last night, there was a phenomenal survey about the quantum PCP conjecture by Dorit Aharonov, Itai Arad, and my former postdoc Thomas Vidick (soon to be a professor at Caltech).  I recommend reading it in the strongest possible terms, if you’d like to see how far people have come with this problem (but also, how far they still have to go) since my “Quantum PCP Manifesto” seven years ago.

5. Christos Papadimitriou asked me to publicize that the deadline for early registration and hotel reservations for the upcoming FOCS in Berkeley is fast approaching!  Indeed, it’s October 4 (three days from now).  See here for details, and here for information about student travel support.  (The links were down when I just tried them, but hopefully the server will be back up soon.)