Archive for the ‘Physics for Doofuses’ Category

Is “information is physical” contentful?

Thursday, July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

The universe has a high (but not infinite) Sleep Number

Friday, February 12th, 2016

As everyone knows, this was a momentous week in the history of science.  And I don’t need to tell you why: the STOC and CCC accepted paper lists finally came out.

Haha, kidding!  I meant, we learned this week that gravitational waves were directly detected for the first time, a hundred years after Einstein first predicted them (he then reneged on the prediction, then reinstated it, then reneged again, then reinstated it a second time—see Daniel Kennefick’s article for some of the fascinating story).

By now, we all know some of the basic parameters here: a merger of two black holes, ~1.3 billion light-years away, weighing ~36 and ~29 solar masses respectively, which (when they merged) gave off 3 solar masses’ worth of energy in the form of gravitational waves—in those brief 0.2 seconds, radiating more watts of power than all the stars in the observable universe combined.  By the time the waves reached earth, they were only stretching and compressing space by 1 part in 4×1021—thus, changing the lengths of the 4-kilometer arms of LIGO by 10-18 meters (1/1000 the diameter of a proton).  But this was detected, in possibly the highest-precision measurement ever made.

As I read the historic news, there’s one question that kept gnawing at me: how close would you need to have been to the merging black holes before you could, you know, feel the distortion of space?  I made a guess, assuming the strength of gravitational waves fell off with distance as 1/r2.  Then I checked Wikipedia and learned that the strength falls off only as 1/r, which completely changes the situation, and implies that the answer to my question is: you’d need to be very close.  Even if you were only as far from the black-hole cataclysm as the earth is from the sun, I get that you’d be stretched and squished by a mere ~50 nanometers (this interview with Jennifer Ouellette and Amber Stuver says 165 nanometers, but as a theoretical computer scientist, I try not to sweat factors of 3).  Even if you were 3000 miles from the black holes—New-York/LA distance—I get that the gravitational waves would only stretch and squish you by around a millimeter.  Would you feel that?  Not sure.  At 300 miles, it would be maybe a centimeter—though presumably the linearized approximation is breaking down by that point.  (See also this Physics StackExchange answer, which reaches similar conclusions, though again off from mine by factors of 3 or 4.)  Now, the black holes themselves were orbiting about 200 miles from each other before they merged.  So, the distance at which you could safely feel their gravitational waves, isn’t too far from the distance at which they’d rip you to shreds and swallow you!

In summary, to stretch and squeeze spacetime by just a few hundred nanometers per meter, along the surface of a sphere whose radius equals our orbit around the sun, requires more watts of power than all the stars in the observable universe give off as starlight.  People often say that the message of general relativity is that matter bends spacetime “as if it were a mattress.”  But they should add that the reason it took so long for humans to notice this, is that it’s a really friggin’ firm mattress, one that you need to bounce up and down on unbelievably hard before it quivers, and would probably never want to sleep on.

As if I needed to say it, this post is an invitation for experts to correct whatever I got wrong.  Public humiliation, I’ve found, is a very fast and effective way to learn an unfamiliar field.

The First Law of Complexodynamics

Friday, September 23rd, 2011

A few weeks ago, I had the pleasure of attending FQXi’s Setting Time Aright conference, part of which took place on a cruise from Bergen, Norway to Copenhagen, Denmark.  (Why aren’t theoretical computer science conferences ever held on cruises?  If nothing else, it certainly cuts down on attendees sneaking away from the conference venue.)  This conference brought together physicists, cosmologists, philosophers, biologists, psychologists, and (for some strange reason) one quantum complexity blogger to pontificate about the existence, directionality, and nature of time.  If you want to know more about the conference, check out Sean Carroll’s Cosmic Variance posts here and here.

Sean also delivered the opening talk of the conference, during which (among other things) he asked a beautiful question: why does “complexity” or “interestingness” of physical systems seem to increase with time and then hit a maximum and decrease, in contrast to the entropy, which of course increases monotonically?

My purpose, in this post, is to sketch a possible answer to Sean’s question, drawing on concepts from Kolmogorov complexity.  If this answer has been suggested before, I’m sure someone will let me know in the comments section.

First, some background: we all know the Second Law, which says that the entropy of any closed system tends to increase with time until it reaches a maximum value.  Here “entropy” is slippery to define—we’ll come back to that later—but somehow measures how “random” or “generic” or “disordered” a system is.  As Sean points out in his wonderful book From Eternity to Here, the Second Law is almost a tautology: how could a system not tend to evolve to more “generic” configurations?  if it didn’t, those configurations wouldn’t be generic!  So the real question is not why the entropy is increasing, but why it was ever low to begin with.  In other words, why did the universe’s initial state at the big bang contain so much order for the universe’s subsequent evolution to destroy?  I won’t address that celebrated mystery in this post, but will simply take the low entropy of the initial state as given.

The point that interests us is this: even though isolated physical systems get monotonically more entropic, they don’t get monotonically more “complicated” or “interesting.”  Sean didn’t define what he meant by “complicated” or “interesting” here—indeed, defining those concepts was part of his challenge—but he illustrated what he had in mind with the example of a coffee cup.  Shamelessly ripping off his slides:

Entropy increases monotonically from left to right, but intuitively, the “complexity” seems highest in the middle picture: the one with all the tendrils of milk.  And same is true for the whole universe: shortly after the big bang, the universe was basically just a low-entropy soup of high-energy particles.  A googol years from now, after the last black holes have sputtered away in bursts of Hawking radiation, the universe will basically be just a high-entropy soup of low-energy particles.  But today, in between, the universe contains interesting structures such as galaxies and brains and hot-dog-shaped novelty vehicles.  We see the pattern:

 

In answering Sean’s provocative question (whether there’s some “law of complexodynamics” that would explain his graph), it seems to me that the challenge is twofold:

  1. Come up with a plausible formal definition of “complexity.”
  2. Prove that the “complexity,” so defined, is large at intermediate times in natural model systems, despite being close to zero at the initial time and close to zero at late times.

To clarify: it’s not hard to explain, at least at a handwaving level, why the complexity should be close to zero at the initial time.  It’s because we assumed the entropy is close to zero, and entropy plausibly gives an upper bound on complexity.  Nor is it hard to explain why the complexity should be close to zero at late times: it’s because the system reaches equilibrium (i.e., something resembling the uniform distribution over all possible states), which we’re essentially defining to be simple.  At intermediate times, neither of those constraints is operative, and therefore the complexity could become large.  But does it become large?  How large?  How could we predict?  And what kind of “complexity” are we talking about, anyway?

After thinking on and off about these questions, I now conjecture that they can be answered using a notion called sophistication from the theory of Kolmogorov complexity.  Recall that the Kolmogorov complexity of a string x is the length of the shortest computer program that outputs x (in some Turing-universal programming language—the exact choice can be shown not to matter much).  Sophistication is a more … well, sophisticated concept, but we’ll get to that later.

As a first step, let’s use Kolmogorov complexity to define entropy.  Already it’s not quite obvious how to do that.  If you start, say, a cellular automaton, or a system of billiard balls, in some simple initial configuration, and then let it evolve for a while according to dynamical laws, visually it will look like the entropy is going up.  But if the system happens to be deterministic, then mathematically, its state can always be specified by giving (1) the initial state, and (2) the number of steps t it’s been run for.  The former takes a constant number of bits to specify (independent of t), while the latter takes log(t) bits.  It follows that, if we use Kolmogorov complexity as our stand-in for entropy, then the entropy can increase at most logarithmically with t—much slower than the linear or polynomial increase that we’d intuitively expect.

There are at least two ways to solve this problem.  The first is to consider probabilistic systems, rather than deterministic ones.  In the probabilistic case, the Kolmogorov complexity really does increase at a polynomial rate, as you’d expect.  The second solution is to replace the Kolmogorov complexity by the resource-bounded Kolmogorov complexity: the length of the shortest computer program that outputs the state in a short amount of time (or the size of the smallest, say, depth-3 circuit that outputs the state—for present purposes, it doesn’t even matter much what kind of resource bound we impose, as long as the bound is severe enough).  Even though there’s a computer program only log(t) bits long to compute the state of the system after t time steps, that program will typically use an amount of time that grows with t (or even faster), so if we rule out sufficiently complex programs, we can again get our program size to increase with t at a polynomial rate.

OK, that was entropy.  What about the thing Sean was calling “complexity”—which, to avoid confusion with other kinds of complexity, from now on I’m going to call “complextropy”?  For this, we’re going to need a cluster of related ideas that go under names like sophistication, Kolmogorov structure functions, and algorithmic statistics.  The backstory is that, in the 1970s (after introducing Kolmogorov complexity), Kolmogorov made an observation that was closely related to Sean’s observation above.  A uniformly random string, he said, has close-to-maximal Kolmogorov complexity, but it’s also one of the least “complex” or “interesting” strings imaginable.  After all, we can describe essentially everything you’d ever want to know about the string by saying “it’s random”!  But is there a way to formalize that intuition?  Indeed there is.

First, given a set S of n-bit strings, let K(S) be the number of bits in the shortest computer program that outputs the elements of S and then halts.  Also, given such a set S and an element x of S, let K(x|S) be the length of the shortest program that outputs x, given an oracle for testing membership in S.  Then we can let the sophistication of x, or Soph(x), be the smallest possible value of K(S), over all sets S such that

  1. x∈S and
  2. K(x|S) ≥ log2(|S|) – c, for some constant c.  (In other words, one can distill all the “nonrandom” information in x just by saying that x belongs that S.)

Intuitively, Soph(x) is the length of the shortest computer program that describes, not necessarily x itself, but a set S of which x is a “random” or “generic” member.  To illustrate, any string x with small Kolmogorov complexity has small sophistication, since we can let S be the singleton set {x}.  However, a uniformly-random string also has small sophistication, since we can let S be the set {0,1}n of all n-bit strings.  In fact, the question arises of whether there are any sophisticated strings!  Apparently, after Kolmogorov raised this question in the early 1980s, it was answered in the affirmative by Alexander Shen (for more, see this paper by Gács, Tromp, and Vitányi).  The construction is via a diagonalization argument that’s a bit too complicated to fit in this blog post.

But what does any of this have to do with coffee cups?  Well, at first glance, sophistication seems to have exactly the properties that we were looking for in a “complextropy” measure: it’s small for both simple strings and uniformly random strings, but large for strings in a weird third category of “neither simple nor random.”  Unfortunately, as we defined it above, sophistication still doesn’t do the job.  For deterministic systems, the problem is the same as the one pointed out earlier for Kolmogorov complexity: we can always describe the system’s state after t time steps by specifying the initial state, the transition rule, and t.  Therefore the sophistication can never exceed log(t)+c.  Even for probabilistic systems, though, we can specify the set S(t) of all possible states after t time steps by specifying the initial state, the probabilistic transition rule, and t.  And, at least assuming that the probability distribution over S(t) is uniform, by a simple counting argument the state after t steps will almost always be a “generic” element of S(t).  So again, the sophistication will almost never exceed log(t)+c.  (If the distribution over S(t) is nonuniform, then some technical further arguments are needed, which I omit.)

How can we fix this problem?  I think the key is to bring computational resource bounds into the picture.  (We already saw a hint of this in the discussion of entropy.)  In particular, suppose we define the complextropy of an n-bit string x to be something like the following:

the number of bits in the shortest computer program that runs in n log(n) time, and that outputs a nearly-uniform sample from a set S such that (i) x∈S, and (ii) any computer program that outputs x in n log(n) time, given an oracle that provides independent, uniform samples from S, has at least log2(|S|)-c bits, for some constant c.

Here n log(n) is just intended as a concrete example of a complexity bound: one could replace it with some other time bound, or a restriction to (say) constant-depth circuits or some other weak model of computation.  The motivation for the definition is that we want some “complextropy” measure that will assign a value close to 0 to the first and third coffee cups in the picture, but a large value to the second coffee cup.  And thus we consider the length of the shortest efficient computer program that outputs, not necessarily the target string x itself, but a sample from a probability distribution D such that x is not efficiently compressible with respect to D.  (In other words, x looks to any efficient algorithm like a “random” or “generic” sample from D.)

Note that it’s essential for this definition that we imposed a computational efficiency requirement in two places: on the sampling algorithm, and also on the algorithm that reconstructs x given the sampling oracle.  Without the first efficiency constraint, the complextropy could never exceed log(t)+c by the previous argument.  Meanwhile, without the second efficiency constraint, the complextropy would increase, but then it would probably keep right on increasing, for the following reason: a time-bounded sampling algorithm wouldn’t be able to sample from exactly the right set S, only a reasonable facsimile thereof, and a reconstruction algorithm with unlimited time could probably then use special properties of the target string x to reconstruct x with fewer than log2(|S|)-c bits.

But as long as we remember to put computational efficiency requirements on both algorithms, I conjecture that the complextropy will satisfy the “First Law of Complexodynamics,” exhibiting exactly the behavior that Sean wants: small for the initial state, large for intermediate states, then small again once the mixing has finished.  I don’t yet know how to prove this conjecture.  But crucially, it’s not a hopelessly open-ended question that one tosses out just to show how wide-ranging one’s thoughts are, but a relatively-bounded question about which actual theorems could be proved and actual papers published.

If you want to do so, the first step will be to “instantiate” everything I said above with a particular model system and particular resource constraints.  One good choice could be a discretized “coffee cup,” consisting of a 2D array of black and white pixels (the “coffee” and “milk”), which are initially in separated components and then subject to random nearest-neighbor mixing dynamics.  (E.g., at each time step, we pick an adjacent coffee pixel and milk pixel uniformly at random, and swap the two.)  Can we show that for such a system, the complextropy becomes large at intermediate times (intuitively, because of the need to specify the irregular boundaries between the regions of all-black pixels, all-white pixels, and mixed black-and-white pixels)?

One could try to show such a statement either theoretically or empirically.  Theoretically, I have no idea where to begin in proving it, despite a clear intuition that such a statement should hold: let me toss it out as a wonderful (I think) open problem!  At an empirical level, one could simply try to plot the complextropy in some simulated system, like the discrete coffee cup, and show that it has the predicted small-large-small behavior.   One obvious difficulty here is that the complextropy, under any definition like the one I gave, is almost certainly going to be intractable to compute or even approximate.  However, one could try to get around that problem the same way many others have, in empirical research inspired by Kolmogorov complexity: namely, by using something you can compute (e.g., the size of a gzip compressed file) as a rough-and-ready substitute for something you can’t compute (e.g., the Kolmogorov complexity K(x)).  In the interest of a full disclosure, a wonderful MIT undergrad, Lauren Oullette, recently started a research project with me where she’s trying to do exactly that.  So hopefully, by the end of the semester, we’ll be able to answer Sean’s question at least at a physics level of rigor!  Answering the question at a math/CS level of rigor could take a while longer.

PS (unrelated). Are neutrinos traveling faster than light?  See this xkcd strip (which does what I was trying to do in the Deolalikar affair, but better).

Physics for Doofuses: Why Beds Exist

Friday, September 3rd, 2010

I promised to blog more about research, and I will.  Unfortunately, in the one week between my world tour and the start of the fall semester, I’ve been spending less time on quantum complexity research than on sleeping on a new mattress that I bought.  This has provided ample time to ponder the following question, which I’ve decided to add to the Shtetl-Optimized Physics for Doofuses series:

Why is a soft bed more comfortable than a hard one?

At first glance, this question seems too doofusy even for a series such as this, which makes its target audience clear.  The trouble is that, while perfectly reasonable-sounding answers immediately suggest themselves, several of those answers can be shown to be wrong.

Let’s start with the most common answer: a soft bed is more comfortable than a hard bed because it molds to your shape.   The inadequacy of this answer can be seen by the following thought experiment: lie on a soft bed, and let it mold to your body.  Then imagine that the bed retains exactly the same molded shape, but is replaced by ceramic.  No longer so comfortable!

Ah, you reply, but that’s because a ceramic bed doesn’t change its shape as you shift positions throughout the night.  But this reply is still inadequate—since even if you’re lying as still as possible, it still seems clear that a soft bed is more comfortable than a hard one.

So it seems any answer needs to start from the observation that, even when you’re lying still, you’re not really lying still: you’re breathing in and out, there are tiny vibrations, etc.  The real point of a soft bed is to create a gentler potential well, which absorbs the shocks that would otherwise be caused by those sorts of small movements.

(I was tempted to say the point is to damp the movements, but that can’t be right: trampolines are designed for minimal damping, yet sleeping on a trampoline could actually be pretty comfortable.  So the essential thing a bed needs to do is simply to make way in response to small movements and vibrations.  How hard the bed tries to spring back to its original shape is a secondary question—the answer to which presumably influences, for example, whether you prefer an innerspring or a memory-foam mattress.)

So then why aren’t beds even softer than they are?  Well, the limit of infinite softness would be a bed that immediately collapsed to nothing when you lay on it, dropping you to the floor.  But even before that limit, a bed that was too soft would give you too much freedom to shift into awkward positions and thereby cause yourself back problems.  This suggests an answer to a question raised by a colleague: is the purpose of a bed to approximate, as well as possible on the earth’s surface, the experience of sleeping in zero gravity?  Unless I’m mistaken, the answer is no.  Sleeping in space would be like sleeping on a bed that was too soft, with the same potential for back problems and so forth.

Given that lying in bed is normally the least active thing we do, I find it ironic that the only reasons we lie in bed in the first place (as opposed to, say, on steel beams) are dynamical: they involve the way the bed responds to continual vibrations and movements.

I’ll be grateful if knowledgeable physicists, physiologists, or sleepers can correct any errors in the above account.  Meantime, the next time your spouse, partner, roommate, parent, etc. accuses you of lounging in bed all afternoon like a comatose dog, you can reply that nothing could be further from the truth: rather, inspired by a post on Shtetl-Optimized, you’re struggling to reconcile your modern understanding of the physics and biology of lying in bed with the prescientific, phenomenal experience of lying in bed, and thereby make yourself into a more enlightened human being.


How long could a black hole remain in the center of the earth?

Sunday, December 21st, 2008

The above question came up in conversation with Michael Vassar and some other nerds in New York City yesterday (before I went with relatives to see Gimpel Tam, an extraordinarily dark and depressing musical performed entirely in Yiddish).  Look, I know a massive black hole would swallow the earth extremely quickly, and I also know that a microscopic black hole would quickly evaporate as Hawking radiation.  So suppose we chose one of intermediate size so as to maximize the earth’s survival time—how long a time could we achieve?  (Does the answer depend on the viscosity of the magma or whatever else is in the earth’s core?)  Sure, I could try to calculate an answer myself, but why bother when so many physicists read this blog?  Pencils out!

Quantum Computing Since Democritus Lecture 20: Cosmology and Complexity

Thursday, August 21st, 2008

Come watch me attempt to explain the implications of a positive cosmological constant for computational complexity theory.  If this blog is about anything, it’s about me talking about subjects I don’t understand sufficiently well and thereby making a fool of myself.  But it’s also about experts taking the time to correct me.  The latter is the primary saving grace.

Physics for Doofuses: Mass vs. charge deathmatch

Sunday, July 15th, 2007

Back in high school, I was struck by the apparent symmetry between mass and charge. For the one you’ve got Newton’s F=Gm1m2/r2, for the other you’ve got Coulomb’s F=Kq1q2/r2. So then why, in our current understanding of the universe, are mass and charge treated so differently? Why should one be inextricably linked to the geometry of spacetime, whereas the other seems more like an add-on? Why should it be so much harder to give a quantum-mechanical treatment of one than the other? Notwithstanding that such questions occupied Einstein for the last decades of his life, let’s plunge ahead.

When we look for differences between mass and charge, we immediately notice several.

(1) Charge can be negative whereas mass can’t.

That’s why gravity is always attractive, whereas the Coulomb force is both attractive and repulsive. Since positive and negative charges tend to neutralize each other, this already explains why gravity is relevant to the large-scale structure of the universe while electromagnetism isn’t. It also explains why there can’t be any “charge black holes” analogous to gravitational black holes. (I don’t mean charged black holes; I mean “black holes” that are black because of electric charge.) Unfortunately, it still doesn’t explain why mass should be related to the geometry of spacetime.

(2) Charge appears to be quantized (coming in units of 1/3 of an electron charge), whereas mass appears not to be quantized, at least not in units we know.

(3) The apparent mass of a moving object increases Lorentzianly, whereas the charge is invariant.

These are interesting differences, but they also don’t seem to get us anywhere.

(4) Gravity is “many orders of magnitude weaker” than electromagnetism.

One hears this statement often; the trouble is, what does it mean? How does one compare the “intrinsic” strength of gravity and electromagnetism, without plugging in the masses and charges of typical particles that we happen to find in the universe? (Help me.)

(5) Gravity is transmitted by a spin-2 particle, whereas electromagnetism is transmitted by a spin-1 particle.

This difference is surely crucial; the trouble with it (to use a pomo word) is that it’s too “theory-laden.” Since no one has ever seen a graviton, the reason we know gravitons are spin-2 particles in the first place must have to do with more “basic” properties of gravity. So if we want a non-circular explanation for why gravity is different from the Coulomb force, it’d be better to phrase the explanation directly in terms of the more basic properties.

(6) Charge shows up in only one fundamental equation of physics — F=Kq1q2/r2 — whereas mass shows up in two equations: F=Gm1m2/r2 and F=ma.

Now we finally seem to be getting somewhere. Difference (6) was the basis for Einstein’s equivalence principle, which was one of the main steps on the road to general relativity.

But while the equivalence principle suggests the possibility of relating mass to spacetime geometry, I could never understand why it implies the necessity of doing so. If we wanted, why couldn’t we simply regard the equivalence of gravitational and inertial mass as a weird coincidence? Why are we forced to take the drastic step of making spacetime itself into a pseudo-Riemannian manifold?

The answer seems to be that we’re not! It’s possible to treat general relativity as just a complicated field theory on flat spacetime, involving a tensor at every point — and indeed, this is a perspective that both Feynman and Weinberg famously adopted at various times. It’s just that most people see it as simpler, more parsimonious, to interpret the tensors geometrically.

So the real question is: why should the field theory of Gmm/r2 involve these complicated tensors (which also turn out to be hard to quantize), whereas the field theory of Kqq/r2 is much simpler and easier to quantize?

After studying this question assiduously for years (alright, alright — I Googled it), I came across the following point, which struck me as the crucial one:

(7) Whereas the electric force is mediated by photons, which don’t themselves carry charge, the gravitational force is mediated by gravitons, which do themselves carry energy.

Photons sail past each other, ships passing in the night. They’re too busy tugging on the charges in the universe even to notice each other’s presence. (Indeed, this is why it’s so hard to build a quantum computer with photons as qubits, despite photons’ excellent coherence times.) Gravitons, by contrast, are constantly tugging at the matter in the universe and at each other. This is why Maxwell’s equations are linear whereas Einstein’s are nonlinear — and that, in turn, is related to why Einstein’s are so much harder than Maxwell’s to quantize.

When I ran this explanation by non-doofus friends like Daniel Gottesman, they immediately pointed out that I’ve ignored the strong nuclear force — which, while it’s also nonlinear, turns out to be possible to quantize in certain energy regimes, using the hack called “renormalization.” Incidentally, John Preskill told me that this hack only works in 3+1 dimensions: if spacetime were 5-dimensional, then the strong force wouldn’t be renormalizable either. And in the other direction, if spacetime were 3-dimensional, then gravity would become a topological theory that we do sort of know how to quantize.

However, I see no reason to let these actual facts mar our tidy explanation. Think of it this way: if electromagnetism (being linear) is in P and gravity (being nonlinear) is NP-complete, then the strong force is Graph Isomorphism.

My physicist friends were at least willing to concede to me that, while the explanation I’ve settled on is not completely right, it’s not completely wrong either. And that, my friends, means that it more than meets the standards of the Physics for Doofuses series.

Physics for Doofuses: Understanding Electricity

Sunday, April 15th, 2007

Welcome to an occasional new Shtetl-Optimized series, where physicists get to amuse themselves by watching me struggle to understand the most basic concepts of their discipline. I’ll consider my post on black hole singularities to be retroactively part of this series.

Official motto: “Because if I talked about complexity, you wouldn’t understand it.”

Unofficial motto: “Because if I talked about climate change, I’d start another flamewar — and as much as I want to save civilization, I want even more for everyone to like me.”

Today’s topic is Understanding Electricity. First of all, what makes electricity confusing? Well, besides electricity’s evil twin magnetism (which we’ll get to another time), what makes it confusing is that there are six things to keep track of: charge, current, energy, power, voltage, and resistance, which are measured respectively in coulombs, amps, joules, watts, volts, and ohms. And I mean, sure you can memorize formulas for these things, but what are they, in terms of actual electrons flowing through a wire?

Alright, let’s take ’em one by one.

Charge is the q in kqq/r2. Twice as many electrons, twice as much charge. ‘Nuff said.

Current is charge per unit time. It’s how many electrons are flowing through a cross-section of the wire every second. If you’ve got 100 amps coming out, you can send 50 this way and 50 that way, or π this way and 100-π that way, etc.

Energy … Alright, even I know this one. Energy is what we fight wars to liberate. In our case, if you have a bunch of electrons going through a wire, then the energy scales like the number of electrons times the speed of the electrons squared.

Power is energy per unit time: how much energy does your appliance consume every second? Duh, that’s why a 60-watt light bulb is environmentally-friendlier than a 100-watt bulb.

Voltage is the first one I had trouble with back in freshman physics. It’s energy per charge, or power per current. Intuitively, voltage measures how much energy gets imparted to each individual electron. Thus, if you have a 110-volt hairdryer and you plug it into a 220-volt outlet, then the trouble is that the electrons have twice as much energy as the hairdryer expects. This is what transformers are for: to ramp voltages up and down.

Incidentally, the ability to transform voltages is related to why what comes out of your socket is alternating current (AC) instead of direct current (DC). AC, of course, is the kind where the electrons switch direction 60 times or so per second, while DC is the kind where they always flow in the same direction. For computers and other electronics, you clearly want DC, since logic gates are unidirectional. And indeed, the earliest power plants did transmit DC. In the 1890’s, Thomas Edison fought vigorously against the adoption of AC, going so far as to electrocute dogs, horses, and even an elephant using AC in order to “prove” that it was unsafe. (These demonstrations proved about as much as D-Wave’s quantum computer — since needless to say, one can also electrocute elephants using DC. To draw any conclusions a comparative study is needed.)

So why did AC win? Because it turns out that it’s not practical to transmit DC over distances of more than about a mile. The reason is this: the longer the wire, the more power gets lost along the way. On the other hand, the higher the voltage, the less power gets lost along the way. This means that if you want to send power over a long wire and have a reasonable amount of it reach its destination, then you want to transmit at high voltages. But high voltages are no good for household appliances, for safety and other reasons. So once the power gets close to its destination, you want to convert back down to lower voltages.

Now, the simplest way to convert high voltages to low ones was discovered by Michael Faraday, and relies on the principle of electromagnetic induction. This is the principle according to which a changing electric current creates a changing magnetic field, which can in turn be used to drive another current. (Damn, I knew we wouldn’t get far without bumping into electricity’s evil and confusing magnetwin.) And that gives us a simple way to convert one voltage to another — analogous to using a small, quickly-rotating gear to drive a big, slowly-rotating gear.

So to make a long story short: while in principle it’s possible to convert voltages with DC, it’s more practical to do it with AC. And if you don’t convert voltages, then you can only transmit power for about a mile — meaning that you’d have to build millions of tiny power plants, unless you only cared about urban centers like New York.

Resistance is the trickiest of the six concepts. Basically, resistance is the thing you need to cut in half, if you want to send twice as much current through a wire at the same voltage. If you have two appliances hooked up serially, the total resistance is the sum of the individual resistances: Rtot = R1 + R2. On the other hand, if you have two appliances hooked up in parallel, the reciprocal of the total resistance is the sum of the reciprocals of the individual resistances: 1/Rtot = 1/R1 + 1/R2. If you’re like me, you’ll immediately ask: why should resistance obey these identities? Or to put it differently, why should the thing that obeys one or both of these identities be resistance, defined as voltage divided by current?

Well, as it turns out, the identities don’t always hold. That they do in most cases of interest is just an empirical fact, called Ohm’s Law. I suspect that much confusion could be eliminated in freshman physics classes, were it made clear that there’s nothing obvious about this “Law”: a new physical assumption is being introduced. (Challenge for commenters: can you give me a handwaving argument for why Ohm’s Law should hold? The rule is that your argument has to be grounded in terms of what the actual electrons in a wire are doing.)

Here are some useful formulas that follow from the above discussion:

Power = Voltage2/Resistance = Current2 x Resistance = Voltage x Current
Voltage = Power/Current = Current x Resistance = √(Power x Resistance)
Resistance = Voltage/Current = Power/Current2 = Voltage2/Power
Current = Power/Voltage = Voltage/Resistance = √(Power/Resistance)

Understand? Really? Take the test!

Update (4/16): Chad Orzel answers my question about Ohm’s Law.

The event horizon’s involved, but the singularity is committed

Thursday, March 22nd, 2007

Lenny Susskind — the Stanford string theorist who Shtetl-Optimized readers will remember from this entry — is currently visiting Perimeter Institute to give a fascinating series of lectures on “Black Holes and Holography.”

After this morning’s lecture (yes, I’m actually getting up at 10am for them), the following question occurred to me: what’s the connection between a black hole having an event horizon and its having a singularity? In other words, once you’ve clumped enough stuff together that light can’t escape, why have you also clumped enough together to create a singularity? I know there’s a physics answer; what I’m looking for is a conceptual answer.

Of course, one direction of the correspondence — that you can’t have a singularity without also having an event horizon — is the famous Cosmic Censorship Hypothesis popularized by Hawking. But what about the other direction?

When I posed this question at lunch, Daniel Gottesman patiently explained to me that singularities and event horizons just sort of go together, “like bacon and eggs.” However, this answer was unsatisfying to me for several reasons — one of them being that, with my limited bacon experience, I don’t know why bacon and eggs go together. (I have eaten eggs with turkey bacon, but I wouldn’t describe their combined goodness as greater than the sum of their individual goodnesses.)

So then Daniel gave me a second answer, which, by the time it lodged in my brain, had morphed itself into the following. By definition, an event horizon is a surface that twists the causal structure in its interior, so that none of the geodesics (paths taken by light rays) lead outside the horizon. But geodesics can’t just stop: assuming there are no closed timelike curves, they have to either keep going forever or else terminate at a singularity. In particular, if you take a causal structure that “wants” to send geodesics off to infinity, and shoehorn it into a finite box (as you do when creating a black hole), the causal structure gets very, very angry — so much so that it has to “vent its anger” somewhere by forming a singularity!

Of course this can’t be the full explanation, since why can’t the geodesics just circle around forever? But if it’s even slightly correct, then it makes me much happier. The reason is that it reminds me of things I already know, like the hairy ball theorem (there must be a spot on the Earth’s surface where the wind isn’t blowing), or Cauchy’s integral theorem (if the integral around a closed curve in the complex plane is nonzero, then there must be a singularity in the middle), or even the Nash equilibrium theorem. In each of these cases, you take a geometric structure with some global property, and then deduce that having that property makes the structure “angry,” so that it needs a special point (a singularity, an equilibrium, or whatever) to blow off some steam.

So, question for the relativistas: is there a theorem in GR anything like my beautiful story, or am I just talking out of my ass as usual?

Update (3/22): Well, it turns out that I was ignorantly groping toward the famous Penrose-Hawking singularity theorems. Thanks to Dave Bacon, Sean Carroll, and ambitwistor for immediately pointing this out.