Higher-level causation exists (but I wish it didn’t)

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.

Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.

This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?

To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.

Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.

160 Responses to “Higher-level causation exists (but I wish it didn’t)”

  1. Sniffnoy Says:

    So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

    A state of affairs so common that some combination of Miriam Weizenbaum and Daniel Dennett coined a word for it. 🙂

  2. Daniel Says:

    You know given the times Scott, I think that we need to be more careful about strong declarations which are “obviously not meant to be taken seriously”. There are so many people seriously saying things which in any reasonable world would merit that description that I think we have a responsibility to at least clearly state at the end whether we actually believe what we are saying or not…

  3. Kavi Gupta Says:

    I was recently thinking of an idea for a movie plot. In 2018, a weird bug involving two presidents with the same last name but different first names results in accidental nuclear anhilation. A team of time travelers then attempts to change the past to ensure that President Clinton isn’t elected in 2016. However, all their attempts fail, since they end up electing Jeb Bush (same bug triggered). They then try to prop up Bernie Sanders, but are unable to make that work. Finally, they do their last resort plan, and make a bunch of minor modifications to history to ensure Trump’s election.

    Why not just fix the bug? Working with legacy code is the worst.

  4. Scott Says:

    Daniel #2: It’s funny; I’d been thinking the exact opposite—that given the times, clearly stating whether we actually believe what we’re saying is now evidently passé! I found it remarkable that, in the thug’s entire speech pulling the US out of the Paris accord, he never said a word to indicate whether he considered it true or false that human CO2 emissions were dangerously changing the climate. I’d expected a denial; instead I got a transcendence of the entire category of truth.

  5. Shecky R Says:

    Bravo!! (…though perhaps my atoms & quarks merely force me to say that).

  6. Daniel Says:

    Scott #4: I don’t think this is the standard you want to emulate, especially since the first part of your post was actually serious.

    In case the obvious flaw in the second part isn’t clear to any one, it isn’t that Scott is too pessimistic, rather it is that he is much too optimistic in the assumptions (that people are basically good and things are usually on the right track)which lead to his silly conclusion!

  7. Paul Chapman Says:

    If I am more optimistic, perhaps it’s because I see this time as the beginning of history, not the end of it. Recorded history only stretches back a few thousand years, in a universe billions of years old (during which time, from a human perspective, ‘coldly impersonal’ is as literal a description as one could accurately give).

    And a Hollywood film tells a whole story, while history is currently no further than somewhere into the second act — and quite possibly only a minute from the start of the film, where some horrible explosion kills some nameless people before the protagonist has even appeared. (My protagonist here is not a Messiah, but a hoped-for fundamental decency of our species, and the final triumph of a highly abstracted will to survive.)

    I also know just enough history (but admittedly perhaps a dangerously small amount) to see that progess has been made, and is being made (I’m sure you’re aware of Pinker’s work). There have been some serious setbacks which you appear to ascribe to a possible ‘postselected final state’, and there are almost certainly far worse setbacks to come (the first ‘terrorist’ use of a nuclear device in a city, for example), but we haven’t yet collected anywhere near enough evidence to give us 1 sigma of confidence in the pessimistic conjecture, let alone 5.

    I have an aphorism: every act of evil so far perpetrated on and by the human race will one day be surpassed (and then surpassed again), but so will every soaring human achievement.

    Cheers, Paul

  8. Joshua Zelinsky Says:

    Without commenting on the specifics of causal emergence as defined here, my preferred mental analog for high level causes v. low level reductionism is something like this: Suppose you are doing math and you want to construct a general foundation for most of the math you want to do. One option is to use set theory, say ZFC. Another option is to use something else, say topos theory or category theory as your fundamental objects. Now, suppose someone else just cares about understanding the integers; the vast majority of the time, they won’t care what their foundation underlying is. In this analogy, the high-level causes are like the integers, while the low level reduction is the underlying foundations.

    This analogy is not perfect; it turns out that for some purposes, your foundations do matter for some statements. And unlike with reductionism, the mathematician has a choice what foundation they pick. There is however one similarity there; we can conceive of their being different hypothetical universes which macroscopically for human purposes look identical but have different ultimate reductions(e.g. maybe one runs on string theory and another does not, or maybe they both uses strings but with slightly different parameters). However, here too the analogy isn’t perfect since some physicists speculate that if we really understood things well enough, then we might understand why the laws of physics had only one option.

    As for the issue of climate change, I’m reminded of the MLK quote about the arc of history. Unfortunately, there very likely isn’t any deity looking out for people, so while a comforting idea, it isn’t one we can take for granted. If the arc of history is to bend towards justice and sanity, we must exert that effort ourselves.

    There are a number of charities one can donate to that will help with climate change in a variety of ways. Two of my preferred ones are Everybody Solar http://www.everybodysolar.org/ which helps solar panels for nonprofits like homeless shelters and science museums, and the Solar Electric Light Fund http://self.org/ which helps get solar panels to power locations in the developing world. Every little bit helps. If not now, when?

  9. Sniffnoy Says:

    If I had to defend the strange normalization: The paragraph you quote makes it sounds like the uniform distribution on macrostates is coming from people deliberately doing experiments. That is to say, in your example, a human experimenter doesn’t realize that macrostate A is made up of various microstates, or that it’s made up of more microstates than B, so when they run an experiment on the results of setting up A and the results of setting up B they use the uniform distribution on {A,B}.

    I’m not sure how well this works in context. But ultimately it doesn’t matter; the first part of your post already says the important parts, that this is only exciting to the extent that it’s misconstrued.

  10. Carl 'SAI' Mitchell Says:

    WRT the joking “reverse hollywoodism” argument:

    “… the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.”

    Except the ability to do evil, by definition. That ability is amazingly strong. Violence works, especially when used against those unwilling to resist with violence. Evil has the advantage of ruthlessness. Good restricts itself to a subset of all possible actions, evil does not. Sadly this tends to offset the other advantages good has.

    Also don’t underestimate the power of apathy. Plenty of otherwise good people won’t bother to oppose evil until it’s too late, if ever.

  11. Raoul Ohio Says:


    Don’t forget you are a Computer Scientist. The states should be numbered 0 through 7, not 1 through 8.

  12. Nick Says:

    Regarding administration’s rationale for withdrawing from Paris, its hard to take their statements on the matter at face value, especially when they blatantly misrepresent research on things like expected impact of the accord, but what is clearer, if you look at the election results in comparison with each state’s respective carbon intensity of energy consumption, that the action is one that could disproportionately benefit economies of states that voted red.

  13. Arko Bose Says:

    If we ask the question: “What is the most fundamental aspect of reality that follows unitary, linear evolution?”, the answer that seems to be favored – as noted in this blog and in Sean Carroll’s latest book, The Big Picture – by nature is the class of functions that belong to L^2.

    Anything above this fundamental level necessarily involves loss of visibility of at least some degrees of freedom of a system, for if we wish to claim that we have a useful theory that can make testable predictions about a system, then such a theory must necessarily involve functions that are not computationally hard. And predictions about a macroscopic system – containing ~ 10^24 particles – cannot be computationally feasible if a theory were to attempt to keep an account of all those particles and their (possibly entangled) states.

    But this coarse graining often leads to non-linearities, which give rise to the classes of emergent phenomena that we see around us (again, noted by Sean in The Big Picture).

    A very rough analogy would be to see that we can say with certainty that the sum of 1, 2, 3, and 4 is always 10, but we cannot say with any degree of certainty where 10 came from AFTER the summation is done (-1 -2 + 9 + 4 = 1 + 2 + 3 + 4 = … = 10, etc.). Any sort of composition (basically any operation that can be reduced to addition) involves loss of information about the initial state, and this loss of information is the key to emergent phenomena.

    What more there is to emergent phenomena I fail to understand. What am I missing?

  14. anonymousInfTh Says:

    Let X be the random variable that describes the initial micro states and Y the random variable that describes the final micro states. The operation of “coarsening” micro states amounts to computing a random variable Z=f(X), for some function f. Now, the Data Processing Theorem says that I(f(X);Y) is NEVER greater than I(X;Y). So, how can we have a greater mutual information between initial and final states by coarsening?

  15. Edan Maor Says:

    “In case the obvious flaw in the second part isn’t clear to any one, it isn’t that Scott is too pessimistic, rather it is that he is much too optimistic in the assumptions (that people are basically good and things are usually on the right track)which lead to his silly conclusion!”

    I strongly disagree with this. I’m with Scott on this one – I think that there are more people that are fundamentally good. We have the numbers, the knowledge, the technology… but there is a tendency for that not to matter.

    (Well, for now… I am still optimistic that over the long run, good wins out).

  16. James R. Lee Says:

    In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.

    Scott, admit you’re a little scared.

    If one adds efficiency of the predictor to their model, their narratives start to ring true, and then you’ll be running into them at CCC.

  17. Jim Cross Says:

    Thanks, Scott, for commenting on this.

    When I saw your comments at Quanta, I thought I detected some degree of ambivalence in them and I wasn’t sure in the end where you stood. I also had been over on Sean Carroll’s site when the debate on top down causation was going on. So I hoped you might want to clarify.

    I didn’t expect quite this response. Is it correct to say you disagree with Carroll?

    I need to spend some more time absorbing this.

    Causation arises from time and memory. We observe A, later we observe B, and infer A causes B, because from past experience A seems to come before B. In real world, complex systems A and B do not exist in isolation and we do not have the ability to describe either with complete precision. Even if complex phenomena like human thought or weather systems are composed of subatomic particles never disobeying the rules of physics, the explanatory power of those particles to explain those complex phenomena is lost

    Not all Hollywood movies have good triumphing over evil. Sometimes evil needs to win so they can do a sequel.

  18. Adrià Garriga-Alonso Says:

    >I think that we need to be more careful about strong declarations which are “obviously not meant to be taken seriously”

    In fact, I’d been taking it all seriously, and was uttering out loud a “WTF”; as I thought that this is in no way an argument for the underlying nature of reality having a high-level causation.

    But if it was really not to be taken as an argument, rather as a bitter complaint about the state of affairs, then I agree.

  19. adamt Says:

    I would say Unsong has quite affected you. When you get too down remember Mr. Rogers advice: look for the helpers. All the people who come out in a tragedy looking to help and lend a hand. You see them in this orange clown farce tragedy too. IOW: you are not alone Scott. Please remember.

  20. adamt Says:

    On the more serious part… do you think Hoel and the IIT folks connected are putting this stuff out in bad faith? When you explain this it seems such a glaring error that they must be. Or else they are capable of amazing self-deception. How could they do all this work and not see such an obvoious flaw and then highlight it with such obfuscated language?

    Given the recent fight in Scientific American about the “inflation paradigm” and then this, it seems quite a lot of theorists are engaging in willful bad faith claiming to know more than they actually do. Or just incredible self-deception fueld by ordinary cognitive bias and motivated reasoning.

    Oh, for Feynman and theorists more comfortable dealing with uncertainty! Oh, for theorists more immune to self-deception!

    How can ordinary humans be expected to combat their own cognitive bias and motivated reasoning when even the best and brightest can not do it??!!

  21. John Sidles Says:

    These Shtetl Optimized themes were prominent in today’s National Geographic interview with arch-nerd free-climber Alex Honnold, upon the occasion of Honnold’s pioneering free-ascent of Yosemite’s Half Dome:

    Q Do you have any notion of how big a deal this is and what you’ve done?

    A That’s always the funny thing. It doesn’t feel that big a deal when you finally do it, because you put so much effort in. I mean the whole point is to make it feel not that crazy.

    Seeking to demonstrate Quantum Supremacy is not essentially different, is it?

    Q Do you feel the world kind of needed something cool like this, at this moment in time?

    What the world needs is for the U.S. to stay in the Paris Accords. There’s some bigger issues. But I think it’s always cool for somebody to work on something difficult and achieve their dream. Hopefully people can draw inspiration from this.

    In summary, there exist plenty of inspiring reasons to seek to demonstrate Quantum Supremacy — or conversely, to continue the quest for ever-stronger verifications of the Strong Church-Turing Thesis … which after all is an equally respectable, equally thrilling quest. Both Honnold’s free-climbing and the “climb” toward Quantum Supremacy remind us that the bravely nerdy human cognition that motivates and disciplines these inspirational quests ranks among their most societally valuable aspects.

  22. Mitchell Porter Says:

    Actually, you need to consider a uniform distribution over red states rather than blue states. But you did vow to “never, never normalize this”…

  23. Candide III Says:

    In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics. These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

    Good heavens. What a revelation! Now I finally know why Macron was elected President of France.

  24. Candide III Says:

    Poe’s Law strikes again.

  25. Jay Says:

    In other words, Trump is what you get when sampling from Murphy’s distribution.

    …kind of convincing indeed 😉

  26. fred Says:

    So, it’s possible to imagine a gambling system that’s totally random and fair at the basic level (say, a simple 50/50 quantum coin flip), yet, at the macro level, the casino is somehow able to “beat the odds” consistently, thanks to some high-level “emerging” phenomenon?

  27. fred Says:

    At best this could prove the validity of quantum type multiverse theory:

    Given a perfectly random coin, some universes will see an infinite sequence of heads.
    A player who only plays head will then be convinced that “praying” is working as an emerging causal agent (how else could you explain that a random coin is consistently falling on head?).

  28. wolfgang Says:

    >> this theory is a genuinely non-reductionist one.

    But is it?
    All you did is assume final conditions instead of initial conditions, nothing else about physics changes.

    I think a truly non-reductionist theory would e.g. assume that Trump’s mind is completely detached from reality, making decisions (which influence the state of the world) without being influenced by anything else. But this is obviously a ridiculous hypothesis …

  29. Erik Hoel Says:

    Stopping in to say hi.

    I recently sent Scott an email, and possible post if he would like it, issuing my reply to some of his misunderstandings about the theory. Hopefully we can have an informative conversation about it. In particular, his point about normalization is not accurate, as how EI increases at higher scales doesn’t stem from an arbitrary normalization any more than how the mutual information between a sender and receiver can be increased via encoding. Both are features, not bugs. But before I give my tl;dr version here in the comments I’ll wait to see if he responds or posts my reply.

    Thanks to everyone for their time and interest,

    Erik Hoel

  30. JimV Says:

    I think you’re right, unfortunately: Nixon, Reagan, G.W.Bush, Trump – the arc isn’t bending towards truth or justice.

    Well, at least it clears up the Fermi Paradox.

  31. Scott Says:

    Joshua Zelinsky #8: Thanks for your comment. I completely agree with you that, if one wants to state the anti-reductionist position as strongly as possible, one ought to focus on how higher-level concepts (water, jealousy, baseball, computation, …) would often still have exactly the same internal logic that they have today, even if their underlying physical substrate (e.g., the Standard Model of elementary particles) had turned out to be something totally different from what we now know it to be. The notion of renormalization group flow in QFT even provides a precise instantiation of this. And yes, I also see it as very directly analogous to how higher-level mathematical concepts (prime numbers, elliptic curves, etc.) typically survive unscathed, even if you rip out their “logical foundations” (ZF set theory or whatever) and replace them with alternative foundations.

    Of course, none of this changes the fact that the macrostates “supervene” on the microstates, to use the philosophers’ term for “are completely determined by.” And I also don’t see how it provides support for Hoel’s thesis, that one can somehow how get more information (in the pure Shannon sense, not factoring in computational efficiency as mentioned by James Lee #16) by looking at macrostates than by looking at the microstates that constitute them.

  32. Scott Says:

    wolfgang #28: If explaining why the initial conditions or final conditions of a physical theory are how they are requires making reference to concepts like human desires and intentions, then that’s quite enough for me to call the theory “non-reductionist,” even if the theory otherwise posits the working-out of simple physical laws.

    (“But what about the anthropic principle?” I hear someone reply. Well, I’ll grant that you can still call yourself a reductionist even if you think that aspects of the laws of physics, including of the initial or final conditions, can only be explained by the observable universe needing to be hospitable for intelligent life. But once intelligent life arises at all, if you start explaining what happens to it in teleological terms—“we simply don’t find ourselves in the worlds where good triumphs over evil” —that’s definitely non-reductionist! If it isn’t, then what is?)

  33. Erik Hoel Says:

    Thanks to Scott for putting that up. I won’t spam the thread, but since the data processing inequality (dpi) has been brought up three times, I just wanted to say: I think there’s no more contradiction between the dpi and causal emergence than between the dpi and Shannon’s noisy-channel coding theorem. Causal emergence treats causal structure like it’s a channel over which interventions are sent, where interventions at higher scales act like error-correcting codes. In this manner, there can be extra information in using a “macrocode.” Conversely, in the context of an information channel the dpi is about the impossibility of doing some clever transformation on an already received code to get extra information.

  34. Scott Says:

    adamt #19, #20: Thanks for the wise words from Mr. Rogers. From the perspective of Reverse Hollywoodism Theory, “the helpers” are indeed the only reason why our civilization has continued as long as it has, even in the presence of evil’s built-in teleological advantage. So even if they ultimately lose, helpers are as essential to the plot as Darth Vader is to Star Wars, or the stepmother is to Cinderella.

    No, I don’t believe for a second that anyone you mentioned—the inflationary cosmologists, integrated information theorists, causal emergentists, etc.—is arguing in bad faith. In the case of inflation, I don’t have a strong opinion right now; I think Guth, Carroll, et al are right when they point out what’s appealing about the model, and Penrose, Steinhardt, et al are also right when they point out what’s unappealing about it.

    With IIT and causal emergence, I do have strong opinions, but am certain that Tononi, Hoel et al honestly hold their opposing views. If you doubt that, read their responses to my critical blog posts and decide for yourself.

  35. Scott Says:

    Raoul #11:

      Don’t forget you are a Computer Scientist. The states should be numbered 0 through 7, not 1 through 8.

    Eh, I’m a theoretical computer scientist, one who spends a lot of time doing math and trying to write for broad audiences, and essentially zero time programming.

  36. Peter Morgan Says:

    Scott #31: “I also don’t see how it provides support for Hoel’s thesis, that one can someone how get more information (in the pure Shannon sense, not factoring in computational efficiency as mentioned by James Lee #16) by looking at macrostates than by looking at the microstates that constitute them.”
    Instead of thinking about states, macro and micro, perhaps better (or at least it’s different) to think about the relationships between observables, in the vein of Axiomatic QFT.
    In such terms, the question is whether there is anything one can discover by looking at observables associated with two regions of space-time together, O1 and O2, contained in the algebra A(O1∪O2), that cannot be discovered by looking at observables associated with each region separately, contained in the algebra generated by A(O1) and A(O2), A(O1)∨A(O2). The axiomatic assumption that nothing extra can be discovered, called Additivity, is a commonplace in Axiomatic QFT, but a formal statement of the rather vaguely defined “emergence” might well be to weaken or deny Additivity as an axiom, A(O1∪O2)≠A(O1)∨A(O2).
    Whether a state over A(O1) and a state over A(O2), which determine measurement statistics locally in O1 and in O2 separately, determines a state over in A(O1)∨A(O2), or in A(O1∪O2) if that is different, is a distinct question (answer no, because entanglement, etc., but in practice, perhaps yes, because decoherence).
    Axiomatic QFT, however, is conceptually different enough from Markov processes, which is mostly about the evolution of states with relatively little focus on observables, that this may not be helpful for Hoel.

  37. Jon K. Says:

    I think EI is an interesting idea, Erik. I can also see Scott’s point of view regarding normalization concerns being ignored.

    1) In the simple numerical example provided above, is there an information-theoretic concept that bridges the lower level that measures ~0.54 bits of information and the coarse-grained level that measures the 1 bit of effective information?

    2) Can this same concept give rise to less (instead of more) EI at a higher level? That is, could you take a system that has deterministic behavior at one level–like a register cycling through 2^n states by adding 1 bit each time and then repeating the cycle–but chunk these states in such a way to get to something less than 1 bit of EI at a higher level? So for instance, course-grain states based on the only the event bits in the register so you have fewer states and 50-50 probabilities at the higher level instead of 100% probabilities. In other words, does your model talk about “agents” below and “atoms” above, as well?

  38. Scott Says:

    anonymousInfTh #14: As I pointed out in the update, the resolution is that the data processing inequality assumes that the distribution over macrostates is obtained by starting from the distribution over microstates, then marginalizing out the microscopic stuff that you don’t care about. That assumption is precisely what Hoel rejects, for reasons that still don’t entirely make sense to me.

  39. James R. Lee Says:

    I feel hesitant to elevate this to a scientific discussion because Hoel’s response contains the usual hallmarks of philosophers. I will never comprehend the pursuit of understanding through obfuscation (or the use of the word “supervene” by any well-intentioned person).

    It seems one should address the simple question: What future event can I predict better from a macroscale description than from the microscale description? Does more “causal information” not entail more predictive power?

  40. James R. Lee Says:

    Actually, this theory is pretty brilliant. It gives a mathematical basis to (the modern practice of) philosophy.

    When you love the uniform distribution above all else, making up new words actually increases the amount of information in the world.

    mutual information between a sender and receiver can be increased via encoding

    Getting more revolutionary by the second…

  41. Scott Says:

    Jim Cross #17:

      I didn’t expect quite this response. Is it correct to say you disagree with Carroll?

    I think it’s fair to say that Sean and I agree both about the weakness of arguments against reductionism based on “causal emergence,” and about the tragicomic horribleness of Anthony Weiner’s sexts, and a sequence of other trivialities, having conspired to cast such a dark shadow over the future of life on earth. However, I strongly suspect that Sean—who’s a much more equable person than I am, and also one of the world’s leading public reductionists—is willing to seek reductionist explanations for the latter long after I’ve given up. 😉

  42. venky Says:

    All philosophical arguments go back to Scott’s post on Aumann’s common knowledge where disagreements stop when the sigma algebra finally ends up being no coarser or finer than needed for the disagreement to stop.

  43. Jacques Distler Says:

    Rather than considering just two choices for the distribution for macrostates (the uniform distribution and the distribution obtained by coarse-graining the distribution for microstates), it might be clarifying to consider the ensemble of all possible distributions for macrostates.

    Choosing a distribution out of that ensemble adds information. Calling that added information “emergent” is a marketing gimmick, but there’s no question that it’s not determined by the microphysics.

    It might be helpful, too, to take the set of values for sf NOT to be a finite set, thereby removing the uniform distribution as a “distinguished” choice among the ensemble of possible distributions.

    Generically, then, the only distinguished choice will be the one obtained by coarse-graining the distribution of microstates (the “reductionist” choice).

  44. Lee-kai Wang Says:

    A huge part of the disagreement about whether Hoel’s argument is antireductionist or not seems to be wrapped up in the principle of “causal exclusion”. From “Quantifying causal emergence shows that macro can beat micro” (Hoel, Albantakis, Tononi, 2013):

    Causal Exclusion and Its Implications. Causal analysis as presented here endorses both supervenience (no extra causal ingredients at the macro level) and causal exclusion [for a given system at a given time, causation occurs at one level only, otherwise causes would be double counted (4)]. However, causal analysis also demonstrates that EI can actually be maximal at a macro level, depending on the system’s architecture. In such cases, causal exclusion turns the reductionist assumption on its head, because to avoid double-counting causes, optimal macro causation must exclude micro causation.

    Hoel subscribes to causal exclusion – and also assumes that reductionists subscribe to it. This leads to statements like the one that if the “macro” level explanation has greater EI than the “micro” level explanation, then it must be the macro one that is “correct”, to the exclusion of the micro one.

    On the other hand, Scott rejects causal exclusion and believes that causes (e.g., of nose-scratching) can be explained at multiple scales simultaneously. Personally, this seems very reasonable and intuitive to me – no different from how a mathematical relationship like the Pythagorean theorem might be proved (with equal validity) using many different concepts with differing levels of complexity.

    Once you take causal exclusion away, Hoel’s paper boils down to using a measure like EI to rank multiple (perhaps infinite) valid causal models. It’s certainly very interesting and thought-provoking to quantify properties of a model by measures such as EI or complexity, but it also immediately becomes clear that which measure (and what normalizations) you choose will have its own effects on the rankings. This raises no philosophical quandaries whatsoever – unless you subscribe to causal exclusion.

  45. Scott Says:

    Lee-Kai #44: Thanks! Yes, while it wasn’t directly needed for my argument, I probably should’ve said in the post that the notion that an event has a “true cause” at a given scale, which rules out the possibility of valid causal explanations at other scales, strikes me as self-evidently absurd.

    If I drop a rock, why does it fall? Is it because I’m no longer holding it? Because of the rock’s being heavier than air? Because of the proximity of the earth? Because of spacetime curvature? Because of graviton transfer?

    I take it as patently obvious that each of these, and others, are valid answers, depending on what the questioner cares about—or in Judea Pearl’s terms, where we’re imagining placing “do” operators in the vast causal network of the world. If my 4-year-old daughter asks, the optimal answer will target more proximate causes than if Einstein asks.

    Even in pure math, if we write a program that halts on counterexamples to Fermat’s Last Theorem, and then ask why the program is still running after a month, there’s an enormous range of possible valid answers, from “because if you examine this giant execution trace, you’ll see that the ‘halt’ instruction never gets triggered” to “because Wiles proved that all semistable elliptic curves are modular.”

  46. Isotopeblue Says:

    Combining your thoughts on two controversial and completely unrelated topics in one post just undermines both of your messages. An unfortunate post.

  47. Anon E. Moose Says:

    It sounds like the authors are trying to formalize the concept of emergent phenomena as the requirement of high complexity for describing/simulating the higher-level phenomena in terms of the underlying lower-level one, and then claiming that the causal relations that can be drawn between the compact high-level descriptions to have more predictive power than their low-level counterparts, whereas in fact they have less predictive power but that power can be used more efficiently.
    In the example of the light switch, the “On -> Light” causal model is simple to reason about but cannot be used to explain the occasional observations of ‘On -> Light, then no light’ or ‘On -> Fire’ whereas a more detailed model would allow for such observations at the expense of having to include the concepts of a tungsten filament burning out or faulty switch and wiring causing a fire.
    Would this be a reasonable if somewhat dumbed-down approximation of their thesis and your critique of it ?

  48. Scott Says:

    isotopeblue: If an argument is valid, then how can it possibly be “undermined” by putting it next to something that you consider to be unrelated?

    Like, I realize that the habit of reading an argument for A, then saying to yourself, “this seems completely convincing, but before I accept A’s truth, let me check what the author has to say about B, C, and D, so I know whether he’s a serious and reliable ally”—I know it’s deeply ingrained in human nature, in me no less than in anyone else. But it’s a habit that’s worth fighting every day, and I hope the juxtaposition in this post helps spur someone to that realization.

  49. Erik Hoel Says:

    #37 Jon K: thanks, and good questions.

    1) If the system is considered an information channel, the 1 bit is the maximum information it can transmit per use (its capacity). Analyzing the causation at the lower level is like using the channel such that each state corresponds to a specific unique message (like 000). This turns out to be bad because of the noise, and thus it’s only 0.54 bits. Much better to think of states 1-7 as the same message (0) and then send that reliably. Note that the latter is the same as coarse-graining (i.e., analyzing the system causally at a macroscale with the same associated EI of 1 bit).

    2) The lowest level can definitely have the highest EI. Scott made it sound as if we claimed we proved scientific reductionism false. What the theory actually allows for is system-dependence: its depends on the properties and structure. Basically, the theory asks at what scale differences in state (an intervention) make most of a difference (has the most informative effects). In general all systems lean toward reduction: the vast majority of macro scales have a low EI for most systems, lower than the microscale.

    #44 Lee-kai Wang

    You’re right – I think the causal exclusion is a strong argument for universal reductionism. I agree with Scott that it’s not true, but there’s a big difference between proclaiming something obviously false and trying to prove it wrong by engaging with the arguments. On the other end, there’s a pure causal relativism that some people, such as Scott in #45, hold about causation. Often this is due to a confusion between understanding and causation. Everyone is a philosopher when the person they disagree with is a scientist, and a scientist when the person they disagree with is a philosopher. But neither the exclusion argument or the pure relativism argument is true. Nearly everyone will admit to causal exclusion over gerrymandered groupings or states. Even if it is the case that the causal structure is relative to some set of sensible observer/experimental questions there are still issues of scale, as most questions wouldn’t uniquely specify a scale, and thus causal emergence comes into play. So there’s both a strong claim and a weak claim (the weak claim is that definable causal structure doesn’t really exist a prioiri but causal emergence still comes into play once you start doing causal analysis).

    #47 Anon E. Moose.

    Your observation is good, but there are strong reasons to believe we’re not just doing some multiplication of the predictive power by the efficiency. This is because the phenomenon is very similar to how information transmission can be improved by encoding, which leads to a real increase in transmitted information. Keep in mind that while it may sound like a crazy claim it isn’t *all* information that is increasing at higher scales, it is only information associated with the causal structure (or causal information). Lots of people don’t understand that and immediately have a knee-jerk reaction.

  50. Eric Bahr Says:

    What you wrote seems to echo a sentiment Russell expressed several times.

    >If you are going to say [that God’s goodness is anything more than mere fiat], you will then have to say that it is not only through God that right and wrong came into being, but that they are in their essence logically anterior to God. You could, of course, if you liked, say that there was a superior deity who gave orders to the God who made this world, or could take up the line that some of the gnostics took up—a line which I often thought was a very plausible one—that as a matter of fact this world that we know was made by the devil at a moment when God was not looking. There is a good deal to be said for that, and I am not concerned to refute it.

  51. gentzen Says:

    Scott’s willingness to offer standard-issue crustiness to Natalie Wolchover on short notice and his long blog post can both be seen as part of his efforts to support good science journalism. And those efforts are successful indeed. Even the comments on that article are impressive. It is hard to describe how much I rejoiced upon reading lemur’s comment (even if off-topic):

    …, but these hurt my philosophical feelings: QM is Nature’s way of having to avoid dealing with an infinite number of bits

    His efforts to engage with politics on the other hand have not been significantly more successful than the efforts of other mortal beings. But mentioning Hitler in the context of the current events also doesn’t indicate deep political experience. (And maybe the hostile foreign power had no real alternative other than trying to prevent the historically unpopular opponent from winning the election.) But I have no deep political experience either, so I better stop here.

    Regarding philosophy, Scott is certainly entitled to his “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with” opinion. His essay The Ghost in the Quantum Turing Machine offers unconventional ideas that one can really disagree with. I also loved some of his answers in the Closer to Truth videos, both for being unexpected and for being unbelievably straight.

    (That comment was sort of directed towards Erik Hoel, at least that was the intention when I started writing it. But well… I started to appreciate good science journalism after experiencing bad one, and wondered about the consequences. But isn’t “bad” a subjective judgment? I read a (physics) book from 2007 and was put off by many historical inaccuracies. Then I read another book by the same author from 2013, and noticed that she had corrected most of those inaccuracies in the meantime. I took another look at the older book, and no longer found it as bad, because I now had the feeling that its author nevertheless tried to give her best.)

  52. Anon E. Moose Says:

    @Erik Hoel, 49

    I don’t understand it either, in your theory information content seems to be *decreasing* at higher scales which allows for more compact encoding solely due to it being a lossy encoding of the underlying phenomena.
    If you’re saying that what is increasing is the relevant to non-relevant information ratio then it does sound like a sleight of hand, specifically juggling Occam’s razors – you take a model, then take a proper subset of the possible observations that it can yield, then construct a simpler model which can only yield that subset and use the simplicity to claim explanatory precedence over the underlying model while ignoring the fact that the two do not explain the same sets of observations.

  53. Joshua Zelinsky Says:

    Scott #31, “And I also don’t see how it provides support for Hoel’s thesis, that one can someone how get more information (in the pure Shannon sense, not factoring in computational efficiency as mentioned by James Lee #16) by looking at macrostates than by looking at the microstates that constitute them.”

    I suspect that you don’t see it providing said support because it doesn’t, or at least doesn’t in any way which is obvious to me either.

  54. Erik Hoel Says:

    @ #52, Anon. E. Moose
    Thanks for the interest. Have you read the papers? It’s extremely difficult to tell from just Scott’s post what’s going on. I understand why you’re asking that question, but causal emergence is very different. The effective information is definitely increasing at (some) higher scales. Even Scott agrees with this, he just thinks there must be some form of arbitrary normalization built into the measure for this to happen (such as an enforced uniform distribution). The problem is he hasn’t understood that it’s not some arbitrary normalization; it’s a necessity brought about by the multiple-realizability of macro-states and macro-interventions. It’s also a general necessity to capture causal structure appropriately. Mathematically, it’s very similar to how error-correcting codes aren’t arbitrary normalizations, although someone just hearing about them could easily make the same mistake Scott did with this.

    Basically, information theory can be divided cleanly two parts: source coding and channel coding. So far, people have only talked about higher scales in terms of source coding. If you think of it in terms of source coding, then higher scales are at best lossy representations, i.e., exactly what you’re saying. The data processing inequality applies, and all that jazz. But the proposal here is that causal structure should actually be thought of in terms of channel coding. This means that higher scales can error-correct and thus have more (effective) information. Of course, we can always do what Scott suggests and change the measure purposefully so it fails to capture the causal structure at higher scales to avoid this conclusion. But this isn’t really an objection, more like a bad suggestion that misses the point.

  55. JimV Says:

    My worthless impression of the Switch-On/Light-On, Switch-Off/Light-Off model is that, yes, humans like to use simple heuristics to plot their way through life; but sometimes the hoof-beats are from zebras.

  56. Adamt Says:

    Hoel #54,

    You object to Scott’s normalization fix and suggest you disagree both technically and conceptually, but then your so-called technical objection is just that Scott’s fix ruins your conceptual story. Then you cloud up the whole thing with obfuscated language about supervening layers and so on. Scott thinks you are arguing in good faith, but I have my doubts… from here it just looks like you’ve erected a rickety Rube Goldberg contraption with feigned philosophical underpinnings to arrive at the conceptual story you want to tell of causal emergence.

    Scott, I think Carroll signed the letter, but his explanation after was pretty tepid. No where near as full throated as your other famous collaborator 😉

    You claim to be Switzerland here, but should I read it as telling that you interpreted my comments as indicting the Linde/Guth side and not ISL?? 😉

  57. Erik Hoel Says:

    @Adamt #56. I don’t dismiss Scott’s “normalization fix” because of conceptual reasons. I dismiss it because it leads to nonsensical conclusions.

    Scott’s objection to the math has boiled down to believing that “requiring the macro-distribution to arise by marginalizing the micro-distribution seems like the correct choice to me.” This is referring to the distribution of interventions on the system by some experimenter. So Scott claims experimenters should perturb any given system through all possible micro-states. If Scott wants to analyze the information in the causal relationships of a computer’s logic gates, he must first learn the full set of its subatomic states. Even worse, such a “measure” of causal influence has nothing to do with a system’s mechanisms or connectivity, except that of its fundamental microscale. But changing the measure in the horrible way Scott suggests doesn’t change the fact that causal structure does indeed shift its properties across scales, such as becoming more deterministic, less degenerate, or creating different mechanisms. This is the heart of causal emergence. A good measure like effective information is sensitive to this. Scott’s arbitrary fix severs this connection so the measure has nothing to do with the strength of causal relationships.That’s why it spits out random numbers for even a simple causal relationship, like that of a light switch to a light bulb. Now how, on earth, do you think that’s a conceptual issue and not a technical one? Because I want some of whatever you and Scott are smoking, so I too can look at a chain of deterministic COPY gates in a computer and go: “zero causal relationships here….” ::: exhalation :::

    Broad philosophical handwaving about how there is no real problem to be solved or how all events have infinite causes are totally fine in my book. Hell, they are sometimes warranted and I completely welcome them. But the technical objection about some kind of unnecessary normalization hidden away in EI is literally just wrong. In an earlier post, I also point out why the data processing inequality does not apply to this situation. So there are no technical objections I know of that hold up.

  58. Scott Says:

    Adamt #56: I’m not, to put it mildly, a good source of information on what the universe might have been doing in the first trillionth of a trillionth of a second of its existence. And yes, I’m open to the possibility that we know far less about such matters than a lot of popular cosmology writing would lead one to think. (From 1 second or so after the Big Bang is a different story: there we have a picture based on the Standard Model and GR that impressively matches observed data.)

    Sean Carroll occupies an interesting position here: on the one hand, he signed the letter in support of inflationary cosmology; but on the other hand, it was his own writings, such as this one, that drove home the point for me that most of the arguments for inflation that the educated layperson would know are just flat-out misleading or wrong. Among other things, Roger Penrose is fundamentally correct that inflation is of no help whatsoever in producing a special initial state for the universe, because it requires an even specialer state to get it started.

    On the other hand, the alternatives to inflation, like bouncing and ekpyrotic cosmologies, don’t seem to an outsider to inspire any great confidence; and inflation arguably did have success in predicting detailed features of the CMB (even if perhaps those same features could’ve been predicted without inflation, just by saying “we don’t know the exact mechanism, but everything should probably be as simple and generic as it could be”).

    So, all in all, I’m left in the uncomfortable position of saying that most of the arguments for inflation that an outsider would know (such as it giving enough time for different regions of the observable universe to equilibrate with each other) appear to be wrong, but nevertheless, some variant of inflation might very plausibly have happened, for all I know.

  59. John Sidles Says:

    Scott (in the OP) seeks emergent answers to “Why has air travel in the US gotten so terrible?”

    Breaking News Donald Trump has appointed Steve “Torture Memo” Bradbury to be General Counsel of Transportation. Quarks and gluons aside, we air-travellers emergently dislike the sound of THIS! 🙂

    More broadly, the discourse of the Aaronson/Hoel debate (so far) calls to mind Groucho Marx’s assessment “I’ve had a lovely time, but this wasn’t it.” 🙂


    Mind-fullness  Searching the latest of Mind — Ludwig Wittgenstein’s least-liked academic journal — for philosophical assessments of emergent phenomena turned up a really lovely article (as it seemed to me) by MIT professor Jack Spencer, having the engaging title “Able to do the Impossible” (Mind, 2017, preprint here).

    One illuminating aspect of Spencer’s article (for me), is its application to philosophical universes in which the Extended Church-Turing Thesis is not only factually false, but even metaphysically impossible.

    Nonetheless, by the construction “Case 1: Close calls” of Spencer’s section 1.3 “Constructing other G-cases”, it can plausibly happen that the Extended Church-Turing Thesis is “G-true” (in Spencer’s terminology) … in the sense that the Extended Church-Turing Thesis is emergently true (as an dynamical consequence of QED field theory, for example).

    There’s no shortage of newly-launched enterprises whose business models rely upon “G-closeness” to truth of the Extended Church-Turing Thesis … and the good news for young STEAM-students is, these enterprises are hiring.

    Thus Spencer’s modes of thinking about emergent physics, cognition, and truth are (for me anyway) appreciably more pragmatically fruitful than any modes that the Hoel/Aaronson debate has suggested (so far, at any rate).

    Meta conclusion  Efforts to identify and conciliate the strongest emergence-related STEAM-works are more likely to provoke fruitful discourse and/or prospering STEAM-enterprises than criticism of the weakest works.

  60. Daniel Tung Says:

    I think it is an interesting attempt. But the main concept used is basically relative entropy at higher levels of descriptions. I am not sure that Hoel has made a convincing argument (which I think is quite bold and far reaching) in arguing that higher level causation is mostly just relative entropy.
    Taking a Bayesian perspective, entropy is a measure of our knowledge, and relative entropy measures our information gain from one distribution to another (at any level of description, macro or micro). More information gain at a higher level than the corresponding lower level states does not necessarily mean it can be considered as a causation at higher level. Anyway, what is the proposed relation between information/knowledge and causation? One is passive and one is active. The former may be a necessary condition for the latter but not a sufficient one.
    By the way, I also think that Tononi’s psi function fails to capture consciousness. To me, it is definitely not a sufficient condition for consciousness (I am not a panpsychism). It is at most a necessary condition. I suspect it is the same here about Hoel’s argument. Relative entropy may be a necessary condition for higher level causation, but not a sufficient one.

  61. Gil Kalai Says:

    A remark regarding Scott’s itching nose example:

    Scott wrote: “I scratched my nose because it itched, not because of the Standard Model of elementary particles” and elaborated: “my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them”

    Consider the three statements:

    A) Scott’s nose itches

    B) Scott’s nose will itch on June 9, 2022

    C) Itching nose causes Scott to scratch it

    First, in all these statements Scott himself and even Scott’s nose are not concrete well-defined meta states of some larger (“microscopic”) system. The ambiguities in associating a concrete meta state to Scott and his nose still allow the statement “Scott nose will itch in June 9, 2022” to be pretty well defined and not to depend on the precise meta states chosen to represent Scott and his nose.

    Second, in order to give very precise (even probabilistic) prediction for the event “Scott’s nose will itch on June 9, 2022” we need a huge microscopic system, perhaps as large as the entire universe or at least large enough that will enable us to tell also if Erik Hoel’s nose will itch on the same day.

    Third, the causal structure “itchy nose causes Scott to scratch it” may emerge in much smaller systems which do not allow predicting if the nose will itch.

    Last, the ambiguity in “what Scott is” is related also to the question if Scott has free will regarding scratching his nose when it itches.

  62. Michael Says:

    I have a feeling that EI is described somewhat clumsily with reductionist, naturally, replying to the claims as formally described.

    If you have a full description of a system in terms of microstates and their dynamics, and a description of the observed distribution of initial states, there is nothing else to measure about the system — the rest can be deduced. If you list the macrostates, the micro-level description will provide a complete prediction of possible dynamic behaviours; if you add some indication of distribution of input states (and most of the time macro states are not enough to _completely_ describe the dynamics, because there is distribution of micro-level states inside each macro-level state), you get the distribution of the output states just from the micro-level description.

    The description of EI leaves a superficial impression that the claim above is not believed to be literally true; I am not convinced. On the other hand, the claim above is not that strong, and I think this claim is approximately all that reductionism expects from the world.

    In my view after reading the reply and the comments here, the descriptions of EI in terms of interventions and system as causal channel etc. seem to be more about choosing the description level for optimal use of information, stability of observed behavior etc.

    If we compress pretty pictures, of course we would like to classify all the possible pictures into 2^n equiprobable classes which capture the most interesting differences for us, and equiprobability is about efficiency of description and selecting the most relevant information to convey etc.the images

    And when we study dynamics, if some part of information doesn’t affect the outcome, and some part is used in such a chaotic way that our measurements do not give enough precision, it is quite natural for a natural science to become less interested in these parts of the description and switch to studying the parts of the input that have a measurable causal effect on the output.

    So maybe EI should be framed not as claiming reductionism is wrong (just not a currently feasible program in most fields of study, which is obvious), and maybe some comparison with Lyapunov exponents could be drawn; they are about describing evolution along one axis as stable and predictable and along some other axis as very unstable, which seems to be something similar…

  63. saraswat Says:

    Scott — take heart in the words of the emerging Trump administration philosopher, James Mattis … “”Once we’ve exhausted all possible alternatives, the Americans will do the right thing. We will still be there”! What will be true of the Pacific will be true of the Paris accord as well.

    (Even though I hate referring to a Churchillism…)

  64. saraswat Says:

    (On a slightly different topic, is there a version of Scott Aaronson for AI? cf Scott #35 …

    “a theoretical computer scientist, one who spends a lot of time doing math _and AI_ and trying to write for broad audiences _about which outcomes of AI research can be made to happen in practice (just SMOP)/which are possible/which are implausible/which are impossible_”.

    (my additional requirements in _…_ :-).)

    If ever the world needs such a person, it is now!)

  65. JimV Says:

    I am getting the further worthless impression is that some of the disagreement is semantic re “causal”. I don’t think there would be much objection to the work if it were called “heuristic emergence”, but until there is some empirical proof of macro-states not being determined by their micro-states, “causal emergence” doesn’t sound right to some of us.

  66. Peter Morgan Says:

    Saraswat #63, Mattis has clearly not internalized the rapture of Republican Christianity, else his conclusion would be “We will be in Heaven”.

    I’ll address Hoel’s approach differently than I did in my #36. Hoel works with Markov processes, but specializes to systems that have an exact macro-state separation: that is, having a diagonal block-matrix presentation. Physical systems, however, are only separable in this way for limited lengths of time. There are small probabilities, for example, of interactions between my fingers, subsystems of my body, and my toes; infrequently, I have to cut my toenails. In general, every entry in a Markov matrix is likely to be non-zero and different from other entries.
    To identify how to coarse-grain in the general case, we have to consider either the matrix entries or we have to consider whatever information is not encoded in the Markov matrix (physical adjacency, for example). If the former, an algorithm might, for example, compute an eigenvector basis (over the complex numbers) and compute which to discard (but there’s likely a better algorithm!); the algorithm is clearly a source of extra information. If the latter, there’s an even clearer source of extra information. In both cases, the Markov process is embedded in a larger system of other degrees of freedom, but we can’t just move Hoel’s argument to that larger system, because the same problem applies to it (perhaps more so, because now either the matrix or the external information is more complex).

  67. Adam Treat Says:

    Scott #58: Yes, that’s what I meant saying Sean’s support seemed to be tepid in comparison to Susskind. Penrose’s point is well taken. I saw him make this point in a talk he gave about his own alternative theory. I do believe the controversy not too hard to follow for an educated layman and the essential criticisms of ISL and Penrose seem pretty grave for inflation. In particular, this is damning: http://physics.princeton.edu/~cosmo/sciam/index.html#facts

    You make a point of publishing your errors for all to see, but I have seen zero response from Linde/Guth to that. Moreover, it seems ISL is saying that even when all parameters are fixed in a given model that inflation can still give a ‘prediction’ for any CMB observable encountered and thus the explanatory power zilch.

    To take this further off-topic – but hey, this is much more fun – I look at Penrose as an elder sage who deeply cares to get things right rather than toot his own horn. I’m drawn to his CCC theory because I find it beautiful. He’s probably wrong about this, strong AI, and objective collapse, but it is fun to think about what if he isn’t! 🙂

    I’ve been spending my little day dreaming time thinking about what he might mean by the gravitational laws being uncomputable like what if the schwarzschild radius for a black hole was a function of the kolmogorov complexity of the information inside the hole. If nature has some conservation law for kolmogorov complexity that would pose problems for the physical church-turing thesis wouldn’t you say?

  68. Erik Hoel Says:

    #62 Michael
    You are correct in thinking the theory is about choosing appropriate levels of description. It’s in explicit agreement that higher scales can be deduced for lower scales: they can (this is also called “supervenience”). However, one can actually gain information about the causal structure at higher scales. This turns out to be very similar to how codes perform error-correction. Higher scales are like codes, but for causal relationships. Scott misses this and thinks the conclusion must be due to some form of arbitrary normalization and proposes a “fix” that makes EI give nonsensical values at higher scales. His proposal is like looking at Shannon’s noisy-channel coding theorem and saying that it’s cheating because one must always use the same input distribution no matter what. Of course, this doesn’t stop us from measuring EI over macro-variables in the real world using macro-interventions. In fact, we do this in the lab. Luckily, we can have Scott come over and edit the results of those interventions to fit whatever he thinks the fundamental distribution should have been down at the level of strings and hand us back our random numbers.

  69. fred Says:

    What would be the practical applications of this?

  70. fred Says:

    Fundamentally, isn’t this all about what’s going on in the human brain?
    A brain is when the state of small clumps of atoms becomes super sensitive to the statistical properties of larger clumps of atoms.
    E.g. if nearby atoms are in the form of a chair, a certain neuron will fire, and if a distant galaxy is in the shape of a spiral, another neuron will fire.
    The humain brain is the most complex object we know because to understand it you really need to understand it’s entire environment (for a human brain, the entire universe).
    It’s not about the map being better than the territory, it’s about the territory spontaneously creating its own map within itself.

  71. Adam Treat Says:

    JimV #65: Yes, exactly. This is evident when Hoel says he has ‘technical objections’ and then proceeds to quibble with Scott’s normalization fix because it doesn’t align with the conceptual story he wants to tell.

    It is also telling that he doesn’t answer when challenged whether this causal emergence leads to new predictive power or not. He has a story he wants to tell and his ‘EI’ as conceived with normal distributions allows him to do that. When Scott gives the lie behind it he objects because it doesn’t allow him to tell the same story.

  72. Jay Says:


    Thanks for asking comments. One good point from Erik Hoel is that, at least for some physical systems, there is a confusion on which level of description is the best. Most notably in neuroscience: is it best to describe everything (or some things, and under what conditions?) at the neural level, or at the psychological level, or at some freebit level maybe?

    Do you consider this question physical or vacuous? If that’s the former, don’t you think this work could help shedding light on it?

    @Erik Hoel

    Thanks for shimming in. Suppose S1 is a system described in terms of particles moving there and there and we can compute what will happen for the next few hours. Suppose S2 is a system described in terms of two persons arguing, and we can compute how the argument will resolve. According to your causality measure, is there any anything to learn that S1 is, or is not, S2?
    Now suppose S0 is a system described as a computer that may or may not emulates S1. According to Tononi, there is something to learn that S0 is, or is not, S1 (e.g. the latter could be far more conscious if it is not simulated). Does your causality measure share this property at some extent?

  73. Scott Says:

    Jay #72: Yes, for each complex system that science studies (the brain, the economy, a macromolecule…), there are important debates to be had about what are the right levels of description for understanding it. Such debates are a large part of what scientists do! Often they try to make a case for a certain level of description, by showing what actual insights they can obtain at that level that were inaccessible at “higher” or “lower” levels.

    But unfortunately, I don’t see that causal emergence theory sheds any new light here. For suppose I knew that the effective information (as always, assuming the uniform distribution) was greater at one scale than at a higher or lower one. What useful information does that give me? How does that connect to anything else I might want to know?

  74. Curious Wavefunction Says:

    I have always had a problem with reductionists who claim that complex systems may not be reducible in practice but that they are still reducible in principle.

    If that is the case, show me.

    In other words, if you cannot actually show me that a complex system is reducible in practice, then assuming that it must be reducible in principle is, beyond a point, a matter of faith.

  75. Erik Hoel Says:

    Adam Treat #67

    Scott’s supposed criticism of the math is that there is an “issue of normalization” hiding in effective information (EI). But Scott has mixed up doing macro-interventions (putting a macro-variable in some macro-state) with normalization. Scott’s confusion lead to him proposing a “fix” that all causal analysis should be forced to obey some arbitrary distribution at the microscale.

    Let’s say you want to measure the EI between two neurons to assess the strength of their causal relationship. There’s plenty of reasons to do this, but maybe you want a quantifiable measure of how their causal relationship decreases under a drug. You first measure the EI over their states {firing, not-firing}. Then Scott comes running and declares that you must now discover how many subatomic states make up {firing} and {not-firing} and what those subatomic states do. Then you must weight your value by that. This produces numbers that have nothing to do with the causal relationship between two neurons. That’s not a “fix” haha. Scott just didn’t understand where the shift in the intervention distribution was coming from (macro-interventions being multiply-realizable). He also didn’t grasp why EI originally uses a uniform distribution at all, which is what makes it decomposable into properties that govern the strength of causal relationships (like determinism/degeneracy). This is extremely clear in the papers but Scott’s only objection (and it’s not much of one) of “well, you could also NOT use the uniform distribution!” ignores this.

    His only other technical objection is something about how the data processing inequality applies in some unspecified way. But information theory has two parts: source and channel coding. In channel coding the dpi is only that you cannot milk an already *output* message for extra information. Causal emergence is about treating interventions as *inputs* to a channel, so the dpi doesn’t apply.

    And btw, I’m not answering all things people toss out as I don’t want to spam the thread. Trying to correct the mistakes of the guy who thinks Trump proves teleology is difficult enough.

  76. Ajit R. Jadhav Says:

    Two things.

    1. Take a digital pic of, say, an attractive woman’s smiling face. Using a good digital image-viewing program, easily available these days, go on enlarging some smaller and still smaller section of it a lot. … Soon enough, you can’t tell, e.g., if it’s her eyelashes or her eyebrows that is under the view.

    “Come back” to the “normal” scale, and you so easily can.

    Does this example qualify towards searching an answer to the question under discussion? Or is it that the question is much broader/deeper than that?

    [My problem here is that I seemingly don’t quite get what the problem is. … Ummm… why don’t you guys define your terms?]

    2. If the former, i.e., if the example does qualify, i.e., if referring purely to the physical aspects of the universe is enough to address the question, then, guess, you guys could look up a little bit about certain ideas like the algebraic multi-gridding—why the technique accelerates the solution process at all. … Or, may be, any other suitable multi-scaling approach. … May be, there is a hint or two in these for you.

    My two cents.



  77. Erik Hoel Says:

    Jay #72. In theory, you could measure the causal relationships at both scales without reference to each other (unlike what Scott seems to believe). But I do I think there is still something to learn, as it tells you something about the supervenient structure and the relationship between the high and low causal models. Your second question is a great one. It’s definitely true that the same system could causally emerge in real life but not in a computer stimulation.

    #76 Ajit Jadhav. Speaking broadly, you are right that the TV example does have something to do with it. A closer metaphor is like a camera being blurry or focused. Causal emergence is when the causal structure of a system snaps into focus at a particular scale, but is blurry at the other scales.

  78. Scott Says:

    Curious Wavefunction #74: There are two problems with your view. The first is that, in the complex systems that we build ourselves (such as large software packages), this is EXACTLY how it works. I.e. we know that in principle, you could trace what the program is doing one opcode or even one transistor at a time—but we also understand why in practice no one would ever want to do that. So, given that we think we already know the “opcodes” that are relevant for the physics of everyday life (namely, the Standard Model of elementary particles plus gravity), why shouldn’t we generalize that experience to other complex systems?

    The second difficulty is, what’s the alternative? Either you can come right out and say that you think magic happens at some intermediate scale—like with my “reverse Hollywoodism” theory—and that that’s what prevents reductionism from working even in principle. Or you can engage in endless verbal obfuscation and misdirection that never cash out into anything clear.

  79. Jay Says:

    Scott #73: I won’t try to predict what Erik Hoel would answer, but that’s excellent questions indeed. In other words, I’m very curious to see if he would answer it.

    But let me try to give an example of something that would constitue a sensible answer to my eyes. Suppose I define a new measure, HE (thk JimV), and say: if HE for some scale (DNA sequence, cell, individual, group) is greater than HE at the other scales, then considering that scale will allow predicting most of the evolution from natural selection. Of course, I’m not saying that what Erik Hoel did or tried to reach, but that’s exactly the kind of things that would make me say ‘ok you’re right, you do have an interesting measure of causality’.

  80. JimV Says:

    “Trying to correct the mistakes of the guy who thinks Trump proves teleology is difficult enough.”

    I would say, “Trying to correct the mistakes of the guy who thinks that Scott thinks Trump proves teleology is even harder” – if I were the sort of person who says that sort of thing, which of course I’m not.

  81. Curious Wavefunction Says:

    Scott #78: While I agree with your computer example, I also think it’s far simpler to imagine reductionism working in principle in that case than in the case of extrapolating from atoms to Winston Churchill’s election as prime minister during the Second World War. As for your second question, when you ask what’s the alternative, it’s precisely theories like this one that might be the alternative.

    Now you may ask why strong reductionism in principle should work for the computer example and not for the Churchill example. To which may I venture a possibility: that theories like the one proposed by Erik Hoel are not all-or-none, and that they may apply on a continuum and on a case by case basis. You may need to invoke them to a significant extent for some cases in which strong reductionism may not be sufficient, and you may not need them as much for other cases where it is. Is there any fundamental principle in nature which says that strong reductionism needs to be a universally applicable phenomenon?

    I think that part of the problem in these discussions seems to arise from semantics. When someone says that “atoms can account for Beethoven’s 5th”, I am much more inclined to accept it than when someone says “atoms can explain Beethoven’s 5th”. The latter is much harder to grasp (while the former seems true but in a trivial and uninteresting manner) in terms of causal relationships since there is very little in my physical experiments with atoms that seems to have any connection with the creation of the 5th and my enjoyment of it. My guess is that the debate between reductionists and anti-reductionists may remain unresolved because they actually mean different things and are bent on getting mired in binary choices.

  82. Jay Says:

    Erik Hoel #77.

    >It’s definitely true that the same system could causally emerge in real life but not in a computer stimulation.

    Let me rephrase: Does your causality measure share this oddity, and if yes do you think you can go beyond hand waving on this point?

  83. fred Says:

    Causality means asking “why?”.
    As Feynman put it, you only go so far until you hit some assumption that’s taken for granted.
    If you keep asking “why?” against any historical event, eventually you’ll get the workings of individual human brains.
    And fundamentally we can’t even understand our own decisions (notions of free-will and other non-sense) without looking at things at the molecular level.

  84. Scott Says:

    Curious Wavefunction #81: Alas, this is a perfect example of what drives me up a wall in these reductionism debates, because I can never put my finger on any actual clear question that’s at issue.

    It’s obvious that, if you knew the complete quantum state on a large enough region of a spacelike hypersurface surrounding Beethoven, you could in principle simulate every last detail of the process by which he wrote his 5th symphony. And it’s equally obvious that that’s the wrong level of description for understanding how or why he wrote the symphony, where it fits in the history of music, etc. So then,

    What is the additional question that you would like to have answered? What sort of result, or discovery, would count as an answer to it?

    And regarding causal emergence: even if I agreed that there was a deep mystery here (which again, if we leave consciousness out of it, I don’t), why should I accept that “effective information” was a key to resolving that mystery, as opposed to a thousand other numerical measures that one could write down? Where has that assumption been justified, in a way that’s not a word ratatouille but proceeds linearly from assumptions to a conclusion?

    Much like with IIT, the suspicion arises that some of the commentators who say they like causal emergence haven’t actually looked under the hood of what they bought—so their expressions of support ought to be interpreted as support for some aspirational theory of the future, rather than for anything that’s actually been written down.

  85. Michael Says:

    #76 Erik Hoel: I think you misunderstand Scott’s argument, it would work just as well in the opposite direction, measuring common information between input and output when micro-states follow a distribution derived from a good distribution of macro-states.

    And uniform distribution of interventions on a fixed scale is not always natural: when an imperfect channel is already digital and different errors have different probabilities, the ECCs become asymmetric.

    Maybe it would be a more convincing characterisation of the scale choice efficiency if you took the maximum over all distributions of common information between intervention and output divided by the logarithm of the number of interventions. That way at least inventing meaningless micro-states will be strictly harmful, unlike the current definition of EI. I think it would be a more natural metric of effectiveness: you want a cheap description that still allows to capture a lot of relationship between inputs and outputs.

  86. Erik Hoel Says:

    Jay #79 and #82. Unless development dries up the theory will have practical applications, but remember that this is a research program that is at most 3/4 papers. It’s so new, and we’ve been focused on fundamental theory, such as showing why EI is sensitive to causal structure (the uniform distribution), how causal structure shifts across scales (determinism/degeneracy change), and how measuring EI can capture that (which Scott believes is due to an arbitrary normalization but really it stems from macro-interventions). Additionally, we’ve been grounding it in information theory (as channel coding). But it will have applications. For instance Scott in #73 admits that scale is important in science, but then says such issues are settled just by scientists making a case. But in many systems, such as the brain, no obvious scale pops out at you, and yet scale matters for things like understanding the neural code. As to the computer/simulation issue: causal emergence is more flexible in how it groups states than IIT. So perhaps not. Love to have a better answer for whether this is true but I just don’t know yet.

    JimV #80. But correcting the mistake of the guy who thinks I think that Scott thinks Trump proves teleology is even harder. I’m sure he’s winking, but the Trump-brain factor should never be underestimated. I only read the blog every few years when Scott makes a big post about my research. So we should be right on course for the 2020 election haha.

    Curious Wavefunction #81. I generally agree with your position. But Scott doesn’t represent the theory correctly. It is precisely such a continuum. It justifies reduction in many cases (in fact, it shows that mathematically systems *generally* lean toward reduction).

  87. Curious Wavefunction Says:

    Scott #84: For me the central question *is* one of reductionism in practice. The reason why emergentists have a problem with strong reductionism is simply because reductionism in practice has not been demonstrated (or is impossible to demonstrate) except in a limited number of cases, while reductionism in principle (quarks/strings simply *accounting* for everything) is uninteresting because it is both trivially true as well as pitched at the wrong level of explanation. What’s left, then?

    Then there’s contingency which I think might pose the ultimate obstacle. The actual realization of Beethoven’s 5th was a result of umpteen contingencies of chemical, biological and ultimately cultural evolution that neither strong reductionism nor causal emergence can really account for. Once contingency enters the picture then we can only answer the how and not the why. An exhaustive enumeration of the causal hypersurface of reductionist possibilities may include Beethoven and his 5th as one potential result. It still won’t tell us why out of the countably infinite scenarios (ones in which “Beethoven” might have been the name of an albino rhino at the Berlin zoo for instance), *that* was the chosen one that made the transition from the world of potential existence to the world of real existence.

    I agree with you that the particulars of Erik’s theory may not provide the correct answer, but I do find attractive the fact that the theory does provide a potential angle for accounting for complexity that does not do away with reductionism but which (as Erik says above) does put strong reductionism on a continuum. In that sense it lends itself to a view of explanation as a slider rather than a binary switch and is more nuanced and flexible in my opinion. Personally I find this flexibility and diversity pleasing.

  88. Isotopeblue Says:

    Re #48, the question is not what’s logically the fact, but what’s rhetorically effective. That said, the fact that your chimeric post goaded Hoel into showing just what a jerk he is (#75) suggests it may have been worth it.

  89. Scott Says:

    Isotopeblue #88: Sorry, I won’t be approving further comments from you, since I don’t see that they’re contributing to the discussion in any way.

  90. Daniel Freeman Says:

    @Erik: What does your EI theory predict that reductionism doesn’t? I’ve read your paper, and I’ve read most of this thread, and I still don’t really have a good answer to that question.

    Like, characteristic lengthscales (be they ”emergent” or not) are old hat in physics. Is the meat of the claim really just, “We have a diagnostic that may be better than alternatives at identifying a characteristic lengthscale.”?

    And if so, what at all does that have to do with *causality*? Whatever it does have to do with causality–why doesn’t that also apply to a renormalization group procedure?

    (I hope this doesn’t sound too combative–I’m really genuinely curious about this!)

  91. Jay Says:

    Erik Hoel #86. Fair enough. Thanks for the discussion and good luck with your future researches.

  92. Erik Hoel Says:

    Hey all, I’ll answer some last questions before bowing out. Feel free to contact me via email. While I don’t think Scott put together various aspects of the theory (in ways in which I’ve detailed pretty thoroughly) I do applaud his exemplar cordial engagement with the ideas. I look forward to chatting with him in the future and thank Scott very much for his interest.

    #85 Michael. To me it reads like you are proposing we measure the EI at the microscale using the macroscale distribution (of interventions, I’m assuming, but I could be wrong). In fact, that’s close to the theory (for instance, doing this would give 1 bit in Scott’s example, not 0.54 bits). We do define a similar quantity to that in your third point, called “effectiveness” (the mutual information between a set of interventions and their effects, divided by the logarithm of the number of those interventions). It is indeed a big part of the theory.

    #90 Daniel. Don’t worry, you don’t sound combative. Causal emergence isn’t against reductionism. It’s about developing tools that tell which from which, depending on the system. This is a broad answer, but I think universal reductionism predicts that the best causal models (no matter the purpose) are always the most detailed (the territory), whereas causal emergence disagrees (maps can be better). To your latter question asking what this has to do with causation: it’s using information theory to quantify the causal structure of systems. It’s framed around interventions and their effects (causal analysis). One of the big proofs of the papers (that Scott leaves out) is showing how these information-theoretic measurements capture key properties of causation and precisely aren’t arbitrary in the way Scott is suggesting. One open question is indeed how this should be applied to physics, but there do seem to be some relations to the renormalization group procedure.

    Thanks to everyone for their time if they got this far! – Erik

  93. Michael Musson Says:

    I think I am missing something in the argument for casual emergence.

    If I have a microstates and choose to group them as macrostates A and B, I haven’t changed the information content of the system. I have changed the information content of my description of the system. If I talk about A and B then I have a more efficient description of the system as A and B, but a much less effective description of the system as 1-8.

    Comparing entropy is apples and oranges since 1-8 and A,B are two different systems, no?

  94. Mr. Eldritch Says:

    Well, of course it’s a matter of faith! It’s also a matter of faith that evolution is real, or that the Earth is actually round – I genuinely cannot refute all arguments made against those positions, and just take it on faith that, say, the flight paths of major air routes MUST have some origin that can wholly be explained by geography, legislation, and business practices, rather than indicating a mark against the spherical earth.

    There’s plenty of arguments where I personally haven’t – or can’t – work out everything in full detail, but I take it on faith that the premises actually lead to the conclusions.

    If I were to doubt everything that I personally cannot derive from primary sources, arguments I fully understand, and calculations I’ve worked out on my own, then I would be unable to say anything at all with confidence. (Would the Apollo astronauts have been fried by the Van Allen belt radiation, thereby proving the Moon landing a fraud? I don’t think so, and other people tell me that they wouldn’t have – but I haven’t done the calculations myself and wouldn’t really know how, and even though I have seen someone else do those calculations I just kinda glossed over them and certainly didn’t double-check.)

    I’m being more than a little hyperbolic here, but the point is that there’s *lots* of thing I personally can’t show are reducible in practice, because I don’t have the expertise or time or insight or knowledge needed to do so. I don’t see any fundamental difference in assuming that some things are so difficult to work out in practice that *nobody* can completely demonstrate them.

  95. Mr. Eldritch Says:

    That was directed at Curious Wavefunction #74, by the way.

  96. Tim Makarios Says:

    … Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya …
    Can I add Noether to that list?

    Your musings on higher-level causation and seemingly improbably disastrous events reminded me a bit of the end of Amos chapter 5:

    What sorrow awaits you who say,
        “If only the day of the LORD were here!”
    You have no idea what you are wishing for.
        That day will bring darkness, not light.
    In that day you will be like a man who runs from a lion—
        only to meet a bear.
    Escaping from the bear, he leans his hand against a wall in his house—
        and he’s bitten by a snake.
    Yes, the day of the LORD will be dark and hopeless,
        without a ray of joy or hope.

    “I hate all your show and pretense—
        the hypocrisy of your religious festivals and solemn assemblies.
    I will not accept your burnt offerings and grain offerings.
        I won’t even notice all your choice peace offerings.
    Away with your noisy hymns of praise!
        I will not listen to the music of your harps.
    Instead, I want to see a mighty flood of justice,
        an endless river of righteous living.

    “Was it to me you were bringing sacrifices and offerings during the forty years in the wilderness, Israel? No, you served your pagan gods—Sakkuth your king god and Kaiwan your star god—the images you made for yourselves. So I will send you into exile, to a land east of Damascus,” says the LORD, whose name is the God of Heaven’s Armies.

    P.S. I don’t think you’ve quite got the comments fixed yet. The main page of your blog says this post has 95 comments, but when I follow the link, the page stops at comment #64, except on my tablet, where I came earlier via a different route, and it loaded down to comment #73, and pre-filled someone else’s name and email address for me to comment under.

  97. Mateus Araújo Says:

    I don’t see how Aharonov’s fundamental postselection is in any way non-reductionist. He postulates a final *quantum state*, not a final macrostate that includes complex entities like minds. And even if he did, one could again simply express this macrostate in terms of fundamental elements and obtain again a reductionist description.

    His post-selection is akin to having a boundary condition in the future, analogous to a boundary condition in the past (of course, with the obvious difference that we have no reason to think that it exists).

  98. JimV Says:

    A) Some things are only caused to happen at a higher level than basic physics; until the higher level was reached, they would not have happened.

    B) In probing nature for the causes of various things, sometimes it is better to focus on information at a higher level than basic physics.

    I’m still confused as to which of these (or none of the above) “causal emergence” is supposed to mean.

  99. Daryl Mcullough Says:

    A lot of heated debates in philosophy and science sound like there is no actual disagreement. One side says

    “X is of course true, but I insist, nevertheless, that Y is true.”

    while the other side says

    “Y is of course true, but I insist, nevertheless, that X is true.”

    It’s like the stupid beer commercial from years ago “Tastes great/Less filling”.

    Reductionists and nonreductionists don’t really disagree, they just prefer to emphasize one or another of two true claims, which are as Scott described them:

    1. My nose, and my behavior are completely described by quantum field theory.

    2. I scratch my nose because it itches.

    Nobody really disagrees, do they?

  100. Daryl Mcullough Says:

    Having said that, I think there is a sense in which nonreductionist explanations are called for. If the high-level causal explanation can be “implemented” in many different ways in terms of the low-level, then the appropriate level of explanation is at the higher level.

    If you’re watching someone play chess, and you ask: “Why did he allow his rook to be captured?” then presumably the answer is in terms of “It took the heat off his queen” or something chess-appropriate. An explanation in terms of QED is inappropriate, not simply because it’s overkill, but because it’s not an explanation at all. If the topic is chess, then the explanation should be in chess-terms, that would be equally appropriate regardless of how the chess player is implemented.

    Of course, in a game of chess, there might be events that “break the abstraction”. If the player made the wrong move because his hands were shaking, then you can’t give an explanation of that in purely chess terms. But if it’s possible to give a chess-level explanation, it’s preferable.

  101. Peter Morgan Says:

    Musson #93, FWIW, I see that as largely my argument as I put it in my #66 (ancient history by now, of course). Hoel replied to my posting my #66 on his FQXi thread, from which I’d say that he hasn’t yet enough considered the information that is external to a Markovian model of a system, both at the space-time end and at the consciousness end, say, but for now I think, or it seems, he has a viable research project.

  102. Scott Says:

    Daryl #99: I think your comment wins this thread, particularly in its analogizing of the reductionist/non-reductionist debate to a beer commercial.

    However, I also find that there’s a strong reason in practice to favor the “reductionist” side. Namely, they’re the side that says:

    Of course the beer tastes great while also being less filling. That’s exactly what we’ve been saying for centuries.”

    Whereas the non-reductionist side says:

    “We wish to announce our paradigm-smashing, world-shaking discovery. Namely, while it might be true that the beer is less filling, it also tastes great!”

    It’s the pretension of shockingness and originality that grates.

  103. Scott Says:

    Mateus #97: In the context of “reverse Hollywoodism theory,” what would make Aharonov’s postselected final state non-reductionist (in my understanding) is that it involves conditions like “an orange clown has destroyed life on earth,” not conditions like “the universe is filled with thermal radiation at such-and-such temperature.” I.e., the final quantum state has enormous Kolmogorov complexity, if we need to use the programming language of fundamental physics rather than that of human-level concepts.

  104. Will Says:

    Scott’s objection does not need the macroscale distribution to be calculated by enumerating all the microstates, as you suggest.

    Here is a slight variant of the objection: You draw an analogy to channel capacity. But it is possible to make the analogy much more directly. Each system, *at a given description level*, can be viewed as a communications channel where the experimenter chooses a macrostate input. From this point of view, we should let the experimenter choose the input probability distribution to maximize the channel capacity. If we do this, we of course restore the reductionist inequality.

    Here is a second objection: Consider a wire that is part of a computing system or controlling some electrical device. For the microstates, we will take voltages on the wire. A voltage in some interval counts as a 0 bit and turns the device off, and a voltage in a second interval counts as a 1 bit and turns the device on.

    Our macrostates will be sets of possible voltages. It is easy to see that a given macro scale achieves the maximum possible channel capacity (1 bit) if and only if

    A) Each macrostate is contained in one of the two intervals.


    B) The number of macrostates in each interval is equal.

    Condition A) is obviously reasonable. But B) is incredibly lax as compared to our understanding of what makes a reasonable causal model of the system. There is a massive number of ways to divide two large sets into partitions with the same number of parts. But very few of them seem like good causal descriptions of the system.

    Furthermore, if n is the number of microstates in each interval then it is possible to build a chain of length 2 log_2 (n) of macroscales, each a coarsening of the previous, that alternate between 1 bit and (1/3) log 3 + (2/3) log (3/2) = .918 bits of effective information. This alteration does not correspond to anything in the causal structure of the system.

    Now consider a variant of the wire system, where we examine what happens also to voltages between the two intervals. I expect the probability that this will be registered as an on bit / activate the device to vary smoothly between 0 on one interval and 1 on the other. What is the macroscale that maximizes the effective information? It is not too hard to see that the solution must coarse-grain the area between the intervals, into one or a few macrostates, while fine-graining each of the intervals as much as possible. The purposes of this is so that most of the mass of the uniform distribution on macrostates is supported on one of the two intervals, and very few of this is in between.

    This is the *exact opposite* of how an engineer studying the system would choose to coarse-grain it, if there was some problem where the voltage was sometimes between the two intervals. They where coarse-grain the intervals where the behavior of the system is known and as expected, and look very finely at what is going on between the two intervals, where the behavior is more complicated. They may study, e.g., the exact graph of how the probability distribution varies as a function of the voltage between the two intervals.

    Finally a small objection: Consider any system at all where the macrostate contains more causual information then the microstate. To this system we add a cup of hot tea and a jug of cold milk. We rig up the system so that, in each macrostate, some of the cold tea is poured into the hot milk, to equalize the entropy of each macrostate. However the tea/milk has no connection to the output whatsoever. By Boltzmann’s equation, this equalizes the number of microstates within each macrostate, and so ensures that the microscale contains at least as much information as the macroscale.

  105. Mateus Araújo Says:

    Scott #103: So you consider non-reductionist a mechanism that is merely very hard to explain in terms of fundamental physics, instead of impossible to explain in terms of fundamental physics?

    I guess that is fair enough. If there were a great orange shitpile written in the final state of the universe I would start believing that there is a god, and that he hates us.

  106. Scott Says:

    Mateus #105: Yup.

  107. Nick Nolan Says:

    Let’s see if I understand this correctly.

    System with 8 variables X = (X_1, X_2, … , X_8).

    Orthodox reductionism with some locality: World can be explained using some function
    f: X_n = f(X_n-1, X_n, X_n+1) + randomness.

    It’s OK to describe universe with macroscopic theories X = g(X) as
    long as they be in theory described as well or better using some
    reductionist function f.

    Orthodox reductionism fails only if someone can show that there exists macroscopic
    phenomena can’t be reduced to f. Just finding some g, does not invalidate anything.

  108. Anonymous Says:

    I still get the name and mail address of other people when entering a comment.

    My question for you Scott: If you observe a state A why do you assume with such confidence that there was a single causal chain leading to that state?

    There could be thousands, or not? Just like in exponential planning domains. So how can you be sure that Jews would not have been persecuted and killed in Germany if one of the states or transitions you listed would not have happened? Maybe Hitler would have found another way to get his power (he failed before but then rose again). Maybe if Hitler would have been killed someone else would have stepped in and continued his way. Maybe some of the states you listed appear in many paths to that final state and some only in a few. Maybe the system is sensitive in that way. Maybe all paths in the system would have led to the same outcome. Who knows?

  109. Greg Ver Steeg Says:

    Here is a new measure of “causal emergence” that sidesteps issues with normalization. We want to use something like a channel capacity between interventions and effects, as Hoel suggests. For channel capacity, we are supposed to take the supremum of mutual information over all input (intervention) distributions. When we do so, however, we see that Hoel’s examples give the trivial result that the capacity of the micro-channel and the macro-channel are the same, as expected (and for arbitrary examples, the macro-channel capacity can only decrease). However, if we define the “Intervention Efficiency” (IE) as IE = max_{p(intervention_C}} I(intervention_C ; effects_C) / log |C|, for a given coarse-graining, C, with |C| states, then we recover Hoel’s result that a light-switch has maximal causal efficiency when we consider the macro-states “on” and “off”, versus the micro-states for every configuration of atoms (leading to a huge number of states, dramatically lowering the efficiency on the effect state).

  110. FeepingCreature Says:


    “Surprise exists in the map, not in the territory. There are no surprising facts, only models that are surprised by facts. Likewise for facts called such nasty names as “bizarre”, “incredible”, “unbelievable”, “unexpected”, “strange”, “anomalous”, or “weird”. When you find yourself tempted by such labels, it may be wise to check if the alleged fact is really factual. But if the fact checks out, then the problem isn’t the fact, it’s you.” —Think Like Reality (LessWrong)

  111. Jon K. Says:

    #99 and #102

    I think there is lot of “talking past each other” that goes on in the scientific world and sometimes in these threads as well. It’s always great when people really understand the point the other side is trying to communicate as best as possible, before an idea is dismissed. Granted, this takes time and is not feasible for every “paradigm-smashing” theory that shows up in your mailbox.

    Misunderstandings may be better solved through short back-and-forth dialogue, as opposed to long papers/articles/posts. Too bad scientists don’t have more round table discussions. I’d love to see Scott and another QM expert or two sit down with some respected scientists on the “hidden variable” side (e.g. Wolfram, ‘t Hooft, etc.). Having an informal, short-response discussion might allow both sides to really pinpoint where their opinions diverge, and clear up any misunderstandings that exist.

  112. Scott Says:

    Jon K. #111: But scientists do sit down and hash out their differences all the time. It’s just that, because of selection effects, those discussions often aren’t what you see on blogs like this one. Often the scientists start with a vast amount of shared understanding, and the point that they disagree about it is too technical—too deep in the search tree—for hashing it out on a general-interest blog to make sense. Other times, one side realizes extremely quickly that they made a mistake, or the sides realize they were just using words differently, and there’s no reason to take the whole thing public.

    But then there are the cases where essentially all experts have a certain shared understanding, based on common use of words and concepts—for example, that local hidden variable theories can’t violate the Bell inequality—but a small contingent goes to the public with claims that some local hidden variable theory can too violate Bell. In such cases, in my experience, no amount of discussion with the radical contingent ever really clears anything thing up, because the radical contingent never submits to the same shared understanding of words and concepts as everyone else. “Yes, you say that 1+1=3 is false, but that’s only because you’ve ignored multiple realizability and non-orientable manifolds and Clifford algebras and yada yada yada…”

    Anyway, for reasons I’ll leave as exercises for the reader, it’s these latter cases that disproportionately make it to Shtetl-Optimized. 🙂

  113. Scott Says:

    Will #104: Thanks very much for your comment, which helped me clarify my own thoughts! This will, I think, be my final “technical” response to Erik Hoel, and to Simon DeDeo (who has a tweetstorm arguing that I’m wrong).

    It seems to me that there are two comparisons that would make sense here.

    On the one hand, you could look at a “natural” distribution Dmic over microstates, as well as the distribution Dmac over macrostates that Dmic gives rise to (obtained by coarse-graining Dmic). If you do that, then of course predictability given a sample from Dmic will be at least as great as predictability given a sample from Dmac, by the data processing inequality. So, no good for Hoel’s thesis.

    Alternatively, you could take seriously everything Erik keeps saying about the importance of coding, exogenous interventions by the experimenter, and Pearl’s do-operators. That would suggest letting the experimenter choose a distribution Dmac over macrostates that maximizes the effective information. OK, but then that Dmac is consistent with some underlying distribution Dmic over microstates, and a sample from Dmic still gives you at least as great predictability as a sample from Dmac. So, no good for Hoel’s thesis either.

    What Hoel actually does is an unholy hybrid of the two things. On the one hand, he says that we get to consider a uniform distribution Dmac over macrostates, because we’re the experimenters and we can exogenously intervene. OK. But on the other hand, he refuses to let us consider the distribution Dmic over microstates that would naturally arise by drawing a macrostate from Dmac. Instead he demands, for no discernable reason, that the microstate be drawn from the uniform distribution over all microstates, something that would only make sense in a situation with no exogenous control.

    Of course, only by rigging the comparison in that way, is he able to arrive at the amazing conclusion that a Dmac sample can give you more information than a Dmic sample.

  114. Sniffnoy Says:

    Scott #113: Thanks, I was getting confused here! I was confused as to how Hoel’s talking about it in terms of channel capacity (which should yield an inequality as usual, right?) matched up with his uniform distribution; the answer appears to be “it doesn’t”.

  115. Richard Gaylord Says:

    “maybe just bored with debates about ‘reductionism’ versus ’emergence’?”. me too. here’s two statements that are relevant.

    (1) from Raold Hoffman, Nobel Prize winning theoretical chemist:

    “there are concepts in chemistry which are not reducible to physics. Or if they are so reduced, they lose much that is interesting about them. I would ask the reader who is a chemist to think of ideas such as aromaticity, acidity and basicity, the concept of a functional group, or a substituent effect. Those constructs have a tendency to wilt at the edges as one tries to define them too closely. They cannot be mathematicisized, they cannot be defined unambigously. But they are of fantastic utility to our science.”

    (2) from Physics Today April 2017 “In Defense of Crazy Ideas”:

    “As for Crazy Ideas of the Third Kind […] The extension of a rubber band, which roughly obeys Hooke’s law, is purely entropic and has nothing to do with the forces between the atoms that make up the material, so one could say that in that case a force law emerges from Boltzmann’s definition of entropy.”

  116. Adamt Says:

    The comments are completely screwed up for me. A few days ago I could see up to around #85, but now it’s stuck at #73. Also, I am not Adamt, but their info is stuck in the boxes (previously it was someone else’s info).

    No need to post this comment, just letting you know.

  117. Adamt Says:

    That comment seems to have cleared up the logjam, because now I can see up to #115.

  118. Libb Says:

    I can’t see the last comments. The homepage link displays 115 comments but when I click I only see the first 64 comments

  119. Libb Says:

    Ok now that I commented I can see them..

  120. Erik Hoel Says:

    Just popping back in this one time to link a new non-technical explainer of causal emergence on my website. It also demonstrates why Scott’s criticism, which is actually clearest not in his original post but in #113 (and also Will’s #104), is based on failing to separate out the information associated with causal structure from information in general.

    Thanks for the discussion – Erik

  121. dorothy Says:

    I couldn’t miss the irony of your ‘never, never, never normalize this’ post and the ‘it all comes down to normalization’ statement about Trump in your current post. Bless the English language.

  122. Cameron Smith Says:

    The paragraph that begins

    Once the argument is spelled out, it’s clear that the entire thing boils down to, …, a normalization issue.

    really is the main point on which any future debate on this topic, at least insofar as it relates to the paper at hand, hinges.

    I know everyone, including myself, becomes skeptical by default when people reference their own research, but I think there is a somewhat related question as follows:

    If probabilistic constraints are defined over potentially overlapping subsets of components generating a microstate, for example as a result of interfaces between sub-components comprising the micro-system and other, naively independent, systems that interact through said interfaces, is there a distribution over the joint microstates capable of satisfying those constraints?

    The answer is quite trivially, no, and has been studied extensively at least since Boole, 1862, and likely earlier. The setting is purely classical, but otherwise, from a probability theory perspective, related to http://dx.doi.org/10.1103/PhysRevA.88.022118 except that hypergraphs are explicitly considered on three or four component microsystems as opposed to n-cycles. Some collections of distributions over sub-components cannot arise from any joint distribution over microstates, which, in a sense, limits the subset of the collection of possible interfaces a system can simultaneously engage in with other systems: https://arxiv.org/abs/1506.02749 . This may seem quite a bit less exciting since it also seems completely consistent with reductionism, but it could be a useful thing to be more broadly utilized in reasoning about various manners in which systems interact with one another by noting some ways in which classical systems actually cannot possibly interact despite the potential for spending some time in ill-fated attempts to do so. In a way, I get the sense that this is among the things that motivated the primary paper at hand in this blog article.

  123. Joshua Zelinsky Says:

    On the topic of the data processing inequality, here is a question that may not be well-posed: what is the relationship between the data processing inequality, superdense coding and Bell’s inequality? It seems like superdense coding gets around the data processing inequality because the inequality only applies to local processes, and the entanglement is not local. This sounds close to what happens with Bell’s inequality but not quite- like evading the data processing inequality is somehow a less surprising claim, but what exactly is the difference that makes this less surprising? Somehow it seems like this should have something to do with local realism, but I think I may be confused here.

  124. OnOracles Says:

    So are you a theist?

  125. Scott Says:

    OnOracles #124: If I answered that, I don’t see how it would provide any additional information beyond what’s in the post.

    (It’s similar to how journalists always ask me, “so is the D-Wave device a quantum computer or not?,” as if a categorization choice could give them additional information beyond the facts of what the device is and does.)

  126. Jair Says:

    Do you think the hard problem of consciousness is a genuine obstacle to pure reductionism?

  127. Isotopeblue Says:

    Hmmm, you’re probably right. Apologies for the grumpy last night posts.

  128. OnOracles Says:

    Well if everything has a causation then sum of parts does not give the whole picture makes sense right?

    The right question then is if sum of parts does not give the whole picture then does everything have a causation?

  129. OnOracles Says:

    There is something called common information (rather knowledge which you have already blogged) which may be the notion of additional information that you are missing that clearly could convey what you intend to convey unequivocally.

  130. quax Says:

    “The world itself had been rooting for horribleness all along. “

    Not the entire world, but enough humans long for it to make it come about.

    One only has to do some cursory field research at sites like zerohedge.com, where the alt-right congregates, to experience that this is true.

    Incidentally, Scott, that is why I did not understand why you engaged people, like Mencius Moldbug in an attempt at a rational discussion. There is nothing to discuss, they are neither irrational nor inconsistent, they just made the choice to glorify evil (for lack of a better word in the higher-level causation plane).

  131. A primer on causal emergence | Artificia Intelligence Says:

    […] first post is inspired by the physicist and blogger Scott Aaronson, who recently blogged his criticisms about a theory I’ve been working on, called causal emergence. Since his criticisms reflected a […]

  132. Shmi Nux Says:

    I suspect if the theory was called something like “A Model for Quantifying Emergence at Various Scales” instead of any present grandiose claims like “When the Map Is Better Than the Territory,” there would be a lot less controversy and a lot more productive discussions on the power of the suggested measures.

  133. quax Says:

    The fact that Erik Hoel thinks that you are a physicist, rather than a theoretical computer scientist, something that can be verified rather easily, does not inspire confidence in his rigour.

  134. gentzen Says:

    quax #133: Erik Hoel probably thinks that Scott is a polymath, hence he uses the profession which he thinks most appropriate for the given context. Since physicists are the ones who believe in scientific reductionism, it just makes sense to call Scott a physicist for this discussion. On the other hand, I am not so sure whether Scott really believes in scientific reductionism. I guess he just wants a more convincing disproof that doesn’t feel like a cheap trick to him.

  135. Jim Cross Says:

    I have been struggling with the last part of your post, Scott. I can’t decide if it is mostly tongue in cheek or serious. I’ll take you at your word.

    “In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.”

    Reductionism, whatever its successes, has been spectacularly unsuccessful at explaining complex phenomenon. So anybody who insists it can be the definitive explanation beyond doubt seems to have something more akin to a religious than a scientific belief.

    Your core argument for changing your mind – the Paris Accord – seems particularly weak.

    The Paris Accord isn’t really a solution to climate change. It is at best a first step. I’ve always felt that technological changes, rather than treaties, would in the long run accomplish more and that this will happen regardless of treaties, agreements, or efforts at enforcement.

    What really makes the argument weak, however,is that the US cannot actually withdraw from the agreement until 2020. So unless the millions who turned out for Trump and millions who didn’t turn out for Clinton repeat that history in 2020 we will likely not withdraw at all from the accord.

    Your argument against reductionism seems to be a sort of cosmic Murphy’s Law and heavily subject to observer bias.

  136. ARaybold Says:

    Erik Hoel #120: In that primer, and specifically the note in which you explain that Scott is wrong by definition [1], I was confused by your use of the term ‘black-boxing’ (the definition of which apparently rules out Scott’s objections.) I have previously seen the term to mean that when some subsystem is treated as a black box, one does not look at its internal states and transitions, but only its interactions with the rest of the system. In the example, however, when the macrostate comprised of S1 is black-boxed, it is completely excised from consideration, and all its interactions with the remaining states are ignored.

    While it seems plausible to me that an analysis performed in accordance with the former definition would lead to results that are valid for the whole system, it seems equally the case that if you apply black-boxing in the second sense, the results would not be generally applicable to the whole system, or only with caveats that would be specific to what behavior, exactly, is being consigned to the black box. Taking this specific example from an experimental point of view, if someone performed an experiment on this system in which, by accident or design, it never started in state S1, we would probably not think he had achieved some special insight into how the system worked, and certainly not that he had achieved an insight inexplicable and inexplicable by reductive analysis; we would think of him as being misled by experimental error.

    I guess I am not questioning whether 1/3 or 1/4 is the correct value for S2, S3 and S4 in the black-boxed distribution, but whether setting S1 to zero gives a result that is meaningful for the system as a whole.

    [1] http://www.erikphoel.com/uploads/1/7/8/8/17883727/black-boxing-mistake.pdf

  137. Michael P Says:

    Hi Scott,
    I think you exaggerating the impact of US withdrawal from Paris agreement. If you recall Trump’s speech he was mostly talking about money, even hinting that he would re-join at the right price, and never claimed that climate change wasn’t real. I watched the speech and couldn’t help but picture a car buyer who pretends to leave the dealership so that the dealer would drop the price. Remember that Trump is a professional bargainer, even wrote a book on bargaining. I don’t think this withdrawal is permanent, or that it would last considerable time.

  138. Scott Says:

    Michael #137: The trouble is, he also called climate change a “hoax” during the campaign. So who knows what he believes? Did he ever literally believe that Obama tapped his phones, or wasn’t born in the US? My own best guess is that the entire concept of “belief,” in some proposition about the external world being “true” or “false,” isn’t really applicable to him.

  139. practical joke Says:

    “Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.”

    Just pick whatever normalization lets you make fun of Florida 🙂

  140. Michael P Says:

    `the entire concept of “belief,” in some proposition about the external world being “true” or “false,” isn’t really applicable to him.`

    That’s applicable to most politicians, left and right. You mentioned yourself “social justice warriors” who couldn’t possibly be less concerned about `some proposition about the external world being “true” or “false”`

    Trump, *as a person*, is certainly one of the most unhinged of them, ever. On the other hand, in the last couple of years left, *as a political movement*, seem to me even more unhinged than Trump, and certainly more violent. Trump will pass, likely in 3.5 years. I’m not sure that the trend for deaf unreasoning violence set by the left will pass anytime soon.

  141. Theophanes Raptis Says:

    Strange but the same has been said about Buddha!
    There must be a demarkation line somewhere…

  142. shawnuff Says:

    What about “autopoiesis”? Is this concept completly out of scope in this context?

  143. fred Says:

    Scott #138

    “[…]in some proposition about the external world being “true” or “false,” isn’t really applicable to him.”

    He’s the first realization of a macroscopic qubit.

  144. jonas Says:

    Re #112

    > or the sides realize they were just using words differently

    Indeed. It often turns out that there is no actual disagreement, just some miscommunication, and if you ask the other party to explain why think said something, you’ll find out they didn’t mean to imply what you thought they were implying. Listening to others is a hard skill for me, and for other people too, and I have to actively work on it.

    Re Jim Cross #135

    > The Paris Accord isn’t really a solution to climate change. It is at best a first step. I’ve always felt that technological changes, rather than treaties, would in the long run accomplish more and that this will happen regardless of treaties, agreements, or efforts at enforcement.

    Yes! Please build those nuclear fusion reactors and vacuum trains already, it’s getting urgent.

  145. Eli Says:

    So hold on. I honestly thought the whole thing about Donald Trump was some elaborate Talmudic joke.

  146. Jim Cross Says:


    “The scenario suggests green energy is taking root more quickly than most experts anticipate. It would mean that global carbon dioxide pollution from fossil fuels may decline after 2026, a contrast with the International Energy Agency’s central forecast, which sees emissions rising steadily for decades to come. ”


  147. Phylliida Says:

    This might have already been mentioned, but from a computational point of view isn’t the view that “everything can’t be addressed in terms of microscopic terms” trivial?

    As a simple example, let’s say that I have lots of turing machines that I know whether or not they halt. I consider a 0 as a (eventually) halting turing machine and a 1 as one that loops forever.

    Now I encode the contents of this post in binary into turing machines and provide them to you. Sure, technically microscopic models describe your message, but computationally the macroscopic model of “given the bits, what is your post” is clearly much more powerful.

    This might seem like a dumb edge case but there are many more practical similar ones you can imagine.

  148. Oldman Says:

    The good brain dies by 35. amazed it still runs for you. envious here

  149. quax Says:

    @gentzen, fair enough, benefit of the doubt and all that. But then again I first learned about the concept of emergence of complex patterns far away from the thermodynamic equilibrium from Haken’s work back in the nineties.

    IMHO Hoel may have to resurrect some 19th century physicists to have the kind of debate he is looking for.

    Is it just me, or do philosophers always seem to lack behind the physics discourse by about a century?

  150. fred Says:

    Nothing at the macro level is truly emergent.
    Micro and macro are both sides of the same coin – it’s like asking “What runs a computer? Its own atoms and their physics or its software?”, so high-level causation implies assuming some intent behind the initial conditions of the universe.

  151. ilya shmulevich Says:

    I concur with Scott “that requiring the macro-distribution to arise by marginalizing the micro-distribution” is correct. Coarse-graining mappings are designed to preserve some property of the original probability structure (e.g. steady state distribution), since that is precisely the point of using such mappings in the first place. For example, we explored such mappings for probabilistic Boolean networks, which are discrete dynamical systems whose dynamics are described by Markov chains. (https://doi.org/10.1016/S0165-1684(02)00480-2)

    A uniform distribution over the microstates will necessarily lead to a non-uniform distribution over the macrostates. In general, the assumption that the states, at any given scale, are distributed uniformly (or their intervention distribution, in this case) violates any possible assumption of uniformity at any of the lower scales, which is demanded by the method. The only way to save it is indeed to “break” (hence, the term “fix” used in Scott’s post) the probabilistic structure of the underlying model such that an arbitrary (in this case, uniform) distribution can be imposed at each scale. Everything does seem to hinge on this assumption.

  152. Jon K. Says:


    I love it, fred! “Emergence” is a not-too-well-defined term.

    What if emergence represents a crossover point when a system can be described with a new, higher-level model. Sure you could use the original, more fundamental model, but if the complexity of the system yields new, higher-level structures, then why not roll-up that complexity into new chunks and therefore reduce the complexity of your model going forward/higher?

    I guess the risk in creating higher-level models and doing away with the more fundamental models that have become unwieldy, is that you might not capture all of the details of the more fundamental model or the actual system itself… but maybe you could capture those detail in some probabilistic way. I wonder if QM models aren’t fundamental, but are actually just some higher-level statistical models.

    Somewhat Related: What if the uncertainty principle related to bandwidth limits?!

    (just some thoughts from an armchair scientist… have at it)

  153. Theophanes Raptis Says:

    What if “emergence” := “emulation”? (That’s not Bostick’s argument at any rate!)

  154. Jon K. Says:


    Hi Theo,

    Can you expand on your reply?


  155. fred Says:

    Jon K. #152

    Right, I get the usefulness of high level concepts, e.g. that’s what software is – it allows us to totally to determine the behavior (initial conditions and end state) of macro properties (transistors as on/off switches) of systems made of uncountable atoms (digital computers).

    But the real puzzler here is the emergence of “software” as a thing…
    Software is some subtle mapping between biological structures in our brain and macro silicon structures in the computers.
    Also, if I present you with two complex mechanical systems, you’d have a tough time measuring the degree of “softwareness” in each of them.
    Software (and any explanatory high level concept) eventually is traced back to our own ability to contemplate the universe as conscious beings, which hits the hard problem of consciousness and the tendency of inanimate matter to organize itself into structures that are super sensitive to the macro states in their environment (i.e. brains).

  156. fred Says:

    To expend a bit…
    It seems that the two basic ingredients necessary for the appearance of “agents” claiming that there is such a thing as high-level causation (aka the illusion of free will) are the ones required for life:

    1) ability for matter to spontaneously assemble itself into clumps that can auto-duplicate almost perfectly (almost -> mutation).

    2) ability for matter to spontaneously assemble into small clumps sensitive to the macro properties of bigger and more distant clumps, which increases the likelihood of property 1).

  157. Theophanes Raptis Says:

    Suppose you have a theory A and a theory B with two entirely different axiomatic bases that have a cross-section of common predictions directly accessible to us. If A can be made larger than B it would be appropriate to call A a possible “emulator” of B, especially if these two are in fact constructively inequivalent even if they appear isomorphic to a certain extent to us. The later also calls for a theory of constructibility which appears to be quite harder than merely a theory of computability in the absence of a TOE guaranteeing a proper ontic substrate.

  158. steve Says:

    So there is absolutely nowhere to go to escape political fulminations?

    I would have thought that a blog about quantum mechanics would be a refuge from the incessant hysteria of “progressives” about the horrors of Trump’s election victory.

    Apparently I was naive.

  159. Scott Says:

    steve #158: I don’t know how long you’ve been reading this blog, but it’s ALWAYS been about whatever I’ve felt like writing about on a given day.

    In addition, while reasonable people might draw the line in different places, surely we agree that there’s SOME level of political horribleness beyond which remaining silent is a louder political statement than protesting. If I’m not waiting for the thugs in power to literally round up and imprison dissidents before sounding the alarm, it’s simply because I’m a patriot who holds the US to a higher standard than I’d hold some random authoritarian hellhole.

  160. Theophanes Raptis Says:

    It would be rather more enlightening if someone could provide a criterion for the limits of a “scientific” discussion before becoming “political”. It’s not so much about the critic first attempted by Kuhn and Feyerabend as it is about what was already written in the Socratic dialogues which seem to remain as lively as they could be on both the populism and the likeness of the “masses” into beliefs (when directly translated into premises) of any kind.

Leave a Reply