Integrated Information Theory: Virgil Griffith opines

Remember the two discussions about Integrated Information Theory that we had a month ago on this blog?  You know, the ones where I argued that IIT fails because “the brain might be an expander, but not every expander is a brain”; where IIT inventor Giulio Tononi wrote a 14-page response biting the bullet with mustard; and where famous philosopher of mind David Chalmers, and leading consciousness researcher (and IIT supporter) Christof Koch, also got involved in the comments section?

OK, so one more thing about that.  Virgil Griffith recently completed his PhD under Christof Koch at Caltech—as he puts it, “immersing [him]self in the nitty-gritty of IIT for the past 6.5 years.”  This morning, Virgil sent me two striking letters about his thoughts on the recent IIT exchanges on this blog.  He asked me to share them here, something that I’m more than happy to do:

Reading these letters, what jumped out at me—given Virgil’s long apprenticeship in the heart of IIT-land—was the amount of agreement between my views and his.  In particular, Virgil agrees with my central contention that Φ, as it stands, can at most be a necessary condition for consciousness, not a sufficient condition, and remarks that “[t]o move IIT from talked about to accepted among hard scientists, it may be necessary for [Tononi] to wash his hands of sufficiency claims.”  He agrees that a lack of mathematical clarity in the definition of Φ is a “major problem in the IIT literature,” commenting that “IIT needs more mathematically inclined people at its helm.”  He also says he agrees “110%” that the lack of a derivation of the form of Φ from IIT’s axioms is “a pothole in the theory,” and further agrees 110% that the current prescriptions for computing Φ contain many unjustified idiosyncrasies.

Indeed, given the level of agreement here, there’s not all that much for me to rebut, defend, or clarify!

I suppose there are a few things.

  1. Just as a clarifying remark, in a few places where it looks from the formatting like Virgil is responding to something I said (for example, “The conceptual structure is unified—it cannot be decomposed into independent components” and “Clearly, a theory of consciousness must be able to provide an adequate account for such seemingly disparate but largely uncontroversial facts”), he’s actually responding to something Giulio said (and that I, at most, quoted).
  2. Virgil says, correctly, that Giulio would respond to my central objection against IIT by challenging my “intuition for things being unconscious.”  (Indeed, because Giulio did respond, there’s no need to speculate about how he would respond!)  However, Virgil then goes on to explicate Giulio’s response using the analogy of temperature (interestingly, the same analogy I used for a different purpose).  He points out how counterintuitive it would be for Kelvin’s contemporaries to accept that “even the coldest thing you’ve touched actually has substantial heat in it,” and remarks: “I find this ‘Kelvin scale for C’ analogy makes the panpsychism much more palatable.”  The trouble is that I never objected to IIT’s panpsychism per se: I only objected to its seemingly arbitrary and selective panpsychism.  It’s one thing for a theory to ascribe some amount of consciousness to a 2D grid or an expander graph.  It’s quite another for a theory to ascribe vastly more consciousness to those things than it ascribes to a human brain—even while denying consciousness to things that are intuitively similar but organized a little differently (say, a 1D grid).  A better analogy here would be if Kelvin’s theory of temperature had predicted, not merely that all ordinary things had some heat in them, but that an ice cube was hotter than the Sun, even though a popsicle was, of course, colder than the Sun.  (The ice cube, you see, “integrates heat” in a way that the popsicle doesn’t…)
  3. Virgil imagines two ways that an IIT proponent could respond to my argument involving the cerebellum—the argument that accuses IIT proponents of changing the rules of the game according to convenience (a 2D grid has a large Φ?  suck it up and accept it; your intuitions about a grid’s lack of consciousness are irrelevant.  the human cerebellum has a small Φ?  ah, that’s a victory for IIT, since the cerebellum is intuitively unconscious).  The trouble is that both of Virgil’s imagined responses are by reference to the IIT axioms.  But I wasn’t talking about the axioms themselves, but about whether we’re allowed to validate the axioms, by checking their consequences against earlier, pre-theoretic intuitions.  And I was pointing out that Giulio seemed happy to do so when the results “went in IIT’s favor” (in the cerebellum example), even though he lectured me against doing so in the cases of the expander and the 2D grid (cases where IIT does less well, to put it mildly, at capturing our intuitions).
  4. Virgil chastises me for ridiculing Giulio’s phenomenological argument for the consciousness of a 2D grid by way of nursery rhymes: “Just because it feels like something to see a wall, doesn’t mean it feels like something to be a wall.  You can smell a rose, and the rose can smell good, but that doesn’t mean the rose can smell you.”  Virgil amusingly comments: “Even when both are inebriated, I’ve never heard [Giulio] nor [Christof] separately or collectively imply anything like this.  Moreover, they’re each far too clueful to fall for something so trivial.”  For my part, I agree that neither Giulio nor Christof would ever advocate something as transparently silly as, “if you have a rich inner experience when thinking about X, then that’s evidence X itself is conscious.”  And I apologize if I seemed to suggest they would.  To clarify, my point was not that Giulio was making such an absurd statement, but rather that, assuming he wasn’t, I didn’t know what he was trying to say in the passages of his that I’d just quoted at length.  The silly thing seemed like the “obvious” reading of his words, and my hermeneutic powers were unequal to the task of figuring out the non-silly, non-obvious reading that he surely intended.

Anyway, there’s much more to Virgil’s letters than the above—including answers to some of my subsidiary questions about the details of IIT (e.g., how to handle unbalanced partitions, and the mathematical meanings of terms like “mechanism” and “system of mechanisms”).  Also, in parts of the letters, Virgil’s main concern is neither to agree with me nor to agree with Giulio, but rather to offer his own ideas, developed in the course of his PhD work, for how to move forward and fix some of the problems with IIT.  All in all, these are recommended reads for anyone who’s been following this debate.

28 Responses to “Integrated Information Theory: Virgil Griffith opines”

  1. Patrick Says:

    Correct me if I’m wrong. But if we treat large phi as a necessary condition but not a sufficient condition, then the objections from pre-theoretic intuition fall away. All things with consciousness have “large” phi. In other words, the weird predictions only come when we treat large phi as a sufficient condition.

  2. Scott Says:

    Patrick #1: You can get a “weird prediction” purely from treating large Φ as a necessary condition, but it’s not quite as weird. The weird prediction is that if you take a conscious system (say, a human brain) and simulate it on a computer, then you can bring another consciousness into being only if the computer happens to have a large Φ. So for example, simulating the brain using a 2D grid of logic gates might or might not be enough to bring about consciousness, but simulating it using a Turing machine with a 1D tape certainly wouldn’t be. And you might find it bizarre (as I do) that, given two behaviorally-indistinguishable systems, one could have qualia and the other not because of what seems to a computer scientist like a completely unimportant detail of their organization. On the other hand, this is exactly the sort of counterintuitive prediction that I could suck up and accept, if I thought I was forced to by the successes of IIT in other domains (which, of course, I don’t think I am).

  3. Darrell Burgan Says:

    I apologize if I’m slow on this, but I am unclear as to what “consciousness” is, in unambiguous terms. We define the Turing Test for an AI as one that a human cannot distinguish from a real human, but have we defined what it is that defines a human as conscious in the first place? If we cannot rigorously define what consciousness is, how can we say what is or isn’t necessary, sufficient, or even related to consciousness?

    For example, I “know” that I’m conscious, but under the current science it seems there is no way I can objectively prove that to anyone else! The best I can hope to do is prove I am indistinguishable in conversation from another person (or perhaps an AI), but only insofar as I can convince another allegedly conscious mind into thinking so.

    It all seems very circular to me.

  4. Rahul Says:

    “It all seems very circular to me.”

    That’s exactly my impression from this whole discussion.

    Aren’t we trying to define “consciousness”, a somewhat ill-defined state, in terms of Φ, another vaguely defined parameter?

  5. domenico Says:

    When I read IIT for the first time, I noted a great analogy with the theory of Paul Davies for the life definition (that I read superficially some time ago).
    But I go over, a sterile organism have only consciousness, so that a sterile elementary organism without sensory input (due to illness or encystment) is only consciousness, and the first Artificial Intelligence will be only consciousness (so it is possible conscious life without self-replication).
    The self-replication could be only a means to obtain complex information (complex code or complex program): it is possible that there is only an hard algorithmic problem for the life and consciousness.

  6. fred Says:

    Can we ever hope to tackle these questions without a better understanding of “physical-ness”?
    I.e. the degree of equivalence between discrete algorithms (digital systems) and the physical systems they simulate.
    That would require understanding properly the nature of space and time (discrete at the Planck scale?).

    Some say that the equivalence is perfect, i.e. the physical universe is actually digital in nature (the universe is nothing more and nothing less than a number).

    Others say it’s not the case, but don’t really explain what are the implications (afaik).

  7. fred Says:

    Darrell #3
    Well, every single concept is eventually circular if you dig deep enough, no? Everything is explained in terms of something else (a journalist asked Feynman why magnets are so mysterious… everything is as mysterious as the magnet!).

    For me it helps to look at this discussion from a practical perspective:
    1) we are conscious.
    2) consciousness seems to be an emergent property of some human brains (just like the relation between the concept of a wave and water molecules).
    3) is there a set of metrics that would allow us to pin-point more precisely conscious brains (without asking them “are you conscious”?). Those metrics should score low(er) for brains that have been dead for 10 hours, for brains with mental diseases, for animals, etc.
    4) maybe the same metrics will allow us to assess whether a coma patient is conscious or not, whether a blob from outer space is conscious or not.

  8. James Cross Says:

    Virgil’s reference to the All or Nothing transform at the end of letter seems to be making an argument somewhat similar to that made in Is Consciousness Computable? Quantifying Integrated Information Using Algorithmic Information Theory which I brought up some time back (and may have set you off on this path of looking at ITT).

    It seems to me in the real world maximal Φ would not be practical. If every bit depends on every other bit, then one random flipping of a bit (a cosmic ray hit on a neuron, for example) would cause a complete collapse of consciousness. High Φ or at least some minimal level of Φ might be required for consciousness but it seem to me that real world consciousness would work with some optimal level of Φ rather than a maximal level.

  9. Wondering Says:

    #8 James

    ‘It seems to me in the real world maximal Φ would not be practical….it seems to me that real world consciousness would work with some optimal level of Φ rather than a maximal level.’

    Exactly. Basic engineering principles of all types emphasize the desirability of modularity, rather than maximal Φ or anything like that. Unnecessarily large Φ is bad, and surely it’s selected against.

  10. Rahul Says:

    @Wondering

    But is nature’s system design modular? I agree that as modern design low Φ is good.

    But that need not mean nature also adheres to the principle of selecting against large Φ.

  11. Patrick Says:

    @darrell(#3)

    This post is actually the continuation of a previous discussion on this blog.

    http://www.scottaaronson.com/blog/?p=1799

    To make the problem tractable, Scott proposed limiting the problem to producing a theory that is consistent with our intuitions on what is conscious. This is still a “pretty hard problem” but avoids thorny questions about the essence of consciousness.

  12. Phil Says:

    #2 Scott

    I don’t understand why there should necessarily be a large difference in \(\phi\) between a computer operating on a 2D grid of logic gates and a Turing machine with a 1D tape.

    In the case of, say, the Ising model, the structure of the couplings/correlations clearly depends on dimensionality, and I can see why \(\phi\) should be different in 1 vs. 2D. But in the case of the computer program, it seems like \(\phi\) should not be determined by the physical structure of the hardware, but rather by the correlation between abstracts states of the program, which shouldn’t depend on hardware. Perhaps this is not the case for the current definition of \(\phi\), but it also seems like this would be a straightforward change to make.

    So perhaps there are issues with ambiguity of what state space or coordinate system to use when computing \(\phi\), but this seems like a relatively fixable problem, rather than a trenchant objection to the idea of \(\phi\) as a necessary (but not sufficient) condition for consciousness.

  13. Scott Says:

    Phil #12: FWIW, Giulio himself has been adamant that what matters for calculating Φ is the organization of the actual physical components (e.g., the gates and wires), rather than that of some abstract mathematical structure that the components are simulating. He’s made that point repeatedly in his papers, claiming that it’s what rules out existing computers being very conscious. And he made it again in his response to me, when he remarked that the Vandermonde matrix, being a mathematical abstraction, has no Φ-value; only actual logic circuits for applying the Vandermonde transformation have Φ-values.

    Of course, other people always have the option of breaking with IIT’s founder on this point. But in Giulio’s defense, if we did go the abstract route, then it seems to me that ambiguities in which abstract structure we should be calculating Φ for would become immense, crippling our ability to get any clear predictions about the “amount of consciousness” present in anything. The problem is that the same computational process can be described at many different levels of abstraction—in terms of the movements of electrons, the transistors, the pipelined process on the chip, the assembly-language instructions, the C/Python/etc. code, a mathematical description of the algorithm—and I see no reason whatsoever for the value of Φ to be robust to changes in the description level.

  14. Phil Says:

    #13 Scott

    I definitely agree that the value of \(\phi\) will not be robust to changes in the level of description of a particular system, but I guess I am more optimistic than you that this problem is resolvable. The issue of ambiguity in absolute information content does not seem unique to IIT. For example, if we wanted to compute the information content of a message written on a piece of paper, we could consider the positions of molecules in the ink and paper, or the detailed shapes or misshapes of the letters, etc etc. But in practice we just consider the identity of the various letters and their frequencies.

    This is a tendentious example, and clearly the problem of ambiguity will be much harder in cases of potentially conscious systems, but again I think there at least some cases where the solution presents itself. If we are performing a whole brain simulation, then the correct level of description will involve the activation states of the “neurons” in the simulation, which will not depend on the hardware architecture or programming language, and which certainly leaves open the possibility that a computer simulation could be just as conscious as a real brain. Apparently Tononi does not agree with this view, but I am quite happy to try to rescue the idea of IIT from the idiosyncrasies of his approach.

    It also occurs to me that the issue of ambiguity in computing \(\phi\) is not entirely unlike the ambiguity of classical entropy, where the absolute entropy of some physical system can depend on the scale of partitioning of microstates. We get around this by only considering relative entropy or changes in entropy, so that the constant associated with scale drops out. Perhaps there could be some kind of normalization scheme allowing a similar fix for computing \(\phi\), though it’s not clear how this might work (and again I am certainly oversimplifying the problem).

  15. Virgil Says:

    #13 Scott

    As said in the letter, I always thought it was unfair for phi to claim to be a sufficient measure for consciousness but then afterwards claim that the systems must also be “physical”. If physicalness is a requirement for consciousness, then phi should be in some physical units, not abstract bits.

  16. TF Says:

    scott hits the nailon the head – as there are a countless many models of varying degrees of abstraction that fit a given system’s dynamics (in a perfectly counterfactual manner) the game of C measures has to be understood in the context of fundamental (or at least adequate) descriptions of the dynamics (physics) of a system under scrutiny.

    In all fairness this is what GT tries to do – his basic formulation is in terms of a “neural network” and the notion of abstraction he suggests is not that abstract – namely coarse graining (uniformly) in space and time (that is without “supervening” higher order structures which could be said to be in the eye of the beholder).

    The only piece missing here is that we have no idea how phi behaves as a function of sptio-temporal grain. Ideally we would like something like thermodynamics and statistical mechanics to be in play here: that is that at some point (grain in space and time) we can measure more or less all that we need (that is have sufficient knowledge) without having to track down every little sub-atomic critter (which of course doesn’t seem possible, but that’s another issue).

    GT seems to be claiming that we should expect something like that (at least with brain like systems) – that there will turn out to be a most informative scale for computing phi, because (his point) that conscious experience seems to have definite grain in space and time. However, given that we understand phi very little, and it seems very volatile (whereas i would think we would like a smooth measure or something) and sensitive to N it seems to me like pie in the sky at this point.

    It’s interesting to note that maybe a way out here is a multi-scale measure of C, as that in itself offers a “context free” signature to some extent.

  17. fred Says:

    Scott, probably something obvious, but one big difference between 1D grid and a 2D grid: with a 1D grid, the number of possible paths between any 2 nodes (as in maximum flow) is always one while with a 2D grid that number is very quickly gigantic.

  18. Scott Says:

    fred #17: Well, sure, there’s a huge number of mathematical differences between 1 and 2 dimensions (as between 2 and 3 dimensions, and 3 and 4 dimensions, etc.), and any of them conceivably could be relevant to consciousness. But the claim that any one of them is relevant to consciousness requires a strong argument.

  19. Darrell Burgan Says:

    Patrick #11: yup, I was following that thread closely as well. It’s a fascinating topic to me.

    I get that science has to start somewhere, and the “pretty hard problem” does seem a good place to start.

    What irks me is that, without a rigorous definition of consciousness, a software program has as much chance of “proving” it is conscious (under the Turing Test) as I have. This defies my intuition! :-)

    Anyway, attempting a measure of consciousness, as in IIT, seems very premature given we can’t even define what consciousness is yet, much less measure or model it.

    Apologies to Scott for being impolite and bringing the word “consciousness” into the discussion. :-)

  20. DaveK Says:

    I think there are massive problems with Giulio’s blank-wall argument:

    [I]f one thinks a bit about it, the experience of empty 2D visual space is not at all empty, but contains a remarkable amount of structure.  In fact, when we stare at the blank screen, quite a lot is immediately available to us without any effort whatsoever.  Thus, we are aware of all the possible locations in space (“points”): the various locations are right “there”, in front of us.  We are aware of their relative positions: a point may be left or right of another, above or below, and so on, for every position, without us having to order them.  And we are aware of the relative distances among points: quite clearly, two points may be close or far, and this is the case for every position.  Because we are aware of all of this immediately, without any need to calculate anything, and quite regularly, since 2D space pervades most of our experiences, we tend to take for granted the vast set of relationship[s] that make up 2D space.

    You don’t need to be a full-on Dennettian to doubt this. How can he be so certain that all that information about, for example, the relationships between points is actually there, in the immediate conscious experience, rather than just inferrable from it on demand when attended to? I think the evidence of tachistoscope experiments strongly suggests that the amount of information available about a sensum starts off small and increases with time of exposure to the sensum, precisely because that information is not just “immediately” there but because there in fact is a “need to calculate” very much of it. Subjectively, I very often have the experience of a conscious sensation being unclear and ill-formed in my initial apprehension of it, and clarifying over time as I pay attention to it. Giulio’s claims about what can be inferred from the blank wall experience contain some very substantial unstated pre-assumptions that would need to be first stated explicitly and then justified before his argument could be considered at all rigorous.

    I’m afraid I’m obliged to place him alongside Penrose in my ‘Why scientists should study philosophy before they start trying to reason about consciousness’ folder.

  21. Erik Says:

    Scott: In regards to your point:
    “… it seems to me that ambiguities in which abstract structure we should be calculating Φ for would become immense… The problem is that the same computational process can be described at many different levels of abstraction… and I see no reason whatsoever for the value of Φ to be robust to changes in the description level.”

    There has actually been work done on how to find the optimal causal spatiotemporal scale for physical systems from the Tononi lab:

    Quantifying causal emergence shows that macro can beat micro

    This approach to finding appropriate spatiotemporal scales isn’t directly about quantifying consciousness. Rather, the idea is that any particular system can be viewed at, as you said, a huge number of possible spatiotemporal levels (or coarse-grains), and that in (small) systems a causal model can be derived for each level. Precisely because the measure we use to find the amount of causal information in those causal models is dependent on scale-choice, this allows for the causal relationships of the system to be best characterized at a particular spatiotemporal level. If the best causal model is found to be at a non-micro spatiotemporal level, then that system can be said to be causally emergent.

    The notion of causal emergence has a relationship to IIT in that it can be used to justify the existence of a “causally privileged” spatiotemporal scale (for the brain perhaps the scale of neurons or cortical minicolumns) over which the IIT measure can be applied, or alternatively to suggest that the same process occurs within IIT whereby there is some particular scale for a any physical system at which integrated information is maximal.

    I also wanted to alert you to this work on causal emergence as I know you are interested in the notion (and problems) of choosing macro-states, as is done here in reference to causal relations.

  22. Darrell Burgan Says:

    Erik #21, isn’t the mere fact that effects at the micro level can interfere evidence that behavior can emerge? One could argue that the behavior of quantum systems itself is an emergence of the sum-over-all-histories going on underneath.

    It seems intuitively obvious to me that when the scale gets exponentially larger than the micro (i.e. when exponential numbers of layers of behavior are added on top of one another), that behavior will emerge simply because of the complexity of all the interference going on beneath it. Am I oversimplifying? Is the notion of emergence really still controversial?

  23. Erik Says:

    Darrell Burgan #22:
    There are different types of acknowledged “emergence”. Everyone believes in “weak” emergence – which sounds like what you’re talking about in terms of the macro-behavior being very different than the micro-behavior (but I don’t fully grok your point about the micro-level “interfering”).

    The notion of causal emergence IS controversial. This is because of a common and powerful reductionist argument which (in simplest form) goes: the micro-level (most basic physical description) of any system is sufficient to describe the behavior of that system, so why not say that all the causal power responsible for the behavior of the system occurs at the micro-level?

  24. Darrell Burgan Says:

    Erik #23 – you said:

    … why not say that all the causal power responsible for the behavior of the system occurs at the micro-level?

    When one has multiple, perhaps an exponential number of, micro systems that are together contributing to a macro behavior, it seems reasonable to me that the macro behavior might be more than just chaotically unpredictable given what’s going on underneath; I think it reasonable that the macro behavior’s relation to the micro systems’ behavior could be formally undecidable. In other words, impossible in principle to say what micro system behavior contributed to the behavior happening at the macro level. Is this not causal emergence?

    I’d also point to Godel’s incompleteness theorems, which I interpret to mean that any consistent causal system, macro or micro, cannot describe all possible outcomes. The causal system cannot be both consistent and complete. If one accepts that Godel’s theorems apply to causal systems, then don’t we have a mathematical argument for causal emergence?

    I’m a software engineer, not a scientist, so determinism is definitely a comforting thought to me. :-) But I’ve seen enough really complex systems to know that sometimes the behavior of the system as a whole is very difficult to explain given the behavior of its components. And this is for systems that have only dozens of layers. When I imagine systems with exponential numbers of layers, I find it hard to imagine that there isn’t emergence going on.

  25. Dan Fitch Says:

    Excellent and interesting letters, Virgil. Thanks for writing to clarify some of the mathematical pieces, and thanks for continuing this conversation where us amateurs can attempt to follow along, Scott.

  26. Dmytry Says:

    In what sense, exactly, would it be a “necessary condition”? It’s a real number, nonzero for pretty much anything that is not static.

    As I’ve said earlier, the value is almost maximally high for an ideal gas (atoms bouncing around). Ohh, they want to freely pick the scale to get rid of some counter examples? How cute.

  27. Erik Says:

    Dmytry #26 – The theory is not as simple as you seem to think. Actually, for the vast majority of most real-world conglomerates, phi is zero.

    You say: “As I’ve said earlier, the value is almost maximally high for an ideal gas (atoms bouncing around)”.

    That is simply false. Systems with minimal causal structure, such as a gas, will not have high phi.

    I’d suggest running a few simple systems through the algorithms detailed in the latest paper for testing your intuitions – instead of leaping to conclusions.

    You also say: “Ohh, they want to freely pick the scale to get rid of some counter examples?”

    As posted above, IIT does precisely the *opposite* of your interpretation – it lets the intrinsic properties of the system pick the scale without reference to any observer.

    There appears to be a lot of confusion about IIT and phi, often based off of taking simplistic versions of the calculation as stand-ins for the real thing.

    Consider a system (set) of three elements {ABC}. For the 8 members of the powerset of {ABC}, integrated information is classified. If the powerset member {{AB},{C}}, has the highest integrated information, then the system should interpreted as: AB is a single complex and C is a single complex – there are 2 consciousnesses in the system. {A} alone would not be consciousness, nor {ABC}, nor would any member of the powerset *not* {AB} alone or {C} alone.

    So even if phi were taken as just a necessary condition, it would still impose strong constraints on what systems and what spatiotemporal scales can be conscious.

  28. Ben Standeven Says:

    Darrell Burgan #24:
    The way formal undecidability is usually defined, you’d need an infinite number of layers to ensure that the behavior of the system is formally undecidable. But I suspect you could define some sort of “exponential undecidability” and show that a system with exponentially many layers can be exponentially undecidable. But I don’t know that I’d call that emergence; if there are exponentially many layers, there are double-exponentially many objects at the bottom layer, so you shouldn’t expect to be able to predict their behavior.

Leave a Reply