What I believe II (ft. Sarah Constantin and Stacey Jeffery)

August 15th, 2017

Unrelated Update: To everyone who keeps asking me about the “new” P≠NP proof: I’d again bet $200,000 that the paper won’t stand, except that the last time I tried that, it didn’t achieve its purpose, which was to get people to stop asking me about it. So: please stop asking, and if the thing hasn’t been refuted by the end of the week, you can come back and tell me I was a closed-minded fool.


In my post “The Kolmogorov Option,” I tried to step back from current controversies, and use history to reflect on the broader question of how nerds should behave when their penchant for speaking unpopular truths collides head-on with their desire to be kind and decent and charitable, and to be judged as such by their culture.  I was gratified to get positive feedback about this approach from men and women all over the ideological spectrum.

However, a few people who I like and respect accused me of “dogwhistling.” They warned, in particular, that if I wouldn’t just come out and say what I thought about the James Damore Google memo thing, then people would assume the very worst—even though, of course, my friends themselves knew better.

So in this post, I’ll come out and say what I think.  But first, I’ll do something even better: I’ll hand the podium over to two friends, Sarah Constantin and Stacey Jeffery, both of whom were kind enough to email me detailed thoughts in response to my Kolmogorov post.


Sarah Constantin completed her PhD in math at Yale. I don’t think I’ve met her in person yet, but we have a huge number of mutual friends in the so-called “rationalist community.”  Whenever Sarah emails me about something I’ve written, I pay extremely close attention, because I have yet to read a single thing by her that wasn’t full of insight and good sense.  I strongly urge anyone who likes her beautiful essay below to check out her blog, which is called Otium.

Sarah Constantin’s Commentary:

I’ve had a women-in-STEM essay brewing in me for years, but I’ve been reluctant to actually write publicly on the topic for fear of stirring up a firestorm of controversy.  On the other hand, we seem to be at a cultural inflection point on the issue, especially in the wake of the leaked Google memo, and other people are already scared to speak out, so I think it’s past time for me to put my name on the line, and Scott has graciously provided me a platform to do so.

I’m a woman in tech myself. I’m a data scientist doing machine learning for drug discovery at Recursion Pharmaceuticals, and before that I was a data scientist at Palantir. Before that I was a woman in math — I got my PhD from Yale, studying applied harmonic analysis. I’ve been in this world all my adult life, and I obviously don’t believe my gender makes me unfit to do the work.

I’m also not under any misapprehension that I’m some sort of exception. I’ve been mentored by Ingrid Daubechies and Maryam Mirzakhani (the first female Fields Medalist, who died tragically young last month).  I’ve been lucky enough to work with women who are far, far better than me.  There are a lot of remarkable women in math and computer science — women just aren’t the majority in those fields. But “not the majority” doesn’t mean “rare” or “unknown.”

I even think diversity programs can be worthwhile. I went to the Institute for Advanced Studies’ Women and Math Program, which would be an excellent graduate summer school even if it weren’t all-female, and taught at its sister program for high school girls, which likewise is a great math camp independent of the gender angle. There’s a certain magic, if you’re in a male-dominated field, of once in a while being in a room full of women doing math, and I hope that everybody gets to have that experience once.  

But (you knew the “but” was coming), I think the Google memo was largely correct, and the way people conventionally talk about women in tech is wrong.

Let’s look at some of his claims. From the beginning of the memo:

  • Google’s political bias has equated the freedom from offense with psychological safety, but shaming into silence is the antithesis of psychological safety.
  • This silencing has created an ideological echo chamber where some ideas are too sacred to be honestly discussed.
  • The lack of discussion fosters the most extreme and authoritarian elements of this ideology.
  • Extreme: all disparities in representation are due to oppression
  • Authoritarian: we should discriminate to correct for this oppression

Okay, so there’s a pervasive assumption that any deviation from 50% representation of women in technical jobs is a.) due to oppression, and b.) ought to be corrected by differential hiring practices. I think it is basically true that people widely believe this, and that people can lose their jobs for openly contradicting it (as James Damore, the author of the memo, did).  I have heard people I work with advocating hiring quotas for women (i.e. explicitly earmarking a number of jobs for women candidates only).  It’s not a strawman.

Then, Damore disagrees with this assumption:

  • Differences in distributions of traits between men and women may in part explain why we don’t have 50% representation of women in tech and leadership. Discrimination to reach equal representation is unfair, divisive, and bad for business.

Again, I agree with Damore. Note that this doesn’t mean that I must believe that sexism against women isn’t real and important (I’ve heard enough horror stories to be confident that some work environments are toxic to women).  It doesn’t even mean that I must be certain that the different rates of men and women in technical fields are due to genetics.  I’m very far from certain, and I’m not an expert in psychology. I don’t think I can do justice to the science in this post, so I’m not going to cover the research literature.

But I do think it’s irresponsible to assume a priori that there are no innate sex differences that might explain what we see.  It’s an empirical matter, and a topic for research, not dogma.

Moreover, I think discrimination on the basis of sex to reach equal representation is unfair and unproductive.  It’s unfair, because it’s not meritocratic.  You’re not choosing the best human for the job regardless of gender.

I think women might actually benefit from companies giving genuine meritocracy a chance. “Blind” auditions (in which the evaluator doesn’t see the performer) gave women a better chance of landing orchestra jobs; apparently, orchestras were prejudiced against female musicians, and the blinding canceled out that prejudice. Google’s own research has actually shown that the single best predictor of work performance is a work sample — testing candidates with a small project similar to what they’d do on the job. Work samples are easy to anonymize to reduce gender bias, and they’re more effective than traditional interviews, where split-second first impressions usually decide who gets hired, but don’t correlate at all with job performance. A number of tech companies have switched to work samples as part of their interview process.  I used work samples myself when I was hiring for a startup, just because they seemed more accurate at predicting who’d be good at the job; entirely without intending to, I got a 50% gender ratio.  If you want to reduce gender bias in tech, it’s worth at least considering blinded hiring via work samples.

Moreover, thinking about “representation” in science and technology reflects underlying assumptions that I think are quite dangerous.

You expect interest groups to squabble over who gets a piece of the federal budget. In politics, people will band together in blocs, and try to get the biggest piece of the spoils they can.  “Women should get such-and-such a percent of tech jobs” sounds precisely like this kind of politicking; women are assumed to be a unified bloc who will vote together, and the focus is on what size chunk they can negotiate for themselves. If a tech job (or a university position) were a cushy sinecure, a ticket to privilege, and nothing more, you might reasonably ask “how come some people get more goodies than others? Isn’t meritocracy just an excuse to restrict the goodies to your preferred group?”

Again, this is not a strawman. Here’s one Vox response to the memo stating explicitly that she believes women are a unified bloc:

The manifesto’s sleight-of-hand delineation between “women, on average” and the actual living, breathing women who have had to work alongside this guy failed to reassure many of those women — and failed to reassure me. That’s because the manifesto’s author overestimated the extent to which women are willing to be turned against their own gender.

Speaking for myself, it doesn’t matter to me how soothingly a man coos that I’m not like most women, when those coos are accompanied by misogyny against most women. I am a woman. I do not stop being one during the parts of the day when I am practicing my craft. There can be no realistic chance of individual comfort for me in an environment where others in my demographic categories (or, really, any protected demographic categories) are subjected to skepticism and condescension.

She can’t be comfortable unless everybody in any protected demographic category — note that this is a legal, governmental category — is given the benefit of the doubt?  That’s a pretty collectivist commitment!

Or, look at Piper Harron, an assistant professor in math who blogged on the American Mathematical Society’s website that universities should simply “stop hiring white cis men”, and explicitly says “If you are on a hiring committee, and you are looking at applicants and you see a stellar white male applicant, think long and hard about whether your department needs another white man. You are not hiring a researching robot who will output papers from a dark closet. You are hiring an educator, a role model, a spokesperson, an advisor, a committee person … There is no objectivity. There is no meritocracy.”

Piper Harron reflects an extreme, of course, but she’s explicitly saying, on America’s major communication channel for and by mathematicians, that whether you get to work in math should not be based on whether you’re actually good at math. For her, it’s all politics.  Life itself is political, and therefore a zero-sum power struggle between groups.  

But most of us, male or female, didn’t fall in love with science and technology for that. Science is the mission to explore and understand our universe. Technology is the project of expanding human power to shape that universe. What we do towards those goals will live longer than any “protected demographic category”, any nation, any civilization.  We know how the Babylonians mapped the stars.

Women deserve an equal chance at a berth on the journey of exploration not because they form a political bloc but because some of them are discoverers and can contribute to the human mission.

Maybe, in a world corrupted by rent-seeking, the majority of well-paying jobs have some element of unearned privilege; perhaps almost all of us got at least part of our salaries by indirectly expropriating someone who had as good a right to it as us.

But that’s not a good thing, and that’s not what we hope for science and engineering to be, and I truly believe that this is not the inevitable fate of the human race — that we can only squabble over scraps, and never create.  

I’ve seen creation, and I’ve seen discovery. I know they’re real.

I care a lot more about whether my company achieves its goal of curing 100 rare diseases in 10 years than about the demographic makeup of our team.  We have an actual mission; we are trying to do something beyond collecting spoils.  

Do I rely on brilliant work by other women every day? I do. My respect for myself and my female colleagues is not incompatible with primarily caring about the mission.

Am I “turning against my own gender” because I see women as individuals first? I don’t think so. We’re half the human race, for Pete’s sake! We’re diverse. We disagree. We’re human.

When you think of “women-in-STEM” as a talking point on a political agenda, you mention Ada Lovelace and Grace Hopper in passing, and move on to talking about quotas.  When you think of women as individuals, you start to notice how many genuinely foundational advances were made by women — just in my own field of machine learning, Adele Cutler co-invented random forests, Corrina Cortes co-invented support vector machines, and Fei Fei Li created the famous ImageNet benchmark dataset that started a revolution in image recognition.

As a child, my favorite book was Carl Sagan’s Contact, a novel about Ellie Arroway, an astronomer loosely based on his wife Ann Druyan. The name is not an accident; like the title character in Sinclair Lewis’ Arrowsmith, Ellie is a truth-seeking scientist who battles corruption, anti-intellectualism, and blind prejudice.  Sexism is one of the challenges she faces, but the essence of her life is about wonder and curiosity. She’s what I’ve always tried to become.

I hope that, in seeking to encourage the world’s Ellies in science and technology, we remember why we’re doing that in the first place. I hope we remember humans are explorers.


Now let’s hear from another friend who wrote to me recently, and who has a slightly different take.  Stacey Jeffery is a quantum computing theorist at one of my favorite research centers, CWI in Amsterdam.  She completed her PhD at University of Waterloo, and has done wonderful work on quantum query complexity and other topics close to my heart.  When I was being viciously attacked in the comment-171 affair, Stacey was one of the first people to send me a note of support, and I’ve never forgotten it.

Stacey Jeffery’s Commentary

I don’t think Google was right to fire Damore. This makes me a minority among people with whom I have discussed this issue.  Hopefully some people come out in the comments in support of the other position, so it’s not just me presenting that view, but the main argument I encountered was that what he said just sounded way too sexist for Google to put up with.  I agree with part of that, it did sound sexist to me.  In fact it also sounded racist to me. But that’s not because he necessarily said anything actually sexist or actually racist, but because he said the kinds of things that you usually only hear from sexist people, and in particular, the kind of sexist people who are also racist.  I’m very unlikely to try to pursue further interaction with a person who says these kinds of things for those reasons, but I think firing him for what he said between the lines sets a very bad precedent.  It seems to me he was fired for associating himself with the wrong ideas, and it does feel a bit like certain subjects are not up for rational discussion.  If Google wants an open environment, where employees can feel safe discussing company policy, I don’t think this contributes to that.  If they want their employees, and the world, to think that they aim for diversity because it’s the most rational course of action to achieve their overall objectives, rather than because it serves some secret agenda, like maintaining a PC public image, then I don’t think they’ve served that cause either.  Personally, this irritates me the most, because I feel they have damaged the image for a cause I feel strongly about.

My position is independent of the validity of Damore’s attempt at scientific argument, which is outside my area of expertise.  I personally don’t think it’s very productive for non-social-scientists to take authoritative positions on social science issues, especially ones that appear to be controversial within the field (but I say this as a layperson).  This may include some of the other commentary in this blog post, which I have not yet read, and might even extend to Scott’s decision to comment on this issue at all (but this bridge was crossed in the previous blog post).  However, I think one of the reasons that many of us do this is that the burden of solving the problem of too few women in STEM is often placed on us.  Some people in STEM feel they are blamed for not being welcoming enough to women (in fact, in my specific field, it’s my experience that the majority of people are very sympathetic).  Many scientific funding applications even ask applicants how they plan to address the issue of diversity, as if they should be the ones to come up with a solution for this difficult problem that nobody knows the answer to, and is not even within their expertise.  So it’s not surprising when these same people start to think about and form opinions on these social science issues.  Obviously, we working in STEM have valuable insight into how we might encourage women to pursue STEM careers, and we should be pushed to think about this, but we don’t have all the answers (and maybe we should remember that the next time we consider authoring an authoritative memo on the subject).


Scott’s Mansplaining Commentary

I’m incredibly grateful to Sarah and Stacey for sharing their views.  Now it’s time for me to mansplain my own thoughts in light of what they said.  Let me start with a seven-point creed.

1. I believe that science and engineering, both in academia and in industry, benefit enormously from contributions from people of every ethnic background and gender identity.  This sort of university-president-style banality shouldn’t even need to be said, but in a world where the President of the US criticizes neo-Nazis only under extreme pressure from his own party, I suppose it does.

2. I believe that there’s no noticeable difference in average ability between men and women in STEM fields—or if there’s some small disparity, for all I know the advantage goes to women. I have enough Sheldon Cooper in me that, if this hadn’t been my experience, I’d probably let it slip that it hadn’t been, but it has been.  When I taught 6.045 (undergrad computability and complexity) at MIT, women were only 20% or so of the students, but for whatever reasons they were wildly overrepresented among the top students.

3. I believe that women in STEM face obstacles that men don’t.  These range from the sheer awkwardness of sometimes being the only woman in a room full of guys, to challenges related to pregnancy and childcare, to actual belittlement and harassment.  Note that, even if men in STEM fields are no more sexist on average than men in other fields—or are less sexist, as one might expect from their generally socially liberal views and attitudes—the mere fact of the gender imbalance means that women in STEM will have many more opportunities to be exposed to whatever sexists there are.  This puts a special burden on us to create a welcoming environment for women.

4. Given that we know that gender gaps in interest and inclination appear early in life, I believe in doing anything we can to encourage girls’ interest in STEM fields.  Trust me, my four-year-old daughter Lily wishes I didn’t believe so fervently in working with her every day on her math skills.

5. I believe that gender diversity is valuable in itself.  It’s just nicer, for men and women alike, to have a work environment with many people of both sexes—especially if (as is often the case in STEM) so much of our lives revolves around our work.  I think that affirmative action for women, women-only scholarships and conferences, and other current efforts to improve gender diversity can all be defended and supported on that ground alone.

6. I believe that John Stuart Mill’s The Subjection of Women is one of the masterpieces of history, possibly the highest pinnacle that moral philosophy has ever reached.  Everyone should read it carefully and reflect on it if they haven’t already.

7. I believe it’s a tragedy that the current holder of the US presidency is a confessed sexual predator, who’s full of contempt not merely for feminism, but for essentially every worthwhile human value. I believe those of us on the “pro-Enlightenment side” now face the historic burden of banding together to stop this thug by every legal and peaceful means available. I believe that, whenever the “good guys” tear each other down in internecine warfare—e.g. “nerds vs. feminists”—it represents a wasted opportunity and an unearned victory for the enemies of progress.

OK, now for the part that might blow some people’s minds.  I hold that every single belief above is compatible with what James Damore wrote in his now-infamous memo—at least, if we’re talking about the actual words in it.  In some cases, Damore even makes the above points himself.  In particular, there’s nothing in what he wrote about female Googlers being less qualified on average than male Googlers, or being too neurotic to code, or anything like that: the question at hand is just why there are fewer women in these positions, and that in turn becomes a question about why there are fewer women earlier in the CS pipeline.  Reasonable people need not agree about the answers to those questions, or regard them as known or obvious, to see that the failure to make this one elementary distinction, between quality and quantity, already condemns 95% of Damore’s attackers as not having read or understood what he wrote.

Let that be the measure of just how terrifyingly efficient the social-media outrage machine has become at twisting its victims’ words to fit a clickbait narrative—a phenomenon with which I happen to be personally acquainted.  Strikingly, it seems not to make the slightest difference if (as in this case) the original source text is easily available to everyone.

Still, while most coverage of Damore’s memo was depressing in its monotonous incomprehension, dissent was by no means confined to the right-wingers eager to recruit Damore to their side.  Peter Singer—the legendary leftist moral philosopher, and someone whose fearlessness and consistency I’ve always admired whether I’ve agreed with him or not—wrote a powerful condemnation of Google’s decision to fire Damore.  Scott Alexander was brilliant as usual in picking apart bad arguments.  Megan McArdle drew on her experiences to illustrate some of Damore’s contentions.  Steven Pinker tweeted that Damore’s firing “makes [the] job of anti-Trumpists harder.”

Like Peter Singer, and also like Sarah Constantin and Stacey Jeffery above, I have no plans to take any position on biological differences in male and female inclinations and cognitive styles, and what role (if any) such differences might play in 80% of Google engineers being male—or, for that matter, what role they might play in 80% of graduating veterinarians now being female, or other striking gender gaps.  I decline to take a position not only because I’m not an expert, but also because, as Singer says, doing so isn’t necessary to reach the right verdict about Damore’s firing.  It suffices to note that the basic thesis being discussed—namely, that natural selection doesn’t stop at the neck, and that it’s perfectly plausible that it acted differently on women and men in ways that might help explain many of the population-level differences that we see today—can also be found in, for example, The Blank Slate by Steven Pinker, and other mainstream works by some of the greatest thinkers alive.

And therefore I say: if James Damore deserves to be fired from Google, for treating evolutionary psychology as potentially relevant to social issues, then Steven Pinker deserves to be fired from Harvard for the same offense.

Yes, I realize that an employee of a private company is different from a tenured professor.  But I don’t see why it’s relevant here.  For if someone really believes that mooting the hypothesis of an evolutionary reason for average differences in cognitive styles between men and women, is enough by itself to create a hostile environment for women—well then, why should tenure be a bar to firing, any more than it is in cases of sexual harassment?

But the reductio needn’t stop there.  It seems to me that, if Damore deserves to be fired, then so do the 56% of Googlers who said in a poll that they opposed his firing.  For isn’t that 56% just as responsible for maintaining a hostile environment as Damore himself was? (And how would Google find out which employees opposed the firing? Well, if there’s any company on earth that could…)  Furthermore, after those 56% of Googlers are fired, any of the remaining 44% who think the 56% shouldn’t have been fired should be fired as well!  And so on iteratively, until only an ideologically reliable core remains, which might or might not be the empty set.

OK, but while the wider implications of Damore’s firing have frightened and depressed me all week, as I said, I depart from Damore on the question of affirmative action and other diversity policies.  Fundamentally, what I want is a sort of negotiated agreement or bargain, between STEM nerds and the wider culture in which they live.  The agreement would work like this: STEM nerds do everything they can to foster diversity, including by creating environments that are welcoming for women, and by supporting affirmative action, women-only scholarships and conferences, and other diversity policies.  The STEM nerds also agree never to talk in public about possible cognitive-science explanations for gender disparities in which careers people choose, or overlapping bell curves,  or anything else potentially inflammatory.  In return, just two things:

  1. Male STEM nerds don’t regularly get libelled as misogynist monsters, who must be scaring all the women away with their inherently gross, icky, creepy, discriminatory brogrammer maleness.
  2. The fields beloved by STEM nerds are suffered to continue to exist, rather than getting destroyed and rebuilt along explicitly ideological lines, as already happened with many humanities and social science fields.

So in summary, neither side advances its theories about the causes of gender gaps; both sides simply agree that there are more interesting topics to explore.  In concrete terms, the social-justice side gets to retain 100% of what it has now, or maybe even expand it.  And all it has to offer in exchange is “R-E-S-P-E-C-T“!  Like, don’t smear and shame male nerds as a class, or nerdy disciplines themselves, for gender gaps that the male nerds would be as happy as anybody to see eradicated.

The trouble is that, fueled by outrage-fests on social media, I think the social-justice side is currently failing to uphold its end of this imagined bargain.  Nearly every day the sun rises to yet another thinkpiece about the toxic “bro culture” of Silicon Valley: a culture so uniquely and incorrigibly misogynist, it seems, that it still intentionally keeps women out, even after law and biology and most other white-collar fields have achieved or exceeded gender parity, their own “bro cultures” notwithstanding.  The trouble with this slander against male STEM nerds, besides its fundamental falsity (which Scott Alexander documented), is that puts the male nerds into an impossible position.  For how can they refute the slander without talking about other possible explanations for fields like CS being 80% male, which is the very thing we all know they’re not supposed to talk about?

In Europe, in the Middle Ages, the Church would sometimes enjoy forcing the local Jews into “disputations” about whose religion was the true one.  At these events, a popular tactic on the Church’s side was to make statements that the Jews couldn’t possibly answer without blaspheming the name of Christ—which, of course, could lead to the Jews’ expulsion or execution if they dared it.

Maybe I have weird moral intuitions, but it’s hard for me to imagine a more contemptible act of intellectual treason, than deliberately trapping your opponents between surrender and blasphemy.  I’d actually rather have someone force me into one or the other, than make me choose, and thereby make me responsible for whichever choice I made.  So I believe the social-justice left would do well to forswear this trapping tactic forever.

Ironically, I suspect that in the long term, doing so would benefit no entity more than the social-justice left itself.  If I had to steelman, in one sentence, the argument that in the space of one year propelled the “alt-right” from obscurity in dark and hateful corners of the Internet, to the improbable and ghastly ascent of Donald Trump and his white-nationalist brigade to the most powerful office on earth, the argument would be this:

If the elites, the technocrats, the “Cathedral”-dwellers, were willing to lie to the masses about humans being blank slates—and they obviously were—then why shouldn’t we assume that they also lied to us about healthcare and free trade and guns and climate change and everything else?

We progressives deluded ourselves that we could permanently shame our enemies into silence, on pain of sexism, racism, xenophobia, and other blasphemies.  But the “victories” won that way were hollow and illusory, and the crumbling of the illusion brings us to where we are now: with a vindictive, delusional madman in the White House who has a non-negligible chance of starting a nuclear war this week.

The Enlightenment was a specific historical period in 18th-century Europe.  But the term can also be used much more broadly, to refer to every trend in human history that’s other than horrible.  Seen that way, the Enlightenment encompasses the scientific revolution, the abolition of slavery, the decline of all forms of violence, the spread of democracy and literacy, and the liberation of women from domestic drudgery to careers of their own choosing.  The invention of Google, which made the entire world’s knowledge just a search bar away, is now also a permanent part of the story of the Enlightenment.

I fantasize that, within my lifetime, the Enlightenment will expand further to tolerate a diversity of cognitive styles—including people on the Asperger’s and autism spectrum, with their penchant for speaking uncomfortable truths—as well as a diversity of natural abilities and inclinations.  Society might or might not get the “demographically correct” percentage of Ellie Arroways—Ellie might decide to become a doctor or musician rather than an astronomer, and that’s fine too—but most important, it will nurture all the Ellie Arroways that it gets, all the misfits and explorers of every background.  I wonder whether, while disagreeing on exactly what’s meant by it, all parties to this debate could agree that diversity represents a next frontier for the Enlightenment.


Comment Policy: Any comment, from any side, that attacks people rather than propositions will be deleted.  I don’t care if the comment also makes useful points: if it contains a single ad hominem, it’s out.

As it happens, I’m at a quantum supremacy workshop in Bristol, UK right now—yeah, yeah, I’m a closet supremacist after all, hur hur—so I probably won’t participate in the comments until later.

The Kolmogorov option

August 8th, 2017

Andrey Nikolaevich Kolmogorov was one of the giants of 20th-century mathematics.  I’ve always found it amazing that the same man was responsible both for establishing the foundations of classical probability theory in the 1930s, and also for co-inventing the theory of algorithmic randomness (a.k.a. Kolmogorov complexity) in the 1960s, which challenged the classical foundations, by holding that it is possible after all to talk about the entropy of an individual object, without reference to any ensemble from which the object was drawn.  Incredibly, going strong into his eighties, Kolmogorov then pioneered the study of “sophistication,” which amends Kolmogorov complexity to assign low values both to “simple” objects and “random” ones, and high values only to a third category of objects, which are “neither simple nor random.”  So, Kolmogorov was at the vanguard of the revolution, counter-revolution, and counter-counter-revolution.

But that doesn’t even scratch the surface of his accomplishments: he made fundamental contributions to topology and dynamical systems, and together with Vladimir Arnold, solved Hilbert’s thirteenth problem, showing that any multivariate continuous function can be written as a composition of continuous functions of two variables.  He mentored an awe-inspiring list of young mathematicians, whose names (besides Arnold) include Dobrushin, Dynkin, Gelfand, Martin-Löf, Sinai, and in theoretical computer science, our own Leonid Levin.  If that wasn’t enough, during World War II Kolmogorov applied his mathematical gifts to artillery problems, helping to protect Moscow from German bombardment.

Kolmogorov was private in his personal and political life, which might have had something to do with being gay, at a time and place when that was in no way widely accepted.  From what I’ve read—for example, in Gessen’s biography of Perelman—Kolmogorov seems to have been generally a model of integrity and decency.  He established schools for mathematically gifted children, which became jewels of the Soviet Union; one still reads about them with awe.  And at a time when Soviet mathematics was convulsed by antisemitism—with students of Jewish descent excluded from the top math programs for made-up reasons, sent instead to remote trade schools—Kolmogorov quietly protected Jewish researchers.

OK, but all this leaves a question.  Kolmogorov was a leading and admired Soviet scientist all through the era of Stalin’s purges, the Gulag, the KGB, the murders and disappearances and forced confessions, the show trials, the rewritings of history, the allies suddenly denounced as traitors, the tragicomedy of Lysenkoism.  Anyone as intelligent, individualistic, and morally sensitive as Kolmogorov would obviously have seen through the lies of his government, and been horrified by its brutality.  So then why did he utter nary a word in public against what was happening?

As far as I can tell, the answer is simply: because Kolmogorov knew better than to pick fights he couldn’t win.  He judged that he could best serve the cause of truth by building up an enclosed little bubble of truth, and protecting that bubble from interference by the Soviet system, and even making the bubble useful to the system wherever he could—rather than futilely struggling to reform the system, and simply making martyrs of himself and all his students for his trouble.

There’s a saying of Kolmogorov, which associates wisdom with keeping your mouth shut:

“Every mathematician believes that he is ahead of the others. The reason none state this belief in public is because they are intelligent people.”

There’s also a story that Kolmogorov loved to tell about himself, which presents math as a sort of refuge from the arbitrariness of the world: he said that he once studied to become a historian, but was put off by the fact that historians demanded ten different proofs for the same proposition, whereas in math, a single proof suffices.

There was also a dark side to political quietism.  In 1936, Kolmogorov joined other mathematicians in testifying against his former mentor in the so-called Luzin affair.  By many accounts, he did this because the police blackmailed him, by threatening to reveal his homosexual relationship with Pavel Aleksandrov.  On the other hand, while he was never foolish enough to take on Lysenko directly, Kolmogorov did publish a paper in 1940 courageously supporting Mendelian genetics.


It seems likely that in every culture, there have been truths, which moreover everyone knows to be true on some level, but which are so corrosive to the culture’s moral self-conception that one can’t assert them, or even entertain them seriously, without (in the best case) being ostracized for the rest of one’s life.  In the USSR, those truths were the ones that undermined the entire communist project: for example, that humans are not blank slates; that Mendelian genetics is right; that Soviet collectivized agriculture was a humanitarian disaster.  In our own culture, those truths are—well, you didn’t expect me to say, did you? 🙂

I’ve long been fascinated by the psychology of unspeakable truths.  Like, for any halfway perceptive person in the USSR, there must have been an incredible temptation to make a name for yourself as a daring truth-teller: so much low-hanging fruit!  So much to say that’s correct and important, and that best of all, hardly anyone else is saying!

But then one would think better of it.  It’s not as if, when you speak a forbidden truth, your colleagues and superiors will thank you for correcting their misconceptions.  Indeed, it’s not as if they didn’t already know, on some level, whatever you imagined yourself telling them.  In fact it’s often because they fear you might be right that the authorities see no choice but to make an example of you, lest the heresy spread more widely.  One corollary is that the more reasonably and cogently you make your case, the more you force the authorities’ hand.

But what’s the inner psychology of the authorities?  For some, it probably really is as cynical as the preceding paragraph makes it sound.  But for most, I doubt that.  I think that most authorities simply internalize the ruling ideology so deeply that they equate dissent with sin.  So in particular, the better you can ground your case in empirical facts, the craftier and more conniving a deceiver you become in their eyes, and hence the more virtuous they are for punishing you.  Someone who’s arrived at that point is completely insulated from argument: absent some crisis that makes them reevaluate their entire life, there’s no sense in even trying.  The question of whether or not your arguments have merit won’t even get entered upon, nor will the authority ever be able to repeat back your arguments in a form you’d recognize—for even repeating the arguments correctly could invite accusations of secretly agreeing with them.  Instead, the sole subject of interest will be you: who you think you are, what your motivations were to utter something so divisive and hateful.  And you have as good a chance of convincing authorities of your benign motivations as you’d have of convincing the Inquisition that, sure, you’re a heretic, but the good kind of heretic, the kind who rejects the divinity of Jesus but believes in niceness and tolerance and helping people.  To an Inquisitor, “good heretic” doesn’t parse any better than “round square,” and the very utterance of such a phrase is an invitation to mockery.  If the Inquisition had had Twitter, its favorite sentence would be “I can’t even.”

If it means anything to be a lover of truth, it means that anytime society finds itself stuck in one of these naked-emperor equilibriums—i.e., an equilibrium with certain facts known to nearly everyone, but severe punishments for anyone who tries to make those facts common knowledge—you hope that eventually society climbs its way out.  But crucially, you can hope this while also realizing that, if you tried singlehandedly to change the equilibrium, it wouldn’t achieve anything good for the cause of truth.  If iconoclasts simply throw themselves against a ruling ideology one by one, they can be picked off as easily as tribesmen charging a tank with spears, and each kill will only embolden the tank-gunners still further.  The charging tribesmen don’t even have the assurance that, if truth ultimately does prevail, then they’ll be honored as martyrs: they might instead end up like Ted Nelson babbling about hypertext in 1960, or H.C. Pocklington yammering about polynomial-time algorithms in 1917, nearly forgotten by history for being too far ahead of their time.

Does this mean that, like Winston Smith, the iconoclast simply must accept that 2+2=5, and that a boot will stamp on a human face forever?  No, not at all.  Instead the iconoclast can choose what I think of as the Kolmogorov option.  This is where you build up fortresses of truth in places the ideological authorities don’t particularly understand or care about, like pure math, or butterfly taxonomy, or irregular verbs.  You avoid a direct assault on any beliefs your culture considers necessary for it to operate.  You even seek out common ground with the local enforcers of orthodoxy.  Best of all is a shared enemy, and a way your knowledge and skills might be useful against that enemy.  For Kolmogorov, the shared enemy was the Nazis; for someone today, an excellent choice might be Trump, who’s rightly despised by many intellectual factions that spend most of their time despising each other.  Meanwhile, you wait for a moment when, because of social tectonic shifts beyond your control, the ruling ideology has become fragile enough that truth-tellers acting in concert really can bring it down.  You accept that this moment of reckoning might never arrive, or not in your lifetime.  But even if so, you could still be honored by future generations for building your local pocket of truth, and for not giving falsehood any more aid or comfort than was necessary for your survival.


When it comes to the amount of flak one takes for defending controversial views in public under one’s own name, I defer to almost no one.  For anyone tempted, based on this post, to call me a conformist or coward: how many times have you been denounced online, and from how many different corners of the ideological spectrum?  How many people have demanded your firing?   How many death threats have you received?  How many threatened lawsuits?  How many comments that simply say “kill yourself kike” or similar?  Answer and we can talk about cowardice.

But, yes, there are places even I won’t go, hills I won’t die on.  Broadly speaking:

  • My Law is that, as a scientist, I’ll hold discovering and disseminating the truth to be a central duty of my life, one that overrides almost every other value.  I’ll constantly urge myself to share what I see as the truth, even if it’s wildly unpopular, or makes me look weird, or is otherwise damaging to me.
  • The Amendment to the Law is that I’ll go to great lengths not to hurt anyone else’s feelings: for example, by propagating negative stereotypes, or by saying anything that might discourage any enthusiastic person from entering science.  And if I don’t understand what is or isn’t hurtful, then I’ll defer to the leading intellectuals in my culture to tell me.  This Amendment often overrides the Law, causing me to bite my tongue.
  • The Amendment to the Amendment is that, when pushed, I’ll stand by what I care about—such as free scientific inquiry, liberal Enlightenment norms, humor, clarity, and the survival of the planet and of family and friends and colleagues and nerdy misfits wherever they might be found.  So if someone puts me in a situation where there’s no way to protect what I care about without speaking a truth that hurts someone’s feelings, then I might speak the truth, feelings be damned.  (Even then, though, I’ll try to minimize collateral damage.)

When I see social media ablaze with this or that popular falsehood, I sometimes feel the “Galileo urge” washing over me.  I think: I’m a tenured professor with a semi-popular blog.  How can I look myself in the mirror, if I won’t use my platform and relative job safety to declare to the world, “and yet it moves”?

But then I remember that even Galileo weighed his options and tried hard to be prudent.  In his mind, the Dialogue Concerning the Two Chief World Systems actually represented a compromise (!).  Galileo never declared outright that the earth orbits the sun.  Instead, he put the Copernican doctrine, as a “possible view,” into the mouth of his character Salviati—only to have Simplicio “refute” Salviati, by the final dialogue, with the argument that faith always trumps reason, and that human beings are pathetically unequipped to deduce the plan of God from mere surface appearances.  Then, when that fig-leaf turned out not to be wide enough to fool the Church, Galileo quickly capitulated.  He repented of his error, and agreed never to defend the Copernican heresy again.  And he didn’t, at least not publicly.

Some have called Galileo a coward for that.  But the great David Hilbert held a different view.  Hilbert said that science, unlike religion, has no need for martyrs, because it’s based on facts that can’t be denied indefinitely.  Given that, Hilbert considered Galileo’s response to be precisely correct: in effect Galileo told the Inquisitors, hey, you’re the ones with the torture rack.  Just tell me which way you want it.  I can have the earth orbiting Mars and Venus in figure-eights by tomorrow if you decree it so.

Three hundred years later, Andrey Kolmogorov would say to the Soviet authorities, in so many words: hey, you’re the ones with the Gulag and secret police.  Consider me at your service.  I’ll even help you stop Hitler’s ideology from taking over the world—you’re 100% right about that one, I’ll give you that.  Now as for your own wondrous ideology: just tell me the dogma of the week, and I’ll try to make sure Soviet mathematics presents no threat to it.

There’s a quiet dignity to Kolmogorov’s (and Galileo’s) approach: a dignity that I suspect will be alien to many, but recognizable to those in the business of science.


Comment Policy: I welcome discussion about the responses of Galileo, Kolmogorov, and other historical figures to official ideologies that they didn’t believe in; and about the meta-question of how a truth-valuing person ought to behave when living under such ideologies.  In the hopes of maintaining a civil discussion, any comments that mention current hot-button ideological disputes will be ruthlessly deleted.

Is “information is physical” contentful?

July 20th, 2017

“Information is physical.”

This slogan seems to have originated around 1991 with Rolf Landauer.  It’s ricocheted around quantum information for the entire time I’ve been in the field, incanted in funding agency reports and popular articles and at the beginnings and ends of talks.

But what the hell does it mean?

There are many things it’s taken to mean, in my experience, that don’t make a lot of sense when you think about them—or else they’re vacuously true, or purely a matter of perspective, or not faithful readings of the slogan’s words.

For example, some people seem to use the slogan to mean something more like its converse: “physics is informational.”  That is, the laws of physics are ultimately not about mass or energy or pressure, but about bits and computations on them.  As I’ve often said, my problem with that view is less its audacity than its timidity!  It’s like, what would the universe have to do in order not to be informational in this sense?  “Information” is just a name we give to whatever picks out one element from a set of possibilities, with the “amount” of information given by the log of the set’s cardinality (and with suitable generalizations to infinite sets, nonuniform probability distributions, yadda yadda).  So, as long as the laws of physics take the form of telling us that some observations or configurations of the world are possible and others are not, or of giving us probabilities for each configuration, no duh they’re about information!

Other people use “information is physical” to pour scorn on the idea that “information” could mean anything without some actual physical instantiation of the abstract 0’s and 1’s, such as voltage differences in a loop of wire.  Here I certainly agree with the tautology that in order to exist physically—that is, be embodied in the physical world—a piece of information (like a song, video, or computer program) does need to be embodied in the physical world.  But my inner Platonist slumps in his armchair when people go on to assert that, for example, it’s meaningless to discuss the first prime number larger than 1010^125, because according to post-1998 cosmology, one couldn’t fit its digits inside the observable universe.

If the cosmologists revise their models next week, will this prime suddenly burst into existence, with all the mathematical properties that one could’ve predicted for it on general grounds—only to fade back into the netherworld if the cosmologists revise their models again?  Why would anyone want to use language in such a tortured way?

Yes, brains, computers, yellow books, and so on that encode mathematical knowledge comprise only a tiny sliver of the physical world.  But it’s equally true that the physical world we observe comprises only a tiny sliver of mathematical possibility-space.

Still other people use “information is physical” simply to express their enthusiasm for the modern merger of physical and information sciences, as exemplified by quantum computing.  Far be it from me to temper that enthusiasm: rock on, dudes!

Yet others use “information is physical” to mean that the rules governing information processing and transmission in the physical world aren’t knowable a priori, but can only be learned from physics.  This is clearest in the case of quantum information, which has its own internal logic that generalizes the logic of classical information.  But in some sense, we didn’t need quantum mechanics to tell us this!  Of course the laws of physics have ultimate jurisdiction over whatever occurs in the physical world, information processing included.

My biggest beef, with all these unpackings of the “information is physical” slogan, is that none of them really engage with any of the deep truths that we’ve learned about physics.  That is, we could’ve had more-or-less the same debates about any of them, even in a hypothetical world where the laws of physics were completely different.


So then what should we mean by “information is physical”?  In the rest of this post, I’d like to propose an answer to that question.

We get closer to the meat of the slogan if we consider some actual physical phenomena, say in quantum mechanics.  The double-slit experiment will do fine.

Recall: you shoot photons, one by one, at a screen with two slits, then examine the probability distribution over where the photons end up on a second screen.  You ask: does that distribution contain alternating “light” and “dark” regions, the signature of interference between positive and negative amplitudes?  And the answer, predicted by the math and confirmed by experiment, is: yes, but only if the information about which slit the photon went through failed to get recorded anywhere else in the universe, other than the photon location itself.

Here a skeptic interjects: but that has to be wrong!  The criterion for where a physical particle lands on a physical screen can’t possibly depend on anything as airy as whether “information” got “recorded” or not.  For what counts as “information,” anyway?  As an extreme example: what if God, unbeknownst to us mortals, took divine note of which slit the photon went through?  Would that destroy the interference pattern?  If so, then every time we do the experiment, are we collecting data about the existence or nonexistence of an all-knowing God?

It seems to me that the answer is: insofar as the mind of God can be modeled as a tensor factor in Hilbert space, yes, we are.  And crucially, if quantum mechanics is universally true, then the mind of God would have to be such a tensor factor, in order for its state to play any role in the prediction of observed phenomena.

To say this another way: it’s obvious and unexceptionable that, by observing a physical system, you can often learn something about what information must be in it.  For example, you need never have heard of DNA to deduce that chickens must somehow contain information about making more chickens.  What’s much more surprising is that, in quantum mechanics, you can often deduce things about what information can’t be present, anywhere in the physical world—because if such information existed, even a billion light-years away, it would necessarily have a physical effect that you don’t see.

Another famous example here concerns identical particles.  You may have heard the slogan that “if you’ve seen one electron, you’ve seen them all”: that is, apart from position, momentum, and spin, every two electrons have exactly the same mass, same charge, same every other property, including even any properties yet to be discovered.  Again the skeptic interjects: but that has to be wrong.  Logically, you could only ever confirm that two electrons were different, by observing a difference in their behavior.  Even if the electrons had behaved identically for a billion years, you couldn’t rule out the possibility that they were actually different, for example because of tiny nametags (“Hi, I’m Emily the Electron!” “Hi, I’m Ernie!”) that had no effect on any experiment you’d thought to perform, but were visible to God.

You can probably guess where this is going.  Quantum mechanics says that, no, you can verify that two particles are perfectly identical by doing an experiment where you swap them and see what happens.  If the particles are identical in all respects, then you’ll see quantum interference between the swapped and un-swapped states.  If they aren’t, you won’t.  The kind of interference you’ll see is different for fermions (like electrons) than for bosons (like photons), but the basic principle is the same in both cases.  Once again, quantum mechanics lets you verify that a specific type of information—in this case, information that distinguishes one particle from another—was not present anywhere in the physical world, because if it were, it would’ve destroyed an interference effect that you in fact saw.

This, I think, already provides a meatier sense in which “information is physical” than any of the senses discussed previously.


But we haven’t gotten to the filet mignon yet.  The late, great Jacob Bekenstein will forever be associated with the discovery that information, wherever and whenever it occurs in the physical world, takes up a minimum amount of space.  The most precise form of this statement, called the covariant entropy bound, was worked out in detail by Raphael Bousso.  Here I’ll be discussing a looser version of the bound, which holds in “non-pathological” cases, and which states that a bounded physical system can store at most A/(4 ln 2) bits of information, where A is the area in Planck units of any surface that encloses the system—so, about 1069 bits per square meter.  (Actually it’s 1069 qubits per square meter, but because of Holevo’s theorem, an upper bound on the number of qubits is also an upper bound on the number of classical bits that can be reliably stored in a system and then retrieved later.)

You might have heard of the famous way Nature enforces this bound.  Namely, if you tried to create a hard drive that stored more than 1069 bits per square meter of surface area, the hard drive would necessarily collapse to a black hole.  And from that point on, the information storage capacity would scale “only” with the area of the black hole’s event horizon—a black hole itself being the densest possible hard drive allowed by physics.

Let’s hear once more from our skeptic.  “Nonsense!  Matter can take up space.  Energy can take up space.  But information?  Bah!  That’s just a category mistake.  For a proof, suppose God took one of your black holes, with a 1-square-meter event horizon, which already had its supposed maximum of ~1069 bits of information.  And suppose She then created a bunch of new fundamental fields, which didn’t interact with gravity, electromagnetism, or any of the other fields that we know from observation, but which had the effect of encoding 10300 new bits in the region of the black hole.  Presto!  An unlimited amount of additional information, exactly where Bekenstein said it couldn’t exist.”

We’d like to pinpoint what’s wrong with the skeptic’s argument—and do so in a self-contained, non-question-begging way, a way that doesn’t pull any rabbits out of hats, other than the general principles of relativity and quantum mechanics.  I was confused myself about how to do this, until a month ago, when Daniel Harlow helped set me straight (any remaining howlers in my exposition are 100% mine, not his).

I believe the logic goes like this:

  1. Relativity—even just Galilean relativity—demands that, in flat space, the laws of physics must have the same form for all inertial observers (i.e., all observers who move through space at constant speed).
  2. Anything in the physical world that varies in space—say, a field that encodes different bits of information at different locations—also varies in time, from the perspective of an observer who moves through the field at a constant speed.
  3. Combining 1 and 2, we conclude that anything that can vary in space can also vary in time.  Or to say it better, there’s only one kind of varying: varying in spacetime.
  4. More strongly, special relativity tells us that there’s a specific numerical conversion factor between units of space and units of time: namely the speed of light, c.  Loosely speaking, this means that if we know the rate at which a field varies across space, we can also calculate the rate at which it varies across time, and vice versa.
  5. Anything that varies across time carries energy.  Why?  Because this is essentially the definition of energy in quantum mechanics!  Up to a constant multiple (namely, Planck’s constant), energy is the expected speed of rotation of the global phase of the wavefunction, when you apply your Hamiltonian.  If the global phase rotates at the slowest possible speed, then we take the energy to be zero, and say you’re in a vacuum state.  If it rotates at the next highest speed, we say you’re in a first excited state, and so on.  Indeed, assuming a time-independent Hamiltonian, the evolution of any quantum system can be fully described by simply decomposing the wavefunction into a superposition of energy eigenstates, then tracking of the phase of each eigenstate’s amplitude as it loops around and around the unit circle.  No energy means no looping around means nothing ever changes.
  6. Combining 3 and 5, any field that varies across space carries energy.
  7. More strongly, combining 4 and 5, if we know how quickly a field varies across space, we can lower-bound how much energy it has to contain.
  8. In general relativity, anything that carries energy couples to the gravitational field.  This means that anything that carries energy necessarily has an observable effect: if nothing else, its effect on the warping of spacetime.  (This is dramatically illustrated by dark matter, which is currently observable via its spacetime warping effect and nothing else.)
  9. Combining 6 and 8, any field that varies across space couples to the gravitational field.
  10. More strongly, combining 7 and 8, if we know how quickly a field varies across space, then we can lower-bound by how much it has to warp spacetime.  This is so because of another famous (and distinctive) feature of gravity: namely, the fact that it’s universally attractive, so all the warping contributions add up.
  11. But in GR, spacetime can only be warped by so much before we create a black hole: this is the famous Schwarzschild bound.
  12. Combining 10 and 11, the information contained in a physical field can only vary so quickly across space, before it causes spacetime to collapse to a black hole.

Summarizing where we’ve gotten, we could say: any information that’s spatially localized at all, can only be localized so precisely.  In our world, the more densely you try to pack 1’s and 0’s, the more energy you need, therefore the more you warp spacetime, until all you’ve gotten for your trouble is a black hole.  Furthermore, if we rewrote the above conceptual argument in math—keeping track of all the G’s, c’s, h’s, and so on—we could derive a quantitative bound on how much information there can be in a bounded region of space.  And if we were careful enough, that bound would be precisely the holographic entropy bound, which says that the number of (qu)bits is at most A/(4 ln 2), where A is the area of a bounding surface in Planck units.

Let’s pause to point out some interesting features of this argument.

Firstly, we pretty much needed the whole kitchen sink of basic physical principles: special relativity (both the equivalence of inertial frames and the finiteness of the speed of light), quantum mechanics (in the form of the universal relation between energy and frequency), and finally general relativity and gravity.  All three of the fundamental constants G, c, and h made appearances, which is why all three show up in the detailed statement of the holographic bound.

But secondly, gravity only appeared from step 8 onwards.  Up till then, everything could be said solely in the language of quantum field theory: that is, quantum mechanics plus special relativity.  The result would be the so-called Bekenstein bound, which upper-bounds the number of bits in any spatial region by the product of the region’s radius and its energy content.  I learned that there’s an interesting history here: Bekenstein originally deduced this bound using ingenious thought experiments involving black holes.  Only later did people realize that the Bekenstein bound can be derived purely within QFT (see here and here for example)—in contrast to the holographic bound, which really is a statement about quantum gravity.  (An early hint of this was that, while the holographic bound involves Newton’s gravitational constant G, the Bekenstein bound doesn’t.)

Thirdly, speaking of QFT, some readers might be struck by the fact that at no point in our 12-step program did we ever seem to need QFT machinery.  Which is fortunate, because if we had needed it, I wouldn’t have been able to explain any of this!  But here I have to confess that I cheated slightly.  Recall step 4, which said that “if you know the rate at which a field varies across space, you can calculate the rate at which it varies across time.”  It turns out that, in order to give that sentence a definite meaning, one uses the fact that in QFT, space and time derivatives in the Hamiltonian need to be related by a factor of c, since otherwise the Hamiltonian wouldn’t be Lorentz-invariant.

Fourthly, eagle-eyed readers might notice a loophole in the argument.  Namely, we never upper-bounded how much information God could add to the world, via fields that are constant across all of spacetime.  For example, there’s nothing to stop Her from creating a new scalar field that takes the same value everywhere in the universe—with that value, in suitable units, encoding 1050000 separate divine thoughts in its binary expansion.  But OK, being constant, such a field would interact with nothing and affect no observations—so Occam’s Razor itches to slice it off, by rewriting the laws of physics in a simpler form where that field is absent.  If you like, such a field would at most be a comment in the source code of the universe: it could be as long as the Great Programmer wanted it to be, but would have no observable effect on those of us living inside the program’s execution.


Of course, even before relativity and quantum mechanics, information had already been playing a surprisingly fleshy role in physics, through its appearance as entropy in 19th-century thermodynamics.  Which leads to another puzzle.  To a computer scientist, the concept of entropy, as the log of the number of microstates compatible with a given macrostate, seems clear enough, as does the intuition for why it should increase monotonically with time.  Or at least, to whatever extent we’re confused about these matters, we’re no more confused than the physicists are!

But then why should this information-theoretic concept be so closely connected to tangible quantities like temperature, and pressure, and energy?  From the mere assumption that a black hole has a nonzero entropy—that is, that it takes many bits to describe—how could Bekenstein and Hawking have possibly deduced that it also has a nonzero temperature?  Or: if you put your finger into a tub of hot water, does the heat that you feel somehow reflect how many bits are needed to describe the water’s microstate?

Once again our skeptic pipes up: “but surely God could stuff as many additional bits as She wanted into the microstate of the hot water—for example, in degrees of freedom that are still unknown to physics—without the new bits having any effect on the water’s temperature.”

But we should’ve learned by now to doubt this sort of argument.  There’s no general principle, in our universe, saying that you can hide as many bits as you want in a physical object, without those bits influencing the object’s observable properties.  On the contrary, in case after case, our laws of physics seem to be intolerant of “wallflower bits,” which hide in a corner without talking to anyone.  If a bit is there, the laws of physics want it to affect other nearby bits and be affected by them in turn.

In the case of thermodynamics, the assumption that does all the real work here is that of equidistribution.  That is, whatever degrees of freedom might be available to your thermal system, your gas in a box or whatever, we assume that they’re all already “as randomized as they could possibly be,” subject to a few observed properties like temperature and volume and pressure.  (At least, we assume that in classical thermodynamics.  Non-equilibrium thermodynamics is a whole different can of worms, worms that don’t stay in equilibrium.)  Crucially, we assume this despite the fact that we might not even know all the relevant degrees of freedom.

Why is this assumption justified?  “Because experiment bears it out,” the physics teacher explains—but we can do better.  The assumption is justified because, as long as the degrees of freedom that we’re talking about all interact with each other, they’ve already had plenty of time to equilibrate.  And conversely, if a degree of freedom doesn’t interact with the stuff we’re observing—or with anything that interacts with the stuff we’re observing, etc.—well then, who cares about it anyway?

But now, because the microscopic laws of physics have the fundamental property of reversibility—that is, they never destroy information—a new bit has to go somewhere, and it can’t overwrite degrees of freedom that are already fully randomized.  This is why, if you pump more bits of information into a tub of hot water, while keeping it at the same volume, the new bits have nowhere to go except into pushing up the energy.  Now, there are often ways to push up the energy other than by raising the temperature—the concept of specific heat, in chemistry, is precisely about this—but if you need to stuff more bits into a substance, at the cost of raising its energy, certainly one of the obvious ways to do it is to describe a greater range of possible speeds for the water molecules.  So since that can happen, by equidistribution it typically does happen, which means that the molecules move faster on average, and your finger feels the water get hotter.


In summary, our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect.  And this is not a tautology, but comes about only because of nontrivial facts about special and general relativity, quantum mechanics, quantum field theory, and thermodynamics.  And this is what I think people should mean when they say “information is physical.”

Anyway, if this was all obvious to you, I apologize for having wasted your time!  But in my defense, it was never explained to me quite this way, nor was it sorted out in my head until recently—even though it seems like one of the most basic and general things one can possibly say about physics.


Endnotes. Thanks again to Daniel Harlow, not only for explaining the logic of the holographic bound to me but for several suggestions that improved this post.

Some readers might suspect circularity in the arguments we’ve made: are we merely saying that “any information that has observable physical consequences, has observable physical consequences”?  No, it’s more than that.  In all the examples I discussed, the magic was that we inserted certain information into our abstract mathematical description of the world, taking no care to ensure that the information’s presence would have any observable consequences whatsoever.  But then the principles of quantum mechanics, quantum gravity, or thermodynamics forced the information to be detectable in very specific ways (namely, via the destruction of quantum interference, the warping of spacetime, or the generation of heat respectively).

Three things

July 17th, 2017

I was shocked and horrified to learn of the loss of Maryam Mirzakhani at age 40, after a battle with cancer (see here or here).  Mirzakhani was a renowned mathematician at Stanford and the world’s first and so far only female Fields Medalist.  I never had the privilege of meeting her, but everything I’ve read about her fills me with admiration.  I wish to offer condolences to her friends and family, including her husband Jan Vondrák, also a professor at Stanford and a member of the CS theory community.


In other depressing news, discussion continues to rage on social media about “The Uninhabitable Earth,” the New York magazine article by David Wallace-Wells arguing that the dangers of climate change have been systematically understated even by climate scientists; that sea level rise is the least of the problems; and that if we stay the current course, much of the earth’s landmass has a good chance of being uninhabitable by the year 2100.  In an unusual turn of events, the Wallace-Wells piece has been getting slammed by climate scientists, including Michael Mann (see here and also this interview)—people who are usually in the news to refute the claims of deniers.

Some of the critics’ arguments seem cogent to me: for example, that Wallace-Wells misunderstood some satellite data, and more broadly, that the piece misleadingly presents its scenario as overwhelmingly probable by 2100 if we do nothing, rather than as “only” 10% likely or whatever—i.e., a mere Trump-becoming-president level of risk.  Other objections to the article impressed me less: for example, that doom-and-gloom is a bad way to motivate people about climate change; that the masses need a more optimistic takeaway.  That obviously has no bearing on the truth of what’s going to happen—but even if we did agree to entertain such arguments, well, it’s not as if mainstream messaging on climate change has been an unmitigated success.  What if everyone should be sweating-in-the-night terrified?

As far as I understand it, the question of the plausibility of Wallace-Wells’s catastrophe scenario mostly just comes down to a single scientific unknown: namely, will the melting permafrost belch huge amounts of methane into the atmosphere?  If it does, then “Armageddon” is probably a fair description of what awaits us in the next century, and if not, not.  Alas, our understanding of permafrost doesn’t seem especially reliable, and it strikes me that models of such feedbacks have a long history of erring on the side of conservatism (for example, researchers were astonished by how quickly glaciers and ice shelves fell apart).

So, while I wish the article was written with more caveats, I submit that runaway warming scenarios deserve more attention rather than less.  And we should be putting discussion of those scenarios in exactly the broader context that Wallace-Wells does: namely, that of the Permian-Triassic extinction event, the Fermi paradox, and the conditions for a technological civilization to survive past its infancy.

Certainly we spend much more time on risks to civilization (e.g., nuclear terrorism, bioengineered pandemics) that strike me as less probable than this one.  And certainly this tail, in the distribution of possible outcomes, deserves at least as much attention as its more popular opposite, the tail where climate change turns out not to be much of a problem at all.  For the grim truth about climate change is that history won’t end in 2100: only the projections do.  And the mere addition of 50 more years could easily suffice to turn a tail risk into a body risk.

Of course, that the worst will happen is a clear prediction of reverse Hollywoodism theory—besides being the “natural, default” prediction for a computer scientist used to worst-case analysis.  This is one prediction that I hope turns out to be as wrong as possible.


OK, now for something to cheer us all up.  Yesterday the group of Misha Lukin, at Harvard, put a paper on the arXiv reporting the creation of a 51-qubit quantum simulator using cold atoms.  The paper doesn’t directly address the question of quantum supremacy, or indeed of performance comparisons between the new device and classical simulations at all.  But this is clearly a big step forward, while the world waits for the fully-programmable 50-qubit superconducting QCs that have been promised by the groups at Google and IBM.

Indeed, this strikes me as the most exciting news in experimental quantum information since last month, when Jian-Wei Pan’s group in Shanghai reported the first transmission of entangled photons from a satellite to earth—thereby allowing violations of the Bell inequality over 1200 kilometers, teleportation of a qubit from earth to space, and other major firsts.  These are breakthroughs that we knew were in the works ever since the Chinese government launched the QUESS satellite devoted to quantum communications.  I should’ve blogged about them in June.  Then again, regular readers of Shtetl-Optimized, familiar as they already are with the universal reach of quantum mechanics and with the general state of quantum information technology, shouldn’t find anything here that fundamentally surprises them, should they?

Amsterdam art museums plagiarizing my blog?

July 12th, 2017

This past week I had the pleasure of attending COLT (Conference on Learning Theory) 2017 in Amsterdam, and of giving an invited talk on “PAC-Learning and Reconstruction of Quantum States.”  You can see the PowerPoint slides here; videos were also made, but don’t seem to be available yet.

This was my first COLT, but almost certainly not the last.  I learned lots of cool new tidbits, from the expressive power of small-depth neural networks, to a modern theoretical computer science definition of “non-discriminatory” (namely, your learning algorithm’s output should be independent of protected categories like race, sex, etc. after conditioning on the truth you’re trying to predict), to the inapproximability of VC dimension (assuming the Exponential Time Hypothesis).  You can see the full schedule here.  Thanks so much to the PC chairs, Ohad Shamir and Satyen Kale, for inviting me and for putting on a great conference.

And one more thing: I’m not normally big on art museums, but Amsterdam turns out to have two in close proximity to each other—the Rijksmuseum and the Stedelijk—each containing something that Shtetl-Optimized readers might recognize.

Photo credits: Ronald de Wolf and Marijn Heule.

ITCS’2018

July 5th, 2017

My friend Anna Karlin, who chairs the ITCS program committee this year, asked me to post the following announcement, and I’m happy to oblige her.  I’ve enjoyed ITCS every time I’ve attended, and was even involved in the statement that led to ITCS’s creation, although I don’t take direct responsibility for the content of this ad. –SA


The ITCS 2018 Call For Papers is now available!

ITCS is a conference that stands apart from all others. For a decade now, it has been celebrating the vibrancy and unity of our field of Theoretical Computer Science. See this blog post for a detailed discussion of what makes ITCS so cool and the brief description of ITCS’17 at the end of this post.

ITCS seeks to promote research that carries a strong conceptual message  (e.g., introducing a new concept, model or understanding, opening a new line of inquiry within traditional or interdisciplinary areas, introducing new mathematical techniques and methodologies, or new applications of known techniques). ITCS welcomes both conceptual and technical contributions whose contents will advance and inspire the greater theory community.

This year, ITCS will be held at MIT in Cambridge, MA from January 11-14, 2018.

The submission deadline is September 8, 2017, with notification of decisions by October 30, 2017.

 Authors should strive to make their papers accessible not only to experts in their subarea, but also to the theory community at large. The committee will place a premium on writing that conveys clearly and in the simplest possible way what the paper is accomplishing.

Ten-page versions of accepted papers will be published in an electronic proceedings of the conference. However, the alternative of publishing a one page abstract with a link to a full PDF will also be available (to accommodate subsequent publication in journals that would not consider results that have been published in preliminary form in a conference proceedings).

You can find all the details in the official Call For Papers.

 On last year’s ITCS (by the PC Chair Christos Papadimitriou)

 This past ITCS (2017) was by all accounts the most successful ever.  We had 170+ submissions and 61 papers, including 5 “invited papers”, and 90+ registrants, all new records.  There was a voluntary poster session for authors to get a chance to go through more detail, and the famous Graduating Bits event, where the younger ones get their 5 minutes to show off their accomplishment and personality.

The spirit of the conference was invigorating, heartwarming, and great fun.  I believe none of the twelve sessions had fewer than 70 attendees — no parallelism, of course — while the now famous last session was among the best attended and went one hour overtime due to the excitement of discussion (compare with the last large conference that you attended).

Alex Halderman testifying before the Senate Intelligence Committee

June 21st, 2017

This morning, my childhood best friend Alex Halderman testified before the US Senate about the proven ease of hacking electronic voting machines without leaving any record, the certainty that Russia has the technical capability to hack American elections, and the urgency of three commonsense (and cheap) countermeasures:

  1. a paper trail for every vote cast in every state,
  2. routine statistical sampling of the paper trail—enough to determine whether large-scale tampering occurred, and
  3. cybersecurity audits to instill general best practices (such as firewalling election systems).

You can watch Alex on C-SPAN here—his testimony begins at 2:16:13, and is followed by the Q&A period.  You can also read Alex’s prepared testimony here, as well as his accompanying Washington Post editorial (joint with Justin Talbot-Zorn).

Alex’s testimony—its civic, nonpartisan nature, right down to Alex’s flourish of approvingly quoting President Trump in support of paper ballots—reflects a moving optimism that, even in these dark times for democracy, Congress can be prodded into doing the right thing merely because it’s clearly, overwhelmingly in the national interest.  I wish I could say I shared that optimism.  Nevertheless, when called to testify, what can one do but act on the assumption that such optimism is justified?  Here’s hoping that Alex’s urgent message is heard and acted on.

Higher-level causation exists (but I wish it didn’t)

June 4th, 2017

Unrelated Update (June 6): It looks like the issues we’ve had with commenting have finally been fixed! Thanks so much to Christie Wright and others at WordPress Concierge Services for handling this. Let me know if you still have problems. In the meantime, I also stopped asking for commenters’ email addresses (many commenters filled that field with nonsense anyway).  Oops, that ended up being a terrible idea, because it made commenting impossible!  Back to how it was before.


Update (June 5): Erik Hoel was kind enough to write a 5-page response to this post (Word .docx format), and to give me permission to share it here.  I might respond to various parts of it later.  For now, though, I’ll simply say that I stand by what I wrote, and that requiring the macro-distribution to arise by marginalizing the micro-distribution still seems like the correct choice to me (and is what’s assumed in, e.g., the proof of the data processing inequality).  But I invite readers to read my post along with Erik’s response, form their own opinions, and share them in the comments section.


This past Thursday, Natalie Wolchover—a math/science writer whose work has typically been outstanding—published a piece in Quanta magazine entitled “A Theory of Reality as More Than the Sum of Its Parts.”  The piece deals with recent work by Erik Hoel and his collaborators, including Giulio Tononi (Hoel’s adviser, and the founder of integrated information theory, previously critiqued on this blog).  Commenter Jim Cross asked me to expand on my thoughts about causal emergence in a blog post, so: your post, monsieur.

In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.  For more about what we’ll be discussing, see Hoel’s FQXi essay “Agent Above, Atom Below,” or better yet, his paper in Entropy, When the Map Is Better Than the Territory.  Here’s the abstract of the Entropy paper:

The causal structure of any system can be analyzed at a multitude of spatial and temporal scales. It has long been thought that while higher scale (macro) descriptions may be useful to observers, they are at best a compressed description and at worse leave out critical information and causal relationships. However, recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.” While causal emergence may at first seem counterintuitive, this paper grounds the phenomenon in a classic concept from information theory: Shannon’s discovery of the channel capacity. I argue that systems have a particular causal capacity, and that different descriptions of those systems take advantage of that capacity to various degrees. For some systems, only macroscale descriptions use the full causal capacity. These macroscales can either be coarse-grains, or may leave variables and states out of the model (exogenous, or “black boxed”) in various ways, which can improve the efficacy and informativeness via the same mathematical principles of how error-correcting codes take advantage of an information channel’s capacity. The causal capacity of a system can approach the channel capacity as more and different kinds of macroscales are considered. Ultimately, this provides a general framework for understanding how the causal structure of some systems cannot be fully captured by even the most detailed microscale description.

Anyway, Wolchover’s popular article quoted various researchers praising the theory of causal emergence, as well as a single inexplicably curmudgeonly skeptic—some guy who sounded like he was so off his game (or maybe just bored with debates about ‘reductionism’ versus ’emergence’?), that he couldn’t even be bothered to engage the details of what he was supposed to be commenting on.

Hoel’s ideas do not impress Scott Aaronson, a theoretical computer scientist at the University of Texas, Austin. He says causal emergence isn’t radical in its basic premise. After reading Hoel’s recent essay for the Foundational Questions Institute, “Agent Above, Atom Below” (the one that featured Romeo and Juliet), Aaronson said, “It was hard for me to find anything in the essay that the world’s most orthodox reductionist would disagree with. Yes, of course you want to pass to higher abstraction layers in order to make predictions, and to tell causal stories that are predictively useful — and the essay explains some of the reasons why.”

After the Quanta piece came out, Sean Carroll tweeted approvingly about the above paragraph, calling me a “voice of reason [yes, Sean; have I ever not been?], slapping down the idea that emergent higher levels have spooky causal powers.”  Then Sean, in turn, was criticized for that remark by Hoel and others.

Hoel in particular raised a reasonable-sounding question.  Namely, in my “curmudgeon paragraph” from Wolchover’s article, I claimed that the notion of “causal emergence,” or causality at the macro-scale, says nothing fundamentally new.  Instead it simply reiterates the usual worldview of science, according to which

  1. the universe is ultimately made of quantum fields evolving by some Hamiltonian, but
  2. if someone asks (say) “why has air travel in the US gotten so terrible?”, a useful answer is going to talk about politics or psychology or economics or history rather than the movements of quarks and leptons.

But then, Hoel asks, if there’s nothing here for the world’s most orthodox reductionist to disagree with, then how do we find Carroll and other reductionists … err, disagreeing?

I think this dilemma is actually not hard to resolve.  Faced with a claim about “causation at higher levels,” what reductionists disagree with is not the object-level claim that such causation exists (I scratched my nose because it itched, not because of the Standard Model of elementary particles).  Rather, they disagree with the meta-level claim that there’s anything shocking about such causation, anything that poses a special difficulty for the reductionist worldview that physics has held for centuries.  I.e., they consider it true both that

  1. my nose is made of subatomic particles, and its behavior is in principle fully determined (at least probabilistically) by the quantum state of those particles together with the laws governing them, and
  2. my nose itched.

At least if we leave the hard problem of consciousness out of it—that’s a separate debate—there seems to be no reason to imagine a contradiction between 1 and 2 that needs to be resolved, but “only” a vast network of intervening mechanisms to be elucidated.  So, this is how it is that reductionists can find anti-reductionist claims to be both wrong and vacuously correct at the same time.

(Incidentally, yes, quantum entanglement provides an obvious sense in which “the whole is more than the sum of its parts,” but even in quantum mechanics, the whole isn’t more than the density matrix, which is still a huge array of numbers evolving by an equation, just different numbers than one would’ve thought a priori.  For that reason, it’s not obvious what relevance, if any, QM has to reductionism versus anti-reductionism.  In any case, QM is not what Hoel invokes in his causal emergence theory.)

From reading the philosophical parts of Hoel’s papers, it was clear to me that some remarks like the above might help ward off the forehead-banging confusions that these discussions inevitably provoke.  So standard-issue crustiness is what I offered Natalie Wolchover when she asked me, not having time on short notice to go through the technical arguments.

But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper?  What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?


To answer that question, yesterday I (finally) read the Entropy paper all the way through.

Much like Tononi’s integrated information theory was built around a numerical measure called Φ, causal emergence is built around a different numerical quantity, this one supposed to measure the amount of “causal information” at a particular scale.  The measure is called effective information or EI, and it’s basically the mutual information between a system’s initial state sI and its final state sF, assuming a uniform distribution over sI.  Much like with Φ in IIT, computations of this EI are then used as the basis for wide-ranging philosophical claims—even though EI, like Φ, has aspects that could be criticized as arbitrary, and as not obviously connected with what we’re trying to understand.

Once again like with Φ, one of those assumptions is that of a uniform distribution over one of the variables, sI, whose relatedness we’re trying to measure.  In my IIT post, I remarked on that assumption, but I didn’t harp on it, since I didn’t see that it did serious harm, and in any case my central objection to Φ would hold regardless of which distribution we chose.  With causal emergence, by contrast, this uniformity assumption turns out to be the key to everything.

For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components.  Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8.  And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1.  If, on the other hand, it starts in state 8, then it stays in state 8.  In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state?  If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1.  So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates.  If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B.  So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us!  Ergo reductionism is false.

Once the argument is spelled out, it’s clear that the entire thing boils down to, how shall I put this, a normalization issue.  That is: we insist on the uniform distribution over microstates when calculating microscopic EI, and we also insist on the uniform distribution over macrostates when calculating macroscopic EI, and we ignore the fact that the uniform distribution over microstates gives rise to a non-uniform distribution over macrostates, because some macrostates can be formed in more ways than others.  If we fixed this, demanding that the two distributions be compatible with each other, we’d immediately find that, surprise, knowing the complete initial microstate of a system always gives you at least as much power to predict the system’s future as knowing a macroscopic approximation to that state.  (How could it not?  For given the microstate, we could in principle compute the macroscopic approximation for ourselves, but not vice versa.)

The closest the paper comes to acknowledging the problem—i.e., that it’s all just a normalization trick—seems to be the following paragraph in the discussion section:

Another possible objection to causal emergence is that it is not natural but rather enforced upon a system via an experimenter’s application of an intervention distribution, that is, from using macro-interventions.  For formalization purposes, it is the experimenter who is the source of the intervention distribution, which reveals a causal structure that already exists.  Additionally, nature itself may intervene upon a system with statistical regularities, just like an intervention distribution.  Some of these naturally occurring input distributions may have a viable interpretation as a macroscale causal model (such as being equal to Hmax [the maximum entropy] at some particular macroscale).  In this sense, some systems may function over their inputs and outputs at a microscale or macroscale, depending on their own causal capacity and the probability distribution of some natural source of driving input.

As far as I understand it, this paragraph is saying that, for all we know, something could give rise to a uniform distribution over macrostates, so therefore that’s a valid thing to look at, even if it’s not what we get by taking a uniform distribution over microstates and then coarse-graining it.  Well, OK, but unknown interventions could give rise to many other distributions over macrostates as well.  In any case, if we’re directly comparing causal information at the microscale against causal information at the macroscale, it still seems reasonable to me to demand that in the comparison, the macro-distribution arise by coarse-graining the micro one.  But in that case, the entire argument collapses.


Despite everything I said above, the real purpose of this post is to announce that I’ve changed my mind.  I now believe that, while Hoel’s argument might be unsatisfactory, the conclusion is fundamentally correct: scientific reductionism is false.  There is higher-level causation in our universe, and it’s 100% genuine, not just a verbal sleight-of-hand.  In particular, there are causal forces that can only be understood in terms of human desires and goals, and not in terms of subatomic particles blindly bouncing around.

So what caused such a dramatic conversion?

By 2015, after decades of research and diplomacy and activism and struggle, 196 nations had finally agreed to limit their carbon dioxide emissions—every nation on earth besides Syria and Nicaragua, and Nicaragua only because it thought the agreement didn’t go far enough.  The human race had thereby started to carve out some sort of future for itself, one in which the oceans might rise slowly enough that we could adapt, and maybe buy enough time until new technologies were invented that changed the outlook.  Of course the Paris agreement fell far short of what was needed, but it was a start, something to build on in the coming decades.  Even in the US, long the hotbed of intransigence and denial on this issue, 69% of the public supported joining the Paris agreement, compared to a mere 13% who opposed.  Clean energy was getting cheaper by the year.  Most of the US’s largest corporations, including Google, Microsoft, Apple, Intel, Mars, PG&E, and ExxonMobil—ExxonMobil, for godsakes—vocally supported staying in the agreement and working to cut their own carbon footprints.  All in all, there was reason to be cautiously optimistic that children born today wouldn’t live to curse their parents for having brought them into a world so close to collapse.

In order to unravel all this, in order to steer the heavy ship of destiny off the path toward averting the crisis and toward the path of existential despair, a huge number of unlikely events would need to happen in succession, as if propelled by some evil supernatural force.

Like what?  I dunno, maybe a fascist demagogue would take over the United States on a campaign based on willful cruelty, on digging up and burning dirty fuels just because and even if it made zero economic sense, just for the fun of sticking it to liberals, or because of the urgent need to save the US coal industry, which employs fewer people than Arby’s.  Such a demagogue would have no chance of getting elected, you say?

So let’s suppose he’s up against a historically unpopular opponent.  Let’s suppose that even then, he still loses the popular vote, but somehow ekes out an Electoral College win.  Maybe he gets crucial help in winning the election from a hostile foreign power—and for some reason, pro-American nationalists are totally OK with that, even cheer it.  Even then, we’d still probably need a string of additional absurd coincidences.  Like, I dunno, maybe the fascist’s opponent has an aide who used to be married to a guy who likes sending lewd photos to minors, and investigating that guy leads the FBI to some emails that ultimately turn out to mean nothing whatsoever, but that the media hyperventilate about precisely in time to cause just enough people to vote to bring the fascist to power, thereby bringing about the end of the world.  Something like that.

It’s kind of like, you know that thing where the small population in Europe that produced Einstein and von Neumann and Erdös and Ulam and Tarski and von Karman and Polya was systematically exterminated (along with millions of other innocents) soon after it started producing such people, and the world still hasn’t fully recovered?  How many things needed to go wrong for that to happen?  Obviously you needed Hitler to be born, and to survive the trenches and assassination plots; and Hindenburg to make the fateful decision to give Hitler power.  But beyond that, the world had to sleep as Germany rebuilt its military; every last country had to turn away refugees; the UK had to shut down Jewish immigration to Palestine at exactly the right time; newspapers had to bury the story; government record-keeping had to have advanced just to the point that rounding up millions for mass murder was (barely) logistically possible; and finally, the war had to continue long enough for nearly every European country to have just enough time to ship its Jews to their deaths, before the Allies showed up to liberate mostly the ashes.

In my view, these simply aren’t the sort of outcomes that you expect from atoms blindly interacting according to the laws of physics.  These are, instead, the signatures of higher-level causation—and specifically, of a teleological force that operates in our universe to make it distinctively cruel and horrible.

Admittedly, I don’t claim to know the exact mechanism of the higher-level causation.  Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously “pulled”—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.  And because of the postselected final state, even if the human race naïvely had only (say) a one-in-thousand chance of killing itself off, even if the paths to its destruction all involved some improbable absurdity, like an orange clown showing up from nowhere—nevertheless, the orange clown would show up.  Alternatively, maybe the higher-level causation unfolds through subtle correlations in the universe’s initial state, along the lines I sketched in my 2013 essay The Ghost in the Quantum Turing Machine.  Or maybe Erik Hoel is right after all, and it all comes down to normalization: if we looked at the uniform distribution over macrostates rather than over microstates, we’d discover that orange clowns destroying the world predominated.  Whatever the details, though, I think it can no longer be doubted that we live, not in the coldly impersonal universe that physics posited for centuries, but instead in a tragicomically evil one.

I call my theory reverse Hollywoodism, because it holds that the real world has the inverse of the typical Hollywood movie’s narrative arc.  Again and again, what we observe is that the forces of good have every possible advantage, from money to knowledge to overwhelming numerical superiority.  Yet somehow good still fumbles.  Somehow a string of improbable coincidences, or a black swan or an orange Hitler, show up at the last moment to let horribleness eke out a last-minute victory, as if the world itself had been rooting for horribleness all along.  That’s our universe.

I’m fine if you don’t believe this theory: maybe you’re congenitally more optimistic than I am (in which case, more power to you); maybe the full weight of our universe’s freakish awfulness doesn’t bear down on you as it does on me.  But I hope you’ll concede that, if nothing else, this theory is a genuinely non-reductionist one.

The Social Justice Warriors are right

May 29th, 2017

As you might know, I haven’t been exactly the world’s most consistent fan of the Social Justice movement, nor has it been the most consistent fan of me.

I cringe when I read about yet another conservative college lecture shut down by mob violence; or student protesters demanding the firing of a professor for trying gently to argue and reason with them; or an editor forced from his position for writing a (progressive) defense of “cultural appropriation”—a practice that I take to have been ubiquitous for all of recorded history, and without which there wouldn’t be any culture at all.  I cringe not only because I know that I was in the crosshairs once before and could easily be again, but also because, it seems to me, the Social Justice scalp-hunters are so astoundingly oblivious to the misdirection of their energies, to the power of their message for losing elections and neutering the progressive cause, to the massive gift their every absurdity provides to the world’s Fox Newses and Breitbarts and Trumps.

Yet there’s at least one issue where it seems to me that the Social Justice Warriors are 100% right, and their opponents 100% wrong. This is the moral imperative to take down every monument to Confederate “war heroes,” and to rename every street and school and college named after individuals whose primary contribution to the world was to defend chattel slavery.  As a now-Southerner, I have a greater personal stake here than I did before: UT Austin just recently removed its statue of Jefferson Davis, while keeping up its statue of Robert E. Lee.  My kids will likely attend what until very recently was called Robert E. Lee Elementary—this summer renamed Russell Lee Elementary.  (My suggestion, that the school be called T. D. Lee Parity Violation Elementary, was sadly never considered.)

So I was gratified that last week, New Orleans finally took down its monuments to slavers.  Mayor Mitch Landrieu’s speech, setting out the reasons for the removal, is worth reading.

I used to have little patience for “merely symbolic” issues: would that offensive statues and flags were the worst problems!  But it now seems to me that the fight over Confederate symbols is just a thinly-veiled proxy for the biggest moral question that’s faced the United States through its history, and also the most urgent question facing it in 2017.  Namely: Did the Union actually win the Civil War? Were the anti-Enlightenment forces—the slavers, the worshippers of blood and land and race and hierarchy—truly defeated? Do those forces acknowledge the finality and the rightness of their defeat?

For those who say that, sure, slavery was bad and all, but we need to keep statues to slavers up so as not to “erase history,” we need only change the example. Would we similarly defend statues of Hitler, Himmler, and Goebbels, looming over Berlin in heroic poses?  Yes, let Germans reflect somberly and often on this aspect of their heritage—but not by hoisting a swastika over City Hall.

For those who say the Civil War wasn’t “really” about slavery, I reply: this is the canonical example of a “Mount Stupid” belief, the sort of thing you can say only if you’ve learned enough to be wrong but not enough to be unwrong.  In 1861, the Confederate ringleaders themselves loudly proclaimed to future generations that, indeed, their desire to preserve slavery was their overriding reason to secede. Here’s CSA Vice-President Alexander Stephens, in his famous Cornerstone Speech:

Our new government is founded upon exactly the opposite ideas; its foundations are laid, its cornerstone rests, upon the great truth that the negro is not equal to the white man; that slavery, subordination to the superior race, is his natural and normal condition. This, our new government, is the first, in the history of the world, based upon this great physical, philosophical, and moral truth.

Here’s Texas’ Declaration of Secession:

We hold as undeniable truths that the governments of the various States, and of the confederacy itself, were established exclusively by the white race, for themselves and their posterity; that the African race had no agency in their establishment; that they were rightfully held and regarded as an inferior and dependent race, and in that condition only could their existence in this country be rendered beneficial or tolerable. That in this free government all white men are and of right ought to be entitled to equal civil and political rights; that the servitude of the African race, as existing in these States, is mutually beneficial to both bond and free, and is abundantly authorized and justified by the experience of mankind, and the revealed will of the Almighty Creator, as recognized by all Christian nations; while the destruction of the existing relations between the two races, as advocated by our sectional enemies, would bring inevitable calamities upon both and desolation upon the fifteen slave-holding states.

It was only when defeat looked inevitable that the slavers started changing their story, claiming that their real grievance was never about slavery per se, but only “states’ rights” (states’ right to do what, exactly?). So again, why should we take the slavers’ rationalizations any more seriously than we take the postwar epiphanies of jailed Nazis that actually, they’d never felt any personal animus toward Jews, that the Final Solution was just the world’s biggest bureaucratic mishap?  Of course there’s a difference: when the Allies occupied Germany, they insisted on de-Nazification.  They didn’t suffer streets to be named after Hitler. And today, incredibly, fascism and white nationalism are greater threats here in the US than they are in Germany.  One reads about the historic irony of some American Jews, who are eligible for German citizenship because of grandparents expelled from there, now seeking to move there because they’re terrified about Trump.

By contrast, after a brief Reconstruction, the United States lost its will to continue de-Confederatizing the South.  The leaders were left free to write book after book whitewashing their cause, even to hold political office again.  And probably not by coincidence, we then got nearly a hundred years of Jim Crow—and still today, a half-century after the civil rights movement, southern governors and legislatures that do everything in their power to disenfranchise black voters.

For those who ask: but wasn’t Robert E. Lee a great general who was admired by millions? Didn’t he fight bravely for a cause he believed in?  Maybe it’s just me, but I’m allergic to granting undue respect to history’s villains just because they managed to amass power and get others to go along with them.  I remember reading once in some magazine that, yes, Genghis Khan might have raped thousands and murdered millions, but since DNA tests suggest that ~1% of humanity is now descended from him, we should also celebrate Khan’s positive contribution to “peopling the world.” Likewise, Hegel and Marx and Freud and Heidegger might have been wrong in nearly everything they said, sometimes with horrific consequences, but their ideas still need to be studied reverently, because of the number of other intellectuals who took them seriously.  As I reject those special pleas, so I reject the analogous ones for Jefferson Davis, Alexander Stephens, and Robert E. Lee, who as far as I can tell, should all (along with the rest of the Confederate leadership) have been sentenced for treason.

This has nothing to do with judging the past by standards of the present. By all means, build statues to Washington and Jefferson even though they held slaves, to Lincoln even though he called blacks inferior even while he freed them, to Churchill even though he fought the independence of India.  But don’t look for moral complexity where there isn’t any.  Don’t celebrate people who were terrible even for their own time, whose public life was devoted entirely to what we now know to be evil.

And if, after the last Confederate general comes down, the public spaces are too empty, fill them with monuments to Alan Turing, Marian Rejewski, Bertrand Russell, Hypatia of Alexandria, Emmy Noether, Lise Meitner, Mark Twain, Srinivasa Ramanujan, Frederick Douglass, Vasili Arkhipov, Stanislav Petrov, Raoul Wallenberg, even the inventors of saltwater taffy or Gatorade or the intermittent windshield wiper.  There are, I think, enough people who added value to the world to fill every city square and street sign.

Yet more errors in papers

May 24th, 2017

Following up on my posts PostBQP Postscripts and More Wrong Things I Said In Papers, it felt like time for another post in which I publicly flog myself for mistakes in my research papers.  [Warning: The rest of this post is kinda, sorta technical.  Read at your own risk.]


(1) In my 2006 paper “Oracles are subtle but not malicious,” I claimed to show that if PP is contained in BQP/qpoly, then the counting hierarchy collapses to QMA (Theorem 5).  But on further reflection, I only know how to show a collapse of the counting hierarchy under the stronger assumption that PP is in BQP/poly.  If PP is in BQP/qpoly, then certainly P#P=PP=QMA, but I don’t know how to collapse any further levels of the counting hierarchy.  The issue is this: in QMA, we can indeed nondeterministically guess an (amplified) quantum advice state for a BQP/qpoly algorithm.  We can then verify that the advice state works to solve PP problems, by using (for example) the interactive protocol for the permanent, or some other #P-complete problem.  But having done that, how do we then unravel the higher levels of the counting hierarchy?  For example, how do we simulate PPPP in PPBQP=PP?  We don’t have any mechanism to pass the quantum advice up to the oracle PP machine, since queries to a PP oracle are by definition classical strings.  We could try to use tools from my later paper with Andy Drucker, passing a classical description of the quantum advice up to the oracle and then using the description to reconstruct the advice for ourselves.  But doing so just doesn’t seem to give us a complexity class that’s low for PP, which is what would be needed to unravel the counting hierarchy.  I still think this result might be recoverable, but a new idea is needed.


(2) In my 2008 algebrization paper with Avi Wigderson, one of the most surprising things we showed was a general connection between communication complexity lower bounds and algebraic query complexity lower bounds.  Specifically, given a Boolean oracle A:{0,1}n→{0,1}, let ~A be a low-degree extension of A over a finite field F (that is, ~A(x)=A(x) whenever x∈{0,1}n).  Then suppose we have an algorithm that’s able to learn some property of A, by making k black-box queries to ~A.  We observed that, in such a case, if Alice is given the top half of the truth table of A, and Bob is given the bottom half of the truth table, then there’s necessarily a communication protocol by which Alice and Bob can learn the same property of A, by exchanging at most O(kn log|F|) bits.  This connection is extremely model-independent: a randomized query algorithm gives rise to a randomized communication protocol, a quantum query algorithm gives rise to a quantum communication protocol, etc. etc.  The upshot is that, if you want to lower-bound the number of queries that an algorithm needs to make to the algebraic extension oracle ~A, in order to learn something about A, then it suffices to prove a suitable communication complexity lower bound.  And the latter, unlike algebraic query complexity, is a well-studied subject with countless results that one can take off the shelf.  We illustrated how one could use this connection to prove, for example, that there exists an oracle A such that NPA ⊄ BQP~A, for any low-degree extension ~A of A—a separation that we didn’t and don’t know how to prove any other way. Likewise, there exists an oracle B such that BQPB ⊄ BPP~B for any low-degree extension ~B of B.

The trouble is, our “proof sketches” for these separations (in Theorem 5.11) are inadequate, even for “sketches.”  They can often be fixed, but only by appealing to special properties of the communication complexity separations in question, properties that don’t necessarily hold for an arbitrary communication separation between two arbitrary models.

The issue is this: while it’s true, as we claimed, that a communication complexity lower bound implies an algebraic query complexity lower bound, it’s not true in general that a communication complexity upper bound implies an algebraic query complexity upper bound!  So, from a communication separation between models C and D, we certainly obtain a query complexity problem that’s not in D~A, but then the problem might not be in CA.  What tripped us up was that, in the cases we had in mind (e.g. Disjointness), it’s obvious that the query problem is in CA.  In other cases, however, such as Raz’s separation between quantum and randomized communication complexity, it probably isn’t even true.  In the latter case, to recover the desired conclusion about algebraic query complexity (namely, the existence of an oracle B such that BQPB ⊄ BPP~B), what seems to be needed is to start from a later quantum vs. classical communication complexity separation due to Klartag and Regev, and then convert their communication problem into a query problem using a recent approach by myself and Shalev Ben-David (see Section 4).  Unfortunately, my and Shalev’s approach only tells us nonconstructively that there exists a query problem with the desired separation, with no upper bound on the gate complexity of the quantum algorithm.  So strictly speaking, I still don’t know how to get a separation between the relativized complexity classes BQPB and BPP~B defined in terms of Turing machines.

In any case, I of course should have realized this issue with the algebrization paper the moment Shalev and I encountered the same issue when writing our later paper.  Let me acknowledge Shalev, as well as Robin Kothari, for helping to spur my realization of the issue.


In case it wasn’t clear, the mistakes I’ve detailed here have no effect on the main results of the papers in question (e.g., the existence of an oracle relative to which PP has linear-sized circuits; the existence and pervasiveness of the algebrization barrier).  The effect is entirely on various “bonus” results—results that, because they’re bonus, were gone over much less carefully by authors and reviewers alike.

Nevertheless, I’ve always felt like in science, the louder you are about your own mistakes, the better.  Hence this post.