Archive for the ‘Nerd Interest’ Category

Common Knowledge and Aumann’s Agreement Theorem

Sunday, August 16th, 2015

The following is the prepared version of a talk that I gave at SPARC: a high-school summer program about applied rationality held in Berkeley, CA for the past two weeks.  I had a wonderful time in Berkeley, meeting new friends and old, but I’m now leaving to visit the CQT in Singapore, and then to attend the AQIS conference in Seoul.


Common Knowledge and Aumann’s Agreement Theorem

August 14, 2015

Thank you so much for inviting me here!  I honestly don’t know whether it’s possible to teach applied rationality, the way this camp is trying to do.  What I know is that, if it is possible, then the people running SPARC are some of the awesomest people on earth to figure out how.  I’m incredibly proud that Chelsea Voss and Paul Christiano are both former students of mine, and I’m amazed by the program they and the others have put together here.  I hope you’re all having fun—or maximizing your utility functions, or whatever.

My research is mostly about quantum computing, and more broadly, computation and physics.  But I was asked to talk about something you can actually use in your lives, so I want to tell a different story, involving common knowledge.

I’ll start with the “Muddy Children Puzzle,” which is one of the greatest logic puzzles ever invented.  How many of you have seen this one?

OK, so the way it goes is, there are a hundred children playing in the mud.  Naturally, they all have muddy foreheads.  At some point their teacher comes along and says to them, as they all sit around in a circle: “stand up if you know your forehead is muddy.”  No one stands up.  For how could they know?  Each kid can see all the other 99 kids’ foreheads, so knows that they’re muddy, but can’t see his or her own forehead.  (We’ll assume that there are no mirrors or camera phones nearby, and also that this is mud that you don’t feel when it’s on your forehead.)

So the teacher tries again.  “Knowing that no one stood up the last time, now stand up if you know your forehead is muddy.”  Still no one stands up.  Why would they?  No matter how many times the teacher repeats the request, still no one stands up.

Then the teacher tries something new.  “Look, I hereby announce that at least one of you has a muddy forehead.”  After that announcement, the teacher again says, “stand up if you know your forehead is muddy”—and again no one stands up.  And again and again; it continues 99 times.  But then the hundredth time, all the children suddenly stand up.

(There’s a variant of the puzzle involving blue-eyed islanders who all suddenly commit suicide on the hundredth day, when they all learn that their eyes are blue—but as a blue-eyed person myself, that’s always struck me as needlessly macabre.)

What’s going on here?  Somehow, the teacher’s announcing to the children that at least one of them had a muddy forehead set something dramatic in motion, which would eventually make them all stand up—but how could that announcement possibly have made any difference?  After all, each child already knew that at least 99 children had muddy foreheads!

Like with many puzzles, the way to get intuition is to change the numbers.  So suppose there were two children with muddy foreheads, and the teacher announced to them that at least one had a muddy forehead, and then asked both of them whether their own forehead was muddy.  Neither would know.  But each child could reason as follows: “if my forehead weren’t muddy, then the other child would’ve seen that, and would also have known that at least one of us has a muddy forehead.  Therefore she would’ve known, when asked, that her own forehead was muddy.  Since she didn’t know, that means my forehead is muddy.”  So then both children know their foreheads are muddy, when the teacher asks a second time.

Now, this argument can be generalized to any (finite) number of children.  The crucial concept here is common knowledge.  We call a fact “common knowledge” if, not only does everyone know it, but everyone knows everyone knows it, and everyone knows everyone knows everyone knows it, and so on.  It’s true that in the beginning, each child knew that all the other children had muddy foreheads, but it wasn’t common knowledge that even one of them had a muddy forehead.  For example, if your forehead and mine are both muddy, then I know that at least one of us has a muddy forehead, and you know that too, but you don’t know that I know it (for what if your forehead were clean?), and I don’t know that you know it (for what if my forehead were clean?).

What the teacher’s announcement did, was to make it common knowledge that at least one child has a muddy forehead (since not only did everyone hear the announcement, but everyone witnessed everyone else hearing it, etc.).  And once you understand that point, it’s easy to argue by induction: after the teacher asks and no child stands up (and everyone sees that no one stood up), it becomes common knowledge that at least two children have muddy foreheads (since if only one child had had a muddy forehead, that child would’ve known it and stood up).  Next it becomes common knowledge that at least three children have muddy foreheads, and so on, until after a hundred rounds it’s common knowledge that everyone’s forehead is muddy, so everyone stands up.

The moral is that the mere act of saying something publicly can change the world—even if everything you said was already obvious to every last one of your listeners.  For it’s possible that, until your announcement, not everyone knew that everyone knew the thing, or knew everyone knew everyone knew it, etc., and that could have prevented them from acting.

This idea turns out to have huge real-life consequences, to situations way beyond children with muddy foreheads.  I mean, it also applies to children with dots on their foreheads, or “kick me” signs on their backs…

But seriously, let me give you an example I stole from Steven Pinker, from his wonderful book The Stuff of Thought.  Two people of indeterminate gender—let’s not make any assumptions here—go on a date.  Afterward, one of them says to the other: “Would you like to come up to my apartment to see my etchings?”  The other says, “Sure, I’d love to see them.”

This is such a cliché that we might not even notice the deep paradox here.  It’s like with life itself: people knew for thousands of years that every bird has the right kind of beak for its environment, but not until Darwin and Wallace could anyone articulate why (and only a few people before them even recognized there was a question there that called for a non-circular answer).

In our case, the puzzle is this: both people on the date know perfectly well that the reason they’re going up to the apartment has nothing to do with etchings.  They probably even both know the other knows that.  But if that’s the case, then why don’t they just blurt it out: “would you like to come up for some intercourse?”  (Or “fluid transfer,” as the John Nash character put it in the Beautiful Mind movie?)

So here’s Pinker’s answer.  Yes, both people know why they’re going to the apartment, but they also want to avoid their knowledge becoming common knowledge.  They want plausible deniability.  There are several possible reasons: to preserve the romantic fantasy of being “swept off one’s feet.”  To provide a face-saving way to back out later, should one of them change their mind: since nothing was ever openly said, there’s no agreement to abrogate.  In fact, even if only one of the people (say A) might care about such things, if the other person (say B) thinks there’s any chance A cares, B will also have an interest in avoiding common knowledge, for A’s sake.

Put differently, the issue is that, as soon as you say X out loud, the other person doesn’t merely learn X: they learn that you know X, that you know that they know that you know X, that you want them to know you know X, and an infinity of other things that might upset the delicate epistemic balance.  Contrast that with the situation where X is left unstated: yeah, both people are pretty sure that “etchings” are just a pretext, and can even plausibly guess that the other person knows they’re pretty sure about it.  But once you start getting to 3, 4, 5, levels of indirection—who knows?  Maybe you do just want to show me some etchings.

Philosophers like to discuss Sherlock Holmes and Professor Moriarty meeting in a train station, and Moriarty declaring, “I knew you’d be here,” and Holmes replying, “well, I knew that you knew I’d be here,” and Moriarty saying, “I knew you knew I knew I’d be here,” etc.  But real humans tend to be unable to reason reliably past three or four levels in the knowledge hierarchy.  (Related to that, you might have heard of the game where everyone guesses a number between 0 and 100, and the winner is whoever’s number is the closest to 2/3 of the average of all the numbers.  If this game is played by perfectly rational people, who know they’re all perfectly rational, and know they know, etc., then they must all guess 0—exercise for you to see why.  Yet experiments show that, if you actually want to win this game against average people, you should guess about 20.  People seem to start with 50 or so, iterate the operation of multiplying by 2/3 a few times, and then stop.)

Incidentally, do you know what I would’ve given for someone to have explained this stuff to me back in high school?  I think that a large fraction of the infamous social difficulties that nerds have, is simply down to nerds spending so much time in domains (like math and science) where the point is to struggle with every last neuron to make everything common knowledge, to make all truths as clear and explicit as possible.  Whereas in social contexts, very often you’re managing a delicate epistemic balance where you need certain things to be known, but not known to be known, and so forth—where you need to prevent common knowledge from arising, at least temporarily.  “Normal” people have an intuitive feel for this; it doesn’t need to be explained to them.  For nerds, by contrast, explaining it—in terms of the muddy children puzzle and so forth—might be exactly what’s needed.  Once they’re told the rules of a game, nerds can try playing it too!  They might even turn out to be good at it.

OK, now for a darker example of common knowledge in action.  If you read accounts of Nazi Germany, or the USSR, or North Korea or other despotic regimes today, you can easily be overwhelmed by this sense of, “so why didn’t all the sane people just rise up and overthrow the totalitarian monsters?  Surely there were more sane people than crazy, evil ones.  And probably the sane people even knew, from experience, that many of their neighbors were sane—so why this cowardice?”  Once again, it could be argued that common knowledge is the key.  Even if everyone knows the emperor is naked; indeed, even if everyone knows everyone knows he’s naked, still, if it’s not common knowledge, then anyone who says the emperor’s naked is knowingly assuming a massive personal risk.  That’s why, in the story, it took a child to shift the equilibrium.  Likewise, even if you know that 90% of the populace will join your democratic revolt provided they themselves know 90% will join it, if you can’t make your revolt’s popularity common knowledge, everyone will be stuck second-guessing each other, worried that if they revolt they’ll be an easily-crushed minority.  And because of that very worry, they’ll be correct!

(My favorite Soviet joke involves a man standing in the Moscow train station, handing out leaflets to everyone who passes by.  Eventually, of course, the KGB arrests him—but they discover to their surprise that the leaflets are just blank pieces of paper.  “What’s the meaning of this?” they demand.  “What is there to write?” replies the man.  “It’s so obvious!”  Note that this is precisely a situation where the man is trying to make common knowledge something he assumes his “readers” already know.)

The kicker is that, to prevent something from becoming common knowledge, all you need to do is censor the common-knowledge-producing mechanisms: the press, the Internet, public meetings.  This nicely explains why despots throughout history have been so obsessed with controlling the press, and also explains how it’s possible for 10% of a population to murder and enslave the other 90% (as has happened again and again in our species’ sorry history), even though the 90% could easily overwhelm the 10% by acting in concert.  Finally, it explains why believers in the Enlightenment project tend to be such fanatical absolutists about free speech—why they refuse to “balance” it against cultural sensitivity or social harmony or any other value, as so many well-meaning people urge these days.

OK, but let me try to tell you something surprising about common knowledge.  Here at SPARC, you’ve learned all about Bayes’ rule—how, if you like, you can treat “probabilities” as just made-up numbers in your head, which are required obey the probability calculus, and then there’s a very definite rule for how to update those numbers when you gain new information.  And indeed, how an agent that wanders around constantly updating these numbers in its head, and taking whichever action maximizes its expected utility (as calculated using the numbers), is probably the leading modern conception of what it means to be “rational.”

Now imagine that you’ve got two agents, call them Alice and Bob, with common knowledge of each other’s honesty and rationality, and with the same prior probability distribution over some set of possible states of the world.  But now imagine they go out and live their lives, and have totally different experiences that lead to their learning different things, and having different posterior distributions.  But then they meet again, and they realize that their opinions about some topic (say, Hillary’s chances of winning the election) are common knowledge: they both know each other’s opinion, and they both know that they both know, and so on.  Then a striking 1976 result called Aumann’s Theorem states that their opinions must be equal.  Or, as it’s summarized: “rational agents with common priors can never agree to disagree about anything.”

Actually, before going further, let’s prove Aumann’s Theorem—since it’s one of those things that sounds like a mistake when you first hear it, and then becomes a triviality once you see the 3-line proof.  (Albeit, a “triviality” that won Aumann a Nobel in economics.)  The key idea is that knowledge induces a partition on the set of possible states of the world.  Huh?  OK, imagine someone is either an old man, an old woman, a young man, or a young woman.  You and I agree in giving each of these a 25% prior probability.  Now imagine that you find out whether they’re a man or a woman, and I find out whether they’re young or old.  This can be illustrated as follows:

ymom

The diagram tells us, for example, that if the ground truth is “old woman,” then your knowledge is described by the set {old woman, young woman}, while my knowledge is described by the set {old woman, old man}.  And this different information leads us to different beliefs: for example, if someone asks for the probability that the person is a woman, you’ll say 100% but I’ll say 50%.  OK, but what does it mean for information to be common knowledge?  It means that I know that you know that I know that you know, and so on.  Which means that, if you want to find out what’s common knowledge between us, you need to take the least common coarsening of our knowledge partitions.  I.e., if the ground truth is some given world w, then what do I consider it possible that you consider it possible that I consider possible that … etc.?  Iterate this growth process until it stops, by “zigzagging” between our knowledge partitions, and you get the set S of worlds such that, if we’re in world w, then what’s common knowledge between us is that the world belongs to S.  Repeat for all w’s, and you get the least common coarsening of our partitions.  In the above example, the least common coarsening is trivial, with all four worlds ending up in the same set S, but there are nontrivial examples as well:

youme

Now, if Alice’s expectation of a random variable X is common knowledge between her and Bob, that means that everywhere in S, her expectation must be constant … and hence must equal whatever the expectation is, over all the worlds in S!  Likewise, if Bob’s expectation is common knowledge with Alice, then everywhere in S, it must equal the expectation of X over S.  But that means that Alice’s and Bob’s expectations are the same.

There are lots of related results.  For example, rational agents with common priors, and common knowledge of each other’s rationality, should never engage in speculative trade (e.g., buying and selling stocks, assuming that they don’t need cash, they’re not earning a commission, etc.).  Why?  Basically because, if I try to sell you a stock for (say) $50, then you should reason that the very fact that I’m offering it means I must have information you don’t that it’s worth less than $50, so then you update accordingly and you don’t want it either.

Or here’s another one: suppose again that we’re Bayesians with common priors, and we’re having a conversation, where I tell you my opinion (say, of the probability Hillary will win the election).  Not any of the reasons or evidence on which the opinion is based—just the opinion itself.  Then you, being Bayesian, update your probabilities to account for what my opinion is.  Then you tell me your opinion (which might have changed after learning mine), I update on that, I tell you my new opinion, then you tell me your new opinion, and so on.  You might think this could go on forever!  But, no, Geanakoplos and Polemarchakis observed that, as long as there are only finitely many possible states of the world in our shared prior, this process must converge after finitely many steps with you and me having the same opinion (and moreover, with it being common knowledge that we have that opinion).  Why?  Because as long as our opinions differ, your telling me your opinion or me telling you mine must induce a nontrivial refinement of one of our knowledge partitions, like so:

youtell

I.e., if you learn something new, then at least one of your knowledge sets must get split along the different possible values of the thing you learned.  But since there are only finitely many underlying states, there can only be finitely many such splittings (note that, since Bayesians never forget anything, knowledge sets that are split will never again rejoin).

And something else: suppose your friend tells you a liberal opinion, then you take it into account, but reply with a more conservative opinion.  The friend takes your opinion into account, and replies with a revised opinion.  Question: is your friend’s new opinion likelier to be more liberal than yours, or more conservative?

Obviously, more liberal!  Yes, maybe your friend now sees some of your points and vice versa, maybe you’ve now drawn a bit closer (ideally!), but you’re not going to suddenly switch sides because of one conversation.

Yet, if you and your friend are Bayesians with common priors, one can prove that that’s not what should happen at all.  Indeed, your expectation of your own future opinion should equal your current opinion, and your expectation of your friend’s next opinion should also equal your current opinion—meaning that you shouldn’t be able to predict in which direction your opinion will change next, nor in which direction your friend will next disagree with you.  Why not?  Formally, because all these expectations are just different ways of calculating an expectation over the same set, namely your current knowledge set (i.e., the set of states of the world that you currently consider possible)!  More intuitively, we could say: if you could predict that, all else equal, the next thing you heard would probably shift your opinion in a liberal direction, then as a Bayesian you should already shift your opinion in a liberal direction right now.  (This is related to what’s called the “martingale property”: sure, a random variable X could evolve in many ways in the future, but the average of all those ways must be its current expectation E[X], by the very definition of E[X]…)

So, putting all these results together, we get a clear picture of what rational disagreements should look like: they should follow unbiased random walks, until sooner or later they terminate in common knowledge of complete agreement.  We now face a bit of a puzzle, in that hardly any disagreements in the history of the world have ever looked like that.  So what gives?

There are a few ways out:

(1) You could say that the “failed prediction” of Aumann’s Theorem is no surprise, since virtually all human beings are irrational cretins, or liars (or at least, it’s not common knowledge that they aren’t). Except for you, of course: you’re perfectly rational and honest.  And if you ever met anyone else as rational and honest as you, maybe you and they could have an Aumannian conversation.  But since such a person probably doesn’t exist, you’re totally justified to stand your ground, discount all opinions that differ from yours, etc.  Notice that, even if you genuinely believed that was all there was to it, Aumann’s Theorem would still have an aspirational significance for you: you would still have to say this is the ideal that all rationalists should strive toward when they disagree.  And that would already conflict with a lot of standard rationalist wisdom.  For example, we all know that arguments from authority carry little weight: what should sway you is not the mere fact of some other person stating their opinion, but the actual arguments and evidence that they’re able to bring.  Except that as we’ve seen, for Bayesians with common priors this isn’t true at all!  Instead, merely hearing your friend’s opinion serves as a powerful summary of what your friend knows.  And if you learn that your rational friend disagrees with you, then even without knowing why, you should take that as seriously as if you discovered a contradiction in your own thought processes.  This is related to an even broader point: there’s a normative rule of rationality that you should judge ideas only on their merits—yet if you’re a Bayesian, of course you’re going to take into account where the ideas come from, and how many other people hold them!  Likewise, if you’re a Bayesian police officer or a Bayesian airport screener or a Bayesian job interviewer, of course you’re going to profile people by their superficial characteristics, however unfair that might be to individuals—so all those studies proving that people evaluate the same resume differently if you change the name at the top are no great surprise.  It seems to me that the tension between these two different views of rationality, the normative and the Bayesian, generates a lot of the most intractable debates of the modern world.

(2) Or—and this is an obvious one—you could reject the assumption of common priors. After all, isn’t a major selling point of Bayesianism supposed to be its subjective aspect, the fact that you pick “whichever prior feels right for you,” and are constrained only in how to update that prior?  If Alice’s and Bob’s priors can be different, then all the reasoning I went through earlier collapses.  So rejecting common priors might seem appealing.  But there’s a paper by Tyler Cowen and Robin Hanson called “Are Disagreements Honest?”—one of the most worldview-destabilizing papers I’ve ever read—that calls that strategy into question.  What it says, basically, is this: if you’re really a thoroughgoing Bayesian rationalist, then your prior ought to allow for the possibility that you are the other person.  Or to put it another way: “you being born as you,” rather than as someone else, should be treated as just one more contingent fact that you observe and then conditionalize on!  And likewise, the other person should condition on the observation that they’re them and not you.  In this way, absolutely everything that makes you different from someone else can be understood as “differing information,” so we’re right back to the situation covered by Aumann’s Theorem.  Imagine, if you like, that we all started out behind some Rawlsian veil of ignorance, as pure reasoning minds that had yet to be assigned specific bodies.  In that original state, there was nothing to differentiate any of us from any other—anything that did would just be information to condition on—so we all should’ve had the same prior.  That might sound fanciful, but in some sense all it’s saying is: what licenses you to privilege an observation just because it’s your eyes that made it, or a thought just because it happened to occur in your head?  Like, if you’re objectively smarter or more observant than everyone else around you, fine, but to whatever extent you agree that you aren’t, your opinion gets no special epistemic protection just because it’s yours.

(3) If you’re uncomfortable with this tendency of Bayesian reasoning to refuse to be confined anywhere, to want to expand to cosmic or metaphysical scope (“I need to condition on having been born as me and not someone else”)—well then, you could reject the entire framework of Bayesianism, as your notion of rationality. Lest I be cast out from this camp as a heretic, I hasten to say: I include this option only for the sake of completeness!

(4) When I first learned about this stuff 12 years ago, it seemed obvious to me that a lot of it could be dismissed as irrelevant to the real world for reasons of complexity. I.e., sure, it might apply to ideal reasoners with unlimited time and computational power, but as soon as you impose realistic constraints, this whole Aumannian house of cards should collapse.  As an example, if Alice and Bob have common priors, then sure they’ll agree about everything if they effectively share all their information with each other!  But in practice, we don’t have time to “mind-meld,” swapping our entire life experiences with anyone we meet.  So one could conjecture that agreement, in general, requires a lot of communication.  So then I sat down and tried to prove that as a theorem.  And you know what I found?  That my intuition here wasn’t even close to correct!

In more detail, I proved the following theorem.  Suppose Alice and Bob are Bayesians with shared priors, and suppose they’re arguing about (say) the probability of some future event—or more generally, about any random variable X bounded in [0,1].  So, they have a conversation where Alice first announces her expectation of X, then Bob announces his new expectation, and so on.  The theorem says that Alice’s and Bob’s estimates of X will necessarily agree to within ±ε, with probability at least 1-δ over their shared prior, after they’ve exchanged only O(1/(δε2)) messages.  Note that this bound is completely independent of how much knowledge they have; it depends only on the accuracy with which they want to agree!  Furthermore, the same bound holds even if Alice and Bob only send a few discrete bits about their real-valued expectations with each message, rather than the expectations themselves.

The proof involves the idea that Alice and Bob’s estimates of X, call them XA and XB respectively, follow “unbiased random walks” (or more formally, are martingales).  Very roughly, if |XA-XB|≥ε with high probability over Alice and Bob’s shared prior, then that fact implies that the next message has a high probability (again, over the shared prior) of causing either XA or XB to jump up or down by about ε.  But XA and XB, being estimates of X, are bounded between 0 and 1.  So a random walk with a step size of ε can only continue for about 1/ε2 steps before it hits one of the “absorbing barriers.”

The way to formalize this is to look at the variances, Var[XA] and Var[XB], with respect to the shared prior.  Because Alice and Bob’s partitions keep getting refined, the variances are monotonically non-decreasing.  They start out 0 and can never exceed 1 (in fact they can never exceed 1/4, but let’s not worry about constants).  Now, the key lemma is that, if Pr[|XA-XB|≥ε]≥δ, then Var[XB] must increase by at least δε2 if Alice sends XA to Bob, and Var[XA] must increase by at least δε2 if Bob sends XB to Alice.  You can see my paper for the proof, or just work it out for yourself.  At any rate, the lemma implies that, after O(1/(δε2)) rounds of communication, there must be at least a temporary break in the disagreement; there must be some round where Alice and Bob approximately agree with high probability.

There are lots of other results in my paper, including an upper bound on the number of calls that Alice and Bob need to make to a “sampling oracle” to carry out this sort of protocol approximately, assuming they’re not perfect Bayesians but agents with bounded computational power.  But let me step back and address the broader question: what should we make of all this?  How should we live with the gargantuan chasm between the prediction of Bayesian rationality for how we should disagree, and the actual facts of how we do disagree?

We could simply declare that human beings are not well-modeled as Bayesians with common priors—that we’ve failed in giving a descriptive account of human behavior—and leave it at that.   OK, but that would still leave the question: does this stuff have normative value?  Should it affect how we behave, if we want to consider ourselves honest and rational?  I would argue, possibly yes.

Yes, you should constantly ask yourself the question: “would I still be defending this opinion, if I had been born as someone else?”  (Though you might say this insight predates Aumann by quite a bit, going back at least to Spinoza.)

Yes, if someone you respect as honest and rational disagrees with you, you should take it as seriously as if the disagreement were between two different aspects of yourself.

Finally, yes, we can try to judge epistemic communities by how closely they approach the Aumannian ideal.  In math and science, in my experience, it’s common to see two people furiously arguing with each other at a blackboard.  Come back five minutes later, and they’re arguing even more furiously, but now their positions have switched.  As we’ve seen, that’s precisely what the math says a rational conversation should look like.  In social and political discussions, though, usually the very best you’ll see is that two people start out diametrically opposed, but eventually one of them says “fine, I’ll grant you this,” and the other says “fine, I’ll grant you that.”  We might say, that’s certainly better than the common alternative, of the two people walking away even more polarized than before!  Yet the math tells us that even the first case—even the two people gradually getting closer in their views—is nothing at all like a rational exchange, which would involve the two participants repeatedly leapfrogging each other, completely changing their opinion about the question under discussion (and then changing back, and back again) every time they learned something new.  The first case, you might say, is more like haggling—more like “I’ll grant you that X is true if you grant me that Y is true”—than like our ideal friendly mathematicians arguing at the blackboard, whose acceptance of new truths is never slow or grudging, never conditional on the other person first agreeing with them about something else.

Armed with this understanding, we could try to rank fields by how hard it is to have an Aumannian conversation in them.  At the bottom—the easiest!—is math (or, let’s say, chess, or debugging a program, or fact-heavy fields like lexicography or geography).  Crucially, here I only mean the parts of these subjects with agreed-on rules and definite answers: once the conversation turns to whose theorems are deeper, or whose fault the bug was, things can get arbitrarily non-Aumannian.  Then there’s the type of science that involves messy correlational studies (I just mean, talking about what’s a risk factor for what, not the political implications).  Then there’s politics and aesthetics, with the most radioactive topics like Israel/Palestine higher up.  And then, at the very peak, there’s gender and social justice debates, where everyone brings their formative experiences along, and absolutely no one is a disinterested truth-seeker, and possibly no Aumannian conversation has ever been had in the history of the world.

I would urge that even at the very top, it’s still incumbent on all of us to try to make the Aumannian move, of “what would I think about this issue if I were someone else and not me?  If I were a man, a woman, black, white, gay, straight, a nerd, a jock?  How much of my thinking about this represents pure Spinozist reason, which could be ported to any rational mind, and how much of it would get lost in translation?”

Anyway, I’m sure some people would argue that, in the end, the whole framework of Bayesian agents, common priors, common knowledge, etc. can be chucked from this discussion like so much scaffolding, and the moral lessons I want to draw boil down to trite advice (“try to see the other person’s point of view”) that you all knew already.  Then again, even if you all knew all this, maybe you didn’t know that you all knew it!  So I hope you gained some new information from this talk in any case.  Thanks.


Update: Coincidentally, there’s a moving NYT piece by Oliver Sacks, which (among other things) recounts his experiences with his cousin, the Aumann of Aumann’s theorem.


Another Update: If I ever did attempt an Aumannian conversation with someone, the other Scott A. would be a candidate! Here he is in 2011 making several of the same points I did above, using the same examples (I thank him for pointing me to his post).

Celebrate gay marriage—and its 2065 equivalent

Saturday, June 27th, 2015

Yesterday was a historic day for the United States, and I was as delighted as everyone else I know.  I’ve supported gay marriage since the mid-1990s, when as a teenager, I read Andrew Hodges’ classic biography of Alan Turing, and burned with white-hot rage at Turing’s treatment.  In the world he was born into—our world, until fairly recently—Turing was “free”: free to prove the unsolvability of the halting problem, free to help save civilization from the Nazis, just not free to pursue the sexual and romantic fulfillment that nearly everyone else took for granted.  I resolved then that, if I was against anything in life, I was against the worldview that had hounded Turing to his death, or anything that even vaguely resembled it.

So I’m proud for my country, and I’m thrilled for my gay friends and colleagues and relatives.  At the same time, seeing my Facebook page light up with an endless sea of rainbow flags and jeers at Antonin Scalia, there’s something that gnaws at me.  To stand up for Alan Turing in 1952 would’ve taken genuine courage.  To support gay rights in the 60s, 70s, 80s, even the 90s, took courage.  But celebrating a social change when you know all your friends will upvote you, more than a decade after the tide of history has made the change unstoppable?  It’s fun, it’s righteous, it’s justified, I’m doing it myself.  But let’s not kid ourselves by calling it courageous.

Do you want to impress me with your moral backbone?  Then go and find a group that almost all of your Facebook friends still consider it okay, even praiseworthy, to despise and mock, for moral failings that either aren’t failings at all or are no worse than the rest of humanity’s.  (I promise: once you start looking, it shouldn’t be hard to find.)  Then take a public stand for that group.

Can blog posts nourish the soul? Scott A. (alas, not me) as existence proof

Wednesday, June 3rd, 2015

Reading the essays and speculative fiction of Scott Alexander, as they’ve grown in awesomeness even just within the past half-year, has for me been like witnessing the birth of a new Asimov.  (For more Alexandery goodness, check out Universal Love, Said the Cactus Person.)  That this nerd-bard, this spinner of stupid Internet memes into reflections on eternity, came to my attention by way of his brilliantly defending me, is almost immaterial at this point; I don’t think it plays any role in my continuing admiration for his work.  Whatever you do, just keep writing, other Scott A.

NSA in P/poly: The Power of Precomputation

Friday, May 22nd, 2015

Even after the Snowden revelations, there remained at least one big mystery about what the NSA was doing and how.  The NSA’s classified 2013 budget request mentioned, as a priority item, “groundbreaking cryptanalytic capabilities to defeat adversarial cryptography and exploit internet traffic.”  There was a requested increase, of several hundred million dollars, for “cryptanalytic IT services” and “cryptanalysis and exploitation services program C” (whatever that was).  And a classified presentation slide showed encrypted data being passed to a high-performance computing system called “TURMOIL,” and decrypts coming out.  But whatever was going on inside TURMOIL seemed to be secret even within NSA; someone at Snowden’s level wouldn’t have had access to the details.

So, what was (or is) inside the NSA’s cryptanalytic black box?  A quantum computer?  Maybe even one that they bought from D-Wave?  (Rimshot.)  A fast classical factoring algorithm?  A proof of P=NP?  Commentators on the Internet rushed to suggest each of these far-reaching possibilities.  Some of us tried to pour cold water on these speculations—pointing out that one could envision many scenarios that were a little more prosaic, a little more tied to the details of how public-key crypto is actually used in the real world.  Were we just naïve?

This week, a new bombshell 14-author paper (see also the website) advances an exceedingly plausible hypothesis about what may have been the NSA’s greatest cryptanalytic secret of recent years.  One of the authors is J. Alex Halderman of the University of Michigan, my best friend since junior high school, who I’ve blogged about before.  Because of that, I had some advance knowledge of this scoop, and found myself having to do what regular Shtetl-Optimized readers will know is the single hardest thing in the world for me: bite my tongue and not say anything.  Until now, that is.

Besides Alex, the other authors are David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Béguelink, and Paul Zimmermann (two of these, Green and Heninger, have previously turned up on Shtetl-Optimized).

These authors study vulnerabilities in Diffie-Hellman key exchange, the “original” (but still widely-used) public-key cryptosystem, the one that predates even RSA.  Diffie-Hellman is the thing where Alice and Bob first agree on a huge prime number p and a number g, then Alice picks a secret a and sends Bob ga (mod p), and Bob picks a secret b and sends Alice gb (mod p), and then Alice and Bob can both compute (ga)b=(gb)a=gab (mod p), but an eavesdropper who’s listening in only knows p, g, ga (mod p), and gb (mod p), and one can plausibly conjecture that it’s hard from those things alone to get gab (mod p).  So then Alice and Bob share a secret unknown to the eavesdropper, which they didn’t before, and they can use that secret to start doing cryptography.

As far as anyone knows today, the best way to break Diffie-Hellman is simply by calculating discrete logarithms: that is, solving the problem of recovering a given only g and h=ga (mod p).  At least on a classical computer, the fastest known algorithm for discrete logarithms (over fields of prime order) is the number field sieve (NFS).  Under plausible conjectures about the distribution of “smooth” numbers, NFS uses time that grows like exp((1.923+o(1))(log p)1/3(log log p)2/3), where the exp and logs are base e (and yes, even the lower-order stuff like (log log p)2/3 makes a big difference in practice).  Of course, once you know the running time of the best-known algorithm, you can then try to choose a key size (that is, a value of log(p)) that’s out of reach for that algorithm on the computing hardware of today.

(Note that the recent breakthrough of Antoine Joux, solving discrete log in quasipolynomial time in fields of small characteristic, also relied heavily on sieving ideas.  But there are no improvements from this yet for the “original” discrete log problem, over prime fields.)

But there’s one crucial further fact, which has been understood for at least a decade by theoretical cryptographers, but somehow was slow to filter out to the people who deploy practical cryptosystems.  The further fact is that in NFS, you can arrange things so that almost all the discrete-logging effort depends only on the prime number p, and not at all on the specific numbers g and h for which you’re trying to take the discrete log.  After this initial “precomputation” step, you then have a massive database that you can use to speed up the “descent” step: the step of solving ga=h (mod p), for any (g,h) pair that you want.

It’s a little like the complexity class P/poly, where a single, hard-to-compute “advice string” unlocks exponentially many inputs once you have it.  (Or a bit more precisely, one could say that NFS reveals that exponentiation modulo a prime number is sort of a trapdoor one-way function, except that the trapdoor information is subexponential-size, and given the trapdoor, inverting the function is still subexponential-time, but a milder subexponential than before.)

The kicker is that, in practice, a large percentage of all clients and servers that use Diffie-Hellman key exchange use the same few prime numbers p.  This means that, if you wanted to decrypt a large fraction of all the traffic encrypted with Diffie-Hellman, you wouldn’t need to do NFS over and over: you could just do it for a few p‘s and cache the results.  This fact can singlehandedly change the outlook for breaking Diffie-Hellman.

The story is different depending on the key size, log(p).  In the 1990s, the US government insisted on “export-grade” cryptography for products sold overseas (what a quaint concept!), which meant that the key size was restricted to 512 bits.  For 512-bit keys, Adrian et al. were able to implement NFS and use it to do the precomputation step in about 7 days on a cluster with a few thousand cores.  After this initial precomputation step (which produced 2.5GB of data), doing the descent, to find the discrete log for a specific (g,h) pair, took only about 90 seconds on a 24-core machine.

OK, but no one still uses 512-bit keys, do they?  The first part of Adrian et al.’s paper demonstrates that, because of implementation issues, even today you can force many servers to “downgrade” to the 512-bit, export-grade keys—and then, having done so, you can stall for time for 90 seconds as you figure out the session key, and then do a man-in-the-middle attack and take over and impersonate the server.  It’s an impressive example of the sort of game computer security researchers have been playing for a long time—but it’s really just a warmup to the main act.

As you’d expect, many servers today are configured more intelligently, and will only agree to 1024-bit keys.  But even there, Adrian et al. found that a large fraction of servers rely on just a single 1024-bit prime (!), and many of the ones that don’t rely on just a few other primes.  Adrian et al. estimate that, for a single 1024-bit prime, doing the NFS precomputation would take about 45 million years using a single core—or to put it more ominously, 1 year using 45 million cores.  If you built special-purpose hardware, that could go down by almost two orders of magnitude, putting the monetary cost at a few hundred million dollars, completely within the reach of a sufficiently determined nation-state.  Once the precomputation was done, and the terabytes of output stored in a data center somewhere, computing a particular discrete log would then take about 30 days using 1 core, or mere minutes using a supercomputer.  Once again, none of this is assuming any algorithmic advances beyond what’s publicly known.  (Of course, it’s possible that the NSA also has some algorithmic advances; even modest ones could obviate the need for special-purpose hardware.)

While writing this post, I did my own back-of-the-envelope, and got that using NFS, calculating a 1024-bit discrete log should be about 7.5 million times harder than calculating a 512-bit discrete log.  So, extrapolating from the 7 days it took Adrian et al. to do it for 512 bits, this suggests that it might’ve taken them about 143,840 years to calculate 1024-bit discrete logs with the few thousand cores they had, or 1 year if they had 143,840 times as many cores (since almost all this stuff is extremely parallelizable).  Adrian et al. mention optimizations that they expect would improve this by a factor of 3, giving us about 100 million core-years, very similar to Adrian et al.’s estimate of 45 million core-years (the lower-order terms in the running time of NFS might account for some of the remaining discrepancy).

Adrian et al. mount a detailed argument in their paper that all of the details about NSA’s “groundbreaking cryptanalytic capabilities” that we learned from the Snowden documents match what would be true if the NSA were doing something like the above.  The way Alex put it to me is that, sure, the NSA might not have been doing this, but if not, then he would like to understand why not—for it would’ve been completely feasible within the cryptanalytic budget they had, and the NSA would’ve known that, and it would’ve been a very good codebreaking value for the money.

Now that we know about this weakness of Diffie-Hellman key exchange, what can be done?

The most obvious solution—but a good one!—is just to use longer keys.  For decades, when applied cryptographers would announce some attack like this, theorists like me would say with exasperation: “dude, why don’t you fix all these problems in one stroke by just, like, increasing the key sizes by a factor of 10?  when it’s an exponential against a polynomial, we all know the exponential will win eventually, so why not just go out to where it does?”  The applied cryptographers explain to us, with equal exasperation in their voices, that there are all sorts of reasons why not, from efficiency to (maybe the biggest thing) backwards-compatibility.  You can’t unilaterally demand 2048-bit keys, if millions of your customers are using browsers that only understand 1024-bit keys.  On the other hand, given the new revelations, it looks like there really will be a big push to migrate to larger key sizes, as the theorists would’ve suggested from their ivory towers.

A second, equally-obvious solution is to stop relying so much on the same few prime numbers in Diffie-Hellman key exchange.  (Note that the reason RSA isn’t vulnerable to this particular attack is that it inherently requires a different composite number N for each public key.)  In practice, generating a new huge random prime number tends to be expensive—taking, say, a few minutes—which is why people so often rely on “standard” primes.  At the least, we could use libraries of millions of “safe” primes, from which a prime for a given session is chosen randomly.

A third solution is to migrate to elliptic-curve cryptography (ECC), which as far as anyone knows today, is much less vulnerable to descent attacks than the original Diffie-Hellman scheme.  Alas, there’s been a lot of understandable distrust of ECC after the DUAL_EC_DBRG scandal, in which it came out that the NSA backdoored some of NIST’s elliptic-curve-based pseudorandom generators by choosing particular parameters that it knew how handle.  But maybe the right lesson to draw is mod-p groups and elliptic-curve groups both seem to be pretty good for cryptography, but the mod-p groups are less good if everyone is using the same few prime numbers p (and those primes are “within nation-state range”), and the elliptic-curve groups are less good if everyone is using the same few parameters.  (A lot of these things do seem pretty predictable with hindsight, but how many did you predict?)

Many people will use this paper to ask political questions, like: hasn’t the NSA’s codebreaking mission once again usurped its mission to ensure the nation’s information security?  Doesn’t the 512-bit vulnerability that many Diffie-Hellman implementations still face, as a holdover from the 1990s export rules, illustrate why encryption should never be deliberately weakened for purposes of “national security”?  How can we get over the issue of backwards-compatibility, and get everyone using strong crypto?  People absolutely should be asking such questions.

But for readers of this blog, there’s one question that probably looms even larger than those of freedom versus security, openness versus secrecy, etc.: namely, the question of theory versus practice.  Which “side” should be said to have “won” this round?  Some will say: those useless theoretical cryptographers, they didn’t even know how their coveted Diffie-Hellman system could be broken in the real world!  The theoretical cryptographers might reply: of course we knew about the ability to do precomputation with NFS!  This wasn’t some NSA secret; it’s something we discussed openly for years.  And if someone told us how Diffie-Hellman was actually being used (with much of the world relying on the same few primes), we could’ve immediately spotted the potential for such an attack.  To which others might reply: then why didn’t you spot it?

Perhaps the right lesson to draw is how silly such debates really are.  In the end, piecing this story together took a team that was willing to do everything from learning some fairly difficult number theory to coding up simulations to poring over the Snowden documents for clues about the NSA’s budget.  Clear thought doesn’t respect the boundaries between disciplines, or between theory and practice.

(Thanks very much to Nadia Heninger and Neal Koblitz for reading this post and correcting a few errors in it.  For more about this, see Bruce Schneier’s post or Matt Green’s post.)

Five announcements

Saturday, May 16th, 2015

1. Sanjeev Arora sent me a heads-up that there’s a discussion about the future of the STOC conference  at the Windows on Theory blog—in particular, about the idea of turning STOC into a longer “CS theory festival.”  If you have opinions about this, don’t miss the chance to make your voice heard.

2. Back in January, I blogged about a new quantum optimization algorithm by Farhi, Goldstone, and Gutmann, which was notable for being, as far as anyone could tell, the first quantum algorithm to achieve a provably better approximation ratio than the best-known classical algorithm for an NP-hard optimization problem.  Today, I report that a fearsome list of authors—Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright—has put out an eagerly-awaited paper that gives a classical algorithm for the same problem, with better performance than the quantum algorithm’s.  (They write that this “improves both qualitatively and quantitatively” on Farhi et al.’s work; I assume “qualitatively” refers to the fact that the new algorithm is classical.)  What happened, apparently, is that after I blogged (with enthusiasm) about the Farhi et al. result, a bunch of classical complexity theorists read my post and decided independently that they could match or beat the quantum algorithm’s performance classically; then they found out about each other and decided to merge their efforts.  I’m proud to say that this isn’t the first example of this blog catalyzing actual research progress, though it’s probably the best example so far.  [Update: Luca Trevisan now has a great post explaining what happened in much more detail, entitled “How Many Theoreticians Does It Take to Approximate Max 3Lin?”]

Another update: Farhi et al. have posted a new version of their paper, in which they can almost match the performance of the classical algorithm using their quantum algorithm.

3. Jennifer Ouellette has a wonderful article in Quanta magazine about recent progress in AdS/MERA (i.e., “the emergence of spacetime from entanglement”), centered around the ideas of Brian Swingle.  This is one of the main things that I’d love to understand better right now—if I succeed even partially, you’ll know because I’ll write a blog post trying to explain it to others.  See also this blog post by Sean Carroll (about this paper by Ning Bao et al.), and this paper by Pastawski, Yoshida, Harlow, and Preskill, which explicitly mines the AdS/CFT correspondence for examples of quantum error-correcting codes.

4. Celebrity rationalist Julia Galef, who I had the great honor of meeting recently, has a podcast interview with Sean Carroll about why Carroll accepts the many-worlds interpretation.  (Or if, like me, you prefer the written word to the spoken one, click here for a full transcript.)  Unfortunately, Sean is given the opportunity at the end of the interview to recommend one science book to his listeners—just one!—but he squanders it by plugging some weird, self-indulgent thing called Quantum Computing Since Democritus.  Julia also has a YouTube video about what she learned from the interview, but I haven’t yet watched it (is there a transcript?).

5. I came across an insightful if meandering essay about nerd culture by Meredith L. Patterson.  In particular, noticing how the term “nerd” has been co-opted by normal, socially-skilled people, who’ve quickly set about remaking nerd social norms to make them identical to the rest of the world’s norms, Patterson coins the term “weird-nerd” to describe people like herself, who are still nerds in the original sense and who don’t see nerd culture as something horribly, irreparably broken.  As she writes: “We’ll start to feel less defensive when we get some indication — any indication — that our critics understand what parts of our culture we don’t want to lose and why we don’t want to lose them.”  (But is this the start of a linguistic treadmill?  Will we eventually need to talk about weird-weird-nerds, etc.?)

How can we fight online shaming campaigns?

Wednesday, February 25th, 2015

Longtime friend and colleague Boaz Barak sent me a fascinating New York Times Magazine article that profiles people who lost their jobs or otherwise had their lives ruined, because of a single remark that then got amplified a trillionfold in importance by social media.  (The author, Jon Ronson, also has a forthcoming book on the topic.)  The article opens with Justine Sacco: a woman who, about to board a flight to Cape Town, tweeted “Going to Africa.  Hope I don’t get AIDS.  Just kidding.  I’m white!”

To the few friends who read Sacco’s Twitter feed, it would’ve been obvious that she was trying to mock the belief of many well-off white people that they live in a bubble, insulated from the problems of the Third World; she wasn’t actually mocking black Africans who suffer from AIDS.  In a just world, maybe Sacco deserved someone to take her aside and quietly explain that her tweet might be read the wrong way, that she should be more careful next time.  Instead, by the time she landed in Cape Town, she learned that she’d become the #1 worldwide Twitter trend and a global symbol of racism.  She lost her career, she lost her entire previous life, and tens of thousands of people expressed glee about it.  The article rather heartbreakingly describes Sacco’s attempts to start over.

There are many more stories like the above.  Some I’d already heard about: the father of three who lost his job after he whispered a silly joke involving “dongles” to the person next to him at a conference, whereupon Adria Richards, a woman in front of him, snapped his photo and posted it to social media, to make an example of him as a sexist pig.  (Afterwards, a counter-reaction formed, which successfully got Richards fired from her job: justice??)  Other stories I hadn’t heard.

Reading this article made it clear to me just how easily I got off, in my own recent brush with the online shaming-mobs.  Yes, I made the ‘mistake’ of writing too openly about my experiences as a nerdy male teenager, and the impact that one specific aspect of feminist thought (not all of feminism!) had had on me.  Within the context of the conversation that a few nerdy men and women were having on this blog, my opening up led to exactly the results I was hoping for: readers thoughtfully sharing their own experiences, a meaningful exchange of ideas, even (dare I say it?) glimmers of understanding and empathy.

Alas, once the comment was wrested from its original setting into the clickbait bazaar, the story became “MIT professor explains: the real oppression is having to learn to talk to women” (the title of Amanda Marcotte’s hit-piece, something even some in Marcotte’s ideological camp called sickeningly cruel).  My photo was on the front page of Salon, next to the headline “The plight of the bitter nerd.”  I was subjected to hostile psychoanalysis not once but twice on ‘Dr. Nerdlove,’ a nerd-bashing site whose very name drips with irony, rather like the ‘Democratic People’s Republic of Korea.’  There were tweets and blog comments that urged MIT to fire me, that compared me to a mass-murderer, and that “deduced” (from first principles!) all the ways in which my parents screwed up in raising me and my female students cower in fear of me.   And yes, when you Google me, this affair now more-or-less overshadows everything else I’ve done in my life.

But then … there were also hundreds of men and women who rose to my defense, and they were heavily concentrated among the people I most admire and respect.  My supporters ranged from the actual female students who took my classes or worked with me or who I encouraged in their careers, from whom there was only kindness, not a single negative word; to the shy nerds who thanked me for being one of the only people to acknowledge their reality; to the lesbians and bisexual women who told me my experience also resonated with them; to the female friends and colleagues who sent me notes urging me to ignore the nonsense.  In the end, not only have I not lost any friends over this, I’ve gained new ones, and I’ve learned new sides of the friends I had.

Oh, and I didn’t get any death threats: I guess that’s good!  (Once in my life I did get death threats—graphic, explicit threats, about which I had to contact the police—but it was because I refused to publicize someone’s P=NP proof.)

Since I was away from campus when this blew up, I did feel some fear about the professional backlash that would await me on my return.  Would my office be vandalized?  Would activist groups be protesting my classes?  Would MIT police be there to escort me from campus?

Well, you want to know what happened instead?  Students and colleagues have stopped me in the hall, or come by my office, just to say they support me.  My class has record enrollment this term.  I was invited to participate in MIT’s Diversity Summit, since the organizers felt it would mean a lot to the students to see someone there who had opened up about diversity issues in STEM in such a powerful way.  (I regretfully had to decline, since the summit conflicted with a trip to Stanford.)  And an MIT graduate women’s reading group invited me for a dinner discussion (at my suggestion, Laurie Penny participated as well).  Imagine that: not only are MIT’s women’s groups not picketing me, they’re inviting me over for dinner!  Is there any better answer to the claim, urged on me by some of my overzealous supporters, that the bile of Amanda Marcotte represents all of feminism these days?

Speaking of which, I met Laurie Penny for coffee last month, and she and I quickly hit it off.  We’ve even agreed to write a joint blog post about our advice for shy nerds.  (In my What I Believe post, I had promised a post of advice for shy female nerds—but at Laurie’s urging, we’re broadening the focus to shy nerds of both sexes.)  Even though Laurie’s essay is the thing that brought me to the attention of the Twitter-mobs (which wasn’t Laurie’s intent!), and even though I disagreed with several points in her essay, I knew on reading it that Laurie was someone I’d enjoy talking to.  Unlike so much writing by online social justice activists, which tends to be encrusted with the specialized technical terms of that field—you know, terms like “asshat,” “shitlord,” “douchecanoe,” and “precious feefees of entitled white dudes”—Laurie’s prose shone with humanity and vulnerability: her own, which she freely shared, and mine, which she generously acknowledged.

Overall, the response to my comment has never made me happier or more grateful to be part of the STEM community (I never liked the bureaucratic acronym “STEM,” but fine, I’ll own it).  To many outsiders, we STEM nerds are a sorry lot: we’re “sperglords” (yes, slurs are fine, as long as they’re directed against the right targets!) who might be competent in certain narrow domains, but who lack empathy and emotional depth, and are basically narcissistic children.  Yet somehow when the chips were down, it’s my fellow STEM nerds, and people who hang out with STEM nerds a lot, who showed me far more empathy and compassion than many of the “normals” did.  So if STEM nerds are psychologically broken, then I say: may I surround myself, for the rest of my life, with men and women who are psychologically broken like I am.  May I raise Lily, and any future children I have, to be as psychologically broken as they can be.  And may I stay as far as possible from anyone who’s too well-adjusted.

I reserve my ultimate gratitude for the many women in STEM, friends and strangers alike, who sent me messages of support these past two months.  I’m not ashamed to say it: witnessing how so many STEM women stood up for me has made me want to stand up for them, even more than I did before.  If they’re not called on often enough in class, I’ll call on them more.  If they’re subtly discouraged from careers in science, I’ll blatantly encourage them back.  If they’re sexually harassed, I’ll confront their harassers myself (well, if asked to).  I will listen to them, and I will try to improve.

Is it selfish that I want to help female STEM nerds partly because they helped me?  Here’s the thing: one of my deepest moral beliefs is in the obligation to fight for those among the disadvantaged who don’t despise you, and who wouldn’t gladly rid the planet of everyone like you if they could.  (As I’ve written before, on issue after issue, this belief makes me a left-winger by American standards, and a right-winger by academic ones.)  In the present context, I’d say I have a massive moral obligation toward female STEM nerds and toward Laurie Penny’s version of feminism, and none at all toward Marcotte’s version.

All this is just to say that I’m unbelievably lucky—privileged (!)—to have had so many at MIT and elsewhere willing to stand up for me, and to have reached in a stage in life where I’m strong enough to say what I think and to weather anything the Internet says back.  What worries me is that others, more vulnerable, didn’t and won’t have it as easy when the Twitter hate-machine turns its barrel on them.  So in the rest of this post, I’d like to discuss the problem of what to do about social-media shaming campaigns that aim to, and do, destroy the lives of individuals.  I’m convinced that this is a phenomenon that’s only going to get more and more common: something sprung on us faster than our social norms have evolved to deal with it.  And it would be nice if we could solve it without having to wait for a few high-profile suicides.

But first, let me address a few obvious questions about why this problem is even a problem at all.

Isn’t social shaming as old as society itself—and permanent records of the shaming as old as print media?

Yes, but there’s also something fundamentally new about the problem of the Twitter-mobs.  Before, it would take someone—say, a newspaper editor—to make a conscious decision to the effect, “this comment is worth destroying someone’s life over.”  Today, there might be such an individual, but it’s also possible for lives to be destroyed in a decentralized, distributed fashion, with thousands of Twitterers collaborating to push a non-story past the point of no return.  And among the people who “break” the story, not one has to intend to ruin the victim’s life, or accept responsibility for it afterward: after all, each one made the story only ε bigger than it already was.  (Incidentally, this is one reason why I haven’t gotten a Twitter account: while it has many worthwhile uses, it’s also a medium that might as well have been designed for mobs, for ganging up, for status-seeking among allies stripped of rational arguments.  It’s like the world’s biggest high school.)

Don’t some targets of online shaming campaigns, y’know, deserve it?

Of course!  Some are genuine racists or misogynists or homophobes, who once would’ve been able to inflict hatred their entire lives without consequence, and were only brought down thanks to social media.  The trouble is, the participants in online shaming campaigns will always think they’re meting out righteous justice, whether they are or aren’t.  But there’s an excellent reason why we’ve learned in modern societies not to avenge even the worst crimes via lynch mobs.  There’s a reason why we have trials and lawyers and the opportunity for the accused to show their innocence.

Some might say that no safeguards are possible or necessary here, since we’re not talking about state violence, just individuals exercising their free speech right to vilify someone, demand their firing, that sort of thing.  Yet in today’s world, trial-by-Internet can be more consequential than the old kind of trial: would you rather spend a year in jail, but then be free to move to another town where no one knew about it, or have your Google search results tarnished with lurid accusations (let’s say, that you molested children) for the rest of your life—to have that forever prevent you from getting a job or a relationship, and have no way to correct the record?  With trial by Twitter, there’s no presumption of innocence, no requirement to prove that any other party was harmed, just the law of the schoolyard.

Whether shaming is justified in a particular case is a complicated question, but for whatever it’s worth, here are a few of the questions I would ask:

  • Did the person express a wish for anyone (or any group of people) to come to harm, or for anyone’s rights to be infringed?
  • Did the person express glee or mockery about anyone else’s suffering?
  • Did the person perpetrate a grievous factual falsehood—like, something one could prove was a falsehood in a court of law?
  • Did the person violate anyone else’s confidence?
  • How much does the speaker’s identity matter?  If it had been a man rather than a woman (or vice versa) saying parallel things, would we have taken equal offense?
  • Does the comment have what obscenity law calls “redeeming social value”?  E.g., does it express an unusual viewpoint, or lead to an interesting discussion?

Of course, even in those cases where shaming campaigns are justified, they’ll sometimes be unproductive and ill-advised.

Aren’t society’s most powerful fair targets for public criticism, even mocking or vicious criticism?

Of course.  Few would claim, for example, that we have an ethical obligation to ease up on Todd Akin over his “legitimate rape” remarks, since all the rage might give Akin an anxiety attack.  Completely apart from the (de)merits of the remarks, we accept that, when you become (let’s say) an elected official, a CEO, or a university president, part of the bargain is that you no longer get to complain if people organize to express their hatred of you.

But what’s striking about the cases in the NYT article is that it’s not public figures being gleefully destroyed: just ordinary people who in most cases, made one ill-advised joke or tweet, no worse than countless things you or I have probably said in private among friends.  The social justice warriors try to justify what would otherwise look like bullying by shifting attention away from individuals: sure, Justine Sacco might be a decent person, but she stands for the entire category of upper-middle-class, entitled white women, a powerful structural force against whom the underclass is engaged in a righteous struggle.  Like in a war, the enemy must be fought by any means necessary, even if it means picking off one hapless enemy foot-soldier to make an example to the rest.  And anyway, why do you care more about this one professional white woman, than about the millions of victims of racism?  Is it because you’re a racist yourself?

I find this line of thinking repugnant.  For it perverts worthy struggles for social equality into something callous and inhuman, and thereby undermines the struggles themselves.  It seems to me to have roughly the same relation to real human rights activism as the Inquisition did to the ethical teachings of Jesus.  It’s also repugnant because of its massive chilling effect: watching a few shaming campaigns is enough to make even the most well-intentioned writer want to hide behind a pseudonym, or only offer those ideas and experiences that are sure to win approval.  And the chilling effect is not some accidental byproduct; it’s the goal.  This negates what, for me, is a large part of the promise of the Internet: that if people from all walks of life can just communicate openly, everything made common knowledge, nothing whispered or secondhand, then all the well-intentioned people will eventually come to understand each other.


If I’m right that online shaming of decent people is a real problem that’s only going to get worse, what’s the solution?  Let’s examine five possibilities.

(1) Libel law.  For generations, libel has been recognized as one of the rare types of speech that even a liberal, democratic society can legitimately censor (along with fraud, incitement to imminent violence, national secrets, child porn, and a few others).  That libel is illegal reflects a realistic understanding of the importance of reputation: if, for example, CNN falsely reports that you raped your children, then it doesn’t really matter if MSNBC later corrects the record; your life as you knew it is done.

The trouble is, it’s not clear how to apply libel law in the age of social media.  In the cases we’re talking about, an innocent person’s life gets ruined because of the collective effect of thousands of people piling on to make nasty comments, and it’s neither possible nor desirable to prosecute all of them.  Furthermore, in many cases the problem is not that the shamers said anything untrue: rather, it’s that they “merely” took something true and spitefully misunderstood it, or blew it wildly, viciously, astronomically out of proportion.  I don’t see any legal remedies here.

(2) “Shame the shamers.”  Some people will say the only answer is to hit the shamers with their own weapons.  If an overzealous activist gets an innocent jokester fired from his job, shame the activist until she’s fired from her job.  If vigilantes post the jokester’s home address on the Internet with crosshairs overlaid, find the vigilantes’ home addresses and post those.  It probably won’t surprise many people that I’m not a fan of this solution.  For it only exacerbates the real problem: that of mob justice overwhelming reasoned debate.  The most I can say in favor of vigilantism is this: you probably don’t get to complain about online shaming, if what you’re being shamed for is itself a shaming campaign that you prosecuted against a specific person.

(In a decade writing this blog, I can think of exactly one case where I engaged in what might be called a shaming campaign: namely, against the Bell’s inequality denier Joy Christian.  Christian had provoked me over six years, not merely by being forehead-bangingly wrong about Bell’s theorem, but by insulting me and others when we tried to reason with him, and by demanding prize money from me because he had ‘proved’ that quantum computing was a fraud.  Despite that, I still regret the shaming aspects of my Joy Christian posts, and will strive not to repeat them.)

(3) Technological solutions.  We could try to change the functioning of the Internet, to make it harder to use it to ruin people’s lives.  This, more-or-less, is what the European Court of Justice was going for, with its much-discussed recent ruling upholding a “right to be forgotten” (more precisely, a right for individuals to petition for embarrassing information about them to be de-listed from search engines).  Alas, I fear that the Streisand effect, the Internet’s eternal memory, and the existence of different countries with different legal systems will forever make a mockery of all such technological solutions.  But, OK, given that Google is constantly tweaking its ranking algorithms anyway, maybe it could give less weight to cruel attacks against non-public-figures?  Or more weight (or even special placement) to sites explaining how the individual was cleared of the accusations?  There might be scope for such things, but I have the strong feeling that they should be done, if at all, on a voluntary basis.

(4) Self-censorship.  We could simply train people not to express any views online that might jeopardize their lives or careers, or at any rate, not to express those views under their real names.  Many people I’ve talked to seem to favor this solution, but I can’t get behind it.  For it effectively cedes to the most militant activists the right to decide what is or isn’t acceptable online discourse.  It tells them that they can use social shame as a weapon to get what they want.  When women are ridiculed for sharing stories of anorexia or being sexually assaulted or being discouraged from careers in science, it’s reprehensible to say that the solution is to teach those women to shut up about it.  I not only agree with that but go further: privacy is sometimes important, but is also an overrated value.  The respect that one rational person affords another for openly sharing the truth (or his or her understanding of the truth), in a spirit of sympathy and goodwill, is a higher value than privacy.  And the Internet’s ability to foster that respect (sometimes!) is worth defending.

(5) Standing up.  And so we come to the only solution that I can wholeheartedly stand behind.  This is for people who abhor shaming campaigns to speak out, loudly, for those who are unfairly shamed.

At the nadir of my own Twitter episode, when it felt like my life was now finished, throw in the towel, the psychiatrist Scott Alexander wrote a 10,000-word essay in my defense, which also ranged controversially into numerous other issues.  In a comment on his girlfriend Ozy’s blog, Alexander now says that he regrets aspects of Untitled (then again, it was already tagged “Things I Will Regret Writing” when he posted it!).  In particular, he now feels that the piece was too broad in its critique of feminism.  However, he then explains as follows what motivated him to write it:

Scott Aaronson is one of the nicest and most decent people in the world, who does nothing but try to expand human knowledge and support and mentor other people working on the same in a bunch of incredible ways. After a lot of prompting he exposed his deepest personal insecurities, something I as a psychiatrist have to really respect. Amanda Marcotte tried to use that to make mincemeat of him, casually, as if destroying him was barely worth her time. She did it on a site where she gets more pageviews than he ever will, among people who don’t know him, and probably stained his reputation among nonphysicists permanently. I know I have weird moral intuitions, but this is about as close to pure evil punching pure good in the face just because it can as I’ve ever seen in my life. It made me physically ill, and I mentioned the comments of the post that I lost a couple pounds pacing back and forth and shaking and not sleeping after I read it. That was the place I was writing from. And it was part of what seemed to me to be an obvious trend, and although “feminists vs. nerds” is a really crude way of framing it, I couldn’t think of a better one in that mental state and I couldn’t let it pass.

I had three reactions on reading this.  First, if there is a Scott in this discussion who’s “pure good,” then it’s not I.  Second, maybe the ultimate solution to the problem of online shaming mobs is to make a thousand copies of Alexander, and give each one a laptop with an Internet connection.  But third, as long as we have only one of him, the rest of us have a lot of work cut out for us.  I know, without having to ask, that the only real way I can thank Alexander for coming to my defense, is to use this blog to defend other people (anywhere on the ideological spectrum) who are attacked online for sharing in a spirit of honesty and goodwill.  So if you encounter such a person, let me know—I’d much prefer that to letting me know about the latest attempt to solve NP-complete problems in polynomial time with some analog contraption.


Unrelated Update: Since I started this post with Boaz Barak, let me also point to his recent blog post on why theoretical computer scientists care so much about asymptotics, despite understanding full well that the constants can overwhelm them in practice.  Boaz articulates something that I’ve tried to say many times, but he’s crisper and more eloquent.


Update (Feb. 27): Since a couple people asked, I explain here what I see as the basic problems with the “Dr. Nerdlove” site.


Update (Feb. 28): In the middle of this affair, perhaps the one thing that depressed me the most was Salon‘s “Plight of the bitter nerd” headline. Random idiots on the Internet were one thing, but how could a “serious,” “respectable” magazine lend its legitimacy to such casual meanness? I’ve now figured out the answer: I used to read Salon sometimes in the late 90s and early 2000s, but not since then, and I simply hadn’t appreciated how far the magazine had descended into clickbait trash. There’s an amusing fake Salon Twitter account that skewers the magazine with made-up headlines (“Ten signs your cat might be racist” / “Nerd supremacism: should we have affirmative action to get cool people into engineering?”), mixed with actual Salon headlines, in such a way that it would be difficult to tell many of them apart were they not marked. (Indeed, someone should write a web app where you get quizzed to see how well you can distinguish them.) “The plight of the bitter nerd” is offered there as one of the real headlines that’s indistinguishable from the parodies.

“The Man Who Tried to Redeem the World with Logic”

Wednesday, February 18th, 2015

No, I’m not talking about me!

Check out an amazing Nautilus article of that title by Amanda Gefter, a fine science writer of my acquaintance.  The article tells the story of Walter Pitts, who [spoiler alert] grew up on the mean streets of Prohibition-era Detroit, discovered Russell and Whitehead’s Principia Mathematica in the library at age 12 while hiding from bullies, corresponded with Russell about errors he’d found in the Principia, then ran away from home at age 15, co-invented neural networks with Warren McCulloch in 1943, became the protégé of Norbert Wiener at MIT, was disowned by Wiener because Wiener’s wife concocted a lie that Pitts and others who she hated had seduced Wiener’s daughter, and then became depressed and drank himself to death.  Interested yet?  It’s not often that I encounter a piece of nerd history that’s important and riveting and that had been totally unknown to me; this is one of the times.

Update (Feb. 19): Also in Nautilus, you can check out a fun interview with me.

Update (Feb. 24): In loosely-related news, check out a riveting profile of Geoffrey Hinton (and more generally, of deep learning, a.k.a. re-branded neural networks) in the Chronicle of Higher Education.  I had the pleasure of meeting Hinton when he visited MIT a few months ago; he struck me as an extraordinary person.  Hat tip to commenter Chris W.

Happy Second Birthday Lily

Wednesday, January 21st, 2015

cat2

Two years ago, I blogged when Lily was born.  Today I can blog that she runs, climbs, swims (sort of), constructs 3-word sentences, demands chocolate cake, counts to 10 in both English and Hebrew, and knows colors, letters, shapes, animals, friends, relatives, the sun, and the moon.  To all external appearances she’s now conscious as you and I are (and considerably more so than the cat in the photo).

But the most impressive thing Lily does—the thing that puts her far beyond where her parents were at the same age, in a few areas—is her use of the iPad.  There she does phonics exercises, plays puzzle games that aren’t always trivial for me to win, and watches educational videos on YouTube (skipping past the ads, and complaining if the Internet connection goes down).  She chooses the apps and videos herself, easily switching between them when she gets bored.  It’s a sight to behold, and definitely something to try with your own toddler if you have one.  (There’s a movement these days that encourages parents to ban kids from using touch-screen devices, fearful that too much screen time will distract them from the real world.  To which I reply: for better or worse, this is the real world that our kids will grow up into.)

People often ask whether Dana and I will steer Lily into becoming a theoretical computer scientist like us.  My answer is “hell no”: I’ll support Lily in whatever she wants to do, whether that means logic, combinatorics, algebraic geometry, or even something further afield like theoretical neuroscience or physics.

As recent events illustrated, the world is not always the kindest place for nerds (male or female), with our normal ways of thinking, talking, and interacting sometimes misunderstood by others in the cruelest ways imaginable.  Yet despite everything, nerds do sometimes manage to meet, get married, and even produce offspring with nerd potential of their own.  We’re here, we’re sometimes inappropriately clear, and we’re not going anywhere.

So to life!  And happy birthday Lily!

What I believe

Tuesday, December 30th, 2014

Two weeks ago, prompted by a commenter named Amy, I wrote by far the most personal thing I’ve ever made public—what’s now being referred to in some places as just “comment 171.”  My thinking was: I’m giving up a privacy that I won’t regain for as long as I live, opening myself to ridicule, doing the blog equivalent of a queen-and-two-rook sacrifice.  But at least—and this is what matters—no one will ever again be able to question the depth of my feminist ideals.  Not after they understand how I clung to those ideals through a decade when I wanted to die.  And any teenage male nerds who read this blog, and who find themselves in a similar hole, will know that they too can get out without giving up on feminism. Surely that’s a message any decent person could get behind?

Alas, I was overoptimistic.  Twitter is now abuzz with people accusing me of holding precisely the barbaric attitudes that my story was all about resisting, defeating, and escaping, even when life throws you into those nasty attitudes’ gravity well, even when it tests you as most of your critics will never be tested.  Many of the tweets are full of the courageous clucks of those who speak for justice as long as they’re pretty sure their friends will agree with them: wow just wow, so sad how he totes doesn’t get it, expletives in place of arguments.  This whole affair makes me despair of the power of language to convey human reality—or at least, of my own ability to use language for that end.  I took the most dramatic, almost self-immolating step I could to get people to see me as I was, rather than according to some preexisting mental template of a “privileged, entitled, elite male scientist.”  And many responded by pressing down the template all the more firmly, twisting my words until they fit, and then congratulating each other for their bravery in doing so.

Here, of course, these twitterers (and redditors and facebookers) inadvertently helped make my argument for me.  Does anyone still not understand the sort of paralyzing fear that I endured as a teenager, that millions of other nerds endure, and that I tried to explain in the comment—the fear that civilized people will condemn you as soon as they find out who you really are (even if the truth seems far from uncommonly bad), that your only escape is to hide or lie?

Thankfully, not everyone responded with snarls.  Throughout the past two weeks, I’ve been getting regular emails from shy nerds who thanked me profusely for sharing as I did, for giving them hope for their own lives, and for articulating a life-crushing problem that anyone who’s spent a day among STEM nerds knows perfectly well, but that no one acknowledges in polite company.  I owe the writers of those emails more than they owe me, since they’re the ones who convinced me that on balance, I did the right thing.

I’m equally grateful to have gotten some interesting, compassionate responses from feminist women.  The most striking was that of Laurie Penny in the New Statesman—a response that others of Penny’s views should study, if they want to understand how to win hearts and change minds.

I do not intend for a moment to minimise Aaronson’s suffering. Having been a lonely, anxious, horny young person who hated herself and was bullied I can categorically say that it is an awful place to be. I have seen responses to nerd anti-feminism along the lines of ‘being bullied at school doesn’t make you oppressed.’ Maybe it’s not a vector of oppression in the same way, but it’s not nothing. It burns. It takes a long time to heal.

Feminism, however, is not to blame for making life hell for ‘shy, nerdy men.’ Patriarchy is to blame for that. It is a real shame that Aaronson picked up Dworkin rather than any of the many feminist theorists and writers who manage to combine raw rage with refusal to resort to sexual shame as an instructive tool. Weaponised shame- male, female or other- has no place in any feminism I subscribe to. Ironically, Aronson [sic] actually writes a lot like Dworkin- he writes from pain felt and relived and wrenched from the intimate core of himself, and because of that his writing is powerfully honest, but also flawed …

What fascinates me about Aaronson’s piece, in which there was such raw, honest suffering, was that there was not one mention of women in any respect other than how they might relieve him from his pain by taking pity, or educating him differently. And Aaronson is not a misogynist. Aaronson is obviously a compassionate, well-meaning and highly intelligent man [damn straight—SA]

I’ll have more to say about Penny’s arguments in a later post—where I agree and where I part ways from her—but there’s one factual point I should clear up now.  When I started writing comment 171, I filled it with anecdotes from the happier part of my life (roughly, from age 24 onward): the part where I finally became able to ask; where women, with a frequency that I couldn’t have imagined as a teenager, actually answered ‘yes’; and where I got to learn about their own fears and insecurities and quirks.  In the earlier draft, I also wrote about my wife’s experiences as a woman in computer science, which differed from Amy’s in some crucial ways.  But then I removed it all, for a simple reason: because while I have the right to bare my own soul on my blog, I don’t have the right to bare other people’s unless they want me to.

Without further ado, and for the benefit of the world’s Twitterariat, I’m now just going to state nine of my core beliefs.

1. I believe that women are authors of their own stories, that they don’t exist merely to please men, that they are not homogeneous, that they’re not slot machines that ‘pay out’ but only if you say the right things.  I don’t want my two-year-old daughter to grow up to be anyone else’s property, and I’m happy that she won’t.  And I’d hope all this would no more need to be said, than (say) that Gentiles shouldn’t be slaughtered to use their blood in making matzo.

2. I believe everyone’s story should be listened to—and concretely, that everyone should feel 300% welcome to participate in my comments section.  I don’t promise to agree with you, but I promise to try to engage your ideas thoughtfully, whether you’re a man, woman, child, AI-bot, or unusually-bright keyboard-pecking chicken.  Indeed, I spend a nontrivial fraction of my life doing exactly that (well, not so much with chickens).

3. I believe no one has the right to anyone else’s sexual affections.  I believe establishing this principle was one of the triumphs of modern civilization.

4. I believe women who go into male-dominated fields like math, CS, and physics deserve praise, encouragement, and support.  But that’s putting the point too tepidly: if I get to pick 100 people (unrelated to me) to put onto a spaceship as the earth is being destroyed, I start thinking immediately about six or seven of my female colleagues in complexity and quantum computing.  And no, Twitter: not because being female, they could help repopulate the species.  Just because they’re great people.

5. I believe there still exist men who think women are inferior, that they have no business in science, that they’re good only for sandwich-making and sex.  Though I don’t consider it legally practicable, as a moral matter I’d be fine if every such man were thrown in prison for life.

6. I believe that even if they don’t hold views anything like the above (as, overwhelmingly, they don’t), there might be nerdy males who unintentionally behave in ways that tend to drive some women away from science.  I believe this is a complicated problem best approached with charity: we want win-win solutions, where no one is made to feel despised because of who they are.  Toward that end, I believe open, honest communication (as I’ve been trying to foster on this blog) is essential.

7. I believe that no one should be ashamed of inborn sexual desires: not straight men, not straight women, not gays, not lesbians, not even pedophiles (though in the last case, there might really be no moral solution other than a lifetime of unfulfilled longing).  Indeed, I’ve always felt a special kinship with gays and lesbians, precisely because the sense of having to hide from the world, of being hissed at for a sexual makeup that you never chose, is one that I can relate to on a visceral level.  This is one reason why I’ve staunchly supported gay marriage since adolescence, when it was still radical.  It’s also why the tragedy of Alan Turing, of his court-ordered chemical castration and subsequent suicide, was one of the formative influences of my life.

8. I believe that “the problem of the nerdy heterosexual male” is surely one of the worst social problems today that you can’t even acknowledge as being a problem—the more so, if you weight the problems by how likely academics like me are to know the sufferers and to feel a personal stake in helping them. How to help all the young male nerds I meet who suffer from this problem, in a way that passes feminist muster, and that triggers the world’s sympathy rather than outrage, is a problem that interests me as much as P vs. NP, and that right now seems about equally hard.

9. I believe that, just as there are shy, nerdy men, there are also shy, nerdy women, who likewise suffer from feeling unwanted, sexually invisible, or ashamed to express their desires.  On top of that, these women also have additional difficulties that come with being women!  At the same time, I also think there are crucial differences between the two cases—at least in the world as it currently exists—which might make the shy-nerdy-male problem vastly harder to solve than the shy-nerdy-female one.  Those differences, and my advice for shy nerdy females, will be the subject of another post.  (That’s the thing about blogging: in for a penny, in for a post.)


Update (Dec. 31): I struggle always to be ready to change my views in light of new arguments and evidence. After reflecting on the many thoughtful comments here, there are two concessions that I’m now willing to make.

The first concession is that, as Laurie Penny maintained, my problems weren’t caused by feminism, but rather by the Patriarchy. One thing I’ve learned these last few days is that, as many people use it, the notion of “Patriarchy” is sufficiently elastic as to encompass almost anything about the relations between the sexes that is, or has ever been, bad or messed up—regardless of who benefits, who’s hurt, or who instigated it. So if you tell such a person that your problem was not caused by the Patriarchy, it’s as if you’ve told a pious person that a certain evil wasn’t the Devil’s handiwork: the person has trouble even parsing what you said, since within her framework, “evil” and “Devil-caused” are close to synonymous. If you want to be understood, far better just to agree that it was Beelzebub and be done with it. This might sound facetious, but it’s really not: I believe in the principle of always adopting the other side’s terms of reference, whenever doing so will facilitate understanding and not sacrifice what actually matters to you.

Smash the Patriarchy!

The second concession is that, all my life, I’ve benefited from male privilege, white privilege, and straight privilege. I would only add that, for some time, I was about as miserable as it’s possible for a person to be, so that in an instant, I would’ve traded all three privileges for the privilege of not being miserable. And if, as some suggested, there are many women, blacks, and gays who would’ve gladly accepted the other side of that trade—well then, so much the better for all of us, I guess. “Privilege” simply struck me as a pompous, cumbersome way to describe such situations: why not just say that person A’s life stinks in this way, and person B’s stinks in that way? If they’re not actively bothering each other, then why do we also need to spread person A’s stink over to person B and vice versa, by claiming they’re each “privileged” by not having the other one’s?

However, I now understand why so many people became so attached to that word: if I won’t use it, they think it means I think that sexism, racism, and homophobia don’t exist, rather than just that I think people fixated on a really bad way to talk about these problems.


Update (Jan. 1): Yesterday I gave a seminar at the Hebrew University of Jerusalem. Since I’d been spending all my time dealing with comment-171-gate, I showed up with no slides, no notes, no anything—just me and the whiteboard. But for an hour and a half, I got to forget entirely about the thousands of people on the Internet I’d never met who were now calling me an asshole because of wild, “postmodernist” misreadings of a blog comment, which twisted what I said (and meant) into its exact opposite, building up a fake-Scott-Aaronson onto whom the ax-grinders could project all of their own bogeymen. For 90 minutes I got to forget all that, and just throw myself into separations between randomized and quantum query complexity. It was the most cathartic lecture of my life. And in the near future, I’d like more such catharses. Someday I’ll say more about the inexhaustibly-fascinating topic of nerds and sex—and in particular, I’ll write the promised post about shy female nerds—but not now. This will be my last post on the subject for a while.

On balance, I don’t regret having shared my story—because it prompted an epic discussion; because I learned so much from the dozens of other nerd coming-of-age stories that it drew out, similar to mine but also different; because what I learned will change the way I talk about these issues in the future; and most of all, because so many people, men and also some women, emailed me to say how my speaking out gave them hope for their own lives. But I do regret a few rhetorical flourishes, which I should have known might be misread maliciously, though I could never have guessed how maliciously. I never meant to minimize the suffering of other people, nor to deny that many others have had things as bad or worse than I did (again, how does one even compare?). I meant only that, if we’re going to discuss how to change the culture of STEM fields, or design sexual-conduct policies to minimize suffering, then I request a seat at the table not as the “white male powerful oppressor figure,” but as someone who also suffered something atypically extreme, overcame it, and gained relevant knowledge that way. I never meant to suggest that anyone else should leave the table.

To the people who tweeted that female MIT students should now be afraid to take classes with me: please check out the beautiful blog post by Yan, a female student who did take 6.045 with me. See also this by Lisa Danz and this by Chelsea Voss.

More broadly: thank you to everyone who sent me messages of support, but especially to all the female mathematicians and scientists who did so.  I take great solace from the fact that, of all the women and men whose contributions to the world I had respected beforehand, not one (to my knowledge) reacted to this affair in a mean-spirited way.

Happy New Year, everyone. May 2015 be a year of compassion and understanding.


Update (Jan. 2): If you’ve been following this at all, then please, please, please read Scott Alexander’s tour-de-force post. To understand what it was like for me to read this, after all I’ve been through the past few days, try to imagine Galileo’s Dialogue Concerning the Two Chief World Systems, the American Declaration of Independence, John Stuart Mill’s The Subjection of Women, and Clarence Darrow’s closing arguments in the Scopes trial all rolled into one, except with you as the protagonist. Reason and emotion are traditionally imagined as opposites, but that’s never seemed entirely right to me: while, yes, part of reason is learning how to separate out emotion, I never experience such intense emotion as when, like with Alexander’s piece, I see reason finally taking a stand, reason used to face down a thousand bullies and as a fulcrum to move the world.


Update (Jan. 13): Please check out this beautiful Quora answer by Jean Yang, a PhD student in MIT CSAIL. She’s answering the question: “What do you think of Scott Aaronson’s comment #171 and the subsequent posts?”

More generally, I’ve been thrilled by the almost-unanimously positive reactions that I’ve been getting these past two weeks from women in STEM fields, even as so many people outside STEM have responded with incomprehension and cruelty.  Witnessing that pattern has—if possible—made me even more of a supporter and admirer of STEM women than I was before this thing started.


Update (Jan. 17): See this comment on Lavinia Collins’s blog for my final response to the various criticisms that have been leveled at me.

The Turing movie

Tuesday, December 16th, 2014

Last week I finally saw The Imitation Game, the movie with Benedict Cumberbatch as Alan Turing.

OK, so for those who haven’t yet seen it: should you?  Here’s my one paragraph summary: imagine that you told the story of Alan Turing—one of the greatest triumphs and tragedies of human history, needing no embellishment whatsoever—to someone who only sort-of understood it, and who filled in the gaps with weird fabrications and Hollywood clichés.  And imagine that person retold the story to a second person, who understood even less, and that that person retold it to a third, who understood least of all, but who was charged with making the movie that would bring Turing’s story before the largest audience it’s ever had.  And yet, imagine that enough of the enormity of the original story made it through this noisy channel, that the final product was still pretty good.  (Except, imagine how much better it could’ve been!)

The fabrications were especially frustrating to me, because we know it’s possible to bring Alan Turing’s story to life in a way that fully honors the true science and history.  We know that, because Hugh Whitemore’s 1986 play Breaking the Code did it.  The producers of The Imitation Game would’ve done better just to junk their script, and remake Breaking the Code into a Hollywood blockbuster.  (Note that there is a 1996 BBC adaptation of Breaking the Code, with Derek Jacobi as Turing.)

Anyway, the movie focuses mostly on Turing’s codebreaking work at Bletchley Park, but also jumps around in time to his childhood at Sherborne School, and to his arrest for “homosexual indecency” and its aftermath.  Turing’s two world-changing papers—On Computable Numbers and Computing Machinery and Intelligence—are both mentioned, though strangely, his paper about computing zeroes of the Riemann zeta function is entirely overlooked.

Here are my miscellaneous comments:

  • The boastful, trash-talking, humor-impaired badass-nerd of the movie seems a lot closer to The Big Bang Theory‘s Sheldon Cooper, or to some other Hollywood concept of “why smart people are so annoying,” than to the historical Alan Turing.  (At least in Sheldon’s case, the archetype is used for laughs, not drama or veracity.)  As portrayed in the definitive biography (Andrew Hodges’ Alan Turing: The Enigma), Turing was eccentric, sure, and fiercely individualistic (e.g., holding up his pants with pieces of string), but he didn’t get off on insulting the intelligence of the people around him.
  • In the movie, Turing is pretty much singlehandedly responsible for designing, building, and operating the Bombes (the codebreaking machines), which he does over the strenuous objections of his superiors.  This, of course, is absurd: Bletchley employed about 10,000 people at its height.  Turing may have been the single most important cog in the operation, but he was still a cog.  And by November 1942, the operation was already running smoothly enough that Turing could set sail for the US (in waters that were now much safer, thanks to Bletchley!), to consult on other cryptographic projects at Bell Labs.
  • But perhaps the movie’s zaniest conceit is that Turing was also in charge of deciding what to do with Bletchley’s intelligence (!).  In the movie, it falls to him, not the military, to decide which ship convoys will be saved, and which sacrificed to prevent spilling Bletchley’s secret.  If that had any historicity to it, it would surely be the most military and political power ever entrusted to a mathematician (update: see the comments section for potential counterexamples).
  • It’s true that Turing (along with three other codebreakers) wrote a letter directly to Winston Churchill, pleading for more funding for Bletchley Park—and that Churchill saw the letter, and ordered “Action this day! Make sure they have all they want on extreme priority.”  However, the letter was not a power play to elevate Turing over Hugh Alexander and his other colleagues: in fact, Alexander co-signed the letter.  More broadly, the fierce infighting between Turing and everyone else at Bletchley Park, central to the movie’s plot, seems to have been almost entirely invented for dramatic purposes.
  • The movie actually deserves a lot of credit for getting right that the major technical problem of Bletchley Park was how to get the Bombes to search through keys fast enough—and that speeding things up is where Turing made a central contribution.  As a result, The Imitation Game might be the first Hollywood movie ever made whose plot revolves around computational efficiency.  (Counterexamples, anyone?)  Unfortunately, the movie presents Turing’s great insight as being that one can speed up the search by guessing common phrases, like “HEIL HITLER,” that are likely to be in the plaintext.  That was, I believe, obvious to everyone from the beginning.
  • Turing never built a computer in his own home, and he never named a computer “Christopher,” after his childhood crush Christopher Morcom.  (On the other hand, Christopher Morcom existed, and his early death from tuberculosis really did devastate Turing, sending him into morbid-yet-prescient ruminations about whether a mind could exist separately from a brain.)
  • I found it ironic that The Imitation Game, produced in 2014, is far more squeamish about on-screen homosexuality than Breaking the Code, produced in 1986.  Turing talks about being gay (which is an improvement over 2001’s Enigma, which made Turing straight!), but is never shown embracing another man.  However, the more important problem is that the movie botches the story of the burglary of Turing’s house (i.e., the event that led to Turing’s arrest and conviction for homosexual indecency), omitting the role of Turing’s own naiveté in revealing his homosexuality to the police, and substituting some cloak-and-dagger spy stuff.  Once again, Breaking the Code handled this perfectly.
  • In one scene, Euler is pronounced “Yooler.”

For more, see an excellent piece in Slate, How Accurate Is The Imitation Game?.  And for other science bloggers’ reactions, see this review by Christos Papadimitriou (which I thought was extremely kind, though it focuses more on Turing himself than on the movie), this reaction by Peter Woit, which largely echoes mine, and this by Clifford Johnson.