I’ve supervised a lot of great student projects in my nine years at MIT, but my inner nerdy teenager has never been as *personally* delighted by a project as it is right now. Today, I’m proud to announce that Adam Yedidia, a PhD student at MIT (but an MEng student when he did most of this work), has explicitly constructed a one-tape, two-symbol Turing machine with 7,918 states, whose behavior (when run on a blank tape) can never be proven from the usual axioms of set theory, under reasonable consistency hypotheses. Adam has also constructed a 4,888-state Turing machine that halts iff there’s a counterexample to Goldbach’s Conjecture, and a 5,372-state machine that halts iff there’s a counterexample to the Riemann Hypothesis. In all three cases, this is the first time we’ve had a reasonable explicit upper bound on how many states you need in a Turing machine before you can see the behavior in question.

Here’s our research paper, on which Adam generously included me as a coauthor, even though he did the heavy lifting. Also, here’s a github repository where you can download all the code Adam used to generate these Turing machines, and even use it to build your own small Turing machines that encode interesting mathematical statements. Finally, here’s a YouTube video where Adam walks you through how to use his tools.

A more precise statement of our main result is this: we give a 7,918-state Turing machine, called Z (and actually explicitly listed in our paper!), such that:

- Z runs forever, assuming the consistency of a large-cardinal theory called SRP (Stationary Ramsey Property), but
- Z can’t be
*proved* to run forever in ZFC (Zermelo-Fraenkel set theory with the Axiom of Choice, the usual foundation for mathematics), assuming that ZFC is consistent.

A bit of background: it follows, as an immediate consequence of Gödel’s Incompleteness Theorem, that there’s *some* computer program, of *some* length, that eludes the power of ordinary mathematics to prove what it does, when it’s run with an unlimited amount of memory. So for example, such a program could simply enumerate all the possible consequences of the ZFC axioms, one after another, and halt if it ever found a contradiction (e.g., a proof of 1+1=3). Assuming ZFC is consistent, this program must run forever. But again assuming ZFC is consistent, ZFC can’t *prove* that the program runs forever, since if it did, then it would prove its own consistency, thereby violating the Second Incompleteness Theorem!

Alas, this argument still leaves us in the dark about *where*, in space of computer programs, the “Gödelian gremlin” rears its undecidable head. A program that searches for an inconsistency in ZFC is a fairly complicated animal: it needs to encode not only the ZFC axiom schema, but also the language and inference rules of first-order logic. Such a program might be thousands of lines long if written in a standard programming language like C, or millions of instructions if compiled down to a bare-bones machine code. You’d certainly never run across such a program by chance—not even if you had a computer the size of the observable universe, trying one random program after another for billions of years in a “primordial soup”!

So the question stands—a question that strikes me as *obviously* important, even though as far as I know, only one or two people ever asked the question before us; see here for example. Namely: do the axioms of set theory suffice to analyze the behavior of every computer program that’s at most, let’s say, 50 machine instructions long? Or are there super-short programs that *already* exhibit “Gödelian behavior”?

Theoretical computer scientists might object that this is “merely a question of constants.” Well yes, OK, but the origin of life in our universe—a not entirely unrelated puzzle—is also “merely a question of constants”! In more detail, we know that it’s *possible* with our laws of physics to build a self-replicating machine: say, DNA or RNA and their associated paraphernalia. We also know that tiny molecules like H_{2}O and CO_{2} are not self-replicating. But we don’t know *how small* the smallest self-replicating molecule can be—and that’s an issue that influences whether we should expect to find ourselves alone in the universe or find it teeming with life.

Some people might also object that what we’re asking about has already been studied, in the half-century quest to design the smallest universal Turing machine (the subject of Stephen Wolfram’s $25,000 prize in 2007, to which I responded with my own $25.00 prize). But I see that as fundamentally different, for the following reason. A universal Turing machine—that is, a machine that simulates any other machine that’s described to it on its input tape—has the privilege of offloading almost all of its complexity onto the description format for the input machine. So indeed, that’s exactly what all known tiny universal machines do! But a program that checks (say) Goldbach’s Conjecture, or the Riemann Hypothesis, or the consistency of set theory, on an initially blank tape, has no such liberty. For such machines, the number of states really *does* seem like an intrinsic measure of complexity, because the complexity can’t be shoehorned anywhere else.

One can also phrase what we’re asking in terms of the infamous Busy Beaver function. Recall that BB(n), or the n^{th} Busy Beaver number, is defined to be the maximum number of steps that any n-state Turing machine takes when run on an initially blank tape, assuming that the machine eventually halts. The Busy Beaver function was the centerpiece of my 1998 essay Who Can Name the Bigger Number?, which *might* still attract more readers than anything else I’ve written since. As I stressed there, if you’re in a biggest-number-naming contest, and you write “BB(10000),” you’ll *destroy* any opponent—however otherwise mathematically literate they are—who’s innocent of computability theory. For BB(n) grows faster than any computable sequence of integers: indeed, if it didn’t, then one could use that fact to solve the halting problem, contradicting Turing’s theorem.

But the BB function has a second amazing property: namely, it’s a perfectly well-defined integer function, and yet once you fix the axioms of mathematics, only finitely many values of the function can ever be *proved*, even in principle. To see why, consider again a Turing machine M that halts if and only if there’s a contradiction in ZF set theory. Clearly such a machine could be built, with some finite number of states k. But then ZF set theory can’t possibly determine the value of BB(k) (or BB(k+1), BB(k+2), etc.), unless ZF is inconsistent! For to do so, ZF would need to prove that M ran forever, and therefore prove its own consistency, and therefore be inconsistent by Gödel’s Theorem.

OK, but we can now ask a quantitative question: *how many* values of the BB function is it possible for us to know? Where exactly is the precipice at which this function “departs the realm of mortals and enters the realm of God”: is it closer to n=10 or to n=10,000,000? In practice, *four* values of BB have been determined so far:

- BB(1)=1
- BB(2)=6
- BB(3)=21 (Lin and Rado 1965)
- BB(4)=107 (Brady 1975)

We also know some lower bounds:

See Heiner Marxen’s page or the Googology Wiki (which somehow I only learned about today) for more information.

Some Busy Beaver enthusiasts have opined that even BB(6) will never be known exactly. On the other hand, the abstract argument from before tells us only that, if we confine ourselves to (say) ZF set theory, then there’s *some* k—possibly in the tens of millions or higher—such that the values of BB(k), BB(k+1), BB(k+2), and so on can never be proven. So again: is the number of knowable values of the BB function more like 10, or more like a million?

This is the question that Adam and I (but mostly Adam) have finally addressed.

It’s hopeless to design a Turing machine by hand for all but the simplest tasks, so as a first step, Adam created a new programming language, called Laconic, specifically for writing programs that compile down to small Turing machines. Laconic programs actually compile to an intermediary language called TMD (Turing Machine Descriptor), and from there to Turing machines.

Even then, we estimate that a direct attempt to write a Laconic program that searched for a contradiction in ZFC would lead to a Turing machine with millions of states. There were three ideas needed to get the state count down to something reasonable.

The first was to take advantage of the work of Harvey Friedman, who’s one of the one or two people I mentioned earlier who’s written about these problems before. In particular, Friedman has been laboring since the 1960s to find “natural” arithmetical statements that are provably independent of ZFC or other strong set theories. (See this *AMS Notices* piece by Martin Davis for a discussion of Friedman’s progress as of 2006.) Not only does Friedman’s quest continue, but some of his most important progress has come only within the last year. His statements—typically involving objects called “order-invariant graphs”—strike me as alien, and as far removed from anything I’d personally have independent reasons to think about (but is that just a sign of my limited perspective?). Be that as it may, Friedman’s statements *still* seem a lot easier to encode as short computer programs than the full apparatus of first-order logic and set theory! So that’s what we started with; our work wouldn’t have been possible without Friedman (who we consulted by email throughout the project).

The second idea was something we called “on-tape processing.” Basically, instead of compiling directly from Laconic down to Turing machine, Adam wrote an *interpreter* in Turing machine (which took about 4000 states—a single, fixed cost), and then had the final Turing machine first write a higher-level program onto its tape and then interpret that program. Instead of the compilation process producing a huge multiplicative overhead in the number of Turing machine states (and a repetitive machine), this approach gives us only an additive overhead. We found that this one idea decreased the number of states by roughly an order of magnitude.

The third idea was first suggested in 2002 by Ben-Amram and Petersen (and refined for us by Luke Schaeffer); we call it “introspective encoding.” When we write the program to be interpreted onto the Turing machine tape, the naïve approach would use one Turing machine state per bit. But that’s clearly wasteful, since in an n-state Turing machine, every state contains ~log(n) bits of information (because of the other states it needs to point to). A better approach tries to exploit as many of those bits as it can; doing that gave us up to a factor-of-5 additional savings in the number of states.

For Goldbach’s Conjecture and the Riemann Hypothesis, we paid the same 4000-state overhead for the interpreter, but then the program to be interpreted was simpler, giving a smaller overall machine. Incidentally, it’s not intuitively obvious that the Riemann Hypothesis is equivalent to the statement that some particular computer program runs forever, but it is—that follows, for example, from work by Lagarias and by Davis, Matijasevich, and Robinson (we used the latter; an earlier version of this post incorrectly stated that we used the Lagarias result).

To preempt the inevitable question in the comments section: yes, we *did* run these Turing machines for a while, and no, none of them had halted after a day or so. But before you interpret that as evidence in favor of Goldbach, Riemann, and the consistency of ZFC, you should probably know that a Turing machine to test whether *all perfect squares are less than 5*, produced using Laconic, needed to run for more than an hour before it found the first counterexample (namely, 3^{2}=9) and halted. Laconic Turing machines are optimized only for the number of states, not for speed, to put it mildly.

Of course, three orders of magnitude still remain between the largest value of n (namely, 4) for which BB(n) is known to be knowable in ZFC-based mathematics, and the smallest value of n (namely, 7,918) for which BB(n) is known to be unknowable. I’m optimistic that further improvements are possible to the machine Z—whether that means simplifications to Friedman’s statement, a redesigned interpreter (possibly using lambda calculus?), or a “multi-stage rocket model” where a bare-bones interpreter would be used to unpack a second, richer interpreter which would be used to unpack a third, etc., until you got to the actual program you cared about. But I’d be *shocked* if anyone in my lifetime determined the value of BB(10), for example, or proved the value independent of set theory. Even after the Singularity happens, I imagine that our robot overlords would find the determination of BB(10) quite a challenge.

In an early *Shtetl-Optimized* post, I described theoretical computer science as “quantitative epistemology.” Constructing small Turing machines whose behavior eludes set theory is not conventional theoretical computer science by any stretch of the imagination: it’s closer in practice to programming languages or computer architecture, or even the recreational practice known as code-golfing. On the other hand, I’ve never been involved with any other project that was so clearly, explicitly about pinning down the quantitative boundary between the knowable and the unknowable.

Comments on our paper are welcome.

**Addendum:** Some people might wonder “why Turing machines,” as opposed to a more reasonable programming language like C or Python. Well, first of all, we needed a language that could address an unlimited amount of memory. Also, the BB function is traditionally defined in terms of Turing machines. But the most important issue is that we wanted there to be *no suspicion whatsoever* that our choice of programming language was artificially helping to make our machine small. And hopefully everyone can agree that one-tape, two-symbol Turing machines aren’t designed for *anyone’s* convenience!