by Scott Aaronson

[Author's blog]

[This essay in Spanish]
In an old joke, two noblemen vie to name the bigger number. The first, after ruminating for hours, triumphantly announces "Eighty-three!" The second, mightily impressed, replies "You win." A biggest number contest is clearly pointless when the contestants take turns. But what if the contestants write down their numbers simultaneously, neither aware of the other’s? To introduce a talk on "Big Numbers," I invite two audience volunteers to try exactly this. I tell them the rules:
You have fifteen seconds. Using standard math notation, English words, or both, name a single whole number—not an infinity—on a blank index card. Be precise enough for any reasonable modern mathematician to determine exactly what number you’ve named, by consulting only your card and, if necessary, the published literature. So contestants can’t say "the number of sand grains in the Sahara," because sand drifts in and out of the Sahara regularly. Nor can they say "my opponent’s number plus one," or "the biggest number anyone’s ever thought of plus one"—again, these are ill-defined, given what our reasonable mathematician has available. Within the rules, the contestant who names the bigger number wins.
Are you ready? Get set. Go. The contest’s results are never quite what I’d hope. Once, a seventh-grade
boy filled his card with a string of successive 9’s. Like many other big-number
tyros, he sought to maximize his number by stuffing a 9 into every place value.
Had he chosen easy-to-write 1’s rather than curvaceous 9’s, his number could
have been millions of times bigger. He still would been decimated, though, by
the girl he was up against, who wrote a string of 9’s followed by the
superscript And yet the girl’s number could have been much bigger still, had she stacked
the mighty exponential more than once. Take , for example. This
behemoth, equal to 9 Place value, exponentials, stacked exponentials: each can express boundlessly
big numbers, and in this sense they’re all equivalent. But the notational
systems differ dramatically in the numbers they can express Such paradigms are historical rarities. We find a flurry in antiquity, another flurry in the twentieth century, and nothing much in between. But when a new way to express big numbers concisely does emerge, it’s often a byproduct of a major scientific revolution: systematized mathematics, formal logic, computer science. Revolutions this momentous, as any Kuhnian could tell you, only happen under the right social conditions. Thus is the story of big numbers a story of human progress. And herein lies a parallel with another mathematical story. In his remarkable
and underappreciated book This same pattern holds, I think, for big numbers. Curiosity and openness
lead to fascination with big numbers, and to the buoyant view that no quantity,
whether of the number of stars in the galaxy or the number of possible bridge
hands, is too immense for the mind to enumerate. Conversely, ignorance and
irrationality lead to fatalism concerning big numbers. Historian Ilan Vardi cites the ancient Greek
term ¨ But sand doesn’t escape counting, as Archimedes recognized in the third
century B.C. Here’s how he began There are some ... who think that the number of the sand is infinite in multitude ... again there are some who, without regarding it as infinite, yet think that no number has been named which is great enough to exceed its multitude ... But I will try to show you [numbers that] exceed not only the number of the mass of sand equal in magnitude to the earth ... but also that of a mass equal in magnitude to the universe. This Archimedes proceeded to do, essentially by using the ancient Greek term
Consider, for example, the oft-repeated legend of the Grand Vizier in Persia
who invented chess. The King, so the legend goes, was delighted with the new
game, and invited the Vizier to name his own reward. The Vizier replied that,
being a modest man, he desired only one grain of wheat on the first square of a
chessboard, two grains on the second, four on the third, and so on, with twice
as many grains on each square as on the last. The innumerate King agreed, not
realizing that the total number of grains on all 64 squares would be
2 Fittingly, this same exponential growth is what makes chess itself so
difficult. There are only about 35 legal choices for each chess move, but the
choices multiply exponentially to yield something like 10 Although computers will probably never solve NP-complete problems
efficiently, there’s more hope for another grail of computer science:
replicating human intelligence. The human brain has roughly a hundred billion
neurons linked by a hundred trillion synapses. And though the function of an
individual neuron is only partially understood, it’s thought that each neuron
fires electrical impulses according to relatively simple rules up to a thousand
times each second. So what we have is a highly interconnected computer capable
of maybe 10 To prognosticators of artificial intelligence, Moore’s Law is a glorious
herald of exponential growth. But exponentials have a drearier side as well. The
human population recently passed six billion and is doubling about once every
forty years. At this exponential rate, if an average person weighs seventy
kilograms, then by the year 3750 the entire Earth will be composed of human
flesh. But before you invest in deodorant, realize that the population will stop
increasing long before this—either because of famine, epidemic disease, global
warming, mass species extinctions, unbreathable air, or, entering the
speculative realm, birth control. It’s not hard to fathom why physicist Albert
Bartlett asserted "the greatest shortcoming of the human race" to be "our
inability to understand the exponential function." Or why Carl Sagan advised us
to "never underestimate an exponential." In his book ¨ Exponentials are familiar, relevant, intimately connected to the physical world and to human hopes and fears. Using the notational systems I’ll discuss next, we can concisely name numbers that make exponentials picayune by comparison, that subjectively speaking exceed as much as the latter exceeds 9. But these new systems may seem more abstruse than exponentials. In his essay "On Number Numbness," Douglas Hofstadter leads his readers to the precipice of these systems, but then avers: If we were to continue our discussion just one zillisecond longer, we would find ourselves smack-dab in the middle of the theory of recursive functions and algorithmic complexity, and that would be too abstract. So let’s drop the topic right here. But to drop the topic is to forfeit, not only the biggest number contest, but any hope of understanding how stronger paradigms lead to vaster numbers. And so we arrive in the early twentieth century, when a school of mathematicians called the formalists sought to place all of mathematics on a rigorous axiomatic basis. A key question for the formalists was what the word ‘computable’ means. That is, how do we tell whether a sequence of numbers can be listed by a definite, mechanical procedure? Some mathematicians thought that ‘computable’ coincided with a technical notion called ‘primitive recursive.’ But in 1928 Wilhelm Ackermann disproved them by constructing a sequence of numbers that’s clearly computable, yet grows too quickly to be primitive recursive. Ackermann’s idea was to create an endless procession of arithmetic operations, each more powerful than the last. First comes addition. Second comes multiplication, which we can think of as repeated addition: for example, 5´3 means 5 added to itself 3 times, or 5+5+5 = 15. Third comes exponentiation, which we can think of as repeated multiplication. Fourth comes ... what? Well, we have to invent a weird new operation, for repeated exponentiation. The mathematician Rudy Rucker calls it ‘tetration.’ For example, ‘5 tetrated to the 3’ means 5 raised to its own power 3 times, or , a number with 2,185 digits. We can go on. Fifth comes repeated tetration: shall we call it ‘pentation’? Sixth comes repeated pentation: ‘hexation’? The operations continue infinitely, with each one standing on its predecessor to peer even higher into the firmament of big numbers. If each operation were a candy flavor, then the Ackermann sequence would be
the sampler pack, mixing one number of each flavor. First in the sequence is
1+1, or (don’t hold your breath) 2. Second is 2´2, or
4. Third is 3 raised to the 3
Fee. Fi. Fo. Fum. Fourth is 4 tetrated to the 4, or , which has
10 Wielding the Ackermann sequence, we can clobber unschooled opponents in the biggest-number contest. But we need to be careful, since there are several definitions of the Ackermann sequence, not all identical. Under the fifteen-second time limit, here’s what I might write to avoid ambiguity:
A(111)—Ackermann seq—A(1)=1+1, A(2)=2´2, A(3)=3 Recondite as it seems, the Ackermann sequence does have some applications. A
problem in an area called Ramsey theory asks for the minimum dimension of a
hypercube satisfying a certain property. The true dimension is thought to be 6,
but the lowest dimension anyone’s been able is prove is so huge that it can only
be expressed using the same ‘weird arithmetic’ that underlies the Ackermann
sequence. Indeed, the ¨ Ackermann numbers are pretty big, but they’re not yet big enough. The quest
for still bigger numbers takes us back to the formalists. After Ackermann
demonstrated that ‘primitive recursive’ isn’t what we mean by ‘computable,’ the
question still stood: what "Computing," said Turing, is normally done by writing certain symbols on paper. We may suppose this paper to be divided into squares like a child’s arithmetic book. In elementary arithmetic the 2-dimensional character of the paper is sometimes used. But such use is always avoidable, and I think it will be agreed that the two-dimensional character of paper is no essential of computation. I assume then that the computation is carried out on one-dimensional paper, on a tape divided into squares. Turing continued to explicate his machine using ingenious reasoning from first principles. The tape, said Turing, extends infinitely in both directions, since a theoretical machine ought not be constrained by physical limits on resources. Furthermore, there’s a symbol written on each square of the tape, like the ‘1’s and ‘0’s in a modern computer’s memory. But how are the symbols manipulated? Well, there’s a ‘tape head’ moving back and forth along the tape, examining one square at a time, writing and erasing symbols according to definite rules. The rules are the tape head’s program: change them, and you change what the tape head does. Turing’s august insight was that we can program the tape head to carry out
Nope. Turing proved that this problem, called the Halting Problem, is unsolvable by Turing machines. The proof is a beautiful example of self-reference. It formalizes an old argument about why you can never have perfect introspection: because if you could, then you could determine what you were going to do ten seconds from now, and then do something else. Turing imagined that there was a special machine that could solve the Halting Problem. Then he showed how we could have this machine analyze itself, in such a way that it has to halt if it runs forever, and run forever if it halts. Like a hound that finally catches its tail and devours itself, the mythical machine vanishes in a fury of contradiction. (That’s the sort of thing you don’t say in a research paper.) ¨ "Very nice," you say (or perhaps you say, "not nice at all"). "But what does
all this have to do with big numbers?" Aha! The connection wasn’t published
until May of 1962. Then, in the His idea was simple. Just as we can classify words by how many letters they
contain, we can classify Turing machines by how many rules they have in the tape
head. Some machines have only one rule, others have two rules, still others have
three rules, and so on. But for each fixed whole number N, just as there are
only finitely many distinct words with N letters, so too are there only finitely many distinct
machines with N rules. Among these machines, some halt and others run forever
when started on a blank tape. Of the ones that halt, asked Rado, what’s the
maximum number of steps that any machine takes Rado called this maximum the N Now, suppose we knew the N But here’s a curious fact. Suppose we could name a number Conclusion? The sequence of Busy Beaver numbers, BB(1), BB(2), and so on,
grows faster than This means that no computer program could list all the Busy Beavers one by one. It doesn’t mean that specific Busy Beavers need remain eternally unknowable. And in fact, pinning them down has been a computer science pastime ever since Rado published his article. It’s easy to verify that BB(1), the first Busy Beaver number, is 1. That’s because if a one-rule Turing machine doesn’t halt after the very first step, it’ll just keep moving along the tape endlessly. There’s no room for any more complex behavior. With two rules we can do more, and a little grunt work will ascertain that BB(2) is 6. Six steps. What about the third Busy Beaver? In 1965 Rado, together with Shen Lin, proved that BB(3) is 21. The task was an arduous one, requiring human analysis of many machines to prove that they don’t halt—since, remember, there’s no algorithm for listing the Busy Beaver numbers. Next, in 1983, Allan Brady proved that BB(4) is 107. Unimpressed so far? Well, as with the Ackermann sequence, don’t be fooled by the first few numbers. In 1984, A.K. Dewdney devoted a Indeed, already the top five and six-rule contenders elude us: we can’t explain how they ‘work’ in human terms. If creativity imbues their design, it’s not because humans put it there. One way to understand this is that even small Turing machines can encode profound mathematical problems. Take Goldbach’s conjecture, that every even number 4 or higher is a sum of two prime numbers: 10=7+3, 18=13+5. The conjecture has resisted proof since 1742. Yet we could design a Turing machine with, oh, let’s say 100 rules, that tests each even number to see whether it’s a sum of two primes, and halts when and if it finds a counterexample to the conjecture. Then knowing BB(100), we could in principle run this machine for BB(100) steps, decide whether it halts, and thereby resolve Goldbach’s conjecture. We need not venture far in the sequence to enter the lair of basilisks. But as Rado stressed, even if we can’t list the Busy Beaver numbers, they’re perfectly well-defined mathematically. If you ever challenge a friend to the biggest number contest, I suggest you write something like this:
BB(11111)—Busy Beaver shift #—1, 6, 21, etc If your friend doesn’t know about Turing machines or anything similar, but only about, say, Ackermann numbers, then you’ll win the contest. You’ll still win even if you grant your friend a handicap, and allow him the entire lifetime of the universe to write his number. The key to the biggest number contest is a potent paradigm, and Turing’s theory of computation is potent indeed. ¨ But what if your friend knows about Turing machines as well? Is there a notational system for big numbers more powerful than even Busy Beavers? Suppose we could endow a Turing machine with a magical ability to solve the
Halting Problem. What would we get? We’d get a ‘super Turing machine’: one with
abilities beyond those of any ordinary machine. But now, how hard is it to
decide whether a Imagine a novel, which is imbedded in a longer novel, which itself is
imbedded in an even And there’s no escape. Suppose a Turing machine had a magical ability to
solve the Halting Problem, But how’s this relevant to big numbers? Well, each level of Kleene’s
hierarchy generates a faster-growing Busy Beaver sequence than do all the
previous levels. Indeed, each level’s sequence grows so rapidly that it can only
be computed by a higher level. For example, define BB You might think that now, in the biggest-number contest, you could obliterate even an opponent who uses the Busy Beaver sequence by writing something like this:
BB But not quite. The problem is that I’ve never seen these "higher-level Busy Beavers" defined anywhere, probably because, to people who know computability theory, they’re a fairly obvious extension of the ordinary Busy Beaver numbers. So our reasonable modern mathematician wouldn’t know what number you were naming. If you want to use higher-level Busy Beavers in the biggest number contest, here’s what I suggest. First, publish a paper formalizing the concept in some obscure, low-prestige journal. Then, during the contest, cite the paper on your index card. To exceed higher-level Busy Beavers, we’d presumably need some new computational model surpassing even Turing machines. I can’t imagine what such a model would look like. Yet somehow I doubt that the story of notational systems for big numbers is over. Perhaps someday humans will be able concisely to name numbers that make Busy Beaver 100 seem as puerile and amusingly small as our nobleman’s eighty-three. Or if we’ll never name such numbers, perhaps other civilizations will. Is a biggest number contest afoot throughout the galaxy? ¨ You might wonder why we can’t transcend the whole parade of paradigms, and name numbers by a system that encompasses and surpasses them all. Suppose you wrote the following in the biggest number contest:
The biggest whole number nameable with 1,000 characters of English text Surely this number exists. Using 1,000 characters, we can name only finitely many numbers, and among these numbers there has to be a biggest. And yet we’ve made no reference to how the number’s named. The English text could invoke Ackermann numbers, or Busy Beavers, or higher-level Busy Beavers, or even some yet more sweeping concept that nobody’s thought of yet. So unless our opponent uses the same ploy, we’ve got him licked. What a brilliant idea! Why didn’t we think of this earlier? Unfortunately it doesn’t work. We might as well have written
One plus the biggest whole number nameable with 1,000 characters of English text This number takes at least 1,001 characters to name. Yet we’ve just named it with only 80 characters! Like a snake that swallows itself whole, our colossal number dissolves in a tumult of contradiction. What gives? The paradox I’ve just described was first published by Bertrand Russell, who attributed it to a librarian named G. G. Berry. The Berry Paradox arises not from mathematics, but from the ambiguity inherent in the English language. There’s no surefire way to convert an English phrase into the number it names (or to decide whether it names a number at all), which is why I invoked a "reasonable modern mathematician" in the rules for the biggest number contest. To circumvent the Berry Paradox, we need to name numbers using a precise, mathematical notational system, such as Turing machines—which is exactly the idea behind the Busy Beaver sequence. So in short, there’s no wily language trick by which to surpass Archimedes, Ackermann, Turing, and Rado, no royal road to big numbers. You might also wonder why we can’t use infinity in the contest. The answer is, for the same reason why we can’t use a rocket car in a bike race. Infinity is fascinating and elegant, but it’s not a whole number. Nor can we ‘subtract from infinity’ to yield a whole number. Infinity minus 17 is still infinity, whereas infinity minus infinity is undefined: it could be 0, 38, or even infinity again. Actually I should speak of infinities, plural. For in the late nineteenth century, Georg Cantor proved that there are different levels of infinity: for example, the infinity of points on a line is greater than the infinity of whole numbers. What’s more, just as there’s no biggest number, so too is there no biggest infinity. But the quest for big infinities is more abstruse than the quest for big numbers. And it involves, not a succession of paradigms, but essentially one: Cantor’s. ¨ So here we are, at the frontier of big number knowledge. As Euclid’s disciple
supposedly asked, "what is the Imagine trying to explain the Turing machine to Archimedes. The genius of Syracuse listens patiently as you discuss the papyrus tape extending infinitely in both directions, the time steps, states, input and output sequences. At last he explodes. "Foolishness!" he declares (or the ancient Greek equivalent). "All you’ve given me is an elaborate definition, with no value outside of itself." How do you respond? Archimedes has never heard of computers, those
cantankerous devices that, twenty-three centuries from his time, will transact
the world’s affairs. So you can’t claim practical application. Nor can you
appeal to Hilbert and the formalist program, since Archimedes hasn’t heard of
those either. But then it hits you: the Busy Beaver sequence. You define the
sequence for Archimedes, convince him that BB(1000) is more than his
10 Indeed, one could define science as reason’s attempt to compensate for our
inability to perceive big numbers. If we could run at 280,000,000 meters per
second, there’d be no need for a special theory of relativity: it’d be obvious
to everyone that the faster we go, the heavier and squatter we get, and the
faster time elapses in the rest of the world. If we could live for 70,000,000
years, there’d be no theory of evolution, and But Whence the cowering before big numbers, then? Does it have a biological
origin? In 1999, a group led by neuropsychologist
Stanislas Dehaene reported evidence in If Dehaene et al.’s hypothesis is correct, then which representation do we use for big numbers? Surely the symbolic one—for nobody’s mental number line could be long enough to contain , 5 pentated to the 5, or BB(1000). And here, I suspect, is the problem. When thinking about 3, 4, or 7, we’re guided by our spatial intuition, honed over millions of years of perceiving 3 gazelles, 4 mates, 7 members of a hostile clan. But when thinking about BB(1000), we have only language, that evolutionary neophyte, to rely upon. The usual neural pathways for representing numbers lead to dead ends. And this, perhaps, is why people are afraid of big numbers. Could early intervention mitigate our big number phobia? What if second-grade
math teachers took an hour-long hiatus from stultifying busywork to ask their
students, "How do you name really, Who can name the bigger number? Whoever has the deeper paradigm. Are you ready? Get set. Go.
References Petr Beckmann, Allan H. Brady, "The Determination of the Value of Rado’s Noncomputable
Function Sigma(k) for Four-State Turing Machines," Gregory J. Chaitin, "The Berry Paradox," A.K. Dewdney, S. Dehaene and E. Spelke and P. Pinel and R. Stanescu and S. Tsivkin,
"Sources of Mathematical Thinking: Behavioral and Brain-Imaging Evidence,"
Douglas Hofstadter, Robert Kanigel, Stephen C. Kleene, "Recursive predicates and quantifiers," Donald E. Knuth, Dexter C. Kozen, ———, Shen Lin and Tibor Rado, "Computer studies of Turing machine problems,"
Heiner Marxen, Busy Beaver, at http://www.drb.insel.de/~heiner/BB/. ——— and Jürgen Buntrock, "Attacking the Busy Beaver 5," Tibor Rado, "On Non-Computable Functions," Rudy Rucker, Carl Sagan, Michael Somos, "Busy Beaver Turing Machine." At http://grail.cba.csuohio.edu/~somos/bb.html. Alan Turing, "On computable numbers, with an application to the
Entscheidungsproblem," Ilan Vardi, "Archimedes, the Sand Reckoner," at http://www.ihes.fr/~ilan/sand_reckoner.ps. Eric W. Weisstein, |