The Busy Beaver Frontier

Update (July 27): I now have a substantially revised and expanded version (now revised and expanded even a second time), which incorporates (among other things) the extensive feedback that I got from this blog post. There are new philosophical remarks, some lovely new open problems, and an even-faster-growing (!) integer sequence. Check it out!

A life that was all covid, cancellations, and Trump, all desperate rearguard defense of the beleaguered ideals of the Enlightenment, would hardly be worth living. So it was an exquisite delight, these past two weeks, to forget current events and write an 18-page survey article about the Busy Beaver function: the staggeringly quickly-growing function that probably encodes a huge portion of all interesting mathematical truth in its first hundred values, if only we could know those values or exploit them if we did.

Without further ado, here’s the title, abstract, and link:

The Busy Beaver Frontier
by Scott Aaronson

The Busy Beaver function, with its incomprehensibly rapid growth, has captivated generations of computer scientists, mathematicians, and hobbyists. In this survey, I offer a personal view of the BB function 58 years after its introduction, emphasizing lesser-known insights, recent progress, and especially favorite open problems. Examples of such problems include: when does the BB function first exceed the Ackermann function? Is the value of BB(20) independent of set theory? Can we prove that BB(n+1)>2BB(n) for large enough n? Given BB(n), how many advice bits are needed to compute BB(n+1)? Do all Busy Beavers halt on all inputs, not just the 0 input? Is it decidable whether BB(n) is even or odd?

The article is slated to appear soon in SIGACT News. I’m grateful to Bill Gasarch for suggesting it—even with everything else going on, this was a commission I felt I couldn’t turn down!

Besides Bill, I’m grateful to the various Busy Beaver experts who answered my inquiries, to Marijn Heule and Andy Drucker for suggesting some of the open problems, to Marijn for creating a figure, and to Lily, my 7-year-old daughter, for raising the question about the first value of n at which the Busy Beaver function exceeds the Ackermann function. (Yes, Lily’s covid homeschooling has included multiple lessons on very large positive integers.)

There are still a few days until I have to deliver the final version. So if you spot anything wrong or in need of improvement, don’t hesitate to leave a comment or send an email. Thanks in advance!

Of course Busy Beaver has been an obsession that I’ve returned to many times in my life: for example, in that Who Can Name the Bigger Number? essay that I wrote way back when I was 18, in Quantum Computing Since Democritus, in my public lecture at Festivaletteratura, and in my 2016 paper with Adam Yedidia that showed that the values of all Busy Beaver numbers beyond the 7910th are independent of the axioms of set theory (Stefan O’Rear has since shown that independence starts at the 748th value or sooner). This survey, however, represents the first time I’ve tried to take stock of BusyBeaverology as a research topic—collecting in one place all the lesser-known theorems and empirical observations and open problems that I found the most striking, in the hope of inspiring not just contemplation or wonderment but actual progress.

Within the last few months, the world of deep mathematics that you can actually explain to a child lost two of its greatest giants: John Conway (who died of covid, and who I eulogized here) and Ron Graham. One thing I found poignant, and that I didn’t know before I started writing, is that Conway and Graham both play significant roles in the story of the Busy Beaver function. Conway, because most of the best known candidates for Busy Beaver Turing machines turn out, when you analyze them, to be testing variants of the notorious Collatz Conjecture—and Conway is the one who proved, in 1972, that the set of “Collatz-like questions” is Turing-undecidable. And Graham because of Graham’s number from Ramsey theory—a candidate for the biggest number that’s ever played a role in mathematical research—and because of the discovery, four years ago, that the 18th Busy Beaver number exceeds Graham’s number.

(“Just how big is Graham’s number? So big that the 17th Busy Beaver number is not yet known to exceed it!”)

Anyway, I tried to make the survey pretty accessible, while still providing enough technical content to sink one’s two overgrown front teeth into (don’t worry, there are no such puns in the piece itself). I hope you like reading it at least 1/BB(10) as much as I liked writing it.

Update (July 24): Longtime commenter Joshua Zelinsky gently reminded me that one of the main questions discussed in the survey—namely, whether we can prove BB(n+1)>2BB(n) for all large enough n—was first brought to my attention by him, Joshua, in a 2013 Ask-Me-Anything session on this blog! I apologize to Joshua for the major oversight, which has now been corrected. On the positive side, we just got a powerful demonstration both of the intellectual benefits of blogging, and of the benefits of sharing paper drafts on one’s blog before sending them to the editor!

404 Responses to “The Busy Beaver Frontier”

  1. Sniffnoy Says:

    Interesting summary!

    Two quick notes:

    1. Conjectures 11 and 12 are interesting. Particularly because, I remember back in the thread where this was all being discussed, I asked Stefan O’Rear if he could get fewer states by focusing on PA rather than ZF, and he was like, actually, ZF is easier! But like… ultimately that can’t be the case, right? And yet it is surprising that at the moment nobody sems to know how to do better on PA than on ZF.

    2. I have to nitpick, but my understanding is that the significance of Graham’s number is largely a fiction; there was never actually any version of Graham’s proof that used this number as an upper bound. Or at least that’s what John Baez says Graham told him.

    And, heh, I see you closed down the other thread before I had a chance to go back and reply to stuff… ah well, maybe best to avoid getting into such arguments, heh.

  2. Harvey Friedman Says:

    “the values of all Busy Beaver numbers beyond the 7910th are independent of the axioms of set theory (Stefan O’Rear has since shown that independence starts at 748th value or sooner).”

    One might be a little more precise here and say that for c = 7910 (later improved to 748),

    for all n >= c and m, the statement BB(n) = m is unprovable in ZF(C).

    When put this precisely, this raises the following question about BB:

    QUESTION. Let n be fixed and suppose that for all m, the statement BB(n) = m is unprovable in ZF(C). Then is it true that for all r, the statement BB(n+1) = r is unprovable in ZF(C)?

    From work of Goedel, we know that ZF and ZFC are equivalent for these questions.

  3. Jon Awbrey Says:

    One of my favorite fast functions …


  4. Jon Awbrey Says:

    Here’s another one …

  5. Harvey Friedman Says:

    I can now answer my question in the previous post.

    THEOREM. Suppose ZF(C) proves BB(n) = m. Then every TM with at most n states that doesn’t halt can be proved in ZF(C) to not halt. Conversely, if every TM with at most n states that doesn’t halt can be proved in ZF(C) to not halt, then for some m, BB(n) = m is provable in ZF(C).

    Proof: For the first claim, just use that: ZF(C) sees that a TM with <= n states halts if and only if it halts in <= m steps. For the second claim, assume

    *) every TM with at most n states that doesn't halt can be proved in ZF(C) to not halt.

    We now show how to determine BB(n) within ZF(C). Let m be the actual value of BB(n). then ZF(C) identifies the TM with at most n states correctly by waiting m steps, We need to see that ZF(C) can actually prove that the other TM's with at most n states do not halt. That is clear by *). QED

    COROLLARY. If BB(n) = m is provable in ZF(C) then for some r, BB(n+1) = r is provable in ZF(C).

  6. Jon Awbrey Says:

    All my favorite integer sequences, a few of them very fast growing, spring from the “lambda point” where graph theory, logic, and number theory meet, going back to a time when I was playing around with Gödel numbers of graph-theoretic structures and thinking about computational complexity. I posted a couple of links to the OEIS earlier but they must have fallen into the spam trap. I’ll try this non-linky comment for now and add links later.

  7. wolfgang Says:

    The discussion of BB(n) usually just distinguishes halting vs not-halting TMs, but I think it would be interesting to further distinguish the non-halting, e.g. as follows: non-halting which produces a finite pattern of 0s and 1s , non-halting which produces an infinite pattern of low complexity, e.g. 01010101… and finally non-halting which produces a pattern indistinguishable from random , e.g. calculating pi etc.

    And I think it would be interesting to know something about the ratio of such TMs , basically the ratio of ‘boring’ TMs (stuck in a loop or finishing early) vs the ‘interesting’ TMs.
    i think both TMs which stop after a finite but very large number of steps and TMs which never stop but create complicated patterns are ‘interesting’ and suspect that the ratio of ‘interesting’ to ‘boring’ TMs quickly tends to zero …

  8. Michael Raskin Says:

    When you mention Rayo’s construction, you say that it is too shaky; is it known, though, that uniquely-ZF-definable numbers grow faster than BB of rank of largest constructive ordinal provable in ZF?

    Also, maybe SKI combinator logic (with number of reductions as runtime) could also be mentioned as a contender for a well-known succinctly-definable model where reasonably short program can already run for a long time?

  9. Stephen L Says:

    Mathematically-literate non-computer-scientist here. Nice article! I was able to follow until section 3. This paragraph lost me a bit:

    “Define the “super Busy Beaver function,” BB1 (n), exactly the same way as BB (n), except that the Turing machines being maximized over are now equipped with a HALT oracle in some suitable way. (The “original” BB function would then be BB0 (n).) Since the arguments in Section 2 relativize, we find that BB1 (n) dominates not only BB (n) itself, but any function computable using a BB oracle.”

    How does having a Halt oracle allow for longer-running programs?

  10. Joshua B Zelinsky Says:

    @ Stephen L, #9

    “How does having a Halt oracle allow for longer-running programs?”

    It may be easier as an intuition pump to consider a Turing machine which rather than a Halting oracle, has itself a Busy Beaver oracle (that is for your regular Busy Beaver function). If so, for n that are only a little big one can for n states do something like a machine that does something like “Count to BB(2^n).” Similarly consider a machine that does “Find k=BB(n-10), and then count to BB(k).” The number of transitions here should intuitively grow much faster than BB(n).

  11. Oleg Eterevsky Says:

    Not claiming any theoretical value, but last autumn as a toy project I decided to find some Busy Beavers in Brainfuck (with some limitations): TLDR: I’ve ran all the BF programs up to length 18, solved halting problem for all programs up to length 12. The longest-running program that I’ve found is this one: >+[>++>+++[->]+. It runs for 9213 steps.

  12. Scott Says:

    Harvey Friedman #2 and #5: Granting that you surely forget more logic and computability theory in an hour then I’ve learned in my entire life, isn’t that observation implicitly right there in Proposition 4 in the survey? 🙂

    (Separately, in your corollary at the end, shouldn’t “provable” be “unprovable”?)

  13. Idle Squirrel Says:

    I know I should probably feel bad for tuning into the comment sections before reading the paper to check if someone has already found a way to politicize or culture-weaponize BB (i.e. by suggesting that the term “beaver frontier” is offensive to… rural Canadians)

    So how about preemptively turning it into an argument about the supremacy of computable numbers instead – a number that cannot be computed should not be said to be bigger than any number that can be – because come on, that’s cheating.

  14. fred Says:

    My first encounter with the Busy Beaver was in 1984, in the Scientific American column “Computer Recreations” by A.K. Dewdney.

    The SA official site only has his full article on Mandelbrot:

    But you can find many of his columns (including the Beaver one) in the book:

  15. wolfgang Says:

    @Idle Squirrel #13 >> that’s cheating

    But cheating is sometimes quite interesting, e.g. in a ‘name the bigger number’ contest I would go with S + 1 , where S is the largest well-defined number ever mentioned by Scott or one of his commenters on this blog.

    You may say that this is quite fuzzy, because S might change over time, but I think this loophole can be fixed by considering a Turing machine capable of simulating Scott and his commenters … including this one.

  16. Scott Says:

    wolfgang #15:

      in a ‘name the bigger number’ contest I would go with S + 1 , where S is the largest well-defined number ever mentioned by Scott or one of his commenters on this blog.

    The trouble is, you’re a commenter on this blog, and you just mentioned S+1. So to whatever extent S is well-defined at all, we get S≥S+1, an obvious absurdity.

  17. Scott Says:

    fred #14: Hey, I also first learned about Busy Beaver from A. K. Dewdney’s The New Turing Omnibus! I think when I was 16. And I wondered why no one had told me about such huge numbers earlier, and figured I’d tell my own kids as early as possible when and if I had any. 😀

  18. Scott Says:

    Idle Squirrel #13: If the Busy Beaver function is considered suspect because of noncomputable supremacy, then we could also consider the Idle Squirrel function, defined as the least number of steps made by any n-state Turing machine on an all-0 input. As IS(n)=1 for all n, my survey article on The Idle Squirrel Frontier would have the advantage of being a lot shorter… 😀

  19. Scott Says:

    Sniffnoy #1:

    1) Yes, PA has to be easier than ZFC. I guess Stefan was saying that some particular way to encode PA is even worse than what he got for ZFC, but if so, then it must be a bad encoding.

    2) Yeah, I knew from the Wikipedia page that Graham’s number is a looser upper bound that Graham explained to Martin Gardner; what appeared in Graham’s paper was a tighter (though still incomprehensibly enormous) bound that’s harder to explain. That seemed legit though, since even for mathematical reasons one might prefer a simpler, looser bound.

  20. Dangernorm Says:

    Is there a stylistic requirement that you use dismissive scare quotes around names that don’t, or that you believe don’t, match the names people use to file their tax paperwork? If someone makes a significant enough contribution that you’d want to reference them at all, surely we can respect their decision to do so under the name Wythagoras, or any other. I assume you wouldn’t do the same for the self-selected name of transgender persons, even if you knew that they hadn’t yet filed formal name change paperwork.

  21. Scott Says:

    Michael Raskin #8:

      When you mention Rayo’s construction, you say that it is too shaky; is it known, though, that uniquely-ZF-definable numbers grow faster than BB of rank of largest constructive ordinal provable in ZF?

    I and others debated some of these issues years ago on this MathOverflow page. I also talked them over with Agustin Rayo, my former colleague at MIT (and a cool guy), who completely agreed that his number is ill-defined given the philosophical commitments that I’m willing to make.

    To make a long story short, my current understanding is that, if
    (1) someone has a way of uniquely defining a huge numbers using a ZF predicate, and
    (2) I’d be willing to admit their number as well-defined (e.g., not depending on an intended model of set theory),
    then it should be possible to simulate their construction using an ordinal BB function. Or to say it another way: I don’t see set theory as having a magical power to make integers well-defined that otherwise wouldn’t be. If an integer is well-defined, then I’d like its definition to ultimately be in terms of first-order quantification over the integers (possibly with an ordinal number of quantifiers), in which case that integer will be “ordinal-BB-simulable.” The proof of the construction’s soundness might depend on highfalutin set theory (for example, large-cardinal axioms to establish the existence of the requisite ordinal), but the construction itself shouldn’t.

    But if anyone wants to revisit that debate in this thread, I won’t stop them…

  22. Scott Says:

    wolfgang #7:

      The discussion of BB(n) usually just distinguishes halting vs not-halting TMs, but I think it would be interesting to further distinguish the non-halting, e.g. as follows: non-halting which produces a finite pattern of 0s and 1s , non-halting which produces an infinite pattern of low complexity, e.g. 01010101… and finally non-halting which produces a pattern indistinguishable from random , e.g. calculating pi etc.

    Yes, I mention that distinction in Section 5.6 of the survey. In practice, when you’re trying to calculate BB numbers, the non-halting machines that you’re worried about are virtually all ones that generate non-repeating patterns. If it’s a repeating pattern, then it tends to be easy to detect that, prove the machine never halts, and discard it.

      And I think it would be interesting to know something about the ratio of such TMs , basically the ratio of ‘boring’ TMs (stuck in a loop or finishing early) vs the ‘interesting’ TMs.
      i think both TMs which stop after a finite but very large number of steps and TMs which never stop but create complicated patterns are ‘interesting’ and suspect that the ratio of ‘interesting’ to ‘boring’ TMs quickly tends to zero …

    No, that’s not the case. See for example the literature on Chaitin’s constant Ω. Once you have a prefix-free encoding scheme (so that the notion of a “random program” makes sense at all), the proportion of programs displaying basically any nontrivial behavior you want (e.g., generating an infinite non-repeating pattern) is going to be some uncomputable real. So in particular, it will be bounded away from 0.

  23. DangerNorm Says:

    Actually, on the subject of math results posted by Internet users, have you heard of the paper, A lower bound on the length of the shortest superpattern, which credits Anonymous 4chan Poster as the main contributor? I expect that the number of significant results credited by other-than-legal-names will only increase with time.

  24. Scott Says:

    Dangernorm #20:

      Is there a stylistic requirement that you use dismissive scare quotes around names that don’t, or that you believe don’t, match the names people use to file their tax paperwork? If someone makes a significant enough contribution that you’d want to reference them at all, surely we can respect their decision to do so under the name Wythagoras, or any other. I assume you wouldn’t do the same for the self-selected name of transgender persons, even if you knew that they hadn’t yet filed formal name change paperwork.

    Aha, thank you! This is 2020, so I knew someone would find something to take offense at, even in a survey article about an uncomputable integer sequence. 😀

    I have nothing but admiration for anyone who discovered that BB(7)>1010^10^10^18,000,000. The trouble is, there’s not much tradition in academia for publishing original research under pseudonymous handles—the closest to a counterexample that I could think of was Bourbaki, which of course published under the same name for many decades. Academics can change their names, as many trans people do (or for that matter, spouses who take a new surname when they get married). But academic exchange usually does presuppose some level of consistency in what name a person is known by, so that they can stay accountable for what they said and also so that it’s easy for others to credit their contributions.

    In the case at hand, the point is not just that Wythagoras presumably doesn’t use that name to file taxes—rather, it’s that as far as I know, they haven’t used that name for anything besides a few forum posts about large numbers. Indeed, I had wanted to contact Wythagoras to solicit feedback on my survey, but couldn’t find any way to do that. (Wythagoras, if you’re reading this now: big fan of your posts; please get in touch! 🙂 ) And what if someone had questions about the veracity of the result, which was described rather briefly on the forum and wasn’t peer-reviewed?

    I’m extremely far from the reactionary camp that says “if it’s not in a peer-reviewed journal, it doesn’t exist.” But I think we’re still negotiating the norms for results that, in some cases, exist only as pseudonymous blog comments with no way to contact the author. And this issue came to the fore with my Busy Beaver survey—given the centrality, especially recently, of online contributions outside the normal academic process.

  25. DangerNorm Says:

    I see. I’m not offended, but I am in favor of expanding the area of society in which people may operate without pre-doxing themselvs.

    My own sense is that mathematics should be the most open to this, since the reader can scarcely understand what a math paper is even trying to say without joining the author hand-in-hand, step-by-step. There is not the issue one has with, for example, the collection of datasets, whereby one must put trust in the process, even if the alleged data itself is included. The proof could as well have been published by a university, posted on 4chan (as in the paper mentioned above), or recovered from carvings in Antarctic passages in an unknown language, but with clear enough pictures that you can follow the geometric construction; you’re either persuaded by the proof or you aren’t.

    As someone more embedded in the academic process than myself, does this match your understanding?

  26. Stephen Jordan Says:

    At a time like this it is very valuable to have a mathematical world into which one can be absorbed. I’ve chosen the world of tensor rank and algebraic complexity. I can’t quite put my finger on what specific property makes certain mathematical subjects particularly suitable for this but I think Busy Beaver has it.

    The hard part is keeping the external world from intruding. I could be bounded in a nutshell and count myself king of infinite space were it not that I have facebook.

  27. Idle Squirrel Says:

    Satoshi Nakamoto comes to mind as having made a rather significant impact with a pseudonymous paper even if not in the classic academic sense.

  28. John Michael Says:

    I found your “Bigger Number” essay when I was a kid and was utterly fascinated (and also made deeply confused and curious by the whole concept of uncomputability). It’s probably still buried near the bottom of the bookmarks on the family computer, heh. Cool to be reading an even more fascinating piece on an overlapping subject all these years later! (And a bit embarrassing to find out I’m now older than you were when you wrote that article!)

    Thanks for being such an inspiring & engaging writer!

  29. Jacob Manaker Says:

    You mention in the paper that the optimal functions for small n seem to be running Collatz-type iterations. This would make sense if they “unpacked” into some larger function via a FRACTRAN-type encoding.

  30. Gautham Kamath Says:


    In the article the the lowest bound for BB(N) that can’t be proved by ZF is BB(748). Similarly the current lowest bound for BB(N) that can prove Riemann is for N =744 and for Goldbach N=27.

    1) If someone manages the lower the ZF bound to N < 744, but lets say that we magically know that N=744 can't be improved upon for Riemann. Would that mean that Riemann is not provable in ZF?

    2) Do you think that the above current bounds hint that proving Riemann is more difficult than proving Goldbach?

  31. Scott Says:

    DangerNorm #25:

      The proof could as well have been published by a university, posted on 4chan (as in the paper mentioned above), or recovered from carvings in Antarctic passages in an unknown language, but with clear enough pictures that you can follow the geometric construction; you’re either persuaded by the proof or you aren’t.

      As someone more embedded in the academic process than myself, does this match your understanding?

    The trouble is that it doesn’t. Like, you’ve described the Platonic ideal of math research, and it used to be more like that, and it’s still like that for certain problems and in certain areas of math. But many modern proofs are insanely complicated, and they depend on previous work in messy and incompletely-specified ways, and include many steps like “this is handled using the standard tools.” It’s typically unrealistic that even world experts would be able to follow such a proof without some back-and-forth with the author. (By analogy, even if your code contains all the essential ideas of a modern operating system, it’s probably not going to boot on the first try.)

    Sure, after a long refereeing process, a product will hopefully emerge that can be understood with no further input from the author. But the forum posts that we’re talking about are not refereed! (At most they’re commented on by other forum users.)

    These problems are compounded in the specific case of searches for 6- and 7-state Busy Beavers, which involve a mixture of informal reasoning and the results of running simulations. In cases like this, even if code is available for download, rather than struggling to get the code to work, most people are just going to trust the author about what the results were, and are also going to trust the author that the code is doing what it’s supposed to do and that it correctly links up with the informal reasoning.

    Eventually, I expect people to find solutions to these problems—which might involve automated proof-checking, or decoupling peer review from journal publication, or letting authors have pseudonymous handles by which they’re reachable even decades later, or something else. But right now we’re in a transitional era where the solutions haven’t yet emerged.

  32. Scott Says:

    Stephen Jordan #26:

      I could be bounded in a nutshell and count myself king of infinite space were it not that I have facebook.

    I’m tempted to make that my blog’s new tagline! 😀

  33. Scott Says:

    Idle Squirrel #27:

      Satoshi Nakamoto comes to mind as having made a rather significant impact with a pseudonymous paper even if not in the classic academic sense.

    Yes, good, that’s another big example! Besides Satoshi and Bourbaki, what others are there?

  34. zjin Says:

    I wonder if you would comment on the news of National Quantum Internet. It seemed that the U.S. Department of Energy and the University of Chicago would have a major project to build this.

  35. Scott Says:

    Jacob Manaker #29:

      You mention in the paper that the optimal functions for small n seem to be running Collatz-type iterations. This would make sense if they “unpacked” into some larger function via a FRACTRAN-type encoding.

    While I don’t know FRACTRAN well, the Collatz-like iteration could itself be seen as a sort of unpacking: we start with 0, which then gets successively “unpacked” into larger and larger positive integers, but only a finite number of times until the iterative process terminates for a modularity reason. But in addition to that, there’s some “unpacking” in a Turing machine with only 5 states implicitly encoding a relatively complicated Collatz-like iteration.

  36. Scott Says:

    Gautham Kamath #30:

      1) If someone manages the lower the ZF bound to N < 744, but lets say that we magically know that N=744 can't be improved upon for Riemann. Would that mean that Riemann is not provable in ZF?

    Yes, it would mean that. As I point out in footnote 18, the moment the Riemann hypothesis was proven, the truth of RH would then be proved to be equivalent to the non-halting of a one-state Turing machine—namely, one that just goes into a trivial infinite loop! 😀

      2) Do you think that the above current bounds hint that proving Riemann is more difficult than proving Goldbach?

    No, not necessarily. In practice, the number of states needed to encode a conjecture via a Turing machine need not have any correlation with the difficulty of proving it.

  37. Scott Says:

    zjin #34: Sorry, I haven’t read about it and don’t have a comment right now.

  38. CC Says:

    Thanks for the article. Just wanted to note my appreciation since I had not really read about BB before and you made me do it and it was rewarding.

  39. Gautham Kamath Says:

    Scott #36: Thanks for the explanation!

    Unrelated: You gave a talk at my company (Cirrus Logic) last year and it was pretty well received; attendance was significantly higher than the average Professor talks we usually have. So the organizer would like to have future speaker talks about Quantum Computing and wanted suggestions from me. I said that since you were on the theoretical computer science side, that maybe we should now get someone from the engineering/experimental side and suggested he try to reach out to the Google Engineering team since they had the quantum supremacy milestone several months ago.

    I know you are friends with some of those guys; perhaps there is a name you can suggest, someone who is a good expositor, just like you? Or some other engineer/experimentalist colleague of yours that has nothing to do with Google?

  40. Scott Says:

    CC #38: Thanks!

  41. Scott Says:

    Gautham #39: If you wanted a QC experimentalist from UT Austin, Shyam Shankar is your man. Though in the Covid/Zoom era, I guess proximity no longer matters, so you could also reach out to Google folks in Santa Barbara, like Ryan Babbush or Sergio Boixo.

  42. Mark Marshall Says:

    Scott #33

    At school, I learnt about the “Student T Test”, which was published by “Student”. He was also the head brewer at Guinness, which tells us all we need to know about statistics.

  43. Filip Says:

    Scott #33: How did you forget G. W. Peck? It’s the pseudonym of Ronald Graham, Erdős, Douglas West, George B. Purdy, Fan Chung, and Daniel Kleitman.

  44. asdf Says:

    The citations to the googolology wiki remind me of the following extremely nerdy passage from HPMOR:

    “Meself,” Hagrid continued, “I think we might ‘ave a Parisian hydra on our ‘ands. They’re no threat to a wizard, yeh’ve just got to keep holdin’ ’em off long enough, and there’s no way yeh can lose. I mean literally no way yeh can lose so long’s yeh keep fightin’. Trouble is, against a Parisian hydra, most creatures give up long before. Takes a while to cut down all the heads, yeh see.”

    “Bah,” said the foreign boy. “In Durmstrang we learn to fight Buchholz hydra. Unimaginably more tedious to fight! I mean literally, cannot imagine. First-years not believe us when we tell them winning is possible! Instructor must give second order, iterate until they comprehend.”

    I had to look it up: the Googolology wiki describes the Buchholz hydra here.

  45. John Michael Says:

    Question about uncomputability and definability:

    So, the value of BB(748) is independent of ZFC. As I understand it, by Gödel’s completeness theorem, this implies that in different models of ZFC BB(748) has different values. Right? That’s pretty weird though, since if you had two competing values for BB(748) proven in two different extensions of set theory, if there were enough time and space and negentropy, you could just run them in check to see which is right…

    I’m missing some things here but I’m not sure what they are… Does it have something to do with “nonstandard models” or whatever and weird theories like ZFC+¬Con(ZFC)? Scott, why do you consider BB(748) to be mathematically well-defined, even though you don’t consider Rado’s 2nd order number to bet? Don’t both change depending on which model of set theory you’re in?

  46. lazyworm Says:

    About your conjecture BB(20,2) is independent of ZF, one can speculate that BB(10,3) should be independent of ZF …………. then there exists an integer k such that BB(2,k) is independent of ZF.

  47. Oscar Cunningham Says:

    I realised an interesting thing while reading this. If we define the theory T_n as PA + ‘the nth Busy Beaver machine is b’ where b actually is the nth Busy Beaver machine, then T is a sequence of effectively axiomatized consistent theories with arbitrarily high consistency strength! For any other effectively axiomatized consistent theory S there’s some n_S such that PA+Con(T_n_S) proves Con(S).

    So the Busy Beaver numbers a countable ladder on which we can place the various Large Cardinal axioms for comparison to each other. Previously I’d been assuming that the structure of all possible Large Cardinal axioms was much more complicated than that, and that they order of their consistency strengths would be transfinite, with no hope of having countable cofinality.

  48. Raoul Ohio Says:

    Scott #24,

    Pretty sure Student of “Student T” test fame is not her/his real name.

  49. asdf Says:

    John Michael #45, !CON(ZFC) asserts that a certain Turing machine (one that searches for a proof that 1=0) eventually halts. Assuming ZFC is actually consistent, in a model of ZFC with standard integers, this TM never halts, so some other TM is the busy beaver (the longest-running halting TM) for that number of states. In an alternate model with nonstandard arithmetic, the 1=0 TM halts after a nonstandard number of steps, but that is larger than any standard integer, so there is no way to actually run the TM for that long to see what happens. In particular, the busy beaver in that model would have a nonstandard running time.

  50. Laurent Claessens Says:

    If I understand correctly, there is a small typo on page 8 : “For completeness, here are the Busy Beavers and (for n ≥ 5)” It should be “n <= 5".

    I've a question about "usefulness" of BB in the following sense : Is there a question in mathematics that
    1. can be asked without knowing about BB
    2. cannot be answered without using some BB ?

  51. Scott Says:

    Oscar Cunningham #47: Yes!!!! That’s a beautiful observation. Would you mind if I included it in my survey, crediting you?

    In some sense, I suppose it’s obvious that we can order all effectively axiomatized theories T along a countable ladder—namely, the ladder of “the number of bits needed to write a program that enumerates all the theorems of T”—which is all that the BB ladder really is. The less obvious part is that the steps on this ladder are themselves theories that are ordered by consistency strength: namely, your theories of the form “PA + BB(n)=k.”

    Here’s something that I’m now wondering. If we consider popular theories like ZF, is their consistency provably equivalent to some statement of the form BB(n)=k? Or is Con(ZF) sandwiched between different such statements? In other words, let x≥5 be the first integer for which ZF doesn’t prove the value of BB(x), and let y≤748 be the first integer for which the value of BB(y) implies Con(ZF). Clearly x≤y. But is there a gap between x and y, and if so, how large is it?

  52. Scott Says:

    Laurent Claessens #50:

      If I understand correctly, there is a small typo on page 8 : “For completeness, here are the Busy Beavers and (for n ≥ 5)” It should be “n <= 5".

    No, that’s not a typo. For n≥5, no one has established what the Busy Beavers are; we only have candidates.

      I’ve a question about “usefulness” of BB in the following sense : Is there a question in mathematics that
      1. can be asked without knowing about BB
      2. cannot be answered without using some BB ?

    Probably the best example is, “Here’s someone who never studied computability or logic. What’s a positive integer that’s vastly bigger than anything they could write down within the lifetime of the observable universe?” 😀

    More seriously, I sometimes see BB show up in papers on Kolmogorov complexity, sophistication, depth, Chaitin’s Ω, and other topics in computability theory. In all these cases, undoubtedly one could “route around” BB if one really wanted to, and phrase everything in terms of other ways to make computability quantitative, like K and Ω. But why? The relevant question is just whether BB is a useful concept, and I think it clearly is, at least for anyone who cares about any quantitative question that touches on computability.

  53. Scott Says:

    John Michael #45: Yes, it has to do with nonstandard models. The situation is this: let G be the actual value of BB(748), whatever it is.

    (And if you don’t accept that there’s such a thing as the “actual value” of BB(748), then get out of the room! 😀 For me, if there’s no objective fact of the matter about whether a given Turing machine halts or runs forever on an all-0 input—separate from the question of that fact’s provability—then there’s no objective fact of the matter about anything, including anything that we’re talking about in this very conversation.)

    Still with me? OK then, ZF clearly proves that BB(748)≥G, by just simulating a 748-state Busy Beaver for the requisite number of steps. So every model of ZF “knows” that BB(748)≥G. But some models of ZF “incorrectly believe” that BB(748) is strictly greater than G. None of these models “believe” that BB(748)=H, for any positive integer H strictly greater than G—if they did, that would easily lead to a contradiction. Their “belief,” so to speak, is instead that BB(748) equals a nonstandard integer. Which is the same as saying: these models “believe” that some particular 748-state TM halts, even though in actual reality that TM runs forever. By Gödel, this false belief can never be disproved within ZF, but it’s false all the same.

    In other words: metatheoretically, we can say that these theories, though consistent, are all “pathological” and arithmetically unsound, in exactly the same way that ZF+Not(Con(ZF)) is “pathological” and arithmetically unsound.

  54. Oscar Cunningham Says:

    Scott #51: Of course I don’t mind!

  55. Sniffnoy Says:

    A bunch more mathematicians’ pseudonyms here. Some of these are pseudonyms they published mathematics under; some are not.

    Also, some other amusing irregularities in academic authorship: F. D. C. Willard and G. Mirkwood 🙂

  56. Jon Awbrey Says:

    Dear Scott,

    I put a few links related to my previous comments in a blog post —

    🙞 Riffs and Rotes • 5

  57. DR Says:

    An option I have seen and maybe tried myself when referring to internet pseudonyms is to cite “user X”, where X is the username, but with no quotes in the text. So for instance, one can cite user Wythagoras on Googolology, rather than “Wythagoras”. Or, to pick a rather well-known anonymous mathematician, user quid on MathOverflow (no longer active, sadly), rather than “quid”.

  58. Cecile McKee Says:

    I’ve just met your blog (by following a link from Pinker’s website). I love it! And, from what I can tell of your eclectic interests, I thought you might enjoy this “Mental Floss” piece:

    You: “OK, I mused, how many people have even heard of the Linguistics Society of America, compared to the number who’ve heard of Pinker or read his books?”

  59. Joshua B Zelinsky Says:

    One other thought which may or may not be useful:

    On page 16, you ask “do all Busy Beavers halt on all finite inputs?” While my guess is that the answer is probably no, it might be possible to prove results of the sort which I believe you once characterized as something like “If pigs flys, then the moon is not made of green cheese.” In particular, if we assume that all Busy Beavers do halt on all finite inputs, then subject to that assumption, it looks like conditionally proving that BB(5) = 47,176,870 might become easier. For the remaining 25 5- state Turing machines, it would then suffice for each to find some initial tape configuration where we can prove that machine does not halt. That seems much easier than proving it for the specific case of not halting on the blank tape.

  60. Persona Says:

    “One can even define, for example, BBωZF (n), where ωZF is the computable ordinal that’s the supremum of all the computable ordinals that can be proven to exist in ZF set theory. Or BBωLC (n), where LC is some large-cardinal theory extending ZF, and ωLC is the computable ordinal that similarly encodes LC’s power.”

    BBωLC (n) is unambigously defined as long as LC is effectively axiomatized and consistent, right? Could we define BBmax(n) = max_LC BBωLC (n) with LC running over all effectively axiomatized consistent theories extending ZF?

  61. Scott Says:

    Persona #60: For a fixed value of n, why would we expect that maximum to exist?

  62. Persona Says:

    #Scott61: Okay, next try. Define some enumeration of all effectively axiomatized extensions of Zf.

    Let L(m,n)=max_(number(LC) < m and LC is consistent) BBωLC (n).

    Then L(n)=L(n,n) should be well-defined.

  63. Michael Says:

    Scott, an off topic question- I was just reading Jonathan and Jesse Kellerman’s murder mystery Half Moon Bay and at one point the protagonist questions a physics professor named DELIA Moskowitz. Do either you or Dana know the Kellermans?

  64. Job Says:

    So BB(n) is basically a race to see which n-state machine encodes the largest number in its run time.

    For large enough n we could embed a random number generator, as well as a seed, such that it runs until it sees k consecutive heads, for some large k – just let the seed do the work.

    And if we embed an m-state simulator (m less than n), then k could be another BB(m). That’s like raising BB(m) to the power of RNG. 🙂

    Not surprised it grows so fast, very interesting stuff.

    (BTW the comment preview is having issues with < characters).

  65. asdf Says:

    There is an MO thread about math pseudonyms:

  66. Nick Says:

    This is exactly the kind of “simple-but-deep” inquiry that brought me to this blog in the first place, and I imagine that goes for many other readers as well. Let’s all try to keep that common ground in mind in the political discussions that take place on this blog. That’s at least as much a reminder to myself as to anyone else!

    Questions / comments:

    1. Say BB(n) is provable in T, but BB(n + 1) is not. In a qualitative sense, what is it that changes to make that the case? What is the difference between n and n + 1? Why is the cutoff point n + 1 and not n? It’s like you’re walking down a staircase, and the all of a sudden you’ve taken a step into a bottomless pit. I hope this question makes sense despite being vague.

    2. Let CC(n) be the number of machines in T(n) that halt in BB(n) steps. What is known about CC? Sorry if this is addressed in the article and I missed it.

    3. Scott: I am far from an expert, but I have read the TeXBook, and your definition of BB on page 2 strikes me as ugly-looking TeX. I don’t have any suggestions for how to make it look better, except to ask “What would Knuth do?” (Knuth, if you are reading this, please advise.)

    4. BB(1) through BB(4) is [1, 6, 21, 107]. As I write this, there is exactly one sequence in the Online Encyclopedia of Integer Sequences containing that subsequence, and it is BB [1]. So here is a challenge for anyone who is really bored: devise a natural (i.e. non-artificial) integer sequence containing [1, 6, 21, 107] as a subsequence, preferably as its first four values.


  67. Sniffnoy Says:

    Job #64: It’s a (limited) HTML comment box; write &lt; instead of < to get a less-than sign.

  68. Raoul Ohio Says:

    Sniffnoy #55,

    Your reference includes my guess: E.T. Bell / John Taine. Related topic: I want to suggest to everyone a highly entertaining book “The search for E.T. Bell” by Constance Reid. It is listed online as a novel, but it is a serious biography — and it is astounding! For unknown reasons, Bell invented an elaborate history for himself that no one caught on to, including his wife, son, and colleagues at Stanford, which is all the more remarkable because he spend much of his childhood in San Jose.

  69. Scott Says:

    Nick #66:

    1. One possible answer to your question is that it happens to take n+1 states to design a Turing machine that searches for contradictions in the formal system. A second possible answer is that the phase transition has to happen somewhere, so why not at n+1? 🙂

    2. CC(n) is another quantity that, if you know it, lets you solve the halting problem for all Turing machines with ≤n states. CC(n) also has n log2 n – O(n) bits of algorithmic information (ie, no shorter program can output CC(n)). CC(n) is extremely closely related to, although not identical with, Chaitin’s halting probability Ω.

    3. Sorry! My LaTeX skills are better than those of the people who use < and > for the angle-brackets of quantum states, and who write words in math mode that are actually products of variables, but they’re not MUCH better. 🙂

  70. Scott Says:

    Michael #63: No, sorry, I’ve never heard of those people.

  71. Persona Says:

    Regarding my comment #62: I was looking for a trick to avoid what you called the “argument over which large-cardinal axioms are allowed when defining a generalized BB function”. Does it work? I mean, if L is well-defined then it grows (for sufficiently large n) at least as fast as any BBωLC for some fixed LC.

  72. Jacob Says:

    Layman here, with a probably naive question about the proof of proposition 4.

    How would T proving that M never halts prove T’s consistency? Wouldn’t proving that M never halts just prove that a proof of 0=1 does not exist? Couldn’t there still exist other inconsistencies in T besides 0=1?

  73. Zeb Says:

    In your discussion of the platonic reality of the busy beaver numbers BB(n) – and your reasons for rejecting Rayo’s function – you make several references to the “standard integers”.

    Before saying something you will find controversial, let me say that I agree with your philosophical position that the standard integers are a definite thing that truly exists, in some platonic sense. (J. D. Hamkins seems to have a coherent view of an alternative “multiversal” philosophical stance, where the “standard integers” can’t be nailed down at all and are at best a convenient fiction, which I find slightly terrifying and hard to refute.)

    I also agree that the values of the busy beaver function can be probed empirically. The fact that BB(3) = 21 is an empirical fact which we can test by a concrete experiment, so it has a physical reality to it that the truth or falsity of the continuum hypothesis lacks.

    Here is the issue: there is no guarantee that the “standard integers” are the same thing as the “physical integers”. I think that you are conflating two concepts here which might not be the same at all. Let me explain.

    We all should agree that it is an empirical fact that the number 10 is a natural number. Most of us have directly observed children counting to 10, the number 10 behaves in every way like we expect natural numbers to act (i.e. it is either even or odd), we can watch the second-hand of a mechanical watch tick by 10 times in a row, etc. So I think we can all agree that 10 is a “physical integer”. However, I would like to convince you that 10 might not be a “standard integer” – I claim that 10 might be non-standardly large! For this, we need to perform a thought experiment.

    Suppose first that it is possible to completely describe our universe – or at least, a universe which looks very similar to our universe – as a mathematical structure which satisfies some list of axioms of first order logic. This assumption is surprisingly controversial: the only academic I’m aware of who seems willing to really defend it is Tegmark, even though the search for a “grand unified theory” of physics seems to presuppose the existence of such a description. But at least, I think we can agree that this assumption hasn’t been completely ruled out.

    Suppose further, that the axioms which our universe satisfies are sufficiently permissive that we can prove that for each natural number n, there exists some universe which looks superficially similar to ours, in which some living creature is able to survive for n years (and in fact is able to simulate n steps of computation of any Turing machine). This is a slightly more controversial assumption, but there are enough people making a serious effort to live forever in the actual, physical universe that we should be able to at least entertain this idea.

    Then we can use standard logical constructions (such as the ultraproduct construction, or the Godel completeness theorem) to produce a universe which looks superficially similar to ours, but in which the internal “physical integers” are a nonstandard model of PA. What’s more, inside this universe, there will be some living creature which lives for a non-standardly large time n. We can even arrange that this n is so non-standardly big, that in this alternative universe, the Turing machine which searches for contradictions within ZFC eventually finds a contradiction and halts – so it would be an empirical fact in this universe, that BB(748) is some nonstandardly huge integer. This nonstandardly huge integer would pass every empirical test for being a true “physical integer” that you could throw at it – people would be able to count up to it, you could time that amount of time passing on a stopwatch, it would either be even or odd, etc.

    The above might seem like a fantastical scenario – but how can we rule it out? Any physical experiment we could devise to ensure that our universe is built out of a mathematical structure in which the internal “physical integers” are the true “standard integers” could be described as an axiom of our universe, and we can always arrange to find nonstandard models which additionally satisfy any finite list of additional axioms (so long as they are consistent with PA). So there is no physical experiment we can perform to definitively rule out the possibility that the physical integers are nonstandard – or even, for that matter, to rule out the possibility that the number 10 = 1+1+1+1+1+1+1+1+1+1 is nonstandard.

    Going a step further, I’d say that while we can agree that empirically we have BB(3) = 21 in our physical universe, it is not clear that BB(3) is 21 in the true “standard integers”: perhaps 21 is nonstandardly large, and the 3-state Turing machine which we find empirically halts after 21 steps (when we simulate it via our physical pencils and paper), actually runs forever in the true standard integers, which end somewhere between the standard integer 3 and the nonstandardly large number 10 (no, I can’t list out the precise set of which integers below 10 are the true standard ones – that set is infinite, undefinable, and uncomputable!).

  74. murmur Says:

    Hi Scott, can you explain why in Proposition 4 there should be a finite state Turing machine that finds all the proofs in T?

  75. Gerald Says:

    On conjecture 12: PA is equivalent to ZF-INF (axiom of infinity deleted). Commenter #1 however mentioned that ZF was actually easier to do than PA. Does this mean that deleting INF would not help to further simplify the known 748-state construction, in other words that INF comes for free?

    Conjectures 11 and 12 suggest that just adding INF may already double the number of required states. I feel that plain ZF (no large cardinals) should not actually be that much stronger than PA. An interesting, much stronger theory would be Second Order Number Theory with Projective Determinacy (Z2+PD). Since the work of Woodin and others in the 1980s there is a growing consensus that Z2+PD is the canonical theory of second order math, i.e. the right theory of V(ω+1), while PA being the canonical theory of first order number theory. So let me suggest

    Conjecture 11 a: Second order math (Z2+PD) does not prove the value of BB(25).

    Has anyone actually tried to add a supercompact or other sufficiently large cardinals to the 748-state machine in order to get the strength up to PD?

  76. Bunsen Burner Says:

    How is any of this modified if you move second order logic? Or even higher? Maybe there is an argument here even for infiniary logics? Has anyone considered such a thing?

  77. Joshua B Zelinsky Says:

    One other question that comes to mind. Is it possible for there to be an n such that there are two genuinely distinct Turing machines with n states which are both Busy Beavers for that n? More rigorously, is there an n, such that there are Turing Machines T1 and T2 each with n states such that T1 and T2 run for different numbers of steps on some input L (both halting but running for different numbers of steps before halting, or one halting on L and the other not halting on L), but that T1 and T2 both run for BB(n) steps on the blank tape? My guess would be no.

  78. Scott Says:

    Jacob #72: Principle of explosion. If a theory proves any contradiction, then it proves any other contradiction (including 0=1), since false implies false.

  79. Scott Says:

    murmur #74: The existence of such a Turing machine is what it means for a theory to be “computable” (another term is “effectively axiomatizable”). All the usual theories you’ve heard of, like PA and ZFC, have this property, which is why Göde’s Theorem applies to them. You can simply write a program that does a breadth-first search over all possible valid derivations from the axioms, incorporating more and more axioms as it goes if there are infinitely many of them. Alternatively, you can have the program iterate over all finite strings, and check each one to see whether it constitutes a valid proof.

  80. Scott Says:

    Bunsen Burner #76: I confess that I’ve never really grokked second-order logic. And I’m deeply suspicious of anything that presupposes a definite truth-value for, e.g., the Axiom of Choice or the Continuum Hypothesis. But maybe someone who understands this stuff would like to enlighten us on how second-order logic would change the discussion of the BB function?

  81. Scott Says:

    Joshua #77: Don’t we already get such an example when n=1? Consider the 1-state machine that halts on a 0, but keeps moving to the right when it sees 1’s. That, and a machine that halts on both 0 and 1, both demonstrate BB(1)=1.

  82. Joshua B Zelinsky Says:

    Scott, yes, sorry, I actually meant to write n>1 there. Although now I’m worried someone is going to point out a trivial example for n=2 or n=3.

  83. invisibules Says:

    I’ve just tucked in to the paper (and haven’t yet got to the meaty bits) — it occurred to me that props 1, 3 and 4 don’t apply to all BB_L. For example in my computer language, all programs of size evenly divisible by 100 are defined to halt immediately. 🙂

  84. invisibules Says:

    oh… but you saw to that in the definition of BL, with that sneaky <=n…

  85. Nick Says:

    Scott #69

    Is it obvious that the function CC has these interesting properties? Couldn’t it be that CC(n) = 1 for all n (or for all n > some n_cc)? The survey is careful not to assume uniqueness (‘_A_ machine M that achieves the maximum is also called _an_ “n-state Busy Beaver.”‘), but does anyone have any idea how common or rare Busy-Beaverness really is?

  86. Scott Says:

    Nick #85: Sorry, I thought you meant halt in at most BB(n) steps (or equivalently, halt at all). I have no idea how many machines run for exactly BB(n) steps, except that it’s at least as many as fit into one isomorphism class. That’s related to the question Joshua Zelinsky just asked.

  87. Scott Says:

    Gerald #75:

      Has anyone actually tried to add a supercompact or other sufficiently large cardinals to the 748-state machine in order to get the strength up to PD?

    Not that I’m aware of. That’s another worthwhile project!

    And I wouldn’t even think of encoding PA by doing ZF-INF — I’d instead look for some “non-logical” arithmetical statement equivalent to Con(PA), the simpler the better.

  88. Gerald Says:

    Bunsen Burner #76: Second order logic tells us essentially nothing new. Once we assume the second oder induction axiom, the theory is complete regarding arithemtical statements. Second order peano axioms are categorical: There is only one model, the set of natural numbers. Second order ZFC (with second order versions of separation and replacement) has only the trivial models V(κ) with κ inaccessable.

    So, second order PA (PA2) decides every statement (first order or otherwise) about natural numbers, but we often don’t know wich way. Gödel’s completeness theorem does not hold for second order logic, there is no complete logical deduction system. PA2 “proves” the statement T if and only if T is actually true. Likewise ZFC2 “proves” the continuum hypothesis iff CH is actually true. Unless you are a hardcore platonist, the latter depends on the background set theory in use. “Proves” of course only means “implies” in the semantic sense.

    While infinitary logics are an important tool in model theory and set theory, they are uninteresting for finitary math. Once we can axiomatize wellordering, i.e. “there exist no n1, n2, n3 … such that n1 > n2 > n3 > …, the theory completely determines finitary math, the reason being that nonstandard models of the natural numbers are not wellordered.

  89. Nick Says:

    Scot #86

    I don’t know if you have time to add more stuff, but it might be worth mentioning CC or something like it as an open problem (with JBZ’s qualifier “genuinely distinct” to weed out mirror images, etc). If the goal is to get these open problems out there in the hopes that answers will show up, this is one I sure would like to see answered.

    BTW, the chess master’s quotation on page 3 should be capitalized I think: “no,” -> “No,”

  90. Scott Says:

    Zeb #73: I was with your comment right up to the paragraph that begins “Then we can use standard logical constructions…”, where you increasingly say things that I don’t even know how to refute, they make so little sense to me! It’s easiest to start from the end: that BB(3)=21 is a theorem of Peano Arithmetic. There are no models of arithmetic where BB(3) is a nonstandard integer, or indeed anything other than 21. More broadly, though, what would it even mean to live in a physical universe governed by a nonstandard model of arithmetic? What empirical test could we ever do to tell if we were in such a universe? Would the result of such a test always be “oh sure, that will confirm we’re in a nonstandard universe, but it will only do so at a nonstandard time—i.e., after what you naïvely regard as the end of eternity”? 🙂

    Your comment made me realize something, though, about why I love the BB function so much: because it forces all the vague-talkers and anti-Platonists and anti-Law-of-the-Excluded-Middle people (not saying that you’re one) to put their cards on the table! It’s like, you agree that it’s a fact that BB(2)=6 and BB(3)=21? Well then, why isn’t there a fact about the value of BB(1000)? At which n does there stop being a fact about the value of BB(n)? But once you agree that there’s a fact about the value of BB(1000), how are you not a Platonist about the positive integers, just like I am?

  91. maline Says:

    Is there really any reason to suspect a connection between the BB numbers and the (generalized) Collatz conjecture? Sure, the Busy Beavers we have found can be expressed as iterating Collatz-like maps, but always with a particular start and end. They never explore more than one orbit, and that one is always one that ends. Is there any reason to relate this to the question of whether there are orbits that do not end?

  92. Scott Says:

    maline #91: I concede that it’s a genuine difference that with the Collatz Conjecture, one cares about all orbits, whereas with BB, one only cares about the orbit that starts at 0.

    Still, given a Collatz-like iteration rule g, it could already be a hard problem to decide if a specific orbit of g is finite or infinite … especially to prove that it’s infinite in case it is! And that’s exactly the sort of problem that might arise in proving that some of the 5- or 6-state holdout machines run forever (although I don’t know for certain that it does). And of course, if we don’t even know g’s behavior on the orbit starting at 0, then we’d seem to have little hope of understanding all the orbits! Thus, I stand by the claim that progress on determining the small values of BB could go hand in hand with progress on generalized Collatz iteration problems.

  93. Zeb Says:

    Scott #90: Here’s the thing, though. In our hypothetical nonstandard universe, there is a nonstandard version of Scott, who says, “Come now, here I have a proof in PA that BB(748) is at least [some huge (nonstandard) number]. Here, I’ll even write it out for you, and go through every step in detail! How can you claim that there might be some other model of arithmetic where BB(748) is smaller than that?” – and indeed, there would be such a proof in this universe, which could be written on a nonstandardly large physical piece of paper, with a nonstandardly long length. And I could perform the physical experiment where I feed this proof to my electronic formalized proof checker (which has a nonstandardly large amount of battery life), and it would (after a nonstandardly long time) output the result “yes, this looks totally legit!” Of course, none of us would be aware that any of these quantities (length of the proof, amount of time that passes, etc.) are actually nonstandard – we would think they are just ordinary, perhaps even somewhat small, numbers.

    So somehow that hypothetical version of Scott in that hypothetical universe is wrong, even though his reasoning looks just like yours. This is a very confusing state of affairs for him, but for us, it is easy to explain: his “proof” is not a true, platonic proof, since it is nonstandardly long.

    The same thing might be happening here, with your claim that BB(3) is 21. The proof you can write down of the claim that BB(3) = 21 probably has at least 10 steps in it, no? If 10 were secretly a nonstandardly large integer all along, then this “proof” would just be another nonstandard mirage, and wouldn’t have any implications about what is true or false in the true, platonic, standard model of the integers.

  94. asdf Says:

    It’s possible to believe in natural numbers while rejecting PA’s impedicative induction axioms, i.e. recognize known numbers like 0,1,2… but reject axioms that quantify over “numbers” that haven’t already been proven to exist. You get a system of arithmetic weaker than PA but that can apparently still handle most ordinary math like calculus. See Ed Nelson’s 200 page book or more accessible informal talk on the subject. What I don’t know is whether Nelson’s predicative arithmetic proves the existence of the BB numbers. I suspect that it does not.

  95. Scott Says:

    Zeb #93: Sorry, I still don’t follow you. We know for sure that 10 is not a nonstandard integer! Like, even within the nonstandard models of PA, where there are nonstandard integers, the integer 10 still exists and is still standard, because we can actually construct it as 1+1+1+1+1+1+1+1+1+1. This is not a matter of opinion or interpretation.

    Beyond that specific point, you might say I attach vastly less metaphysical importance to nonstandard models of arithmetic than you apparently do! For me, nonstandard models are best understood as formal artifacts of the completeness theorem. So for example, a “nonstandard proof” of PA’s inconsistency, within a model of PA+Not(Con(PA)), is not actually a “proof” at all. It’s just a placeholder that represents PA’s inability to rule out such a proof.

    By analogy, Andrew Wiles presumably spent years with tokens in his brain for objects like “the smallest counterexample to FLT” and “a non-modular elliptic curve.” But the fact that he had these tokens, and could even do complicated manipulations of them, doesn’t mean that the tokens ever had referents, any more than my daughter’s tokens for “unicorn” or “mermaid” do. For me, that’s precisely what a “nonstandard integer” is: a token for an integer that doesn’t actually exist.

    And while one could of course invent laws of physics that included those tokens as basic entities (just like one could invent worlds with unicorns and mermaids), they’d be totally unlike the laws of physics that we see. And if you want to claim that our laws might secretly already involve the nonstandard entities, but we could only notice them by doing nonstandard experiments or by waiting nonstandard amounts of time or some such … well then, you’ll have to take the discussion up with my nonstandard doppelgänger, rather than with the standard me who you’re talking to! 😀

  96. Scott Says:

    Persona #62 and #71: I’m sorry for the delay in replying to you—I wanted a chance to think it over.

    Briefly, yes, I think your proposal works! An even simpler version of your proposal would be as follows:

    Let ω(n) be the largest computable ordinal that’s definable by a computer program at most n bits long. Let me stick my neck out and say that ω(n) strikes me as a thing that clearly Platonically exists, despite the inability of any fixed formal system (like PA or ZFC) to determine it beyond the first few values of n. Just like BB(n) itself Platonically exists, even if PA or ZFC can’t calculate it beyond the first values. The one relevant difference, I guess, is that computable ordinals are not objects that you can even define using first-order quantification over the integers: you need the notion of a well-ordering (i.e., no infinite descending sequence). But I’m fine with that.

    Anyway, once we have ω(n), we can then define:

    L(n) := BBω(n)(n).

    This will grow faster than any of the functions like BBω_LC that I considered in Section 3 of my survey—in effect, by diagonalizing across all those functions.

    Indeed, it should have similar growth as your version, which diagonalizes across all computable ordinals that can be proved to exist in consistent, computable extensions of ZF. To see this: my version simulates yours because mine eventually hits all the computable ordinals. But your version also simulates mine, because for every computable ordinal α, one could consistently extend ZF with an axiom that says “the following Turing machine, M, computes the order relation of a computable ordinal, which we’ll call α.”

    Does anyone else have any comments on this? Any reasons why it’s ill-defined that I overlooked?

    If not, then may I include this observation in my article? And should I thank you as “commenter Persona,” or by a real name? 🙂

  97. Zeb Says:

    Here’s an experiment you can perform, to help understand what it would “feel like” to live in a nonstandard universe, where the number 10 was secretly nonstandardly large. Set an alarm to go off in 10 minutes. Then set aside all distractions, put the phone away, stop thinking about any sort of research, and just wait patiently for the alarm to go off.


    Now tell me – how certain are you that an eternity did not just pass, between beginning the experiment and ending it? Can you really track every single step of what happened, from minute to minute? Or was there a sort of vague feeling where the time in between became a long blur, with only a short summary claiming that quite a lot of time passed being stored in your brain?

    This is, of course, a joke – doing the above experiment won’t actually prove anything. Here is a real experiment you can perform to help determine whether you live in a nonstandard universe: take the Turing machine which is supposedly the busy beaver on three states listed in page 8 of your paper, and try simulating it. If it runs forever, then congratulations! You probably live in the true, platonic, standard universe (but there’s no way to be absolutely sure: nonstandard universes can mimic standard universes for the sake of fooling any particular experiment). If it ever halts, then you might want to seriously consider the possibility that you live in a nonstandard universe, where enormous “numbers” like “21” exist.

    [Ok, that was also a joke… I think. A better experiment would be to search for a contradiction in PA. If you ever find one, then either PA is truly inconsistent, or you live in one of the nonstandard universes which witness the failure of Con(PA) – and either way, logic becomes very, very difficult for you to make sense of.]

  98. Zeb Says:

    Scott #95: Oh, I think I understand your philosophical position better now. You seem to be saying that while the true integers really have a Platonic existence, these nonstandard models of arithmetic do not! So the fact that your nonstandard doppelganger is wrong doesn’t bother you, because to you, he doesn’t truly exist to you in any meaningful sense. For the same reason, you see no reason to spend huge amounts of time imagining what life is like from his point of view.

    This seems consistent with your position that you don’t feel confident that second order statements such as the continuum hypothesis should have a definite truth value. After all, nonstandard models of arithmetic are uncomputable, so they might coherently be viewed as being just as unreal as well-orderings of the continuum or unicorns.

  99. Scott Says:

    Zeb #98: It’s not exactly that the nonstandard models “don’t exist.” Given a countable number of steps, you can even explicitly construct a countable nonstandard model of PA; the completeness theorem tells you exactly how.

    It’s more like: the standard model of the integers has such an unambiguous meaning (indeed, one that’s presupposed in any mathematical discussion, including about nonstandard models), and is obviously so vastly more central to mathematics than any nonstandard models, that the word “true” should just mean “true in the standard model” unless specified otherwise.

    If someone insists otherwise, then I say to that person: all your study of logic did for you, is lead you into a tangle that you’ll need more study of logic to get out of. Right now, you know less than someone who never studied logic at all!

    To illustrate what I mean, consider the following dialogue:

    Alice: “Unicorns don’t exist.”

    Bob: “No, you only mean that unicorns don’t exist in our world. They do exist in logically consistent hypothetical worlds—for example, those that you get by taking our world and then adjoining a unicorn to it.”

    Alice: “Yeah, that’s what I said, that unicorns don’t exist.”

  100. Scott Says:

    Joshua Zelinsky #82: OK, I also have a counterexample to your conjecture for n=2. Namely, the machine M, given by

    A 0:1LB 1:H
    B 0:1RB 1:1RA,

    runs for 6 steps on an all-0 tape, just like the “official” 2-state Busy Beaver

    A 0:1RB 1:1LB
    B 0:1LA 1:H.

    But when started on a “1” square (surrounded by infinite “0” squares to either side), M halts in a single step, whereas the “official” machine takes 4 steps to halt.

    Having said that, looking at Pascal Michel’s page, it looks like this non-uniqueness has only been observed for n=1 and n=2—apparently the shift-number Busy Beavers for n=3 and n=4 are essentially unique, and so is the candidate for n=5. Thus, your conjecture might hold for n=3 onwards!

  101. Job Says:

    I’m wondering what’s the smallest run time that is not achievable by any n-state machine, for some n.

    Since many n-state machines have huge run times, and many never even halt, the space of possible run times for an n-state machine (described by k bits) will have some holes below 2^k, right.

    That’s like finding out that a 6-state machine can’t run for exactly 234 steps, though there is one that can do 235.

    Maybe we could use that to prove a statement false? E.g. X is false because otherwise we can construct a 20-state machine that runs for exactly 2^105 steps, well known to be impossible even though you wouldn’t guess it. Is that plausible?

    It’s the lazy beaver function: What’s the least amount of work that no one is already doing?

  102. Scott Says:

    Job #101: That’s a beautiful question! Note that, unlike BB, your LB function is computable (and, as you pointed out, upper-bounded by (4n+1)2n+1).

    Clearly LB(1)=2. I just confirmed that LB(2)=7. But LB(3) might already drop below BB(3).

    Note that LB increases monotonically for similar reasons to BB: given an n-state machine, we can always replicate its runtime with an (n+1)-state machine, or increase its runtime by 1 step.

    Can anyone here say anything else about this function?

  103. Gerald Says:

    For finite mathematics, i.e the theory of the positive integers, most mathematicians agree with Scott. The philosophy is platonism. We know what valid (standard) natural numbers look like, they can in principle be written down. And prior to any logic and formal theories there is already truth about them out there. It’s the job of mathematicians to find what is true. It’s easy to see that at least Π1 statements like Goldbach should obviously be either true or false “in reality”. It’s then also not too hard to extend this truth-realism to all arithmetical statements.

    Now, there seems to be quite an amount of confusion going around regarding Gödel’s incompleteness theorems and nonstandard models of arithmetics. People often mistakenly think that these theorems say that there can’t be objective truth in mathematics, that truth-relativism is a fundamental featue of math. This is wrong. The incompleteness theorems only say that no effecitve formal theory can capture all truth that is already there, that every such theory must necessarily be incomplete. Gödel sentences are actually true statemens about positive integers that the theory only fails to prove true. Nonstandard natural numbers are cleverly constructed infinitary objects. They are no natural numbers, they are fake integers that the theory is too weak to detect as being invalid. Nonstandard models exploit the incompletenes of a theory like PA to trick it into accepting these fake numbers as valid numbers.

    The situation is very different in infinitary math. We do not know what all the “valid” reals (subsets of integers) are. Many set theorists believe that we have to accept a mathematical multiverse here, that statements about infinitary combinatorics like the continuum hypothesis (CH) do not come with a prior truth value attached. However, this has nothing to do with finite math and the Gödel incompleteness theorems. Unfortunately some popular math articles present the indepence of CH as an example of Gödel incompleteness. This is bad, it’s wrong and further adds to the confusion. Neither CH nor its negation are Gödel sentences. The independence of CH of ZFC is quite a different beast. Also, the reals added in Cohen’s construction are completely valid real numbers, they are in no way fake or “nonstandard”. The Cohen-Models of set theory are not nonstandard models in any sense.

    So, we have platonism for finite math but we probably have to accept a more formalistic foundation of infinitary math at least for now. Of course not everyone agrees with this philosophy. Ultrafinitists for example bring the limitations of the physical world into the picture. They claim that huge numbers like Grahams Number, TREE(3), BB(1000) etc. are still fictions, they are only thoughts. They are defined in an abstact way but in no realistically conceivable universere can they actually be written out. They say if ZFC is inconsistent but the shortes proof of a contradiction has length TREE(3), why should we bother? It’s hard to argue against this. IMO, if you want anti-platonism for finite math, you have to resort to some ultrafinitistic argument.

  104. Mg Says:

    Conjecture: for all n>4, BB(n) is an odd perfect number such that there is a zero of the Riemann zeta function with it’s imaginary part floored equal to BB(n) and real part not equal to 1/2, and also BB(n)’s big-endian ASCII encoding is a proof of P=NP in Coq (your favourite implementation)

  105. Persona Says:

    Scott #96: Thanks for your kind reply. Of course, you may include this observation in the article. You may refer to me by Niels S. Lohmann. (I’ m not identical with any Niels Lohmann you might currently find via Google.)

  106. Scott Says:

    Job #101: I now have a conjecture about your Lazy Beaver function. I think LB(n) is going to grow like nΩ(n). In other words, I think that as n gets large, the small running times are going to get densely filled in, and the first unfilled running time will be only polynomially smaller than the total number of n-state machines. Moreover, I think that weaker results in this direction (e.g., at least some exponential growth with n) are probably feasible to prove, by giving explicit constructions that fill in the running times—likely a patchwork of constructions, with the better ones kicking in only later, but with the small values of n and small running times that the better constructions can’t handle having already been handled by worse constructions.

    If I mention this conjecture in my survey, how should I thank you? As “commenter Job” or by a real name? 🙂

  107. Scott Says:

    Mg #104:

      Conjecture: for all n>4, BB(n) is an odd perfect number such that there is a zero of the Riemann zeta function with it’s imaginary part floored equal to BB(n) and real part not equal to 1/2, and also BB(n)’s big-endian ASCII encoding is a proof of P=NP in Coq (your favourite implementation)

    You see, that’s exactly why I think it’s so important for us to nail down BB(5)—to refute conjectures like that one. 😀

  108. Scott Says:

    Gerald #103: That’s beautifully said, better than I could’ve said it. I give it a standing ovation.

    But I do disagree with you on one point. I think I have a clear enough conception of what a “valid” real (i.e., an infinite binary sequence) is. It’s only at sets of reals (not coincidentally, the subject of CH) that I lose a clear conception.

    And related to that, I think it’s crucial to understand that Cohen’s construction takes place entirely in the world of countable models of ZF, not the world of “the real reals.” So, yes, a Cohen real is a real real, but it’s being added to a set of reals that’s actually only countable (except, a model of ZF “mistakenly thought” that it was uncountable)! So it shouldn’t be thought of as enlarging the universe of reals in any Platonic sense.

  109. maline Says:

    Scott #108: A calculable real, or an algorithmically generated binary sequence, are clearly “valid” – but there are only countably many of those. Personally I am not at all comfortable with the concept of a “general” binary sequence as a valid object. If there is no way at all of generating or describing such an entity, then in what sense does it “exist”?

    On a similar note, do you find Godel’s model L, or similarly constructible models, to be less “true” than von Neumann’s V? Why?

  110. Scott Says:

    maline #109: So let me state my position more carefully. I’m not aware of any question of the form, “does there exist a real number / infinite binary sequence with property X?,” that I wouldn’t accept as having a Platonically correct answer. (Can anyone suggest a candidate?) I am, of course, aware of multiple such questions about sets of reals.

  111. Filip Says:

    A fresh 2 hour Richard Karp interview hosted by Lex Fridman:

    It’s pretty advanced and technical 🙂

  112. DangerNorm Says:

    It occurs to me that I’ve been thinking about the meaning of S(n) a bit wrong. I’d been thinking of it as the longest that an n-state Turing machine can run and still halt, but of course this is not true: a machine that performs some computation whose runtime depends on its input, like factoring an integer, could be made to run arbitrarily long by setting it to work on a tape containing a large enough number. Hence why “starting on an empty tape” is part of the definition.

    However, this requires that you simply assume that the number is encoded on the tape when you start counting. If you require that the machine also encode the value of the number, then BB(n+m) puts an upper bound on the largest number that can be encoded, where n is the number of states of the machine that implements the algorithm, and m is a machine that writes out the input before simulating the n-machine.

    This line of thought led me to another function that one might wonder at the properties and values of: the largest natural number that can be encoded on a Turing tape with an n-state Turing machine, such that all smaller natural numbers can also be encoded with an at-most n-state Turing machine, using the same encoding.

  113. Zeb Says:

    Gerald #103:
    > They are no natural numbers, they are fake integers that the theory is too weak to detect as being invalid. Nonstandard models exploit the incompletenes of a theory like PA to trick it into accepting these fake numbers as valid numbers.

    I’m not claiming that the true natural numbers don’t exist and aren’t meaningful, or that nonstandard naturals are true numbers. I’m just pointing out that if a model of nonstandard arithmetic looks sufficiently like the true numbers to fool PA, then it can also be used to construct a fake physics (which we might be living in right now).

    I see no physical experiment we can perform which will definitively rule out the possibility that we are living in a universe based on one of the nonstandard models of arithmetic, and that a nonstandard amount of time has already passed within our lifetimes – do you have one in mind?

    I understand that second order axioms can pick out one single true, platonic set of integers (and that currently, the same can’t be said to be true of set theory). But physics doesn’t magically give us access to second-order properties of the integers, unless the physical Church-Turing thesis is dramatically incorrect.

    Any refutation of this possibility must be philosophical rather than logical or physical. You can argue that there are philosophical reasons to believe that we do not live in an uncomputable universe which merely believes itself to be computable with respect to its internal model of the integers – and honestly, I find such a philosophical argument fairly compelling. But there are no *logical* reasons that this couldn’t be occurring, given our observations, just as there is no logical way we can be completely sure that we don’t live in a simulation, or that we aren’t characters in a play about the hubris of pure logic.

    (My view on ultrafinitism is that it is laughably naive: just because you can count to a number and grasp it in your hands, you are convinced that that number is “safe”? Nothing is absolutely safe.)

    Scott #99: The difference between this situation and the unicorn example that you give, is that there are actually physical experiments we can perform to rule out the existence of unicorns. For instance, we can send teams of people around the world to search for unicorns, or ask our neighbors if they have ever seen a unicorn, or break into secret government labs where we believe that unicorns might be being created.

  114. DangerNorm Says:

    Actually, I suppose that function would be very slow growing, compared to BB(n), since uniquely specifying every natural number up to some n requires at least log(2)(n) bits, and there just aren’t that many bits available in the specification of Turing machines, as described right on page 1.

  115. Scott Says:

    Job #101 (following up on your #106): Let T(n) be the set of n-state Turing machines. Then I believe I can now prove the following about your Lazy Beaver function:

    |T(n)|/cn ≤ LB(n) ≤ |T(n)|,

    for some constant c. The lower bound is via a construction that combines the “introspective encoding” of Ben-Amram and Petersen 2002 (see my survey for more), which should let you build an n-state machine that runs for any desired number of steps up to some |T(n)|/cn, but accurate only to within plus or minus n or so, with a more specialized gadget that uses O(log n) states to run however many additional steps between 0 and n you need. I can post more details later if anyone is sufficiently interested.

    Of course one can ask many other questions about the spectrum of runtimes: for example, how much less than BB(n) is the second-longest runtime?

  116. Scott Says:

    Zeb #113: Physical experiments can only rule out unicorns in the same ways they rule out nonstandard integers in physics. What if the unicorns, being magical, are invisible and make no sound, like Carl Sagan’s invisible dragon in the garage? So yes, with unicorns and nonstandard integers alike, the argument is ultimately “philosophical.” It’s something like: not only have we seen no evidence that this exists or influences physical reality, but its existence would do violence to our present understanding of physical reality, and no empirical problem has been presented that postulating this new entity would solve.

  117. RANDY Says:

    I have at multiple times in the past had an experience that seemed like I had just completed a supertask with order type ω*, with no subjective connection between any portion of the ω*-ordered part & the things that must have happened before. It seems this was because my brain conflated two parts of a monotonous experience—it has always been something like raking leaves or pacing that is repetitive but not static. (This seems related to Zeb #97’s joke experiements.) Experience continuing on after an ω-ordered supertask does seem more impossible, but that could just be a failure of imagination on my part. So I do not find physical time being nonstandard to be obviously impossible.

    But I am not sure if 10 being nonstandardly large makes sense, because we can explicitly enumerate the finite number of…oh. Okay, I think I see Zeb’s point. We can (maybe) imagine a larger nonstandard world, but what would the standard world look like from a nonstandard world? It seems it would have to look like it just stopped after some finite point, if it were imaginable at all.

  118. maline Says:

    Scott #110: Well, I’m convinced by your argument that questions about Turing machine behavior are “ultimately finitary”, which implies that querying a HALT oracle is a well-defined “procedure”. Therefore I’ll accept as “valid” any binary sequence that can be constructed using such an oracle, and so on up the ordinal hierarchy of oracles. Combining a countable set of well-defined oracles into a single oracle, with the index of desired sub-oracle given as part of the query, also seems fairly innocuous. How far does that get us? I think until, but not including, \epsilon_0. That limit step might not work, because you will need an infinitely long address to describe which sub-sub-…-oracle you want to query!

    As long as there is some such limit step that we treat as potentially ill-defined, we have a countable set of “valid reals”. Diagonalization is not an issue because we cannot enumerate the set without using the unacceptable oracle.

    BTW, does anyone know whether this set has a name? Does it correspond to anything a set theorist would find familiar?

    Anyway, how does this leave me with regard to statements of the form you mentioned? Are there any familiar cases where a specific real number or a binary sequence can be shown to exist, but may not equal any of my “valid”, definable ones? If so, I probably will have to modify my setup to include them. But if not, then it may be that Scott and I are in agreement: Perhaps all “interesting” statements of the form “there exists a real number/ infinite binary sequence with property X” do have Platonic truth values, but only because they turn out to be discussing constructible entities!

  119. Scott Says:

    maline #118: Right, at some point we’re going to bump up against Lowenheim-Skolem. Any theory of the reals will have a model with “secretly” only countably many reals. So even when we talk about, e.g., a “random real” with no definable regularities of any kind, in some sense we have no way to tell whether it’s secretly just a special example of such a real, constructed as part of a countable model.

    All the same, I … feel like I have a clear enough conception of what it would mean to flip a fair coin a countable number of times. And I’m very open to hearing counterexamples, but so far I haven’t heard a compelling problem that would come from extending my Platonism to the whole uncountable collection of such countable sequences of coin flips. I have no similarly clear conception of what it would mean to pick a random set of reals (where do you start?), or of whether it’s “better” for CH or AC or Projective Determinacy to hold or not hold.

  120. maline Says:

    Scott #119: Well, if our universe (spacetime universe, not universe of discourse!) is actually infinite in extent, then there really are infinitely many independent qubits. There probably are even infinitely many literal coins. So I guess arbitrary infinite sequences might be a real thing whether we like them or not…

    BTW, is there a sensible formalism to deal with infinitely many qubits? How could you give amplitudes to uncountably many sequences, while keeping the total probability finite?

    I also take back my point about “infinitely long addresses”. The sub-oracles are indexed by ordinals, each of which can be written naturally and finitely in the Cantor notation. So now I’m confused as to where would be a good place to stop allowing oracles.

  121. Scott Says:

    maline #120:

      BTW, is there a sensible formalism to deal with infinitely many qubits? How could you give amplitudes to uncountably many sequences, while keeping the total probability finite?

    The question is an ironic one, since the physicists (especially in the context of QFT) were dealing with all the complications of infinite numbers of qubits for generations before quantum information came along, and started worrying about only finite numbers! Anyway, the short answer to your question is yes. You do much the same thing as in classical probability over continuous spaces, and assign probabilities not to individual points but to intervals (and calculate the probabilities by integrating a density). Things are nicer when the Hilbert space has a countable basis, but it’s possible to do QM even in Hilbert spaces of uncountable dimension. But it’s complicated! And I, for one, am infinitely grateful (finitely grateful?) for the finitude of my Hilbert spaces.

  122. Job Says:

    If I mention this conjecture in my survey, how should I thank you? As “commenter Job” or by a real name?

    Thanks, but no need. It’s easier for me to learn when using an alias. Like, a commenter is good enough if you need to blame someone. 🙂

    I believe I can now prove the following about your Lazy Beaver function:
    |T(n)|/c^n ≤ LB(n) ≤ |T(n)|,
    for some constant c.

    Are there any bounds on c?

    Also, LB(n) is strictly less than |T(n)| right?
    E.g. if NH(n) is the number of non-halting n-state machines, then LB(n) is no more than |T(n)|-NH(n)? Plus lots of machines will have the same run time.

    Since |T(n)| – LB(n) is both computable and an upper bound on the number of candidate machines for BB(n), i wonder how large LB(n) is even allowed to be.

    E.g. if you could always narrow down BB(n) to n^c machines, for some fixed c, would anything break?

  123. Scott Says:

    Everyone: Now that I have my own Busy Beaver search code up and running, I can start answering some of the empirical questions without relying on others! Here are two nuggets for tonight:

    – LB(3)=22. In other words, 3-state Turing machines fill out the entire spectrum of runtimes from 1 up to BB(3)=21, so the third Lazy Beaver number is just BB(3)+1=22, similar to LB(1)=BB(1)+1 and LB(2)=BB(2)+1. We know this pattern must break down by LB(6)<<BB(6) if not earlier.

    – The 3-state Busy Beaver is indeed unique up to trivial isomorphisms.

  124. asdf Says:

    If we believe the Church-Turing thesis then we can’t verify the existence of long random bit strings (say from coin tosses or quantum experiments). But we go around believing in their existence. Standard and nonstandard integers seem about the same way. Similarly, GR lets us predict stuff about the interior of black holes, but by definition (if we believe GR) there is no way to observationally verify the predictions. It usually doesn’t bother us too much. I don’t see how physics can even confirm the existence of ordinary computable numbers like A(100), let alone busy beaver numbers. The number of informational bits in the universe is much smaller than the log of that number. So it’s all philosophy again.

  125. STEM Caveman Says:

    Busy Beaver for halting time (rather than output size) is an obfuscation. There is nothing interesting you can say about specific values of BB(n) that is not an obfuscation of a clearer and sharper statement about halting of an n-state Turing machine. The reason is that knowing BB(n) is equivalent to solving the halting problem for all n-state TM’s, but any specific statement about BB for specific n comes through individual machines, e.g., lower bounds, or that 27 states is enough to encode a search for Goldbach counterexamples and 748 encompasses ZF.

    > “nonstandard integer” is: a token for an integer that doesn’t actually exist.

    Nonstandard integer is an an anti-concept introduced to avoid speaking explicitly about syntax.

  126. Scott Says:

    STEM Caveman #125: Wait … you’re saying that, whenever we look at the maximum of a finite set, there must be a particular element in the set that achieves the maximum? Thank you! How did I write a whole article on the BB function while missing such a profound insight? 😀

  127. Scott Says:

    asdf #124: I’m not even sure what it means to “confirm the existence” of an integer. Do you agree that 1080 exists? Would we need to count up to it, maybe with fingers and toes, to be sure it did? Or is it enough that astronomers assure us that there are >1080 atoms in the observable universe? What if next week the astronomers revise their models and say there are actually >108000 atoms—will the latter number then pop into existence? Who would want to use language in such a tortured way? Why not simply say that, in whatever sense 10 and 20 “exist,” 108000 and A(100) and BB(100) clearly also “exist”? And even though we can’t well count to them, we can (wonderfully) still study them and prove things about them—just like we don’t need to fly to the Sun and touch it in order to learn about its composition?

  128. Scott Says:

    Nick #66, #85, #89: I’m doing final revisions to the article now, and decided to add a short section on the question of uniqueness of Busy Beavers. Should I thank you as “commenter Nick” or by a real name?

  129. STEM Caveman Says:

    @Scott 126

    Not to deny you the fun of “sneering the messenger”, but:

    Apart from the communication and comprehension overhead caused by talking in terms of BB (and the fun of “sneering the messenger”), this is a finitary version of issues about “predicativity” that arise when defining something as a max or min of some set of values. It’s not as trivial as you imply since membership in the set is not computable. Generally predicative definitions are easier to understand and work with whereas impredicative are chosen for their Aesthetic or Metaphysical advantages.

  130. maline Says:

    Scott #121: Can you give me a keyword or reference on working with infinitely many qubits?

    In standard presentations of QFT the issue is avoided by working in Fock space: there are only countably many field modes and the total number of excitations must be finite. My question is about what happens if you drop the second condition and allow infinitely many particles, as is appropriate for an infinite and homogenous universe.

  131. Sniffnoy Says:

    Scott #119:

    Hm, so you don’t have a problem with ℘(N), but you do have a problem with ℘(℘(N))… what about ℘(ω_1)? Since the question you asked was “where do you start”… 🙂

  132. Luke G Says:

    maline #118

    On the question of how far you can get up the hierarchy of ordinals, you can get considerably farther than epsilon_0 with “well defined” operations by introducing the Veblen function, which is basically a way of systematizing diagonalization.

    Using the Veblen function, you can get up to the Feferman-Schutte ordinal with only “well defined” operations. More precisely speaking, this ordinal is often considered the limit of “predicative” mathematics. “Predicative” basically means you can only build on what you’ve already established: sets can only be defined in terms of sets that you already know exist. To get above the Feferman-Schutte ordinal, you need to use impredicative principles, such as assuming larger ordinals already exist even though you haven’t yet given a construction for them. (Disclaimer: there is some disagreement on exactly what “predicative” means and hence the corresponding ordinal.)

  133. Jonathan Weinstein Says:

    Thanks, I enjoyed very much the BB survey. I also scanned through your older article on big numbers. There are some (perhaps difficult) questions that would come up if we made the rules in the “bigger number” game that you can call BB, but not any other functions besides those on a calculator (which has exponents, but not factorials, let’s say.) Should I spend my limited time nesting BBs? Or putting a tower of exponents inside a BB? The following seems like a reasonable conjecture at a recursive answer: If it costs time d to write BB(), then the optimal solution given total time t is to write the optimal solution for time t-d, then put BB() around it.

  134. Scott Says:

    Jonathan #133: If you’re allowed only BB and calculator functions, then definitely, without a doubt, just nest BB as many times as you can (except possibly in the last second). I guess stuff like “BB(BB(…BB(100))) nested BB(100) times” is disallowed under your rules?

  135. Jon Says:

    Scott #119: re: sequences of coin flips, you may like

  136. Sniffnoy Says:

    maine #118, Luke G #132:

    Expanding on Luke G’s final comment, here’s an article by Nik Weaver arguing that the idea that Γ_0 is the first impredicative ordinal is unfounded, that it is in fact entirely predicative, and that the first impredicative ordinal is in fact much larger (larger than the small Veblen ordinal, and even — he claims but doesn’t explicitly argue — larger than the large Veblen ordinal). Note: Anything involving predicativity is very much not my area and I can’t really meaningfully discuss this myself, other than to say that it seems convincing to me.

    Also maine, just in case you’re not familiar with all these big countable ordinals, this series of blog posts by John Baez (1, 2, 3) is I think a good introduction. 🙂 (Although part 1 will probably be things you already know.)

  137. Persona Says:

    What about the Busy Beaver function of a Turing machine with oracle access to L?

  138. Scott Says:

    Persona #137: For which L?

  139. Persona Says:

    Scott #138: L as defined in Scott #96.

  140. Scott Says:

    Persona #139: Oh sure, you can always take BB with an oracle for whatever fast-growing function you previously defined (this is the BB version of what’s called the “Turing jump” in computability theory). I’m curious about whether you can go beyond diagonalizing across all the computable ordinal notations in some more impressive way than that.

  141. STEM Caveman Says:


    > Given a countable number of steps, you can even explicitly construct a countable nonstandard model of PA; the completeness theorem tells you exactly how.

    What? There are such strong limitations on how explicit it can be that, like Alice and Bob talking about unicorns, it is pretty much everything we could mean if we said “there is no such thing as an explicit nonstandard model”.

    You are very much correct about learning so much logic that even more logic is needed to escape the trap. But this can also be applied to the BB function —- a grand theory of hypothetical numbers-according-to-a-formalism, that lack the materiality of what we think of as integers but (lo-GIC!!!!) get talked about as though they were, out of force of verbal habit.

  142. Curtis Bright Says:

    Sniffjoy #131:
    If I’m understanding Scott correctly then he does have a problem with ℘(N). Because if this set platonically exists then presumably there would be a definite answer to if it has a subset whose cardinality is strictly between the cardinalities of N and ℘(N). (You also need to believe in the existence of ℘(N)×℘(N) to formalize this.)

  143. Scott Says:

    STEM Caveman #141: Sorry, by “construct” I just meant “follow the iterative process from the proof of the Completeness Theorem.” I didn’t mean that you’d end up with computable addition or multiplication operations (by Tennenbaum’s theorem, you won’t).

    Also, the surest sign that your study of logic has led you into a trap that you need more logic to get out of, is if you start to doubt the “materiality” of (by which we simply mean, objectivity of truths about) positive integers, whether we’re talking about 10 or about BB(10). 🙂

  144. Scott Says:

    Sniffnoy #131 and Curtis Bright #142: Since “have a problem with” is a bit vague, let me make this discussion concrete as follows. Can either of you propose a candidate for a mathematical statement with an indefinite truth-value—like I think is very plausibly the case for CH—but whose “logical complexity” is lower than CH’s?

    Somewhere between quantification over reals and quantification over sets of reals, we seem to lose Platonicity, and it would be fascinating to zoom in on where (although, if I try to look, I fear getting lost in a thicket of arcane ordinal notations).

  145. Vaarsuvius Says:

    Being entirely a novice in this area, my questions may or may not be laughable to the extreme. Nevertheless:

    1. The Busy Beaver number BB(n) seems to be a definition based on a qualifier i.e. “the grains of sand in the largest beach on Earth”. Of course, beaches are real places whose average volume of sand can be counted, just as the Busy Beaver is a condition that can be tested for by evaluating the possibility space of all possible Turing machines with n states. Doesn’t that make BB(n) as “real” a number as n, given that n’s definition in PA is just the number that satisfies the condition (n-1)+1 (speaking in extremely loose terms, since addition is defined recursively and each new number defined in terms of the previous)?

    2. Can we design a program to check for BB numbers in anywhere near reasonable time, even if you allow quantum computers?

  146. Nick Says:

    Wowee, a mention in section 5.7! I’ll claim comment #66 as “Nick Drozd”.

    The definition of “essentially different” in section 5.7 is not what I would have expected. Take a machine M and construct a new machine M’ by replacing every Left shift with a Right and vice versa. M’ will be just like M except that it runs in the opposite direction, so we wouldn’t want to count M and M’ separately for BB purposes. Maybe there are some other transformations like this. I’m sure there’s a more succinct way of describing this idea. Importantly, it is tedious but straightforward (primitive recursive) to verify that two machines are similar in this sense.

    In contrast, your definition says that machines M and M′ are different “if there exists an input configuration on which they run for different numbers of steps”. Certainly this is a striking way to define similarity, and it underscores the point from section 1.1 about “a goal completely orthogonal to any normal programmer’s goal”. But it doesn’t seem like a straightforward property to test for. Is it even decidable in general? If so, is it primitive recursive?

    Is it possible for two machines that exhibit different behavior to be runtime-equivalent? That is, do there exist two machines that run for exactly the same number of steps on all inputs but that leave different tape contents for some input?

  147. Gerald Says:

    Scott #119: “All the same, I … feel like I have a clear enough conception of what it would mean to flip a fair coin a countable number of times. And I’m very open to hearing counterexamples, but so far I haven’t heard a compelling problem that would come from extending my Platonism to the whole uncountable collection of such countable sequences of coin flips. I have no similarly clear conception of what it would mean to pick a random set of reals (where do you start?), or of whether it’s “better” for CH or AC or Projective Determinacy to hold or not hold.”

    This is a strong statement, I’m not that confident. Should there be a nonconstructible real? Should 0# exist? Should Projective Determinacy hold? Must god know the one correct answer to these questions?

    If you say you have a clear conception about arbitrary (!) ω-sequences of coin flips (reals) does this intuition include quantifying over reals? Should every second-order arithmetic statement (allowing quantifiers over reals) have a definite “right” truth value?

    While CH is a third-order statement involving quantifiers over arbitrary sets of reals, PD talks only about definable sets of reals. It is an infinite scheme of statements in second-order number theory, similar to induction in PA. Should it have a definite truth value? We know it fails in L but follows from large cardinals.

    The reason why I’m a platonist about first-order arithmetics is that I know the quantifiers should only range over numbers I can at least in principle write down. If Goldbach is wrong we can in principle write down an actual counterexample. This does not apply to the reals, they are uncountable, I don’t have a name for most of them. There are many forcing constructions that add a single or only few new interesting reals with special properties. I can’t see in general wether these new reals should exist or not, they all seem legit. I don’t see when the set of reals is ever complete, it seems to never be. Making the universe broader adds new reals. Making the universe higher also adds new reals of a different kind. For the positive integers every model of even the weakest theory already contains all of them.

    Btw. the countable ground model is only a technicality in some approaches to forcing. One can start with an arbitrary transitive model (even V itself) and then construct V[G] via the boolean-valued-model/ultrafilter construction.

  148. Scott Says:

    Vaarsuvius #145:

    1. Yes, I’d say that the BB numbers are as sharply defined as any other numbers in math. People have even determined the first 4 of them!

    2. There is no algorithm, classical or quantum, to calculate BB(n) given n, in any finite amount of time. That’s the whole point of being uncomputable. The first 4 values were determined via a combination of automated tools and ad hoc, hand analysis of particular machines to prove that they didn’t halt.

  149. Nick Says:

    Here’s a historical question about Rado’s paper “On Non-Computable Functions”. Reading over it just now, I notice that the abstract, the introduction, and the closing summary do not make any mention of Turing machines or the Busy Beaver function. Instead, they discuss the general point that non-computable functions can be defined using “extremely primitive” means, like “the principle that a finite, non-empty set of non-negative integers has a largest element”, and without using any diagonal arguments. The rest of the paper describes the Busy Beaver game in detail.

    So the question is: did Rado sandwich the Busy Beaver game in those general observations at the behest of an editor? I’m imagining a scenario where Rado just writes about BB, and an editor says, that’s frivolous, recreational, etc, and then Rado (or maybe even the editor) adds the stuff about “exceptionally well-defined” functions as a way of justifying BB’s importance. But that’s just my feeling. Does anyone know for sure?

  150. STEM Caveman Says:

    > PA is equivalent to ZF-INF (axiom of infinity deleted)

    Negated, not deleted. ie, ZF + Not(INF).

  151. Sniffnoy Says:

    Hm, so having now read the part about beeping busy beavers — is there any way to extend this construction to make a function that grows like BB_k for any given k? Like is there a way to get an approximation of BB_2 via “beep-beeping busy beavers”? 🙂

  152. dm Says:

    Your survey is surprisingly accessible to a biologist with limited math sophistication. I (apparently) followed things well-enough that the question of uniqueness came to mind just in time to be addressed by your conjecture.
    1) What does the distribution of halting run-times look like for TM’s below the champion, for say n=5? I’m guessing that the current champion is well separated from the pack.
    2) What is the smallest state number needed to implement the Collatz map on consecutive numbers? Presumably, BB(n) for that number must be greater than 2^68.

  153. Scott Says:

    Sniffnoy #151:

      is there any way to extend this construction to make a function that grows like BB_k for any given k? Like is there a way to get an approximation of BB_2 via “beep-beeping busy beavers”?

    Hey, I wondered about that too! My guess was that it’s just a happy accident that there’s a definition for BBB that’s essentially as simple as the definition of BB itself, and that as k gets large, the definitions for BBk will necessarily get more and more kludgy and complicated. But I’d love to be proven wrong about that!

  154. Scott Says:

    Gerald #147: Well, as a quantum mechanicist in my day job, I’m more than happy to take a fair coin-flip as a basic primitive of my conceptual universe—and having flipped a coin 10 or 20 times, I’m more than happy to add an ellipsis and imagine the same process continuing forever. 😀 Whereas there’s nothing I’m similarly happy with that would need to be done separately to each point on a continuum.

    Having said that, I’m entirely open to being convinced that there’s some question about the existence or nonexistence of an infinite binary sequence, that I’d admit as probably not having a Platonic answer. I just haven’t seen it yet.

    Regarding 0# and nonconstructible reals, can’t we even explicitly calculate a few of their digits—just like we can explicitly calculate the first few values of the BB function? Can’t we imagine whatever we did continuing to infinity? If so, then my feeling is unequivocally: yes, these objects exist. Which is entirely compatible with the fact that set theories that we might want to use might be unable to talk about the objects—because if they could then they’d yield a contradiction, and indeed the objects were basically constructed by diagonalizing against those set theories.

    I still don’t understand the various determinacy axioms well enough to have feelings one way or the other, although perhaps I’ve already committed myself to a view with what I said above.

  155. Scott Says:

    dm #152:

    1) There are probably many, many interesting things to figure out about the distribution of runtimes, and I hope my survey plays some part in inspiring people to do that! But, yes, the current n=5 and n=6 champions are well separated from the pack, although there are other 5- and 6-state machines with “qualitatively similar” behavior (e.g., running for about half as long as the champion in the 5-state case, or 10 to some slightly less huge number in the 6-state case).

    2) Collatz is not exactly a question about the halting of a Turing machine on a specific input—it’s a question about whether a Turing machine halts on all inputs. So Collatz and BB aren’t directly comparable. Still, Pascal Michel has indeed studied the smallest Turing machines that implement the Collatz map on arbitrary inputs—see for example this paper.

  156. Curtis Bright Says:

    If I understand your view correctly, Scott:

    &bullet; Statements in first-order arithmetic: have definite truth-values.

    &bullet; Statements in second-order arithmetic: you suspect they have definite truth-values, but you are open to counterexamples.

    &bullet; Statements in third-order arithmetic: may not have definite truth-values (e.g., the continuum hypthothesis).

    I found a paper that lists a number of statements equivalent to a “second-order version of the continuum hypothesis” (equivalent to what Gerald mentioned, that all reals are constructible). So you would have to believe that either the propositions in Thm 1.1 are true or the propositions in Thm 1.2 are true, and maybe you do—I’m not sure if these count as plausible “indefinite truth-value” statements or not.

  157. Bruce Smith Says:

    Regarding the “open problem” of “improving Proposition 1” (BB(n+1) > BB(n)), I think I can improve it in two distinct ways — one that I suspect might be obvious (though I think your paper could usefully mention it), and one that your paper implies is new.

    Theorem X1 (suspected obvious): BB(n+2) ≥ BB(n)(1 + 2/n).

    Theorem X2 (new): BB(n+1) ≥ BB(n) + 2.

    Here are their proofs, encoded by rot13 for those of you who’d like to puzzle them out first:

    Proof of X1:

    Yrg O or n znkvzny-ehagvzr znpuvar va G(a), naq yrg C or gur fgngr bs O juvpu vf zbfg bsgra gur pheerag fgngr nf nal fgrc bs O fgnegf ehaavat. (C znl be znl abg or gur vavgvny fgngr.)

    Abgr gung C vf gur pheerag fgngr ng gur fgneg bs ng yrnfg OO(a)/a fgrcf.

    Jr jvyy znxr n arj znpuvar O’ ol zbqvslvat O, nqqvat gjb arj fgngrf naq vapernfvat vgf ehagvzr.

    Gb qb guvf, fgneg jvgu O, gura ercynpr vgf fgngr C jvgu n guerr-fgngr frdhrapr, A1 A2 C, jurer gur frdhrapr A1 A2 “qbrf abguvat” naq C npgf nf orsber. (Vs C jnf gur vavgvny fgngr, A1 vf abj gur vavgvny fgngr; bgurejvfr gur vavgvny fgngr vf hapunatrq.)

    Zber cerpvfryl, rirel rkvfgvat “genafvgvba gb C” (fbzr fhofrg bs gur 2a genafvgvba ehyrf va O) vf ercynprq ol na bgurejvfr-vqragvpny genafvgvba gb A1. Qrsvar A1 gb abg nygre gur gncr (v.r. jevgr gur fnzr ovg vg’f ernqvat), zbir yrsg, naq tb gb A2. Qrsvar A2 gb abg nygre gur gncr, zbir evtug, naq tb gb C. Qrsvar C nf orsber (rkprcg gung vs bar bs vgf rkvfgvat genafvgvbaf jrag gb C, vg abj tbrf gb A1 qhr gb gur zbqvsvpngvba nyernql qrfpevorq).

    Pbzcnevat n eha bs O naq n eha bs O’, rirel fgrc fgnegvat abg ng C orunirf nf orsber. Rirel fgrc fgnegvat ng C orpbzrf n frevrf bs guerr fgrcf fgnegvat ng A1, A2, naq C, erfcrpgviryl. Gur A1 fgrc zbirf yrsg, gur A2 fgrc zbirf evtug (arvgure bar nygref gur gncr), gura gur C fgrc fgnegf va na vqragvpny fgngr nf vg qvq va O, fb vg orunirf vqragvpnyyl.

    Fvapr jr pubfr C gb eha ng yrnfg OO(a)/a gvzrf, rnpu bs A1 naq A2 nyfb ehaf gung znal gvzrf. Gur bgure fgngrf eha gur fnzr ahzore bs gvzrf nf orsber. DRQ.

    (Vs genafvgvbaf jrer nyybjrq gb “zbir arvgure Y abe E”, jr pbhyq nqq bayl bar arj fgngr, naq cebir OO(a+1) ≥ OO(a)(1 + 1/a).)

    Proof of X2:

    Qrsvar O nf orsber, ohg yrg C or gur fgngr juvpu vf pheerag jura vg unygf (va vgf fvatyr eha ba gur mreb gncr), naq (nsgre gung unyg) yrg gur gncr pbagnva o1 ng gur pheerag cbfvgvba, naq o2 gb vgf vzzrqvngr yrsg. Abgr gung guvf vzcyvrf C’f ehyr sbe o1 vf UNYG. (Gurer znl or bgure UNYG vafgehpgvbaf va gur znpuvar, va C be va bgure fgngrf, ohg gurl arire eha.)

    Jr jvyy sbez n arj znpuvar O’ sebz O ol erivfvat C naq nqqvat bar arj fgngr A.

    Yrnir C’f ehyr sbe pbzcyrzrag(o1) hapunatrq, ohg ercynpr C’f UNYG ehyr sbe o1 jvgu “jevgr pbzcyrzrag(o2), zbir Yrsg, tb gb A”.

    Qrsvar A fb gung jura vg ernqf o2, vg zbirf evtug, ohg jura vg ernqf pbzcyrzrag(o2), vg unygf.

    Pbzcnevat gur eha bs O’ gb gur eha bs O, gurl orunir vqragvpnyyl hagvy gur ynfg fgrc bs O, jura O unygf ohg O’ qbrfa’g. Gur gncr ng gur fgneg bs gung fgrc (va rvgure znpuvar) ybbxf ybpnyyl yvxr [o2] [C o1]. (Gung vf, vg unf gjb fhpprffvir pryyf nf fubja, jvgu gur frpbaq bar pheerag naq gur fgngr orvat C, naq nyy bgure pryyf pbhyq or nalguvat.)

    Va O’, gur arkg fgngr ybbxf yvxr [A o2] [pbzcyrzrag(o2)], naq gur fgngr nsgre gung ybbxf yvxr [o2] [A pbzcyrzrag(o2)], naq nsgre gung gur znpuvar unygf. DRQ.


  158. Joshua B Zelinsky Says:

    It might also be of interest to examine the graph structure of the Busy Beaver functions. Often when introducing Turing machine to students, we represent it as a directed graph with vertices for states and directed edges for transitions. We can for a given Turing machine ask then about the graph properties its graph has.

    In that context, do Busy Beaver machines always have strongly connected graphs (in the sense that given any two distinct vertices A and B, there is always a directed path from A to b). This is true for the 2 state, 3 state, and 4 state machines, as well as the candidates for 5 and 6. This is *not* the case for Wythagoras’s 7 state candidate.

    Here’s a rough intuition for why we should expect Busy Beaver machines to be strongly connected: Assume that a given Turing machine is not strongly connected. Then we can partition the states into two non-empty subsets X and Y, such that after we have enter a state in Y we never return to any states X. Note that at some point all transitions stay in Y. Let’s say X has x total states and Y has y total states. Then the X component is only used for at most BB(x) states, and has written at most BB(x) tape symbols.

    Define a T(k,n) to be the largest number of states a Turing machine can run for followed by halting when run on some input of length no larger than k. (Note that T(0,n) = BB(n)). Now, assume that the m state Busy Beaver has a partition given as above, with m= x+y

    Then BB(x+y) would be bounded above by BB(x) + T(BB(x),y). This seems unlikely to be true for large m.

    A slightly weaker but also plausible guess is that there is some constant k such that any Busy Beaver machine has a strongly directed graph with at most k exceptional states/vertices And again, a similar argument would apply to the above.

    If this is true in general, this might also in some sense give implicitly a statement about the limits of introspection; if I understand the introspection technique, it should often lead to Turing machines which are not strongly connected.

  159. Joshua B Zelinsky Says:

    Bruce @ #157,

    The first of your two statements I think has been proved before (someone proved that result in an earlier comment thread here).

    Your second proof’s trick of using the actual situation on the tape at the halt is clever.

  160. Vasek Says:

    I wonder, can some of the conjectures from the survey make a good project to PolyTCS ( I feel that an understandable problem that anyone can start playing with could have its place there.

  161. Scott Says:

    Curtis Bright #156: Yes, that’s precisely my view.

    I looked at the paper you linked and found it quite interesting, but I’d strongly prefer a candidate whose statement didn’t involve Σ12 definability or whatever—one that was just directly about the underlying concepts, like CH is.

  162. Scott Says:

    Bruce Smith #157: Nice!! May I include a link to your comment in the paper and acknowledge you? (Although maybe I should de-rot13 it first.)

  163. Scott Says:

    Joshua #158: Goddammit, yet another great question from you! Actually, the same question had rattled around in my mind when I looked at the current BB(7) champion and thought “that can’t possibly be optimal,” but I never formulated the conjecture explicitly.

    Alas, I’m already way over SIGACT News’s page limit, and am not sure if I should risk the editors’ ire by exceeding BB(3) pages… 🙂

  164. maline Says:

    Luke G #132: Let me try to be explicit about what I am assuming. Let’s say that I accept the following:

    1) The questions of whether a given Turing machine will halt on a given input, and of what the output will be if it does halt, are “well-defined” (meaning they have “absolutely true” answers).

    2) A “well-defined” oracle is one that provides values of a “well-defined” function from the naturals to the naturals.

    3) The behavior of a Turing machine that is able to query a “well-defined” oracle is itself “well-defined”.

    Now consider the ordinal hierarchy of oracles, where oracle 1 answers the Halting problem, oracle n+1 answers the Halting problem for machines with access to oracle n, and the oracle for a limit ordinal allows access to all of the previous oracles. Up until what ordinal should I consider these oracles to be “well-defined”?

    It seems clear that the only obstacle will be if, at a particular limit ordinal, the capability of accessing all previous oracles cannot be expressed as a “well-defined” function from the naturals to the naturals – that is, the command to query a particular sub-oracle cannot be expressed as a natural number in a “well-defined” way. My initial thought was that this might happen at epsilon_0, because any ordinal smaller than epsilon_0 is naturally indexed by tuples of a fixed length: the coefficients in the Cantor normal form.

    But this of course was silly of me; there are infinitely many other indexing schemes that can work for much larger ordinals. Indeed, if I understand correctly, any ordinal smaller than the Church-Kleene ordinal can be put in bijection with the natural numbers in a recursive way. Such a bijection would provide exactly the indexing we need to “well-define” our limit ordinal oracle. But we can do even more: why should we be limited to recursive schemes, when we have this incredible tower of oracles at our command? All we should require is that, for each particular limit ordinal, there must be an indexing scheme that can be decoded by a Turing machine equipped with some specific earlier oracle. So how far up can we actually go?

  165. Bruce Smith Says:

    Scott #162: Please do! And if you can locate the original of Theorem X1 (see Joshua #159), I think that’s highly worth mentioning too, since the paper has very few limits on BB(n + c) in terms of BB(n) for any small constants c.

  166. Bruce Smith Says:

    Scott #163 and Joshua #158: I do think that argument for a “likely-almost-connected directed graph” is pretty interesting, and worth mentioning (I think you could do it in a single paragraph).

    More generally, so are intuitive arguments modeling the likely evolution of a long-running machine as a “pseudorandom walk through tape states which slowly lengthen”. For example, that would “explain” the runtime being quadratic in the number of ones left or the number of cells reached (due to statistics of random walks of tape position). (I’m sure such ideas are not original. I even think I read them already in related old columns by Martin Gardner.)

  167. Bruce Smith Says:

    Thinking more, I think I was wrong about the random walk statistics prediction. It would only hold for the part of the runtime spent while the tape state had already reached maximal length, which would be a small fraction of the runtime. Your earlier explanation of the observed quadratic relation, implying a more systematic scanning of the tape, seems better.

    I suppose people have examined machine-state sequences to see how random they look. (BTW, I’m sure Wolfram must have written up some experiments related to this in A New Kind of Science.)

  168. Luke G Says:

    maline #164

    This is getting outside my familiarity (so I hope someone here can correct me if I’m wrong!), but my understanding is that adding BB oracles doesn’t actually give you access to ordinals above the Church-Kleene ordinal. In particular, the Church-Kleene ordinal is the both limit of recursive ordinals, and the limit of hyperarithmetical ordinals.

  169. Bruce Smith Says:

    I think I can improve Proposition 1 further!

    Theorem X3: BB(n+1) ≥ BB(n) + 3.


    Let B be a maximal-runtime machine in T(n), and let P be the state which is current when it halts (in its single run on the zero tape), and (after that halt) let the tape contain b1 at the current position, and b2 to its immediate left. Note that this implies P’s rule for b1 is HALT. (There may be other HALT instructions in the machine, in P or in other states, but they never run.)

    We will form a new machine B’ from B by revising P and adding one new state N.

    Leave P’s rule for not(b1) unchanged, but replace P’s HALT rule for b1 with “write b1, move Left, go to N”.

    Define N with these rules:
    b2: write not(b2), move Right, go to P;
    not(b2): HALT.

    Comparing the run of B’ to the run of B, they behave identically until the last step of B, when B halts but B’ doesn’t. The tape at the start of that step (in either machine) looks locally like [b2] [P b1]. (That is, it has two successive cells as shown, with the second one current and the state being P; all other cells could be anything.)

    B’ will perform three extra steps compared to B, with these states and actions:

    (from above) [b2] [P b1] (will halt in B but not in B’)
    new state 1: [N b2] [b1]
    new state 2: [not(b2)] [P b1]
    new state 3: [N not(b2)] [b1] (will halt in B’). QED.


    Can we do better by using more knowledge about the tape? (We know its entire state when B halts.)

    For most tape states, yes; but it might be all 0 (except for [P b1] if b1 is not 0), and in that case I can’t think of any way to use the rest of it with so few new instructions.

    We also don’t know enough to safely have N jump to whatever state last jumped to P — our revised instruction in P, by writing the correct bit onto the tape, could make P safe to rerun, or make that next-older state safe to rerun, but (in general) not both at once.

  170. Bjørn Kjos-Hanssen Says:

    Solovay in the 1970s showed that a function is computable from every sufficiently fast growing function (given as an oracle) iff it is hyperarithmetic. So the other functions all have more to them than just their speed, so to speak.

    P.S. Google seems to have a hard time learning that I want this blog and not the Israeli TV show Shtisel!

  171. STEM Caveman Says:

    @Scott 143,

    It’s not just the lack of computable model. The completeness theorem is a strong form of the Axiom of Choice, and something like that must be true for the countable case, e.g. equivalence to countable Dependent Choice (combinatorially, Koenig’s tree lemma or the like). This is wildly nonconstructive. The completeness theorem in such cases is a proof that you will never reach a contradiction if you pretend to talk about “actual objects” rather than syntax, and I guess one can get used to the talk to the point that it feels real, but the proof does not produce new nonstandard objects at the same ontological level as the old, standard ones.

    So it looks to me like your minimal meta-mathematical assumptions are Dependent Choice and the existence of a definite answer to the halting of any individual TM, regardless of our ability to find that answer, coupled to some basic set-theoretic setup for talking about discrete objects. This is indeed pretty minimal by modern standards, and covers virtually all of concrete mathematics for TCS, but I don’t think it is all that coherent when examined. Basically the trick is to avoid issues by leaving certain things vague or undefined. When you stop doing that the issues re-emerge and at a more finite level.

    For example, your article takes it as a given that the runtimes of n-state machines that halt are meaningful (“objective” as you wrote), since they can eventually be computed. But they rapidly exhaust what is physically computable and what we actually (physically) have are upper and lower bounds produced within some theory. The length of the halting proofs, a.k.a upper bounds, has its own Busy Beaver-like growth as a function of n. Things become theory dependent and the supposed objectivity is that there is some consistency between the different theories that can be effectively used. That’s no doubt true for very small n but there quickly appears a need for more and more sophisticated theory and more choices of theory so you get a miniaturized form of all the Goedelian complications it seemed one was avoiding by talking about things that halt in finite time.

  172. gentzen Says:

    maline #164, Luke G #168

    Not claiming to be an expert either, but you are basically correct that the hyperarithmetic sets are closely related to the construction outlined by maline. However, I never heard of hyperarithmetical ordinals. But it is a good question, what comes after the recursive ordinals, in terms still related to computability.

    (And now I had a short parenthetical remark putting “possibly relevant computability concepts” into context, but it got so long that I decided it earned its own paragraph:) If we denote recursive/computable by Δ_0^0, then Δ_1^0 would be next, which denotes limit computability, described in footnote 62 as “trial and error procedure … by following a guessing procedure which may make a finite number of initial errors before it eventually converges to the correct answer” (attributed to Putnam). Then comes Δ_2^0, Δ_3^0, …, but that is a bit boring and not very enlightening conceptually, except maybe for the fact that we get Δ instead of Σ or Π, because we need function classes. And then we come to Δ_0^1, which denotes arithmetic computability/definability. And next comes Δ_1^1, which denotes hyperarithmetic computability/definability. This is again an interesting concept, which has at least three totally different characterizations. Next would be Δ_2^1, I don’t know a name for that, but it is the last one still covered by Shoenfield’s absoluteness theorem.

    What else could/should I say about hyperarithmetic sets? (I tried to explain why I find such concepts important, but then I drifted too much into closure of function classes under composition, Weihrauch complexity (because there a subtle differences in ways to compose functions), and I was too unsure whether closure under composition is trivial or not:) Classes like Δ_0^0, Δ_1^0, Δ_0^1, and Δ_1^1 interpreted as function classes are closed under composition of functions. It is less obvious to me whether this is still true for classes like Δ_2^0, Δ_3^0 Δ_2^1, or Δ_3^1. At least for the individual classes in the logarithmic time hierarchy (AC^1), I guess it is not true, but maybe I am making a trivial mistake here. (So I better stop here, and do my homework first.)

  173. gentzen Says:

    Gerald #75:

    An interesting, much stronger theory would be Second Order Number Theory with Projective Determinacy (Z2+PD). Since the work of Woodin and others in the 1980s there is a growing consensus that Z2+PD is the canonical theory of second order math, i.e. the right theory of V(ω+1), while PA being the canonical theory of first order number theory.

    Scott #144:

    …, let me make this discussion concrete as follows. Can either of you propose a candidate for a mathematical statement with an indefinite truth-value—like I think is very plausibly the case for CH—but whose “logical complexity” is lower than CH’s?

    Somewhere between quantification over reals and quantification over sets of reals, we seem to lose Platonicity, and it would be fascinating to zoom in on where

    In response to Scott’s question, I looked up the following FOM post by Dmytro Taranovsky about Z2+PD. I had not noticed that Gerald had already mentioned Z2+PD before. The relevant point here is that ZFC is compatible with Z2+PD but does not prove it, yet there is a growing concensus that Z2+PD “the canonical theory of second order math” or “the basic theory of real numbers and projective sets”.

  174. Joshua B Zelinsky Says:

    @Bruce Smith #169,

    That looks valid.

    “For most tape states, yes; but it might be all 0 (except for [P b1] if b1 is not 0), and in that case I can’t think of any way to use the rest of it with so few new instructions.”

    So it seems like one reasonable subgoal should be to prove that for n>1, the Busy Beaver machine on n states never leaves a blank tape when it is finished. I’m not sure how to prove that, but maybe worth noting that if it did, then a whole bunch of moves at the end are going to be writing 0s; that should maybe restrict what the system can do, with something akin to a pumping lemma or similar sort of pigeon hole argument, since if on the last time it mostly moves in one direction from where it has a lot of 1s, it needs to be able to replace them all with 0s, and then somehow know enough to stop when it gets to the end of the line.

  175. Scott Says:

    Bjørn Kjos-Hanssen #170:

      Solovay in the 1970s showed that a function is computable from every sufficiently fast growing function (given as an oracle) iff it is hyperarithmetic. So the other functions all have more to them than just their speed, so to speak.

    That’s extremely interesting—do you have a reference for that?? Andy Drucker just raised the question to me, in an email a few weeks ago, of characterizing the functions that are computable given an oracle for any sufficiently fast-growing function.

  176. Scott Says:

    STEM Caveman #171: Your comment led me to some interesting reading on Dependent Choice, Countable Choice, König’s Lemma, Weak König’s Lemma, and the relation of all these axioms to each other and to the Completeness Theorem. Briefly, I have no problem at all with Dependent Choice—as I said before, I’m totally fine with the notion of flipping a coin a countable number of times. (Flipping it an uncountable number of times is a different matter.) All the same, if I had to give up Dependent Choice or even König’s Lemma, I could still do math and TCS with hardly any problem.

    By contrast, I could not do math and TCS if I had to give up the idea that a Turing machine either halts or runs forever, and that one which it does is an iron fact of Platonic reality. It’s the denial of that idea that strikes me as crazy and barely even comprehensible. I’m inclined to turn the tables and ask: what’s a specific example of a Turing machine whose halting you consider to be indeterminate? (As opposed to not yet known, which surely you agree is different?) Or: you agree, don’t you, that BB(1)=1, BB(2)=6, BB(3)=21, BB(4)=107, BB(5)≥47,176,870, BB(6)>1036,534, BB(7)>1010^10^10^18,705,353, that these are all facts are surely as 4+3=7 is a fact? Great then, what’s the first value of n for which you think that BB(n) is indeterminate?

  177. Gargantua Says:

    Let ω(n) be the supremum of all the computable ordinals whose order relations are computable by n-state Turing machines. Could we consider Turing machines equipped with an oracle for all BB_{ω(n)} and define their Busy Beaver function?

  178. Bjørn Kjos-Hanssen Says:

    Scott #175:

    Theorem. The following are equivalent
    (i) f is hyperarithmetic
    (ii) “f is computable from all sufficiently fast-growing functions” in the sense that there is a function g such that all h dominating g compute f.

    Proof sketch:
    (ii) implies (i): Apply Solovay’s result in “Hyperarithmetically encodable sets”, TAMS 1978 [link: ]
    to the range of g.
    (i) implies (ii): Consider a characterization of hyperarithmetic sets in terms of iterated halting problems over the computable ordinals. [ ]

    I’m not sure if this has been spelled out in a paper yet actually. A version for partial functions seems to be in Stephan and Yu, “A reducibility related to being hyperimmune-free”, APAL 2014.

  179. Scott Says:

    Gargantua #177: The problem is that, in order to define BBα(n), where α is an ordinal, you need a notation for the ordinal—a way for the Turing machine to tell the oracle which level in the ordinal hierarchy it’s talking about. Whenever we have a computable ordinal, defined via a Turing machine, that machine provides a ready-made notation, but if we try to go beyond the Church-Kleene ordinal, we’re in danger of writing nonsense if we’re not extremely careful about notations. (I know just enough about the subject to tell you that, and am trying to learn more!)

  180. Gargantua Says:

    Scott #178: But you accepted these BB_{ω(n)} to be well-defined for n ∈ N. If we do so, then they’re just a family of integer sequences and we could define oracle TMs with access to them. These machines don’t have to know how these sequences are defined.

  181. Scott Says:

    Gargantua #180: Oh, sorry—I misread your earlier comment! I thought you wanted to define a single BB function based on the supremum of computable ordinals (which is called the Church-Kleene ordinal), rather than diagonalizing across all the different computable ordinals. Yes, sure, you can do the latter, and then look at Busy Beaver with an oracle for the result, and so on as often as you like.

    Let me be clear: there’s no question of ever defining a “fastest-growing sequence.” You can always find a faster-growing one. But precisely for that reason, the standards are higher. A new sequence is only interesting if, in some sense, it blows all the previous sequences out of the water, rather than merely exceeding them in ways that we previously understood.

  182. Bruce Smith Says:

    Joshua Zelinsky #174:

    “… prove that for n > 1, the Busy Beaver machine on n states never leaves a blank tape when it is finished.”

    (Or even a tape which is blank everywhere except in the current cell.)

    That is an interesting idea — but do you have any intuitive reason to guess it might be true? (Or at least, “true for some non-accidental reason”, which I guess means “provable”?)

    I didn’t fully understand your proposal for a proof outline, so maybe there is an idea implicit in that which I don’t yet understand.

    I do see that, when the machine is k steps from halting and in state X, there is a fraction of at least 1/2^k tape states which would cause it to halt (since it will examine at most k cells before halting).

    Maybe that’s unclear — what I mean is, for every k (between 0 and the runtime), there is a machine state X and a “local tape state” K (a set of at most k adjacent relative cell positions, plus a k-tuple of bits to fill them), such that it’s a true statement about the machine’s time evolution that “if it’s ever in machine state X and local tape state K, then it will halt in exactly k steps”.

    Are you suggesting that in a BB machine, for small k and the associated X and K, it’s unlikely the number of ones in K is small? And beyond that, it’s also unlikely that the halting process (for those last k steps) erases all those ones in K? (Where by “unlikely” I mean “if we restrict consideration to this kind of machine, we somehow force the machine to lose the competition to be a BB machine”.)

    I don’t yet see a heuristic reason to suspect that, let alone a potential proof outline. For example, I don’t see why “zeroing a special pattern of ones at the same time as we recognize it” is somehow a bad strategy for a BB machine to use.

    On the other hand, the general idea of considering this sort of analysis does seem attractive.

    A related more general question: is there any known consequence of a machine M “being a BB machine” which would be “locally observable” in the sense that certain short patterns of behavior (if they occur during the run that starts from the empty tape) can be ruled out? (Since any machine that ever does that could not be one that wins the runtime competition to be a BB machine.) (Not counting “entering a short infinite loop” — I mean something compatible with halting, but not compatible with having maximum runtime for its number of states.)

  183. Gargantua Says:

    I think I’ m having trouble understanding certain steps in the construction of the BB-hierarchy, starting already with B_ω. How exactly is it defined?

  184. Gargantua Says:

    Typo in previous comment: B_ω should be BB_ω. By the way: How can we type subscripts here?

  185. Bruce Smith Says:

    I asked just above, more or less: is there any short behavior sequence in a machine M which rules out its being a BB machine, but doesn’t rule out its halting?

    Here is one example: there are instruction sequences which can be “pessimized” (revised to do the same thing in the same number of states but more time), which therefore can’t be present in a BB machine. (I guess this might be well known — I apologize if I’m repeating what most of you already know.)

    (By an “instruction sequence” I really mean a subgraph of the graph of machine states with some designated “entry states”, so that the rest of the machine can only jump to those entry states rather than to arbitrary states within this subgraph. A “pessimization” is then a revision of the rules of the states in the subgraph, so that machine histories are unchanged except for what happens within periods spent within this subgraph. Those periods might take longer, but otherwise they must leave the machine in the same state (after exit from the subgraph) as without the revision.)

    Here is an example that seems natural: suppose you wanted to zero out and skip over the next two tape cells, and then go to state P. To do this, you write an instruction sequence with one entry point X1 and one internal state X2, coded as:


    X1: 0: 0 R X2, 1: 0 R X2
    X2: 0: 0 R P, 1: 0 R P

    But you could “pessimize” this by instead writing:


    X1: 0: 0 R X2, 1: 0 R X2
    X2: 0: 0 R P, 1: 0 L X1

    This is a “weak pessimization”, since it only takes longer for some tape states, and takes the same time for others. But you *alternatively* could have pessimized it as:


    X1: 0: 0 R X2, 1: 0 R X2
    X2: 0: 1 L X1, 1: 0 R P

    This is also a weak pessimization, but its weakness occurs on a disjoint set of tape states — every tape state is slowed down by one or the other of these pessimizations.

    Therefore, no BB machine can contain ORIG!

    Proof: any machine M (running on the blank tape) either never runs those instructions, or it runs them on tape states (I mean just the local part of the tape, seen by these two instructions) of the form x1, and/or it runs them on tape states of the form x0; in each case there is a way to either remove machine states (with unchanged runtime) or to strongly pessimize the machine. Therefore M is not a BB machine. QED.


    To connect this to a prior topic — if you could rule out certain instruction sequences as coming just before halting in a BB machine, you might prove the remaining possibilities either leave the tape non-blank or have states that can be reused by a new state added to modify halting behavior (in the same way, in my proof of Theorem X3, the new state N reuses state P, but going farther back in time).

    (However, I’m pretty skeptical this scheme could work well and fully enough to actually improve Theorem X3. I spent awhile considering cases about the prior one or two instructions before P, in that theorem’s proof, and found some cases that seemingly leave no room for this kind of thing to work. On the other hand, I hadn’t yet realized “pessimizable code can be ruled out”, so I might have missed something. But on the third hand, in the general case where every state might be an entry point, I don’t know if pessimizable code even exists.)

  186. Bruce Smith Says:

    FYI, this sequence is pessimizable even if both its states can be entry points:

    X1: move right, go to X2
    X2: write 0, move right, go to P

    In the pessimized forms, X2 might go to X1, which is safe even if X1 didn’t just run.

  187. Bruce Smith Says:

    Joshua Zelinsky #174: I should clarify my comment that sparked our discussion about whether a BB machine can terminate with an almost-blank tape, since I’m realizing it might have been unclear in a way that makes that question seem more significant for improving Theorem X3 than it is. (Though of course, that question also has intrinsic interest.)

    What’s true: for a reasonably large fraction of tape states S, if you knew machine M halted (on the initial blank tape) with tape state S, you could use a slight variant of the proof of Theorem X2 to improve that theorem’s conclusion, or even (for a smaller fraction of tape states) Theorem X3’s conclusion.

    What might be true, but I’m not sure: just for improving Theorem X2, *every* tape state S is like that, except the ones which are blank everywhere except perhaps under the halting step with state P. (I just mentioned the blank tape as an easy-to-describe counterexample to the idea that you could *always* do this — not intending to imply that it was the only counterexample. Also, though I mentioned it after Theorem X3, my own feeling of its significance may have been partly left over from my earlier attempts to improve Theorem X2. And I then forgot some of this by the time I read your idea about proving the post-halting tape was non-blank, since I got interested in that idea for its own sake.)

    It might be literally true that those states are the only exceptions relevant to this way of improving Theorem X2 (and it would only take a few minutes to check), but since X2 is already superceded by X3, this no longer seems important. For X3, it’s much harder to improve it this way — I’m nearly certain there are almost-but-not-quite-so-blank tape states which would fail to improve it in any way I know.

    (This potential improvement, relative to the proof of X2, is just to get state N to be a one-state loop which stops a little farther along, optimizing all details of which side of P it starts on, which bit value it stops at, which direction it goes, and what P writes under itself before going to N. For an arbitrary non-blank tape state, the worst case is probably a 1 on one side of P and blank everywhere else. In that case N can first run when it’s over that 1, then skip it and the P cell and stop on the 0 (on the other side of the P cell), thus equalling but not exceeding the performance of X3. Even if you proved the tape had lots of ones elsewhere, if it happened to surround the P cell like 01P01, those farther away ones would not help improve this result.)

  188. gentzen Says:

    gentzen #172: … I tried to do some homework now …

    Classes like Δ_0^0, Δ_1^0, Δ_0^1, and Δ_1^1 interpreted as function classes are closed under composition of functions. It is less obvious to me whether this is still true for classes like Δ_2^0, Δ_3^0, Δ_2^1, or Δ_3^1. At least for the individual classes in the logarithmic time hierarchy (AC^0), I guess it is not true, but maybe I am making a trivial mistake here. (So I better stop here, and do my homework first.)

    The non-closure under composition of functions seems to be an artifact of the logarithmic time hierarchy (AC^0, an individual class in this hierarchy would be circuits of depth k, minus some details). Using computations in the limits with oracles, it it pretty obvious (to me now) that Δ_2^0 and Δ_3^0 are closed under composition of functions. And if I had some basic understanding of Δ_2^1 and Δ_3^1 (like I have for Δ_1^1), then it would probably be pretty obvious to me that they are closed under composition of functions.

  189. gentzen Says:

    STEM Caveman #171:

    It’s not just the lack of computable model. The completeness theorem is a strong form of the Axiom of Choice, and something like that must be true for the countable case, e.g. equivalence to countable Dependent Choice (combinatorially, Koenig’s tree lemma or the like). This is wildly nonconstructive.

    The way the completeness theorem is presented is wildly nonconstructive, because text-books are eager to present it for arbitrary sets of axioms. Maybe some are modest enough to restrict themselves to arbitrary countable sets of axioms, but you will have a hard time finding a text-book which limits itself to computable sets of axioms.

    For non-standard models of PA, we do have a computable sets of axioms, and therefore we can construct reasonably concrete models in that case. (However, your exchange with Scott makes it clear to me that you won’t even be happy with those models.)

    The concrete models consists of symbol strings as objects, functions (like S(x), x+y, and x*y for PA) are represented as (extremely simple) computable functions from strings to strings, predicates (like x &lt y in case of PA) are represented as limit computable yes/no functions (of strings), and the equality relation (between to symbol string that represent the objects) is a limit computable yes/no function (of strings) too.

    The completeness theorem in such cases is a proof that you will never reach a contradiction if you pretend to talk about “actual objects” rather than syntax, and I guess one can get used to the talk to the point that it feels real, but the proof does not produce new nonstandard objects at the same ontological level as the old, standard ones.

    Well, for a long time, I was unsure too whether the completeness theorem would fall into that category. My main reasons against believing this were that Kurt Gödel and Paul Bernays are much more careful with respect to ontological commitments than typical text-books, and since both of them believed that the model existence theorem implied “real existence”, then it is probably just the fault of modern text-books that I failed to see how it could imply it.

    In the linear algebra course in the first semester (a very long time ago), the professor proved that every vector space has a (Hammel) basis. Just like you wrote above, I accepted that this proof meant that I would never be able to prove otherwise by finding a contradiction, but I still thought that this was a sort of modeling error. The existence implied by that theorem was simply not the type of existence I expected from mathematical idealization of reality. I am more than willing to accept existence of some abstract objects that can never be constructed exactly, but only approximated better and better, such that the abstract objects act as an idealization of the actual objects. (And that is why limit computability is OK for me, because it approximates some idealized answer better and better in a suitable sense.)

    So it looks to me like your minimal meta-mathematical assumptions are Dependent Choice and the existence of a definite answer to the halting of any individual TM, regardless of our ability to find that answer, coupled to some basic set-theoretic setup for talking about discrete objects.

    The concrete model I mentioned above does depend on a definite answer to the halting of any individual TM, and even more depends on the definite last revision of an answer that may be revised a finite number of times by a TM which may run forever. However, neither Dependent Choice nor basic set-theoretic setup are needed.

    Scott #176:

    Your comment led me to some interesting reading on Dependent Choice, Countable Choice, König’s Lemma, Weak König’s Lemma, and the relation of all these axioms to each other and to the Completeness Theorem.

    The interesting thing is that using Weak König’s Lemma is both weaker and stronger than the concrete construction above, in a certain sense. It is weaker, because it is just a “de dicto” construction (there is no explicitly defined object associated with that construction), which has weaker ontological implications than accepting limit computation. The concrete constructon above is a “de re” construction, because it defined an explicit well defined object, therefore it is stronger. But it is also weaker, because it does not need to work with arbitrary sets, but just with computable sets.

  190. Joshua B Zelinsky Says:

    @Bruce Smith #187,

    It does seem to me that the all blank tape is the only barrier for the following reason: If you have a string of 0s followed by a 1, from where you halted (without loss of generality say it looks like 01….1[] ) where [] is the tape location one is currently in, and there are k 1s, then one can use the new state to loop moving the tape until one hits the 0 and halt at 0. Does that not work?

  191. Eric Cordian Says:

    Hi Scott,

    As a supporter of Israel, but not necessarily of the policies of the Netanyahu government, I was wondering what you thought of Seth Rogan’s recent comments that Israel’s existence “Doesn’t make sense,” and that as a Jewish person, he was “fed a huge amount of lies about Israel” his entire life.

    While I’d like to see peace in the region, and justice for the Palestinians, I think the existence of Israel serves a useful purpose. If Jewish people ever again find themselves in a situation where the country they live in is planning on murdering them, and no other country will give them a visa, they can go to Israel where they will be welcomed, and helped to build new lives.

    While being displaced from ones home country is obviously sub-optimal, I think we can all agree it is vastly superior to being murdered.

  192. Scott Says:

    Eric Cordian #191: I agree with what you wrote. I’m not, to put it extremely mildly, a fan of the Netanyahu government, or how they torched the peace process while making Israel every day less of a liberal democracy and more of a settler-theocracy. But just like my contempt for Trump doesn’t mean I want to see the US destroyed and its current inhabitants murdered or exiled, so my contempt for Netanyahu doesn’t mean I want the same for Israel. And while I loved Seth Rogen’s North Korea movie, I think he’s incredibly naïve if he doesn’t realize that that’s what “the end of Israel” would mean. Israel is no longer a theoretical proposal for “what to do with the Jews,” but an actual country with millions of inhabitants, many of whom happen to be my friends and family, and with universities and high-tech industry and all the rest. “Ending” Israel now is about as feasible as “ending” Canada or Singapore or the US, and as horrifying if you think through its implications—especially given the stated intentions of some of Israel’s neighbors to complete the Holocaust if and when they become able.

    And now, let’s please get back to Busy Beaver! 🙂

  193. maline Says:

    Luke G #168, Gentzen #172, and Scott #179: The hyperarithmetical sets are indeed “well-defined” according to my assumptions in #164. Their recursive construction is almost identical to what I want, except for a detail I mentioned: in that construction, the ordinal notations must be computable. There must be a way for a Turing machine to “find” the correct sub-oracle to query, starting with a natural-number “address” and without supernatural help.

    I want to loosen this, and allow oracles to be used in the ordinal notation system. For a given limit ordinal a, the corresponding oracle should be considered well-defined if it first interprets the address number using some fixed oracle machine of order <a, and then sends the query to the correct oracle, which may be of any order <a.

    According to [ hyperarithmeticity and hyperdegrees], if I understand correctly, the supremum of the ordinals that can be given notations using an oracle X is written \omega_1^X and is larger than the Church-Kleene ordinal. Now suppose we define a function on the ordinals
    where the supremum is over ordinals ba, there is some ordinal ba, meaning that oracle X(b) can be used to interpret a notation for a. So it seems to me that the cutoff for my suggested “well-definedness” will be at the lowest fixed point of this function f.

    Can anyone tell me whether this makes sense, whether there is actually a countable fixed point, and whether that cutoff ordinal has a name? Scott, do you agree that up to this point, all of the questions are still “essentially finitary” and so have Platonic answers?

  194. maline Says:

    Where it says “ba” in my previous comment it should be “b smaller than a”.

    Can someone give me a reference for how Latex, links, and so on work here?

  195. Toby Ord Says:

    Hi Scott,

    I am so excited to see this paper. When I left Melbourne in 2003, I was working on questions concerning generalised busy beaver functions for models of computation that go beyond Turing machines. I packed it all away in a shoebox, taped it shut, and took it to Oxford with me. 17 years later, I believe it is still unopened, somewhere in my attic. I’ve now read your paper and this thread, and think I have a few things to add from back in the day, as well as a few observations inspired by this recent progress.

    First, The paper mentions rates of growth between computable rates and busy beaver rates. I’d noticed this too. My thinking was to use the ideas behind the answers to Post’s problem (of finding an r.e. set that is neither recursive nor equivalent to the halting set). Since the Turing degrees were eventually shown to be dense between these levels, there should be corresponding quasi busy beaver functions that are also dense between computable functions and BB().

    Since there are also non-comparable r.e. Turing degrees, there should also be rapid growing functions that grow faster than all computable functions, that can’t be used to computer BB() and where neither one can be used to compute the other. The only way this can happen is if neither surpasses the other, but they keep swapping the lead. I can’t remember if I proved any of this more formally, or if it was all still conjecture that the analogies play out from Turing degrees to quasi busy beaver functions. If the analogies do hold, then you could also look through the properties of the Turing degrees (which are, frankly, a mess) and port things over to the quasi busy beaver functions.

    I also noticed the idea of using oracles to get faster rates of growth all the way up the arithmetical hierarchy, and of using oracles defined in terms of other busy beaver functions to provide an interesting alternative to halting sets. But I didn’t know enough about how computable ordinals actually work to think of things like the BB_{\omega(n)}(n) of comment #96.

  196. Toby Ord Says:

    It seems like we could do with some nice notation for a function growing not just faster but uncomputably faster than another. It is a bit tricky to pin down, as BB(n)^2 grows faster BB(n) by an uncomputably growing amount (and ratio), but we wouldn’t want it to count for these purposes. I think the key is that we should let the monotonic function by which the slower function is allowed to be boosted be relative to an oracle for that function.

    Turing degrees use notation of =, etc with subscripted T. How about using the curly versions of these (~, &pr;, &sc; etc) with the subscript capital T? Then, e.g.

    f &sc;_T g could mean: For all functions w(.) computable by an o-machine with oracle g, there is still a natural number k such that for all n>k, f(n) > w(g(n)).

  197. Toby Ord Says:

    Looking at the spectrum of run times is a great idea.

    Conjecture: the ratio between the BB and the second place beaver grows incomputably quickly.

    Indeed, we can almost show that something like this is true in wide generality, since any way of reliably bounding (even probabilistically) the true value of BB(n) given initial segments of the spectrum would lead to a contradiction. This doesn’t quite resolve the second-place-beaver conjecture, but might be a useful start and should produce many similar results.

  198. Scott Says:

    Toby Ord #195-197: Thanks so much for the kind words and comments!

    Yes, of course my functions with intermediate growth between computable and BB were inspired by the intermediate r.e. Turing degrees, although it’s not obvious to me how to derive either of those results from the other.

    You write:

      there should also be rapid growing functions that grow faster than all computable functions, that can’t be used to computer BB() and where neither one can be used to compute the other. The only way this can happen is if neither surpasses the other, but they keep swapping the lead.

    I had the same thought, but one needs to be careful, since we could also get what you asked for by (e.g.) taking my function g with intermediate growth, and then twiddling its oddness or evenness to encode two different Turing-incomparable languages. What you really meant, I assume, is two growth rates h1 and h2 such that no upper bound on h1 is computable given h2 and vice versa. I agree that this ought to be possible as well, via an h1 and h2 that swap the lead infinitely many times. But in such a pathological case, I’d be tempted to just define h1 and h2 to have “the same growth rate” and be done with it! 🙂

    BB(n)2 is small potatoes. For me, “noticeably faster than BB” means, at the least, faster than anything computable given a BB oracle. 🙂

    I confess that I don’t have an intuition about whether BB(n) should be a computable function of the second-longest runtime—although notice that BB(n-1) is upper-bounded by, and hence easy to compute given, the second-longest n-state runtime. Would you mind if I put this question in the paper and acknowledged you?

  199. Scott Says:

    Just to have something to link to, I’m now going to de-rot13 the first part of Bruce Smith’s comment #157:

    Theorem X1 (suspected obvious): BB(n+2) ≥ BB(n)(1 + 2/n).

    Proof: Let B be a maximal-runtime machine in T(n), and let P be the state of B which is most often the current state as any step of B starts running. (P may or may not be the initial state.)

    Note that P is the current state at the start of at least BB(n)/n steps.

    We will make a new machine B’ by modifying B, adding two new states and increasing its runtime.

    To do this, start with B, then replace its state P with a three-state sequence, N1 N2 P, where the sequence N1 N2 “does nothing” and P acts as before. (If P was the initial state, N1 is now the initial state; otherwise the initial state is unchanged.)

    More precisely, every existing “transition to P” (some subset of the 2n transition rules in B) is replaced by an otherwise-identical transition to N1. Define N1 to not alter the tape (i.e. write the same bit it’s reading), move left, and go to N2. Define N2 to not alter the tape, move right, and go to P. Define P as before (except that if one of its existing transitions went to P, it now goes to N1 due to the modification already described).

    Comparing a run of B and a run of B’, every step starting not at P behaves as before. Every step starting at P becomes a series of three steps starting at N1, N2, and P, respectively. The N1 step moves left, the N2 step moves right (neither one alters the tape), then the P step starts in an identical state as it did in B, so it behaves identically.

    Since we chose P to run at least BB(n)/n times, each of N1 and N2 also runs that many times. The other states run the same number of times as before. QED.

    (If transitions were allowed to “move neither L nor R”, we could add only one new state, and prove BB(n+1) ≥ BB(n)(1 + 1/n).)

    Scott’s note: the independent earlier discovery of this little result is here.

  200. Raoul Ohio Says:

    I don’t know much about BB but I am a great grand student of Rado’s.

  201. Sniffnoy Says:

    maline #194:

    It’s just (limited) HTML. So if you want to write “b<a”, you would write “b&lt;a”. Similarly links are done the HTML way, etc. I don’t believe there’s any way to embed LaTeX.

    Unfortunately, the <sup> and <sub> tags don’t appear to work at the moment… or rather, they only work for Scott! Superscripts and subscripts are reserved to him alone. 😛 (But he’s trying to fix that, I think?)

  202. gentzen Says:

    maline #193:

    But we can do even more: why should we be limited to recursive schemes, when we have this incredible tower of oracles at our command? All we should require is that, for each particular limit ordinal, there must be an indexing scheme that can be decoded by a Turing machine equipped with some specific earlier oracle. So how far up can we actually go?

    I believe that Luke G tried to argue that you still cannot go beyond the Church-Kleene ordinal (=hyperarithmetical), when he wrote (the same claim also occurs on wikipedia):

    In particular, the Church-Kleene ordinal is the both limit of recursive ordinals, and the limit of hyperarithmetical ordinals.

    So all the oracles up to that point won’t help you to define bigger ordinals. On the other hand, nothing seems to prevent you to restart the game at that point, and use the fixed oracle for the hyperarithmetical sets.

    I want to loosen this, and allow oracles to be used in the ordinal notation system.

    OK, but how do you want to get suitable well-defined oracles beyond the hyperarithmetic sets, given the barrier indicated by Luke G? And where do you want to go? Δ_2^1? Δ_0^2?

  203. Scott Says:

    Sniffnoy #201: I don’t know how to fix it!

    Would someone like $200 to be temporary webmaster of this blog and fix all the technical stuff that’s wrong with it?

  204. Bruce Smith Says:

    Joshua Zelinsky #190:

    That works, but how well it works depends on how many 1s there are. So in the likely-worst case of 01[]01, or the worse-in-a-different way 01[]0… (0s forever), you can start on the 1 or 0 in the 1[]0 part, and move either way as you loop, and place any bit inside [] before you start, and code it to skip over either 1s or 0s as it loops — but you still can’t make it last more than 3 steps before it has stopped by hitting the other kind of bit than it started on.

    (In fact, the only way to make it last even that long is to place 1 inside [], and start on 1 in 1[]0, and move right, and skip over 1s. Then it skips over two 1s and stops on the 0 in 1[]0.)

  205. Bruce Smith Says:

    Toby Ord #197, and fyi Scott #198:

    If you just see an initial segment of the spectrum, you won’t thereby have any idea “how initial it is”, so even if it contains the second place beaver, you won’t know that.

    OTOH if someone promises you m (a runtime) is the second place beaver for n, you can dovetail over all n-machines until you find the next one that halts after time m, and then you know BB(n).

    (I’m not sure how or whether either of these observations impact your conjecture. I guess by “uncomputably quickly” you mean given n, not given m.)

  206. Scott Says:

    Bruce #205: Duhhhh, yes, thanks! 😀

    I guess the precise version of the question should be: does there exist a total, computable function f(n,m), such that whenever m is the second-place running-time for n-state Turing machines, we have f(n,m) = BB(n)? Here the requirement of totality means that the program for f still needs to halt, even if someone lied and set m:=BB(n) (for example).

  207. Bruce Smith Says:

    Scott #206: That’s an interesting question, but I’m not sure it’s equivalent to how I interpreted Toby Ord’s conjecture. I think what I thought he meant was: there is no computable g(n) such that BB(n) / runtime(second place beaver for n) is bounded above by g(n). (Leaving “infinitely often” vs “eventually always” unspecified, for now.)

    I see that if his conjecture was false, you could use g to implement the f in yours. (It would only search up to runtimes of m times g(n).) But I’m missing the other direction of implication, if it’s there. (I guess I’m too tired just now to properly look for it.) Well, I do see that yours is intended to be more general, anyway — it replaces “ratio” with “computable relation”, I guess. In that case I am more skeptical of yours than his, though perhaps only slightly more. (But I guess you did not conjecture “f doesn’t exist” yourself, just stated that form of the question.)

  208. Bruce Smith Says:

    Actually I want to go on record as conjecturing the second place beaver will not be *too* much different in runtime than the winner. Since the winner has pessimized itself as much as possible, all some variant has to do is use the same algorithm but not be quite so clever at being slow. For example, if there is any “setup phase” at the beginning, never repeated, then the winner will artificially do it as slowly as possible in some small number of states, and the variant only has to “do it straightforwardly instead” (in same number of states) to have a runtime differing only by a tiny additive constant.

    So for any of these conjectures to hold, a BB program has to be necessarily so convoluted that there is never any distinguishable “setup code”, or really any other code with an “abstractly specifiable purpose” which runs in a short time, many times during the entire history. Otherwise the variant can just accomplish that same purpose slightly faster and violate these conjectures. And this strikes me as unlikely to always hold, even if it often holds.

  209. Bruce Smith Says:

    Addendum: I’m forgetting again the “always” vs “sometimes” issue. My view is compatible with “sometimes the 2nd place is pretty close, other times it’s far away, depending on the particular winner for this n”. And my interpretation of Toby’s conjecture is also compatible with that. (I better stop posting for tonight!)

  210. Filip Says:

    Scott #203: I can do it, are sub/sup tags and LaTeX needed or is there anything more?

  211. Bruce Smith Says:

    Addendum again: ah, now I see why your f version with “computable relation” is equivalent to Toby’s g version with “computable ratio” — those are the same thing! (You can compute any relation and then express it as a ratio.) I think this means I misinterpreted his conjecture in the first place as more like a “slowly growing ratio”.

  212. Toby Ord Says:

    Bruce #205, That’s a nice idea. Indeed it generalises quite far. If you give me the third highest run time for n states (and I really trust you that it is!), I can still calculate BB(n). I could also do it from the median runtime, or the first centile runtime (though there are technical challenges if the quantiles don’t divide cleanly into the total number of halting machines).

    Your way of clarifying my point with a ratio is at least roughly what I was trying to say, and I’m interested to see that you are suggesting it is not only false, but that the ratio between BB(n)/runtime(second place beaver for n) might converge to 1. That’s impressively bold.

  213. Toby Ord Says:

    Scott #198,

    You are right that those clarifications are needed for the idea of two incomparable rates of quasi busy beaver growth. I wouldn’t be so quick to call them ‘the same growth rate’ though, as (i) it will probably lead to an inconsistency, since relations of the form ‘incomparable with’ are not transitive but ‘equal to’ are, and (ii) it would anyway deprive you of an interesting object of study.

    What else can we say about this relationship between the rates of growth? They must swap the lead infinitely many times, of course, and it must be extremely difficult to pin down where that occurs. If a Turing machine could determine any infinite sequence of natural numbers where we knew the first one was in the lead (no matter how sparse) then it could modify the first function to always be in the lead. So such sets are not recursive (and not recursive given these functions as oracles), but if they can’t be too hard to compute, as a Turing machine with the actual busy beaver function could compute both quasi beaver growth functions and compare them. In contrast, if the incomparability occurs for two rates of growth that are strictly beyond the busy beaver function, then determining such a sequence of points where one is ahead must be even harder.

    Or, if we take make a new function from the max of each of the two of these at each n, then we presumably get a function whose growth rate is related to the join of these two Turing degrees.

    Is it possible to ‘split up’ the power of growth of BB into a finite set of functions, none of which grows as fast, but if you had any upper bounds to all of the functions in the set, you could compute BB?

    There are no-doubt deeper questions one can ask too. I think there is something here, even if it ends up just recasting results results about Turing degrees.

  214. Joshua B Zelinsky Says:

    @ Bruce Smith #204,

    Ah yes, you are correct. So one would need a stronger claim there then just that little piece of tape.

    Also, minor regarding the second place BB function. Since you’ve constructed machines with n+1 states which halt with more than BB(n) +1steps, it follows that the second place machine on n will take at at least BB(n-1) steps.

  215. Zirui Wang Says:

    You may want to add a link to the 27-state TM for Goldbach. I think I have a fair chance deciding whether such a small thing halts.

  216. Toby Ord Says:

    A thought on the robustness of the BB function as an oracle for the halting problem.

    The main robustness focus (quite rightly) is on the intriguing property that any upper bound to the BB function also works as an oracle for the halting problem. So it has a kind of upwards robustness to noise that isn’t remotely possessed by an oracle like the set of natural numbers for the codes for machines that halt (or equivalently) the real number whose binary expansion that is. This robustness is not surprising for the study of fast growing functions, but is surprising for oracles or the plausibility of finding and using oracles in the physical world.

    But also there is also a robustness stemming from the vast amount of redundant information, from the fact that any BB(n) lets you calculate all BB(i), where i less than n. One way to think about that is that if you were lucky enough to find a BB oracle, then even if all the numbers got increased by some arbitrary amount and then arbitrarily many of them got set to zero in an arbitrary pattern, so long as any infinite set of them were left, you would still have an oracle for the halting problem (as you can backfill each gap). So the infinite amount of bits of information they store is encoded in every infinite subsequence. i.e. for all n: K(BB(n-1)|BB(n)) = 0, for all infinite sequences: K(sequence) = infinity and K(sequence1|sequence2) = 0

    This is pretty obvious mathematically, but an interesting property none-the-less, and might lead somewhere…

  217. maline Says:

    Gentzen #202: Oh, I didn’t understand that point. You are saying that \omega_1^X [the supremum of the ordinals that can be given a notation using an oracle X] will always just be the Church-Kleene ordinal, unless X itself uses a non-calculable ordinal?

    That seems very surprising to me: It means that hyperarithmetic oracles are completely unhelpful in defining ordinal notations! Any ordinal that can be indexed using such an oracle has a calculable notation as well. What is the intuition for this?

    Anyway, if this is true then the Church-Kleene ordinal is indeed the cutoff for my scheme. Helping myself to a Church-Kleene level oracle would be obviously illegal. So my claim is that the hyperarithmetic sets, and corresponding reals, are those that “undeniably” have Platonic existence.

    And now for Scott’s challenge: is there any particular real (or set of integers) that is known not to be hyperarithmetic, but that we “want to believe” should truly exist?

  218. Jon Awbrey Says:

    Dear Scott,

    This discussion inspired me to go back and look at some of the work I did in the late 80s when I was trying to understand Cook’s Theorem.  One of the programs I wrote to explore the integration of sequential learning and propositional reasoning had a propositional calculus module based on C.S. Peirce’s logical graphs, so I used that syntax to write out the clauses for finite approximations to turing machines, taking the 4-state parity machine from one of Wilf’s books as an object example.  It was 1989 and all I had was a 289 PC with 600K heap, but I did manage to emulate a parity machine capable of 1 bit of computation.  Here’s a link to an exposition of that.

    🙞 Differential Analytic Turing Automata • Overview

    It may be quicker to skip to Part 2 and refer to Part 1 only as needed.

    I’ll try doing BB(2) when I next get a chance. I always learned a lot just from looking at the propositional form.

  219. Bruce Smith Says:

    Toby #212: “… you are suggesting it is not only false, but that the ratio between BB(n)/runtime(second place beaver for n) might converge to 1. That’s impressively bold.”

    Either bold or reckless! But all I meant to suggest, at the strongest, was that it might converge to a constant, or at least not grow without bound. And at the weakest, all I’m really comfortable conjecturing is that it might become small (ie lower than some constant) infinitely often — that is, it might not *uniformly* grow without bound (if I’m using that term correctly).

    (For better or worse, I also conjecture that we’ll never know the answer!)

  220. Gerald Says:

    STEM Caveman #171.
    I think that the proof of the completeness theorem for a countable set of nonlogical symbols goes through in ZF, no choice at all needed (more generally if you can wellorder the set of nonlogical symbols). Typically a term-model is constructed and at some point we need to ‘complete’ a set of formulas by transfinite induction. However, given a wellorder on the set of nonlogical symbols, everthing can be canonically wellordered here, so no choice at all should be needed.
    Or am I missing something?

    So, if you start with an uncountable set of nonlogical symbols that you cannot wellorder, you get into trouble later. That’s not too surprising, but the source of the problem is not the completeness theorem. Of course not needing AC alone does not mean that the proof is constructive.

  221. Bruce Smith Says:

    Joshua #214,

    “… one would need a stronger claim there [than just about] that little piece of tape.” I think it’s more accurate to say, a stronger claim *about* that little piece of tape. But any claim that would help, strikes me as implausible. (And so does *proving* the original conjecture about “the halted tape will never be blank”, as opposed to just its truth.)

    BTW, I noticed that all the BB machines (or candidates) listed in the paper never encounter a 0 without immediately turning it into a 1! Whereas when they hit a 1, about half the states turn it into each of 1 or 0. So those machines do seem to “favor 1”, and it’s indeed *plausible* that all BBs leave lots of 1s at the end. All I’m saying I don’t yet find plausible is that we could *prove* that! But it’s a truly interesting question, and I’d still like to hear any ideas about it, even if they have no bearing on improving Theorem X3 or anything of that kind.

    “… it follows that the second place machine on n will take at least BB(n-1) steps.”

    Actually that was already known just from Proposition 1 (as I assume Scott implied at the end of #198). That’s because any M in T(n) is also trivially in T(n+1) just by adding an unreachable state at the end, so the runtime spectrum for n is contained in that for n+1. (I think some other comment, or the paper, made that last statement explicitly.)

    But it’s true that Theorems X2/X3 improve this, by forcing each point in that spectrum to expand upwards slightly more. (So incidentally, their proofs (and Proposition 1’s), taken all together, also work to force LB (Lazy Beaver) to grow in the same manner, since those proofs used transforms that work on all machines — they never assumed the starting machine was a BB machine.)

    So I think your comment could have been “because of Theorem X3, the second place n-machine will take at least BB(n-1) + 2 steps”.

  222. Bruce Smith Says:

    Scott, I presume this is well known: Busy Beavers can’t be very compressible, since otherwise a smaller machine could hardcode and unpack their description and then simulate it — a contradiction.

    I have a few comments/questions about its implications:

    – am I right that Lemma 15’s error term severely limits how sharp this statement can be?

    – I think this provides yet another proof of Proposition 4, since proving “M is a BB machine” is proving “M’s description has high K-complexity”, but any proof of a constant having arbitrarily high K-complexity is not possible (with the threshold depending only on the proof theory and the universal language used to define K-complexity).

    – (A technical point, more or less implicit in your paper: K(BB(n)) must equal K(M) (up to a constant) for M the first busy beaver of n states, since given either one we can get the other. Therefore K(BB(n)) lower bounds the K-complexity of any busy beaver of n states (up to a constant). Is this correct as stated here?)

    – I am wondering whether/how this relates to your paper’s Theorems 18 and 19, and especially to your speculation that K(BB(n+1)|BB(n)) might be as low as a constant. If so, “naively integrating that over n”, we’d get K(BB(n)) ≤ O(n), which *looks* lower than the apparent lower bound on K(BB(n)) which we’re discussing — but I’m not sure it really is, due to this Lemma 15 error term issue. So I can’t tell whether that’s a contradiction, or not, or even whether I’m just misunderstanding how relative K-complexity works, regarding “integrating it” like that.

  223. Bruce Smith Says:

    Scott, about Conjecture 21 (BBs above n = 2 are essentially unique):

    – I think someone earlier in this comment thread pointed out the left-right symmetry in BBs. I think you ought to mention this in your definition of “essentially unique”, since otherwise some of your claims about it seem wrong as stated (since to get the same behavior you must sometimes flip the input). One way is to relativize everything to whatever direction the instruction in A0 moves in (or if it’s Halt, then the instruction in A1).

    – If the current candidate for n=7 is the winner, then Conjecture 21 is false (easy exercise). Personally, I’d consider this evidence against that candidate, rather than against the conjecture! However, my feeling about the conjecture is that it’s at best “true by accident”. I can easily imagine it being false, not by accident, if there are two ways of coding the same algorithm, which differ only in exactly how they mess up when encountering an unexpected primordial 1. Or even if there is only one way to code that algorithm, but it has an unused single instruction (like in the n=7 candidate). (By “instruction” I mean that thing that each state has two of.) Conjecturing this never happens is sort of like conjecturing “the number of instructions needed is always even”! Not exactly, since it implies there’s a state which is only jumped to when the bit under it is known, which does seem “objectively inefficient”. But sometimes there is unavoidable inefficiency, and maybe even the BB oracle can’t avoid that every time.

  224. Scott Says:

    Filip #210: That’s great, thanks so much! I’d say the main things to handle right now are:

    (1) Superscripts and subscripts in comments
    (2) LaTeX in comments
    (3) Some German spam that apparently some people are seeing on this blog (?)
    (4) Updating my PHP

    (Can anyone suggest anything else that needs to be fixed about this blog at a technical level?)

    Anyway, if you’re still interested, then please get in touch with me by email!

  225. Scott Says:

    Toby Ord #213: You might be right that my idea of declaring two growth rates “the same” if they’re incomparable can’t withstand scrutiny.

    My issue is this: in 20+ years in theoretical computer science, I’ve seen more than my share of weird growth rates, but I’ve never once seen two that were incomparable (i.e., infinitely often overtaking each other), unless they were specifically constructed that way. So it feels to me like it should be possible to identify a subset of growth rates that’s totally ordered and that “includes all the ones that matter.”

    Does anyone here have a candidate for such a subset, or have a reason why this can’t be done?

    Now, regarding “splitting up” the growth of BB into (say) two functions f and g, such that
    (i) neither f nor g is upper-bounded by a computable function,
    (ii) an oracle for f or g alone doesn’t let you compute BB, and
    (iii) an oracle for any upper bound on f, PLUS any upper bound on g, DOES let you compute BB
    —my feeling is that yes, this could be done, via a construction similar to the one in my survey. We’ll just have to alternate: for some regions of n, f will stay constant while g will experience BB-like growth, and for the other regions of n it will be the reverse. And we’ll choose the n’s at which the alternations happen increasingly far apart from each other, in such a way as to kill off each possible reduction from f to g or vice versa. Would you like to work out the details or should I? 🙂

  226. Scott Says:

    Zirui Wang #215: The claimed 27-state machine is here (and I did put the link in the article). Unfortunately, the documentation for why its non-halting is equivalent to Goldbach is pretty spare, and I don’t know if more detailed documentation exists.

  227. Scott Says:

    Incidentally, Toby Ord and others: given the magnitude of my uncertainty, I decided in the end not to formulate a conjecture or even a concrete question about the relation between BB(n) and the second-place running time, but just to ask about it in general.

  228. Scott Says:

    Bruce Smith #222: Yes, I did know that Busy Beavers aren’t very compressible, although already I hadn’t explicitly thought about it that way—I’d just thought about it as

    K(nth Busy Beaver) = nlog2n ± O(n).

    More importantly, I’d completely missed the implication of that fact that you noticed, that

    K(BB(n+1) | BB(n)) = Ω(log(n))

    (at least, on average over n’s), since yes, one can integrate over conditional Kolmogorov complexities.

    I updated the paper accordingly and acknowledged you. Thanks again!!

  229. Scott Says:

    Bruce Smith #223: The definition of two Turing machines M and M’ being “essentially different” that I went with was the one suggested by Joshua Zelinsky—namely, that there’s some input for which M and M’ run for different numbers of steps. This definition can justly be criticized for conflating machines that are genuinely different (but happen to have the same runtimes), but it certainly handles the left/right symmetry issue.

  230. Nick Says:

    Filip #210, Scott #224:

    The German spam links are visible at the bottom of the blog if and only if Javascript is NOT enabled. This has been the case for every browser I’ve tried, including on my phone and in Emacs.

  231. Bruce Smith Says:

    Scott #228: My pleasure, and thanks for the opportunity!

    Scott #229: Consider this example: suppose my input is 001111[1]100 where the [1] marks the initial tape position. Then a machine coded to “move right until 0, then halt” runs for 3 steps on this input. But a machine coded to “move left until 0, then halt” runs for 6 steps on this same input. So the definition says those machines are essentially different, as far as I understand it. By the same token it would surely classify a typical BB machine and its mirror image as different.

  232. Scott Says:

    Bruce Smith #231: Aha, yet another good catch! I guess I was implicitly thinking that, if you invert a Turing machine about the left/right axis, you should also invert its input along with it. But you’re right that that needs to be said explicitly, and once you do, Joshua Z’s criterion becomes more unwieldy.

  233. Bruce Smith Says:

    Scott #232: I was assuming you should fix the definition, but you’re right that that makes it unwieldy. Maybe it’s simpler to just fix the conjecture — “essentially unique except for exchanging left and right”, or however you want to describe it.

  234. Toby Ord Says:

    Scott #225, Yes, I think you are right that these incomparable growth rates are somewhat pathological. But I’m not yet sure how common they are once we depart from the slender subspace of functions that we normally talk about. e.g. it was hard to construct the first algorithmically random real, but most have that property. Something similar might be true here. e.g. perhaps most pairs of random monotonic functions from natural numbers to natural numbers have incomparable growth. It feels to me that they actually do, though there is a problem in picking the measure.

    But I don’t think there is generally as much incomparability here as there is in the Turing degrees. e.g. obviously for random sets of natural numbers X and Y, machines with oracles X and Y are almost always incomparable in their abilities. Whereas I doubt they would almost always have busy beaver functions BB_X and BB_Y that can’t be upper bounded a computable transform of the other one. (Note that the details of X can’t in general be extracted from all upper bounds to BB_X.)

    I think you are right that to construct a CC and DD that together give BB, they need to alternate the lead for longer and longer times. On my sketch where they alternate according to the digits of Omega, I think CC on its own can give you BB because the islands of 0s and 1s don’t grow in size fast enough — so you could make a transform of CC where you look ahead exponentially far, then copy those results back to the early values to get an upper bound of BB that bridges across the damaged parts of CC. I think something like this can be made to get around CC’s damage. My guess is that you should therefore make the islands where each has the lead grow in size uncomputably quickly, to avoid these tricks.

    Note that this ‘computable transform’ of CC that I applied uses a trick a bit more powerful than my previous definition. It is computable given an oracle for CC, even though I don’t think there is a computable function w, such that w(CC(n)) upper bounds BB(n). But this more powerful computable transform now seems to me to be the right way to think about it.

  235. Toby Ord Says:

    Scott #225, As to finding a core totally ordered sequence of fast growing functions, I think that is where you already are, powering along the arithmetical hierarchy of busy beaver functions. I doubt you will find incomparable pairs without working towards it, even if in some sense it is common.

    I am highly unlikely to be able to get round to actually proving any of these things. But hope I can be of some assistance in pointing out promising approaches, or new areas to explore.

  236. Toby Ord Says:

    Here is another new direction: places where BB growth can occur and what follows from that.

    For example, it is well known that one can solve the halting problem via a long enough initial segment of Chaitin’s Omega. Suppose you are asking about whether a program of length n halts and have the first n bits of Omega. Just simulate all programs in dovetail, adding 2^-i to a running total when one of size i halts. Keep going until this total exceeds your initial segment of Omega. Then all programs of length n or less that will ever halt have halted, so you can check if the one you are interested in has halted. But this is a slow process, with time complexity of order BB(n) and a space complexity of order Ones(n).

    Are there interesting things that come up when uncomputably fast rates of growth enter into computational complexity? We know that a machine at level omega in the arithmetical hierarchy can decide all the truths of first order arithmetic. What are the time complexities for different algorithms that do so? Are there sensible algorithms that use, say, BB_omega(n) time? I also find it fun that more powerful hypermachines can be more *inefficient* than any Turing machine (and so inefficient, that if you could measure it, you could solve the halting problem). If you think bubble sort is bad, wait until you see how inefficiently a machine epsilon_0 levels up the arithmetical hierarchy can sort a list!

    Another example comes from when I was trying to test the limits of probabilistic approaches to solving the halting problem. Suppose you have a Turing machine with access to a fair coin. It turns out you can do *slightly* better than you may have thought. You can design a process where for all input TMs, it halts with probability 1, correctly identifies a halting TM with probability > 1/2 and a non-halting one with probability = 1/2. What you do is start by flipping a coin until you get Tails. Then simulate the input TM for as many steps as you had Heads. If it halts, return True. If it hasn’t halted yet, flip a coin and return True/False at random.

    For all input TMs, this gets it right with probability = 0.5 if they don’t halt, and probability > 0.5 + 1/2^BB(n) if they do. What is cute here is that you can’t do something like this that would *computably* bound the probability away from 0.5, or you could work out how many trials to do so that the law of large numbers moves this probability close to 1 and effectively solves the halting problem. But here it is so barely above 0.5 that we can’t effectively apply the law of large numbers.

    These examples are all perhaps more humorous than useful, but there is probably some more serious stuff too.

  237. Zirui Wang Says:

    Scott #226: The pseudocode further down at the link is quite clear that the machine puts an even number of 1s on the tape to see if there is a cut, where both sides are prime.

  238. Scott Says:

    Zirui #237: Good, so would you say you’re satisfied that the machine indeed does that? (If one understood what to look for, it might not be hard to check by running the machine, although I haven’t tried it yet.)

  239. Scott Says:

    This is a test of LaTeX in comments; I’m just going to enclose it between two dollar signs:

    $$ x^2 + y^2 = z^2 $$

  240. Scott Says:

    Everyone: Filip says he enabled rich HTML in the comments for people other than me, so feel free to try that out as well!

  241. Filip Says:

    Scott #240 and all:

    <sup> and <sub> are the ones that are enabled for everyone – but WordPress doesn’t like nested tags so something like 2^2^x might not work well.

    Anyway \( \LaTeX \) is there as well. There are two formats:

    inline \( latex goes here \)
    block $$ latex goes here $$

    Nick #230:
    I’ve also deleted the German spam links, can you confirm that?

  242. Toby Ord Says:

    Test of html: BBω &sc;T BB1 &sc;T BB &sc;T A ~T exp

  243. Toby Ord Says:

    Test HTML seems to have mostly worked, but my curly greater than signs (ampersand sc semicolon) worked in the preview, but not in the posted comment. In contrast, my omega (ampersand omega semicolon) worked fine.

  244. Luke Says:

    Hi, love your blog, first time questioner here…

    “The question is interesting because it speaks to a very old debate: namely, whether Godelian
    independence from strong formal systems is a property only of “absurdly complicated” statements
    in arithmetic, such as those that talk directly about the formal systems themselves, or whether
    independence rears its head even for “natural” statements. Of course, expressibility by a small Turing machine is not quite the same as “naturalness,” but it has the great advantage of being definite.”

    Indeed, it is this self referential nature of GIC that many people attempt to separate from “natural” statements. This impulse makes sense to me and I can understand why people do it. It is this same intuition of mine that tells me that what the “smallest unprovable BB(n) under axiomatic system A” question is doing is just a rephrasing if the liar’s paradox.

    For any finite axiomatic system A, at some point you will exhaust the theorems A can prove (since theorems are just combinations of the finite axioms). Then you have a choice, you can say “well that’s all A can tell us” or you can say “let’s keep going!”. If you choose the latter, the only place left to go is self-reference. In metaphor, you’ve already said everything you can about the area inside the fence A props up, now all you have left is talking about A itself.

    Is this intuition of mine that we are just rephrasing the liar’s paradox reasonable? I don’t mean to say it’s uninteresting! Clearly I’m interested enough to ask this question. But there’s a nagging part of me that says trying to frame this kind of question as a counterexample to the claims of “unnaturalness” of Gödel Incompletenesss misleading.

  245. Scott Says:

    Toby airs #242: Can you try ampersand gt semicolon for “greater than”?

  246. Scott Says:

    Filip #241: Thanks so much again for all your help!!

    Are special symbols, bold, and italic enabled for everyone as well?

  247. Scott Says:

    Luke #244: Glad you like this blog! I do fundamentally disagree with one thing you said. Namely, just about any reasonable formal system has an infinite number of theorems, because there are infinitely many ways to combine even a finite list of axioms (and in cases like induction schema, there are infinitely many axioms as well). So it’s not obvious that there couldn’t be a single system to exhaust all (and only) the true theorems—that really is something that we needed a Gödel to discover. In the case of the incompleteness theorem, the source of the difficulty is, yes, the ability of computable formal systems to talk about themselves. On the other hand, in the case of (e.g.) the independence of the Axiom of Choice and the Continuum Hypothesis from ZF set theory, the source of the difficulty is different, and has nothing to do with self-reference.

    As I mentioned in the survey, we do know examples of arithmetical theorems that one might independently care about—they have nothing obviously to do with self-reference—that are provably independent of Peano Arithmetic (typically they require ordinal induction). Harvey Friedman has been working for a long time to produce similar examples even for ZF set theory. I think it’s fair to say that, 90 years after Gödel’s discovery, the extent to which that’s possible remains an open problem. Our own project, to find small Turing machines whose behavior is probably independent of ZF, was very much motivated by that problem.

  248. Bruce Smith Says:

    I think I can pessimize the n=7 candidate by filling in the N/A slot and modifying another instruction.

    Note that the only way to get to state C is via the 1RC instruction in state B. That means that whenever we enter C, the cell to our left is 1 (as just set by that rule).

    The cell under C (I mean, the current tape position as now seen by state C) might be 0 or 1. I will assume we know it is sometimes each of those values — I didn’t verify this, but I assume whoever designed this candidate inspected the history enough to verify it, or they would have written another N/A into one of those slots in State C. (In any case, anyone with a simulator can probably check this within the first few dozen steps of history.)

    That means we can replace the contents of either slot in State C, and know that our replacement will run at least once. For this pessimization, I’ll replace C1, since its contents (1RB) conveniently also exist in A0.

    The goal (as we enter C and are about to run C1) is to delay for two steps (without net tape motion or unfixable net tape change) and then execute 1RB. We do this by replacing C1’s contents with 0LA, which changes the tape to 0 (to be fixed later), moves left, and then runs A1 (remember we happen to know the cell to our left was 1). We program A1 to not change the tape, move right (over the 0 we wrote a moment ago), but stay in A. This now runs A0, which does the 1RB we were supposed to do two steps before, from the correct tape position. That also fixes the temporary tape change, as promised (not by undoing the change, but by ensuring that cell’s prior state no longer matters).

    To summarize: we replace C1’s 1RB with 0LA, and program A1 with 1RA. The result is that every time the old machine enters C1 (at least once, but probably lots of times), the new one takes two more steps but ends up doing the same thing.

    (BTW, I didn’t simulate this, so don’t trust it without verifying!)


    I noticed this because I was wondering whether I could prove that no BB contains an N/A (more formally, an instruction that never runs when starting on the blank tape), or failing that, at least prove *some* upper limit on how many N/As it might contain. But the only general theorems I can come up with are:

    – either all N/As are in 0-slots, or all N/As are in 1-slots. (This has two proofs: given an exception, either merge the two involved states into one, or use them to pessimize the machine in some simple way I don’t fully recall right now.)

    – for large enough n, we can’t have any constant fraction of slots containing N/As, or this machine’s description would be too compressible to be a BB (at least I *think* this is true — I’m not confident enough about all the necessary ingredients of this argument, to be positive).

    I got several partial results which give various conditions under which we can pessimize machines with at least several N/As, but no evidence yet that at least one of these conditions always holds. I don’t feel like spelling all these out right now, but if anyone wants to take this up, they can all be rediscovered quickly by using the ideas in my other proofs, plus the fact that for every additional N/A, there is one more state reachable by only one input transition (and thus with a neighbor cell of known content when it runs). (However, I’m not optimistic that pursuing all this would quickly lead to a better result — all I can say is I didn’t rule that out.)


    Another question is whether there is any theorem of the form BB(n + c) ≥ BB(n) + k, for very small c > 2 but not implied by the theorems already known. In principle this seems possible — for example, doesn’t BB(n + c) ≥ BB(n) + BB( c) seem intuitively likely? But I have not thought of any new way to prove anything with smaller c than in Theorem 16!

    The problem is that if you insert code into an existing BB, even after it halted, it starts on a potentially dense pseudorandom tape which it can’t just ignore, but it also can’t be sure it isn’t blank. So there is no safe way to either treat it as a usable initial state, or erase it.

    But there are lots of other proof strategies to think about. What would seem especially interesting is anything which treated the time evolution function of a BB machine in a higher-level way, e.g. showing that to be the BB winner it has to have certain statistical properties in the kind of tape states it uses and/or leaves behind (and thereby a certain number of ones, or better yet, of one-zero changes as you scan the tape). Then you could insert new code which “did nothing” in a slightly slower way than now, or add new code after halting which moved back and forth over some number of existing runs of common bits, perhaps modifying them as it went.


    Even those schemes would not get at the “intuitive real reason” why BB grows so fast — that it keeps finding “new ideas” (which it now has enough capacity to make use of) at each new value of n (or at least, at infinitely many new values, and probably at a reasonably dense set of them). I have no idea how to address that formally, let alone prove it.

  249. Bruce Smith Says:

    (The reason the prior comment uses an extra blank space in BB( c) is to avoid a new bug which causes a copyright symbol to appear in BB(c), at least in the preview.)

  250. Sniffnoy Says:

    Toby Ord #243:

    Yeah I’ve had the same problem before with a number of symbols. Scott/Filip, can we get it so that arbitrary HTML entities are allowed, at least for printable characters?

  251. Bruce Smith Says:

    This is about “inverting the BB function”, and an implication of that for K(n | BB(n)).

    Using the same ideas as in the discussion of Lazy Beaver (in the paper and/or in other comments), we can prove the following exists and is computable:

    R(m) = the smallest n such that there is a machine in T(n) with runtime exactly m.

    (Feel free to suggest a better name than R.)

    R is sort of like a specialized “complexity measure” for numbers thought of as runtimes. (Note that when R(m) = n, there are trivially also machines of all sizes larger than n with runtime exactly m.)

    The entire function R straightforwardly encodes every “spectrum of runtimes of the set T(n)”, and the LB function too.

    And specifically, if m happens to be BB(n), then R(m) = n and R(m+1) = n + 1 = R(m+2) = R(m+3). (Thus LB(n+1) ≥ LB(n) + 3, as I mentioned earlier — but I didn’t reread the paper to see whether something even stronger than that about LB’s increase is known.)

    (Of course, R(m+1) = R(m) + 1 is not a *sufficient* condition for m = BB(n) — otherwise BB would be computable!)


    Anyway, we can use this to “invert BB” — by which I mean, define a computable function f so that, given m, if m = some BB(n) then f(m) = n, and otherwise f’s value at m doesn’t matter.

    To define f using R, just set f(m) = R(m) for all m. (Most of those values don’t matter, but they don’t hurt!)

    (So we might as well just say “R inverts BB”, even though R has an independent meaning aside from that in which all its values are useful.)


    So, can we use this for anything? Yes — it means K(n | BB(n)) = O(1) (which is as close to 0 as any K(a|b) theorem can get, I think).

    Unfortunately this doesn’t justify removing the WLOG comment at the start of Theorem 20 (in the latest version of bb.pdf — Theorem 18 in the prior version), since the same argument doesn’t work with BB_L, since it’s possible BB_L(n) = BB_L(n+1). Maybe there is some similar function like R, and argument about K(n | BB_L(n)), in that case — I don’t know.

  252. Bruce Smith Says:

    (In my #248, by “a theorem of the form BB(n + c) ≥ BB(n) + k”, I really meant “a theorem of the form BB(n + c) ≥ BB(n) + k(n)”, for some specific small constant c and function k.)

  253. Nick Says:

    Filip #241

    I can’t find any spam links or posts, so it looks like everything is fixed.

  254. Bruce Smith Says:

    What happens if we restrict attention to reversible Turing machines?

    More specifically, to “microscopically reversible” ones, by which I mean, looking at just the involved subset of the machine on any step, there is exactly one prior state compatible with any given next state (conditional on that “next state” actually arising from at least one prior state).

    I think this is equivalent to the following conditions on the rules:

    – for every state, all transitions into it come from the same direction (L or R), and at most one writes either bit (0 or 1) as it transitions (that way you can examine the neighborhood of the new tape position after the transition, to see how to uniquely run it backwards);

    – every state, considering both its rules together, either complements or leaves alone its underlying bit (never one of each behavior in its two rules). (That way, knowing that state was the prior one and the tape position it departed from, you can reconstruct the prior value of the bit in that tape cell.)

    I don’t know whether you can express the time-reversed Turing machine as another Turing machine of this same form, though certainly its rule is very local (so at least it’s a cellular automaton). (The reversed version has to examine the proper neighbor, change the state based on that bit and the old state, then change that underlying bit based on that new state. Maybe this can be done by using a few new states for each old one? I don’t know.)

    None of the BBs or candidates in the paper are reversible (as easily seen just from which bits they write). Can a reversible machine still exhibit the same general variety of behavior as a regular one? Can it usefully compute anything? (It certainly has a large enough supply of “free energy (zero) bits”, and potentially a place to “dump garbage” (the zeros on one side of its work area), but it might be hard to traverse the work area and old garbage and notice exactly when it got to the end of that so it could dump new garbage. Maybe it’s easier to just keep moving into fresh tape (always in one direction, in the long run) and leave garbage behind.)

  255. Filip Dimitrovski Says:

    Scott #246, Toby Ord #243:

    There’s lists (<ul>, <li>), subscript/superscript (<sub>, <sup>),
    links (<a>), bold (<b>, <strong>), italic (<i>, <em>), strike, blockquote, code for everyone.
    — —

    The HTML entities / ampersand characters supported are mostly in the ASCII range. Toby wants “sc” which is in Unicode — but I can’t find a list of all of them.

    I did find but it also has all the spacing / non-printable entities. If I have time I will compile a list that should be sufficient.

    In my opinion it’s easier to just use \( BB_{\omega} \succ_{T} BB_{1} \dots \) by typing
    \( BB_{\omega} \succ_{T} \)
    I wrote a script that will render the TeX in the comment preview, but it may not always catch up with your typing. The typesetting library used is MathJax.

    — —

    Bruce Smith #249:

    I looked at the code and the (c) thing dates back 5 years, so it’s not new. I did fix it now 😊 Anyway, I hope I don’t interrupt your Busy Beaver discussion, we should test the comment stuff in another thread.

  256. Toby Ord Says:

    Great, I’ll give this another test. First testing ampersand gt semicolon: >

    The main thing that had me going with ampersand sc semicolon was that it worked in the preview, but not the comment. However, if the following LaTeX works, then I agree we don’t need all the HTML entitites: \(BB_\omega \succ_T BB_1\)

    Test of HTML sup and sub: BB1(10) > 1010

    Scott should definitely document the fact that you start and end with double dollar signs for blocks or backslashed parentheses for inline in the Comment Policy section (the former is a bit ambiguous at the moment – I though the pair was one at the start and one at the end).

  257. Sniffnoy Says:

    Filip Dimitrovski #255:

    There’s a list of all named ones here, if only named ones are going to be allowed. (Which covers all cases I’ve actually wanted, really, but I’m surprised you can’t just like… turn on numeric ones as well, maybe with an exception for ones that could potentially mess things up… I mean it’s not like I couldn’t just copy and paste in a character I wanted, if I really wanted!)

  258. Sniffnoy Says:

    Filip #255:

    Actually, I guess better than Wikipedia is the official list here; it seems to include some that aren’t on Wikipedia at the moment.

  259. Sami Says:

    I wonder if any of the exotic behavior of busy beaver is visible by looking at the computable function bb_m (n), defined as the supremum of runtimes of n state machines, that stop after at most m steps. This impatient beaver stays computable no matter how large n gets, but could any interesting properties emerge here, or are the nonprovability and other stuff completely hidden in the m goes to infinity limit?

  260. Jon Awbrey Says:

    The following site is very useful for unicode —

    There is also a converter app —

  261. Zeb Says:

    What can we say about the Kolmogorov complexity of the final state of the tape after the nth busy beaver halts, as a function of n? At first I thought that it should be very high, but clearly it is bounded by the number of bits needed to describe the nth busy beaver (plus an additive constant). Are there any better upper bounds we can put on it?

  262. Scott Says:

    Sami #259: It’s an excellent question, closely related to the Lazy Beaver discussion above. By definition, the spectrum of runtimes doesn’t tell you anything until you get up to LB(n), which is relatively close to |T(n)|. After that … well, it clearly doesn’t let you compute BB, since otherwise BB would be computable, but it would be fascinating to know whether any of the properties of the BB machines cast “shadows” lower down in the runtime spectrum.

  263. Toby Ord Says:

    I want to take a step back and think more generally about the relationships between (1) fast rates of growth, (2) computability, and (3) ordinals. BB relates (1) and (2). The recursive ordinals that mark the levels of the arithmetical hierarchy combine (2) and (3). Computable approaches to fast growing functions tend to involve ordinals, combining (1) and (3). Our current approaches to the best known variant of BB (by jumping a transfinite distance up the arithmetical hierarchy) involves all three.

    Let’s start by considering the sequence of binary operators: addition, multiplication, exponentiation, tetration, etc. Each is defined recursively by repeated applications of the one before. Together they form the infinite hierarchy of the hyperoperations. The best notation I know of for combining x with y using the zth hyperoperation is x[z]y (so 10^6 = 10[3]6). Each of these hyperoperations is computable, and indeed is primitive recursive. But the function A(n) = n[n]n (the most elegant definition for the Ackermann function) diagonalises out of the hyperoperations, growing faster than any of them, and indeed so fast that it grows faster than any primitive recursive function. (The hyperoperations thus nicely stratify the primitive recursive rates of growth.)

    So far so familiar. Indeed, this is sometimes seen as a prelude to the main event of the busy beaver function. But it is interesting to pause to wonder how much busy-beaverology can be applied here too. As we did with the busy beaver function, we can add A(n) as a new primitive function for primitive recursion and expand its power. For example, we can now get to new functions that grow faster than A(n).

    Q1: does this lead to an interesting transfinite hierarchy of computability extending primitive recursion? (i.e. add the limit of the new fast growing functions in as second additional primitive, then keep going)
    Q2: does it suffice to add *any* function that grows faster than A(n) as the primitive, like you can with the busy beaver function?

    If the answers are both ‘yes’, the analogy would be particularly tight. Even if not, it could still be interesting.

  264. Toby Ord Says:

    The relationship between fast growing functions and the ordinals has always seemed very tight to me. For example let’s build a fast growing function. Suppose you start with the constant zero function (f(n) = 0), then add one to get the constant one function (f(n) = 1), then add one to get the constant two function, etc. Then diagonalise out by taking the nth of these functions applied to n (which gives f(n) = n), giving you something that grows faster than all of them. Then you can go on to f(n) = n+1, then f(n) = n+2… to f(n) = n+n and so on, in a way that mirrors the standard introductions to the ordinals. And indeed to some extent you could use these to represent the ordinals. I think a lot of people (e.g. high school students) would find this a more concrete and less mysterious way of working with a related pattern.

    Suppose we keep going with successor functions (adding one to the right hand side) and limit functions (diagonalising out of a parameterised family of earlier functions). We go through f(n) = n*n, then f(n) = n^n, and onwards. This is just the beginnings of the hyperoperations and we can go smoothly up through them, continuing with f(n) = n[4]n, etc. and diagonalising out to f(n) = n[n]n = A(n).

    Why don’t the standard introductions to the ordinals proceed this way? You see ω, then ω+ω then ω*ω, then ω^ω, and then it is treated as if something strange and interesting happens, where you use fix points to get to ε0, which is also represented as ω^ω^ω… It is shown to have the fascinating fixpoint property of being the first α such that α=ω^α, and then this property is used to reach ever higher ordinals. But as far as I can tell, there is nothing special causing a barrier at the point of exponentiation that warrants a new approach. We could just move on to ω[4]ω (which would equal what is usually denoted ε0). And we could use the hyperoperations to power on further with ω[5]ω, ω[6]ω, and diagonalise out to ω[ω]ω, then use other ideas from the theory of fast growing functions to proceed further.

    Note that even the special fixpoint property of α=ω^α doesn’t seem special to me, as it is just the third ordinal to have such a property. ω*ω was the first to have α=ω+α, and ω^ω was the first to have α=ω*α. So it seems like we could have started to use the fixpoint approach earlier if we wanted and that we didn’t have to start using it at ε0. So the standard approach has a strange Frankensteinian combination of the hyperoperation approach switching to the fixpoint approach at an arbitrary point (between the familiar exponentiation and the unfamiliar tetration).

    As far as I can tell, the reason we don’t just continue with hyperoperations (at least for some considerable distance) is that they were only systematised in 1914, whereas the standard approach was already systematised by Veblen in 1908. I find even this puzzling, as the hyperoperations are an extremely obvious thing. What fraction of bright high school children came up with raising n to the nth power n times when daydreaming in class? 50%? Or noticed the pattern of multiplication as repeated addition, and exponentiation as repeated multiplication and wondered if this keeps going? Only a particularly bright high school student could properly formalise this, but it is still pretty easy in the scheme of things.

    Q3. Is there any problem with the approach of defining ordinals by hyperoperations and then the Ackermann function that I may not have spotted?
    Q4. Which ordinals (in standard notations) do ω[4]ω, ω[5]ω, ω[6]ω, and ω[ω]ω correspond to?
    Q5. It might be supposed that a remaining special property of ε0 is that Cantor normal form works for all ordinals up to that point, but not beyond. Would a version of Cantor normal form that uses [4] instead of [3] work beyond that point? If so this remains an arbitrary stopping point once hyperoperations are considered first-class citizens.

  265. Toby Ord Says:

    Finally, let’s look at the relationship between all three of rates of growth, computability, and ordinals.

    As far as I understand, the recursive ordinals are usually represented with programs/machines that compute unusual orderings of the naturals, where the order-type of the ordering gives you the ordinal it is representing. e.g. imagine a program that rates any even number as ‘less than’ any odd number, and uses the usual orderings within each category. This would represent ω+ω.

    Q6: Could one instead have a system of recursive ordinals directly based on rates of growth? e.g. using the extension of Cantor normal form via the hyperoperators (that I mentioned in the previous comment) for ordinals up to ω[ω]ω, then continuing with related schemes with even faster rates of growth for higher ordinals. So instead of an ordinal being represented by a program taking two natural numbers and comparing them, it is a program that takes a single natural number and applies a monotonic fast growing function to it.

    If so, you could have a very elegant computability hierarchy based entirely in fast rates of growth. You use an oracle for the busy beaver function of a given machine type as the jump operator (instead of using the halting problem), to get above an infinite sequence of such jumps you diagonalise out of the sequence of fast rates of growth (instead of dovetailing the oracle sets). So, e.g., the level Δω is a machine with an oracle for BB_n(n). And for each ordinal in this hierarchy, they get represented with fast growing computable functions. (Indeed, if we wanted to be extreme purists we wouldn’t even need to use ordinal notation in their subscripts, but could just use the fast growing functions themselves. e.g. using n instead of ω, n[4]n instead of ε0…). Then all of the Busy Beavers we’ve talked about including BB, BB1 and even (BBw(n)(n)) could be done with fast growing functions all the way down. No halting functions, no orderings, perhaps even no ordinals (at least not considered in their usual sense as infinite numbers).

    If all of that works (or can be made to work) I think this is pretty cool. And I think it is the kind of thing Scott is going for in taking the busy beaver function seriously, and in being as finitistic as possible. It may seem a bit more complicated now, because we usually just help ourselves to the traditional versions of these concepts and have built our own here, but I think it is actually at least as simple and may be more pure this way.

    And it may also open up some new possibilities in terms of getting ever faster rates of growth. For example, what if we use uncomputable rates of growth to specify the ordinals we need in order to reach ever higher levels of uncomputability and thus faster rates of growth?

    Q7: Since ordinals usually correspond to rates of growth, is there an ordinal corresponding to the BB rate of growth?
    Q8: If so, could we then specify that level of the arithmetical hierarchy, going beyond where we’ve been able to get to so far, and then take BB function for machines with that oracle?

  266. Bruce Smith Says:

    Can BB( ) *sometimes* be computably upper-bounded?

    That is, could there be a computable partial function f (defined for an infinite number of values) which gives an upper bound for BB(n+1) in terms of BB(n)? (To make this more likely, we’ll let it know both n and BB(n).)

    More precisely: we want f(n, BB(n)) to upper bound BB(n+1) whenever f is defined for that pair of arguments. Otherwise, all we require about f is that for an infinite number of n, f(n, BB(n)) is defined.

    (By a “computable partial function”, I mean it has an algorithm which emits an answer or emits a symbol that means “f is not defined for that pair of arguments”, and always halts.)

    (If some f had this property, we could modify it to output BB(n+1) exactly, but somehow this question seems more intuitive to think about when I only ask f to upper bound it.)

    Motivation: in thinking about whether Theorem X3 (aka Proposition 14) could be improved, it’s reasonable to assume not (that is, for some values of n it’s the best that happens in reality) and look for a contradiction. To start with, we can strengthen that even more, to make it even easier to find a contradiction. That hypothesis for contradiction might then be “there is an algorithm which recognizes an infinite set of values of n for which BB(n+1) = BB(n) + 3”. But maybe we’d get lucky and merely the fact that it recognized any “computable upper bound” would lead to a contradiction. Hence this question.

  267. Bruce Smith Says:

    In my last comment, I think the n argument to f is redundant, since it can be computed from BB(n). (But it might make certain f easier to express.)

    A related, simpler question is this one:

    Could there be a computable total function g such that BB(n+1) ≤ g(BB(n)) infinitely often?

    (Unlike for f, here we don’t require g to *know* for which values of n that is the case.)

  268. Scott Says:

    Toby Ord: Thanks for all these fascinating thoughts! Yes, it should absolutely be possible to define ordinals using fast-growing functions, not just the reverse. But for defining generalized BB functions, we need more than just the definition of the ordinal, no? We also need a coding scheme by which an oracle Turing machine can specify anything up to the ordinal, in order to command its oracle about which halting problem to solve. How would we use the fast-growing function definitions to get that?

  269. Sniffnoy Says:

    Toby Ord #264:

    Oh, I can answer some of your questions about ordinals and hyperoperations. I’ve looked at this before, and, well — basically, the reason that people don’t tend to use hyperoperations with ordinals is that the results end up being pretty disappointing once you go beyond exponentiation.

    Let’s assume we’re defining hyperoperations analogously to how one would on whole numbers, so that everything extends seamlessly. (Later we’ll consider some alternatives — but I think I’ll post that as a separate comment if you don’t mind.)

    Let’s look at α[4]β, as β increases. First we have 1, then α, then αα, then ααα… then we have α[4]ω, which will be some epsilon number; let’s just call it ε. (I mean, specifically, it’ll be the smallest epsilon number greater than α, but that’s not the point.)

    But then we get that α[4](ω+1)=ε as well. And then of course α[4](ω+2)=ε also. And so forth, until you get α[4](ω2)=ε as well… and ultimately you get α[4]β = α[4]ω for all β≥ω.

    So, α[4]β turns out not to be that interesting as β ranges through all ordinals; it’s only really interesting for β≤ω.

    The same thing will of course happen for α[5]β as well, or for α[γ]β for any γ≥5. Except here it’s even worse; let’s look at what happens if we make the additional assumption that α is infinite. Well then in that case we’ll actually get α[5]β = α[4]ω for all β≥2. Yikes! And then of course we’ll get α[γ]β = α[4]ω for any γ≥5, β≥2.

    Moreover, once γ≥ω, then even if α and β are finite, well, assuming α and β are, say, more than 2 or so (to avoid trivialities), you’ll get α[ω]β=ω still.

    So basically, your cases here are:
    γ≤3: Successor, addition, multiplication, exponentiation; these are interesting
    γ=4: Tetration — this is interesting for finite β, but becomes uninteresting once β is infinite
    5≤γ<ω: Further ordinary hyper operations — this is interesting for finite α and β, but becomes uninteresting once either argument is infinite
    γ≥ω — This is uninteresting

    Or to put it differently, the interesting cases of hyper operations, using Roman letters to denote finite numbers, are:
    1. α[n]β for n≤3
    2. α[4]k
    3. m[n]k

    So basically, the reason that people only study higher hyper operations for whole numbers, rather than for ordinals, is that that’s pretty much the only interesting case. The only interesting case that misses is the case of α[4]k. So, yeah. That’s why.

    (There are also some definition issues when α=0; what is 0[4]ω, for instance? But I suppose those can largely be ignored.)

    (Also worth noting while above I assumed we’re using the ordinary ordinal operations, there are various alternative notions of addition, multiplication, and exponentiation on ordinals (I wrote a paper on this 🙂 )… but it doesn’t matter which ones you use here. The uninteresting cases above will be uninteresting regardless of which notion of exponentiation you use, and moreover will be insensitive to which one you use — not only will you get uninteresting results, you’ll get the same uninteresting results! That said, which notion of exponentiation you use can affect your results in the α[4]k case, as can be seen by considering (ω+2)[4]2.)

    Anyway I didn’t really explain in this comment why this happens — although a lot of the answer to that question is fairly obvious once the phenomenon is pointed out, I do think there is something worth remarking on here — but I’ll get to that in a later comment I think.

  270. Sniffnoy Says:

    Toby Ord #264:

    Oops, obviously that sequence in the third paragraph of my previous comment was supposed to start α, α^α, α^(α^α)… I guess stacked sup tags really don’t work. Maybe I should just use this newfangled LaTeX feature. 😛

  271. Sniffnoy Says:

    Toby Ord #264:

    OK, so, I want to now write more about my previous comment, and answer the question — why does this happen, and why can it not be avoided?

    The first obvious answer to why this happen is, continuity. Like — the root of the problem is α[4](ω+1), right? (Well, that and m[ω]k, but that case is less interesting, and kind of obviously impossible to avoid. Although obviously continuity is the problem there as well.) We get α[4](ω+1)=α[4]ω because α[3]β, that is αβ, is continuous in β, and β is where the nesting occurs.

    Except, this raises the question — wait, why doesn’t this happen for α[n](ω+1), [n]≤3? Why don’t we get α[n](ω+1)=α[n]ω there, too?

    And the answer here has to do with the definition of α[n]β when n is that small. We define multiplication of ordinals to be interated right-addition, we define exponentiation of ordinals to be interated right-multiplication; in other words, it’s in the left argument that the nesting occurs. And addition and multiplication are continuous in the right argument, not the left argument, so continuity isn’t relevant here and doesn’t cause the problem above.

    But this then raises the question — well, wait. So for n≤3 we’re iterating on the right, to ensure that we get the standard ordinal operations (and to avoid continuity problems), but then for n≥4 we’re iterating on the left? Why the switch? Why not just do things consistently on the right, get rid of the continuity problem entirely?

    Well, because that introduces a different problem, and I’d say a worse one. We have to switch to interating on the left in order to match the standard hyper operations on the whole numbers. For n≤3, the choice of direction doesn’t matter for whole numbers, since addition and multiplication are both commutative, but for n≥4 it does. So if we want to match both the standard ordinal operations and the standard hyper operations, we have to iterate on the right for n≤3, but on the left for n≥4.

    (…I guess that doesn’t address the case of what happens when n becomes an infinite ordinal — hm. I hadn’t thought about that before. But let’s ignore that case for now… finite n first…)

    OK, but why not just not match the usual hyper operations? Well, because the usual hyper operations are defined that way for a reason! If instead of going α, α^α, α^(α^α), α^(α^(α^α)), etc., you went α, α^α, (α^α)^α, ((α^α)^α)^α, etc., this would then just be α^(α^k) as k increases. That’s a lot slower-growing and a lot less interesting (I mean, it can be directly expresses in terms of the previous operations). That’s… really not what we want from a hyper operation.

    Of course, this once again raises the question, why does this not happen for n≤3?

    Well, for exponentiation we have the iteration relation (αβ)γ = αβγ, which causes this problem. This problem doesn’t come up for n=3 because the relevant expression there is (αβ)γ. There is an algebraic relation that applies here, associativity, but it doesn’t, like, reduce things. Similarly with n=2, making multiplication from addition — yeah, addition is associative, but so what? And as for n=1, making addition from successor, once again… yeah.

    OK, but this still raises the question — why does exponentiation have a relation that reduces it like this while addition and multiplication don’t?

    …and, well, it would be possible to give a detailed answer to this question, but I think I’m going to stop here. But the short answer here is that notice what exponentiation is reduced to — it is reduced to multiplication. Basically, well… addition and multiplication are special in a way. All the algebraic relations you get among the low hyper operations, they all involve addition or multiplication, and that’s due to these two (especially addition) having a certain applicability that exponentiation and the higher ones don’t. So with exponentiation, even all the relations you get involving it, still involve addition or multiplication. So exponentiation can have a relation that reduces it to multiplication, but you won’t get that phenomenon with n=1 or n=2.

    …OK, that’s not really a complete answer, because that doesn’t explain why the multiplication case can’t reduce to addition, but still. I’m skipping a detailed answer because I don’t think we all really need to be reminded where the basic algebraic relations between addition, multiplication, and exponentiation come from; anyone reading this probably already has a pretty good understanding of that, or can sit down and think about it if they haven’t thought about it recently. So, you all can write that explanation yourself. But, I don’t know, I thought the rest of this was maybe worth laying out explicitly!

    Point is, the switch at n=4, which causes things to become uninteresting, is necessary to avoid a worse way in which things could become uninteresting.

    Notionally you could switch back once n (γ) becomes infinite to rescue things somewhat in that case — remember, what happens when n becomes infinite is really a distinct cause of uninterestingness — but I think that still causes things to become uninteresting, you just get slightly different values out of it. I’m not entirely sur though, I haven’t really looked at that. Still seems kind of pointless if things are crappy inbetween, though.

  272. Bruce Smith Says:

    Is there any known connection between “BusyBeaverology” and complexity classes like P, NP, EXP, etc?

    For example — define the predicate R'(n, m) := (n ≤ R(m)), using R from my comment #251. Then if we encode n in unary and m in binary, I think a straightforward implementation of R’ is in EXP (since it iterates over all T(n) and smaller, and runs each one for time up to m). (Unfortunately I don’t see any way it’s EXP-complete.) Is there any hope of proving an arbitrary implementation of R’ is not in P?

    But that is just a shot in the dark — my real question is the general one above. The reason I can imagine *some* connection is that, unlike in many discussions of computability, we do care something about the runtimes.

  273. Sniffnoy Says:

    Toby Ord #264:

    Btw, part of my motivation for looking at ordinal hyperoperations was the same as yours — that the fast-growing hierarchy starts out as hyperoperations, so it seems like ordinal hyperoperations should go together with the fast-growing hierarchy, right? But as far as I can tell that’s not the case. :-/

    So yeah, unfortunately, like I said above, ω[4]ω, ω[5]ω, ω[6]ω, and ω[ω]ω, and indeed ω[γ]ω for any larger γ, are all disappointingly just equal to ε0. :-/

  274. Toby Ord Says:

    Thanks so much Sniffoy for this very detailed explanation of what goes wrong with a direct application of the hyperoperations (especially with why things change at level 3 in the hierarchy). The upshot is indeed unfortunate for my unified approach to rates of growth, ordinals, and uncomputability.

    Looking here, I see a couple of modified definitions of the hyperoperations for transfinite arithmetic that lose some elegance and some independent motivation, but that apparently basically get the job done. I’m glad to see people have looked into this obvious and natural idea! In these versions, the sequence of ordinals of the form ω[k]ω roughly correspond to Φk(0) in the Veblen hierarchy, with ω[ω]ω roughly being Φω(0) and the supremum of the sequence ω, ω[ω]ω, ω[ω[ω]ω]ω … being the Feferman-Schutte ordinal.

    Note that this would mean that in terms of fast growing function representations, the Feferman-Schutte ordinal is at about the same place in the ordinals as Graham’s number is in the naturals, since the latter is precisely equal to
    (64 nestings of 3s around a 6)

    So my approach of coding ordinals with their equivalent fast growing functions on the natural numbers is less good than I’d hoped, at it is less clear that the finite hyperoperations really are the same as their ordinal counterparts.

    I don’t think the method is sunk, though it is somewhat less shiny than if it had all worked as cleanly as I’d hoped.

    Here is a thought, aimed at avoiding the slightly inelegant matching up of fast growing functions with ordinals. What about just ignoring the intermediate step of labelling levels in the hierarchy with ordinals at all? Instead, just label them with fast growing functions.

    So: the levels are all of the form \(BB_{f(k)}(n)\) where f(k) is a fast growing function (I use k to distinguish it from the variable that the BB function actually takes, which is n and which I will sometimes suppress).

    \(BB_0\) is the busy beaver function for Turing machines
    \(BB_{f(k)+1}\) is the busy beaver function for Turing machines equipped with oracle \(BB_{f(k)}\)

    Then for any sequence of BB functions \(BB_{f_j(k)}\), we can diagonalise out to make a level above all of them with \(BB_{f_k(k)}\).

    Then proceeding upwards through the hyperarithmetic hierarchy takes us through \(BB_k, BB_{k+k}, BB_{k*k}, BB_{k^k}, BB_{k[4]k}, BB_{k[k]k},\) etc.

    The question is then around the rules for how we can build up these subscript functions. If we allow any (monotonic) recursive functions, then we can certainly have one for k nestings of square brackets, like k[k[k[k]k]k]k. If using the ordinals, we’d have to worry about whether this really is precisely the Feferman-Schutte ordinal, but I think we don’t have to worry about that here.

    I conjecture that with recursive functions in the subscript, this hierarchy goes the same distance as the arithmetical hierarchy, but *potentially* does so in a clean way that is just built of fast growing functions.

    Then one can ask about what happens if we can have subscripts corresponding to well defined and ‘platonic’ fast growing functions that aren’t themselves recursive.

    e.g. \(BB_{BB(k)}(n)\)
    (not to be confused with \(BB_{BB(n)}(n)\), which has already been discussed in Scott’s paper and is just diagonalising over all finite levels)

  275. Toby Ord Says:

    Scott #268,
    Yes, it should absolutely be possible to define ordinals using fast-growing functions, not just the reverse. But for defining generalized BB functions, we need more than just the definition of the ordinal, no? We also need a coding scheme by which an oracle Turing machine can specify anything up to the ordinal, in order to command its oracle about which halting problem to solve. How would we use the fast-growing function definitions to get that?

    I don’t think you quite need that if using BB functions, but I may be wrong.

    My approach is to use the BB property that any function which grows faster suffices. Consider level ω. The function I want to use to encompass all levels below that is \(BB_n(n)\), which grows fast enough that we don’t need to be able to pick out any particular level of BB function below this. If you want to ask about an o-machine at level 47, we don’t extract out \(BB_{47}(n)\), we just use \(BB_n(n)\) itself.

    As far as I can see, the main flaws in this strategy could be if this doesn’t work beyond some higher level for reasons I haven’t considered, or if there is a problem where we really need to know in full generality how high n needs to be before \(BB_n(n)\) forever exceeds \(BB_{47}(n)\). When dealing with a single jump, we don’t need to worry about that, because there is some hard coded k you could put in where beyond that k, the faster growing function wins and we don’t need to know it. But I suppose here there might need to be an effective method for finding it. In the case I’m actually considering, we know that 47 is such a k. I *think* it may be similarly easy to determine a lower bound for when this diagonalised BB function exceeds the lower level you are asking about, thought it might depend on the details (e.g. I’m now floating versions of this based on ordinals and based purely on fast growing functions and these may differ, with the latter version sounding easier to me as we just use its subscript function to find the value of k).

  276. Scott Says:

    Bruce Smith #272: Those are excellent questions! The Busy Beaver function itself seems literally to shoot past the realm of complexity theory, into the stratosphere of computability theory. But I’d wondered myself about a question that’s clearly related to your question, although I’m not sure if it’s identical. Namely, what’s the complexity of computing the Lazy Beaver function, LB(n)? It’s clear that it can be done in exp(n) time. Can one show that the problem is complete for unary problems in EXP?

    Alas, to address one of your questions, proving EXP-hardness is virtually the only tool that we now have for proving unconditionally that a problem is not in P.

  277. Scott Says:

    Toby Ord #275: I love your idea of just constructing the hierarchy of BBα functions directly—using oracles for fast-growing functions to define even faster-growing functions—and thereby bypassing entirely the need for Turing machines that encode oracle notations. I think it works, and I don’t know why I didn’t think of it myself!

    – Given any ordinal BB function, BBα, we can define BBα+1 as the BB function for Turing machines with oracles for BBα.
    – Given any countable sequence of ordinal BB functions, BBα(1), BBα(2), …, we can define BBβ(n), where β is the limit ordinal of α(1), α(2), …, as BBα(n)(n).

    One might wonder if, by proceeding far enough in this way, we could even get past the Church-Kleene ordinal. Unfortunately, the whole point of the Solovay result that I’ve now added to my survey (as Proposition 4) is that, if we did, then our fast-growing functions would no longer be hard to compute merely in virtue of how quickly they grew. For that property, unless I’ve misunderstood something, the “hyperarithmetic” BB functions (i.e., those of the form BBα, for some ordinal α below the Church-Kleene ordinal) do indeed seem to be the limit.

  278. Sniffnoy Says:

    Scott #276:

    Here’s an idea. Above it was noted that all the busy beavers seem to have the property that they halt on every input, not just on empty input, right? So let’s actually define the function this way; BB'(n) will be the longest runtime on empty input for machines with n states that halt on every input. Obviously BB'(n)≤BB(n), it seems like maybe they’re equal, but who knows.

    Then BB'(n) can be straightforwardly generalized to a simple complexity class like P — BB’P(n) would be the longest runtime on empty input for machines with n states that run in polynomial time.

    …except, now that I actually write that out, hm. I was thinking that, like, BB’P(n) would, like, be much smaller than BB(n), but in fact the disconnect between the thing we’re taking the maximum over (the specific case of empty input) and the condition we’re imposing (which isn’t affected by the runtime of any one input on its own) means that likely it’s probably pretty close to BB(n), like BB(n-c) or something. Like you could use a constant number of states to check for empty input and then execute the busy beaver, right? While in the nonempty case just halting immediately.

    Oh well, so much for that idea…

  279. Bruce Smith Says:

    Scott #276: “Namely, what’s the complexity of computing the Lazy Beaver function, LB(n)? It’s clear that it can be done in exp(n) time. …”

    (I assume you mean if the input n is encoded in unary, right?)

  280. Bruce Smith Says:

    (Nevermind, I was confused — I see there are only exp(n) members of T(n) in the sense of exp used in complexity theory.)

  281. Bruce Smith Says:

    Actually I was confused on several levels at once —

    – yes, |T(n)| is in exp(n) (meaning O(2^poly(n)), right?), and so is its square, which bounds runtime of computing LB(n);

    – but you *do* have to encode n in unary anyway, if you want input size to be like n;

    – but you considered that obvious and ok to leave implicit, since after all, you said exp(n) rather than exp(log n)!

    Anyway, that computation does run all those machines for up to about 2^(n log n) time (is that enough to cover each poly in O(2^poly(n))? I think not) — but since its output lumps them all together, I see no way to single out any one machine of interest, which you would seem to need to do to show this is EXP-complete. Nor can you use the implicit self-reference (in any way I can see), even though one of those machines is one that computes LB on a hardcoded n in binary (but thereby takes more than exp time, so is still running when this computation gives up on it). Even if the self-reference really worked, the lumping together would seem to make it unexploitable (since any machine of interest is probably not the last one to halt before LB(n)).

  282. Bruce Smith Says:

    Re my comment #248: I did end up simulating my slight pessimization of Wythagoras’ machine, to verify the pessimization really happens and works properly. I’d post my program here (a very short simulator in python), but the HTML <pre> tag is not working (at least in preview).

  283. Bruce Smith Says:

    Wait, if we hypothesize that LP(n) for unary n is in P (ie runtime in poly(n)) (ignoring that’s its not a predicate for the moment), then can we try to make a contradiction like this:

    there is also a P machine which hardcodes the n in binary, expands it to unary, runs the hypothetical machine, gets LB(n) as a number, knows exactly how long it already ran and subtracts that to get a runtime it still needs to exhibit, thinks “hmm, *that* runtime is < LB(n) so maybe I have a chance at getting it exactly”, then actually does that, thereby running for exactly LB(n) for a contradiction? This is just one of all the machines simulated by LB, but all it has to do is fill in the hole left by all of them!

  284. Bruce Smith Says:

    To make that work, one part is knowing the exact runtime of the hypothetical LB-in-poly(n)-machine. But our new machine can simulate it and count its steps, and know its own simulation of k steps took exactly ck steps for some small c. So as far as I can tell right now, that idea could actually be made to work! Of course I’m probably missing something…

  285. Bruce Smith Says:

    Another detail is how that machine uses up the computed exact runtime. Everything covered so far can be done in c + O(log n) states (and quickly), which leaves it still almost n states to play with — but not quite n, so just the fact that this computed number is < LB(n) is not quite enough — and worse, it’s a parameter on the tape, not something it can hardcode into the design of its remaining states at runtime!

    But, having it as a number is really most of the issue. Here is what it can do (I imagine this is mostly repeating ideas in the existing work on limits on LB, though I haven’t read those):

    – first it’s important to realize that this machine doesn’t have to do all the work. Its design can include guesses, and as long as we can prove at least one combination of guesses will work, the corresponding machine fills in the hole and achieves the contradiction. The others try and harmlessly fail by using up “wrong guesses about the total runtime”.

    – So the main thing this machine does is count down from the computed desired runtime to 0. But of course that doesn’t work as stated, since counting down takes more than one step per count.

    – But if it uses a properly coded algorithm for a prespecified fixed number-size (that size is coded into the machine in binary, and the size of *that* number is hardcoded as one of the “design guesses” mentioned above), then it can count down in a way that takes the exact same time per count, which is O(size of original number). (To count down by 1, it scans over the entire number and back. To scan the right amount of tape, it pulls along a smaller counter that tells it where it is, or maybe it just premakes a special mark at the end, storing each main-number-bit in two cells.)

    – We can also require that the ratio of steps / count is a power of two, for convenience. Then instead of counting from the original number, it counts down from some truncation of it, by a hardcoded amount.

    – The remaining error is small and is just used up by a hardcoded “design guess”. Even this part still has n – O(log n) – c states to be coded in, so with luck, it’s easy (by appealing to known limits on LB to show a machine to fix that error exists in that many states).

  286. Bruce Smith Says:

    If this works at all, I guess it proves something much stronger than “LB with unary input is not in P”, more like “it takes almost-exponential time”. But I didn’t try to work out the exact limit, and it might even depend on some details of how the contradicting-machine is constructed.

  287. Bruce Smith Says:

    One thing I might have missed — my claim about simulating k steps in ck steps was not “carefully worked out”, and if I try to do that now, just having noticed the issue, what I’m sure of is not quite as much. I’m pretty sure it’s still useable, though. Probably it’s easiest to just claim (1) the simulation of k steps takes within poly(k) steps (probably at worst O(k^2)) (2) the simulation leaves the exact number of steps it actually took (as opposed to k itself), as a binary number on the tape, after it’s done (with the tape having the final tape of the sim on one side or in one “simulated tape” (eg every third cell of the real tape, with also a guaranteed end-marker of the reached point in another cell), and this number of real steps as a binary number on the other side).

  288. Bruce Smith Says:

    Furthermore, the details of how to construct that simulator, eg the max size of “number of steps counter” it supports, are hardcoded into it as another of those “design guesses”.

  289. Bruce Smith Says:

    Another way it could fail would be if LB(n) was too small! But I think you said it’s known to be at least |T(n)|/c^n for some c. So this is no problem if we just want to prove it can’t be computed in P, but I suppose it matters if we really try to optimize how high is the runtime we can prove it can’t be done in. Could that runtime depend on the unknown actual value of LB? That is, “if LB is like this then the limit is this in terms of LB, otherwise it’s this other absolute limit”?

  290. Scott Says:

    Sniffnoy #278: One sneaky way that you might imagine saving your idea, would be to exploit a peculiarity of the Turing machine model—namely, that you might never actually know for certain whether you’re dealing with the empty input, or whether your tape head has just been initialized in the middle of a giant field of 0’s, with 1’s in the far yonder distance! So you could, e.g., restrict attention to Turing machines that are constrained to run in polynomial time assuming that they ever find input delimeters, and then maximize their running time assuming that they in fact never find such delimeters.

    Alas, besides being artificial, I don’t think this approach can escape BB-style growth, because one could always satisfy the definition via a machine that simulates a Busy Beaver until it encounters input delimiters, at which point it halts. In fact, I believe you’ll still have

    BB'(n) ≥ BB(n – cn/ log n)

    via introspective encoding. But I certainly don’t see how to do better than that.

  291. Scott Says:

    Bruce Smith #282: Kudos on your “pessimization” of the Wythagoras machine! However, even if my survey weren’t already “off to the printer,” there’d be an interesting philosophical question about whether to credit you for having set a new n=7 record. For Wythagoras’ lower bound, as stated in my survey, was \( BB(7) > 10^{2\times 10^{10^{10^{18,705,353}}}}, \) and your pessimization shouldn’t affect that bound at all! 🙂

  292. Job Says:

    Since BB(n) grows so fast, if it were to ever approach LB(m) for some huge m, much larger than n, that would be a problem right.

    Because we’d be able to use the m-n states to easily pad BB(n) up to LB(m), which is not possible?

    Can we say that BB(n) is never anywhere near LB(m) for any m > n, no matter how large m gets?

  293. Scott Says:

    Bruce Smith #281, #283-289 (!): Sorry for the delay! I wanted time to think about this.

    I think your idea for showing by contradiction that there’s no poly(n)-time algorithm to compute LB(n) could plausibly work, and it’s awesome! Let me restate the idea in my own words:

    Suppose A is a polynomial-time algorithm to compute LB(n) given n. Let A correspond to a c-state Turing machine. Then for all n, there should be a c+O(log n)-state Turing machine that first writes n onto the tape, then runs A to compute LB(n), and finally stalls until it’s run for precisely LB(n) steps. But since c+O(log n)<<n for large n, this would mean there was also an n-state machine that ran for precisely LB(n) steps, which contradicts the definition of the Lazy Beaver function, QED.

    As you say, there are a few annoying details to be worked out here, including:

    (1) How does A know exactly how many steps it itself ran for? (In practice, we might need to envelop A inside a simulating shell that runs for a still polynomial but larger and more “transparent” number of steps.)

    (2) What gadgets do we use to force our new machine to run for exactly LB(n) steps, no more and no less? (I completely agree that this will probably involve first approximating the number of steps within a small additive constant, then switching over to some gadget that gives us LB(n) exactly.)

    (3) How do we show that LB(n) is large enough for this strategy to succeed? (This, of course, should just be a matter of formalizing my argument for why LB(n)>|T(n)|/cn, which indeed relies on the same idea that I just sketched in (2), for getting precise control over runtimes.)

    I also agree that, if this works at all, then it ought to work not merely to show LB(n) isn’t computable in poly(n) time, but to show it’s not computable in any amount of time much less than LB(n) itself—and hence, I think, that at least nn-O(n/log n) time is needed.

    As you pointed out, there are some annoying issues with stating this potential result in terms of conventional complexity classes, although I don’t much care. One could either say that LB (with n encoded in unary) is not in FP, or that LB (with n encoded in binary) is not in FEXP, requiring doubly-exponential time.

    If this works, then the question of whether computing LB is complete for Unary-FEXP or FEXPEXP (depending on how you formalize it) becomes even more interesting for me than it had been. If it’s not complete for a superpolynomial-time class, then we’d get possibly the first example ever of a problem that can be proved unconditionally to be outside P, despite not being complete for such a class. On the other hand, like you, I’m at a loss right now for how to prove the problem complete.

    (Note that it’s not an issue here that EXP means 2poly(n) time, because completeness reductions can blow up the inputs by polynomial sizes anyway.)

    Anyway, what’s your day job, and would you like to collaborate further on this? 🙂 (I’d invite you to visit Austin to work on it, but alas, that will probably have to await the post-covid era…)

  294. Scott Says:

    Job #292: No. LB(m) certainly will exceed any fixed value as m gets large, including BB(n) for any fixed n. This is not a problem. It simply means that LB(m) must be large enough, and incompressible enough, that if we tried to pad out an n-state Busy Beaver so that it ran for exactly LB(m) steps, we’d necessarily add more than m-n states.

  295. Bruce Smith Says:

    Scott #291, I agree that my improvement to Wythagoras’s result is so low-level as to almost not matter. It’s not interesting as a significant change in the bound or as a “new idea” for getting the bound to be so high — it’s mainly interesting just as a demo of how you can often pessimize handmade machines by these sorts of low-level changes. But maybe it does matter that, at this moment, no known BB candidate has an unused instruction. (As I mentioned earlier, one of your conjectures can only be true if no BB machine has one.)

  296. Bruce Smith Says:

    (Or at least, not if that unused instruction is in A1. I guess I don’t fully know this for the other instruction slots! Interesting question.)

  297. Bruce Smith Says:

    Scott #293: Thanks! (I’m relieved that if there is some mistake in there, it’s not obvious enough for you to have noticed it immediately! 🙂

    Fortunately my day job leaves me a reasonable amount of free time, certainly enough to collaborate remotely, which I would love to do!

    One thing I wondered — I infer from wikipedia and also from complexity zoo that proving P != co-NP would imply P != NP. Is that right? If so, that makes the following question more interesting (though it might be interesting even if not):

    – note that a machine M of n states is a witness for s(M) being a possible runtime for n states, which can be verified in time s(M). But the machines we’re talking about now are near-exponential time in terms of n. Might we be able to pad their inputs somehow (even more than by expressing n in unary) to bring their natural time limit closer to P? If so, is it interesting that we have a problem here (whether exact time m is possible for an n-state machine — at least in the case where m happens to be LP(n), I didn’t check whether this all works for general m) which we might prove a minimum runtime for (in terms of n), but for which “yes, it’s possible” has a presumably-much-faster-to-verify-than-to-find witness?

  298. Job Says:

    It simply means that LB(m) must be large enough, and incompressible enough, that if we tried to pad out an n-state Busy Beaver so that it ran for exactly LB(m) steps, we’d necessarily add more than m-n states.

    OK i guess BB(n) is never less than LB(m-n) steps away from LB(m), for any m > n? 🙂

    Since it’s easy to produce an (m-n)-state machine that runs for less than LB(m-n), to be appended to BB(n), anything closer than that would result in a contradiction.

  299. Bruce Smith Says:

    About my last question, I guess if you have that witness, it gives you about a quadratic speedup (number of machines to check is about the same as time you must run them when checking them), so probably this has no bearing on P != co-NP, since it’s ok if those two polys have different degrees.

  300. Scott Says:

    Bruce Smith #297: Yes, P=?NP and P=?coNP are simply the same question. P=NP would therefore imply NP=coNP, but NP=coNP isn’t known to imply P=NP.

    And yes, “there exists an n-state TM that runs for exactly k steps” has a convenient witness in the form of the TM itself. But it’s not clear how useful that is, since as you say, when k is large enough that this question is interesting, it’s already nearly as large as the number of n-state TMs itself (in particular, exponential). I didn’t understand your remark about padding—sure, we could pad to m>>n states, but then LB(m) will become much greater than our machine’s running time, leaving us no better than when we started, no?

  301. Bruce Smith Says:

    Job #298:

    – remember that you can’t just append two machines and add their empty-tape runtimes, since the second one to run sees all the junk left by the first one on the tape. If not for this issue, we could prove BB(n+m) ≥ BB(n) + BB(m).

    – also remember that for LB, much or all of the hard part is exactly matching a desired runtime, not just getting close to it or exceeding it.

  302. Bruce Smith Says:

    Scott #299, our posts which “crossed in the mail” come to the same conclusion, that this idea is not going to help with P != co-NP in any obvious way. And I think they conclude that for the same reason in different words. But even so, I’ll explain what I meant about padding.

    I was talking about padding the way we encoded the same input n, to move the witness-verifying problem (in terms of its input size, rather than in terms of n) from EXP to P (and the witness-finding-and-verifying problem to something smaller than it is now, but, I hoped, not as small as P), rather than changing the value n itself.

    But I realized I can’t do this, since the natural runtimes are related by the slower one being about the square of the faster one, so if I adjust things to get the faster one in P, the slower one will also be in P, just with a higher degree of polynomial.

  303. Bruce Smith Says:

    My answer to Job #298 suggested the following, which I think gives an example of a class of improved theorems of the form BB(n + c) ≥ BB(n) + k for various pairs of constants c and k — well, I think not quantitatively improved compared to Prop. 15 (though I’m not sure that failure is provable), but different enough to be worth mentioning:

    define BBclean(n) as the longest runtime of an n-state machine run on an empty tape, where the machine is also required (when run on an empty tape) to leave the tape empty when it’s done.

    Then BB(n + c) ≥ BBclean(c) + BB(n). Proof — just append the machines with the clean one running first.

    But BBclean(c) is probably larger than BB(c + the same extra stuff as in Theorem 16), since you can simulate the machine, track and mark the extent of its use of the tape, and then clean the tape after it’s done. (I didn’t prove this but I think it’s pretty simple. In fact, it must have been proved as part of proving Theorem 16, since it would be hard to make use of a “dirty tape” even if it contained the number you wanted it to contain in a readable way.)

    And for various constants c, that ought to give you k which grows a lot compared to c.

    So Prop. 14 can be extended to still-small c but large k. But since the result is additive, it’s probably still no match for Prop 15 as it stands now, whose result multiplies the runtime.

  304. Job Says:

    Bruce #301,

    remember that you can’t just append two machines and add their empty-tape runtimes, since the second one to run sees all the junk left by the first one on the tape. If not for this issue, we could prove BB(n+m) ≥ BB(n) + BB(m).

    I see, the tape states do get in the way of concatenating two machines so it’s not that simple, too bad.

    Also, question on your idea to show that LB(n) is not in P, using Scott’s version:

    Suppose A is a polynomial-time algorithm to compute LB(n) given n. Let A correspond to a c-state Turing machine. Then for all n, there should be a c+O(log n)-state Turing machine that first writes n onto the tape, then runs A to compute LB(n), and finally stalls until it’s run for precisely LB(n) steps. But since c+O(log n)<<n for large n, this would mean there was also an n-state machine that ran for precisely LB(n) steps, which contradicts the definition of the Lazy Beaver function, QED.

    I’m wondering how this would distinguish between a machine that actually computes LB(n) vs a fake one that just outputs precomputed values of LB(n) (for large enough n to show the contradiction), or even one that guesses LB(n) randomly.

    Wouldn’t the Turing machine construction in the proof also manage to take a fake or lucky polynomial-time LB(n) solver and successfully pad it to run for exactly LB(n)?

    I’m assuming that would be a problem since we can already produce polynomial-time machines that are partial solvers for LB(n) and those should not lead to a contradiction.

  305. Bruce Smith Says:

    My reasoning in #299 was quite wrong, and my claim in #302 that it was the same as Scott’s reasoning was therefore probably also wrong. I’ll post more on that later, but the short version is, it’s even easier to see that this won’t help with P != co-NP that I thought it was, and it’s for a more fundamental reason.

    OTOH there are some interesting generalizations of this — other functions than LB, which the same method can prove are not in P, or, I think, not in any TIME(f(n)) class you like (at least out of a wider choice of them — I better not claim too much before working out details!). I’ll post that too, shortly.

  306. Bruce Smith Says:

    Job #304: “I’m wondering how this would distinguish between a machine that actually computes LB(n) vs a fake one that just outputs precomputed values of LB(n) …”

    [summary: your question inspired one or two interesting extensions of what we can prove!]

    That’s a really good question! I had not thought of it before. But the same proof shows this also could not happen. But it also shows more, which I didn’t notice until you asked this question. (It’s possible Scott noticed it, since maybe something he said earlier hinted at it and I didn’t get that at the time. He’ll have to tell us.)

    So — the same proof that shows our contradicting-machine could not quickly compute m (the impossible runtime), also shows it could not hardcode m (in the space it has available in its states) (because doing so would lead to the same contradiction, since it could use the value just as well in that case).

    At one level that is not surprising — we know m can be almost as large as |T(n)| in terms of bits (the formula in Scott’s paper for min possible value of LB is not much smaller than T(n) once you take logs), so hardcoding a value of that size would potentially use up most or all of the n states.

    But I didn’t work out the details of that comparison — Scott can tell us whether this idea actually can improve his lower bound on LB(n), or he might say he already used that idea (implicitly or explicitly) in his proof of that bound. (I don’t know his proof.)

    But, beyond this bound on the size of m (it is too big to be hardcoded into n – c – log(n) states or whatever), this argument also shows it can’t be *compressed* into that many states, in a way that is efficient enough to decompress!

    This argument is not strong enough to show K(LB(n)) is that large, since it says nothing about a kind of compression that is very slow to decompress.

    But it does show, not only does our hypothetical algorithm A have no way to compute m in a certain time, *no machine of a certain number of states* (which is almost n) is able to compute it in that same amount of time!

    There is no requirement that these machines, for different n values, are related to each other.

    So I think we have proved not only that an *algorithm* for LB can’t have a faster runtime than a certain runtime function (which is much higher than polynomial but less than exponential) — we’ve proved that a *family of turing machines, one for each value of n*, with each one limited to some number of states a bit less than n, can’t do that!

    AFAIK this is not a concept I’ve heard of before — unless it’s equivalent to a family of *circuits*, which actually, it might be. (Scott or any other expert will know instantly, I think.)

    I guess each of those TMs is like a universal TM with about n log n bits of advice, and a circuit is sort of like that too (circuit size == number of advice bits), so maybe it *is* the same concept. Rather than trying to be 100% sure of that, I’ll wait for Scott to confirm it!

    But actually, circuits in complexity theory usually have one output bit, but in this situation we have more… and I think that’s inherent in the situation, since if we ran a circuit with one output log m times to get log m output bits (giving it more inputs so we could tell it which bit we wanted), that would take substantially more runtime… but maybe that wouldn’t matter….

  307. Bruce Smith Says:

    Since it’s getting late, I’ll post only the “generalization” — the rest will have to wait til tomorrow.

    There are functions different from LB with the same way of being proved not in P (or in certain other time-complexity classes).

    First, what properties of LB are we actually using? It is not so much that it produces an impossible runtime that matters — rather, it produces a value that it is impossible for us to get from an n-state TM, in whatever specified way we like (i.e. the value of f(M) for some f we specify), provided we can compute f(M) within time g(n).

    There are only |T(n)| values we can get that way, so in some larger set of potential values V(n), there are some we can’t get. If we define h(n) as “some value in V(n) which we can’t get as f(M) for M in T(n)” (but also make sure h is deterministic and in fact computable, eg the first such value), then we can look for a way to make an n-state TM which *does* make that value, to give a contradiction.

    So far, we have not even cared that the domain of f is a set of TMs rather than (say) a set of numbers, or if it is TMs, that we get values from them by running them rather than (say) taking rot13 of their bitstring and repeating that until it’s long enough. All that matters is that we specified f: T(n) -> V(n) and |V(n)| > |T(n)| and h picks one of the values not in the range of f. (In fact it’s enough if h sometimes fails and picks a possible value, as long as it picks an impossible one infinitely often.)

    But in the next part the nature of f starts to matter, since to get the contradiction we need to be able to program the TM so that, if h runs fast enough that the TM can use it to think about h’s output for very long, this is enough to let the TM *cause* f(itself) to be the forbidden value.

    So what can M (a TM) do to influence f(M)? For arbitrary f that’s complicated to answer, but the only kind of example I yet know of, that is interesting here, is where f runs the TM and does something with values related to running the TM which the TM can control in an intelligent way.

    Those values include: the exact runtime, the tape contents when it halts, the final state when it halts. The final state is not important since the TM could easily write it on the tape if desired, so we can ignore it.

    For LB(), f(M) = s(M) and M can, with difficulty, control s(M) within a certain range.

    But another choice is, for example, “f(M) = the first n^2 bits of output tape content (relative to current tape pos after halting), provided M halts within time g(n)”. (Or even, those bits at step g(n), whether or not M has halted by then.) (Note that number of n^2-bit strings > |T(n)|.)

    (Note that this is even more like a “complexity function” than LB is. More on that below.)

    So let’s see how it works out with that f.

    The function analogous to LB (called h) is, described roughly, “the first n^2-bit string that no n-state TM can output within time g(n)”.

    As with LB, we can prove h is well-defined. It’s also computable, by an algorithm which runs all n-state TMs for time g(n).

    (So M’s runtime does still matter — but it doesn’t have to be part of the value of f(M).)

    What about the contradiction?

    Suppose we can compute h “quickly” (TBD) using some family of slightly-less-than-n-state TMs, one per n. (For example, the ones that hardcode n in binary and feed it to a fixed TM of c states which implements a fast algorithm for h.)

    Then for large enough n we can make the contra-TM which computes h(n) (and afterwards has the forbidden output on its tape as a 2^n-bit string), moves it into the right position quickly enough, then halts, so that forbidden output is this TM’s output, giving the contradiction.

    (This is analogous to “make sure my runtime is exactly m”, but is easier for the TM to do. In fact, if we define things properly, the post-h part is nothing! The contra-TM is then as simple as “hardcode n and feed that to the h-algorithm”.)

    (We could do similar things for an f() which used both the runtime and the output, but I don’t know if we gain anything beyond just using the output alone. We could also vary how h picks a value that f can’t produce, among all such values — for example, instead of LB, we could use “the first impossible runtime at least as large as |T(n)|^2”, which is obviously between |T(n)|^2 and |T(n)|^2 + |T(n)| (inclusive).)

    (We could also allow h to examine TMs of various numbers of states, and let g depend both on h’s parameter n and on the actual number of states used, as well as on the output and/or exact runtime. That is, it could enforce a tradeoff so that to violate its prediction a TM would need to either be small enough or run fast enough but could trade these off according to a specific relation. And perhaps there are even more things we can do.)


    The meaning of “quickly” above (about the runtime of h that leads to contradiction) is “in a runtime slightly less than g(n)”, but I won’t analyze that more closely here, except to point out that since M’s job is easier if it only tries to control output, this runtime can probably get closer to g(n) in that case than in the LB case where f also cares about M’s exact runtime. For example, when f cares about exact runtime, M has to *simulate* h (within a time limit), but if f only cares about M’s output, then M only has to *run* h directly (within the same time limit, so fitting into it is easier).


    So that’s enough about the generalization itself, unless I forgot something due to being tired now.

    For which time-complexity classes could we use this? It seems like g(n) could get pretty large, since no matter how large it is, clearly only |T(n)| outputs are possible, so h still exists, and the nature of the contradiction stays about the same. In all cases the provable min runtime of h is a bit less (in some sense) than g(n).

    It could also get pretty small, with the only limiting factor that the overhead in the TM’s using the output of h (and simulating h if it needs to also control its own exact runtime) might start to get too large by comparison to get a good limit.


    I mentioned above that a certain kind of h function is similar to a complexity measure on the values V(n). In fact, it’s a pretty natural generalization of K(v) to ask not for the size of the smallest program that can *ever* compute v, but the smallest one that can do it in a certain time. And then this h(n) function is defined as “the first v not computable by an n-state program in time g(n)”, and what we’re proving about h is, roughly, that it itself can’t be computed in that time. Since it outputs a v that it just said can’t be computed in that time, this is almost tautological! The only thing obscuring that is the dependence of some of these functions or limits on the allowed number of states of the machine that computes it.

    So it wouldn’t surprise me if this form of the idea (about TM output rather than TM exact runtime) (including the theorem about h’s runtime in that case) has been discovered before, as part of an exploration of generalizations of K-complexity. But if not, it’s even more interesting than if this was only true about the LB function!

  308. Bruce Smith Says:

    Job #304, What did you mean by “… we can already produce polynomial-time machines that are partial solvers for LB(n)…”?

    I didn’t quite notice that when I replied first. If we really could produce those, that *would* be a problem, but I am not aware that we can (if I understand properly what you mean — it sounds like “machines that can compute LB for some values of n”, but maybe you meant something more like “machines that can delay for some number of steps compatible with the output of LB”).

  309. Bruce Smith Says:

    Here’s another formatting error in the preview — with no spaces in f( TM ) I get f(TM), which in preview looks like f with a superscript of TM in small print (ie as if f was a trademark).

  310. Sniffnoy Says:

    Toby Ord #274:

    Oh interesting. I hadn’t seen that before. So let’s see…

    The accepted (but not at all upvoted) answer, by Simply Beautiful Art, suggests remedying the problem by always taking the larger of the two values. Huh. Kind of inelegant, but I guess it is something nontrivial to study.

    The highest-voted answer, by Mike Battaglia, gives a way of looking at it from which it does make sense to view the Veblen hierarchy as hyper operations. Interesting! I’d never thought of that before.

    The next answer, by Simply Beautiful Art once again, only covers tetration and seems… pretty arbitrary? This seems worth ignoring to me.

    And then the next answer, by Alec Rhea, seems to just be “iterate on the right” and ignores the problem that this makes things small. <shrug>

    And then the final answer by Timothy just comments on the usual problems that come up when you try to define it. So that’s not helpful.

  311. Joshua B Zelinsky Says:

    One other thought about the growth of BB which seems reasonable to conjecture/ask about it. Let BB^(k)(n) be the kth finite difference of BB(n). Is it always true that BB^(k)(n) c for sufficiently large n.

  312. maline Says:

    Scott #121: If you don’t mind, I’d still appreciate a reference on working with infinitely many qubits.

    To state the difficulty in short: We cannot use a separable Hilbert space because there are uncountably many orthogonal states. But using a nonseparable Hilbert space doesn’t really help: a state in a nonseparable Hilbert space, when expressed in some basis, may only have countably many nonzero amplitudes. But that means that is we use, say, the up/down basis, there is no way to represent the state where all the qubits are in the “plus” state!

    I have not seen a good way to represent the space of states in this system, so I’m excited to hear that the problem has been well studied.

  313. anon Says:

    Scott #277: Noah Schweber does something similar in his reply here.

  314. Scott Says:

    maline #312: Sorry, I’m not going to be able to provide what you’re looking for. All I meant was that Hilbert spaces of countable and uncountable infinite dimensions can both be defined, and the latter could be interpreted in terms of infinite numbers of qubits, and pretty much anything one could say about this subject has been said by the quantum field theorists and C* algebraists, to whom I refer you for whatever other questions you have about it. Personally, I prefer all my Hilbert spaces finite-dimensional, or countable at the largest.

  315. maline Says:

    Scott #314: Okay, thanks anyway.

    I do know that the C* algebra approach to quantum theory, which avoids Hilbert space altogether, can handle infinitely many qubits. But that approach has the shortcoming of not telling you what the space of possible states looks like!

  316. Bruce Smith Says:

    Joshua #311: there was a formatting error in your comment, but if I guess correctly, you are just asking whether BB(n + k) – BB(n) eventually always exceeds … any c? If so, this is unknown for k = 1 but implied by Prop. 15 for all higher k. (If I take out “always” then I guess it’s known for all k by Theorem 16.)

    But maybe you were asking something else which got lost in the formatting.

  317. Bruce Smith Says:

    [This is mostly about the same ideas as my last big comment last night, #307, but hopefully somewhat cleaner, and more abstract in a useful way. But it does have new material, like “comparison to hierarchy theorems”.]

    The essence of the new idea seems to be some sort of complexity measure, or more precisely, a new class of complexity measures, on finite data.

    The generic version of this is “how big (n) of a process of kind G(n) is required to produce output v”.

    (By “kind of process” we also include how the value v should be extracted from the record of running the process, aka the process’s “history”. So far we’re only using deterministic processes with no inputs, so there is a total function from processes to histories and thence to values.)

    In the specific case of LB, the “kind of process G(n)” is something like “a turing machine of n states running for time up to |T(n)| + 1 and halting by then”; in related theorems it is more generally G(n, m) = “a turing machine of n states running for time up to m and halting by then”. And in both cases the “extracted value” is just the exact runtime, or a “None” value if the machine doesn’t halt in the time alloted. And the theorems require that the range of possible runtimes (whose size is m) is larger than |T(n)|.

    In the other kind of “h function” I wrote about last night, G(n) (where this G is also parametrized by a time limit function g on n and a number-of-output-bits function k on n) is instead, most naturally, “a turing machine of n states running for time g(n) (or any lower time, if it halts by then) and maybe or maybe not halting by then”, and the extracted value is “the tape content in the interval of k(n) tape cells starting at the current tape position, visible after exactly g(n) steps or after halting, whichever comes first”. The theorems require nothing directly about g(n), but about k(n) they require that the range of possible outputs, ie V(n) = {0,1}^k(n), is larger in size than T(n), ie |V(n)| > |T(n)|. Note that in this case |G(n)| = |T(n)| — they are 1-1 by definition. The more fundamental general requirement is |V(n)| > |G(n)|.

    (Of course you could also replace “turing machines of n states” with “L-programs of n bits”. That might simplify some things, mainly by making |T(n)| a simpler expression in n, and/or by permitting tighter bounds, just like it does with BB vs. BB_L.)


    Having defined G(n), you can define an h function h(n) in any manner which infinitely often emits an output value which is impossible for any process in G(n) to generate. In the complexity interpretation, this just means a value “with complexity higher than n”.

    The h function does not even have to be computable or deterministic. For example it could be defined by a family of unrelated turing machines, one for each n. The condition |V(n)| > |G(n)| is what guarantees that a suitable h function exists. (It does more — it guarantees a computable one exists, namely, “the first impossible output”.) But typically you want something more from h — you’re interested in a specific h function, not just any one that satisfies the output condition. So if you know it exists anyway (ie that it does infinitely often produce impossible values), that implies |V(n)| > |G(n)| (at least for those n), but you don’t have to think about that explicitly.

    For LB(), the h function is just LB.

    In the other case I discussed last night, it’s just something like “the first output value not generatable by anything in G(n)”.


    Having defined G and h, you then prove trivially that for infinitely many n, nothing in G(n) can output h(n). This is just a restatement of the requirement above for h.

    To do more, you need that these so-called “processes” in G(n) (which up to now could be anything with corresponding values in V(n), according to some function f(process) -> V(n)) are “computation-like”. I won’t try to formalize that here, but what you end up wanting is to think of their histories as structured objects, and to reason like “none of their histories even have an impossible v inside them, since if one did, we could make a modified one (a different process created by modifying the original one) which used that v inside it as its output value, getting a contradiction”.

    There are details which limit that conclusion a bit, and which depend on exactly what kind of processes these are, including how that value is extracted from their histories. But the upshot is that you can show (at least) that there is no “algorithm with a certain amount of advice and able to be run within the resources of a G(n) process (possibly while also being interpreted there, depending on f)” which can always compute h(n).


    So, how does this result compare to other ones?

    It has some similarity to the “hierarchy theorems” (space or time). It doesn’t seem to me to be just a restatement of one of them, nor is it exactly “a new hierarchy theorem”, though I have some hope that it could be recast as one (“complexity hierarchy theorem”??).

    But skimming the proof of the space hierarchy theorem in wikipedia, it sure looks highly related to the contradiction we get in this case.

    It also reminds me of some proofs about Kolmogorov complexity.

    The resource being limited in the examples of G(n) given above are a combination of program complexity and runtime. If, in the proof of this theorem, you only limit runtime, like in the time hierarchy theorem, this theorem no longer works, since more program complexity can make up for lack of time by being used to hardcode the “impossible output”. And if you only limit program complexity, you just get K-complexity. So it seems essential to limit both at once.

    Another thing it reminds me of is a statement that “co-NX is more powerful than X”. (I don’t know if there is any formal validity to this idea. This is part of why I was pursuing whether this could be used to separate co-NP from P, but I concluded it can’t do anything related to that — I’ll explain what was making me think it maybe could, and what I now think about it, later.)

    The reason it reminds me of a statement like that is simply that “X” is “any process in G(n)”, and “co-NX” is “the procedure of letting a nondeterministic choice of a process in G(n) rule out an answer, and if that never happens, emitting that answer (or any such answer)”. The latter is what h does, and it provably computes something we can’t always compute within G(n).

    Since this is a “function problem” rather than a “predicate”, maybe it’s fundamentally different than what “co-N” means for predicate-classes.


    Why can we use this to separate h from a class like P (or many other time complexity classes g(n)), which doesn’t limit the polynomial degree or the constant in O(), so for any one value of n, it might include algorithms with any absolute runtime?

    I don’t yet have as clean an answer to that as I would like! I also suspect that using this that way is wasting most of its potential power.

    But the understanding I have is something like this (in the specific case of the class P, or more precisely FP):

    by choosing G appropriately, so g(n) grows faster than any polynomial, and the allowed program complexity (n in this case — it would not *have* to equal the n parameter in general!) grows without limit, we manage to dominate every P algorithm for high enough n. (There are two thresholds n has to exceed — it has to make g(n) dominate the hypothetical polynomial runtime, and it has to exceed the fixed size of the turing machine for the hypothetical FP-algorithm. If it only did either of those alone, we wouldn’t get our result.)

    Having done that, we know that any FP-algorithm for h — even one with up to a certain amount of advice, I guess o(nlogn) or so when G(n) is 1-1 with T(n) — would eventually fit inside G(n), at which point we get the contradiction, since we can modify the machine it fits into to emit h’s output directly.

  318. Scott Says:

    Bruce Smith #306, #307: Just last night I was having some of the same thoughts, and I see that you’ve beat me to writing them! The way I was going to put it, was that your argument (assuming it works) would also show that LB(n) is not computable in subexponential time even with o(n log n) bits of Karp-Lipton advice (i.e., extra information dependent on the input length). The reason being that that would still lead to an n-state Turing machine that ran for exactly LB(n) steps, contradicting LB(n)’s definition.

    Of course, one you have n log n advice bits (and just linear time), or once you have exponential time (and no advice bits), it does become possible to calculate LB(n), so in that sense such a result would be optimal.

    The other point I wanted to make is that this is strongly reminiscent of arguments that one makes in the theory of resource-bounded Kolmogorov complexity, and (especially) circuit lower bounds that “explicitize” Shannon’s counting argument. For example, here’s the classic proof that there must be problems solvable in EXPSPACE (exponential space) that require exponential-size circuits:

      First, if we consider all \( 2^{2^n} \) Boolean functions f:{0,1}n, almost all of them must require circuits with at least (say) 2n/n2 gates, just by counting / pigeonhole. So in exponential space, we can search for the first such function f, if all their truth tables were arranged in lexicographic order, and then compute that f.

    Once we have that f, we can then use it to “work our way downwards,” and get problems in smaller complexity classes (like NEXPNP and even MAEXP) that also don’t have polynomial-size circuits.

    Isn’t this similar to what we do with LB? First, we observe that, by a counting argument, there must be many relatively small runtimes that aren’t “claimed” by any n-state Turing machine. Second, to make things explicit, we consider the first such runtime. Third, we use that runtime as the basis for a lower bound, by arguing that if the runtime were easy to compute in this-or-that way, then we could contradict the very way we defined it.

  319. Job Says:

    Job #304, What did you mean by “… we can already produce polynomial-time machines that are partial solvers for LB(n)…”?

    Only that there are polynomial-time functions that happen to coincide with LB(n), for one or more values of n.

    I see your point though that it’s not trivial to embed an LB(n) value into an n-state machine. In the sense of, “here’s LB(n) produce an n-state machine that encodes it”.

    At the same time, it sounds like the proof only requires a single LB(n) for a sufficiently large n, and that seems alot easier.
    E.g. we don’t have to find a way to encode every LB(n) into an n-state machine, just find a single polynomial-time (n or less)-state machine that happens to leave LB(n) in the tape.

    And then the TM construction in the proof would produce a contradiction from that?

    Basically i’m wondering whether the proof also produces a contradiction for a statement that is true.

    It reminds me of P vs NP, where a proof that P != NP, by contradiction, also says that we can’t even partially solve NP-Complete instances in polynomial time.

  320. Bruce Smith Says:

    [Scott #318: I want to get this out before reading your reply, since this is my “last important idea to get out”, and then I’ll read that and reply.]

    If we really have a new way of proving problems h are not in P, it seems reasonable to ask whether we can use that to separate P from seemingly-higher classes X, by constructing an h definitely inside X but now provably outside P.

    (We could say the same thing for other time classes besides P, but I don’t know enough about those to know what would be interesting.)

    I tried this for X = co-NP and failed and concluded it can’t work (which Scott probably knew all along), but of course there is also X = PSPACE = NPSPACE = co-NPSPACE.

    So can one of the h’s we’ve been defining look anything like one of the known PSPACE-complete problems? (I’ll slowly get to this, below.)


    The h’s inherently output large values, whereas a PSPACE-complete problem is a predicate (with 1-bit output). As is well known, you can ask about a derived predicate like h'(n, m) := (h(n) < m), and use binary search on that to compute h(n) in O(log m) calls of h. As long as O(log m) is “not a big multiplier” on your g(n), this won’t increase your total runtime too much.

    In LB, m is O(g(n)), so this is fine. In the machine-output kind of h, the output length can’t exceed g(n) but it can certainly exceed log(g(n))! The output as a number can go up to 2^g(n) (though you might choose G(n)’s internal f to discard most of that).

    So in the worst case, log m can be g(n) and we’re asking whether O(g(n)^2) is worse than O(g(n)). Well, yes, but (I am pretty sure) if g(n) is in a typical named large time complexity class (P, EXP, or higher — technically I mean the time complexity functions used to define those complexity classes), then g(n)^2 is in the same class.

    So I hope I can mostly ignore this function/predicate distinction. Of course I still might make some kind of logical error due to ignoring it, even if its “effect on runtime” is ok.


    Looking at Wikipedia, some of the PSPACE-complete problems look at least *slightly* like these h functions — though not enough to convince me there is anything real to pursue there. Having no definite idea, I think I’ll stop there, and just say that I didn’t see anything to completely rule out this approach — especially for other pairs of not-yet-separated complexity classes that I don’t know much about.

    But I also think that exploring this new idea (the h functions themselves, and the limit on ability to compute a complex value) “on its own terms” is the main right thing to do now — if the idea is actually new, that is how to understand what it’s really about, and if not, that is how to find out in what form it’s been discovered before.

    The new idea has certain aspects that I’m sure I don’t yet understand deeply enough, including potential nondeterminism, nonuniformity, how it combines its two resource limits (program size and runtime), and how to take advantage of how poorly it “fits into” already known complexity classes, in the sense of not matching up to them very directly.

    [unless I’m forgetting something, that’s all I have to say on this at the moment, except the details about what I was thinking about this and P vs. co-NP, which are probably not very high priority to report.]

  321. Bruce Smith Says:

    Scott #318:

    “… EXPSPACE proof … Isn’t this similar to what we do with LB?”

    Yes, I think there is a very close analogy just as you describe. To help me think clearly about it, let me make it even more explicit:

    > First, we observe that, by a counting argument, there must be many relatively small runtimes that aren’t “claimed” by any n-state Turing machine.

    (Actually for LB we only assumed “at least one”, but that is enough to make the result work just as well.)

    In the EXPSPACE result you described, the analogy is that we observe there must be many (and thus at least one) truth tables of our fixed size, which aren’t “claimed” by a “small” circuit that can evaluate to them.

    > Second, to make things explicit, we consider the first such runtime.

    Exactly the same. I would say this is to make sure what we’re defining is a computable, deterministic function, which lies in the claimed class, in this case EXPSPACE (though we’re not done proving that, at this stage).

    > Third, we use that runtime as the basis for a lower bound, by arguing that if the runtime were easy to compute in this-or-that way, then we could contradict the very way we defined it.

    In the LB case, the way we compute the runtime is to run a small enough turing machine for a small enough time, and see this value as its exact runtime.

    (So we have to iterate over all those turing machines and run them that long — this has to fit in whatever our complexity class is.)

    In the EXPSPACE case, the analogous value to the runtime is the truth table, and the way we compute it from one “small example” (circuit) is to evaluate the circuit on each input.

    So we have to iterate over all small circuits (where small means “almost but not quite large enough to get arbitrary truth tables”). Fortunately we only have a space bound, not a (small) time bound! And these circuits fit into it, just barely (same with the truth tables). Anyway we have to, and can, at this point prove we have an EXPSPACE algorithm.

    Is the contradiction also exactly analogous?

    LB: it emits a runtime that can’t be achieved. If the small TM could know that runtime by shortly before its time limit runs out, it could achieve it.

    EXPSPACE proof: it computes a truth table that can’t be achieved by a small circuit (the same one regardless of its inputs, depending only on n). Then it uses that to evaluate its inputs and emit one bit from that truth table. Thus its total set of values (for that n) imitates that truth table. If some small circuit could imitate that same truth table, that would be the contradiction.

    So I think they are identical except for the function vs. predicate distinction, which is basically handled in the “standard way”.

    I’ll post this first, then maybe reply to your other points separately.

  322. Andrei Says:

    I know what your stance on this kind of stuff is, but, anyway, thought you should know:

    This is the author:

  323. Bruce Smith Says:

    still Scott #318:

    “… resource-bounded Kolmogorov complexity …”

    I don’t know anything about that except the name, except that I have a vague memory of hearing something about it long ago. But guessing from the name, that is exactly the sort of thing I was saying last night that my “generalized h function” might be based on. So, did the people studying that notice they could use it to construct problems provably outside almost any time complexity class? If they did (and if they used that in whatever interesting ways we would think of now, which you can never tell), then that part of this is not new. Since you mention it now in connection with that EXPSPACE proof, I take it that they did notice something like that, and construct similar proofs in it.

    Given that, in what ways is the proof of LB’s complexity new?

    – the function whose time-complexity we lower-bound is about runtimes of programs rather than about explicit outputs of programs. (But in either case it’s an “impossible value for a suitably-small-and-limited program”.)

    – the contradiction is created by controlling a runtime, rather than by controlling an explicit output.

    I think those are actually independent points! We could mix and match them, that is, make all four variants of those design choices.

    Re the first option: since a turing machine simulation can turn a runtime into a number, whenever m is a possible runtime of a machine in T(n), m is also a possible explicit output of a slightly larger machine.

    Re the second option: it is more complicated (so more work to prove), and takes more overhead in runtime and program size, for a machine you’re constructing to control its exact runtime, than for it to control its exact explicit output. But they are both possible and give the same sort of contradiction. But the higher overhead for controlling a runtime means it probably hurts your result rather than helps it, I would guess. OTOH to get a direct contradiction in the LB case, we did have to output our number “in the form of a runtime”, simply because that’s what LB measures about us when it computes its value. But if we just wanted some function we could prove was not in P (or some other time complexity class), and didn’t care in advance that it was LB or had anything to do with runtimes, it’s probably easier to make that function by using explicit outputs rather than runtimes in both of the above choices, and you will probably get tighter bounds, and for a wider variety of time complexities (lots rather than just one).

    And re the way the LB proof combines both options: it seems to me it combines them independently. That is, it turns the “subroutine runtime” into a number by simulation, then emits that number as a runtime by “carefully controlling its own exact runtime”; the fact that it does both of these things doesn’t help it do either one, it does them entirely independently — in fact, worse than that, it has to worry about its own exact runtime even while simulating a machine to measure *its* runtime. So one makes the other worse — the opposite of their being some nice way to do them both together.

    So if we just redefined LB as an h function with the same output size and time complexity g(), I predict we’d get a tighter bound with an easier proof.

    So I presently see LB’s role as having inspired this discussion and these thoughts, but not necessarily as being a useful addition to what can be done using “generalized h functions based on explicit outputs”.

    (Though I am eager to be corrected on that, if possible! And also, to keep exploring “applications” of either one, and/or to understand them more deeply.)

  324. Scott Says:

    Bruce Smith #320: If LazyBeaver∉P were to lead to a proof of P≠PSPACE, this would probably have to be counted the most consequential blog comment thread in the history of the world. 😀

    Alas, I have a reason for pessimism: the proof of LB∉P, assuming it indeed works, is still “just” a proof by diagonalization, and we know (for, e.g., relativization reasons) that diagonalization proofs can’t possibly separate P from PSPACE on their own. The part that wasn’t obvious to me—though maybe it was obvious to others—was that, nevertheless, diagonalization proofs can potentially do a little bit better than separating P from TIME(f(n)) for some superpolynomial f.

  325. Bruce Smith Says:

    Job #319: “Basically i’m wondering whether the proof also produces a contradiction for a statement that is true.”

    I hope not! 🙂

    I prefer to think that it produces a contradiction for a statement that we might have heretofore thought *might* be true, but that we now realize we have just proved is *not* true! (Namely, that for some n, LB(n) might be hardcoded into a TM with sufficiently less than n states, or compressed in that form in a way that can be decompressed quickly enough.)

  326. Bruce Smith Says:

    Scott #323:

    “… the most consequential blog comment thread in the history of the world. 😀”

    You mean, in the history of the world, so far! I think blogs’ importance is growing, though not quite as fast as BB…

    “… and we know (for, e.g., relativization reasons) that diagonalization proofs can’t possibly separate P from PSPACE on their own.”

    Sometime I’d like to look into those reasons, just for fun. But I take it now as I think you intend — we can’t make progress on that by just “looking for one of these h functions among the known PSPACE-complete problems”. And if I understand right, if a function like that exists at all, it would have to be defined quite differently than we’ve discussed so far, since the definitions so far do “relativize” in the sense that if all the machines had access to some fixed oracle, the proofs would still work.

    I guess the proofs would have to “examine the machines’ structure” instead of only ever simulating/extending them.

    Would it be enough (in principle, just to defeat relativization) if they diagonalized over detailed machine histories rather than just over machine outputs? I ask this because the history structure is different if oracle calls are present or not. (Of course the proof would have to make use of some fact about the existing primitives in the machine, which is not true of oracle calls — otherwise it could be extended to cover them. Eg that the output of each operation is a simple function of a small local neighborhood, ie that the machine’s time evolution is a cellular automaton.) (I’m not saying I have any idea how to make use of that, even if you say “yes, in principle that would be immune to relativization”.)


    I guess it would also be interesting to understand exactly why the LB proof and the related h-function proofs are not prevented by relativization from reaching their conclusions. And given that, what further conclusions might still be plausible, regarding only that issue.


    I should also ask — what accounts for your excitement about “a potential new way to establish a problem is not in P”, if not “a potential way to separate P from some other class”? Is it purely its intrinsic interestingness, rather than “some application”?

  327. Scott Says:

    Bruce Smith #326: I agree, the novel aspect here is that the problem has to do with exact runtimes. If you remove that, then it’s really closely analogous to (say) the proof that there’s a language in EXP that has no size-n2 circuits, namely that in EXP you can go through all functions f:{0,1}n→{0,1} in lexicographic order until you find the first function that’s hard enough (which will happen after at most exp(nO(1)) functions), and use that one.

    In both cases, the hardness argument is a sort of mashup of diagonalization with a counting argument. And, even more to the point, in both cases we end up with a hard problem that we currently have no reason to think is complete for any standard complexity class—it’s just some random problem that’s hard. (Almost literally “random”: the problem, in both cases, is to reproduce a specific pseudorandom object, one that’s difficult although not impossible to compress.)

    Of course, one difference is that it’s only the LazyBeaver argument that would yield a separation between EXP and P. This has to do with the fact that LB is specifically designed to evade (uniform) Turing machines rather than circuits.

  328. Bruce Smith Says:

    I googled “resource-bounded Kolmogorov complexity” and am starting to read one of the first results, since I recognized an author name and it’s relatively recent.

    The Pervasive Reach of Resource-Bounded Kolmogorov Complexity in Computational Complexity Theory

    Eric Allender
    Michal Kouck´y
    Detlef Ronneburger
    Sambuddha Roy

    It’s interesting; near the beginning it defines the same notion of complexity I defined last night (except using programs rather than turing machines); if nothing else, it gives a survey of some aspects of this field up to 2006 or maybe a bit later.

    (I didn’t yet find out whether they used this in the same way as in the putative new proof of LB’s time complexity. OTOH it’s clear they’re doing things both somewhat related and much more sophisticated.)

  329. Bruce Smith Says:

    Scott #327:

    (FYI you refer to my #326, but your replies seem to be to points in my #323.)

    “Of course, one difference is that it’s only the LazyBeaver argument that would yield a separation between EXP and P. This has to do with the fact that LB is specifically designed to evade (uniform) Turing machines rather than circuits.”

    This sounds very important to me, but I mostly don’t understand it — can you elaborate?

    Also I’m not sure why you say “uniform” — for each value of n, LB(n) goes through all TMs of size n, so it has nothing to do with any series of TMs, one per n, which are generated by one algorithm — so you must mean something else, which I can’t guess.

  330. Joshua B Zelinsky Says:

    Bruce #316,

    Ugh. Not the question. Need to use preview better. Ok. Trying that again with more words so I don’t need to worry about weird with greater than signs.

    Define for a function f(n), f^(k)(n) as follows: f^(0)(n) = f(n), and for k greater than 0 define f^(k+1)(n) = f^k(n+1)- f^k(n). This is sometimes called the kth finite difference, and is a discrete analog of the derivative. Then the question is whether it is true that for any k and any c, BB^(k)(n) at least c for sufficiently large n. This is a much stronger statement than the conjecture that BB(n+1)-BB(n) goes to infinity for sufficiently large n (which is the k=1 case of this conjecture). Does that make sense?

  331. Scott Says:

    Bruce Smith #329: A “uniform” model of computation, like Turing machines, is one where the same machine needs to work for every input length n. A “nonuniform” model, like Boolean circuits, is one where you can have a different machine for each n. P is a uniform complexity class, while P/poly is a nonuniform class. And of course, the lower bounds situation is very different for the two. We know P≠EXP by the Time Hierarchy Theorem, but we can’t show EXP⊄P/poly; the best we can currently do is MAEXP⊄P/poly.

    When we take the lexicographically first Boolean function that requires large circuits, we do well in “fighting” against small circuits, but (as it turns out) not nearly as well as we could do in fighting against fast Turing machines. For example, if our problem is in EXP, then we can get that the problem is outside TIME(nk) for some fixed k, but not that it’s outside P, let alone that it requires exponential time. Taking the lowest runtime that’s not claimed by any n-state Turing machine does much better in fighting against Turing machines—letting us separate EXP from P and even from SUBEXP (just like we knew how to do from the Time Hierarchy Theorem). I hope that clarifies.

  332. Bruce Smith Says:

    Joshua #330, yes, this now makes perfect sense.

    As for the conjecture itself — I agree it seems “intuitively very likely”. But I have even less idea how to try proving it than for just its k = 1 case!

  333. Zirui Wang Says:

    Do you know that the commas inside numbers should be typed {,}? For example, instead of 1,234,567, it should be 1{,}234{,}567.

  334. Job Says:

    What if we lived in a world where we could use LB(n) as a sinkhole in algorithm design?

    For SAT we’d just create an n-state machine that:
    1. Picks a random solution x.
    2. If x is valid, stop and output x.
    3. Otherwise, output x and then stall until LB(n).

    We can be sure the machine will never actually take path #3. 🙂

    This is what I’ve always wanted from quantum computing, a “never happened” gate.

  335. Bruce Smith Says:

    Scott #331, thanks for trying, but so far that doesn’t clarify your #327!

    FYI I already knew your first paragraph except the last clause (“the best we can currently do …”).

    I am comparing these things: the two proofs you outlined (first the one involving EXPSPACE and second the one involving EXP, since they both “find and use a large circuit”) and the LB putative proof of runtime.

    I understand why taking a (necessarily-)large circuit fights against a small circuit (ie is guaranteed to not be imitable by it). (By definition.)

    You then seem to compare that to “fighting against fast Turing machines”, implying this is more difficult. I’m not sure if you mean “the large circuit tries to fight a fast Turing machine” or something else which I can’t guess. If you meant that, then (1) I don’t understand the relevance, since it doesn’t happen in any of the problems mentioned above, (2) I don’t understand your result in that comparison — I’ll explain why in the following.

    – As I see it, there is no relevance of uniformity here (no place to even try to apply it), since we’re operating independently for each n.

    – Any fixed turing machine running for time t can be simulated by a circuit of size about t^2. So to really compare them you’d have to get specific about size limits and number of outputs, but I don’t see any “generic difference” in this context.

    – Just the fact that the number of small circuits is limited, and the number of turing machines (fast or not) is limited, is enough to find an output or runtime or truth table which goes “unclaimed”. Their smallness/fastness is not directly relevant for that.

    So basically, I don’t yet understand your original remark *or* these new ones… 🙁

    If there was something definite I could compute (about various time bounds, numbers of things, etc), maybe that would help me understand it, but I don’t even understand exactly what you’re comparing, so there is nothing for me to compute.

  336. Bruce Smith Says:

    Now your #331 looks longer — edited? So maybe it will clarify further — reading now.

  337. Bruce Smith Says:

    Scott, your longer (edited?) #311 is a little clearer, but only maybe 10-20% of the way to “clarifying” all the things I don’t get yet.

    I think what would really clarify it is if you spell out and compare two theorems (or refer to them specifically, if they are ones you already posted or I can find on the web). Then I’ll understand things such as, when you say “if our problem is in EXP”, what else you’re also assuming about “our problem” — as it is, the context is getting too complicated for me to work out or guess. (If you are tired/busy, feel free to defer til tomorrow.)

    I also presently still think that anything the LB proof could do about proving something (in this case, LB itself) requires almost-exponential time, some of the h-functions I defined yesterday (which don’t involve using runtimes for “communication”, only as resource limits) can do with a likely-tighter bound and more easily. Are you not addressing that claim, or arguing against it?

  338. Bruce Smith Says:

    To clarify my last claim — I claim we could prove better/easier theorems about the h functions (which look at outputs on tape, not runtimes, though they still limit runtimes) than we could prove about LB runtime. (I don’t claim that there is some better way than we already discussed to prove the same theorem we discussed about LB. That is, I’m comparing two analogous theorems, not two proofs of one theorem.)

  339. Bruce Smith Says:

    (Please ignore my remark about a TM of time t being simulable by a circuit of size t^2, since it seems irrelevant in this context!)

  340. a Says:

    Is this correct result

  341. Beat Hörmann Says:

    Using Turing machines that operate on tapes of a finite length, the value of BB(n) for a certain value of n can be approximated from below by successively lengthening the tape, provided that my arguments below hold.

    == Tape-Limited Turing Machine
    Let T_n,m be a Turing Machine (a TM) with n denoting the finite number of states and m denoting the finite number of storage fields on the tape. T_n,m starts somewhere on the tape, but crashes at the attempt to go beyond the borders of the tape—T_n,m halts in the “Crash” state. Otherwise T_n,m works exactly the same as a normal BB(n)-TM.

    The busy beaver BB(n, m) for tape-limited TMs is defined analogously to BB(n), however, crashing TMs do not count. [The reason is that a crashing TM may not halt if it could operate on an infinitely long tape so that BB(n, m) > BB(n) would be possible for m > m*, (m* see below).]

    ==Universal State
    The universal state U_n,m describes at any point in time the current state of T_n,m, the current position of the tape, and the entire current content of the tape.

    == Claims
    1. Let W_n,m be the number of all possible universal states. For given values of n and m, the value of W_n,m can be explicitly computed.

    2. For given values of n and m, an explicit program can decide whether T_n,m halts by simulating T_n,m for W_n,m steps at most: T_n,m crashes or halts normally within W_n,m steps or else runs for ever because it got into a universal state it was before. Thus BB(n, m) can be explicitly computed.

    3. W_n,m >= BB(n, m) for all n and m, because even a champion T_n,m cannot run longer than W_n,m steps.

    4. There exists a threshold m*, such that for all m > m*: BB(n, m) = BB(n, m*) > BB(n, m*-1), meaning that lengthening the tape beyond m* is not effective.

    5. BB(n) >= BB(n, m) for all n and m, because a champion operating on a tape of limited length cannot run longer than a champion operating on an infinitely long tape.

    6. For all m > m*: BB(n) = BB(n, m), because of Claim 4. The value of m* is not computable, for otherwise BB(n) would be computable.

    == Conclusions
    a) The sequence e_i := BB(n, i) for i = 1, 2,… where e_i+1 >= e_i can be explicitly computed for a finite number of elements.

    b) [Side-effect] W_n,m = BB(n, m) does not apply to all m < m*, for otherwise m* could be computed: Set c to 1 and compute W_n,c and BB(n, c). If both values are equal, increment c by 1 and compute W_n,c and BB(n, c) again with the incremented value of c. Do this until W_n,c > BB(n,c): m* = c – 1. It follows that for certain n and m < m* there exist universal states U_n,m never visited by any tape-limited T_n,m TM.

  342. Ian M Finn Says:

    a #340: No.

  343. Scott Says:

    Ian Finn #342 and others: From my standpoint, this is almost the perfect case—the NP=RP claim posted, debunked, and retracted, all before I’ve eaten and taken a shower! 😀

  344. Bruce Smith Says:

    Scott #331 and #327,

    In the relative clarity of morning, I think I might finally get your main point, though I am partly guessing.

    Consider these two classes of “small things to fight against” (ie some big thing has to be constructed to surely differ from all of them):

    – turing machines of size n and max runtime g(n)


    – small circuits of size g(n).

    (Here g(n) is any growth function you’re interested in — in the present context it’s roughly exponential in n.)

    Both of these have a “computational power” which grows basically like g(n) (at least if we ignore the potentially greater “power” of nonuniformity), but the number of such turing machines is only T(n), regardless of g(n), whereas the number of such circuits is more like 2^g(n) (exponentially greater in the present context).

    I didn’t go on to draw further conclusions from this; I think you did that, and considered them too obvious to make explicit, whereas for me, they are mostly not obvious enough to have noticed yet! But am I on the right track in understanding your main point?

  345. Nick Says:

    The survey conjectures that \( BBB(3) = 55 \), as witnessed by the program

    • 1LB 0RB 1RA 0LC 1RC 1RA

    I have not been able to find anything better than that, so I am inclined to agree with the conjecture. For completeness, here are three more programs that also hit 55. They are not essentially different.

    • 1RB 0LB 1LA 0RC 1LC 1LA
    • 1LC 0RC 1RB 1RA 1RA 0LB
    • 1RC 0LC 1LB 1LA 1LA 0RB
  346. Scott Says:

    Bruce Smith #344: Fundamentally, you are trying to give my handwavy, off-the-cuff remarks a far more thorough exegesis than they deserve or can withstand. 😀

    Yes, though, my main point was simply that I have no idea how to reprove P≠EXP, let alone SUBEXP≠EXP, using the constructions based on finding the lexicographically first Boolean function with a high circuit complexity. These constructions can separate EXP from SIZE(nk) for fixed k, and they can also separate double-exponential time from subexponential circuit size, but the fact that they involve enumerating over all small circuits seems like a barrier to doing better. Whereas the Lazy Beaver function seems to have just the right properties to reproduce the best that we know how to prove from the Time Hierarchy Theorem, even though it’s no good for circuit lower bounds (being a problem that depends only on n, not on a length-n input).

  347. Scott Says:

    Zirui Wang #333: No, I didn’t know that, thanks!

  348. Bruce Smith Says:

    Beat Hörmann #341:

    This is interesting, so I hope you don’t mind if I take the time to criticize your presentation as well as commenting on your ideas.

    As far as I know, your reasoning and claims are all correct (except I didn’t take the time to understand claim (b) at the end, since I’m in a bit of a hurry now).

    There are two technical points you left ambiguous: where the tape position starts (within the permitted m cells), and whether W covers all physically permitted configurations or only reachable ones.

    About defining W: by reading farther, I could tell you meant all configurations, whether reachable or not. (As you say, both forms of the definition are finite and computable, since the set of reachable configurations is.)

    About the starting tape position: for definiteness you could just require the tape position to start in the middle of the permitted range; but it might be more interesting to give a separate limit on tape usage to the left and right (more on this below).

    There are also ways your notation could be improved:

    – in general it’s better to use subscripts like _n,m on *classes* of machines, rather than on individual members of the class.

    – So, you could say T_n,m (for even m) is the class of “space-limited turing machines” which have n states and allow use of cell positions from -m/2 to +m/2.

    – (But to fit better with the existing notation T(n), you should probably call that class T(n,m) instead. If you want one with different left and right limits, that could be T(n, mL, mR).)

    – “U” and “universal” should be reserved for “universal computers” or “universal turing machines”. For the concept you called U, you could use “full state space” or “configuration space” of a machine, and let that space simply be a function of the machine, ie machine M is always in some configuration in its configuration space S(M). Then your W doesn’t even need its own notation — it’s just the size of that space, |S(M)|.

    Finally, most of your definitions and notation are not really needed, since there are more economical ways to present the same ideas using existing concepts. So, this is how I might rewrite your entire post (not including claim (b)) (this is technically not quite equivalent, due to the starting tape position issue, but gets at similar points; I think you can see how to revise it to be exactly equivalent if you want to):


    [re-presentation of your ideas:]

    Consider the turing machines T(n,m) which have n states and use at most space m before halting (with the allowed space centered around the starting position), and the function BB(n,m), defined like BB but only for these machines.

    Unlike with T(n), the fate of these machines is computable — such a machine M must either halt, exceed its allowed space, or enter an infinite loop in its finite configuration space S(M), within an easily computable number of steps, namely |S(M)|.

    Therefore, BB(n,m) is computable, and BB(n,m) ≤ BB(n), but it increases (or stays constant) as m increases, eventually equalling BB(n) for a finite (but not computable) m = m*.

    It might be possible to study this for small values of m experimentally. It would be interesting to study how BB(n,m) increases for fixed n, and how soon it reaches BB(n).


    Now I can give my own comments on these ideas:

    – Yes, this would be interesting!

    – It’s likely that the people who studied BB(n) experimentally, already know this — at least, in their surveys of various machines, they could have recorded for each machine the maximum tape usage on both left and right, pretty easily.

    – Some of Scott’s comments, and especially his Conjecture 19 (as numbered in the latest version of bb.pdf), suggest that they did study this, and found that the typical tape usage for a BB machine is roughly proportional to the square root of the elapsed number of steps — much smaller than the max possible tape usage, but much larger than if the machine was using up most configurations possible for low tape usage before proceeding to use more tape.

    – We might also wonder whether a typical BB machine grows its left and right tape usage at an equal pace. This too is probably known experimentally, but I don’t know it.

  349. Bruce Smith Says:

    Scott #346: “… a far more thorough exegesis than they deserve or can withstand. 😀”

    Well, maybe I am treating you a bit like an “Oracle“…

    But seriously, in this particular case, it was well worth it, both to figure out my guess in #344 and to hear your follow-up clarification in #346 (whose main point I didn’t guess, but which seems worthwhile to understand explicitly).

    Which reminds me, if you felt like clarifying/elaborating *this* part:

    “… even though [Lazy Beaver is] no good for circuit lower bounds (being a problem that depends only on n, not on a length-n input).”

    I would probably learn yet another worthwhile idea!

  350. Scott Says:

    Bruce #349: I just meant that, given a function like LB(n) that only depends on n, you can simply hardwire the value of LB(n) into the nth circuit, so there’s no hope of getting a hard problem in the nonuniform (circuit) model.

  351. Bruce Smith Says:

    Scott, your paper mentions a proven lower bound LB(n) ≥ |T(n)|/(c^n) for some c, and also gives Conjecture 25 that it is at least |T(n)|/(n^c) for some c — in my memory I’ve been confusing these sometimes!

    Anyway, I think Conjecture 25 isn’t possible unless you replace T(n) with T'(n), defined as a subset of T(n) with exactly one representative per equivalence class under state permutations (of all states but the starting state). Otherwise that bound seems clearly larger (for high enough n) than |T'(n)| + 1, which is impossible.

    (This assumes I’m not confused when I estimate |T'(n)| as just slightly higher than sqrt(|T(n)|).)

    (I don’t think this affects our discussion about the runtime of LB, at least not much, since even the proven bound is exponential in n, assuming a reasonable range for c.)


    Separately, as I mentioned in a reply to Job, our discussion about LB also leads to a lower bound on its value (assuming as always that “the proof works”) — just as a “subroutine to compute LB(n)” is not possible in a T(n) machine if it emits that value early enough to make possible the contradiction, the same goes for an “introspective encoder which emits LB(n)”, from which one could derive a bound. (I even speculated to Job that maybe you derived Conj. 25 that way.)

    I didn’t try to figure out how that bound might relate quantitatively to this one. If this one came from trying to design a machine to stop at exactly a desired step, then presumably the machinery of that could be the same in the two cases. I think this would leave the same “available runtime” for each of these “subroutines”. But their natural runtimes are different — the introspective encoder of LB(n) is probably fast, but is limited by available number of states, whereas the “use of hypothetical fast algorithm for LB (in a simulator which counts its steps)” is slow, but only needs a constant number of states (except to encode its argument n). This difference also leaves different tradeoffs for the design of the later stage which adjusts the exact runtime to a specified value — so maybe that stage is not the same after all. So maybe these two ways of deriving bounds would give different results.

  352. Bruce Smith Says:

    Wait, I got confused partway through writing that, and ended up comparing the wrong things. There are really three related things one could compare — Conj. 25, assuming you derived it by designing a machine to run for a desired hardcoded time; the bound on LB(n) derivable by introspectively encoding it into a machine which then runs for that desired time (giving a contradiction); the “contradiction-machine” in the proof of LB runtime, which also computes LB(n) but in a different way. I think the first part of my comment compared the first two of those, and the last part compared the last two of those (but I only noticed that right after posting). In fact all three are interesting to compare, but the two bounds calculations are more similar than I said in my second paragraph — probably I now think they ought to give very similar bounds.

  353. Scott Says:

    Bruce #351: Ah, that’s an excellent catch, thank you!! I completely agree, Conjecture 25 (along with the claim about |T(n)|/cn) can hold only if we talk about equivalence classes — which must’ve been how I slipped into thinking about it without saying so — since otherwise, there simply aren’t enough inequivalent Turing machines to get close to |T(n)| distinct runtimes. I’ll fix this in the paper.

  354. Toby Ord Says:

    All of this talk about the Lazy Beaver has made me notice the connection to the Berry paradox:

    “The smallest positive integer not definable in under sixty letters.”

    If we use the running time of Turing machines as a way of defining/specifying numbers, then LB(n) is the smallest number not definable in that manner in under n+1 states.

    This isn’t in itself a paradox, but is drawing close to one. I haven’t been following your LB discussion closely, but I gather that there have been attempts to make a machine of a fixed number of states that works out the LB(n) and then runs for that many steps. If so, then this is a very close analogy to the Berry paradox. You may well have noticed this already, but I think the explicit connection may be useful for either further progress or for explaining your approach to others.

  355. Toby Ord Says:

    The Berry paradox has been used by Boolos and by Chaitin to prove versions of Godel’s Incompleteness Theorem. So there might be ideas lurking in their proofs that are useful here.

    As Wikipedia summarises:

    George Boolos (1989) built on a formalized version of Berry’s paradox to prove Gödel’s Incompleteness Theorem in a new and much simpler way. The basic idea of his proof is that a proposition that holds of x if and only if x = n for some natural number n can be called a definition for n, and that the set {(n, k): n has a definition that is k symbols long} can be shown to be representable (using Gödel numbers). Then the proposition “m is the first number not definable in less than k symbols” can be formalized and shown to be a definition in the sense just stated.

  356. Beat Hörmann Says:

    Bruce Smith #348:

    Thanks for improving my presentation! You fully got the idea.

    I find Conclusion (b), if it holds, counter-intuitive: To be a BB(n, m < m*) champion, a TM does not necessarily have to adopt all possible configurations.

    I didn’t want to set a specific starting position because, as you mentioned, we would then have to worry about left and right spaces and I don’t think that it matters at which position the TM starts if we make statements about the set T(n, m).

    Note that when computing BB(n, m) TMs that exceed their allowed space must be treated in the same way as TMs that run forever. They must be omitted.

    As you mentioned, what I’m saying about space-limited TMs is probably well-known to the BB community. Since I didn’t find any mention of it in Scott’s survey, I just thought I’d mention it in the comment section. Those fundamental questions raised in Scott’s survey come from the fact alone that the Turing machines involved have access to an infinitely large memory.

    When I first heard about the article by Yedidia and Aaronson [15] in 2017 I began to analyze and simulate the first of the still open five-state TMs:

    A0->C1L, A1->E1L, B0->H1L, B1->D1L, C0->D1R, C1->D0L, D0->A1L, D1->E1R, E0->B0L, E1->C0R

    After step 814.320*10^12 the tape is 44488325 bits long (note the presumed square relationship between space and time!). This TM is heavily left-wing. In step 3027886 it takes its rightmost position of only 20 squares relative to the starting position, but spreads steadily to the left. You can replace this TM with a modified TM that jumps over the boring configurations of the original TM. I have simulated this modified TM up to 53*10^9 steps, which is about 10^18 steps of the original TM. The tape was then 121646698642 bits long. This simple TM is incredibly creative in creating and moving around a small number of tiny patterns on the tape thus hiding the simple yes/no answer to the simple question: Do you intend to play this game for ever or not? (Restrict the tape to a finite size and the monster is instantly tamed.) I think that Scott put it right: “In my view, the resolution of this problem [BB(5)] would be a minor milestone in humanity’s understanding of computation.”

  357. Nick Says:

    I spoke too soon about BBB(3). The survey claims that the program

    • 1LB 0RB 1RA 0LC 1RC 1RA

    hits state B at step 55 and then “spends an eternity in state C”. In fact, state B gets hit again at step 12341, after which another long period is spent in state C. If the program never hits B again, then

    $$ BBB(3) = 12341 $$

    My feeling (which is not worth all that much) is that this is not the case, and that state B will get hit over and over at increasingly large step intervals.

  358. Scott Says:

    Nick #357: That’s a striking claim, but how could it be? I just simulated that machine again, and after 55 steps, it’s just moving infinitely to the right, across a field of all 0’s, while remaining in state C. Nothing could stop it besides running out of tape. If you email me, I’ll be happy to send you my code.

  359. Scott Says:

    Toby Ord #354: I completely agree with you, Lazy Beaver is another “effectivization” of the Berry Paradox, besides the other effectivizations like Kolmogorov complexity.

  360. Zeb Says:

    Sniffnoy #151, Scott #153: As a possible next step after beeping Turing machines, how about this:

    We use a beeping Turing machine as before, but now instead of looking at the last time the machine beeps, we look at the set of all tape positions where the machine only ever beeps *in that position of the tape* finitely many times, and if there are finitely many such positions, we take the last time it beeps in one of those positions.

    Can this be used to compute BB_2(n)?

    Also: Can we prove that BBB(n) > BB(n) for every large enough n?

  361. Toby Ord Says:

    Here is a pretty half-baked idea, included for completeness.

    Some of the Turing machines with n states compute total functions from natural numbers to natural numbers (on some interpretation of the tape as a number). Some of these functions grow faster than others. We could consider a kind of meta busy beaver function(al) that takes n and returns the fastest growing function computed by an n state TM. Let’s denote it \(MBB_n\).

    This gives rise to various questions, such as for what n you get a function that grows faster than exponential? or faster than the Ackermann function?

    Is this interesting/useful? I’m not sure.

    In some way, if you care about fast growing functions, this could be a bit interesting. And you can also do things like apply the output function to n: \(MBB_n(n)\), which diagonalises out of the rates of growth that are Turing-computable, so is not itself computable. Is this an interesting novel proof of uncomputability? Maybe? It certainly feels extremely derivative of the Busy Beaver Function, yet the proof feels substantially different to the usual proof there. It is related to the proof of the halting function’s uncomputability, but uses a different kind of diagonalisation (it doesn’t change the nth value, it just uses it directly).

    It might also be useful to use this in proofs related to BB(n). e.g. if the convention for output is such that these TMs can be composed with each other, then that might be useful. e.g. you can get inequalities such as: \(BB(m+n) \ge MBB_m(BB(n))\), or \( MBB(m+n) \ge MBB_m \cdot MBB_n\).

    Note that the possibility of incomparable rates of growth is a sticking point here, as it means that the definition I gave is not actually determinate. There are ways of resolving this (e.g. taking the lexically first among those that aren’t dominated, or making a new function that upper bounds the top tier), but I’m not sure which is best as it depends what purpose this is being used for.

  362. Nick Says:

    Scott #358

    I retract my retraction. My tape allocation scheme had a bug that triggered right around step 12289. Whoops!

    To make up for that embarrassing blunder, here is a four-state program that hits state C at step 2568 and then spins off into state B forever:

    • 1RD 1RA 1LB 1LD 0RB 1RA 0RC 0RD

    If that’s right, then BBB(4) >= 2568. Could somebody verify this?

  363. Adrian Trejo Nuñez Says:

    Nick #358

    I ran your machine for one million steps and I see the same thing that you do.

  364. Scott Says:

    Nick #362 and Adrian #363 Thanks!!! I was going to check the machine but you beat me to it. If it’s ok, I’ll be happy to put this into my survey, with acknowledgment, when I get a chance.

  365. Bruce Smith Says:

    Toby Ord #361:

    That sounds interesting, but can you be more specific on exactly what you have in mind for defining the total function? I am guessing you mean: input number encoded on input tape, runtime used as output of function, and it’s only a total function if the machine always halts.

    But even if I guessed right (of which I am not at all sure), how did you want to encode the number on the input tape? (This might be a “detail” but it seems very important.)

  366. Bruce Smith Says:

    Beat Hörmann #356:

    Those are some very impressive simulation runtimes and tape lengths!

    That is also interesting information about that 5-state BB candidate. I would guess that it’s unlikely that, in general, this sort of asymmetry (in left/right tape usage) is present for infinite TMs and not present for those that eventually halt; therefore, this example makes it likely that not all of the BBs are symmetrical either. But for all I know, none of them are!

    Would it be convenient for you to simulate the BB(5) candidate known to halt, until it does halt, and confirm its runtime. and report on its left/right tape usage?

    As for whether it’s counterintuitive that a “champion” only sparsely uses the allowed “state space” (of tape configurations) — in fact it might be what you ought to expect. After all, if it used most possible tape states (as analyzed with some “word length” k, much smaller than total tape usage m, looking at the “probability distribution” of the 2^k possible “words”), then as it evolved it would approximately be experiencing a random locally surrounding tape. This means (intuitively — not proven, to my knowledge) it would approximately take a random walk on its “connectivity graph”. I think this means it would hit its own Halt instruction approximately once every 2n steps! If that’s not exactly true (eg if the chance of hitting each state depends on its number of in-transitions in that graph), it certainly would still hit it far, far too often for it to be a BB.

    To test that idea, you could take any of your simulations and sample some “tape words” (say for k = 10) and test what sort of probability distribution they seem to have (for a single machine). I conjecture that (for a single machine) it is very far from the uniform distribution! It is probably good enough to do it for only the final tape configuration, if you happen to have that still on disk.

    (BTW I have not read the literature on this — for all I know, they’ve already done tests like this and reported on them.)

  367. Scott Says:

    Bruce and others: Let’s assume the proof that LB(n) is not computable in poly(n) time works. Then here are some more interesting observations about LB:

    (1) Is there at least a poly(n)-time algorithm that recognizes the value of LB(n) when shown it? I claim that the answer is again an unconditional “no.” For suppose there existed such an algorithm, call it A. Also, let LB(n) take m bits to write down. Then suppose we hardwired the left m/2 bits of LB(n), and then cycled through all possible values for the right m/2 bits—checking each result using A, and stopping only when we’d correctly reconstructed LB(n). This yields a program that takes only m/2+o(m) bits to specify, and that (when run on a blank input) computes LB(n) in only 2m/2nO(1) steps. But, if the stuff we’ve been talking about works at all, then from there we can produce an n-state Turing machine that halts after exactly LB(n) steps, contradiction.

    (2) More generally, is there any NP witness for the value of LB(n)? (Of nO(1) size, and taking nO(1) steps to verify?) I don’t know, but I claim that if there is, then P≠NP! For if LB∈NP and P=NP, then LB∈P, which contradicts what I’m assuming to have been proven. (Indeed, since LB is a unary problem, one can even deduce the stronger consequence EXP≠NEXP.) I don’t expect this to be a promising approach to prove P≠NP, but it’s one that I hadn’t seen and that isn’t obviously dead on arrival (e.g., it wouldn’t require NP=EXP or anything like that to work)…

  368. Bruce Smith Says:

    Scott #346: “… the Lazy Beaver function seems to have just the right properties to reproduce the best that we know how to prove from the Time Hierarchy Theorem …”

    I was wondering just how close it really gets to that, if optimized for that. In other words, what if we start with the putative proof of LB’s high runtime, and vary the details until it’s as much as possible like the Time Hierarchy Theorem, and with its provable bound also as close as possible.

    (As a reference for THT, I used these course notes by Luca Trevisan,, found as a reference in the wikipedia article. The notes present two proofs, and (I think) state more conditions on the theorem than wikipedia does.)

    Summary: you can get to a bound which is almost, but not quite, as good, while still staying in the “function problem” rather than “predicate problem” world. (Is there a standard Time Hierachy Theorem for function problems?) (Caveat: I did not work out every last detail, so I could have missed something important, especially for extreme values of the parameters.)

    But in order to get that bound, several changes to the LB runtime proof were needed, listed below.

    Parameters used below: g(n) is the runtime limit; s(n) is the limit on number of TM states (for LB proof, s(n) = n); k(n) is the number of bits of TM output used (as described below, and also a couple nights ago). I assume k(n) is just barely big enough to make 2^k(n) > |T'(s(n))|, which is needed to make the desired output of the h function exist.

    These changes from the LB proof were needed (plus corresponding changes in the function h whose runtime is being lower bounded):

    – don’t use runtimes to communicate values — only use the max runtime g(n) as a resource limit. This means the “output of a TM” is what it leaves on its tape after halting (actually only the first k(n) bits of that) rather than its runtime. You still have to simulate it and count its runtime (to enforce the limit), so this has no effect on simulation overhead.

    – reduce the limited number of states of the TMs from n to some very slow-growing function, s(n). (The slower the better, since the number of iterations over TMs is |T'(s(n))|, and this is another multiplier which worsens the bound. The limit on how slow-growing s(n) can be is, I think, only that the runtime for computing s(n) from n should be small enough compared to the other time taken.)

    – as mentioned, this reduces the iteration over TMs to |T'(s(n))| rather than |T'(n)|. In the THT itself, there is no iteration like that in the proof — the function (really, language) it defines gets a single TM description as part of its input.

    – change that iteration, so that instead of running those TMs on an empty tape, we run them on a tape containing n in binary. This is needed in order to permit s(n) < log n and get any benefit from that, since if the contra-TM has to hardcode n internally and put it on its own tape, the state count to do that (say, log n, though I know it can be a bit lower) becomes a big limit in how good the bound can be.

    After all that, the resulting h function (analogous to LB) has this definition:

    h(n) is the first word in {0,1}^k(n) which can’t be the output word (length-k tape prefix after halting) of an s(n)-state TM started on a tape containing n in binary and halting within time g(n).

    And the resulting bound comes from this:

    if you could compute h(n) uniformly in time (exactly) g(n) or lower, you’d get a contradiction;

    but the upper bound (ie naive runtime) for computing h (ignoring a few details, like recording the results after each iteration, which I think are small compared to the rest, provided g(n) itself is not extremely small) is:

    O( g(n) log(g(n)) |T'(s(n))| )

    where |T'(s(n))| is the number of TMs you iterate over, and log(g(n)) is the simulation overhead (assuming you can use the same simulation methods as for THT, which I think is true). (This sim overhead depends on the computation model and would be different for models other than TMs, I think, for both this theorem and the THT.)


    I also think you can do an analogous thing for a space limit (ie imitate the Space Hierarchy Theorem), but I didn’t work out the details enough to compare the bounds in that case.

    I have a feeling all this is already known, either as a “function problem THT/SHT” (though I would not be surprised if they use a different proof and get a better bound), and/or as part of “resource-bounded Kolmogorov complexity” (in which they seem to prefer single “complexity measures” that combine number of machine states (really, description bits) used, and amount of time or space used, in various ways depending on the measure).

  369. a Says: appears to be wrong. I wonder if infact NP=RP holds and in fact P=NP is the truth then do we still need Pseudorandom generators to show P=BPP thus showing P=NP (assuming that there is no direct proof to P=NP but only by showing P=BPP) or we don’t need them?

  370. Job Says:

    (2) More generally, is there any NP witness for the value of LB(n)? (Of nO(1) size, and taking nO(1) steps to verify?) I don’t know, but I claim that if there is, then P≠NP!

    Wouldn’t an NP witness for LB(n) be in direct conflict with the proof that LB∉P, as sketched out?

    At that point the proof would produce a contradiction for a true statement, such as your (1) above, since you’d be able to produce an n-state machine that leaves LB(n) in the tape.

    It’s like, we can either show that LB∉P, or we can show that LB∈NP, but not both?

    E.g. if LB∈NP then i would expect something about Bruce’s approach to be impossible, such as a machine being able to consistently pad its own runtime with the necessary precision for LB(n).

    I actually wonder if there is an argument to be made that, for a given L, a proof that L∉P is always inconsistent with a proof that L∈NP. But only because this seems to be a pattern with P≠NP.

    You can either know a language’s non-p-ness or its np-ness, but not both. As soon as you prove one, the other becomes undefined. I believe this is what we know as the uncertain-p principle.

  371. Bruce Smith Says:

    Scott #367: both parts are very interesting!

    A detail in (2): since you are freely mixing function problems with predicate-problem complexity classes, let me see if I guess correctly how to formalize that — actually I think I know of two ways: (a) LB’ as a predicate, to be proven not in P and wondered about re being in NP, takes n in unary, i in binary (or unary), and one bit b, and accepts iff the ith bit of LB(n) is b, relying on the user to know the easily computed max length of LB(n) in binary. Or (b) LB’ takes n in unary, m in binary, and one bit b, and accepts iff the truth value of (LB(n) < m) is given by b.

    Other comments:

    in (1), the first half of LB(n) combined with the code for A (and for using A like you describe) is a just a special case of a “small enough compressed form of LB(n) which decompresses fast enough” to get the contradiction (assuming your calcs are correct — I think so but I didn’t check carefully).

    Open-ended question: Are there other special cases of that general class of compressed forms, which can be used to prove other things impossible?

    in (2), I assume your *guess* is simply that LB is not in NP and P is not NP. And certainly it doesn’t seem likely that there would be a poly-size/time witness for an exponential number of turing machines not having a particular exact runtime.

    Does it make sense to ask whether there is an oracle under which LB is in / not in NP?

    I don’t know if this is very related, but since we agree we can’t think of any way LB might be complete for any time-complexity class it’s in, I should mention that, if I understand (i.e. guess) correctly some parts of the introduction of , someone has proved completeness of languages which recognize various kinds of “high complexity strings” for various high-time-complexity classes. I have no idea what kind of reductions might accomplish that, so I can’t guess whether maybe there is some non-obvious way LB as a language (ie the set of its values? or maybe some related but denser language) might be complete for something.

  372. Scott Says:

    Job #370:

      Wouldn’t an NP witness for LB(n) be in direct conflict with the proof that LB∉P, as sketched out?

      At that point the proof would produce a contradiction for a true statement, such as your (1) above, since you’d be able to produce an n-state machine that leaves LB(n) in the tape.

    No, you only get a contradiction if the machine halts after exactly LB(n) steps. And you won’t be able to arrange that if the machine is too slow to search through potential witnesses. That’s why I currently don’t see anything simpler than the arguments that I gave.

  373. Bruce Smith Says:

    Job #370: since the NP witness can be very long (some poly(n)), it might be way too long to fit inside an n-state machine, so its (potential) existence doesn’t seem to affect anything you might try to do inside an n-state machine to reach a contradiction.

  374. Scott Says:

    Bruce Smith #371: For formalizing the question, why not just consider the language

    { (0n, k): LB(n)=k } ?

    There are oracles relative to which NP=EXP. Relative to those oracles, LB would necessarily be in NP, no?

    Yeah, I also idly wondered about the “power from random strings” results, and whether they might give us any leverage in showing that an oracle for LB lets you do other interesting stuff. Maybe they would if we had an oracle that, for any k, told us whether there’s an n-state TM that runs for exactly k steps. But an oracle for LB seems substantially weaker than that.

  375. Toby Ord Says:

    Bruce #365,

    I mean encode an input number on the tape and output based on an encoded number on the tape (i.e. how everyone other than those of us on this thread use Turing machines). The standard I learnt at university was to start the TM on a blank square (0), with the input number in unary in 1s to the right, followed by an infinite blank tape. It should always halt, and do so in the same kind of configuration, looking at a blank square with the output number encoded in unary to the right, followed by blank tape. This way, there is an obvious way to compose Turing machines. I’m not sure what was supposed to be to the left – the convention I learned may have been for a one-way tape, hence the starting blank square to stop you trying to run off the end of the tape.

  376. Beat Hörmann Says:

    Bruce Smith #366:

    I just simulated Marxen-Buntrock BB(5) candidate and can confirm 47,176,870 number of steps and 4,097 ones on the tape (my simulation neither overwrites the tape at the current position nor moves the tape, if in Halting state). The machine uses 12,289 squares of space, extends 12,243 squares to the left and 45 squares to the right, hence, heavily left-leaning.

    Here the figures for the champions of BB(2), BB(3), BB(4) and again for the Marxen-Buntrock BB(5) candidate, respectively:

    Steps: 6 ones: 4 space: 4 left: 2 right: 1
    Steps: 21 ones: 5 space: 5 left: 1 right: 3
    Steps: 107 ones: 12 space: 14 left: 10 right: 3
    Steps: 47176870 ones: 4097 space: 12289 left: 12243 right: 45

  377. Bruce Smith Says:

    Scott #374:

    The problem with the language { (0^n, k): LB(n)=k } is that, if you had an oracle for it, you could not use it to access the function version of LB unless you could guess its value!

    So it might be ok as an oracle for use within a nondeterministic machine, but not for efficient use within a deterministic machine.

    (OTOH I guess it’s fine as a formalization of that “LB recognizer” you also mentioned. Actually, it’s not — see next issue.)

    (In fairness, perhaps in your original context, part (2) of your comment #367, it was good enough — I’d just like a standard version that works in all contexts.)

    To work in both contexts (P or NP machine), I think you need something like the ones I said; let me rewrite them (or something similar enough to them) as languages in the same notation:

    { (0^n, m, b): (LB(n) < m) == b}

    { (0^n, i, b): (ith bit of LB(n) is b}

    The reason I think you need b — rather than just the following:

    { (0^n, m): (LB(n) < m)}

    { (0^n, i): (ith bit of LB(n) is 1}

    is that, though from a deterministic machine those would be just as good, from a nondet. machine you would fail instead of reading a False comparison or 0 bit as desired (if I understand how all that stuff conventionally works, which of course I might not).

    (Come to think of it, if oracles just give binary data, this is never an issue. I seem to be assuming an “oracle for a language” gives “accept or reject signals” like a nondet. computation path does. Maybe I just made that up somehow without realizing it?? I guess yes — sorry! I leave it here so you can confirm.)

    (BTW, this “b trick” I just learned a few days ago, from that same paper about various complexity measures. They used it in the context of the usage protocols for certain turing machines, not in language definitions.)

    “… Maybe [those results] would [give us leverage] if we had an oracle that, for any k, told us whether there’s an n-state TM that runs for exactly k steps. But an oracle for LB seems substantially weaker than that.”

    I agree.

    Do you have a good survey-like reference for “completeness (for some class) of a language consisting of high-complexity-in-some-sense strings”?

  378. Bruce Smith Says:

    Toby #375: thanks, I get it now.

    To make them compose properly with BB and get those interesting inequalities, you would have to change the BB output convention, or (when mixing MBB and BB) wrap the BB with a simulator which counted the steps and put that onto the tape (thus incurring the same overhead in state count as in Theorem 16).

    But it’s interesting that this MBB (if we ignore the well-definedness issues) probably does have composition laws similar to those you mentioned. (Even though I bet they’re very weak compared to reality, just like our lower bounds for BB.)

  379. Bruce Smith Says:

    Scott #374 (again):

    You mentioned “an oracle that, for any k, told us whether there’s an n-state TM that runs for exactly k steps” — ie the language

    L_runtimes = { (0^n, k) | k is in R(n) }

    (where R is the spectrum of runtimes of n-state TMs as in your Section 5.9). (Feel free to suggest a better language-name.)

    That language seems interesting in a few ways.

    For example, I think you can prove it’s not in P using the same argument as for your “LB recognizer” in your comment #367, part (1). And I think the same comments in part (2) then follow (making it unlikely to have NP-witnesses for either false or true). Of course a specific TM is a short witness for its runtime being in R(n), but it takes too long to verify. I think this also shows “if P != NP, there are TMs that halt at time k, but for which there is no poly(n)-sized proof of their halting at time k.” (I imagine that’s already well-known, and maybe has simpler proofs. BTW I’m not trying to state it entirely precisely here.)

  380. Bruce Smith Says:

    Beat Hörmann #376: Thanks!

    That confirms the runtimes from the table in the paper.

    (The “ones” figures sometimes differ by 1, which at first I thought (based on your comment about HALT) meant some simulator used to make the table in the paper improperly wrote a bit to the tape after halting — but then I realized “Rado’s ones function” in the paper is not defined based on a runtime champion (in fact, that would not be well-defined unless we picked the one that ended with the most 1s), but over all n-state TMs that halt — so the winner might not be a runtime champion. Even so, the near-coincidence in ones values makes me wonder whether it *happens* to be the same TM in those cases, but there *also happens* to be a “HALT bit-writing bug” in those simulators. If that is the case, my own tiny python simulator confirms yours, that BB(3) does end with 5 ones on its tape.)

    It’s interesting that left/right asymmetry looks common. You can even imagine seeing it for BB(4), as well as for both BB(5) candidates we discussed so far.

    After the fact, I can make up an intuitive theoretical reason for that (a pretty vague one): the machine needs code which is able to safely grow the written part of the tape at one end without going off to infinity; but it doesn’t need such code for both ends — it can afford to be very conservative about growing the other end. So if such code uses up states or “half-states” (plausible in hindsight), and has to exist independently for the two tape ends if they both want to use it (plausible), that might explain this pattern!

  381. Bruce Smith Says:

    Well, here is a simpler proof of “for any poly, there exist n-state TMs without poly(n)-sized proofs of their halting at time k” — if not, you could solve halting problem by looking for those proofs! Also, log(k) would be limited to poly(n)!

    Now if I had talked about poly(log k)-sized proofs, maybe that’s not so simple to rule out. At least it would not have the above problems. And it is still covered by those “NP-witness speculations”, since poly(log k) is poly in the input size of the language L_runtimes (which is n + log k).

    So I amend my prior comment about that issue to: “if P != NP, for any poly, there are TMs that halt at time k, but for which there is no poly(log k)-sized proof of their halting at time k.” (But I still wonder if this is known unconditionally in some other way.)

  382. Bruce Smith Says:

    Arrgh, I have to correct *another* mistake: my prior comment should not have assumed P!=NP, but P=NP!

    This is counterintuitive to me. It makes me want an unconditional proof even more.

    Here’s an apparently different proof of the same theorem:

    Theorem: if P = NP, then for any poly, there are TMs that halt at (exactly) time k, but for which there is no poly(log k)-sized proof of their halting at time k.

    Proof: assume otherwise. Then we can construct a counterexample-TM which finds its own proof of halting at time k, and purposefully halts sooner. To make sure it comes under the conditions of the hypothesis, it also has to halt if it fails to find that proof (even though, in the end, we know it will); and it has to run for a long fixed time before doing the search, to ensure k is long enough to exceed q(poly(log k)) where q is the polynomial runtime of the P alg for SAT. (The existence of that alg, due to P=NP, is what ensures this proof search can end before time k has elapsed.) QED.

    (Caveat: I did not actually write down and confirm the math. I just “think it works”.)

  383. Bruce Smith Says:

    (q should also include the overhead of constructing a circuit saying that a proof of length n is correct and has the right form — that is, the function taking proof length to circuit size.)

  384. Toby Ord Says:

    Scott #277,

    Glad to be of help! Yes, your concrete version is what I’m thinking of, though there is an additional challenge re the limit ordinals, which is that there are multiple sequences of α(n) that converge to any limit ordinal β, which would give multiple definitions for BBβ. Any would suffice, but we need to pick a particular one in order to define it properly. I can see methods of doing this for various ordinals I can consider, but I don’t know if there is a standard way of getting a canonical sequence of ordinals leading up to an arbitrary ordinal (less than the Church-Kleene ordinal).

    It does feel to me like there may be a method here of getting beyond the Church-Kleene ordinal — perhaps by bypassing some assumptions of that theorem of Solovay, or (sadly) by giving up the ‘any upper bound suffices’ property. That property has served us well and feels inextricably linked to busy beaver growth, but if the ultimate quest was fast growing functions (or compact specifications of large numbers) then perhaps we could proceed even without it.

  385. STEM Caveman Says:

    @Scott #176

    > I have no problem at all with Dependent Choice … I’m totally fine with the notion of flipping a coin a countable number of times.

    Even a single random bit (as an actual object chosen according to a probability measure, rather than the measure itself) is a notion that does not stand up to examination. As I mentioned, these ideas are not all that coherent and the misconception that they are essential, that they have to exist as full fledged Stuff in your theory rather than being syntactic sugar to help the calculations go down easier, is the result of verbal habit getting internalized. There’s a reason why probability was treated with suspicion before Kolmogorov and also a reason why probabilists today tend to think in terms roughly akin to “algebras of potential random observables” rather than actual randomness (whatever that could be).

    > By contrast, I could not do math and TCS if I had to give up the idea that a Turing machine either halts or runs forever,

    Not only could you do it, it would be easy and beneficial. Just do whatever you’re currently doing, and get in the habit of describing it more precisely. You will find that certificates (e.g. proofs, finite computations) that a Turing machine halts or runs forever are perfectly useful concepts but if you make arguments that genuinely, unavoidably depend on a priori existence of arbitrary TM halting status, independent of any ability to find a certificate, then there is an excellent chance that what you are doing will not be able to have tangible consequences for earthly computation.
    If you find this crazy, try to give the simplest example where you think the (very un-quantum!) idea of a pre-existent reality of halting status for all TMs does something useful and can’t be better phrased without that assumption. It is not so easy, and people have in one setting or another looked hard for examples.

    > It’s the denial of that idea that strikes me as crazy and barely even comprehensible.

    Not assuming God or the Devil exist strikes people as crazy if they have formed over several decades the habit of seriously referring to those as real entities. When the Reverend Jeremiah Wright says “God damn America”, he probably has in mind a super-being with very particular properties that really exists, performing specific acts of damnation, and he might be very insistent that this is necessary for his words to make sense. But those who don’t share his ontological assumptions can make perfect sense, arguably better sense, of what he is saying with at least one fewer existence assumption. His God is a syntactic construct, a bunch of search-and-replace macros that replace “God damn X” with “I say X should be damned” and similar macros to secularize “damn”, Hell, Heaven, etc. His less religious listeners like Obama and Oprah made those conversions automatically without difficulty. Same with noncomputable objects in TCS.

    > I’m inclined to turn the tables and ask: what’s a specific example of a Turing machine whose halting you consider to be indeterminate? (As opposed to not yet known, which surely you agree is different?) Or: you agree, don’t you, that BB(1)=1, BB(2)=6, BB(3)=21, BB(4)=107, BB(5)≥47,176,870, BB(6)>10^36,534, BB(7)>1010^10^10^18705353, that these are all facts are surely as 4+3=7 is a fact? Great then, what’s the first value of n for which you think that BB(n) is indeterminate?

    I don’t agree with any of that. Not assuming God exists as part of your mathematics doesn’t mean you have to demonstrate His non-existence, or the existence of the Devil, or even to personally disbelieve in God. It just means you generally phrase your theorems without reference to a deity.

    What we have concerning the Busy Beaver problem are (perhaps implicit) proofs of upper and lower bounds on runtime of particular small machines. If the proofs are done in consistent proof systems, and the systems are consistent with each other (e.g., can all be unified within ZF), then these bounds will be consistent with each other. That consistency would also follow from there being an underlying value “BB(n)”, but that doesn’t mean there has to be such a thing or that we could ever show it by getting the lower and upper bounds to agree. BB(n) is defined as Max of some complicated function on a complicated set, and the only properties ever needed of Max of anything are lower and upper bounds on it, not that it itself be a number. This may sound strange to you, but writing in this way would have shortened and clarified the exposition in your article, and treating Max as not a number but a target for inequalities is akin to setting up a formalism for physics that automatically forbids certain unphysical questions from being asked.

    If you nevertheless insist on writing “BB(n)”, we can say “BB(k) is at least L” if there is a k-state machine that we can prove runs at least L steps, and “BB(k) is at most M” if we can prove all k-state machines that halt do so in M steps or less, without assuming BB(k) is an integer let alone a pre-existent one. This sticks to precisely the sort of information we actually have, and doesn’t contradict BB being a definite integer, but doesn’t assume it either. I see no reason why every 6 or 7-state TM that halts should have a proof of less than 10^1000 pages in ZF that it halts, and certainly no reason to have a description that compact of its exact runtime (i.e., BB(6) and BB(7), for the right choice of machine). Nor any reason to believe ZF can be canonically extended to efficiently cover these issues even for small n. In what sense then do you expect BB(k) to be usefully treated as an integer-valued function, rather than a metaphor for the data we actually have (bounds on runtime), or a somewhat obfuscatory abbreviation for more specific statements about machines?

  386. Scott Says:

    STEM Caveman #385: I’ve considered what you’ve written, and sorry, but my choice is not to talk in the way you’re advocating. I see nothing that I’d gain in exchange for all the new mental acrobatics required. Elsewhere in this thread, we’ve been making actual mathematical progress on questions that I find fascinating—for example, about the LB and BBB functions. Can you imagine the verbal contortions needed if we weren’t even allowed to refer to those functions as functions at all? (Or will you graciously allow the LB function, since it has a computable upper bound?)

    We might as well forget about Dependent Choice, etc., and just focus on whether a given Turing machine halts or run forever, since you deny a definite fact even there. Do you at least agree that, if a Turing machine does halt, then it’s a fact that it halts? If not, at what number of steps does it stop being a fact? If so, then doesn’t this really come down to the Law of the Excluded Middle? Maybe the issue is that I’ve never understood the advantage to be gained by denying that law, outside perhaps of specialized applications in program verification and the like.

  387. Scott Says:

    Bruce Smith #382: If NP=EXP, then for every n-state TM that halts in k steps, there’s a poly(n,log(k))-sized proof of its halting that can be checked in poly(n,log(k)) time. Indeed, what I just wrote is equivalent to NP=EXP. So it seems to me that the argument you gave could be replaced by: if P=NP, then by the Time Hierarchy Theorem, NP≠EXP.

  388. Nick Says:

    Scott #386

    > I’ve never understood the advantage to be gained by denying [the Law of the Excluded Middle], outside perhaps of specialized applications in program verification and the like.

    I’d say program verification is a critical application for the purposes of this discussion — after all, that’s exactly what we’re doing when we talk about machines halting or not. As to your particular question:

    > Do you at least agree that, if a Turing machine does halt, then it’s a fact that it halts? If not, at what number of steps does it stop being a fact?

    Let’s take “is a fact” to mean “is provable”. Then, yes, for every halting machine, there is a proof that it halts.

    But the situation is different for nonhaltingness: it is NOT the case that for every non-halting machine there is a proof that it does not halt. There are proofs for some of them (like the BBB(3) and BBB(4) candidates), but for others there are not. As for when that happens, that’s an open question. Preliminary research suggests that haltingness can’t be decided for machines of 748 states, but that upper bound is unlikely to be tight.

    Constructivists don’t deny the law of the excluded middle in general. Instead, they require it to be proved on a case-by-case basis. For example, it can be proved constructively that every integer is either even or not. That is, if you give me an integer, I guarantee that I can give back either a proof that it is even or a proof that it is not even. I can’t make the same guarantee about haltingness, because you might come up with a nonhalting machine for which there is no proof of nonhaltingness.

    I don’t have a problem with nonconstructive arguments, but they should be sharply demarcated from constructive ones because they require stronger assumptions. An argument like the following should be flagged for further consideration: “Let M be a machine such that blah blah blah. Now, either M halts or it doesn’t. If it halts, blah blah blah, and if it doesn’t, then blah blah blah…”

  389. Scott Says:

    Nick #388: Sorry, but I expect that having to “flag for further consideration” every time I branch on whether a Turing machine halts or doesn’t halt, or whether an integer exists or doesn’t exist, etc., would increase my cognitive load enough seriously to impair my ability to do math. And to what end? Also, there are lots of other things I could flag—e.g., every time I find a proof that uses induction, or real numbers, I could worry about whether there’s a different proof that doesn’t use them. Until someone gives me a reason, why should I worry about this thing in preference to all the others?

  390. Bruce Smith Says:

    Scott #387:

    Thanks very much, that greatly clarifies the situation for me.

    The only part I can’t yet confirm for myself is the equivalence of those statements — I only see how to go from NP=EXP to the statement about short proofs, not the other way. But I will think on this for a day or so (off and on) and see if I can figure out the other direction.

  391. Bruce Smith Says:

    I think I see it now:

    an algorithm A in EXP, applied to input x, can be turned into a TM which hardcodes x, feeds it to A, then halts after even or odd runtime (exponential in |x|) to accept or reject x.

    If we assume those short proofs of halting at specific times exist, then there is one to use as an NP- or co-NP- witness of this halting. So we conclude A is in both NP and co-NP, thus EXP = NP = co-NP.

  392. Toby Ord Says:

    Bruce, #378 Oh, you are right about the composition issue. What I said is closer to the Ones function than the Busy Beaver function (and requires the ones to be in a single contiguous block, as well as returning to the right start position). That said, I think the standard proof that there is an uncomputable function via the Busy Beaver function is with a version like this that allows composition. It is a pity this doesn’t work as I wrote it for the actual Busy Beaver though.

  393. Toby Ord Says:

    Scott #277,

    I just noticed a problem with my use of \(BB_n(n)\) as the oracle for level ω etc. The problem is that some work needs to be done to prove that this exceeds every \(BB_k(n)\) for a fixed k. This is because it is possible to have a sequence of faster and faster growing functions where this diagonal function doesn’t grow faster than all (or any!) of them. e.g. if the kth function starts with zero all the way up to and including k, then grows with gradient k. The diagonal function would just be the constant zero function. And even the diagonal +1 (the usual trick) would just be the constant 1 function. It is assured to not equal any of those in the sequence, but not assured to grow faster. I assume that this problem doesn’t actually occur in the case of reaching level ω in the Busy Beaver function, but am not completely sure, and am more unsure about the higher limit ordinals. It may not be too hard to prove it works, but a proof is needed none-the-less.

  394. Nick Says:

    Scott #389

    In general, proofs that require a certain proof technique to work should be separated from those that don’t require that technique. That goes for everything: the axiom of choice, law of the excluded middle, and yes, even induction. That’s not to say that someone doing number theory has to make a big announcement every time they use induction. But say a theorem is proved using induction, and then another proof is found of that theorem that does not use induction. Is that an improvement? I would say so, since it means the theorem can be proved by weaker theories that don’t have induction.

    Here is an old statement of yours [1], which I think you would still stand behind (please correct me if that’s not the case):

    I submit that the key distinction is between

    1. questions that are ultimately about Turing machines and finite sets of integers (even if they’re not phrased that way), and

    2. questions that aren’t.

    I would modify that to say “Turing machines for which haltingness is decidable”. That includes all halting machines, as well as some of the nonhalting ones. It probably includes, for instance, all 4-state machines. But once you get outside of that and start talking about arbitrary machines, you may be in territory that is more like the second set of questions than the first.


  395. Nick Says:

    Speaking of 4-state machines, here is a new candidate for BBB(4):

    • 1LB 1LC 1RC 1LD 1LA 1RD 0LD 0RB

    This one hits state B at step 2819 before spinning out into state D. If this is right, then BBB(4) >= 2819.

    The 2568-step candidate left its tape totally blank before spinning out, giving it a disappointing sigma count of 0. This one has a sigma count of 69 (that is, it leaves 69 ones on the tape, in one consecutive block in fact). For comparison, the sigma count of the 4-state busy beaver is 13.

  396. Beat Hörmann Says:

    Bruce Smith #380:

    Thanks for clarifying on sigma vs. “ones”!

    According to “”, cited as [10] in Scott’s survey article, there are two sigma(5) candidates with 4098 ones, among them the Marxen-Buntrock BB(5) candidate. Both machines are described to execute 1L in Halting state and to overwrite 0, so that the best known result for sigma(5) is 4097 if the machine does not write 1 in Halting state.

    Funny that there exist not one but two candidates for sigma(5).

    So the Marxen-Buntrock BB(5)/sigma(5) candidate uses 12,289 squares of space but produces 4097 ones only. If that candidate is the true champion, then it behaves differently than the champions for BB(1), BB(2), BB(3), and BB(4) which produce almost as many ones as they use space.

    I also simulated the other sigma(5) candidate which I can confirm makes 11,798,826 steps. It is more memory-effective than the BB(5)/sigma(5) candidate: For producing 4097 ones it only needs 6145 squares of space.

  397. Scott Says:

    Nick #394:

      I would modify that to say “Turing machines for which haltingness is decidable”.

    Then this is really the crux of the disagreement. Firstly, what do you mean by M’s haltingness being “decidable”: do you mean there has to be a TM to decide whether M(x) halts for arbitrary x? If so, why is that relevant at all to the definiteness of whether M halts on a specific x? Or do you mean that M’s haltingness has to be provable? If so, provable in which theory: PA? ZF? Large-cardinal theories?

    Rather than trying to adjudicate any of that, I take a far simpler approach. I hold these truths to be self-evident, that every Turing machine either halts or runs forever, and that which it does is a fact independent of all theories, interpretations, and models. And that to decide these facts, theories are instituted among men, but when a theory fails to decide the facts to our satisfaction, we have the right, we have the duty, to replace it by a better theory.

    This, incidentally, was also Gödel’s position (Gödel went even further, hoping for “self-evident” axioms that would settle the Axiom of Choice and the Continuum Hypothesis, something on which I’m completely agnostic). When restricted to arithmetic, I have yet to see a single problem that this position leads to.

  398. gentzen Says:

    This, incidentally, was also Gödel’s position (Gödel went even further, hoping for “self-evident” axioms that would settle the Axiom of Choice and the Continuum Hypothesis, something on which I’m completely agnostic).

    The Axiom of Choice is well researched and understood extremely well. The Continuum Hypothesis is also well researched, but not understood so well. The currently accepted foundations of mathematics are ZFC, and it would be a mistake to assume that ZF is an improvement over ZFC. It is well know when and why Choice can make trouble, and some good ways to cope with it are known too.

    When restricted to arithmetic, I have yet to see a single problem that this position leads to.

    The problems are not in the proofs, but in the statements of the theorems being too vague (or idealized) about what has actually been proved. The classical symptom of this are corollaries stated after the main theorem, which turn out not to be corollaries of the main theorem itself, but only of its proof.

    Take the model existence theorem for first order logic, as an example. For me, being computable in the limit means that the existence implied by that theorem has a nice specific interpretation of what it means, and of what it doesn’t mean. But since the theorem is normally just stated in a way that the existence it proves might also just be of the type ensured by the Axiom of Choice (the best way to interpret is I know so far is that I will be unable to construct a contradiction), I wondered for years what sort of ontological commitments might be sufficient for proving it (even so in the end the answer turned out to be sort of trivial).
    Or maybe a better example is the other way round, where the theorem suggests that the proof showed something much more nontrivial than it actually did. The No Free Lunch Theorems for Optimization are paradigmatic examples of this problem.

  399. Scott Says:

    gentzen #398: I agree that the metamathematics of AC are well-understood—I only meant taking a metaphysical position on its “Platonic” truth or falsehood.

    It sounds like most of the problems you describe could be solved by just stating theorems more carefully, without any change to the foundations of mathematics.

    And yes, the No-Free Lunch Theorem seems like a perfect example of its name: a failed attempt to get a nontrivial conclusion out of a trivial argument. 😀

  400. gentzen Says:

    The unclear Platonic status of AC is not caused by AC itself, but by the fact that there are many possible alternative set theories. And those set theories are more different from one another than merely having different consistency strenghts (or having different axiomatizations). For specific set theories (not just in terms of axiomatization, but also in terms of how the specific set theory is supposed be interpreted), AC often becomes trivially true or false. Just like AC is trivially true in ZFC (especially if ZFC is interpreted as the “backport” of von Neumann–Bernays–Gödel set theory).

    Just stating theorems more carefully sounds easier in theory than it actually is in practice. This becomes important for proof assistants and automatic theorem proving. Even those intuitionistic dependent type theories used in Coq and some other proof assistants are no pancake in this respect.

    But I basically agree with you that no change to the foundations of mathematics with respect to arithmetic is required (regarding excluded middle). Even for second order arithmetic, I see no indications that being careful about excluded middle would change the conclusions in any meaningful way. The story is different for third order arithmetic.

  401. Nick Says:

    Scott #397

    Firstly, what do you mean by M’s haltingness being “decidable”…?

    Fix a theory T and an input I and let M be a machine. M’s haltingness on I is decidable in T if there is a proof in T that M halts on I or there is a proof in T that M does not halt on I.

    The haltingness of all halting machines is decidable, and the haltingness of some and only some nonhalting machines is decidable. I conjecture that the haltingness of any n-state machine for n under 5 is decidable.

    I hold these truths to be self-evident, that every Turing machine either halts or runs forever, and that which it does is a fact independent of all theories, interpretations, and models. And that to decide these facts, theories are instituted among men, but when a theory fails to decide the facts to our satisfaction, we have the right, we have the duty, to replace it by a better theory.

    I don’t deny any of that, so maybe we are arguing at cross purposes. It sounds like you are arguing against an extreme constructivist position that identifies truth with provability. That may be STEM Caveman’s position, but it isn’t mine. All I’m saying is that it’s a good idea to separate those arguments that make unavoidable appeals to LEM from those that do not. (I guess I am a moderate, open-minded constructivist.)

    In fact, from a platonic standpoint, it doesn’t matter whether or not you care about the distinction, because it always already exists anyway. For let P be a provable claim. Then either P can be proved without LEM or it cannot be proved without LEM (is that claim itself constructively valid???). So my claim is just that it’s a good idea to keep this distinction in mind. If you make a LEM argument, does it really need LEM, or can it be done without? This is like asking whether a theorem of geometry can be proved without set theory. Ignoring the distinction or not is just a matter of hygiene.

    Do you have any examples of arguments that make unavoidable appeal to LEM that don’t have to do with halting, incompleteness, uncountable sets, etc?

  402. Nick Says:

    Philosophy aside, I’d like to make a conjecture:

    For all n > 2, BB(n) < BBB(n) < BB(n + 1).

    I don’t know how to prove that, but my intuition is the following.

    On each execution step, a machine does three things: print some color, move left or right, then go to some state. Call these things together an action.

    A 2-color n-state program has 2n actions (one per color per state). For a halting program, at least one of these actions tells the machine to halt. A halting action cannot contribute to the computation, so a halting program is effectively operating with 2n – 1 actions. The BBB game does not require programs to halt, so that wasted halt action can be used for something more productive. A program can operate with all 2n actions, and can thus do more computation.

    On the other hand, a halting (n + 1)-state program has 2n + 2 actions. One of these is a wasted halt action, so it’s really operating with 2n + 1 actions. This is more than the nonhalting n-state machine’s 2n actions, so the halting (n + 1)-state program can do more computation, and BBB(n) < BB(n + 1).

    Can anyone come up with an explicit construction to show this?

  403. Scott Says:

    Nick #402: No, that’s not the case. As I point out in the survey, BBB grows like BB1 (the BB function for Turing machines with HALT oracles), so completely dominates BB for larger n. This is because having an algorithm to compute an upper bound on BBB would let you decide whether a given TM accepts a finite or an infinite number of inputs, which is a complete problem for the second level of the arithmetical hierarchy.

  404. yme Says:

    STEM Caveman said in comment #385: “I see no reason why every 6 or 7-state TM that halts should have a proof of less than 10^1000 pages in ZF that it halts”

    Sure, but that didn’t stop you from describing them as TMs that halt. Isn’t that the sort of statement you’re arguing is meaningless?

    It’s not easy to only ever talk about proofs, but never about the meaning of the statements that they’re proofs of. And why should we? If we didn’t think that a statement has a meaning, that it’s either true or false, and that its provability implies its truth, why would we care about proving it?

Leave a Reply

Comment Policy: All comments are placed in moderation and reviewed prior to appearing. Comments can be left in moderation for any reason, but in particular, for ad-hominem attacks, hatred of groups of people, or snide and patronizing tone. Also: comments that link to a paper or article and, in effect, challenge me to respond to it are at severe risk of being left in moderation, as such comments place demands on my time that I can no longer meet. You'll have a much better chance of a response from me if you formulate your own argument here, rather than outsourcing the job to someone else. I sometimes accidentally miss perfectly reasonable comments in the moderation queue, or they get caught in the spam filter. If you feel this may have been the case with your comment, shoot me an email.

You can now use rich HTML in comments! You can also use basic TeX, by enclosing it within $$ $$ for displayed equations or \( \) for inline equations.