Incompleteness ex machina

I have a treat with which to impress your friends at New Year’s Eve parties tomorrow night: a rollicking essay graciously contributed by a reader named Sebastian Oberhoff, about a unified and simplified way to prove all of Gödel’s Incompleteness Theorems, as well as Rosser’s Theorem, directly in terms of computer programs. In particular, this improves over my treatments in Quantum Computing Since Democritus and my Rosser’s Theorem via Turing machines post. While there won’t be anything new here for the experts, I loved the style—indeed, it brings back wistful memories of how I used to write, before I accumulated too many imaginary (and non-imaginary) readers tut-tutting at crass jokes over my shoulder. May 2019 bring us all the time and the courage to express ourselves authentically, even in ways that might be sneered at as incomplete, inconsistent, or unsound.

67 Responses to “Incompleteness ex machina”

  1. William Hird Says:

    What a clever character, he proves Goedel’s incompleteness theory twice and proves the existence of time travel all in the same paper ! ( look at the date of the paper, my computer says it’s still 12/30/2018.) LOL.

  2. Sebastian Oberhoff Says:

    It was already the 31. in Germany where I am when I found one more error and hastily sent Scott another version. Fortunately light travels faster than the earth turns.

  3. Andrew Certain Says:

    I already don’t believe that computers will ever be able to replicate human intelligence, so maybe this is just confirmation bias, but I read the essay as providing another argument that they won’t.

    As anybody who reads this blog knows, there are plenty of conjectures in math that are believed to be true even though they aren’t yet proven. Some of these conjectures may actually be unprovable; nevertheless, at some point there is enough consensus among mathematicians that they are true and they are then used to prove other valuable results.

    But what would it mean for a computer to “believe” that a mathematical statement is true but be unable to prove it? Certainly computers can make probabalistic statements, but to me at least there is some qualitative difference between “there is a 99.9% probability that this theorem is true” and “I believe with certainty that this theorem is correct but I can’t prove it.”

    Of course, it’s easy to imagine a computer digesting all text related to prime numbers and then when posed the question, “What do you think about the Riemann Hypothesis?” spitting out “I believe the Riemann Hypothesis is true, but not yet proven.” But could it ever when given novel statements which are neither provable nor disprovable pick which ones it “believed” to be true? I freely admit it may be a failure of my imagination, but the conclusion of this essay at least to me highlights the non-computable abilities of human cognition.

  4. William Hird Says:

    Hi Sebastian, wonderful paper, I was just kidding of course, and best wishes for a Happy New Year to all friends of Shtetl-Optimized 😉

  5. James Gallagher Says:

    William #1

    That would have been almost appropriate if true, since Gödel himself had a proof of time-travel in an unusual spacetime (Gödel metric)

    (Note that I have taken care to spell Gödel correctly, since I noticed in a previous comment that I unforgivably misspelled “von Neumann” twice. My New Year’s resolution is to take care to check spelling of such notable people.)

  6. Scott Says:

    Andrew Certain #3: Nominative determinism strikes again! I’d try to become less “certain” of your beliefs in this area. For starters, I’m not absolutely certain of mathematical statements that I can’t prove. Even for P≠NP, I’m on record as saying that my confidence is “merely” ~98%. For (say) e+π being irrational, my confidence might be 99.99999%—so high that seeing a proof wouldn’t even substantially increase my confidence, because of the possibility of an undiscovered error in the proof—but it’s not 100%.

    For a second thing, this does seem like obvious confirmation bias, because Oberhoff’s delightful essay is presenting material that was fully understood in the 1930s, and which your own comment suggests you already knew. The essay has some pedagogical innovation but no new philosophical arguments.

    To my mind, the idea that Gödel’s theorem places an interesting limit on machine intelligence, one that isn’t at the same time a limit on human intelligence, has been thoroughly debunked—with the key insight, again, being that human mathematicians are not absolutely certain about the unproven consistency of formal systems, any more than machines would be. And if you want to fall back on talking about humans’ subjective experience of feeling certain—well, you might as well have talked about their subjective experience of tasting strawberries, or any other conscious perception, so why even bring anything as abstruse as Gödel into it, other than to impress laypeople? 🙂

    Having said that, if you really want to believe that Gödel is the “sword in the stone” proving human specialness—well, you’ve got Penrose on your side, and we could agree to disagree. But on one condition only: at least don’t pretend that every time you read another proof of Gödel it somehow provides you more ammunition!

  7. Kevin S Van Horn Says:

    Andrew Certain #3: Why do you believe that computers will never be able to replicate human intelligence? Human brains are physical systems, their operation described by the same laws of physics as any other physical system. Why should it be fundamentally impossible to build a machine whose intended capabilities we already know to be physically possible to achieve with only modest inputs of materials and energy?

  8. mjgeddes Says:

    Andrew #3, Scott #6

    Yes, I think the correct conclusion to draw from Godel is not that there’s anything uncomputable about cognition, but rather that mathematics , in some sense, exceeds (is beyond the comprehension of) cognition.

    I think what Penrose doesn’t like is the idea that this seems to prevent us from reasoning our way to mathematical truth. But , all it actually means in practice, is that mathematics is an *empirical* field, just like physics say. We are not limited to deductive methods, but we can *empirically* find patterns through numerical calculations and extrapolate. Of course this sort of empirical extrapolation doesn’t deliver certainty, but we don’t actually need certainty to make progress. So give up the Penrose demand for certainty, and there really isn’t a problem here.

  9. Scott Says:

    mjgeddes #8: I don’t quite agree with that view either. We can have certainty in mathematics—at least, something closer to certainty than is possible in any other area of human inquiry—just not certainty about every possible question that might occur to us. Or rather: there’s no single formal system (set of axioms and inference rules) that can deliver certainty about every question, even every arithmetical question. You’d need to keep adding new axioms without end if you wanted to settle everything.

    It also needs to be said that, 88 years after Gödel, the jury is still out on whether any of this matters for the sorts of questions mathematicians mostly worry about in practice. I.e., setting aside questions of transfinite set theory (like the Axiom of Choice and the Continuum Hypothesis), are any major open problems (things like P vs NP, the Riemann Hypothesis, the Yang-Mills mass gap, etc.) actually Gödel-undecidable? We still don’t know the answer to that.

  10. mjgeddes Says:

    Yes Scott, I agree that we *can* have certainty when we use *deductive* methods. But my point was, we’re not limited to deduction. For instance, for Riemann, we can simply perform billions of numerical calculations to test whether each zero is on the critical line or not. So I’m thinking that this deployment of numerical methods let’s us get beyond deduction? (but at the cost of giving up certainty).

  11. Gian Mario Says:

    I might being naive, but could not a computer simply add any unproven or unprovable conjecture/sentence as a new axiom and keep generating new theorems (either everything will work fine forever or it will generate a contradiction at some point)? Just like mathematicians do.

  12. Scott Says:

    Gian Mario #11: Of course you can try that; the question is, which unproven conjectures do you add? E.g., do you add the Unique Games Conjecture or its negation? The point of Gödel and Turing is that there’s no algorithmic way to make these decisions that’s also infallible. And the response of most people in AI, understandably, has been to shrug their shoulders. An AI wouldn’t need to be infallible to change the world beyond recognition, it would merely need to be as good as humans or better, and Gödel and Turing have nothing whatsoever to say against that!

  13. Greg Rosenthal Says:

    Andrew Certain #3: Here‘s one way a computer could sensibly assign probabilities to mathematical statements that haven’t been proven or disproven.

  14. Gian Mario Manca Says:

    I was thinking about it of course: “which new axiom do you add?”. As you mentioned I don’t think there is a “good” intrinsic mechanism to add new axioms. That’s what makes mathematics a human enterprise I guess. An AI will need some utility function to maximize to learn how to make these decisions. Maybe we can tell the AI: “maximize the number of your citations in the current mathematical journals” 😀

  15. William Hird Says:

    @ James #5

    How do you people get that “O” with the two little dots above it for Goedel’s last name, that’s so cool, my computer keyboard doesn’t seem to have that character, I’m feeling empty all of a sudden……….. 🙁

  16. ppnl Says:

    Scott #9:

    The way I see it math and physics are doing the same thing but in different directions. In physics, we look at the complex world and try to derive a simple set of “laws” that create that complex world. In math, we start with some simple set of assumptions and try to derive the complex consequences of those assumptions.

  17. ppnl Says:

    James,

    You have to use alt key codes. I can’t do that because my keyboard does not have a numlock or numpad keys.

    So I would use copy and paste. Easier anyway.

    I think you can also turn on the onscreen keyboard and switch to a German keyboard.

  18. domotorp Says:

    @Sebastian I have a little trouble understanding your notation. For example in the proof of Theorem 4 after the algorithm box, shouldn’t that A(A) be H(H)?

    @William Try a non-US keyboard, or US with international input. But simplest is to just copy paste a ö that someone else wrote…

  19. Gabriel Says:

    There’s also the alternative proof that the halting problem is undecidable, using the busy beaver function instead of diagonalization: We know that BB(n) is eventually larger than any computable function, in particular it is uncomputable. But if the halting problem were decidable, then we could compute BB(n).

  20. William Hird Says:

    @ Domotorp #18 & PPNL #17

    Thank you both, I did not know there was such a thing as a non-US keyboard, I’m feeling Gödel -Complete already 😉

  21. Sebastian Oberhoff Says:

    @domotorp No. We’re not necessarily running H on itself. We’re running H on an arbitrary algorithm A and trying to figure out whether that algorithm halts on itself. That’s the form of the Halting Problem stated in Lemma 1.

  22. James Gallagher Says:

    William #15

    In Linux it used to just be “Alt Gr + [” then press “o” with my GB keyboard, but I was on Windows and so did it the dumb way by using the inbuilt character map app.

    It is lovely that the above comments actually demonstrate there are several solutions to these kinds of problems, and the simplest is the one we quite often miss – just copy and paste the character from a post above you!

  23. Joshua Zelinsky Says:

    Scott #6

    “Even for P≠NP, I’m on record as saying that my confidence is “merely” ~98%. For (say) e+π being irrational, my confidence might be 99.99999%—so high that seeing a proof wouldn’t even substantially increase my confidence, because of the possibility of an undiscovered error in the proof—but it’s not 100%.”

    Can you explain why you have such a high confidence about e+π being irrational? We’ve gone out to a lot of digits, and we don’t know any structural reason that e+π should be related, so our default presumption is that there shouldn’t be a relation. But nothing structurally awful happens if it turns out to be rational; there’s nothing equivalent to say the weirdness one would get from P=NP. My own confidence on the irrationality of e+π is much lower (maybe 99%), and to some extent, this problem is closer to my sort of work than yours (although still not incredibly close).

    I wonder if this high confidence is connected to the fact that P ?= NP is in your own field, whereas e+π being irrational is in another branch of math; it wouldn’t surprise me if people are frequently more confident about the major conjectures in nearby subfields than their own. On the other hand, it seems that there are a lot of mathematicians who don’t realize how much evidence there is to believe that P!= NP and a lot of non-number theorists who don’t realize how much evidence there is for RH to use another example.

  24. JimV Says:

    Andrew Certain, thanks for your comment since it is one a layman like myself can respond to with a degree of self-assurance. The way I see is that human brains are nano-tech computers with around 100 billion neurons for memory (which can be supplemented by external memory–books, notes, computer files, etc.) and about a quadrillion synapses for processing. Its base code/operating system has been refined by biological evolution over hundreds of millions of years. The largest super-computer yet developed has about a million processors, and its operation system has had less than 100 years of development, albeit at a faster generation rate of new versions.

    So it will be no easy task to develop an electronic computer as powerful as the human brain, and maybe the human race will kill itself before it can.

    On the other hand, most of our brains’ computer power is used for translating light impinging on retinal surfaces into a 3D representation of the world, picking out one voice in a noisy crowd, walking and running without falling over, and so on, whereas a computer can ignore all that and focus on very specific problems, such as beating us in chess and Go (which it can do handily).

  25. Scott Says:

    Joshua #23: Just because it feels like a totally arbitrary coincidence that could have no reason to be true. Or think of it this way: among e+π, e+2π, e+3π, and every other nonzero rational combination of the two, at most one could possibly be rational, right? So then why should e+π be the one? 🙂 But OK, on reflection and prompted by you, I’ll downgrade my confidence to 99.99%. Adopting Eliezer Yudkowsky’s heuristic, I suppose that given 10,000 equally “obvious” questions, I’d guess wrong for at least one of them.

  26. Joshua Zelinsky Says:

    Actually, that’s a pretty strong argument about all the possible rational combinations. That does increase my confidence that it is irrational, but I’m not sure how much I should update from that argument.

  27. Sniffnoy Says:

    William Hird:

    If you have a compose key set up, then you get get “ö” via compose-quote-o. Or compose-o-quote.

    But note that even if you have no keys set up for this sort of thing, Shtetl-Optimized accepts (some, limited) HTML — in particularly, it accepts HTML entities. So you can make “&ouml” by entering “ö”. And this will work anywhere that accepts HTML, including most things that only accept limited HTML.

  28. Sniffnoy Says:

    Josh #26: I mean, we expect more than just that e and π are linearly independent over Q, right? We expect that they’re algebraically independent over Q. But I’m not entirely sure how this fits into what you and Scott are saying about estimating just how much we expect this. 😛

  29. gentzen Says:

    Scott #9: If the jury uses handwaving concepts like “mostly worry about in practice” or “major open problems”, then it is no surprise that it remains unclear whether Gödel matters:

    It also needs to be said that, 88 years after Gödel, the jury is still out on whether any of this matters for the sorts of questions mathematicians mostly worry about in practice. I.e., setting aside questions of transfinite set theory (…), are any major open problems (…) actually Gödel-undecidable? We still don’t know the answer to that.

    Let me first note that we cannot prove that the Riemann Hypothesis is Gödel-undecidable without actually proving it. So we need to look among proved theorems for suitable candidates of major (no longer open) problems that could be judged as Gödel-undecidable:

    The graph minor theorem is one potential candidate, which has nothing to do with transfinite set theory. It is currently beyond the reach of formal systems analysed in terms of ordinal analysis, i.e. it is not provable in Π11-CA0. However, I don’t think that it goes beyond higher order logic (i.e. MacLane set theory).

    Martin’s theorem that all Borel sets are determined is another potential candidate, which goes beyond higher order logic. (It depends on replacement in an essential way.) It might be accused of being related to transfinite set theory, but it shows that replacement has consequences in analysis, consequences even for the simple sets of reals favored by the French analysts.

    If I were the jury, then the graph minor theorem would be fair game, as long as ordinal analysis doesn’t catch-up. (And even if it did, one would have to check whether the analysis is clear enough to be fully convincing that the corresponding formal system is consistent.) Martin’s theorem is not as good, because I only learned about it while reading Penelope Maddy’s “Believing the Axioms”, linked to by a comment on a question about justifying mathematics.

  30. Jon K. Says:

    I’m interested in people’s thoughts about what criteria you look for in an axiom… Do you require it to be “self-evident”? Or is an axiom being independent of your current formal system sufficient for you to add it? Do you view math as talking about some objective abstract entity, or is it just symbol shunting? I always wonder if Platonic mathematicians and a practical physicist should have different criteria for axioms.

    Also, I watched a talked about “abstract state machines” from Yuri Gurevich recently. Does anyone have thoughts on his view that the Church-Turing thesis can’t be true for all “algorithms” in a more general sense of the word.

    Happy new year!

  31. William Hird Says:

    Sniffnoy#27
    Thank you for the suggestion, although I’m not that computer literate yet, I will put that on my New Year’s Resolution List along with proving P!=NP 🙂

  32. Scott Says:

    gentzen #29: Sure, there are borderline cases (and Harvey Friedman’s work over the last 50 years has precisely been probing this border). But if any questions like P vs. NP were proved independent of ZFC, that would be completely unambiguous evidence that incompleteness mattered for conventional mathematical practice, with no handwaving needed.

    (I’m taking it as undisputed here that incompleteness was one of the greatest insights in the history of human thought, and that it matters enormously for mathematical foundations and set theory and logic and CS. But the question we’re asking here is different.)

    It’s actually an open problem whether the arithmetical consequences of the Graph Minor Theorem (i.e., the reasons why people would care about that theorem in CS) are independent of PA—I had a discussion with Friedman a couple years ago about precisely that point. But let’s assume for the sake of argument that they are independent of PA. Even then, that, and the other examples you gave, are of course not examples of independence from full ZFC—as evidenced by the very fact that we call them “theorems”!

    If, hypothetically, RH were to be proved equivalent to Con(ZFC), that would be one way for its unprovability in ZFC to be established by means that didn’t simultaneously prove in the conventional sense that RH was true. (Though in such a case, we would have a proof of RH from a large cardinal axiom.)

  33. Scott Says:

    Jon #30: There are two very different things people mean by “axioms.”

    First, there are axioms for particular theories, like Euclidean geometry or groups or polynomial-time computation. The criteria for those sorts of axioms are things like: do they capture the phenomena we care about? Are our original motivating examples actually models for the axioms? Are things that we didn’t want also models? Are the axioms minimal (could we get by with weaker axioms)? Are they simple and intuitive?

    Then there are axioms that are proposed for “the whole of mathematics,” like those of ZFC. Here we have all the above requirements, but additionally, a Platonist would want the axioms to be “metatheoretically true,” whatever is meant by that.

    And crucially, even if we’re not Platonists—e.g., even if we believe there’s no fact of the matter about the Axiom of Choice or the Continuum Hypothesis—I’d say that we still, at the least, need the axioms to be arithmetically sound. That is, we need all their consequences for ordinary arithmetic to be true, so that (e.g.) if they imply that some Turing machine halts, then that machine does in fact halt when we run it. That’s the requirement that’s presumably violated by, for example, the theory ZFC+Not(Con(ZFC)), despite that theory’s consistency.

  34. Sniffnoy Says:

    Wait, why did my HTML entity not work? It worked fine in preview. >_> Does ö really not work here? Oy…

  35. Sniffnoy Says:

    OK I guess I must have just entered it wrong the first time or something… you can indeed enter “ö” for “ö”. Yes? It’s not going to suddenly stop working? (I sure hope so.)

    …I feel kind of like I should say something vaguely on topic about the graph minor theorem or about Gödel’s theorems or something, but, eh, I’ll skip it. 😛

  36. Vanessa Says:

    I can’t help but wonder whether Scott Aaronson and Joshua Zelinsky will Aumann-agree on the probability of e+π being irrational by the end of this thread 🙂

  37. gentzen Says:

    Scott #32: My guess is that RH has a real chance to be independent of ZFC and all large cardinal axioms that are currently being considered. It might need axioms derived from different considerations (maybe even contradicting ZFC) than the maximization and limitation of size principles related to the intuitive conception of set theory. (Something “unjustifiable” like that events having zero probability won’t happen without an “explainable/understandable” reason.) I see no reason to guess something similar for P vs. NP, especially since our understanding (and capabilities) still grows steadily with no end in sight.

    If we look for examples of (provably) Gödel-undecidable problems today, I expect that they will be related to “size issues”. As long as those size issues are related to ordinal numbers, they should have arithmetical consequences in the sense that some fast growing function is total. I would have thought that Friedman’s SSCG function is such a fast growing function as a consequence of the Graph Minor Theorem, and that PA would be unable to prove that it is total. I am a bit surprised to learn that this has not been established rigorously. Maybe I should browse the survey What’s so special about Kruskal’s theorem and the ordinal Γ0? A survey of some results in proof theory by Jean H. Gallier (1991) to learn more about the details.

    On the other hand, maybe I should use your remark about arithmetical consequences as an opportunity to defend the use of second order arithmetic for reverse mathematics. Simply put, first order arithmetic is not expressive enough to state theorems like the model existence theorem or Ramsey’s theorems. While P vs. NP is equivalent to a Π_2^0 sentence, separation principles like Kőnig’s lemma must talk about (the existence of) classes, and hence are not equivalent to arithmetical sentences. And if a theorem is not equivalent to some arithmetical sentence, then mathematicians are normally interested in the theorem itself, and not just in its arithmetical consequences.

  38. Daniel Pehoushek Says:

    A curious idea that NP is simply equivalent to #P: (where should I post it?)

    Complete correctness of programs for NP (zero chopped subspaces, zero randomization) implies true equivalence with #P, because, All satisfying solutions are acceptable as answers from the program. Inequality with #P would imply NOT all solutions are answers, contradicting correctness.

    I have had two decades of experience with #sat solvers, so the theorem comes from experience. My best counting benchmark (eleven weeks, 4cnf 4 coloring, degree 6 graph): #P 472,406,068,323,174 inferences 67 trillion. At Satisfiability 2002, I published an examination of #P=#Q, where the number of satisfying assignments equals the number of valid quantifications. That theorem required more than three inferences.

  39. Sebastian Oberhoff Says:

    As I keep pondering this topic I came across another little result I thought was kinda cute. A (new?) negative solution to the decision problem:

    Theorem: Any consistent formal system F capable of reasoning about algorithms can’t be decided. More precisely, there’s no algorithm D which can determine for an arbitrary sentence S whether S is provable or not.

    Proof: Suppose F is decidable. Then one can amend F by adding the inference rule

    If D(S) returns false, then ¬S is a theorem.

    This gives rise to a new system F’ which is still capable of reasoning about algorithms. It’s only difference to F is that all the undecidable sentences have become disprovable. Hence F’ is complete. But as I show in the essay any consistent formal system capable of reasoning about algorithms can’t be complete. □

    This turns undecidability into a consequence of incompleteness. And because incompleteness is an obvious consequence of incompleteness the two concepts kinda meld together. I’m not sure if there’s anything interesting left to be said here.

  40. Jay- Says:

    Happy new year to all!

    Scott #33

    >I’d say that we still, at the least, need the axioms to be arithmetically sound. That is, we need >all their consequences for ordinary arithmetic to be true, so that (e.g.) if they imply that some >Turing machine halts, then that machine does in fact halt when we run it. That’s the >requirement that’s presumably violated by, for example, the theory ZFC+Not(Con(ZFC)), >despite that theory’s consistency.

    Is this formulation known or believed equivalent to the wikipedia short definition (sound == all theorems are tautologies)? In other words: is it “presumably” as in “Experts knows that presumably…” or as in “Experts would presumably agree that…”?

    Also, I’m probably missing something but I would have thought the reverse: let U be a consistent but unsound theory, M a TM that doesn’t halt, then U can say M halts after S steps if S can’t be defined using axioms in U. On the contrary, if one theory says a particular TM doesn’t halt and in fact it does, can’t we use that to prove the theory inconsistent?

  41. Sebastian Oberhoff Says:

    Maybe there really is something more to say about the proof I just gave above.

    So Hilbert’s “Entscheidungsproblem” asked for a formalization of “all of mathematics” which was decidable. That is, just as proofs could be checked by machine, it should be possible to *find* proofs by machine. There should be an algorithm D which answers “true” and “false” depending on whether or not a given sentence S was provable. The existence of such an algorithm was not ruled out by Gödel because D could deal with undecidable sentences by (correctly) designating them not provable. That would’ve still met Hilbert’s demands.

    Here’s where Turing comes along with the undecidability of the Halting Problem. I’ve never seen the details spelled out but presumably they run as follows:

    Theorem: Let F be a sound formalization of all of mathematics. Then there’s no decision procedure for F.

    Proof: Suppose D is a decision procedure for F. Because F formalizes “all of mathematics” it better also be able to reason about Turing machines. This would allow us to solve the Halting Problem for algorithm A as follows. Simply ask D whether «A(A) halts» is provable. If A(A) halts, this can be proven by writing out the computation and D will say so. If A(A) doesn’t halt, then this can’t be provable without F becoming unsound in the process. So D will answer that «A(A) halts» isn’t provable. □

    Note that this argument crucially rests on F’s soundness. If F is merely consistent, then it might be able to prove «A(A) halts» despite the fact that A(A) doesn’t halt without becoming inconsistent. But then D no longer solves the Halting Problem.

    The argument I give in the previous post rests on consistency alone.

  42. Scott Says:

    Daniel #38: I confess that I didn’t understand your argument in the slightest. For starters, NP consists of languages and #P of integer-valued functions, so they can’t be equal. Were you instead trying to argue NP=P#P? If so, then for starters, why do you think NP=coNP? (I.e., if the number of satisfying assignments is zero, what’s a short proof of that fact that can be efficiently checked?)

  43. fred Says:

    Atoms auto-assemble to create self-replicating and mutating structures, which eventually expand to “do mathematics” (exploring the mathematical realm for its truths).
    Is there a connection between the basic elements of physics (the physical “axioms”) and the basic axioms of mathematics?
    Is a Turing complete system (like game of life) enough to eventually evolve structures that can spontaneously obsess over mathematics?

  44. Sebastian Oberhoff Says:

    Two minor corrections:

    When I said in the post before the last that “incompleteness is an obvious consequence of incompleteness” I meant “incompleteness is an obvious consequence of undecidability”.

    When I said in the last post that “If A(A) doesn’t halt, then this can’t be provable without F becoming unsound in the process” I meant “If A(A) doesn’t halt, then «A(A) halts» can’t be provable without F becoming unsound in the process”.

  45. Maximilian Says:

    Hi Scott,

    when you say you have a certainty of about 98% for the truth of P!=NP (or of some other mathematical conjecture), would you say this is for similar reasons as for which e.g. some physicist would assign a similar probability to the truth of some natural law?

    It seems to me that both mathematicians and empirical scientists arrive in essentially the same way at those beliefs, namely by some kind of inductive inference. E.g. some “inference to the best explanation” of already known facts, with assumed higher probability for simpler explanations etc.

    And this inductive method seems to work in mathematics similarly well as in science, since the results of many mathematical conjectures was suspected (believed with high probability) by mathematicians long before they have been proved.

    This seems to shed interesting light on the “problem of induction”: If inductive inference works in mathematics, then the explanation why induction (in general) works, cannot be “because we happen to live in a universe where induction works”. Since mathematics, unlike empirical science, is arguably the same in every possible universe, the fact that inductive inference works cannot be attributed to a contingent statement like “our universe happens to be induction friendly”.

    In other words: If induction works in mathematics, and mathematics works necessarily, induction must also work necessarily, since the success of induction in mathematics cannot be explained by contingent circumstances like in which kind of universe we happen to live.

    This would mean that induction works necessarily, and that therefore there must exist some purely mathematical explanation of why induction works. E.g. a proof that (in some precise sense) “simpler” hypotheses are indeed more likely to be true, like Ockham’s razor conjectures.

    (If induction indeed works necessarily, this of course doesn’t change the fact that it still only works probabilistically. Also to be clear, I’m of course not talking about the familiar proof technique “mathematical induction”.)

    It would be nice to hear whether you find this plausible. Or whether you think inductive inference in mathematics is of an entirely different kind as the one employed in science and everyday life.

  46. jemand Says:

    In the Rosser-via-Turing-dicussion there was something about that you get an intermediate degree less than the Halting-Problem. Does the S.O.-construction give new insights on that?

  47. Randy Olsen Says:

    As an alternative to viewing math and science (esp. physics) in a hierarchy with math on a higher level (given its apparent purity, which I do appreciate), I have wondered if mathematics might be an *interface* between different physical descriptions of reality. Why this might not have been considered much previously is because almost all effort has gone into developing the space-time description. I would also call this the energy-and-momentum-centric description, and we’ve benefitted from its fantastic successes.

    Now, however, with “information” (and associated topics such as error correction) being discussed as physical, might we be developing a description that we are connecting to the space-time description via mathematics? At least in a symbolic sense, imagining coordinates alternative to space-time is not so difficult. For example, rather than work in time, with its “tyranny” of energy conservation, why not play around in a coordinate in which information could be conserved? Exactly when some information is exchanged may not be as important as whether it is ever exchanged, period. Rather than spanning time, we might want to span… scenarios.

    As a possible entry point into such thinking, how about trading time for something like Temperature*time (from dimensional analysis with Planck’s constant and entropy)?

  48. Godel's ghost Says:

    “…if you declare yourself a very stable genius, you’re not…”

    Mr. Oberhoff, any formal system capable of [political] reasoning cannot prove its own consistency, including the ones reasoned that Trump is an unstable idiot.

  49. fred Says:

    It’s not that hard to come up with some scenarios where our machines would not match what our minds can do.

    You could imagine that what we perceive as physical reality (with its limitations) is only a subset of a more general mega-reality that’s way more complex and rich with with resources (higher speed of light, QM where entanglement is more robust, etc), and somehow the human brain is “straddling” the two realities.

    We can actually already do this scenario today/soon with Virtual Reality:

    Imagine that we hook up a group of people, since birth, to a VR system, which has very coarse resources (like a Minecraft world, with any arbitrary limitations on available resources).
    Their experience of reality is only through this VR system. They would build a theory of how physics work in such system, and come up with models for their own brains.. but they would never realize that their own minds are actually tapping into a much more comprehensive reality, which can’t be modeled within the world they’re experiencing.
    Another example would be to couple a QC with a simulation that’s classical.

    It’s all tied to the fact that what we can directly perceive as conscious beings is fairly limited, but I guess you could also imagine a scenario where a very powerful QC is interfaced into a classical computer.

  50. fred Says:

    segue – recent interview of Susskind where he talks a bit about his current research – the connection between wormholes, quantum entanglement, and complexity theory.

    https://youtu.be/CQAcLW6qdQY?t=2978

  51. Sandro Says:

    This would mean that induction works necessarily, and that therefore there must exist some purely mathematical explanation of why induction works. E.g. a proof that (in some precise sense) “simpler” hypotheses are indeed more likely to be true, like Ockham’s razor conjectures.

    Indeed, read up on Solomonoff Induction.

  52. Maximilian Says:

    @Sandro #51: Solomonoff induction assumes that shorter strings of bits have higher probability, i.e. that Ockham’s Razor is true, but it doesn’t give an a priori justification for this assumption. A mathematical explanation why Ockham’s Razor works is exactly what is in question.

    (Also, since Solomonoff induction is based on the notion of a Universal Turing Machine, predictions made by Solomonoff induction can vary extremely based on what implementation of a UTM you use for it, since there is no unique “ideal” UTM known to date. But that’s another problem.)

  53. Trevor Adcock Says:

    Their experience of reality is only through this VR system. They would build a theory of how physics work in such system, and come up with models for their own brains.. but they would never realize that their own minds are actually tapping into a much more comprehensive reality, which can’t be modeled within the world they’re experiencing.
    Another example would be to couple a QC with a simulation that’s classical.

    But turing machines can be constructed in minecraft, fred. They would be confused by how powerful their minds could be. But they should be able to figure out that it’s all reducible to a turing machine.

  54. Sandro Says:

    @Maximilian #52:

    A mathematical explanation why Ockham’s Razor works is exactly what is in question.

    Any prior would converge with Bayesian inference, but minimum-length description/Okham’s razor is a non-arbitrary partial ordering that converges optimally, which is why it’s the universal prior in Solomonoff Induction.

  55. Gerard Says:

    Scott #42

    “Were you instead trying to argue NP=P^#P? If so, then for starters, why do you think NP=coNP? (I.e., if the number of satisfying assignments is zero, what’s a short proof of that fact that can be efficiently checked?)”

    If you have a #P oracle wouldn’t the proof that the number of satisfying solutions is zero simply be that the oracle tells you so ?

  56. Scott Says:

    Gerard #55: Well yes, but so far you haven’t said anything remotely comprehensible to explain why you can implement a #P oracle in NP—something that, if true, would be a revolution in complexity theory.

  57. Maximilian Says:

    @Sandro #54: The optimality argument has been refuted, see http://philsci-archive.pitt.edu/12429/1/soloccam.pdf

  58. Sniffnoy Says:

    Maximillian #57:

    Wow, thanks for that link! I wasn’t expecting much when I clicked on it but it makes a good argument. The key point isn’t stated clearly up front, though, so let me summarize it for everyone else. (Warning, I may be butchering some of the technical details regarding different notions of computability a little.)

    So, the idea is that a universal prior formalizes the principle Occam’s Razor, and a universal prior works well (for some appropriate notion of that), so this shows that Occam’s Razor works, right? The thing is, there’s a theorem that any semicomputable prior on the set of semicomputable sequences (and that assigns nonzero probability to every semicomputable hypothesis) can be represented as a universal prior. So this idea that universal priors are this special Occamian subset of these, is not true — if you demand semicomputability (and non-dogmatism), there’s nothing to contrast them with! And the argument “Occam’s Razor does well in a semicomputable world” just becomes “Assuming a semicomputable world (and being semicomputable yourself) does well in a semicomputable world”, which is less interesting.

    OTOH, one could just reply “Yes, assuming a semicomputable world does well in a semicomputable world, but that’s equivalent to Occam’s Razor doing well in a semicomputable world, so what’s the problem? It still remains the case that Occam’s Razor does well in a semicomputable world, so unless you think the world might not be semicomputable, it’s still justified; it’s OK if one only implicitly uses Occam’s Razor rather than invoking it explicitly.” But — as mentioned above — this kind of requires acknowledging priors that don’t assume semicomputable (or aren’t semicomputable themselves, or are dogmatic) as a live option to contrast against, so this reply seems to be on pretty shaky ground…

  59. Sandro Says:

    @Maximilian Comment #57:

    The optimality argument has been refuted, see http://philsci-archive.pitt.edu/12429/1/soloccam.pdf

    Thanks for the link! The paper actually does not agree with your conclusion, as per footnote 5 in the paper:

    This phrasing suggests a weaker property than reliability (i.e., convergence to the truth), namely optimality (convergence to predictions that are at least as good as those of any other prediction method). However, the proof that is referred to is about reliability.

    So the paper covers reliability not optimality. As I initially said, any prior would converge with Bayesian inference, or as the paper says, “We can prove that any Bayesian predictor, operating under the inductive assumption of S, is reliable under the assumption that the data is indeed generated by some source S* ∈ S.”

    I don’t think the optimality of Solomonoff induction is in dispute, and I think optimality is sufficient justification for Occam’s razor as a general recommendation for prediction: it’s simple, it’s explicable/communicable, it’s effective, and it’s optimal.

    However, the paper also concludes that, “If there exists a way to salvage the argument at all, then it would have to consist in demonstrating anew that effectiveness as an inductive assumption does lead to a fundamental simplicity notion”.

    That is, the paper does not definitively establish that effectiveness does not require simplicity, which you seem to imply, it concludes only that effectiveness is sufficient for Solomonoff Induction, and that simplicity does not appear necessary unless it is itself required by effectiveness. That remains an open problem.

  60. Maximilian Says:

    @Sandro #59: The problem is that the Solomonoff prediction argument does not provide an epistemic justification for Ockham’s Razor.

    The justification you name is merely pragmatic. First of all, there already exists a much more direct pragmatic justification of Ockham’s Razor (optimal worst-case performance, see Kelly 2008 in the references) which doesn’t require Solomonoff prediction at all. Also the familiar argument by Reichenbach 1935 or it’s “meta-induction” refinement by Schurz 2008.

    Second, only an epistemic justification can be converted into an explanation, since any epistemic justification E can be used to explain a hypothesis H if H is true. Because any epistemic justification E for H must entail H logically or probabilistically. This is not the case for a pragmatic justification, which just entails the usefulness of H. But an explanation is exactly what should exist by the argument in #45, and therefore also an epistemic justification. By the same argument, this explanation should not require any contingent “inductive assumptions” (like effectiveness of the world), since as I said, induction (and by extension, Ockham’s Razor) apparently works necessarily because it apparently works in mathematics.

  61. Sniffnoy Says:

    By the same argument, this explanation should not require any contingent “inductive assumptions” (like effectiveness of the world), since as I said, induction (and by extension, Ockham’s Razor) apparently works necessarily because it apparently works in mathematics.

    I don’t think this is correct; I think this is a bad generalization. Occam’s Razor seems to work in mathematics. It also seems to work in matters involving the external world. The former of these must hold necessarily. That doesn’t mean the latter must also. I don’t see any reason to assume that one explanation must cover both (well, other than… I don’t need to finish this joke 😛 ). And indeed an explanation based on, say, Solomonoff Induction necessarily only works towards explaining the latter, since Solomonoff Induction assumes you’re already logically omniscient (I’m not familiar with these other arguments but I wouldn’t be surprised if they did too, formalizing partial states of logical rather than empirical knowledge is not so easy).

  62. Maximilian Says:

    @Sniffnoy #61: I don’t see any reason to assume that induction works totally differently in science than in mathematics. They seem to be quite similar in the “inference to the best explanation” sense where they both use Ockham’s razor. So it would require an argument why they should be completely different things.

    Additionally, nobody would argue that deduction works totally differently when you use necessary premises, like “all integers have a successor”, as opposed to when you use contingent premises like “all men are mortal”. It’s completely irrelevant for deduction whether the premises or conclusions are necessary or contingent. Similar thing can be said about probability theory when we use something like Carnap’s logical probabilities as opposed to contingent probabilities like relative frequencies or propensities. This makes absolutely no difference for probability theory. And both deductive logic work and probability theory work necessarily, because otherwise they wouldn’t work with necessary subjects such as mathematics. If it isn’t justified to assume two totally different kind of deduction or probability theory because they seem so similar, then it is also not justified to assume the same thing for induction or Ockham’s razor.

  63. Sniffnoy Says:

    Maximilian #62:

    No, I still think you’re mixing things together that don’t belong together. Different levels, really. Certainly deduction works just as well with contingent premises, but I don’t think one can infer the rest of what you say from that. The relation between necessary domains and contingent ones is asymmetric.

    I’m going to abuse some logic terminology here — sorry for the abuse of terminology but I’m having trouble thinking of a better way to put this — and talk about the level of “theory” and the level of “model”. By “model” I mean, like, a description of the world as a mathematical object. By “theory” I mean the statements we make about the world, about this object. Like I said, I’m abusing logic terminology here, I don’t mean it in quite the same way, but like in logic, a “model” is a mathematical object and a “theory” is statements about that object.

    Deduction is a tool that works at the level of theory. It takes true statements and makes other true statements. By contrast, probability theory is a tool that works at the model level. We imagine a space of possible worlds, the worlds being mathematical objects, and we put a proability distribution on that space. You can assign probabilities to statements, yes, but this happens by replacing the statement by its extension.

    Which is to say, probability theory assumes you are logically omniscient. As you already know, attempting to apply probability theory to mathematics results in triviality — any given statement has probability either 1 (it is true), or 0 (it is false). There are not multiple possible models; your space of possible worlds is a single point.

    Yes, of course in actual logic there can be multiple models of ZFC, etc. Like I said, I’m not talking about that, I’m abusing the terminology a bit. I’m assuming a sort of mathematical platonism where the model comes first and the theory merely describes it. So, if you know the model, you also know the theory.

    And so probability theory, as well as all our other tools based on it for dealing with contingent matters, being model-based, trivialize questions of theory. Sure, it’s true that deduction works for contingent matters, but that’s a theory-level concern; it just lets you navigate around a theory, gets you between different statements within it. And probability theory works on the level where we’re not dealing with those, where we already automatically know all true statements about any given model.

    But if there’s one thing in your comment that makes me stop and say “Hold on, what?” it’s this:

    Similar thing can be said about probability theory when we use something like Carnap’s logical probabilities as opposed to contingent probabilities like relative frequencies or propensities. This makes absolutely no difference for probability theory.

    Carnap’s logical probabilities — as best I can tell from my quick reading, because I actually hadn’t encountered these before, so thank you for bringing them to my attention — are no good for these purposes. They’re “admissible”, they obey the laws of ordinary probability theory. Which means automatically that they’ll assign any necessarily true statement a probability of 1 and any necessarily false statement a probability of 0. Which is to say, they assume logical omniscience, once again, the very thing we’re trying to avoid here. (Carnap’s logical probabilities also seem to require adding various symbols one wouldn’t normally have, but I’ll ignore that for now.)

    My understanding was that the entire idea of “logical probability” — in the sense of, logical probability that avoids logical omniscience, that allows us to assign probabilities other than 0 or 1 to mathematical statements that are nonetheless provable or disprovable — and how to formalize it was basically a gigantic open problem. OK, maybe less gigantic now thanks to the people at MIRI. But it’s pretty obvious why the problem is hard — because, as already mentioned, if your logical probabilities satisfy the laws of ordinary probability, you once again end up with decidable mathematical statements getting a probability in {0,1}, i.e., logical omniscience.

    So, as I said, the relation between these tools — deduction and probability theory — is an asymmetric one. Deduction works at the theory level. It therefore also automatically applies at the model level, which incorporates the theory level. Probability theory works at the model level, which incorporates the theory level; but probability theory is also a tool that trivializes the theory level, so if you attempt to port it back to the theory level without adjustment you get something trivial. Logical probability is, if it is to be nontrivial, not probability; it is a new tool that resembles probability. There’s a real asymmetry here, and the automatic porting in one direction doesn’t mean that you can automatically (nontrivially) port things in the other direction.

  64. gentzen Says:

    And now Adam has posted yet a third segment, in which I talk about small, lighthearted things like existential threats to civilization and the prospects for superintelligent AI.

    Starting at 16:40, Scott says something like “… we don’t have anything in our past experience that is analogous to it … We are asking about a logical possibility that is sort of unlike anything that’s ever been seen. The closest analogy that I can think of is sometimes people ask things like what if these famous math problems like the Riemann hypothesis or P vs NP, what if they are independent of the axioms of set theory and you could neither prove nor disprove them… it is a logical possibility … until we have even one genuine example of it … we can’t really do statistics with a sample of size zero”

    I would like to take that discussion as an opportunity to explain some of my recent thoughts on circularity which I find interesting (and which are more or less related to Sebastian Oberhoff’s rollicking essay). But before that …

    There is a huge difference between a math problem “being independent of the axioms of set theory” and a math problem that “you could neither prove nor disprove”. I agree with Scott that we don’t have examples of math problems even remotely comparable to the Riemann hypothesis or P vs NP that “you could neither prove nor disprove”. On the other hand, I don’t agree with the claim that there would be no nice, good, and natural examples of Gödel-undecidable math problems. Especially since Gödel explicitly mentions the formal system of Principia Mathematica, which is slightly weaker than ZFC, already Martin’s theorem (see my comment #29 above) should do the trick.

    There are quite some famous math problems that have been proved independent of some reasonable natural axiom system in the past, prominent examples include the parallel postulate and the continuum hypothesis. (If I were the judge, I would say they have both been disproven in a suitable sense. I am less sure how to judge the axiom of choice in this context.)

    Can we learn how to answer the “what if …” question from these examples? It seems like we somehow managed to “construct sufficiently nice and natural models” where those hypothesis failed. But how could we ever manage to construct natural models of the natural numbers different from the one true model, in a way that those models would give interesting insights into P vs NP? (My recent thoughts on circularity below give some intuition for how consistent (but unsound/dishonest) axiom systems give models of the natural numbers.)

    Gentzen’s consistency proof of PA suggests a different way to answer the “what if …” question: It uses the additional assumption that epsilon_0 is well founded. An analogy for P vs NP would be to prove P != NP using additional derandomization assumptions. The challenge here is to address the objection that “you explicitly put in (the additional assumptions) what you wanted to prove in the first place”, and that hence your proof is circular and worthless.

    OK, here are my thoughts on circularity:

    Of course Hume is right that justifying induction by its success in the past is circular. Of course Copenhagen is right that describing measurements in terms of unitary quantum mechanics is circular. Of course Poincaré is right that defining the natural numbers as finite strings of digits is circular. (Please forgive me that I simplified those subtle philosophical positions to such objectionable short statements. My comment is already too long, and subtle philosophical elaborations at this point would reduce the number of readers to basically zero.)

    But this circularity is somehow trivial, it doesn’t really count. It does make sense to use induction, describing measurement in terms of unitary quantum mechanics does clarify things, and the natural numbers are really well defined. But why? Has anybody ever written a clear refutation explaining how to overcome those trivial circularities? Peter Smith’s essay Induction and predicativity at least addresses this question with respect to the natural numbers. But I still remember how Nik Weaver got angry when I linked to that essay, basically disagreeing with the starting point that this trivial circularity even exists. My most intuitive explanation of this circularity so far at least no longer caused angry replies, but perhaps this is because it stays so abstract (like a model of set theory where the continuum hypothesis is refuted) that it doesn’t hurt anyone. Nothing is said about where those infinite sequences of digits come from.

    So I wondered how to make this more concrete, in a similar spirit like one shows that equality testing for computational real numbers is undecidable. The idea here is to take some Turing machine for which we would like to decide whether it halts or not, and write out the real number 0.000…000… with an additional zero for every step of the machine, and output one final 1 when (and only if) the machine finally halts. Deciding whether this real number is equal to zero is equivalent to solving the halting problem for the given Turing machine. How can one do something similarly concrete for natural numbers? Suppose that you have a consistent (but maybe unsound/dishonest) axiom system that claim that some given Turing machine would halt. Then this Turing machine defines a natural number in any model of that axiom system. To expose the trivial circularity, we can now use this natural number to define the length of a finite string of digits. How do we get the digits themselves? We can just use any Turing machine which outputs digits for that, independent of whether it halts or not. Well, maybe if it doesn’t halt, it would still be interesting whether it will output a finite or an infinite number of digits, but in the worst case we can ask the axiom system again about its opinion on this matter. The requirement here is that the natural numbers defined in this way should define a natural number in any model of that axiom system.

  65. gentzen Says:

    Sniffnoy #63:

    Deduction is a tool that works at the level of theory. It takes true statements and makes other true statements. By contrast, probability theory is a tool that works at the model level. We imagine a space of possible worlds, the worlds being mathematical objects, and we put a proability distribution on that space.

    I used to think the same about probability theory, until I read Kevin van Horn’s From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem. He puts the (conditional) probability at the level of the logical consequence relation A |= B. For me, as someone who explicitly takes non-classical logic into account, this is a huge difference. I used to think that you must restrict yourself to classical logic, if you want to use probability theory. However, A |= B is just true or false, even in non-classical logic. There is a slight difference that A, B, C |= D is equivalent to A∧B∧C |= D in classical logic, but may be false in some non-classical logics. But A, B, C |= D can still be interpreted as the conditional probability P(D | A, B, C).

  66. Sniffnoy Says:

    gentzen #65: I must admit I’m not following what you’re saying.

    FWIW — and I realize this doesn’t seem to relate to your particular claim, but I think it’s worth noting — it does appear that what Van Horn sets out in that paper is also not logical probability in the sense I discussed above.

  67. Defining a natural number as a finite string of digits is circular | Gentzen translated Says:

    […] idea was part of yet another attempt to explain how defining a natural number as a finite string of digits is circular. This post will […]