Important to notice is that from the mathematical point of view there is no discussion on whether Kolmogorov complexity is suitable or serious, IT IS the mathematical measure of randomness in mathematics, period. But this is not only because of decree. Just as it happened to the theory of computability in the 30’s when Turing, Post, Church, Kleene and others were trying to define the concept of algorithm and they all came up with equivalent theories, the development of Kolmogorov complexity underwent a similar process of convergence of definitions, where the above mentioned (notably Chaitin and Levin) together with Schnorr and Martin-Loef, proved that all these theories dealing with the characterization of (both finite and infinite) randomness are equivalent in a strong way (even if there are some subtle difference when jumping between finite strings and infinite sequences). All these approaches characterize the same randomness using: uncompressibility (Kolmogorov, Chaitin), unpredictability (Schnorr) and statistical tipicality (Martin-Loef). For more see: http://www.scholarpedia.org/article/Algorithmic_randomness

Concerning the uncomputability of Kolmogorov complexity, it is more precise to call K a lower semicomputable function, which is very different to “simply” uncomputable. It means that one can calculate upper bounds as Scott has pointed out, but also that compressibility is a sufficient test for non-randomness. That is, even if you cannot prove that a program is the shortest possible generating a string x, if you do find a short program generating x then you can safely declare x non-random. Again, if you are worried by the choice of universal Turing machine or complete programming language choice, take into consideration that measures calculated with different universal Turing machines or programming languages will only differ by a “small” constant (the size of a translation program between the chosen machine and any other). This is the rationale of the so-called invariance theorem in the theory of Kolmogorov complexity.

Now, it is true that the invariance theorem doesn’t tell how “small” is that constant or, otherwise said, how fast evaluations of K on different universal Turing machines or programming languages will converge to the same values. However, experimental work has been done on this respect with interesting results (that are able to classify and characterize space-time diagrams, images and even graphs and networks), strongly suggesting that the rate of convergence and constant involved is actually small and controllable. Here available references online:

http://arxiv.org/abs/1306.0322

http://arxiv.org/abs/1212.6745

http://arxiv.org/abs/1211.1302

http://arxiv.org/abs/1211.4891

(forthcoming in the journal Computability)

There is even an online Kolmogorov complexity calculator that provides upper bounds using these approaches:

http://complexitycalculator.com/

With explanations that we hope will help understand how to evaluate K and other complexity measures.

References on Kolmogorov complexity and algorithmic randomness:

– Calude’s “Information and Randomness” (Springer)

– Li and Vitanyi’s “Kolmogorov complexity and its applications” (Springer) (already mentioned in 2 occasions by Scott himself)

– Downey and Hirschfeldt’s Algorithmic Randomness and Complexity (Springer)

– Nies’ “Computability and Randomness” (Oxford)

I also dare to recommend my own edited volume:

– Zenil, Randomness Through Computation: Some Answers, More Questions (World Scientific and Imperial College Press). (with contributions from most of the pioneers and current Kolmogorov complexity and randomness researchers explaining their work, challenges and future)

Myself and some colleagues are also preparing a book to be published by Springer next year under the title “Methods and applications of Kolmogorov complexity” which shows the method we have been using by applying the algorithmic Coding theorem (a beautiful theorem that connects Kolmogorov complexity and frequency of production of a string from a random program) to evaluate K as an alternative to the lossless compression approach hitherto used (c.f. recommended papers above).

Last, but not least, consider that Kolmogorov complexity (computable variations in the form of similarity distances and information metrics) have shown to be very successful at spam filtering, authorship detection, phylogenetic classification, among other applications.

To answer last comment (#52), of course Solomonoff’s paper is on Kolmogorov complexity, MDL is clearly a byproduct of Kolmogorov compelxity too. Solomonoff together with Chaitin and Kolmogorov are fully credited for Kolmogorov(-Chaitin-Solomonoff) complexity! The 3 of them proved the invariance theorem, but Solomonoff was more interested in the problem of induction which led to the theory of algorithmic probability and optimal learning.

]]>Infact it isn’t a random event IMHO that the first paper in Machine Learning was also amongst the first in Kolmogorov Complexity (Solomonoff 1956). ]]>

*Using the techniques of today, it looks like the best one can do is essentially to construct a Turing machine that explicitly enumerates the theorems of PA or ZFC—in which case the numerical bounds would be absolutely terrible, since one would need to code up the axiom schema, the inference rules of first-order logic, etc. (Probably the situation is somewhat better in a programming language like Lisp or Python.)*

I’d still be interested in knowing those bounds. While they’re big, they’re not *that* big: there are lots of things in the universe that *seem* to (and probably do) have a higher Kolmogorov complexity than that.

But yes, it would be even better if someone could significantly chop down these ‘naive’ bounds.

*… my own intuition—and Friedman seems to agree—is that this problem actually provides an incredibly useful and underappreciated yardstick. *

I tend to agree with you. But if you haven’t yet, it’s worth reading Panu Raatikainen’s opposing view in On interpreting Chaitin’s incompleteness theorem, *Journal of Philosophical Logic* **27** (1998). At the very least it calls for fine-tuning any claims one wants to make.

1) regarding complexity:

Would you mind to answer my question post #37, or maybe explain why it’s not an interesting question?

2) regarding “Simpler” statements equivalent to Con(PA) or Con(ZFC), and inspired by Stefan Geschke’s answer on mathoverflow:

What about trying all diophatine equations below some threshold? The idea would be that coding “Y” as an integer and “try all solutions to all diophantine equations using less than Y variable” can requiere far less space than “try all solution to the particular diophantine equation Z”.

Pseudocode:

a-select a set of X integer (say 10000)

b-try it as a tentative solution to any diophantine equation including less than X variables

c-count the number of equation solved so far

d-halt if this number is below a certain integer Y, or else redo (a-d) using the next set of integer

For a well chosen couple of integer X an Y, it will halt iff counterexamples to Con(PA) or Con(ZF) can be found. X is “easy” to find (write the most obvious equivalent diophantine equation and look its size), wheras Y is absolutly non trivial (but garanteed to exist) as it is the number of solvable diphantine equations (below X variables) not equivalent to Con(PA) or Con(ZF).

Two interesting (?) observations however:

-this garantees the existence of a TM much shorter than the “obvious” ones

-one can design a series of TM equivalent to the consistency of any interesting set of axioms, and the size of these TM will increase at most as the log of the number of variable of the most obvious reduction to a diophantine equation.

Does that partially answer your question, or you were thinking at even more drastic shortening?

]]>I feel exactly the same way! As you say, it may take a while for someone to figure this out. It may take someone who knows a lot about thermodynamics *and* computation, and thinks about both at once for a long time.

*As you freely admit, right now the choices of log(runtime), program output as a positive integer, etc. seem totally arbitrary: why not the space usage or the runtime mod 2?*

Well, a lot of thermodynamics is about *maximizing entropy subject to constraints on quantities you care about*: from here we inevitably get the partition function, the fact that each of our quantities has a “conjugate”, and all those confusing relationships involving partial derivatives of our quantities and their conjugates.

*You get to pick* the quantities you care about. If you’re a physicist, you care about energy, so you’ll be forced into thinking about its conjugate: roughly, temperature. If you’re a physicist studying a box of gas, you care about its volume, so you’ll also be forced into thinking about its conjugate: roughly, pressure. And so on.

But if you’re a computer scientist, you’re interested in other things. We picked some examples of things you might be interested in, to illustrate this philosophy. But it’s really up to you. If the runtime mod 2 is what turns you on (not likely), go ahead and use that! All the basic framework will still apply.

Our plan had been to study more properties of the partition function Z(beta,gamma,delta) which we begin to discuss in Section 4.4. That’s the “obvious thing to do” from a thermodynamics perspective. The rate at which it diverges as beta, gamma and delta approach certain values should be interesting. It could provide a new way of thinking about the average behavior of computer programs.

Unfortunately, the partition function seems highly dependent on the choice of universal Turing machine. We say a bit about this. The problems we point out arise from “goofy” universal Turing machines where infinitely many programs halt immediately, or all programs take a long time to run. So, I wonder if there’s an interesting class of “non-goofy” universal Turing machines.

It seems we can’t say much about “the average behavior of computer programs” if we can’t define and exclude goofy universal Turing machines, which are like goofy programming languages that nobody would actually want to use.

]]>Thanks for the comment!

Regarding your and Bruce Smith’s wonderful question: welcome to the club; I’ve also been asking people for explicit numerical bounds on where the “onset of undecidability” occurs! In fact, half a year ago, I asked a closely-related question on MathOverflow. I didn’t get a definite answer, but I did learn that there’s approximately *one* logician who’s actively worked on this problem: Harvey Friedman.

Using the techniques of today, it looks like the best one can do is essentially to construct a Turing machine that explicitly enumerates the theorems of PA or ZFC—in which case the numerical bounds would be absolutely terrible, since one would need to code up the axiom schema, the inference rules of first-order logic, etc. (Probably the situation is somewhat better in a programming language like Lisp or Python.) Finding a *short* program whose behavior is provably independent of PA or ZFC seems closely-related to the notorious problem of finding “natural” Gödel-undecidable arithmetical statements. Thus, while finding the smallest N such that the N^{th} Busy Beaver number is independent of ZFC might be dismissed as a parlor game, my own intuition—and Friedman seems to agree—is that this problem actually provides an incredibly useful and underappreciated yardstick. I doubt much progress can be made on it without some genuine breakthroughs in mathematical foundations!

Moving on to your paper with Mike Stay. You know, every time I read anything technical you’ve written, my head spins from notions that look at first glance like “compile-time errors”: a finite field with one element. Programs that never halt as “singularities to be patched up.” The log of a program’s runtime as analogous to the energy of a gas container. Then I wish my brain would expand enough that these notions would compile.

Since I already knew (and accepted) that algorithmic entropy was closely related to thermodynamic entropy, I find the new quantities that you and Stay define—algorithmic temperature, algorithmic heat, algorithmic pressure—deliciously tantalizing. But I kept asking myself what these quantities are *good for*—i.e., what questions in complexity or computability could they help address? Indeed, until I knew a use to which these quantities were going to be put, I don’t think I could even understand the more basic matter of why we should *define* them in one way rather than another way. (As you freely admit, right now the choices of log(runtime), program output as a positive integer, etc. seem totally arbitrary: why not the space usage or the runtime mod 2?)

On the other hand, the metaphor you construct is so intriguing that, even if neither you nor I can think of any use for it at present, I can readily imagine *someone, someday* finding a use!

On the third hand, I suspect that the *biggest* application of your metaphor might be one that you don’t even mention in the paper. Namely, it could provide a perfect pedagogical tool for *explaining* thermodynamic concepts to people with math/CS backgrounds! When I took chemistry and physics as an undergrad, I was driven up a wall by the constant demands to manipulate the partial derivatives of magic quantities called “entropy,” “temperature,” and “pressure” using rules that one memorized, without ever seeing what those quantities *were* in a mathematically self-contained way. (An unrealistic toy system would’ve been fine, as long as I was told everything about it! Best of all would’ve been a *discrete* toy system.)

Today, if I ever feel a need to relearn basic thermodynamics, your paper is going to be the first thing I consult—even though it almost certainly wasn’t intended for that purpose!

]]>*I also learned a valuable sociological lesson about the “two kinds of complexity theory”…*

Yeah: if you explain things clearly more people will think about them – and the results are not always pretty. 🙂

Still worth the risk, I’d say.

I’ve been thinking about Kolmogorov complexity a bit these days. I was going to mention my paper with Mike Stay on algorithmic thermodynamics, but I see Jochen beat me to it. I still have an excuse for doing so, though: a little correction. Our paper does *not* apply Kolmogorov complexity to thermodynamics. Rather, we take the relation between Kolmogorov complexity and entropy and extend it so we can apply more concepts from thermodynamics to the theory of computation! I think there’s room for more work in this reverse direction – I hope people try it.

I also wrote up a pop explanation of how Kritchman and Raz proved Gödel’s second incompleteness theorem using Kolmogorov complexity and the idea behind the ‘surprise examination’ paradox, here. And Bruce Smith and I asked a question which you or your readers might be able to answer.

]]>@ Comment #41: Classy response.

“Fundamentally, though, this isn’t a current events blog; it’s a procrastination blog.”

Great closer!

]]>Ha, I epically resemble that remark.

]]>– Edsger Dijkstra

And if it weren’t, what would Scott have to say about Jobs? Also,

“If you have n bits of axioms, you can never prove that a program is the smallest possible if it is more than n bits long.”

– Gregory Chaitin

So in the limit, talking to the iPhone 4S is just as useful as talking to another piece of plastic.

]]>