The Limits of Mathematics

G. J. Chaitin, IBM Research Division, P. O. Box 704, Yorktown Heights, NY 10598, chaitin@watson.ibm.com

Journal of Universal Computer Science 2, No. 5 (1996), pp. 270-305

Introduction

In a remarkable development, I have constructed a new definition for a self-delimiting universal Turing machine (UTM) that is easy to program and runs very quickly. This provides a new foundation for algorithmic information theory (AIT), which is the theory of the size in bits of programs for self-delimiting UTM's. Previously, AIT had an abstract mathematical quality. Now it is possible to write down executable programs that embody the constructions in the proofs of theorems. So AIT goes from dealing with remote idealized mythical objects to being a theory about practical down-to-earth gadgets that one can actually play with and use.

This new self-delimiting UTM is implemented via software written in a new version of LISP that I invented especially for this purpose. This LISP was designed by writing an interpreter for it in Mathematica that was then translated into C. I have tested this software by running it on IBM RS/6000 workstations with the AIX version of UNIX.

Using this new software and the latest theoretical ideas, it is now possible to give a self-contained ``hands on'' mini-course presenting very concretely my latest proofs of my two fundamental information-theoretic incompleteness theorems. The first of these theorems states that an N-bit formal axiomatic system cannot enable one to exhibit any specific object with program-size complexity greater than N + c. The second of these theorems states that an N-bit formal axiomatic system cannot enable one to determine more than N + c' scattered bits of the halting probability Ω.

Most people believe that anything that is true is true for a reason. These theorems show that some things are true for no reason at all, i.e., accidentally, or at random.

As is shown in this course, the algorithms considered in the proofs of these two theorems are now easy to program and run, and by looking at the size in bits of these programs one can actually, for the first time, determine exact values for the constants c and c'.

I used this approach and software in an intensive short course on the limits of mathematics that I gave at the University of Maine in Orono in the summer of 1994. I also lectured on this material during a stay at the Santa Fe Institute in the spring of 1995, and at a meeting at the Black Sea University in Romania in the summer of 1995. A summary of the approach that I used on these three occasions will appear under the title ``A new version of algorithmic information theory'' in a forthcoming issue of the new magazine Complexity, which has just been launched by the Santa Fe Institute and Wiley. A less technical discussion of the basic ideas that are involved ``How to run algorithmic information theory on a computer'' will also appear in Complexity.

After presenting this material at these three different places, it became obvious to me that it is extremely difficult to understand it in its original form. So next time, at the Rovaniemi Institute of Technology in the spring of 1996, I am going to use the new, more understandable software in this report; everything has been redone in an attempt to make it as easy to understand as possible.

For their stimulating invitations, I thank Prof. George Markowsky of the University of Maine, Prof. Cristian Calude of the University of Auckland, Prof. John Casti of the Santa Fe Institute, and Prof. Veikko Keränen of the Rovaniemi Institute of Technology. And I am grateful to IBM for supporting my research for almost thirty years, and to my current management chain at the IBM Research Division, Dan Prener, Christos Georgiou, Eric Kronstadt, Jeff Jaffe, and Paul Horn.

This report includes the Mathematica code for the LISP interpreter lisp.m, and the LISP runs *.r used to present the information-theoretic incompleteness theorems of algorithmic information theory. This report also includes the input to the interpreter *.l for producing these LISP runs, and the C version of the interpreter lisp.c.

The New Idea

Here is a quick summary of this new LISP, in which atoms can now either be words or unsigned decimal integers. First of all, comments are written like this: [comment]. Each LISP primitive function has a fixed number of arguments. ' is QUOTE, = is EQ, and atom, car, cdr, cadr, caddr, cons are provided with their usual meaning. We also have lambda, define, let, if and display and eval. The notation " indicates that an S-expression with explicit parentheses follows, not what is usually the case in this LISP, an M-expression, in which the parentheses for each primitive function are implicit. nil denotes the empty list (), and the logical truth values are true and false. For dealing with unsigned decimal integers we have +, -, *, ^, <, >, <=, >=, base10-to-2, base2-to-10.

So far this is fairly standard. The new idea is this. We define our standard self-delimiting universal Turing machine as follows. Its program is in binary, and appears on a tape in the following form. First comes a LISP expression, written in ASCII with 8 bits per character, and terminated by an end-of-line character `\n'. The TM reads in this LISP expression, and then evaluates it. As it does this, two new primitive functions read-bit and read-exp with no arguments may be used to read more from the TM tape. Both of these functions explode if the tape is exhausted, killing the computation. read-bit reads a single bit from the tape. read-exp reads in an entire LISP expression, in 8-bit character chunks, until it reaches an end-of-line character `\n'.

This is the only way that information on the TM tape may be accessed, which forces it to be used in a self-delimiting fashion. This is because no algorithm can search for the end of the tape and then use the length of the tape as data in the computation. If an algorithm attempts to read a bit that is not on the tape, the algorithm aborts.

How is information placed on the TM tape in the first place? Well, in the starting environment, the tape is empty and any attempt to read it will give an error message. To place information on the tape, one must use the primitive function try which tries to see if an expression can be evaluated.

Consider the three arguments α, β and γ of try. The meaning of the first argument is as follows. If α is no-time-limit, then there is no depth limit. Otherwise α must be an unsigned decimal integer, and gives the depth limit (limit on the nesting depth of function calls and re-evaluations). The second argument β of try is the expression to be evaluated as long as the depth limit α is not exceeded. And the third argument γ of try is a list of bits to be used as the TM tape.

The value ν returned by the primitive function try is a triple. The first element of ν is success if the evaluation of β was completed successfully, and the first element of ν is failure if this was not the case. The second element of ν is out-of-data if the evaluation of β aborted because an attempt was made to read a non-existent bit from the TM tape. The second element of ν is out-of-time if evaluation of β aborted because the depth limit α was exceeded. These are the only possible error flags, because this LISP is designed with maximally permissive semantics. If the computation β terminated normally instead of aborting, the second element of ν will be the result produced by the computation β, i.e., its value. That's the second element of the list ν produced by the try primitive function.

The third element of the value ν is a list of all the arguments to the primitive function display that were encountered during the evaluation of β. More precisely, if display was called N times during the evaluation of β, then the third element of ν will be a list of N elements. The N arguments of display appear in the third element of ν in chronological order. Thus try can not only be used to determine if a computation β reads too much tape or goes on too long (i.e., to greater depth than α), but try can also be used to capture all the output that β displayed as it went along, whether the computation β aborted or not.

In summary, all that one has to do to simulate a self-delimiting universal Turing machine U(p) running on the binary program p is to write

          try no-time-limit 'eval read-exp p
This is an M-expression with parentheses omitted from primitive functions. (Recall that all primitive functions have a fixed number of arguments.) With the parentheses supplied, it becomes the S-expression
          (try no-time-limit ('(eval(read-exp))) p)
This says that one is to read a complete LISP S-expression from the TM tape p and then evaluate it without any time limit and using whatever is left on the tape p.

Some more primitive functions have also been added. The 2-argument function append denotes list concatenation, and the 1-argument function bits converts an S-expression into the list of the bits in its ASCII character string representation. These are used for constructing the bit strings that are then put on the TM tape using try's third argument γ. We also provide the 1-argument functions size and length that respectively give the number of characters in an S-expression, and the number of elements in a list. Note that the functions append, size and length could be programmed rather than included as built-in primitive functions, but it is extremely convenient and much much faster to provide them built in.

Finally a new 1-argument identity function debug with the side-effect of outputting its argument is provided for debugging. Output produced by debug is invisible to the ``official'' display and try output mechanism. debug is needed because try α β γ suppresses all output θ produced within its depth-controlled evaluation of β. Instead try collects all output θ from within β for inclusion in the final value ν that try returns, namely ν = (success/failure, value of β, θ).

Course Outline

The course begins by explaining with examples my new LISP. See examples.r.

Then the theory of LISP program-size complexity is developed a little bit. LISP program-size complexity is extremely simple and concrete. In particular, it is easy to show that it is impossible to prove that a self-contained LISP expression is elegant, i.e., that no smaller expression has the same value. More precisely, a formal axiomatic system of LISP complexity N cannot prove that a LISP expression is elegant if the LISP expression is more than N + 410 characters long. See godel.r.

Next we define our standard self-delimiting universal Turing machine U(p) using

          cadr try no-time-limit 'eval read-exp p
as explained in the previous section.

Next we show that

H(x, y) ≤ H(x) + H(y) + c

with c = 432. Here H(...) denotes the size in bits of the smallest program that makes our standard universal Turing machine compute .... Thus this inequality states that the information needed to compute the pair (x, y) is bounded by a constant c plus the sum of the information needed to compute x and the information needed to compute y. Consider

          cons eval read-exp
          cons eval read-exp
               nil
This is an M-expression with parentheses omitted from primitive functions. With all the parentheses supplied, it becomes the S-expression
          (cons (eval (read-exp))
          (cons (eval (read-exp))
                nil))
c = 432 is just 8 bits plus 8 times the size in characters of this LISP S-expression. See utm.r.

Consider a binary string x whose size is | x | bits. In utm.r we also show that

H(x) ≤ 2| x | + c

and

H(x) ≤ | x | + H(| x |) + c'

with c = 1106 and c' = 1024. As before, the programs for doing this are exhibited and run.

Next we turn to the self-delimiting program-size complexity H(X) for infinite r.e. sets X. This is defined to be the size in bits of the smallest LISP expression x that executes forever without halting and outputs the members of the r.e. set X using the LISP primitive display, which is an identity function with the side-effect of outputting the value of its argument. Note that this LISP expression x is allowed to read additional bits or expressions from the TM tape using the primitive functions read-bit and read-exp if x so desires. But of course x is charged for this; this adds to x's program size.

It is in order to deal with such unending expressions x that the LISP primitive function for time-limited evaluation try captures all output from display within its second argument β.

Now consider a formal axiomatic system A of complexity N, i.e., with a set of theorems TA that considered as an r.e. set as above has self-delimiting program-size complexity H(TA) = N. We show that A cannot enable us to exhibit a specific S-expression s with self-delimiting complexity H(s) greater than N + c. Here c = 4872. See godel2.r.

Next we show two different ways to calculate the halting probability Ω of our standard self-delimiting universal Turing machine in the limit from below. See omega.r and omega2.r. The first way of doing this, omega.r, is straight-forward. The second way to calculate Ω, omega2.r, uses a more clever method. Using the clever method as a subroutine, we show that if ΩN is the first N bits of the fractional part of the base-two real number Ω, then

HN) > Nc

with c = 8000. Again this is done with a program that can actually be run and whose size gives us a value for c. See omega3.r.

Consider again the formal axiomatic system A with complexity N, i.e., with self-delimiting program-size complexity H(TA) = N. Using the lower bound of Nc on HN) established in omega3.r, we show that A cannot enable us to determine more than the first N + c' bits of Ω. Here c' = 15328. In fact, we show that A cannot enable us to determine more than N + c' bits of Ω even if they are scattered and we leave gaps. See godel3.r.

Last but not least, the philosophical implications of all this should be discussed, especially the extent to which it tends to justify experimental mathematics. This would be along the lines of the discussion in my talk transcript ``Randomness in arithmetic and the decline and fall of reductionism in pure mathematics''.

This concludes our ``hands-on'' mini-course on the information-theoretic limits of mathematics.

LISP Source Code

The above LISP interpreter runs were produced using the C version of the interpreter and the following input files: examples.l, godel.l, utm.l, godel2.l, omega.l, omega2.l, omega3.l, godel3.l.

Here is another complete set of source (and output). This time the code is heavily commented, debug output is included, auxiliary functions are tested, and the C version of the LISP interpreter was also used. Source code: xexamples.l, xgodel.l, xutm.l, xgodel2.l, xomega.l, xomega2.l, xomega3.l, xgodel3.l. LISP runs: xexamples.r, xgodel.r, xutm.r, xgodel2.r, xomega.r, xomega2.r, xomega3.r, xgodel3.r.

Bibliography

Here is a useful collection of hand-outs for this course:
  1. G. J. Chaitin, ``Randomness in arithmetic and the decline and fall of reductionism in pure mathematics'', in J. Cornwell, Nature's Imagination, Oxford University Press, 1995, pp. 27-44.
  2. G. J. Chaitin, ``The Berry paradox'', Complexity 1:1 (1995), pp. 26-30.
  3. G. J. Chaitin, ``A new version of algorithmic information theory'', Complexity, to appear.
  4. G. J. Chaitin, ``How to run algorithmic information theory on a computer'', Complexity, to appear.