Most mathy people have a pretty good mental model of what a random process is (for example, generating a sequence of 20 independent bits).
I think most mathy people also have the intuition that there’s a sense in which an individual string like
10101001110000100101 is more “random” than
0000000000000000000 even though both strings are equally likely under the above random process, but they don’t know how formalize it, and may even doubt that there is any way to make sense of this intuition.
Mathematical logic (or maybe theoretical computer science) has a method for quantifying the randomness of individual strings: given a string , the Kolmogorov complexity of is the length of the shortest Turing machine that outputs it.
In this blog post, I would like to explain why I think this is a very satisfying definition.
I think a good way to help avoid philosophical quagmires when thinking about randomness is to recognize that random numbers are useful in the real world, and to make sure that your thinking about randomness preserves that.
For example, there are algorithms that take a fixed length string , and produce the correct answer to whatever problem they’re trying to solve on some large proportion of all the length strings. Then a good approach would just be to feed a random , and you’ll get the right answer with probability .
Just to give a concrete example: a very familiar way that random numbers are useful is to estimate the average of a large list of numbers by taking a random sample and averaging them. You might have a list of 1000 numbers (say, bounded between 0 and 10), and have encode a set of 100 indices, then will return the average of the numbers at those indices. If you say that succeeds for this problem if it returns an average that’s within some fixed tolerance of the true average, then you can work out for the given tolerance (although I think getting exact numbers for this problem is actually pretty tricky).
The reason that I think that the Kolmogorov complexity is a good account of randomness is that they above story “factors” through Kolmogorov complexity in the following way: For any computable where is high enough (in a sense to be made precise below), there is an integer such that:
- For all with , returns a correct answer.
- Almost all (of the given length ) have .
That is, Kolmogorov complexity lets you view the problem as follows: Any string of high complexity will yield the right answer when fed into , so the only role of randomness is as an easy way to generate a string of a high Kolmogorov complexity.
As a note: the notation means the shortest Turing program that outputs when given as an input. The reason for using this concept instead of is that we want to, e.g., consider any string of all 0s to be low complexity, even if the length of the string happens to be a high complexity number.
Some Rough Intuitions
The intuition for why almost all strings should have high Kolmogorov complexity is that there are only so many Turing machines: For example, there are strings of length and Turing machines of length , so the proportion of strings of Kolmogorov complexity must be at least .
The intuition for why should be correct for all strings of sufficiently high complexity is as follows: We’re presuming that is correct for most strings, and that is computable. If isn’t correct, that means you can describe it fairly succinctly: i.e., as the th string for which isn’t correct. This will be a short description since, by presumption, will be small.
I said above that this fact about Kolmogorov complexity only holds if is high
enough. How can we formalize this? One approach would be to consider a sequence of algorithms instead of a single as above. Each algorithm should return a correct answer on at least of its input strings. Furthermore, the different algorithms should be consistent: specifically, if returns the correct answer, then so should for .
Now, if we kept the size of the input string fixed, then this would be trivial, since for greater than , would have to return the correct answer on any string. So we should also consider algorithms that take input strings of length and give a correct answer on at least of those strings. (And if , we will have to define “correct answer” for so that every input string returns a correct answer. Thus won’t be very useful, but we can look at for higher s.)
In fact, it turns out we can just describe in terms of the sets of input strings on which the algorithm returns a correct answer.
Definition: A P-test is an assignment of a natural number to each
finite string such that, for each , the number of of length such that is .
If has length , then corresponds to being the smallest such that returns a correct answer in our discussion above.
Theorem (Martin-Löf?): For any computable P-test , there is a constant such that: For all of length and natural numbers , if , then .
Furthermore, the proportion of such that is at least .
I think this was one of Martin-Löf’s original theorems but I’m actually not sure. It’s a rephrasing of the results in Section 2.4 of Li and Vitányi’s book.
So, there is a complexity bound such that any string of high enough complexity will return a correct answer when plugged into the algorithm. However,
may have to be made high (which corresponds to making high) to ensure
that there are a large number of such high complexity strings (or any at all).
What about Noisy Data?
The algorithms discussed above are all deterministic: that is, they correspond to things like Monte Carlo integration rather than averaging noisy data collected from the real world.
So what about noisy data? Random numbers are also useful in analyzing real world data, but the theorem above only applies to computable algorithms. The answer is so simple that it seems like cheating: if you model the noise in your data as coming from some infinite binary sequence , you can simply redo the whole thing but with Turing machines that have access to ! In other words, you won’t get theorems about , but you will get theorems about , which is the length of the shortest Turing machine that has access to and outputs .
What about Infinite Random Sequences?
Above we considered algorithms that knew ahead of time how many random bits we need. What about algorithms that might request a random bit at any time? This is also handled by Kolmogorov complexity: here we say that an infinite binary sequence is Martin-Löf random if there is some such that each prefix of the sequence of length has complexity at least . (There actually has to be a technical change to the definition of complexity of finite strings in this case.)
As in the finite case, there’s a theorem saying that any sufficiently robust algorithm will yield a correct answer on any Martin-Löf random sequence.
One thing I like about this framework is that it provides an idea for what it means for a single infinite sequence to be random. For example, people often say that the primes are random (in fact, it’s one of their main points of interest). Since the primes are computable, they aren’t random in this sense, but this gives an idea of what it might mean: perhaps there’s some programming language that encapsulates “non-number-theoretic” ideas in some way, and some sequence derived from the primes can be shown to be “Martin-Löf” random with Turing machines replaced by this weaker description language. But this is pure speculation.
2 thoughts on “A Good Definition of Randomness”
Without having read the rest of this post, here’s my stab at it:
abbabaabaababagets parsed by the mind into an equivalence class with about (or exactly) that ratio of
aaaaaaaaaaaaa† gets sent into its own equivalence class, and
aaaabaaaaaaaawould get sent into their own equivalence class too.
† keeping that number of
ab‘s because that “looks close enough” — the exact topic at hand
Is it just that the ratios of a’s to b’s are approximately equal, or would you also put abbabaabaababa in a different class from aaaaaaaabbbbbb?