Set Theory and Weather Prediction

Here’s a puzzle:

You and Bob are going to play a game which has the following steps.

  1. Bob thinks of some function f\colon \mathbb{R}\to\mathbb{R} (it’s arbitrary: it doesn’t have to be continuous or anything).
  2. You pick an x \in \mathbb{R}.
  3. Bob reveals to you the table of values \{(x_0, f(x_0))\mid x_0\ne x\} of his function on every input except the one you specified
  4. You guess the value f(x) of Bob’s secret function on the number x that you picked in step 2.

You win if you guess right, you lose if you guess wrong. What’s the best strategy you have?

This initially seems completely hopeless: the values of f on inputs x_0\ne x have nothing to do with the value of f on input x, so how could you do any better then just making a wild guess?

In fact, it turns out that if you, say, choose x in Step 2 with uniform probability from \lbrack 0,1\rbrack, the axiom of choice implies that you have a strategy such that, whatever f Bob picked, you will win the game with probability 1!

The strategy is as follows: Let \sim be the equivalence relation on functions from \mathbb{R} to \mathbb{R} defined by f \sim g iff for all but finitely many y, f(y) = g(y). Using the axiom of choice, pick a representative from each equivalence class.

In Step 2, choose x with uniform probability from \lbrack 0,1\rbrack. When, in step 3, Bob reveals \{(x_0, f(x_0)) \mid x_0 \ne x\}, you know what equivalence class f is in, because you know its values at all but one point. Let g be the representative of that equivalence class that you picked ahead of time. Now, in step 4, guess that f(x) is equal to g(x).

What is the probability of success of this strategy? Well, whatever f that Bob picks, the representative g of its equivalence class will differ from it in only finitely many places. You will win the game if, in Step 2, you pick any number besides one of those finitely many numbers. Thus, you win with probability 1 no matter what function Bob selects.

This puzzle originally had the following form:

Suppose that there are countably infinitely many prisoners: Prisoner 1, Prisoner 2, etc., arranged so that Prisoner i can see Prisoner j iff i < j.
A warden puts either a red hat or a blue hat on each prisoner’s head, and asks each to guess the color of the hat on his own head. Prove that the prisoners have a strategy of coordinating their guesses so that only finitely many of them will be wrong.

As before, let \sim be the equivalence relation on functions f\colon \mathbb{N}\to\{\text{red}, \text{blue}\} defined by f\sim g iff f and g differ on only finitely many places. The prisoners’ strategy will then be: Beforehand, pick a representative from each equivalence class. Let f(i) be the color of the hat on Prisoner i‘s head. Then, since each Prisoner i can see the color of the hats on Prisoner j for j > i, each prisoner knows which equivalence class f is in. Suppose g\sim f is the representative that they picked beforehand. Then, for each i, Prisoner i will guess that he’s wearing hat g(i), and since g\sim f, only finitely many of them will be wrong.

For some interesting comments on this puzzle, see Greg Muller’s blog post on it here and Chris Hardin and Alan Taylor’s paper An Introduction to Infinite Hat Problems.

After hearing this puzzle, Chris Hardin came up with a great generalization. Instead of having a Prisoner i for each i\in \mathbb{N} and declare that Prisoner i can see Prisoner j iff i < j, let (P, <) be an arbitrary partial order and declare that for each p \in P, there is a Prisoner p, and that Prisoner p can see Prisoner q iff p < q. Assuming again that red and blue hats are placed on all prisoners and that they must all guess the color of the hat on their own head, how many of them will be able to guess correctly?

Call a partially ordered set A reverse well-founded if there are no strictly increasing chains a_1 < a_2 < a_3 < \cdots in it. Chris Hardin and Alan Taylor showed in their paper A Peculiar Connection Between the Axiom of Choice and Predicting the Future that the prisoners have a strategy so that the set of prisoners who are wrong will be reverse well-founded. In the case of the original prisoners problem, this implies that there will be only finitely many prisoners who are wrong, since there are no infinite reverse well-founded subsets of \mathbb{N}.

Suppose that there is a Prisoner x for each x\in\mathbb{R} and that Prisoner x can see Prisoner y iff x > y. Then, since all reverse well-founded subsets of \mathbb{R} are countable, at most countably many prisoners will be wrong under the Hardin-Taylor strategy. Since all countable subsets of \mathbb{R} are measure zero, this gives another way to win the game against Bob with probability one.

In fact, it implies that you can do more: You don’t need Bob to tell you \{(x_0, f(x_0)) \mid x_0\ne x\}, just \{(x_0, f(x_0)) \mid x_0 < x\}. Hardin and Taylor express this by imagining that we represent the weather with respect to time as an arbitrary function f\colon \mathbb{R}\to \mathbb{R}. Then, given that we can observe the past, there is an almost perfect weatherman who can predict the current weather with probability 1. They further show that the weatherman can almost surely get the weather right for some interval into the future.

What is the Hardin-Taylor strategy? What the prisoners do is that they first choose a well-ordering \prec of the set of functions from P to \{\text{red}, \text{blue}\} (this uses the axiom of choice), and then for each p, Prisoner p simply guesses that his hat color is g_p(p), where g_p is the \prec-least function consistent with what Prisoner p sees.

Now, suppose that there is a sequence p_1 < p_2 < \cdots of prisoners who are wrong. Since each Prisoner p_i sees all the prisoners that Prisoner p_j for i < j sees, we must have that g_{p_1} \succeq g_{p_2} \succeq \cdots. In fact, since g_{p_j}(p_j) \ne g_{p_i}(p_j) for i < j (since by assumption Prisoner p_j was wrong about his hat color, whereas Prisoner p_i will be right about it, since he can see Prisoner p_j), we have that g_{p_1} \succ g_{p_2} \succ \cdots, but this contradicts the fact that \prec is a well-ordering.

12 thoughts on “Set Theory and Weather Prediction

  1. The style of explanation does a lot here.

    I think it’s more obvious if you assume at the outset that Bob knows all the equiv class representatives. (giving him more info will only strengthen the proof). So Bob picks places to change the representative, hoping you will stumble on one of those, but of course you don’t.

    On the other hand, it really seems magical if you explain it this way:
    After Bob picks f, you pick x, and Bob gives you the table, Step 4 is to “pick an equivalence representative g for the table and guess f(x)=g(x)”
    all at once.

    It could even be made into a joke!

  2. But this post is ignoring that the concept of “uniformly choosing x between 0 and 1” is inconsistent with the axiom of choice.

  3. This is interesting – conditional on the function being f, your strategy has probability 1 of getting his function right. However, if you don’t have an initial distribution over the functions he could choose, then of course you can’t assign a probability that you’ll win from the beginning. I would conjecture that for most reasonable distributions (I don’t have a clear sense of how to characterize “reasonable” distributions on the space of functions) the event of you winning will turn out to be unmeasurable.

  4. You’re right, I changed the phrase “… the axiom of choice implies that you have a strategy that will win the game with probability 1” to: “… the axiom of choice implies that that you have a strategy such that, whatever f Bob picked, you will win the game with probability 1”

    Another way to linguistically slip past this subtlety might have been to say: “The axiom of choice implies that you have a strategy such that, whatever Bob’s distribution over the space of functions, the expected value over that distribution of the probability of you winning the game is 1.”

  5. By choosing a representative function g from each equivalence class ahead of time, there is a (deterministic) mapping
    G(f,x): {(x’,f) | x’ =/= x}-> Reals
    where G(f,x) = g(x) if g is the representative function for the class containing f.

    Suppose that f is white noise with a non-zero variance. Each f(x) is random, and f(x) and f(x’) are independent of one another if x’ =/= x. Since G(f,x) is a deterministic mapping, and since it uses information about f everywhere except at x, G(f,x) and f(x) are independent. By independence, the variance of the error VAR[G(f,x) – f(x)] for a given x is at least VAR[f(x)]. It follows that the probability of a correct guess (even over random x) is zero.

    The article’s proof seems to be based in the statement “the set {x | g(x) differs from f(x)} has measure 0,” which is why a random selection of x helps. However, I believe it is implicit in the proof is that f is fully known to the guesser — it’s fixed at the beginning of the problem and used to construct a “nice” G(f,x) that yields G(f,x) = f(x) except possibly at a finite number of points.

    If the guesser didn’t know f, G(f,x) couldn’t be chosen so nicely. Suppose f is non-zero except at finitely many points. For any given f and x, choose G(f,x) = 0. The function g with g=f except at x where we define g(x) = 0 certainly belongs to the same class as f, so we can make it the representative function for that class. However, fix f and define g(x) = G(f,x). Now, g=0, which means that g and f differ at infinitely many points. The probability of guessing f(x) is now zero.

  6. mike, you are incorrect that knowledge of f is known to the guesser. The guesser simply picks a representative for each equivalence class; this can be done before f is even chosen. When f is revealed (for all values except x) the equivalence class for f is known, so the guesser chooses g to be the representative from that class.

    Your example with G(f,x) = 0 indicates that you don’t understand the solution. An important point to the solution is that the representative g that is chosen is the same regardless of what x is. Otherwise we could not assert that the guesser is wrong for only finitely many x because g(x) differs from f(x) at only finitely many places. In your example, you would have g(x) be identically zero, which obviously does not match f(x) at all but finitely many places.

    Your probability argument doesn’t hold because G(f,x) is not a random variable.

  7. You cannot choose a random integer uniformly! Likewise neither can you choose a random real number uniformly from [0,1]. Therefore the strategy does not exist. Also, for all such problems the strategy requires an access into an infinite set in finite amount of time, which is impossible.

  8. The question is how you picked the arbitrary representatives. The axiom says that a pick exists. It doesn’t say you can actually pick. You can see how absurd it is if you move the arbitrary choice to immediately before the guess. No outside observer could tell if you chose the arbitrary representatives beforehand or after seeing x and the table of values. Picking without a corresponding probability distribution is meaningless.

    I think “in practice” this would result in the guesser being wrong practically all the time and saying that this doesn’t rule out that the probability is 1, he’s just extremely unlucky, but “eventually” the rate of correct guesses will converge to 1, but no idea when it’s going to happen. “Eventually.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s