Here’s a puzzle:
You and Bob are going to play a game which has the following steps.
- Bob thinks of some function
(it’s arbitrary: it doesn’t have to be continuous or anything).
- You pick an
.
- Bob reveals to you the table of values
of his function on every input except the one you specified
- You guess the value
of Bob’s secret function on the number
that you picked in step 2.
You win if you guess right, you lose if you guess wrong. What’s the best strategy you have?
This initially seems completely hopeless: the values of on inputs
have nothing to do with the value of
on input
, so how could you do any better then just making a wild guess?
In fact, it turns out that if you, say, choose in Step 2 with uniform probability from
, the axiom of choice implies that you have a strategy such that, whatever
Bob picked, you will win the game with probability 1!
The strategy is as follows: Let be the equivalence relation on functions from
to
defined by
iff for all but finitely many
,
. Using the axiom of choice, pick a representative from each equivalence class.
In Step 2, choose with uniform probability from
. When, in step 3, Bob reveals
, you know what equivalence class
is in, because you know its values at all but one point. Let
be the representative of that equivalence class that you picked ahead of time. Now, in step 4, guess that
is equal to
.
What is the probability of success of this strategy? Well, whatever that Bob picks, the representative
of its equivalence class will differ from it in only finitely many places. You will win the game if, in Step 2, you pick any number besides one of those finitely many numbers. Thus, you win with probability 1 no matter what function Bob selects.
This puzzle originally had the following form:
Suppose that there are countably infinitely many prisoners: Prisoner 1, Prisoner 2, etc., arranged so that Prisoner
can see Prisoner
iff
.
A warden puts either a red hat or a blue hat on each prisoner’s head, and asks each to guess the color of the hat on his own head. Prove that the prisoners have a strategy of coordinating their guesses so that only finitely many of them will be wrong.
As before, let be the equivalence relation on functions
defined by
iff
and
differ on only finitely many places. The prisoners’ strategy will then be: Beforehand, pick a representative from each equivalence class. Let
be the color of the hat on Prisoner
‘s head. Then, since each Prisoner
can see the color of the hats on Prisoner
for
, each prisoner knows which equivalence class
is in. Suppose
is the representative that they picked beforehand. Then, for each
, Prisoner
will guess that he’s wearing hat
, and since
, only finitely many of them will be wrong.
For some interesting comments on this puzzle, see Greg Muller’s blog post on it here and Chris Hardin and Alan Taylor’s paper An Introduction to Infinite Hat Problems.
After hearing this puzzle, Chris Hardin came up with a great generalization. Instead of having a Prisoner for each
and declare that Prisoner
can see Prisoner
iff
, let
be an arbitrary partial order and declare that for each
, there is a Prisoner
, and that Prisoner
can see Prisoner
iff
. Assuming again that red and blue hats are placed on all prisoners and that they must all guess the color of the hat on their own head, how many of them will be able to guess correctly?
Call a partially ordered set reverse well-founded if there are no strictly increasing chains
in it. Chris Hardin and Alan Taylor showed in their paper A Peculiar Connection Between the Axiom of Choice and Predicting the Future that the prisoners have a strategy so that the set of prisoners who are wrong will be reverse well-founded. In the case of the original prisoners problem, this implies that there will be only finitely many prisoners who are wrong, since there are no infinite reverse well-founded subsets of
.
Suppose that there is a Prisoner for each
and that Prisoner
can see Prisoner
iff
. Then, since all reverse well-founded subsets of
are countable, at most countably many prisoners will be wrong under the Hardin-Taylor strategy. Since all countable subsets of
are measure zero, this gives another way to win the game against Bob with probability one.
In fact, it implies that you can do more: You don’t need Bob to tell you , just
. Hardin and Taylor express this by imagining that we represent the weather with respect to time as an arbitrary function
. Then, given that we can observe the past, there is an almost perfect weatherman who can predict the current weather with probability 1. They further show that the weatherman can almost surely get the weather right for some interval into the future.
What is the Hardin-Taylor strategy? What the prisoners do is that they first choose a well-ordering of the set of functions from
to
(this uses the axiom of choice), and then for each
, Prisoner
simply guesses that his hat color is
, where
is the
-least function consistent with what Prisoner
sees.
Now, suppose that there is a sequence of prisoners who are wrong. Since each Prisoner
sees all the prisoners that Prisoner
for
sees, we must have that
. In fact, since
for
(since by assumption Prisoner
was wrong about his hat color, whereas Prisoner
will be right about it, since he can see Prisoner
), we have that
, but this contradicts the fact that
is a well-ordering.
The style of explanation does a lot here.
I think it’s more obvious if you assume at the outset that Bob knows all the equiv class representatives. (giving him more info will only strengthen the proof). So Bob picks places to change the representative, hoping you will stumble on one of those, but of course you don’t.
On the other hand, it really seems magical if you explain it this way:
After Bob picks f, you pick x, and Bob gives you the table, Step 4 is to “pick an equivalence representative g for the table and guess f(x)=g(x)”
all at once.
It could even be made into a joke!
But this post is ignoring that the concept of “uniformly choosing x between 0 and 1” is inconsistent with the axiom of choice.
Why would the existence of a uniform probability distribution on
be inconsistent with the axiom of choice?
This is interesting – conditional on the function being f, your strategy has probability 1 of getting his function right. However, if you don’t have an initial distribution over the functions he could choose, then of course you can’t assign a probability that you’ll win from the beginning. I would conjecture that for most reasonable distributions (I don’t have a clear sense of how to characterize “reasonable” distributions on the space of functions) the event of you winning will turn out to be unmeasurable.
You’re right, I changed the phrase “… the axiom of choice implies that you have a strategy that will win the game with probability 1” to: “… the axiom of choice implies that that you have a strategy such that, whatever
Bob picked, you will win the game with probability 1”
Another way to linguistically slip past this subtlety might have been to say: “The axiom of choice implies that you have a strategy such that, whatever Bob’s distribution over the space of functions, the expected value over that distribution of the probability of you winning the game is 1.”
By choosing a representative function g from each equivalence class ahead of time, there is a (deterministic) mapping
G(f,x): {(x’,f) | x’ =/= x}-> Reals
where G(f,x) = g(x) if g is the representative function for the class containing f.
Suppose that f is white noise with a non-zero variance. Each f(x) is random, and f(x) and f(x’) are independent of one another if x’ =/= x. Since G(f,x) is a deterministic mapping, and since it uses information about f everywhere except at x, G(f,x) and f(x) are independent. By independence, the variance of the error VAR[G(f,x) – f(x)] for a given x is at least VAR[f(x)]. It follows that the probability of a correct guess (even over random x) is zero.
The article’s proof seems to be based in the statement “the set {x | g(x) differs from f(x)} has measure 0,” which is why a random selection of x helps. However, I believe it is implicit in the proof is that f is fully known to the guesser — it’s fixed at the beginning of the problem and used to construct a “nice” G(f,x) that yields G(f,x) = f(x) except possibly at a finite number of points.
If the guesser didn’t know f, G(f,x) couldn’t be chosen so nicely. Suppose f is non-zero except at finitely many points. For any given f and x, choose G(f,x) = 0. The function g with g=f except at x where we define g(x) = 0 certainly belongs to the same class as f, so we can make it the representative function for that class. However, fix f and define g(x) = G(f,x). Now, g=0, which means that g and f differ at infinitely many points. The probability of guessing f(x) is now zero.
mike, you are incorrect that knowledge of f is known to the guesser. The guesser simply picks a representative for each equivalence class; this can be done before f is even chosen. When f is revealed (for all values except x) the equivalence class for f is known, so the guesser chooses g to be the representative from that class.
Your example with G(f,x) = 0 indicates that you don’t understand the solution. An important point to the solution is that the representative g that is chosen is the same regardless of what x is. Otherwise we could not assert that the guesser is wrong for only finitely many x because g(x) differs from f(x) at only finitely many places. In your example, you would have g(x) be identically zero, which obviously does not match f(x) at all but finitely many places.
Your probability argument doesn’t hold because G(f,x) is not a random variable.
Hmm, for some reason the prisoner example seems to be much less strange than the function example which seems much more counterintuitive.
Links broken, try http://www.math.union.edu/~hardinc/pub/peculiar.pdf
You cannot choose a random integer uniformly! Likewise neither can you choose a random real number uniformly from [0,1]. Therefore the strategy does not exist. Also, for all such problems the strategy requires an access into an infinite set in finite amount of time, which is impossible.
The question is how you picked the arbitrary representatives. The axiom says that a pick exists. It doesn’t say you can actually pick. You can see how absurd it is if you move the arbitrary choice to immediately before the guess. No outside observer could tell if you chose the arbitrary representatives beforehand or after seeing x and the table of values. Picking without a corresponding probability distribution is meaningless.
I think “in practice” this would result in the guesser being wrong practically all the time and saying that this doesn’t rule out that the probability is 1, he’s just extremely unlucky, but “eventually” the rate of correct guesses will converge to 1, but no idea when it’s going to happen. “Eventually.”