I’d been interested in making a math youtube video for quite a while, but the impetus for this was that I just found out that 3blue1brown, who is math youtuber who produces excellent videos, has actually released the python scripts that he uses to make them on his github.

So this weekend, I’ve been poking at them to figure out they work, and I really think allowing people to easily make high quality math animations might be as revolutionary as in terms of allowing people to communicate mathematical ideas. I’m very excited!

]]>

Let’s grant that “continent” is not a well-defined term; or, to put it another way, the set of continents is not a well-defined set. Even given that, it turns out there’s a mathematical notion of a “variable set”, or “set-valued sheaf” that can capture the notion of a set which can vary under different assumptions. Intuitively a set-valued sheaf on a topological space is like a continuous function with domain , except the range is not another topological space, it’s the category of all sets!

Rather than define “sheaf of sets on a topological space” explicitly, let’s work through what it means in the CGP Grey case. For simplicitly, let’s just focus on two of the things that CGP Grey mentions: the meaning of “continent” can vary depending on how large you require a continent to be, and the meaning of “continent” can vary depending on how separated you require two continents to be to count as distinct.

Since these are two independent parameters, let’s take our topological space to be . The first coordinate will represent our “looseness about the size requirement”; i.e., if it’s larger, we’ll consider smaller islands to be continents. The second coordinate will represent our “degree of consideration of land bridges”; i.e., if it’s larger, we’ll require larger amounts of water to separate two continents.

To be clear, these parameters are *subjective*: that is, I’m not postulating any quantitative correspondence between the parameters and, e.g., a minimum size requirement to be a continent.

Now let’s see what the variable set of continents might look like. First, let’s set the second parameter to 0 and vary the first parameter. The set might look like this:

Note that some continents, like South America, are always in the set of continents, but as the parameter gets loosened, other elements get added to the set.

Now, let’s set the first parameter to 0 and vary the second one. That graph might look like this:

This is a little more subtle than the previous graph; instead of new continents getting added, two continents which are distinct might become equal: Europe and Asia quickly become equal, as there is actually no ocean between them at all. If you disregard the Panama Canal, North America and South America become one continent. If you disregard the Suez Canal, Eurasia and Africa become one continent.

Now, we’ve only looked at two slices of this variable set (and even those two slices have been under-specified, since I haven’t said in complete detail how to interpret the two parameters). But let’s suppose that the full variable set on can be filled out to give a set-valued sheaf called .

Given that, what can we do with this? Well, one of the reason sheaves are interesting from a logical perspective is that if we consider the category of all set-valued sheaves on (or any fixed topological space), this forms a type of category called a topos which acts so much like the category of sets that we can actually pretend that its elements *are* sets, and do normal set theory in it. The only proviso is that the internal logic does not include the law of the excluded middle: the axiom that for any proposition .

So, what are some things you can do in this logic where we get to pretend that is a genuine set?

Well, we know that has elements: we know there is a thing called that’s in , and a thing called that’s in and so on. We *don’t *know there’s a thing called that’s in ; I’ll show how to deal with that later.

This is where the lack of the law excluded middle first rears its head: in this logic it is *neither* the case that , *nor *that ! On the other hand, it *is *the case that . This might seem unusual with ordinary sets, but I think it’s pretty intuitive here. Note that there can be relationships between these facts, e.g., implies .

In normal set theory, you can determine the cardinality of any set. And in fact, the video’s stated aim is to say what the cardinality is. One of the consequences of losing the law of the excluded middle is that the notion of *finiteness* becomes more subtle (e.g., see here or here), which again seems appropriate here. It turns out to be the case that is what’s called *subfinite*, but doesn’t have a definite cardinality.

However, there are still true things using the cardinality of in them: for example, assuming no continents other than the ones in the graphs above are added, it’s the case that the cardinality of is less than 11 (even though the cardinality does not equal a specific number below 11). For another example, there might be a relationship between the two parameters such that something like implies the number of continents is greater than 7 is true.

OK, so far we’ve discussed how the logic handles things like the possibility of two continents becoming equal. How does it handle the conditional *existence* of continents like Borneo? So far it’s not clear how to even talk about these things in the language.

To explain that, we have to back up a bit. In normal set theory, there are sets with one element, and we might as well pick a distinguished one, call it . Note that for any set (still in normal set theory), the elements of are in 1-1 correspondence with maps ; so we could as well talk about those maps instead of elements of .

Similarly, in the theory of set-valued sheaves on , there is also a set , and instead of saying that continents like are elements of , we could have instead talked about maps from to and relationships between them.

Now, in normal set theory, has only two subsets: itself and the empty set. But that proof depends on excluded middle (since it goes by asking whether or not is in a given subset), so if we drop it, it’s no longer necessarily true. Indeed, in this logic, there is a subset of , call it , that is not the empty set and not . Furthermore, there is a map, , from to . The fact that this map has domain instead of represents the conditional nature of Borneo’s existence as an element of .

Just as with the equality hypotheses, we can represent relationships between conditional existences: for example, if Greenland is a continent whenever Borneo is, we have a map from to . If there are at least 7 continents whenever Borneo exists, we have a map from to .

Toposes were invented in the service of algebraic geometry (see here for a good account of the history of this topic). However, I think they also provide a beautiful account of how set theory can take account of fuzzy concepts. See here for more on this notion of variable sets.

]]>

The setup is as follows: Imagine that there is some set of states of the world, called the *macrostate**s*, that we humans can distinguish. To each of these macrostates is associated some large number of microstates, where a microstate is a complete specification of all information about all the particles in a system.

For example, given a container of gas, different macrostates would correspond to different pressures and temperatures of the gas, since we can determine those with macroscopic measurements. Microstates would correspond to complete information about how all particles in the gas are moving.

Every macrostate has an associated quantity called its *entropy*, written with an . The entropy of a macrostate obeys the following rules:

- The entropy is equal to Boltzmann’s constant, , times the logarithm of the number of associated microstates: .
- If a system is at a temperature , and you heat it by adding energy to it (while keeping it at temperature , by allowing it to expand, say), then its entropy increases by .
- The total of the entropy of the universe always increases. (This is the second law of thermodynamics.)

These rules alone let you do a surprising number of useful calculations:

Suppose you have a gas occupying a volume at temperature with total number of molecules . How much energy do you have to add to it to double its volume while keeping it at the same temperature?

With a doubled volume, imagine that the gas occupies two spaces of volume . Then the number of microstates of the system gets multiplied by since, for each microstate of the original gas, there are new microstates corresponding to which space of volume the molecule is in.

That means that the entropy must have gone up by by Rule #1, but by rule #2, that means that to achieve this entropy increase by heating, we must have added energy to the system.

Generally, when hot things and cold things touch, the hot things get cooler and the cold things get hotter. This is because hot things that are getting cooler are losing entropy, and cold things that are getting hotter are gaining entropy, but the entropy being gained is greater than the entropy being lost.

How much energy does it take to reverse the process? Suppose you have a refrigerator with 10 kg of food that’s currently at room temperature (say 20° C) and you want to lower it to 0°C. Suppose the specific heat of the food is 3.5 kJ/kg°C. That means that a total of can be extracted from the food as it cools.

When the food has lost energy, that must mean its temperature is:

That means the entropy lost by the food as its cooled is

This is about 2.5 kJ/°C. Whatever process is used for refrigeration, it must obey Rule #3, and thus increase entropy somewhere else.

If you increase entropy by exhausting heat into the room, which is at 20°C = 293K, then you’ll have to exhaust of energy. You can get 700kJ of that from the food, but you still need an extra 32.5 kJ of energy, which is why refrigerators have to be plugged in and don’t work on their own.

How much energy does it take to delete a bit of information in a computer? A bit could be in state 0 or state 1, and after deleting it, it will be in state 0 (say). That means that the process of deleting the bit takes all the microstates of the 0 macrostate and all the microstates of the 1 macrostate to the 0 microstate. That’s halves the number of microstates, or subtracts from the entropy.

In order to obey Rule #3, you therefore have to exhaust a minimum of energy as waste heat, where is the ambient temperature. This is called Landauer’s principle.

]]>

But I only thought that because I never bothered to actually plug in any numbers. Using the formula for the force of gravity, you can see that if you have two 1-kilogram objects 0.1 meters apart, the acceleration due to gravity between them is enough to move them by 2.7 millimeters in just 15 minutes. Wow!

Of course, this makes sense, given that was measured all the way back in 1798 by Henry Cavendish. I think I knew that was measured quite long ago, but I just assumed it was based on some astronomical calculations, or was some sort of indirect inference. Nope, it turns out Henry Cavendish put some lead balls on a torsion balance and directly measured how much they attracted each other. Cool!

]]>

- and are both equal to the set of sentences such that a computer can determine the truth or falsity of in finite time.Sentences in are essentially computations, things like or “For all less than a million, if then .”
- for is defined as follows: If, for all , is a sentence, then is a sentence.
- Similarly, if for all , is a sentence, then is a sentence.

It’s not obvious from the definition, but by encoding pairs, triples, etc. of natural numbers as single natural numbers, you are allowed to include multiple instances of the same quantifier without moving up the hierarchy. For example, a sentence is if is . Essentially contains all statements with quantifier alternations where the outermost quantifier is a and contains all statements with quantifier alternations where the outermost quantifier is an .

Many number-theoretic and combinatorial statements are : Fermat’s last theorem, Goldbach’s conjecture, and the Riemann Hypothesis (which can be seen to be in by a formulation of Jeffrey Lagarias). By encoding finite groups as numbers, theorems like the classification of finite simple groups can also be seen to be .

Note that statements of the form where is some computable function are also in since for a fixed , is checkable in finite time by a computer.

There are many sentences which are of the form in for which we actually know a bound on , so we get a sentence in when we use that bound. For example, the statement that there are infinitely many primes is in , since it can be written as “For all , there exists an such that is prime”, but we also know a version that’s in , for example, we know that there’s a prime between and for any .

There is a theorem that the sentences in this hierarchy actually do get harder to decide the truth of in general:

Theorem: For any : even in a programming language augmented with the magical ability to decide the truth of sentences, it is not possible to write a program which decides the truth of every sentence.

What if we used the above idea to classify scientific statements instead of number-theoretic ones? We would get a new hierarchy where:

- and are both equal to the set of sentences such that a definite experiment can be performed which demonstrates the truth or falsity of .
- and for are defined just as before.

Examples of statements in this classification would be things like: Any pair of objects dropped in a vacuum will fall with the same acceleration; or given any gram of liquid water at a standard pressure, adding 4.18 joules of energy will raise it 1 degree Celsius.

Of course, those examples are idealized: to make them actually true, you have to add in lots more caveats and tolerances: for example, you have to say that any pair of objects weighing no more than such-and-such will fall with the same acceleration up to such-and-such tolerance (and add in various other caveats). More on this later.

An example of a scientific statement might be something like: if you put any two objects in thermal contact, then for any given tolerance, there will be a time at which their temperatures are equal to within that tolerance.

You might have to use a statement of complexity to say something about randomness. For example, a scientific statement might be: Suppose you are doing an experiment where you acquire a carbon-11 atom, wait for it to decay to boron-11 and record the time, then repeat. Then, for any tolerance , there will be an such that for any the proportion of the first carbon-11 atoms that decayed in less than 20.334 minutes will be no more than away from 0.5.

In the Arithmetic Hierarchy case, we had a theorem saying that statements higher up the hierarchy were qualitatively harder to decide the truth of. Is there a similar theorem here?

In fact, you can make this precise (although, to be honest, I’m not sure what the cleanest way to do it is). In particular, statements are learnable in the limit of acquiring all the experimental data, while statements aren’t. One way to make this rigorous is: Suppose you have a statement , where for any object (assume that ranges over some countable set of physical objects), can be established or refuted by a single experiment.

Now, suppose that you have a probability distribution over all possibilities for for objects : that is, you have a probability distribution on the Cantor space where the elements of the sample space are to be interpreted as full specifications of whether or not holds for each : e.g., things like or .

Call events which are finite conjunctions of events of the form or *basic events*.

Now say that this probability distribution is open-minded with respect to an event if, for any basic event , if and are consistent, then .

Now, assuming that is open-minded with respect to the event , it’s pretty easy to show that , where . That is, if is actually true, it will be learned. (Of course, if it is false, that will be learned in finite time automatically.)

On the other hand, for general statements, it’s pretty easy to see that this is not the case: In fact, any computable procedure assigning a probability for a statement just based on seeing finitely many data points of the form or can be tricked by some value of by using the definition of the computable procedure in itself.

I would like to emphasize though, that the above are off-the-cuff basically-trivial observations. There must be a cleaner, nicer framework to state a scientific hierarchy theorem, but I don’t know what it is.

In my opinion, thinking about the complexity of scientific statements in terms of quantifiers yields some insights.

For example, the case is basically the scientific method as learned in elementary school: a hypothesis is generated, and by repeated experiments we either reject it, or gradually come to correctly accept the truth of it.

The fact that science is not so simple is seen by just recognizing that there are scientific hypotheses that are not in form. However, probably even more significant is the fact that many real-world phenomena *add quantifiers* to our hypotheses:

**Measurement errors**.As alluded to above, to make our scientific statements true, we have to add tolerances. If we don’t know what the tolerances should be when we make the hypothesis, we have to add at least one quantifier.**Not everyone can perform the experiments**Not everyone performs experiments, in fact, most people rely on scientific consensus.

That automatically brings the hypothesis up to (at least) , for example:

There exists a time , such that for all times the scientific consensus at time will be in favor of global warming (or evolution, or that eggs are good for you, or whatever).

I feel like thinking about the complexity of scientific hypotheses in terms of quantifier alternations is so natural that it must have been studied before, but I can’t find anything by googling around. Does anyone know where to find more information on this?

]]>

This is bad, not just because the right-hand rule is confusing, but because it leads people to wonder if the right-hand rule has some physical reality. For example, see the comments on this youtube explanation of gyroscopes by the PhysicsGirl: there’s a lot of confusion over whether the right-hand rule is a fact of nature or a convention.

It turns out that it’s unnecessary to use the right-hand rule because it’s actually unnecessary to convert rotational quantities to vectors at all, and I think many people are unaware of this.

Let me explain by analogy with the vector case. Given a particle, how do you represent its linear momentum? You take a vector whose direction is the same as the direction of the particle’s velocity, and whose magnitude is the particle’s mass times its speed.

A system of two particles has a linear momentum obtained by adding the linear momenta of the individual particles by putting the vectors head-to-tail.

Now, how do we represent the angular momentum of a particle with respect to some base point ? Usually, what you do is take the position vector of with respect to and the particle’s linear momentum vector , and define the linear momentum to be the cross product , which requires the right-hand rule. Then, as before, the angular momentum of two particles is the sum of the angular momentum of the particles individually.

However, there is another thing you can do: instead of taking the cross product , which is a vector, represent it as a 2-dimensional object: an oriented parallelogram called which is in the same plane as and and whose area is .

These add analogously to vectors: you add two oriented parallelograms by matching up edges of the same length and opposite orientation. (Apologies for the poor handwriting: the squiggles inside the parallelograms are meant to be arrows indicating orientation.)

Two of these parallelograms are declared to be equal if they have the same (signed) area and live in the same plane: in that way, you can always add any two of them by reshaping one of them to have the right side length to add with the other.

And, this addition is still appropriate: under this definition, the angular momentum of the system of particles is the sum of the angular momenta of the individual particles, and it has all the properties you want: it’s unchanged unless the system experiences a net torque (also represented by a parallelogram), etc.

This gets rid of the arbitrariness and complexities of the right-hand rule, and also has the nice property that for a situation restricted to a plane, you don’t have to arbitrarily introduce a third dimension not part of the problem.

These parallelograms are actually called bivectors, and this is just a very small part of a much larger enterprise called geometric algebra that allows algebraic manipulation of higher-dimensional objects just as linear algebra allows the algebraic manipulation of vectors. There’s a small cadre of physicists who think lots of physics should be redone with geometric algebra.

I’m not qualified to have an opinion on that, but I am pretty confident that using signed parallelogram addition in YouTube physics explanations would be clearer than using the right-hand rule.

]]>

Using complexity theory, we can give a partial account of this phenomenon. The concept that we’ll use is length-conditional complexity: if is an -bit number, means the length of the shortest Turing machine that outputs when given as an input. In a previous post, I stated the following theorem:

Theorem: Suppose you have a computable property of -bit numbers such that:

- For all , , , implies .
- For all , , there are at least -digit numbers such that holds.
Then there is a such that for all and , holds for every of length such that .

Furthermore, the number of ‘s of length such that is at least .

In this context, might be of the form “ is not close to some special set of numbers”, like “ is not close to a square number” or “ is not close to a prime”, where the meaning of “close” will depend on and . Or it could be that some statistic about deviates from the expected value by some large amount (again, with the meaning of “large” depending on and ).

Now, suppose we have a bag of computable properties that we’re interested in. Assume that we’re using the same and for each and that they are of similar complexity, meaning that the value of in the above theorem is the same for each of them.

Then assuming that is small, given a random -bit number , it is likely (with probability at least ) that satisfies , and thus should hold for *all* without exception. If some doesn’t hold, that means some assumption was violated, in particular that doesn’t hold for at least -bit numbers, which is good evidence that there’s some pattern or mathematical law here that you were unaware of (e.g., a large proportion of numbers are close to one in some special set). (Note that it is also possible that the assumption that was violated was that the complexity of was higher than expected. However, that *is* tied pretty closely to the length of a program that implements , so, assuming that you know how to compute , it’s unlikely that it’s much more complex than you originally thought.)

On the other hand, if is large (say, ), then there may be no ‘s such that . In that case, it may simply be that the properties are independent for different values of . In that case, it’s not reasonable to assume that if some fails, then you

have discovered a possible new mathematical law, it may just be a coincidence instead.

Furthermore, as gets smaller, you will be *forced* to decrease , making larger, since if , then each must be trivial, since it must hold of at least of the numbers, and the only way to do that is to hold for all of them, making the properties trivial.

I certainly don’t claim this captures everything about the Strong Law of Small Numbers. But I like this account, because it gives a way to think about what “small” means: namely, it means small enough that you’ve chosen an such that , where is the complexity of the properties of numbers that you’re interested in.

]]>

It’s an intriguing fact that if you look at the sequence of geometric means this approaches a single constant, called Khinchin’s constant, which is approximately , for *almost* every . This means that if you were to pick (for convenience, say it’s between 0 and 1) by writing a decimal point and then repeatedly rolling a ten-sided die forever to generate the digits after that, the you generate would have this property with probability 1.

However, as the Wikipedia page above says, although almost all have this property (call it the “Khinchin property”), *no* number that wasn’t specifically constructed to have the Khinchin property has been proven to do so (and some numbers, like and and all rational numbers, have been shown to *not* have the Khinchin property).

If you want to be the first to find a number having the Khinchin property that wasn’t specifically constructed to have it, my advice is to try Chaitin’s Constant, . Roughly, you can think of as the probability that a randomly selected Turing machine will halt, although there are a few more technicalities than that.

More importantly for our purposes, it’s very likely to have the Khinchin property, because it’s algorithmically random, meaning it has *all* computable properties almost all numbers have! That means that the following statement implies that has the Khinchin property:

There is a computable function satisfying: For all , the set of numbers between 0 and 1 such that for all has measure at least .

Proving that sounds a lot easier to me that, e.g., proving has the Khinchin property (note that the fact that is computable is the hard part).

However, some might quibble about whether or not this meets the original criterion: it’s definitely true that wasn’t constructed to have the Khinchin property; however, in a certain sense, it was constructed to have *every* such property!

]]>

I think most mathy people also have the *intuition* that there’s a sense in which an individual string like `10101001110000100101`

is more “random” than `0000000000000000000`

even though both strings are equally likely under the above random process, but they don’t know how formalize it, and may even doubt that there is *any* way to make sense of this intuition.

Mathematical logic (or maybe theoretical computer science) has a method for quantifying the randomness of individual strings: given a string , the Kolmogorov complexity of is the length of the shortest Turing machine that outputs it.

In this blog post, I would like to explain why I think this is a very satisfying definition.

I think a good way to help avoid philosophical quagmires when thinking about randomness is to recognize that random numbers are useful in the real world, and to make sure that your thinking about randomness preserves that.

For example, there are algorithms that take a fixed length string , and produce the correct answer to whatever problem they’re trying to solve on some large proportion of all the length strings. Then a good approach would just be to feed a random , and you’ll get the right answer with probability .

Just to give a concrete example: a very familiar way that random numbers are useful is to estimate the average of a large list of numbers by taking a random sample and averaging them. You might have a list of 1000 numbers (say, bounded between 0 and 10), and have encode a set of 100 indices, then will return the average of the numbers at those indices. If you say that succeeds for this problem if it returns an average that’s within some fixed tolerance of the true average, then you can work out for the given tolerance (although I think getting exact numbers for this problem is actually pretty tricky).

The reason that I think that the Kolmogorov complexity is a good account of randomness is that they above story “factors” through Kolmogorov complexity in the following way: For any computable where is high enough (in a sense to be made precise below), there is an integer such that:

- For
*all*with , returns a correct answer. - Almost all (of the given length ) have .

That is, Kolmogorov complexity lets you view the problem as follows: Any string of high complexity will yield the right answer when fed into , so the only role of randomness is as an easy way to generate a string of a high Kolmogorov complexity.

As a note: the notation means the shortest Turing program that outputs when given as an input. The reason for using this concept instead of is that we want to, e.g., consider any string of all 0s to be low complexity, even if the length of the string happens to be a high complexity number.

The intuition for why almost all strings should have high Kolmogorov complexity is that there are only so many Turing machines: For example, there are strings of length and Turing machines of length , so the proportion of strings of Kolmogorov complexity must be at least .

The intuition for why should be correct for all strings of sufficiently high complexity is as follows: We’re presuming that is correct for most strings, and that is computable. If isn’t correct, that means you can describe it fairly succinctly: i.e., as the th string for which isn’t correct. This will be a short description since, by presumption, will be small.

I said above that this fact about Kolmogorov complexity only holds if is high

enough. How can we formalize this? One approach would be to consider a sequence of algorithms instead of a single as above. Each algorithm should return a correct answer on at least of its input strings. Furthermore, the different algorithms should be consistent: specifically, if returns the correct answer, then so should for .

Now, if we kept the size of the input string fixed, then this would be trivial, since for greater than , would have to return the correct answer on any string. So we should also consider algorithms that take input strings of length and give a correct answer on at least of those strings. (And if , we will have to define “correct answer” for so that every input string returns a correct answer. Thus won’t be very useful, but we can look at for higher s.)

In fact, it turns out we can just describe in terms of the sets of input strings on which the algorithm returns a correct answer.

Definition: A P-test is an assignment of a natural number to each

finite string such that, for each , the number of of length such that is .

If has length , then corresponds to being the smallest such that returns a correct answer in our discussion above.

Theorem (Martin-For any computable P-test , there is a constant such that: For all of length and natural numbers , if , then .Löf?):Furthermore, the proportion of such that is at least .

I think this was one of Martin-Löf’s original theorems but I’m actually not sure. It’s a rephrasing of the results in Section 2.4 of Li and Vitányi’s book.

So, there is a complexity bound such that any string of high enough complexity will return a correct answer when plugged into the algorithm. However,

may have to be made high (which corresponds to making high) to ensure

that there are a large number of such high complexity strings (or any at all).

The algorithms discussed above are all deterministic: that is, they correspond to things like Monte Carlo integration rather than averaging noisy data collected from the real world.

So what about noisy data? Random numbers are also useful in analyzing real world data, but the theorem above only applies to computable algorithms. The answer is so simple that it seems like cheating: if you model the noise in your data as coming from some infinite binary sequence , you can simply redo the whole thing but with Turing machines that have access to ! In other words, you won’t get theorems about , but you will get theorems about , which is the length of the shortest Turing machine that has access to and outputs .

Above we considered algorithms that knew ahead of time how many random bits we need. What about algorithms that might request a random bit at any time? This is also handled by Kolmogorov complexity: here we say that an infinite binary sequence is Martin-Löf random if there is some such that each prefix of the sequence of length has complexity at least . (There actually has to be a technical change to the definition of complexity of finite strings in this case.)

As in the finite case, there’s a theorem saying that any sufficiently robust algorithm will yield a correct answer on any Martin-Löf random sequence.

One thing I like about this framework is that it provides an idea for what it means for a single infinite sequence to be random. For example, people often say that the primes are random (in fact, it’s one of their main points of interest). Since the primes are computable, they aren’t random in this sense, but this gives an idea of what it might mean: perhaps there’s some programming language that encapsulates “non-number-theoretic” ideas in some way, and some sequence derived from the primes can be shown to be “Martin-Löf” random with Turing machines replaced by this weaker description language. But this is pure speculation.

]]>

I’ve known about the gist of generating functions for a while, and I’d always thought that the fact that differentiation was meaningful was just a magical coincidence (for some reason, addition and multiplication being meaningful didn’t seem as surprising to me).

But recently, Nathan Linger pointed out to me that over in the functional programming community, they have what I think is a very satisfying answer to this question (he said he got it from sigfpe’s blog, but I’m not sure what post, maybe this one?).

I actually find the general concept of generating functions surprisingly slippery. André Joyal’s notion of a combinatorial species makes things more concrete for me. A combinatorial species is simply a functor from to itself, where is the category of finite sets and bijections. The idea is that, for a finite set , should be considered as the set of all structures of a certain kind on .

For example: is the functor taking a set to the set of all linear orders of elements of , so . Another example is which takes to the set of all trees on elements of .

Each species has an associated generating function where (you could use any set of cardinality instead of .

It is now possible to make precise the fact that meaningful operations on species correspond to meaningful operations on generating functions. For example, if you define addition on species by letting be the disjoint union of and , then the generating function of the sum of two species is the sum of the generating functions. Similarly, multiplication on species is defined by letting be a pair of an element of and an element of for some partition , and it corresponds to multiplication of generating functions.

As mentioned before, there’s also an operation on species which corresponds to differentiation of the generating function. It corresponds to an addition of a “hole” in the structure. That is, , and takes bijections on to the output of on the same bijection but fixing .

This is a very powerful fact. For example, a linear order with a hole is the same thing as the product of two linear orders (the one to the left of the hole and the one to the right of the hole). If is the generating function of , this gives us the equation . Since we also know that should be 1, this gives us . Thus, if we didn’t know it already, we can deduce that there are linear orders on elements.

Furthermore, there is a notion of composition of species corresponding to composition of generating functions, where intuitively is a partition of together with a -structure on each element of the partition, and a -structure on the partition as a whole.

What is the connection between differentiation of species and ordinary differentiation? Two main ways of approaching ordinary differentiation are through limits or through infinitesimals. There don’t seem to be any limits around, so let’s focus on infinitesimals.

Although you can make infinitesimals precise in various ways, most people who think about calculus using infinitesimals do so in a non-rigorous way. Here’s one common non-rigorous principle:

Let be such that . Then for any (smooth, real-valued) and real , .

Of course, there is no such in the standard definition of the real numbers.

Although this is non-rigorous, if we can translate the same non-rigorous principle over to species, that would give a good account of why differentiation shows up in generating functions. The main insight is what the meaning of should be. The condition that for species simply means that there should be at least one -structure on some set. The condition that is subtler: it means that you can’t put a -structure on two sets at the same time. As in the case with real numbers, this is impossible, but the reasoning works anyway so we’ll go with it.

Now let’s think about what means. An -structure on a set is either an -structure or a -structure. By the definition of composition of species, an -structure on a set is a partition of together with an -structure on each element of the partition and an -structure on the partition as a whole. This means that each element of the partition is given either an -structure or a -structure.

But, by the infinitesimal nature of , at most one element of the partition can be given a structure. That means there are two cases: 0 elements of the partition have a -structure or 1 does. If 0 elements do, every element of the partition has an -structure, and the species is . On the other hand, if one does, we can describe the species by saying which one does (with a hole in the -structure) and what the -structure was. That’s . Therefore !

Note that if we didn’t know what the formula for a species with a hole was, we could go through the preceding informal reasoning and deduce that it should correspond to the derivative! I don’t know if others are convinced, but I find this quite satisfying. To be totally clear, as mentioned before, this argument came from sigfpe’s blog and I don’t know if it has history before that.

]]>

Warning: The solutions are given right after the puzzles. If you want to think about them, cover the screen.

Puzzle:There are 10 prisoners labeled . A malicious warden puts a black or white hat on each. Prisoner can see prisoner ‘s hat iff . The warden has each prisoner guess the hat color on their head, in order starting from prisoner 1. The prisoners can hear previous guesses.If a prisoner guesses right, they are freed, otherwise they are sent back to jail. Given that the prisoners can strategize as a group beforehand, for how many prisoners can they guarantee freedom in the worst case?

It turns out that the prisoners can guarantee the freedom of all prisoners except prisoner 1: Prisoner 1 first counts the number of black hats, then guesses black if it’s even and white if it’s odd. Now prisoner 2 knows their hat color, since they heard prisoner 1’s guess and they can count the number of black hats they see. Once prisoner 2 guesses correctly, prisoner 3 can guess correctly using prisoner 1 and 2’s answer, and so forth.

Generalization 1:What if there are 3 hat colors? What if there are hat colors? What if the hat colors are drawn from an arbitrary, possibly infinite set ?

This generalization makes the solution simpler, since it reveals something about what was going on in the first solution.

As long as the prisoners can put a group structure on the hat colors, they can run the same strategy as before: Player 1 adds up all the colors and announces that. Each subsequent player adds up all the colors they can see and the correct guesses, and subtracts that from the sum that player 1 announced.

Generalization 2:What if , i.e., there are infinitely many prisoners, arranged like ?

The solution to this generalization makes things more complicated: it uses a trick which is not related to the previous problems, and which is not clear how to generalize.

It turns out there is still a strategy to free all but the first prisoner.

Here’s the new trick: Consider the set of all possible assignments of hat colors to prisoners. Let for iff agrees with for all but finitely many prisoners. This is an equivalence relation. Beforehand, the prisoners choose a representative from each equivalence class (this requires the axiom of choice).

Since each prisoner can see all but finitely many of the other prisoner’s hats, each prisoner knows which equivalence class they’re in. Then the solution is similar to before: The first prisoner adds up the finitely many differences between the hat colors and the chosen representative, and the subsequent ones can then all correctly deduce their own hat color.

Generalization 3:What if is an arbitrary ordinal? What if is an arbitrary linear ordering?

This solution makes things simpler, as it reveals what was going on in the previous solution. This problem was solved by Chris Hardin and Alan Taylor.

The answer is that, for an arbitrary linear order, the prisoners have a strategy that *does not use communication* (i.e., nobody can hear anyone else’s guess) and guarantees that the set of prisoners who guess wrong does not have an infinite ascending chain.

If is an ordinal, this guarantees that is finite, and then the prisoners can use communication to save all but the first prisoner. If is the reals, then this guarantees that they can free all prisoners except those in a set of measure zero.

The solution is surprisingly simple. The prisoners agree beforehand on a well-ordering of the set of assignments of hat colors to prisoners, then each prisoner takes the -least assignment consistent with what they can see, and guesses the hat color they have in that assignment.

As an exercise, try showing that an infinite ascending sequence of wrong guesses would translate into an infinite descending sequence of hat color assignments, and thus contradict well-ordering.

Generalization 4:What if is just a relation, not necessarily transitive?

Chris Hardin and Alan Taylor considered this generalization in this paper. It turns out that things become complicated again.

For example, suddenly the number of different hat colors matters. Hardin and Taylor prove the following striking theorem:

Theorem:Suppose the prisoners are labeled and that each even can see all higher-numbered odds and each odd can see all higher-numbered evens. Suppose that no prisoner can hear anyone else’s guess. Then:

- If there are 2 hat colors, the prisoners have a strategy guaranteed to save infinitely many prisoners.
- If there are hat colors, the prisoners have no strategy guaranteed to save anybody.
- If there are hat colors, whether or not the prisoners have a strategy that can save anybody is independent of ZFC.

]]>

has elements , and operations , , and so forth defined on it. Furthermore, there is a map

which takes

sets to cardinalities such that (and so on).

Ordinary generating functions can be thought of entirely analogously

with set maps replacing sets:

There is a class with elements ,

, and operations , . Furthermore,

there is a (partial) map such that (and so on). Here, is defined by . Other operations on set maps (like disjoint union) are similarly defined pointwise.

(This is probably obvious and trivial to anyone who actually works

with generating functions, but it only occurred to me recently, so I

thought I’d write a blog post about it.)

The class is in fact a set, and is just the set of formal power series . The partial map takes to just in case is “canonically isomorphic” (a notion I’ll leave slippery and undefined but that can be made precise) to the map , where indicates disjoint union.

That provides a semantics for ordinary generating functions. Furthermore, this semantics has a number of features beyond those of cardinality. For example, in addition to respecting and , represents composition.

A similar semantics can be provided for exponential generating functions, but it takes a little more work. In particular, we have to single out as a distinguished set. Let be the smallest set containing all measurable subsets of for any finite and which is closed under finite products, countable disjoint unions, and products with sets for finite .

We can define the measure of all sets in by extending Lebesgue measure in the obvious way (taking the product of a set with will multiply the measure by ). Furthermore, notice that, by construction, every element of every set in is a tuple which (after flattening) has all of its elements either natural numbers or elements of and has at least one element of . Therefore, we can define a pre-ordering on by comparing the corresponding first elements that are in .

The point of all that is that, for , we can form the set which will again be in and its measure will be . The corresponding statement with cardinality is not true since you have to worry about the case when elements in the tuple are equal () but the set of tuples that have duplicates has measure 0, so by working with measure, we can get the equality we want.

Finally, let be the set of formal power series . The partial map takes to just in case is “canonically isomorphic” to the map for all in . Just as before, this map respects , , composition, etc.

Note that the exponential generating functions are usually explained via labeled objects and some sort of relabeling operation. This approach weasels out of that by observing that the event that there was a label collision has probability 0, so you can just ignore it.

]]>

The `Resolve`

function is called like `Resolve[formula,domain]`

where `domain`

gives the domain for the quantifiers in formula. Since we’ll always be working over in this blog post, let’s set that to be the default at the start.

`In[1]:= Unprotect[Resolve]; Resolve[expr_] := Resolve[expr, Reals]; Protect[Resolve]; `

Now let’s see what quantifier elimination lets you do!

(A couple of caveats first though: First, many of these algorithms are extremely inefficient. Second, I had some trouble exporting the Mathematica notebook, so I basically just copy-and-pasted the text. Apologies if it’s unreadable.)

Let’s start with just existential formulas. By eliminating quantifiers from , we can tell what the conditions are on a such that there’s at least one solution . For example:

`In[2]:= Resolve[Exists[x, x^2 + b x + c == 0]]`

Out[2]= -b^2 + 4 c <= 0

This just tells you that there’s a solution to the quadratic if the discriminant is non-negative. Let’s turn this into a function:

`In[3]:= atLeastOneSolution[formula_, variable_] := Resolve[Exists[variable, formula]]`

Now we can verify that cubics always have solutions:

`In[4]:= atLeastOneSolution[x^3 + b x^2 + c x + d == 0, x]`

Out[4]= True

Now suppose we wanted to find when something has at least two solutions. Just like resolving told us when there was at least one, will be true exactly when there are at least two.

This is just as easy to program as `atLeastOneSolution`

was, except that when we create the variables and we have to be careful to avoid capture (what if one of those two already appeared in ?). Mathematica provides a function called `Unique`

where if you call `Unique[]`

, you’re guaranteed to get back a variable that’s never been used before. With that we can define `atLeastTwoSolutions`

correctly (edit: actually, this isn’t right if the passed-in variable is also bound in the passed-in formula):

`In[5]:= atLeastTwoSolutions[formula_, v_] :=`

With[{s1 = Unique[], s2 = Unique[]},

Resolve[

Exists[{s1, s2},

s1 != s2 && (formula /. v -> s1) && (formula /. v -> s2)]]]

We can check this by verifying that quadratics have two solutions when the discriminant is strictly positive:

`In[6]:= atLeastTwoSolutions[x^2 + b x + c == 0, x]`

Out[6]= -b^2 + 4 c < 0

Here’s the condition for the cubic to have at least two solutions:

`In[7]:= atLeastTwoSolutions[x^3 + b x^2 + c x + d == 0, x]`

Out[7]= c < b^2/3 &&

1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] <=

d <= 1/27 (-2 b^3 + 9 b c) + 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3]

Note that (and I believe `Resolve`

always does this) the condition given first is sufficient that the later square root is well-defined:

`In[8]:= Resolve[ForAll[{b, c}, c < b^2/3 ⇒ b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3 > 0]]`

Out[8]= True

It’s clear that we can determine when there at least n solutions by a very similar trick: just resolve .

We’ll first write a helper function to produce the conjunction of inequalities we’ll need:

`In[9]:= noneEqual[vars_] :=`

And @@ Flatten[Table[If[s1 === s2, True, s1 != s2], {s1, vars}, {s2, vars}]]

In[10]:= noneEqual[{x, y, z}]

Out[10]= x != y && x != z && y != x && y != z && z != x && z != y

And now we’ll write `atLeastNSolutions`

:

`In[11]:= atLeastNSolutions[formula_, v_, n_] := With[{sList = Array[Unique[] &, n]},`

Resolve[

Exists[sList,

noneEqual[sList] && (And @@ Table[formula /. v -> s, {s, sList}])]]]

Given `atLeastNSolutions`

, we can easily write `exactlyNSolutions`

:

`In[12]:= exactlyNSolutions[formula_, v_, n_] :=`

BooleanConvert[

atLeastNSolutions[formula, v, n] && ! atLeastNSolutions[formula, v, n + 1]]

I used `BooleanConvert`

instead of `Resolve`

since there won’t be any quantifiers left in the formula, so we just have to do Boolean simplifications.

`In[13]:= exactlyNSolutions[x^3 + b x^2 + c x + d == 0, x, 2]`

Out[13]= ! 1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] < d <

1/27 (-2 b^3 + 9 b c) + 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] &&

1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] <=

d <= 1/27 (-2 b^3 + 9 b c) +

2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] && c < b^2/3

In[14]:= exactlyNSolutions[x^2 + b x + c == 0, x, 1]

Out[14]= -b^2 + 4 c <= 0 && -b^2 + 4 c >= 0

This last calculation shows that a quadratic has exactly one solution exactly when the discriminant is both nonnegative and nonpositive (as you can see, there is no guarantee that the formula will be in it’s simplest form).

We now have a way to test whether a formula with one free variable has solutions for specific values of , since `exactlyNSolutions`

will return either `True`

or `False`

if you quantify out the only variable. For example:

`In[15]:= p = x^4 - 3 x^3 + 1`

Out[15]= 1 - 3 x^3 + x^4

In[16]:= Plot[Evaluate[p], {x, -3, 3}]

`In[17]:= exactlyNSolutions[p == 0, x, 2]`

Out[17]= True

It would be nice, however, to have a function which will just tell you how many solutions such a formula has.

In the single-variable polynomial case, we could just try `exactlyNSolutions`

for until we find the right . However, there might not be finitely many solutions if the formula involves inequalities or higher-dimension polynomials (e.g., has infinitely many solutions).

How can we tell if a formula has infinitely many solutions? Well, the fact that has quantifier elimination implies that for with just free must be a finite union of points and open intervals (since the only quantifier free terms are and . Therefore is infinite iff it contains a non-empty open interval, i.e., iff .

`In[18]:= infinitelyManySolutions[formula_, v_] := With[{a = Unique[], b = Unique[]},`

Resolve[Exists[{a, b}, a < b && ForAll[v, a < v < b ⇒ formula]]]]

To test:

`In[19]:= infinitelyManySolutions[Exists[y, x^2 + y^2 == 1], x]`

Out[19]= True

Now we can write `numberOfSolutions`

and be assured that it will always (theoretically) terminate for any formula with a single free variable:

`In[20]:= numberOfSolutions[formula_, v_] :=`

If[infinitelyManySolutions[formula, v], Infinity,

Block[{n = 0},

While[! exactlyNSolutions[formula, v, n], n++];

n]]

A few examples:

`In[21]:= numberOfSolutions[p == 0, x]`

Out[21]= 2

In[22]:= numberOfSolutions[p > x^2, x]

Out[22]= ∞

In[23]:= numberOfSolutions[p > x^6 + 5, x]

Out[23]= ∞

In[24]:= numberOfSolutions[p > x^6 + 6, x]

Out[24]= 0

In[26]:= Plot[{p, x^6 + 5, x^6 + 6}, {x, -1.6, -1},

PlotLegend -> {HoldForm[p], x^6 + 5, x^6 + 6}, LegendPosition -> {1, 0},

ImageSize -> Large]

Up to now, all our functions have taken single variables, but we can accomodate tuples of variables as well. First, we’ll define the analogue of `noneEqual`

to produce the formula asserting that none of the given tuples are equal (recall that two tuples are unequal iff a pair of corresponding components is unequal):

`In[27]:= noTuplesEqual[tuples_] := And @@ Flatten[Table[If[t1 === t2, True,`

Or @@ MapThread[#1 != #2 &, {t1, t2}]], {t1, tuples}, {t2, tuples}]]

In[28]:= noTuplesEqual[{{x[1], y[1]}, {x[2], y[2]}}]

Out[28]= (x[1] != x[2] || y[1] != y[2]) && (x[2] != x[1] || y[2] != y[1])

Now we can add rules to our old function to deal with tuples of variables as well:

`In[29]:= atLeastNSolutions[formula_, variables_List, n_] := With[`

{sList = Array[Unique[] &, {n, Length[variables]}]},

Resolve[

Exists[Evaluate[Flatten[sList]],

noTuplesEqual[sList] &&

```
```

`And @@`

Table[

formula /. MapThread[Rule, {variables, tuple}], {tuple, sList}]]]];

We can extend `infinitelyManySolutions`

by observing that a formula has infinitely many solutions iff some projection does.

`In[30]:= infinitelyManySolutions[formula_, variables_List] := Or @@ Table[`

infinitelyManySolutions[Exists[Select[variables, ! (# === v) &], formula],

v], {v, variables}]

In[33]:= ContourPlot[{x^2 + y^3 - 2, x^2 + y^2/4 - 2}, {x, -3, 3}, {y, -3, 3}]

`In[34]:= exactlyNSolutions[x^2 + y^3 == 2 && x^2 + y^2/4 == 2, {x, y}, 2]`

Out[34]= False

(There are actually four solutions. This example of a set equations for which it’s difficult to tell how many solutions there are by graphing is from Stan Wagon)

In the last section, we saw how to use quantifier elimination to find out how many roots there are. But how can you actually find the roots?

In a certain sense, you’ve already found them just when you identified how many there are! To “find” a root in this sense, you just introduce a new symbol for it, and have some means for answering questions about its properties. Given some property , if you want to determine if it holds of the 6th root of some polynomial with 17 roots, then you just have to decide .

We can implement this by a function `withSpecificRoot`

, that takes a variable, the formula it’s supposed to be a solution to, which of the roots it’s a solution to, and a formula in which you want to use this root:

`In[35]:= withSpecificRoot[variable_, rootFormula_, whichRoot_, totalRoots_, formula_] :=`

```
```

`With[{roots = Array[Unique[] &, totalRoots]},`

Resolve[

Exists[Evaluate[roots~Join~{variable}],

Less[Sequence @@ roots] &&

variable ==

roots[[whichRoot]] &&

(And @@

Table[(rootFormula /. variable -> root), {root, roots}]) && formula]]]

We can tell where various roots are with respect to already-known real numbers:

`In[36]:= withSpecificRoot[x, x^2 - 3 == 0, 1, 2, x < 3]`

Out[36]= True

In[37]:= withSpecificRoot[x, p == 0, 1, 2, x < 1]

Out[37]= True

In[38]:= withSpecificRoot[x, p == 0, 2, 2, x < 1]

Out[38]= False

We can also compute relationships between roots like :

`In[39]:= withSpecificRoot[sqrt6, sqrt6^2 == 6, 2, 2,`

withSpecificRoot[lhs, lhs^2 == 5 + 2 sqrt6, 2, 2,

withSpecificRoot[sqrt3, sqrt3^2 == 3, 2, 2,

withSpecificRoot[sqrt2, sqrt2^2 == 2, 2, 2,

lhs == sqrt3 + sqrt2

]]]]

Out[39]= True

That’s all I have time for now, but I hope to write another blog post on the subject soon!

]]>

Although topology is usually motivated as a study of spatial structures, you can interpret topological spaces as being a particular type of logic, and give a purely logical, non-spatial interpretation to a number of bits of topology.

This seems like one of those facts that was obvious to everyone else already, but I’ll write a quick blog post about it anyway.

As you’re probably aware, a set of natural numbers is called semi-decidable if there is a computer program which, given any , will eventually terminate and return “Yes” if . If , the program is not required to ever return and you may never learn whether or not .

There are many such “semi-decidable” propositions unrelated to natural numbers which intuitively have the same property: i.e., there is some test you can perform such that, if is true, you will eventually find out, but if it’s false, you may never learn that fact. For example, consider the proposition that you are (strictly) taller than 6 feet. To test , you could measure your height with ever-finer rulers. If your height is actually strictly greater than 6 feet, you will eventually find out when you use a ruler with granularity finer than to measure your height. On the other hand if is false and you are unfortunate enough that is exactly 6, you will never learn whether or not is true no matter how fine a ruler you use.

Let’s come up with a logic for such semi-decidable propositions. We’ll keep it a propositional logic to keep things simple. First off, notice that we shouldn’t allow negation: if is semi-decidable, it’s not necessarily the case that is. On the other hand, we can allow conjunction: if and are semi-decidable, then you can test by testing and separately and stopping if and when both the tests for and stop.

Furthermore, we can allow *arbitrary* disjunctions: given , we can test by running all the tests in parallel and stopping when any of them stop. Note that even given that we can arbitrarily many tests at the same time, it still doesn’t follow that arbitrary conjunctions of semi-decidable propositions are semi-decidable: if the first one terminates after 1 minute, the second after 2, etc., we’ll never be able to stop the test of the conjunction even though all tests terminate eventually.

Implication is a bit tricky: isn’t necessarily semi-decidable for the same reason that isn’t, but we still want to reason about the case where implies . Therefore, we’ll allow the formation of the statement with the meaning that implies , but only at the “top level”, i.e., you can’t nest this connective.

Finally, both and are semi-decidable.

Now we need rules to tell us when a set of statements implies another statement. First off, there are some boring structural rules that I’ll omit (e.g., and imply and so on).

There are rules that give the two connectives their meaning:

- For any and , and hold.
- For any , , and , the statements and together imply .
- For any and , holds.
- For any and , the set of statements implies the statement .

Finally, there’s a distributivity rule:

- For any and the statements and are equivalent (each implies the other).

As you’ve probably guessed, there is a close connection between semi-decidable logics and topological spaces. In fact, given a topological space, you can form a semi-decidable logic by making a propositional symbol for each open set . You can then interpret all propositional formulas as open sets by interpreting as union and as intersection. Finally, take as a set of axioms the set of all statements where the open set corresponding to is a subset of the open set corresponding to . This set is closed under the inference rules given.

You can also start with a semi-decidable logic and generate a topology; this is a form of Stone duality. In general, if you start with a topological space, translate to a semi-decidable logic, then translate back, you might not get your original space back. However, you will if the space you start with is sufficiently nice (e.g., Hausdorff).

With that out of the way, let’s interpret some topological concepts in our new logical framework!

- Topologically, a
*neighborhood*is any set which contains a non-empty open set. Logically, these correspond to propositions that are*possible to learn*. In the height example, the proposition that your height is in is a neighborhood since you might happen to learn it by learning some stronger fact, but you can’t run a semi-decidable test for exactly that proposition. In contrast, the proposition that your height is exactly 6 ft is not even a neighborhood. - Topologically, an open covering is a set of open sets whose union covers the whole space. Logically, this corresponds to a
*deterministic experiment*. If you run the tests in parallel, you are guaranteed that eventually (at least) one of them will stop, since the sets cover the whole space. In the height example, the open covering of all open intervals of diameter corresponds to measuring your height to a granularity of and recording the results. - Topologically, an open covering is a
*refinement*of an open covering if every element of is a subset of some element of . Logically, this corresponds to experiment being*more informative*than experiment . Whatever answer you get from experiment , you will be able to answer the question asked by experiment . - Topologically, a space is
*compact*if every open cover has a finite refinement. Logically, this means that anything about the space that you could find out by any experiment at all is actually discoverable by an experiment that runs only finitely many tests and hence is (maybe) doable in real life. - Topologically, a space has
*Lebesgue covering dimension*if all open covers have a refinement with no of the open sets having non-empty intersection. Logically, this corresponds to something like a bound on the amount of information you can get from one experiment. The information you get from running an experiment is just the list of propositions (open sets) which you’ve learned are true. The condition guarantees that that list will be no longer than , bounding the information received from the experiment. This makes spatial sense too: a measurement on in general yields less information than a single measurement on .

Actually, that last correspondence was the whole impetus for me writing this blog post: I never really understood the definition of Lebesgue covering dimension from a spatial perspective, but it makes perfect sense to me from a logical perspective.

Here are a few more random facts which may or may not be accurate and/or make sense.

I believe that “semi-decidable logic” as I presented it is in fact the propositional version of geometric logic. Geometric logic also has a higher-order form: just as the propositional form corresponds to topologies, I believe the full higher-order form corresponds to toposes.

I think you can extend this interpretation of topological spaces to an analogous one for sheaves. I believe it’s something like: a sheaf corresponds to a set of solutions to some problem that you learn more about as you learn more semi-decidable propositions. In particular, the gluing property corresponds to the fact that: if you can determine via an experiment which of or holds, and you have a solution given and a solution given that are compatible, then you have a solution: run the experiment, then use the solution of whichever of or turns out to be true.

Making sense of this is left as an exercise to the reader.

I said above that I’d address the assumption that you can run arbitrarily many tests at once. I believe that, among many other things, Grothendieck topologies remove this restriction.

Regular topologies have the sort-of-odd property that the open covering relation is completely determined by the partial ordering on open sets given by inclusion. Grothendieck topologies do away with this: in a Grothendieck topology, there is in addition to the partial order an assignment for every open set of which sets of open sets are deemed to define “open covers”. Grothendieck topologies also remove the restriction that the “partial order” on open sets is a partial order; it’s allowed to be a more general category.

]]>

- if logically entails .
- if Sue considers at least as likely to be true as is.

Let be the equivalence relation defined by iff and let similarly be defined by iff .

Then we know what type of structure is: since we’re assuming classical logic in this article, it’s a Boolean algebra. What type of structure is ?

We can at least come up with a couple of examples. Since Sue is a perfect logician, it must be that if , then . If Sue is extremely conservative, she may decline to offer opinions about whether one proposition is more likely to be true than another except when she’s forced to by logic. In this case, is equal to and therefore again a Boolean algebra.

In the other extreme, Sue may have opinions about *every* pair of propositions, making a total ordering. A principal example of this is where is isomorphic to a subset of and Sue’s opinions about the propositions were generated by her assigning a probability to every proposition .

What’s in between on the spectrum from logic to probability? Are there totally ordered structures *not* isomorphic to or a subset? More ambitiously: every Boolean algebra has operations , , , while has operations , , which play similar roles in the computation of probabilities (note that is partial on ). How are these related and does every structure on the spectrum from logic to probability have analogous operations?

These structures (i.e., structures of the form for some acceptable in a sense to be defined below) were called **scales** and defined and explored in a very nice paper by Michael Hardy.

Modding out by the equivalence relations once and for all, the general setup is that we have a map (induced by the identity function on in the above setup) from a Boolean algebra to a poset . What should be true of ?

Since if a proposition logically entails a proposition , Sue will consider at least as likely to be true as , we should have that implies ( will now be the ordering in either or , depending on context). In fact, we should have that implies .

Actually we should have more: For example, it should be the case that if , then . In general, if is a propositional formula where appears negatively (that is, all occurrences of are negated in a normal form of ), then should imply and the reverse is true if appears positively in . Furthermore, if we can require that the inequality be strict.

Finally, we should require that not just if , but even if it only holds that . That is, even if doesn’t logically entail , if you consider more likely to be true than , you should consider more likely to be true than . A similar generalization to holds as above.

These considerations are equivalent to Hardy’s definition:

Let be a Boolean algebra, be a poset, and . Then is called a

basic scalingif:

- is strictly increasing, so that implies .
- preserves relative complementation, so that if and , then , where is the relative complement .

Hardy proves that the relative complement operation is well-defined on , that is, that depends only on , , and . Note however, that it is a partial operation: even if in , there is no guarantee that there such that , , .

A **scale** is then defined as a poset together with a partial relative complement operation which is the range of a basic scaling.

Hardy’s paper gives many examples of scales, including a few pretty wild ones. Here’s one: Let be the boolean algebra of subsets of . Let iff or . Let iff . This defines a basic scaling to a scale .

What does it look like? Every element except for has an immediate predecessor, and every element except for has an immediate successor. Therefore, it is partitioned into “galaxies” together with in initial galaxy and a final galaxy . Between any two galaxies that are comparable, there are uncountably many galaxies and infinite antichains of galaxies.

We already know that there are appropriate analogues of in all scales, since we know that relative complementation carries over in a well-defined way from the domain Boolean algebra.

What about ? Hardy proves the following:

- If in , then depends only on and . In this case we define to be .
- For , if exists then .

It turns out that, for any , the operation is a partial injective map. Let be its inverse.

Hardy calls a scale **divided** if the necessary condition for existing given by (2) above is also sufficient. He proves:

For any divided scale, and , in Boolean algebra ,

In other words, all divided scales do have a operation, which satisfies the appropriate law from probability theory.

Finding an analogue of or is trickier, and, when he wrote the paper, Hardy only knew how to do it in the case that the scale is linearly ordered and Archimedean, defined as follows:

Let . Then is called

infinitesimalif there is an infinite subset such that for such that for all .A scale is called

Archimedeanif it is divided and has no nonzero infinitesimals.

The idea behind the definition of infinitesimal is that, assigning the Boolean algebra a total measure of 1, the measures of the elements of must approach 0.

In that case, you can define a division as follows: Let be the maximum number of times can be subtracted from , let be the maximum number of times that result can be subtracted from , and so on. The quotient is then defined as the continued fraction:

Then the map maps the scale injective to a subscale of (in particular, preserving ). Thus, can be pulled back from its definition on .

]]>

Let’s momentarily accept the heresy of saying that the square root of a negative number is , so that our function will be total.

How can we represent the situation of this branching “function” topologically?

One thing we could do is just take the graph of the multivalued “function” itself in the subspace topology of , which is topologically just like this:

This has the downside that, at the origin (the place where all three lines meet), it doesn’t really represent the fact that the graph of the original multivalued “function” was the union of two genuine single-valued functions: There is no neighborhood of the origin which is functional in any way.

But what about this topological space? (The two filled-in dots represent points which are present in the space, the non-filled-in dot represents a point missing from the space. The border is not part of the space.)

Here, the open sets are given by a basis consisting of: open sets on either of the three branches: i.e., like this

and this:

and this:

as well as open sets of this form:

and this form:

(Note that this space isn’t Hausdorff!).

This space (call it ) *represents* the fact that the original graph was the union of two genuine single-valued functions in the following sense: There is a function, , from to the -axis (i.e., ) such that for all points , there is an open set such that restricted to is a homeomorphism to an open set in . That is, every point in has a neighborhood such that the inverse of restricted to that neighborhood *is *a function.

The space as defined above is (sort of) a sheaf over . More precisely, it’s an étalé space: To restate the definition above more generally, an étalé space over a topological space is a topological space together with a function such that, for all , there is an open set such that is open and is a homeomorphism. In general, for any , the set is called the *stalk* over . Note that, as a subspace of , any stalk is discrete.

As another example along similar lines, gluing the domains of the different branches of the complex logarithm gives rise to a sheaf over : (This image is from Wikipedia and is by Jan Homann.)

In this case, is the space pictured, is (or ), and is projection along the depicted vertical direction. Note that this represents how the domains of the various branches of the complex logarithm fit together; this graph is *not* a depiction of the complex logarithm or any branch of the complex logarithm, which is a function from to , and thus hard to draw!

The two examples I gave were of natural functions which happen to be multivalued, but there are much more general examples of étalé spaces. For example, for any topological spaces and , there is a sheaf of *all* continuous functions from to ! The étalé space corresponding to this sheaf would have, for each , the stalk equal to the set of germs of continuous functions from to at . (See here for more).

And now we’ll switch to a seemingly totally different topic.

A modal logic is a logic that contains an operator , representing necessity, and , representing possibility. If is a proposition, then should be interpreted as the proposition “It is necessary that ”, or “ is necessarily true”, and should be interpreted as the proposition “It is possible that is true”.

The operators and are dual to each other: is equivalent to and similarly is equivalent to (if you think about it, this actually makes real-life sense, as well as just sense in logic-land!).

There are a number of different axiomatic systems for propositional modal logic; here we’ll just consider S4, which was invented by C. I. Lewis in the early 20th century. It has the following rules:

- For all propositions , is a theorem.
- For all propositions , is a theorem.
- For all propositions and , is a theorem.
- For any proposition , if is a theorem, then is a theorem. (Note that this does
*not*say that is a theorem; if it did, the whole thing would be trivial!).

In the middle of the 20th century, Saul Kripke invented possible-worlds semantics for modal logics. The idea is that there is a set of possible worlds and at each possible world , each atomic proposition may hold or not, independently of the other possible worlds. Furthermore, there is a relation between possible worlds; should be interpreted as “World considers world possible”. The whole setup is called a *Kripke frame*.

This gives a semantics for modal logic: For any proposition , holds at a world if there is some world that considers possible such that holds at . Similarly, holds at a world if holds at *every* world that considers possible.

Somebody showed (probably Kripke, but I’m not sure) that the class of Kripke frames where is reflexive and transitive corresponds to S4, in the sense that all theorems of S4 hold in all such Kripke frames, and everything which holds in all such Kripke frames holds in S4.

Note that the accessibility relation is completely clear-cut and discrete: given a world , you know exactly what worlds it considers possible. But, interpreting as a measure of “closeness” of two worlds, observe that topology gives us a more nuanced version of what “closeness” means: we think of the topology on , for example, as defining what “closeness” means on , even though no two fixed real numbers are actually close to one another! (At least, they’re not close to one another in any absolute sense.)

It turns out we can incorporate this sense of “closeness” into the semantics for modal logic as well. We’ll define a topological Kripke model this way: We again have a set of possible worlds, but instead of defining a relation between them, we define a topology on . As above, we say, for each possible world , which of the atomic propositions hold at that world. Note that this is equivalent to defining arbitrary subsets of . In general, we’ll let be the set of worlds at which proposition holds. Then we can define to be the interior of and to be the closure of . In other worlds, holds at a world there is some open set such that holds at every world , and holds at a world if for all open sets there is some such that holds at .

It turns out that this semantics, too corresponds to S4, in the sense that all theorems of S4 hold in all topological Kripke models and all propositions which hold in all topological Kripke models are theorems of S4. (I’m not exactly sure who first came up with this. I looked at the Wikipedia article and some of its links, but I couldn’t quite figure out who was the first. It seems Tarski was involved somehow anyway.)

So far, the logics we’ve considered have all been propositional, but you can easily add first-order logic to S4 to get FOS4. There have been a number of proposals for how to define a semantics for FOS4. In 2008, Steve Awodey and Kohei Kishida proposed that the right generalization of topological semantics for S4 to first-order semantics for FOS4 was to use étalé spaces!

Here’s an example of how it works. Consider the following étalé space:

The top space, , is homeomorphic to . The bottom space, , is the circle . The projection is given by .

We can put a first-order structure on this étalé space by considering each stalk for to be a first-order universe, just as when defining regular semantics for first-order logic.

We can define the interpretation of relation symbols, function symbols, and constant symbols more or less arbitrarily: The only restrictions are that the selection of the interpretation of any particular constant symbol from each stalk must be done in a continuous manner, and there is a similar continuity condition on the interpretation of function symbols.

It will turn out that the interpretation of a formula with free variables will be a subset of , where is the th fibered product over of with itself. (This just means that, e.g., is the étalé space where the stalk of any is .). In particular, as in the regular topological semantics, the interpretation of a sentence will be a subset of .

The crucial step, as before, is that the interpretation of is the interior of the interpretation of , and the interpretation of is the closure of the interpretation of , although now the interiors and closures are being taken in the topology of .

To see how this works, suppose that in our example we define the relation on each stalk by restricting the usual ordering on . Then it is true in our model that (since it is true in every stalk), but it is not true that . The reason is that in this stalk:

the red dot is the minimum element of the stalk, but if you push it to the right just a little bit, that’s no longer true. Intuitively, it’s the minimum element, but not *necessarily *so.

Awodey and Kishida prove that, as in the other cases, this sheaf semantics corresponds to FOS4 in that every theorem of FOS4 holds in all sheaf models, and everything which holds in all sheaf models is a theorem of FOS4. (They actually prove something a bit stronger than that.)

This relationship between sheaves and logic came up relatively recently, but there is a longer and more well-known relationship which I’ll just mention briefly, namely through toposes. Using topos theory, you can interpret the class of sheaves over a topological space as a category similar to the category of sets. The interpretation is actually fairly similar to the interpretation here, but instead of interpreting modal logic, higher-order (non-modal) logic is interpreted. It turns out that the set of truth values in this interpretation is the set of open subsets of (whereas in the modal interpretation just given, the truth values could be arbitrary subsets of ). One of the remarkable things about this is that the set of open subsets of can *itself *be interpreted as an étalé space over , which is what allows the equivalent of power sets to be taken in this category.

For more information on this, see Sheaves in Geometry and Logic.

]]>

**Observation #1** (which I read in Chapter 23 of David Easley and Jon Kleinberg’s book Networks, Crowds, and Markets): **Voters will vote strategically (i.e., they will lie) even when they have a common goal.**

In the setup above where each voter has a set of personal preferences and voters are essentially competing with other voters who have different preferences, it is easy to come up with situations where it would be advantageous for a voter to lie. For example, if a voter’s true rankings are where , , and are candidates, but has a much better chance of winning than does, it may be advantageous for the voter to submit a ranking of if she wants to maximize the chance that comes out on top.

However, in a situation where every voter has the same goals, but they have different private information (and it’s impossible or infeasible for them to share their private information with each other), it seems like there’s never a reason for a voter to lie. But there is, even when there are only two alternatives that are being voted on.

Consider the following game: There is a vase filled with marbles. Either it has 10 white marbles (call this state ) or 5 white marbles and 5 green marbles (call this state ). Which of state or state holds was determined by flipping a fair coin before the game started (and this fact is common knowledge). Each of three voters independently and without communication draws one marble at random from the vase, observes its color, puts it back, and then votes on whether or not or holds. The voters win if a majority guesses right and lose otherwise.

As you can work out: if you draw a white marble, you believe state holds with probability and holds with probability . If you draw a green marble, you believe state holds with probability and holds with probability (you are sure that state holds).

However, voters will not vote their true beliefs: Suppose they did, and consider whether a fixed voter has an incentive to deviate from this strategy (i.e., consider whether all voters voting their true beliefs is a Nash equilibrium). When you draw a green marble, you should definitely vote for state . But what should you do when you draw a white marble? The key question is: when will your vote make a difference? Only when one other voter votes and the other voter votes . But because they are voting sincerely, the voter who voted must have drawn a green marble and, therefore she must be right! So, you should vote as well. That is, if you think that the other two voters are voting sincerely, you should disregard the information you get from observing a marble, and always vote !

In the Easley-Kleinberg chapter, the authors also demonstrate a version of this with juries, where the vote must be unanimous to convict a defendant, and otherwise she will be acquitted. A similar situation happens: assume that you are thinking of voting to acquit. Under what circumstances will your vote make a difference? Only when every other juror has voted to convict. In that circumstance, it is quite likely that you are wrong, and the defendant was guilty, and thus you should have voted to convict no matter what! (This shows that everyone voting sincerely is not an equilibrium but doesn’t show what the equilibria of this game are. According to the authors, finding the equilibria is quite difficult.)

**Observation #2 **(which I saw in a paper by Roger Sewell, David Mackay, and Ian McLean): **Maximizing the Entropy of the Outcome of Voting Leads to Good Results**

As I alluded to above, Arrow’s impossibility theorem says that there’s no satisfactory way to provide an output ranking of candidates given the input rankings from each voter (here we will assume that we simply know each voter’s true ranking and not consider strategic voting). However, this just applies to *deterministic *voting procedures: it is, in fact, quite easy to come up with a *probabilistic *voting procedure satisfying all of the hypotheses of Arrow’s impossibility theorem: just pick a voter uniformly at random and take their preferences to be the output ranking!

Aside from the fact that it seems unlikely that the general public would accept such explicit randomization in the voting procedure any time soon, this process (which is called Random Dictator) has a couple of other negative aspects. First of all, it can easily lead to extreme outcomes: Suppose that there are 20 candidates, and the top 10 in the output ranking will be given various positions in the government. Suppose 10 candidates are from one political party, and 10 are from the other, and further that the populace is highly polarized: everyone ranks their party’s 10 candidates strictly better than each of the 10 candidates of the other party. Then it is guaranteed that a single-party government will be the result. What might be better is 10 officials chosen from a mix of the two parties according to each party’s representation in the voting public.

Another way in which Random Dictator doesn’t compromise very well is the following: Suppose there is heavy contention between candidates from the top rank among voters, but there is a candidate that is everyone’s second choice. Then under Random Dictator there is zero chance that will be top-ranked in the output-ranking, even though it intuitively seems that is more generally liked by the population than any of .

The authors of the paper fix these problems by proposing the following procedure: For each pair of candidates and , record the proportion of the voting population which prefers to . Now, from all probability distributions over output rankings such that, for each pair of candidates , , the probability of being ranked higher than in the output ranking is , choose the one that has maximum entropy. Then choose an output ranking according to that distribution.

I won’t define maximum entropy here, but I will give a few examples. The idea that there is a number called the entropy associated with every probability distribution, and furthermore that if you are looking for a probability distribution in a certain class, but don’t know anything about it except that it is in that class, then the “right” distribution to take is the one that maximizes the entropy (obviously this is an unprovable assertion). In some sense, choosing the maximum entropy distribution from a class codifies the fact that you know nothing about it except that it is in that class.

For example, the maximum entropy probability distribution over the set is the uniform distribution, which assigns probability to each number. Fixing a and , the maximum entropy probability distribution in the set of probability distributions over with mean and variance is a Gaussian distribution. Fixing , the maximum entropy distribution over the positive reals with mean is the exponential distribution with mean .

Basically, what the authors have proposed is that the actual total ranking of each voter are not important (or else we would have the first problem mentioned above), and in particular which candidate each voter happened to place in the top rank is not important (or else we would have the second problem mentioned above); the only thing that’s really important is getting the correct proportions of the pairwise rankings right. And the way to get a distribution on output rankings which reflects nothing except for the constraints on the pairwise rankings is to pick the maximum entropy distribution satisfying those constraints.

The beauty of this is that if you disagree with them, that’s fine: all you have to do is figure out what *you *think is the important information to preserve from the voters’ rankings, then pick the output distribution which maximizes entropy among those satisfying those rankings, and that will be the “right” output for your choice of what’s important, in the sense that it will not “take into account” anything that you don’t think is important. To take a trivial example, if you decide that the total ranking of each voter is important, then the maximum entropy distribution on output rankings will degenerate into the Random Dictator process.

The authors additionally ran simulations of elections using various voting procedures in order to verify that the maximum entropy voting scheme they propose is “better” (in some senses they define) than others.

]]>

In his chapter on quantum mechanics, he defends the “many-worlds” interpretation (although he doesn’t think the term accurately describes the concept) versus the Copenhagen interpretation. In the process of doing so, he does something I thought was extraordinary: he comes up with a simple model of quantum mechanics in which all of the standard concepts you read about: the two-slit experiment, the Heisenberg uncertainty principle, etc., are represented. This model requires no prerequisites from physics and actually uses almost totally discrete mathematics!

(Edit: I somehow missed this when originally writing this post, but Drescher also outlines quantish physics in an online paper.)

I’ll sketch it below.

The first step is to define the “classical” version of our physics, which we will then tweak to get the quantum version. The “topology” of our universe will be given by a finite directed graph where each vertex has three edges coming in, and three edges going out. There is given a bijection such that if edge is directed in to vertex , edge is directed out of vertex . Given this bijection, you can think of each edge as actually a piece of a wire: a directed loop in the graph. Finally, we require that each edge is labeled either , or so that each triple of in-edges to a given vertex gets a distinct label.

We can picture vertices like this:

The vertex is represented by the big box in the center. We will always put the control edge at the top, and the two switch edges at the bottom. Note that the edge need not have the same label as .

Particles inhabit edges. If is the set of particles and is the set of edges, then a point in determines the position of each particle. The set is thus called (classical) configuration space. Time in this universe is discrete; to describe how the system evolves, we just have to define the successor function which tells how the system progresses one time step (the superscript stands for “classical”).

For any edge and label , we let be the edge with same destination vertex as and with label .

For a configuration and edge , we let if edge does not have a particle in configuration , and we let it be if it does.

Now we define as follows: For every edge labeled , . For every edge labeled ,

In other words, particles on a control edge always go straight along whatever loop they are on. However, particles on a switch edge may or may not cross over to the loop of the other switch edge, depending on whether or not there is a particle on the control edge (hence the names).

For example, this configuration:

turns into this configuration:

and this configuration:

turns into this configuration:

Now for the “quantum” variation, which Drescher calls quantish physics. In this case, each particle now has a *sign* (/) attached to it. Furthermore, each vertex has an angle associated with it, called ‘s *measurement angle*. The classical configuration space was ; the quantum configuration space will be the set of all formal linear combinations of states . Given a state , the number can reasonably be interpreted as the probability of being in state (see Drescher for more comment on this).

The task now is to describe the successor function describing how the universe evolves through time. First some preliminary definitions:

Given a nonzero complex number , an angle , and , let be the component of which is parallel to if and the component of which is perpendicular to if .

Note that and, due to the Pythagorean theorem, .

If, furthermore, , let be the component of which is parallel to (note: , not as above) if and perpendicular to if .

As above , and . Similarly, and .

Finally, note that for a given , and , the split function is simply multiplication by a complex number independent of (and similarly for the two-argument split function).

First off, is -linear; it therefore suffices to define for classical configurations .

If a particle is on edge labeled , it always passes straight through to .

If a particle is on a switch edge, the successor state will be the sum of (up to) 4 non-zero classical configurations corresponding to whether or not it stays on its loop or crosses over to the loop of the other switch edge, and whether or not it changes sign. Let for denote the classical state where the particle stays on its own loop iff and the particle keeps the same sign iff . Then the weight given to is (the is not a typo). The use of is just because we are assuming that we are starting from a pure classical configuration; if it’s a classical configuration time a weight , the would be replaced with .

When there are particles in the classical configuration , each is split separately; may be the sum of up to classical configurations. Since the splitting is simply multiplication by a complex number, it doesn’t matter in what order the splittings are performed.

I’ll now briefly describe some quantum phenomena which can be interpreted in the quantish world. For much more insight, meaning, and many more examples, please see Drescher’s book!

**The Two-Slit Experiment**

Suppose we have the following configuration:

where the measurement angles of and are equal and oblique to the measurement angle of , and the sign of the particle is positive. Then, it is always the case that after three timesteps, the particle is in the middle rightmost edge (i.e., applied three times to the initial state yields the linear combination consisting of the sum of the single classical state where the particle is in the middle rightmost edge). It is never in the bottom rightmost edge.

However, if we *remove* one of the ways the particle can get to the bottom edge:

now the particle arrives at the bottom rightmost edge with positive probability (i.e., nonzero weight). (The ellipsis simply means that this edge goes somewhere else; we don’t care what happens there.) This is because of destructive interference in the first case, which was removed in the second.

Suppose we try to investigate what’s going on, and *observe* if the particle is on one particular edge:

What’s going on here? Well first of all, the way we observe things is by, e.g., having the particles we want to observe interact with particles in our eye. In this example, we’ll take the red particle at the bottom to be a particle “in our eye” and observe the blue particle by having it interact with the red. Second of all, there are various “delay” gates throughout (at the bottom left, the delay gate is explicit, at the top, two wires are labeled “delay”, which means that they pass through a delay gate, although I haven’t drawn it). These aren’t really significant; they’re just to synchronize things.

Note that this gate has exactly the same behavior as our first setup, except that we are observing when the blue particle is on the bottom edge by having it interact with the red particle (and syncing things up). However, the results are as in the second case: with nonzero probability, the red particle appears on the bottom rightmost edge! The observation blocks the destructive interference of the first case.

**Heisenberg Uncertainty**

Say that a particle in a quantish state is definite with respect to a measurement angle if passing it through a switch wire of a vertex with measurement angle and no other particles entering the vertex will yield the particle always emerging on one specific edge. Heisenberg uncertainty is represented in quantish physics by the fact that whenever a particle is definite with respect to some measurement angle , it is always indefinite with respect to measurement angles oblique to .

**The Einstein-Podolsky-Rosen Experiment**

Unfortunately, the diagrams for this setup are beyond my poor figure-making abilities. However, I can substitute a poor description for poor figure-making. It is possible to entangle two quantish particles by sending both of them through gates which have related measurement angles, then setting up two further gates through which a third particle is sent. In the first of these gates, one of the switch wires from the first particle’s measurement is the control wire, and in the second of these gates, one of the switch wires from the second particle’s measurement is the control wire. You then observe if the third particle emerges from the second of the two gates at the same place where it entered.

In that case, you can then measure the two particles both with respect to any fixed angle, and you will get the same results for both. (Let me reiterate that this was a terrible description; see Drescher’s book for more).

]]>

Conversely, we can call a function *strongly non-even* if for all , , .

Finding strongly non-even functions is easy, as any injective function provides a trivial example. We can make things harder for ourselves by considering only functions from to . But now, it is just as easy to show that there are no strongly non-even functions.

Therefore, let’s make the following definition: Let a function be *non-even of order* if, for all , . Thus, a strongly non-even function is non-even of order , and a function being non-even of order implies that it’s non-even of order for all .

In this paper, the set theorists Peter Komjáth and Saharon Shelah proved:

The existence of a non-even function of order 1 is equivalent to the Continuum Hypothesis (i.e., the statement that ).

Thus, if we assume that there is a non-even function of order 1, then we can conclude that . Can we weaken the hypothesis and still conclude something interesting? We can, as they also proved:

For any , if there is a non-even function of order , then .

They showed this by showing the following (the statement above follows directly from this, given just a bit of thought):

For any vector space over of cardinality , and any function from to , there is an and a set of unordered pairs such that and , where the cardinality of is .

I’ll show here how to prove the weaker statement obtained by replacing with .

Fix and . We will construct a set of basis elements with the property that for every set , . Taking to be , this will provide the required number of unordered pairs. (As a slight bit of notational convenience, if is a set of basis elements, I will write for .)

Let be a basis for . For , let be called the th slice of basis elements. For each between and and , we will pick to be in the th slice of basis elements.

Given , suppose that we have defined for and we will define as follows: pick it so that it is in the th slice of basis elements but is not any singleton set of the form: for any finite sets and . Since there are only such singleton sets, and elements of the th slice of basis elements, we can find such a .

Now we define the . Given , assume that have already been defined and we will define : pick it from the th slice so that for all subsets of , (this is possible by the defining condition of .

It is now easy to prove the following proposition by induction on :

For any set (where ), equals .

(The proof simply uses the defining property of the .) Now, taking , the result follows.

]]>

In the course of John Harrison‘s logic textbook Handbook of Practical Logic and Automated Reasoning, all three of these algorithms (and many more) are implemented. Furthermore, you can download and play with them for free. (However, I still recommend checking out the book: especially if you are looking for a good textbook for a course on logic with a concrete, computational bent.)

Below, I’ll describe how to install the programs and try them out. There are many more interesting functions in this suite that I haven’t described.

The software is written in OCaml and can be run interactively in an OCaml toplevel (don’t worry, you won’t actually need to know any OCaml). Download and install OCaml as well as its preprocessor Camlp5 (which is used for formatting formulas nicely).

Then, download the code from here (under “All the code together”) and unzip it somewhere.

To run it, go to wherever you unzipped it and type `make interactive`

in a shell.

(At least, that’s what worked for me on a Mac OSX. Other systems may be different.)

The Tarski-Seidenberg theorem implies that there is a decision procedure which, given a first-order sentence over using plus, times, 0, and 1, will tell you if it’s true or not. The function `real_qelim`

implements this. Let’s try it out. (The symbol # indicates the beginning of the prompt; don’t type that, just type in what’s after it.)

This function knows that not all quadratic polynomials have roots, but all cubics do.

# real_qelim <<forall b c. exists x. x^2 + b*x + c = 0>>;; - : fol formula = <<false>> # real_qelim <<forall b c d. exists x. x^3 + b*x^2 + c*x + d = 0>> ;; - : fol formula = <<true>>

Many geometric puzzles can, in theory, be solved automatically by this function. Unfortunately, it is too slow for most interesting ones. Harrison notes that there are open problems about kissing numbers of high-dimensional spheres which could be solved in theory by this algorithm, although in practice it is an unworkable approach.

This algorithm actually does something stronger than decide the truth of first-order sentences: it does quantifier-elimination, which means that if you give it a formula with free variables, it will give you a quantifier-free formula in those same free variables (in the case of a sentence, which has no free variables, that means either the formula “true” or the formula “false”).

For example, if you’ve forgotten the quadratic formula and want to know what the condition is for a quadratic polynomial to have a root:

# real_qelim <<exists x. x^2 + b*x + c = 0>>;; - : fol formula = <<(0 + c * 4) + b * (0 + b * -1) = 0 \/ ~(0 + c * 4) + b * (0 + b * -1) = 0 /\ ~(0 + c * 4) + b * (0 + b * -1) > 0>>

Note that there is no claim that the formula it gives you will be completely simplified, only that it will be correct.

We can similarly use the function `complex_qelim`

to do quantifier elimination over the complexes. The fact that this possible is easier to prove than the corresponding fact for the reals, and the algorithm is similarly faster.

# complex_qelim <<forall x. x^3 = 1 ==> x = 1>>;; - : fol formula = <<false>>

The following sentence is also true over the reals (although for a different reason than why it’s true over the complexes), but it takes significantly longer for the real quantifier elimination algorithm to decide it.

# complex_qelim <<forall x1 x2 x3. (x1^3 = 1 /\ x2^3 = 1 /\ x3^3 = 1 /\ ~(x1 = x2) /\ ~(x1 = x3) /\ ~(x2 = x3)) ==> x1 + x2 + x3 = 0>>;; - : fol formula = <<true>>

Suppose we read on wikipedia that the translation of the limaçon to rectangular coordinates is . We can verify this (I’ve used `s`

to represent and `c`

to represent ):

# complex_qelim << forall r s c x y. (x^2 + y^2 = r^2 /\ r * c = x /\ r * s = y ==> forall a b. (r = b + a * c ==> (x^2 + y^2 - a * x)^2 = b^2 * (x^2 + y^2)))>>;; - : fol formula = <<true>>

Finally, first-order sentences with plus and less-than over the integers and over the natural numbers are decidable. The relevant functions are `integer_qelim`

and `natural_qelim`

. Even though multiplication of variables is prohibited, we can still multiply by constants (since for example, instead of we could have written anyway).

An example Harrison gives is: There is an old (easy) puzzle which is to show that, with 3- and 5-cent stamps, you can make an -cent stamp for any .

# natural_qelim <<forall n. n >= 8 ==> exists x y. 3 * x + 5 * y = n>>;; - : fol formula = <<true>>

]]>

The first professor you come across, , knows what means, the second professor you come across, , knows what means, and the third professor you come across, , knows what means. So you have fortunately solved your problem.

But you’re now curious and decide to meet some other professors in the department. The next professor you come across is named . He doesn’t know what means or what means, but if you told him what meant, he would be able to tell you what means (for example, maybe he knows that is the noun form of ).

The next professor you meet is named . If you told him what meant and what meant, he would be able to tell you what means.

The next professor you meet is named . If you told him a *method* for finding out the meaning of given the meaning of , he would be able to tell you the meaning of .

In general, for any two professors and , there is a professor with the property that if you told him what knew, he would be able to tell you what knows (but doesn’t know any more than that).

Notice that some professors have essentially the same state of knowledge. For example, and have essentially the same knowledge, since to get the meaning of out of you only have to tell him a method for finding out the meaning of given the meaning of , which is something that you can do without any particular special knowledge concerning , and .

A more nontrivial example is that has the same state of knowledge as . This is because each can “simulate” the other. In one direction, suppose somebody told the meaning of . He therefore knows a trivial “method” for getting the meaning of given any inputs, and so he knows the meaning of . In the other direction, suppose we told a method for turning (methods for turning the meaning of into the meaning of ) into the meaning of . Well, knows a method for turning the meaning of into the meaning of , so he can use that find the meaning of . He can then use his method a second time to turn that into the meaning of .

The puzzle is then to prove that there are only finitely many professors with different states of knowledge.

This puzzle is equivalent to showing that intuitionistic implicational propositional logic over three variables has only finitely many logically inequivalent formulas. Another formulation is: Let be the free cartesian closed category over objects. Given objects and , say that and are equivalent if there is an arrow from to and an arrow from to . Then there are only finitely many equivalence classes in . (The corresponding statements are also true with replaced by .)

This fact was first proved using algebraic methods by Arturo Diego in his Ph.D. thesis in the 1940’s. It was subsequently reproved using semantic methods by various people including Nicolaas de Bruijn and Alasdair Urquhart. A good overview of those results is in Lex Hendriks’s thesis. I proved it in a combinatorial way as part of my thesis.

To rigorously state the problem: Let be a finite set of propositional variables, and let be the smallest set containing and such that if formulas and are in , then the formula is in . We define a relation between sets of formulas in and formulas in as follows: We will let be the smallest relation such that, for any , :

- .
- If and , then .
- If and , then .
- If , then .

The relation formalizes the notion of a formula being provable given a set of hypotheses.

We say that and are equivalent if and , and the proposition is then that there are only finitely many equivalence classes in .

Unfortunately, a full solution using no other machinery (at least the one that I came up with) is a bit too notationally cumbersome for a blog entry, but the puzzle is by no means inaccessible to someone with no other knowledge of logic.

In any case, I will sketch the solution in an important special case: that of formulas which are *left-associated*, i.e., of the form where each is a propositional variable. Since we know how to parenthesize such formulas, we can write them simply in the form .

The crucial insight is the following:

Let be a propositional variable for . Let and suppose that . Then the formulas and are equivalent.

To see why this is so, imagine that you are trying to prove some and you have some hypothesis . What does allow you to do?

- If , it allows you to complete the proof immediately.
- If is of the form , then it allows you to change the goal from to .
- If is of the form , then it allows you to change the goal from to and give yourself as a hypothesis.

In the third case above, suppose that is of the form . The only way that can be of use is if you have some way to change the goal from to . In general, if you have as a hypothesis, if you get to use you must have a method for turning the goal from to for . By assumption in the boxed statement, we therefore have a method for turning the goal from to and vice versa. So and are equivalent. (Actually, what this argument shows is that they can be used in equivalent ways as hypotheses in a proof, which turns out to be enough.)

Now also observe that

For any formula and variable , is equivalent to .

From the two boxed facts, it pretty easily follows that every left-associated formula is equivalent to one of length at most (where is the number of propositional variables): Suppose that is equivalent to no shorter formula. Chopping up into triplets, I claim that no two triplets are the same: If so, and a triplet consisted of all the same variable, we could apply the second boxed fact to get a shorter formula. Otherwise, we can apply the first. Since there are only distinct triplets, that gives a length of at most .

Therefore, there are at most left-associated formulas.

]]>

In particular, is stronger than . But certainly, given that we believe that everything proves is true, we believe that does not prove a contradiction, and hence is consistent. Thus, we believe that everything that proves is true. But by a similar argument, we believe that everything that proves is true. Where does this stop? Once we believe that everything proves is true, what, exactly, are we committed to believing?

This is from Chapters 13–15 of Torkel Franzén’s book Inexhaustibility, which is admirably clear and well-written.

First off, let be , and let be . By the considerations above, we accept that each is sound. (A theory being sound means that everything it proves is true.)

So, if we therefore let , then we accept that is sound. We could therefore define to be and we would have to accept that as sound as well, but in making this definition we run up against our first snag.

The snag is this: In order to express a sentence of the form in the language of number theory, we much choose some recursively enumerable presentation of , and which recursively enumerable presentation we choose matters. For example, if we add to any given presentation of the stipulation that we are adding, for all and such that the axiom , then we haven’t actually added any new axioms, but if we construct the statement with presented the second way, will imply Fermat’s Last Theorem, while constructed from the original presentation of may not.

As you might guess from the above, we are going to want to construct for ordinals . When is a successor ordinal, it is clear how to get a “reasonable” presentation of from a reasonable presentation of (where ), but if is a limit ordinal, in general it won’t be clear (although it is clear for ).

So how can we solve this problem?

The first step is to use a more computable representation of ordinals, namely ordinal *notations. *An ordinal notation is a number with the following property: It is either 0, or for every , the output of the th Turing machine on input is an ordinal notation. (What this recursive definition really means is that the set of ordinal notations is the smallest set satisfying the above property.)

Given an ordinal notation , we let the ordinal it represents, , be defined by and , where denotes the output of the th Turing machine on input .

We can now uniformly pick presentations of for ordinal notations by letting and be presented as the union of over , where the consistency statements are constructed using the presentations given by induction.

Unfortunately, this doesn’t prevent us from doing the trick mentioned above: For any true sentence , there is an ordinal notation such that and proves . The catch is that will be quite an unusual notation for 1, and we’re not really justified in taking to be a consistency extension of because doesn’t “know” that is an ordinal notation.

However, we can make a reasonable definition for what it means for (or any extension) to prove that a number is an ordinal notation. (This is actually not trivial, since can only talk about numbers, but the set of ordinal notations was defined to be the least *set* satisfying a certain property.) We can then define an *autonomous* consistency extension of as follows: is an autonomous consistency extension of itself, and if is an autonomous consistency extension of , and proves that is an ordinal notation, then is an autonomous consistency extension of .

The autonomous consistency extensions of have some claim to being exactly those that we recognize to be consistency extensions of solely on the basis that we accept . But that isn’t really completely satisfying. There’s nothing stopping us from letting be the union of the autonomous consistency extensions of and considering . Similarly, we got the set of autonomous consistency extensions of by starting with and then closing under *finite* applications of a particular operation, but we could also have considered transfinite applications of that operation.

Does there exist a theory (which we believe is true) which will prove anything *any *reasonable iterated consistency extension of proves? It turns out there is. Let be the theory obtained by adding to the axiom that for any sentence (that is, sentence of the form where all of ‘s quantifiers are bounded), if proves , then is true.

This property is called -soundness, and the axiom formalizing it is called a reflection axiom. If is -sound, then so is , since if proved a false statement , then would prove the false statement (false because -soundness implies consistency). Similarly, any union of a chain of -sound theories must be -sound.

Because we can formalize the above argument in , proves that every autonomous consistency extension of is -sound. Therefore, it proves that every autonomous consistency extension of is consistent. Therefore (since essentially autonomous consistency extensions of say nothing besides the fact that lower autonomous consistency extensions are consistent), extends each autonomous consistency extension of .

Okay, so adding axioms asserting that is -sound takes us beyond all the autonomous consistency extensions of . But what happens if add to axioms asserting that is -sound? This is called a *reflection *extension, and we can form autonomous iterated reflection extensions just as we can autonomous iterated consistency extensions.

Is there any theory (which we believe is true) which goes beyond all the autonomous reflection extensions the same way that goes beyond all the autonomous consistency extensions of ? There is. The theory asserts that all sentences that proves are true. But it’s actually the case that *all* sentences that proves are true.

By a result of Tarski’s we can’t define truth of an arithmetical formula in , but we can define it by adding a new predicate to the language of , together with suitable axioms. The resulting theory , extends every autonomous reflection extension of .

In terms of what arithmetical sentences they can prove, is an equivalent theory to (edit: not ), which is the theory of second-order arithmetic, with a comprehension axiom for all arithmetic formulas. This is essentially because sets of numbers in are interchangeable with formulas in the language of with one free variable in .

And, of course, we then get autonomous iterated *truth *extensions of , in analogy to the autonomous iterated reflection extensions and the autonomous iterated consistency extensions. Here there is again a natural theory which extends all the autonomous iterated truth extensions, a theory called : it’s a theory of second-order arithmetic, like , but it allows comprehension for -formulas (formulas with a universal set quantifier in front), instead of just arithmetic formulas.

Of course, we can now start again, taking consistency or reflection extensions of . But, as Franzén says:

[E]xtending to opens the door to a number of possible extensions that go beyond reflection. In particular, we can extend a theory by introducing axioms about sets of higher type—meaning sets of sets of natural numbers, sets of sets of sets of natural numbers, and so on—and by introducing stronger comprehension principles for sets of a given type. … Axiomatic set theories like give powerful first-order theories which prove everything provable in such iterated autonomous extensions. … In this connection the term “reflection” reappears and takes on a new meaning. … [This] leads to a further indefinite sequence of extensions of set theory, and furthermore, “axioms of infinity” [i.e., large cardinal axioms], have been formulated which can be reasonably argued to be stronger, as far as arithmetical theorems are concerned, than any such extension by set-theoretic reflection.

]]>

A natural question to ask is whether or not the representation of as a trigonometric series is unique, if it has one. It was the consideration of this question that led Cantor to the invention of set theory.

There is a nice writeup of this story in the first part of this article by Alexander Kechris. I’ll give part of the story below.

Cantor solved the problem in the affirmative; i.e., he proved:

Suppose that a trigonometric series converges to zero everywhere in . Then all the coefficients of that series are zero.

(By subtraction, this is equivalent to the problem stated above.) He was also able to show (by a very similar method) the following, which I’ll call the Isolated Points Lemma:

Suppose and that a trigonometric series converges to zero on . Then that series converges to zero at as well.

From these two results, we can immediately conclude the following:

Suppose that a trigonometric series converges to zero at all but finitely many points. Then the coefficients of that series are all zero.

Call a set a set of uniqueness if whenever a trigonometric series converges to zero on , the coefficients of that series are all zero. Then the previous result may be stated: “All finite sets are sets of uniqueness.”

But we can use the Isolated Points Lemma to show more than that. For example, we can show that the set is a set of uniqueness. The reason is that if a trigonometric series converges to zero on , then by the Isolated Points lemma, it also converges to zero on the points in .

But, *now that we know* that it converges to zero on the points in , we can apply the Isolated Points Lemma again to show that it converges to zero at 0 (since we now know that at converges to zero on, e.g., ).

What we have actually shown by the above argument is the following:

Given , let be the set of limit points of (also known as the Cantor-Bendixson derivative of ). If is a set of uniqueness, then is a set of uniqueness.

For any , let be the th Cantor-Bendixson derivative of . Then, by iterating the above fact, we have the following:

Suppose that for some , . Then is a set of uniqueness.

This is as far as we can go as long as we merely iterate the Cantor-Bendixson derivative finitely often. But, if we make the leap to iterating it transfinitely many times, we can go much further:

Theorem:All countable closed sets are sets of uniqueness.

**Proof: **First, define for all ordinals and all closed sets as follows:

- .
- , when is a limit ordinal.

The Isolated Points Lemma says that if a trigonometric series converges to zero outside of , then it converges to zero outside of . We will generalize this by showing the following lemma:

Lemma:If a trigonometric series converges to zero outside of , then it converges to zero outside of for any .

**Proof of Lemma: **This is by transfinite induction. The successor step of the induction is just the Isolated Points Lemma again, so all we have to show is that, fixing a trigonometric series, if is a limit ordinal and the series converges to zero outside of each for , then it converges to zero outside of . But this follows simply because every point of must be in some for by definition. **End of proof of Lemma.**

To complete the proof of the theorem then, we just have to observe that for all countable closed , for some . Clearly, for all closed , there is an such that (this is because is a decreasing sequence). But it is standard fact that a set such that is either empty or of cardinality . (Such a set is called a perfect set and a reference for the cited fact is page 7 of David Marker’s notes on descriptive set theory.) **End of proof.**

As a historical note, Kechris reports that while thinking about the above issues led Cantor to discover ordinals, he never actually wrote down a proof of the above theorem; that was finally done by Lebesgue in 1903.

Further, it was later proven by Bernstein and Young independently that arbitrary countable sets are sets of uniqueness, and by Bari that countable unions of closed sets of uniqueness are sets of uniqueness.

Edit: Simplified the proof.

]]>

Although it is not strictly related to logic, I’ll write up what I learned here.

My main source for this is sigfpe’s blog post on this. In fact, all I really did was take his post and remove the Haskell from it (and probably add some mistakes). If you can read Haskell, I definitely recommend that post (and his blog in general). I also read the article on quantum groups in the Princeton Companion to Mathematics, which is an amazing book.

We’ll take the ordinary definition of a group and turn it into the definition of a quantum group in two steps.

**Step 1: Groorgs.**

In the first step, we’ll “symmetrize” the definition of a group to get an object I call a groorg. What does it mean to symmetrize a definition? Well, for one thing, as part of the definition of a group we have a multiplication . Therefore, to make it symmetric, we should also have a comultiplication .

Furthermore, this comultiplication should satisfy laws dual to those satisfied by multiplication. For example, multiplication is associative, which means that for any , , , the two ways of using multiplication to turn the triple into a single element are the same. Dually, then, comultiplication should be coassociative, meaning that for any element , the two ways of using comultiplication to turn that single element into a triple should be the same.

OK, so we’ve seen that the dual of multiplication is comultiplication. Another part of the definition of a group is the inverse function , which is no problem, since it can be its own dual. But what about the identity element ? This is puzzling until you think of the identity not as an element of , but as a map from a one element set to , which, when written it that form, I’ll call . Then it’s clear that the dual of should be . The astute reader will note that there is only one such function, so this seems to be a trivial addition, but let’s press on regardless.

We can now begin the definition of a groorg.

Definition of a Groorg, Part 1.A groorg is a set together with:

- A map .
- A map .
- A map .
- A map .
- A map .
These must satisfy the following properties:

- Multiplication must be associative and comultiplication must be coassociative.
- should be a unit for multiplication. This means that, given any , if you form an ordered pair with and , then apply to that ordered pair, you get back.
- should be a counit for comultplication. This means that, given any , if you apply to to get an ordered pair, then destroy one of the componenents with (i.e., just discard it), then you get back.

Let’s pause here. If you’ve followed along, you may have noticed that this last condition forces to be (which does satisfy the conditions so far). This is obviously quite restrictive, but it has one benefit: it means that we can rewrite the inverse law using comultiplication. To see what I mean, consider the following: If we let and , then the usual inverse law for a group says that for any , . Now, since we know that is forced to take each to , we can rewrite the inverse law as: .

We can make this completely symmetric by getting into the act:

Definition of a Groorg, Part 2.A groorg is also required to satisfy: for all (where and are as above.

And we finally have a requirement which says that the two ways of computing by using comultiplication and multiplication are equal.

Definition of a Groorg, Part 3.A groorg is also required to satisfy: , where the second is the natural multiplication on .

This concludes the definition of a groorg.

But, what use is it? As we’ve observed, in every groorg, we must have that sends to , so that the groorg just reduces to an ordinary group. Furthermore, every group becomes a groorg by defining comult in that way (and by defining counit in the only possible way).

What we’ve gained is that we now have a definition of a group which is equivalent to the old one and which is symmetric, which will lend itself well to our next step.

**Step 2. Adding superpositions.**

How can we turn this concept of a group into one of a “quantum” group? If you’re like me, the only thing you know about quantum mechanics is that you often hear the word “superposition” used in conjunction with it. That’s not much, but it turns out to be enough in this case.

Instead of having the composition of two group elements be another group element, let’s have it be a *superposition* of group elements. It turns out that what this should mean is a *linear combination* of group elements. So, let be the -vector space generated by taking as a formal set of basis vectors. Instead of requiring that be a map from to , we will let it be a map from to . So we have the following:

Provisional Definition of a Quantum Group.A quantum group is a set together with

- A map .
- A map .
- A map .
- A map
- A map .
satisfying …

Before we can think about what properties these functions should satisfy, we have to settle a question: We know how to multiply two group elements to get a superposition of group elements, but how should we multiply two superpositions of group elements? For example, what should the product of and be? (Note that I am using the same symbol to stand for the group element and the formal basis vector corresponding to it.)

The quickest way to define the multiplication of superpositions is to notice that, since is a basis for , the map extends to a *linear* map , which we can use to multiply superpositions. In the above example, is equal to , so the product of the two superpositions would be .

Now, our symmetric definition of a group above translates exactly, and we no longer need to mention the basis explicitly:

Definition of a Quantum Group (or Hopf Algebra). A quantum group is a -vector space together with:

- A linear map
- A linear map
- A linear map
- A linear map
- A linear map
satisfying the analogues of the laws given in the definition of a groorg.

**Some Combinatorial Examples**

I believe there are many examples of the usefulness of this concept in physics. However because I don’t know any physics, I won’t give them.

Here are two combinatorial examples from sigfpe’s blog post:

**Example 1: A Quantum Group on Finite Strings.
**

Let be an alphabet, and let be the set of all finite strings with characters from . We may put a quantum group structure on as follows:

- We let , the concatenation of and .
- We let , the empty string.
- We let and where .
- We let , where is the length of and is the reverse of .
- We let be the sum of where and can be shuffled together to give . This means that the characters in occur in in the same order, and when you remove them you get .

For an example of the comultiplication, if and , then .

We can verify the inverse law in this case: If we apply on the right to , we get . Applying to this, we get .

**Example 2. Another Quantum Group on Finite Strings.**

This is also a quantum group on .

- We let be the sum of all possible ways of shuffling and together.
- We let , the empty string.
- We let and where .
- We let , where is the length of and is the reverse of .
- We let be the sum of all such that .

Again, we can verify the inverse law in a specific case: We have that . If we apply on the right, we get . Now applying , we get .

**Example 3: A Quantum Group on Finite Binary Trees.**

This example is from here. We think of a finite binary tree as a finite tree where each node has either zero or two children, and where we distinguish between left and right. This picture from the above paper shows how, if you select a leaf of a finite binary tree, you may divide the tree into the tree to the left of the leaf and the tree to the right of the leaf:

If you select a multiset of leaves, you may similarly divide your tree into trees.

We may now define a quantum group on , where is the set of finite binary trees as follows:

- is the sum of all trees generated as follows: Suppose that has leaves. Divide into trees as above, and stick them onto the leaves of . There are many graphical examples of this here.
- is the tree with a single node and no leaves.
- is the sum of all , where and are trees that can be divided into. There are many graphical examples of this here.
- is 1 if is the tree with only one node, and zero otherwise.
- For the definition of , I refer you to the above paper. However, there are many graphical examples of the antipode here.

]]>

The idea is that you construct a superset which contains the reals and also some infinitesimals, prove that some statement holds of , and then use a general “transfer principle” to conclude that the same statement holds of .

Implicit in this procedure is the idea that is the *real* world, and therefore the goal is to prove things about it. We construct a field with infinitesimals, but only as a method for eventually proving something about .

We can do precisely the same thing with instead of with . But, in Weak Theories of Nonstandard Arithmetic and Analysis, Jeremy Avigad observed that if we don’t care about transferring the results back down to , then we can get all the basic results of calculus and elementary real analysis just by working with , and without ever having to construct the reals.

Let me first differentiate two approaches to nonstandard analysis. The first is the one I mentioned above, where you actually *construct* a field (although you need the axiom of choice to do it). This is done entirely within ordinary mathematics. Call this the semantic approach.

Another approach is the axiomatic approach. A good example of this is Edward Nelson‘s internal set theory. In this approach, you take an ordinary axiomatization of some part of mathematics (for example, ZFC), introduce a new predicate for being “standard” or “normal-sized”, and some axioms saying that there exist things which are not standard and how these things relate to everything else. In the usual situation, a sentence which does not contain the predicate “standard” is provable in the new theory iff it’s provable in the old theory. (This is the case with IST and ZFC.)

The axiomatic approach is the approach we’ll take here. We’ll let our language consist of a function symbol for each primitive recursive function and relation, together with a predicate and a constant . Our axioms will be the following:

- If is a true (in the natural numbers) first-order -sentence that does not include the new predicate , then we take as an axiom.
- We take as an axiom.
- We take as an axiom.
- We take to be an axiom for each -ary primitive recursive function .

The interpretation of our sentences is that we are now quantifying over a domain which includes infinitely large natural numbers (of which is an example) and that the predicate picks out those which are normal-sized. However, since we are working within the axiomatic system, I will still refer to the domain we are quantifying over as .

Within the system, construct and from as usual. We make the following definitions:

We say that an natural number is unbounded if it is not standard (i.e., if ). We say that an integer is unbounded if is unbounded. We say that a rational is unbounded if the closest integer to it is unbounded.

Furthermore, we say that a rational number is infinitesimal if it equals 0 or if is unbounded. We say that and are infinitely close, written , if is infinitesimal.

Let be the set of rationals which are not infinite. We can now do analysis on . First of all, we can define continuity in a natural way: We say that is continuous if whenever , .

We have the intermediate value theorem for : If and and is continuous, then there is a such that . Proof: Recall that is a natural number. Let be the maximum natural number less than such that . (This is possible because there are only finitely many natural numbers less than any natural number, including !) But then must be infinitely close to , since by continuity .

We can also prove that any continuous function on attains a maximum (up to ) by essentially the same means: just consider the for which is a maximum, which is again possible considering that there are only finitely many .

Turning to differentiation, we may define if for all non-zero infinitesimals ,

(Note that the derivative is actually defined only up to .) We can then prove that the derivative of is by letting be an arbitrary infinitesimal, expanding , dividing by , and noting that what results is plus an infinitesimal.

Avigad notes that we may continue by defining , , and by taking an unbounded partial sum of the Taylor expansions, and that this is sufficient to prove all the basic properties. He also cites an easy proof in this setting of the Cauchy-Peano theorem on the existence of solutions to differential equations.

]]>

Are there any games which can actually be played in the real world with an undecidable analysis function? Robern Hearn, in the same thesis that I linked to last time, showed that the answer is yes.

In order to make sure that our games can be played in the real world, we’ll restrict our attention to games where each has only finitely many positions (that is, the board has only finitely many states it can be in).

First off, observe that if is a game of perfect information, then it has a decidable analysis function. By the fact that there are only finitely many positions in each , you can construct a finite game tree for (finite because you can cut it off when positions repeat) and then induct up it to find out who has a winning strategy. (You may want to think about this if you haven’t seen it before. This is something like what is sometimes called Zermelo’s Theorem.)

OK, so let’s look at games which are not of perfect information, i.e., each of the players have some private information. Now it’s possible that no player has a winning strategy (for example, consider the two-player game where both players secretly choose either 0 or 1, and if the sum is odd then Player 1 wins, and if it’s even then Player 2 wins). Even so, the question of which, if any, players have a winning strategy is well-defined.

It turns out that these also have decidable analysis functions. I’ll also omit this proof, but it’s similar to the above except that, instead of a finite tree of positions, you can construct, for each player, the tree of subsets of positions that he considers possible at any given time.

So what hope is there? I said that both games of perfect information and games of imperfect information have decidable analysis functions. But there was an unstated assumption: that all players are playing against each other. Hearn showed that a game with two players who are playing as a team against a third player but such that each of two team members has private information (which they are not allowed to communicate) can be undecidable.

The idea of the proof is as follows: Imagine that we have three players, Player 1, Player 2, and Player 3, and that Players 2 and 3 are playing as a team against Player 1. In order to make the analysis function undecidable, we would like to do something like (say) in game force Player 2 to emulate the th Turing machine and have his team wins if it halts. But we can’t literally make the board an infinite tape and make Player 2’s legal moves be those simulating the th Turing machine, because then the board would have infinitely many positions.

But what if, instead of an infinite tape, the board was a single cell where Player 2 wrote out the computational history of the th Turing machine one character at a time? That is, if represents the (finite) contents of the tape at time , he writes out the concatenation of all the (with, say a special symbol # separating them). This would be good, but how can we enforce this given that we want the set of legal moves to depend only on the board position?

The solution is the following: We require both Player and Player 3 to write out the computational history of the th Turing machine in two separate “streams”. Player 1 lets each of them know which stream he wants them to to write a character to at any given time. Player 1 can then check them against each other by advancing one of them ahead of the other in one of the streams and checking, character by character, that the string that one of them is writing out is one step further along the computational history of the th Turing machine than the string that the other is writing out. Since Player 2 and Player 3 do not know who is advanced relative to the other, they will not be able to cheat.

Hearn uses this argument to show that a game he calls Team Computation has an undecidable analysis function. He then uses that to show that the team version of Constraint Logic (discussed in the previous post) is undecidable.

]]>

But what if you come across a new game, which no computer scientist has heard of yet? Well, you’re in luck, as Robert Hearn, in his thesis, formulated a framework called Constraint Logic intended to make it easy to prove complexity results for games.

Games are classified by Hearn by whether they are zero-, one-, or two-player, and further by whether or not their length is polynomially bounded by their initial setup. (Hearn also covers team games, which I’ll cover in a later post.)

Recall that by saying a game is in a complexity class I mean that the problem of, given an initial setup for the game, determining whether or not there is a winning strategy for a given player (if any) and, if so, what that strategy is, is in that complexity class. Thus, when we talk about a game being in a particular complexity class, we are always implicitly talking about a way of generalizing the game to different initial setups. For example, the statement that Go is in EXPTIME means that the problem of, given , determining a winning strategy for Go on an board is in EXPTIME.

- An example of a zero-player bounded-length game is the game Clock Solitaire. Once the cards are dealt, the game is completely deterministic (this makes it zero-player), and furthermore, each card is turned over at most once (this makes it bounded). Games of this sort are in the complexity class P.
- An example of a zero-player unbounded game is Conway’s Game of Life. Games of this sort tend to be in PSPACE. Conway’s game of life is PSPACE-complete.
- An example of a one-player bounded-length game is Peg Solitaire, mentioned above. It is bounded since each move removes a peg, and there is no way to get a peg back. Games of this sort are in NP. Peg Solitaire, as mentioned above, is NP-complete.
- Examples of one-player unbounded games include Rubik’s Cube and Rush Hour. Games of this sort tend to be in PSPACE. Rush Hour is PSPACE-complete. (There are a couple of different ways of generalizing the Rubik’s Cube to higher dimensions, but I’m not aware of complexity results for any of them. If you know, please leave a comment.)
- Examples of two-player bounded-length games include Tic-Tac-Toe, John Nash’s game Hex, and Othello. In both cases, the game is of bounded-length since once a player marks a space on the board, it can never be unmarked or marked again. Games of this sort are in PSPACE. Hex and Othello are PSPACE-complete. (As with the Rubik’s cube, there are a couple of different ways to generalize Tic-Tac-Toe to higher dimensions, but I don’t know of any complexity results for them. If you know, please leave a comment.)
- Examples of two-player unbounded games include many familiar games such as checkers, chess, and go. Games of this type tend to be in EXPTIME. Each of the three games mentioned is EXPTIME-complete.
- Hearn also defines “team” games, which I will skip in this post.

So what is constraint logic? It’s a single setup which naturally gives examples of games of each of the six types listed above, and furthermore is in each case complete with respect to the associated complexity class. Thus, if you can reduce constraint logic to whichever game you’re interested in, you have a completeness result.

The setup for constraint logic is as follows: The game board is an undirected graph with weights of 1 or 2 on the edges and non-negative integers assigned to the vertices (these non-negative integers are called the *minimum inflow* of the vertex). A position of the board is an assignment of direction to each edge. The position is legal if, for each vertex, the sum of all the weights of the edges directed towards the vertex is at least the minimum inflow of the vertex. A move is generally reversing the direction of an edge (to give another legal position), and the goal of the game is generally to reverse the direction of a specified edge.

How does this give us each of the six types?

- To form a zero-player bounded-length game: Suppose given a board position , and a goal edge (i.e., is a directed graph with weights assigned to the edges and non-negative integers assigned to the vertices, and is an edge of , and the game is won if the direction of is reversed). Then let be the set of edges such that it’s legal to reverse the direction of . Reverse the direction of all of these edges to get a new position . Let be the set of all edges such that it’s legal to reverse the direction of
*and**hasn’t been reversed before.*Reverse all of those, and so on. Since an edge is reversed at most once, the game is bounded length. (Notice that it might be the case that in reversing some edge in , you make it so that some other edge in can no longer be legally reversed. This can indeed happen, but it is possible to restrict to a subclass of games where this never happens). This game is P-complete. - To form a zero-player unbounded length, you essentially do the same thing as above, but remove the restriction that an edge can be reversed at most once. There are some technicalities however, which are too ensure that every time you try to reverse an edge, it is a legal move. This game is PSPACE-complete
- To form a one-player bounded game: On each turn the player reverses the direction of edge (to form a legal position). Each edge can be reversed at most once. The player wins if he is able to reverse the direction of the pre-specified goal edge. This game is NP-complete.
- To form a one-player unbounded game: Just as above, except that each edge may be reversed as many times as necessary. This game is PSPACE-complete.
- To form a two-player bounded-game: Each player has a set of edges which only they may reverse. Each player also has their own target edge. They take turns reversing edges and each edge may be reversed at most once. The first player to reverse their target edge wins. This game is PSPACE-complete
- To form a two-player unbounded game: Just as above, except that each edge may be reversed an unlimited number of times. This game is EXPTIME-complete.

Part II of Hearn’s thesis (linked above) contains a large number of complexity proofs for games based on these results. Furthermore, he has results which make this endeavor easier: a priori, you would have to reduce any graph to the game in question to prove that it’s complete with respect to the appropriate complexity class. Hearn proves that it suffices to reduce planar graphs which consist solely of AND vertices, which are those of minimum inflow 2 and with exactly three adjacent edges, one of which has weight 2 and two of which have weight 1, and OR vertices, which are those of minimum inflow 2 and with exactly three adjacent edges, all of which have weight 2. (You can interpret games built up from these vertices as a kind of a non-deterministic circuit computation, which is where they get their name.)

For example, consider sliding block puzzles: In a sliding block puzzle, you are given a number of rectangular blocks in a rectangular box. The goal is to slide the blocks around to get one particular block in one particular case. Using Constraint Logic, Hearn, together with his advisor, Erik Demaine (who is amazing, by the way), showed that this was PSPACE-complete. The proof is shown in this figure, reproduced from this paper by Demaine and Hearn:

On the left is the translation of an AND vertex; on the riht is the translation of an OR vertex. In each case, the three yellow blocks on the border represent the three edges adjacent to the vertex. Reversing the direction of an edge corresponds to either pushing the block in to the square, or pulling it out. The squares are designed so that it is possible to slide the blocks in the interior around so that pushing a yellow block inside is possible iff the corresponding edge reversal is legal.

For more information, see Hearn’s thesis and the paper by Demaine and Hearn linked above, as well as this additional paper by Demaine and Hearn on the subject.

]]>

Suppose that we relax the restrictions on the program, and we allow it to take a number out of the bag that it has put in (but once the program has done that, the number stays out forever). We call the set of numbers which are in the bag from some point on (i.e., the set of numbers which are put in the bag and never taken out) in a procedure of this sort a 2-c.e. set.

We can analogously let an -c.e. set be one given by a program which can, for each number, “toggle” that number’s status up to and including times if it likes.

The puzzle is then to find, for each , an example of a set which is -c.e. but not -c.e.

Here is an example of a set which is -c.e. but not -c.e.: The program simulates all Turing machines simultaneously (by dovetailing, this can still be done with a finite algorithm).

For each it does the following: If the th Turing machine halts, it puts in the bag. In this case, the program also checks to see what number the th Turing machine output when it halted; call it . If eventually the th Turing machine halts, the program takes out of the bag. In this case, the program also checks to see what number the th Turing machine output when it halted; call it . If eventually the th Turing machine halts, the program puts back in the bag. And so on, up to a maximum of times.

Call that set . Clearly it’s -c.e. We have to show that it’s not -c.e.

First observe that each is complete with respect to -c.e. sets. This means that if is -c.e., then there is a computable function such that iff for all .

Now observe that if is -c.e., then its complement is -c.e. We can find an -c.e. presentation of by letting our computer program start by putting all numbers in the bag and then doing the opposite of what the -c.e. presentation of does.

Combining these two observations, there is a computable function such that for all , iff . By Kleene’s Fixed Point Theorem (also known as Kleene’s Recursion Theorem), there is an such the th Turing machine behaves the same as the th Turing machine. For this particular we then have that iff , which is a contradiction.

]]>

**Puzzle #1.** Describe a winning strategy for the following game: You are given three numbers . You must correctly say for each number whether or not it is in . You are allowed to ask (and receive a truthful answer to) two questions of the form “Is in K?” for any .

**Puzzle #2.** Show that there is *no* winning strategy for the game which is the same as that in Puzzle #1 except you are given two numbers and may ask only one question. (Even stronger, show that if is a set such that you *can* win that game, then must be decidable.)

These puzzles are special cases of more general questions answered in Terse sets, superterse sets, and verbose sets by Richard Beigel, William Gasarch, John Gill, and James Owings. I also suggest looking at Richard Beigel’s page of online papers, which has a lot of interesting stuff.

I read about this first in Piergiorgio Odifreddi’s book “Classical Recursion Theory, Volume 1.”

**Answer #1.** The first thing to do is to find out how many of are in . To do this, let be such that the th Turing machine simulates the th, th and th Turing machines in parallel, and halts after any two of them halt.

By asking if is in , you will find out either that there are two or more or that there are one or less of in . Use a similar second question to find out exactly how many of are in .

Once you know how many of are in , just run the th, th, and th Turing machines in parallel, and wait until all the ones that are going to halt do halt. Then you will know which of them are in .

**Answer #2**. Suppose that you have a winning strategy, and I’ll show how to compute . Let be the function which, given and , returns the guess for whether or not is in , supposing that you received a “no” answer to whatever question your strategy decided to ask. Similarly define , , and .

Either it is the case that for all there is a such that or it isn’t. Suppose first that it is. Then given , we may compute whether or not is in by searching for such a and computing (which equals ). This must be the correct guess since either “yes” or “no” must be the correct answer to whatever question the strategy asked.

Now suppose that there is an such that for all , . Since is just a single number, we can assume that we know whether or not it’s in . Then, given any , since we know which of them is correct, hence we know whether “yes” or “no” is the correct answer to the question that the strategy would pose. Hence we can compute its correct guess as to whether or not is in .

**Additional Results.** The paper linked above gives generalizations of both puzzles: You can find out whether any set of numbers is in by asking questions (by the same strategy of determining how many of them are in ), and, for any , if you can determine whether any set of numbers is in by asking questions then is decidable.

The authors call a set *verbose* if it is such that you can determine whether any set of numbers is in that set by asking questions. They call it *terse* if you need questions to determine whether or not numbers are in the set. They are show a number of interesting results about these, mainly along the lines that (very roughly) lots of both kinds exist, and in lots of different places.

]]>

It’s pretty clear that spatial concepts having to do with distances and rotation require the real numbers. For example, if we took as our model of the plane, the distance from to would not be rational, and we would not be able to rotate the point about the point by most angles.

But I always implicitly thought that spatial notions not depending on distances or angles required only the rationals. It turns out that I was wrong: there are spatial notions not depending on distances or angles which differ depending on whether you take space to be or . The fact that I was wrong follows from a theorem of Micha Perles which is very famous in combinatorics, but which I only found out about recently.

I found out because the combinatorialist Drew Armstrong told me about it, and he referred me to the online book Lectures on Discrete and Polyhedral Geometry by Igor Pak.

Actually, the fact that I was wrong follows just from a lemma in the proof of Perles’s result, which I will state before telling you what Perles’s main result was.

Consider the following system of points and lines (the image is stolen from Pak’s manuscript):

The lemma is then that while there is a collinearity- and noncollinearity-preserving embedding of this diagram into , there is not one into . Note that the question of a collinearity- and noncollinearity-preserving embedding of the diagram says nothing about angles or distances. The proof is simply to assume that there is a rational embedding, then to find a rational transformation of the configuration to one where you know that one of the points has an irrational coordinate. This proof appears on page 108 of Pak’s book.

Perles’s main theorem is the following, and I think it’s quite striking: A *polytope* is the convex hull of a finite set of points in some , where we consider two polytopes equivalent if they are combinatorially equivalent: i.e., if there is a bijection between the two sets of vertices such that if one pair of vertices has an edge between them, the corresponding pair does as well, etc. Then for all dimensions greater than 3, there is an -dimensional polytope which is *not* equivalent to one which is the convex hull of a set of points with only rational coordinates.

The discussion of this in Pak’s book is in Part I, Section 12.5.

Edit: I removed a paragraph on planar graphs because it didn’t really fit the article, and I took out the phrase “purely combinatorial property,” which was misleading and probably incorrect.

]]>

Well, we could certainly allow the set into our universe: natural numbers are the most basic computational objects there are. (Notation: I’ll use to refer to when we’re considering it as part of the universe we’ll building, and just when we’re talking about the set of natural numbers in the “real” world.) What should we take as our set of functions from to ? Since we want to admit only computable things, we should let be the set of computable functions from to , which we can represent non-uniquely by their indices (i.e., by the programs which compute them).

(For clarity, I’ll use the following notation for computable functions: denotes the partial function from to computed by the th Turing machine. Given , it is possible that the computation never halts; in that case, I’ll write and say that is undefined. If it does halt, I’ll write and say that is defined. If it halts, then it yields an output . To indicate what it is, I’ll write or .)

So, we’ve decided that should equal . What should equal? At this point, there is a slight subtlety: It’s not simply the set of computable functions from (considered as a subset of ) to (considered as ), because we would like to only admit those functions from to that return the same number when given inputs which represent the same element of .

Therefore, we’ll let be the set of such that, for all , , and whenever are such that for all , , then .

We can similarly define , except that there are now two places where we should take into account that we consider equivalent if for all , : We’ll let be the set of such that, whenever are equivalent in the aforementioned sense, and are defined and equivalent in the aforementioned sense.

In a similar fashion, we can define and so on; these sets are called the sets of hereditarily computable functions.

Can we generalize this construction to a category that incorporates all possible computable representations of real objects? More ambitiously, can we generalize to a category that is a genuine mathematical *universe* in the sense that questions like “Does the Riemann Hypothesis hold in this category?” are meaningful? The answer, due to Martin Hyland, is yes.

This material is from Jaap van Oosten‘s book “Realizability: A Categorical Perspective” (link to Preface, Introduction and Table of Contents). Unfortunately, I don’t know of a freely available explanation of the effective topos on the web, which is part of the reason why I’m writing this blog entry. (If you know of one, please leave a comment. Edit: Found one.) However, the Stanford Encyclopedia of Philosophy has a pretty good section on the realizability interpretation of intuitionistic logic, on which Hyland’s effective topos is based.

Back to the math. Notice that what we did in the case was the following: Although we represented the computable functions as a subset of , we still kept the “real” set hiding around in the background: we used it to determine what the appropriate elements of should be: If two elements of represented the same element of the “real set” , then an element of should assign the same number to both of them.

That suggests a generalization. Let an* assembly* be a pair , where is a set, and is a function from to , the power set of . We think of as assigning to each element of its computable representations. We let a morphism between two assemblies and be a function from to such that there is an such that, whenever , and , .

With these morphisms, the class of assemblies forms a category. Let and be two assemblies. Then they have a direct product given by , where and is a pairing function. They have an exponential given by , where .

If we let be the assembly , where , then the iterated exponential objects of correspond precisely to our initial definition of the sets of hereditarily computable objects above.

This is all great, but we still can’t call the category of assemblies a mathematical *universe*. Why not? Well, in the real world, we ask questions like “Is the Riemann Hypothesis true?”, “Is Goldbach’s conjecture true”, etc., but we don’t yet know how to ask questions like “Is the Riemann Hypothesis true *in the category of assemblies*?” any more than it makes sense to ask whether or not the Riemann Hypothesis is true in the group or in the number 17. What we need is a way to interpret statements being true or not in this category.

In turning the category of assemblies into one in which we can interpret logical statements, there are three considerations, each of which builds on the previous ones.

**The object of truth values should have more than two elements.**Let’s step back to the ordinary category of sets for a moment. Say we have two sets and and an injection from to . Given , what is the truth value of the proposition that is in the image of ? Well, I don’t know, but it’s clearly either or . But in the category of assemblies it’s more complicated due to the computational information we have lying around. Say that we have an injective morphism from an assembly to an assembly . Given , now what is the truth value of the proposition that is in the image of ? What if there is a such that but ? Without resolving the issue now, one plausible answer is that the truth value should be the set of indices of all computable functions taking to , so that the more “alike” and , the more “true” the proposition is, and furthermore this “alikeness” is represented in a computational way. So, a working hypothesis is that the set of truth values should be something like .**Objects should come equipped with an equivalence relation.**In the category of assemblies, there is no question about whether or not two elements of a given object are equal. If we are making a category where the object of truth values is something like , however, we should allow that the proposition that different elements are equal has a truth value in that object, rather than in the classical set of truth values. Therefore, objects should be something like , where is a set and is a map from to . (We can represent an assembly as the object in our new category where and if ).**Morphisms should be more general than functions.**If we’re allowing objects to come equipped with some sort of equivalence relation, we will have to let morphisms be more general than functions: If is a morphism from and , and , then is also true to some degree. So morphisms should probably be some sort of relation on that resembles a function in some way.

Now, after listing all those (somewhat vague) considerations, I’ll describe the category that takes them into account. It’s called the Effective Topos and it was discovered/invented by Martin Hyland.

**Description of the category**

The objects of the effective topos are pairs where is a set and is a map from to . This map is required to satisfy the following properties:

- There must be a number such that for all if then .
- There must be a number such that for all if and then .

(In the above, stands for “symmetric” and stands for “transitive.”)

A morphism from to is represented by a function satisfying the following:

- There must be a number such that for all , if then where and .
- There must be a number such that for all , , if , , and , then .
- There must be a number such that for all , , if and then .
- There must be a number such that for all , if then .

(In the above, stands for “strict,” stands for “relational,” stands for “single-valued” and stands for “total.”)

We say that two such representations and are equivalent if there exist such that for all , , if then and conversely if then . (Thus, a morphism in the Effective Topos is actually an equivalence class of representations as above.)

Figuring out how to compose such morphisms is an exercise left to the websurfer.

Let and be two objects. Their direct product is given by where . To form the exponential , take the object , where in the definition of , you emulate the definition of a morphism given above.

The object of truth values (often denoted in any topos) is , where .

The object playing the role of a singleton set is where .

The map from to representing the truth value is given by the equivalence class of the map defined by .

The natural numbers object of the effective topos is , where and where .

**Interpretation of logical formulas in the effective topos.**

I’ll now describe how logical formulas can be interpreted in Hyland’s effective topos. If are variables intended to range over the objects respectively and is a formula with free variables from , then I’ll show how to find a map from to interpreting that formula. If , and thus the formula has no free variables and is a sentence, then the interpretation will give a map from to . We say that a sentence holds in the effective topos if its interpretation is equal to the map defined above.

The only atomic relation is equality, and the interpretation of atomic formulas is given by the component of the objects of the effective topos.

For clarity, assume that and contain only one free variable, and that it ranges over the object . If we know the interpretations of and already, then we have the following:

- is represented by the map taking to .
- is represented by the map taking to .
- is represented by the map taking to .

Now suppose that has two free variables ranging over and respectively, and I’ll show you how to interpret quantifiers.

- is represented by the map taking to .
- is represented by the map taking to .

Now, once we observe that we can interpret the power “set” of an object in the effective topos as the exponential , we know how to interpret all first- and higher-order sentences as holding or not in the effective topos.

Here are some interesting sentences given by Van Oosten that highlight some differences between the effective topos and the ordinary category of sets.

- Note that we may write the relation “” as a relation on in our language. Then the sentence is true in the effective topos.
- For every formula , where is a variable ranging over and is a variable ranging over , the sentence holds in the effective topos.
- We may construct the rationals and the reals in the effective topos just as we do in the category of sets. However, they have different properties. For example, in the effective topos the statement “There exists a bounded monotonic sequence in that does not converge to a limit.” holds, contradicting the Bolzano-Weierstrass theorem. Intuitively, this is because we can find a bounded, monotonic sequence converging to a real number whose binary expansion encodes the halting problem but such that every member of the sequence has a decidable binary expansion.
- The sentence holds in the effective topos.
- Similar to the above, we may can show that the intermediate value theorem fails in the effective topos.
- In the effective topos, the statement “All functions from to are continuous” holds.

]]>

`x`

and adds it to `3`

, the compiler will figure out that `x`

is an `int`

. (It couldn’t be a `float`

, since it was added to `3`

and not `3.0`

.)
But it often seems like the compiler should be able to infer not just the types of expressions, but the expressions themselves! For example, if the compiler infers that the type of some function `f`

is `(int -> int) -> (int list) -> int list`

(i.e., `f`

is a higher-order function which takes a function from `int`

to `int`

, a list of `int`

s, and produces a list of `int`

s), then `f`

is very probably the `map`

function, defined informally by

`map g [x_1;...;x_n] = [g x_1;...;g x_n]`

.

Therefore, if the compiler determines that some expression has that type, and the user has somehow omitted the actual function definition, why not allow the compiler to infer what the expression is?

I made a stab at implementing this type of idea in a toy language I call TermInf (apologies for the weird hosting: I don’t have another hosting service at the moment). It’s a modification of the toy language Poly from Andrej Bauer’s Programming Language Zoo. You’ll need OCaml to compile it. Please feel free to alert me to any bugs or to tell me that my code is horrible.

More details below.

The basic idea is really simple: For any expression `e`

, the expression `{e}`

is also an expression. The compiler will infer the type `t1`

of `e`

and the type `t2`

that `{e}`

has to be. It will search for a sequence of coercions taking `t1`

to t2 and if there is a unique one, it will replace `{e}`

with that sequence of coercions applied to `e`

.

Which functions are coercions is determined by the user; functions can be declared to be coercions or removed from the list of coercions at any point.

I can think of at least three ways this would be useful.

**1. Automatically coercing from one base type to another**

This is actually the least interesting of the three, but it serves to illustrate how TermInf works.

You can use `$show_coercions`

to show all the current coercions. The identity

function `id`

is always a coercion.

TermInf. Press Ctrl-D to exit. TermInf> $show_coercions id

Let’s define a new coercion from `bool`

to `int`

.

TermInf> let_coercion bool_to_int = fun x -> if x then 1 else 0 val bool_to_int : bool -> int TermInf> $show_coercions bool_to_int id

Now we can use the coercion.

TermInf> {true} + 7 - : int = 8

In that instance, the interpreter could determine that the type of `{true}`

had to be `int`

, since it was added to `7`

. In the following instance, the interpreter can’t determine type of `{true}`

.

TermInf> {true} Problem with term inference.

But we can always explicitly give a type to any expression, so we can use that to tell the type-inferer what the type of `{true}`

is.

TermInf> {true} : bool - : bool = true TermInf> {true} : int - : int = 1

**2. Lifting functions**

We can view the function `List.map`

as a coercion, taking a function `'a -> 'b`

to a function `'a list -> 'b list`

.

TermInf> let_coercion map = rec map is fun f -> fun l -> match l with [] -> [] | x::ll -> (f x)::(map f ll) val map : ('a -> 'b) -> 'a list -> 'b list TermInf> $show_coercions map bool_to_int id

Now we can try it out.

TermInf> let square = fun x -> x * x val square : int -> int TermInf> ({square} 3) : int - : int = 9 TermInf> ({square} [1;2;3]) : int list - : int list = 1 :: 4 :: 9 :: [] TermInf> ({square} [[1;2];[5;6;7]]) : int list list - : (int list) list = (1 :: 4 :: []) :: (25 :: 36 :: 49 :: []) :: []

Note that in our case, we had to explicitly tell the interpreter what the return type was, although presumably in practice the interpreter or compiler would usually be able to infer it.

The idea is that we can change the basic structure of the thing passed to `{square}`

, and the term inferer will adapt. Note that in the third case, the term inferer iterated map to produce the required `(int list list -> int list list)`

type.

We can similarly look inside the structure of pairs.

TermInf> let_coercion map_pair = fun f -> fun x -> (f (fst x), f (snd x)) val map_pair : ('a -> 'b) -> 'a * 'a -> 'b * 'b TermInf> ({square} [(1,2);(3,4)]) : (int * int) list - : (int * int) list = (1, 4) :: (9, 16) :: [] TermInf> ({square} ([1;2],[3;4])) : (int list) * (int list) - : int list * int list = (1 :: 4 :: [], 9 :: 16 :: [])

Essentially all variants of map can be added. For example, the function `mapi : ((int * 'a) -> 'b) -> 'a list -> 'b list`

where the function takes the index of the list element can be added. Then the term-inferer will determine which version of map (or sequence of versions of map) is needed based on the function given to it.

**3. Term inference in conjunction with phantom types.**

I put just enough type aliasing in TermInf to allow you to use phantom types. (For a great introduction to phantom types, see this blog post).

Here’s an example of how type aliasing works in TermInf:

TermInf> type hidden = int TermInf> let f = (fun x -> x + 7) : hidden -> hidden val f : hidden -> hidden TermInf> let x = 3 : hidden val x : hidden TermInf> f x - : hidden = 10 TermInf> f 3 The types hidden and int are incompatible

Something we might like to do with phantom types is have the type system do a static dimensional analysis on our program. Here’s an attempt to do that:

TermInf> type meters TermInf> type gallons TermInf> type 'a units = int TermInf> let add = (fun x -> fun y -> x + y) : 'a units -> 'a units -> 'a units val add : 'a units -> 'a units -> 'a units TermInf> let times = (fun x -> fun y -> x * y) : 'a units -> 'b units -> ('a * 'b) units val times : 'a units -> 'b units -> ('a * 'b) units TermInf> let one_gal = 1 : gallons units val one_gal : gallons units TermInf> let one_m = 1 : meters units val one_m : meters units

Then we have the following correct behavior:

TermInf> add one_gal one_gal - : gallons units = 2 TermInf> times one_gal one_m - : (gallons * meters) units = 1 TermInf> add one_gal one_m The types gallons and meters are incompatible

But the following is not correct:

TermInf> let x = times one_gal one_m val x : (gallons * meters) units TermInf> let y = times one_m one_gal val y : (meters * gallons) units TermInf> add x y The types gallons and meters are incompatible

Of course, the problem is that the interpreter doesn’t know that units commute.

But we can fix this with coercions.

TermInf> let_id_coercion commute = id : ('a * 'b) units -> ('b * 'a) units val commute : ('a * 'b) units -> ('b * 'a) units TermInf> add x {y} - : (gallons * meters) units = 2

We’ve declared `commute`

to be an identity coercion (by using `let_id_coercion`

instead of `let_coercion`

) to help the interpreter when it’s deciding if a term inference is unique or not.

Note that we don’t use term inference on both `x`

and `y`

, because then it couldn’t determine what type to give it.

TermInf> add {x} y - : (meters * gallons) units = 2 TermInf> add {x} {y} Problem with term inference.

This version of `commute`

will just commute the two units at the top level, but there are a finite number of identity coercions that you can define that will give you associativity and commutativity (and inverses, if you want). Thus, the type system will be able to perform a static dimensional analysis on your program.

Edit: I should note that I left out several details about how this actually works. For example, the interpreter doesn’t search through *all* sequences of coercions, since there are infinitely many (and the problem of deciding if there is a unique one between any two given types is undecidable in general). Instead it limits itself to sequences of coercions whose type is never “bigger” that the starting type or the goal type, where “bigger” is defined by a straightforward length function.

]]>

This image from the Wikipedia article shows a typical sequence of moves for a chocolate bar:

At this point, Player A is forced to eat the poisoned square and hence loses the game.

Although the question of *what* the winning strategies are for this game is very much an open problem, the question of *who* has a winning strategy is not: On the board, Player B wins (since Player A must eat the poison piece on his first move). But for any other board, Player A has a winning strategy.

To see why, suppose not. Then if Player A’s first move is to eat just the one square in the top right-hand corner, Player B must have a winning response (since we are supposing that Player B has a winning response to *any* move that Player A makes). But if Player B’s response is winning, then Player A could have simply made that move to start with.

However, suppose we play Chomp not just on boards, but on boards, where and are arbitrary ordinals. The game still makes sense just as before, and will always end in finite time, but Player A no longer wins all of the time (there will no longer be a top right-hand corner square if either or is a limit ordinal).

Scott Huddleston and Jerry Shurman investigated Ordinal Chomp in this paper, and showed that it has a number of interesting properties. I’ll describe a few of them below.

First of all, to make things a bit easier to discuss, we will consider only games where is finite rather than games where is arbitrary. However, everything said will go through fine in that case as well.

Secondly, we will use the following game which is equivalent to Chomp: At any point in the game, the board is described by a non-increasing sequence . On each turn, the appropriate player picks an and a such that and replaces the sequence with . We call this *taking a bite of height *. Playing on an chocolate bar as described above corresponds to playing with the sequence consisting of ‘s.

For ease of discussion later, we make the convention that any sequence (not just a non-increasing one) describes a game position by stipulating that is the same as the position .

We saw above that Player A wins all non-trivial finite Chomp games, so let’s start by looking at a transfinite chomp game that Player B wins: , or in our new notation. What is Player B’s winning strategy? Well, notice that Player A’s first move has to put the game either in the state for some or for some . In either case, Player B can then move the game into state of the form for some . From a position of that form, whatever Player A does, Player B can again move to a position of that form. Eventually, Player B will move to the position , and Player A will be forced to eat the poison piece.

So, Player B can win at least *some* transfinite Chomp games, although he still loses a lot of them: for example, he loses all games of the form where . The reason is that in such a game Player A can win by first moving to the position and then using Player B’s winning strategy! Similarly, Player B loses all games for .

In fact, for any ordinal , there is exactly one ordinal such that Player B wins . This is a consequence of what Huddleston and Shurman call the Fundamental Theorem of Transfinite Chomp in the above paper. Another interesting consequence, which illustrates the style of reasoning used in their proof of the Fundamental Theorem, is the following:

**Theorem:** For any sequence of ordinals, there is exactly one ordinal such that Player B wins the game . (Remember the convention above about sequence which are not necessarily non-increasing.)

**Proof:** The uniqueness is similar to the argument given above: If Player B has a winning strategy on the game then Player A has a winning strategy on all games where , since Player A can just move to the position and then use Player B’s winning strategy.

The existence is by induction. Fix and suppose that for all where for all and for at least one , we know that there is a such that Player B has a winning strategy.

For each ordinal , let be the set of all obtainable from by taking a bite of height (this term was defined above, if you forgot what it means).

Let . Let be the minimal ordinal not in . I claim that Player B has a winning strategy for . We will show this by showing that, for any move Player A makes, Player B can make a move that we know leads to a winning strategy for B.

So suppose Player A moves to . Either or . First suppose that . This means that Player A moved by taking a bite of height (say) out of by moving to . But by construction, we know that , which means that the player who is to move (Player B in this case) has a winning strategy.

Now suppose . This means by construction of that where for all and for at least one . Thus, if Player B moves to the position he ensures himself a win. .

Note that this proof is constructive. This means that you can actually use it to compute that (as we already know), the unique such that Player B wins is . As a puzzle, you might like to find the unique such that Player B wins (or such that Player B wins or or whatever you like, although the last one will be hard).

The much-more-general Fundamental Theorem of Transfinite Chomp in the paper linked above is also constructive. It allows you, in theory, to compute who will win the -dimensional game (we have been considering -dimensional games) for any ordinals . However, this is quite difficult in practice: according to the Wikipedia article, it is an open question who wins the -dimensional game .

As a final note, in the book Tracking the Automatic Ant, David Gale gives a very nice non-constructive proof that for all , there is a unique such that Player B wins .

]]>

The standard solution to this is essentially to forbid the construction of any set which is too *big*. This solves the problem since you can prove that there are many sets which are not members of themselves, making too big to be a set. But you also end up throwing out many sets which you might want to have: for example, the set of all sets, the set of all groups, etc.

Randall Holmes recently published a paper espousing another solution: instead of forbidding the construction of sets which are too *big*, forbid the construction of sets which are too *asymmetric*. Details below.

Imagine you have some permutation of the universe of sets. Because any set is also a set of sets, we can also consider the related permutation defined by . That is, acts on a set by applying to ‘s elements. By iteration, we have for any .

For , say that a set is -symmetric if for all permutations of the universe of sets. We say that a set is symmetric if it’s -symmetric for some . Holmes’s criterion is then to forbid the construction of any set which is not symmetric. (You may have noticed that this discussion is not quite rigorous. Holmes’s paper has a fully rigorous formalization of this.)

So which sets are symmetric? First of all, notice that the empty set is symmetric, as it’s 1-symmetric. Therefore the set consisting of solely the empty set is 2-symmetric and therefore symmetric. Similarly any hereditarily finite set (this means a set which can be written down with a finite number of ‘s and ‘s and ‘s and nothing else) is symmetric, since it will be -symmetric where is the maximum depth of the braces.

It’s also the case that the set of all sets is 1-symmetric, so that exists. What about the set of all groups? A group will be encoded as some ordered pair of a set and a binary operation on that set, and a binary operation will be further encoded as a set of ordered pairs. The set of all groups will be -symmetric where is large enough to “pierce” the encoding, so that it ends up just permuting the group elements (and thus permuting the groups and sending the set of all groups to itself).

Can we develop mathematics in this theory? It seems that constructing the natural numbers will be a problem. The usual (von Neumann) definition of the natural numbers is that:

and, in general, each natural number is the set of all the preceding ones. All of these sets exist, since the von Neumann definition of will be -symmetric, but the set of all natural numbers is not symmetric.

However, we can go back instead to Frege’s original definition of the natural numbers: each is represented as the set of all sets of cardinality . For each , Frege’s definition of is 2-symmetric, and the set of all natural numbers is 3-symmetric. The rationals and reals can be constructed as usual.

So, how do we know that the set is not symmetric? We don’t, but an encouraging fact is the following: There is no known way to prove that for any formula , the set exists. Instead, one can prove that exists if is *stratified*: this means that one can assign a natural number to each variable in so that for any occurrence of the formula in , is assigned the number one less than that assigned to , and for any occurrence of the formula in , is assigned the same number as that assigned to . The formula defining is emphatically not stratified!

If you like working with universal sets, but it makes you uneasy to use a set theory which you don’t know is consistent, check out NFU. It uses the concept of stratified formulas to avoid Russell’s paradox, allows the existence of the set of all sets (and set of all groups, etc.) and is known to be consistent relative to ZFC. In fact, Randall Holmes proposed the system I’ve discussed here as a way of clarifying the semantics of a related set theory. A book developing mathematics in NFU is here.

]]>

is provable (by a very simple proof!), it’s not possible to prove the truth or falsity of all such identities. This is because Daniel Richardson proved the following:

Let denote the class of expressions generated by

- The rational numbers, and .
- The variable
- The operations of addition, multiplication, and composition.
- The sine, exponential, and absolute value functions.

Then the problem of deciding whether or not an expression in is identically zero is undecidable. This means as well that the problem of deciding whether or not two expressions are always equal is also undecidable, since this is equivalent to deciding if is identically zero.

A summary of Richardson’s proof (mostly from Richardson’s paper itself) is below.

The proof depends on the MRDP theorem, which says that for any recursively enumerable set , there is a polynomial such that

For all , iff there exist such that .

At the time that Richardson proved his result, the MDRP theorem was not proven. Instead, only the weaker result where is allowed to be an exponential polynomial (i.e., the are allowed to appear as exponents in ) had been proven, and so that’s what he used. I haven’t read Richardson’s proof closely enough to determine if his result can be improved using the full MDRP theorem.

In any case, since there are recursively enumerable sets which are not decidable, we may let be an exponential polynomial such that the problem of deciding, given , whether or not there are such that is undecidable.

Therefore, the problem of deciding, given , whether or not there are *non-negative real numbers* such that

is undecidable.

Now, let be an exponential polynomial which grows very fast (and such that is very large). Then, if is a natural number and there are non-negative reals such that

is less than one, then both and are small. From this last fact, we conclude that each is close to a natural number. Let be the natural number closest to . Then, will be small. But then, because it’s an integer, it will be zero.

Therefore, we have an expression formed from sine and exponential functions (and rational numbers, addition, and multiplication) such that for each , there exist non-negative reals such that iff there exist natural numbers such that (which is an undecidable problem).

By an argument which I won’t reproduce here, we can replace with with the property that for each there exists a real such that iff there exist natural numbers such that . (Notice that now ranges over all reals.) But now, consider the sequence of functions

Each is identically zero iff the corresponding is ever less than zero, which is an undecidable problem.

The reference for Richardson’s paper is: Daniel Richardson, “Some unsolvable problems involving elementary functions of a real variable,” Journal of Symbolic Logic, Volume 33, 1968, pages 514–520

Another reference is: B.F. Caviness, “On canonical forms and simplification,” Journal of the ACM, volume 17, 1970, pages 385–396.

]]>

We also know a priori that there must be uncomputable functions, since there are functions from to but only computer programs. But that is nonconstructive, and the two examples I gave above seem a bit like they’re cheating since their definitions refer to the concept of computability. Is there a natural example of an uncomputable function that does not refer to computability?

In this paper, Alex Nabutovsky found what I think is a great example of such a function from geometry. Details below.

For any , let be the -dimensional unit sphere . For all there is an “equatorial” embedding of into sending to . This is certainly the nicest way of embedding into but there are other ways.

If is an embedding of into then let its *amount of wiggle room* be the maximum amount that ‘s image can be thickened before it intersects itself. More precisely, it is the maximum such that , where , and are in the image of , is the unit normal to the image of at and is the unit normal to the image of at . We let ‘s *crumbledness *be the reciprocal of its amount of wiggle room.

It is known (by a theorem of Stephen Smale) that any embedding of into can be isotoped to the equatorial embedding (up to reparameterization), but you may have to increase the crumbledness to do it. Nabutovsky proves that, for any dimension and crumbledness , there is an such that any embedding of crumbledness can be isotoped to the equatorial embedding going through only embeddings of crumbledness . We lose nothing in terms of complexity by considering .

Nabutovsky shows that for any , any satisfying the above condition is uncomputable, and, furthermore, the minimum which works grows like the busy beaver function!

The proof depends on the fact that for , it is undecidable if a compact manifold (presented either as a simplicial complex, or as a zero-set of a polynomial with rational coefficients, or some such representation that a computer can handle) embedded in is diffeomorphic to . (For a good summary of these types of undecidability results in group theory and topology, see Section 3 of Bob Soare’s Computability and Differential Geometry.) Essentially, the idea is that if were computable, you could decide if a manifold were homeomorphic to by taking embedding it in , measuring its crumbledness (say as ), then checking all possible isotopies of the manifold through embeddings of crumbledness . The fact that the manifold’s crumbledness can be measured and that all possible isotopies going through embeddings of bounded crumbledess can be checked computably is related to the fact that often you can computably search over compact spaces, as I wrote about in this post.

If you can get a hold of a copy, I highly recommend Shmuel Weinberger’s book Computers, Rigidity, and Moduli, where he talks about this and other related results in a very lively and engaging manner.

Edit: Fixed some notation.

]]>

A *symplectic structure* on a manifold is a differential -form satisfying two conditions:

- is
*non-degenerate*, i.e. for each and tangent vector based at , if for all tangent vectors based at , then is the zero vector; - is
*closed*, i.e. the exterior derivative of is zero, i.e. .

In trying to come up with answers to questions like “what do you do?” and “what is symplectic geometry?” that would be accessible to an advanced undergraduate or beginning graduate student, I’ve tried to come up with fairly intuitive descriptions of what the two symplectic structure conditions really mean.

Non-degeneracy is pretty easy, because my intended audience is certainly familiar with the dot product in Euclidean space, and probably familiar with more general machinery like inner products and bilinear forms. A *bilinear form* on a vector space over a field is just an assignment of a number in to each pair of vectors, in such a way that the assignment is linear in each vector in the pair. A bilinear form is called *non-degenerate* if the only thing that pairs to zero with every single vector is the zero vector. A -form on is a collection of skew-symmetric bilinear forms, one for each tangent space of . Saying that is non-degenerate is saying that each of these bilinear forms is non-degenerate.

It’s much less clear how to describe to the uninitiated what the closed condition means. It’s even a bit unclear why this condition is required in the first place. A pretty nice answer came up yesterday, in a reading group I attend that is trying to learn about generalized complex structure. We are going through the PhD thesis of Marco Gualtieri, titled “Generalized Complex Geometry”. It is available at the following websites:

http://front.math.ucdavis.edu/0401.5221

http://front.math.ucdavis.edu/0703.5298.

This was the first meeting, and Tomoo Matsumura was the speaker. He suggested that the requirement that is an *integrability condition*. I had never thought of it this way, but I probably will from now on.

**Almost complex and complex structures**

Let me first describe what integrability means for an almost complex structure on a manifold. A *complex structure on a vector space* , where is real and finite-dimensional, is a linear endomorphism such that . Taking the determinant of both sides, we have . Since , we must have , so must be even. Furthermore, since , we know , so is a linear automorphism. The complex structure makes into a complex vector space, by setting

for and .

The standard example is with its usual ordered basis, labelled , and complex structure defined by and for all . Putting , we obtain the usual ordered basis for with its usual complex structure.

Let be a -dimensional manifold. An *almost complex structure* on is a smoothly-varying collection of complex structures, one for each tangent space of . (The existence of an almost complex structure forces to be even.) An almost complex structure on is just a bunch of complex structures on the tangent spaces glued together smoothly along . Recall that as a manifold, all tangent spaces to a vector space can be canonically identified with the vector space itself, so a choice of complex structure on the vector space induces an almost complex structure on the vector space as a manifold.

An almost complex structure on is called a *complex structure* if the complex structures on the vector spaces fit together in an even nicer way. We require that there be a covering of by coordinate neighborhoods such that on each such neighborhood is the pullback of the standard complex structure on . We require also that all transition maps for these coordinate charts be holomorphic with respect to the standard complex structure. This collection of coordinate charts form a complex atlas for , and give the structure of a complex manifold. (Notice that’s it’s easy to choose coordinates so that a single looks like the standard one, . We require that this hold not just at a single point, but in an entire neighborhood of the point.)

An almost complex structure is called *integrable* if it is actually a complex structure. There are many integrability conditions for almost complex structures, such as the vanishing of the *Nijenhuis tensor* associated to an almost complex structure.

**Almost symplectic and symplectic structures**

Now we give a parallel discussion for symplectic structures. A *symplectic structure on a vector space* , where is real and finite-dimensional, is non-degenerate and skew-symmetric bilinear form . Choose a basis for and represent by a matrix relative to this basis. Because is non-degenerate we know , and because it is skew-symmetric we know that , so . Hence , so must be even.

The standard example is with its usual ordered basis, labelled , and symplectic structure defined by and , where is the Kronecker delta.

Let be a -dimensional manifold. An *almost symplectic structure* on is a smoothly-varying collection of symplectic structures, one for each tangent space of . (The existence of an almost symplectic structure forces to be even.) An almost symplectic structure on is just a bunch of symplectic structures on the tangent spaces glued together smoothly along . As before, a choice of symplectic structure on a vector space induces an almost symplectic structure on the vector space as a manifold.

An almost symplectic structure on is called a *symplectic structure* if the symplectic structures on the vector spaces fit together in an even nicer way. Analogous to the complex structure case, we require that there be a covering of by coordinate neighborhoods such that on each such neighborhood is the pullback of the standard complex structure on . We require also that all transition maps for these coordinate charts be symplectic with respect to the standard symplectic structure. A manifold with a symplectic structure is a symplectic manifold.

Not much seems to be said about almost symplectic structures on manifolds, and so even less is said about *integrable* almost symplectic structures. But if one were to say something about them, surely the first thing would be to notice that, by Darboux’s Theorem, there is an extremely simple integrability condition. This is exactly that be closed, i.e. that .

**Summary**

To summarize, every manifold is locally isomorphic to some . An almost complex manifold is one equipped with a smoothly varying collection of complex structures on its tangent spaces. An almost complex manifold is a complex manifold if it is locally isomorphic to some with its standard complex structure. In this case, the almost complex structure is called integrable. Every previous sentence in this paragraph holds with the word “complex” replaced with “symplectic”. There are many well-known conditions for an almost complex structure to be integrable. To the best of my knowledge, there is really only one well-known condition for an almost symplectic structure to be integrable, and this is the innocuous looking requirement that .

]]>

]]>

This first one is Euler’s original argument for the equality of two expressions (both of which happen to define ):

I’ll also sketch how this can be made rigorous in non-standard analysis.

The argument is as follows: The limit is equal to , where is infinitely large. By the binomial theorem, this is:

Since is , this is the sum as ranges from to of:

Now, if is infinitely large, this term is so small that it may be neglected. On the other hand, if is finite, then for . Therefore

and the whole sum is equal to

,

as desired.

Now, I’ll sketch how to make this rigorous in non-standard analysis. This is from Higher Trigonometry, Hyperreal Numbers, and Euler’s Analysis of Infinities by Mark McKinzie and Curtis Tuckey, which is the best introductory article on non-standard analysis that I’ve read.

In non-standard analysis, one extends the real numbers to a larger field which contains all the reals, but also a positive which is less than every positive real (and hence also a number which is greater than every real). For every function , there is a function , and the ‘s satisfy all the same identities and inequalities formed out of composition that the ‘s do. (For example, for all hyperreal .) For that reason, I’ll often omit the . The range of is called , the hyperintegers. Since , the same is true in the hyperreals and there are therefore infinite hyperintegers.

We call a nonzero hyperreal infinitesimal if is less than every positive real. We say that and are close (written ) if is infinitesimal. We say that is infinite if is infinitesimal (equivalently, if is greater than every real). We say that is finite if it’s not infinite (equivalently, if is less than some real).

Let be a sequence of hyperreals. We say that it is determinate if whenever and are infinite

The Summation Theorem can then be proven: If and are two determinate sequences such that for all finite , then for all infinite .

By appropriately replacing “equals” with “is close to”, Euler’s argument above may now be adapted to prove that for all infinite and ,

(the sequence may be proved determinate by comparison with the geometric sequence, which is easily shown determinate). By a transfer principle, this may in turn be used to prove that (in the regular reals).

]]>

To get to the Others’ side.

(Composed by Tim Goldberg.)

]]>

It is known that there is no algorithm which will decide whether or not an arithmetic statement is true or not. This shouldn’t be surprising, since if there were such an algorithm, it would be able to automatically prove Fermat’s Last Theorem, settle Goldbach’s Conjecture and the Twin Prime Conjecture, etc.

However, if we call a quasi-arithmetic statement one which uses the quantifiers “for all but finitely many ” (denoted “”) and “there exists infinitely many ” (denoted “”) instead of “” and “”, then we *do* have an algorithm for deciding whether a quasi-arithmetic statement is true or not!

This was shown by David Marker and Ted Slaman in this note. The proof goes as follows.

First, observe that “” is equivalent to “”, so that we can eliminate all occurrences of .

Next, note that “” is equivalent to “.” Thus, we can replace with , where is defined to be the quantifier .

Now, prove that all statements involving only the quantifier are true in iff they are true in . This is proved by induction on the structure of the formulas. The crucial step is the following: If holds in , then is true for all sufficiently large natural numbers. However, a subset of defined only by quantifiers over is a semialgebraic set, and it is known that all semialgebraic subsets of are finite unions of points and intervals. Therefore, if all sufficiently large natural numbers are in some semialgebraic set, then all sufficiently large real numbers must also be in that set.

So, we have reduced the problem to that of deciding whether or not sentences involving the quantifier are true over . But, by a result of Tarski’s, there is an algorithm which will decide whether or not statements using the quantifiers and is true over and can be defined in terms of and .

How does Tarski’s proof work? The first step is to observe that deciding quantifier-free statements is easy, since it’s just a computation. So, the second step is to systematically eliminate quantifiers from statements. One instance of quantifier elimination is familiar to everyone is: is equivalent to . This follows from the quadratic formula. Sturm’s theorem is a generalization of this test which will tell you how many distinct real roots any polynomial has, and Tarski’s theorem is a generalization of Sturm’s theorem.

For information on practical algorithms for quantifier elimination over the reals see Algorithms in Real Algebraic Geometry.

]]>

You and Bob are going to play a game which has the following steps.

- Bob thinks of some function (it’s arbitrary: it doesn’t have to be continuous or anything).
- You pick an .
- Bob reveals to you the table of values of his function on every input except the one you specified
- You guess the value of Bob’s secret function on the number that you picked in step 2.
You win if you guess right, you lose if you guess wrong. What’s the best strategy you have?

This initially seems completely hopeless: the values of on inputs have nothing to do with the value of on input , so how could you do any better then just making a wild guess?

In fact, it turns out that if you, say, choose in Step 2 with uniform probability from , the axiom of choice implies that you have a strategy such that, whatever Bob picked, you will win the game with probability 1!

The strategy is as follows: Let be the equivalence relation on functions from to defined by iff for all but finitely many , . Using the axiom of choice, pick a representative from each equivalence class.

In Step 2, choose with uniform probability from . When, in step 3, Bob reveals , you know what equivalence class is in, because you know its values at all but one point. Let be the representative of that equivalence class that you picked ahead of time. Now, in step 4, guess that is equal to .

What is the probability of success of this strategy? Well, whatever that Bob picks, the representative of its equivalence class will differ from it in only finitely many places. You will win the game if, in Step 2, you pick any number besides one of those finitely many numbers. Thus, you win with probability 1 no matter what function Bob selects.

This puzzle originally had the following form:

Suppose that there are countably infinitely many prisoners: Prisoner 1, Prisoner 2, etc., arranged so that Prisoner can see Prisoner iff .

A warden puts either a red hat or a blue hat on each prisoner’s head, and asks each to guess the color of the hat on his own head. Prove that the prisoners have a strategy of coordinating their guesses so that only finitely many of them will be wrong.

As before, let be the equivalence relation on functions defined by iff and differ on only finitely many places. The prisoners’ strategy will then be: Beforehand, pick a representative from each equivalence class. Let be the color of the hat on Prisoner ‘s head. Then, since each Prisoner can see the color of the hats on Prisoner for , each prisoner knows which equivalence class is in. Suppose is the representative that they picked beforehand. Then, for each , Prisoner will guess that he’s wearing hat , and since , only finitely many of them will be wrong.

For some interesting comments on this puzzle, see Greg Muller’s blog post on it here and Chris Hardin and Alan Taylor’s paper An Introduction to Infinite Hat Problems.

After hearing this puzzle, Chris Hardin came up with a great generalization. Instead of having a Prisoner for each and declare that Prisoner can see Prisoner iff , let be an arbitrary partial order and declare that for each , there is a Prisoner , and that Prisoner can see Prisoner iff . Assuming again that red and blue hats are placed on all prisoners and that they must all guess the color of the hat on their own head, how many of them will be able to guess correctly?

Call a partially ordered set reverse well-founded if there are no strictly increasing chains in it. Chris Hardin and Alan Taylor showed in their paper A Peculiar Connection Between the Axiom of Choice and Predicting the Future that the prisoners have a strategy so that the set of prisoners who are wrong will be reverse well-founded. In the case of the original prisoners problem, this implies that there will be only finitely many prisoners who are wrong, since there are no infinite reverse well-founded subsets of .

Suppose that there is a Prisoner for each and that Prisoner can see Prisoner iff . Then, since all reverse well-founded subsets of are countable, at most countably many prisoners will be wrong under the Hardin-Taylor strategy. Since all countable subsets of are measure zero, this gives another way to win the game against Bob with probability one.

In fact, it implies that you can do more: You don’t need Bob to tell you , just . Hardin and Taylor express this by imagining that we represent the weather with respect to time as an arbitrary function . Then, given that we can observe the past, there is an almost perfect weatherman who can predict the current weather with probability 1. They further show that the weatherman can almost surely get the weather right for some interval into the future.

What is the Hardin-Taylor strategy? What the prisoners do is that they first choose a well-ordering of the set of functions from to (this uses the axiom of choice), and then for each , Prisoner simply guesses that his hat color is , where is the -least function consistent with what Prisoner sees.

Now, suppose that there is a sequence of prisoners who are wrong. Since each Prisoner sees all the prisoners that Prisoner for sees, we must have that . In fact, since for (since by assumption Prisoner was wrong about his hat color, whereas Prisoner will be right about it, since he can see Prisoner ), we have that , but this contradicts the fact that is a well-ordering.

]]>

The answer is “no” and the reason is that the devil could do the following: Think of the bills you have at the start as being numbered 1, 3, 5, etc. and imagine that the devil has an initial pile of bills numbered 2, 4, 6, etc. Then on the *n*th transaction, the devil gives you the two lowest-numbered bills from his initial pile and takes bill *n* from you (one can easily show that you have bill *n* in your possession at this point). Since the devil takes bill *n* from you on the *n*th transaction, he gets all the bills in the end and you end up with nothing.

So, even though you start with infinitely many bills and each transaction produces a net gain of one bill for you, after all the transactions are done you have nothing.

In that puzzle, the devil was able to use a tricky strategy to give you more than he took at each stage and still end up with everything. In the following puzzle, which made the rounds when I was a graduate student, *no matter what the devil does*, he takes everything from you!

You and the devil are taking a train ride together. The train stops at each ordinal. At stop 0, you have countably infinitely many dollar bills. At each stop, the devil does the following two things (in order):

- If you have nonzero number of dollar bills, the devil takes one and destroys it.
- The devil gives you countably infinitely many dollar bills.
Prove that no matter what the devil does, when the train reaches stop (the first uncountable ordinal), you will have no money.

Solution below.

For each stop which is before , let be the first stop by which all the dollar bills which you had at stop which will be destroyed by the devil before stop have already been destroyed by the devil.

Since you have only countably many dollars, is a supremum of countably many countable ordinals, and is therefore itself countable.

Now, let . As a supremum of countably many countable ordinals, is itself countable.

*Lemma*. For any , at stop , you have no money.

*Proof*. Suppose that you have at least one dollar bill at stop . Then one of your dollar bills will be destroyed by the devil at that stop. Let that be dollar bill . Then you must have had at all stops for sufficiently large . In particular, there is an ordinal such that you had both at stop and stop . But, by the definition of , this is only possible if is not destroyed at any countable ordinal. But is destroyed at stop , which is a contradiction.

*Corollary. *At stop , you have no money.

*Proof*. Suppose not. Any dollar bill that you have at stop you must have had at some countable ordinal . But then, by the above corollary, that dollar was destroyed by stop .

]]>

The obvious thing to do is to compute a Riemann sum for some large . However, this could be arbitrarily far from the true value of . For example, might be 0 for all , but might curve sharply up in between each and so that its definite integral is arbitrarily close to 1.

However, since is continuous on , it is uniformly continuous. This means that for all there is a such that whenever , . If we could compute a function such for all , whenever , , then we could compute the definite integral of with arbitrary accuracy: If we want to know to within 0.001, then choose so that and we can take , since .

So, one answer to the question “What extra information do we need in order to compute ?” is: a function witnessing ‘s uniform continuity.

In 1998, Alex Simpson showed, building on ideas of Ulrich Berger and others, that another answer is: Nothing!

For technical convenience, we switch from considering functions with domain and range to functions with domain and range , and try to compute the definite integral of such a function over , since that value will again be in .

For our representation of the reals in we will use functions from to , where we are thinking of as representing . All reals in have representations. Some have more than one, but that won’t concern us.

We will use to denote both the interval of reals from to and the set of its representations. Hopefully this will not cause confusion.

So a computable function from to is a computable second-order functional which takes a function from to and returns a function from to . (There are some technicalities about the fact that a real can have more than one representation, but we will ignore them.)

Similarly, a *predicate* on is a function from to . An amazing fact due to Ulrich Berger is that for any computable predicate we can decide whether or not there is an such that holds. (To see why this is amazing, observe that the analogous statement for computable predicates on the integers is false, due to the Halting Problem.)

What’s more, we can define computable functionals and so that, if is a computable predicate, returns or depending on whether or not and returns a number which satisfies if anything does.

Before we define those functions, I’ll give some notation. If is a function from to , then we let be defined by and for . We define and similarly.

Now define and by mutual recursion as follows: Let and let do the following: it uses checks to see if there is an such that and holds and, if so, it returns , where . If not, it uses to check if there is an such that and holds and if so, it returns , where . If both of those fail, it returns , where .

If you think about the above definitions for a while, it should probably become clear to you that they should work, *if *they terminate. But they seem way too crazy to terminate. In fact they do, given one condition which I’ll remark on below, and the crucial fact is that since is computable and total, when given an , it only looks at finitely many values before terminating.

The condition that allows them to terminate is that we specify that the evaluation is *lazy*. That is, when I say for example that returns , I mean that it returns the function which on input returns and *only on non-zero input* calculates to determine its return value for that input. So, it only calls when it needs to. Since every call to involves a recursive call to , if you followed all of them immediately, the program could not possibly terminate.

Note also that we can use to define a as well.

Now let’s use this to compute the definite integral. Observe that if , then , if , then , and if , then . Conversely, every real in has a representation such that , and similarly for the other two cases.

Therefore, if is such that for all , it must be the case that , and if we are trying to compute a representative for , we may safely say that , and similarly for the other two cases. Furthermore, if for all . Then .

Next, observe that, for any , by stretching it out and splitting it up, we may find and such that , and the range of on is equal to the range of on and the range of on is equal to the range of on

Therefore, the following definition for (almost) works: Given , first use to check if for all . If so, return . If not, check if for all or if for all and do the analogous thing. Otherwise, return .

Since is continuous, the process of splitting it up will eventually terminate, as will the algorithm.

Actually this algorithm doesn’t quite work, as there are some a couple of details that I’ve omitted, but it is essentially correct. For more information, see Alex Simpson’s paper, and for more on Berger’s work, see Martin Escardó’s blog post, both of which include actual runnable code.

]]>

For every

n,TprovesP(n)

and

Tdoes not prove that for alln,P(n)

To see that these statements are not actually contradictory, replace *T* with a person Bob and observe that it’s certainly possible that

For every

a,b,c> 1, andn> 2, Bob can verify thata^{n}+b^{n}≠c^{n}.

is true but that

Bob can verify that for every

a,b,c >1, andn> 2,a^{n}+b^{n}≠c^{n}.

is false, since for the first statement it suffices for Bob to be able to remember grade school arithmetic, whereas for the second statement to be true Bob must be able to prove Fermat’s Last Theorem.

If it’s the case that there is a property *P*(*n*) such that for all *n*, *T* proves *P*(*n*) but it is not the case that *T* proves for all *n*, *P*(*n*), then we could actually add “There exists an *n* such that *~P*(*n*)” to *T* and still have a consistent theory. Thus, we might be in the even stranger situation where

For every

n,TprovesP(n).

but

Tproves that there exists annsuch that ~P(n)

even though *T* is consistent. Such a *T* is called ω-inconsistent. (*T* is called ω-consistent if it is not ω-inconsistent.)

On the other hand, note that it’s *not* really possible that

Bob cannot verify that there exists

a,b,c >1such thata^{2}+b^{2}=c^{2}.

given that there are such *a*, *b*, and *c*, since we could just tell Bob an example and have him use his knowledge of grade-school arithmetic to verify it.

Similarly, for all the theories *T* we will consider, if there exists an *n* such that *T* proves *P*(*n*), then *T* proves that there exists an *n* such that *P*(*n*).

Now consider the following argument, which is a rough version of Gödel’s original argument for the first incompleteness theorem:

Suppose that *T* is a complete theory (meaning that for every sentence *S*, either *T* proves *S* or *T* proves ~*S*.) Let *G* be the sentence “*G* is not provable in *T*” which it turns out we can interpret in the language of number theory. Suppose that *G* is provable in *T. *Then there exists a proof of *G* in *T*, so *T* can prove “There exists a proof of *G* in *T.*” But this is the same as ~*G*. So, since *T* can prove both *G* and ~*G*, it is inconsistent.

Now suppose that *G* is not provable in *T*. If *T* was ω-consistent, then since *T* is also complete, we would know that *T* proved “There is no proof of *G* in *T*.” But this is equivalent to *G*. So, since *T* can prove both *G* and ~*G*, it is inconsistent.

So, what we have proven is that if *T* is ω-consistent, then it is incomplete. But Gödel’s First Incompleteness Theorem is usually stated as: If *T* is *consistent*, then *T* is incomplete. This is a stronger and cleaner statement. (Note that in both statements we are assuming that number theory can be expressed in *T*.)

J. Barkley Rosser was able to prove this stronger statement with a very clever change in the self-referential statement *G*.

What Rosser did was the following: Let *R* be the statement “For every proof of *R* in *T*, there is a shorter proof of ~*R*.” Let’s see how this gets around the issue of ω-inconsistency.

First suppose that *R* is provable in *T*, say by a proof of length *n*. If there actually is a shorter proof of ~*R* in *T*, then *T* is inconsistent. Otherwise, we can actually verify case-by-case that there is no proof of ~*R* in *T *of length < *n*. But then we have proven ~*R* in *T*, making it again inconsistent.

Now suppose that *~R* is provable in *T*, say by a proof of length *n. *If *R *is provable in *T*, then *T* is inconsistent, so suppose that *R* is not provable in *T*. Then, we can as before verify case-by-case in *T* that *R *is not provable in *T* by a proof of length ≤ *n*. But this means that we can prove in *T* that for any proof of *R* in *T*, there is a shorter proof of ~*R*: we can prove in *T* that there is no proof of *R *of length ≤ *n* and we may argue in *T* that for any longer proof of *R* we can take the proof of ~*R* of length *n*. But this is exactly what *R* says, so *R* is provable in *T* which is a contradiction.

]]>

]]>

As its name implies, the “Hardest Logic Puzzle Ever” has a number of complicating factors which will be irrelevant for this discussion. Instead, consider the following much simpler puzzle which will do just as well.

You are on an island populated by knights and knaves. Knights always tell the truth; knaves always lie. You meet a inhabitant of the island who you know is holding up either 1, 2, or 3 fingers behind his back. You don’t know if this inhabitant is a knight or a knave. By asking two yes-or-no questions, determine how many fingers the inhabitant has behind his back.

There are a number of ways to solve this. One solution follows from the observation that, for any question Q, if you ask a inhabitant of the island

If I asked you question Q, would you say “yes”?

then you will get a truthful response to question Q. A knight will tell the truth about his truthful response to Q, whereas a knave would lie about his false response to Q.

So one solution would be to ask

If I asked you if you were holding 1 finger up, would you say “yes”?

If the inhabitant says “yes”, then you know he is holding 1 finger up. On the other hand, if the inhabitant says “no”, then you ask “If I asked you if you were holding 2 fingers up, would you say `yes’?” If the inhabitant says “yes”, then he is holding 2 fingers up, otherwise he is holding 3 fingers up.

This works, and it seems like you can’t do any better: There are three possibilities for how many fingers the inhabitant could be holding up, and since you can only ask yes-or-no questions, you couldn’t determine which of the three possibilities holds with only one question. But this is exactly what Rabern and Rabern claim you *can* do.

Consider what would happen if you asked a knight, “Will you answer `no’ to this question?”. If he is bound to answer the question, then he is in trouble, because no matter if he says “yes” or “no” he will have lied. Rabern and Rabern argue that in this sort of situation, the knight’s head would simply explode as there is nothing else he can do.

Assume that a inhabitant’s head explodes exactly when he cannot consistently answer “yes” or “no” to a question. Given this, we can solve the above problem by asking the single question. To make things simpler, suppose that we know that the inhabitant is a knight. Then we may ask:

Is it the case that you are holding up one finger or (you are holding up two fingers iff you will answer “no” this question)?

In this instance, if he answers “yes”, then he’s holding up one finger, if he answers “no”, he’s holding up three fingers, and if his head explodes, he was holding up two fingers.

I think Rabern and Rabern’s argument is really clever, but I don’t think it goes far enough. If we can observe inhabitants’ heads exploding and reason based on it, we should be able to ask inhabitants about it. Consider what would happen if we asked a knight:

Is it the case that you will answer “no” to this question and that your head will not blow up upon hearing this question?

If the knight’s head does not blow up upon hearing the question, then he can neither truthfully answer “yes” nor answer “no.” Therefore his head blows up. But if his head blows up only when he can’t answer a question, there’s a problem because given that his head blew up, he could have consistently answered “no” to the question.

So what happens? Well it must be the case that God’s (or whoever decides whether or not to blow up a head) head blows up because God won’t be able to decide whether or not the knight’s head should blow up. Similarly, if we suppose that SuperGod is in charge of deciding whether or not God’s head blows up, SuperGod’s head is in danger of blowing up due to a clever self-referential question, and so forth. So we may actually extract an *unbounded* amount of information from a single yes-or-no question by choosing the question carefully and then observing how much of the universe is destroyed by our asking it.

We can eliminate this silliness a bit by seeing how this applies to the usual Liar’s Paradox. The Liar’s Paradox is

This sentence is false.

(Or “This sentence is not true,” but the two will be equivalent for our purposes.) If it was true, then it would be false, and if it was false, then it would be true. Many people say that this sentence simply doesn’t have a well-defined truth value.

But consider this sentence, which I call the Second-Level Liar’s Paradox:

This sentence is false and it has a well-defined truth-value.

We can’t say that this sentence doesn’t have a well-defined truth-value, since if it doesn’t, then it is unproblematically false! Similarly, we can’t say that it has a well-defined truth value, since in that case it reduces to the Liar’s Paradox and can be neither true nor false. So, the Second-Level Liar’s Paradox has no well-defined (well-defined truth-value or not)-value.

Of course, we can iterate this. It seems that we should have at least the following truth values: true, false, paradoxical_{0}, paradoxical_{1}, paradoxical_{2}, … and possibly more extending in to the ordinals depending on the expressive power of the language with respect to which sets of truth values it can refer to. (Here, a sentence is paradoxical_{i + 1} if it cannot consistently be paradoxical_{i}.)

I’ve made a lot of assumptions here which I’m sure could be challenged about how we should reason in murky, paradoxical situations. One issue which I’m not quite sure how to resolve is the question of why, given that a sentence being false implies that it has a well-defined truth value, the Liar Paradox and Second Level Liar Paradox are not equivalent. But my guess would be that this can be made to give some sort of interesting paraconsistent logic, and probably some interesting puzzles as well.

]]>

Later, I read the same problem in Krzysztof Ciesielski‘s excellent book Set Theory for the Working Mathematician. In that book Ciesielski gives an almost purely set-theoretic solution.

I’ll discuss both solutions below. (Don’t read on yet if you want to think about the puzzle first.)

Geometric Solution: First observe that you can partition a 2-sphere (e.g., ) minus two points into disjoint circles. The easiest way to see this is to see that you can do it if you remove the north and south poles from the sphere by taking the circles to be the lines of latitude. Then observe that you can drag the two holes at the two poles to any other two locations on the sphere you want and allow the circles to follow. (For example, say that the two holes are still on different hemispheres. Then the circles will still radiate out from the holes, but their centers will gradually become closer and closer to the north or south pole, depending on the hemisphere that the hole is in.)

Given that, the partition of is as follows: The first circles in the partition will be those in the -plane with center where and with radius 1. Now notice that every 2-sphere centered at the origin meets these circles in exactly two points. Thus we may complete the partition by taking a partition of each of these 2-spheres into circles separately.

There is another (probably much better) explanation of this solution at cut-the-knot.

Set-theoretic solution: Let be the first ordinal of cardinality . Pick a well-ordering of of length . We will build the partition of by transfinite recursion along . At each step , we will define a circle such that and for if . Hopefully, it’s clear that this suffices.

Here’s what to do at step . First of all, if is in some where then let and stop. Otherwise, pick a plane passing through which is not coplanar with any for . This is possible since there are planes through but only circles .

Now, each circle intersects in at most 2 points. Since there are circles in containing but only circles there must be a circle in which contains and is disjoint from each for . Let this circle be . This completes the proof.

]]>

**Multivariable Calculus **

*Definition *[Partial Derivatives].

Let be a function from to . We define the partial derivative (also written ) as follows: Given , let . Then is defined to be . A similar definition is made for , and for functions of more than two variables.

*Definition* [].

For , let . Note that .

The sets play the role in multivariable calculus that played in singlevariable calculus. For example, we have the following.

*Proposition.* Let be a function from to . Then, for all ,

and furthermore, and are unique with those properties.

The analogous statement is also true for functions of more than two variables.

We also have

*Proposition* [Extended Microcancellation]. Let . Suppose that for all , . Then each equals 0.

**Stationary Points and Lagrange Multipliers**

There is an interesting substitute for the method of Lagrange multipliers in Smooth Infinitesimal Analysis. To introduce it, I’ll first discuss the concept of stationary points.

Suppose that we’ve forgotten what a stationary point and what a critical point is, and we need to redefine the concept in Smooth Infinitesimal Analysis. How should we do it? We want a stationary point to be such that every local maximum and local minimum is one. A point gives rise to a local maximum of a single-variable function just in case there is some neighborhood of such that for all in that neighborhood.

However, in Smooth Infinitesimal Analysis, there is always a neighborhood of on which is *linear*. That means that for to be a local maximum, it must be \emph{constant} on some neighborhood. Obviously, the same is true if is a local minimum. This suggests that we say that has a stationary point at just in case for all .

*Definition* [Stationary Point of a Single-Variable Function]. Let and . We say that has a stationary point at if for all , .

Similarly, given a function of two variables, and a point , is linear on the set . This suggests that we define to be a stationary point of just in case for all .

*Definition* [Stationary Point of a Multivariable Function]. Let . We say that is a stationary point of if for all , .

Now, suppose we want to maximize or minimize a function subject to the constraint that it be on some level surface , where is a constant. Now, we should require of not that for all , but only for those which keep on the same level surface of ; that is, those for which . I’ll record this in a definition.

*Definition* [Constrained Stationary Point]. Let , . A point is a stationary point of constrained by if for all , if then .

I’ll show how this definition leads immediately to a method of solving constrained extrema problems by doing an example.

This example (and this method) are from [Bell2]. Suppose we want to find the radius and height of the cylindrical can (with top and bottom) of least surface area that holds a volume of cubic centimeters. The surface area is , and we are constrained by the volume, which is .

We want to find those such that for all those such that . So, the first question is to

figure out which satisfy that property.

We have

which is

If this is to equal , then we must have , so that .

Now, we want to find an so that where .

We have

which is

If this is to equal then we must have . Substituting , we get . By microcancellation, we have , from which it follows that .

**Stokes’s Theorem**

It is interesting that not only can the theorems of vector calculus such as Green’s theorem, Stokes’s theorem, and the Divergence theorem be stated and proved in Smooth Infinitesimal Analysis, but, just as in the classical case, they are all special cases of a generalized Stokes’s theorem.

In this section I will state Stokes’s theorem.

*Definition.* Given , , we say that if . We define to be the set .

*Definition.* Let be a curve, and be a vector field. The line integral is defined to be .

*Definition.* Let be a surface, and be a function. The surface integral is defined to be .

This definition may be intuitively justified in the same manner that the arclength of a function was derived in an earlier section.

*Definition.* Let be a surface, and be a vector field. The surface integral is defined to be

Note that this equals .

We extend both definitions to cover formal -linear combinations of curves and surfaces, and we define the boundary of a region to be the formal -linear combination of curves .

The curl of a vector field is defined as usual, and we can prove the usual Stokes’s Theorem:

*Theorem*. Let be a surface and a vector field. Then

This theorem may be used to compute answers to standard multivariable calculus problems requiring Stokes’s theorem in the usual way.

As an exercise, state the divergence theorem in SIA.

**Generalized Stokes’s Theorem**

The definitions in this section are directly from [Moerdijk-Reyes].

*Definition *[Infinitesimal -cubes]. For , and any set, an infinitesimal -cube in is some where and .

Intuitively, an infinitesimal -cube on a set is specified by saying how you want to map into your set, and how far you want to go along each coordinate.

Note that an infinitesimal 0-cube is simply a point.

*Definition *[Infinitesimal -chains]. An infinitesimal -chain is a formal -linear combination of infinitesimal -cubes.

*Definition* [Boundary of -chains]. Let be a 1-cube . The boundary is defined to be the 0-chain , where this is a formal l-linear combination of 0-cubes identified as points.

Let be a 2-cube . The boundary is defined to be the 1-chain .

In general, if is an -cube , the boundary is defined to be the -chain .

The boundary map is extended to chains in the usual way.

*Definition* [Differential Forms]. An -form on a set is a mapping from the infinitesimal -cubes on to satisfying

1. Homogeneity. Let , , and . Define by . Then for all , .

2. Alternation. Let be a permutation of . Then , where and .

3. Degeneracy. If , .

We often write as

We extend to act on all -chains in the usual way.

These axioms intuitively say that is a reasonable way of assigning an oriented size to the infinitesimal -cubes.

The homogeneity condition says that if you double the length of one side of an infinitesimal -cube, you double its size.

The alternation condition says that if you swap the order of two coordinates in an infinitesimal -cube, then you negate its oriented size.

The degeneracy condition says that if any side of the infinitesimal -cube is of length 0, its oriented size is of length 0.

By the Kock-Lawvere axiom, for all differential -forms , there is a unique map such that for all and we have .

*Definition* [Exterior Derivative]. The exterior derivative of a differential -form is an -form defined by

for all infinitesimal -cubes.

*Definition* [Finite -cubes]. A finite -cube in is a map from to .

The boundary of a finite -cube is defined in the same way that the boundary of an infinitesimal -cube was defined.

In the above section, a curve was a finite 1-cube in and a surface was a finite 2-cube in .

*Definition* [Integration of forms over finite cubes]. Let be an -form on and a finite -cube on . Then is defined to be

Generalized Stokes’s theorem (for finite -cubes) is provable in SIA (see [Moerdijk-Reyes] for the proof).

*Theorem* [Generalized Stokes’s Theorem]. Let be a set, an -form on , and a finite -cube on . Then

Let’s see how this gives the Fundamental Theorem of Calculus.

Let and let . We would like to see how is a special case of Generalized Stokes’s Theorem. (On the other hand, that it’s *true* is immediate from the way we defined integration.)

Let be the 0-form on defined by . (Recall that 0-cubes are identified with points.)

Then is the 1-form which takes infinitesimal 1-cubes to . We must show that for the finite 1-cube , .

The boundary of is (as a formal linear combination, not as a subtraction in ). Therefore, . Since is a function from to , there is a unique such that for all . Then . Therefore, , where for all .

Therefore, .

One can show in a similar manner that Stokes’s theorem and the Divergence theorem are special cases of Generalized Stokes’s theorem, although the computations are significantly more arduous.

]]>

Ax’s Theorem: For all polynomial functions , if is injective, then is surjective.

Very rough proof sketch: The field has characteristic 0, so each of the axioms (where is a prime) is true in . Suppose that some polynomial is injective but not surjective. Then there is a proof of that fact from the axioms of algebraically closed fields, together with the axioms . But a proof can only use finitely many axioms. Therefore, there must be some that is not used in the proof. One can then show that there would be a polynomial function which is injective but not surjective from to , where is a finite field of characteristic . But this is impossible, because is a finite set.

More details below.

Fix a language . The set of axioms ACF for algebraically closed fields consists of the field axioms (, etc.) together with, for each an axiom

asserting that all monic polynomials of degree have a root.

If is a prime, then let ACF_{p} be the axioms of ACF together with the axiom

asserting that the field has characteristic . Call that axiom . Let ACF_{0 }be the set of axioms of ACF together with for each . (This asserts that the field has characteristic 0). We now have the following lemma, provable by (essentially) logical means.

*Lemma*. All of the theories ACF_{p }and ACF_{0 }are *complete*, i.e., for any first-order sentence in the language of , either the theory proves or the theory proves .

*Proof of Ax’s Theorem*. For each and , let be a formula asserting that all -tuples of polynomials of degree in variables which are injective are surjective.

First we show that ACF_{p }proves each . First observe that is true in each finite field of characteristic just by virtue of it being a finite set. Since the algebraic closure of (the field with elements) is a union of finite fields of characteristic , it is true in that field as well: if there is some injective and non-surjective polynomial function, simply pick a finite field of characteristic large enough to contain all the coefficients of the polynomial and to witness its non-surjectivity in order to get a contradiction. Since ACF_{p }is complete and there is an algebraically closed field of characteristic in which is true, it follows that ACF_{p }proves .

Now, assume that some wasn’t true in . Then ACF_{0 }would prove . But the proof would have to use only finitely many axioms . If is a prime greater than each such that is used in the proof, then ACF_{p0 }proves , contrary to the result of the above paragraph.

For more information, see David Marker’s introductory notes on model theory here.

]]>