XOR’s Hammer

Parenting Recommendations

mkoconnor — Sun, 22 Feb 2026 14:28:13 +0000

My four-year-old has recently started asking me questions about infinity (“is there a biggest number?”, “what’s infinity plus one”, etc.) and he and my three-year-old often get into competitions to name the biggest number.

The correct answers to these questions depend on how you interpret “number” and “infinity”; I’ve decided it’s best to pick one interpretation of these terms and answer their questions consistently based on that interpretation. Clearly, when they are older they will remember the answers their dad gave them when they were three and four and be grateful that all the answers turned out to be consistent.

But what interpretation should I give to these terms? Here are some possibilities:

Cardinal number	$\aleph_0$	no	yes	yes	no
Ordinal number	$\omega$	yes	no	no	n/a
Ordinal numbers with “natural” operations	$ω$	yes	yes	no	n/a
Surreal number	$ω$	yes	yes	yes	yes

Let me go through some of the tradeoffs:

If “number” means cardinal number, then I’d have to explain to my kids that “infinity plus one” is the same as “infinity”. I don’t think they’d like that. Also, when they get into a “name the highest number competition”, they often say things like “infinity plus infinity” and “infinity plus two”. I’d have to explain that those are all the same, defeating the point of the game. That’s no good.

“number” meaning ordinal number is better, but then if my kids ever ask about “one plus infinity” instead of “infinity plus one”, I’d have to explain that “one plus infinity” equals “infinity” but “infinity plus one” is bigger. Huh? What four-year-old wants to hear that?

If I keep “number” as meaning ordinal number, but interpret “plus” to be the so-called “natural” plus that’s commutative, then that’s better. But even better still is surreal numbers. Much like a Stokke high chair, it will grow with them as they age. When they start asking about “infinity minus one” I can explain that that’s an actual number that’s one less than infinity as expected. Similarly with more advanced things like “sqrt(infinity)” and “log(infinity)”. If they ever ask about infinitely small things, I can explain that as “1/infinity”.

And yet, this essential parenting tool was only invented ~50 years ago. Just another of the modern parenting conveniences that weren’t available to our ancestors.

The Axiom of Choice Isn’t Always Non-Constructive

mkoconnor — Wed, 10 Aug 2022 20:59:02 +0000

The Axiom of Choice is usually introduced as a non-constructive axiom that mathematicians used to care about but don’t really pay much attention to anymore. It’s true that mainstream mathematicians often don’t pay much attention to it, but it turns out that AC isn’t inherently non-constructive: it depends on what the base system it’s being added to is.

Furthermore, adding it may or may not effect what arithmetic statements a system can prove (that is, what statements only involving quantifiers and relations over the natural numbers).

Here’s a list of what happens to various axiomatic systems when you add AC. I’m sure it’s very incomplete, it’s just what I happen to be aware of. Hopefully there aren’t significant errors; at some point I’ll actually go through and add links for these things.

Martin-Lof type theory: Constructive, and constructive after AC is added. AC is actually already provable in the system.
Heyting Algebra in all finite types: Constructive, and constructive after AC is added. AC is not already provable in the system, but it doesn’t change the provable arithmetic statements
Local set theory (or the internal logic of the free typos): Constructive, but not constructive after AC is added
IZF: Constructive, but not constructive after AC is added
Peano Arithmetic in all finite types: Not constructive even without AC. Adding AC does allow proving more arithmetic statements.
ZF: Not constructive even without AC. Adding AC does not allow proving more arithmetic statements.

A Nice Definition of “Field Theory”

mkoconnor — Fri, 09 Apr 2021 00:51:17 +0000

Like most people, I don’t really know anything about quantum field theory. But the other day I stumbled across this paper by Stefano Gogioso, Maria E. Stasinou, and Bob Coecke that provides a very nice framework for what sort of thing a “quantum field theory” (or really, any “field theory”) is. It certainly doesn’t mean I understand quantum field theory, but knowing what sort of thing it is helps me categorize it in my brain.

The definition is as follows: Suppose we’re given a partial order where we are meant to interpret elements as points in spacetime and where means could causally affect .

We say that are space-like separated if neither nor and we say that a subset is a space-like slice if it’s an antichain (that is, where any two distinct elements are space-like separated).

Given a subset we say that a subset is a path to if:

is a linear ordering
has a maximum element and that maximum element is in
is a maximal subset with the above two properties

Now, we can form a partial symmetric monoidal category as follows:

The objects are the space-like slices of
The monoidal product is defined just when all elements of are space-like separated with every element of . In that case, it’s defined to be
The category is a partial order with a morphism from to just in case every path to intersects

In other words, there is a morphism between two space like slices and just in case the state of the world at should determine the state of the world at .

Now, given that representation of spacetime, to form a field theory you simply pick some other symmetric monoidal category and a monoidal functor from to . The authors of the paper point out that you have different choices for depending on the type of field theory you want: for example, in a finite-dimensional context you could pick a category of finite dimensional Hilbert spaces and completely positive maps between them, or in an infinite-dimensional context you could pick the category of Hilbert spaces and bounded linear maps.

I find just that definition by itself illuminating. Of course, the paper doesn’t stop there: you can define other concepts from this definition (spacetime regions, foliations, etc.) as well as put restrictions on what slices you allow if you need them to, e.g., be nice topologically. They also relate their approach to other frameworks for quantum field theory. Neat!

But Why Is Proof by Contradiction Non-Constructive?

mkoconnor — Fri, 09 Apr 2021 00:30:29 +0000

We think of a proof as being non-constructive if it proves “There exists an such that without ever actually exhibiting such an .

If you want to form a system of mathematics where all proofs are constructive, one thing you can do is remove the principle of proof by contradiction: the principle that you can prove a statement by showing that is false. (Let’s leave aside set-theoretical considerations for the moment.)

But one thing you can ask is: exactly why is the principle of proof by contradiction non-constructive? In the paper Linear logic for constructive mathematics, Mike Shulman gave an answer which I found quite mind-blowing: Imagine you’re proving by contradiction. This means that you allow yourself to assume and show a contradiction from there. The assumption is equivalent to , but in order to use such an assumption, you actually have to produce an , so shouldn’t that be constructive?

The answer is that yes it will be, unless you use the hypothesis more than once! So (the paper reasons), you can form a constructive system of mathematics not by removing the law of proof by contradiction, but by requiring you to only use a hypothesis once when proving a statement . Absolutely amazing!

It gets even more amazing: Once you’ve committed to doing that, there’s a question: Does proving mean you’re allowed to use both and in your proof of , or that you could use either to prove ? Both are reasonable interpretations, so conjunction splits into two separate connectives.

Dually, disjunction also splits into two connectives, and these two connectives can be given the interpretations of “constructive-or” and “non-constructive-or”. Fantastic!

What does it mean to extend the manipulability of differentials?

mkoconnor — Sat, 09 Nov 2019 14:12:23 +0000

In an interesting paper called Extending the Manipulability of Differentials, the authors Jonathan Bartlett and Asatur Zh. Khurshudyan describe an interesting proposal for representing higher-order derivatives. The argument is basically this:

As is well-known, the chain rule for first derivatives seems to follow algebraically if you use Leibniz notation for the derivatives:

However, there’s a chain rule for second derivatives and it doesn’t seem to follow algebraically even when using the Leibniz notation for the second derivative

But (the authors argue), it does if we use a different notation for the second derivative: , and furthermore this can be rewritten by expanding using the quotient rule

This is quite intriguing. But I found it a bit odd that the authors presented this as a purely notational idea, when there seems to be some non-trivial semantic thing going on here. This is one attempt at spelling out what it is:

Let be the free smooth algebra with generators (right now, these are just names and the has no meaning). Then the proposition is:

There exists a function such that

(including the case )
For all smooth functions and , , where is the th partial derivative of .

This is non-trivial (and thus useful) because a given element might be expressible in more than one way by the rules above.

This is likely not a complete characterization of all the ways you could use the rules in the paper. In particular, this only deals with total smooth functions, so even an expression like isn’t directly dealt with (you have to say that means an such that ), but I think it captures a good bit of what’s going on.

Hoping for Lane Closures

mkoconnor — Sat, 09 Nov 2019 13:32:59 +0000

Two lanes of a four lane highway are closed and you’re stuck in a traffic jam. If there are no on- or off-ramps, what should you hope to see on the road ahead of you: the other two lanes re-open or one of the two open lanes close?

I think you should hope to see one of the two open lanes close: Since cars are not appearing or disappearing, going from two lanes to one should double the speed of each car, while the reverse will happen when going from two lanes to four lanes.

Differentiating Sine Without Doing Any Work or Knowing Anything

mkoconnor — Sun, 27 Oct 2019 15:27:33 +0000

If you do a google search for how to derive the facts that and , most of the derivations you’ll find rely on knowing something like the double angle formula and go through a direct, non-trivial evaluation of .

It’s actually possible to compute these derivatives this without really remembering any specific facts about trigonometry (besides the fact that for points on a circle) or computing any limit directly. I think this deserves to be better known, so I thought I’d record it in a blog post.

I’ll state the theorem generically, without specifically referencing or , and using instead of to try to avoid allowing any hidden assumptions about angles to creep in.

Theorem: Suppose and are differentiable functions satisfying:

for all
The arclength of the path from to is for any

Then one of the following two conditions holds:

and for all
and for all

Proof: We’ll use the two conditions to get two equations in our two unknowns and then solve. Differentiating with respect to (and dividing by 2) gives . That’s Equation 1.

The arclength condition is: . Differentiating that with respect to gives , and thus . That’s Equation 2.

These are our two equations. Solving for in Equation 1 gives . Plugging that in to Equation 2 and simplifying (using ) gives . Thus, either or , which lead to the two possibilities respectively. QED.

From that generic result, and can then be characterized completely by assuming the initial conditions that and and by choosing (choosing the other alternative of would correspond to going clockwise instead of counterclockwise around the circle).

How does the Infinitesimal Intuition About Lie Brackets Actually Work?

mkoconnor — Wed, 16 Oct 2019 19:19:09 +0000

You can often get the gist of a mathematical subject via an informal explanation involving infinitesimals. But I often find that questions arise from that informal explanation that I’d like resolved, but I don’t want to jump all the way to the full definitions. Without a rigorous basis for reasoning about infinitesimals, it can be tricky to dig any deeper, and I think allowing that slightly deeper digging is a nice benefit that an understanding of nonstandard analysis can provide.

One example of this for me is the Lie bracket of vector fields: this is supposed to be a commutator of vector fields, but isn’t addition of vector fields supposed to be commutative? How is the Lie bracket ever non-zero, given that intuition? The answer may be obvious to most mathematicians, but it wasn’t for me. Fortunately, nonstandard analysis provides a nice way to push an informal, infinitesimal-based understanding of vector fields and Lie brackets far enough to to answer this to my intuitive satisfaction.

This is based on this paper, which provides a development of differential geometry in nonstandard analysis. I’ll simplify the presentation in two ways: One is by assuming that the given manifold we’re considering is compact. The other, which you can ignore if you’re not familiar with these issues, is by working in an internal set theory; this essentially means that we work in a framework where already has infinitesimals in it, rather than having to pass to some extension field containing infinitesimals.

The basic intuition for treating tangent vectors nonstandardly is to think of them as things which tell you how to flow an infinitesimal amount from a given point. To flesh this out, let’s start with a few definitions:

Given two points and an infinitesimal 0" class="latex" /> with , let’s say that is if is infinitesimal in some chart (equivalently in every chart). Similarly, let’s say that is if isn’t infinitesimal in some chart (equivalently in any chart).

Now we can build up a nonstandard notions of tangent vectors and vector fields. The paper above makes the choice to fix a single infinitesimal 0" class="latex" /> and use it as a sort of global length scale throughout. Given that, we have:

A prevector at a point is a pair where such that is .
Two prevectors , are equivalent if is .
A tangent vector at is a prevector , considered up to equivalence.

Similarly we can define:

A prevector field on is a function from to such that is for all .
Two prevector fields , are equivalent if is for all .
A vector field is a prevector field considered up to equivalence.

Now, we can transfer classical notions over to the nonstandard case. For example, we can add two prevectors based at the same point by doing normal vector addition in a chart: this does depend on what chart you use but only up to . Similarly, it turns out that we get a well-defined vector addition based off of this notion of prevector addition.

Similarly, we can define addition of vector fields pointwise. Furthermore, if there are underlying prevector fields and that satisfy a certain regularity condition (corresponding to classically), then addition of the vector fields is equivalent to the composition of the prevector fields (which, recall, are just functions).

We can also easily define a flow (or integral curve) of a prevector field : Starting at a point , the flow of along for time is ; that is, composed with itself many times.

There’s a sort of bonus “paradox” resolution here: I used to wonder intuitively why differential equations could have non-unique solutions: Isn’t it the case that the differential equation always tells you exactly where to move in the next infinitesimal time step? The answer is no, it doesn’t: for an infinitesimal timestep , it only tells you where to move up to , so you might be able to make different steps that “add up” to appreciably different solutions.

Moving on to the Lie bracket, consider prevector fields and ; as the “commutator” intuition about the Lie Bracket suggests, let’s consider the prevector field . As the “vector fields commute with each other” intuition suggests, this prevector field is everywhere equivalent to the zero prevector field.

But we still want to study it. In order to make it “appreciable”, the paper defines the Lie bracket prevector field of and to be , that is, composed with itself many times. This is sufficiently many times that this prevector field is now distinct from the zero prevector field.

The paper proves that (under a regularity condition corresponding to classically), this does indeed correspond to the classical Lie bracket. I found this to be a very satisfactory resolution of the conflicting intuitions I mentioned at the start of the post.

How is it even possible for a sailboat to sail into the wind?

mkoconnor — Mon, 29 May 2017 17:21:21 +0000

Until this morning, I didn’t really understand how it was possible for a sailboat to sail into the wind: popular descriptions like Wikipedia’s talk about keels and lift and the Bernoulli effect and so forth, but this feels like a leap beyond my understanding: I wanted an account of how it is even possible in terms of the very basics.

I think to most physics people this is obvious, but in case there are others like me out there, I thought I’d record the explanation I came up with this morning here.

Suppose we model the situation as follows: there is a single air particle with mass and velocity . The sailboat is also a particle; it has mass and is initially at rest. The two particles interact in some way (it could involve rudders, keels, whatever), and afterwards the air particle has velocity and the sailboat has velocity .

If we require that momentum is conserved and kinetic energy doesn’t increase, so that and , then it is possible to prove that our everyday intuition is correct! That is, you can show that indicating that the cosine of the angle that the sailboat makes with respect to the initial air velocity must be nonnegative: that means the sailboat can’t go against the wind! (In fact, if the sailboat’s velocity is nonzero, the dot product has to be strictly positive.)

The resolution is to add a third particle representing the ocean (or other air particles). Now it becomes possible for the sailboat’s final velocity to go against the wind, even though it is still impossible for the combined system of the ocean plus the sailboat to go upwind.

Looking at it this way with three particles, it’s actually pretty easy to see that it’s possible for the sailboat to go against the wind: imagine a bowling ball representing the ocean floating in space, with a ping pong ball representing the sailboat touching it. Another ping pong ball representing the air collides with the first ping pong ball. Both ping pong balls bounce in the opposite direction (and the bowling ball is deflected very slightly): The sailboat is now moving in the direction opposite the initial air velocity!

Anyway, I still have no idea how the rudders and keels and so forth actually work, but looking at it this way satisfied my curiosity about how it was even possible for a sailboat to sail against the wind.

Making Money Disappear Through Infinite Iteration, Now In YouTube Form!

mkoconnor — Sun, 07 May 2017 17:32:12 +0000

A while ago, I wrote a blog post called Making Money Disappear Through Infinite Iteration, and I just put out a video version of this post on youtube. Note: it’s very rough and unpolished, but I hope to make more videos and get better at their production as time goes on.

I’d been interested in making a math youtube video for quite a while, but the impetus for this was that I just found out that 3blue1brown, who is math youtuber who produces excellent videos, has actually released the python scripts that he uses to make them on his github.

So this weekend, I’ve been poking at them to figure out they work, and I really think allowing people to easily make high quality math animations might be as revolutionary as in terms of allowing people to communicate mathematical ideas. I’m very excited!

The CGP Grey Sheaf of Continents

mkoconnor — Sun, 24 Jul 2016 18:52:27 +0000

CGP Grey is a youtuber with a variety of interesting videos, often about the quirks of geography and political boundaries. In this video, he asks the question “How many continents are there?”, discusses a variety of subtleties in the notion of “continent”, and concludes that it is not well-defined enough to provide an answer.

Let’s grant that “continent” is not a well-defined term; or, to put it another way, the set of continents is not a well-defined set. Even given that, it turns out there’s a mathematical notion of a “variable set”, or “set-valued sheaf” that can capture the notion of a set which can vary under different assumptions. Intuitively a set-valued sheaf on a topological space is like a continuous function with domain , except the range is not another topological space, it’s the category of all sets!

Rather than define “sheaf of sets on a topological space” explicitly, let’s work through what it means in the CGP Grey case. For simplicitly, let’s just focus on two of the things that CGP Grey mentions: the meaning of “continent” can vary depending on how large you require a continent to be, and the meaning of “continent” can vary depending on how separated you require two continents to be to count as distinct.

Since these are two independent parameters, let’s take our topological space to be . The first coordinate will represent our “looseness about the size requirement”; i.e., if it’s larger, we’ll consider smaller islands to be continents. The second coordinate will represent our “degree of consideration of land bridges”; i.e., if it’s larger, we’ll require larger amounts of water to separate two continents.

To be clear, these parameters are subjective: that is, I’m not postulating any quantitative correspondence between the parameters and, e.g., a minimum size requirement to be a continent.

Now let’s see what the variable set of continents might look like. First, let’s set the second parameter to 0 and vary the first parameter. The set might look like this:

Note that some continents, like South America, are always in the set of continents, but as the parameter gets loosened, other elements get added to the set.

Now, let’s set the first parameter to 0 and vary the second one. That graph might look like this:

This is a little more subtle than the previous graph; instead of new continents getting added, two continents which are distinct might become equal: Europe and Asia quickly become equal, as there is actually no ocean between them at all. If you disregard the Panama Canal, North America and South America become one continent. If you disregard the Suez Canal, Eurasia and Africa become one continent.

Now, we’ve only looked at two slices of this variable set (and even those two slices have been under-specified, since I haven’t said in complete detail how to interpret the two parameters). But let’s suppose that the full variable set on can be filled out to give a set-valued sheaf called .

Given that, what can we do with this? Well, one of the reason sheaves are interesting from a logical perspective is that if we consider the category of all set-valued sheaves on (or any fixed topological space), this forms a type of category called a topos which acts so much like the category of sets that we can actually pretend that its elements are sets, and do normal set theory in it. The only proviso is that the internal logic does not include the law of the excluded middle: the axiom that for any proposition .

So, what are some things you can do in this logic where we get to pretend that is a genuine set?

Well, we know that has elements: we know there is a thing called that’s in , and a thing called that’s in and so on. We don’t know there’s a thing called that’s in ; I’ll show how to deal with that later.

This is where the lack of the law excluded middle first rears its head: in this logic it is neither the case that , nor that ! On the other hand, it is the case that . This might seem unusual with ordinary sets, but I think it’s pretty intuitive here. Note that there can be relationships between these facts, e.g., implies .

In normal set theory, you can determine the cardinality of any set. And in fact, the video’s stated aim is to say what the cardinality is. One of the consequences of losing the law of the excluded middle is that the notion of finiteness becomes more subtle (e.g., see here or here), which again seems appropriate here. It turns out to be the case that is what’s called subfinite, but doesn’t have a definite cardinality.

However, there are still true things using the cardinality of in them: for example, assuming no continents other than the ones in the graphs above are added, it’s the case that the cardinality of is less than 11 (even though the cardinality does not equal a specific number below 11). For another example, there might be a relationship between the two parameters such that something like implies the number of continents is greater than 7 is true.

OK, so far we’ve discussed how the logic handles things like the possibility of two continents becoming equal. How does it handle the conditional existence of continents like Borneo? So far it’s not clear how to even talk about these things in the language.

To explain that, we have to back up a bit. In normal set theory, there are sets with one element, and we might as well pick a distinguished one, call it . Note that for any set (still in normal set theory), the elements of are in 1-1 correspondence with maps ; so we could as well talk about those maps instead of elements of .

Similarly, in the theory of set-valued sheaves on , there is also a set , and instead of saying that continents like are elements of , we could have instead talked about maps from to and relationships between them.

Now, in normal set theory, has only two subsets: itself and the empty set. But that proof depends on excluded middle (since it goes by asking whether or not is in a given subset), so if we drop it, it’s no longer necessarily true. Indeed, in this logic, there is a subset of , call it , that is not the empty set and not . Furthermore, there is a map, , from to . The fact that this map has domain instead of represents the conditional nature of Borneo’s existence as an element of .

Just as with the equality hypotheses, we can represent relationships between conditional existences: for example, if Greenland is a continent whenever Borneo is, we have a map from to . If there are at least 7 continents whenever Borneo exists, we have a map from to .

Toposes were invented in the service of algebraic geometry (see here for a good account of the history of this topic). However, I think they also provide a beautiful account of how set theory can take account of fuzzy concepts. See here for more on this notion of variable sets.

Thermodynamics is Easier Than I Thought

mkoconnor — Thu, 19 May 2016 04:26:14 +0000

Actually, thermodynamics is hard and I don’t understand it. But even without totally understanding thermodynamics, it turns out its possible to do a surprising number of useful calculations with just a couple of simple rules about entropy.

The setup is as follows: Imagine that there is some set of states of the world, called the macrostates, that we humans can distinguish. To each of these macrostates is associated some large number of microstates, where a microstate is a complete specification of all information about all the particles in a system.

For example, given a container of gas, different macrostates would correspond to different pressures and temperatures of the gas, since we can determine those with macroscopic measurements. Microstates would correspond to complete information about how all particles in the gas are moving.

Every macrostate has an associated quantity called its entropy, written with an . The entropy of a macrostate obeys the following rules:

The entropy is equal to Boltzmann’s constant, , times the logarithm of the number of associated microstates: .
If a system is at a temperature , and you heat it by adding energy to it (while keeping it at temperature , by allowing it to expand, say), then its entropy increases by .
The total of the entropy of the universe always increases. (This is the second law of thermodynamics.)

These rules alone let you do a surprising number of useful calculations:

Doubling the Volume of a Gas

Suppose you have a gas occupying a volume at temperature with total number of molecules . How much energy do you have to add to it to double its volume while keeping it at the same temperature?

With a doubled volume, imagine that the gas occupies two spaces of volume . Then the number of microstates of the system gets multiplied by since, for each microstate of the original gas, there are new microstates corresponding to which space of volume the molecule is in.

That means that the entropy must have gone up by by Rule #1, but by rule #2, that means that to achieve this entropy increase by heating, we must have added energy to the system.

Efficient Refrigeration

Generally, when hot things and cold things touch, the hot things get cooler and the cold things get hotter. This is because hot things that are getting cooler are losing entropy, and cold things that are getting hotter are gaining entropy, but the entropy being gained is greater than the entropy being lost.

How much energy does it take to reverse the process? Suppose you have a refrigerator with 10 kg of food that’s currently at room temperature (say 20° C) and you want to lower it to 0°C. Suppose the specific heat of the food is 3.5 kJ/kg°C. That means that a total of can be extracted from the food as it cools.

When the food has lost energy, that must mean its temperature is:

That means the entropy lost by the food as its cooled is

This is about 2.5 kJ/°C. Whatever process is used for refrigeration, it must obey Rule #3, and thus increase entropy somewhere else.

If you increase entropy by exhausting heat into the room, which is at 20°C = 293K, then you’ll have to exhaust of energy. You can get 700kJ of that from the food, but you still need an extra 32.5 kJ of energy, which is why refrigerators have to be plugged in and don’t work on their own.

Deleting a Bit

How much energy does it take to delete a bit of information in a computer? A bit could be in state 0 or state 1, and after deleting it, it will be in state 0 (say). That means that the process of deleting the bit takes all the microstates of the 0 macrostate and all the microstates of the 1 macrostate to the 0 microstate. That’s halves the number of microstates, or subtracts from the entropy.

In order to obey Rule #3, you therefore have to exhaust a minimum of energy as waste heat, where is the ambient temperature. This is called Landauer’s principle.

Gravity is Stronger Than I Thought

mkoconnor — Thu, 19 May 2016 02:57:21 +0000

I’m not a physicist, and I’d always supposed that, while the Earth has a significant gravitational pull because it’s so massive, the gravitational pull between everyday objects must be completely undetectable, or maybe only detectable with modern laboratory equipment.

But I only thought that because I never bothered to actually plug in any numbers. Using the formula for the force of gravity, you can see that if you have two 1-kilogram objects 0.1 meters apart, the acceleration due to gravity between them is enough to move them by 2.7 millimeters in just 15 minutes. Wow!

Of course, this makes sense, given that was measured all the way back in 1798 by Henry Cavendish. I think I knew that was measured quite long ago, but I just assumed it was based on some astronomical calculations, or was some sort of indirect inference. Nope, it turns out Henry Cavendish put some lead balls on a torsion balance and directly measured how much they attracted each other. Cool!

The Arithmetic Hierarchy Meets the Real World

mkoconnor — Sun, 15 May 2016 00:30:42 +0000

Mathematical logic has a categorization of sentences in terms of increasing complexity called the Arithmetic Hierarchy. This hierarchy defines sets of sentences and for all nonnegative integers . The definition is as follows:

and are both equal to the set of sentences such that a computer can determine the truth or falsity of in finite time.Sentences in are essentially computations, things like or “For all less than a million, if 2" class="latex" /> then .”

for 0" class="latex" /> is defined as follows: If, for all , is a sentence, then is a sentence.

Similarly, if for all , is a sentence, then is a sentence.

It’s not obvious from the definition, but by encoding pairs, triples, etc. of natural numbers as single natural numbers, you are allowed to include multiple instances of the same quantifier without moving up the hierarchy. For example, a sentence is if is . Essentially contains all statements with quantifier alternations where the outermost quantifier is a and contains all statements with quantifier alternations where the outermost quantifier is an .

Many number-theoretic and combinatorial statements are : Fermat’s last theorem, Goldbach’s conjecture, and the Riemann Hypothesis (which can be seen to be in by a formulation of Jeffrey Lagarias). By encoding finite groups as numbers, theorems like the classification of finite simple groups can also be seen to be .

Note that statements of the form where is some computable function are also in since for a fixed , is checkable in finite time by a computer.

There are many sentences which are of the form in for which we actually know a bound on , so we get a sentence in when we use that bound. For example, the statement that there are infinitely many primes is in , since it can be written as “For all , there exists an n" class="latex" /> such that is prime”, but we also know a version that’s in , for example, we know that there’s a prime between and for any .

There is a theorem that the sentences in this hierarchy actually do get harder to decide the truth of in general:

Theorem: For any : even in a programming language augmented with the magical ability to decide the truth of sentences, it is not possible to write a program which decides the truth of every sentence.

A Scientific Hierarchy

What if we used the above idea to classify scientific statements instead of number-theoretic ones? We would get a new hierarchy where:

and are both equal to the set of sentences such that a definite experiment can be performed which demonstrates the truth or falsity of .
and for 0" class="latex" /> are defined just as before.

Examples of statements in this classification would be things like: Any pair of objects dropped in a vacuum will fall with the same acceleration; or given any gram of liquid water at a standard pressure, adding 4.18 joules of energy will raise it 1 degree Celsius.

Of course, those examples are idealized: to make them actually true, you have to add in lots more caveats and tolerances: for example, you have to say that any pair of objects weighing no more than such-and-such will fall with the same acceleration up to such-and-such tolerance (and add in various other caveats). More on this later.

An example of a scientific statement might be something like: if you put any two objects in thermal contact, then for any given tolerance, there will be a time at which their temperatures are equal to within that tolerance.

You might have to use a statement of complexity to say something about randomness. For example, a scientific statement might be: Suppose you are doing an experiment where you acquire a carbon-11 atom, wait for it to decay to boron-11 and record the time, then repeat. Then, for any tolerance , there will be an such that for any = n" class="latex" /> the proportion of the first carbon-11 atoms that decayed in less than 20.334 minutes will be no more than away from 0.5.

A Scientific Hierarchy Theorem?

In the Arithmetic Hierarchy case, we had a theorem saying that statements higher up the hierarchy were qualitatively harder to decide the truth of. Is there a similar theorem here?

In fact, you can make this precise (although, to be honest, I’m not sure what the cleanest way to do it is). In particular, statements are learnable in the limit of acquiring all the experimental data, while statements aren’t. One way to make this rigorous is: Suppose you have a statement , where for any object (assume that ranges over some countable set of physical objects), can be established or refuted by a single experiment.

Now, suppose that you have a probability distribution over all possibilities for for objects : that is, you have a probability distribution on the Cantor space where the elements of the sample space are to be interpreted as full specifications of whether or not holds for each : e.g., things like or .

Call events which are finite conjunctions of events of the form or basic events.

Now say that this probability distribution is open-minded with respect to an event if, for any basic event , if and are consistent, then 0" class="latex" />.

Now, assuming that is open-minded with respect to the event , it’s pretty easy to show that , where . That is, if is actually true, it will be learned. (Of course, if it is false, that will be learned in finite time automatically.)

On the other hand, for general statements, it’s pretty easy to see that this is not the case: In fact, any computable procedure assigning a probability for a statement just based on seeing finitely many data points of the form or can be tricked by some value of by using the definition of the computable procedure in itself.

I would like to emphasize though, that the above are off-the-cuff basically-trivial observations. There must be a cleaner, nicer framework to state a scientific hierarchy theorem, but I don’t know what it is.

Science is Hard

In my opinion, thinking about the complexity of scientific statements in terms of quantifiers yields some insights.

For example, the case is basically the scientific method as learned in elementary school: a hypothesis is generated, and by repeated experiments we either reject it, or gradually come to correctly accept the truth of it.

The fact that science is not so simple is seen by just recognizing that there are scientific hypotheses that are not in form. However, probably even more significant is the fact that many real-world phenomena add quantifiers to our hypotheses:

Measurement errors.As alluded to above, to make our scientific statements true, we have to add tolerances. If we don’t know what the tolerances should be when we make the hypothesis, we have to add at least one quantifier.
Not everyone can perform the experimentsNot everyone performs experiments, in fact, most people rely on scientific consensus.
That automatically brings the hypothesis up to (at least) , for example:
There exists a time , such that for all times t" class="latex" /> the scientific consensus at time will be in favor of global warming (or evolution, or that eggs are good for you, or whatever).

Where can I go for more information?

I feel like thinking about the complexity of scientific hypotheses in terms of quantifier alternations is so natural that it must have been studied before, but I can’t find anything by googling around. Does anyone know where to find more information on this?

YouTube Physics Explanations Shouldn’t Use the Right-Hand Rule

mkoconnor — Tue, 10 May 2016 03:32:46 +0000

Popular explanations of physical phenomena like gyroscopes or magnetic fields often end up having to explain the right-hand rule to explain how rotational quantities add (say, by using the right-hand rule to convert angular momenta into vectors, then adding the vectors).

This is bad, not just because the right-hand rule is confusing, but because it leads people to wonder if the right-hand rule has some physical reality. For example, see the comments on this youtube explanation of gyroscopes by the PhysicsGirl: there’s a lot of confusion over whether the right-hand rule is a fact of nature or a convention.

It turns out that it’s unnecessary to use the right-hand rule because it’s actually unnecessary to convert rotational quantities to vectors at all, and I think many people are unaware of this.

Let me explain by analogy with the vector case. Given a particle, how do you represent its linear momentum? You take a vector whose direction is the same as the direction of the particle’s velocity, and whose magnitude is the particle’s mass times its speed.

A system of two particles has a linear momentum obtained by adding the linear momenta of the individual particles by putting the vectors head-to-tail.

Now, how do we represent the angular momentum of a particle with respect to some base point ? Usually, what you do is take the position vector of with respect to and the particle’s linear momentum vector , and define the linear momentum to be the cross product , which requires the right-hand rule. Then, as before, the angular momentum of two particles is the sum of the angular momentum of the particles individually.

However, there is another thing you can do: instead of taking the cross product , which is a vector, represent it as a 2-dimensional object: an oriented parallelogram called which is in the same plane as and and whose area is .

These add analogously to vectors: you add two oriented parallelograms by matching up edges of the same length and opposite orientation. (Apologies for the poor handwriting: the squiggles inside the parallelograms are meant to be arrows indicating orientation.)

Two of these parallelograms are declared to be equal if they have the same (signed) area and live in the same plane: in that way, you can always add any two of them by reshaping one of them to have the right side length to add with the other.

And, this addition is still appropriate: under this definition, the angular momentum of the system of particles is the sum of the angular momenta of the individual particles, and it has all the properties you want: it’s unchanged unless the system experiences a net torque (also represented by a parallelogram), etc.

This gets rid of the arbitrariness and complexities of the right-hand rule, and also has the nice property that for a situation restricted to a plane, you don’t have to arbitrarily introduce a third dimension not part of the problem.

These parallelograms are actually called bivectors, and this is just a very small part of a much larger enterprise called geometric algebra that allows algebraic manipulation of higher-dimensional objects just as linear algebra allows the algebraic manipulation of vectors. There’s a small cadre of physicists who think lots of physics should be redone with geometric algebra.

I’m not qualified to have an opinion on that, but I am pretty confident that using signed parallelogram addition in YouTube physics explanations would be clearer than using the right-hand rule.

A Complexity-Theoretic Account of The Strong Law of Small Numbers

mkoconnor — Mon, 09 May 2016 19:17:51 +0000

The Strong Law of Small Numbers (see also Wikipedia) says that “There aren’t enough small numbers to meet the many demands made of them.” It means that when you look at small numbers, it’s easy to see compelling patterns that turn out to be false when you look at larger numbers.

Using complexity theory, we can give a partial account of this phenomenon. The concept that we’ll use is length-conditional complexity: if is an -bit number, means the length of the shortest Turing machine that outputs when given as an input. In a previous post, I stated the following theorem:

Theorem: Suppose you have a computable property of -bit numbers such that:

For all , , , implies .

For all , , there are at least -digit numbers such that holds.

Then there is a such that for all and , holds for every of length such that n + c - m" class="latex" />.

Furthermore, the number of ‘s of length such that n + c - m" class="latex" /> is at least .

In this context, might be of the form “ is not close to some special set of numbers”, like “ is not close to a square number” or “ is not close to a prime”, where the meaning of “close” will depend on and . Or it could be that some statistic about deviates from the expected value by some large amount (again, with the meaning of “large” depending on and ).

Now, suppose we have a bag of computable properties that we’re interested in. Assume that we’re using the same and for each and that they are of similar complexity, meaning that the value of in the above theorem is the same for each of them.

Then assuming that is small, given a random -bit number , it is likely (with probability at least ) that satisfies n + c - m" class="latex" />, and thus should hold for all without exception. If some doesn’t hold, that means some assumption was violated, in particular that doesn’t hold for at least -bit numbers, which is good evidence that there’s some pattern or mathematical law here that you were unaware of (e.g., a large proportion of numbers are close to one in some special set). (Note that it is also possible that the assumption that was violated was that the complexity of was higher than expected. However, that is tied pretty closely to the length of a program that implements , so, assuming that you know how to compute , it’s unlikely that it’s much more complex than you originally thought.)

On the other hand, if is large (say, ), then there may be no ‘s such that n + c - m" class="latex" />. In that case, it may simply be that the properties are independent for different values of . In that case, it’s not reasonable to assume that if some fails, then you
have discovered a possible new mathematical law, it may just be a coincidence instead.

Furthermore, as gets smaller, you will be forced to decrease , making larger, since if , then each must be trivial, since it must hold of at least of the numbers, and the only way to do that is to hold for all of them, making the properties trivial.

I certainly don’t claim this captures everything about the Strong Law of Small Numbers. But I like this account, because it gives a way to think about what “small” means: namely, it means small enough that you’ve chosen an such that , where is the complexity of the properties of numbers that you’re interested in.

Two Constants: Khinchin and Chaitin

mkoconnor — Fri, 06 May 2016 03:45:44 +0000

Take a real number, . Write out its continued fraction:

It’s an intriguing fact that if you look at the sequence of geometric means this approaches a single constant, called Khinchin’s constant, which is approximately , for almost every . This means that if you were to pick (for convenience, say it’s between 0 and 1) by writing a decimal point and then repeatedly rolling a ten-sided die forever to generate the digits after that, the you generate would have this property with probability 1.

However, as the Wikipedia page above says, although almost all have this property (call it the “Khinchin property”), no number that wasn’t specifically constructed to have the Khinchin property has been proven to do so (and some numbers, like and and all rational numbers, have been shown to not have the Khinchin property).

If you want to be the first to find a number having the Khinchin property that wasn’t specifically constructed to have it, my advice is to try Chaitin’s Constant, . Roughly, you can think of as the probability that a randomly selected Turing machine will halt, although there are a few more technicalities than that.

More importantly for our purposes, it’s very likely to have the Khinchin property, because it’s algorithmically random, meaning it has all computable properties almost all numbers have! That means that the following statement implies that has the Khinchin property:

There is a computable function satisfying: For all , the set of numbers between 0 and 1 such that for all \delta(n,m)" class="latex" /> has measure at least .

Proving that sounds a lot easier to me that, e.g., proving has the Khinchin property (note that the fact that is computable is the hard part).

However, some might quibble about whether or not this meets the original criterion: it’s definitely true that wasn’t constructed to have the Khinchin property; however, in a certain sense, it was constructed to have every such property!

A Good Definition of Randomness

mkoconnor — Wed, 04 May 2016 02:37:53 +0000

Most mathy people have a pretty good mental model of what a random process is (for example, generating a sequence of 20 independent bits).

I think most mathy people also have the intuition that there’s a sense in which an individual string like 10101001110000100101 is more “random” than 0000000000000000000 even though both strings are equally likely under the above random process, but they don’t know how formalize it, and may even doubt that there is any way to make sense of this intuition.

Mathematical logic (or maybe theoretical computer science) has a method for quantifying the randomness of individual strings: given a string , the Kolmogorov complexity of is the length of the shortest Turing machine that outputs it.

In this blog post, I would like to explain why I think this is a very satisfying definition.

Keeping Grounded

I think a good way to help avoid philosophical quagmires when thinking about randomness is to recognize that random numbers are useful in the real world, and to make sure that your thinking about randomness preserves that.

For example, there are algorithms that take a fixed length string , and produce the correct answer to whatever problem they’re trying to solve on some large proportion of all the length strings. Then a good approach would just be to feed a random , and you’ll get the right answer with probability .

Just to give a concrete example: a very familiar way that random numbers are useful is to estimate the average of a large list of numbers by taking a random sample and averaging them. You might have a list of 1000 numbers (say, bounded between 0 and 10), and have encode a set of 100 indices, then will return the average of the numbers at those indices. If you say that succeeds for this problem if it returns an average that’s within some fixed tolerance of the true average, then you can work out for the given tolerance (although I think getting exact numbers for this problem is actually pretty tricky).

The reason that I think that the Kolmogorov complexity is a good account of randomness is that they above story “factors” through Kolmogorov complexity in the following way: For any computable where is high enough (in a sense to be made precise below), there is an integer such that:

For all with c" class="latex" />, returns a correct answer.
Almost all (of the given length ) have c" class="latex" />.

That is, Kolmogorov complexity lets you view the problem as follows: Any string of high complexity will yield the right answer when fed into , so the only role of randomness is as an easy way to generate a string of a high Kolmogorov complexity.

As a note: the notation means the shortest Turing program that outputs when given as an input. The reason for using this concept instead of is that we want to, e.g., consider any string of all 0s to be low complexity, even if the length of the string happens to be a high complexity number.

Some Rough Intuitions

The intuition for why almost all strings should have high Kolmogorov complexity is that there are only so many Turing machines: For example, there are strings of length and Turing machines of length , so the proportion of strings of Kolmogorov complexity must be at least .

The intuition for why should be correct for all strings of sufficiently high complexity is as follows: We’re presuming that is correct for most strings, and that is computable. If isn’t correct, that means you can describe it fairly succinctly: i.e., as the th string for which isn’t correct. This will be a short description since, by presumption, will be small.

Formalization

I said above that this fact about Kolmogorov complexity only holds if is high
enough. How can we formalize this? One approach would be to consider a sequence of algorithms instead of a single as above. Each algorithm should return a correct answer on at least of its input strings. Furthermore, the different algorithms should be consistent: specifically, if returns the correct answer, then so should for i" class="latex" />.

Now, if we kept the size of the input string fixed, then this would be trivial, since for greater than , would have to return the correct answer on any string. So we should also consider algorithms that take input strings of length and give a correct answer on at least of those strings. (And if n" class="latex" />, we will have to define “correct answer” for so that every input string returns a correct answer. Thus won’t be very useful, but we can look at for higher s.)

In fact, it turns out we can just describe in terms of the sets of input strings on which the algorithm returns a correct answer.

Definition: A P-test is an assignment of a natural number to each
finite string such that, for each , the number of of length such that is .

If has length , then corresponds to being the smallest such that returns a correct answer in our discussion above.

Theorem (Martin-Löf?): For any computable P-test , there is a constant such that: For all of length and natural numbers , if , then .

Furthermore, the proportion of such that is at least .

I think this was one of Martin-Löf’s original theorems but I’m actually not sure. It’s a rephrasing of the results in Section 2.4 of Li and Vitányi’s book.

So, there is a complexity bound such that any string of high enough complexity will return a correct answer when plugged into the algorithm. However,
may have to be made high (which corresponds to making high) to ensure
that there are a large number of such high complexity strings (or any at all).

What about Noisy Data?

The algorithms discussed above are all deterministic: that is, they correspond to things like Monte Carlo integration rather than averaging noisy data collected from the real world.

So what about noisy data? Random numbers are also useful in analyzing real world data, but the theorem above only applies to computable algorithms. The answer is so simple that it seems like cheating: if you model the noise in your data as coming from some infinite binary sequence , you can simply redo the whole thing but with Turing machines that have access to ! In other words, you won’t get theorems about , but you will get theorems about , which is the length of the shortest Turing machine that has access to and outputs .

What about Infinite Random Sequences?

Above we considered algorithms that knew ahead of time how many random bits we need. What about algorithms that might request a random bit at any time? This is also handled by Kolmogorov complexity: here we say that an infinite binary sequence is Martin-Löf random if there is some such that each prefix of the sequence of length has complexity at least . (There actually has to be a technical change to the definition of complexity of finite strings in this case.)

As in the finite case, there’s a theorem saying that any sufficiently robust algorithm will yield a correct answer on any Martin-Löf random sequence.

One thing I like about this framework is that it provides an idea for what it means for a single infinite sequence to be random. For example, people often say that the primes are random (in fact, it’s one of their main points of interest). Since the primes are computable, they aren’t random in this sense, but this gives an idea of what it might mean: perhaps there’s some programming language that encapsulates “non-number-theoretic” ideas in some way, and some sequence derived from the primes can be shown to be “Martin-Löf” random with Turing machines replaced by this weaker description language. But this is pure speculation.

Why is the derivative of a generating function meaningful?

mkoconnor — Fri, 21 Aug 2015 14:59:45 +0000

A generating function is a formal power series where the sequence of coefficients is the object of interest. Usually the point of using them is that operations on the power series (like addition, multiplication, and differentiation) correspond to meaningful operations on your sequence of coefficients.

I’ve known about the gist of generating functions for a while, and I’d always thought that the fact that differentiation was meaningful was just a magical coincidence (for some reason, addition and multiplication being meaningful didn’t seem as surprising to me).

But recently, Nathan Linger pointed out to me that over in the functional programming community, they have what I think is a very satisfying answer to this question (he said he got it from sigfpe’s blog, but I’m not sure what post, maybe this one?).

Combinatorial Species

I actually find the general concept of generating functions surprisingly slippery. André Joyal’s notion of a combinatorial species makes things more concrete for me. A combinatorial species is simply a functor from to itself, where is the category of finite sets and bijections. The idea is that, for a finite set , should be considered as the set of all structures of a certain kind on .

For example: is the functor taking a set to the set of all linear orders of elements of , so . Another example is which takes to the set of all trees on elements of .

Each species has an associated generating function where (you could use any set of cardinality instead of .

It is now possible to make precise the fact that meaningful operations on species correspond to meaningful operations on generating functions. For example, if you define addition on species by letting be the disjoint union of and , then the generating function of the sum of two species is the sum of the generating functions. Similarly, multiplication on species is defined by letting be a pair of an element of and an element of for some partition , and it corresponds to multiplication of generating functions.

As mentioned before, there’s also an operation on species which corresponds to differentiation of the generating function. It corresponds to an addition of a “hole” in the structure. That is, , and takes bijections on to the output of on the same bijection but fixing .

This is a very powerful fact. For example, a linear order with a hole is the same thing as the product of two linear orders (the one to the left of the hole and the one to the right of the hole). If is the generating function of , this gives us the equation . Since we also know that should be 1, this gives us . Thus, if we didn’t know it already, we can deduce that there are linear orders on elements.

Furthermore, there is a notion of composition of species corresponding to composition of generating functions, where intuitively is a partition of together with a -structure on each element of the partition, and a -structure on the partition as a whole.

So why is differentiation meaningful?

What is the connection between differentiation of species and ordinary differentiation? Two main ways of approaching ordinary differentiation are through limits or through infinitesimals. There don’t seem to be any limits around, so let’s focus on infinitesimals.

Although you can make infinitesimals precise in various ways, most people who think about calculus using infinitesimals do so in a non-rigorous way. Here’s one common non-rigorous principle:

Let be such that . Then for any (smooth, real-valued) and real , .

Of course, there is no such in the standard definition of the real numbers.

Although this is non-rigorous, if we can translate the same non-rigorous principle over to species, that would give a good account of why differentiation shows up in generating functions. The main insight is what the meaning of should be. The condition that for species simply means that there should be at least one -structure on some set. The condition that is subtler: it means that you can’t put a -structure on two sets at the same time. As in the case with real numbers, this is impossible, but the reasoning works anyway so we’ll go with it.

Now let’s think about what means. An -structure on a set is either an -structure or a -structure. By the definition of composition of species, an -structure on a set is a partition of together with an -structure on each element of the partition and an -structure on the partition as a whole. This means that each element of the partition is given either an -structure or a -structure.

But, by the infinitesimal nature of , at most one element of the partition can be given a structure. That means there are two cases: 0 elements of the partition have a -structure or 1 does. If 0 elements do, every element of the partition has an -structure, and the species is . On the other hand, if one does, we can describe the species by saying which one does (with a hole in the -structure) and what the -structure was. That’s . Therefore !

Note that if we didn’t know what the formula for a species with a hole was, we could go through the preceding informal reasoning and deduce that it should correspond to the derivative! I don’t know if others are convinced, but I find this quite satisfying. To be totally clear, as mentioned before, this argument came from sigfpe’s blog and I don’t know if it has history before that.

Complexity to Simplicity and Back Again

mkoconnor — Mon, 13 Feb 2012 04:29:46 +0000

Generalizing a problem can make the solution simpler or more complicated, and it’s often hard to predict which beforehand. Here’s a mini-example of a puzzle and four generalizations which alternately make it simpler or more complicated.

Warning: The solutions are given right after the puzzles. If you want to think about them, cover the screen.

Puzzle: There are 10 prisoners labeled . A malicious warden puts a black or white hat on each. Prisoner can see prisoner ‘s hat iff . The warden has each prisoner guess the hat color on their head, in order starting from prisoner 1. The prisoners can hear previous guesses.

If a prisoner guesses right, they are freed, otherwise they are sent back to jail. Given that the prisoners can strategize as a group beforehand, for how many prisoners can they guarantee freedom in the worst case?

It turns out that the prisoners can guarantee the freedom of all prisoners except prisoner 1: Prisoner 1 first counts the number of black hats, then guesses black if it’s even and white if it’s odd. Now prisoner 2 knows their hat color, since they heard prisoner 1’s guess and they can count the number of black hats they see. Once prisoner 2 guesses correctly, prisoner 3 can guess correctly using prisoner 1 and 2’s answer, and so forth.

Generalization 1:What if there are 3 hat colors? What if there are hat colors? What if the hat colors are drawn from an arbitrary, possibly infinite set ?

This generalization makes the solution simpler, since it reveals something about what was going on in the first solution.

As long as the prisoners can put a group structure on the hat colors, they can run the same strategy as before: Player 1 adds up all the colors and announces that. Each subsequent player adds up all the colors they can see and the correct guesses, and subtracts that from the sum that player 1 announced.

Generalization 2: What if , i.e., there are infinitely many prisoners, arranged like ?

The solution to this generalization makes things more complicated: it uses a trick which is not related to the previous problems, and which is not clear how to generalize.

It turns out there is still a strategy to free all but the first prisoner.

Here’s the new trick: Consider the set of all possible assignments of hat colors to prisoners. Let for iff agrees with for all but finitely many prisoners. This is an equivalence relation. Beforehand, the prisoners choose a representative from each equivalence class (this requires the axiom of choice).

Since each prisoner can see all but finitely many of the other prisoner’s hats, each prisoner knows which equivalence class they’re in. Then the solution is similar to before: The first prisoner adds up the finitely many differences between the hat colors and the chosen representative, and the subsequent ones can then all correctly deduce their own hat color.

Generalization 3: What if is an arbitrary ordinal? What if is an arbitrary linear ordering?

This solution makes things simpler, as it reveals what was going on in the previous solution. This problem was solved by Chris Hardin and Alan Taylor.

The answer is that, for an arbitrary linear order, the prisoners have a strategy that does not use communication (i.e., nobody can hear anyone else’s guess) and guarantees that the set of prisoners who guess wrong does not have an infinite ascending chain.

If is an ordinal, this guarantees that is finite, and then the prisoners can use communication to save all but the first prisoner. If is the reals, then this guarantees that they can free all prisoners except those in a set of measure zero.

The solution is surprisingly simple. The prisoners agree beforehand on a well-ordering of the set of assignments of hat colors to prisoners, then each prisoner takes the -least assignment consistent with what they can see, and guesses the hat color they have in that assignment.

As an exercise, try showing that an infinite ascending sequence of wrong guesses would translate into an infinite descending sequence of hat color assignments, and thus contradict well-ordering.

Generalization 4: What if is just a relation, not necessarily transitive?

Chris Hardin and Alan Taylor considered this generalization in this paper. It turns out that things become complicated again.

For example, suddenly the number of different hat colors matters. Hardin and Taylor prove the following striking theorem:

Theorem: Suppose the prisoners are labeled and that each even can see all higher-numbered odds and each odd can see all higher-numbered evens. Suppose that no prisoner can hear anyone else’s guess. Then:

If there are 2 hat colors, the prisoners have a strategy guaranteed to save infinitely many prisoners.

If there are hat colors, the prisoners have no strategy guaranteed to save anybody.

If there are hat colors, whether or not the prisoners have a strategy that can save anybody is independent of ZFC.

Generating Functions as Cardinality of Set Maps

mkoconnor — Sat, 24 Dec 2011 18:57:28 +0000

There is a class of all cardinalities , and it
has elements , and operations , , and so forth defined on it. Furthermore, there is a map
which takes
sets to cardinalities such that (and so on).

Ordinary generating functions can be thought of entirely analogously
with set maps replacing sets:
There is a class with elements ,
, and operations , . Furthermore,
there is a (partial) map such that (and so on). Here, is defined by . Other operations on set maps (like disjoint union) are similarly defined pointwise.

(This is probably obvious and trivial to anyone who actually works
with generating functions, but it only occurred to me recently, so I
thought I’d write a blog post about it.)

The class is in fact a set, and is just the set of formal power series . The partial map takes to just in case is “canonically isomorphic” (a notion I’ll leave slippery and undefined but that can be made precise) to the map , where indicates disjoint union.

That provides a semantics for ordinary generating functions. Furthermore, this semantics has a number of features beyond those of cardinality. For example, in addition to respecting and , represents composition.

A similar semantics can be provided for exponential generating functions, but it takes a little more work. In particular, we have to single out as a distinguished set. Let be the smallest set containing all measurable subsets of for any finite and which is closed under finite products, countable disjoint unions, and products with sets for finite .

We can define the measure of all sets in by extending Lebesgue measure in the obvious way (taking the product of a set with will multiply the measure by ). Furthermore, notice that, by construction, every element of every set in is a tuple which (after flattening) has all of its elements either natural numbers or elements of and has at least one element of . Therefore, we can define a pre-ordering on by comparing the corresponding first elements that are in .

The point of all that is that, for , we can form the set which will again be in and its measure will be . The corresponding statement with cardinality is not true since you have to worry about the case when elements in the tuple are equal () but the set of tuples that have duplicates has measure 0, so by working with measure, we can get the equality we want.

Finally, let be the set of formal power series . The partial map takes to just in case is “canonically isomorphic” to the map for all in . Just as before, this map respects , , composition, etc.

Note that the exponential generating functions are usually explained via labeled objects and some sort of relabeling operation. This approach weasels out of that by observing that the event that there was a label collision has probability 0, so you can just ignore it.

Mathematica and Quantifier Elimination

mkoconnor — Thu, 15 Dec 2011 05:38:22 +0000

In 1931, Alfred Tarski proved that the real ordered field allows quantifier elimination: i.e., every first-order formula is equivalent to one with no quantifiers. This is implemented in Mathematica’s “Resolve” function.

The Resolve function is called like Resolve[formula,domain] where domain gives the domain for the quantifiers in formula. Since we’ll always be working over in this blog post, let’s set that to be the default at the start.

In[1]:= Unprotect[Resolve]; Resolve[expr_] := Resolve[expr, Reals]; Protect[Resolve];

Now let’s see what quantifier elimination lets you do!

(A couple of caveats first though: First, many of these algorithms are extremely inefficient. Second, I had some trouble exporting the Mathematica notebook, so I basically just copy-and-pasted the text. Apologies if it’s unreadable.)

How many solutions?

Let’s start with just existential formulas. By eliminating quantifiers from , we can tell what the conditions are on a such that there’s at least one solution . For example:

In[2]:= Resolve[Exists[x, x^2 + b x + c == 0]] Out[2]= -b^2 + 4 c <= 0

This just tells you that there’s a solution to the quadratic if the discriminant is non-negative. Let’s turn this into a function:

In[3]:= atLeastOneSolution[formula_, variable_] := Resolve[Exists[variable, formula]]

Now we can verify that cubics always have solutions:

In[4]:= atLeastOneSolution[x^3 + b x^2 + c x + d == 0, x] Out[4]= True
Now suppose we wanted to find when something has at least two solutions. Just like resolving told us when there was at least one, will be true exactly when there are at least two.

This is just as easy to program as atLeastOneSolution was, except that when we create the variables and we have to be careful to avoid capture (what if one of those two already appeared in ?). Mathematica provides a function called Unique where if you call Unique[], you’re guaranteed to get back a variable that’s never been used before. With that we can define atLeastTwoSolutions correctly (edit: actually, this isn’t right if the passed-in variable is also bound in the passed-in formula):

In[5]:= atLeastTwoSolutions[formula_, v_] := With[{s1 = Unique[], s2 = Unique[]}, Resolve[ Exists[{s1, s2}, s1 != s2 && (formula /. v -> s1) && (formula /. v -> s2)]]]
We can check this by verifying that quadratics have two solutions when the discriminant is strictly positive:

In[6]:= atLeastTwoSolutions[x^2 + b x + c == 0, x] Out[6]= -b^2 + 4 c < 0

Here’s the condition for the cubic to have at least two solutions:

In[7]:= atLeastTwoSolutions[x^3 + b x^2 + c x + d == 0, x] Out[7]= c < b^2/3 && 1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] <= d <= 1/27 (-2 b^3 + 9 b c) + 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3]
Note that (and I believe Resolve always does this) the condition given first is sufficient that the later square root is well-defined:
In[8]:= Resolve[ForAll[{b, c}, c < b^2/3 ⇒ b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3 > 0]] Out[8]= True
It’s clear that we can determine when there at least n solutions by a very similar trick: just resolve .

We’ll first write a helper function to produce the conjunction of inequalities we’ll need:

In[9]:= noneEqual[vars_] := And @@ Flatten[Table[If[s1 === s2, True, s1 != s2], {s1, vars}, {s2, vars}]] In[10]:= noneEqual[{x, y, z}] Out[10]= x != y && x != z && y != x && y != z && z != x && z != y

And now we’ll write atLeastNSolutions:

In[11]:= atLeastNSolutions[formula_, v_, n_] := With[{sList = Array[Unique[] &, n]}, Resolve[ Exists[sList, noneEqual[sList] && (And @@ Table[formula /. v -> s, {s, sList}])]]]

Given atLeastNSolutions, we can easily write exactlyNSolutions:

In[12]:= exactlyNSolutions[formula_, v_, n_] := BooleanConvert[ atLeastNSolutions[formula, v, n] && ! atLeastNSolutions[formula, v, n + 1]]

I used BooleanConvert instead of Resolve since there won’t be any quantifiers left in the formula, so we just have to do Boolean simplifications.

In[13]:= exactlyNSolutions[x^3 + b x^2 + c x + d == 0, x, 2] Out[13]= ! 1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] < d < 1/27 (-2 b^3 + 9 b c) + 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] && 1/27 (-2 b^3 + 9 b c) - 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] <= d <= 1/27 (-2 b^3 + 9 b c) + 2/27 Sqrt[b^6 - 9 b^4 c + 27 b^2 c^2 - 27 c^3] && c < b^2/3 In[14]:= exactlyNSolutions[x^2 + b x + c == 0, x, 1] Out[14]= -b^2 + 4 c <= 0 && -b^2 + 4 c >= 0

This last calculation shows that a quadratic has exactly one solution exactly when the discriminant is both nonnegative and nonpositive (as you can see, there is no guarantee that the formula will be in it’s simplest form).
We now have a way to test whether a formula with one free variable has solutions for specific values of , since exactlyNSolutions will return either True or False if you quantify out the only variable. For example:

In[15]:= p = x^4 - 3 x^3 + 1 Out[15]= 1 - 3 x^3 + x^4 In[16]:= Plot[Evaluate[p], {x, -3, 3}]

In[17]:= exactlyNSolutions[p == 0, x, 2] Out[17]= True

It would be nice, however, to have a function which will just tell you how many solutions such a formula has.

In the single-variable polynomial case, we could just try exactlyNSolutions for until we find the right . However, there might not be finitely many solutions if the formula involves inequalities or higher-dimension polynomials (e.g., has infinitely many solutions).

How can we tell if a formula has infinitely many solutions? Well, the fact that has quantifier elimination implies that for with just free must be a finite union of points and open intervals (since the only quantifier free terms are and . Therefore is infinite iff it contains a non-empty open interval, i.e., iff .

In[18]:= infinitelyManySolutions[formula_, v_] := With[{a = Unique[], b = Unique[]}, Resolve[Exists[{a, b}, a < b && ForAll[v, a < v < b ⇒ formula]]]]

To test:

In[19]:= infinitelyManySolutions[Exists[y, x^2 + y^2 == 1], x] Out[19]= True

Now we can write numberOfSolutions and be assured that it will always (theoretically) terminate for any formula with a single free variable:

In[20]:= numberOfSolutions[formula_, v_] := If[infinitelyManySolutions[formula, v], Infinity, Block[{n = 0}, While[! exactlyNSolutions[formula, v, n], n++]; n]]

A few examples:

In[21]:= numberOfSolutions[p == 0, x] Out[21]= 2 In[22]:= numberOfSolutions[p > x^2, x] Out[22]= ∞ In[23]:= numberOfSolutions[p > x^6 + 5, x] Out[23]= ∞ In[24]:= numberOfSolutions[p > x^6 + 6, x] Out[24]= 0 In[26]:= Plot[{p, x^6 + 5, x^6 + 6}, {x, -1.6, -1}, PlotLegend -> {HoldForm[p], x^6 + 5, x^6 + 6}, LegendPosition -> {1, 0}, ImageSize -> Large]

Up to now, all our functions have taken single variables, but we can accomodate tuples of variables as well. First, we’ll define the analogue of noneEqual to produce the formula asserting that none of the given tuples are equal (recall that two tuples are unequal iff a pair of corresponding components is unequal):

In[27]:= noTuplesEqual[tuples_] := And @@ Flatten[Table[If[t1 === t2, True, Or @@ MapThread[#1 != #2 &, {t1, t2}]], {t1, tuples}, {t2, tuples}]] In[28]:= noTuplesEqual[{{x[1], y[1]}, {x[2], y[2]}}] Out[28]= (x[1] != x[2] || y[1] != y[2]) && (x[2] != x[1] || y[2] != y[1])

Now we can add rules to our old function to deal with tuples of variables as well:

In[29]:= atLeastNSolutions[formula_, variables_List, n_] := With[ {sList = Array[Unique[] &, {n, Length[variables]}]}, Resolve[ Exists[Evaluate[Flatten[sList]], noTuplesEqual[sList] &&

And @@ Table[ formula /. MapThread[Rule, {variables, tuple}], {tuple, sList}]]]];

We can extend infinitelyManySolutions by observing that a formula has infinitely many solutions iff some projection does.

In[30]:= infinitelyManySolutions[formula_, variables_List] := Or @@ Table[ infinitelyManySolutions[Exists[Select[variables, ! (# === v) &], formula], v], {v, variables}] In[33]:= ContourPlot[{x^2 + y^3 - 2, x^2 + y^2/4 - 2}, {x, -3, 3}, {y, -3, 3}]

In[34]:= exactlyNSolutions[x^2 + y^3 == 2 && x^2 + y^2/4 == 2, {x, y}, 2] Out[34]= False

(There are actually four solutions. This example of a set equations for which it’s difficult to tell how many solutions there are by graphing is from Stan Wagon)

Solving Polynomial Equations

In the last section, we saw how to use quantifier elimination to find out how many roots there are. But how can you actually find the roots?

In a certain sense, you’ve already found them just when you identified how many there are! To “find” a root in this sense, you just introduce a new symbol for it, and have some means for answering questions about its properties. Given some property , if you want to determine if it holds of the 6th root of some polynomial with 17 roots, then you just have to decide .

We can implement this by a function withSpecificRoot, that takes a variable, the formula it’s supposed to be a solution to, which of the roots it’s a solution to, and a formula in which you want to use this root:

In[35]:= withSpecificRoot[variable_, rootFormula_, whichRoot_, totalRoots_, formula_] :=

With[{roots = Array[Unique[] &, totalRoots]}, Resolve[ Exists[Evaluate[roots~Join~{variable}], Less[Sequence @@ roots] && variable == roots[[whichRoot]] && (And @@ Table[(rootFormula /. variable -> root), {root, roots}]) && formula]]]

We can tell where various roots are with respect to already-known real numbers:
In[36]:= withSpecificRoot[x, x^2 - 3 == 0, 1, 2, x < 3] Out[36]= True In[37]:= withSpecificRoot[x, p == 0, 1, 2, x < 1] Out[37]= True In[38]:= withSpecificRoot[x, p == 0, 2, 2, x < 1] Out[38]= False

We can also compute relationships between roots like :

In[39]:= withSpecificRoot[sqrt6, sqrt6^2 == 6, 2, 2, withSpecificRoot[lhs, lhs^2 == 5 + 2 sqrt6, 2, 2, withSpecificRoot[sqrt3, sqrt3^2 == 3, 2, 2, withSpecificRoot[sqrt2, sqrt2^2 == 2, 2, 2, lhs == sqrt3 + sqrt2 ]]]] Out[39]= True

That’s all I have time for now, but I hope to write another blog post on the subject soon!

A Logical Interpretation of Some Bits of Topology

mkoconnor — Sat, 09 Jul 2011 09:26:56 +0000

Edit: These ideas are also discussed here and here (thanks to Qiaochu Yuan: I found out about those links by him linking back to this post).

Although topology is usually motivated as a study of spatial structures, you can interpret topological spaces as being a particular type of logic, and give a purely logical, non-spatial interpretation to a number of bits of topology.

This seems like one of those facts that was obvious to everyone else already, but I’ll write a quick blog post about it anyway.

As you’re probably aware, a set of natural numbers is called semi-decidable if there is a computer program which, given any , will eventually terminate and return “Yes” if . If , the program is not required to ever return and you may never learn whether or not .

There are many such “semi-decidable” propositions unrelated to natural numbers which intuitively have the same property: i.e., there is some test you can perform such that, if is true, you will eventually find out, but if it’s false, you may never learn that fact. For example, consider the proposition that you are (strictly) taller than 6 feet. To test , you could measure your height with ever-finer rulers. If your height is actually strictly greater than 6 feet, you will eventually find out when you use a ruler with granularity finer than to measure your height. On the other hand if is false and you are unfortunate enough that is exactly 6, you will never learn whether or not is true no matter how fine a ruler you use.

Semi-Decidable Logic

Let’s come up with a logic for such semi-decidable propositions. We’ll keep it a propositional logic to keep things simple. First off, notice that we shouldn’t allow negation: if is semi-decidable, it’s not necessarily the case that is. On the other hand, we can allow conjunction: if and are semi-decidable, then you can test by testing and separately and stopping if and when both the tests for and stop.

Furthermore, we can allow arbitrary disjunctions: given , we can test by running all the tests in parallel and stopping when any of them stop. Note that even given that we can arbitrarily many tests at the same time, it still doesn’t follow that arbitrary conjunctions of semi-decidable propositions are semi-decidable: if the first one terminates after 1 minute, the second after 2, etc., we’ll never be able to stop the test of the conjunction even though all tests terminate eventually.

Implication is a bit tricky: isn’t necessarily semi-decidable for the same reason that isn’t, but we still want to reason about the case where implies . Therefore, we’ll allow the formation of the statement with the meaning that implies , but only at the “top level”, i.e., you can’t nest this connective.

Finally, both and are semi-decidable.

Now we need rules to tell us when a set of statements implies another statement. First off, there are some boring structural rules that I’ll omit (e.g., and imply and so on).

There are rules that give the two connectives their meaning:

For any and , and hold.
For any , , and , the statements and together imply .
For any and , holds.
For any and , the set of statements implies the statement .

Finally, there’s a distributivity rule:

For any and the statements and are equivalent (each implies the other).

Topology

As you’ve probably guessed, there is a close connection between semi-decidable logics and topological spaces. In fact, given a topological space, you can form a semi-decidable logic by making a propositional symbol for each open set . You can then interpret all propositional formulas as open sets by interpreting as union and as intersection. Finally, take as a set of axioms the set of all statements where the open set corresponding to is a subset of the open set corresponding to . This set is closed under the inference rules given.

You can also start with a semi-decidable logic and generate a topology; this is a form of Stone duality. In general, if you start with a topological space, translate to a semi-decidable logic, then translate back, you might not get your original space back. However, you will if the space you start with is sufficiently nice (e.g., Hausdorff).

With that out of the way, let’s interpret some topological concepts in our new logical framework!

Topologically, a neighborhood is any set which contains a non-empty open set. Logically, these correspond to propositions that are possible to learn. In the height example, the proposition that your height is in is a neighborhood since you might happen to learn it by learning some stronger fact, but you can’t run a semi-decidable test for exactly that proposition. In contrast, the proposition that your height is exactly 6 ft is not even a neighborhood.
Topologically, an open covering is a set of open sets whose union covers the whole space. Logically, this corresponds to a deterministic experiment. If you run the tests in parallel, you are guaranteed that eventually (at least) one of them will stop, since the sets cover the whole space. In the height example, the open covering of all open intervals of diameter corresponds to measuring your height to a granularity of and recording the results.
Topologically, an open covering is a refinement of an open covering if every element of is a subset of some element of . Logically, this corresponds to experiment being more informative than experiment . Whatever answer you get from experiment , you will be able to answer the question asked by experiment .
Topologically, a space is compact if every open cover has a finite refinement. Logically, this means that anything about the space that you could find out by any experiment at all is actually discoverable by an experiment that runs only finitely many tests and hence is (maybe) doable in real life.
Topologically, a space has Lebesgue covering dimension if all open covers have a refinement with no of the open sets having non-empty intersection. Logically, this corresponds to something like a bound on the amount of information you can get from one experiment. The information you get from running an experiment is just the list of propositions (open sets) which you’ve learned are true. The condition guarantees that that list will be no longer than , bounding the information received from the experiment. This makes spatial sense too: a measurement on in general yields less information than a single measurement on .

Actually, that last correspondence was the whole impetus for me writing this blog post: I never really understood the definition of Lebesgue covering dimension from a spatial perspective, but it makes perfect sense to me from a logical perspective.

Miscellaneous

Here are a few more random facts which may or may not be accurate and/or make sense.

Geometric Logic

I believe that “semi-decidable logic” as I presented it is in fact the propositional version of geometric logic. Geometric logic also has a higher-order form: just as the propositional form corresponds to topologies, I believe the full higher-order form corresponds to toposes.

Sheaves

I think you can extend this interpretation of topological spaces to an analogous one for sheaves. I believe it’s something like: a sheaf corresponds to a set of solutions to some problem that you learn more about as you learn more semi-decidable propositions. In particular, the gluing property corresponds to the fact that: if you can determine via an experiment which of or holds, and you have a solution given and a solution given that are compatible, then you have a solution: run the experiment, then use the solution of whichever of or turns out to be true.

Making sense of this is left as an exercise to the reader.

Grothendieck Topologies

I said above that I’d address the assumption that you can run arbitrarily many tests at once. I believe that, among many other things, Grothendieck topologies remove this restriction.

Regular topologies have the sort-of-odd property that the open covering relation is completely determined by the partial ordering on open sets given by inclusion. Grothendieck topologies do away with this: in a Grothendieck topology, there is in addition to the partial order an assignment for every open set of which sets of open sets are deemed to define “open covers”. Grothendieck topologies also remove the restriction that the “partial order” on open sets is a partial order; it’s allowed to be a more general category.

The Spectrum From Logic to Probability

mkoconnor — Sat, 18 Sep 2010 22:36:29 +0000

Let be the set of propositions considered by some rational logician (call her Sue). Further, suppose that is closed under the propositional connectives , , . Here are two related but different preorders on :

if logically entails .
if Sue considers at least as likely to be true as is.

Let be the equivalence relation defined by iff and let similarly be defined by iff .

Then we know what type of structure is: since we’re assuming classical logic in this article, it’s a Boolean algebra. What type of structure is ?

We can at least come up with a couple of examples. Since Sue is a perfect logician, it must be that if , then . If Sue is extremely conservative, she may decline to offer opinions about whether one proposition is more likely to be true than another except when she’s forced to by logic. In this case, is equal to and therefore again a Boolean algebra.

In the other extreme, Sue may have opinions about every pair of propositions, making a total ordering. A principal example of this is where is isomorphic to a subset of and Sue’s opinions about the propositions were generated by her assigning a probability to every proposition .

What’s in between on the spectrum from logic to probability? Are there totally ordered structures not isomorphic to or a subset? More ambitiously: every Boolean algebra has operations , , , while has operations , , which play similar roles in the computation of probabilities (note that is partial on ). How are these related and does every structure on the spectrum from logic to probability have analogous operations?

These structures (i.e., structures of the form for some acceptable in a sense to be defined below) were called scales and defined and explored in a very nice paper by Michael Hardy.

The Definition of a Scale

Modding out by the equivalence relations once and for all, the general setup is that we have a map (induced by the identity function on in the above setup) from a Boolean algebra to a poset . What should be true of ?

Since if a proposition logically entails a proposition , Sue will consider at least as likely to be true as , we should have that implies ( will now be the ordering in either or , depending on context). In fact, we should have that implies .

Actually we should have more: For example, it should be the case that if , then . In general, if is a propositional formula where appears negatively (that is, all occurrences of are negated in a normal form of ), then should imply and the reverse is true if appears positively in . Furthermore, if we can require that the inequality be strict.

Finally, we should require that not just if , but even if it only holds that . That is, even if doesn’t logically entail , if you consider more likely to be true than , you should consider more likely to be true than . A similar generalization to holds as above.

These considerations are equivalent to Hardy’s definition:

Let be a Boolean algebra, be a poset, and . Then is called a basic scaling if:

is strictly increasing, so that implies .

preserves relative complementation, so that if and , then , where is the relative complement .

Hardy proves that the relative complement operation is well-defined on , that is, that depends only on , , and . Note however, that it is a partial operation: even if in , there is no guarantee that there such that , , .

A scale is then defined as a poset together with a partial relative complement operation which is the range of a basic scaling.

An Example

Hardy’s paper gives many examples of scales, including a few pretty wild ones. Here’s one: Let be the boolean algebra of subsets of . Let iff or . Let iff . This defines a basic scaling to a scale .

What does it look like? Every element except for has an immediate predecessor, and every element except for has an immediate successor. Therefore, it is partitioned into “galaxies” together with in initial galaxy and a final galaxy . Between any two galaxies that are comparable, there are uncountably many galaxies and infinite antichains of galaxies.

Analogues of , ,

We already know that there are appropriate analogues of in all scales, since we know that relative complementation carries over in a well-defined way from the domain Boolean algebra.

What about ? Hardy proves the following:

If in , then depends only on and . In this case we define to be .

For , if exists then .

It turns out that, for any , the operation is a partial injective map. Let be its inverse.

Hardy calls a scale divided if the necessary condition for existing given by (2) above is also sufficient. He proves:

For any divided scale, and , in Boolean algebra ,

In other words, all divided scales do have a operation, which satisfies the appropriate law from probability theory.

Finding an analogue of or is trickier, and, when he wrote the paper, Hardy only knew how to do it in the case that the scale is linearly ordered and Archimedean, defined as follows:

Let . Then is called infinitesimal if there is an infinite subset such that for such that for all .

A scale is called Archimedean if it is divided and has no nonzero infinitesimals.

The idea behind the definition of infinitesimal is that, assigning the Boolean algebra a total measure of 1, the measures of the elements of must approach 0.

In that case, you can define a division as follows: Let be the maximum number of times can be subtracted from , let be the maximum number of times that result can be subtracted from , and so on. The quotient is then defined as the continued fraction:

Then the map maps the scale injective to a subscale of (in particular, preserving ). Thus, can be pulled back from its definition on .

Topology and First-Order Modal Logic

mkoconnor — Sat, 13 Mar 2010 07:23:02 +0000

The normal square root function can be considered to be multi-valued.

Let’s momentarily accept the heresy of saying that the square root of a negative number is , so that our function will be total.

How can we represent the situation of this branching “function” topologically?

One thing we could do is just take the graph of the multivalued “function” itself in the subspace topology of , which is topologically just like this:

This has the downside that, at the origin (the place where all three lines meet), it doesn’t really represent the fact that the graph of the original multivalued “function” was the union of two genuine single-valued functions: There is no neighborhood of the origin which is functional in any way.

But what about this topological space? (The two filled-in dots represent points which are present in the space, the non-filled-in dot represents a point missing from the space. The border is not part of the space.)

Here, the open sets are given by a basis consisting of: open sets on either of the three branches: i.e., like this

and this:

as well as open sets of this form:

and this form:

(Note that this space isn’t Hausdorff!).

This space (call it ) represents the fact that the original graph was the union of two genuine single-valued functions in the following sense: There is a function, , from to the -axis (i.e., ) such that for all points , there is an open set such that restricted to is a homeomorphism to an open set in . That is, every point in has a neighborhood such that the inverse of restricted to that neighborhood is a function.

The space as defined above is (sort of) a sheaf over . More precisely, it’s an étalé space: To restate the definition above more generally, an étalé space over a topological space is a topological space together with a function such that, for all , there is an open set such that is open and is a homeomorphism. In general, for any , the set is called the stalk over . Note that, as a subspace of , any stalk is discrete.

As another example along similar lines, gluing the domains of the different branches of the complex logarithm gives rise to a sheaf over : (This image is from Wikipedia and is by Jan Homann.)

In this case, is the space pictured, is (or ), and is projection along the depicted vertical direction. Note that this represents how the domains of the various branches of the complex logarithm fit together; this graph is not a depiction of the complex logarithm or any branch of the complex logarithm, which is a function from to , and thus hard to draw!

The two examples I gave were of natural functions which happen to be multivalued, but there are much more general examples of étalé spaces. For example, for any topological spaces and , there is a sheaf of all continuous functions from to ! The étalé space corresponding to this sheaf would have, for each , the stalk equal to the set of germs of continuous functions from to at . (See here for more).

First-Order Modal Logic

And now we’ll switch to a seemingly totally different topic.

A modal logic is a logic that contains an operator , representing necessity, and , representing possibility. If is a proposition, then should be interpreted as the proposition “It is necessary that ”, or “ is necessarily true”, and should be interpreted as the proposition “It is possible that is true”.

The operators and are dual to each other: is equivalent to and similarly is equivalent to (if you think about it, this actually makes real-life sense, as well as just sense in logic-land!).

There are a number of different axiomatic systems for propositional modal logic; here we’ll just consider S4, which was invented by C. I. Lewis in the early 20th century. It has the following rules:

For all propositions , is a theorem.
For all propositions , is a theorem.
For all propositions and , is a theorem.
For any proposition , if is a theorem, then is a theorem. (Note that this does not say that is a theorem; if it did, the whole thing would be trivial!).

In the middle of the 20th century, Saul Kripke invented possible-worlds semantics for modal logics. The idea is that there is a set of possible worlds and at each possible world , each atomic proposition may hold or not, independently of the other possible worlds. Furthermore, there is a relation between possible worlds; should be interpreted as “World considers world possible”. The whole setup is called a Kripke frame.

This gives a semantics for modal logic: For any proposition , holds at a world if there is some world that considers possible such that holds at . Similarly, holds at a world if holds at every world that considers possible.

Somebody showed (probably Kripke, but I’m not sure) that the class of Kripke frames where is reflexive and transitive corresponds to S4, in the sense that all theorems of S4 hold in all such Kripke frames, and everything which holds in all such Kripke frames holds in S4.

Note that the accessibility relation is completely clear-cut and discrete: given a world , you know exactly what worlds it considers possible. But, interpreting as a measure of “closeness” of two worlds, observe that topology gives us a more nuanced version of what “closeness” means: we think of the topology on , for example, as defining what “closeness” means on , even though no two fixed real numbers are actually close to one another! (At least, they’re not close to one another in any absolute sense.)

It turns out we can incorporate this sense of “closeness” into the semantics for modal logic as well. We’ll define a topological Kripke model this way: We again have a set of possible worlds, but instead of defining a relation between them, we define a topology on . As above, we say, for each possible world , which of the atomic propositions hold at that world. Note that this is equivalent to defining arbitrary subsets of . In general, we’ll let be the set of worlds at which proposition holds. Then we can define to be the interior of and to be the closure of . In other worlds, holds at a world there is some open set such that holds at every world , and holds at a world if for all open sets there is some such that holds at .

It turns out that this semantics, too corresponds to S4, in the sense that all theorems of S4 hold in all topological Kripke models and all propositions which hold in all topological Kripke models are theorems of S4. (I’m not exactly sure who first came up with this. I looked at the Wikipedia article and some of its links, but I couldn’t quite figure out who was the first. It seems Tarski was involved somehow anyway.)

So far, the logics we’ve considered have all been propositional, but you can easily add first-order logic to S4 to get FOS4. There have been a number of proposals for how to define a semantics for FOS4. In 2008, Steve Awodey and Kohei Kishida proposed that the right generalization of topological semantics for S4 to first-order semantics for FOS4 was to use étalé spaces!

Here’s an example of how it works. Consider the following étalé space:

The top space, , is homeomorphic to 0\}" class="latex" />. The bottom space, , is the circle . The projection is given by .

We can put a first-order structure on this étalé space by considering each stalk for to be a first-order universe, just as when defining regular semantics for first-order logic.

We can define the interpretation of relation symbols, function symbols, and constant symbols more or less arbitrarily: The only restrictions are that the selection of the interpretation of any particular constant symbol from each stalk must be done in a continuous manner, and there is a similar continuity condition on the interpretation of function symbols.

It will turn out that the interpretation of a formula with free variables will be a subset of , where is the th fibered product over of with itself. (This just means that, e.g., is the étalé space where the stalk of any is .). In particular, as in the regular topological semantics, the interpretation of a sentence will be a subset of .

The crucial step, as before, is that the interpretation of is the interior of the interpretation of , and the interpretation of is the closure of the interpretation of , although now the interiors and closures are being taken in the topology of .

To see how this works, suppose that in our example we define the relation on each stalk by restricting the usual ordering on . Then it is true in our model that (since it is true in every stalk), but it is not true that . The reason is that in this stalk:

the red dot is the minimum element of the stalk, but if you push it to the right just a little bit, that’s no longer true. Intuitively, it’s the minimum element, but not necessarily so.

Awodey and Kishida prove that, as in the other cases, this sheaf semantics corresponds to FOS4 in that every theorem of FOS4 holds in all sheaf models, and everything which holds in all sheaf models is a theorem of FOS4. (They actually prove something a bit stronger than that.)

Other Uses of Sheaves in Logic

This relationship between sheaves and logic came up relatively recently, but there is a longer and more well-known relationship which I’ll just mention briefly, namely through toposes. Using topos theory, you can interpret the class of sheaves over a topological space as a category similar to the category of sets. The interpretation is actually fairly similar to the interpretation here, but instead of interpreting modal logic, higher-order (non-modal) logic is interpreted. It turns out that the set of truth values in this interpretation is the set of open subsets of (whereas in the modal interpretation just given, the truth values could be arbitrary subsets of ). One of the remarkable things about this is that the set of open subsets of can itself be interpreted as an étalé space over , which is what allows the equivalent of power sets to be taken in this category.

For more information on this, see Sheaves in Geometry and Logic.

Two Interesting Observations about Voting I Hadn’t Seen Until Recently

mkoconnor — Tue, 23 Feb 2010 01:32:43 +0000

By “voting”, I mean the following general problem: Suppose there are candidates and voters. Each voter produces a total ordering of all candidates. A voting procedure is a function which takes as input all orderings, and produces an output ranking of all candidates. Arrow’s impossibility theorem states that there is really no satisfactory voting procedure when the number of candidates is greater than 2 (majority rule is a good voting procedure when there are two candidates).

Observation #1 (which I read in Chapter 23 of David Easley and Jon Kleinberg’s book Networks, Crowds, and Markets): Voters will vote strategically (i.e., they will lie) even when they have a common goal.

In the setup above where each voter has a set of personal preferences and voters are essentially competing with other voters who have different preferences, it is easy to come up with situations where it would be advantageous for a voter to lie. For example, if a voter’s true rankings are B > C" class="latex" /> where , , and are candidates, but has a much better chance of winning than does, it may be advantageous for the voter to submit a ranking of C > B" class="latex" /> if she wants to maximize the chance that comes out on top.

However, in a situation where every voter has the same goals, but they have different private information (and it’s impossible or infeasible for them to share their private information with each other), it seems like there’s never a reason for a voter to lie. But there is, even when there are only two alternatives that are being voted on.

Consider the following game: There is a vase filled with marbles. Either it has 10 white marbles (call this state ) or 5 white marbles and 5 green marbles (call this state ). Which of state or state holds was determined by flipping a fair coin before the game started (and this fact is common knowledge). Each of three voters independently and without communication draws one marble at random from the vase, observes its color, puts it back, and then votes on whether or not or holds. The voters win if a majority guesses right and lose otherwise.

As you can work out: if you draw a white marble, you believe state holds with probability and holds with probability . If you draw a green marble, you believe state holds with probability and holds with probability (you are sure that state holds).

However, voters will not vote their true beliefs: Suppose they did, and consider whether a fixed voter has an incentive to deviate from this strategy (i.e., consider whether all voters voting their true beliefs is a Nash equilibrium). When you draw a green marble, you should definitely vote for state . But what should you do when you draw a white marble? The key question is: when will your vote make a difference? Only when one other voter votes and the other voter votes . But because they are voting sincerely, the voter who voted must have drawn a green marble and, therefore she must be right! So, you should vote as well. That is, if you think that the other two voters are voting sincerely, you should disregard the information you get from observing a marble, and always vote !

In the Easley-Kleinberg chapter, the authors also demonstrate a version of this with juries, where the vote must be unanimous to convict a defendant, and otherwise she will be acquitted. A similar situation happens: assume that you are thinking of voting to acquit. Under what circumstances will your vote make a difference? Only when every other juror has voted to convict. In that circumstance, it is quite likely that you are wrong, and the defendant was guilty, and thus you should have voted to convict no matter what! (This shows that everyone voting sincerely is not an equilibrium but doesn’t show what the equilibria of this game are. According to the authors, finding the equilibria is quite difficult.)

Observation #2 (which I saw in a paper by Roger Sewell, David Mackay, and Ian McLean): Maximizing the Entropy of the Outcome of Voting Leads to Good Results

As I alluded to above, Arrow’s impossibility theorem says that there’s no satisfactory way to provide an output ranking of candidates given the input rankings from each voter (here we will assume that we simply know each voter’s true ranking and not consider strategic voting). However, this just applies to deterministic voting procedures: it is, in fact, quite easy to come up with a probabilistic voting procedure satisfying all of the hypotheses of Arrow’s impossibility theorem: just pick a voter uniformly at random and take their preferences to be the output ranking!

Aside from the fact that it seems unlikely that the general public would accept such explicit randomization in the voting procedure any time soon, this process (which is called Random Dictator) has a couple of other negative aspects. First of all, it can easily lead to extreme outcomes: Suppose that there are 20 candidates, and the top 10 in the output ranking will be given various positions in the government. Suppose 10 candidates are from one political party, and 10 are from the other, and further that the populace is highly polarized: everyone ranks their party’s 10 candidates strictly better than each of the 10 candidates of the other party. Then it is guaranteed that a single-party government will be the result. What might be better is 10 officials chosen from a mix of the two parties according to each party’s representation in the voting public.

Another way in which Random Dictator doesn’t compromise very well is the following: Suppose there is heavy contention between candidates from the top rank among voters, but there is a candidate that is everyone’s second choice. Then under Random Dictator there is zero chance that will be top-ranked in the output-ranking, even though it intuitively seems that is more generally liked by the population than any of .

The authors of the paper fix these problems by proposing the following procedure: For each pair of candidates and , record the proportion of the voting population which prefers to . Now, from all probability distributions over output rankings such that, for each pair of candidates , , the probability of being ranked higher than in the output ranking is , choose the one that has maximum entropy. Then choose an output ranking according to that distribution.

I won’t define maximum entropy here, but I will give a few examples. The idea that there is a number called the entropy associated with every probability distribution, and furthermore that if you are looking for a probability distribution in a certain class, but don’t know anything about it except that it is in that class, then the “right” distribution to take is the one that maximizes the entropy (obviously this is an unprovable assertion). In some sense, choosing the maximum entropy distribution from a class codifies the fact that you know nothing about it except that it is in that class.

For example, the maximum entropy probability distribution over the set is the uniform distribution, which assigns probability to each number. Fixing a and 0" class="latex" />, the maximum entropy probability distribution in the set of probability distributions over with mean and variance is a Gaussian distribution. Fixing 0" class="latex" />, the maximum entropy distribution over the positive reals with mean is the exponential distribution with mean .

Basically, what the authors have proposed is that the actual total ranking of each voter are not important (or else we would have the first problem mentioned above), and in particular which candidate each voter happened to place in the top rank is not important (or else we would have the second problem mentioned above); the only thing that’s really important is getting the correct proportions of the pairwise rankings right. And the way to get a distribution on output rankings which reflects nothing except for the constraints on the pairwise rankings is to pick the maximum entropy distribution satisfying those constraints.

The beauty of this is that if you disagree with them, that’s fine: all you have to do is figure out what you think is the important information to preserve from the voters’ rankings, then pick the output distribution which maximizes entropy among those satisfying those rankings, and that will be the “right” output for your choice of what’s important, in the sense that it will not “take into account” anything that you don’t think is important. To take a trivial example, if you decide that the total ranking of each voter is important, then the maximum entropy distribution on output rankings will degenerate into the Random Dictator process.

The authors additionally ran simulations of elections using various voting procedures in order to verify that the maximum entropy voting scheme they propose is “better” (in some senses they define) than others.

Quantish Physics: A Discrete Model of Quantum Physics

mkoconnor — Wed, 17 Feb 2010 05:11:12 +0000

In the book Good and Real, author Gary Drescher, who received his PhD from MIT’s AI lab, defends the view that determinism is a consistent and coherent view of the world. In doing so, he enters many different arenas: ethics, decision theory, and physics.

In his chapter on quantum mechanics, he defends the “many-worlds” interpretation (although he doesn’t think the term accurately describes the concept) versus the Copenhagen interpretation. In the process of doing so, he does something I thought was extraordinary: he comes up with a simple model of quantum mechanics in which all of the standard concepts you read about: the two-slit experiment, the Heisenberg uncertainty principle, etc., are represented. This model requires no prerequisites from physics and actually uses almost totally discrete mathematics!

(Edit: I somehow missed this when originally writing this post, but Drescher also outlines quantish physics in an online paper.)

I’ll sketch it below.

The first step is to define the “classical” version of our physics, which we will then tweak to get the quantum version. The “topology” of our universe will be given by a finite directed graph where each vertex has three edges coming in, and three edges going out. There is given a bijection such that if edge is directed in to vertex , edge is directed out of vertex . Given this bijection, you can think of each edge as actually a piece of a wire: a directed loop in the graph. Finally, we require that each edge is labeled either , or so that each triple of in-edges to a given vertex gets a distinct label.

We can picture vertices like this:

The vertex is represented by the big box in the center. We will always put the control edge at the top, and the two switch edges at the bottom. Note that the edge need not have the same label as .

Particles inhabit edges. If is the set of particles and is the set of edges, then a point in determines the position of each particle. The set is thus called (classical) configuration space. Time in this universe is discrete; to describe how the system evolves, we just have to define the successor function which tells how the system progresses one time step (the superscript stands for “classical”).

For any edge and label , we let be the edge with same destination vertex as and with label .

For a configuration and edge , we let if edge does not have a particle in configuration , and we let it be if it does.

Now we define as follows: For every edge labeled , . For every edge labeled ,

In other words, particles on a control edge always go straight along whatever loop they are on. However, particles on a switch edge may or may not cross over to the loop of the other switch edge, depending on whether or not there is a particle on the control edge (hence the names).

For example, this configuration:

turns into this configuration:

and this configuration:

turns into this configuration:

Now for the “quantum” variation, which Drescher calls quantish physics. In this case, each particle now has a sign (/) attached to it. Furthermore, each vertex has an angle associated with it, called ‘s measurement angle. The classical configuration space was ; the quantum configuration space will be the set of all formal linear combinations of states . Given a state , the number can reasonably be interpreted as the probability of being in state (see Drescher for more comment on this).

The task now is to describe the successor function describing how the universe evolves through time. First some preliminary definitions:

Given a nonzero complex number , an angle , and , let be the component of which is parallel to if and the component of which is perpendicular to if .

Note that and, due to the Pythagorean theorem, .

If, furthermore, , let be the component of which is parallel to (note: , not as above) if and perpendicular to if .

As above , and . Similarly, and .

Finally, note that for a given , and , the split function is simply multiplication by a complex number independent of (and similarly for the two-argument split function).

First off, is -linear; it therefore suffices to define for classical configurations .

If a particle is on edge labeled , it always passes straight through to .

If a particle is on a switch edge, the successor state will be the sum of (up to) 4 non-zero classical configurations corresponding to whether or not it stays on its loop or crosses over to the loop of the other switch edge, and whether or not it changes sign. Let for denote the classical state where the particle stays on its own loop iff and the particle keeps the same sign iff . Then the weight given to is (the is not a typo). The use of is just because we are assuming that we are starting from a pure classical configuration; if it’s a classical configuration time a weight , the would be replaced with .

When there are particles in the classical configuration , each is split separately; may be the sum of up to classical configurations. Since the splitting is simply multiplication by a complex number, it doesn’t matter in what order the splittings are performed.

I’ll now briefly describe some quantum phenomena which can be interpreted in the quantish world. For much more insight, meaning, and many more examples, please see Drescher’s book!

The Two-Slit Experiment

Suppose we have the following configuration:

where the measurement angles of and are equal and oblique to the measurement angle of , and the sign of the particle is positive. Then, it is always the case that after three timesteps, the particle is in the middle rightmost edge (i.e., applied three times to the initial state yields the linear combination consisting of the sum of the single classical state where the particle is in the middle rightmost edge). It is never in the bottom rightmost edge.

However, if we remove one of the ways the particle can get to the bottom edge:

now the particle arrives at the bottom rightmost edge with positive probability (i.e., nonzero weight). (The ellipsis simply means that this edge goes somewhere else; we don’t care what happens there.) This is because of destructive interference in the first case, which was removed in the second.

Suppose we try to investigate what’s going on, and observe if the particle is on one particular edge:

What’s going on here? Well first of all, the way we observe things is by, e.g., having the particles we want to observe interact with particles in our eye. In this example, we’ll take the red particle at the bottom to be a particle “in our eye” and observe the blue particle by having it interact with the red. Second of all, there are various “delay” gates throughout (at the bottom left, the delay gate is explicit, at the top, two wires are labeled “delay”, which means that they pass through a delay gate, although I haven’t drawn it). These aren’t really significant; they’re just to synchronize things.

Note that this gate has exactly the same behavior as our first setup, except that we are observing when the blue particle is on the bottom edge by having it interact with the red particle (and syncing things up). However, the results are as in the second case: with nonzero probability, the red particle appears on the bottom rightmost edge! The observation blocks the destructive interference of the first case.

Heisenberg Uncertainty

Say that a particle in a quantish state is definite with respect to a measurement angle if passing it through a switch wire of a vertex with measurement angle and no other particles entering the vertex will yield the particle always emerging on one specific edge. Heisenberg uncertainty is represented in quantish physics by the fact that whenever a particle is definite with respect to some measurement angle , it is always indefinite with respect to measurement angles oblique to .

The Einstein-Podolsky-Rosen Experiment

Unfortunately, the diagrams for this setup are beyond my poor figure-making abilities. However, I can substitute a poor description for poor figure-making. It is possible to entangle two quantish particles by sending both of them through gates which have related measurement angles, then setting up two further gates through which a third particle is sent. In the first of these gates, one of the switch wires from the first particle’s measurement is the control wire, and in the second of these gates, one of the switch wires from the second particle’s measurement is the control wire. You then observe if the third particle emerges from the second of the two gates at the same place where it entered.

In that case, you can then measure the two particles both with respect to any fixed angle, and you will get the same results for both. (Let me reiterate that this was a terrible description; see Drescher’s book for more).

Functions with Very Low Symmetry and the Continuum Hypothesis

mkoconnor — Sun, 19 Jul 2009 20:32:03 +0000

A function from to is called even if for all , . We might call it even about the point if, for all , .

Conversely, we can call a function strongly non-even if for all , 0" class="latex" />, .

Finding strongly non-even functions is easy, as any injective function provides a trivial example. We can make things harder for ourselves by considering only functions from to . But now, it is just as easy to show that there are no strongly non-even functions.

Therefore, let’s make the following definition: Let a function be non-even of order if, for all , 0\mid f(x-h) = f(x + h)\}|\leq n" class="latex" />. Thus, a strongly non-even function is non-even of order , and a function being non-even of order implies that it’s non-even of order for all .

In this paper, the set theorists Peter Komjáth and Saharon Shelah proved:

The existence of a non-even function of order 1 is equivalent to the Continuum Hypothesis (i.e., the statement that ).

Thus, if we assume that there is a non-even function of order 1, then we can conclude that . Can we weaken the hypothesis and still conclude something interesting? We can, as they also proved:

For any , if there is a non-even function of order , then .

They showed this by showing the following (the statement above follows directly from this, given just a bit of thought):

For any vector space over of cardinality , and any function from to , there is an and a set of unordered pairs such that and , where the cardinality of is .

I’ll show here how to prove the weaker statement obtained by replacing with .

Fix and . We will construct a set of basis elements with the property that for every set , . Taking to be , this will provide the required number of unordered pairs. (As a slight bit of notational convenience, if is a set of basis elements, I will write for .)

Let be a basis for . For , let be called the th slice of basis elements. For each between and and , we will pick to be in the th slice of basis elements.

Given , suppose that we have defined for and we will define as follows: pick it so that it is in the th slice of basis elements but is not any singleton set of the form: for any finite sets and . Since there are only such singleton sets, and elements of the th slice of basis elements, we can find such a .

Now we define the . Given , assume that have already been defined and we will define : pick it from the th slice so that for all subsets of , (this is possible by the defining condition of .

It is now easy to prove the following proposition by induction on :

For any set (where ), equals .

(The proof simply uses the defining property of the .) Now, taking , the result follows.

A Suite of Cool Logic Programs

mkoconnor — Fri, 15 May 2009 00:24:49 +0000

You may have heard about the Tarski-Seidenberg theorem, which says that the first-order theory of the reals is decidable, that the first-order theory of the complex numbers is similarly decidable, or that the first order theory of the integers without multiplication is decidable.

In the course of John Harrison‘s logic textbook Handbook of Practical Logic and Automated Reasoning, all three of these algorithms (and many more) are implemented. Furthermore, you can download and play with them for free. (However, I still recommend checking out the book: especially if you are looking for a good textbook for a course on logic with a concrete, computational bent.)

Below, I’ll describe how to install the programs and try them out. There are many more interesting functions in this suite that I haven’t described.

Installation:

The software is written in OCaml and can be run interactively in an OCaml toplevel (don’t worry, you won’t actually need to know any OCaml). Download and install OCaml as well as its preprocessor Camlp5 (which is used for formatting formulas nicely).

Then, download the code from here (under “All the code together”) and unzip it somewhere.

To run it, go to wherever you unzipped it and type make interactive in a shell.

(At least, that’s what worked for me on a Mac OSX. Other systems may be different.)

The Tarski-Seidenberg Theorem.

The Tarski-Seidenberg theorem implies that there is a decision procedure which, given a first-order sentence over using plus, times, 0, and 1, will tell you if it’s true or not. The function real_qelim implements this. Let’s try it out. (The symbol # indicates the beginning of the prompt; don’t type that, just type in what’s after it.)

This function knows that not all quadratic polynomials have roots, but all cubics do.

# real_qelim <>;;
- : fol formula = <>
# real_qelim <>
  ;;
- : fol formula = <>

Many geometric puzzles can, in theory, be solved automatically by this function. Unfortunately, it is too slow for most interesting ones. Harrison notes that there are open problems about kissing numbers of high-dimensional spheres which could be solved in theory by this algorithm, although in practice it is an unworkable approach.

This algorithm actually does something stronger than decide the truth of first-order sentences: it does quantifier-elimination, which means that if you give it a formula with free variables, it will give you a quantifier-free formula in those same free variables (in the case of a sentence, which has no free variables, that means either the formula “true” or the formula “false”).

For example, if you’ve forgotten the quadratic formula and want to know what the condition is for a quadratic polynomial to have a root:

# real_qelim <>;;
- : fol formula =
<<(0 + c * 4) + b * (0 + b * -1) = 0 \/
 ~(0 + c * 4) + b * (0 + b * -1) = 0 /\
 ~(0 + c * 4) + b * (0 + b * -1) > 0>>

Note that there is no claim that the formula it gives you will be completely simplified, only that it will be correct.

Deciding Sentences over the Complex Numbers

We can similarly use the function complex_qelim to do quantifier elimination over the complexes. The fact that this possible is easier to prove than the corresponding fact for the reals, and the algorithm is similarly faster.

# complex_qelim < x = 1>>;;
- : fol formula = <>

The following sentence is also true over the reals (although for a different reason than why it’s true over the complexes), but it takes significantly longer for the real quantifier elimination algorithm to decide it.

# complex_qelim < x1 + x2 + x3 = 0>>;;
- : fol formula = <>

Suppose we read on wikipedia that the translation of the limaçon to rectangular coordinates is . We can verify this (I’ve used s to represent and c to represent ):

# complex_qelim << forall r s c x y. (x^2 + y^2 = r^2
  /\ r * c = x /\ r * s = y ==>
  forall a b. (r = b + a * c ==>
  (x^2 + y^2 - a * x)^2 = b^2 * (x^2 + y^2)))>>;;
- : fol formula = <>

Presburger Arithmetic

Finally, first-order sentences with plus and less-than over the integers and over the natural numbers are decidable. The relevant functions are integer_qelim and natural_qelim. Even though multiplication of variables is prohibited, we can still multiply by constants (since for example, instead of we could have written anyway).

An example Harrison gives is: There is an old (easy) puzzle which is to show that, with 3- and 5-cent stamps, you can make an -cent stamp for any .

# natural_qelim <= 8 ==>
  exists x y. 3 * x + 5 * y = n>>;;
- : fol formula = <>

An Interesting Puzzle in Propositional Logic

mkoconnor — Thu, 09 Apr 2009 13:47:48 +0000

Suppose that you’re translating an ancient text, and in this text you come across three words whose meaning you are unsure of: , , and . So, you head down to the ancient language department of your local university.

The first professor you come across, , knows what means, the second professor you come across, , knows what means, and the third professor you come across, , knows what means. So you have fortunately solved your problem.

But you’re now curious and decide to meet some other professors in the department. The next professor you come across is named . He doesn’t know what means or what means, but if you told him what meant, he would be able to tell you what means (for example, maybe he knows that is the noun form of ).

The next professor you meet is named . If you told him what meant and what meant, he would be able to tell you what means.

The next professor you meet is named . If you told him a method for finding out the meaning of given the meaning of , he would be able to tell you the meaning of .

In general, for any two professors and , there is a professor with the property that if you told him what knew, he would be able to tell you what knows (but doesn’t know any more than that).

Notice that some professors have essentially the same state of knowledge. For example, and have essentially the same knowledge, since to get the meaning of out of you only have to tell him a method for finding out the meaning of given the meaning of , which is something that you can do without any particular special knowledge concerning , and .

A more nontrivial example is that has the same state of knowledge as . This is because each can “simulate” the other. In one direction, suppose somebody told the meaning of . He therefore knows a trivial “method” for getting the meaning of given any inputs, and so he knows the meaning of . In the other direction, suppose we told a method for turning (methods for turning the meaning of into the meaning of ) into the meaning of . Well, knows a method for turning the meaning of into the meaning of , so he can use that find the meaning of . He can then use his method a second time to turn that into the meaning of .

The puzzle is then to prove that there are only finitely many professors with different states of knowledge.

This puzzle is equivalent to showing that intuitionistic implicational propositional logic over three variables has only finitely many logically inequivalent formulas. Another formulation is: Let be the free cartesian closed category over objects. Given objects and , say that and are equivalent if there is an arrow from to and an arrow from to . Then there are only finitely many equivalence classes in . (The corresponding statements are also true with replaced by .)

This fact was first proved using algebraic methods by Arturo Diego in his Ph.D. thesis in the 1940’s. It was subsequently reproved using semantic methods by various people including Nicolaas de Bruijn and Alasdair Urquhart. A good overview of those results is in Lex Hendriks’s thesis. I proved it in a combinatorial way as part of my thesis.

To rigorously state the problem: Let be a finite set of propositional variables, and let be the smallest set containing and such that if formulas and are in , then the formula is in . We define a relation between sets of formulas in and formulas in as follows: We will let be the smallest relation such that, for any , :

.
If and , then .
If and , then .
If , then .

The relation formalizes the notion of a formula being provable given a set of hypotheses.

We say that and are equivalent if and , and the proposition is then that there are only finitely many equivalence classes in .

Unfortunately, a full solution using no other machinery (at least the one that I came up with) is a bit too notationally cumbersome for a blog entry, but the puzzle is by no means inaccessible to someone with no other knowledge of logic.

In any case, I will sketch the solution in an important special case: that of formulas which are left-associated, i.e., of the form where each is a propositional variable. Since we know how to parenthesize such formulas, we can write them simply in the form .

The crucial insight is the following:

Let be a propositional variable for . Let and suppose that . Then the formulas and are equivalent.

To see why this is so, imagine that you are trying to prove some and you have some hypothesis . What does allow you to do?

If , it allows you to complete the proof immediately.
If is of the form , then it allows you to change the goal from to .
If is of the form , then it allows you to change the goal from to and give yourself as a hypothesis.

In the third case above, suppose that is of the form . The only way that can be of use is if you have some way to change the goal from to . In general, if you have as a hypothesis, if you get to use you must have a method for turning the goal from to for . By assumption in the boxed statement, we therefore have a method for turning the goal from to and vice versa. So and are equivalent. (Actually, what this argument shows is that they can be used in equivalent ways as hypotheses in a proof, which turns out to be enough.)

Now also observe that

For any formula and variable , is equivalent to .

From the two boxed facts, it pretty easily follows that every left-associated formula is equivalent to one of length at most (where is the number of propositional variables): Suppose that is equivalent to no shorter formula. Chopping up into triplets, I claim that no two triplets are the same: If so, and a triplet consisted of all the same variable, we could apply the second boxed fact to get a shorter formula. Otherwise, we can apply the first. Since there are only distinct triplets, that gives a length of at most .

Therefore, there are at most left-associated formulas.

What Happens When You Iterate Gödel’s Theorem?

mkoconnor — Tue, 24 Mar 2009 02:38:30 +0000

Let be Peano Arithmetic. Gödel’s Second Incompleteness Theorem says that no consistent theory extending can prove its own consistency. (I’ll write for the statement asserting ‘s consistency; more on this later.)

In particular, is stronger than . But certainly, given that we believe that everything proves is true, we believe that does not prove a contradiction, and hence is consistent. Thus, we believe that everything that proves is true. But by a similar argument, we believe that everything that proves is true. Where does this stop? Once we believe that everything proves is true, what, exactly, are we committed to believing?

This is from Chapters 13–15 of Torkel Franzén’s book Inexhaustibility, which is admirably clear and well-written.

First off, let be , and let be . By the considerations above, we accept that each is sound. (A theory being sound means that everything it proves is true.)

So, if we therefore let , then we accept that is sound. We could therefore define to be and we would have to accept that as sound as well, but in making this definition we run up against our first snag.

The snag is this: In order to express a sentence of the form in the language of number theory, we much choose some recursively enumerable presentation of , and which recursively enumerable presentation we choose matters. For example, if we add to any given presentation of the stipulation that we are adding, for all 0" class="latex" /> and 2" class="latex" /> such that the axiom , then we haven’t actually added any new axioms, but if we construct the statement with presented the second way, will imply Fermat’s Last Theorem, while constructed from the original presentation of may not.

As you might guess from the above, we are going to want to construct for ordinals . When is a successor ordinal, it is clear how to get a “reasonable” presentation of from a reasonable presentation of (where ), but if is a limit ordinal, in general it won’t be clear (although it is clear for ).

So how can we solve this problem?

The first step is to use a more computable representation of ordinals, namely ordinal notations. An ordinal notation is a number with the following property: It is either 0, or for every , the output of the th Turing machine on input is an ordinal notation. (What this recursive definition really means is that the set of ordinal notations is the smallest set satisfying the above property.)

Given an ordinal notation , we let the ordinal it represents, , be defined by and , where denotes the output of the th Turing machine on input .

We can now uniformly pick presentations of for ordinal notations by letting and be presented as the union of over , where the consistency statements are constructed using the presentations given by induction.

Unfortunately, this doesn’t prevent us from doing the trick mentioned above: For any true sentence , there is an ordinal notation such that and proves . The catch is that will be quite an unusual notation for 1, and we’re not really justified in taking to be a consistency extension of because doesn’t “know” that is an ordinal notation.

However, we can make a reasonable definition for what it means for (or any extension) to prove that a number is an ordinal notation. (This is actually not trivial, since can only talk about numbers, but the set of ordinal notations was defined to be the least set satisfying a certain property.) We can then define an autonomous consistency extension of as follows: is an autonomous consistency extension of itself, and if is an autonomous consistency extension of , and proves that is an ordinal notation, then is an autonomous consistency extension of .

The autonomous consistency extensions of have some claim to being exactly those that we recognize to be consistency extensions of solely on the basis that we accept . But that isn’t really completely satisfying. There’s nothing stopping us from letting be the union of the autonomous consistency extensions of and considering . Similarly, we got the set of autonomous consistency extensions of by starting with and then closing under finite applications of a particular operation, but we could also have considered transfinite applications of that operation.

Does there exist a theory (which we believe is true) which will prove anything any reasonable iterated consistency extension of proves? It turns out there is. Let be the theory obtained by adding to the axiom that for any sentence (that is, sentence of the form where all of ‘s quantifiers are bounded), if proves , then is true.

This property is called -soundness, and the axiom formalizing it is called a reflection axiom. If is -sound, then so is , since if proved a false statement , then would prove the false statement (false because -soundness implies consistency). Similarly, any union of a chain of -sound theories must be -sound.

Because we can formalize the above argument in , proves that every autonomous consistency extension of is -sound. Therefore, it proves that every autonomous consistency extension of is consistent. Therefore (since essentially autonomous consistency extensions of say nothing besides the fact that lower autonomous consistency extensions are consistent), extends each autonomous consistency extension of .

Okay, so adding axioms asserting that is -sound takes us beyond all the autonomous consistency extensions of . But what happens if add to axioms asserting that is -sound? This is called a reflection extension, and we can form autonomous iterated reflection extensions just as we can autonomous iterated consistency extensions.

Is there any theory (which we believe is true) which goes beyond all the autonomous reflection extensions the same way that goes beyond all the autonomous consistency extensions of ? There is. The theory asserts that all sentences that proves are true. But it’s actually the case that all sentences that proves are true.

By a result of Tarski’s we can’t define truth of an arithmetical formula in , but we can define it by adding a new predicate to the language of , together with suitable axioms. The resulting theory , extends every autonomous reflection extension of .

In terms of what arithmetical sentences they can prove, is an equivalent theory to (edit: not ), which is the theory of second-order arithmetic, with a comprehension axiom for all arithmetic formulas. This is essentially because sets of numbers in are interchangeable with formulas in the language of with one free variable in .

And, of course, we then get autonomous iterated truth extensions of , in analogy to the autonomous iterated reflection extensions and the autonomous iterated consistency extensions. Here there is again a natural theory which extends all the autonomous iterated truth extensions, a theory called : it’s a theory of second-order arithmetic, like , but it allows comprehension for -formulas (formulas with a universal set quantifier in front), instead of just arithmetic formulas.

Of course, we can now start again, taking consistency or reflection extensions of . But, as Franzén says:

[E]xtending to opens the door to a number of possible extensions that go beyond reflection. In particular, we can extend a theory by introducing axioms about sets of higher type—meaning sets of sets of natural numbers, sets of sets of sets of natural numbers, and so on—and by introducing stronger comprehension principles for sets of a given type. … Axiomatic set theories like give powerful first-order theories which prove everything provable in such iterated autonomous extensions. … In this connection the term “reflection” reappears and takes on a new meaning. … [This] leads to a further indefinite sequence of extensions of set theory, and furthermore, “axioms of infinity” [i.e., large cardinal axioms], have been formulated which can be reasonably argued to be stronger, as far as arithmetical theorems are concerned, than any such extension by set-theoretic reflection.

Trigonometric Series and the Beginnings of Set Theory

mkoconnor — Sat, 20 Dec 2008 13:38:17 +0000

Let be a -periodic function. It may or may not have a representation as a trigonometric series

A natural question to ask is whether or not the representation of as a trigonometric series is unique, if it has one. It was the consideration of this question that led Cantor to the invention of set theory.

There is a nice writeup of this story in the first part of this article by Alexander Kechris. I’ll give part of the story below.

Cantor solved the problem in the affirmative; i.e., he proved:

Suppose that a trigonometric series converges to zero everywhere in . Then all the coefficients of that series are zero.

(By subtraction, this is equivalent to the problem stated above.) He was also able to show (by a very similar method) the following, which I’ll call the Isolated Points Lemma:

Suppose and that a trigonometric series converges to zero on . Then that series converges to zero at as well.

From these two results, we can immediately conclude the following:

Suppose that a trigonometric series converges to zero at all but finitely many points. Then the coefficients of that series are all zero.

Call a set a set of uniqueness if whenever a trigonometric series converges to zero on , the coefficients of that series are all zero. Then the previous result may be stated: “All finite sets are sets of uniqueness.”

But we can use the Isolated Points Lemma to show more than that. For example, we can show that the set is a set of uniqueness. The reason is that if a trigonometric series converges to zero on , then by the Isolated Points lemma, it also converges to zero on the points in .

But, now that we know that it converges to zero on the points in , we can apply the Isolated Points Lemma again to show that it converges to zero at 0 (since we now know that at converges to zero on, e.g., ).

What we have actually shown by the above argument is the following:

Given , let be the set of limit points of (also known as the Cantor-Bendixson derivative of ). If is a set of uniqueness, then is a set of uniqueness.

For any , let be the th Cantor-Bendixson derivative of . Then, by iterating the above fact, we have the following:

Suppose that for some , . Then is a set of uniqueness.

This is as far as we can go as long as we merely iterate the Cantor-Bendixson derivative finitely often. But, if we make the leap to iterating it transfinitely many times, we can go much further:

Theorem: All countable closed sets are sets of uniqueness.

Proof: First, define for all ordinals and all closed sets as follows:

.
, when is a limit ordinal.

The Isolated Points Lemma says that if a trigonometric series converges to zero outside of , then it converges to zero outside of . We will generalize this by showing the following lemma:

Lemma: If a trigonometric series converges to zero outside of , then it converges to zero outside of for any .

Proof of Lemma: This is by transfinite induction. The successor step of the induction is just the Isolated Points Lemma again, so all we have to show is that, fixing a trigonometric series, if is a limit ordinal and the series converges to zero outside of each for , then it converges to zero outside of . But this follows simply because every point of must be in some for by definition. End of proof of Lemma.

To complete the proof of the theorem then, we just have to observe that for all countable closed , for some . Clearly, for all closed , there is an such that (this is because is a decreasing sequence). But it is standard fact that a set such that is either empty or of cardinality . (Such a set is called a perfect set and a reference for the cited fact is page 7 of David Marker’s notes on descriptive set theory.) End of proof.

As a historical note, Kechris reports that while thinking about the above issues led Cantor to discover ordinals, he never actually wrote down a proof of the above theorem; that was finally done by Lebesgue in 1903.

Further, it was later proven by Bernstein and Young independently that arbitrary countable sets are sets of uniqueness, and by Bari that countable unions of closed sets of uniqueness are sets of uniqueness.

Edit: Simplified the proof.

A Simple Introduction to Quantum Groups

mkoconnor — Fri, 12 Dec 2008 02:49:23 +0000

In the course of reading some background material for an article by James Worthington on using bialgebraic structures in automata theory, I was led to finally reading up on what a Hopf algebra (sometimes called a “quantum group“) is.

Although it is not strictly related to logic, I’ll write up what I learned here.

My main source for this is sigfpe’s blog post on this. In fact, all I really did was take his post and remove the Haskell from it (and probably add some mistakes). If you can read Haskell, I definitely recommend that post (and his blog in general). I also read the article on quantum groups in the Princeton Companion to Mathematics, which is an amazing book.

We’ll take the ordinary definition of a group and turn it into the definition of a quantum group in two steps.

Step 1: Groorgs.

In the first step, we’ll “symmetrize” the definition of a group to get an object I call a groorg. What does it mean to symmetrize a definition? Well, for one thing, as part of the definition of a group we have a multiplication . Therefore, to make it symmetric, we should also have a comultiplication .

Furthermore, this comultiplication should satisfy laws dual to those satisfied by multiplication. For example, multiplication is associative, which means that for any , , , the two ways of using multiplication to turn the triple into a single element are the same. Dually, then, comultiplication should be coassociative, meaning that for any element , the two ways of using comultiplication to turn that single element into a triple should be the same.

OK, so we’ve seen that the dual of multiplication is comultiplication. Another part of the definition of a group is the inverse function , which is no problem, since it can be its own dual. But what about the identity element ? This is puzzling until you think of the identity not as an element of , but as a map from a one element set to , which, when written it that form, I’ll call . Then it’s clear that the dual of should be . The astute reader will note that there is only one such function, so this seems to be a trivial addition, but let’s press on regardless.

We can now begin the definition of a groorg.

Definition of a Groorg, Part 1. A groorg is a set together with:

A map .

A map .

A map .

A map .

A map .

These must satisfy the following properties:

Multiplication must be associative and comultiplication must be coassociative.

should be a unit for multiplication. This means that, given any , if you form an ordered pair with and , then apply to that ordered pair, you get back.

should be a counit for comultplication. This means that, given any , if you apply to to get an ordered pair, then destroy one of the componenents with (i.e., just discard it), then you get back.

Let’s pause here. If you’ve followed along, you may have noticed that this last condition forces to be (which does satisfy the conditions so far). This is obviously quite restrictive, but it has one benefit: it means that we can rewrite the inverse law using comultiplication. To see what I mean, consider the following: If we let and , then the usual inverse law for a group says that for any , . Now, since we know that is forced to take each to , we can rewrite the inverse law as: .

We can make this completely symmetric by getting into the act:

Definition of a Groorg, Part 2. A groorg is also required to satisfy: for all (where and are as above.

And we finally have a requirement which says that the two ways of computing by using comultiplication and multiplication are equal.

Definition of a Groorg, Part 3. A groorg is also required to satisfy: , where the second is the natural multiplication on .

This concludes the definition of a groorg.

But, what use is it? As we’ve observed, in every groorg, we must have that sends to , so that the groorg just reduces to an ordinary group. Furthermore, every group becomes a groorg by defining comult in that way (and by defining counit in the only possible way).

What we’ve gained is that we now have a definition of a group which is equivalent to the old one and which is symmetric, which will lend itself well to our next step.

Step 2. Adding superpositions.

How can we turn this concept of a group into one of a “quantum” group? If you’re like me, the only thing you know about quantum mechanics is that you often hear the word “superposition” used in conjunction with it. That’s not much, but it turns out to be enough in this case.

Instead of having the composition of two group elements be another group element, let’s have it be a superposition of group elements. It turns out that what this should mean is a linear combination of group elements. So, let be the -vector space generated by taking as a formal set of basis vectors. Instead of requiring that be a map from to , we will let it be a map from to . So we have the following:

Provisional Definition of a Quantum Group. A quantum group is a set together with

A map .

A map .

A map .

A map

A map .

satisfying …

Before we can think about what properties these functions should satisfy, we have to settle a question: We know how to multiply two group elements to get a superposition of group elements, but how should we multiply two superpositions of group elements? For example, what should the product of and be? (Note that I am using the same symbol to stand for the group element and the formal basis vector corresponding to it.)

The quickest way to define the multiplication of superpositions is to notice that, since is a basis for , the map extends to a linear map , which we can use to multiply superpositions. In the above example, is equal to , so the product of the two superpositions would be .

Now, our symmetric definition of a group above translates exactly, and we no longer need to mention the basis explicitly:

Definition of a Quantum Group (or Hopf Algebra). A quantum group is a -vector space together with:

A linear map

A linear map

A linear map

A linear map

A linear map

satisfying the analogues of the laws given in the definition of a groorg.

Some Combinatorial Examples

I believe there are many examples of the usefulness of this concept in physics. However because I don’t know any physics, I won’t give them.

Here are two combinatorial examples from sigfpe’s blog post:

Example 1: A Quantum Group on Finite Strings.

Let be an alphabet, and let be the set of all finite strings with characters from . We may put a quantum group structure on as follows:

We let , the concatenation of and .
We let , the empty string.
We let and where .
We let , where is the length of and is the reverse of .
We let be the sum of where and can be shuffled together to give . This means that the characters in occur in in the same order, and when you remove them you get .

For an example of the comultiplication, if and , then .
We can verify the inverse law in this case: If we apply on the right to , we get . Applying to this, we get .

Example 2. Another Quantum Group on Finite Strings.
This is also a quantum group on .

We let be the sum of all possible ways of shuffling and together.
We let , the empty string.
We let and where .
We let , where is the length of and is the reverse of .
We let be the sum of all such that .

Again, we can verify the inverse law in a specific case: We have that . If we apply on the right, we get . Now applying , we get .

Example 3: A Quantum Group on Finite Binary Trees.
This example is from here. We think of a finite binary tree as a finite tree where each node has either zero or two children, and where we distinguish between left and right. This picture from the above paper shows how, if you select a leaf of a finite binary tree, you may divide the tree into the tree to the left of the leaf and the tree to the right of the leaf:

If you select a multiset of leaves, you may similarly divide your tree into trees.

We may now define a quantum group on , where is the set of finite binary trees as follows:

is the sum of all trees generated as follows: Suppose that has leaves. Divide into trees as above, and stick them onto the leaves of . There are many graphical examples of this here.
is the tree with a single node and no leaves.
is the sum of all , where and are trees that can be divided into. There are many graphical examples of this here.
is 1 if is the tree with only one node, and zero otherwise.
For the definition of , I refer you to the above paper. However, there are many graphical examples of the antipode here.

Doing Calculus on the Rationals (with the help of Nonstandard Analysis)

mkoconnor — Sat, 15 Nov 2008 02:32:08 +0000

Nonstandard Analysis is usually used to introduce infinitesimals into the real numbers in an attempt to make arguments in analysis more intuitive.

The idea is that you construct a superset which contains the reals and also some infinitesimals, prove that some statement holds of , and then use a general “transfer principle” to conclude that the same statement holds of .

Implicit in this procedure is the idea that is the real world, and therefore the goal is to prove things about it. We construct a field with infinitesimals, but only as a method for eventually proving something about .

We can do precisely the same thing with instead of with . But, in Weak Theories of Nonstandard Arithmetic and Analysis, Jeremy Avigad observed that if we don’t care about transferring the results back down to , then we can get all the basic results of calculus and elementary real analysis just by working with , and without ever having to construct the reals.

Let me first differentiate two approaches to nonstandard analysis. The first is the one I mentioned above, where you actually construct a field (although you need the axiom of choice to do it). This is done entirely within ordinary mathematics. Call this the semantic approach.

Another approach is the axiomatic approach. A good example of this is Edward Nelson‘s internal set theory. In this approach, you take an ordinary axiomatization of some part of mathematics (for example, ZFC), introduce a new predicate for being “standard” or “normal-sized”, and some axioms saying that there exist things which are not standard and how these things relate to everything else. In the usual situation, a sentence which does not contain the predicate “standard” is provable in the new theory iff it’s provable in the old theory. (This is the case with IST and ZFC.)

The axiomatic approach is the approach we’ll take here. We’ll let our language consist of a function symbol for each primitive recursive function and relation, together with a predicate and a constant . Our axioms will be the following:

If is a true (in the natural numbers) first-order -sentence that does not include the new predicate , then we take as an axiom.
We take as an axiom.
We take as an axiom.
We take to be an axiom for each -ary primitive recursive function .

The interpretation of our sentences is that we are now quantifying over a domain which includes infinitely large natural numbers (of which is an example) and that the predicate picks out those which are normal-sized. However, since we are working within the axiomatic system, I will still refer to the domain we are quantifying over as .

Within the system, construct and from as usual. We make the following definitions:

We say that an natural number is unbounded if it is not standard (i.e., if ). We say that an integer is unbounded if is unbounded. We say that a rational is unbounded if the closest integer to it is unbounded.

Furthermore, we say that a rational number is infinitesimal if it equals 0 or if is unbounded. We say that and are infinitely close, written , if is infinitesimal.

Let be the set of rationals which are not infinite. We can now do analysis on . First of all, we can define continuity in a natural way: We say that is continuous if whenever , .

We have the intermediate value theorem for : If and 0" class="latex" /> and is continuous, then there is a such that . Proof: Recall that is a natural number. Let be the maximum natural number less than such that . (This is possible because there are only finitely many natural numbers less than any natural number, including !) But then must be infinitely close to , since by continuity .

We can also prove that any continuous function on attains a maximum (up to ) by essentially the same means: just consider the for which is a maximum, which is again possible considering that there are only finitely many .

Turning to differentiation, we may define if for all non-zero infinitesimals ,

(Note that the derivative is actually defined only up to .) We can then prove that the derivative of is by letting be an arbitrary infinitesimal, expanding , dividing by , and noting that what results is plus an infinitesimal.

Avigad notes that we may continue by defining , , and by taking an unbounded partial sum of the Taylor expansions, and that this is sufficient to prove all the basic properties. He also cites an easy proof in this setting of the Cauchy-Peano theorem on the existence of solutions to differential equations.

Games Which are Impossible to Analyze

mkoconnor — Sun, 09 Nov 2008 14:38:35 +0000

In the last post, I mentioned the computational complexity of various games. To be explicit, we consider each “game” to actually be a sequence of games for . For example, would be checkers played on an board. The problem was then to analyze the computational complexity of the function which takes and tells you which player has a winning strategy and what the winning strategy is. I’ll call that function the analysis function of .

Are there any games which can actually be played in the real world with an undecidable analysis function? Robern Hearn, in the same thesis that I linked to last time, showed that the answer is yes.

In order to make sure that our games can be played in the real world, we’ll restrict our attention to games where each has only finitely many positions (that is, the board has only finitely many states it can be in).

First off, observe that if is a game of perfect information, then it has a decidable analysis function. By the fact that there are only finitely many positions in each , you can construct a finite game tree for (finite because you can cut it off when positions repeat) and then induct up it to find out who has a winning strategy. (You may want to think about this if you haven’t seen it before. This is something like what is sometimes called Zermelo’s Theorem.)

OK, so let’s look at games which are not of perfect information, i.e., each of the players have some private information. Now it’s possible that no player has a winning strategy (for example, consider the two-player game where both players secretly choose either 0 or 1, and if the sum is odd then Player 1 wins, and if it’s even then Player 2 wins). Even so, the question of which, if any, players have a winning strategy is well-defined.

It turns out that these also have decidable analysis functions. I’ll also omit this proof, but it’s similar to the above except that, instead of a finite tree of positions, you can construct, for each player, the tree of subsets of positions that he considers possible at any given time.

So what hope is there? I said that both games of perfect information and games of imperfect information have decidable analysis functions. But there was an unstated assumption: that all players are playing against each other. Hearn showed that a game with two players who are playing as a team against a third player but such that each of two team members has private information (which they are not allowed to communicate) can be undecidable.

The idea of the proof is as follows: Imagine that we have three players, Player 1, Player 2, and Player 3, and that Players 2 and 3 are playing as a team against Player 1. In order to make the analysis function undecidable, we would like to do something like (say) in game force Player 2 to emulate the th Turing machine and have his team wins if it halts. But we can’t literally make the board an infinite tape and make Player 2’s legal moves be those simulating the th Turing machine, because then the board would have infinitely many positions.

But what if, instead of an infinite tape, the board was a single cell where Player 2 wrote out the computational history of the th Turing machine one character at a time? That is, if represents the (finite) contents of the tape at time , he writes out the concatenation of all the (with, say a special symbol # separating them). This would be good, but how can we enforce this given that we want the set of legal moves to depend only on the board position?

The solution is the following: We require both Player and Player 3 to write out the computational history of the th Turing machine in two separate “streams”. Player 1 lets each of them know which stream he wants them to to write a character to at any given time. Player 1 can then check them against each other by advancing one of them ahead of the other in one of the streams and checking, character by character, that the string that one of them is writing out is one step further along the computational history of the th Turing machine than the string that the other is writing out. Since Player 2 and Player 3 do not know who is advanced relative to the other, they will not be able to cheat.

Hearn uses this argument to show that a game he calls Team Computation has an undecidable analysis function. He then uses that to show that the team version of Constraint Logic (discussed in the previous post) is undecidable.

How to Show that Games are Hard

mkoconnor — Tue, 04 Nov 2008 02:35:43 +0000

Peg Solitaire is a pretty popular game, often found in restaurants (including Cracker Barrel, if I remember correctly). It’s also NP-complete (by which I mean determining a winning strategy given the initial set-up is an NP-complete problem). You may have also heard of computational complexity results for Minesweeper (see here, for example). There are a number of other results showing that various popular games are complete for some complexity class.

But what if you come across a new game, which no computer scientist has heard of yet? Well, you’re in luck, as Robert Hearn, in his thesis, formulated a framework called Constraint Logic intended to make it easy to prove complexity results for games.

Games are classified by Hearn by whether they are zero-, one-, or two-player, and further by whether or not their length is polynomially bounded by their initial setup. (Hearn also covers team games, which I’ll cover in a later post.)

Recall that by saying a game is in a complexity class I mean that the problem of, given an initial setup for the game, determining whether or not there is a winning strategy for a given player (if any) and, if so, what that strategy is, is in that complexity class. Thus, when we talk about a game being in a particular complexity class, we are always implicitly talking about a way of generalizing the game to different initial setups. For example, the statement that Go is in EXPTIME means that the problem of, given , determining a winning strategy for Go on an board is in EXPTIME.

An example of a zero-player bounded-length game is the game Clock Solitaire. Once the cards are dealt, the game is completely deterministic (this makes it zero-player), and furthermore, each card is turned over at most once (this makes it bounded). Games of this sort are in the complexity class P.
An example of a zero-player unbounded game is Conway’s Game of Life. Games of this sort tend to be in PSPACE. Conway’s game of life is PSPACE-complete.
An example of a one-player bounded-length game is Peg Solitaire, mentioned above. It is bounded since each move removes a peg, and there is no way to get a peg back. Games of this sort are in NP. Peg Solitaire, as mentioned above, is NP-complete.
Examples of one-player unbounded games include Rubik’s Cube and Rush Hour. Games of this sort tend to be in PSPACE. Rush Hour is PSPACE-complete. (There are a couple of different ways of generalizing the Rubik’s Cube to higher dimensions, but I’m not aware of complexity results for any of them. If you know, please leave a comment.)
Examples of two-player bounded-length games include Tic-Tac-Toe, John Nash’s game Hex, and Othello. In both cases, the game is of bounded-length since once a player marks a space on the board, it can never be unmarked or marked again. Games of this sort are in PSPACE. Hex and Othello are PSPACE-complete. (As with the Rubik’s cube, there are a couple of different ways to generalize Tic-Tac-Toe to higher dimensions, but I don’t know of any complexity results for them. If you know, please leave a comment.)
Examples of two-player unbounded games include many familiar games such as checkers, chess, and go. Games of this type tend to be in EXPTIME. Each of the three games mentioned is EXPTIME-complete.
Hearn also defines “team” games, which I will skip in this post.

So what is constraint logic? It’s a single setup which naturally gives examples of games of each of the six types listed above, and furthermore is in each case complete with respect to the associated complexity class. Thus, if you can reduce constraint logic to whichever game you’re interested in, you have a completeness result.

The setup for constraint logic is as follows: The game board is an undirected graph with weights of 1 or 2 on the edges and non-negative integers assigned to the vertices (these non-negative integers are called the minimum inflow of the vertex). A position of the board is an assignment of direction to each edge. The position is legal if, for each vertex, the sum of all the weights of the edges directed towards the vertex is at least the minimum inflow of the vertex. A move is generally reversing the direction of an edge (to give another legal position), and the goal of the game is generally to reverse the direction of a specified edge.

How does this give us each of the six types?

To form a zero-player bounded-length game: Suppose given a board position , and a goal edge (i.e., is a directed graph with weights assigned to the edges and non-negative integers assigned to the vertices, and is an edge of , and the game is won if the direction of is reversed). Then let be the set of edges such that it’s legal to reverse the direction of . Reverse the direction of all of these edges to get a new position . Let be the set of all edges such that it’s legal to reverse the direction of and hasn’t been reversed before. Reverse all of those, and so on. Since an edge is reversed at most once, the game is bounded length. (Notice that it might be the case that in reversing some edge in , you make it so that some other edge in can no longer be legally reversed. This can indeed happen, but it is possible to restrict to a subclass of games where this never happens). This game is P-complete.
To form a zero-player unbounded length, you essentially do the same thing as above, but remove the restriction that an edge can be reversed at most once. There are some technicalities however, which are too ensure that every time you try to reverse an edge, it is a legal move. This game is PSPACE-complete
To form a one-player bounded game: On each turn the player reverses the direction of edge (to form a legal position). Each edge can be reversed at most once. The player wins if he is able to reverse the direction of the pre-specified goal edge. This game is NP-complete.
To form a one-player unbounded game: Just as above, except that each edge may be reversed as many times as necessary. This game is PSPACE-complete.
To form a two-player bounded-game: Each player has a set of edges which only they may reverse. Each player also has their own target edge. They take turns reversing edges and each edge may be reversed at most once. The first player to reverse their target edge wins. This game is PSPACE-complete
To form a two-player unbounded game: Just as above, except that each edge may be reversed an unlimited number of times. This game is EXPTIME-complete.

Part II of Hearn’s thesis (linked above) contains a large number of complexity proofs for games based on these results. Furthermore, he has results which make this endeavor easier: a priori, you would have to reduce any graph to the game in question to prove that it’s complete with respect to the appropriate complexity class. Hearn proves that it suffices to reduce planar graphs which consist solely of AND vertices, which are those of minimum inflow 2 and with exactly three adjacent edges, one of which has weight 2 and two of which have weight 1, and OR vertices, which are those of minimum inflow 2 and with exactly three adjacent edges, all of which have weight 2. (You can interpret games built up from these vertices as a kind of a non-deterministic circuit computation, which is where they get their name.)

For example, consider sliding block puzzles: In a sliding block puzzle, you are given a number of rectangular blocks in a rectangular box. The goal is to slide the blocks around to get one particular block in one particular case. Using Constraint Logic, Hearn, together with his advisor, Erik Demaine (who is amazing, by the way), showed that this was PSPACE-complete. The proof is shown in this figure, reproduced from this paper by Demaine and Hearn:

On the left is the translation of an AND vertex; on the riht is the translation of an OR vertex. In each case, the three yellow blocks on the border represent the three edges adjacent to the vertex. Reversing the direction of an edge corresponds to either pushing the block in to the square, or pulling it out. The squares are designed so that it is possible to slide the blocks in the interior around so that pushing a yellow block inside is possible iff the corresponding edge reversal is legal.

For more information, see Hearn’s thesis and the paper by Demaine and Hearn linked above, as well as this additional paper by Demaine and Hearn on the subject.

Another Puzzle in Recursion Theory: n-Enumerable Sets

mkoconnor — Sun, 26 Oct 2008 17:11:40 +0000

We can think of a computably enumerable (or c.e.) set as a bag which some computer program puts more and more numbers into over time. The set then consists of all numbers which are in the bag from some point on (i.e., the numbers which are eventually put in the bag by the program).

Suppose that we relax the restrictions on the program, and we allow it to take a number out of the bag that it has put in (but once the program has done that, the number stays out forever). We call the set of numbers which are in the bag from some point on (i.e., the set of numbers which are put in the bag and never taken out) in a procedure of this sort a 2-c.e. set.

We can analogously let an -c.e. set be one given by a program which can, for each number, “toggle” that number’s status up to and including times if it likes.

The puzzle is then to find, for each , an example of a set which is -c.e. but not -c.e.

Here is an example of a set which is -c.e. but not -c.e.: The program simulates all Turing machines simultaneously (by dovetailing, this can still be done with a finite algorithm).

For each it does the following: If the th Turing machine halts, it puts in the bag. In this case, the program also checks to see what number the th Turing machine output when it halted; call it . If eventually the th Turing machine halts, the program takes out of the bag. In this case, the program also checks to see what number the th Turing machine output when it halted; call it . If eventually the th Turing machine halts, the program puts back in the bag. And so on, up to a maximum of times.

Call that set . Clearly it’s -c.e. We have to show that it’s not -c.e.

First observe that each is complete with respect to -c.e. sets. This means that if is -c.e., then there is a computable function such that iff for all .

Now observe that if is -c.e., then its complement is -c.e. We can find an -c.e. presentation of by letting our computer program start by putting all numbers in the bag and then doing the opposite of what the -c.e. presentation of does.

Combining these two observations, there is a computable function such that for all , iff . By Kleene’s Fixed Point Theorem (also known as Kleene’s Recursion Theorem), there is an such the th Turing machine behaves the same as the th Turing machine. For this particular we then have that iff , which is a contradiction.

Two Puzzles in Recursion Theory: Verbose Sets and Terse Sets

mkoconnor — Thu, 16 Oct 2008 23:16:05 +0000

Let be the set of all such that the th Turing machine halts. (For these puzzles, we will assume that Turing machines are always run on a blank initial state, i.e., they take no input.) Recall that is computably enumerable, but not decidable.

Puzzle #1. Describe a winning strategy for the following game: You are given three numbers . You must correctly say for each number whether or not it is in . You are allowed to ask (and receive a truthful answer to) two questions of the form “Is in K?” for any .

Puzzle #2. Show that there is no winning strategy for the game which is the same as that in Puzzle #1 except you are given two numbers and may ask only one question. (Even stronger, show that if is a set such that you can win that game, then must be decidable.)

These puzzles are special cases of more general questions answered in Terse sets, superterse sets, and verbose sets by Richard Beigel, William Gasarch, John Gill, and James Owings. I also suggest looking at Richard Beigel’s page of online papers, which has a lot of interesting stuff.

I read about this first in Piergiorgio Odifreddi’s book “Classical Recursion Theory, Volume 1.”

Answer #1. The first thing to do is to find out how many of are in . To do this, let be such that the th Turing machine simulates the th, th and th Turing machines in parallel, and halts after any two of them halt.

By asking if is in , you will find out either that there are two or more or that there are one or less of in . Use a similar second question to find out exactly how many of are in .

Once you know how many of are in , just run the th, th, and th Turing machines in parallel, and wait until all the ones that are going to halt do halt. Then you will know which of them are in .

Answer #2. Suppose that you have a winning strategy, and I’ll show how to compute . Let be the function which, given and , returns the guess for whether or not is in , supposing that you received a “no” answer to whatever question your strategy decided to ask. Similarly define , , and .

Either it is the case that for all there is a such that or it isn’t. Suppose first that it is. Then given , we may compute whether or not is in by searching for such a and computing (which equals ). This must be the correct guess since either “yes” or “no” must be the correct answer to whatever question the strategy asked.

Now suppose that there is an such that for all , . Since is just a single number, we can assume that we know whether or not it’s in . Then, given any , since we know which of them is correct, hence we know whether “yes” or “no” is the correct answer to the question that the strategy would pose. Hence we can compute its correct guess as to whether or not is in .

Additional Results. The paper linked above gives generalizations of both puzzles: You can find out whether any set of numbers is in by asking questions (by the same strategy of determining how many of them are in ), and, for any , if you can determine whether any set of numbers is in by asking questions then is decidable.

The authors call a set verbose if it is such that you can determine whether any set of numbers is in that set by asking questions. They call it terse if you need questions to determine whether or not numbers are in the set. They are show a number of interesting results about these, mainly along the lines that (very roughly) lots of both kinds exist, and in lots of different places.

When are the Real Numbers Necessary?

mkoconnor — Tue, 14 Oct 2008 02:44:18 +0000

The natural numbers can all be finitely represented, as can the rational numbers. The real numbers, however, cannot be so represented and require some notion of “infinity” to define. This makes it both computationally and philosophically interesting to determine for what purposes you need the real numbers, and for what purposes you need only the rationals.

It’s pretty clear that spatial concepts having to do with distances and rotation require the real numbers. For example, if we took as our model of the plane, the distance from to would not be rational, and we would not be able to rotate the point about the point by most angles.

But I always implicitly thought that spatial notions not depending on distances or angles required only the rationals. It turns out that I was wrong: there are spatial notions not depending on distances or angles which differ depending on whether you take space to be or . The fact that I was wrong follows from a theorem of Micha Perles which is very famous in combinatorics, but which I only found out about recently.

I found out because the combinatorialist Drew Armstrong told me about it, and he referred me to the online book Lectures on Discrete and Polyhedral Geometry by Igor Pak.

Actually, the fact that I was wrong follows just from a lemma in the proof of Perles’s result, which I will state before telling you what Perles’s main result was.

Consider the following system of points and lines (the image is stolen from Pak’s manuscript):

The lemma is then that while there is a collinearity- and noncollinearity-preserving embedding of this diagram into , there is not one into . Note that the question of a collinearity- and noncollinearity-preserving embedding of the diagram says nothing about angles or distances. The proof is simply to assume that there is a rational embedding, then to find a rational transformation of the configuration to one where you know that one of the points has an irrational coordinate. This proof appears on page 108 of Pak’s book.

Perles’s main theorem is the following, and I think it’s quite striking: A polytope is the convex hull of a finite set of points in some , where we consider two polytopes equivalent if they are combinatorially equivalent: i.e., if there is a bijection between the two sets of vertices such that if one pair of vertices has an edge between them, the corresponding pair does as well, etc. Then for all dimensions greater than 3, there is an -dimensional polytope which is not equivalent to one which is the convex hull of a set of points with only rational coordinates.

The discussion of this in Pak’s book is in Part I, Section 12.5.

Edit: I removed a paragraph on planar graphs because it didn’t really fit the article, and I took out the phrase “purely combinatorial property,” which was misleading and probably incorrect.

What Would the World Look Like if Everything was Computable?: An Introduction to Hyland’s Effective Topos

mkoconnor — Tue, 14 Oct 2008 00:48:39 +0000

Suppose that we wanted to construct a mathematical universe where all objects were computable in some sense. How would we do it?

Well, we could certainly allow the set into our universe: natural numbers are the most basic computational objects there are. (Notation: I’ll use to refer to when we’re considering it as part of the universe we’ll building, and just when we’re talking about the set of natural numbers in the “real” world.) What should we take as our set of functions from to ? Since we want to admit only computable things, we should let be the set of computable functions from to , which we can represent non-uniquely by their indices (i.e., by the programs which compute them).

(For clarity, I’ll use the following notation for computable functions: denotes the partial function from to computed by the th Turing machine. Given , it is possible that the computation never halts; in that case, I’ll write and say that is undefined. If it does halt, I’ll write and say that is defined. If it halts, then it yields an output . To indicate what it is, I’ll write or .)

So, we’ve decided that should equal . What should equal? At this point, there is a slight subtlety: It’s not simply the set of computable functions from (considered as a subset of ) to (considered as ), because we would like to only admit those functions from to that return the same number when given inputs which represent the same element of .

Therefore, we’ll let be the set of such that, for all , , and whenever are such that for all , , then .

We can similarly define , except that there are now two places where we should take into account that we consider equivalent if for all , : We’ll let be the set of such that, whenever are equivalent in the aforementioned sense, and are defined and equivalent in the aforementioned sense.

In a similar fashion, we can define and so on; these sets are called the sets of hereditarily computable functions.

Can we generalize this construction to a category that incorporates all possible computable representations of real objects? More ambitiously, can we generalize to a category that is a genuine mathematical universe in the sense that questions like “Does the Riemann Hypothesis hold in this category?” are meaningful? The answer, due to Martin Hyland, is yes.

This material is from Jaap van Oosten‘s book “Realizability: A Categorical Perspective” (link to Preface, Introduction and Table of Contents). Unfortunately, I don’t know of a freely available explanation of the effective topos on the web, which is part of the reason why I’m writing this blog entry. (If you know of one, please leave a comment. Edit: Found one.) However, the Stanford Encyclopedia of Philosophy has a pretty good section on the realizability interpretation of intuitionistic logic, on which Hyland’s effective topos is based.

Back to the math. Notice that what we did in the case was the following: Although we represented the computable functions as a subset of , we still kept the “real” set hiding around in the background: we used it to determine what the appropriate elements of should be: If two elements of represented the same element of the “real set” , then an element of should assign the same number to both of them.

That suggests a generalization. Let an assembly be a pair , where is a set, and is a function from to , the power set of . We think of as assigning to each element of its computable representations. We let a morphism between two assemblies and be a function from to such that there is an such that, whenever , and , .

With these morphisms, the class of assemblies forms a category. Let and be two assemblies. Then they have a direct product given by , where and is a pairing function. They have an exponential given by , where .

If we let be the assembly , where , then the iterated exponential objects of correspond precisely to our initial definition of the sets of hereditarily computable objects above.

This is all great, but we still can’t call the category of assemblies a mathematical universe. Why not? Well, in the real world, we ask questions like “Is the Riemann Hypothesis true?”, “Is Goldbach’s conjecture true”, etc., but we don’t yet know how to ask questions like “Is the Riemann Hypothesis true in the category of assemblies?” any more than it makes sense to ask whether or not the Riemann Hypothesis is true in the group or in the number 17. What we need is a way to interpret statements being true or not in this category.

In turning the category of assemblies into one in which we can interpret logical statements, there are three considerations, each of which builds on the previous ones.

The object of truth values should have more than two elements. Let’s step back to the ordinary category of sets for a moment. Say we have two sets and and an injection from to . Given , what is the truth value of the proposition that is in the image of ? Well, I don’t know, but it’s clearly either or . But in the category of assemblies it’s more complicated due to the computational information we have lying around. Say that we have an injective morphism from an assembly to an assembly . Given , now what is the truth value of the proposition that is in the image of ? What if there is a such that but ? Without resolving the issue now, one plausible answer is that the truth value should be the set of indices of all computable functions taking to , so that the more “alike” and , the more “true” the proposition is, and furthermore this “alikeness” is represented in a computational way. So, a working hypothesis is that the set of truth values should be something like .
Objects should come equipped with an equivalence relation. In the category of assemblies, there is no question about whether or not two elements of a given object are equal. If we are making a category where the object of truth values is something like , however, we should allow that the proposition that different elements are equal has a truth value in that object, rather than in the classical set of truth values. Therefore, objects should be something like , where is a set and is a map from to . (We can represent an assembly as the object in our new category where and if ).
Morphisms should be more general than functions. If we’re allowing objects to come equipped with some sort of equivalence relation, we will have to let morphisms be more general than functions: If is a morphism from and , and , then is also true to some degree. So morphisms should probably be some sort of relation on that resembles a function in some way.

Now, after listing all those (somewhat vague) considerations, I’ll describe the category that takes them into account. It’s called the Effective Topos and it was discovered/invented by Martin Hyland.

Description of the category

The objects of the effective topos are pairs where is a set and is a map from to . This map is required to satisfy the following properties:

There must be a number such that for all if then .
There must be a number such that for all if and then .

(In the above, stands for “symmetric” and stands for “transitive.”)

A morphism from to is represented by a function satisfying the following:

There must be a number such that for all , if then where and .
There must be a number such that for all , , if , , and , then .
There must be a number such that for all , , if and then .
There must be a number such that for all , if then .

(In the above, stands for “strict,” stands for “relational,” stands for “single-valued” and stands for “total.”)

We say that two such representations and are equivalent if there exist such that for all , , if then and conversely if then . (Thus, a morphism in the Effective Topos is actually an equivalence class of representations as above.)

Figuring out how to compose such morphisms is an exercise left to the websurfer.

Let and be two objects. Their direct product is given by where . To form the exponential , take the object , where in the definition of , you emulate the definition of a morphism given above.

The object of truth values (often denoted in any topos) is , where .

The object playing the role of a singleton set is where .

The map from to representing the truth value is given by the equivalence class of the map defined by .

The natural numbers object of the effective topos is , where and where .

Interpretation of logical formulas in the effective topos.

I’ll now describe how logical formulas can be interpreted in Hyland’s effective topos. If are variables intended to range over the objects respectively and is a formula with free variables from , then I’ll show how to find a map from to interpreting that formula. If , and thus the formula has no free variables and is a sentence, then the interpretation will give a map from to . We say that a sentence holds in the effective topos if its interpretation is equal to the map defined above.

The only atomic relation is equality, and the interpretation of atomic formulas is given by the component of the objects of the effective topos.

For clarity, assume that and contain only one free variable, and that it ranges over the object . If we know the interpretations of and already, then we have the following:

is represented by the map taking to .
is represented by the map taking to .
is represented by the map taking to .

Now suppose that has two free variables ranging over and respectively, and I’ll show you how to interpret quantifiers.

is represented by the map taking to .
is represented by the map taking to .

Now, once we observe that we can interpret the power “set” of an object in the effective topos as the exponential , we know how to interpret all first- and higher-order sentences as holding or not in the effective topos.

Here are some interesting sentences given by Van Oosten that highlight some differences between the effective topos and the ordinary category of sets.

Note that we may write the relation “” as a relation on in our language. Then the sentence is true in the effective topos.
For every formula , where is a variable ranging over and is a variable ranging over , the sentence holds in the effective topos.
We may construct the rationals and the reals in the effective topos just as we do in the category of sets. However, they have different properties. For example, in the effective topos the statement “There exists a bounded monotonic sequence in that does not converge to a limit.” holds, contradicting the Bolzano-Weierstrass theorem. Intuitively, this is because we can find a bounded, monotonic sequence converging to a real number whose binary expansion encodes the halting problem but such that every member of the sequence has a decidable binary expansion.
The sentence holds in the effective topos.
Similar to the above, we may can show that the intermediate value theorem fails in the effective topos.
In the effective topos, the statement “All functions from to are continuous” holds.

A language which does term inference

mkoconnor — Wed, 08 Oct 2008 01:38:53 +0000

Many strongly typed languages like OCaml do type inference. That is, even though they’re strongly typed, you don’t have to explicitly say what the type of everything is since a lot of the time the compiler can figure it out by itself. For example, if you define a function which takes an x and adds it to 3, the compiler will figure out that x is an int. (It couldn’t be a float, since it was added to 3 and not 3.0.)

But it often seems like the compiler should be able to infer not just the types of expressions, but the expressions themselves! For example, if the compiler infers that the type of some function f is (int -> int) -> (int list) -> int list (i.e., f is a higher-order function which takes a function from int to int, a list of ints, and produces a list of ints), then f is very probably the map function, defined informally by

map g [x_1;...;x_n] = [g x_1;...;g x_n].

Therefore, if the compiler determines that some expression has that type, and the user has somehow omitted the actual function definition, why not allow the compiler to infer what the expression is?

I made a stab at implementing this type of idea in a toy language I call TermInf (apologies for the weird hosting: I don’t have another hosting service at the moment). It’s a modification of the toy language Poly from Andrej Bauer’s Programming Language Zoo. You’ll need OCaml to compile it. Please feel free to alert me to any bugs or to tell me that my code is horrible.

More details below.

The basic idea is really simple: For any expression e, the expression {e} is also an expression. The compiler will infer the type t1 of e and the type t2 that {e} has to be. It will search for a sequence of coercions taking t1 to t2 and if there is a unique one, it will replace {e} with that sequence of coercions applied to e.

Which functions are coercions is determined by the user; functions can be declared to be coercions or removed from the list of coercions at any point.

I can think of at least three ways this would be useful.

1. Automatically coercing from one base type to another

This is actually the least interesting of the three, but it serves to illustrate how TermInf works.

You can use $show_coercions to show all the current coercions. The identity
function id is always a coercion.

TermInf. Press Ctrl-D to exit.
TermInf> $show_coercions
id

Let’s define a new coercion from bool to int.

TermInf> let_coercion bool_to_int = fun x -> if x then 1 else 0
val bool_to_int : bool -> int
TermInf> $show_coercions
bool_to_int
id

Now we can use the coercion.

TermInf> {true} + 7
- : int = 8

In that instance, the interpreter could determine that the type of {true} had to be int, since it was added to 7. In the following instance, the interpreter can’t determine type of {true}.

TermInf> {true}
Problem with term inference.

But we can always explicitly give a type to any expression, so we can use that to tell the type-inferer what the type of {true} is.

TermInf> {true} : bool
- : bool = true
TermInf> {true} : int
- : int = 1

2. Lifting functions
We can view the function List.map as a coercion, taking a function 'a -> 'b to a function 'a list -> 'b list.

TermInf> let_coercion map = rec map is fun f -> fun l -> match l with [] -> [] | x::ll -> (f x)::(map f ll)
val map : ('a -> 'b) -> 'a list -> 'b list
TermInf> $show_coercions
map
bool_to_int
id

Now we can try it out.

TermInf> let square = fun x -> x * x
val square : int -> int
TermInf> ({square} 3) : int
- : int = 9
TermInf> ({square} [1;2;3]) : int list
- : int list = 1 :: 4 :: 9 :: []
TermInf> ({square} [[1;2];[5;6;7]]) : int list list
- : (int list) list = (1 :: 4 :: []) :: (25 :: 36 :: 49 :: []) :: []

Note that in our case, we had to explicitly tell the interpreter what the return type was, although presumably in practice the interpreter or compiler would usually be able to infer it.

The idea is that we can change the basic structure of the thing passed to {square}, and the term inferer will adapt. Note that in the third case, the term inferer iterated map to produce the required (int list list -> int list list) type.

We can similarly look inside the structure of pairs.

TermInf> let_coercion map_pair = fun f -> fun x -> (f (fst x), f (snd x))
val map_pair : ('a -> 'b) -> 'a * 'a -> 'b * 'b
TermInf> ({square} [(1,2);(3,4)]) : (int * int) list
- : (int * int) list = (1, 4) :: (9, 16) :: []
TermInf> ({square} ([1;2],[3;4])) : (int list) * (int list)
- : int list * int list = (1 :: 4 :: [], 9 :: 16 :: [])

Essentially all variants of map can be added. For example, the function mapi : ((int * 'a) -> 'b) -> 'a list -> 'b list where the function takes the index of the list element can be added. Then the term-inferer will determine which version of map (or sequence of versions of map) is needed based on the function given to it.

3. Term inference in conjunction with phantom types.
I put just enough type aliasing in TermInf to allow you to use phantom types. (For a great introduction to phantom types, see this blog post).

Here’s an example of how type aliasing works in TermInf:

TermInf> type hidden = int
TermInf> let f = (fun x -> x + 7) : hidden -> hidden
val f : hidden -> hidden
TermInf> let x = 3 : hidden
val x : hidden
TermInf> f x
- : hidden = 10
TermInf> f 3
The types hidden and int are incompatible

Something we might like to do with phantom types is have the type system do a static dimensional analysis on our program. Here’s an attempt to do that:

TermInf> type meters
TermInf> type gallons
TermInf> type 'a units = int
TermInf> let add = (fun x -> fun y -> x + y) : 'a units -> 'a units -> 'a units
val add : 'a units -> 'a units -> 'a units
TermInf> let times = (fun x -> fun y -> x * y) : 'a units -> 'b units -> ('a * 'b) units
val times : 'a units -> 'b units -> ('a * 'b) units
TermInf> let one_gal = 1 : gallons units
val one_gal : gallons units
TermInf> let one_m = 1 : meters units
val one_m : meters units

Then we have the following correct behavior:

TermInf> add one_gal one_gal
- : gallons units = 2
TermInf> times one_gal one_m
- : (gallons * meters) units = 1
TermInf> add one_gal one_m
The types gallons and meters are incompatible

But the following is not correct:

TermInf> let x = times one_gal one_m
val x : (gallons * meters) units
TermInf> let y = times one_m one_gal
val y : (meters * gallons) units
TermInf> add x y
The types gallons and meters are incompatible

Of course, the problem is that the interpreter doesn’t know that units commute.
But we can fix this with coercions.

TermInf> let_id_coercion commute = id : ('a * 'b) units -> ('b * 'a) units
val commute : ('a * 'b) units -> ('b * 'a) units
TermInf> add x {y}
- : (gallons * meters) units = 2

We’ve declared commute to be an identity coercion (by using let_id_coercion instead of let_coercion) to help the interpreter when it’s deciding if a term inference is unique or not.

Note that we don’t use term inference on both x and y, because then it couldn’t determine what type to give it.

TermInf> add {x} y
- : (meters * gallons) units = 2
TermInf> add {x} {y}
Problem with term inference.

This version of commute will just commute the two units at the top level, but there are a finite number of identity coercions that you can define that will give you associativity and commutativity (and inverses, if you want). Thus, the type system will be able to perform a static dimensional analysis on your program.

Edit: I should note that I left out several details about how this actually works. For example, the interpreter doesn’t search through all sequences of coercions, since there are infinitely many (and the problem of deciding if there is a unique one between any two given types is undecidable in general). Instead it limits itself to sequences of coercions whose type is never “bigger” that the starting type or the goal type, where “bigger” is defined by a straightforward length function.

Playing Games in the Transfinite: An Introduction to “Ordinal Chomp”

mkoconnor — Tue, 30 Sep 2008 03:14:52 +0000

Chomp is a two-player game which is played as follows: The two players, A and B, start with a “board” which is a chocolate bar divided into small squares. With Player A starting, they take turns choosing a square and eating it together with all squares above and to the right. The catch is that the square at the lower left-hand corner is poisonous, and the player who is forced to eat it loses.

This image from the Wikipedia article shows a typical sequence of moves for a chocolate bar:

At this point, Player A is forced to eat the poisoned square and hence loses the game.

Although the question of what the winning strategies are for this game is very much an open problem, the question of who has a winning strategy is not: On the board, Player B wins (since Player A must eat the poison piece on his first move). But for any other board, Player A has a winning strategy.

To see why, suppose not. Then if Player A’s first move is to eat just the one square in the top right-hand corner, Player B must have a winning response (since we are supposing that Player B has a winning response to any move that Player A makes). But if Player B’s response is winning, then Player A could have simply made that move to start with.

However, suppose we play Chomp not just on boards, but on boards, where and are arbitrary ordinals. The game still makes sense just as before, and will always end in finite time, but Player A no longer wins all of the time (there will no longer be a top right-hand corner square if either or is a limit ordinal).

Scott Huddleston and Jerry Shurman investigated Ordinal Chomp in this paper, and showed that it has a number of interesting properties. I’ll describe a few of them below.

First of all, to make things a bit easier to discuss, we will consider only games where is finite rather than games where is arbitrary. However, everything said will go through fine in that case as well.

Secondly, we will use the following game which is equivalent to Chomp: At any point in the game, the board is described by a non-increasing sequence . On each turn, the appropriate player picks an and a such that and replaces the sequence with . We call this taking a bite of height . Playing on an chocolate bar as described above corresponds to playing with the sequence consisting of ‘s.

For ease of discussion later, we make the convention that any sequence (not just a non-increasing one) describes a game position by stipulating that is the same as the position .

We saw above that Player A wins all non-trivial finite Chomp games, so let’s start by looking at a transfinite chomp game that Player B wins: , or in our new notation. What is Player B’s winning strategy? Well, notice that Player A’s first move has to put the game either in the state for some or for some . In either case, Player B can then move the game into state of the form for some . From a position of that form, whatever Player A does, Player B can again move to a position of that form. Eventually, Player B will move to the position , and Player A will be forced to eat the poison piece.

So, Player B can win at least some transfinite Chomp games, although he still loses a lot of them: for example, he loses all games of the form where \omega" class="latex" />. The reason is that in such a game Player A can win by first moving to the position and then using Player B’s winning strategy! Similarly, Player B loses all games for 2" class="latex" />.

In fact, for any ordinal , there is exactly one ordinal such that Player B wins . This is a consequence of what Huddleston and Shurman call the Fundamental Theorem of Transfinite Chomp in the above paper. Another interesting consequence, which illustrates the style of reasoning used in their proof of the Fundamental Theorem, is the following:

Theorem: For any sequence of ordinals, there is exactly one ordinal such that Player B wins the game . (Remember the convention above about sequence which are not necessarily non-increasing.)

Proof: The uniqueness is similar to the argument given above: If Player B has a winning strategy on the game then Player A has a winning strategy on all games where \alpha_1" class="latex" />, since Player A can just move to the position and then use Player B’s winning strategy.

The existence is by induction. Fix and suppose that for all where for all and for at least one , we know that there is a such that Player B has a winning strategy.

For each ordinal , let be the set of all obtainable from by taking a bite of height (this term was defined above, if you forgot what it means).

Let \gamma \}" class="latex" />. Let be the minimal ordinal not in . I claim that Player B has a winning strategy for . We will show this by showing that, for any move Player A makes, Player B can make a move that we know leads to a winning strategy for B.

So suppose Player A moves to . Either or . First suppose that . This means that Player A moved by taking a bite of height (say) out of by moving to . But by construction, we know that , which means that the player who is to move (Player B in this case) has a winning strategy.

Now suppose . This means by construction of that where for all and for at least one . Thus, if Player B moves to the position he ensures himself a win. .

Note that this proof is constructive. This means that you can actually use it to compute that (as we already know), the unique such that Player B wins is . As a puzzle, you might like to find the unique such that Player B wins (or such that Player B wins or or whatever you like, although the last one will be hard).

The much-more-general Fundamental Theorem of Transfinite Chomp in the paper linked above is also constructive. It allows you, in theory, to compute who will win the -dimensional game (we have been considering -dimensional games) for any ordinals . However, this is quite difficult in practice: according to the Wikipedia article, it is an open question who wins the -dimensional game .

As a final note, in the book Tracking the Automatic Ant, David Gale gives a very nice non-constructive proof that for all , there is a unique such that Player B wins .

Avoiding Set-Theoretic Paradoxes using Symmetry

mkoconnor — Mon, 22 Sep 2008 02:34:08 +0000

Intuitively, for any property of sets, there should be a set which has as its members all and only those sets such that holds. But this can’t actually work, due to Russell’s Paradox: Let , and then you can derive a contradiction from both and .

The standard solution to this is essentially to forbid the construction of any set which is too big. This solves the problem since you can prove that there are many sets which are not members of themselves, making too big to be a set. But you also end up throwing out many sets which you might want to have: for example, the set of all sets, the set of all groups, etc.

Randall Holmes recently published a paper espousing another solution: instead of forbidding the construction of sets which are too big, forbid the construction of sets which are too asymmetric. Details below.

Imagine you have some permutation of the universe of sets. Because any set is also a set of sets, we can also consider the related permutation defined by . That is, acts on a set by applying to ‘s elements. By iteration, we have for any .

For , say that a set is -symmetric if for all permutations of the universe of sets. We say that a set is symmetric if it’s -symmetric for some . Holmes’s criterion is then to forbid the construction of any set which is not symmetric. (You may have noticed that this discussion is not quite rigorous. Holmes’s paper has a fully rigorous formalization of this.)

So which sets are symmetric? First of all, notice that the empty set is symmetric, as it’s 1-symmetric. Therefore the set consisting of solely the empty set is 2-symmetric and therefore symmetric. Similarly any hereditarily finite set (this means a set which can be written down with a finite number of ‘s and ‘s and ‘s and nothing else) is symmetric, since it will be -symmetric where is the maximum depth of the braces.

It’s also the case that the set of all sets is 1-symmetric, so that exists. What about the set of all groups? A group will be encoded as some ordered pair of a set and a binary operation on that set, and a binary operation will be further encoded as a set of ordered pairs. The set of all groups will be -symmetric where is large enough to “pierce” the encoding, so that it ends up just permuting the group elements (and thus permuting the groups and sending the set of all groups to itself).

Can we develop mathematics in this theory? It seems that constructing the natural numbers will be a problem. The usual (von Neumann) definition of the natural numbers is that:

and, in general, each natural number is the set of all the preceding ones. All of these sets exist, since the von Neumann definition of will be -symmetric, but the set of all natural numbers is not symmetric.

However, we can go back instead to Frege’s original definition of the natural numbers: each is represented as the set of all sets of cardinality . For each , Frege’s definition of is 2-symmetric, and the set of all natural numbers is 3-symmetric. The rationals and reals can be constructed as usual.

So, how do we know that the set is not symmetric? We don’t, but an encouraging fact is the following: There is no known way to prove that for any formula , the set exists. Instead, one can prove that exists if is stratified: this means that one can assign a natural number to each variable in so that for any occurrence of the formula in , is assigned the number one less than that assigned to , and for any occurrence of the formula in , is assigned the same number as that assigned to . The formula defining is emphatically not stratified!

If you like working with universal sets, but it makes you uneasy to use a set theory which you don’t know is consistent, check out NFU. It uses the concept of stratified formulas to avoid Russell’s paradox, allows the existence of the set of all sets (and set of all groups, etc.) and is known to be consistent relative to ZFC. In fact, Randall Holmes proposed the system I’ve discussed here as a way of clarifying the semantics of a related set theory. A book developing mathematics in NFU is here.

The Undecidability of Identities Involving Sine, Exponentiation, and Absolute Value

mkoconnor — Sun, 14 Sep 2008 17:45:27 +0000

In the book A=B, the authors point out that while the identity

is provable (by a very simple proof!), it’s not possible to prove the truth or falsity of all such identities. This is because Daniel Richardson proved the following:

Let denote the class of expressions generated by

The rational numbers, and .
The variable
The operations of addition, multiplication, and composition.
The sine, exponential, and absolute value functions.

Then the problem of deciding whether or not an expression in is identically zero is undecidable. This means as well that the problem of deciding whether or not two expressions are always equal is also undecidable, since this is equivalent to deciding if is identically zero.

A summary of Richardson’s proof (mostly from Richardson’s paper itself) is below.

The proof depends on the MRDP theorem, which says that for any recursively enumerable set , there is a polynomial such that

For all , iff there exist such that .

At the time that Richardson proved his result, the MDRP theorem was not proven. Instead, only the weaker result where is allowed to be an exponential polynomial (i.e., the are allowed to appear as exponents in ) had been proven, and so that’s what he used. I haven’t read Richardson’s proof closely enough to determine if his result can be improved using the full MDRP theorem.

In any case, since there are recursively enumerable sets which are not decidable, we may let be an exponential polynomial such that the problem of deciding, given , whether or not there are such that is undecidable.

Therefore, the problem of deciding, given , whether or not there are non-negative real numbers such that

is undecidable.

Now, let be an exponential polynomial which grows very fast (and such that is very large). Then, if is a natural number and there are non-negative reals such that

is less than one, then both and are small. From this last fact, we conclude that each is close to a natural number. Let be the natural number closest to . Then, will be small. But then, because it’s an integer, it will be zero.

Therefore, we have an expression formed from sine and exponential functions (and rational numbers, addition, and multiplication) such that for each , there exist non-negative reals such that iff there exist natural numbers such that (which is an undecidable problem).

By an argument which I won’t reproduce here, we can replace with with the property that for each there exists a real such that iff there exist natural numbers such that . (Notice that now ranges over all reals.) But now, consider the sequence of functions

Each is identically zero iff the corresponding is ever less than zero, which is an undecidable problem.

The reference for Richardson’s paper is: Daniel Richardson, “Some unsolvable problems involving elementary functions of a real variable,” Journal of Symbolic Logic, Volume 33, 1968, pages 514–520

Another reference is: B.F. Caviness, “On canonical forms and simplification,” Journal of the ACM, volume 17, 1970, pages 385–396.

A Geometrically Natural Uncomputable Function

mkoconnor — Fri, 05 Sep 2008 02:50:47 +0000

There are many functions from to that cannot be computed by any algorithm or computer program. For example, a famous one is the halting problem, defined by if the th Turing machine halts and if the th Turing machine does not halt. Another one in the same spirit is the busy beaver function.

We also know a priori that there must be uncomputable functions, since there are functions from to but only computer programs. But that is nonconstructive, and the two examples I gave above seem a bit like they’re cheating since their definitions refer to the concept of computability. Is there a natural example of an uncomputable function that does not refer to computability?

In this paper, Alex Nabutovsky found what I think is a great example of such a function from geometry. Details below.

For any , let be the -dimensional unit sphere . For all there is an “equatorial” embedding of into sending to . This is certainly the nicest way of embedding into but there are other ways.

If is an embedding of into then let its amount of wiggle room be the maximum amount that ‘s image can be thickened before it intersects itself. More precisely, it is the maximum such that , where , and are in the image of , is the unit normal to the image of at and is the unit normal to the image of at . We let ‘s crumbledness be the reciprocal of its amount of wiggle room.

It is known (by a theorem of Stephen Smale) that any embedding of into can be isotoped to the equatorial embedding (up to reparameterization), but you may have to increase the crumbledness to do it. Nabutovsky proves that, for any dimension and crumbledness , there is an such that any embedding of crumbledness can be isotoped to the equatorial embedding going through only embeddings of crumbledness . We lose nothing in terms of complexity by considering .

Nabutovsky shows that for any , any satisfying the above condition is uncomputable, and, furthermore, the minimum which works grows like the busy beaver function!

The proof depends on the fact that for , it is undecidable if a compact manifold (presented either as a simplicial complex, or as a zero-set of a polynomial with rational coefficients, or some such representation that a computer can handle) embedded in is diffeomorphic to . (For a good summary of these types of undecidability results in group theory and topology, see Section 3 of Bob Soare’s Computability and Differential Geometry.) Essentially, the idea is that if were computable, you could decide if a manifold were homeomorphic to by taking embedding it in , measuring its crumbledness (say as ), then checking all possible isotopies of the manifold through embeddings of crumbledness . The fact that the manifold’s crumbledness can be measured and that all possible isotopies going through embeddings of bounded crumbledess can be checked computably is related to the fact that often you can computably search over compact spaces, as I wrote about in this post.

If you can get a hold of a copy, I highly recommend Shmuel Weinberger’s book Computers, Rigidity, and Moduli, where he talks about this and other related results in a very lively and engaging manner.

Edit: Fixed some notation.

Integrability Conditions (Guest Post!)

mkoconnor — Thu, 04 Sep 2008 23:38:13 +0000

Please enjoy the following guest post on differential geometry by Tim Goldberg.

A symplectic structure on a manifold is a differential -form satisfying two conditions:

is non-degenerate, i.e. for each and tangent vector based at , if for all tangent vectors based at , then is the zero vector;
is closed, i.e. the exterior derivative of is zero, i.e. .

In trying to come up with answers to questions like “what do you do?” and “what is symplectic geometry?” that would be accessible to an advanced undergraduate or beginning graduate student, I’ve tried to come up with fairly intuitive descriptions of what the two symplectic structure conditions really mean.

Non-degeneracy is pretty easy, because my intended audience is certainly familiar with the dot product in Euclidean space, and probably familiar with more general machinery like inner products and bilinear forms. A bilinear form on a vector space over a field is just an assignment of a number in to each pair of vectors, in such a way that the assignment is linear in each vector in the pair. A bilinear form is called non-degenerate if the only thing that pairs to zero with every single vector is the zero vector. A -form on is a collection of skew-symmetric bilinear forms, one for each tangent space of . Saying that is non-degenerate is saying that each of these bilinear forms is non-degenerate.

It’s much less clear how to describe to the uninitiated what the closed condition means. It’s even a bit unclear why this condition is required in the first place. A pretty nice answer came up yesterday, in a reading group I attend that is trying to learn about generalized complex structure. We are going through the PhD thesis of Marco Gualtieri, titled “Generalized Complex Geometry”. It is available at the following websites:

http://front.math.ucdavis.edu/0401.5221

http://front.math.ucdavis.edu/0703.5298.

This was the first meeting, and Tomoo Matsumura was the speaker. He suggested that the requirement that is an integrability condition. I had never thought of it this way, but I probably will from now on.

Almost complex and complex structures

Let me first describe what integrability means for an almost complex structure on a manifold. A complex structure on a vector space , where is real and finite-dimensional, is a linear endomorphism such that . Taking the determinant of both sides, we have . Since , we must have , so must be even. Furthermore, since , we know , so is a linear automorphism. The complex structure makes into a complex vector space, by setting

for and .

The standard example is with its usual ordered basis, labelled , and complex structure defined by and for all . Putting , we obtain the usual ordered basis for with its usual complex structure.

Let be a -dimensional manifold. An almost complex structure on is a smoothly-varying collection of complex structures, one for each tangent space of . (The existence of an almost complex structure forces to be even.) An almost complex structure on is just a bunch of complex structures on the tangent spaces glued together smoothly along . Recall that as a manifold, all tangent spaces to a vector space can be canonically identified with the vector space itself, so a choice of complex structure on the vector space induces an almost complex structure on the vector space as a manifold.

An almost complex structure on is called a complex structure if the complex structures on the vector spaces fit together in an even nicer way. We require that there be a covering of by coordinate neighborhoods such that on each such neighborhood is the pullback of the standard complex structure on . We require also that all transition maps for these coordinate charts be holomorphic with respect to the standard complex structure. This collection of coordinate charts form a complex atlas for , and give the structure of a complex manifold. (Notice that’s it’s easy to choose coordinates so that a single looks like the standard one, . We require that this hold not just at a single point, but in an entire neighborhood of the point.)

An almost complex structure is called integrable if it is actually a complex structure. There are many integrability conditions for almost complex structures, such as the vanishing of the Nijenhuis tensor associated to an almost complex structure.

Almost symplectic and symplectic structures

Now we give a parallel discussion for symplectic structures. A symplectic structure on a vector space , where is real and finite-dimensional, is non-degenerate and skew-symmetric bilinear form . Choose a basis for and represent by a matrix relative to this basis. Because is non-degenerate we know , and because it is skew-symmetric we know that , so . Hence , so must be even.

The standard example is with its usual ordered basis, labelled , and symplectic structure defined by and , where is the Kronecker delta.

Let be a -dimensional manifold. An almost symplectic structure on is a smoothly-varying collection of symplectic structures, one for each tangent space of . (The existence of an almost symplectic structure forces to be even.) An almost symplectic structure on is just a bunch of symplectic structures on the tangent spaces glued together smoothly along . As before, a choice of symplectic structure on a vector space induces an almost symplectic structure on the vector space as a manifold.

An almost symplectic structure on is called a symplectic structure if the symplectic structures on the vector spaces fit together in an even nicer way. Analogous to the complex structure case, we require that there be a covering of by coordinate neighborhoods such that on each such neighborhood is the pullback of the standard complex structure on . We require also that all transition maps for these coordinate charts be symplectic with respect to the standard symplectic structure. A manifold with a symplectic structure is a symplectic manifold.

Not much seems to be said about almost symplectic structures on manifolds, and so even less is said about integrable almost symplectic structures. But if one were to say something about them, surely the first thing would be to notice that, by Darboux’s Theorem, there is an extremely simple integrability condition. This is exactly that be closed, i.e. that .

Summary

To summarize, every manifold is locally isomorphic to some . An almost complex manifold is one equipped with a smoothly varying collection of complex structures on its tangent spaces. An almost complex manifold is a complex manifold if it is locally isomorphic to some with its standard complex structure. In this case, the almost complex structure is called integrable. Every previous sentence in this paragraph holds with the word “complex” replaced with “symplectic”. There are many well-known conditions for an almost complex structure to be integrable. To the best of my knowledge, there is really only one well-known condition for an almost symplectic structure to be integrable, and this is the innocuous looking requirement that .

Lots of Fun Math Papers

mkoconnor — Fri, 29 Aug 2008 18:22:11 +0000

In the course of looking up a link for my last blog entry, I discovered the MAA Writing Awards site, which collects many pdfs of articles that have won MAA writing awards. From browsing it a bit, it seems to be a goldmine of fun math articles.

Non-Rigorous Arguments 1: Two Formulas For e

mkoconnor — Fri, 29 Aug 2008 17:31:25 +0000

I’m a big fan of non-rigorous arguments, especially in calculus and analysis. I think there should be a book cataloging all the beautiful, morally-true-but-not-actually-true proofs that mathematicians have advanced, but until that time I’ll try to at least catalog a few of them on my blog.

This first one is Euler’s original argument for the equality of two expressions (both of which happen to define ):

I’ll also sketch how this can be made rigorous in non-standard analysis.

The argument is as follows: The limit is equal to , where is infinitely large. By the binomial theorem, this is:

Since is , this is the sum as ranges from to of:

Now, if is infinitely large, this term is so small that it may be neglected. On the other hand, if is finite, then for . Therefore

and the whole sum is equal to

as desired.

Now, I’ll sketch how to make this rigorous in non-standard analysis. This is from Higher Trigonometry, Hyperreal Numbers, and Euler’s Analysis of Infinities by Mark McKinzie and Curtis Tuckey, which is the best introductory article on non-standard analysis that I’ve read.

In non-standard analysis, one extends the real numbers to a larger field which contains all the reals, but also a positive which is less than every positive real (and hence also a number which is greater than every real). For every function , there is a function , and the ‘s satisfy all the same identities and inequalities formed out of composition that the ‘s do. (For example, for all hyperreal .) For that reason, I’ll often omit the . The range of is called , the hyperintegers. Since , the same is true in the hyperreals and there are therefore infinite hyperintegers.

We call a nonzero hyperreal infinitesimal if is less than every positive real. We say that and are close (written ) if is infinitesimal. We say that is infinite if is infinitesimal (equivalently, if is greater than every real). We say that is finite if it’s not infinite (equivalently, if is less than some real).

Let be a sequence of hyperreals. We say that it is determinate if whenever and are infinite

The Summation Theorem can then be proven: If and are two determinate sequences such that for all finite , then for all infinite .

By appropriately replacing “equals” with “is close to”, Euler’s argument above may now be adapted to prove that for all infinite and ,

(the sequence may be proved determinate by comparison with the geometric sequence, which is easily shown determinate). By a transfer principle, this may in turn be used to prove that (in the regular reals).

A Curious Application of Ambiguity with Respect to the Possessive Form

mkoconnor — Mon, 25 Aug 2008 22:38:25 +0000

Why did the chicken cross the island on Lost?

To get to the Others’ side.

(Composed by Tim Goldberg.)

Almost a Number-Theoretic Miracle

mkoconnor — Mon, 25 Aug 2008 18:33:20 +0000

An arithmetic statement is one made up of quantifiers “,” “,” the logical connectives “and,” “or,” “not”, function symbols , , constants , , and variables which are bound by the aforementioned quantifiers.

It is known that there is no algorithm which will decide whether or not an arithmetic statement is true or not. This shouldn’t be surprising, since if there were such an algorithm, it would be able to automatically prove Fermat’s Last Theorem, settle Goldbach’s Conjecture and the Twin Prime Conjecture, etc.

However, if we call a quasi-arithmetic statement one which uses the quantifiers “for all but finitely many ” (denoted “”) and “there exists infinitely many ” (denoted “”) instead of “” and “”, then we do have an algorithm for deciding whether a quasi-arithmetic statement is true or not!

This was shown by David Marker and Ted Slaman in this note. The proof goes as follows.

First, observe that “” is equivalent to “”, so that we can eliminate all occurrences of .

Next, note that “” is equivalent to “ m)\, \phi(n)" class="latex" />.” Thus, we can replace with , where is defined to be the quantifier m)" class="latex" />.

Now, prove that all statements involving only the quantifier are true in iff they are true in . This is proved by induction on the structure of the formulas. The crucial step is the following: If holds in , then is true for all sufficiently large natural numbers. However, a subset of defined only by quantifiers over is a semialgebraic set, and it is known that all semialgebraic subsets of are finite unions of points and intervals. Therefore, if all sufficiently large natural numbers are in some semialgebraic set, then all sufficiently large real numbers must also be in that set.

So, we have reduced the problem to that of deciding whether or not sentences involving the quantifier are true over . But, by a result of Tarski’s, there is an algorithm which will decide whether or not statements using the quantifiers and is true over and can be defined in terms of and .

How does Tarski’s proof work? The first step is to observe that deciding quantifier-free statements is easy, since it’s just a computation. So, the second step is to systematically eliminate quantifiers from statements. One instance of quantifier elimination is familiar to everyone is: is equivalent to . This follows from the quadratic formula. Sturm’s theorem is a generalization of this test which will tell you how many distinct real roots any polynomial has, and Tarski’s theorem is a generalization of Sturm’s theorem.

For information on practical algorithms for quantifier elimination over the reals see Algorithms in Real Algebraic Geometry.