How can one do calculus with (nilpotent) infinitesimals?: An Introduction to Smooth Infinitesimal Analysis

Many mathematicians, from Archimedes to Leibniz to Euler and beyond, made use of infinitesimals in their arguments. These were later replaced rigorously with limits, but many people still find it useful to think and derive with infinitesimals.

Unfortunately, in most informal setups the existence of infinitesimals is technically contradictory, so it can be difficult to grasp the means by which one fruitfully manipulates them. It would be useful to have an axiomatic framework with the following properties:

1. It is consistent.

2. The system acts as a good “intuition pump” for the real world. In particular, this entails that if you prove something in the system, then while it won’t necessarily be true in the real world, there should be a high probability that it’s morally true in the real world, i.e., with some extra assumptions it becomes true. It should also ideally entail that many of the proofs of Archimedes, et al., involving infinitesimals can be formulated as is (or close to “as is”).

“Smooth infinitesimal analysis” is one attempt to satisfy these conditions.

(This is a blogified version of the first part of an article I wrote here.)

Axioms and Logic

Consider the following axioms:

Axiom 1. R is a set, 0 and 1 are elements of R and + and \cdot are binary operations on R. The structure \langle R, +, \cdot, 0, 1\rangle is a commutative ring with unit.

Furthermore, we have that \forall x\, ((x \ne 0) \implies (\exists y\, xy = 1)), but I don’t want to call R a field for a reason I’ll discuss in a moment.

Axiom 2. There is a transitive irreflexive relation < on R. It satisfies 0 < 1, and for all x, y, and z, we have x < y \implies x + z < y + z and (x < y and z > 0)\implies xz < yz.

It also satisfies \forall x, y\, (x\ne y) \implies (x > y \vee x < y), but I don’t want to call < total, for a reason I’ll discuss in a moment.

Axiom 3. For all x > 0 there is a unique y > 0 such that y^2 = x.

Axiom 4 [Kock-Lawvere Axiom]. Let D = \{d \in R\mid d^2 = 0\}. Then for all functions f from D to R, and all d\in D, there is a unique a\in R such that f(d) = f(0) + d\cdot a.

After reading the Kock-Lawvere Axiom you are probably quite puzzled. In the first place, we can easily prove that D = \{0\}: Let d\in D. For a proof by contradiction, assume that d\ne 0, then there is a d^{-1} and if d^2 equalled 0, we would have d = d^2 d^{-1} = 0.

For an alternate proof that D = \{0\}: Again assume that d\ne 0 for a contradiction. Then d > 0 or d < 0. In the first case, d^2 > 0, so d\ne 0 (since < is irreflexive). In the second case, we have 0 < -d by adding -d to both sides, and again d^2 > 0.

Now, if D = \{0\}, then for any a\in R, and any function f from D to R, we have f(d) = f(0) + d\cdot a for all d\in D. This contradicts the uniqueness of a. Therefore, the axioms presented so far are contradictory.

However, we have the following surprising fact.

Fact. There is a form of set theory (called a local set theory, or topos logic) which has its underlying logic restricted (to a logic called intuitionistic logic) under which Axioms 1 through 4 (and also the axioms to be presented later in this paper) taken together are consistent

Definition. Smooth Infinitesimal Analysis (SIA) is the system whose axioms are those sentences marked as Axioms in this paper and whose logic is that alluded to in the above fact.

References for this theorem are [Moerdijk-Reyes] and [Kock]. References for topos logic specifically are [Bell1] and [MacLane-Moerdijk].

Essentially, intuitionistic logic disallows proof by contradiction (which was used in both proofs that D = \{0\} above) and its equivalent brother, the law of the excluded middle, which says that for any proposition P, P\vee \neg P holds.

I won’t formally define intuitionistic logic or topos logic here as it would take too much space and there’s no real way to understand it except by seeing examples anyway. If you avoid proofs by contradiction and proofs using the law of the excluded middle (which usually come up in ways like: “Let x\in R. Then either x = 0 or x\ne 0.”), you will be okay.

But before we go further we might ask, “what does this logic have to do with the real world anyway?” Possibly nothing, but recall that our goals above do not require that we work with “real” objects; just that we have a consistent system which will act as a good “intuition pump” about the real world. We are guaranteed that the system is consistent by a theorem; for the second condition each person will have to judge for themselves.

To conclude this section, it should now be clear why I didn’t want to call R a field and < a total order:
Even though we have \forall x\, ((x\ne 0)\implies x invertible), we can’t conclude from that that \forall x\, ((x = 0) \vee (x invertible)), because the proof of the latter from the former uses the law of the excluded middle. Calling R a field would unduly give the impression that the latter is true.

For the rest of this blog entry I will generally work within SIA (except, obviously, when I announce new axioms or make remarks about SIA).

Single-Variable Calculus

An Important Lemma
This lemma is easy to prove, but because it is used over and over again, I’ll isolate it here:

Lemma [Microcancellation] Let a, b\in R. If for all d\in D we have ad = bd, then a = b.

Proof.

Let f\in R^D be given by f(d) = ad = bd. Then by the uniqueness condition of the Kock-Lawvere axiom, we have that a = b.

\square

Basic Rules
Let f be a function from R to R, and let x\in R. We may define a function g from D to R as follows: for all d\in D, let g(d) = f(x + d). Then the Kock-Lawvere axiom tells us that there is a unique a so that g(d) = g(0) + ad for all d\in D. Thus, we have that for all functions f from R to R and all x\in R, there is a unique a so that f(x + d) = f(x) + ad for all d. We define f'(x) to be this a.

We thus have the following fundamental fact:

Proposition [Fundamental Fact about Derivatives]
For all f\in R^R, all x\in R, and all d\in D,

f(x + d) = f(x) + f'(x)d

and furthermore, f'(x) is the unique real number with that property.

\square

Proposition Let f, g\in R^R, c\in R. Then:

1. (f + g)' = f' + g'

2. (cf)' = cf'

3. (fg)' = f'g + fg'

4. If for all x, g(x) \ne 0, then (f/g)' = (gf' - fg')/g^2.

5. (f\circ g)' = (f'\circ g)\cdot g'.

Proof
I’ll prove 3 and 5 and leave the rest as exercises.

To prove 3: Let x\in R and d\in D. Let h(x) = f(x)g(x). Then

h(x + d) = f(x + d)g(x + d) = (f(x) + f'(x)d)(g(x) + g'(x)d)

which, multiplying out and using d^2 = 0, is equal to

f(x)g(x) + d(f'(x)g(x) + f(x)g'(x)) = h(x) +d(f'(x)g(x) + f(x)g'(x)).

On the other hand, we know that h(x + d) = h(x) + h'(x)d, so

h'(x)d = d(f'(x)g(x) + f(x)g'(x)).

Since d was an arbitrary element of D, we may use microcancellation, and we obtain h'(x) = f'(x)g(x) + f(x)g'(x).

To prove 5: Let x\in R and d\in D. Then

f(g(x + d)) = f(g(x) + g'(x)d).

Now, g'(x)d is in D (since (g'(x)d)^2 = d^2(g'(x))^2 = 0), so

f(g(x) + g'(x)d) = f(g(x)) + g'(x)f'(g(x))d.

As before, this gives us that g'(x)f'(g(x)) is the derivative of f(g(x)).

\square

In order to do integration, let’s add the following axiom:

Axiom 5 For all f\in R^R there is a unique g\in R^R such that g' = f and g(0) = 0. We write g(x) as \int_0^x f(t)\,dt.

We can now derive the rules of integration in the usual way by inverting the rules of differentiation.

Deriving formulas for Arclength, etc.

I’d now like to derive the formula for the arclength of the graph of a function y = f(x) (say, from x = 0 to x = 1). Because “arclength” isn’t formally defined, the strategy I’ll take is to make some reasonable assumptions that any notion of arclength should satisfy and work with them.

For this problem, and other problems which use geometric reasoning, it’s important to note that the Kock-Lawvere axiom can be stated in the following form:

Proposition [Microstraightness] If f\colon R\to R^n is any curve, x\in R, and
d\in D, then the portion of the curve from f(x) to f(x+d) is straight.

\square

Let f\in R^R be any function, and let s(x) be the arclength of the graph of y = f(x) from 0 to x. (That is, s is the function which we would like to determine.)

Let x_0 \in R and d\in D be arbitrary and consider s(x_0 + d) - s(x_0). It should be the length of the segment of y = f(x) from x_0 to x_0 + d, as in the following figure.

Because of microstraightness, we know that the part of the graph of y=f(x) from P to Q is a straight line. Furthermore, it is the hypotenuse of a right triangle with legs PR and RQ. The length of PR is d.

To determine the length of RQ: Note that the height of P is f(x), so the height of R is f(x). On the other hand, the height of Q is f(x + d) = f(x) + f'(x)d, so the length of RQ is f'(x)d.

The hypotenuse of a right triangle with legs of length 1 and f'(x) is \sqrt{1 + f'(x)^2}. By scaling down, we see that the length of PQ is d\sqrt{1+f'(x)^2}.

So, we know that s(x + d) - s(x) should be d\sqrt{1 + f'(x)^2}. On the other hand, s(x + d) - s(x) = ds'(x). By microcancellation, we have that s'(x) = \sqrt{1 + f'(x)^2}. Since s(0) = 0, we have

s(x) = \int_0^x \sqrt{1 + f'(t)^2}\,dt

Several other formulas can be derived using precisely the same technique. For example, suppose we want to know the surface area of revolution of y = f(x). Furthermore, suppose we know that the surface area of a frustum of a cone with radii r_1 and r_2 and slant height h as in the figure below is \pi(r_1 + r_2)h. (See below to eliminate this assumption.)

Then, let A(x_0) be the surface area of revolution of y = f(x) from x = 0 to x = x_0 about the x-axis. As before, consider A(x_0 + d) - A(x_0) where d is arbitrary. This should be the surface area of the frustum obtained by rotating PQ about the x-axis. The slant height is the length of PQ, which we determined earlier was (\sqrt{1 + f'(x)^2})d. The two radii are f(x) and f(x + d) = f(x) + f'(x)d. Therefore,

A(x_0 + d) - A(x_0) = \pi(f(x) + f(x) + f'(x)d)(\sqrt{1 + f'(x)^2})d

which, multiplying out, becomes d2\pi f(x)\sqrt{1 + f'(x)^2}. As before, A(x_0 + d) - A(x_0) is also equal to A'(x_0)d, so

A(x) = 2\pi\int_0^x f(t) \sqrt{1 + f'(t)^2}\,dt

In a precisely analogous way, one may derive the formula for the volume of the solid of revolution of y = f(x) about the x-axis, the formula for the arclength of a curve r = f(\theta) given in polar form, and show that the (signed) area under the curve y = f(x) from x = a to x = b is \int_a^b f(x)\,dx.

Above we assumed that we knew the surface area of a frustum of a cone. Finally, as an exercise, eliminate this assumption by deriving the formula for the surface area of a cone (from which the formula for the surface area of a frustum follows by an argument with similar triangles) as follows:

Fix a cone C of slant height h and radius r. The cone C can be considered to be the graph of a function y = mx from x = 0 to x = r/m revolved a full 2\pi radians around the x-axis.

Let A(\theta) be the area of the surface formed by revolving the graph of y = mx from x = 0 to x = r/m only \theta radians around the x-axis.

Using a method similar to that above, determine that A(x) = (1/2)xrh. This gives the surface area as A(2\pi) = \pi rh.

The Equation of a Catenary

In the above section, essentially the same method was used again and again to solve different problems. As an example of a different way to apply SIA in single-variable calculus, in this section I’ll outline how the equation of a catenary may be derived in it. The full derivation is in [Bell2].)

To do this, we’ll need the existence of functions \sin, \cos, \exp in R^R satisfying \sin(0) = 0, \cos(0) = \exp(0) =1, \sin' = \cos, \cos' = -\sin and \exp' = \exp. We get this from the following set of axioms.

Axiom (Scheme) 6. For every C^\infty function f\colon \mathbb{R}^n \to \mathbb{R}^m (in the real world), we assume we have a function f\colon R^n \to R^m (in SIA). Furthermore, for any true identity constructed out of such functions, composition, and partial differentiation operators, we may take the corresponding statement in SIA to be an axiom. (“True” means true for the corresponding functions between cartesian products of \mathbb{R} in the real world.)

(We can actually go further. For every C^\infty manifold \mathbb{M} in the real world, we may assume that there is a set M in SIA, and for every C^\infty function f\colon \mathbb{M}\to\mathbb{N} we may assume that there is a function f\colon M\to N in SIA, and we may assume that these functions satisfy all identities true of them in the real world. But I will not use these extra axioms in this article.)

Suppose that we have a flexible rope of constant weight w per unit length suspended from two points A and B (see the figure below). We would like to find the function f such that the graph of y = f(x) is the curve that the rope makes. (We will actually disregard the points A and B and consider f to be defined on all of R.)

Let T(x) be the tension in the rope at the point (x,f(x)). (Recall that the tension at a point in a rope in equilibrium is defined as follows: That point in the rope is being pulled by both sides of the rope with some force. Since the rope is in equilibrium, the magnitude of the two forces must be equal. The tension is that common magnitude.)

Let \phi(x) be the angle that the tangent to f(x) makes with the positive x-axis. (That is, \phi(x) is defined so that \sin\phi(x) = f'(x)\cos\phi(x)). We suppose that we have chosen the origin so that \phi(0) = 0.

Let s(x) be the arclength of f(x) from 0 to x.

Let x_0\in R and d\in D be arbitrary. Consider the segment of the rope from P = (x_0,f(x_0)) to Q = (x_0 + d,f(x_0 + d)). This segment is in equilibrium under three forces:

1. A force of magnitude T(x_0) with direction \phi(x_0) + \pi.

2. A force of magnitude T(x_0 + d) with direction \phi(x_0 + d).

3. A force of magnitude w(s(x_0 + d) - s(x_0)) = ws'(x_0)d with direction -\pi/2.
By resolving these forces horizontally and using microcancellation, one can show that the horizontal component of the tension (that is, T(x)\cos\phi(x)) is constant. Call the constant tension T_0.

By resolving these forces vertically and using microcancellation and the fact that \phi(0) = 0, one can show that the vertical component of the tension (that is T(x)\sin\phi(x)) is ws(x).

Finally, by combining the results of the previous two paragraphs and using the fact that \sin\phi(x) = \cos\phi(x) f'(x) and s'(x) = \sqrt{1 + f'(x)^2}, one can show that f satisfies the differential equation 1 + (u')^2 = a^2(u'')^2, where a = T_0/w.

Solving differential equations symbolically is the same in SIA as it is classically, since no infinitesimals or limits are involved. In this case, the answer turns out to be

f(x) = a\cosh\left(\frac{x}{a}\right) = \frac{a(e^{x/a} + e^{-x/a})}{2},

if we add the initial condition f(0) = a to our previously assumed initial condition f'(0) = 0.

I’ll include a post on multivariable calculus later.

8 thoughts on “How can one do calculus with (nilpotent) infinitesimals?: An Introduction to Smooth Infinitesimal Analysis

  1. I recently became very interested in smooth infinitesimal analysis, and found your article a useful guide when trying to understand some of the earlier works (Kock, Reyes-Moerdijk, etc.).

    However, there is something that bothers me about this setup, and I wonder if you have any ideas on it. Specifically, the problem is Axiom 5. This axiomatizes the fundamental theorem of calculus! Is there no way to define an area function using infinitesimals, then show that its derivative is the original function?

    I think the idea of taking as an axiom “every curve is made up of infinitesimally small line segments” a beautiful concept, but it seems to me to be giving up too much to also have to axiomatize the existence of an anti-derivative.

    Do you know of any way to not have to take this as an axiom?

  2. Hi Geoff,
    I’m glad you liked the article and find the subject interesting!

    Axiom 5 doesn’t really axiomatize the fundamental theorem of calculus, since all Axiom 5 says is that antiderivatives of functions exist. The fundamental theorem of calculus says something more: it says that not only do antiderivatives of (suitable) functions f(x) exist, but that furthermore the function F(x) giving the (signed) area under the graph of f from {0} to x is a specific example of an antiderivative. (Also, that all other antiderivatives are equal to F plus some constant.)

    The notation is confusing though, because in Axiom 5, I said that antiderivatives of functions exist, and furthermore I denoted the antiderivative of f which takes the value {0} on input {0} by \int_0^x f(t)\,dt. This could be confusing, because normally \int_0^x f(t)\,dt is defined as a Riemann sum, which is the classical formalization of the “area” concept, and so with that notation, the statement that the derivative of \int_0^x f(t)\,dt is f(x) is indeed the fundamental theorem of calculus. But with the notation I used in this article, it’s just a definition.

    As an aside, you can prove a version of the fundamental theorem of calculus in smooth infinitesimal analysis, in much the same way that you can find the area of a cone.

    If you want to not take the existence of antiderivatives as an axiom, you might want to take a look at Bell’s book “A Primer on Smooth Infinitesimal Analysis,” which does a lot of stuff without that axiom, but in its place he assumes that if f' = g' and f(0) = g(0) then f = g.

  3. Ah, of course you’re right. I think I was confused by the notation. Thanks for the quick reply.

    I guess my next question would be: how does one define the “area under a curve” function? I assume we would like to do it without limits.

    Now that I look over the above again, I notice that there’s something similar with the arc length function: you mention that it’s not formally defined, so instead work with what reasonable properties it should have. But there has to be some nice way to formally define these notions; perhaps I just need to get Bell’s book.

    As a side note, I’m amazed by how much more elegant some of the proofs with smooth infinitesimal analysis are than with regular calculus. In particular, the proofs of the product and chain rules are so much nicer than their usual counterparts.

  4. I definitely agree that the proofs in smooth infinitesimal analysis are much nicer than in classical calculus; that’s the main reason why I like it.

    You define the “area under a curve” function in a similar way to arclength as follows: You show that, given f, there is a unique function g(a,b) (to be interpreted as the area under the curve y = f(x) from x = a to x = b such that: for all a,b,c, g(a,b) + g(b,c) = g(a,c) and, if f is linear on \lbrack a,b\rbrack, then g(a,b) is the appropriate value (i.e., the formula for the area of a trapezoid: (f(a) + f(b)/2)\cdot (b-a). You can then prove that there is a unique function g(a,b) with these properties, and it’s \int_a^b f(t)\,dt.

    The way area is dealt with in smooth infinitesimal analysis (i.e., isolate some conditions that an area function must have, then prove there is a unique function satisfying those conditions) is not so different from what is done in the classical case. The problem is that it’s done over again for each problem type: that is, you use it once to determine the area of a cone, then once again to prove the fundamental theorem of calculus, etc.

    If I understand your question right, you’re wondering if you can do it once and for all. That is, can you prove that there is a unique function assigning to subsets of, say, R^2 an element of R satisfying the appropriate conditions. I don’t know, but I would guess that the answer is “no.” I would have to think some more about that (and probably learn some more first!). If you have any thoughts, let me know.

  5. Robert, I wouldn’t call myself an expert, but yes: the fact that smooth infinitesimal analysis is modeled in a topos means “smooth spaces” include function spaces, where the calculus of variations naturally lives. For instance, one has a smooth space of smooth paths between two points of a manifold, and it is an easy proposition that tangent vectors in that smooth space are equivalent to vector fields along a chosen path. One can go on to perform analysis on smooth functionals on such function spaces within this setting in a very intuitive fashion.

Leave a Reply to Todd Trimble Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s