A Tour de Morse (theory)

Morse theory is amazing.  Very geometric, more-or-less very intuitive.  You don’t really explore it in detail until you’ve seen a fair bit of differential topology, but if you look closely, you start getting exposed to its core ideas as early as multivariate Calculus.

As is the fashion in modern geometry (specifically, algebraic geometry), we study geometric objects by studying the behavior of (appropriate classes of ) functions on them.  But which functions?  In algebraic geometry, if you’ve got some nice affine variety, you’ve got a set god-given functions to use: the coordinate ring of the variety.  Here, in the affine case, this is a finitely generated, reduced (=no nilpotents) algebra over a field.  Basically, a quotient of a polynomial ring by a radical ideal.  Not too bad, quite manageable.

However, for a smooth manifold M, the class of smooth functions on M is really big.  To make it “worse”, the existence of bump functions makes it hard to obtain too much cohomology info from the sheaf of smooth functions.  Of course, we can use differential forms to obtain geometric (cohomological) info; this is known as the de Rham cohomology of M, and it’s actually isomorphic to the singular cohomology of M, which is also isomorphic to the Cech cohomology of the constant sheaf, \mathbb{R}_M.

Cue Morse functions.

But first (I lied), we need to recall some basic terminology.  Let f: M \to N be a smooth function between the smooth manifolds M and N.  For every point x \in M, f induces a linear transformation d_xf : T_x M \to T_{f(x)}N between tangent spaces.  We say that x is a regular point of f if the map d_xf is surjective (this means “f is a submersion at x“).  If d_xf is not surjective, we say x is a critical point of f.  Suppose f(x) = y.  We say y is a regular value of f if, for all x \in f^{-1}(y), x is a regular point of f.  If this is not the case (i.e., some point in the preimage of y is a critical point), we say that y is a critical value of f.  If you’ve been good and remember your basic Calculus, regularity of a point/value tells us a lot (topologically) about M near x.  Via the Implicit Function Theorem, if y is a regular value, the set f^{-1}(y) is a smooth submanifold of M, of pure codimension one.  If x is a regular point of f, there is an open neighborhood, U of x in M such that f^{-1}(f(x)) \cap U is a smooth submanifold of M of pure codimension one.

But what happens at critical points?  Critical values?  How much do we have to worry?  How abundant are they?  Fortunately, we have

Sard’s Theorem: the set of critical values of f: M \to N has measure zero in N.

So, “almost all” points of N are regular values of f.  But, let’s go deeper: what happens at critical points?

Okay, so this is where you start seeing this stuff in early Calculus.  Say we’ve got a smooth function f: \mathbb{R} \to \mathbb{R}, and we look at its graph, M, in \mathbb{R}^2.  One of the first things we investigate are the “tangent lines” to points on the graph; here, these are the tangent spaces to M.  Using this we can answer the question “where does f achieve extreme values?” Every Calc student knows (or, should know) that these can only happen (at smooth points of the domain of f) when the tangent line to f at some point x \in \mathbb{R} is “horizontal”, that is, when f'(x) = 0.  Equivalently,  when d_xf : T_x\mathbb{R} \to T_{f(x)}\mathbb{R} is not surjective (since in the one dim. case, d_xf(v) = f'(x)v, and d_xf is surjective iff f'(x) \neq 0).

But what about the second derivative?  After all, we said f was infinitely differentiable.  Hopefully these higher derivatives contain more information?

Of course, you already know the answer.  Suppose f'(x) = 0, but f''(x) \neq 0.  Well, it’s either going to be positive or negative.  If f''(x) > 0, then we know f has a local minimum at x.  If f''(x) < 0, then f has a local maximum at x.  Similarly, we’d say the graph is locally “concave up” in the former case, “concave down” in the latter.  Intuitively, the graph “looks like” the parabola y = \pm x^2 around x, depending on the sign.  We can’t really apply this analysis in the case where f''(x) = 0; for that, you’d need to use Taylor’s theorem to get more information about f at x.

It isn’t really until we start doing Calculus in several variables that we see the utility of this approach.  Let’s move to three variables.  Let f: \mathbb{R}^2 \to \mathbb{R} be a smooth function, and let M = \{(x,y,f(x,y))| x,y \in \mathbb{R}\} be the graph of f.  Suppose p= (x_0,y_0) is a critical point of f.  Recall the differential in this case is given by

d_pf(a,b) = a \frac{\partial f}{\partial x}(p) + b \frac{\partial f}{\partial y}(p)

and saying that p is a critical point of f means that \frac{\partial f}{\partial x}(p) = \frac{\partial f}{\partial y}(p) =0. Since we’ve got more than one variable, any kind of “Second derivative test” is going to need to information from all the second partial derivatives, in some way.  For example, how do we reinterpret the criterion f''(x) \neq 0 in this case?  

I’ll save you the trouble and just say it: what we need to examine is something called the Hessian of f at p:

H(p) = \begin{pmatrix} \frac{\partial^2 f}{\partial x^2}(p) && \frac{\partial^2 f}{\partial x \partial y}(p) \\ \frac{\partial^2 f}{\partial y \partial x}(p) && \frac{\partial^2 f}{\partial y^2}(p) \end{pmatrix}

The Hessian of f at a point is just the matrix of second partials of f, arranged in a particular way.  (In the general case of \mathbb{R}^n, with coordinates (x_1,\cdots, x_n), the Hessian takes the form \left ( \frac{\partial^2 f}{\partial x_i \partial x_j} \right )_{1\leq i,j \leq n}.  Requiring that “f''(x) \neq 0 now becomes D(p) := \text{det}H(p) \neq 0, and in such a case, we say p is a nondegenerate critical point of f.  We say

  • p is a local minimum of f if D(p) > 0, and \frac{\partial^2 f}{\partial x^2}(p) > 0;
  • p is a local maximum of f if D(p) > 0, and \frac{\partial^2 f}{\partial x^2}(p) < 0; and
  • p is a saddle point of f if D(p) < 0.

Intuitively, this says that the graph of f locally looks like the paraboloid z= \pm (x^2 + y^2) in the first two cases (depending on the sign), and like the hyperbolic paraboloid (= “saddle”) z = x^2 - y^2 in the third case.

But what do I mean “looks like”?  Is there a formal way to express this?  Of course, or I wouldn’t be talking about it.

Might as well do the general case: Let M be a smooth manifold of dimension n, f: M \to \mathbb{R} a smooth function. Let p \in M, and suppose that p is a nondegenerate critical point of f*.  Then, there is a smooth system of coordinates about p such that, in these coordinates, f may be written as

f(y_1,\cdots, y_n) = f(p) - \sum_{i= 1}^\lambda y_i^2 + \sum_{i = \lambda + 1}^n y_i^2

where 1 \leq \lambda \leq n is the index of f at p (= the number of negative eigenvalues of H(p)).  This result is known as the Morse Lemma, and it legitimizes our intuition from the previous examples.

*We had previously defined the Hessian of f at p within a given coordinate system.  As it turns out, “nondegeneracy” of a critical point is independent of coordinates, as is the index.*

Nondegeneracy of a critical point is basically the next best thing to requiring regularity of a point. In addition to the Morse lemma, nondegenerate critical points are isolated as well.  That is, at such a point p, we can find an open neighborhood U of p such that p is the only critical point of f|_U.  This isn’t even that hard to show: if (x_1,\cdots,x_n) are local coordinates about p, define a new function, F : M \to \mathbb{R}^n via

F(q) = (\frac{\partial f}{\partial x_1}(q),\cdots,\frac{\partial f}{\partial x_n}(q))

Since p is a critical point of f, F(p) = (0,\cdots,0) \in \mathbb{R}^n.  Then, the differential of F at p is equal to the Hessian of f at p, so nondegeneracy of p implies nonsingularity of d_pF.  Hence, by the Inverse Function theorem, F carries some open neighborhood U of p in M diffeomorphically onto an open neighborhood of the origin in \mathbb{R}^n.  That is, p is the only critical point of f inside U.

In keeping with all these definitions, we say a smooth function f: M \to \mathbb{R}^n is a Morse function if all its critical points are nondegenerate.  Some authors impose the additional requirements that every critical value has only one corresponding critical point, or that f be proper (= preimage of a compact set is compact).  For now, I’ll stick to my original definition.

Morse functions are basically as good as it gets for our current approach:  Almost all level sets f^{-1}(c) are smooth submanifolds of M of codimension one, and the bad points (=critical points) where our analysis fails are isolated incidents, and even then, we know exactly what f looks like in an open neighborhood of a bad point.  But are Morse functions too good to be true?  Do we encounter them often?  As it turns out, like our worries about regular points/values, “almost all” smooth functions are Morse functions.  The core of the proof is actually (again) just Sard’s theorem.

Let’s just examine the case where f is a smooth function on an open subset U \subseteq \mathbb{R}^n to \mathbb{R}.  Let (x_1,\cdots,x_n) be a choice of coordinates on U.  For a = (a_1,\cdots,a_n) \in \mathbb{R}^n, we define a smooth function

f_a := f + a_1x_1 + \cdots + a_n x_n

Theorem: No matter what the function f is, for almost all choices of a, f_a is a Morse function on U.

Again, we use the function F(q) = (\frac{\partial f}{\partial x_1}(q),\cdots,\frac{\partial f}{\partial x_n}(q)).  Then, the derivative of f_a at a point p is represented in these coordinates as

d_p f_a = (\frac{\partial f_a}{\partial x_1}(p),\cdots, \frac{\partial f}{\partial x_n}(p)) = F(p) + a

So, p is a critical point of f_a if and only if F(p) = -a.  Since f_a and f have the same second partials, the Hessian of f at p is the matrix d_p F.  If -a is a regular value of F, whenever F(p) = -a, d_pF is nonsingular.  Consequently, every critical point of f_a is nondegenerate.  Sard’s theorem then implies that -a is a regular value of F for almost all a \in \mathbb{R}^n.

There’s so much more to talk about, but I’ve already rambled on for quite a bit.  Until next time.


Published by brianhepler

I'm a third-year math postdoc at the University of Wisconsin-Madison, where I work as a member of the geometry and topology research group. Generally speaking, I think math is pretty neat; and, if you give me the chance, I'll talk your ear off. Especially the more abstract stuff. It's really hard to communicate that love with the general population, but I'm going to do my best to show you a world of pure imagination.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: