Skip to content

Extrema

In this part of the course we work on the following skills:

  • Locating and classifying the extrema of scalar fields.
  • Applying Lagrange's multipliers method to optimize quantities with respect to constraints.

See also the graded exercises and additional exercises associated to this part of the course.

In the previous chapter we introduced various notions of differentials for higher dimensional functions (scalar fields, vector fields, paths, etc.). This part of the course is devoted to searching for extrema (minima / maxima) in various different scenarios. This extends what we already know for functions in R and we will find that in higher dimensions more possibilities and subtleties exist.

Extrema (minima / maxima / saddle)

Let SRn be open, f:SR be a scalar field and aS.

Definition

If f(a)f(x) (resp. f(a)f(x)) for all xS, then f(a) is said to be the absolute minimum (resp. maximum) of f.

Definition

If f(a)f(x) (resp. f(a)f(x)) for all xB(a,r) for some r>0, then f(a) is said to be a relative minimum (resp. maximum) of f.

Collectively we call the these points the extrema of the scalar field. In the case of a scalar field defined on R2 we can visualize the scalar field as a 3D plot like the figure. Here we see the extrema as the "flat" places. We sometimes use global as a synonym of absolute and local as a synonym of relative.

Bumps
f(x,y)=xe(x2y2)+14ey310

To proceed it is convenient to connect the extrema with the behaviour of the gradient of the scalar field.

Theorem

If f:SR is differentiable and has a relative minimum or maximum at a, then f(a)=0.

Proof

Suppose f has a relative minimum at a (or consider f). For any unit vector v let g(u)=f(a+uv). We know that g:RR has a relative minimum at u=0 so u(0)=0. This means that the directional derivative Dvf(a)=0 for every v. Consequently this means that f(a)=0.

Graph of an inflection
f(a)=0 doesn't imply a minimum or maximum at a, even in R, as seen with the function f(x)=x3. In higher dimensions even more is possible.

Observe that, here and in the subsequent text, we can always consider the case of f:RR, i.e., the case of Rn where n=1. Everything still holds and reduces to the arguments and formulae previously developed for functions of one variable.

Definition (stationary point)

If f(a)=0 then a is called a stationary point.

Graph of a bowl shaped function
If f(x,y)=x2+y2 then f(x,y)=(2x2y) and f(0,0)=(00). The point (0,0) is an absolute minimum for f.

As we see in the inflection example, the converse of the above theorem fails in the sense that a stationary point might not be a minimum or a maximum. The motivates the following.

Definition (saddle point)

If f(a)=0 and a is neither a minimum nor a maximum then a is said to be a saddle point.

The quintessential saddle has the shape seen in the graph. However it might be similar to an inflection in 1D or more complicated using the possibilities available in higher dimension.

Graph of a saddle
If f(x,y)=x2y2 then f(x,y)=(2x2y) and f(0,0)=0. The point (0,0) is a saddle point for f.

Hessian matrix

To proceed it is useful to develop the idea of a second order Taylor expansion in this higher dimensional setting. In particular this will allow us to identify the local behaviour close to stationary points. The main object for doing this is the Hessian matrix. Let f:R2R be twice differentiable and use the notation f(x,y). The Hessian matrix at aR2 is defined as

Hf(a)=(2fx2(a)2fxy(a)2fyx(a)2fy2(a)).

Observe that the Hessian matrix Hf(a) is a symmetric matrix since we know that

2fxy(a)=2fyx(a)

for twice differentiable functions.

The Hessian matrix is defined analogously in any dimension.

Let f:RnR be twice differentiable. The Hessian matrix at aRn is defined as

Hf(a)=(2fx12(a)2fx1x2(a)2fx1xn(a)2fx2x1(a)2fx22(a)2fx2xn(a)2fxnx1(a)2fxnx2(a)2fxn2(a)).

Observe that the Hessian matrix is a real symmetric matrix in any dimension. If f:RR then Hf(a) is a 1×1 matrix and coincides with the second derivative of f. In this sense what we know about extrema in R is just a special case of everything we do here.

As an example, let f(x,y)=x2y2 (figure). The gradient and the Hessian are respectively

f(x,y)=(fx(x,y)fy(x,y))=(2x2y),Hf(x,y)=(2fx2(x,y)2fxy(x,y)2fyx(x,y)2fy2(x,y))=(2002).

The point (0,0) is a stationary point since f(0,0)=(00). In this example Hf does not depend on (x,y) but in general we can expect dependence and so it gives a different matrix at different points (x,y).

Theorem

If v=(v1,,vn) then,

v Hf(a) vT=j,k=0n2fxjxk(a)vjvkR.
Proof

Multiplying the matrices we calculate that

v Hf(a) vT=(v1vn)(11f(a)1nf(a)n1f(a)nnf(a))(v1vn)=j,k=0njkf(a)vjvk

as required.

Second order Taylor formula for scalar fields

First let's recall the first order Taylor approximation we saw before. If f is differentiable at a then

f(x)f(a)+f(a)(xa).

If a is a stationary point then this only tells us that f(x)f(a) so a natural next question is to search for slightly more detailed information.

Theorem (second order Taylor for scalar fields)

Let f be a scalar field twice differentiable on B(a,r). Then, for x close to a,

f(x)f(a)+f(a)(xa)+12(xa) Hf(a) (xa)T

in the sense that the error is o((xa)2).

Proof

Let v=xa and let g(u)=f(a+uv). The Taylor expansion of g tells us that g(1)=g(0)+g(0)+12g(c) for some c(0,1). Since g(u)=f(a1+uv1,,an+uvn), by the chain rule,

g(u)=j=1njf(a1+uv1,,an+uvn)vj=f(a+uv)v,g(u)=j,k=1njkf(a1+uv1,,an+uvn)vjvk=vT Hf(a+uv) v.

Consequently f(a+v)=f(a)+f(a)v+12vT Hf(a+cv) v. We define the "error" in the approximation as ϵ(v)=12vT(Hf(a+cv)Hf(a))v and estimate that

|ϵ(v)|j,k=0nvjvk(jkf(a+cv)jkf(a)).

Since |vjvk|v2 we observe that |ϵ(v)|v20 as v0 as required.

Classifying stationary points

In order to classify the stationary points we will take advantage of the Hessian matrix and therefore we need to first understand the follow fact about real symmetric matrices.

Theorem

Let A be a real symmetric matrix and let Q(v)=vTAv. Then,

  • Q(v)>0 for all v0 all eigenvalues of A are positive,
  • Q(v)<0 for all v0 all eigenvalues of A are negative.
Proof

Since A is symmetric it can be diagonalised by matrix B which is orthogonal (BT=B1) and the diagonal matrix D=BTAB has the eigenvalues of A as the diagonal. This means that Q(v)=vTBTBABTBv=wTDw where w=Bv. Consequently Q(v)=jλjwj2. Observe that, if all λj>0 then jλjwj2>0.

In order to prove the other direction in the "if and only if" statement, observe that Q(Buk)=λk. This means that, if Q(v)>0 for all v0 then λk>0 for all k.

Theorem (classifying stationary points)

Let f be a scalar field twice differentiable on B(a,r). Suppose f(a)=0 and consider the eigenvalues of Hf(a). Then,

  • All eigenvalues are positive relative minimum at a,
  • All eigenvalues are negative relative maximum at a,
  • Some positive, some negative a is a saddle point.

Proof

Let Q(v)=vTHf(a)v, w=Bv and let Λ:=minjλj. Observe that w=v and that Q(v)=jλjwj2Λjwj2=Λv2. We have them 2nd-order Taylor

f(a+v)f(a)=12vT Hf(a) v+ϵ(v)(Λ2ϵ(v)v2)v2.

Since |ϵ(v)|v20 as v0, |ϵ(v)|v2<Λ2 when v is small. The argument is analogous for the second part. For final part consider vj which is the eigenvector for λj and apply the argument of the first or second part.

Attaining extreme values

Here we explore the extreme value theorem for continuous scalar fields. The argument will be in two parts: Firstly we show that continuity implies boundedness; Secondly we show that boundedness implies that the maximum and minimum are attained. We use the following notation for interval / rectangle / cuboid / tesseract, etc. If a=(a1,,an) and b=(b1,,bn) then we consider the n-dimensional closed Cartesian product

[a,b]=[a1,b1]××[an,bn].

We call this set a rectangle (independent of the dimension). As a first step it is convenient to know that all sequences in our setting have convergent subsequences.

Theorem

If {xn}n is a sequence in [a,b] there exists a convergent subsequence {xnj}j.

Proof

In order to prove the theorem we construct the subsequence. Firstly we divide [a,b] into sub-rectangles of size half the original. We then choose a sub-rectangle which contains infinite elements of the sequence and choose the first of these elements to be part of the sub-sequence. We repeat this process by again dividing the sub-rectangle we chose by half and choosing the next element of the subsequence. We repeat to give the full subsequence.

Theorem

Suppose that f is a scalar field continuous at every point in the closed rectangle [a,b]. Then f is bounded on [a,b] in the sense that there exists C>0 such that |f(x)|C for all