Extrema

In this part of the course we work on the following skills:

Locating and classifying the extrema of scalar fields.
Applying Lagrange's multipliers method to optimize quantities with respect to constraints.

See also the exercises associated to this part of the course.

In the previous chapter we introduced various notions of differentials for higher dimensional functions (scalar fields, vector fields, paths, etc.). This part of the course is devoted to searching for extrema (minima / maxima) in various different scenarios. This extends what we already know for functions in $R$ and we will find that in higher dimensions more possibilities and subtleties exist.

Extrema (minima / maxima / saddle)

Let $S \subset R^{n}$ be open, $f : S \to R$ be a scalar field and $a \in S$ .

Definition

If $f (a) \leq f (x)$ (resp. $f (a) \geq f (x)$ ) for all $x \in S$ , then $f (a)$ is said to be the absolute minimum (resp. maximum) of $f$ .

Definition

If $f (a) \leq f (x)$ (resp. $f (a) \geq f (x)$ ) for all $x \in B (a, r)$ for some $r > 0$ , then $f (a)$ is said to be a relative minimum (resp. maximum) of $f$ .

Collectively we call the these points the extrema of the scalar field. In the case of a scalar field defined on $R^{2}$ we can visualize the scalar field as a 3D plot like the figure. Here we see the extrema as the "flat" places. We sometimes use global as a synonym of absolute and local as a synonym of relative.

Bumps — $f (x, y) = x e^{- (x^{2} y^{2})} + \frac{1}{4} e^{y^{\frac{3}{10}}}$

To proceed it is convenient to connect the extrema with the behaviour of the gradient of the scalar field.

Theorem

If $f : S \to R$ is differentiable and has a relative minimum or maximum at $a$ , then $\nabla f (a) = 0$ .

Proof

Suppose $f$ has a relative minimum at $a$ (or consider $- f$ ). For any unit vector $v$ let $g (u) = f (a + u v)$ . We know that $g : R \to R$ has a relative minimum at $u = 0$ so $u^{'} (0) = 0$ . This means that the directional derivative $D_{v} f (a) = 0$ for every $v$ . Consequently this means that $\nabla f (a) = 0$ .

Graph of an inflection — $\nabla f (a) = 0$ doesn't imply a minimum or maximum at $a$ , even in $R$ , as seen with the function $f (x) = x^{3}$ . In higher dimensions even more is possible.

Observe that, here and in the subsequent text, we can always consider the case of $f : R \to R$ , i.e., the case of $R^{n}$ where $n = 1$ . Everything still holds and reduces to the arguments and formulae previously developed for functions of one variable.

Definition (stationary point)

If $\nabla f (a) = 0$ then $a$ is called a stationary point.

Graph of a bowl shaped function — If $f (x, y) = x^{2} + y^{2}$ then $\nabla f (x, y) = (\begin{matrix} 2 x \\ 2 y \end{matrix})$ and $\nabla f (0, 0) = (\begin{matrix} 0 \\ 0 \end{matrix})$ . The point $(0, 0)$ is an absolute minimum for $f$ .

As we see in the inflection example, the converse of the above theorem fails in the sense that a stationary point might not be a minimum or a maximum. The motivates the following.

Definition (saddle point)

If $\nabla f (a) = 0$ and $a$ is neither a minimum nor a maximum then $a$ is said to be a saddle point.

The quintessential saddle has the shape seen in the graph. However it might be similar to an inflection in 1D or more complicated using the possibilities available in higher dimension.

Graph of a saddle — If $f (x, y) = x^{2} - y^{2}$ then $\nabla f (x, y) = (\begin{matrix} 2 x \\ - 2 y \end{matrix})$ and $\nabla f (0, 0) = 0$ . The point $(0, 0)$ is a saddle point for $f$ .

Hessian matrix

To proceed it is useful to develop the idea of a second order Taylor expansion in this higher dimensional setting. In particular this will allow us to identify the local behaviour close to stationary points. The main object for doing this is the Hessian matrix. Let $f : R^{2} \to R$ be twice differentiable and use the notation $f (x, y)$ . The Hessian matrix at $a \in R^{2}$ is defined as

H f (a) = (\begin{matrix} \frac{\partial^{2} f}{\partial x^{2}} (a) & \frac{\partial^{2} f}{\partial x \partial y} (a) \\ \frac{\partial^{2} f}{\partial y \partial x} (a) & \frac{\partial^{2} f}{\partial y^{2}} (a) \end{matrix}) .

Observe that the Hessian matrix $H f (a)$ is a symmetric matrix since we know that

\frac{\partial^{2} f}{\partial x \partial y} (a) = \frac{\partial^{2} f}{\partial y \partial x} (a)

for twice differentiable functions.

The Hessian matrix is defined analogously in any dimension.

Let $f : R^{n} \to R$ be twice differentiable. The Hessian matrix at $a \in R^{n}$ is defined as

H f (a) = (\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} (a) & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} (a) & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{n}} (a) \\ \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} (a) & \frac{\partial^{2} f}{\partial x_{2}^{2}} (a) & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{n}} (a) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} f}{\partial x_{n} \partial x_{1}} (a) & \frac{\partial^{2} f}{\partial x_{n} \partial x_{2}} (a) & \dots & \frac{\partial^{2} f}{\partial x_{n}^{2}} (a) \end{matrix}) .

Observe that the Hessian matrix is a real symmetric matrix in any dimension. If $f : R \to R$ then $H f (a)$ is a $1 \times 1$ matrix and coincides with the second derivative of $f$ . In this sense what we know about extrema in $R$ is just a special case of everything we do here.

As an example, let $f (x, y) = x^{2} - y^{2}$ (figure). The gradient and the Hessian are respectively

\begin{aligned} \nabla f (x, y) & = (\begin{array}{c} \frac{\partial f}{\partial x} (x, y) \\ \frac{\partial f}{\partial y} (x, y) \end{array}) = (\begin{array}{c} 2 x \\ - 2 y \end{array}), \\ H f (x, y) & = (\begin{array}{c} \frac{\partial^{2} f}{\partial x^{2}} (x, y) & \frac{\partial^{2} f}{\partial x \partial y} (x, y) \\ \frac{\partial^{2} f}{\partial y \partial x} (x, y) & \frac{\partial^{2} f}{\partial y^{2}} (x, y) \end{array}) = (\begin{array}{c} 2 & 0 \\ 0 & - 2 \end{array}) . \end{aligned}

The point $(0, 0)$ is a stationary point since $\nabla f (0, 0) = (\begin{matrix} 0 \\ 0 \end{matrix})$ . In this example $H f$ does not depend on $(x, y)$ but in general we can expect dependence and so it gives a different matrix at different points $(x, y)$ .

Theorem

If $v = (v_{1}, \dots, v_{n})$ then,

v H f (a) v^{T} = \sum_{j, k = 0}^{n} \frac{\partial^{2} f}{\partial x_{j} \partial x_{k}} (a) v_{j} v_{k} \in R .

Proof

Multiplying the matrices we calculate that

\begin{aligned} v H f (a) v^{T} & = (\begin{array}{c} v_{1} & \dots & v_{n} \end{array}) (\begin{array}{c} \partial_{1} \partial_{1} f (a) & \dots & \partial_{1} \partial_{n} f (a) \\ ⋮ & ⋱ & ⋮ \\ \partial_{n} \partial_{1} f (a) & \dots & \partial_{n} \partial_{n} f (a) \end{array}) (\begin{array}{c} v_{1} \\ ⋮ \\ v_{n} \end{array}) \\ = \sum_{j, k = 0}^{n} \partial_{j} \partial_{k} f (a) v_{j} v_{k} \end{aligned}

as required.

Second order Taylor formula for scalar fields

First let's recall the first order Taylor approximation we saw before. If $f$ is differentiable at $a$ then

f (x) \approx f (a) + \nabla f (a) \cdot (x - a) .

If $a$ is a stationary point then this only tells us that $f (x) \approx f (a)$ so a natural next question is to search for slightly more detailed information.

Theorem (second order Taylor for scalar fields)

Let $f$ be a scalar field twice differentiable on $B (a, r)$ . Then, for $x$ close to $a$ ,

f (x) \approx f (a) + \nabla f (a) \cdot (x - a) + \frac{1}{2} (x - a) H f (a) (x - a)^{T}

in the sense that the error is $o ({‖ (x - a) ‖}^{2})$ .

Proof

Let $v = x - a$ and let $g (u) = f (a + u v)$ . The Taylor expansion of $g$ tells us that $g (1) = g (0) + g^{'} (0) + \frac{1}{2} g^{″} (c)$ for some $c \in (0, 1)$ . Since $g (u) = f (a_{1} + u v_{1}, \dots, a_{n} + u v_{n})$ , by the chain rule,

\begin{aligned} g^{'} (u) & = \sum_{j = 1}^{n} \partial_{j} f (a_{1} + u v_{1}, \dots, a_{n} + u v_{n}) v_{j} = \nabla f (a + u v) \cdot v, \\ g^{″} (u) & = \sum_{j, k = 1}^{n} \partial_{j} \partial_{k} f (a_{1} + u v_{1}, \dots, a_{n} + u v_{n}) v_{j} v_{k} \\ = v^{T} H f (a + u v) v . \end{aligned}

Consequently $f (a + v) = f (a) + \nabla f (a) \cdot v + \frac{1}{2} v^{T} H f (a + c v) v$ . We define the "error" in the approximation as $ϵ (v) = \frac{1}{2} v^{T} (H f (a + c v) - H f (a)) v$ and estimate that

| ϵ (v) | \leq \sum_{j, k = 0}^{n} v_{j} v_{k} (\partial_{j} \partial_{k} f (a + c v) - \partial_{j} \partial_{k} f (a)) .

Since $| v_{j} v_{k} | \leq {‖ v ‖}^{2}$ we observe that $\frac{| ϵ (v) |}{{‖ v ‖}^{2}} \to 0$ as $‖ v ‖ \to 0$ as required.

Classifying stationary points

In order to classify the stationary points we will take advantage of the Hessian matrix and therefore we need to first understand the follow fact about real symmetric matrices.

Theorem

Let $A$ be a real symmetric matrix and let $Q (v) = v^{T} A v$ . Then,

$Q (v) > 0$ for all $v \neq 0$ $⟺$ all eigenvalues of $A$ are positive,
$Q (v) < 0$ for all $v \neq 0$ $⟺$ all eigenvalues of $A$ are negative.

Proof

Since $A$ is symmetric it can be diagonalised by matrix $B$ which is orthogonal ( $B^{T} = B^{- 1}$ ) and the diagonal matrix $D = B^{T} A B$ has the eigenvalues of $A$ as the diagonal. This means that $Q (v) = v^{T} B^{T} B A B^{T} B v = w^{T} D w$ where $w = B v$ . Consequently $Q (v) = \sum_{j} λ_{j} w_{j}^{2}$ . Observe that, if all $λ_{j} > 0$ then $\sum_{j} λ_{j} w_{j}^{2} > 0$ .

In order to prove the other direction in the "if and only if" statement, observe that $Q (B u_{k}) = λ_{k}$ . This means that, if $Q (v) > 0$ for all $v \neq 0$ then $λ_{k} > 0$ for all $k$ .

Theorem (classifying stationary points)

Let $f$ be a scalar field twice differentiable on $B (a, r)$ . Suppose $\nabla f (a) = 0$ and consider the eigenvalues of $H f (a)$ . Then,

All eigenvalues are positive $⟹$ relative minimum at $a$ ,
All eigenvalues are negative $⟹$ relative maximum at $a$ ,
Some positive, some negative $⟹$ $a$ is a saddle point.

Proof

Let $Q (v) = v^{T} H f (a) v$ , $w = B v$ and let $Λ := min_{j} λ_{j}$ . Observe that $‖ w ‖ = ‖ v ‖$ and that $Q (v) = \sum_{j} λ_{j} w_{j}^{2} \geq Λ \sum_{j} w_{j}^{2} = Λ {‖ v ‖}^{2}$ . We have them 2^nd-order Taylor

\begin{aligned} f (a + v) - f (a) & = \frac{1}{2} v^{T} H f (a) v + ϵ (v) \\ \geq (\frac{Λ}{2} - \frac{ϵ (v)}{{‖ v ‖}^{2}}) {‖ v ‖}^{2} . \end{aligned}

Since $\frac{| ϵ (v) |}{{‖ v ‖}^{2}} \to 0$ as $‖ v ‖ \to 0$ , $\frac{| ϵ (v) |}{{‖ v ‖}^{2}} < \frac{Λ}{2}$ when $‖ v ‖$ is small. The argument is analogous for the second part. For final part consider $v_{j}$ which is the eigenvector for $λ_{j}$ and apply the argument of the first or second part.

Attaining extreme values

Here we explore the extreme value theorem for continuous scalar fields. The argument will be in two parts: Firstly we show that continuity implies boundedness; Secondly we show that boundedness implies that the maximum and minimum are attained. We use the following notation for interval / rectangle / cuboid / tesseract, etc. If $a = (a_{1}, \dots, a_{n})$ and $b = (b_{1}, \dots, b_{n})$ then we consider the $n$ -dimensional closed Cartesian product

[a, b] = [a_{1}, b_{1}] \times \dots \times [a_{n}, b_{n}] .

We call this set a rectangle (independent of the dimension). As a first step it is convenient to know that all sequences in our setting have convergent subsequences.

Theorem

If ${x_{n}}_{n}$ is a sequence in $[a, b]$ there exists a convergent subsequence ${x_{n_{j}}}_{j}$ .

Proof

In order to prove the theorem we construct the subsequence. Firstly we divide $[a, b]$ into sub-rectangles of size half the original. We then choose a sub-rectangle which contains infinite elements of the sequence and choose the first of these elements to be part of the sub-sequence. We repeat this process by again dividing the sub-rectangle we chose by half and choosing the next element of the subsequence. We repeat to give the full subsequence.

Theorem

Suppose that $f$ is a scalar field continuous at every point in the closed rectangle $[a, b]$ . Then $f$ is bounded on $[a, b]$ in the sense that there exists $C > 0$ such that $| f (x) | \leq C$ for all $x \in [a, b]$ .

Proof

Suppose the contrary: for all $n \in N$ there exists $x_{n} \in [a, b]$ such that $| f (x_{n}) | > n$ . Bolzano-Weierstrass theorem means that there exists a subsequence ${x_{n_{j}}}_{j}$ converges to $x \in [a, b]$ . Continuity of $f$ means that $f (x_{n_{j}})$ converges to $f (x)$ . This is a contradiction and hence the theorem is proved.

We can now use the above result on the boundedness in order to show that the extreme values are actually obtained.

Theorem

Suppose that $f$ is a scalar field continuous at every point in the closed rectangle $[a, b]$ . Then there exist points $x, y \in [a, b]$ such that

f (x) = inf f and f (y) = sup f .

Proof

By the boundedness theorem $sup f$ is finite and so there exists a sequence ${x_{n}}_{n}$ such that $f (x_{n})$ converges to $sup f$ . Bolzano-Weierstrass theorem implies that there exists a subsequence ${x_{n_{j}}}_{j}$ which converges to $x \in [a, b]$ . By continuity $f (x_{n}) \to f (x) = sup f$ .

Extrema with constraints (Lagrange multipliers)

We now consider a slightly different problem to the one earlier in this chapter. There we wished to find the extrema of a given scalar field. Here the general problem is to minimise or maximise a given scalar field $f (x, y)$ under the constraint $g (x, y) = 0$ . Subsequently we will also consider the same problem but in higher dimensions. For this graphic representation we draw the constraint and also various level sets of the function that we want to find the extrema of. The graphical representation suggests to us that at the "touching point" the gradient vectors are parallel. In other words, $\nabla f = λ \nabla g$ for some $λ \in R$ . The implementation of this idea is the method of Lagrange multipliers.

Suppose that a differentiable scalar field $f (x, y)$ has a relative minimum or maximum when it is subject to the constraint $g (x, y) = 0$ . Then there exists a scalar $λ$ such that, at the extrema point,

\nabla f = λ \nabla g .

Visualization of Lagrange multiplier method — Searching extrema of $f$ under constraint $g = 0$

In three dimensions a similar result holds. Suppose that a differentiable scalar field $f (x, y, z)$ has a relative minimum or maximum when it is subject to the constraints

g_{1} (x, y, z) = 0, g_{2} (x, y, z) = 0

and the $\nabla g_{k}$ are linearly independent. Then there exist scalars $λ_{1}$ , $λ_{2}$ such that, at the extrema point,

\nabla f = λ_{1} \nabla g_{1} + λ_{2} \nabla g_{2} .

In higher dimensions and possibly with additional constraints we have the following general theorem.

Theorem (Lagrange multipliers)

Suppose that a differentiable scalar field $f (x_{1}, \dots, x_{n})$ has an relative extrema when it is subject to $m$ constraints

g_{1} (x_{1}, \dots, x_{n}) = 0, \dots, g_{m} (x_{1}, \dots, x_{n}) = 0,

where $m < n$ , and the $\nabla g_{k}$ are all linearly independent. Then there exist $m$ scalars $λ_{1}, \dots, λ_{m}$ such that, at each extrema point,

\nabla f = λ_{1} \nabla g_{1} + \dots + λ_{m} \nabla g_{m} .

The Lagrange multiplier method is often stated and far less often proved.

Since the proof is rather involved we will follow this tradition here. See, for example, Chapter 14 of "A First Course in Real Analysis" (2012) by Protter & Morrey for a complete proof and further discussion.

Idea of proof

Let us consider a particular case of the method when $n = 3$ and $m = 2$ . More precisely we consider the following problem: Find the maxima and minima of $f (x, y, z)$ along the curve $C$ defined as

g_{1} (x, y, z) = 0, g_{2} (x, y, z) = 0

where $g_{1}$ , $g_{2}$ are differentiable functions. In this particular case we will prove the Lagrange multiplier method. Suppose that $a$ is some point in the curve. Let $α (t)$ denote a path which lies in the curve $C$ in the sense that $α (t) \in C$ for all $t \in (- 1, 1)$ , $α^{'} (t) \neq 0$ and $α (0) = a$ . If $a$ is a local minimum for $f$ restricted to $C$ it means that $f (α (t)) \geq f (α (0))$ for all $t \in (- δ, δ)$ for some $δ > 0$ . In words, moving away from $a$ along the curve $C$ doesn't cause $f (x)$ to decrease. Let $h (t) = f (α (t))$ and observe that $h : R \to R$ so we know how to find the extrema. In particular we know that $h^{'} (0) = 0$ . By the chain rule $h^{'} (t) = \nabla f (α (t)) \cdot α^{'} (t)$ and so

\nabla f (a) \cdot α^{'} (0) = 0.

Since we know that $g_{1} (α (t)) = 0$ and $g_{2} (α (t)) = 0$ , again by the chain rule,

\nabla g_{1} (a) \cdot α^{'} (0) = 0, \nabla g_{2} (a) \cdot α^{'} (0) = 0.

To proceed it is convenient to isolate the following result of linear algebra.

Consider $w, u_{1}, u_{2} \in R^{3}$ and let $V = {v : u_{k} \cdot v = 0, k = 1, 2}$ . If $w \cdot v = 0$ for all $v \in V$ then $w = λ_{1} u_{1} + λ_{2} u_{2}$ for some $λ_{1}, λ_{2} \in R$ .

In order to prove this we write $w = λ_{1} u_{1} + λ_{2} u_{2} + v_{0}$ where $v_{0} \in V$ because $u_{1}, u_{2}$ together with $V$ must span $R^{3}$ . Since $v_{0} \in V$ and, by assumption, $w \cdot v_{0} = 0$ ,

0 = w \cdot v_{0} = (λ_{1} u_{1} + λ_{2} u_{2} + v_{0}) \cdot v_{0} = v_{0} \cdot v_{0} = {‖ v_{0} ‖}^{2} .

This means that $v_{0} = 0$ and so $w = λ_{1} u_{1} + λ_{2} u_{2}$ .

The above statement holds in any dimension with any number of vectors with the analogous proof. Applying this lemma to the vectors $\nabla f (a)$ , $\nabla g_{1} (a)$ and $\nabla g_{2} (a)$ recovers exactly the Lagrange multiplier method in this setting.

Extrema ​

Extrema (minima / maxima / saddle) ​

Hessian matrix ​

Second order Taylor formula for scalar fields ​

Classifying stationary points ​

Attaining extreme values ​

Extrema with constraints (Lagrange multipliers) ​

Extrema

Extrema (minima / maxima / saddle)

Hessian matrix

Second order Taylor formula for scalar fields

Classifying stationary points

Attaining extreme values

Extrema with constraints (Lagrange multipliers)