# Extrema ​

In this part of the course we work on the following skills:

• Locating and classifying the extrema of scalar fields.
• Applying Lagrange's multipliers method to optimize quantities with respect to constraints.

In the previous chapter we introduced various notions of differentials for higher dimensional functions (scalar fields, vector fields, paths, etc.). This part of the course is devoted to searching for extrema (minima / maxima) in various different scenarios. This extends what we already know for functions in $\mathbb{R}$ and we will find that in higher dimensions more possibilities and subtleties exist.

## Extrema (minima / maxima / saddle) ​

Let $S\subset {\mathbb{R}}^{n}$ be open, $f:S\to \mathbb{R}$ be a scalar field and $\mathbf{a}\in S$.

Definition

If $f\left(\mathbf{a}\right)\le f\left(\mathbf{x}\right)$ (resp. $f\left(\mathbf{a}\right)\ge f\left(\mathbf{x}\right)$) for all $\mathbf{x}\in S$, then $f\left(\mathbf{a}\right)$ is said to be the absolute minimum (resp. maximum) of $f$.

Definition

If $f\left(\mathbf{a}\right)\le f\left(\mathbf{x}\right)$ (resp. $f\left(\mathbf{a}\right)\ge f\left(\mathbf{x}\right)$) for all $\mathbf{x}\in B\left(\mathbf{a},r\right)$ for some $r>0$, then $f\left(\mathbf{a}\right)$ is said to be a relative minimum (resp. maximum) of $f$.

Collectively we call the these points the extrema of the scalar field. In the case of a scalar field defined on ${\mathbb{R}}^{2}$ we can visualize the scalar field as a 3D plot like the figure. Here we see the extrema as the "flat" places. We sometimes use global as a synonym of absolute and local as a synonym of relative.

To proceed it is convenient to connect the extrema with the behaviour of the gradient of the scalar field.

Theorem

If $f:S\to \mathbb{R}$ is differentiable and has a relative minimum or maximum at $\mathbf{a}$, then $\mathrm{\nabla }f\left(\mathbf{a}\right)=\mathbf{0}$.

Proof

Suppose $f$ has a relative minimum at $\mathbf{a}$ (or consider $-f$). For any unit vector $\mathbf{v}$ let $g\left(u\right)=f\left(\mathbf{a}+u\mathbf{v}\right)$. We know that $g:\mathbb{R}\to \mathbb{R}$ has a relative minimum at $u=0$ so ${u}^{\prime }\left(0\right)=0$. This means that the directional derivative ${D}_{\mathbf{v}}f\left(\mathbf{a}\right)=0$ for every $\mathbf{v}$. Consequently this means that $\mathrm{\nabla }f\left(\mathbf{a}\right)=\mathbf{0}$.

Observe that, here and in the subsequent text, we can always consider the case of $f:\mathbb{R}\to \mathbb{R}$, i.e., the case of ${\mathbb{R}}^{n}$ where $n=1$. Everything still holds and reduces to the arguments and formulae previously developed for functions of one variable.

Definition (stationary point)

If $\mathrm{\nabla }f\left(\mathbf{a}\right)=\mathbf{0}$ then $\mathbf{a}$ is called a stationary point.

As we see in the inflection example, the converse of the above theorem fails in the sense that a stationary point might not be a minimum or a maximum. The motivates the following.

If $\mathrm{\nabla }f\left(\mathbf{a}\right)=\mathbf{0}$ and $\mathbf{a}$ is neither a minimum nor a maximum then $\mathbf{a}$ is said to be a saddle point.

The quintessential saddle has the shape seen in the graph. However it might be similar to an inflection in 1D or more complicated using the possibilities available in higher dimension.

## Hessian matrix ​

To proceed it is useful to develop the idea of a second order Taylor expansion in this higher dimensional setting. In particular this will allow us to identify the local behaviour close to stationary points. The main object for doing this is the Hessian matrix. Let $f:{\mathbb{R}}^{2}\to \mathbb{R}$ be twice differentiable and use the notation $f\left(x,y\right)$. The Hessian matrix at $\mathbf{a}\in {\mathbb{R}}^{2}$ is defined as

$\mathbf{H}f\left(\mathbf{a}\right)=\left(\begin{array}{cc}\frac{{\partial }^{2}f}{\partial {x}^{2}}\left(\mathbf{a}\right)& \frac{{\partial }^{2}f}{\partial x\phantom{\rule{0.167em}{0ex}}\partial y}\left(\mathbf{a}\right)\\ \frac{{\partial }^{2}f}{\partial y\phantom{\rule{0.167em}{0ex}}\partial x}\left(\mathbf{a}\right)& \frac{{\partial }^{2}f}{\partial {y}^{2}}\left(\mathbf{a}\right)\end{array}\right).$

Observe that the Hessian matrix $\mathbf{H}f\left(\mathbf{a}\right)$ is a symmetric matrix since we know that

$\frac{{\partial }^{2}f}{\partial x\phantom{\rule{0.167em}{0ex}}\partial y}\left(\mathbf{a}\right)=\frac{{\partial }^{2}f}{\partial y\phantom{\rule{0.167em}{0ex}}\partial x}\left(\mathbf{a}\right)$

for twice differentiable functions.

The Hessian matrix is defined analogously in any dimension.

Let $f:{\mathbb{R}}^{n}\to \mathbb{R}$ be twice differentiable. The Hessian matrix at $\mathbf{a}\in {\mathbb{R}}^{n}$ is defined as

$\mathbf{H}f\left(\mathbf{a}\right)=\left(\begin{array}{cccc}\frac{{\partial }^{2}f}{\partial {x}_{1}^{2}}\left(\mathbf{a}\right)& \frac{{\partial }^{2}f}{\partial {x}_{1}\phantom{\rule{0.167em}{0ex}}\partial {x}_{2}}\left(\mathbf{a}\right)& \cdots & \frac{{\partial }^{2}f}{\partial {x}_{1}\phantom{\rule{0.167em}{0ex}}\partial {x}_{n}}\left(\mathbf{a}\right)\\ \frac{{\partial }^{2}f}{\partial {x}_{2}\phantom{\rule{0.167em}{0ex}}\partial {x}_{1}}\left(\mathbf{a}\right)& \frac{{\partial }^{2}f}{\partial {x}_{2}^{2}}\left(\mathbf{a}\right)& \cdots & \frac{{\partial }^{2}f}{\partial {x}_{2}\phantom{\rule{0.167em}{0ex}}\partial {x}_{n}}\left(\mathbf{a}\right)\\ ⋮& ⋮& \ddots & ⋮\\ \frac{{\partial }^{2}f}{\partial {x}_{n}\phantom{\rule{0.167em}{0ex}}\partial {x}_{1}}\left(\mathbf{a}\right)& \frac{{\partial }^{2}f}{\partial {x}_{n}\phantom{\rule{0.167em}{0ex}}\partial {x}_{2}}\left(\mathbf{a}\right)& \cdots & \frac{{\partial }^{2}f}{\partial {x}_{n}^{2}}\left(\mathbf{a}\right)\end{array}\right).$

Observe that the Hessian matrix is a real symmetric matrix in any dimension. If $f:\mathbb{R}\to \mathbb{R}$ then $\mathbf{H}f\left(a\right)$ is a $1×1$ matrix and coincides with the second derivative of $f$. In this sense what we know about extrema in $\mathbb{R}$ is just a special case of everything we do here.

As an example, let $f\left(x,y\right)={x}^{2}-{y}^{2}$ (figure). The gradient and the Hessian are respectively

$\begin{array}{rl}\mathrm{\nabla }f\left(x,y\right)& =\left(\begin{array}{c}\frac{\partial f}{\partial x}\left(x,y\right)\\ \frac{\partial f}{\partial y}\left(x,y\right)\end{array}\right)=\left(\begin{array}{c}2x\\ -2y\end{array}\right),\\ \mathbf{H}f\left(x,y\right)& =\left(\begin{array}{cc}\frac{{\partial }^{2}f}{\partial {x}^{2}}\left(x,y\right)& \frac{{\partial }^{2}f}{\partial x\phantom{\rule{0.167em}{0ex}}\partial y}\left(x,y\right)\\ \frac{{\partial }^{2}f}{\partial y\phantom{\rule{0.167em}{0ex}}\partial x}\left(x,y\right)& \frac{{\partial }^{2}f}{\partial {y}^{2}}\left(x,y\right)\end{array}\right)=\left(\begin{array}{cc}2& 0\\ 0& -2\end{array}\right).\end{array}$

The point $\left(0,0\right)$ is a stationary point since $\mathrm{\nabla }f\left(0,0\right)=\left(\begin{array}{c}0\\ 0\end{array}\right)$. In this example $\mathbf{H}f$ does not depend on $\left(x,y\right)$ but in general we can expect dependence and so it gives a different matrix at different points $\left(x,y\right)$.

Theorem

If $\mathbf{v}=\left({v}_{1},\dots ,{v}_{n}\right)$ then,

Proof

Multiplying the matrices we calculate that

as required.

## Second order Taylor formula for scalar fields ​

First let's recall the first order Taylor approximation we saw before. If $f$ is differentiable at $\mathbf{a}$ then

$f\left(\mathbf{x}\right)\approx f\left(\mathbf{a}\right)+\mathrm{\nabla }f\left(\mathbf{a}\right)\cdot \left(\mathbf{x}-\mathbf{a}\right).$

If $\mathbf{a}$ is a stationary point then this only tells us that $f\left(\mathbf{x}\right)\approx f\left(\mathbf{a}\right)$ so a natural next question is to search for slightly more detailed information.

Theorem (second order Taylor for scalar fields)

Let $f$ be a scalar field twice differentiable on $B\left(\mathbf{a},r\right)$. Then, for $\mathbf{x}$ close to $\mathbf{a}$,

in the sense that the error is $o\left({‖\left(\mathbf{x}-\mathbf{a}\right)‖}^{2}\right)$.

Proof

Let $\mathbf{v}=\mathbf{x}-\mathbf{a}$ and let $g\left(u\right)=f\left(\mathbf{a}+u\mathbf{v}\right)$. The Taylor expansion of $g$ tells us that $g\left(1\right)=g\left(0\right)+{g}^{\prime }\left(0\right)+\frac{1}{2}{g}^{″}\left(c\right)$ for some $c\in \left(0,1\right)$. Since $g\left(u\right)=f\left({a}_{1}+u{v}_{1},\dots ,{a}_{n}+u{v}_{n}\right)$, by the chain rule,

Consequently . We define the "error" in the approximation as $ϵ\left(\mathbf{v}\right)=\frac{1}{2}{\mathbf{v}}^{\mathbf{T}}\left(\mathbf{H}f\left(\mathbf{a}+c\mathbf{v}\right)-\mathbf{H}f\left(\mathbf{a}\right)\right)\mathbf{v}$ and estimate that

$|ϵ\left(\mathbf{v}\right)|\le \sum _{j,k=0}^{n}{v}_{j}{v}_{k}\left({\partial }_{j}{\partial }_{k}f\left(\mathbf{a}+c\mathbf{v}\right)-{\partial }_{j}{\partial }_{k}f\left(\mathbf{a}\right)\right).$

Since $|{v}_{j}{v}_{k}|\le {‖\mathbf{v}‖}^{2}$ we observe that $\frac{|ϵ\left(\mathbf{v}\right)|}{{‖\mathbf{v}‖}^{2}}\to 0$ as $‖\mathbf{v}‖\to 0$ as required.

## Classifying stationary points ​

In order to classify the stationary points we will take advantage of the Hessian matrix and therefore we need to first understand the follow fact about real symmetric matrices.

Theorem

Let $A$ be a real symmetric matrix and let $Q\left(\mathbf{v}\right)={\mathbf{v}}^{\mathbf{T}}A\mathbf{v}$. Then,

• $Q\left(\mathbf{v}\right)>0$ for all $\mathbf{v}\ne \mathbf{0}$ $\phantom{\rule{1em}{0ex}}⟺\phantom{\rule{1em}{0ex}}$ all eigenvalues of $A$ are positive,
• $Q\left(\mathbf{v}\right)<0$ for all $\mathbf{v}\ne \mathbf{0}$ $\phantom{\rule{1em}{0ex}}⟺\phantom{\rule{1em}{0ex}}$ all eigenvalues of $A$ are negative.
Proof

Since $A$ is symmetric it can be diagonalised by matrix $B$ which is orthogonal (${B}^{\mathbf{T}}={B}^{-1}$) and the diagonal matrix $D={B}^{\mathbf{T}}AB$ has the eigenvalues of $A$ as the diagonal. This means that $Q\left(\mathbf{v}\right)={\mathbf{v}}^{\mathbf{T}}{B}^{\mathbf{T}}BA{B}^{\mathbf{T}}B\mathbf{v}={\mathbf{w}}^{\mathbf{T}}D\mathbf{w}$ where $\mathbf{w}=B\mathbf{v}$. Consequently $Q\left(\mathbf{v}\right)=\sum _{j}{\lambda }_{j}{w}_{j}^{2}$. Observe that, if all ${\lambda }_{j}>0$ then $\sum _{j}{\lambda }_{j}{w}_{j}^{2}>0$.

In order to prove the other direction in the "if and only if" statement, observe that $Q\left(B{\mathbf{u}}_{k}\right)={\lambda }_{k}$. This means that, if $Q\left(\mathbf{v}\right)>0$ for all $\mathbf{v}\ne \mathbf{0}$ then ${\lambda }_{k}>0$ for all $k$.

Theorem (classifying stationary points)

Let $f$ be a scalar field twice differentiable on $B\left(\mathbf{a},r\right)$. Suppose $\mathrm{\nabla }f\left(\mathbf{a}\right)=\mathbf{0}$ and consider the eigenvalues of $\mathbf{H}f\left(\mathbf{a}\right)$. Then,

• All eigenvalues are positive $\phantom{\rule{1em}{0ex}}⟹\phantom{\rule{1em}{0ex}}$ relative minimum at $\mathbf{a}$,
• All eigenvalues are negative $\phantom{\rule{1em}{0ex}}⟹\phantom{\rule{1em}{0ex}}$ relative maximum at $\mathbf{a}$,
• Some positive, some negative $\phantom{\rule{1em}{0ex}}⟹\phantom{\rule{1em}{0ex}}$ $\mathbf{a}$ is a saddle point.

Proof

Let $Q\left(\mathbf{v}\right)={\mathbf{v}}^{\mathbf{T}}\mathbf{H}f\left(\mathbf{a}\right)\mathbf{v}$, $\mathbf{w}=B\mathbf{v}$ and let $\mathrm{\Lambda }:=\underset{j}{min}{\lambda }_{j}$. Observe that $‖\mathbf{w}‖=‖\mathbf{v}‖$ and that $Q\left(\mathbf{v}\right)=\sum _{j}{\lambda }_{j}{w}_{j}^{2}\ge \mathrm{\Lambda }\sum _{j}{w}_{j}^{2}=\mathrm{\Lambda }{‖\mathbf{v}‖}^{2}$. We have them 2nd-order Taylor

Since $\frac{|ϵ\left(\mathbf{v}\right)|}{{‖\mathbf{v}‖}^{2}}\to 0$ as $‖\mathbf{v}‖\to 0$, $\frac{|ϵ\left(\mathbf{v}\right)|}{{‖\mathbf{v}‖}^{2}}<\frac{\mathrm{\Lambda }}{2}$ when $‖\mathbf{v}‖$ is small. The argument is analogous for the second part. For final part consider ${\mathbf{v}}_{j}$ which is the eigenvector for ${\lambda }_{j}$ and apply the argument of the first or second part.

## Attaining extreme values ​

Here we explore the extreme value theorem for continuous scalar fields. The argument will be in two parts: Firstly we show that continuity implies boundedness; Secondly we show that boundedness implies that the maximum and minimum are attained. We use the following notation for interval / rectangle / cuboid / tesseract, etc. If $\mathbf{a}=\left({a}_{1},\dots ,{a}_{n}\right)$ and $\mathbf{b}=\left({b}_{1},\dots ,{b}_{n}\right)$ then we consider the $n$-dimensional closed Cartesian product

$\left[\mathbf{a},\mathbf{b}\right]=\left[{a}_{1},{b}_{1}\right]×\cdots ×\left[{a}_{n},{b}_{n}\right].$

We call this set a rectangle (independent of the dimension). As a first step it is convenient to know that all sequences in our setting have convergent subsequences.

Theorem

If ${\left\{{\mathbf{x}}_{n}\right\}}_{n}$ is a sequence in $\left[\mathbf{a},\mathbf{b}\right]$ there exists a convergent subsequence ${\left\{{\mathbf{x}}_{{n}_{j}}\right\}}_{j}$.

Proof

In order to prove the theorem we construct the subsequence. Firstly we divide $\left[\mathbf{a},\mathbf{b}\right]$ into sub-rectangles of size half the original. We then choose a sub-rectangle which contains infinite elements of the sequence and choose the first of these elements to be part of the sub-sequence. We repeat this process by again dividing the sub-rectangle we chose by half and choosing the next element of the subsequence. We repeat to give the full subsequence.

Theorem

Suppose that $f$ is a scalar field continuous at every point in the closed rectangle $\left[\mathbf{a},\mathbf{b}\right]$. Then $f$ is bounded on $\left[\mathbf{a},\mathbf{b}\right]$ in the sense that there exists $C>0$ such that $|f\left(\mathbf{x}\right)|\le C$ for all