Skip to content

Differential calculus in higher dimension

In this part of the course we work on the following skills:

  • Become comfortable working with coordinates in arbitrary dimension.
  • Develop an intuition for working with vector fields.
  • Understand the subtleties of derivatives in dimension greater than 1, evaluate and manipulate partial derivatives, directional derivatives, Jacobian.

See also the exercises associated to this part of the course.

Here we start to consider higher dimensional space. That is, instead of R we consider Rn for nN. We will particularly focus on 2D and 3D but everything also holds in any dimension. Going beyond R we have more options for functions and correspondingly more options for derivatives. Various different notation is commonly used. Here we will primarily use (x,y)R2, (x,y,z)R3 or, more generally, x=(x1,x2,,xn)Rn where x1R,,xnR. For example, R2 is the plane, R3 is 3D space.

Definition (inner product)

xy=k=1nxkykR

We recall that the inner product being zero has a geometric meaning, it means that the two vectors are orthogonal. We also recall that the "length" of a vector is given by the norm, defined as follows.

Definition (norm)

x=xx=(k=1nxk2)12.

For example, in R2 then (x,y)=x2+y2. There are various convenient properties for working with norms and inner products, in particular, the Cauchy-Schwarz inequality |xy|x y and the triangle inequality x+yx+y.

The primary higher-dimensional functions we consider in this course are:

  • Scalar fields: f:RnR
  • Vector fields: F:RnRn
  • Paths: α:RRn
  • Change of coordinates: x:RnRn

These possibilities all fit into the general pattern of f:RnRm for n,mN but tradition and use of the function gives us different terminology and symbols. Such functions are useful for representing various practical things, for example: gravitational force; temperature in a region; wind velocity; fluid flow; electric field; etc.

Open sets, closed sets, boundary, continuity

Let aRn, r>0. The open n-ball of radius r and centre a is written as

B(a,r):={xRn:xa<r}.

Definition (interior point)

Let SRn. A point aS is said to be an interior point if there is r>0 such that B(a,r)S. The set of all interior points of S is denoted intS.

Definition (open set)

A set SRn is said to be open if all of its points are interior points, i.e., if intS=S.

Interior points are the centre of a ball contained within the set

For example, open intervals, open disks, open balls, unions of open intervals, etc., are all open sets.

Lemma

Let r>0, aRn. The set B(a,r)Rn is open.

Proof

Let bB(a,r). It suffices to show that b is an interior point. (1) Let r1=ba<r. (2) Let r2=(rr1)/2. (3) We claim that B(b,r2)B(a,r): In order to see this take any cB(b,r2) and observe that

cacb+bar2+r1=r+r12<r.

Observe that the radius of the ball will be small for points close to the boundary.

Definition (Cartesian product)

If A1R, A2R then the Cartesian product is defined as

A1×A2:={(x,y):xA1,yA2}R2.

Analogously the Cartesian product can be defined in higher dimensions: If A1Rm, A2Rn then the Cartesian product A1×A2 is defined as the set of all points (x1,,xm,y1,,yn)Rm+n such that (x1,,xm)A1 and (y1,,yn)A2.

Lemma

If A1,A2 are open subsets of R then A1×A2 is an open subset of R2.

Proof

Let a=(a1,a2)A1×A2R2. Since A1 is open there exists r1>0 such that B(a1,r1)A1. Similarly for A2. Let r=min{r1,r2}. This all means that B(a,r)B(a1,r1)×B(a2,r2)A1×A2.

If A1,A2 are intervals then A1×A2 is a rectangle

Discussing the "interior" of the set naturally suggests the topic of the "boundary" of the set. In the following definitions we develop this idea.

Definition (exterior points)

Let SRn. A point aS is said to be an exterior point if there exists r>0 such that B(a,r)S=. The set of all exterior points of S is denoted extS.

Observe that extS is an open set. We use the notation Sc=RnS and we say that Cc is the complement of the set S.

Definition (boundary)

The set Rn(intSextS) is called the boundary of SRn and is denoted S.

Definition (closed)

A set SRn is said to be closed if SS.

Lemma

S is open Sc is closed.

Proof

Observe that Rn=intSSextS (disjointly). If xS then, for every r>0, B(x,r)S and so x(Sc). Similarly with S and Sc swapped and so S=(Sc). If S is open then intS=S and Sc=extSS=extS(Sc) and so Sc is closed. If S is not open then there exists aSS. Additionally a(Sc)S hence Sc is not closed.

Limits and continuity

Let SRn and f:SRm. If aRn, bRm we write limxaf(x)=b to mean that f(x)b0 as xa0. Observe how, if n=m=1, this is the familiar notion of continuity for functions on R.

Definition (Continuous)

A function f is said to be continuous at a if f is defined at a and limxaf(x)=f(a). We say f is continuous on S if f is continuous at each point of S.

Even functions which look "nice" can fail to be continuous as we can see in the following example.

Example (continuity in higher dimensions)

Let f be defined, for (x,y)(0,0), as

f(x,y)=xyx2+y2

and f(0,0)=0. What is the behaviour of f when approaching (0,0) along the following lines?

linevalue
{x=0}f(0,t)=0
{y=0}f(t,0)=0
{x=y}f(t,t)=12
{x=y}f(t,t)=12

Theorem

Suppose that limxaf(x)=b and limxag(x)=c. Then

  1. limxa(f(x)+g(x))=b+c,
  2. limxaλf(x)=λb for every λR,
  3. limxaf(x)g(x)=bc,
  4. limxaf(x)=b.

We prove a couple of the parts of the above theorem here, the other parts are left as exercises.

Proof of part 3.

Observe that f(x)g(x)bc=(f(x)b)(g(x)c)+b(g(x)c)+c(f(x)b). By the triangle inequality and Cauchy-Schwarz,

f(x)g(x)bcf(x)bg(x)c+bg(x)c+cf(x)b.

Since we already know that f(x)b0 and g(x)c0 as xa, this implies that f(x)g(x)bc0.

Proof of part 4.

Take f=g in part (c) implies that limxaf(x)2=b2.

When writing a vector field (or similar functions) it is often convenient to divide the higher-dimensional function into smaller parts. We call these parts the components of a vector field. For example F(x)=(F1(x),F2(x)) in 2D, F(x)=(F1(x),F2(x),F3(x)) in 3D, etc.

Theorem

Let F(x)=(F1(x),F2(x)). Then F is continuous if and only if F1 and F2 are continuous.

Proof

We will independently prove the two implications.

  • () Let e1=(1,0), e2=(0,1) and observe that Fk(x)=F(x)ek. We have already shown that the continuity of two vector fields implies the continuity of the inner product.
  • () By definition of the normF(x)F(a)2=k=12(Fk(x)Fk(a))2and we know Fk(x)Fk(a)0 as xa0.

In higher dimensions the analogous statement is true for the vector field F(x)=(F1(x),,Fm(x)) with exactly the same proof. I.e., F is continuous if and only if each fk is continuous.

Example (polynomials)

A polynomial in n variables is a scalar field on Rn of the form

f(x1,,xn)=k1=0jkn=0jck1,,knx1k1xnkn.

E.g., f(x,y):=x+2xyx2 is a polynomial in 2 variables. Polynomials are continuous everywhere in Rn. This is because they are the finite sum of products of continuous scalar fields.

Example (rational functions)

A rational function is a scalar field

f(x)=p(x)q(x)

where p(x) and q(x) are polynomials. A rational function is continuous at every point x such that q(x)0.

As described in the following result, the continuity of functions continues to hold, in an intuitive way, under composition of functions.

Theorem

Suppose SRl, TRm, f:SRm, g:TRn and that f(S)T so that

(gf)(x)=g(f(x))

makes sense. If f is continuous at aS and g is continuous at f(a) then gf is continuous at a.

Proof

limxaf(g(x))f(g(a))=limyg(a)f(y)f(g(a))=0

Example

We can consider the scalar field f(x,y)=sin(x2+y)+xy as the composition of functions.

Derivatives of scalar fields

Plot where colour represents the value of f(x,y)=x2+y2. The change in f depends on direction

We can imagine, for example in the figure, that in higher dimensions, the derivative of a scalar field depends on the direction. This motivates the following.

Definition (directional derivative)

Let SRn and f:SR. For any aintS and vRn, v=1 the directional derivative of f with respect to v is defined as

Dvf(a)=limh01h(f(a+hv)f(a)).

When h is small we can guarantee that a+hvS because aintS so this definition makes sense.

Theorem

Suppose SRn, f:SR, aintS. Let g(t):=f(a+tv). If one of the derivatives g(t) or Dvf(a) exists then the other also exists and

g(t)=Dvf(a+tv).

In particular g(0)=Dvf(a).

Proof

By definition 1h(g(t+h)g(h))=1h(f(a+hv)f(a)).

The following result is useful for proving later results.

Theorem (mean value)

Assume that Dv(a+tv) exists for each t[0,1]. Then for some θ(0,1),

f(a+v)f(a)=Dvf(z),where z=a+θv.

Proof

Apply mean value theorem to g(t)=f(a+tv).

The following notation is convenient. For any k{1,2,,n}, let ek be the n-dimensional unit vector where all entries are zero except the kth position which is equal to 1. I.e., e1=(1,0,,0), e2=(0,1,0,,0), en=(0,,0,1).

Definition (partial derivatives)

We define the partial derivative in xk of f(x1,,xn) at a as

fxk(a)=Dekf(a).

Remark

Various symbols used for partial derivatives: fxk(a)=Dkf(a)=kf(a). If a function is written f(x,y) we write fx,fy for the partial derivatives. Similarly for higher dimension.

In practice, to compute the partial derivative fxk, one should consider all other xj for jk as constants and take the derivative with respect to xk. In a moment we see this rigorously.

If f:RR is differentiable, then we know that, when x is close to a,

f(x)f(a)+(xa)f(a).

More precisely, we know that f(x)=f(a)+(xa)f(a)+ϵ(xa) where |ϵ(xa)|=o(|xa|). (This is little-o notation and here means that |f(x)f(a)(xa)f(a)|/|xa|0 as |xa|0.) This way of seeing differentiability is convenient for the higher dimensional definition of differentiability.

Definition (differentiable)

Let SRn be open, f:SR. We say that f is differentiable at aS if there exists a linear transformation dfa:RnR such that, for xB(a,r),

f(x)=f(a)+dfa(xa)+ϵ(xa)

where |ϵ(xa)|=o(xa).

For future convenience we introduce the following notation.

Definition (gradient)

The gradient of the scalar field f(x,y,z) at the point a is

f(a)=(fx(a)fy(a)fz(a)).

In general, when working in Rn for some nN, the gradient of the scalar field f(x1,,xn) at the point a is

f(a)=(fx1(a)fx2(a)fxn(a)).

Theorem

If f is differentiable at a then dfa(v)=f(a)v. This means that, for xB(a,r),

f(x)=f(a)+f(a)(xa)+ϵ(xa)

where |ϵ(xa)|=o(xa). Moreover, for any vector v, v=1,

Dvf(a)=f(a)v.

Proof

Since f is differentiable there exists a linear transformation dfa:RnR such that f(a+hv)=f(a)+hdfa(v)+ϵ(hv) and hence

Dvf(a)=limh01h(f(a+hv)f(a))=limh01h(h dfa(v)+ϵ(hv))=dfa(v).

In particular dfa(ek)=Dekf(a).

Theorem

If f is differentiable at a, then it is continuous at a.

Proof

Observe that |f(a+v)f(a)|=|dfa(v)+ϵ(v)|. This means that

|f(a+v)f(a)|dfav+|ϵ(v)|

and so this tends to 0 as v0.

Theorem

Suppose that f(x1,,xn) is a scalar field. If the partial derivatives 1f(x),,nf(x) exist for all xB(a,r) and are continuous at a then f is differentiable at a.

Proof

For convenience define the vectors

v=(v1,v2,,vn),uk=(v1,v2,,vk,0,,0).

Observe that

ukuk1=vkek,u0=(0,0,,0),un=v.

Using the mean value theorem we know that there exists zk=uk1+θkek such that f(a+uk)f(a+uk1)=vkDe_kf(a+z_k). Consequently

f(a+v)f(a)=k=1nf(a+uk)f(a+uk1)=k=1nvkDekf(a+z_k)=k=1nvkDekf(a+uk1)+k=1nvk(Dekf(a+z_k)Dekf(a+uk1))

To conclude, observe that the second sum vanishes as v0 and that the first sum, k=1nvkDekf(a+uk1), converges to vf(a).

Chain rule

When we are working in R we know that, if g and h are differentiable, then f(t)=gh(t) is also differentiable and also f(t)=g(h(t)) h(t). This is called the chain rule and is frequently very useful in calculating derivatives. We now investigate how this extends to higher dimension?

Example

Suppose that α:RR3 describes the position α(t) at time t and that f:R3R describes the temperature f(α) at a point α The temperature at time t is equal to g(t)=f(α(t)). We want to calculate g(t) because this is the change in temperature with respect to time.

In situations like the above example it is convenient to consider the derivative of a path α:RRn. Let α:RRn and suppose it has the form α(t)=(α1(t),,αn(t)). We define the derivative as

α(t):=(x1(t)xn(t)).

Here α is a vector-valued function which represents the "direction of movement".

α(t)=(cost,sint,t), tR

Theorem

Let SRn be open and IR an interval. Let x:IS and f:SR and define, for tI,

g(t)=f(x(t)).

Suppose that tI is such that x(t) exists and f is differentiable at x(t). Then g(t) exists and

g(t)=f(x(t))x(t).

Proof

Since f is differentiable, f(y)f(x)=f(x)(yx)+ϵ(x,yx) where |ϵ(x,yx)|=o(yx). Let h>0 be small.

1h[g(t+h)g(t)]=1h[f(x(t+h))f(x(t))]=1hf(x(t))(x(t+h)x(t))  +1hϵ(x(t),x(t+h)x(t)).

Observe that 1h(x(t+h)x(t))x(t) as h0.

Example

A particle moves in a circle and its position at time t[0,2π] is given by

x(t)=(cost,sint).

The temperature at a point y=(y1,y2) is given by the function f(y):=y1+y2, The temperature the particle experiences at time t is given by g(t)=f(x(t)). Temperature change:

g(t)=f(x(t))x(t)=(11)(sintcost)=costsint.
x(t) is the position of a particle.

Level sets & tangent planes

Let SR2, f:SR. Suppose cR and let

L(c)={xS:f(x)=c}.

The set L(c) is called the level set. In general this set can be empty or it can be all of S. However the set L(c) is often a curve and this is the case of interest. This is the same notion as that of contour lines on a map. I.e., x(ta)=a for some taI and

f(x(t))=c

for all tI. Then

  • f(a) is normal to the curve at a
  • Tangent line at a is {xR2:f(a)(xa)=0}

This is because the chain rule implies that f(x(t))x(t)=0.

Example

Let f(x1,x2,x3):=x12+x22+x32.

  • If c>0 then L(c) is a sphere,
  • L(0) is a single point (0,0,0),
  • If c<0 then L(c) is empty.

Example

Let f(x1,x2,x3):=x12+x22x32. See figure.

  • If c>0 then L(c) is a one-sheeted hyperboloid,
  • L(0) is an infinite cone,
  • If c<0 then L(c) is a two-sheeted hyperboloid.
Sphere
2-sheet hyperboloid
Infinite cone
1-sheet hyperboloid

Let f be a differentiable scalar field on SR3 and suppose that the level set L(c)={xS:f(x)=c} defines a surface.

  • The gradient f(a) is normal to every curve α(t) in the surface which passes through a,
  • The tangent plane at a is {xR3:f(a)(xa)=0}.

Same argument as in R2 works in Rn.

Tangent plane and normal vector

Derivatives of vector fields

Essentially everything discussed above for scalar fields extends to vector fields in a predictable way. This is because of the linearity and that we can consider each component of the vector field independently.

Definition (directional derivative)

Let SRn and F:SRm. For any aintS and vRn the derivative of the vector field F with respect to v is defined as

DvF(a):=limh01h(F(a+hv)F(a)).

Remark

If we use the notation F=(F1,,Fm), i.e., we write the function using the ``components'' where each Fk is a scalar field, then DvF=(DvF1,,DvFm).

Definition (differentiable)

We say that F:RnRm is differentiable at a if there exists a linear transformation dfa:RnRm such that, for xB(a,r),

F(x)=F(a)+dfa(xa)+ϵ(xa),

ϵ(xa)=o(xa).

Theorem

If F is differentiable at a then F is continuous at a and dfa(v)=DvF(a).

Proof

Same as for the case of scalar fields when f:RnR.

Jacobian matrix & the chain rule

The relevant differential for higher-dimensional functions is the Jacobian matrix.

Definition (Jacobian matrix)

Suppose that F:R2R2 and use the notation F(x,y)=(F1(x,y),F2(x,y)). The Jacobian matrix of F at a is defined as

DF(a)=(F1x(a)F1y(a)F2x(a)F2y(a)).

The Jacobian matrix is defined analogously in any dimension. I.e., if F:RnRm the the Jacobian at a is

DF(a)=(1F1(a)2F1(a)nF1(a)1F2(a)2F2(a)nF2(a)1Fm(a)2Fm(a)nFm(a))

If we choose a basis then any linear transformation RnRm can be written as a m×n matrix. We find that dfa(v)=DF(a)v.

Let SRn and F:SRm. If f is differentiable at aS then, for all xB(a,r)S,

F(x)=F(a)+DF(a)(xa)+ϵ(xa)

where |ϵ(xa)|=o(xa). This is like a Taylor expansion in higher dimensions.

Here we see that in higher dimensions we have a matrix form of the chain rule.

Theorem

Let SRl, TRm be open. Let f:ST and g:TRn and define

h=gf:SRn.

Let aS. Suppose that f is differentiable at a and g is differentiable at f(a). Then h is differentiable at a and

Dh(a)=Dg(f(a)) Df(a).

Proof

Let u=f(a+v)f(a). Since f and g are differentiable,

h(a+v)h(a)=g(f(a+v))g(f(a))=Dg(f(a))(f(a+v)f(a))+ϵg(u)=Dg(f(a))Df(a)v+Dg(f(a))ϵf(v)+ϵg(u).

Example (polar coordinates)

Here we consider polar coordinates and calculate the Jacobian of this transformation. We can write the change of coordinates

(r,θ)(rcosθ,rsinθ)

as the function f(r,θ)=(x(r,θ),y(r,θ)) where f:(0,)×[0,2π)R2. We calculate the Jacobian matrix of this transformation

Df(r,θ)=(xr(r,θ)xθ(r,θ)yr(r,θ)yθ(r,θ))=(cosθrsinθsinθrcosθ).

In particular we see that detDf(r,θ)=r, the familiar value used in change of variables with polar coordinated.

Suppose now that we wish to calculate derivatives of h:=gf for some g:R2R. Here we take advantage of the theorem concerning multiplication of Jacobians.

Dh(r,θ)=Dg(f(r,θ)) Df(r,θ)(hr(r,θ)hθ(r,θ))=(gx(f(r,θ))gy(f(r,θ)))(cosθrsinθsinθrcosθ)

In other words, we have shown that

hr(r,θ)=gx(rcosθ,rsinθ)cosθ+gy(rcosθ,rsinθ)sinθhθ(r,θ)=rgx(rcosθ,rsinθ)sinθ+rgy(rcosθ,rsinθ)cosθ.

Implicit functions & partial derivatives

Just like with derivatives, we can take higher order partial derivatives. For convenience when we want to write yxf(x,y), i.e., differentiate first with respect to x and then with respect to y, we write instead 2fyx(x,y). The analogous notation is used for higher derivatives and any other choice of coordinates.

We first consider the question of when

2fyx(x,y)=?2fxy(x,y).

Example (partial derivative problem)

Let f:R2R be defined as f(0,0)=0 and, for (x,y)(0,0),

f(x,y):=xy(x2y2)x2+y2.

We calculate that 2fyx(0,0)=1 but 2fxy(0,0)=1.

Theorem

Let f:SR be a scalar field such that the partial derivatives fx, fy and 2fyx exist on an open set SR2 containing x. Further assume that 2fyx is continuous on S. Then the derivative 2fxy(x) exists and

2fxy(x)=2fyx(x).

In many cases we can choose to write a given curve/function either in implicit or explicit form.

ImplicitExplicit
x2y=0y(x)=x2
x2+y2=1y(x)=±1x2, |x|1
x2y21=0y(x)=±x21, |x|1
x2+y2ey4=0A mess?
x2y43=sin(xy)A huge mess?

Given the above observation, the following method of calculating derivatives is sometimes useful. Suppose that some f:R2R is given and we suppose there exists some y:RR such that

f(x,y(x))=0 for all x.

Let h(x):=f(x,y(x)) and note that h(x)=0. Here we are using the idea that h=fg where g(x)=(x,y(x)). By the chain rule h(x) is equal to

(fx(x,y(x))fy(x,y(x)))(1y(x))=0.

Consequently

y(x)=fx(x,y(x))fy(x,y(x)).