It is surprising how many people think that that analysis consists in the diﬃcult proofs of obvious theorems. All we need know, they say, is what a limit is, the deﬁnition of continuity and the deﬁnition of the derivative. All the rest is ‘intuitively clear’.

If pressed they will agree that these deﬁnitions apply as much to the rationals \(\mathbb Q\) as to the real numbers \(\mathbb R\). They then have to explain the following interesting example.

Example 1.1. If \(f:{\mathbb Q}\rightarrow {\mathbb Q}\) is given by

\begin{alignat*}{2} f(x)&=-1&&\qquad\text{if $x^{2}<2$,}\\ f(x)&=1&&\qquad\text{otherwise,} \end{alignat*}

then

(i) \(f\) is continuous function with \(f(0)=-1\), \(f(2)=1\) yet there does not exist a \(c\) with \(f(c)=0\),

(ii) \(f\) is a diﬀerentiable function with \(f'(x)=0\) for all \(x\) yet \(f\) is not constant.

What is the diﬀerence between \(\mathbb R\) and \(\mathbb Q\) which makes calculus work on one even though it fails on the other. Both are ‘ordered ﬁelds’, that is, both support operations of ‘addition’ and ‘multiplication’ together with a relation ‘greater than’ (‘order’) with the properties that we expect. If the reader is interested she will ﬁnd a complete list of the appropriate axioms in texts like the altogether excellent book of Spivak [8] and its many rather less excellent competitors, but, interesting as such things may be, they are irrelevant to our purpose which is not to consider the shared properties of \(\mathbb R\) and \(\mathbb Q\) but to identify a diﬀerence between the two systems which will enable us to exclude the possibility of a function like that of Example 1.1 for functions from \(\mathbb R\) to \(\mathbb R\).

To state the diﬀerence we need only recall a deﬁnition from the beginning of the course C5/6.

Deﬁnition 1.2. If \(a_{n}\in {\mathbb R}\) for each \(n\geq 1\) and \(a\in {\mathbb R}\) then we say that \(a_{n}\rightarrow a\) if given \(\epsilon >0\) we can ﬁnd an \(n_{0}(\epsilon )\) such that \[|a_{n}-a|<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]

The key property of the reals, the fundamental axiom which makes everything work was also stated in the course C5/6.

Axiom 1.3 (The fundamental axiom of analysis). If \(a_{n}\in {\mathbb R}\) for each \(n\geq 1\), \(A\in {\mathbb R}\) and \(a_{1}\leq a_{2}\leq a_{3}\leq \ldots \) and \(a_{n}<A\) for each \(n\) then there exists an \(a\in {\mathbb R}\) such that \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \).

Less ponderously, and just as rigorously, the fundamental axiom for the real numbers says every increasing sequence bounded above tends to a limit.

Everything which depends on the fundamental axiom is analysis, everything else is mere algebra.

In the course C5/6 you proved the following results on the limits of sequences.

Lemma 2.1. (i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).

(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).

(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).

(iv) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}+b_{n}\rightarrow a+b\).

(v) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}b_{n}\rightarrow ab\).

(vi) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(a_{n}\neq 0\) for each \(n\), \(a\neq 0\) then \(a_{n}^{-1}\rightarrow a^{-1}\).

(vii) If \(a_{n}\leq A\) for each \(n\) and \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) then \(a\leq A\).

Since these results were proved in the earlier course I shall not prove them. However, it is essential for further progress that the reader should be able to prove them for herself. If you can not prove these results consult your supervisor at once. You should do the same if you cannot prove the following variation on the fundamental axiom.

[Hint. If \(a\leq b\) then \(-b\leq -a\).]

Useful as the results of Lemma 2.1 are, they are also true of sequences in \(\mathbb Q\). They are therefore mere, if important, algebra. Our ﬁrst truly ‘analysis’ result may strike the reader as rather odd.

Theorem 2.3 shows that there is no ‘exotic’ real number \(\gimel \) say (to choose an exotic
symbol) with the property that \(1/n>\gimel \) for all integers \(n\geq 1\) yet \(\gimel >0\) (that is, \(\gimel \) is strictly positive
and yet smaller than all strictly positive rationals). There exist number systems
with such exotic numbers (the famous ‘non-standard analysis’ of Abraham
Robinson and the ‘surreal numbers’ of Conway constitute two such systems)
but, just as the rationals are, in some sense, too small a system for the
standard theorems of analysis to hold so these non-Archimedean systems
are, in some sense, too big. Archimedes and Eudoxus realised the need
for an axiom to show that there is no exotic number \(\daleth \) bigger than any
integer^{1}
(i.e. \(\daleth >n\) for all integers \(n\geq 1\); to see the connection with our form of the axiom consider \(\gimel =1/\daleth \)).
However, in spite of its name, what was an axiom for Archimedes is a theorem for
us.

It is all very well knowing that the fundamental axiom is the foundation of mathematics but how can we ﬁnd a proof technique which will allow us to apply it. One technique uses the Bolzano–Weierstrass theorem.

Theorem 3.1 (Bolzano–Weierstrass). If \(x_{n}\in {\mathbb R}\) and there exists a \(K\) such that \(|x_{n}|\leq K\) for all \(n\) then we can ﬁnd \(n(1)<n(2)<\ldots \) and \(x\in {\mathbb R}\) such that \(x_{n(j)}\rightarrow x\) as \(j\rightarrow \infty \).

Recall from C5/6 that we say that a sequence converges if it tends to a limit. The Bolzano–Weierstrass theorem thus says that every bounded sequence of reals has a convergent subsequence. Notice that we say nothing about uniqueness, if \(x_{n}=(-1)^{n}\) then \(x_{2n}\rightarrow 1\) but \(x_{2n+1}\rightarrow -1\) as \(n\rightarrow \infty \).

There is a proof of the Bolzano–Weierstrass theorem by repeated dissection along the lines of the proof of the intermediate value theorem given in course C5/6. We shall give another proof based on the following elegant combinatorial lemma.

Lemma 3.2. If \(x_{n}\in {\mathbb R}\) then at least one of the following two statements must be true.

(A) There exist \(n(1)<n(2)<\ldots \) such that \(x_{n(j)}\leq x_{n(j+1)}\) for each \(j\geq 1\).

(B) There exist \(m(1)<m(2)<\ldots \) such that \(x_{m(j)}\geq x_{m(j+1)}\) for each \(j\geq 1\).

There are a variety of techniques for using the fundamental axiom:- successive dissection, using Heine–Borel, looking for the supremum and so on. When the reader is conﬁdent in her control of the material in this course (and when she knows the various techniques enumerated in the last sentence) she should test her skill by proving the main results of this course by using each technique in turn. For the moment it seems sensible to concentrate on one proof technique until it is fully mastered. The technique chosen for this exposition is that of Bolzano–Weierstrass. We conclude this section with two important examples, Theorem 3.6 on the existence of suprema and Theorem 3.13 on the general principle of convergence.

It is an unfortunate fact of life that many, otherwise perfectly well behaved sets of numbers do not have greatest members (maxima). A simple example is given by \(\{x\in {\mathbb R}:0<x<1\}\). However, we shall see in Theorem 3.6 any non-empty bounded set of real numbers does have a least upper bound (supremum).

Deﬁnition 3.3. Consider a non-empty set \(A\) of real numbers. We say that \(\alpha \) is a least upper bound (or supremum) for \(A\) if the following two conditions hold.

(i) \(\alpha \geq a\) for all \(a\in A\) (that is, \(\alpha \) is an upper bound).

(ii) If \(\beta \geq a\) for all \(a\in A\) then \(\beta \geq \alpha \) (that is, \(\alpha \) is no greater than any possible upper bound).

It is convenient to have the following alternative characterisation of the supremum.

Lemma 3.5. Consider a non-empty set \(A\) of real numbers; \(\alpha \) is a least upper bound for \(A\) if and only if the following two conditions hold.

(i) \(\alpha \geq a\) for all \(a\in A\).

(ii) Given \(\epsilon >0\) there exists an \(a\in A\) such that \(a+\epsilon \geq \alpha \).

We write \(\sup A\) or \(\sup _{a\in A}a\) for the supremum of \(A\) if it exists.

Here is the promised theorem.

Theorem 3.6 (Existence of the supremum). If \(A\) is a non empty set of real numbers which is bounded above (that is, there exists a \(K\) such that \(a\leq K\) for all \(a\in A\)) then \(A\) has a supremum.

That this is a genuine theorem of analysis is shown by the fact that, if we work in \(\mathbb Q\) the set \(\{x\in \mathbb Q:x^{2}<2\}\) has no least upper bound. (See Question 18.13 for a detailed discussion.)

We leave it to the reader to deﬁne the greatest lower bound also called the inﬁmum of a set \(A\) and written \(\inf A\) or \(\inf _{a\in A}a\). She should prove that \[\inf _{a\in A}a=-\sup _{a\in A}(-a)\] provided that either side of the equation exists.

As an example of its use, recall the following lemma from course C5/6.

Lemma 3.7. We work in \(\mathbb C\). If \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for \(z=z_{0}\) then it converges for all \(z\) with \(|z|<|z_{0}|\).

The use of Theorem 3.6 gives us a rather transparent proof of the existence of the radius of convergence.

Theorem 3.8. We work in \(\mathbb C\). Consider the sum \(\sum _{n=0}^{\infty }a_{n}z^{n}\). Either \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for all \(z\) (in this case we say that the series has inﬁnite radius of convergence) or there exists a real number \(R\geq 0\) such that

(i) \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for all \(|z|<R\),

(ii) \(\sum _{n=0}^{\infty }a_{n}z^{n}\) diverges for all \(|z|>R\),

(in this case we say that the series has radius of convergence R).

The nature of the supremum is illustrated by the following example which we leave to the reader as an exercise in the methods of the course C5/6. (As usual contact your supervisor if you can not do it.)

Example 3.9. (i) The sum \(\sum _{n=1}^{\infty }n^{-2}z^{n}\) has radius of convergence \(1\) and converges for all \(|z|=1\).

(ii) The sum \(\sum _{n=0}^{\infty }z^{n}\) has radius of convergence \(1\) and diverges for all \(|z|=1\).

(iii) The sum \(\sum _{n=1}^{\infty }n^{-1}z^{n}\) has radius of convergence \(1\), diverges for \(z=1\) and converges for \(z=-1\).

[When revising this part of the course you should also bear in mind Theorem 14.20 discussed later.]

We turn now to the general principle of convergence.

Deﬁnition 3.10. We say that a sequence of real numbers \(x_{n}\) is a Cauchy sequence if given any \(\epsilon >0\) we can ﬁnd \(n_{0}(\epsilon )\) such that \(|x_{p}-x_{q}|<\epsilon \) whenever \(p,q\geq n_{0}(\epsilon )\).

Our ﬁrst lemma is merely algebraic.

The converse is a powerful theorem of analysis.

Combining the two results we get the general principle of convergence.

Theorem 3.13 (General principle of convergence). A sequence of real numbers converges if and only if it is a Cauchy sequence.

The general principle of convergence is usually too general to be used in the same way as the convergence tests of Course C5/6 but has great theoretical importance as we shall see.

For those who can not wait here is a proof of the uncountability of \(\mathbb R\). The argument is much closer to Cantor’s original proof than the standard one given in course C1. I present it as a heavily starred exercise.

Exercise 3.14 (Uncountability of the reals). Let \(y_{1}\), \(y_{2}\), …be any sequence of points in \(\mathbb R\). Let \(x_{0}=0\), \(\delta _{0}=1\).

(i) Show that you can construct inductively a sequence of real numbers \(x_{1}\), \(x_{2}\), …and positive numbers \(\delta _{j}\) such that

(a) \(|x_{n}-x_{n-1}|<\delta _{n-1}/4\),

(b) \(x_{n}\neq y_{n}\),

(c) \(0<\delta _{n}<|x_{n}-y_{n}|\),

(d) \(\delta _{n}<\delta _{n-1}/4\).

(ii) Show that \(\delta _{n+m}<4^{-m}\delta _{n}\) for \(m,n\geq 0\) and deduce that the \(x_{n}\) form a Cauchy sequence. Conclude that \(x_{n}\) tends to a limit \(x\).

(iii) Show that \(|x_{n+m}-x_{n}|<\delta _{n}/3\) for all \(m,n\geq 0\). Deduce that \(|x-x_{n}|\leq \delta _{n}/3\) for all \(n\geq 0\). Why does this show that \(y_{n}\neq x\) for all \(n\)?

(iv) Deduce that the real numbers are uncountable.

In 1908, G. H. Hardy wrote a textbook to introduce the new rigorous analysis (or
‘continental analysis’ as it was known in a Cambridge even more insular than
today) to ‘ﬁrst year students at the Universities whose abilities approach
something like what is usually described as “scholarship standard” ’. Apart from
the fact that even the most hardened analyst would now hesitate to call an
introduction to analysis A Course of Pure Mathematics it is striking how
close the book is in both content and feel to a modern ﬁrst course in
analysis. (And where there are changes it is often not clear that the modern
course^{2}
has the advantage.) The only major diﬀerence is that Hardy only studies the real
line but later advances in mathematics mean that we must now study analysis in \({\mathbb R}^{m}\)
as well.

In the course C1/2 you saw that \({\mathbb R}^{m}\) has a Euclidean norm \[\|{\mathbf x}\|=\left (\sum _{i=1}^{m}x_{j}^{2}\right )^{1/2}\] (taking the positive square root) with the following properties.

Lemma 4.1. If \(\|\ \|\) is the Euclidean norm on \({\mathbb R}^{m}\) then

(i) \(\|{\mathbf x}\|\geq 0\) for all \({\mathbf x}\in {\mathbb R}^{m}\),

(ii) If \(\|{\mathbf x}\|= 0\) then \({\mathbf x}={\mathbf 0}\),

(iii) If \(\lambda \in {\mathbb R}\) and \({\mathbf x}\in {\mathbb R}^{m}\) then \(\|\lambda {\mathbf x}\|=|\lambda |\|{\mathbf x}\|\).

(iv) (The triangle inequality) If \({\mathbf x},{\mathbf y}\in {\mathbb R}^{n}\) then \(\|{\mathbf x}+{\mathbf y}\| \leq \|{\mathbf x}\|+\|{\mathbf y}\|\).

The only non-trivial part of this lemma was the triangle inequality which you proved using the Cauchy-Schwarz inequality.

In the course C5/6 you studied the notion of the limit for sequences in \({\mathbb R}^{m}\).

Deﬁnition 4.2. If \(\mathbf {a}_{n}\in {\mathbb R}^{m}\) for each \(n\geq 1\) and \(\mathbf {a}\in {\mathbb R}^{m}\) then we say that \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) if given \(\epsilon >0\) we can ﬁnd an \(n_{0}(\epsilon )\) such that \[\|\mathbf {a}_{n}-\mathbf {a}\|<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]

Notice that this shows that Deﬁnition 1.2 was about the distance between two points and not the absolute value of the diﬀerence of two numbers.

You proved the following results on sequences in \({\mathbb R}^{m}\) in exactly the same way as you proved the corresponding results in \(\mathbb R\).

Lemma 4.3. (i) The limit is unique. That is, if \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {a}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}=\mathbf {b}\).

(ii) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(\mathbf {a}_{n(j)}\rightarrow \mathbf {a}\) as \(j\rightarrow \infty \).

(iii) If \(\mathbf {a}_{n}=\mathbf {c}\) for all \(n\) then \(\mathbf {a}_{n}\rightarrow \mathbf {c}\) as \(n\rightarrow \infty \).

(iv) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {b}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}_{n}+\mathbf {b}_{n} \rightarrow \mathbf {a}+\mathbf {b}\).

(v) Suppose \(\mathbf {a}_{n}\in {\mathbb R}^{m}\), \(\mathbf {a}\in {\mathbb R}^{m}\), \(\lambda _{n}\in {\mathbb R}\) and \(\lambda \in {\mathbb R}\). If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\lambda _{n}\rightarrow \lambda \) then \(\lambda _{n}\mathbf {a}_{n}\rightarrow \lambda \mathbf {a}\).

Once again I leave it to the reader to check that she can indeed prove these results and to consult her supervisor if she can not. She should also make sure that she understands why certain results in Lemma 2.1 which dealt with an ordered ﬁeld do not appear in Lemma 4.3 which deals with a vector space.

Lemma 4.3 is, of course, merely algebra and applies to \({\mathbb Q}^{m}\) as much as to \({\mathbb R}^{m}\). In order to do analysis we need a more powerful tool and, in keeping with the spirit of this course, we extend the Bolzano–Weierstrass theorem to \({\mathbb R}^{m}\). The extension is easy since the real work has already been done in the one dimensional case.

Theorem 4.4 (Bolzano–Weierstrass). If \(\mathbf {x}_{n}\in {\mathbb R}^{m}\) and there exists a \(K\) such that \(\|\mathbf {x}_{n}\|\leq K\) for all \(n\) then we can ﬁnd \(n(1)<n(2)<\ldots \) and \(\mathbf {x}\in {\mathbb R}\) such that \(\mathbf {x}_{n(j)}\rightarrow \mathbf {x}\) as \(j\rightarrow \infty \).

Once again ‘any bounded sequence has a convergent subsequence’.

The proof of Theorem 4.4 involves extending a one dimensional result to several dimensions. This is more or less inevitable because we stated the fundamental axiom of analysis in a one dimensional form. However the Bolzano–Weierstrass theorem itself contains no reference as to whether we are working in \(\mathbb R\) or \({\mathbb R}^{m}\). Thus exactly the same proofs as we used in the one dimensional case give us a general principle of convergence in \({\mathbb R}^{m}\). (Compare Deﬁnition 3.10 and Theorem 3.13 with what follows.)

Deﬁnition 4.5. We say that a sequence of points \(\mathbf {x}_{n}\) in \({\mathbb R}^{m}\) is a Cauchy sequence if given any \(\epsilon >0\) we can ﬁnd \(n_{0}(\epsilon )\) such that \(\|\mathbf {x}_{p}-\mathbf {x}_{q}\|<\epsilon \) for all \(p,q\geq n_{0}(\epsilon )\).

Theorem 4.6 (General Principle of Convergence). A sequence in \({\mathbb R}^{m}\) converges if and only if it is a Cauchy sequence.

Here is an example of the use of the general principle of convergence. (Following the conventions of course C5/6 we say that \(\sum _{j=1}^{\infty }\mathbf {a}_{j}\) converges if the sequence of partial sums \(\sum _{j=1}^{N}\mathbf {a}_{j}\) tends to a limit \(\mathbf s\) say. We write \(\sum _{j=1}^{\infty }\mathbf {a}_{j}={\mathbf s}\).)

Theorem 4.7. Let \(\mathbf {a}_{n}\in {\mathbb R}^{m}\) for each \(n\). If \(\sum _{j=1}^{\infty }\|\mathbf {a}_{j}\|\) converges then so does \(\sum _{j=1}^{\infty }\mathbf {a}_{j}\).

This result substantially generalises the result in course C5/6 which says that an absolutely convergent series converges.

When we work in \(\mathbb R\) the intervals are, in some sense, the ‘natural’ sets to consider. One of the problems that we face when we try to do analysis in many dimensions is that the types of sets with which we have to deal are much more diverse. It turns out that the so called closed and open sets are both suﬃciently diverse and suﬃciently well behaved to be useful. This short section is devoted to deriving some of their simpler properties. Novices frequently ﬁnd the topic hard but eventually the reader will appreciate that this section is a rather trivial interlude in a deeper discussion.

The deﬁnition of a closed set is a natural one.

Deﬁnition 5.1. A set \(F\subseteq {\mathbb R}^{m}\) is closed if whenever \(\mathbf {x}_{n}\in F\) for each \(n\) and \(\mathbf {x}_{n}\rightarrow \mathbf {x}\) as \(n\rightarrow \infty \) then \(\mathbf {x}\in F\).

Thus a set is closed in the sense of this course if it is ‘closed under the operation of taking limits’. An indication of why this is good deﬁnition is given by the following version of the Bolzano–Weierstrass theorem.

Theorem 5.2. If \(K\) is closed bounded set in \({\mathbb R}^{m}\) then every sequence in \(K\) has a subsequence converging to a point of \(K\).

When working in \({\mathbb R}^{m}\) the words ‘closed and bounded’ should always elicit the response ‘Bolzano–Weierstrass’. We shall see important examples of this slogan in action in the next section (Theorem 6.3 and Theorem 6.12).

We turn now to the deﬁnition of an open set.

Deﬁnition 5.3. A set \(U\subseteq {\mathbb R}^{m}\) is open if whenever \(\mathbf {x}\in U\) there exists an \(\epsilon ({\mathbf x})>0\) such that whenever \(\|\mathbf {x}-\mathbf {y}\|<\epsilon ({\mathbf x})\) we have \(\mathbf {y}\in U\).

Thus every point of an open set lies ‘well inside the set’.

Example 5.4. Consider sets in \(\mathbb R\). The interval \([a,b]=\{x\in {\mathbb R}:a\leq x \leq b\}\) is closed, the interval \((a,b)=\{x\in {\mathbb R}:a< x <b\}\) is open, the interval \([a,b)=\{x\in {\mathbb R}:a\leq x <b\}\) is neither open nor closed, \(\mathbb R\) is both open and closed.

Example 5.5. Consider sets in \({\mathbb R}^{m}\). Let \({\mathbf x}\in {\mathbb R}^{m}\) and \(r>0\).

(i) The set \(B({\mathbf x},r) =\{\mathbf {y}\in {\mathbb R}^{m}: \|\mathbf {x}-\mathbf {y}\|<r\}\) is open.

(ii) The set \(\bar {B}({\mathbf x},r) =\{\mathbf {y}\in {\mathbb R}^{m}: \|\mathbf {x}-\mathbf {y}\|\leq r\}\) is closed.

We call \(B({\mathbf x},r)\) the open ball of radius \(r\) and centre \(\mathbf x\). We call \(\bar {B}({\mathbf x},r)\) the closed ball of radius \(r\) and centre \(\mathbf x\). Observe that the closed and and open balls of \(\mathbb R\) are precisely the closed and open intervals.

The following restatement of the deﬁnition helps us picture an open set.

Lemma 5.6. A subset \(A\) of \({\mathbb R}^{m}\) is open if and only if each point of \(A\) is the centre of an open ball lying entirely within \(A\).

Thus every point of an open set is surrounded by a ball consisting only of points of the set.

The topics of this section are often treated using the idea of neighbourhoods. We shall not use neighbourhoods very much but they come in useful from time to time.

Deﬁnition 5.7. The set \(N\) is a neighbourhood of the point \(\mathbf x\) if we can ﬁnd an \(r({\mathbf x})>0\) such that \(B({\mathbf x},r({\mathbf x}))\subseteq N\).

Thus a set is open if and only if it is neighbourhood of every point that it contains.

Returning to the main theme we note the following remarkable fact.

Lemma 5.8. A subset \(A\) of \({\mathbb R}^{m}\) is open if and only if its complement \({\mathbb R}^{m}\setminus A\) is closed.

We observe the following basic results on open and closed sets.

Lemma 5.9. Consider the collection \(\tau \) of open sets in \({\mathbb R}^{m}\).

(i) \(\emptyset \in \tau \), \({\mathbb R}^{m}\in \tau \).

(ii) If \(U_{\alpha }\in \tau \) for all \(\alpha \in A\) then \(\bigcup _{\alpha \in A} U_{\alpha }\in \tau \).

(iii) If \(U_{1},U_{2},\dots ,U_{n}\in \tau \) then \(\bigcap _{j=1}^{n}U_{j}\in \tau \).

Lemma 5.10. Consider the collection \(\mathcal {F}\) of closed sets in \({\mathbb R}^{m}\).

(i) \(\emptyset \in \mathcal {F}\), \({\mathbb R}^{m}\in \mathcal {F}\).

(ii) If \(F_{\alpha }\in \mathcal {F}\) for all \(\alpha \in A\) then \(\bigcap _{\alpha \in A} F_{\alpha }\in \mathcal {F}\).

(iii) If \(F_{1},F_{2},\dots ,F_{n}\in \mathcal {F}\) then \(\bigcup _{j=1}^{n}F_{j}\in \mathcal {F}\).

In this context you should note the following examples.

Example 5.11. (i) \(\bigcap _{j=1}^{\infty }(-2-j^{-1},2+j^{-1}) =[-2,2]\), \(\bigcap _{j=1}^{\infty }(-2+j^{-1},2-j^{-1}) =(-1,1)\), \(\bigcap _{j=1}^{\infty }(-2+j^{-1},2+j^{-1})=(-1,2]\).

(ii) \(\bigcup _{j=1}^{\infty }[-1-j^{-1},1+j^{-1}] =[-2,2]\), \(\bigcup _{j=1}^{\infty }[-1+j^{-1},1-j^{-1}] =(-1,1)\), \(\bigcup _{j=1}^{\infty }[-1+j^{-1},1+j^{-1}]=(-1,2]\).

Thus the restriction to ﬁnite collections in part (iii) of Lemmas 5.9 and 5.10 can not be removed.

So far in this course we have only dealt with subsets and sequences. But analysis deals with functions, or rather with ‘well behaved functions’. The simplest such class was deﬁned in course C5/6.

Deﬁnition 5.12. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous at some point \({\mathbf x}\in E\) if given \(\epsilon >0\) we can ﬁnd a \(\delta (\epsilon ,{\mathbf x})>0\) such that whenever \({\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon ,{\mathbf x})\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\] If \(f\) is continuous at every point \({\mathbf x}\in E\) we say that \(f\) is a continuous function on \(E\).

In other words \(f\) is continuous at \(\mathbf x\) if \(f({\mathbf y})\) can be made as close as we wish to \(f({\mathbf x})\) simply by taking \(\mathbf y\) suﬃciently close to \(\mathbf x\). In other words \(f\) is locally approximately constant. In less precise terms ‘small changes produce small eﬀects’.

Exercise 5.13. After looking at parts (iii) to (v) of Lemma 4.3 state the corresponding results for continuous functions. (Thus part (v) corresponds to the statement that if \(\lambda :E\rightarrow {\mathbb R}\) and \(f:E\rightarrow {\mathbb R}^{p}\) are continuous at \({\mathbf x}\in E\) then so is \(\lambda f\).) Prove your statements directly from Deﬁnition 5.12. Consult your supervisor if you have any problems.

In this course we shall mainly use the following simple consequence of the deﬁnition.

Lemma 5.14. Let \(E\subseteq {\mathbb R}^{m}\) and suppose that the function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous at \({\mathbf x}\in E\). Then if \({\mathbf x}_{n}\in E\) and \({\mathbf x}_{n}\rightarrow {\mathbf x}\) it follows that \(f({\mathbf x}_{n})\rightarrow f({\mathbf x})\).

However, more advanced work uses generalisations of the following lemma. (Remember that if \(f:X\rightarrow Y\) is a function and \(A\subseteq Y\) we write \[f^{-1}(A)=\{x\in X\, :\, f(x)\in A\}\] and this notation does not require \(f\) to be invertible.)

Lemma 5.15. The function \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is continuous if and only if \(f^{-1}(O)\) is open whenever \(O\) is open.

As a simple example of how Lemma 5.15 can be used contrast the ‘\(\epsilon \), \(\delta \) proof’ of the next lemma using Deﬁnition 5.12 with a proof using Lemma 5.15.

Lemma 5.16. If \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) and \(g:{\mathbb R}^{p}\rightarrow {\mathbb R}^{g}\) are continuous the so is their composition \(g\circ f\).

(Recall that we write \(g\circ f({\mathbf x})=g(f({\mathbf x}))\).)

Although we shall not make much use of it, the following is an important consequence of the Bolzano–Weierstrass theorem.

Theorem 5.17. Suppose that \(F_{1}\), \(F_{2}\), …are non-empty bounded closed sets in \({\mathbb R}^{m}\) such that \(F_{1}\supseteq F_{2}\supseteq \dots \). Then \(\bigcap _{j=1}^{\infty }F_{j}\neq \emptyset .\)

That is, the intersection of bounded, closed, nested non-empty sets is itself non-empty. The example \(F_{j}=[j,\infty )\) shows that ‘bounded’ can not be dropped from the hypothesis. The example \(F_{j}=(0,j^{-1})\) shows that ‘closed’ can not be dropped from the hypothesis.

Finally, it is worth emphasising that openess is not an intrinsic property of a set but depends on the space in which the set ﬁnds itself.

(If you have a suﬃciently logical mind Example 5.18 is otiose. Few of us have that kind of mind.)

This section contains the core of a ﬁrst course in analysis. For completeness I shall include some theorems already proved in course C5/6. The ﬁrst such theorem is the intermediate value theorem proved in C5/6 by successive bisection. That proof is entirely satisfactory but, as an exercise, we shall prove the result using the method of Bolzano–Weierstrass.

Theorem 6.1 (Intermediate value theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is continuous and \(f(a)\leq 0\leq f(b)\) then there exists a \(c\in [a,b]\) such that \(f(c)=0\).

You should compare this result with part (i) of Example 1.1. Note also that the result depends on \(f\) taking values in \(\mathbb R\) rather than \({\mathbb R}^{2}\) or \(\mathbb C\).

Example 6.2. If \(f:[0,1]\rightarrow {\mathbb C}\) is given by \(f(t)=\exp \pi it\) then \(f\) is continuous, \(f(0)=1\), \(f(1)=-1\) but \(f(t)\neq 0\) for all \(t\).

Our next result looks a little abstract at ﬁrst.

Theorem 6.3. Let \(K\) be a closed bounded subset of \({\mathbb R}^{m}\) and \(f:K\rightarrow {\mathbb R}^{p}\) a continuous function. Then \(f(K)\) is closed and bounded.

Thus the continuous image of a closed bounded set is closed and bounded. (The example \(m=2\), \(p=1\), \[K=\{(x,y)\, : \, y\geq x^{-1}\ ,x>0\}\] \(f(x,y)=x\) shows that the continuous image of a closed set need not be closed.

The example The example \(m=1\), \(p=1\), \(K\) the open interval \((0,1)\), \(f(x)=x^{-1}\), shows that the continuous image of a bounded set need not be bounded.)

We derive a much more concrete corollary.

Theorem 6.4. Let \(K\) be a non-empty closed bounded subset of \({\mathbb R}^{m}\) and \(f:K\rightarrow {\mathbb R}\) a continuous function. Then we can ﬁnd \({\mathbf k}_{1}\) and \({\mathbf k}_{2}\) in \(K\) such that \[f({\mathbf k}_{1})\leq f({\mathbf k})\leq f({\mathbf k}_{2})\] for all \({\mathbf k}\in K\).

In other words a real valued continuous function on a closed bounded set is
bounded and attains its bounds. Less usefully we may say that, in this case \(f\)
actually has a maximum and a minimum. Notice that there is no analogous result
for vector valued functions. Much of economics consists in an attempt to disguise
this fact (there is unlikely to be a state of the economy in which everybody is best
oﬀ^{3}).

Theorem 6.4 has an even more concrete consequence.

Theorem 6.5. Let \(f:[a,b]\rightarrow {\mathbb R}\) be a continuous function. Then we can ﬁnd \(k_{1},k_{2}\in [a,b]\) such that \[f(k_{1})\leq f(k)\leq f(k_{2})\] for all \(k\in [a,b]\)

In course C5/6 you used this result to prove Rolle’s theorem.

Theorem 6.6 (Rolle’s theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is a continuous function with \(f\) diﬀerentiable on \((a,b)\) and \(f(a)=f(b)\) then we can ﬁnd a \(c\in (a,b)\) such that \(f'(c)=0\).

By ‘tilting Rolle’s theorem’ we obtain the mean value (in one dimension).

Theorem 6.7 (The one dimensional mean value theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is a continuous function with \(f\) diﬀerentiable on \((a,b)\) then we can ﬁnd a \(c\in (a,b)\) such that \[f(b)-f(a)=(b-a)f'(c).\]

Historically, mathematicians were slow to grasp the importance of the mean value theorem and the subtlety of its proof. Fallacious proofs still appeared in textbooks (British textbooks, admittedly) in the early years of the 20th century. (The proofs were fallacious in the strictest sense. They ‘proved’ the impossibility of the result given in part (ii) of Example 1.1.)

As you saw in course C5/6, the mean value theorem Theorem 6.7 has the following key corollaries.

Theorem 6.8. Suppose that \(f:(\alpha ,\beta )\rightarrow {\mathbb R}\) is a diﬀerentiable function.

(i) If \(f'(x)\geq 0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is an increasing function on \((\alpha ,\beta )\) (that is \(f(x)\leq f(y)\) whenever \(\alpha <x<y<\beta \)).

(ii) If \(f'(x)> 0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is a strictly increasing function on \((\alpha ,\beta )\) (that is \(f(x)<f(y)\) whenever \(\alpha <x<y<\beta \)).

(iii) If \(f'(x)=0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is constant.

(iv) If \(|f'(x)|\leq K\) for all \(x\in (\alpha ,\beta )\) then \(|f(u)-f(v)|\leq K|u-v|\) for all \(u,v\in (\alpha ,\beta )\)

Part (iii) of of Theorem 6.8 is the result that tells us that the solutions (if any) of \(f'(x)=g(x)\) on \((\alpha ,\beta )\) diﬀer only by the addition of a constant (the so called ‘constant of integration’).

Part (iv) of of Theorem 6.8 is an excellent example of the way that the theorems of this section (and many other theorems of analysis) convert local information into global information. We know that \(|f'(x)|\leq K\) so that locally the rate of change of \(f\) is no greater than \(K\). We deduce that \(|f(u)-f(v)|\leq K|x-y|\) so that globally the rate of change of \(f\) is no greater than \(K\).

The ﬁnal theorem of this section (Theorem 6.12 on uniform continuity) is another excellent example of the conversion of local information to global. We need a couple of deﬁnitions and examples. Recall ﬁrst from Deﬁnition 5.12 what it means for a function to be continuous on a set

Deﬁnition 6.9. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous on \(E\) if given any point \({\mathbf x}\in E\) and any \(\epsilon >0\) we can ﬁnd a \(\delta (\epsilon ,{\mathbf x})>0\) such that whenever \({\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon ,{\mathbf x})\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\]

Now compare that deﬁnition with our deﬁnition of uniform continuity.

Deﬁnition 6.10. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is uniformly continuous on \(E\) if given any \(\epsilon >0\) we can ﬁnd a \(\delta (\epsilon )>0\) such that whenever \({\mathbf x},{\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon )\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\]

Example 6.11. The following three functions are continuous but not uniformly continuous.

(i) \(f_{1}:{\mathbb R}\rightarrow {\mathbb R}\) given by \(f_{1}(x)=x^{2}\).

(ii) \(f_{2}:(0,1)\rightarrow {\mathbb R}\) given by \(f_{2}(x)=x^{-1}\).

(iii) \(f_{3}:(0,1)\rightarrow {\mathbb R}\) given by \(f_{2}(x)=\sin (x^{-1})\).

Theorem 6.12. Let \(K\) be a closed bounded subset of \({\mathbb R}^{m}\). If \(f:K\rightarrow {\mathbb R}^{p}\) is continuous on \(K\) then \(f\) is uniformly continuous on K.

The results of this section are the foundations of all analysis. It took 200 years to found analysis on the same axiomatic basis as Greek geometry and the successful completion of the project was a triumph of the human intellect. If you can trace a compete path from the fundamental axiom to the one dimensional mean value theorem you will have mastered the essential point of this course.

A function is continuous if it is locally approximately constant. A function is diﬀerentiable if it is locally approximately linear. More precisely, a function is continuous at a point \(\mathbf x\) if it is locally approximately constant with an error which decreases to zero as we approach \(\mathbf x\). A function is diﬀerentiable at a point \(\mathbf x\) if it is locally approximately linear with an error which decreases to zero faster than linearly as we approach \(\mathbf x\).

Deﬁnition 7.1. Suppose that \(E\) is a subset of of \({\mathbb R}^{m}\) and \(\mathbf x\) a point such that there exists a \(\delta >0\) with \(B({\mathbf x},\delta )\subseteq E\). We say that \(f:E\rightarrow {\mathbb R}^{p}\), is diﬀerentiable at \(\mathbf x\) if we can ﬁnd a linear map \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) such that, when \(\|{\mathbf h}\|<\delta \)

\begin{equation*} f({\mathbf x}+{\mathbf h})=f({\mathbf x})+\alpha{\mathbf h} +{\boldsymbol\epsilon}({\mathbf x},{\mathbf h})\|{\mathbf h}\| \tag*{$\bigstar$} \end{equation*}

where \(\|{\boldsymbol \epsilon }({\mathbf x},{\mathbf h})\|\rightarrow 0\) as \(\|{\mathbf h\|}\rightarrow 0\). We write \(\alpha =Df({\mathbf x})\) or \(\alpha =f'({\mathbf x})\).

If \(E\) is open and \(f\) is diﬀerentiable at each point of \(E\) we say that \(f\) is diﬀerentiable on \(E\).

Needless to say, the centre of the deﬁnition is the formula \(\bigstar \) and the reader should concentrate on understanding the rôle of each term in that formula. The rest of the deﬁnition is just supporting waﬄe. Formula \(\bigstar \) is sometimes rewritten \[\frac {f({\mathbf x}+{\mathbf h})-f({\mathbf x}) -\alpha {\mathbf h}}{\|{\mathbf h}\|}\rightarrow 0\] as \(\|{\mathbf h}\|\rightarrow 0\).

You have already studied diﬀerentiation in course C5/6 and you regularly use partial diﬀerentiation in applied courses and elsewhere. I shall therefore concentrate on recalling some of the key points.

Lemma 7.2. Let \(f\) be as in Deﬁnition 7.1. If we use standard coordinates then if \(f\) is diﬀerentiable at \(\mathbf x\) its partial derivatives \(\displaystyle f_{i,j}({\mathbf x})= \frac {\partial f_{i}}{\partial x_{j}}\) exist and the matrix of the derivative \(Df({\mathbf x})\) is the Jacobian matrix \((f_{i,j}({\mathbf x}))\) of partial derivatives.

It is customary to point out that the existence of the partial derivatives does not imply the diﬀerentiability of the function (see Example 8.5 below) but the main objections to over-reliance on partial derivatives are that this makes formulae cumbersome and stiﬂes geometric intuition. Let your motto be ‘coordinates and matrices for calculation, vectors and linear maps for understanding’.

One key property of diﬀerentiation is the chain rule. You have already seen this discussed and proved in course C5/6. Our discussion will run along the same lines but is slightly neater.

Lemma 7.3. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map we can ﬁnd a constant \(K\) such that \(|\alpha {\mathbf x}|\leq K\|{\mathbf x}\|\) for all \({\mathbf x}\in {\mathbb R}^{m}\).

We can now make the following deﬁnition.

Deﬁnition 7.4. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map then \[\|\alpha \|=\sup _{\|{\mathbf x}\|\leq 1}\|\alpha {\mathbf x}\|.\]

The following easy exercise provides an equivalent deﬁnition.

Exercise 7.5. Show that if \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map then \[\|\alpha \|=\sup _{\|{\mathbf x}\|= 1}\|\alpha {\mathbf x}\|.\]

This ‘operator norm’ has many pleasant properties.

Lemma 7.6. Let \(\alpha ,\beta :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) be linear maps.

(i) If \({\mathbf x}\in {\mathbb R}^{m}\) then \(\|\alpha {\mathbf x}\|\leq \|\alpha \|\,\|{\mathbf x}\|\).

(ii) \(\|\alpha \|\geq 0\),

(iii) If \(\|\alpha \|= 0\) then \(\alpha =0\),

(iv) If \(\lambda \in {\mathbb R}\) then \(\|\lambda \alpha \|=|\lambda |\|\alpha \|\).

(v) (The triangle inequality) \(\|\alpha +\beta \|\leq \|\alpha \|+\|\beta \|\).

(vi) If \(\gamma :{\mathbb R}^{p}\rightarrow {\mathbb R}^{q}\) then \(\|\gamma \alpha \|\leq \|\gamma \|\,\|\alpha \|\).

Lemma 7.7 (The chain rule). Let \(U\) be a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{m}\), and \(V\) a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{p}\). Suppose that \(f:U\rightarrow V\) is diﬀerentiable at \(\mathbf x\) with derivative \(\alpha \), that \(f:V\rightarrow {\mathbb R}^{q}\) is diﬀerentiable at \(\mathbf y\) with derivative \(\beta \) and that \(f({\mathbf x})={\mathbf y}\). Then \(g\circ f\) is diﬀerentiable at \(\mathbf x\) with derivative \(\beta \alpha \).

In more condensed notation

\begin{equation*} D(g\circ f)({\mathbf x})=Dg(f({\mathbf x}))Df({\mathbf x}). \tag*{$\bigstar\bigstar$} \end{equation*}

It is important to see that formula \(\bigstar \bigstar \) is exactly what we should expect and that its proof is simple book-keeping.

Remark One of the most troublesome culture clashes between pure mathematics and applied is that to an applied mathematician variables like \(x\) and \(t\) have meanings such as position and time whereas to a pure mathematician all variables are ‘dummy variables’ or ‘place-holders’ to be interchanged at will. To a pure mathematician \(v\) is an arbitrary function deﬁned by its eﬀect on a variable so that \(v(t)=At^{3}\) means precisely the same thing as \(v(x)=Ax^{3}\) whereas to an applied mathematician who thinks of \(v\) as a velocity the statements \(v(t)=At^{3}\) and \(v(x)=Ax^{3}\) mean very diﬀerent (indeed incompatible) things. The applied mathematician writes \[\frac {dv}{dt}=\frac {dv}{dx}\frac {dx}{dt}\] but the nearest the pure mathematician can get to this statement is \[\frac {d\ }{dt}v(x(t))=v'(x(t))x'(t).\] The same problems occur in still more confusing form when we deal with partial derivatives. In particular the formula \[\frac {\partial f}{\partial x_{i}} =\sum _{k=1}^{n}\frac {\partial f}{\partial y_{k}} \frac {\partial y_{k}}{\partial x_{i}}\] makes no sense without using further unspoken conventions. The only remedy that I can suggest is ‘think like a pure mathematician when doing pure mathematics and like an applied mathematician when doing applied mathematics’.

The following simple result has a simple direct proof but it is instructive to prove it by the chain rule.

Lemma 7.8. Let \(U\) be a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{m}\). Suppose that \(f,g:U\rightarrow {\mathbb R}^{p}\) are diﬀerentiable at \(\mathbf x\). Then \(f+g\) is diﬀerentiable at \(\mathbf x\) and \(D(f+g)({\mathbf x})=Df({\mathbf x})+Dg({\mathbf x})\).

So far our study of diﬀerentiation in higher dimensions has remained on the level of mere algebra. (The deﬁnition of the operator norm used the supremum and so lay deeper but we could have avoided the use of the operator norm as was done in course C5/6.) The next result is a true theorem of analysis.

Theorem 7.9 (The mean value inequality). Suppose that \(U\) is an open set in \({\mathbb R}^{m}\) and that \(f:U\rightarrow {\mathbb R}^{p}\) is diﬀerentiable. Consider the straight line segment \[L=\{(1-t){\mathbf a}+t{\mathbf b}:0\leq t\leq 1\}\] joining \(\mathbf a\) and \(\mathbf b\). If \(L\subseteq U\) (i.e. \(L\) lies entirely within \(U\)) and \(\|Df({\mathbf x})\|\leq K\) for all \({\mathbf x}\in L\) then \[\|f({\mathbf a})-f({\mathbf b})\| \leq K\|{\mathbf a}-{\mathbf b}\|.\]

The reader may be disappointed that a theorem involving an equality in one dimension should be replaced by one involving an inequality in many dimensions. However, inspection of the use of the mean value theorem in one dimension shows that its essential content is the statement that ‘local growth bounds global growth’ and this statement is precisely the mean value inequality.

In any case the following example shows that we can not hope for an exact analogue of the one dimensional theorem.

Example 7.10. Let \(f:{\mathbb R}\rightarrow {\mathbb R}^{2}\) be given by \(f(t)=(\cos t,\sin t)^{T}\). Then \(f(0)=f(2\pi )\) but \(Df(t)\neq 0\) for all \(t\).

It is also clear (though we shall not prove it, and, indeed can not state it without using concepts which we have not formally deﬁned) that the correct generalisation when \(L\) is not a straight line will run as follows. ‘If \(L\) is a well behaved path lying entirely within \(U\) and \(\|Df({\mathbf x})\|\leq K\) for all \({\mathbf x}\in L\) then \(\|f({\mathbf a})-f({\mathbf b})\| \leq K\times \operatorname {length}L\)’.

Example 7.11. Let \[U=\{{\mathbf x}\in {\mathbb R}^{2}:\|{\mathbf x}\|>1\} \setminus \{(x,0)^{T}:x\leq 0\}\}\] If we take \(\theta ({\mathbf x})\) to be the unique solution of \[\cos (\theta ({\mathbf x}))=x,\ \sin (\theta ({\mathbf x}))=y, \ -\pi <\theta (\mathbf x)<\pi \] for \({\mathbf x}=(x,y)^{T}\in U\) then \(\theta :U\rightarrow {\mathbb R}\) is everywhere diﬀerentiable with \(\|D\theta ({\mathbf x})\|<1\). However if \(\mathbf {a}=(-1,10^{-1})^{T}\), \(\mathbf {b}=(-1,-10^{-1})^{T}\) then \[|\theta (\mathbf {a})-\theta (\mathbf {b})|> \|\mathbf {a}-\mathbf {b}\|.\]

In this section we exploit the mean value inequality to obtain information about the behaviour of reasonably well behaved functions \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}\) in the neighbourhood of a point \(\mathbf x\). (Since we can always translate we shall sometimes put \({\mathbf x}={\mathbf 0}\) without comment.) It may be worth alerting the reader to the fact that we shall be using the one-dimensional mean value inequality (see Theorem 6.8 (iv)) in forms like the following. ‘If \(|f_{,1}(x,c)|\leq K\) for some ﬁxed \(c\) and all \(a\leq x\leq b\) then \(|f(a,c)-f(b,c)|\leq K|b-a|\).’ In this section we shall use row rather than column vectors.

Here is our ﬁrst example.

Theorem 8.1 (Continuity of partial derivatives implies diﬀerentiability). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\), \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\) and \(f_{,2}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \[f((x+h,y+k))=f(x,y)+f_{,1}(x,y)h+f_{,2}(x,y)k+ \epsilon (h,k)(h^{2}+k^{2})^{1/2}\] we have \(\epsilon (h,k)\rightarrow 0\) as \((h^{2}+k^{2})^{1/2}\rightarrow 0\). (In other words \(f\) is diﬀerentiable at \(\mathbf x\).)

Although this is not one of the great theorems of all time (or indeed of this course) it does provide a useful short cut for proving functions diﬀerentiable. The following easy extension is left to the reader.

Theorem 8.2. (i) Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), …\(f_{,m}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then \(f\) is diﬀerentiable at \(\mathbf x\).

(ii) Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}^{p}\). If the partial derivatives \(f_{i,j}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) \([1\leq i\leq p,\ 1\leq j\leq m]\) then \(f\) is diﬀerentiable at \(\mathbf x\).

The next result, although along similar lines, is much more interesting. We write \[f_{,ij}({\mathbf x})=(f_{,j})_{,i}({\mathbf x}),\] or, in more familiar notation \[f_{,ij}=\frac {\partial ^{2}f}{\partial x_{i}\partial x_{j}}.\]

Theorem 8.3 (Second order Taylor series). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), \(f_{,11}\), \(f_{,12}\), \(f_{,22}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \begin {align*} f((x+h,y+k))=&f(x,y)+f_{,1}(x,y)h+f_{,2}(x,y)k\\ &+\tfrac {1}{2}(f_{,11}(x,y)h^{2}+2f_{,12}(x,y)hk +f_{,22}(x,y)k^{2})+\epsilon (h,k)(h^{2}+k^{2}) \end {align*}

we have \(\epsilon (h,k)\rightarrow 0\) as \((h^{2}+k^{2})^{1/2}\rightarrow 0\).

We have the following important corollary.

Theorem 8.4 (Symmetry of second partial derivative). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), \(f_{,11}\), \(f_{,12}\), \(f_{,21}\) \(f_{,22}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then \(f_{,12}({\mathbf x})=f_{,21}({\mathbf x})\).

Stripped of all the ﬁne (and for practical purposes irrelevant) detail this tells us that we can interchange the order of partial diﬀerentiation for well behaved functions.

It is possible to produce plausibility arguments for the symmetry of second partial derivatives. Here are a couple.

(1) If \(f\) is a multinomial, i.e. \(f(x,y)=\sum _{p=0}^{P}\sum _{q=0}^{Q}a_{p,q}x^{p}y^{q}\), then \(f_{,12}=f_{,21}\). But smooth functions are very close to being polynomial so we would expect the result to be true in general.

(2) Although we can not interchange limits in general it is plausible that if \(f\) is well behaved then \begin {align*} f_{,12}(x,y)&=\lim _{h\rightarrow 0}\lim _{k\rightarrow 0} h^{-1}k^{-1}(f(x+h,y+k)-f(x+h,y)-f(x,y+k)+f(x,y))\\ &=\lim _{k\rightarrow 0}\lim _{h\rightarrow 0} h^{-1}k^{-1}(f(x+h,y+k)-f(x+h,y)-f(x,y+k)+f(x,y))\\ &=f_{,21}(x,y). \end {align*}

However, these are merely plausible arguments. They do not make clear the role of the continuity of the second derivative (in Example 8.6 we shall see that the result may fail for discontinuous second partial derivatives). More fundamentally they are algebraic arguments and, as the use of the mean value theorem indicates, the result is one of analysis.

Theorems 8.1 and 8.4 are inextricably linked in many examiners’ minds with the following two counter-examples. The present lecturer would rather you forgot all the course than that you forgot how to prove the theorems and remembered the counter-examples but will not let his personal feelings get in the way of his duty.

Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) has \(f({\mathbf 0})=0\). We write \(g(r,\theta )=f(r\cos \theta ,r\sin \theta )\) (i.e. use polar coordinates). If \(f\) is once diﬀerentiable then, to ﬁrst order in \(r\),

(i) \(g(r,\theta )\) grows like \(r\),

(ii) the contours \(f(x,y)=c\) are parallel lines.

If we seek counter-examples it is reasonable to look for an \(f\) such that (i) is true but (ii) is not. We might therefore be led to consider \(g(r,\theta )=r\sin 2\theta \). (Finding counter-examples is more like tinkering with a meccano kit than anything else.)

Example 8.5. If \begin {align*} f(x,y)&=\frac {xy}{(x^{2}+y^{2})^{1/2}}\qquad \text {for $(x,y)\neq (0,0)$},\\ f(0,0)&=0, \end {align*}

then \(f\) has ﬁrst partial derivatives everywhere, \(f\) is diﬀerentiable except at \((0,0)\) where it is not.

Emboldened by our success we could well guess immediately a suitable function to look for in the context of Theorem 8.4. If not let us consider \(f\) and \(g\) as before. Suppose that \[f({\mathbf 0})=f_{,1}({\mathbf 0})=f_{,2}({\mathbf 0})=0.\] If \(f\) obeys the conclusions of Theorem 8.3 (i.e. has a second order Taylor series) then, to second order in \(r\),

(i) \(g(r,\theta )\) grows like \(r^{2}\),

(ii) the contours \(f(x,y)=c\) are conics.

Once again we look for an \(f\) such that (i) is true but (ii) is not. We might, for example, consider \(g(r,\theta )=r^{2}\sin 4\theta \).

Example 8.6. If \begin {align*} f(x,y)&=\frac {xy(x^{2}-y^{2})}{(x^{2}+y^{2})} \qquad \text {for $(x,y)\neq (0,0)$},\\ f(0,0)&=0, \end {align*}

then \(f\) has ﬁrst and second partial derivatives everywhere but \(f_{,12}(0,0)\neq f_{,21}(0,0)\).

It is profoundly unfortunate that the last contact many mathematicians have with this topic is a couple of complicated counter-examples. Multi-dimensional calculus leads towards diﬀerential geometry and inﬁnite dimensional calculus (functional analysis). Both subjects depend on understanding objects which we know to be well behaved but which our limited geometric intuition makes it hard for us to comprehend. Counter-examples such as the ones just produced are simply irrelevant

What happens if a function is smooth (has partial derivatives of all orders)? The following theorem completes the discussion of this section. It is not on the syllabus and left to the reader as an exercise (but a very instructive one).

Theorem 8.7 (The local Taylor’s theorem). Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}^{p}\). If the all partial derivatives \(f_{i,j}\), \(f_{i,jk}\), \(f_{i,jkl}\), …exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \begin {align*} f_{i}({\mathbf x}+{\mathbf h})=f_{i}({\mathbf x})& +\sum _{j=1}^{m}f_{i,j}({\mathbf x})h_{j}+ \sum _{j=1}^{m}\sum _{k=1}^{m}f_{i,jk}({\mathbf x})h_{j}h_{k}\\ &+\dots +\text {sum up to $n$th powers}+ \epsilon _{i}(\mathbf {h})\|\mathbf {h}\|^{n} \end {align*}

we have \(\|{\boldsymbol \epsilon }(\mathbf {h})\|\rightarrow 0\) as \(\|\mathbf {h}\|\rightarrow 0\).

The wide awake reader may observe that I used (gasp, horror!) coordinates in the statement of Theorem 8.7. However, the main formula can be stated in a coordinate free way as \[ f({\mathbf x}+{\mathbf h})=f({\mathbf x}) +\alpha _{1}({\mathbf h})+\alpha _{2}({\mathbf h},{\mathbf h})+ +\dots +\alpha _{n}({\mathbf h},{\mathbf h},\dots ,{\mathbf h}) +{\boldsymbol \epsilon }(\mathbf {h})\|\mathbf {h}\|^{n}, \] where \(\alpha _{k}:{\mathbb R}^{m}\times {\mathbb R}^{m} \dots \times {\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is linear in each variable (i.e. a \(k\)-linear function). For more details consult section 12 of chapter VIII of Dieudonné’s Foundations of Modern Analysis [5] where the higher derivatives are dealt with in a coordinate free way. Like Hardy’s book [6], Dieudonné’s is a masterpiece but in very diﬀerent tradition.

Anyone who feels that the higher derivatives are best studied using coordinates should reﬂect that if \(f:{\mathbb R}^{3}\rightarrow {\mathbb R}^{3}\) is well behaved then the ‘third derivative behaviour’ at of \(f\) at a single point is apparently given by the \(3\times 3\times 3\times 3=81\) numbers \(f_{i,jkl}(\mathbf {x})\). By symmetry (see Theorem 8.4) only \(30\) of the numbers are distinct but these \(30\) numbers are independent (consider polynomials in three variables for which the total degree of each term is \(3\)). How can we understand the information carried by an array of \(30\) real numbers?

(Much of this section was sketched in the course C5/6 but this treatment will supply some important extra detail.) Everybody knows what area is, but then everybody knows what honey tastes like. But does honey taste the same to you as it does to me? Perhaps the question is unanswerable but for many practical purposes it is suﬃcient that we agree on what we call honey. In the same way it is important that when two mathematicians talk about area that (1) they should agree on which sets \(E\) actually have area, and that (2) when a set \(E\) has area they should agree as to what the area is.

At one time some mathematicians may well have hoped that every bounded set in \({\mathbb R}^{2}\) could be given an area, every bounded set in \({\mathbb R}^{3}\) could be given a volume, and so on. However Banach and Tarski showed that a ball in \({\mathbb R}^{3}\) could be split into seven parts which could be reassembled after rotation and translation into a ball of smaller radius. (The proof is non-trivial but there is a good discussion in Wagon’s book The Banach–Tarski Paradox [10].) If each of these seven parts had a volume then repeated Banach Tarski decompositions would enable us to get a quart into a pint pot. Warned by this we shall not attempt to assign area to every subset of \({\mathbb R}^{2}\) or to integrate every function.

Turning speciﬁcally to the Riemann integral we consider bounded functions \(f:[a,b]\rightarrow {\mathbb R}\). (Thus there exists a \(K\) with \(|f(x)|\leq K\) for all \(x\in [a,b]\)). A dissection \(\mathcal {D}\) of \([a,b]\) is a ﬁnite subset of \([a,b]\) containing the end points \(a\) and \(b\). By convention we write \[\mathcal {D}=\{x_{0},x_{1},\dots ,x_{n}\} \ \text {with}\ a=x_{0}\leq x_{1}\leq x_{2}\leq \dots \leq x_{n}=b.\] We deﬁne the upper sum and lower sum of \(f\) associated with \(\mathcal {D}\) by \begin {align*} S(f,\mathcal {D})=&\sum _{j=1}^{n}(x_{j}-x_{j-1}) \sup _{x\in [x_{j-1},x_{j}]}f(x),\\ s(f,\mathcal {D})=&\sum _{j=1}^{n}(x_{j}-x_{j-1}) \inf _{x\in [x_{j-1},x_{j}]}f(x) \end {align*}

The next lemma is hardly more than an observation but it is the key to the proper treatment of the integral.

Lemma 9.1 (Key integration property). If \(f:[a,b]\rightarrow {\mathbb R}\) is bounded and \(\mathcal {D}_{1}\) and \(\mathcal {D}_{2}\) are two dissections then

\begin{equation*} S(f,\mathcal{D}_{1})\geq S(f,\mathcal{D}_{1}\cup\mathcal{D}_{2}) \geq s(f,\mathcal{D}_{1}\cup\mathcal{D}_{2})\geq s(f,\mathcal{D}_{2}). \tag*{$\bigstar$} \end{equation*}

The inequality \(\bigstar \) tells us that, whatever dissection you pick and whatever dissection I pick, your lower sum can not exceed my upper sum. There is no way we can put a quart in a pint pot and the Banach-Tarski phenomenon is avoided.

Since \(S(f,\mathcal {D})\geq -(b-a)K\) for all dissections \(\mathcal {D}\) we can deﬁne the upper integral \(I^{*}(f)=\inf _{\mathcal {D}}S(f,\mathcal {D})\). We deﬁne the lower integral similarly as \(I_{*}(f)=\sup _{\mathcal {D}}s(f,\mathcal {D})\). The inequality \(\bigstar \) tells us that these concepts behave well.

If \(I^{*}(f)=I_{*}(f)\) we say that \(f\) is Riemann integrable and we write \[\int _{a}^{b}f(x)\,dx=I^{*}(f).\] We write \(\mathcal {R}[a,b]\) or sometimes just \(\mathcal {R}\) for the set of Riemann integrable functions on \([a,b]\).

The following lemma provides a convenient criterion for Riemann integrability.

Lemma 9.3. (i) A bounded function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable if and only if given any \(\epsilon >0\) we can ﬁnd a dissection \(\mathcal D\) with \[S(f,{\mathcal D})-s(f,{\mathcal D})<\epsilon .\]

(ii) A bounded function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable with integral \(I\) if and only if given any \(\epsilon >0\) we can ﬁnd a dissection \(\mathcal D\) with \[S(f,{\mathcal D})-s(f,{\mathcal D})<\epsilon , \ \text {and}\ S(f,{\mathcal D})\geq I\geq s(f,{\mathcal D}).\]

The reader who is tempted to start her treatment of the Riemann integral with Lemma 9.3 (ii) should reﬂect that without the inequality \(\bigstar \) she will ﬁnd it hard to prove the uniqueness of \(I\).

Even though we admit that we can not expect all functions to be integrable, any satisfactory theory of integration must have a large collection of integrable functions.

Theorem 9.4. (i) Any monotonic (that is any increasing or decreasing) function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable.

(ii) Any continuous function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable.

Part (i) of Theorem 9.4 looks a little odd. However most of the functions \(f\) met with in ‘every day life’ including many discontinuous ones are the diﬀerence of two increasing functions (that is \(f=f_{1}-f_{2}\) with \(f_{1}\) and \(f_{2}\) increasing). Such functions are called ‘functions of bounded variation’ and have a very pretty, if old fashioned, theory of their own. Lemma 9.7 tells us that every function of bounded variation is Riemann integrable. It is worth noting that Riemann did not have the theorem that every continuous function on a closed bounded interval is uniformly continuous and so was unable to prove that every continuous function is Riemann integrable.

The reader will recall from course C5/6 the standard example of bounded function which is not Riemann integrable.

Example 9.5. If \(f:[0,1]\rightarrow {\mathbb R}\) is given by \begin {align*} f(x)=0&\qquad \text {when $x$ is rational,}\\ f(x)=1&\qquad \text {when $x$ is irrational,} \end {align*}

then \(f\) is not Riemann integrable.

The following results are easy to prove.

Lemma 9.6. Suppose that \(a\leq c\leq b\) and \(f:[a,b]\rightarrow {\mathbb R}\). If \(f|_{[a,c]}\in {\mathcal R}[a,c]\) and \(f|_{[c,b]}\in {\mathcal R}[c,b]\) then \(f\in {\mathcal R}[a,b]\) and \[\int _{a}^{b}f(x)\,dx=\int _{a}^{c}f(x)\,dx +\int _{c}^{b}f(x)\,dx .\]

If we adopt the convention that \(\int _{b}^{a}f(x)\,dx=-\int _{a}^{b}f(x)\,dx\) then the formula \[\int _{a}^{b}f(x)\,dx=\int _{a}^{c}f(x)\,dx +\int _{c}^{b}f(x)\,dx \] holds independently of the order of \(a\), \(b\) and \(c\).

Lemma 9.7. If \(\lambda ,\mu \in {\mathbb R}\) and \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable then so is \(\lambda f+\mu g\) and \[\int _{a}^{b}\lambda f(x)+\mu g(x)\,dx =\lambda \int _{a}^{b}f(x)\,dx+\mu \int _{a}^{b}g(x)\,dx.\]

In the language of linear algebra \({\mathcal R}[a,b]\) is a vector space and the integral is a linear functional (i.e. a linear map from \({\mathcal R}[a,b]\) to \(\mathbb R\).).

Lemma 9.8. If \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable and \(f(x)\geq g(x)\) for all \(x\in [a,b]\) then \[\int _{a}^{b}f(x)\geq \int _{a}^{b}g(x)\,dx.\]

The next result requires a small idea.

Lemma 9.10. (i) If \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable then so is \(f^{2}\).

(ii) If \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable then so is \(fg\).

We are now in a position to prove the fundamental theorem of the calculus.

Theorem 9.11 (Fundamental theorem of the calculus). Suppose that \(u\in (a,b)\) and \(f:(a,b)\rightarrow {\mathbb R}\) is a Riemann integrable function function which is continuous at some \(t\in (a,b)\). If we set \[F(s)=\int _{u}^{s}f(x)\,dx\] then \(F\) is diﬀerentiable at \(t\) and \(F'(t)=f(t)\).

Sometimes we think of the fundamental theorem in a slightly diﬀerent way.

Theorem 9.12. Suppose that \(f:(a,b)\rightarrow {\mathbb R}\) is continuous, that \(u\in (a,b)\) and \(c\in {\mathbb R}\). Then there is a unique solution to the diﬀerential equation \(g'(t)=f(t)\) \([t\in (a,b)]\) such that \(g(u)=c\).

Thus (under appropriate circumstances) integration and diﬀerentiation are inverse operations and the the theories of diﬀerentiation and integration are subsumed in the greater theory of the calculus. Under appropriate circumstances, if the graph of \(F\) has tangent with slope \(f(x)\) at \(x\) \begin {align*} \text {area under}&\ \text {the graph of slope of tangent of $F$}\\ &=\text {area under the graph of $f$}\\ &=\int _{a}^{b}f(x)\,dx=\int _{a}^{b}F'(x)\,dx=F(b)-F(a). \end {align*}

Although it is not explicitly on the syllabus the examiners seem deeply attached to the next theorem. The result was stated in course C4 and the proof makes use of quite a lot of the methods developed in the present course.

Theorem 9.13 (Diﬀerentiation under the integral). Suppose \(g:[a,b]\times [c,d]\) is continuous and that the partial derivative \(g_{2}\) exists and is continuous. Then writing \(G(y)=\int _{a}^{b}g(x,y)\,dx\) we have \(G\) diﬀerentiable on \((c,d)\) with \[G'(y)=\int _{a}^{b}g_{,2}(x,y)\,dx.\]

In the last few years the examiners have shown an excessive interest in the next lemma probably because the proof neatly combines several of our previous results.

Lemma 9.14. Suppose that \(g:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous and has continuous partial derivative \(g_{,2}\).

(i) If we set \(G(x,y)=\int _{0}^{x}g(u,y)\,du\) then \(G\) is diﬀerentiable with partial derivatives \[G_{,1}(x,y)=g(x,y)\ \text {and} \ G_{,2}(x,y)=\int _{0}^{x}g_{,2}(u,y)\,du.\]

(ii) If we set \(F(t)=\int _{0}^{t}g(u,t)\,du\) then \(F\) is diﬀerentiable with derivative \[F'(t)=g(t,t)+\int _{0}^{t}g_{,2}(u,t)\,du.\]

We have deﬁned Riemann integration for bounded functions on bounded intervals. However, the reader will already have evaluated, as a matter of routine, integrals in the following manner \[\int _{0}^{1}x^{-1/2}\,dx=\lim _{\epsilon \rightarrow 0+} \int _{\epsilon }^{1}x^{-1/2}\,dx= \lim _{\epsilon \rightarrow 0+}[2x^{1/2}]_{\epsilon }^{1}=2,\] and \[\int _{1}^{\infty }x^{-2}\,dx=\lim _{R\rightarrow \infty } \int _{1}^{R}x^{-2}\,dx= \lim _{R\rightarrow \infty }[-x^{-1}]_{1}^{R}=1.\]

A full theoretical treatment of such integrals with the tools at our disposal is apt to lead into a howling wilderness of ‘inﬁnite integrals of the ﬁrst kind’, ‘Cauchy principal values’ and so on. Instead I shall give a few typical theorems, deﬁnitions and counter-examples from which the reader should be able to reconstruct any parts of the theory that she needs.

Deﬁnition 10.1. If \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\) and \(\int _{a}^{X}f(x)\,dx\rightarrow L\) as \(X\rightarrow \infty \) then we say that \(\int _{a}^{\infty }f(x)\,dx\) exists with value \(L\).

Lemma 10.2. Suppose \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\). If \(f(x)\geq 0\) for all \(x\) then \(\int _{a}^{\infty }f(x)\,dx\) exists if and only if there exists a \(K\) such that \(\int _{a}^{X}f(x)\,dx\leq K\) for all \(X\).

Lemma 10.3. Suppose \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\). If \(\int _{a}^{\infty }|f(x)|\,dx\) exists then \(\int _{a}^{\infty }f(x)\,dx\) exists.

It is natural to state Lemma 10.3 as saying that ‘absolute convergence of the integral implies conditional convergence’.

Additional problems arise when there are two limits involved.

Example 10.4. If \(\lambda ,\mu >0\) then \[\int _{-\mu R}^{\lambda R}\frac {x}{1+x^{2}}\,dx \rightarrow \log \frac {\lambda }{\mu }\] as \(R\rightarrow \infty \).

A pure mathematician gets round this problem by making a deﬁnition along these lines.

Deﬁnition 10.5. If \(f:{\mathbb R}\rightarrow {\mathbb R}\) is such that \(f|_{[-X,Y]}\in {\mathcal R}[X,Y]\) for each \(X,Y>0\) \(\int _{-\infty }^{\infty }f(x)\,dx\) exists with value \(L\) if and only if the following condition holds. Given \(\epsilon >0\) we can ﬁnd an \(X_{0}(\epsilon )>0\) such that \[\left |\int _{-X}^{Y}f(x)\,dx-L\right |<\epsilon .\] for all \(X,Y>X_{0}(\epsilon )\).

The physicist gets round the problem by ignoring it. If she
is a real physicist with correct physical intuition this works
splendidly^{4}
but if not, not.

Speaking broadly, inﬁnite integrals \(\int _{E}f(x)\,dx\) work well when they are absolutely convergent, that is \(\int _{E}|f(x)|\,dx<\infty \), but are full of traps for the unwary otherwise.

So far we have dealt only with the integration of functions \(f:E\rightarrow {\mathbb R}\) with \(E\) a ‘well behaved’ subset of \(\mathbb R\). It is not diﬃcult to extend our deﬁnitions to the full multidimensional case but the main users of multidimensional integrals are either applied mathematicians (in the broadest sense), diﬀerential geometers and analysts (including probabilists). The applied mathematicians are happy to take the details on trust and both the diﬀerential geometers and the analysts require more sophisticated approaches (though of rather diﬀerent kinds).

However the particular case of a function \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is needed in the complex variable courses (with \(m=2\)) and elsewhere when line integrals are used. The deﬁnition is simple.

Deﬁnition 10.6. If \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is such that \(f_{j}:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable for each \(j\) then \(\int _{a}^{b}f(x)\,dx=\mathbf {y}\) where \({\mathbf y}\in {\mathbb R}^{m}\) and \[y_{j}=\int _{a}^{b}f_{j}(x)\,dx.\]

It is easy to obtain the properties of this integral directly from its deﬁnition. Here is an example.

Lemma 10.7. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is linear and \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is Riemann integrable then so is \(\alpha \circ f\) and \[\int _{a}^{b}\alpha (f(x))\,dx=\alpha \left ( \int _{a}^{b}f(x)\,dx\right ).\]

Taking \(\alpha \) to be any orthogonal transformation of \({\mathbb R}^{m}\) to itself we see that that our deﬁnition is, in fact, coordinate independent. Choosing a particular orthogonal transformation we obtain the following nice result.

Theorem 10.8. If \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is Riemann integrable then \[\left \|\int _{a}^{b}f(x)\,dx\right \| \leq (b-a)\sup _{x\in [a,b]}\|f(x)\|.\]

This result and its extensions are often quoted in the form \[\text {integral}\leq \text {length}\times \text {$\sup $}.\]

Exercise 10.9. Show that the collection \(\mathcal R\) of Riemann integrable functions \(f:[a,b]\rightarrow {\mathbb R}^{m}\) form a real vector space. If we write \[Tf=\int _{a}^{b}f(x)\,dx\] show that \(T:{\mathcal R}\rightarrow {\mathbb R}\) is a linear map and \(|Tf|\leq (b-a)\|f\|\).

The Riemann integral is simple to deﬁne and easy to use. So long as we only
need to integrate continuous functions (or functions continuous except at a few
simple discontinuities) it is perfectly satisfactory. The great majority of
mathematicians need nothing else. However, the modern theory of probability
and many parts of modern analysis need the more powerful integral of
Lebesgue (a hint as to why this might be is given in Exercise 13.9) and
its relatives. This is just as easy to use but substantially harder to set
up^{5}.

One of the unifying ideas of modern analysis is that of distance. We start with a deﬁnition modelled on those properties of Euclidean distance which do not refer to vector space structures. (Thus you should both compare and contrast Lemma 4.1.)

Deﬁnition 11.1. We say that \((X,d)\) is a metric space if \(X\) is a set and \(d:X^{2}\rightarrow \mathbb {R}\) is a function with the following properties:-

(i) \(d(x,y)\geq 0\) for all \(x,y\in X\).

(ii) \(d(x,y)=0\) if and only if \(x=y\).

(iii) \(d(x,y)=d(y,x)\) for all \(x,y\in X\).

(iv) (The triangle inequality) \(d(x,z)\leq d(x,y)+d(y,z)\)

As ought to be the case, we have the following result.

Lemma 11.2. If \(d({\mathbf x},{\mathbf y})=\|{\mathbf x}-{\mathbf y}\|\) then \(({\mathbb R}^{m},d)\) is a metric space.

I shall give later some examples of metric spaces which are both useful and novel (in the context of this course). The metric of Lemma 11.2 is useful but hardly novel. In the interest of balance let me give a metric which will be novel to most of my readers even if it is not useful.

Example 11.3. If \(d({\mathbf x},{\mathbf y})=\|{\mathbf x}\|+\|{\mathbf y}\|\) when \(\mathbf {x}\neq \mathbf {y}\) and \(d({\mathbf x},{\mathbf x})=0\) then \(({\mathbb R}^{m},d)\) is a metric space.

(This metric is called the British Railway metric.) Here is an even odder railway metric.

Example 11.4. If \(\mathbf x\) and \(\mathbf y\) are linearly dependent (in more geometrical language) if \(\mathbf x\), \(\mathbf y\) and \(\mathbf 0\) are on the same straight line) then set \[d({\mathbf x},{\mathbf y})=\|{\mathbf x}-{\mathbf y}\|.\] Otherwise, set \[d({\mathbf x},{\mathbf y})=\|{\mathbf x}\|+\|{\mathbf y}\|.\] With this deﬁnition, \(({\mathbb R}^{m},d)\) is a metric space.

Surprising as it may seem the following rather trivial metric is a rich source of intuition for communication theory. (However it is of no analytic interest.)

Lemma 11.5 (Hamming metric). Let \(M\) be the space of sequences \[{\mathbf x}=(x_{1},x_{2},\dots ,x_{n})\] with \(x_{j}=0\) or \(x_{j}=1\). We call \(M\) the space of messages of length \(n\). If \(\mathbf x\) and \(\mathbf y\) are messages we deﬁne \[d({\mathbf x},{\mathbf y})=\sum _{j=1}^{n}|x_{j}-y_{j}|\] to be the Hamming distance between the two messages. \((M,d)\) is a metric space.

Much of the ‘mere algebra’ which we did for the Euclidean metric carries over with hardly any change to the general metric case. Compare the following deﬁnition with Deﬁnition 4.2.)

Deﬁnition 11.6. Let \((X,d)\) be a metric space. If \(a_{n}\in X\) for each \(n\geq 1\) and \(a\in X\) then we say that \(a_{n}\rightarrow a\) if given \(\epsilon >0\) we can ﬁnd an \(n_{0}(\epsilon )\) such that \[d(a_{n},a)<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]

However, we are not now dealing with a ‘norm on a vector space’ so some results do not carry over. The result corresponding to Lemma 4.3 has far fewer parts.

Lemma 11.7. Let \((X,d)\) be a metric space.

(i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).

(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).

(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).

The material on open and closed sets from Section 5 goes through essentially unchanged.

Deﬁnition 11.8. Let \((X,d)\) be metric space. A set \(F\subseteq X\) is closed if whenever \(x_{n}\in F\) for each \(n\) and \(x_{n}\rightarrow x\) as \(n\rightarrow \infty \) then \(x\in F\).

Deﬁnition 11.9. Let \((X,d)\) be metric space A set \(U\subseteq X\) is open if whenever \(x\in U\) there exists an \(\epsilon >0\) such that whenever \(d(x,y)<\epsilon \) we have \(y\in U\).

Example 11.10. Let \((X,d)\) be metric space. Let \(x\in X\) and \(r>0\).

(i) The set \(B(x,r) =\{y\in X:d(x,y)<r\}\) is open.

(ii) The set \(\bar {B}(x,r) =\{y\in X:d(x,y)\leq r\}\) is closed.

We call \(B(x,r)\) the open ball of radius \(r\) and centre \(x\). We call \(\bar {B}(x,r)\) the closed ball of radius \(r\) and centre \(x\).

Lemma 11.11. Let \((X,d)\) be a metric space. A subset \(A\) of \(X\) is open if and only if each point of \(A\) is the centre of an open ball lying entirely within \(A\).

Deﬁnition 11.12. The set \(N\) is a neighbourhood of the point \(x\) if we can ﬁnd an \(r>0\) such that \(B(x,r)\subseteq N\).

Lemma 11.13. Let \((X,d)\) be a metric space. A subset \(A\) of \(X\) is open if and only if its complement \(X\setminus A\) is closed.

Lemma 11.14. Let \((X,d)\) be a metric space. Consider the collection \(\tau \) of open sets in \(X\).

(i) \(\emptyset \in \tau \), \(X\in \tau \).

(ii) If \(U_{\alpha }\in \tau \) for all \(\alpha \in A\) then \(\bigcup _{\alpha \in A} U_{\alpha }\in \tau \).

(iii) If \(U_{1},U_{2},\dots ,U_{n}\in \tau \) then \(\bigcap _{j=1}^{n}U_{j}\in \tau \).

Lemma 11.15. Let \((X,d)\) be a metric space. Consider the collection \(\mathcal {F}\) of closed sets in \(X\).

(i) \(\emptyset \in \mathcal {F}\), \(X\in \mathcal {F}\).

(ii) If \(F_{\alpha }\in \mathcal {F}\) for all \(\alpha \in A\) then \(\bigcap _{\alpha \in A} F_{\alpha }\in \mathcal {F}\).

(iii) If \(F_{1},F_{2},\dots ,F_{n}\in \mathcal {F}\) then \(\bigcup _{j=1}^{n}F_{j}\in \mathcal {F}\).

The new deﬁnition of continuity only breaks the chain of translations slightly because it involves two metric spaces.

Deﬁnition 11.16. Let \((X,d)\) and \((Z,\rho )\) be metric spaces. We say that a function \(f:X \rightarrow Z\) is continuous at some point \(x\in X\) if given \(\epsilon >0\) we can ﬁnd a \(\delta (\epsilon ,x)>0\) such that if \(y\in X\) and \(d(x,y)<\delta (\epsilon ,x)\) we have \[\rho [(f(x),f(y))<\epsilon .\] If \(f\) is continuous at every point \(x\in X\) we say that \(f\) is a continuous function on \(X\).

The reader may feel that Deﬁnition 5.12 is more general in than Deﬁnition 11.16 because it involves a set \(E\).

The following remark shows that this is not so.

Lemma 11.17. Let \((X,d)\) be a metric space and \(E\subseteq X\). Let \(d_{E}:E^{2}\rightarrow {\mathbb R}\) be given by \(d_{E}(u,v)=d(u,v)\) whenever \(u,v\in E\). Then \((E,d_{E})\) is a metric space.

We conclude this section with more results taken directly from Section 5

Lemma 11.18. Let \((X,d)\) and \((Z,\rho )\) be metric spaces and suppose that the function \(f:X\rightarrow Z\) is continuous at \(x\in X\). Then if \(x_{n}\in X\) and \(x_{n}\rightarrow x\) it follows that \(f(x_{n})\rightarrow f(x)\).

Lemma 11.19. Let \((X,d)\) and \((Z,\rho )\) be metric spaces. The function \(f:X\rightarrow Z\) is continuous if and only if \(f^{-1}(O)\) is open whenever \(O\) is open.

Lemma 11.20. Let \((X,d)\) \((Y,\theta )\) and \((Z,\rho )\) be metric spaces. If \(f:X\rightarrow Y\) and \(g:Y\rightarrow Z\) are continuous then so is their composition \(g\circ f\).

Question 21.1 which asks you to look at open balls and open sets in the two railway metrics given in Examples 11.3 and 11.4 emphasises the fact that open sets in diﬀerent metrics may be very diﬀerent.

In this short starred section I look at a very important class of metrics. The discussion will be informal, as indicated by the replacement of the word ‘Theorem’ by ‘Statement’.

Suppose we wish to build a road joining two points of the plane. The cost of a
building will depend on the nature of the terrain. If the cost of building a short
stretch of length \(\delta s\) near a point \((x,y)\) is (to ﬁrst order) \(g(x,y)\delta s\) then subject to everything being
well behaved the cost of road \(\Gamma \) will be \[\int _{\Gamma }g(x,y)ds.\] ‘But’ cries the reader ‘you have
not deﬁned line integrals yet!’ To which I reply ‘Mathematical discourse
takes place on many levels and we must learn to move freely between
them’^{6}.
It is tempting to deﬁne the distance \(d(A,B)\) between two points \(A\) and \(B\) in the plane as
\[d(A,B)=\inf \left \{\int _{\Gamma }g(x,y)ds: \text {$\Gamma $ joining $A$ and $B$}\right \}.\]

If we include amongst our conditions that \(g\) is continuous and \(g(x,y)>0\) everywhere we see that \(d\) is a metric.

Statement 12.1. (i) \(d(A,B)\geq 0\) for all \(A\) and \(B\).

(ii) \(d(A,B)=0\) if and only if \(A=B\).

(iii) \(d(A,B)=d(B,A)\) for all \(A\) and \(B\).

(iv) \(d(A,C)\leq d(A,B)+d(B,C)\) for all \(A\), \(B\) and \(C\).

I have rather carefully kept to ‘\(\inf \)’ rather than to ‘\(\min \)’. One problem is that I have not speciﬁed the kind of \(\Gamma \) that is permissible. (Notice that the argument we used to prove Statement 12.1 means that we could not just use ‘smooth’.) However, it can be shown that if we choose a suitable class for the possible \(\Gamma \) and a suitably well behaved \(g\) then the minimum is attained. If \(\Gamma _{0}\) is a path from \(A\) to \(B\) such that \[\int _{\Gamma _{0}}g(x,y)ds=d(A,B)\] we call \(\Gamma _{0}\) a geodesic. (The geodesic need not be unique, consider road building for two towns at diametrically opposite points of a circular marsh.) If any two points are joined by a geodesic then \(d(A,B)\) is the ‘length of the geodesic path joining \(A\) and \(B\)’ where length refers to the metric \(d\) and not to the Euclidean metric.

Let us try to use these ideas to ﬁnd a metric on the upper half-plane \[H=\{z\in {\mathbb C}:\Im z>0\}\] which is invariant under Möbius transformation. (See course C1/2 for background). More precisely we want a metric \(d\) such that if \(T\) is a Möbius map mapping \(H\) to itself bijectively, then \(d(Tz_{1},Tz_{2})=d(z_{1},z_{2})\) for all \(z_{1},z_{2}\in H\). Of course no such map may exist but we shall see where the question leads.

Statement 12.2. The set \(\mathcal {H}\) of Möbius maps \(T\) such that \(T|_{H}:H\rightarrow H\) is bijective is a subgroup of the group \(\mathcal {M}\) of all Möbius transformations. The subgroup \(\mathcal {H}\) is generated by the transformations \(T_{a}\) with \(a\in {\mathbb R}\), \(D_{\lambda }\) with \(\lambda >0\) and \(J\) where \begin {align*} T_{a}=&a+z\\ D_{\lambda }(z)=&\lambda z\\ J(z)=&-z^{-1} \end {align*}

By looking at the eﬀect of elements of \(\mathcal {H}\) on a small arc, we are led to the following. (Throughout we write write \(z=x+iy\) and identify the plane \({\mathbb R}^{2}\) with \(\mathbb C\) in the usual manner.)

Statement 12.3. Suppose \(g:H\rightarrow {\mathbb R}\) is well behaved. Then \[\int _{T(\Gamma )}g(x,y)ds=\int _{\Gamma }g(x,y)ds\] for all nice paths \(\Gamma \) in \(H\) and all \(T\in {\mathcal H}\) if and only if \(g(x,y)=Ay^{-1}\) for some \(A>0\).

Let us now ﬁx \(g(x,y)=y^{-1}\) and let \(d\) be the derived metric.

We can use the calculus of variations (course C10, Mathematical Methods) to ﬁnd geodesics.

Statement 12.5. The geodesics for the metric just described are arcs of circles with centres on the real axis and straight lines which cut the real axis at right angles.

Those who are interested in the axiom of parallels for Euclidean geometry will note that, for \((H,d)\), if we call circles with centres on the real axis and straight lines which cut the real axis at right angles geodesic lines for \((H,d)\), then given any geodesic line \(\Gamma \) and any point \(A\) in \(H\) not on \(\Gamma \) we can ﬁnd many geodesic lines through \(A\) which do not intersect \(\Gamma \). More details are given in course D1 (Geometry).

Of course the methods of C10 are indicative rather than conclusive. (We have used a necessary condition and not a suﬃcient one.) However, in this particular case, it is not hard, once the answer is known, to state and prove everything rigorously. If we want general underpinning theorems they can be found in Diﬀerential Geometry but require a lot of hard work.

There is an interesting analogue of this kind of metric for groups.

Lemma 12.6. Let \(G\) be a group and \(A\) a subset of \(G\) generating \(G\). Write \(A^{-1}=\{a^{-1}:a\in A\}\). If we set \[d(x,y)=\min \{n: \text {we can find $a_{1},a_{2},\dots ,a_{n}\in A\cup A^{-1}$ with $a_{1}a_{2}\dots a_{n}=xy^{-1}$}\},\] then \(d\) is a metric on \(G\) with \(d(xz,yz)=d(x,y)\) for all \(x,y,z\in G\). (Thus \(d\) is a right translation invariant metric.)

If \(d(x,y)\) is bounded we call \(\max d(x,y)\) the diameter of the group. The notion of diameter has an obvious relevance to the problem ‘How many shuﬄes do we need to produce a well shuﬄed pack?’.

If we examine the arguments of Section 11 we see that they are all mere algebra.
What do we need to introduce to do genuine analysis on metric spaces. We can
not use a variant of the fundamental axiom because there is no order on our
spaces^{7}.
The correct variant of the Bolzano–Weierstrass method is, it is generally agreed,
the notion of compactness which will be studied in the context of topological
spaces (a concept more general than metric spaces) in course C 12. Instead we use
a generalisation of the general principle of convergence.

Deﬁnition 13.1. If \((X,d)\) is a metric space we say that a sequence of points \(x_{n}\in X\) is Cauchy if given any \(\epsilon >0\) we can ﬁnd \(n_{0}(\epsilon )\) such that \(d(x_{p},x_{q})<\epsilon \) for all \(p,q\geq n_{0}(\epsilon )\).

Of course, \({\mathbb R}^{n}\) with the Euclidean metric is complete.

To show that the notion of completeness is distinct from ‘Bolzano–Weierstrass notions’ we introduce a dull but useful metric space.

Lemma 13.3. Let \(X\) be any set. If we deﬁne \(d:X^{2}\rightarrow \mathbb {R}\) by \begin {align*} d(x,y)=&1\qquad \text {if $x\neq y$}\\ d(x,x)=&0 \end {align*}

then \((X,d)\) is a metric space.

We call the metric \(d\) of the previous lemma the discrete metric.

Lemma 13.4. Let \((X,d)\) be a space with the discrete metric. Then \((X,d)\) is complete.

Suppose \(X\) is inﬁnite. Then although \(X\) is bounded (if \(x_{0}\in X\) then \(\bar {B}(x_{0},1)=X\)) we can ﬁnd a sequence \(x_{n}\) such that no subsequence converges.

Here is another property of the discrete metric.

Lemma 13.5. If \((X,d)\) is a space with the discrete metric then every subset of \(X\) is both open and closed.

The contraction mapping theorem (Theorem 15.1) and its applications will provide a striking example of the utility of the concept of completeness. However this section and the next are devoted more to examples of spaces which are and are not complete. Students who want to see completeness in action immediately should do the next, starred, example.

Exercise 13.6. We say that a metric space \((X,d)\) has no isolated points if given \(y\in X\) and \(\epsilon >0\) we can ﬁnd an \(x\in X\) such that \(0<d(x,y)<\epsilon \). Show by the methods of Exercise 3.14 that a non-empty metric space with no isolated points is uncountable.

Give an example of an inﬁnite countable metric space. Give an example of an uncountable metric space all of whose points are isolated.

The next lemma gives a good supply of metric spaces which are complete and of metric spaces which are not complete.

Lemma 13.7. Let \((X,d)\) be a complete metric space. If \(E\) is a subset of \(X\) and we deﬁne \(d_{E}:E^{2}\rightarrow {\mathbb R}\) by \(d_{E}(u,v)=d(u,v)\) whenever \(u,v\in E\) then \((E,d_{E})\) is complete if and only if \(E\) is closed in \((X,d)\).

Thus, for example, the closed interval \([a,b]\) is complete for the usual metric but the open interval \((a,b)\) is not.

Here is a more interesting example of a metric which is not complete.

Example 13.8. Consider \(C([-1,1])\) the set of continuous functions \(f:[-1,1]\rightarrow {\mathbb R}\). If we set \[d(f,g)=\|f-g\|_{1}=\int _{-1}^{1}|f(x)-g(x)|\,dx\] then \(d\) is a metric. Let

\begin{alignat*}{2} f_{n}(x)&=-1&&\qquad\text{for $-1\leq x\leq -1/n$}\\ f_{n}(x)&=nx&&\qquad\text{for $-1/n\leq x\leq 1/n$}\\ f_{n}(x)&=1&&\qquad\text{for $1/n\leq x\leq 1$}. \end{alignat*}

The sequence \(f_{n}\) is Cauchy but has no limit. Thus \((C([-1,1]),d)\) is not complete.

The example just given leaves open the possibility that the same type of metric might be be complete when applied to Riemann integrable functions. The following example which is long, heavily starred, and only for the ambitious shows that this is not so.

Exercise 13.9. Consider \({\mathcal R}([-1,1])\) the set of Riemann integrable functions functions \(f:[-1,1]\rightarrow {\mathbb R}\). If we set \[d(f,g)=\|f-g\|_{1}=\int _{-1}^{1}|f(x)-g(x)|\,dx\] show that \(d\) satisﬁes conditions (i), (iii) and (iv) of the deﬁnition of a metric (see Deﬁnition 11.1) but give an example to show that condition (ii) may fail. We could now stop since \(d\) is not a metric but this would be to miss the interesting part.

In many ways, condition (ii) is the least important part of the deﬁnition of a metric. We call something satisfying all the other conditions a pseudo-metric. We can deﬁne limits for a pseudo-metrics in the same way as for metrics.

If \((X,\rho )\) is a pseudo-metric space show that if \(x_{n}\rightarrow x\) then \(x_{n}\rightarrow y\) if and only if \(\rho (x,y)=0\).

In the same way we can deﬁne Cauchy sequences. A pseudo-metric space is called complete if every Cauchy sequence converges. Our object is to show that \(({\mathcal R}([-1,1]),d)\) is not complete.

(i) Set

\begin{alignat*}{2} \Delta_{n}(x)=&0&&\qquad\text{if $2^{-3n}\leq |x|$}\\ \Delta_{n}(x)=&1-2^{3n}|x|&&\qquad\text{if $|x|\leq 2^{-3n}$.} \end{alignat*}

and \(g_{n}(t)=\sum _{r=-2^{n}}^{2^{n}}\Delta _{n}(t-r2^{-n})\) for all \(-1\leq t\leq 1\). Sketch \(g_{n}\) and show that \(\|g_{n}\|_{1}\leq 2^{-2n+1}\).

(ii) If \(f_{n}(x)=\max \{f_{1}(x),f_{2}(x),\dots ,f_{n}(x)\}\) explain brieﬂy why \(f_{n}\) is a continuous function on \([-1,1]\) with \(0\leq f_{n}(x)\leq 1\). Show that \(\|f_{n+1}-f_{n}\|\leq 2^{-2n+3}\) and deduce that \(f_{n}\) is a Cauchy sequence in \(({\mathcal R}([-1,1]),d)\).

(iii) Suppose if possible that there exists an \(f\in {\mathcal R}([-1,1])\) such that \(\|f_{n}-f\|\rightarrow 0\) as \(n\rightarrow \infty \). Show that \(\|f\|\leq 2/3\).

(iv) We now use the notation of Section 9. Explain why there must exist a dissection \(\mathcal {D}\) of \([-1,1]\) such that \(S(f,\mathcal {D})\leq 1\). Deduce that we can ﬁnd \(a\) and \(b\) with \(-1\leq a <b\leq 1\) such that \(f(x)\leq 1/2\) for all \(x\in [a,b]\).

(v) Show that \(\int _{a}^{b}|f(x)-f_{n}(x)|\,dx\nrightarrow 0\) and deduce that \(\|f-f_{n}\|_{1}\nrightarrow 0\). This contradiction shows that the sequence \(f_{n}\) does not converge and \(({\mathcal R}([-1,1]),d)\) is not complete.

The reader who has got this far can feel well pleased with herself but by working still harder we can extract a little more juice from this example and show that \(f_{n}\) converges pointwise to a function which is not Riemann integrable.

(v) Show that, for each \(x\), \(f_{n}(x)\) is a bounded increasing function. Deduce that \(f_{n}(x)\) tends to limit \(F(x)\), say, for each \(x\). Show that \(I^{*}(F)=2\).

(vi) Let \(Z_{n}=f_{n}^{-1}(0)\). Explain why \(Z_{n}\) is closed and \(Z_{1}\supseteq Z_{2}\supseteq \dots \). If \(\mathcal {D}\) is a ﬁxed dissection of \([-1,1]\) show that for each \(n\) we can ﬁnd a collection of \(\mathcal {I}_{n}\) of intervals \(I=[x_{j},x_{j-1}]\) with end points successive points of the dissection and of total length greater than \(1\) such that \(I\cap Z_{n}\neq \emptyset \).

(vii) Let \(\mathcal {I}=\bigcap _{n=1}^{\infty }\mathcal {I}_{n}\). Show that \(\mathcal {I}\) is a collection of intervals \(I=[x_{j},x_{j-1}]\) with end points successive points of the dissection \(\mathcal {D}\) and with total length greater than \(1\) such that \(I\cap Z_{n}\neq \emptyset \) for each \(n\).

(viii) Use Theorem 5.17 to show that \(I\cap \bigcap _{n=1}^{\infty }Z_{n}\neq \emptyset \) whenever \(I\in mathcal{I}\). Deduce that \(s(F,\mathcal {D})\leq 1\).

(ix) Conclude that \(F\) is not Riemann integrable.

If we seek to embed \(({\mathcal R}([-1,1]),d)\) in some larger space of functions which is complete we are led to Lebesgue theory.

This section is devoted to one of the most important metrics on functions. We shall write \(\mathbb {F}\) to mean either \(\mathbb {R}\) or \(\mathbb {C}\).

Deﬁnition 14.1. If \(E\) is a non-empty set we write \(\mathcal {B}(E)\) for the set of bounded functions \(f:E\rightarrow {\mathbb F}\). The uniform norm \(\|\ \|_{\infty }\) on \(\mathcal {B}\) is deﬁned by \(\|f\|_{\infty }=\sup _{x\in E}|f(x)|\).

Lemma 14.2. If we use the standard operations \(\mathcal {B}(E)\) is a vector space over \(\mathbb {F}\). If \(f,g,h\in \mathcal {B}(E)\) the following results are true.

(i) \(\|f\|_{\infty }\geq 0\),

(ii) If \(\|f\|_{\infty }= 0\) then \(f=0\),

(iii) If \(\lambda \in {\mathbb F}\) then \(\|\lambda f\|_{\infty }=|\lambda |\|f\|_{\infty }\).

(iv) (The triangle inequality) \(\|f+g\|_{\infty }\leq \|f\|_{\infty }+\|g\|_{\infty }\).

(v) The pointwise product \(fg\in \mathcal {B}(E)\) (where \(fg(x)=f(x)g(x)\)) and \(\|fg\|_{\infty }\leq \|f\|_{\infty }\|g\|_{\infty }\).

We call the metric \(d\) given by \(d(f,g)=\|f-g\|_{\infty }\) the uniform metric.

The space \(\mathcal {B}(E)\) is not very interesting in itself but if \(E\) is a metric space it has a very interesting subspace.

Deﬁnition 14.4. If \((E,d)\) is a non-empty metric space we write \(\mathcal {C}(E)\) for the set of bounded continuous functions \(f:E\rightarrow {\mathbb F}\).

The next remark merely restates what we already know.

Lemma 14.5. If \((E,d)\) is a non-empty metric space then \(\mathcal {C}(E)\) is a vector subspace of \(\mathcal {B}(E)\). Further if \(f,g\in \mathcal {C}(E)\) the pointwise product \(fg\in \mathcal {C}(E)\).

However the next result is new and crucial.

Theorem 14.6. If \((E,d)\) is a non-empty metric space then \(\mathcal {C}(E)\) is a closed subset of \(\mathcal {B}(E)\) under the uniform metric.

This is the famous ‘\(\epsilon /3\) Theorem’^{8}.
Traditionally, it has been presented in a diﬀerent but essentially equivalent
manner.

Deﬁnition 14.7 (Uniform convergence). If \(E\) is a non-empty set and \(f_{n}:E\rightarrow {\mathbb F}\) and \(f:E\rightarrow {\mathbb F}\) are functions we say that \(f_{n}\) converges uniformly to \(f\) as \(n\rightarrow \infty \) if, given any \(\epsilon >0\) we can ﬁnd an \(n_{0}(\epsilon )\) such that \(|f_{n}(x)-f(x)|<\epsilon \) for all \(x\in E\) and all \(n\geq n_{0}(\epsilon )\).

Theorem 14.6 now takes the following form.

Theorem 14.8. If \((E,d)\) is a non-empty metric space and \(f_{n}:E\rightarrow {\mathbb F}\) form a sequence of continuous functions tending uniformly to \(f\) then \(f\) is continuous.

More brieﬂy, the uniform limit of continuous functions is continuous. (To see the exact equivalence observe that if \(f_{n}\rightarrow f\) uniformly then we can ﬁnd an \(N\) such that \(f_{n}-f_{N}\) is bounded for all \(n\geq N\).)

Since \(\mathcal {C}(E)\) is a closed subset of \(\mathcal {B}(E)\), Theorem 14.3 and Lemma 13.7 gives us another important theorem.

Theorem 14.9. If \((E,d)\) is a non-empty metric space then the the uniform metric on \(\mathcal {C}(E)\) is complete.

This result, also, has a more traditional form.

Theorem 14.10 (General principle of uniform convergence). Suppose that \((E,d)\) is a non-empty metric space and \(f_{n}:E\rightarrow {\mathbb F}\) is a continuous function \([n\geq 1]\). The sequence \(f_{n}\) converge uniformly to a continuous function \(f\) if and only if given any \(\epsilon >0\) we can ﬁnd an \(n_{0}(\epsilon )\) such that \(|f_{n}(x)-f_{m}(x)|<\epsilon \) for all \(n,m\geq n_{0}(\epsilon )\).

This theorem is also known as the GPUC by those who do not object to theorems which sound as though they were a branch of the secret police.

If \(E\) is a closed bounded subset of \({\mathbb R}^{m}\) with the Euclidean metric then by Theorem 6.4 all continuous functions \(f:E\rightarrow {\mathbb F}\) are bounded. In these circumstances we shall write \(C(E)=\mathcal {C}\). For the rest of the course we will only consider this special case. We start with a couple of important examples.

Example 14.11 (The witch’s hat). Deﬁne \(f_{n}:[0,2]\rightarrow {\mathbb R}\) by

\begin{alignat*}{2} f_{n}(x)=&1-n|x-n^{-1}|&&\qquad\text{for $|x-n^{-1}|\leq n^{-1}$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}

Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) as \(n\rightarrow \infty \) for each \(x\) but \(f_{n}\nrightarrow 0\) uniformly.

More brieﬂy, pointwise convergence does not imply uniform convergence.

Example 14.12. Deﬁne \(f_{n}:[0,1]\rightarrow {\mathbb R}\) by \(f_{n}(x)=x^{n}\). Then \(f_{n}(x)\rightarrow f(x)\) as \(n\rightarrow \infty \) where \(f(x)=0\) for \(0\leq x<1\) but \(f(1)=1\).

Thus the pointwise limit of continuous functions need not be continuous.

Uniform convergence is a very useful tool when dealing with integration.

Theorem 14.13. Let \(f_{n}\in C([a,b])\). If \(f_{n}\rightarrow f\) uniformly then \(f\in C([a,b])\) and \[\int _{a}^{b}f_{n}(x)\,dx\rightarrow \int _{a}^{b}f(x)\,dx.\]

Students often miss the full force of this theorem because the formula is so easy to prove. We also need to prove that the second integral actually exists. The second half of Example 13.9 shows that the pointwise limit of continuous functions need not be integrable.

Theorem 14.13 should be considered in the context of the following two examples.

Example 14.14 (The tall witch’s hat). Deﬁne \(f_{n}:[0,2]\rightarrow {\mathbb R}\) by

\begin{alignat*}{2} f_{n}(x)=&n(1-n|x-n^{-1}|)&&\qquad \text{for $|x-n^{-1}|\leq n^{-1}$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}

Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) as \(n\rightarrow \infty \) but \[\int _{0}^{2}f_{n}(x)\,dx\nrightarrow 0\] as \(n\rightarrow \infty \).

Example 14.15 (Escape to inﬁnity). Deﬁne \(f_{n}:{\mathbb R}\rightarrow {\mathbb R}\) by

\begin{alignat*}{2} f_{n}(x)=&1-n^{-1}|x|&&\qquad \text{for $|x|\leq n$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}

Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) uniformly as \(n\rightarrow \infty \) but \[\int _{-\infty }^{\infty }f_{n}(x)\,dx\nrightarrow 0\] as \(n\rightarrow \infty \).

Example 14.15 must be well understood since it represents a very common phenomenon which we have to learn to live with.

Traditionally, Theorem 14.13 is always paired with the following result which is really a disguised theorem on integration!

Theorem 14.16. Suppose that \(f_{n}:[a,b]\rightarrow {\mathbb R}\) is diﬀerentiable on \([a,b]\) with continuous derivative \(f_{n}'\) (we take the one-sided derivative at end points if necessary). Suppose that \(f_{n}(x)\rightarrow f(x)\) as \(n\rightarrow \infty \) for each \(x\in [a,b]\) and suppose that \(f_{n}'\) converges uniformly to a limit \(F\) on \([a,b]\). Then \(f\) is diﬀerentiable with derivative \(F\).

The reader knows how to turn results on the limits of sequences into results on inﬁnite sums and vice versa. Applied to Theorems 14.13 and 14.16 the techniques produces the following results.

Theorem 14.17 (Term by term integration). Let \(g_{j}:[a,b]\rightarrow {\mathbb R}\) be continuous. If \(\sum _{j=1}^{n}g_{j}\) converges uniformly as \(n\rightarrow \infty \) then \[\int _{a}^{b}\sum _{j=1}^{\infty }g_{j}(x)\,dx =\sum _{j=1}^{\infty }\int _{a}^{b}g_{j}(x)\,dx.\]

Theorem 14.18 (Term by term diﬀerentiation). Let \(g_{j}:[a,b]\rightarrow {\mathbb R}\) be diﬀerentiable with continuous derivative. If \(\sum _{j=1}^{n}g_{j}(x)\) converges for each \(x\) and \(\sum _{j=1}^{n}g_{j}'\) converges uniformly as \(n\rightarrow \infty \) then \(\sum _{j=1}^{\infty }g_{j}\) is diﬀerentiable and \[\frac {d\ }{dx}\left (\sum _{j=1}^{\infty }g_{j}(x)\right ) =\sum _{j=1}^{\infty }g_{j}(x).\]

Here is a starred application of Theorem 14.16. The reader may, of course omit it but I include it partly because it is often more useful than the theorem (Theorem 9.13) that it extends and partly because it provides an excellent example of how Theorem 14.16 is used in practice.

Theorem 14.19 (Diﬀerentiation under an inﬁnite integral). Suppose \(g:[0,\infty ]\times [c,d]\) is continuous and that the partial derivative \(g_{,2}\) exists and is continuous. Suppose further that that there exists a continuous function \(h:[0,\infty ]\times [c,d]\) with \(|g_{,2}(x,y)|\leq h(x)\) for all \((x,y)\) and such that \(\int _{0}^{\infty }h(x)\,dx\) exists and is ﬁnite. Then, if \(G(y)=\int _{0}^{\infty }g(x,y)\,dx\) exists for all \(y\in (c,d)\) we have \(G\) diﬀerentiable on \((c,d)\) with \[G'(y)=\int _{0}^{\infty }g_{,2}(x,y)\,dx.\]

We conclude this section with a remark which may appear rather isolated but will become more important in course C12 and later work on complex variable.

Theorem 14.20. We work in \(\mathbb C\). Let \(\sum _{n=0}^{\infty }a_{n}z^{n}\) be a power series with radius of convergence \(R\). If \(0\leq r< R\) then \(\sum _{n=0}^{N}a_{n}z^{n}\) converges uniformly on \(\bar {B}(0,r)\) as \(N\rightarrow \infty \). However, it need not be true that \(\sum _{n=0}^{N}a_{n}z^{n}\) converges uniformly on \(B(0,R)\).

The next result is a beautiful example of the power of abstraction. In it Banach transformed a ‘folk-technique’ into a theorem.

Theorem 15.1 (The contraction mapping theorem). Let \((X,d)\) be a non-empty complete metric space and \(T:X\rightarrow X\) a mapping such that there exists a \(K<1\) with \(d(Tx,Ty)\leq K d(x,y)\) for all \(x,y\in X\). Then there exists a unique \(x_{0}\in X\) such that \(Tx_{0}=x_{0}\).

More brieﬂy, a contraction mapping on a complete metric space has a unique ﬁxed point.

Wide though the conditions are the reader should exercise caution before attempting to widen them further.

Example 15.2. (i) If \(X\) is the unit circle centre \(\mathbf 0\) in the plane, \(d\) Euclidean distance and \(T\) a non-trivial rotation then \((X,d)\) is a complete metric space and \(d(T{\mathbf x},T{\mathbf y})=d({\mathbf x},{\mathbf y})\) for all \({\mathbf x},{\mathbf y}\in X\) but \(T\) has no ﬁxed point.

(ii) If \(X={\mathbb R}\), \(d\) is Euclidean distance and \[Tx=1+x+g(x)\] for some suitably chosen \(g\) then \((X,d)\) is a complete metric space and \(d(Tx,Ty)< d(x,y)\) for all \(x,y\in X\) but \(T\) has no ﬁxed point.

We use the contraction mapping theorem to show that a wide class of diﬀerential equations actually have a solution. We shall be looking at equations of the form \[y'=f(x,y).\] Our ﬁrst, simple but important, result is that this problem on diﬀerential equations can be turned into a problem on integral equations.

Lemma 15.3. If \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous, \(x_{0},y_{0}\in {\mathbb R}\) and \(\delta >0\) then the following two statements are equivalent.

(A) The function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is diﬀerentiable and satisﬁes the equation \(y'(x)=f(x,y(x))\) for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\) together with the boundary condition \(y(x_{0})=y_{0}\).

(B) The function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is continuous and satisﬁes the condition \[y(x)=y_{0}+\int _{x_{0}}^{x}f(u,y(u))\,du\] for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\).

Theorem 15.4. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous, \(x_{0},y_{0}\in {\mathbb R}\) and \(\delta >0\). Suppose further that there exists a \(K>0\) such that \(K\delta <1\) and

\begin{equation*} |f(x,u)-f(x,v)|\leq K|u-v| \tag*{$\bigstar$} \end{equation*}

for all \(u,v\in (x_{0}-\delta ,x_{0}+\delta )\). Then there exists a unique \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is continuous and satisﬁes the condition \[y(x)=y_{0}+\int _{x_{0}}^{x}f(t,y(t))\,dt\] for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\).

Condition \(\bigstar \) is called a Lipschitz condition. It closely related to diﬀerentiability (consider the mean value theorem) but does not imply diﬀerentiability (consider \(f(x,y)=|y|\)). In the absence of such a condition diﬀerential equations can have unexpected properties.

Example 15.5. There are an inﬁnity of diﬀerent solutions to the equation \[y'=3y^{2/3}\] with \(y(0)=0\).

The lecturer can not bear to let the topic go at this point but the rest of this section is heavily starred. As it stands Theorem 15.4 is not quite powerful enough for practical purposes. Careful thought, without the need for new ideas shows that it can be extended to the following result.

Theorem 15.6. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[|f(x,u)-f(x,v)|\leq K(R) |u-v| \] whenever \(|x|,|u|,|v|\leq R\). Then given any \((x_{0},y_{0})\in {\mathbb R}^{2}\) we can ﬁnd a \(\delta (x_{0},y_{0})>0\) such that there exists a unique diﬀerentiable function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) which satisﬁes the equation \(y'(x)=f(x,y(x))\) for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\) together with the boundary condition \(y(x_{0})=y_{0}\).

Theorem 15.6 tells us that the under very wide conditions the diﬀerential equation has a local solution through each \((x_{0},y_{0})\). Does it have a global solution, that is, can we ﬁnd a solution for the equation \(y'(x)=f(x,y(x))\) which is deﬁned for all \(x\in {\mathbb R}\)?

We ﬁnish our discussion by showing that the question is non-trivial.

Theorem 15.7. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[|f(x,u)-f(x,v)|\leq K(R) |u-v| \] for all \(|x|\leq R\) and all \(u\) and \(v\). Then given any \((x_{0},y_{0})\in {\mathbb R}^{2}\) we can ﬁnd a unique \(y:{\mathbb R}\rightarrow {\mathbb R}\) which is is diﬀerentiable and satisﬁes the equation \(y'(x)=f(x,y(x))\) for all \(x\in {\mathbb R}\) together with the boundary condition \(y(x_{0})=y_{0}\)

Example 15.8. If \(f(x,y)=1+y^{2}\) then \[|f(x,u)-f(x,v)|\leq 2R |u-v| \] whenever \(|u|,|v|\leq R\). However, there does not exist a diﬀerentiable function \(y:{\mathbb R}\rightarrow {\mathbb R}\) with \(y'=f(x,y)\) for all \(x\).

(The contents of this section are all starred)

In course C12 you saw how to solve the diﬀerential equation

\begin{equation*} y’’+a(x)y’+b(x)y=f(x) \tag*{$(*)$} \end{equation*}

subject to the conditions \(y(0)=y(1)=0\) by using the Green’s function. There is no doubt in the present author’s mind that the ‘physicist’s approach’ using impulses, delta functions and hand-waving is the most important for the student to master. However, it adds to our conﬁdence in the hand-waving approach to see that its results can be conﬁrmed rigourously.

We start by looking at solutions of

\begin{equation*} y’’+a(x)y’+b(x)y=0 \tag*{$(**)$} \end{equation*}

Observe ﬁrst that the results of the last section on the existence of solutions of diﬀerential equations can be extended to the vectorial case with only the most minor variations in the proof. Thus, for example, Theorem 15.7 extends as follows.

Theorem 16.1. Suppose \(f:{\mathbb R}\times {\mathbb R}^{m} \rightarrow {\mathbb R}^{m}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[\|f(x,{\mathbf w})-f(x,{\mathbf z})\| \leq K(R) \|{\mathbf w}-{\mathbf z}\| \] for all \(|x|\leq R\) and all \(\mathbf {w}\) and \(\mathbf {z}\). Then given any \(x_{0}\in {\mathbb R}\) and \({\mathbf y}_{0}\in {\mathbb R}^{m}\) we can ﬁnd a unique \(\mathbf {y}:{\mathbb R}\rightarrow {\mathbb R}^{m}\) which is is diﬀerentiable and satisﬁes the equation \[{\mathbf y}'(x)=f(x,{\mathbf y})\] for all \(x\in {\mathbb R}\) together with the boundary condition \(\mathbf {y}(x_{0})=\mathbf {y}_{0}\).

As the the reader knows from course C4 (Diﬀerential Equations) it is easy to convert equation \((**)\) into a form appropriate for the application of Theorem 16.1. If we set \[{\mathbf y}=\begin {pmatrix}y\\y'\end {pmatrix} \ \text {and}\ f(x,{\mathbf w})= \begin {pmatrix}0&1\\-a(x)&-b(x)\end {pmatrix}{\mathbf w} =\begin {pmatrix}w_{2}\\-b(x)w_{1}-a(x)w_{2}\end {pmatrix},\] then \[{\mathbf y}'(x)=f(x,{\mathbf y}).\] Now \begin {align*} \|f(x,{\mathbf w})-f(x,{\mathbf z})\| &=\left \|\begin {pmatrix} w_{2}-z_{2}\\-b(x)(w_{1}-z_{1})-a(x)(w_{2}-z_{2}) \end {pmatrix}\right \|\\ &\leq (1+|a(x)|+|b(x)|)\|{\mathbf w}-{\mathbf z}\| \end {align*}

so that writing \(K(R)=\sup _{t\in [-R,R]}(1+|a(t)|+|b(t)|)\) we have \[\|f(x,{\mathbf w})-f(x,{\mathbf z})\|\leq K(R) \|{\mathbf w}-{\mathbf z}\|\] whenever \(\|x|\leq K(R)\). Thus there is one and only one solution \(y_{[\lambda ]}\), say, of \((**)\) with \(y(0)=0\) and \(y'(0)=\lambda \). By linearity, \(y_{[\lambda ]}=\lambda y_{[1]}\).

Let us write \(y_{1}\) for the solution of \((**)\) with \(y_{1}(0)=0\), \(y_{1}'(0)=1\). We make the following \[\text {\bf key assumption}\qquad y_{1}(1)\neq 0.\] We write \(y_{2}\) for the solution of \((**)\) with \(y_{2}(0)=0\), \(y_{2}'(0)=1\). The next lemmas are familiar from courses C4 and C10. We deﬁne \[W(x)=y_{1}(x)y_{2}'(x)-y_{2}(x)y_{1}'(x).\] The function \(W\) is called the Wronskian.

Lemma 16.2. (i) The Wronskian is diﬀerentiable with \[W'(x)=-a(x)W(x).\]

(ii) The Wronskian is never zero.

We can now deﬁne the Green’s function \(G:{\mathbb R}^{2}\rightarrow {\mathbb R}\) by

\begin{alignat*}{2} G(s,t)=&y_{1}(s)y_{2}(t)W(t)^{-1}&&\qquad\text{for $s<t$}\\ G(s,t)=&y_{2}(s)y_{1}(t)W(t)^{-1}&&\qquad\text{for $t\leq s$.} \end{alignat*}

(Why should we do any such thing? Because the intuitive arguments of C10 tell us to.) Using Lemma 9.14 it is easy to verify our main result.

Theorem 16.3. If \(f:{\mathbb R}\rightarrow {\mathbb R}\) is continuous then \[y(x)=\int _{0}^{1}G(x,w)f(w)\,dw\] is twice diﬀerentiable and satisﬁes

\begin{equation*} y’’+a(x)y’+b(x)y=f(x) \tag*{$(*)$} \end{equation*}

together with the conditions \(y(0)=y(1)=0\).

(Our proof is very direct and involves neither diﬀerentiation under the integral nor Lemma 9.14. However, it is not hard to see that such ideas would be relevant in more complicated situations.)

From the point of view of more advanced work this shows how diﬀerential
operators like \[y\mapsto Sy=y''+a(x)y+b(x)y\] can be linked with better behaved integral operators like \[f\mapsto Tf \ \text {with}\ Tf(x)=\int _{0}^{1}G(x,v)f(v)\,dv.\] Note
that we have shown \(STf=f\) for \(f\) continuous, but note also that, if \(f\) is merely continuous, \(Sf\)
need not be deﬁned. The Green’s function \(G\) is an example of an integral
kernel^{9}.
More formally, if we write \[Tf(x)=\int _{0}^{1}K(x,v)f(v)\,dv,\] then \(T\) is called an integral operator with kernel \(K\). (So far
as I know, there is no connection with the ‘kernel’ that you meet in the linear
mathematics course P1.)

We end with an example to show that things really do go awry if our key assumption fails.

Since new introductions to analysis pour oﬀ the printing presses in an unending
torrent, it may well be my fault that I can not recommend one book to
cover the whole course. Spivak’s Calculus [8] and J. C. Burkill’s A First
Course in Mathematical Analysis [3] are both excellent introductions to
analysis but only deal with the real line and not with \({\mathbb R}^{m}\). J. C. Burkill and
H. Burkill’s A Second Course in Mathematical Analysis [4] is also good
but rather old fashioned. (Of course, this has nothing to do with the
authors’ knowledge but a great deal to do with what was then thought the
appropriate content for a second course.) Dieudonné’s Foundations of
Modern Analysis [5] is superb but written by someone who neither knew
nor cared what others thought was the appropriate content for a second
course^{10}.
A suitable compromise is reached in Marsden and Hoﬀman’s Elementary Classical
Analysis [7]. This covers the material on multidimensional calculus very clearly
and is probably the book that most students will ﬁnd most useful. Make sure that
your college library has a copy of the second edition.

The material on metric spaces is covered in Sutherland’s Introduction to Metric and Topological Spaces [9] which is workmanlike though a bit dull. (Once analytic topology is detached from its applications it takes a very gifted writer to make it exciting.)

I would be glad to hear of other suggestions for suitable books. A completely unsuitable but interesting version of the standard analysis course is given by Berlinski’s A Tour of the Calculus [1] — Spivak rewritten by Sterne with additional purple passages by the Ankh-Morpork tourist board.

[1] D. Berlinski A Tour of the Calculus Mandarin Paperbacks 1997.

[2] R. P. Boas Lion Hunting and Other Mathematical Pursuits (Editors G. L. Alexander and D. H. Mugler), Dolciani Expositions, Vol 15, MAA, 1995.

[3] J. C. Burkill A First Course in Mathematical Analysis CUP, 1962.

[4] J. C. Burkill and H. Burkill A Second Course in Mathematical Analysis CUP, 1970.

[5] J. Dieudonné Foundations of Modern Analysis, Academic Press, 1960.

[6] G. H. Hardy A Course of Pure Mathematics, CUP, 1908. (Still available in its 10th edition.)

[7] J. E. Marsden and M. J. Hoﬀman Elementary Classical Analysis (2nd Edition), W. H. Freeman, New York, 1993.

[8] M. Spivak, Calculus Addison-Wesley/Benjamin-Cummings, 1967.

[9] W. A. Sutherland Introduction to Metric and Topological Spaces, OUP, 1975.

[10] S. Wagon The Banach–Tarski Paradox, CUP, 1993.

[11] E. T. Whittaker and G. N. Watson A Course of Modern Analysis CUP, 1902 (Still available in its 4th edition.)

I have tried to produce 12 rather routine questions for each sheet. Any further questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.

Q 18.1. (We work with the same ideas as in Example 1.1.) (i) Find a diﬀerentiable function \(f:{\mathbb Q}\rightarrow {\mathbb Q}\) such that \(f'(x)=1\) for all \(x\in {\mathbb Q}\) but \(f(0)>f(1)\).

(ii) If we deﬁne \(g:{\mathbb Q}\rightarrow {\mathbb Q}\) by \(g(x)=(x^{2}-2)^{-2}\) show that \(g\) is continuous but unbounded on \(\{x\in {\mathbb Q}:|x|\leq 2\}\).

Q 18.2. In Question 18.3 below you are asked to prove Lemma 2.1. Write a paragraph on each of the following subjects.

(i) Would it have been better for the lecturer to prove Lemma 2.1 rather than leave it to you. Why?

(ii) What beneﬁt, if any, do you expect to derive from doing Lemma 2.1? Give reasons.

(iii) What is the best way of approaching Question 18.1. For example should you look at your notes from last year before attacking the question? Should you look at your notes from last year after attacking the question?

Q 18.3. Prove Lemma 2.1 (restated below for your convenience)

(i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).

(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).

(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).

(iv) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}+b_{n}\rightarrow a+b\).

(v) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}b_{n}\rightarrow ab\).

(vi) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(a_{n}\neq 0\) for each \(n\), \(a\neq 0\) then \(a_{n}^{-1}\rightarrow a^{-1}\).

(vii) If \(a_{n}\leq A\) for each \(n\) and \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) then \(a\leq A\).

Q 18.4. Prove Lemma 2.2 which states that a decreasing sequence of real numbers bounded below tends to a limit.

Q 18.5. (i) If \(a_{j}\) is a integer with \(0\leq a_{j}\leq 9\) show from the fundamental axiom that \[\sum _{j=1}^{\infty }a_{j}10^{-j}\] exists. Show that \(0\leq \sum _{j=1}^{\infty }a_{j}10^{-j}\leq 1\) carefully quoting any theorems that you use.

(ii) If \(0\leq x\leq 1\) show that we can ﬁnd integers \(x_{j}\) with \(0\leq x_{j}\leq 9\) such that \[x=\sum _{j=1}^{\infty }x_{j}10^{-j}.\] What important deep result do you use?

(iii) If \(a_{j}\) and \(b_{j}\) are integers with \(0\leq a_{j},b_{j}\leq 9\) and \(a_{j}=b_{j}\) for \(j<N\), \(a_{N}>b_{N}\) show that \[\sum _{j=1}^{\infty }a_{j}10^{-j}\geq \sum _{j=1}^{\infty }b_{j}10^{-j}.\] Give the precise necessary and suﬃcient condition for equality and prove it.

Q 18.6. (i) Let us write \[S_{n}=\sum _{r=0}^{n}\frac {1}{r!}.\] Show, from ﬁrst principles, that \(S_{n}\) converges to a limit (which, with the beneﬁt of extra knowledge, we call \(e\)).

(ii) Show that, if \(n\geq 2\) and \(r\geq 0\) then \[\frac {n!}{(n+r)!}\leq \frac {1}{3^{r}}.\] Deduce carefully that, if \(m\geq n\geq 2\), \[0\leq n!(S_{m}-S_{n})\leq \frac {1}{2}.\] and that \[0<n!(e-S_{n})\leq \frac {1}{2}.\] Deduce that \(n!e\) is not an integer for any \(n\) and conclude that \(e\) is irrational.

(iii) Show similarly that \(\displaystyle \sum _{r=0}^{\infty }\frac {1}{(2r)!}\) is irrational.

Q 18.7. In this question you should feel free to use any results you know (provided you quote them correctly).

(i) By using the mean value theorem, show that \((1+x^{-1})^{x}\) is an increasing function of \(x\) for \(x\geq 0\).

(ii) Show that \((1+n^{-1})^{n}\) tends to limit \(L\) as \(n\rightarrow \infty \).

(iii) Show that, if \(n\geq N\), \[e\geq \left (1+\frac {1}{n}\right )^{n}\geq \sum _{r=0}^{N}\binom {n}{r}n^{-r}.\] Deduce that \[e\geq L\geq \sum _{r=0}^{N}\frac {1}{r!},\] and conclude that \[\left (1+\frac {1}{n}\right )^{n}\rightarrow e\] as \(n\rightarrow \infty \).

(iv) Assuming the Taylor expansion for \(e^{t}\) show that \[\left (1+\frac {t}{n}\right )^{n}\rightarrow e^{t}\] as \(n\rightarrow \infty \) for all \(t>0\).

(v) Show that \[1-\left (1+\frac {t^{2}}{n^{2}}\right )^{n}\rightarrow 0\] and deduce that \[1-\left (1-\frac {t^{2}}{n^{2}}\right )^{n}\rightarrow 0\] as \(n\rightarrow \infty \). Hence deduce that \[\left (1-\frac {t}{n}\right )^{n}\rightarrow e^{-t}\] as \(n\rightarrow \infty \) for all \(t>0\).

(vi) Conclude that \[\left (1+\frac {t}{n}\right )^{n}\rightarrow e^{t}\] as \(n\rightarrow \infty \) for all real \(t\).

Q 18.8. (a) Prove the statements made in Example 3.9.

(i) The sum \(\sum _{n=1}^{\infty }n^{-2}z^{n}\) has radius of convergence \(1\) and converges for all \(|z|=1\).

(ii) The sum \(\sum _{n=0}^{\infty }z^{n}\) has radius of convergence \(1\) and diverges for all \(|z|=1\).

(iii) The sum \(\sum _{n=1}^{\infty }n^{-1}z^{n}\) has radius of convergence \(1\), diverges for \(z=1\) and converges for \(z=-1\).

(b) Suppose that \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \(R\) and \(\sum _{j=0}^{\infty }b_{j}z^{j}\) has radius of convergence \(S\).

(i) Show that if \(R\neq S\) then \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence \(\min (R,S)\).

(ii) Show that if \(R=S\) then \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence at least \(R\).

(iii) If \(T\geq 1=R=S\) give an example where \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence \(T\).

Q 18.9. (i) Suppose that \(x_{n}\) is a bounded sequence of real numbers. Show that \(y_{n}=\sup _{m\geq n}x_{m}\) is a well deﬁned bounded decreasing sequence and so \(y_{n}\) tends to a limit \(y\) say. We write \[\limsup _{n\rightarrow \infty }x_{n}=y.\]

(ii) Let \(a_{n}\) be a sequence of complex numbers. Show that if the sequence \(|a_{n}|^{1/n}\) is unbounded, then \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \(0\) and if the the sequence \(|a_{n}|^{1/n}\) is bounded and non-zero, then \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \((\limsup _{n\rightarrow \infty }|a_{n}|^{1/n})^{-1}\). State, with reasons, what happens when \(\limsup _{n\rightarrow \infty }|a_{n}|^{1/n}=0\)

Q 18.10. Suppose that \(A\) and \(B\) are non-empty bounded subsets of \(\mathbb R\). Show that \[\sup \{a+b:a\in A,\ b\in B\}=\sup A+\sup B.\] The last formula is more frequently written \[\sup _{a\in A,\ b\in B}a+b=\sup _{a\in A}a+\sup _{b\in B}b.\]

Suppose, further that \(a_{n}\) and \(b_{n}\) are bounded sequences of real numbers. For each of the following statements either give a proof that it is always true or an example to show that it is sometimes false.

(i) \(\sup _{n}(a_{n}+b_{n})=\sup _{n}a_{n}+\sup _{n}b_{n}\).

(ii) \(\sup _{a\in A,\ b\in B}ab=(\sup _{a\in A}a)(\sup _{b\in B}b)\).

(iii) \(\inf _{a\in A,\ b\in B}a+b=\inf _{a\in A}a+\inf _{b\in B}b\).

Q 18.11. Prove all the statements made in Lemma 4.3, viz:

(i) The limit is unique. That is, if \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {a}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}=\mathbf {b}\).

(ii) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(\mathbf {a}_{n(j)}\rightarrow \mathbf {a}\) as \(j\rightarrow \infty \).

(iii) If \(\mathbf {a}_{n}=\mathbf {c}\) for all \(n\) then \(\mathbf {a}_{n}\rightarrow \mathbf {c}\) as \(n\rightarrow \infty \).

(iv) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {b}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}_{n}+\mathbf {b}_{n} \rightarrow \mathbf {a}+\mathbf {b}\).

(v) Suppose \(\mathbf {a}_{n}\in {\mathbb R}^{m}\), \(\mathbf {a}\in {\mathbb R}^{m}\) \(\lambda _{n}\in {\mathbb R}\), and \(\lambda \in {\mathbb R}\). If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\lambda _{n}\rightarrow \lambda \) then \(\lambda _{n}\mathbf {a}_{n}\rightarrow \lambda \mathbf {a}\).

Q 18.12. Use the Bolzano–Weierstrass theorem to prove the general principle of convergence in \({\mathbb R}^{m}\) (Theorem 4.6).

The remaining questions are included for general interest and not for relevance to the syllabus or to passing Tripos exams.

Q 18.13. By comparing \[\frac {a}{b}\ \text {with}\ \frac {3a+4b}{2a+3b}\] show that given any \(x\in {\mathbb Q}\) with \(x^{2}\leq 2\) we can ﬁnd a \(y\in {\mathbb Q}\) with \(y>x\) and \(y^{2}\leq 2\). Deduce that the set \(\{x\in {\mathbb Q}:x^{2}<2\}\) has no supremum in \(\mathbb Q\). (The idea of comparing \(a/b\) with \((3a+4b)/(2a+3b)\) goes back as far as Euclid.)

Q 18.14. If \(a<c<b\) and \(E\) is a subset of \(\mathbb R\) explain why if \(E\cap [a,c]\) and \(E\cap [c,b]\) are ﬁnite then so is \(E\cap [a,b]\). Use this to give a proof by bisection of Bolzano’s theorem.

Q 18.15. Do Exercise 3.14 showing the uncountability of the reals.

Let \(y_{1}\), \(y_{2}\), …be any sequence of points in \(\mathbb R\). Let \(x_{0}=0\), \(\delta _{0}=1\).

(i) Show that you can construct inductively a sequence of real numbers \(x_{1}\), \(x_{2}\), …and positive numbers \(\delta _{j}\) such that

(a) \(|x_{n}-x_{n-1}|<\delta _{n-1}/4\),

(b) \(x_{n}\neq y_{n}\),

(c) \(0<\delta _{n}<|x_{n}-y_{n}|\),

(d) \(\delta _{n}<\delta _{n-1}/4\).

(ii) Show that \(\delta _{n+m}<4^{-m}\delta _{n}\) for \(m,n\geq 0\) and deduce that the \(x_{n}\) form a Cauchy sequence. Conclude that \(x_{n}\) tends to a limit \(x\).

(iii) Show that \(|x_{n+m}-x_{n}|<\delta _{n}/3\) for all \(m,n\geq 0\). Deduce that \(|x-x_{n}|\leq \delta _{n}/3\) for all \(n\geq 0\). Why does this show that \(y_{n}\neq x\) for all \(n\).

(iv) Prove that the real numbers are uncountable.

Q 18.16. Suppose we consider the collection \(\mathcal Q\) of rational functions \[f(X)=\frac {a_{0}+a_{1}X+\dots +a_{n}X^{n}} {b_{0}+b_{1}X+\dots +b_{m}X^{m}}\] (where the \(a_{j}\) and \(b_{j}\) are real and where \(b_{m}\neq 0\) and \(a_{n}\neq 0\) if \(n\geq 1\)) with the usual rules for addition and multiplication. Suppose that we say that \(f(X)>0\) if \(a_{n}b_{m}>0\) and that \(f>g\) if \(f-g>0\). Convince yourself that \(\mathcal Q\) has all the standard properties of arithmetic and order we expect. (Some students will only be able to convince themselves by looking up the axioms for an ordered ﬁeld in, for example, [8] and then verifying each axiom in turn. Good luck to them, I say.)

Show that \(1/n>1/X>0\) for all integers \(n\) and so Archimedes’ Axiom is false for \(\mathcal Q\).

I have tried to produce 12 rather routine questions for each sheet. Any further questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.

Q 19.1. Here is an alternative proof of Theorem 6.8

(i) Let \(K\geq 0\) and \(\epsilon >0\). If \(f:[a,b]\rightarrow {\mathbb R}\) has the property that \[f(b)-f(a)\geq (K+\epsilon )(b-a)\] and \(N\) is strictly positive integer show that there exists an integer \(r\) with \(1\leq r\leq N\) and \[f(a+r(b-a)/N)-f(a+(r-1)(b-a)/N)\geq (K+\epsilon )(b-a)/N\]

(ii) Under the conditions of part (i) deduce that there exists a sequence of closed intervals \([x_{n},y_{n}]\) with \[[a,b]=[x_{0},y_{0}]\supseteq [x_{1},y_{1}]\supseteq [x_{2},y_{2}]\supseteq \dots \] such that \(f(y_{n})-f(x_{n})\geq (K+\epsilon )(y_{n}-x_{n})\) and \(y_{n}-x_{n}\rightarrow 0\) as \(n\rightarrow \infty \).

(iii) Continuing with the same notation, show that \(x_{n}\rightarrow c\) for some \(c\in [a,b]\). Show that \(c\in [x_{n},y_{n}]\) for all \(n\) and that, if \(c\neq x_{n},y_{n}\) then \[\max \left ( \frac {f(y_{n})-f(c)}{y_{n}-c}, \frac {f(c)-f(x_{n})}{c-x_{n}} \right )\geq K+\epsilon .\] Conclude that if \(f\) is diﬀerentiable at \(c\) then \(f'(c)\geq K+\epsilon \).

(iv) Deduce all the results of Theorem 6.8.

Q 19.2. (i) If \(E\) is a set show that the set \[E^{\circ }=\bigcup \{U:\text {$U$ is open and $U\subseteq E$}\}\] has the following properties.

(a) \(E^{\circ }\) is open.

(b) \(E^{\circ }\subseteq E\).

(c) If \(A\) is open and \(A\subseteq E\) then \(A\subseteq E^{\circ }\).

(ii) Suppose that \(E'\) is such that

(a) \(E'\) is open.

(b) \(E'\subseteq E\).

(c) If \(A\) is open and \(A\subseteq E\) then \(A\subseteq E'\).

Show that \(E'=E^{\circ }\).

(iii) Explain why (i) and (ii) make it possible to deﬁne the interior of \(E\) as the largest open set contained in \(E\).

(iv) Show why it is possible to deﬁne the closure of \(E\) as the smallest closed set containing \(E\).

(v) Find the interior and closure of \((a,b)\), \([a,b)\), \((a,b]\) and \([a,b]\) where \(a<b\).

Q 19.3. Do the Exercise 5.13 restated below.

After looking at parts (iii) to (v) of Lemma 4.3 state the corresponding results for continuous functions. (Thus part (v) corresponds to the statement that if \(\lambda :E\rightarrow {\mathbb R}\) and \(f:E\rightarrow {\mathbb R}^{p}\) are continuous at \({\mathbf x}\in E\) then so is \(\lambda f\).) Prove your statements directly from Deﬁnition 5.12.

Q 19.4. If \(f,g:{\mathbb R}^{p}\rightarrow {\mathbb R}^{m}\) are continuous show that \[\begin {pmatrix} f\\g \end {pmatrix} :{\mathbb R}^{p}\rightarrow {\mathbb R}^{2m} ={\mathbb R}^{m}\times {\mathbb R}^{m}\] deﬁned by \[\begin {pmatrix} f\\g \end {pmatrix}({\mathbf x})= \begin {pmatrix} f({\mathbf x})\\g({\mathbf x}) \end {pmatrix}\] is continuous.

Show that the map \(A:{\mathbb R}^{m}\times {\mathbb R}^{m} \rightarrow {\mathbb R}^{m}\) deﬁned by \[A\begin {pmatrix} \mathbf {x}\\ \mathbf {y} \end {pmatrix}=\mathbf {x}+\mathbf {y}\] whenever \(\mathbf {x},\mathbf {y}\in {\mathbb R}^{m}\) is continuous. By applying the continuity of compositions of continuous functions (Lemma 5.16) to \(\displaystyle A\circ \begin {pmatrix} f\\g \end {pmatrix}\) deduce that \(f+g\) is continuous.

Obtain the other results of Question 19.3 similarly.

Q 19.5. Let \(f:{\mathbb R}\rightarrow {\mathbb R}\). State which of the following statements are always true and which may be false giving a proof or a counter example as appropriate.

(i) If \(f^{-1}(E)\) is closed whenever \(E\) is closed then \(f\) is continuous.

(ii) If \(f\) is continuous then \(f(U)\) is open whenever \(U\) is open.

(iii) If \(f\) is continuous then \(f(E)\) is bounded whenever \(E\) is bounded.

(iv) If \(f(E)\) is bounded whenever \(E\) is bounded then \(f\) is continuous.

Q 19.6. Suppose that \(E\) is a subset of \({\mathbb R}^{n}\). Show that \(E\) is closed and bounded if and only if every continuous function \(f:E\rightarrow {\mathbb R}^{n}\) is bounded.

Q 19.7. If \(A\) and \(B\) are subsets of \({\mathbb R}^{m}\) we write \[A+B=\{{\mathbf a}+{\mathbf b}: {\mathbf a}\in A,\ {\mathbf b}\in B\}\]

By considering the case \(m=1\), \[A=\{-n:n\in {\mathbb N}\}\ \text {and} \ B=\{n+1/n:n\in {\mathbb N},\ n\geq 2\}\] show that if \(A\) and \(B\) are closed it does not follow that \(A+B\) is closed.

Show however that if \(A\) is closed and bounded and \(B\) is closed then \(A+B\) is closed.

If \(A\) and \(B\) are open, does it follow that \(A+B\) is open? Give reasons.

Q 19.8. Consider the map \(\Omega :{\mathbb R}^{6}\rightarrow {\mathbb R}^{3}\) given by \[\Omega \begin {pmatrix}{\mathbf x}\\{\mathbf y} \end {pmatrix}= {\mathbf x}\wedge {\mathbf y}\] for all \({\mathbf x},{\mathbf y}\in {\mathbb R}^{3}\). (Here \(\wedge \) is the ‘vector product’ of mathematical methods.) Show directly from the deﬁnition that \(\Omega \) is diﬀerentiable.

Having done this, ﬁnd the Jacobian matrix of \(\Omega \).

Q 19.9 (Very traditional). Consider the map \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) given by \[f(x,y)=xy|x-y|.\] At which points is \(f\) diﬀerentiable? Prove your statements.

[Deal quickly with the easy part. Do not be surprised if the hard part reveals a special case that must be dealt with separately.]

Q 19.10 (Also very traditional). (i) Consider the function \(f:{\mathbb R}\rightarrow {\mathbb R}\) given by \(f(t)=t^{2}\sin 1/t\) for \(t\neq 0\), \(f(0)=0\). Show that \(f\) is everywhere diﬀerentiable and ﬁnd its derivative. Show that \(f'\) is not continuous. [Deal quickly with the easy part and then go back to the deﬁnition to deal with \(t=0\). There are wide selection of counter-examples obtained by looking at \(t^{\beta }\sin t^{\alpha }\) for various values of \(\alpha \) and \(\beta \).]

(ii) Find an inﬁnitely diﬀerentiable function \(g:(1,\infty )\rightarrow {\mathbb R}\) such that \(g(t)\rightarrow 0\) but \(g'(t)\nrightarrow 0\) as \(t\rightarrow \infty \).

(iii) Consider the function \(F:{\mathbb R}^{2}\rightarrow {\mathbb R}\) given by \(F(s,t)=f(s)+f(t)\) where \(f\) is the function deﬁned in (i). Show that \(F\) is everywhere diﬀerentiable but \(F_{,1}\) and \(F_{,2}\) are not continuous at \((0,0)\).

Q 19.11. Let \(\mathbf {u}\) and \(\mathbf {v}\) be linearly independent vectors in \({\mathbb R}^{2}\). Suppose that \(F:{\mathbb R}^{2}\rightarrow {\mathbb R}\) has the property that \[\frac {F({\mathbf x}+h{\mathbf u})-F({\mathbf x})}{h} \rightarrow a({\mathbf x}),\ \text {and} \ \frac {F({\mathbf x}+h{\mathbf v})-F({\mathbf x})}{h} \rightarrow b({\mathbf x})\] as \(h\rightarrow 0\) for all \({\mathbf x}\in {\mathbb R}^{2}\). Suppose further that \(a:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous. Show that \(F\) is everywhere diﬀerentiable.

Q 19.12. Show that Deﬁnition 7.4 can be modiﬁed to give the deﬁnition
of an operator norm \(\|\alpha \|\) for a linear map \(\alpha :{\mathbb C}^{m}\rightarrow {\mathbb C}^{p}\) and that the results of Lemma 7.6
continue hold mutatis mutandis^{11}.

If \(\alpha \) has matrix \((a_{ij})\) with respect to the standard bases show that \[\max _{i,j}|a_{ij}|\leq \|\alpha \|\leq mp\max _{i,j}|a_{ij}|.\]

For the rest of this question we take \(m=p\) (so we are dealing with endomorphisms of the vector space \({\mathbb C}^{m}\)).

(i) If \(\alpha \) has a lower triangular matrix \((a_{ij})\) with respect to the standard basis show that the set of its eigenvalues is precisely the set of the diagonal entries \(a_{ii}\). (Pure algebra.)

(ii) If \(\alpha \) has a lower triangular matrix \((a_{ij})\) and \(\epsilon >0\) show that we can ﬁnd \(\beta \) such that \(\beta \) has \(m\) distinct non-zero eigenvalues and \(\|\beta -\alpha \|<\epsilon \).

(iii) In algebra you show that given any endomorphism \(\alpha \) we can ﬁnd an invertible endomorphism \(\theta \) such that \(\theta \alpha \theta ^{-1}\) has a lower triangular matrix with respect to the standard basis. (N.B. This is true for complex vector spaces but not for real ones.) Use this fact to show that given any endomorphism \(\alpha \) and any \(\eta >0\) we can an endomorphism \(\gamma \) such that \(\|\gamma -\alpha \|<\eta \) and \(\gamma \) has \(m\) distinct non-zero eigenvalues.

(iv) Write \(P_{\alpha }(t)=\det (t\iota -\alpha )\) (where \(\iota \) is the identity endomorphism). Show that if \(\alpha \) has \(m\) distinct eigenvalues then \(P_{\alpha }(\alpha )=0\). (Pure algebra.)

(v) Let \(\alpha \) be any endomorphism. By considering a sequence \(\alpha _{n}\) of endomorphisms with \(m\) distinct eigenvalues such that \(\|\alpha _{n}-\alpha \|\rightarrow 0\) as \(n\rightarrow \infty \) show that \(P_{\alpha }(\alpha )=0\).

[This procedure is substantially longer than that used to prove the Cayley-Hamilton theorem in your algebra course but it is an instructive way of looking at things.]

I have tried to produce 12 rather routine questions for each sheet. Any remaining questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.

Q 20.1. At the beginning of the 19th century it was hoped that the calculus could be reduced to mere algebra by using Taylor’s theorem. The following example due to Cauchy shows why such hopes were misplaced.

Let \(E:{\mathbb R}\rightarrow {\mathbb R}\) be deﬁned by \(E(t)=\exp (-1/t^{2})\) for \(t\neq 0\) and \(E(0)=0\). Use induction to show that \(E\) is inﬁnitely diﬀerentiable with \begin {align*} E^{(n)}(t)&=Q_{n}(1/t)E(t)\qquad \text {for $t\neq 0$}\\ E^{(n)}(0)&=0. \end {align*}

where \(Q_{n}\) is a polynomial. For which values of \(t\) is it true that \[E(t)=\sum _{n=0}^{\infty }\frac {E^{n}(0)t^{n}}{n!}?\]

Explain why the behaviour of \(E\) is consistent with Lemma 8.7 (taking \(m=1\)).

Q 20.2. In the ﬁrst year you saw how the ﬁnd stationary points for a well behaved function \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) and how to decide (except in certain speciﬁed cases) whether they are maxima, minima or saddle-points. Use Theorem 8.3 to state and prove your results rigorously.

In the course on quadratic mathematics you will see that given any real symmetric \(m\times m\) matrix \(A\) you can ﬁnd a special orthogonal matrix \(P\) such that \(PAP^{T}\) is diagonal. Explain the relevance of this result to the problem of deciding (except in certain speciﬁed cases) when a stationary point of a well behaved function \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}\) is a maximum, a minimum or a saddle point.

Q 20.3. Deﬁne \(f:[0,1]\rightarrow {\mathbb R}\) by \(f(p/q)=1/q\) when \(p\) and \(q\) are coprime integers with \(1\leq p <q\) and \(f(x)=0\) otherwise.

(i) Show that \(f\) is Riemann integrable and ﬁnd \(\int _{0}^{1}f(x)\,dx\).

(ii) At which points is \(f\) continuous? Prove your answer.

Q 20.4. We say that a function \(f:[0,1]\rightarrow {\mathbb R}\) is of bounded variation if there exists a constant \(K\) such that whenever \(0\leq x_{0}\leq x_{1}\leq x_{2}\dots \leq x_{n}\leq 1\) we have \[\sum _{j=1}^{n}|f(x_{j-1})-f(x_{j})|\leq K.\]

(i) Show that if \(g:[0,1]\rightarrow {\mathbb R}\) is increasing then \(g\) is of bounded variation. Show that if \(h_{1},h_{2}:[0,1]\rightarrow {\mathbb R}\) are increasing then \(h=h_{1}-h_{2}\) is of bounded variation.

(ii) Suppose that \(f\) is of bounded variation. Show that \[f_{1}(x)=\sup \{ {\textstyle \sum _{j=1}^{n}f(x_{j})-f(y_{j})} :0\leq y_{1}\leq x_{1}\leq y_{2}\leq x_{2}\leq \dots \leq y_{n}\leq x_{n}\leq x\}\] is a well deﬁned increasing function. Show that \(f_{1}-f\) is also an increasing function and so \(f\) is the diﬀerence of two increasing functions.

(iii) Show that if we set \(f(x)=x\sin x^{-1}\) for \(x\neq 0\) and \(f(0)=0\) then \(f:[0,1]\rightarrow {\mathbb R}\) is a continuous function which is not of bounded variation.

[Remember diagrams may not be rigorous but they certainly help you understand what you are talking about.]

Q 20.5. If \(f:[0,1]\rightarrow {\mathbb R}\) is increasing show that if \(x_{n}\) is an increasing sequence then \(f(x_{n})\) tends to limit as \(n\rightarrow \infty \). Deduce carefully that if \(0<y\leq 1\) then \(f(x)\) tends to a limit \(f(y-)\) as \(x\rightarrow y\) through values of \(x\) with \(x<y\). (We shall set \(f(0-)=f(0)\).) Deﬁne \(f(y+)\) similarly.

Let us call \(f(y+)-f(y-)\) the jump \(J(y)\) at \(y\). Show that if \(y_{1}\), \(y_{2}\), …\(y_{n}\) are distinct points of \((0,1)\) then \[\sum _{j=1}^{n}J(y_{j})\leq f(1)-f(0).\] Show that \(f\) can have only ﬁnitely many jumps of size \(1/n\) or more. Deduce that \(f\) has only countably many jumps and that \(f\) must be continuous at some point.

Q 20.6. Let \(\mathcal K\) be the set of functions \(f:[0,1]\rightarrow {\mathbb R}\) such that \(f\) is continuous on \((0,1]\) and \(f\) has an improper Riemann integral \[{\mathcal I}f=\int _{0}^{1}f(x)\,dx.\]

Show that if \(f,g\in \mathcal K\) and \(\lambda ,\mu \in {\mathbb R}\) then \(\lambda f+\mu g\in \mathcal K\) and \[{\mathcal I}(\lambda f+\mu g)=\lambda {\mathcal I}f +\mu {\mathcal I}g.\]

By giving a proof or counter-example establish whether it is always true that if \(f,g\in {\mathcal K}\) then \(fg\in {\mathcal K}\).

Q 20.7. Suppose that \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous. By using results on the diﬀerentiation of integrals which should be quoted exactly show that \[\frac {d\ }{dt}\int _{a}^{t} \left (\int _{c}^{d}f(u,v)\,dv\right )\,du =\frac {d\ }{dt}\int _{c}^{d} \left (\int _{a}^{t}f(u,v)\,du\right )\,dv.\] Deduce that \[\int _{a}^{b} \left (\int _{c}^{d}f(u,v)\,dv\right )\,du =\int _{c}^{d} \left (\int _{a}^{b}f(u,v)\,du\right )\,dv,\] or, more informally \[\int _{a}^{b} \int _{c}^{d}f(u,v)\,dv\,du =\int _{c}^{d} \int _{a}^{b}f(u,v)\,du\,dv.\]

Q 20.8. Prove all the results of Section 11. (A long but easy and useful exercise. In order to keep the noise level from whinging students down to tolerable levels this counts as 2 questions.)

Q 20.9. See Question 20.8.

Q 20.10. Consider \(\mathcal E\) the space of linear maps \(\alpha :{\mathbb R}^{u}\rightarrow {\mathbb R}^{u}\). We wish to show that \(d(\alpha ,\beta )=\|\alpha -\beta \|\) (with \(\|\alpha \|\) the usual ‘operator norm’) deﬁnes a complete metric.

To this end consider a Cauchy sequence \(\alpha _{n}\) in \(\mathcal E\). Show that if \(\mathbf {x}\in {\mathbb R}^{u}\) then \(\alpha _{n}(\mathbf {x})\) tends to limit, call it \(\alpha (\mathbf {x})\). Now show that \(\alpha :{\mathbb R}^{u}\rightarrow {\mathbb R}^{u}\) is linear. By using the inequality \begin {align*} \|\alpha (\mathbf {x})-\alpha _{n}(\mathbf {x})\|& \leq \|\alpha _{m}(\mathbf {x})-\alpha _{n}(\mathbf {x})\| +\|\alpha _{m}(\mathbf {x})-\alpha (\mathbf {x})\|\\ &\leq \sup _{p,q\geq n}\|\alpha _{p}(\mathbf {x})-\alpha _{q}(\mathbf {x})\| +\|\alpha _{m}(\mathbf {x})-\alpha (\mathbf {x})\|, \end {align*}

and allowing \(m\rightarrow \infty \) show that \[\|\alpha -\alpha _{n}\|\leq \sup _{p,q\geq n}\|\alpha _{p}-\alpha _{q}\|,\] and deduce that \(\|\alpha -\alpha _{n}\|\rightarrow \infty \) as \(n\rightarrow \infty \).

Q 20.11. In Question 20.10 we showed that \(\mathcal {E}\) with the appropriate metric is complete. In this question we make use of the result.

(i) Suppose \(\alpha \in \mathcal {E}\) and \(\|\alpha \|<1\). Let \[\sigma _{n}=\iota +\alpha +\alpha ^{2}+\dots +\alpha ^{n}.\] Show that \(\sigma _{n}\) converges to a limit \(\sigma \), say.

Compute \((\iota -\alpha )\sigma _{n}\) and show, carefully that \((\iota -\alpha )\sigma =\iota \).

(iii) Deduce that if \(\|\iota -\gamma \|<1\) then \(\gamma \) is invertible.

(iv) Show that if \(\beta \in \mathcal {E}\) is invertible and \(\|\beta -\tau \|<\|\beta ^{-1}\|^{-1}\) then \(\tau \) is invertible.

(v) Conclude that the invertible elements of \(\mathcal E\) form an open set.

Q 20.12. Consider \(C([-1,1])\) the set of continuous functions \(f:[-1,1]\rightarrow {\mathbb R}\). Show that, if we set \[d(f,g)=\|f-g\|_{2}= \left (\int _{-1}^{1}|f(x)-g(x)|^{2}\,dx\right )^{1/2}\] then \(d\) is a metric but \(d\) is not complete.

Show that if \(\|f-g\|_{1}\) is deﬁned as in Example 13.8 then \(2^{1/2}\|f\|_{2}\geq \|f\|_{1}\) for all \(f\in C([-1,1])\) but that, given any constant \(K>0\) we can ﬁnd a \(g\in C([-1,1])\) such that \(\|g\|_{2}\geq K\|g\|_{1}\).

The remaining question is included for general interest and not for relevance to the syllabus or to passing Tripos exams.

Q 20.13. We use the notation of Questions 20.10 and 20.11.

(i) Show that if \(\alpha \in \mathcal {E}\) we can ﬁnd \(e^{\alpha }\in \mathcal {E}\) such that \[\left \|\sum _{r=0}^{n}\frac {\alpha ^{r}}{r!} -e^{\alpha }\right \|\rightarrow 0\] as \(n\rightarrow \infty \).

(ii) Show carefully that if \(\alpha \) and \(\beta \) commute \[e^{\alpha }e^{\beta }=e^{\alpha +\beta }.\]

(iii) Show that if \(\alpha \) and \(\beta \) are general (not necessarily commuting) elements of \(\mathcal E\) then \[\left \|h^{-2}(e^{h\alpha }e^{h\beta }-e^{h\beta }e^{h\alpha }) -(\alpha \beta -\beta \alpha )\right \|\rightarrow 0\] as the real number \(h\rightarrow 0\).

Conclude that, in general, \(e^{\alpha }e^{\beta }\) and \(e^{\alpha +\beta }\) need not be equal.

I have tried to produce 12 rather routine questions for each sheet. The remaining questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.

Q 21.1. Consider the two railway metrics \(d_{1}\) and \(d_{2}\) on \({\mathbb R}^{2}\) given in Examples 11.3 and 11.4. For each of the two metrics give a reasonably simple description of the open balls of radius \(r\) and centre \(\mathbf x\) when \({\mathbf x}={\mathbf 0}\) and when \({\mathbf x}\neq {\mathbf 0}\) and (a) \(r\leq \|{\mathbf x}\|\), (b) \(r<\|{\mathbf x}\|\leq 2r\), (c) \(2r<\|{\mathbf x}\|\).

Explain the statement ‘when ﬁnding open sets only balls of small radius are important so cases (b) and (c) are irrelevant’.

Give the simplest description you can ﬁnd of open sets in the two metrics.

Q 21.2. Let \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) be diﬀerentiable and let \(g(x)=f(x,c-x)\) where \(c\) constant. Show that \(g:{\mathbb R}\rightarrow {\mathbb R}\) is diﬀerentiable and ﬁnd its derivative

(i) directly from the deﬁnition of diﬀerentiability

and also

(ii) by using the chain rule.

Deduce that if \(f_{,1}=f_{,2}\) throughout \(\mathbb R\) then \(f(x,y)=h(x+y)\) for some diﬀerentiable function \(h\).

Q 21.3 (Traditional). Consider the functions \(f_{n}:[0,1]\rightarrow {\mathbb R}\) deﬁned by \(f_{n}(x)=n^{p}x\exp (-n^{q}x)\) where \(p,q>0\).

(i) Show that \(f_{n}\) converges pointwise on \([0,1]\).

(ii) Show that if \(p<q\) then \(f_{n}\) converges uniformly on \([0,1]\).

(iii) Show that if \(p\geq q\) then \(f_{n}\) does not converge uniformly on \([0,1]\). Does \(f_{n}\) converge uniformly on \([0,1-\epsilon ]\)? Does \(f_{n}\) converge uniformly on \([\epsilon ,1]\)? (Here \(0<\epsilon <1\), you should justify your answers.)

Q 21.4. Let \((X,d)\) be a metric space. We say that \(E\) is dense in \(X\) if whenever \(x\in X\) we can ﬁnd \(e_{n}\in E\) with \(e_{n}\rightarrow x\).

(i) Show that the rationals are dense in the reals (i.e. \(\mathbb Q\) is dense in \(\mathbb R\) if we give \(\mathbb R\) the usual metric). Show that the irrationals are dense in the reals.

[You should make explicit any use you make of the axiom of Archimedes.]

(ii) Consider the space \(C([0,1])\) of continuous functions with the uniform norm. Show that the piecewise linear functions are dense in \(C([0,1])\).

Q 21.5. Recall from your courses in mathematical methods that, if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) then we write \[\hat {f}(n)=\frac {1}{2\pi } \int _{-\pi }^{\pi }f(t)\exp (-int)\,dt.\]

Show, quoting carefully any results that you use, that, if \(a_{n}\in {\mathbb C}\) and \(\sum _{n=-N}^{N} |a_{n}|\) converges, then \(\sum _{n=-N}^{N} a_{n}\exp (int)\) converges uniformly to a continuous periodic function \(f(t)\), say, with period \(2\pi \). Show that \(\hat {f}(n)=a_{n}\). for each \(n\).

[This chain of thought is continued in Question 21.14.]

Q 21.6. Recall that \(\|\ \|\) is said to be a norm on a real vector space \(V\) if the following conditions hold.

(1) \(\|\mathbf {x}\|\geq 0\) for all \(\mathbf {x}\in V\).

(2) \(\|\mathbf {x}\|=0\) implies \(\mathbf {x}=\mathbf {0}\).

(3) \(\|\lambda \mathbf {x}\|=|\lambda |\|\mathbf {x}\|\) for all \(\lambda \in {\mathbb R}\) and all \(\mathbf {x}\in V\).

(4) \(\|\mathbf {x}+\mathbf {y}\|\leq \|\mathbf {x}\|+ \|\mathbf {y}\|\) for all \(\mathbf {x},\mathbf {y}\in {\mathbb R}\).

(i) Let \(\|\ \|_{2}\) be the usual Euclidean norm on \({\mathbb R}^{m}\) and let \(\|\ \|\) be another norm on \({\mathbb R}^{m}\). Let \(I:({\mathbb R}^{m},\|\ \|_{2})\rightarrow ({\mathbb R}^{m},\|\ \|)\) be the identity map, given by \(I\mathbf {x}=\mathbf {x}\). By considering basis vectors or otherwise that there exists a constant \(K\) such that \(\|\mathbf {x}\|\leq K\|\mathbf {x}\|_{2}\) for all \(\|\mathbf {x}\|\in {\mathbb R}^{m}\). Deduce that \(I\) is continuous.

By applying a theorem on continuous functions on closed bounded sets show that there exists a \(k>0\) such that \(\|\mathbf {x}\|\geq k\) for all \(\mathbf {x}\) with \(\|\mathbf {x}\|_{2}=1\). Conclude that \[k\|\mathbf {x}\|_{2}\leq \|\mathbf {x}\|\leq K\|\mathbf {x}\|_{2}\] for all \(\|\mathbf {x}\|\in {\mathbb R}^{m}\).

[Thus all norms on a ﬁnite dimensional space are essentially the same.]

(ii) Consider the real vector space \(\mathbb {R}^{\mathbb {N}}\) of all real sequences \(\mathbf {x}=(x_{1},x_{2},\dots )\) with the usual vector addition and multiplication by scalars. Let \(V\) be the set of all sequences \(\mathbf {x}=(x_{1},x_{2},\dots )\) such that \(x_{j}\neq 0\) for only ﬁnitely many \(j\). Show that \(V\) is a subspace of \(\mathbb {R}^{\mathbb {N}}\) and so a vector space in its own right. By considering norms of the form \(\|\mathbf {x}\|=\max _{j}\kappa _{j}|x_{j}|\), or otherwise, ﬁnd two norms \(\|\ \|_{A}\) and \(\|\ \|_{B}\) such that \[\sup _{\|\mathbf {x}\|_{A}=1}\|\mathbf {x}\|_{B} =\sup _{\|\mathbf {x}\|_{B}=1}\|\mathbf {x}\|_{A}=\infty .\]

Q 21.7. (i) Suppose that \(f:{\mathbb R}\rightarrow {\mathbb R}\) is a twice diﬀerentiable function such that \[\left |\frac {f(x)f''(x)}{f'(x)^{2}}\right | \leq \lambda \] for all \(x\) and some \(|\lambda |<1\) Show that the mapping \[Tx=x-\frac {f(x)}{f'(x)}\] is a contraction mapping and deduce that \(f\) has a unique root \(y\).

(ii) Suppose that \(F:{\mathbb R}\rightarrow {\mathbb R}\) is a twice diﬀerentiable function such that \[\left |\frac {F(x)F''(x)}{F'(x)^{2}}\right | \leq \lambda \] for all \(|x|\leq a\) and some \(|\lambda |<1\) and that \(F(0)=0\). Consider the mapping \[Tx=x-\frac {F(x)}{F'(x)}.\] Show that \(T^{n}x\rightarrow 0\).

Suppose that \[\frac {\sup _{|t|\leq a}|f'(t)|\sup _{|t|\leq a}|f''(t)|} {\inf _{|t|\leq a}|f'(t)|^{2}}=M.\] By using the mean value theorem twice, show that if \(|x|\leq a\) then \[|Tx|\leq Mx^{2}.\]

(iii) If you know what the Newton–Raphson method is, comment on the relevance of the results of (i) and (ii) to that method.

Q 21.8. We work in \({\mathbb R}^{m}\) with the usual ‘Euclidean norm’. The following line of thought is extremely important in later work. Suppose we want the solution \({\mathbf x}_{0}\) of \[{\mathbf x}+{\boldsymbol \epsilon }({\mathbf x})= {\mathbf y}\] where \({\boldsymbol \epsilon }({\mathbf x})\) is ‘a small error term’ or ‘of ﬁrst order compared to \(\mathbf x\). The following method of solution is traditional (for the excellent reason that it usually works like a Spanish charm). Start by ignoring the small \(\boldsymbol \epsilon \) term and guess \({\mathbf x}_{0}\approx {\mathbf x}_{1}={\mathbf y}\). Since this initial guess is good we estimate the small \(\boldsymbol \epsilon \) term as \({\boldsymbol \epsilon }({\mathbf x}_{1})\). Feeding this estimate back into the equation we now guess \({\mathbf x}_{0}\approx {\mathbf x}_{2}={\mathbf y}-{\boldsymbol \epsilon }({\mathbf x}_{1})\). Repeating this process gives us the rule for successive approximations \[\mathbf {x}_{n+1}=\mathbf {y}-{\boldsymbol \epsilon }(\mathbf {x}_{n}).\] We now try to justify this in certain cases.

(i) Suppose that \({\boldsymbol \epsilon }({\mathbf 0})={\mathbf 0}\) and \[\|{\boldsymbol \epsilon }({\mathbf a})- {\boldsymbol \epsilon }({\mathbf b})\| <\|{\mathbf a}-{\mathbf b}\|/2\] for all \(\|\mathbf {a}\|,\|\mathbf {b}\|\leq \delta \) where \(\delta >0\). (Note that these are rather stronger conditions than might have been expected. However, this is not to say that the general idea might not work under weaker conditions.) We shall show that the method works for \(\|{\mathbf y}\|\leq \delta /2\).

Consider the closed ball \(B=\{{\mathbf x}:\|{\mathbf x}\|\leq \delta \}\). Show that the equation \[T({\mathbf x})=\mathbf {y}-{\boldsymbol \epsilon }({\mathbf x})\] deﬁnes a map \(T:B\rightarrow B\). Now use the contraction mapping theorem to show that \[{\mathbf x}+{\boldsymbol \epsilon }({\mathbf x})= {\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta \) and that \(T^{n}({\mathbf y})\rightarrow {\mathbf x}(\mathbf {y})\) as \(n\rightarrow \infty \).

(ii) Suppose that \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is a diﬀerentiable function with derivative continuous at \(\mathbf 0\), \(f({\mathbf 0})={\mathbf 0}\) and \(Df({\mathbf 0})=I\). Show that we can ﬁnd a \(\delta >0\) such that whenever \(\|{\mathbf y}\|\leq \delta /2\) the equation \[f({\mathbf x})={\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta \).

(iii) Suppose that \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is a diﬀerentiable function with derivative continuous at \(\mathbf 0\), \(f({\mathbf 0})={\mathbf 0}\) and \(Df({\mathbf 0})\) invertible. Show that we can ﬁnd a \(\delta _{1},\delta _{2}>0\) such that whenever \(\|{\mathbf y}\|\leq \delta _{1}\) the equation \[f({\mathbf x})={\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta _{2}\).

Q 21.9. (A method of Abel) (i) Suppose that \(a_{j}\) and \(b_{j}\) are sequences of complex numbers and that \(S_{n}=\sum _{j=1}^{n}a_{j}\) for \(n\geq 1\) and \(S_{0}=0\). Show that, if \(1\leq u\leq v\) then \[\sum _{j=u}^{v}a_{j}b_{j}=\sum _{j=u}^{v}S_{j}(b_{j}-b_{j+1}) -S_{u-1}b_{u}+S_{v}b_{v+1}.\] (This is known as partial summation, for obvious reasons.)

(ii) Suppose now that, in addition, the \(b_{j}\) form a decreasing sequence of positive terms and that \(|S_{n}|\leq K\) for all \(n\). Show that \[\left |\sum _{j=u}^{v}a_{j}b_{j}\right | \leq 2Kb_{u}.\] Deduce that if \(b_{j}\rightarrow 0\) as \(j\rightarrow \infty \) then \(\sum _{j=1}^{\infty }a_{j}b_{j}\) converges.

Deduce the alternating series test.

(iii) If \(b_{j}\) is a decreasing sequence of positive terms with \(b_{j}\rightarrow 0\) as \(j\rightarrow \infty \) show that \(\sum _{j=1}^{\infty }b_{j}z^{j}\) converges uniformly in the region given by \(|z|\leq 1\) and \(|z-1|\geq \epsilon \) for all \(\epsilon >0\).

Q 21.10. We work in \({\mathbb R}^{m}\) with the usual distance.

(i) Show that if \(A_{1}\), \(A_{2}\), are non-empty, closed and bounded with \(A_{1}\supseteq A_{2}\supseteq \dots \) then \(\bigcap _{j=1}^{\infty }A_{j}\) is non empty. Is this result true if we merely assume \(A_{j}\) closed and non-empty? Give reasons.

(ii) If \(A\) is non-empty, closed and bounded show that we can ﬁnd \({\mathbf a}',{\mathbf b}'\in A\) such that \[\|{\mathbf a}'-{\mathbf b}'\|\geq \|{\mathbf a}-{\mathbf b}\|\] for all \({\mathbf a},{\mathbf b}\in A\). Is this result true if we merely assume \(A\) bounded and non-empty? Give reasons.

Q 21.11. We work in \({\mathbb R}^{m}\) with the usual distance. Let \(E\) be a closed non-empty subset of \({\mathbb R}^{m}\) and let \(T\) be a map \(T:E\rightarrow E\).

(i) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|<\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). We saw in Example 15.2 that \(T\) need not have a ﬁxed point. Show that, if \(T\) has a ﬁxed point, it is unique.

(ii) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|>\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). In Question 21.17 it is shown that \(T\) need not have a ﬁxed point. Show that, if \(T\) has a ﬁxed point, it is unique.

(iii) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|=\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\). Show that \(T\) need not have a ﬁxed point. and that, if \(T\) has a ﬁxed point, it need not be unique.

(iv) Suppose now that \(E\) is non-empty, closed and bounded and \[\|T({\mathbf a})-T({\mathbf b})\|<\|{\mathbf a}-{\mathbf b}\|\] for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). By considering \(\inf _{{mathbf x}\in E}\|{\mathbf x}-T({mathbf x})\|\), or otherwise show that \(T\) has a ﬁxed point.

Q 21.12. From time to time numerical analysts mention the spectral radius. It forms no part of any of non-optional 1B courses but the reader may be interested to see what it is.

(i) Give an example of a linear map \(\beta :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) such that \(\beta ^{m-1}\neq {\mathbf 0}\) but \(\beta ^{m}={\mathbf 0}\).

(ii) Let \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) be linear. If \(n=jk+r\) explain why \[\|\alpha ^{jk+r}\|\leq \|\alpha ^{j}\|^{k}\|\alpha \|^{r}.\]

(iii) Continuing with the hypotheses of (ii), show that \(\Delta =\inf _{n}\|\alpha ^{n}\|^{1/n}\) is well deﬁned and, by using the result of (ii), or otherwise, that \[\|\alpha ^{n}\|^{1/n}\rightarrow \Delta .\] We call \(\Delta \) the spectral radius of \(\alpha \) and write \(\rho (\alpha )=\Delta \).

(iv) If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is diagonalisable show that \[\rho (\alpha )=\max \{|\lambda |:\lambda \ \text {an eigenvalue of $\alpha $}\}.\]

(v) Give an example of linear maps \(\alpha ,\beta :{\mathbb R}^{2}\rightarrow {\mathbb R}^{2}\) such that \(\rho (\alpha )=\rho (\beta )=0\) but \(\rho (\alpha +\beta )=1\).

(vi) Recalling Question 20.11 of the third sheet, show that if \(\rho (\iota -\gamma )<1\) then \(\gamma \) is invertible.

The remaining questions are included for general interest and not for relevance to the syllabus or to passing Tripos exams.

Q 21.13. We work in \({\mathbb R}^{m}\), \(\|{\mathbf x}-{\mathbf y}\|\) will represent the usual Euclidean distance between \(\mathbf x\) and \(\mathbf y\).

(i) If \(K\) is a closed non-empty bounded set and \(\mathbf x\) is any point, show that there exists a point \({\mathbf k}'\in K\) such that \[\|{\mathbf x}-{\mathbf k}\|\geq \|{\mathbf x}-{\mathbf k}'\|\] for all \({\mathbf k}\in K\). Is \({\mathbf k}'\) necessarily unique?

(ii) If \(E\) is a non-empty closed set and \(\mathbf x\) is any point, show that there exists a point \({\mathbf e}'\in E\) such that \[\|{\mathbf x}-{\mathbf e}\|\geq \|{\mathbf x}-{\mathbf e}'\|\] for all \({\mathbf e}\in E\). We write \(d({\mathbf x},E)=\|{\mathbf x}-{\mathbf e}'\|.\)

(iii) With the notation of (ii) show that \[d({\mathbf x},E)+\|{\mathbf x}-{\mathbf y}\| \geq d({\mathbf y},E).\] Show that \(d(\ ,E):{\mathbb R}^{m}\rightarrow {\mathbb R}\) is continuous.

(iv) If \(E\) is closed and non-empty and \(K\) closed, bounded and non-empty show that there exist \({\mathbf e}'\in E\) and \({\mathbf k}'\in K\) such that \[\|{\mathbf e}-{\mathbf k}\|\geq \|{\mathbf e}'-{\mathbf k}'\|.\] Would this result be true if we only assumed \(E\) and \(K\) closed and non-empty.

Q 21.14. (i) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\hat {f}(n)=0\) for all \(n\) then \[\int _{-\pi }^{\pi }(1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}f(t)\,dt=0\] for all \(N\).

(ii) Show that, given \(\delta >0\) we can ﬁnd \(\epsilon _{1},\ \epsilon _{2}>0\) and an \(\eta >0\) such that \begin {align*} (1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}\rightarrow \infty &\ \text {uniformly for $|t|\leq \eta $}\\ (1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}\rightarrow 0&\ \text {uniformly for $\delta \leq |t|\leq \pi $} \end {align*}

as \(N\rightarrow \infty \).

(iii) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(f(0)\) is real and \(f(0)>0\) we can ﬁnd \(\epsilon _{1},\ \epsilon _{2}>0\) such that \[\Re \int _{-\pi }^{\pi }(1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}f(t)\,dt \rightarrow \infty \] as \(N\rightarrow \infty \).

(iv) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\hat {f}(n)=0\) for all \(n\) then \(f(t)=0\) for all \(t\).

(v) By using Question 21.5, or otherwise, show that if if \(F:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\sum |\hat {F}(n)|\) converges then \[F(t)=\sum _{n=-\infty }^{\infty }\hat {F}(n)\exp (int).\]

Q 21.15. (i) Suppose that \(g:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \). Show that \[\frac {1}{2\pi }\int _{0}^{2\pi } \left |g(t)-\sum _{r=-N}^{N}\hat {g}(r)\exp (irt)\right |^{2} \,dt =\frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt -\sum _{r=-N}^{N}|\hat {g}(r)|^{2}.\] Deduce that \[\sum _{r=-N}^{N}|\hat {g}(r)|^{2} \leq \frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt.\] (This is a version of Bessel’s inequality.)

(ii) Now suppose that \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuously diﬀerentiable periodic function with period \(2\pi \). If \(f\) has derivative \(g\), obtain a simple relation between \(\hat {f}(n)\) and \(\hat {g}(n)\). By applying the Cauchy-Schwarz inequality to \(\sum _{n\neq 0,\ |n|\leq N} |n||\hat {g}(n)|\) show that \[\sum _{n\neq 0,\ |n|\leq N} |\hat {f}(n)| \leq A \frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt,\] for some constant \(A\). Conclude that \(\sum |\hat {f}(n)|\) converges.

(iii) Deduce, using Question 21.14, that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuously diﬀerentiable periodic function with period \(2\pi \) then \[f(t)=\sum _{n=-\infty }^{\infty }\hat {f}(n)\exp (int).\]

Q 21.16. We continue with the ideas of Question 21.13. From now on all sets will belong to the collection \(\mathcal {K}\) of closed bounded non-empty subsets of \({\mathbb R}^{m}\). Deﬁne \[d(E,F)=\sup _{e\in E}d(e,F)+\sup _{f\in F}d(f,E).\] Show that \(d\) is a metric on \(\mathcal {K}\).

Show also that \(d\) is a complete metric.

Q 21.17. If \((X,d)\) is a complete metric space and \(T:X\rightarrow X\) is a surjective map such that \[d(Tx,Ty)\geq Kd(x,y)\] for all \(x,y\in X\) and some \(K>1\) show that \(T\) has a unique ﬁxed point.

By considering the map \(T:{\mathbb R}\rightarrow {\mathbb R}\) deﬁned by \(T(x)=1+4n+2x\) for \(0\leq x<1\) and \(n\) an integer, or otherwise show that the condition \(T\) surjective can not be dropped.

Q 21.18 (A continuous nowhere diﬀerentiable function). The following construction is due to Van der Waerden.

(i) Sketch the graph of the function \(g\) given by the condition \[g(x+k)=|x|\ \ \ \ \mbox {if $k$ is any integer and $-1/2<x\leq 1/2$.}\]

(ii) Let \(g_{n}(t)=2^{-n}g(8^{n}t)\). Sketch the graphs of \(g_{1}\), \(g_{2}\) and \(g_{3}\).

(iii) Let \(f_{n}(t)=g_{1}(t)+g_{2}(t)+\ldots +g_{n}(t)\). Sketch the graphs of \(f_{1}\), \(f_{2}\) and \(f_{3}\).

(iv) Show that \(f_{n}\) converges uniformly to a continuous function \(f\).

(v) If \(n\) is a positive integer and \(r\) an odd integer give an expression for \(f(r2^{-n})\). Show that when \(n\) is large then \(2^{n}|f((r+2)2^{-n})-f(r2^{-n})|\) is large. Conclude that if \(r2^{-n}<t<(r+2)2^{-n}\) then \[\max \left (\left |\frac {f(t)-f(r2^{-n})}{t-r2^{-n}}\right |, \left |\frac {f((r+2)2^{-n})-f(t)}{(r+2)2^{-n}-t}\right |\right ).\]

(vi) By developing the ideas of part (v) show that \(f\) is nowhere diﬀerentiable.

Final Note To Supervisors

Let me reiterate my request for corrections and improvements particularly to the exercises. The easiest path for me is e-mail. My e-mail address is twk@dpmms. I have to express my gratitude to Drs Barden and Pinch for ﬁnding a multitude of errors but I am sure that once the ﬁrst veil of error has been lifted a further veil of deeper errors will appear.

^{1}Footnote for passing historians, this is a course in mathematics.

^{2}Indeed, anyone who works through the exercises in Hardy as a ﬁrst course and the
exercises in Whittaker and Watson’s even older A Course of Modern Analysis [11] as a second
will have had a splendid education in analysis.

^{3}Of course economists know this but few of their public pronouncements seem to mention
it.

^{4}In [2] Boas reports the story of a friend visiting the Princeton common room ‘…where
Einstein was talking to another man, who would shake his head and stop him; Einstein then
thought for a while, then started talking again; was stopped again; and so on. After a while, …my
friend was introduced to Einstein. He asked Einstein who the other man was. “Oh,” said
Einstein, “that’s my mathematician.” ’

^{5}I know of two universities which teach undergraduates Lebesgue theory in their ﬁrst
year but of no university where undergraduates learn Lebesgue theory in their ﬁrst
year.

^{6}‘Shut up!’ he explained.

^{7}There is an appropriate theory for objects with order (lattices) but we shall not pursue
it here.

^{8}Or according to a rival school of thought the ‘\(3\epsilon \) Theorem’.

^{9}Most of the new words in this last paragraph are well worth dropping.

You must lie among the daisies and discourse in novel phrases of your
complicated state of mind,

The meaning doesn’t matter if it’s only idle chatter of a transcendental
kind.

And everyone will say,

As you walk your mystic way,

‘If this young man expresses himself in terms too deep for me

Why, what a very singularly deep young man this deep young man must
be!’

^{10}Boas notes that ‘There is a test for identifying some of the future professional
mathematicians at an early age. These are students who instantly comprehend a sentence
beginning “Let \(X\) be an ordered quintuple \((a,T,\pi ,\sigma ,{\mathcal B})\) where …’. They are even more promising if they add, ‘I
never really understood it before.” ’ ([2] page231.)

^{11}Changing those things that need to be changed.