It is surprising how many people think that that analysis consists in the difficult proofs of obvious theorems. All we need know, they say, is what a limit is, the definition of continuity and the definition of the derivative. All the rest is ‘intuitively clear’.
If pressed they will agree that these definitions apply as much to the rationals \(\mathbb Q\) as to the real numbers \(\mathbb R\). They then have to explain the following interesting example.
Example 1.1. If \(f:{\mathbb Q}\rightarrow {\mathbb Q}\) is given by
\begin{alignat*}{2} f(x)&=-1&&\qquad\text{if $x^{2}<2$,}\\ f(x)&=1&&\qquad\text{otherwise,} \end{alignat*}
then
(i) \(f\) is continuous function with \(f(0)=-1\), \(f(2)=1\) yet there does not exist a \(c\) with \(f(c)=0\),
(ii) \(f\) is a differentiable function with \(f'(x)=0\) for all \(x\) yet \(f\) is not constant.
What is the difference between \(\mathbb R\) and \(\mathbb Q\) which makes calculus work on one even though it fails on the other. Both are ‘ordered fields’, that is, both support operations of ‘addition’ and ‘multiplication’ together with a relation ‘greater than’ (‘order’) with the properties that we expect. If the reader is interested she will find a complete list of the appropriate axioms in texts like the altogether excellent book of Spivak [8] and its many rather less excellent competitors, but, interesting as such things may be, they are irrelevant to our purpose which is not to consider the shared properties of \(\mathbb R\) and \(\mathbb Q\) but to identify a difference between the two systems which will enable us to exclude the possibility of a function like that of Example 1.1 for functions from \(\mathbb R\) to \(\mathbb R\).
To state the difference we need only recall a definition from the beginning of the course C5/6.
Definition 1.2. If \(a_{n}\in {\mathbb R}\) for each \(n\geq 1\) and \(a\in {\mathbb R}\) then we say that \(a_{n}\rightarrow a\) if given \(\epsilon >0\) we can find an \(n_{0}(\epsilon )\) such that \[|a_{n}-a|<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]
The key property of the reals, the fundamental axiom which makes everything work was also stated in the course C5/6.
Axiom 1.3 (The fundamental axiom of analysis). If \(a_{n}\in {\mathbb R}\) for each \(n\geq 1\), \(A\in {\mathbb R}\) and \(a_{1}\leq a_{2}\leq a_{3}\leq \ldots \) and \(a_{n}<A\) for each \(n\) then there exists an \(a\in {\mathbb R}\) such that \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \).
Less ponderously, and just as rigorously, the fundamental axiom for the real numbers says every increasing sequence bounded above tends to a limit.
Everything which depends on the fundamental axiom is analysis, everything else is mere algebra.
In the course C5/6 you proved the following results on the limits of sequences.
Lemma 2.1. (i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).
(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).
(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).
(iv) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}+b_{n}\rightarrow a+b\).
(v) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}b_{n}\rightarrow ab\).
(vi) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(a_{n}\neq 0\) for each \(n\), \(a\neq 0\) then \(a_{n}^{-1}\rightarrow a^{-1}\).
(vii) If \(a_{n}\leq A\) for each \(n\) and \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) then \(a\leq A\).
Since these results were proved in the earlier course I shall not prove them. However, it is essential for further progress that the reader should be able to prove them for herself. If you can not prove these results consult your supervisor at once. You should do the same if you cannot prove the following variation on the fundamental axiom.
[Hint. If \(a\leq b\) then \(-b\leq -a\).]
Useful as the results of Lemma 2.1 are, they are also true of sequences in \(\mathbb Q\). They are therefore mere, if important, algebra. Our first truly ‘analysis’ result may strike the reader as rather odd.
Theorem 2.3 shows that there is no ‘exotic’ real number \(\gimel \) say (to choose an exotic symbol) with the property that \(1/n>\gimel \) for all integers \(n\geq 1\) yet \(\gimel >0\) (that is, \(\gimel \) is strictly positive and yet smaller than all strictly positive rationals). There exist number systems with such exotic numbers (the famous ‘non-standard analysis’ of Abraham Robinson and the ‘surreal numbers’ of Conway constitute two such systems) but, just as the rationals are, in some sense, too small a system for the standard theorems of analysis to hold so these non-Archimedean systems are, in some sense, too big. Archimedes and Eudoxus realised the need for an axiom to show that there is no exotic number \(\daleth \) bigger than any integer1 (i.e. \(\daleth >n\) for all integers \(n\geq 1\); to see the connection with our form of the axiom consider \(\gimel =1/\daleth \)). However, in spite of its name, what was an axiom for Archimedes is a theorem for us.
It is all very well knowing that the fundamental axiom is the foundation of mathematics but how can we find a proof technique which will allow us to apply it. One technique uses the Bolzano–Weierstrass theorem.
Theorem 3.1 (Bolzano–Weierstrass). If \(x_{n}\in {\mathbb R}\) and there exists a \(K\) such that \(|x_{n}|\leq K\) for all \(n\) then we can find \(n(1)<n(2)<\ldots \) and \(x\in {\mathbb R}\) such that \(x_{n(j)}\rightarrow x\) as \(j\rightarrow \infty \).
Recall from C5/6 that we say that a sequence converges if it tends to a limit. The Bolzano–Weierstrass theorem thus says that every bounded sequence of reals has a convergent subsequence. Notice that we say nothing about uniqueness, if \(x_{n}=(-1)^{n}\) then \(x_{2n}\rightarrow 1\) but \(x_{2n+1}\rightarrow -1\) as \(n\rightarrow \infty \).
There is a proof of the Bolzano–Weierstrass theorem by repeated dissection along the lines of the proof of the intermediate value theorem given in course C5/6. We shall give another proof based on the following elegant combinatorial lemma.
Lemma 3.2. If \(x_{n}\in {\mathbb R}\) then at least one of the following two statements must be true.
(A) There exist \(n(1)<n(2)<\ldots \) such that \(x_{n(j)}\leq x_{n(j+1)}\) for each \(j\geq 1\).
(B) There exist \(m(1)<m(2)<\ldots \) such that \(x_{m(j)}\geq x_{m(j+1)}\) for each \(j\geq 1\).
There are a variety of techniques for using the fundamental axiom:- successive dissection, using Heine–Borel, looking for the supremum and so on. When the reader is confident in her control of the material in this course (and when she knows the various techniques enumerated in the last sentence) she should test her skill by proving the main results of this course by using each technique in turn. For the moment it seems sensible to concentrate on one proof technique until it is fully mastered. The technique chosen for this exposition is that of Bolzano–Weierstrass. We conclude this section with two important examples, Theorem 3.6 on the existence of suprema and Theorem 3.13 on the general principle of convergence.
It is an unfortunate fact of life that many, otherwise perfectly well behaved sets of numbers do not have greatest members (maxima). A simple example is given by \(\{x\in {\mathbb R}:0<x<1\}\). However, we shall see in Theorem 3.6 any non-empty bounded set of real numbers does have a least upper bound (supremum).
Definition 3.3. Consider a non-empty set \(A\) of real numbers. We say that \(\alpha \) is a least upper bound (or supremum) for \(A\) if the following two conditions hold.
(i) \(\alpha \geq a\) for all \(a\in A\) (that is, \(\alpha \) is an upper bound).
(ii) If \(\beta \geq a\) for all \(a\in A\) then \(\beta \geq \alpha \) (that is, \(\alpha \) is no greater than any possible upper bound).
It is convenient to have the following alternative characterisation of the supremum.
Lemma 3.5. Consider a non-empty set \(A\) of real numbers; \(\alpha \) is a least upper bound for \(A\) if and only if the following two conditions hold.
(i) \(\alpha \geq a\) for all \(a\in A\).
(ii) Given \(\epsilon >0\) there exists an \(a\in A\) such that \(a+\epsilon \geq \alpha \).
We write \(\sup A\) or \(\sup _{a\in A}a\) for the supremum of \(A\) if it exists.
Here is the promised theorem.
Theorem 3.6 (Existence of the supremum). If \(A\) is a non empty set of real numbers which is bounded above (that is, there exists a \(K\) such that \(a\leq K\) for all \(a\in A\)) then \(A\) has a supremum.
That this is a genuine theorem of analysis is shown by the fact that, if we work in \(\mathbb Q\) the set \(\{x\in \mathbb Q:x^{2}<2\}\) has no least upper bound. (See Question 18.13 for a detailed discussion.)
We leave it to the reader to define the greatest lower bound also called the infimum of a set \(A\) and written \(\inf A\) or \(\inf _{a\in A}a\). She should prove that \[\inf _{a\in A}a=-\sup _{a\in A}(-a)\] provided that either side of the equation exists.
As an example of its use, recall the following lemma from course C5/6.
Lemma 3.7. We work in \(\mathbb C\). If \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for \(z=z_{0}\) then it converges for all \(z\) with \(|z|<|z_{0}|\).
The use of Theorem 3.6 gives us a rather transparent proof of the existence of the radius of convergence.
Theorem 3.8. We work in \(\mathbb C\). Consider the sum \(\sum _{n=0}^{\infty }a_{n}z^{n}\). Either \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for all \(z\) (in this case we say that the series has infinite radius of convergence) or there exists a real number \(R\geq 0\) such that
(i) \(\sum _{n=0}^{\infty }a_{n}z^{n}\) converges for all \(|z|<R\),
(ii) \(\sum _{n=0}^{\infty }a_{n}z^{n}\) diverges for all \(|z|>R\),
(in this case we say that the series has radius of convergence R).
The nature of the supremum is illustrated by the following example which we leave to the reader as an exercise in the methods of the course C5/6. (As usual contact your supervisor if you can not do it.)
Example 3.9. (i) The sum \(\sum _{n=1}^{\infty }n^{-2}z^{n}\) has radius of convergence \(1\) and converges for all \(|z|=1\).
(ii) The sum \(\sum _{n=0}^{\infty }z^{n}\) has radius of convergence \(1\) and diverges for all \(|z|=1\).
(iii) The sum \(\sum _{n=1}^{\infty }n^{-1}z^{n}\) has radius of convergence \(1\), diverges for \(z=1\) and converges for \(z=-1\).
[When revising this part of the course you should also bear in mind Theorem 14.20 discussed later.]
We turn now to the general principle of convergence.
Definition 3.10. We say that a sequence of real numbers \(x_{n}\) is a Cauchy sequence if given any \(\epsilon >0\) we can find \(n_{0}(\epsilon )\) such that \(|x_{p}-x_{q}|<\epsilon \) whenever \(p,q\geq n_{0}(\epsilon )\).
Our first lemma is merely algebraic.
The converse is a powerful theorem of analysis.
Combining the two results we get the general principle of convergence.
Theorem 3.13 (General principle of convergence). A sequence of real numbers converges if and only if it is a Cauchy sequence.
The general principle of convergence is usually too general to be used in the same way as the convergence tests of Course C5/6 but has great theoretical importance as we shall see.
For those who can not wait here is a proof of the uncountability of \(\mathbb R\). The argument is much closer to Cantor’s original proof than the standard one given in course C1. I present it as a heavily starred exercise.
Exercise 3.14 (Uncountability of the reals). Let \(y_{1}\), \(y_{2}\), …be any sequence of points in \(\mathbb R\). Let \(x_{0}=0\), \(\delta _{0}=1\).
(i) Show that you can construct inductively a sequence of real numbers \(x_{1}\), \(x_{2}\), …and positive numbers \(\delta _{j}\) such that
(a) \(|x_{n}-x_{n-1}|<\delta _{n-1}/4\),
(b) \(x_{n}\neq y_{n}\),
(c) \(0<\delta _{n}<|x_{n}-y_{n}|\),
(d) \(\delta _{n}<\delta _{n-1}/4\).
(ii) Show that \(\delta _{n+m}<4^{-m}\delta _{n}\) for \(m,n\geq 0\) and deduce that the \(x_{n}\) form a Cauchy sequence. Conclude that \(x_{n}\) tends to a limit \(x\).
(iii) Show that \(|x_{n+m}-x_{n}|<\delta _{n}/3\) for all \(m,n\geq 0\). Deduce that \(|x-x_{n}|\leq \delta _{n}/3\) for all \(n\geq 0\). Why does this show that \(y_{n}\neq x\) for all \(n\)?
(iv) Deduce that the real numbers are uncountable.
In 1908, G. H. Hardy wrote a textbook to introduce the new rigorous analysis (or ‘continental analysis’ as it was known in a Cambridge even more insular than today) to ‘first year students at the Universities whose abilities approach something like what is usually described as “scholarship standard” ’. Apart from the fact that even the most hardened analyst would now hesitate to call an introduction to analysis A Course of Pure Mathematics it is striking how close the book is in both content and feel to a modern first course in analysis. (And where there are changes it is often not clear that the modern course2 has the advantage.) The only major difference is that Hardy only studies the real line but later advances in mathematics mean that we must now study analysis in \({\mathbb R}^{m}\) as well.
In the course C1/2 you saw that \({\mathbb R}^{m}\) has a Euclidean norm \[\|{\mathbf x}\|=\left (\sum _{i=1}^{m}x_{j}^{2}\right )^{1/2}\] (taking the positive square root) with the following properties.
Lemma 4.1. If \(\|\ \|\) is the Euclidean norm on \({\mathbb R}^{m}\) then
(i) \(\|{\mathbf x}\|\geq 0\) for all \({\mathbf x}\in {\mathbb R}^{m}\),
(ii) If \(\|{\mathbf x}\|= 0\) then \({\mathbf x}={\mathbf 0}\),
(iii) If \(\lambda \in {\mathbb R}\) and \({\mathbf x}\in {\mathbb R}^{m}\) then \(\|\lambda {\mathbf x}\|=|\lambda |\|{\mathbf x}\|\).
(iv) (The triangle inequality) If \({\mathbf x},{\mathbf y}\in {\mathbb R}^{n}\) then \(\|{\mathbf x}+{\mathbf y}\| \leq \|{\mathbf x}\|+\|{\mathbf y}\|\).
The only non-trivial part of this lemma was the triangle inequality which you proved using the Cauchy-Schwarz inequality.
In the course C5/6 you studied the notion of the limit for sequences in \({\mathbb R}^{m}\).
Definition 4.2. If \(\mathbf {a}_{n}\in {\mathbb R}^{m}\) for each \(n\geq 1\) and \(\mathbf {a}\in {\mathbb R}^{m}\) then we say that \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) if given \(\epsilon >0\) we can find an \(n_{0}(\epsilon )\) such that \[\|\mathbf {a}_{n}-\mathbf {a}\|<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]
Notice that this shows that Definition 1.2 was about the distance between two points and not the absolute value of the difference of two numbers.
You proved the following results on sequences in \({\mathbb R}^{m}\) in exactly the same way as you proved the corresponding results in \(\mathbb R\).
Lemma 4.3. (i) The limit is unique. That is, if \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {a}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}=\mathbf {b}\).
(ii) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(\mathbf {a}_{n(j)}\rightarrow \mathbf {a}\) as \(j\rightarrow \infty \).
(iii) If \(\mathbf {a}_{n}=\mathbf {c}\) for all \(n\) then \(\mathbf {a}_{n}\rightarrow \mathbf {c}\) as \(n\rightarrow \infty \).
(iv) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {b}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}_{n}+\mathbf {b}_{n} \rightarrow \mathbf {a}+\mathbf {b}\).
(v) Suppose \(\mathbf {a}_{n}\in {\mathbb R}^{m}\), \(\mathbf {a}\in {\mathbb R}^{m}\), \(\lambda _{n}\in {\mathbb R}\) and \(\lambda \in {\mathbb R}\). If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\lambda _{n}\rightarrow \lambda \) then \(\lambda _{n}\mathbf {a}_{n}\rightarrow \lambda \mathbf {a}\).
Once again I leave it to the reader to check that she can indeed prove these results and to consult her supervisor if she can not. She should also make sure that she understands why certain results in Lemma 2.1 which dealt with an ordered field do not appear in Lemma 4.3 which deals with a vector space.
Lemma 4.3 is, of course, merely algebra and applies to \({\mathbb Q}^{m}\) as much as to \({\mathbb R}^{m}\). In order to do analysis we need a more powerful tool and, in keeping with the spirit of this course, we extend the Bolzano–Weierstrass theorem to \({\mathbb R}^{m}\). The extension is easy since the real work has already been done in the one dimensional case.
Theorem 4.4 (Bolzano–Weierstrass). If \(\mathbf {x}_{n}\in {\mathbb R}^{m}\) and there exists a \(K\) such that \(\|\mathbf {x}_{n}\|\leq K\) for all \(n\) then we can find \(n(1)<n(2)<\ldots \) and \(\mathbf {x}\in {\mathbb R}\) such that \(\mathbf {x}_{n(j)}\rightarrow \mathbf {x}\) as \(j\rightarrow \infty \).
Once again ‘any bounded sequence has a convergent subsequence’.
The proof of Theorem 4.4 involves extending a one dimensional result to several dimensions. This is more or less inevitable because we stated the fundamental axiom of analysis in a one dimensional form. However the Bolzano–Weierstrass theorem itself contains no reference as to whether we are working in \(\mathbb R\) or \({\mathbb R}^{m}\). Thus exactly the same proofs as we used in the one dimensional case give us a general principle of convergence in \({\mathbb R}^{m}\). (Compare Definition 3.10 and Theorem 3.13 with what follows.)
Definition 4.5. We say that a sequence of points \(\mathbf {x}_{n}\) in \({\mathbb R}^{m}\) is a Cauchy sequence if given any \(\epsilon >0\) we can find \(n_{0}(\epsilon )\) such that \(\|\mathbf {x}_{p}-\mathbf {x}_{q}\|<\epsilon \) for all \(p,q\geq n_{0}(\epsilon )\).
Theorem 4.6 (General Principle of Convergence). A sequence in \({\mathbb R}^{m}\) converges if and only if it is a Cauchy sequence.
Here is an example of the use of the general principle of convergence. (Following the conventions of course C5/6 we say that \(\sum _{j=1}^{\infty }\mathbf {a}_{j}\) converges if the sequence of partial sums \(\sum _{j=1}^{N}\mathbf {a}_{j}\) tends to a limit \(\mathbf s\) say. We write \(\sum _{j=1}^{\infty }\mathbf {a}_{j}={\mathbf s}\).)
Theorem 4.7. Let \(\mathbf {a}_{n}\in {\mathbb R}^{m}\) for each \(n\). If \(\sum _{j=1}^{\infty }\|\mathbf {a}_{j}\|\) converges then so does \(\sum _{j=1}^{\infty }\mathbf {a}_{j}\).
This result substantially generalises the result in course C5/6 which says that an absolutely convergent series converges.
When we work in \(\mathbb R\) the intervals are, in some sense, the ‘natural’ sets to consider. One of the problems that we face when we try to do analysis in many dimensions is that the types of sets with which we have to deal are much more diverse. It turns out that the so called closed and open sets are both sufficiently diverse and sufficiently well behaved to be useful. This short section is devoted to deriving some of their simpler properties. Novices frequently find the topic hard but eventually the reader will appreciate that this section is a rather trivial interlude in a deeper discussion.
The definition of a closed set is a natural one.
Definition 5.1. A set \(F\subseteq {\mathbb R}^{m}\) is closed if whenever \(\mathbf {x}_{n}\in F\) for each \(n\) and \(\mathbf {x}_{n}\rightarrow \mathbf {x}\) as \(n\rightarrow \infty \) then \(\mathbf {x}\in F\).
Thus a set is closed in the sense of this course if it is ‘closed under the operation of taking limits’. An indication of why this is good definition is given by the following version of the Bolzano–Weierstrass theorem.
Theorem 5.2. If \(K\) is closed bounded set in \({\mathbb R}^{m}\) then every sequence in \(K\) has a subsequence converging to a point of \(K\).
When working in \({\mathbb R}^{m}\) the words ‘closed and bounded’ should always elicit the response ‘Bolzano–Weierstrass’. We shall see important examples of this slogan in action in the next section (Theorem 6.3 and Theorem 6.12).
We turn now to the definition of an open set.
Definition 5.3. A set \(U\subseteq {\mathbb R}^{m}\) is open if whenever \(\mathbf {x}\in U\) there exists an \(\epsilon ({\mathbf x})>0\) such that whenever \(\|\mathbf {x}-\mathbf {y}\|<\epsilon ({\mathbf x})\) we have \(\mathbf {y}\in U\).
Thus every point of an open set lies ‘well inside the set’.
Example 5.4. Consider sets in \(\mathbb R\). The interval \([a,b]=\{x\in {\mathbb R}:a\leq x \leq b\}\) is closed, the interval \((a,b)=\{x\in {\mathbb R}:a< x <b\}\) is open, the interval \([a,b)=\{x\in {\mathbb R}:a\leq x <b\}\) is neither open nor closed, \(\mathbb R\) is both open and closed.
Example 5.5. Consider sets in \({\mathbb R}^{m}\). Let \({\mathbf x}\in {\mathbb R}^{m}\) and \(r>0\).
(i) The set \(B({\mathbf x},r) =\{\mathbf {y}\in {\mathbb R}^{m}: \|\mathbf {x}-\mathbf {y}\|<r\}\) is open.
(ii) The set \(\bar {B}({\mathbf x},r) =\{\mathbf {y}\in {\mathbb R}^{m}: \|\mathbf {x}-\mathbf {y}\|\leq r\}\) is closed.
We call \(B({\mathbf x},r)\) the open ball of radius \(r\) and centre \(\mathbf x\). We call \(\bar {B}({\mathbf x},r)\) the closed ball of radius \(r\) and centre \(\mathbf x\). Observe that the closed and and open balls of \(\mathbb R\) are precisely the closed and open intervals.
The following restatement of the definition helps us picture an open set.
Lemma 5.6. A subset \(A\) of \({\mathbb R}^{m}\) is open if and only if each point of \(A\) is the centre of an open ball lying entirely within \(A\).
Thus every point of an open set is surrounded by a ball consisting only of points of the set.
The topics of this section are often treated using the idea of neighbourhoods. We shall not use neighbourhoods very much but they come in useful from time to time.
Definition 5.7. The set \(N\) is a neighbourhood of the point \(\mathbf x\) if we can find an \(r({\mathbf x})>0\) such that \(B({\mathbf x},r({\mathbf x}))\subseteq N\).
Thus a set is open if and only if it is neighbourhood of every point that it contains.
Returning to the main theme we note the following remarkable fact.
Lemma 5.8. A subset \(A\) of \({\mathbb R}^{m}\) is open if and only if its complement \({\mathbb R}^{m}\setminus A\) is closed.
We observe the following basic results on open and closed sets.
Lemma 5.9. Consider the collection \(\tau \) of open sets in \({\mathbb R}^{m}\).
(i) \(\emptyset \in \tau \), \({\mathbb R}^{m}\in \tau \).
(ii) If \(U_{\alpha }\in \tau \) for all \(\alpha \in A\) then \(\bigcup _{\alpha \in A} U_{\alpha }\in \tau \).
(iii) If \(U_{1},U_{2},\dots ,U_{n}\in \tau \) then \(\bigcap _{j=1}^{n}U_{j}\in \tau \).
Lemma 5.10. Consider the collection \(\mathcal {F}\) of closed sets in \({\mathbb R}^{m}\).
(i) \(\emptyset \in \mathcal {F}\), \({\mathbb R}^{m}\in \mathcal {F}\).
(ii) If \(F_{\alpha }\in \mathcal {F}\) for all \(\alpha \in A\) then \(\bigcap _{\alpha \in A} F_{\alpha }\in \mathcal {F}\).
(iii) If \(F_{1},F_{2},\dots ,F_{n}\in \mathcal {F}\) then \(\bigcup _{j=1}^{n}F_{j}\in \mathcal {F}\).
In this context you should note the following examples.
Example 5.11. (i) \(\bigcap _{j=1}^{\infty }(-2-j^{-1},2+j^{-1}) =[-2,2]\), \(\bigcap _{j=1}^{\infty }(-2+j^{-1},2-j^{-1}) =(-1,1)\), \(\bigcap _{j=1}^{\infty }(-2+j^{-1},2+j^{-1})=(-1,2]\).
(ii) \(\bigcup _{j=1}^{\infty }[-1-j^{-1},1+j^{-1}] =[-2,2]\), \(\bigcup _{j=1}^{\infty }[-1+j^{-1},1-j^{-1}] =(-1,1)\), \(\bigcup _{j=1}^{\infty }[-1+j^{-1},1+j^{-1}]=(-1,2]\).
Thus the restriction to finite collections in part (iii) of Lemmas 5.9 and 5.10 can not be removed.
So far in this course we have only dealt with subsets and sequences. But analysis deals with functions, or rather with ‘well behaved functions’. The simplest such class was defined in course C5/6.
Definition 5.12. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous at some point \({\mathbf x}\in E\) if given \(\epsilon >0\) we can find a \(\delta (\epsilon ,{\mathbf x})>0\) such that whenever \({\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon ,{\mathbf x})\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\] If \(f\) is continuous at every point \({\mathbf x}\in E\) we say that \(f\) is a continuous function on \(E\).
In other words \(f\) is continuous at \(\mathbf x\) if \(f({\mathbf y})\) can be made as close as we wish to \(f({\mathbf x})\) simply by taking \(\mathbf y\) sufficiently close to \(\mathbf x\). In other words \(f\) is locally approximately constant. In less precise terms ‘small changes produce small effects’.
Exercise 5.13. After looking at parts (iii) to (v) of Lemma 4.3 state the corresponding results for continuous functions. (Thus part (v) corresponds to the statement that if \(\lambda :E\rightarrow {\mathbb R}\) and \(f:E\rightarrow {\mathbb R}^{p}\) are continuous at \({\mathbf x}\in E\) then so is \(\lambda f\).) Prove your statements directly from Definition 5.12. Consult your supervisor if you have any problems.
In this course we shall mainly use the following simple consequence of the definition.
Lemma 5.14. Let \(E\subseteq {\mathbb R}^{m}\) and suppose that the function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous at \({\mathbf x}\in E\). Then if \({\mathbf x}_{n}\in E\) and \({\mathbf x}_{n}\rightarrow {\mathbf x}\) it follows that \(f({\mathbf x}_{n})\rightarrow f({\mathbf x})\).
However, more advanced work uses generalisations of the following lemma. (Remember that if \(f:X\rightarrow Y\) is a function and \(A\subseteq Y\) we write \[f^{-1}(A)=\{x\in X\, :\, f(x)\in A\}\] and this notation does not require \(f\) to be invertible.)
Lemma 5.15. The function \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is continuous if and only if \(f^{-1}(O)\) is open whenever \(O\) is open.
As a simple example of how Lemma 5.15 can be used contrast the ‘\(\epsilon \), \(\delta \) proof’ of the next lemma using Definition 5.12 with a proof using Lemma 5.15.
Lemma 5.16. If \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) and \(g:{\mathbb R}^{p}\rightarrow {\mathbb R}^{g}\) are continuous the so is their composition \(g\circ f\).
(Recall that we write \(g\circ f({\mathbf x})=g(f({\mathbf x}))\).)
Although we shall not make much use of it, the following is an important consequence of the Bolzano–Weierstrass theorem.
Theorem 5.17. Suppose that \(F_{1}\), \(F_{2}\), …are non-empty bounded closed sets in \({\mathbb R}^{m}\) such that \(F_{1}\supseteq F_{2}\supseteq \dots \). Then \(\bigcap _{j=1}^{\infty }F_{j}\neq \emptyset .\)
That is, the intersection of bounded, closed, nested non-empty sets is itself non-empty. The example \(F_{j}=[j,\infty )\) shows that ‘bounded’ can not be dropped from the hypothesis. The example \(F_{j}=(0,j^{-1})\) shows that ‘closed’ can not be dropped from the hypothesis.
Finally, it is worth emphasising that openess is not an intrinsic property of a set but depends on the space in which the set finds itself.
(If you have a sufficiently logical mind Example 5.18 is otiose. Few of us have that kind of mind.)
This section contains the core of a first course in analysis. For completeness I shall include some theorems already proved in course C5/6. The first such theorem is the intermediate value theorem proved in C5/6 by successive bisection. That proof is entirely satisfactory but, as an exercise, we shall prove the result using the method of Bolzano–Weierstrass.
Theorem 6.1 (Intermediate value theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is continuous and \(f(a)\leq 0\leq f(b)\) then there exists a \(c\in [a,b]\) such that \(f(c)=0\).
You should compare this result with part (i) of Example 1.1. Note also that the result depends on \(f\) taking values in \(\mathbb R\) rather than \({\mathbb R}^{2}\) or \(\mathbb C\).
Example 6.2. If \(f:[0,1]\rightarrow {\mathbb C}\) is given by \(f(t)=\exp \pi it\) then \(f\) is continuous, \(f(0)=1\), \(f(1)=-1\) but \(f(t)\neq 0\) for all \(t\).
Our next result looks a little abstract at first.
Theorem 6.3. Let \(K\) be a closed bounded subset of \({\mathbb R}^{m}\) and \(f:K\rightarrow {\mathbb R}^{p}\) a continuous function. Then \(f(K)\) is closed and bounded.
Thus the continuous image of a closed bounded set is closed and bounded. (The example \(m=2\), \(p=1\), \[K=\{(x,y)\, : \, y\geq x^{-1}\ ,x>0\}\] \(f(x,y)=x\) shows that the continuous image of a closed set need not be closed.
The example The example \(m=1\), \(p=1\), \(K\) the open interval \((0,1)\), \(f(x)=x^{-1}\), shows that the continuous image of a bounded set need not be bounded.)
We derive a much more concrete corollary.
Theorem 6.4. Let \(K\) be a non-empty closed bounded subset of \({\mathbb R}^{m}\) and \(f:K\rightarrow {\mathbb R}\) a continuous function. Then we can find \({\mathbf k}_{1}\) and \({\mathbf k}_{2}\) in \(K\) such that \[f({\mathbf k}_{1})\leq f({\mathbf k})\leq f({\mathbf k}_{2})\] for all \({\mathbf k}\in K\).
In other words a real valued continuous function on a closed bounded set is bounded and attains its bounds. Less usefully we may say that, in this case \(f\) actually has a maximum and a minimum. Notice that there is no analogous result for vector valued functions. Much of economics consists in an attempt to disguise this fact (there is unlikely to be a state of the economy in which everybody is best off3).
Theorem 6.4 has an even more concrete consequence.
Theorem 6.5. Let \(f:[a,b]\rightarrow {\mathbb R}\) be a continuous function. Then we can find \(k_{1},k_{2}\in [a,b]\) such that \[f(k_{1})\leq f(k)\leq f(k_{2})\] for all \(k\in [a,b]\)
In course C5/6 you used this result to prove Rolle’s theorem.
Theorem 6.6 (Rolle’s theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is a continuous function with \(f\) differentiable on \((a,b)\) and \(f(a)=f(b)\) then we can find a \(c\in (a,b)\) such that \(f'(c)=0\).
By ‘tilting Rolle’s theorem’ we obtain the mean value (in one dimension).
Theorem 6.7 (The one dimensional mean value theorem). If \(f:[a,b]\rightarrow {\mathbb R}\) is a continuous function with \(f\) differentiable on \((a,b)\) then we can find a \(c\in (a,b)\) such that \[f(b)-f(a)=(b-a)f'(c).\]
Historically, mathematicians were slow to grasp the importance of the mean value theorem and the subtlety of its proof. Fallacious proofs still appeared in textbooks (British textbooks, admittedly) in the early years of the 20th century. (The proofs were fallacious in the strictest sense. They ‘proved’ the impossibility of the result given in part (ii) of Example 1.1.)
As you saw in course C5/6, the mean value theorem Theorem 6.7 has the following key corollaries.
Theorem 6.8. Suppose that \(f:(\alpha ,\beta )\rightarrow {\mathbb R}\) is a differentiable function.
(i) If \(f'(x)\geq 0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is an increasing function on \((\alpha ,\beta )\) (that is \(f(x)\leq f(y)\) whenever \(\alpha <x<y<\beta \)).
(ii) If \(f'(x)> 0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is a strictly increasing function on \((\alpha ,\beta )\) (that is \(f(x)<f(y)\) whenever \(\alpha <x<y<\beta \)).
(iii) If \(f'(x)=0\) for all \(x\in (\alpha ,\beta )\) then \(f\) is constant.
(iv) If \(|f'(x)|\leq K\) for all \(x\in (\alpha ,\beta )\) then \(|f(u)-f(v)|\leq K|u-v|\) for all \(u,v\in (\alpha ,\beta )\)
Part (iii) of of Theorem 6.8 is the result that tells us that the solutions (if any) of \(f'(x)=g(x)\) on \((\alpha ,\beta )\) differ only by the addition of a constant (the so called ‘constant of integration’).
Part (iv) of of Theorem 6.8 is an excellent example of the way that the theorems of this section (and many other theorems of analysis) convert local information into global information. We know that \(|f'(x)|\leq K\) so that locally the rate of change of \(f\) is no greater than \(K\). We deduce that \(|f(u)-f(v)|\leq K|x-y|\) so that globally the rate of change of \(f\) is no greater than \(K\).
The final theorem of this section (Theorem 6.12 on uniform continuity) is another excellent example of the conversion of local information to global. We need a couple of definitions and examples. Recall first from Definition 5.12 what it means for a function to be continuous on a set
Definition 6.9. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is continuous on \(E\) if given any point \({\mathbf x}\in E\) and any \(\epsilon >0\) we can find a \(\delta (\epsilon ,{\mathbf x})>0\) such that whenever \({\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon ,{\mathbf x})\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\]
Now compare that definition with our definition of uniform continuity.
Definition 6.10. Let \(E\subseteq {\mathbb R}^{m}\) we say that a function \(f:E\rightarrow {\mathbb R}^{p}\) is uniformly continuous on \(E\) if given any \(\epsilon >0\) we can find a \(\delta (\epsilon )>0\) such that whenever \({\mathbf x},{\mathbf y}\in E\) and \(\|{\mathbf x}-{\mathbf y}\|<\delta (\epsilon )\) we have \[\|f({\mathbf x})-f({\mathbf y})\|<\epsilon .\]
Example 6.11. The following three functions are continuous but not uniformly continuous.
(i) \(f_{1}:{\mathbb R}\rightarrow {\mathbb R}\) given by \(f_{1}(x)=x^{2}\).
(ii) \(f_{2}:(0,1)\rightarrow {\mathbb R}\) given by \(f_{2}(x)=x^{-1}\).
(iii) \(f_{3}:(0,1)\rightarrow {\mathbb R}\) given by \(f_{2}(x)=\sin (x^{-1})\).
Theorem 6.12. Let \(K\) be a closed bounded subset of \({\mathbb R}^{m}\). If \(f:K\rightarrow {\mathbb R}^{p}\) is continuous on \(K\) then \(f\) is uniformly continuous on K.
The results of this section are the foundations of all analysis. It took 200 years to found analysis on the same axiomatic basis as Greek geometry and the successful completion of the project was a triumph of the human intellect. If you can trace a compete path from the fundamental axiom to the one dimensional mean value theorem you will have mastered the essential point of this course.
A function is continuous if it is locally approximately constant. A function is differentiable if it is locally approximately linear. More precisely, a function is continuous at a point \(\mathbf x\) if it is locally approximately constant with an error which decreases to zero as we approach \(\mathbf x\). A function is differentiable at a point \(\mathbf x\) if it is locally approximately linear with an error which decreases to zero faster than linearly as we approach \(\mathbf x\).
Definition 7.1. Suppose that \(E\) is a subset of of \({\mathbb R}^{m}\) and \(\mathbf x\) a point such that there exists a \(\delta >0\) with \(B({\mathbf x},\delta )\subseteq E\). We say that \(f:E\rightarrow {\mathbb R}^{p}\), is differentiable at \(\mathbf x\) if we can find a linear map \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) such that, when \(\|{\mathbf h}\|<\delta \)
\begin{equation*} f({\mathbf x}+{\mathbf h})=f({\mathbf x})+\alpha{\mathbf h} +{\boldsymbol\epsilon}({\mathbf x},{\mathbf h})\|{\mathbf h}\| \tag*{$\bigstar$} \end{equation*}
where \(\|{\boldsymbol \epsilon }({\mathbf x},{\mathbf h})\|\rightarrow 0\) as \(\|{\mathbf h\|}\rightarrow 0\). We write \(\alpha =Df({\mathbf x})\) or \(\alpha =f'({\mathbf x})\).
If \(E\) is open and \(f\) is differentiable at each point of \(E\) we say that \(f\) is differentiable on \(E\).
Needless to say, the centre of the definition is the formula \(\bigstar \) and the reader should concentrate on understanding the rôle of each term in that formula. The rest of the definition is just supporting waffle. Formula \(\bigstar \) is sometimes rewritten \[\frac {f({\mathbf x}+{\mathbf h})-f({\mathbf x}) -\alpha {\mathbf h}}{\|{\mathbf h}\|}\rightarrow 0\] as \(\|{\mathbf h}\|\rightarrow 0\).
You have already studied differentiation in course C5/6 and you regularly use partial differentiation in applied courses and elsewhere. I shall therefore concentrate on recalling some of the key points.
Lemma 7.2. Let \(f\) be as in Definition 7.1. If we use standard coordinates then if \(f\) is differentiable at \(\mathbf x\) its partial derivatives \(\displaystyle f_{i,j}({\mathbf x})= \frac {\partial f_{i}}{\partial x_{j}}\) exist and the matrix of the derivative \(Df({\mathbf x})\) is the Jacobian matrix \((f_{i,j}({\mathbf x}))\) of partial derivatives.
It is customary to point out that the existence of the partial derivatives does not imply the differentiability of the function (see Example 8.5 below) but the main objections to over-reliance on partial derivatives are that this makes formulae cumbersome and stifles geometric intuition. Let your motto be ‘coordinates and matrices for calculation, vectors and linear maps for understanding’.
One key property of differentiation is the chain rule. You have already seen this discussed and proved in course C5/6. Our discussion will run along the same lines but is slightly neater.
Lemma 7.3. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map we can find a constant \(K\) such that \(|\alpha {\mathbf x}|\leq K\|{\mathbf x}\|\) for all \({\mathbf x}\in {\mathbb R}^{m}\).
We can now make the following definition.
Definition 7.4. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map then \[\|\alpha \|=\sup _{\|{\mathbf x}\|\leq 1}\|\alpha {\mathbf x}\|.\]
The following easy exercise provides an equivalent definition.
Exercise 7.5. Show that if \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is a linear map then \[\|\alpha \|=\sup _{\|{\mathbf x}\|= 1}\|\alpha {\mathbf x}\|.\]
This ‘operator norm’ has many pleasant properties.
Lemma 7.6. Let \(\alpha ,\beta :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) be linear maps.
(i) If \({\mathbf x}\in {\mathbb R}^{m}\) then \(\|\alpha {\mathbf x}\|\leq \|\alpha \|\,\|{\mathbf x}\|\).
(ii) \(\|\alpha \|\geq 0\),
(iii) If \(\|\alpha \|= 0\) then \(\alpha =0\),
(iv) If \(\lambda \in {\mathbb R}\) then \(\|\lambda \alpha \|=|\lambda |\|\alpha \|\).
(v) (The triangle inequality) \(\|\alpha +\beta \|\leq \|\alpha \|+\|\beta \|\).
(vi) If \(\gamma :{\mathbb R}^{p}\rightarrow {\mathbb R}^{q}\) then \(\|\gamma \alpha \|\leq \|\gamma \|\,\|\alpha \|\).
Lemma 7.7 (The chain rule). Let \(U\) be a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{m}\), and \(V\) a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{p}\). Suppose that \(f:U\rightarrow V\) is differentiable at \(\mathbf x\) with derivative \(\alpha \), that \(f:V\rightarrow {\mathbb R}^{q}\) is differentiable at \(\mathbf y\) with derivative \(\beta \) and that \(f({\mathbf x})={\mathbf y}\). Then \(g\circ f\) is differentiable at \(\mathbf x\) with derivative \(\beta \alpha \).
In more condensed notation
\begin{equation*} D(g\circ f)({\mathbf x})=Dg(f({\mathbf x}))Df({\mathbf x}). \tag*{$\bigstar\bigstar$} \end{equation*}
It is important to see that formula \(\bigstar \bigstar \) is exactly what we should expect and that its proof is simple book-keeping.
Remark One of the most troublesome culture clashes between pure mathematics and applied is that to an applied mathematician variables like \(x\) and \(t\) have meanings such as position and time whereas to a pure mathematician all variables are ‘dummy variables’ or ‘place-holders’ to be interchanged at will. To a pure mathematician \(v\) is an arbitrary function defined by its effect on a variable so that \(v(t)=At^{3}\) means precisely the same thing as \(v(x)=Ax^{3}\) whereas to an applied mathematician who thinks of \(v\) as a velocity the statements \(v(t)=At^{3}\) and \(v(x)=Ax^{3}\) mean very different (indeed incompatible) things. The applied mathematician writes \[\frac {dv}{dt}=\frac {dv}{dx}\frac {dx}{dt}\] but the nearest the pure mathematician can get to this statement is \[\frac {d\ }{dt}v(x(t))=v'(x(t))x'(t).\] The same problems occur in still more confusing form when we deal with partial derivatives. In particular the formula \[\frac {\partial f}{\partial x_{i}} =\sum _{k=1}^{n}\frac {\partial f}{\partial y_{k}} \frac {\partial y_{k}}{\partial x_{i}}\] makes no sense without using further unspoken conventions. The only remedy that I can suggest is ‘think like a pure mathematician when doing pure mathematics and like an applied mathematician when doing applied mathematics’.
The following simple result has a simple direct proof but it is instructive to prove it by the chain rule.
Lemma 7.8. Let \(U\) be a neighbourhood of \(\mathbf x\) in \({\mathbb R}^{m}\). Suppose that \(f,g:U\rightarrow {\mathbb R}^{p}\) are differentiable at \(\mathbf x\). Then \(f+g\) is differentiable at \(\mathbf x\) and \(D(f+g)({\mathbf x})=Df({\mathbf x})+Dg({\mathbf x})\).
So far our study of differentiation in higher dimensions has remained on the level of mere algebra. (The definition of the operator norm used the supremum and so lay deeper but we could have avoided the use of the operator norm as was done in course C5/6.) The next result is a true theorem of analysis.
Theorem 7.9 (The mean value inequality). Suppose that \(U\) is an open set in \({\mathbb R}^{m}\) and that \(f:U\rightarrow {\mathbb R}^{p}\) is differentiable. Consider the straight line segment \[L=\{(1-t){\mathbf a}+t{\mathbf b}:0\leq t\leq 1\}\] joining \(\mathbf a\) and \(\mathbf b\). If \(L\subseteq U\) (i.e. \(L\) lies entirely within \(U\)) and \(\|Df({\mathbf x})\|\leq K\) for all \({\mathbf x}\in L\) then \[\|f({\mathbf a})-f({\mathbf b})\| \leq K\|{\mathbf a}-{\mathbf b}\|.\]
The reader may be disappointed that a theorem involving an equality in one dimension should be replaced by one involving an inequality in many dimensions. However, inspection of the use of the mean value theorem in one dimension shows that its essential content is the statement that ‘local growth bounds global growth’ and this statement is precisely the mean value inequality.
In any case the following example shows that we can not hope for an exact analogue of the one dimensional theorem.
Example 7.10. Let \(f:{\mathbb R}\rightarrow {\mathbb R}^{2}\) be given by \(f(t)=(\cos t,\sin t)^{T}\). Then \(f(0)=f(2\pi )\) but \(Df(t)\neq 0\) for all \(t\).
It is also clear (though we shall not prove it, and, indeed can not state it without using concepts which we have not formally defined) that the correct generalisation when \(L\) is not a straight line will run as follows. ‘If \(L\) is a well behaved path lying entirely within \(U\) and \(\|Df({\mathbf x})\|\leq K\) for all \({\mathbf x}\in L\) then \(\|f({\mathbf a})-f({\mathbf b})\| \leq K\times \operatorname {length}L\)’.
Example 7.11. Let \[U=\{{\mathbf x}\in {\mathbb R}^{2}:\|{\mathbf x}\|>1\} \setminus \{(x,0)^{T}:x\leq 0\}\}\] If we take \(\theta ({\mathbf x})\) to be the unique solution of \[\cos (\theta ({\mathbf x}))=x,\ \sin (\theta ({\mathbf x}))=y, \ -\pi <\theta (\mathbf x)<\pi \] for \({\mathbf x}=(x,y)^{T}\in U\) then \(\theta :U\rightarrow {\mathbb R}\) is everywhere differentiable with \(\|D\theta ({\mathbf x})\|<1\). However if \(\mathbf {a}=(-1,10^{-1})^{T}\), \(\mathbf {b}=(-1,-10^{-1})^{T}\) then \[|\theta (\mathbf {a})-\theta (\mathbf {b})|> \|\mathbf {a}-\mathbf {b}\|.\]
In this section we exploit the mean value inequality to obtain information about the behaviour of reasonably well behaved functions \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}\) in the neighbourhood of a point \(\mathbf x\). (Since we can always translate we shall sometimes put \({\mathbf x}={\mathbf 0}\) without comment.) It may be worth alerting the reader to the fact that we shall be using the one-dimensional mean value inequality (see Theorem 6.8 (iv)) in forms like the following. ‘If \(|f_{,1}(x,c)|\leq K\) for some fixed \(c\) and all \(a\leq x\leq b\) then \(|f(a,c)-f(b,c)|\leq K|b-a|\).’ In this section we shall use row rather than column vectors.
Here is our first example.
Theorem 8.1 (Continuity of partial derivatives implies differentiability). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\), \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\) and \(f_{,2}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \[f((x+h,y+k))=f(x,y)+f_{,1}(x,y)h+f_{,2}(x,y)k+ \epsilon (h,k)(h^{2}+k^{2})^{1/2}\] we have \(\epsilon (h,k)\rightarrow 0\) as \((h^{2}+k^{2})^{1/2}\rightarrow 0\). (In other words \(f\) is differentiable at \(\mathbf x\).)
Although this is not one of the great theorems of all time (or indeed of this course) it does provide a useful short cut for proving functions differentiable. The following easy extension is left to the reader.
Theorem 8.2. (i) Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), …\(f_{,m}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then \(f\) is differentiable at \(\mathbf x\).
(ii) Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}^{p}\). If the partial derivatives \(f_{i,j}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) \([1\leq i\leq p,\ 1\leq j\leq m]\) then \(f\) is differentiable at \(\mathbf x\).
The next result, although along similar lines, is much more interesting. We write \[f_{,ij}({\mathbf x})=(f_{,j})_{,i}({\mathbf x}),\] or, in more familiar notation \[f_{,ij}=\frac {\partial ^{2}f}{\partial x_{i}\partial x_{j}}.\]
Theorem 8.3 (Second order Taylor series). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), \(f_{,11}\), \(f_{,12}\), \(f_{,22}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \begin {align*} f((x+h,y+k))=&f(x,y)+f_{,1}(x,y)h+f_{,2}(x,y)k\\ &+\tfrac {1}{2}(f_{,11}(x,y)h^{2}+2f_{,12}(x,y)hk +f_{,22}(x,y)k^{2})+\epsilon (h,k)(h^{2}+k^{2}) \end {align*}
we have \(\epsilon (h,k)\rightarrow 0\) as \((h^{2}+k^{2})^{1/2}\rightarrow 0\).
We have the following important corollary.
Theorem 8.4 (Symmetry of second partial derivative). Suppose \(\delta >0\), \({\mathbf x}=(x,y)\in {\mathbb R}^{2}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{2}\) and that \(f:E\rightarrow {\mathbb R}\). If the partial derivatives \(f_{,1}\), \(f_{,2}\), \(f_{,11}\), \(f_{,12}\), \(f_{,21}\) \(f_{,22}\) exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then \(f_{,12}({\mathbf x})=f_{,21}({\mathbf x})\).
Stripped of all the fine (and for practical purposes irrelevant) detail this tells us that we can interchange the order of partial differentiation for well behaved functions.
It is possible to produce plausibility arguments for the symmetry of second partial derivatives. Here are a couple.
(1) If \(f\) is a multinomial, i.e. \(f(x,y)=\sum _{p=0}^{P}\sum _{q=0}^{Q}a_{p,q}x^{p}y^{q}\), then \(f_{,12}=f_{,21}\). But smooth functions are very close to being polynomial so we would expect the result to be true in general.
(2) Although we can not interchange limits in general it is plausible that if \(f\) is well behaved then \begin {align*} f_{,12}(x,y)&=\lim _{h\rightarrow 0}\lim _{k\rightarrow 0} h^{-1}k^{-1}(f(x+h,y+k)-f(x+h,y)-f(x,y+k)+f(x,y))\\ &=\lim _{k\rightarrow 0}\lim _{h\rightarrow 0} h^{-1}k^{-1}(f(x+h,y+k)-f(x+h,y)-f(x,y+k)+f(x,y))\\ &=f_{,21}(x,y). \end {align*}
However, these are merely plausible arguments. They do not make clear the role of the continuity of the second derivative (in Example 8.6 we shall see that the result may fail for discontinuous second partial derivatives). More fundamentally they are algebraic arguments and, as the use of the mean value theorem indicates, the result is one of analysis.
Theorems 8.1 and 8.4 are inextricably linked in many examiners’ minds with the following two counter-examples. The present lecturer would rather you forgot all the course than that you forgot how to prove the theorems and remembered the counter-examples but will not let his personal feelings get in the way of his duty.
Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) has \(f({\mathbf 0})=0\). We write \(g(r,\theta )=f(r\cos \theta ,r\sin \theta )\) (i.e. use polar coordinates). If \(f\) is once differentiable then, to first order in \(r\),
(i) \(g(r,\theta )\) grows like \(r\),
(ii) the contours \(f(x,y)=c\) are parallel lines.
If we seek counter-examples it is reasonable to look for an \(f\) such that (i) is true but (ii) is not. We might therefore be led to consider \(g(r,\theta )=r\sin 2\theta \). (Finding counter-examples is more like tinkering with a meccano kit than anything else.)
Example 8.5. If \begin {align*} f(x,y)&=\frac {xy}{(x^{2}+y^{2})^{1/2}}\qquad \text {for $(x,y)\neq (0,0)$},\\ f(0,0)&=0, \end {align*}
then \(f\) has first partial derivatives everywhere, \(f\) is differentiable except at \((0,0)\) where it is not.
Emboldened by our success we could well guess immediately a suitable function to look for in the context of Theorem 8.4. If not let us consider \(f\) and \(g\) as before. Suppose that \[f({\mathbf 0})=f_{,1}({\mathbf 0})=f_{,2}({\mathbf 0})=0.\] If \(f\) obeys the conclusions of Theorem 8.3 (i.e. has a second order Taylor series) then, to second order in \(r\),
(i) \(g(r,\theta )\) grows like \(r^{2}\),
(ii) the contours \(f(x,y)=c\) are conics.
Once again we look for an \(f\) such that (i) is true but (ii) is not. We might, for example, consider \(g(r,\theta )=r^{2}\sin 4\theta \).
Example 8.6. If \begin {align*} f(x,y)&=\frac {xy(x^{2}-y^{2})}{(x^{2}+y^{2})} \qquad \text {for $(x,y)\neq (0,0)$},\\ f(0,0)&=0, \end {align*}
then \(f\) has first and second partial derivatives everywhere but \(f_{,12}(0,0)\neq f_{,21}(0,0)\).
It is profoundly unfortunate that the last contact many mathematicians have with this topic is a couple of complicated counter-examples. Multi-dimensional calculus leads towards differential geometry and infinite dimensional calculus (functional analysis). Both subjects depend on understanding objects which we know to be well behaved but which our limited geometric intuition makes it hard for us to comprehend. Counter-examples such as the ones just produced are simply irrelevant
What happens if a function is smooth (has partial derivatives of all orders)? The following theorem completes the discussion of this section. It is not on the syllabus and left to the reader as an exercise (but a very instructive one).
Theorem 8.7 (The local Taylor’s theorem). Suppose \(\delta >0\), \({\mathbf x}\in {\mathbb R}^{m}\) \(B({\mathbf x},\delta )\subseteq E\subseteq {\mathbb R}^{m}\) and that \(f:E\rightarrow {\mathbb R}^{p}\). If the all partial derivatives \(f_{i,j}\), \(f_{i,jk}\), \(f_{i,jkl}\), …exist in \(B({\mathbf x},\delta )\) and are continuous at \(\mathbf x\) then writing \begin {align*} f_{i}({\mathbf x}+{\mathbf h})=f_{i}({\mathbf x})& +\sum _{j=1}^{m}f_{i,j}({\mathbf x})h_{j}+ \sum _{j=1}^{m}\sum _{k=1}^{m}f_{i,jk}({\mathbf x})h_{j}h_{k}\\ &+\dots +\text {sum up to $n$th powers}+ \epsilon _{i}(\mathbf {h})\|\mathbf {h}\|^{n} \end {align*}
we have \(\|{\boldsymbol \epsilon }(\mathbf {h})\|\rightarrow 0\) as \(\|\mathbf {h}\|\rightarrow 0\).
The wide awake reader may observe that I used (gasp, horror!) coordinates in the statement of Theorem 8.7. However, the main formula can be stated in a coordinate free way as \[ f({\mathbf x}+{\mathbf h})=f({\mathbf x}) +\alpha _{1}({\mathbf h})+\alpha _{2}({\mathbf h},{\mathbf h})+ +\dots +\alpha _{n}({\mathbf h},{\mathbf h},\dots ,{\mathbf h}) +{\boldsymbol \epsilon }(\mathbf {h})\|\mathbf {h}\|^{n}, \] where \(\alpha _{k}:{\mathbb R}^{m}\times {\mathbb R}^{m} \dots \times {\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is linear in each variable (i.e. a \(k\)-linear function). For more details consult section 12 of chapter VIII of Dieudonné’s Foundations of Modern Analysis [5] where the higher derivatives are dealt with in a coordinate free way. Like Hardy’s book [6], Dieudonné’s is a masterpiece but in very different tradition.
Anyone who feels that the higher derivatives are best studied using coordinates should reflect that if \(f:{\mathbb R}^{3}\rightarrow {\mathbb R}^{3}\) is well behaved then the ‘third derivative behaviour’ at of \(f\) at a single point is apparently given by the \(3\times 3\times 3\times 3=81\) numbers \(f_{i,jkl}(\mathbf {x})\). By symmetry (see Theorem 8.4) only \(30\) of the numbers are distinct but these \(30\) numbers are independent (consider polynomials in three variables for which the total degree of each term is \(3\)). How can we understand the information carried by an array of \(30\) real numbers?
(Much of this section was sketched in the course C5/6 but this treatment will supply some important extra detail.) Everybody knows what area is, but then everybody knows what honey tastes like. But does honey taste the same to you as it does to me? Perhaps the question is unanswerable but for many practical purposes it is sufficient that we agree on what we call honey. In the same way it is important that when two mathematicians talk about area that (1) they should agree on which sets \(E\) actually have area, and that (2) when a set \(E\) has area they should agree as to what the area is.
At one time some mathematicians may well have hoped that every bounded set in \({\mathbb R}^{2}\) could be given an area, every bounded set in \({\mathbb R}^{3}\) could be given a volume, and so on. However Banach and Tarski showed that a ball in \({\mathbb R}^{3}\) could be split into seven parts which could be reassembled after rotation and translation into a ball of smaller radius. (The proof is non-trivial but there is a good discussion in Wagon’s book The Banach–Tarski Paradox [10].) If each of these seven parts had a volume then repeated Banach Tarski decompositions would enable us to get a quart into a pint pot. Warned by this we shall not attempt to assign area to every subset of \({\mathbb R}^{2}\) or to integrate every function.
Turning specifically to the Riemann integral we consider bounded functions \(f:[a,b]\rightarrow {\mathbb R}\). (Thus there exists a \(K\) with \(|f(x)|\leq K\) for all \(x\in [a,b]\)). A dissection \(\mathcal {D}\) of \([a,b]\) is a finite subset of \([a,b]\) containing the end points \(a\) and \(b\). By convention we write \[\mathcal {D}=\{x_{0},x_{1},\dots ,x_{n}\} \ \text {with}\ a=x_{0}\leq x_{1}\leq x_{2}\leq \dots \leq x_{n}=b.\] We define the upper sum and lower sum of \(f\) associated with \(\mathcal {D}\) by \begin {align*} S(f,\mathcal {D})=&\sum _{j=1}^{n}(x_{j}-x_{j-1}) \sup _{x\in [x_{j-1},x_{j}]}f(x),\\ s(f,\mathcal {D})=&\sum _{j=1}^{n}(x_{j}-x_{j-1}) \inf _{x\in [x_{j-1},x_{j}]}f(x) \end {align*}
The next lemma is hardly more than an observation but it is the key to the proper treatment of the integral.
Lemma 9.1 (Key integration property). If \(f:[a,b]\rightarrow {\mathbb R}\) is bounded and \(\mathcal {D}_{1}\) and \(\mathcal {D}_{2}\) are two dissections then
\begin{equation*} S(f,\mathcal{D}_{1})\geq S(f,\mathcal{D}_{1}\cup\mathcal{D}_{2}) \geq s(f,\mathcal{D}_{1}\cup\mathcal{D}_{2})\geq s(f,\mathcal{D}_{2}). \tag*{$\bigstar$} \end{equation*}
The inequality \(\bigstar \) tells us that, whatever dissection you pick and whatever dissection I pick, your lower sum can not exceed my upper sum. There is no way we can put a quart in a pint pot and the Banach-Tarski phenomenon is avoided.
Since \(S(f,\mathcal {D})\geq -(b-a)K\) for all dissections \(\mathcal {D}\) we can define the upper integral \(I^{*}(f)=\inf _{\mathcal {D}}S(f,\mathcal {D})\). We define the lower integral similarly as \(I_{*}(f)=\sup _{\mathcal {D}}s(f,\mathcal {D})\). The inequality \(\bigstar \) tells us that these concepts behave well.
If \(I^{*}(f)=I_{*}(f)\) we say that \(f\) is Riemann integrable and we write \[\int _{a}^{b}f(x)\,dx=I^{*}(f).\] We write \(\mathcal {R}[a,b]\) or sometimes just \(\mathcal {R}\) for the set of Riemann integrable functions on \([a,b]\).
The following lemma provides a convenient criterion for Riemann integrability.
Lemma 9.3. (i) A bounded function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable if and only if given any \(\epsilon >0\) we can find a dissection \(\mathcal D\) with \[S(f,{\mathcal D})-s(f,{\mathcal D})<\epsilon .\]
(ii) A bounded function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable with integral \(I\) if and only if given any \(\epsilon >0\) we can find a dissection \(\mathcal D\) with \[S(f,{\mathcal D})-s(f,{\mathcal D})<\epsilon , \ \text {and}\ S(f,{\mathcal D})\geq I\geq s(f,{\mathcal D}).\]
The reader who is tempted to start her treatment of the Riemann integral with Lemma 9.3 (ii) should reflect that without the inequality \(\bigstar \) she will find it hard to prove the uniqueness of \(I\).
Even though we admit that we can not expect all functions to be integrable, any satisfactory theory of integration must have a large collection of integrable functions.
Theorem 9.4. (i) Any monotonic (that is any increasing or decreasing) function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable.
(ii) Any continuous function \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable.
Part (i) of Theorem 9.4 looks a little odd. However most of the functions \(f\) met with in ‘every day life’ including many discontinuous ones are the difference of two increasing functions (that is \(f=f_{1}-f_{2}\) with \(f_{1}\) and \(f_{2}\) increasing). Such functions are called ‘functions of bounded variation’ and have a very pretty, if old fashioned, theory of their own. Lemma 9.7 tells us that every function of bounded variation is Riemann integrable. It is worth noting that Riemann did not have the theorem that every continuous function on a closed bounded interval is uniformly continuous and so was unable to prove that every continuous function is Riemann integrable.
The reader will recall from course C5/6 the standard example of bounded function which is not Riemann integrable.
Example 9.5. If \(f:[0,1]\rightarrow {\mathbb R}\) is given by \begin {align*} f(x)=0&\qquad \text {when $x$ is rational,}\\ f(x)=1&\qquad \text {when $x$ is irrational,} \end {align*}
then \(f\) is not Riemann integrable.
The following results are easy to prove.
Lemma 9.6. Suppose that \(a\leq c\leq b\) and \(f:[a,b]\rightarrow {\mathbb R}\). If \(f|_{[a,c]}\in {\mathcal R}[a,c]\) and \(f|_{[c,b]}\in {\mathcal R}[c,b]\) then \(f\in {\mathcal R}[a,b]\) and \[\int _{a}^{b}f(x)\,dx=\int _{a}^{c}f(x)\,dx +\int _{c}^{b}f(x)\,dx .\]
If we adopt the convention that \(\int _{b}^{a}f(x)\,dx=-\int _{a}^{b}f(x)\,dx\) then the formula \[\int _{a}^{b}f(x)\,dx=\int _{a}^{c}f(x)\,dx +\int _{c}^{b}f(x)\,dx \] holds independently of the order of \(a\), \(b\) and \(c\).
Lemma 9.7. If \(\lambda ,\mu \in {\mathbb R}\) and \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable then so is \(\lambda f+\mu g\) and \[\int _{a}^{b}\lambda f(x)+\mu g(x)\,dx =\lambda \int _{a}^{b}f(x)\,dx+\mu \int _{a}^{b}g(x)\,dx.\]
In the language of linear algebra \({\mathcal R}[a,b]\) is a vector space and the integral is a linear functional (i.e. a linear map from \({\mathcal R}[a,b]\) to \(\mathbb R\).).
Lemma 9.8. If \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable and \(f(x)\geq g(x)\) for all \(x\in [a,b]\) then \[\int _{a}^{b}f(x)\geq \int _{a}^{b}g(x)\,dx.\]
The next result requires a small idea.
Lemma 9.10. (i) If \(f:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable then so is \(f^{2}\).
(ii) If \(f,g:[a,b]\rightarrow {\mathbb R}\) are Riemann integrable then so is \(fg\).
We are now in a position to prove the fundamental theorem of the calculus.
Theorem 9.11 (Fundamental theorem of the calculus). Suppose that \(u\in (a,b)\) and \(f:(a,b)\rightarrow {\mathbb R}\) is a Riemann integrable function function which is continuous at some \(t\in (a,b)\). If we set \[F(s)=\int _{u}^{s}f(x)\,dx\] then \(F\) is differentiable at \(t\) and \(F'(t)=f(t)\).
Sometimes we think of the fundamental theorem in a slightly different way.
Theorem 9.12. Suppose that \(f:(a,b)\rightarrow {\mathbb R}\) is continuous, that \(u\in (a,b)\) and \(c\in {\mathbb R}\). Then there is a unique solution to the differential equation \(g'(t)=f(t)\) \([t\in (a,b)]\) such that \(g(u)=c\).
Thus (under appropriate circumstances) integration and differentiation are inverse operations and the the theories of differentiation and integration are subsumed in the greater theory of the calculus. Under appropriate circumstances, if the graph of \(F\) has tangent with slope \(f(x)\) at \(x\) \begin {align*} \text {area under}&\ \text {the graph of slope of tangent of $F$}\\ &=\text {area under the graph of $f$}\\ &=\int _{a}^{b}f(x)\,dx=\int _{a}^{b}F'(x)\,dx=F(b)-F(a). \end {align*}
Although it is not explicitly on the syllabus the examiners seem deeply attached to the next theorem. The result was stated in course C4 and the proof makes use of quite a lot of the methods developed in the present course.
Theorem 9.13 (Differentiation under the integral). Suppose \(g:[a,b]\times [c,d]\) is continuous and that the partial derivative \(g_{2}\) exists and is continuous. Then writing \(G(y)=\int _{a}^{b}g(x,y)\,dx\) we have \(G\) differentiable on \((c,d)\) with \[G'(y)=\int _{a}^{b}g_{,2}(x,y)\,dx.\]
In the last few years the examiners have shown an excessive interest in the next lemma probably because the proof neatly combines several of our previous results.
Lemma 9.14. Suppose that \(g:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous and has continuous partial derivative \(g_{,2}\).
(i) If we set \(G(x,y)=\int _{0}^{x}g(u,y)\,du\) then \(G\) is differentiable with partial derivatives \[G_{,1}(x,y)=g(x,y)\ \text {and} \ G_{,2}(x,y)=\int _{0}^{x}g_{,2}(u,y)\,du.\]
(ii) If we set \(F(t)=\int _{0}^{t}g(u,t)\,du\) then \(F\) is differentiable with derivative \[F'(t)=g(t,t)+\int _{0}^{t}g_{,2}(u,t)\,du.\]
We have defined Riemann integration for bounded functions on bounded intervals. However, the reader will already have evaluated, as a matter of routine, integrals in the following manner \[\int _{0}^{1}x^{-1/2}\,dx=\lim _{\epsilon \rightarrow 0+} \int _{\epsilon }^{1}x^{-1/2}\,dx= \lim _{\epsilon \rightarrow 0+}[2x^{1/2}]_{\epsilon }^{1}=2,\] and \[\int _{1}^{\infty }x^{-2}\,dx=\lim _{R\rightarrow \infty } \int _{1}^{R}x^{-2}\,dx= \lim _{R\rightarrow \infty }[-x^{-1}]_{1}^{R}=1.\]
A full theoretical treatment of such integrals with the tools at our disposal is apt to lead into a howling wilderness of ‘infinite integrals of the first kind’, ‘Cauchy principal values’ and so on. Instead I shall give a few typical theorems, definitions and counter-examples from which the reader should be able to reconstruct any parts of the theory that she needs.
Definition 10.1. If \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\) and \(\int _{a}^{X}f(x)\,dx\rightarrow L\) as \(X\rightarrow \infty \) then we say that \(\int _{a}^{\infty }f(x)\,dx\) exists with value \(L\).
Lemma 10.2. Suppose \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\). If \(f(x)\geq 0\) for all \(x\) then \(\int _{a}^{\infty }f(x)\,dx\) exists if and only if there exists a \(K\) such that \(\int _{a}^{X}f(x)\,dx\leq K\) for all \(X\).
Lemma 10.3. Suppose \(f:[a,\infty )\rightarrow {\mathbb R}\) is such that \(f|_{[a,X]}\in {\mathcal R}[a,X]\) for each \(X>a\). If \(\int _{a}^{\infty }|f(x)|\,dx\) exists then \(\int _{a}^{\infty }f(x)\,dx\) exists.
It is natural to state Lemma 10.3 as saying that ‘absolute convergence of the integral implies conditional convergence’.
Additional problems arise when there are two limits involved.
Example 10.4. If \(\lambda ,\mu >0\) then \[\int _{-\mu R}^{\lambda R}\frac {x}{1+x^{2}}\,dx \rightarrow \log \frac {\lambda }{\mu }\] as \(R\rightarrow \infty \).
A pure mathematician gets round this problem by making a definition along these lines.
Definition 10.5. If \(f:{\mathbb R}\rightarrow {\mathbb R}\) is such that \(f|_{[-X,Y]}\in {\mathcal R}[X,Y]\) for each \(X,Y>0\) \(\int _{-\infty }^{\infty }f(x)\,dx\) exists with value \(L\) if and only if the following condition holds. Given \(\epsilon >0\) we can find an \(X_{0}(\epsilon )>0\) such that \[\left |\int _{-X}^{Y}f(x)\,dx-L\right |<\epsilon .\] for all \(X,Y>X_{0}(\epsilon )\).
The physicist gets round the problem by ignoring it. If she is a real physicist with correct physical intuition this works splendidly4 but if not, not.
Speaking broadly, infinite integrals \(\int _{E}f(x)\,dx\) work well when they are absolutely convergent, that is \(\int _{E}|f(x)|\,dx<\infty \), but are full of traps for the unwary otherwise.
So far we have dealt only with the integration of functions \(f:E\rightarrow {\mathbb R}\) with \(E\) a ‘well behaved’ subset of \(\mathbb R\). It is not difficult to extend our definitions to the full multidimensional case but the main users of multidimensional integrals are either applied mathematicians (in the broadest sense), differential geometers and analysts (including probabilists). The applied mathematicians are happy to take the details on trust and both the differential geometers and the analysts require more sophisticated approaches (though of rather different kinds).
However the particular case of a function \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is needed in the complex variable courses (with \(m=2\)) and elsewhere when line integrals are used. The definition is simple.
Definition 10.6. If \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is such that \(f_{j}:[a,b]\rightarrow {\mathbb R}\) is Riemann integrable for each \(j\) then \(\int _{a}^{b}f(x)\,dx=\mathbf {y}\) where \({\mathbf y}\in {\mathbb R}^{m}\) and \[y_{j}=\int _{a}^{b}f_{j}(x)\,dx.\]
It is easy to obtain the properties of this integral directly from its definition. Here is an example.
Lemma 10.7. If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{p}\) is linear and \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is Riemann integrable then so is \(\alpha \circ f\) and \[\int _{a}^{b}\alpha (f(x))\,dx=\alpha \left ( \int _{a}^{b}f(x)\,dx\right ).\]
Taking \(\alpha \) to be any orthogonal transformation of \({\mathbb R}^{m}\) to itself we see that that our definition is, in fact, coordinate independent. Choosing a particular orthogonal transformation we obtain the following nice result.
Theorem 10.8. If \(f:[a,b]\rightarrow {\mathbb R}^{m}\) is Riemann integrable then \[\left \|\int _{a}^{b}f(x)\,dx\right \| \leq (b-a)\sup _{x\in [a,b]}\|f(x)\|.\]
This result and its extensions are often quoted in the form \[\text {integral}\leq \text {length}\times \text {$\sup $}.\]
Exercise 10.9. Show that the collection \(\mathcal R\) of Riemann integrable functions \(f:[a,b]\rightarrow {\mathbb R}^{m}\) form a real vector space. If we write \[Tf=\int _{a}^{b}f(x)\,dx\] show that \(T:{\mathcal R}\rightarrow {\mathbb R}\) is a linear map and \(|Tf|\leq (b-a)\|f\|\).
The Riemann integral is simple to define and easy to use. So long as we only need to integrate continuous functions (or functions continuous except at a few simple discontinuities) it is perfectly satisfactory. The great majority of mathematicians need nothing else. However, the modern theory of probability and many parts of modern analysis need the more powerful integral of Lebesgue (a hint as to why this might be is given in Exercise 13.9) and its relatives. This is just as easy to use but substantially harder to set up5.
One of the unifying ideas of modern analysis is that of distance. We start with a definition modelled on those properties of Euclidean distance which do not refer to vector space structures. (Thus you should both compare and contrast Lemma 4.1.)
Definition 11.1. We say that \((X,d)\) is a metric space if \(X\) is a set and \(d:X^{2}\rightarrow \mathbb {R}\) is a function with the following properties:-
(i) \(d(x,y)\geq 0\) for all \(x,y\in X\).
(ii) \(d(x,y)=0\) if and only if \(x=y\).
(iii) \(d(x,y)=d(y,x)\) for all \(x,y\in X\).
(iv) (The triangle inequality) \(d(x,z)\leq d(x,y)+d(y,z)\)
As ought to be the case, we have the following result.
Lemma 11.2. If \(d({\mathbf x},{\mathbf y})=\|{\mathbf x}-{\mathbf y}\|\) then \(({\mathbb R}^{m},d)\) is a metric space.
I shall give later some examples of metric spaces which are both useful and novel (in the context of this course). The metric of Lemma 11.2 is useful but hardly novel. In the interest of balance let me give a metric which will be novel to most of my readers even if it is not useful.
Example 11.3. If \(d({\mathbf x},{\mathbf y})=\|{\mathbf x}\|+\|{\mathbf y}\|\) when \(\mathbf {x}\neq \mathbf {y}\) and \(d({\mathbf x},{\mathbf x})=0\) then \(({\mathbb R}^{m},d)\) is a metric space.
(This metric is called the British Railway metric.) Here is an even odder railway metric.
Example 11.4. If \(\mathbf x\) and \(\mathbf y\) are linearly dependent (in more geometrical language) if \(\mathbf x\), \(\mathbf y\) and \(\mathbf 0\) are on the same straight line) then set \[d({\mathbf x},{\mathbf y})=\|{\mathbf x}-{\mathbf y}\|.\] Otherwise, set \[d({\mathbf x},{\mathbf y})=\|{\mathbf x}\|+\|{\mathbf y}\|.\] With this definition, \(({\mathbb R}^{m},d)\) is a metric space.
Surprising as it may seem the following rather trivial metric is a rich source of intuition for communication theory. (However it is of no analytic interest.)
Lemma 11.5 (Hamming metric). Let \(M\) be the space of sequences \[{\mathbf x}=(x_{1},x_{2},\dots ,x_{n})\] with \(x_{j}=0\) or \(x_{j}=1\). We call \(M\) the space of messages of length \(n\). If \(\mathbf x\) and \(\mathbf y\) are messages we define \[d({\mathbf x},{\mathbf y})=\sum _{j=1}^{n}|x_{j}-y_{j}|\] to be the Hamming distance between the two messages. \((M,d)\) is a metric space.
Much of the ‘mere algebra’ which we did for the Euclidean metric carries over with hardly any change to the general metric case. Compare the following definition with Definition 4.2.)
Definition 11.6. Let \((X,d)\) be a metric space. If \(a_{n}\in X\) for each \(n\geq 1\) and \(a\in X\) then we say that \(a_{n}\rightarrow a\) if given \(\epsilon >0\) we can find an \(n_{0}(\epsilon )\) such that \[d(a_{n},a)<\epsilon \ \text {for all $n\geq n_{0}(\epsilon )$}.\]
However, we are not now dealing with a ‘norm on a vector space’ so some results do not carry over. The result corresponding to Lemma 4.3 has far fewer parts.
Lemma 11.7. Let \((X,d)\) be a metric space.
(i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).
(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).
(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).
The material on open and closed sets from Section 5 goes through essentially unchanged.
Definition 11.8. Let \((X,d)\) be metric space. A set \(F\subseteq X\) is closed if whenever \(x_{n}\in F\) for each \(n\) and \(x_{n}\rightarrow x\) as \(n\rightarrow \infty \) then \(x\in F\).
Definition 11.9. Let \((X,d)\) be metric space A set \(U\subseteq X\) is open if whenever \(x\in U\) there exists an \(\epsilon >0\) such that whenever \(d(x,y)<\epsilon \) we have \(y\in U\).
Example 11.10. Let \((X,d)\) be metric space. Let \(x\in X\) and \(r>0\).
(i) The set \(B(x,r) =\{y\in X:d(x,y)<r\}\) is open.
(ii) The set \(\bar {B}(x,r) =\{y\in X:d(x,y)\leq r\}\) is closed.
We call \(B(x,r)\) the open ball of radius \(r\) and centre \(x\). We call \(\bar {B}(x,r)\) the closed ball of radius \(r\) and centre \(x\).
Lemma 11.11. Let \((X,d)\) be a metric space. A subset \(A\) of \(X\) is open if and only if each point of \(A\) is the centre of an open ball lying entirely within \(A\).
Definition 11.12. The set \(N\) is a neighbourhood of the point \(x\) if we can find an \(r>0\) such that \(B(x,r)\subseteq N\).
Lemma 11.13. Let \((X,d)\) be a metric space. A subset \(A\) of \(X\) is open if and only if its complement \(X\setminus A\) is closed.
Lemma 11.14. Let \((X,d)\) be a metric space. Consider the collection \(\tau \) of open sets in \(X\).
(i) \(\emptyset \in \tau \), \(X\in \tau \).
(ii) If \(U_{\alpha }\in \tau \) for all \(\alpha \in A\) then \(\bigcup _{\alpha \in A} U_{\alpha }\in \tau \).
(iii) If \(U_{1},U_{2},\dots ,U_{n}\in \tau \) then \(\bigcap _{j=1}^{n}U_{j}\in \tau \).
Lemma 11.15. Let \((X,d)\) be a metric space. Consider the collection \(\mathcal {F}\) of closed sets in \(X\).
(i) \(\emptyset \in \mathcal {F}\), \(X\in \mathcal {F}\).
(ii) If \(F_{\alpha }\in \mathcal {F}\) for all \(\alpha \in A\) then \(\bigcap _{\alpha \in A} F_{\alpha }\in \mathcal {F}\).
(iii) If \(F_{1},F_{2},\dots ,F_{n}\in \mathcal {F}\) then \(\bigcup _{j=1}^{n}F_{j}\in \mathcal {F}\).
The new definition of continuity only breaks the chain of translations slightly because it involves two metric spaces.
Definition 11.16. Let \((X,d)\) and \((Z,\rho )\) be metric spaces. We say that a function \(f:X \rightarrow Z\) is continuous at some point \(x\in X\) if given \(\epsilon >0\) we can find a \(\delta (\epsilon ,x)>0\) such that if \(y\in X\) and \(d(x,y)<\delta (\epsilon ,x)\) we have \[\rho [(f(x),f(y))<\epsilon .\] If \(f\) is continuous at every point \(x\in X\) we say that \(f\) is a continuous function on \(X\).
The reader may feel that Definition 5.12 is more general in than Definition 11.16 because it involves a set \(E\).
The following remark shows that this is not so.
Lemma 11.17. Let \((X,d)\) be a metric space and \(E\subseteq X\). Let \(d_{E}:E^{2}\rightarrow {\mathbb R}\) be given by \(d_{E}(u,v)=d(u,v)\) whenever \(u,v\in E\). Then \((E,d_{E})\) is a metric space.
We conclude this section with more results taken directly from Section 5
Lemma 11.18. Let \((X,d)\) and \((Z,\rho )\) be metric spaces and suppose that the function \(f:X\rightarrow Z\) is continuous at \(x\in X\). Then if \(x_{n}\in X\) and \(x_{n}\rightarrow x\) it follows that \(f(x_{n})\rightarrow f(x)\).
Lemma 11.19. Let \((X,d)\) and \((Z,\rho )\) be metric spaces. The function \(f:X\rightarrow Z\) is continuous if and only if \(f^{-1}(O)\) is open whenever \(O\) is open.
Lemma 11.20. Let \((X,d)\) \((Y,\theta )\) and \((Z,\rho )\) be metric spaces. If \(f:X\rightarrow Y\) and \(g:Y\rightarrow Z\) are continuous then so is their composition \(g\circ f\).
Question 21.1 which asks you to look at open balls and open sets in the two railway metrics given in Examples 11.3 and 11.4 emphasises the fact that open sets in different metrics may be very different.
In this short starred section I look at a very important class of metrics. The discussion will be informal, as indicated by the replacement of the word ‘Theorem’ by ‘Statement’.
Suppose we wish to build a road joining two points of the plane. The cost of a building will depend on the nature of the terrain. If the cost of building a short stretch of length \(\delta s\) near a point \((x,y)\) is (to first order) \(g(x,y)\delta s\) then subject to everything being well behaved the cost of road \(\Gamma \) will be \[\int _{\Gamma }g(x,y)ds.\] ‘But’ cries the reader ‘you have not defined line integrals yet!’ To which I reply ‘Mathematical discourse takes place on many levels and we must learn to move freely between them’6. It is tempting to define the distance \(d(A,B)\) between two points \(A\) and \(B\) in the plane as \[d(A,B)=\inf \left \{\int _{\Gamma }g(x,y)ds: \text {$\Gamma $ joining $A$ and $B$}\right \}.\]
If we include amongst our conditions that \(g\) is continuous and \(g(x,y)>0\) everywhere we see that \(d\) is a metric.
Statement 12.1. (i) \(d(A,B)\geq 0\) for all \(A\) and \(B\).
(ii) \(d(A,B)=0\) if and only if \(A=B\).
(iii) \(d(A,B)=d(B,A)\) for all \(A\) and \(B\).
(iv) \(d(A,C)\leq d(A,B)+d(B,C)\) for all \(A\), \(B\) and \(C\).
I have rather carefully kept to ‘\(\inf \)’ rather than to ‘\(\min \)’. One problem is that I have not specified the kind of \(\Gamma \) that is permissible. (Notice that the argument we used to prove Statement 12.1 means that we could not just use ‘smooth’.) However, it can be shown that if we choose a suitable class for the possible \(\Gamma \) and a suitably well behaved \(g\) then the minimum is attained. If \(\Gamma _{0}\) is a path from \(A\) to \(B\) such that \[\int _{\Gamma _{0}}g(x,y)ds=d(A,B)\] we call \(\Gamma _{0}\) a geodesic. (The geodesic need not be unique, consider road building for two towns at diametrically opposite points of a circular marsh.) If any two points are joined by a geodesic then \(d(A,B)\) is the ‘length of the geodesic path joining \(A\) and \(B\)’ where length refers to the metric \(d\) and not to the Euclidean metric.
Let us try to use these ideas to find a metric on the upper half-plane \[H=\{z\in {\mathbb C}:\Im z>0\}\] which is invariant under Möbius transformation. (See course C1/2 for background). More precisely we want a metric \(d\) such that if \(T\) is a Möbius map mapping \(H\) to itself bijectively, then \(d(Tz_{1},Tz_{2})=d(z_{1},z_{2})\) for all \(z_{1},z_{2}\in H\). Of course no such map may exist but we shall see where the question leads.
Statement 12.2. The set \(\mathcal {H}\) of Möbius maps \(T\) such that \(T|_{H}:H\rightarrow H\) is bijective is a subgroup of the group \(\mathcal {M}\) of all Möbius transformations. The subgroup \(\mathcal {H}\) is generated by the transformations \(T_{a}\) with \(a\in {\mathbb R}\), \(D_{\lambda }\) with \(\lambda >0\) and \(J\) where \begin {align*} T_{a}=&a+z\\ D_{\lambda }(z)=&\lambda z\\ J(z)=&-z^{-1} \end {align*}
By looking at the effect of elements of \(\mathcal {H}\) on a small arc, we are led to the following. (Throughout we write write \(z=x+iy\) and identify the plane \({\mathbb R}^{2}\) with \(\mathbb C\) in the usual manner.)
Statement 12.3. Suppose \(g:H\rightarrow {\mathbb R}\) is well behaved. Then \[\int _{T(\Gamma )}g(x,y)ds=\int _{\Gamma }g(x,y)ds\] for all nice paths \(\Gamma \) in \(H\) and all \(T\in {\mathcal H}\) if and only if \(g(x,y)=Ay^{-1}\) for some \(A>0\).
Let us now fix \(g(x,y)=y^{-1}\) and let \(d\) be the derived metric.
We can use the calculus of variations (course C10, Mathematical Methods) to find geodesics.
Statement 12.5. The geodesics for the metric just described are arcs of circles with centres on the real axis and straight lines which cut the real axis at right angles.
Those who are interested in the axiom of parallels for Euclidean geometry will note that, for \((H,d)\), if we call circles with centres on the real axis and straight lines which cut the real axis at right angles geodesic lines for \((H,d)\), then given any geodesic line \(\Gamma \) and any point \(A\) in \(H\) not on \(\Gamma \) we can find many geodesic lines through \(A\) which do not intersect \(\Gamma \). More details are given in course D1 (Geometry).
Of course the methods of C10 are indicative rather than conclusive. (We have used a necessary condition and not a sufficient one.) However, in this particular case, it is not hard, once the answer is known, to state and prove everything rigorously. If we want general underpinning theorems they can be found in Differential Geometry but require a lot of hard work.
There is an interesting analogue of this kind of metric for groups.
Lemma 12.6. Let \(G\) be a group and \(A\) a subset of \(G\) generating \(G\). Write \(A^{-1}=\{a^{-1}:a\in A\}\). If we set \[d(x,y)=\min \{n: \text {we can find $a_{1},a_{2},\dots ,a_{n}\in A\cup A^{-1}$ with $a_{1}a_{2}\dots a_{n}=xy^{-1}$}\},\] then \(d\) is a metric on \(G\) with \(d(xz,yz)=d(x,y)\) for all \(x,y,z\in G\). (Thus \(d\) is a right translation invariant metric.)
If \(d(x,y)\) is bounded we call \(\max d(x,y)\) the diameter of the group. The notion of diameter has an obvious relevance to the problem ‘How many shuffles do we need to produce a well shuffled pack?’.
If we examine the arguments of Section 11 we see that they are all mere algebra. What do we need to introduce to do genuine analysis on metric spaces. We can not use a variant of the fundamental axiom because there is no order on our spaces7. The correct variant of the Bolzano–Weierstrass method is, it is generally agreed, the notion of compactness which will be studied in the context of topological spaces (a concept more general than metric spaces) in course C 12. Instead we use a generalisation of the general principle of convergence.
Definition 13.1. If \((X,d)\) is a metric space we say that a sequence of points \(x_{n}\in X\) is Cauchy if given any \(\epsilon >0\) we can find \(n_{0}(\epsilon )\) such that \(d(x_{p},x_{q})<\epsilon \) for all \(p,q\geq n_{0}(\epsilon )\).
Of course, \({\mathbb R}^{n}\) with the Euclidean metric is complete.
To show that the notion of completeness is distinct from ‘Bolzano–Weierstrass notions’ we introduce a dull but useful metric space.
Lemma 13.3. Let \(X\) be any set. If we define \(d:X^{2}\rightarrow \mathbb {R}\) by \begin {align*} d(x,y)=&1\qquad \text {if $x\neq y$}\\ d(x,x)=&0 \end {align*}
then \((X,d)\) is a metric space.
We call the metric \(d\) of the previous lemma the discrete metric.
Lemma 13.4. Let \((X,d)\) be a space with the discrete metric. Then \((X,d)\) is complete.
Suppose \(X\) is infinite. Then although \(X\) is bounded (if \(x_{0}\in X\) then \(\bar {B}(x_{0},1)=X\)) we can find a sequence \(x_{n}\) such that no subsequence converges.
Here is another property of the discrete metric.
Lemma 13.5. If \((X,d)\) is a space with the discrete metric then every subset of \(X\) is both open and closed.
The contraction mapping theorem (Theorem 15.1) and its applications will provide a striking example of the utility of the concept of completeness. However this section and the next are devoted more to examples of spaces which are and are not complete. Students who want to see completeness in action immediately should do the next, starred, example.
Exercise 13.6. We say that a metric space \((X,d)\) has no isolated points if given \(y\in X\) and \(\epsilon >0\) we can find an \(x\in X\) such that \(0<d(x,y)<\epsilon \). Show by the methods of Exercise 3.14 that a non-empty metric space with no isolated points is uncountable.
Give an example of an infinite countable metric space. Give an example of an uncountable metric space all of whose points are isolated.
The next lemma gives a good supply of metric spaces which are complete and of metric spaces which are not complete.
Lemma 13.7. Let \((X,d)\) be a complete metric space. If \(E\) is a subset of \(X\) and we define \(d_{E}:E^{2}\rightarrow {\mathbb R}\) by \(d_{E}(u,v)=d(u,v)\) whenever \(u,v\in E\) then \((E,d_{E})\) is complete if and only if \(E\) is closed in \((X,d)\).
Thus, for example, the closed interval \([a,b]\) is complete for the usual metric but the open interval \((a,b)\) is not.
Here is a more interesting example of a metric which is not complete.
Example 13.8. Consider \(C([-1,1])\) the set of continuous functions \(f:[-1,1]\rightarrow {\mathbb R}\). If we set \[d(f,g)=\|f-g\|_{1}=\int _{-1}^{1}|f(x)-g(x)|\,dx\] then \(d\) is a metric. Let
\begin{alignat*}{2} f_{n}(x)&=-1&&\qquad\text{for $-1\leq x\leq -1/n$}\\ f_{n}(x)&=nx&&\qquad\text{for $-1/n\leq x\leq 1/n$}\\ f_{n}(x)&=1&&\qquad\text{for $1/n\leq x\leq 1$}. \end{alignat*}
The sequence \(f_{n}\) is Cauchy but has no limit. Thus \((C([-1,1]),d)\) is not complete.
The example just given leaves open the possibility that the same type of metric might be be complete when applied to Riemann integrable functions. The following example which is long, heavily starred, and only for the ambitious shows that this is not so.
Exercise 13.9. Consider \({\mathcal R}([-1,1])\) the set of Riemann integrable functions functions \(f:[-1,1]\rightarrow {\mathbb R}\). If we set \[d(f,g)=\|f-g\|_{1}=\int _{-1}^{1}|f(x)-g(x)|\,dx\] show that \(d\) satisfies conditions (i), (iii) and (iv) of the definition of a metric (see Definition 11.1) but give an example to show that condition (ii) may fail. We could now stop since \(d\) is not a metric but this would be to miss the interesting part.
In many ways, condition (ii) is the least important part of the definition of a metric. We call something satisfying all the other conditions a pseudo-metric. We can define limits for a pseudo-metrics in the same way as for metrics.
If \((X,\rho )\) is a pseudo-metric space show that if \(x_{n}\rightarrow x\) then \(x_{n}\rightarrow y\) if and only if \(\rho (x,y)=0\).
In the same way we can define Cauchy sequences. A pseudo-metric space is called complete if every Cauchy sequence converges. Our object is to show that \(({\mathcal R}([-1,1]),d)\) is not complete.
(i) Set
\begin{alignat*}{2} \Delta_{n}(x)=&0&&\qquad\text{if $2^{-3n}\leq |x|$}\\ \Delta_{n}(x)=&1-2^{3n}|x|&&\qquad\text{if $|x|\leq 2^{-3n}$.} \end{alignat*}
and \(g_{n}(t)=\sum _{r=-2^{n}}^{2^{n}}\Delta _{n}(t-r2^{-n})\) for all \(-1\leq t\leq 1\). Sketch \(g_{n}\) and show that \(\|g_{n}\|_{1}\leq 2^{-2n+1}\).
(ii) If \(f_{n}(x)=\max \{f_{1}(x),f_{2}(x),\dots ,f_{n}(x)\}\) explain briefly why \(f_{n}\) is a continuous function on \([-1,1]\) with \(0\leq f_{n}(x)\leq 1\). Show that \(\|f_{n+1}-f_{n}\|\leq 2^{-2n+3}\) and deduce that \(f_{n}\) is a Cauchy sequence in \(({\mathcal R}([-1,1]),d)\).
(iii) Suppose if possible that there exists an \(f\in {\mathcal R}([-1,1])\) such that \(\|f_{n}-f\|\rightarrow 0\) as \(n\rightarrow \infty \). Show that \(\|f\|\leq 2/3\).
(iv) We now use the notation of Section 9. Explain why there must exist a dissection \(\mathcal {D}\) of \([-1,1]\) such that \(S(f,\mathcal {D})\leq 1\). Deduce that we can find \(a\) and \(b\) with \(-1\leq a <b\leq 1\) such that \(f(x)\leq 1/2\) for all \(x\in [a,b]\).
(v) Show that \(\int _{a}^{b}|f(x)-f_{n}(x)|\,dx\nrightarrow 0\) and deduce that \(\|f-f_{n}\|_{1}\nrightarrow 0\). This contradiction shows that the sequence \(f_{n}\) does not converge and \(({\mathcal R}([-1,1]),d)\) is not complete.
The reader who has got this far can feel well pleased with herself but by working still harder we can extract a little more juice from this example and show that \(f_{n}\) converges pointwise to a function which is not Riemann integrable.
(v) Show that, for each \(x\), \(f_{n}(x)\) is a bounded increasing function. Deduce that \(f_{n}(x)\) tends to limit \(F(x)\), say, for each \(x\). Show that \(I^{*}(F)=2\).
(vi) Let \(Z_{n}=f_{n}^{-1}(0)\). Explain why \(Z_{n}\) is closed and \(Z_{1}\supseteq Z_{2}\supseteq \dots \). If \(\mathcal {D}\) is a fixed dissection of \([-1,1]\) show that for each \(n\) we can find a collection of \(\mathcal {I}_{n}\) of intervals \(I=[x_{j},x_{j-1}]\) with end points successive points of the dissection and of total length greater than \(1\) such that \(I\cap Z_{n}\neq \emptyset \).
(vii) Let \(\mathcal {I}=\bigcap _{n=1}^{\infty }\mathcal {I}_{n}\). Show that \(\mathcal {I}\) is a collection of intervals \(I=[x_{j},x_{j-1}]\) with end points successive points of the dissection \(\mathcal {D}\) and with total length greater than \(1\) such that \(I\cap Z_{n}\neq \emptyset \) for each \(n\).
(viii) Use Theorem 5.17 to show that \(I\cap \bigcap _{n=1}^{\infty }Z_{n}\neq \emptyset \) whenever \(I\in mathcal{I}\). Deduce that \(s(F,\mathcal {D})\leq 1\).
(ix) Conclude that \(F\) is not Riemann integrable.
If we seek to embed \(({\mathcal R}([-1,1]),d)\) in some larger space of functions which is complete we are led to Lebesgue theory.
This section is devoted to one of the most important metrics on functions. We shall write \(\mathbb {F}\) to mean either \(\mathbb {R}\) or \(\mathbb {C}\).
Definition 14.1. If \(E\) is a non-empty set we write \(\mathcal {B}(E)\) for the set of bounded functions \(f:E\rightarrow {\mathbb F}\). The uniform norm \(\|\ \|_{\infty }\) on \(\mathcal {B}\) is defined by \(\|f\|_{\infty }=\sup _{x\in E}|f(x)|\).
Lemma 14.2. If we use the standard operations \(\mathcal {B}(E)\) is a vector space over \(\mathbb {F}\). If \(f,g,h\in \mathcal {B}(E)\) the following results are true.
(i) \(\|f\|_{\infty }\geq 0\),
(ii) If \(\|f\|_{\infty }= 0\) then \(f=0\),
(iii) If \(\lambda \in {\mathbb F}\) then \(\|\lambda f\|_{\infty }=|\lambda |\|f\|_{\infty }\).
(iv) (The triangle inequality) \(\|f+g\|_{\infty }\leq \|f\|_{\infty }+\|g\|_{\infty }\).
(v) The pointwise product \(fg\in \mathcal {B}(E)\) (where \(fg(x)=f(x)g(x)\)) and \(\|fg\|_{\infty }\leq \|f\|_{\infty }\|g\|_{\infty }\).
We call the metric \(d\) given by \(d(f,g)=\|f-g\|_{\infty }\) the uniform metric.
The space \(\mathcal {B}(E)\) is not very interesting in itself but if \(E\) is a metric space it has a very interesting subspace.
Definition 14.4. If \((E,d)\) is a non-empty metric space we write \(\mathcal {C}(E)\) for the set of bounded continuous functions \(f:E\rightarrow {\mathbb F}\).
The next remark merely restates what we already know.
Lemma 14.5. If \((E,d)\) is a non-empty metric space then \(\mathcal {C}(E)\) is a vector subspace of \(\mathcal {B}(E)\). Further if \(f,g\in \mathcal {C}(E)\) the pointwise product \(fg\in \mathcal {C}(E)\).
However the next result is new and crucial.
Theorem 14.6. If \((E,d)\) is a non-empty metric space then \(\mathcal {C}(E)\) is a closed subset of \(\mathcal {B}(E)\) under the uniform metric.
This is the famous ‘\(\epsilon /3\) Theorem’8. Traditionally, it has been presented in a different but essentially equivalent manner.
Definition 14.7 (Uniform convergence). If \(E\) is a non-empty set and \(f_{n}:E\rightarrow {\mathbb F}\) and \(f:E\rightarrow {\mathbb F}\) are functions we say that \(f_{n}\) converges uniformly to \(f\) as \(n\rightarrow \infty \) if, given any \(\epsilon >0\) we can find an \(n_{0}(\epsilon )\) such that \(|f_{n}(x)-f(x)|<\epsilon \) for all \(x\in E\) and all \(n\geq n_{0}(\epsilon )\).
Theorem 14.6 now takes the following form.
Theorem 14.8. If \((E,d)\) is a non-empty metric space and \(f_{n}:E\rightarrow {\mathbb F}\) form a sequence of continuous functions tending uniformly to \(f\) then \(f\) is continuous.
More briefly, the uniform limit of continuous functions is continuous. (To see the exact equivalence observe that if \(f_{n}\rightarrow f\) uniformly then we can find an \(N\) such that \(f_{n}-f_{N}\) is bounded for all \(n\geq N\).)
Since \(\mathcal {C}(E)\) is a closed subset of \(\mathcal {B}(E)\), Theorem 14.3 and Lemma 13.7 gives us another important theorem.
Theorem 14.9. If \((E,d)\) is a non-empty metric space then the the uniform metric on \(\mathcal {C}(E)\) is complete.
This result, also, has a more traditional form.
Theorem 14.10 (General principle of uniform convergence). Suppose that \((E,d)\) is a non-empty metric space and \(f_{n}:E\rightarrow {\mathbb F}\) is a continuous function \([n\geq 1]\). The sequence \(f_{n}\) converge uniformly to a continuous function \(f\) if and only if given any \(\epsilon >0\) we can find an \(n_{0}(\epsilon )\) such that \(|f_{n}(x)-f_{m}(x)|<\epsilon \) for all \(n,m\geq n_{0}(\epsilon )\).
This theorem is also known as the GPUC by those who do not object to theorems which sound as though they were a branch of the secret police.
If \(E\) is a closed bounded subset of \({\mathbb R}^{m}\) with the Euclidean metric then by Theorem 6.4 all continuous functions \(f:E\rightarrow {\mathbb F}\) are bounded. In these circumstances we shall write \(C(E)=\mathcal {C}\). For the rest of the course we will only consider this special case. We start with a couple of important examples.
Example 14.11 (The witch’s hat). Define \(f_{n}:[0,2]\rightarrow {\mathbb R}\) by
\begin{alignat*}{2} f_{n}(x)=&1-n|x-n^{-1}|&&\qquad\text{for $|x-n^{-1}|\leq n^{-1}$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}
Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) as \(n\rightarrow \infty \) for each \(x\) but \(f_{n}\nrightarrow 0\) uniformly.
More briefly, pointwise convergence does not imply uniform convergence.
Example 14.12. Define \(f_{n}:[0,1]\rightarrow {\mathbb R}\) by \(f_{n}(x)=x^{n}\). Then \(f_{n}(x)\rightarrow f(x)\) as \(n\rightarrow \infty \) where \(f(x)=0\) for \(0\leq x<1\) but \(f(1)=1\).
Thus the pointwise limit of continuous functions need not be continuous.
Uniform convergence is a very useful tool when dealing with integration.
Theorem 14.13. Let \(f_{n}\in C([a,b])\). If \(f_{n}\rightarrow f\) uniformly then \(f\in C([a,b])\) and \[\int _{a}^{b}f_{n}(x)\,dx\rightarrow \int _{a}^{b}f(x)\,dx.\]
Students often miss the full force of this theorem because the formula is so easy to prove. We also need to prove that the second integral actually exists. The second half of Example 13.9 shows that the pointwise limit of continuous functions need not be integrable.
Theorem 14.13 should be considered in the context of the following two examples.
Example 14.14 (The tall witch’s hat). Define \(f_{n}:[0,2]\rightarrow {\mathbb R}\) by
\begin{alignat*}{2} f_{n}(x)=&n(1-n|x-n^{-1}|)&&\qquad \text{for $|x-n^{-1}|\leq n^{-1}$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}
Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) as \(n\rightarrow \infty \) but \[\int _{0}^{2}f_{n}(x)\,dx\nrightarrow 0\] as \(n\rightarrow \infty \).
Example 14.15 (Escape to infinity). Define \(f_{n}:{\mathbb R}\rightarrow {\mathbb R}\) by
\begin{alignat*}{2} f_{n}(x)=&1-n^{-1}|x|&&\qquad \text{for $|x|\leq n$,}\\ f_{n}(x)=&0&&\qquad\text{otherwise.} \end{alignat*}
Then the \(f_{n}\) are continuous functions such that \(f_{n}(x)\rightarrow 0\) uniformly as \(n\rightarrow \infty \) but \[\int _{-\infty }^{\infty }f_{n}(x)\,dx\nrightarrow 0\] as \(n\rightarrow \infty \).
Example 14.15 must be well understood since it represents a very common phenomenon which we have to learn to live with.
Traditionally, Theorem 14.13 is always paired with the following result which is really a disguised theorem on integration!
Theorem 14.16. Suppose that \(f_{n}:[a,b]\rightarrow {\mathbb R}\) is differentiable on \([a,b]\) with continuous derivative \(f_{n}'\) (we take the one-sided derivative at end points if necessary). Suppose that \(f_{n}(x)\rightarrow f(x)\) as \(n\rightarrow \infty \) for each \(x\in [a,b]\) and suppose that \(f_{n}'\) converges uniformly to a limit \(F\) on \([a,b]\). Then \(f\) is differentiable with derivative \(F\).
The reader knows how to turn results on the limits of sequences into results on infinite sums and vice versa. Applied to Theorems 14.13 and 14.16 the techniques produces the following results.
Theorem 14.17 (Term by term integration). Let \(g_{j}:[a,b]\rightarrow {\mathbb R}\) be continuous. If \(\sum _{j=1}^{n}g_{j}\) converges uniformly as \(n\rightarrow \infty \) then \[\int _{a}^{b}\sum _{j=1}^{\infty }g_{j}(x)\,dx =\sum _{j=1}^{\infty }\int _{a}^{b}g_{j}(x)\,dx.\]
Theorem 14.18 (Term by term differentiation). Let \(g_{j}:[a,b]\rightarrow {\mathbb R}\) be differentiable with continuous derivative. If \(\sum _{j=1}^{n}g_{j}(x)\) converges for each \(x\) and \(\sum _{j=1}^{n}g_{j}'\) converges uniformly as \(n\rightarrow \infty \) then \(\sum _{j=1}^{\infty }g_{j}\) is differentiable and \[\frac {d\ }{dx}\left (\sum _{j=1}^{\infty }g_{j}(x)\right ) =\sum _{j=1}^{\infty }g_{j}(x).\]
Here is a starred application of Theorem 14.16. The reader may, of course omit it but I include it partly because it is often more useful than the theorem (Theorem 9.13) that it extends and partly because it provides an excellent example of how Theorem 14.16 is used in practice.
Theorem 14.19 (Differentiation under an infinite integral). Suppose \(g:[0,\infty ]\times [c,d]\) is continuous and that the partial derivative \(g_{,2}\) exists and is continuous. Suppose further that that there exists a continuous function \(h:[0,\infty ]\times [c,d]\) with \(|g_{,2}(x,y)|\leq h(x)\) for all \((x,y)\) and such that \(\int _{0}^{\infty }h(x)\,dx\) exists and is finite. Then, if \(G(y)=\int _{0}^{\infty }g(x,y)\,dx\) exists for all \(y\in (c,d)\) we have \(G\) differentiable on \((c,d)\) with \[G'(y)=\int _{0}^{\infty }g_{,2}(x,y)\,dx.\]
We conclude this section with a remark which may appear rather isolated but will become more important in course C12 and later work on complex variable.
Theorem 14.20. We work in \(\mathbb C\). Let \(\sum _{n=0}^{\infty }a_{n}z^{n}\) be a power series with radius of convergence \(R\). If \(0\leq r< R\) then \(\sum _{n=0}^{N}a_{n}z^{n}\) converges uniformly on \(\bar {B}(0,r)\) as \(N\rightarrow \infty \). However, it need not be true that \(\sum _{n=0}^{N}a_{n}z^{n}\) converges uniformly on \(B(0,R)\).
The next result is a beautiful example of the power of abstraction. In it Banach transformed a ‘folk-technique’ into a theorem.
Theorem 15.1 (The contraction mapping theorem). Let \((X,d)\) be a non-empty complete metric space and \(T:X\rightarrow X\) a mapping such that there exists a \(K<1\) with \(d(Tx,Ty)\leq K d(x,y)\) for all \(x,y\in X\). Then there exists a unique \(x_{0}\in X\) such that \(Tx_{0}=x_{0}\).
More briefly, a contraction mapping on a complete metric space has a unique fixed point.
Wide though the conditions are the reader should exercise caution before attempting to widen them further.
Example 15.2. (i) If \(X\) is the unit circle centre \(\mathbf 0\) in the plane, \(d\) Euclidean distance and \(T\) a non-trivial rotation then \((X,d)\) is a complete metric space and \(d(T{\mathbf x},T{\mathbf y})=d({\mathbf x},{\mathbf y})\) for all \({\mathbf x},{\mathbf y}\in X\) but \(T\) has no fixed point.
(ii) If \(X={\mathbb R}\), \(d\) is Euclidean distance and \[Tx=1+x+g(x)\] for some suitably chosen \(g\) then \((X,d)\) is a complete metric space and \(d(Tx,Ty)< d(x,y)\) for all \(x,y\in X\) but \(T\) has no fixed point.
We use the contraction mapping theorem to show that a wide class of differential equations actually have a solution. We shall be looking at equations of the form \[y'=f(x,y).\] Our first, simple but important, result is that this problem on differential equations can be turned into a problem on integral equations.
Lemma 15.3. If \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous, \(x_{0},y_{0}\in {\mathbb R}\) and \(\delta >0\) then the following two statements are equivalent.
(A) The function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is differentiable and satisfies the equation \(y'(x)=f(x,y(x))\) for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\) together with the boundary condition \(y(x_{0})=y_{0}\).
(B) The function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is continuous and satisfies the condition \[y(x)=y_{0}+\int _{x_{0}}^{x}f(u,y(u))\,du\] for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\).
Theorem 15.4. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous, \(x_{0},y_{0}\in {\mathbb R}\) and \(\delta >0\). Suppose further that there exists a \(K>0\) such that \(K\delta <1\) and
\begin{equation*} |f(x,u)-f(x,v)|\leq K|u-v| \tag*{$\bigstar$} \end{equation*}
for all \(u,v\in (x_{0}-\delta ,x_{0}+\delta )\). Then there exists a unique \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) is continuous and satisfies the condition \[y(x)=y_{0}+\int _{x_{0}}^{x}f(t,y(t))\,dt\] for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\).
Condition \(\bigstar \) is called a Lipschitz condition. It closely related to differentiability (consider the mean value theorem) but does not imply differentiability (consider \(f(x,y)=|y|\)). In the absence of such a condition differential equations can have unexpected properties.
Example 15.5. There are an infinity of different solutions to the equation \[y'=3y^{2/3}\] with \(y(0)=0\).
The lecturer can not bear to let the topic go at this point but the rest of this section is heavily starred. As it stands Theorem 15.4 is not quite powerful enough for practical purposes. Careful thought, without the need for new ideas shows that it can be extended to the following result.
Theorem 15.6. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[|f(x,u)-f(x,v)|\leq K(R) |u-v| \] whenever \(|x|,|u|,|v|\leq R\). Then given any \((x_{0},y_{0})\in {\mathbb R}^{2}\) we can find a \(\delta (x_{0},y_{0})>0\) such that there exists a unique differentiable function \(y:(x_{0}-\delta ,x_{0}+\delta )\rightarrow {\mathbb R}\) which satisfies the equation \(y'(x)=f(x,y(x))\) for all \(x\in (x_{0}-\delta ,x_{0}+\delta )\) together with the boundary condition \(y(x_{0})=y_{0}\).
Theorem 15.6 tells us that the under very wide conditions the differential equation has a local solution through each \((x_{0},y_{0})\). Does it have a global solution, that is, can we find a solution for the equation \(y'(x)=f(x,y(x))\) which is defined for all \(x\in {\mathbb R}\)?
We finish our discussion by showing that the question is non-trivial.
Theorem 15.7. Suppose \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[|f(x,u)-f(x,v)|\leq K(R) |u-v| \] for all \(|x|\leq R\) and all \(u\) and \(v\). Then given any \((x_{0},y_{0})\in {\mathbb R}^{2}\) we can find a unique \(y:{\mathbb R}\rightarrow {\mathbb R}\) which is is differentiable and satisfies the equation \(y'(x)=f(x,y(x))\) for all \(x\in {\mathbb R}\) together with the boundary condition \(y(x_{0})=y_{0}\)
Example 15.8. If \(f(x,y)=1+y^{2}\) then \[|f(x,u)-f(x,v)|\leq 2R |u-v| \] whenever \(|u|,|v|\leq R\). However, there does not exist a differentiable function \(y:{\mathbb R}\rightarrow {\mathbb R}\) with \(y'=f(x,y)\) for all \(x\).
(The contents of this section are all starred)
In course C12 you saw how to solve the differential equation
\begin{equation*} y’’+a(x)y’+b(x)y=f(x) \tag*{$(*)$} \end{equation*}
subject to the conditions \(y(0)=y(1)=0\) by using the Green’s function. There is no doubt in the present author’s mind that the ‘physicist’s approach’ using impulses, delta functions and hand-waving is the most important for the student to master. However, it adds to our confidence in the hand-waving approach to see that its results can be confirmed rigourously.
We start by looking at solutions of
\begin{equation*} y’’+a(x)y’+b(x)y=0 \tag*{$(**)$} \end{equation*}
Observe first that the results of the last section on the existence of solutions of differential equations can be extended to the vectorial case with only the most minor variations in the proof. Thus, for example, Theorem 15.7 extends as follows.
Theorem 16.1. Suppose \(f:{\mathbb R}\times {\mathbb R}^{m} \rightarrow {\mathbb R}^{m}\) is a continuous function satisfying the following condition. There exists a \(K:[0,\infty )\rightarrow {\mathbb R}\) such that \[\|f(x,{\mathbf w})-f(x,{\mathbf z})\| \leq K(R) \|{\mathbf w}-{\mathbf z}\| \] for all \(|x|\leq R\) and all \(\mathbf {w}\) and \(\mathbf {z}\). Then given any \(x_{0}\in {\mathbb R}\) and \({\mathbf y}_{0}\in {\mathbb R}^{m}\) we can find a unique \(\mathbf {y}:{\mathbb R}\rightarrow {\mathbb R}^{m}\) which is is differentiable and satisfies the equation \[{\mathbf y}'(x)=f(x,{\mathbf y})\] for all \(x\in {\mathbb R}\) together with the boundary condition \(\mathbf {y}(x_{0})=\mathbf {y}_{0}\).
As the the reader knows from course C4 (Differential Equations) it is easy to convert equation \((**)\) into a form appropriate for the application of Theorem 16.1. If we set \[{\mathbf y}=\begin {pmatrix}y\\y'\end {pmatrix} \ \text {and}\ f(x,{\mathbf w})= \begin {pmatrix}0&1\\-a(x)&-b(x)\end {pmatrix}{\mathbf w} =\begin {pmatrix}w_{2}\\-b(x)w_{1}-a(x)w_{2}\end {pmatrix},\] then \[{\mathbf y}'(x)=f(x,{\mathbf y}).\] Now \begin {align*} \|f(x,{\mathbf w})-f(x,{\mathbf z})\| &=\left \|\begin {pmatrix} w_{2}-z_{2}\\-b(x)(w_{1}-z_{1})-a(x)(w_{2}-z_{2}) \end {pmatrix}\right \|\\ &\leq (1+|a(x)|+|b(x)|)\|{\mathbf w}-{\mathbf z}\| \end {align*}
so that writing \(K(R)=\sup _{t\in [-R,R]}(1+|a(t)|+|b(t)|)\) we have \[\|f(x,{\mathbf w})-f(x,{\mathbf z})\|\leq K(R) \|{\mathbf w}-{\mathbf z}\|\] whenever \(\|x|\leq K(R)\). Thus there is one and only one solution \(y_{[\lambda ]}\), say, of \((**)\) with \(y(0)=0\) and \(y'(0)=\lambda \). By linearity, \(y_{[\lambda ]}=\lambda y_{[1]}\).
Let us write \(y_{1}\) for the solution of \((**)\) with \(y_{1}(0)=0\), \(y_{1}'(0)=1\). We make the following \[\text {\bf key assumption}\qquad y_{1}(1)\neq 0.\] We write \(y_{2}\) for the solution of \((**)\) with \(y_{2}(0)=0\), \(y_{2}'(0)=1\). The next lemmas are familiar from courses C4 and C10. We define \[W(x)=y_{1}(x)y_{2}'(x)-y_{2}(x)y_{1}'(x).\] The function \(W\) is called the Wronskian.
Lemma 16.2. (i) The Wronskian is differentiable with \[W'(x)=-a(x)W(x).\]
(ii) The Wronskian is never zero.
We can now define the Green’s function \(G:{\mathbb R}^{2}\rightarrow {\mathbb R}\) by
\begin{alignat*}{2} G(s,t)=&y_{1}(s)y_{2}(t)W(t)^{-1}&&\qquad\text{for $s<t$}\\ G(s,t)=&y_{2}(s)y_{1}(t)W(t)^{-1}&&\qquad\text{for $t\leq s$.} \end{alignat*}
(Why should we do any such thing? Because the intuitive arguments of C10 tell us to.) Using Lemma 9.14 it is easy to verify our main result.
Theorem 16.3. If \(f:{\mathbb R}\rightarrow {\mathbb R}\) is continuous then \[y(x)=\int _{0}^{1}G(x,w)f(w)\,dw\] is twice differentiable and satisfies
\begin{equation*} y’’+a(x)y’+b(x)y=f(x) \tag*{$(*)$} \end{equation*}
together with the conditions \(y(0)=y(1)=0\).
(Our proof is very direct and involves neither differentiation under the integral nor Lemma 9.14. However, it is not hard to see that such ideas would be relevant in more complicated situations.)
From the point of view of more advanced work this shows how differential operators like \[y\mapsto Sy=y''+a(x)y+b(x)y\] can be linked with better behaved integral operators like \[f\mapsto Tf \ \text {with}\ Tf(x)=\int _{0}^{1}G(x,v)f(v)\,dv.\] Note that we have shown \(STf=f\) for \(f\) continuous, but note also that, if \(f\) is merely continuous, \(Sf\) need not be defined. The Green’s function \(G\) is an example of an integral kernel9. More formally, if we write \[Tf(x)=\int _{0}^{1}K(x,v)f(v)\,dv,\] then \(T\) is called an integral operator with kernel \(K\). (So far as I know, there is no connection with the ‘kernel’ that you meet in the linear mathematics course P1.)
We end with an example to show that things really do go awry if our key assumption fails.
Since new introductions to analysis pour off the printing presses in an unending torrent, it may well be my fault that I can not recommend one book to cover the whole course. Spivak’s Calculus [8] and J. C. Burkill’s A First Course in Mathematical Analysis [3] are both excellent introductions to analysis but only deal with the real line and not with \({\mathbb R}^{m}\). J. C. Burkill and H. Burkill’s A Second Course in Mathematical Analysis [4] is also good but rather old fashioned. (Of course, this has nothing to do with the authors’ knowledge but a great deal to do with what was then thought the appropriate content for a second course.) Dieudonné’s Foundations of Modern Analysis [5] is superb but written by someone who neither knew nor cared what others thought was the appropriate content for a second course10. A suitable compromise is reached in Marsden and Hoffman’s Elementary Classical Analysis [7]. This covers the material on multidimensional calculus very clearly and is probably the book that most students will find most useful. Make sure that your college library has a copy of the second edition.
The material on metric spaces is covered in Sutherland’s Introduction to Metric and Topological Spaces [9] which is workmanlike though a bit dull. (Once analytic topology is detached from its applications it takes a very gifted writer to make it exciting.)
I would be glad to hear of other suggestions for suitable books. A completely unsuitable but interesting version of the standard analysis course is given by Berlinski’s A Tour of the Calculus [1] — Spivak rewritten by Sterne with additional purple passages by the Ankh-Morpork tourist board.
[1] D. Berlinski A Tour of the Calculus Mandarin Paperbacks 1997.
[2] R. P. Boas Lion Hunting and Other Mathematical Pursuits (Editors G. L. Alexander and D. H. Mugler), Dolciani Expositions, Vol 15, MAA, 1995.
[3] J. C. Burkill A First Course in Mathematical Analysis CUP, 1962.
[4] J. C. Burkill and H. Burkill A Second Course in Mathematical Analysis CUP, 1970.
[5] J. Dieudonné Foundations of Modern Analysis, Academic Press, 1960.
[6] G. H. Hardy A Course of Pure Mathematics, CUP, 1908. (Still available in its 10th edition.)
[7] J. E. Marsden and M. J. Hoffman Elementary Classical Analysis (2nd Edition), W. H. Freeman, New York, 1993.
[8] M. Spivak, Calculus Addison-Wesley/Benjamin-Cummings, 1967.
[9] W. A. Sutherland Introduction to Metric and Topological Spaces, OUP, 1975.
[10] S. Wagon The Banach–Tarski Paradox, CUP, 1993.
[11] E. T. Whittaker and G. N. Watson A Course of Modern Analysis CUP, 1902 (Still available in its 4th edition.)
I have tried to produce 12 rather routine questions for each sheet. Any further questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.
Q 18.1. (We work with the same ideas as in Example 1.1.) (i) Find a differentiable function \(f:{\mathbb Q}\rightarrow {\mathbb Q}\) such that \(f'(x)=1\) for all \(x\in {\mathbb Q}\) but \(f(0)>f(1)\).
(ii) If we define \(g:{\mathbb Q}\rightarrow {\mathbb Q}\) by \(g(x)=(x^{2}-2)^{-2}\) show that \(g\) is continuous but unbounded on \(\{x\in {\mathbb Q}:|x|\leq 2\}\).
Q 18.2. In Question 18.3 below you are asked to prove Lemma 2.1. Write a paragraph on each of the following subjects.
(i) Would it have been better for the lecturer to prove Lemma 2.1 rather than leave it to you. Why?
(ii) What benefit, if any, do you expect to derive from doing Lemma 2.1? Give reasons.
(iii) What is the best way of approaching Question 18.1. For example should you look at your notes from last year before attacking the question? Should you look at your notes from last year after attacking the question?
Q 18.3. Prove Lemma 2.1 (restated below for your convenience)
(i) The limit is unique. That is, if \(a_{n}\rightarrow a\) and \(a_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a=b\).
(ii) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(a_{n(j)}\rightarrow a\) as \(j\rightarrow \infty \).
(iii) If \(a_{n}=c\) for all \(n\) then \(a_{n}\rightarrow c\) as \(n\rightarrow \infty \).
(iv) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}+b_{n}\rightarrow a+b\).
(v) If \(a_{n}\rightarrow a\) and \(b_{n}\rightarrow b\) as \(n\rightarrow \infty \) then \(a_{n}b_{n}\rightarrow ab\).
(vi) If \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) and \(a_{n}\neq 0\) for each \(n\), \(a\neq 0\) then \(a_{n}^{-1}\rightarrow a^{-1}\).
(vii) If \(a_{n}\leq A\) for each \(n\) and \(a_{n}\rightarrow a\) as \(n\rightarrow \infty \) then \(a\leq A\).
Q 18.4. Prove Lemma 2.2 which states that a decreasing sequence of real numbers bounded below tends to a limit.
Q 18.5. (i) If \(a_{j}\) is a integer with \(0\leq a_{j}\leq 9\) show from the fundamental axiom that \[\sum _{j=1}^{\infty }a_{j}10^{-j}\] exists. Show that \(0\leq \sum _{j=1}^{\infty }a_{j}10^{-j}\leq 1\) carefully quoting any theorems that you use.
(ii) If \(0\leq x\leq 1\) show that we can find integers \(x_{j}\) with \(0\leq x_{j}\leq 9\) such that \[x=\sum _{j=1}^{\infty }x_{j}10^{-j}.\] What important deep result do you use?
(iii) If \(a_{j}\) and \(b_{j}\) are integers with \(0\leq a_{j},b_{j}\leq 9\) and \(a_{j}=b_{j}\) for \(j<N\), \(a_{N}>b_{N}\) show that \[\sum _{j=1}^{\infty }a_{j}10^{-j}\geq \sum _{j=1}^{\infty }b_{j}10^{-j}.\] Give the precise necessary and sufficient condition for equality and prove it.
Q 18.6. (i) Let us write \[S_{n}=\sum _{r=0}^{n}\frac {1}{r!}.\] Show, from first principles, that \(S_{n}\) converges to a limit (which, with the benefit of extra knowledge, we call \(e\)).
(ii) Show that, if \(n\geq 2\) and \(r\geq 0\) then \[\frac {n!}{(n+r)!}\leq \frac {1}{3^{r}}.\] Deduce carefully that, if \(m\geq n\geq 2\), \[0\leq n!(S_{m}-S_{n})\leq \frac {1}{2}.\] and that \[0<n!(e-S_{n})\leq \frac {1}{2}.\] Deduce that \(n!e\) is not an integer for any \(n\) and conclude that \(e\) is irrational.
(iii) Show similarly that \(\displaystyle \sum _{r=0}^{\infty }\frac {1}{(2r)!}\) is irrational.
Q 18.7. In this question you should feel free to use any results you know (provided you quote them correctly).
(i) By using the mean value theorem, show that \((1+x^{-1})^{x}\) is an increasing function of \(x\) for \(x\geq 0\).
(ii) Show that \((1+n^{-1})^{n}\) tends to limit \(L\) as \(n\rightarrow \infty \).
(iii) Show that, if \(n\geq N\), \[e\geq \left (1+\frac {1}{n}\right )^{n}\geq \sum _{r=0}^{N}\binom {n}{r}n^{-r}.\] Deduce that \[e\geq L\geq \sum _{r=0}^{N}\frac {1}{r!},\] and conclude that \[\left (1+\frac {1}{n}\right )^{n}\rightarrow e\] as \(n\rightarrow \infty \).
(iv) Assuming the Taylor expansion for \(e^{t}\) show that \[\left (1+\frac {t}{n}\right )^{n}\rightarrow e^{t}\] as \(n\rightarrow \infty \) for all \(t>0\).
(v) Show that \[1-\left (1+\frac {t^{2}}{n^{2}}\right )^{n}\rightarrow 0\] and deduce that \[1-\left (1-\frac {t^{2}}{n^{2}}\right )^{n}\rightarrow 0\] as \(n\rightarrow \infty \). Hence deduce that \[\left (1-\frac {t}{n}\right )^{n}\rightarrow e^{-t}\] as \(n\rightarrow \infty \) for all \(t>0\).
(vi) Conclude that \[\left (1+\frac {t}{n}\right )^{n}\rightarrow e^{t}\] as \(n\rightarrow \infty \) for all real \(t\).
Q 18.8. (a) Prove the statements made in Example 3.9.
(i) The sum \(\sum _{n=1}^{\infty }n^{-2}z^{n}\) has radius of convergence \(1\) and converges for all \(|z|=1\).
(ii) The sum \(\sum _{n=0}^{\infty }z^{n}\) has radius of convergence \(1\) and diverges for all \(|z|=1\).
(iii) The sum \(\sum _{n=1}^{\infty }n^{-1}z^{n}\) has radius of convergence \(1\), diverges for \(z=1\) and converges for \(z=-1\).
(b) Suppose that \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \(R\) and \(\sum _{j=0}^{\infty }b_{j}z^{j}\) has radius of convergence \(S\).
(i) Show that if \(R\neq S\) then \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence \(\min (R,S)\).
(ii) Show that if \(R=S\) then \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence at least \(R\).
(iii) If \(T\geq 1=R=S\) give an example where \(\sum _{j=0}^{\infty }(a_{j}+b_{j})z^{j}\) has radius of convergence \(T\).
Q 18.9. (i) Suppose that \(x_{n}\) is a bounded sequence of real numbers. Show that \(y_{n}=\sup _{m\geq n}x_{m}\) is a well defined bounded decreasing sequence and so \(y_{n}\) tends to a limit \(y\) say. We write \[\limsup _{n\rightarrow \infty }x_{n}=y.\]
(ii) Let \(a_{n}\) be a sequence of complex numbers. Show that if the sequence \(|a_{n}|^{1/n}\) is unbounded, then \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \(0\) and if the the sequence \(|a_{n}|^{1/n}\) is bounded and non-zero, then \(\sum _{j=0}^{\infty }a_{j}z^{j}\) has radius of convergence \((\limsup _{n\rightarrow \infty }|a_{n}|^{1/n})^{-1}\). State, with reasons, what happens when \(\limsup _{n\rightarrow \infty }|a_{n}|^{1/n}=0\)
Q 18.10. Suppose that \(A\) and \(B\) are non-empty bounded subsets of \(\mathbb R\). Show that \[\sup \{a+b:a\in A,\ b\in B\}=\sup A+\sup B.\] The last formula is more frequently written \[\sup _{a\in A,\ b\in B}a+b=\sup _{a\in A}a+\sup _{b\in B}b.\]
Suppose, further that \(a_{n}\) and \(b_{n}\) are bounded sequences of real numbers. For each of the following statements either give a proof that it is always true or an example to show that it is sometimes false.
(i) \(\sup _{n}(a_{n}+b_{n})=\sup _{n}a_{n}+\sup _{n}b_{n}\).
(ii) \(\sup _{a\in A,\ b\in B}ab=(\sup _{a\in A}a)(\sup _{b\in B}b)\).
(iii) \(\inf _{a\in A,\ b\in B}a+b=\inf _{a\in A}a+\inf _{b\in B}b\).
Q 18.11. Prove all the statements made in Lemma 4.3, viz:
(i) The limit is unique. That is, if \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {a}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}=\mathbf {b}\).
(ii) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) as \(n\rightarrow \infty \) and \(n(1)<n(2)<n(3)\ldots \) then \(\mathbf {a}_{n(j)}\rightarrow \mathbf {a}\) as \(j\rightarrow \infty \).
(iii) If \(\mathbf {a}_{n}=\mathbf {c}\) for all \(n\) then \(\mathbf {a}_{n}\rightarrow \mathbf {c}\) as \(n\rightarrow \infty \).
(iv) If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\mathbf {b}_{n}\rightarrow \mathbf {b}\) as \(n\rightarrow \infty \) then \(\mathbf {a}_{n}+\mathbf {b}_{n} \rightarrow \mathbf {a}+\mathbf {b}\).
(v) Suppose \(\mathbf {a}_{n}\in {\mathbb R}^{m}\), \(\mathbf {a}\in {\mathbb R}^{m}\) \(\lambda _{n}\in {\mathbb R}\), and \(\lambda \in {\mathbb R}\). If \(\mathbf {a}_{n}\rightarrow \mathbf {a}\) and \(\lambda _{n}\rightarrow \lambda \) then \(\lambda _{n}\mathbf {a}_{n}\rightarrow \lambda \mathbf {a}\).
Q 18.12. Use the Bolzano–Weierstrass theorem to prove the general principle of convergence in \({\mathbb R}^{m}\) (Theorem 4.6).
The remaining questions are included for general interest and not for relevance to the syllabus or to passing Tripos exams.
Q 18.13. By comparing \[\frac {a}{b}\ \text {with}\ \frac {3a+4b}{2a+3b}\] show that given any \(x\in {\mathbb Q}\) with \(x^{2}\leq 2\) we can find a \(y\in {\mathbb Q}\) with \(y>x\) and \(y^{2}\leq 2\). Deduce that the set \(\{x\in {\mathbb Q}:x^{2}<2\}\) has no supremum in \(\mathbb Q\). (The idea of comparing \(a/b\) with \((3a+4b)/(2a+3b)\) goes back as far as Euclid.)
Q 18.14. If \(a<c<b\) and \(E\) is a subset of \(\mathbb R\) explain why if \(E\cap [a,c]\) and \(E\cap [c,b]\) are finite then so is \(E\cap [a,b]\). Use this to give a proof by bisection of Bolzano’s theorem.
Q 18.15. Do Exercise 3.14 showing the uncountability of the reals.
Let \(y_{1}\), \(y_{2}\), …be any sequence of points in \(\mathbb R\). Let \(x_{0}=0\), \(\delta _{0}=1\).
(i) Show that you can construct inductively a sequence of real numbers \(x_{1}\), \(x_{2}\), …and positive numbers \(\delta _{j}\) such that
(a) \(|x_{n}-x_{n-1}|<\delta _{n-1}/4\),
(b) \(x_{n}\neq y_{n}\),
(c) \(0<\delta _{n}<|x_{n}-y_{n}|\),
(d) \(\delta _{n}<\delta _{n-1}/4\).
(ii) Show that \(\delta _{n+m}<4^{-m}\delta _{n}\) for \(m,n\geq 0\) and deduce that the \(x_{n}\) form a Cauchy sequence. Conclude that \(x_{n}\) tends to a limit \(x\).
(iii) Show that \(|x_{n+m}-x_{n}|<\delta _{n}/3\) for all \(m,n\geq 0\). Deduce that \(|x-x_{n}|\leq \delta _{n}/3\) for all \(n\geq 0\). Why does this show that \(y_{n}\neq x\) for all \(n\).
(iv) Prove that the real numbers are uncountable.
Q 18.16. Suppose we consider the collection \(\mathcal Q\) of rational functions \[f(X)=\frac {a_{0}+a_{1}X+\dots +a_{n}X^{n}} {b_{0}+b_{1}X+\dots +b_{m}X^{m}}\] (where the \(a_{j}\) and \(b_{j}\) are real and where \(b_{m}\neq 0\) and \(a_{n}\neq 0\) if \(n\geq 1\)) with the usual rules for addition and multiplication. Suppose that we say that \(f(X)>0\) if \(a_{n}b_{m}>0\) and that \(f>g\) if \(f-g>0\). Convince yourself that \(\mathcal Q\) has all the standard properties of arithmetic and order we expect. (Some students will only be able to convince themselves by looking up the axioms for an ordered field in, for example, [8] and then verifying each axiom in turn. Good luck to them, I say.)
Show that \(1/n>1/X>0\) for all integers \(n\) and so Archimedes’ Axiom is false for \(\mathcal Q\).
I have tried to produce 12 rather routine questions for each sheet. Any further questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.
Q 19.1. Here is an alternative proof of Theorem 6.8
(i) Let \(K\geq 0\) and \(\epsilon >0\). If \(f:[a,b]\rightarrow {\mathbb R}\) has the property that \[f(b)-f(a)\geq (K+\epsilon )(b-a)\] and \(N\) is strictly positive integer show that there exists an integer \(r\) with \(1\leq r\leq N\) and \[f(a+r(b-a)/N)-f(a+(r-1)(b-a)/N)\geq (K+\epsilon )(b-a)/N\]
(ii) Under the conditions of part (i) deduce that there exists a sequence of closed intervals \([x_{n},y_{n}]\) with \[[a,b]=[x_{0},y_{0}]\supseteq [x_{1},y_{1}]\supseteq [x_{2},y_{2}]\supseteq \dots \] such that \(f(y_{n})-f(x_{n})\geq (K+\epsilon )(y_{n}-x_{n})\) and \(y_{n}-x_{n}\rightarrow 0\) as \(n\rightarrow \infty \).
(iii) Continuing with the same notation, show that \(x_{n}\rightarrow c\) for some \(c\in [a,b]\). Show that \(c\in [x_{n},y_{n}]\) for all \(n\) and that, if \(c\neq x_{n},y_{n}\) then \[\max \left ( \frac {f(y_{n})-f(c)}{y_{n}-c}, \frac {f(c)-f(x_{n})}{c-x_{n}} \right )\geq K+\epsilon .\] Conclude that if \(f\) is differentiable at \(c\) then \(f'(c)\geq K+\epsilon \).
(iv) Deduce all the results of Theorem 6.8.
Q 19.2. (i) If \(E\) is a set show that the set \[E^{\circ }=\bigcup \{U:\text {$U$ is open and $U\subseteq E$}\}\] has the following properties.
(a) \(E^{\circ }\) is open.
(b) \(E^{\circ }\subseteq E\).
(c) If \(A\) is open and \(A\subseteq E\) then \(A\subseteq E^{\circ }\).
(ii) Suppose that \(E'\) is such that
(a) \(E'\) is open.
(b) \(E'\subseteq E\).
(c) If \(A\) is open and \(A\subseteq E\) then \(A\subseteq E'\).
Show that \(E'=E^{\circ }\).
(iii) Explain why (i) and (ii) make it possible to define the interior of \(E\) as the largest open set contained in \(E\).
(iv) Show why it is possible to define the closure of \(E\) as the smallest closed set containing \(E\).
(v) Find the interior and closure of \((a,b)\), \([a,b)\), \((a,b]\) and \([a,b]\) where \(a<b\).
Q 19.3. Do the Exercise 5.13 restated below.
After looking at parts (iii) to (v) of Lemma 4.3 state the corresponding results for continuous functions. (Thus part (v) corresponds to the statement that if \(\lambda :E\rightarrow {\mathbb R}\) and \(f:E\rightarrow {\mathbb R}^{p}\) are continuous at \({\mathbf x}\in E\) then so is \(\lambda f\).) Prove your statements directly from Definition 5.12.
Q 19.4. If \(f,g:{\mathbb R}^{p}\rightarrow {\mathbb R}^{m}\) are continuous show that \[\begin {pmatrix} f\\g \end {pmatrix} :{\mathbb R}^{p}\rightarrow {\mathbb R}^{2m} ={\mathbb R}^{m}\times {\mathbb R}^{m}\] defined by \[\begin {pmatrix} f\\g \end {pmatrix}({\mathbf x})= \begin {pmatrix} f({\mathbf x})\\g({\mathbf x}) \end {pmatrix}\] is continuous.
Show that the map \(A:{\mathbb R}^{m}\times {\mathbb R}^{m} \rightarrow {\mathbb R}^{m}\) defined by \[A\begin {pmatrix} \mathbf {x}\\ \mathbf {y} \end {pmatrix}=\mathbf {x}+\mathbf {y}\] whenever \(\mathbf {x},\mathbf {y}\in {\mathbb R}^{m}\) is continuous. By applying the continuity of compositions of continuous functions (Lemma 5.16) to \(\displaystyle A\circ \begin {pmatrix} f\\g \end {pmatrix}\) deduce that \(f+g\) is continuous.
Obtain the other results of Question 19.3 similarly.
Q 19.5. Let \(f:{\mathbb R}\rightarrow {\mathbb R}\). State which of the following statements are always true and which may be false giving a proof or a counter example as appropriate.
(i) If \(f^{-1}(E)\) is closed whenever \(E\) is closed then \(f\) is continuous.
(ii) If \(f\) is continuous then \(f(U)\) is open whenever \(U\) is open.
(iii) If \(f\) is continuous then \(f(E)\) is bounded whenever \(E\) is bounded.
(iv) If \(f(E)\) is bounded whenever \(E\) is bounded then \(f\) is continuous.
Q 19.6. Suppose that \(E\) is a subset of \({\mathbb R}^{n}\). Show that \(E\) is closed and bounded if and only if every continuous function \(f:E\rightarrow {\mathbb R}^{n}\) is bounded.
Q 19.7. If \(A\) and \(B\) are subsets of \({\mathbb R}^{m}\) we write \[A+B=\{{\mathbf a}+{\mathbf b}: {\mathbf a}\in A,\ {\mathbf b}\in B\}\]
By considering the case \(m=1\), \[A=\{-n:n\in {\mathbb N}\}\ \text {and} \ B=\{n+1/n:n\in {\mathbb N},\ n\geq 2\}\] show that if \(A\) and \(B\) are closed it does not follow that \(A+B\) is closed.
Show however that if \(A\) is closed and bounded and \(B\) is closed then \(A+B\) is closed.
If \(A\) and \(B\) are open, does it follow that \(A+B\) is open? Give reasons.
Q 19.8. Consider the map \(\Omega :{\mathbb R}^{6}\rightarrow {\mathbb R}^{3}\) given by \[\Omega \begin {pmatrix}{\mathbf x}\\{\mathbf y} \end {pmatrix}= {\mathbf x}\wedge {\mathbf y}\] for all \({\mathbf x},{\mathbf y}\in {\mathbb R}^{3}\). (Here \(\wedge \) is the ‘vector product’ of mathematical methods.) Show directly from the definition that \(\Omega \) is differentiable.
Having done this, find the Jacobian matrix of \(\Omega \).
Q 19.9 (Very traditional). Consider the map \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) given by \[f(x,y)=xy|x-y|.\] At which points is \(f\) differentiable? Prove your statements.
[Deal quickly with the easy part. Do not be surprised if the hard part reveals a special case that must be dealt with separately.]
Q 19.10 (Also very traditional). (i) Consider the function \(f:{\mathbb R}\rightarrow {\mathbb R}\) given by \(f(t)=t^{2}\sin 1/t\) for \(t\neq 0\), \(f(0)=0\). Show that \(f\) is everywhere differentiable and find its derivative. Show that \(f'\) is not continuous. [Deal quickly with the easy part and then go back to the definition to deal with \(t=0\). There are wide selection of counter-examples obtained by looking at \(t^{\beta }\sin t^{\alpha }\) for various values of \(\alpha \) and \(\beta \).]
(ii) Find an infinitely differentiable function \(g:(1,\infty )\rightarrow {\mathbb R}\) such that \(g(t)\rightarrow 0\) but \(g'(t)\nrightarrow 0\) as \(t\rightarrow \infty \).
(iii) Consider the function \(F:{\mathbb R}^{2}\rightarrow {\mathbb R}\) given by \(F(s,t)=f(s)+f(t)\) where \(f\) is the function defined in (i). Show that \(F\) is everywhere differentiable but \(F_{,1}\) and \(F_{,2}\) are not continuous at \((0,0)\).
Q 19.11. Let \(\mathbf {u}\) and \(\mathbf {v}\) be linearly independent vectors in \({\mathbb R}^{2}\). Suppose that \(F:{\mathbb R}^{2}\rightarrow {\mathbb R}\) has the property that \[\frac {F({\mathbf x}+h{\mathbf u})-F({\mathbf x})}{h} \rightarrow a({\mathbf x}),\ \text {and} \ \frac {F({\mathbf x}+h{\mathbf v})-F({\mathbf x})}{h} \rightarrow b({\mathbf x})\] as \(h\rightarrow 0\) for all \({\mathbf x}\in {\mathbb R}^{2}\). Suppose further that \(a:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous. Show that \(F\) is everywhere differentiable.
Q 19.12. Show that Definition 7.4 can be modified to give the definition of an operator norm \(\|\alpha \|\) for a linear map \(\alpha :{\mathbb C}^{m}\rightarrow {\mathbb C}^{p}\) and that the results of Lemma 7.6 continue hold mutatis mutandis11.
If \(\alpha \) has matrix \((a_{ij})\) with respect to the standard bases show that \[\max _{i,j}|a_{ij}|\leq \|\alpha \|\leq mp\max _{i,j}|a_{ij}|.\]
For the rest of this question we take \(m=p\) (so we are dealing with endomorphisms of the vector space \({\mathbb C}^{m}\)).
(i) If \(\alpha \) has a lower triangular matrix \((a_{ij})\) with respect to the standard basis show that the set of its eigenvalues is precisely the set of the diagonal entries \(a_{ii}\). (Pure algebra.)
(ii) If \(\alpha \) has a lower triangular matrix \((a_{ij})\) and \(\epsilon >0\) show that we can find \(\beta \) such that \(\beta \) has \(m\) distinct non-zero eigenvalues and \(\|\beta -\alpha \|<\epsilon \).
(iii) In algebra you show that given any endomorphism \(\alpha \) we can find an invertible endomorphism \(\theta \) such that \(\theta \alpha \theta ^{-1}\) has a lower triangular matrix with respect to the standard basis. (N.B. This is true for complex vector spaces but not for real ones.) Use this fact to show that given any endomorphism \(\alpha \) and any \(\eta >0\) we can an endomorphism \(\gamma \) such that \(\|\gamma -\alpha \|<\eta \) and \(\gamma \) has \(m\) distinct non-zero eigenvalues.
(iv) Write \(P_{\alpha }(t)=\det (t\iota -\alpha )\) (where \(\iota \) is the identity endomorphism). Show that if \(\alpha \) has \(m\) distinct eigenvalues then \(P_{\alpha }(\alpha )=0\). (Pure algebra.)
(v) Let \(\alpha \) be any endomorphism. By considering a sequence \(\alpha _{n}\) of endomorphisms with \(m\) distinct eigenvalues such that \(\|\alpha _{n}-\alpha \|\rightarrow 0\) as \(n\rightarrow \infty \) show that \(P_{\alpha }(\alpha )=0\).
[This procedure is substantially longer than that used to prove the Cayley-Hamilton theorem in your algebra course but it is an instructive way of looking at things.]
I have tried to produce 12 rather routine questions for each sheet. Any remaining questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.
Q 20.1. At the beginning of the 19th century it was hoped that the calculus could be reduced to mere algebra by using Taylor’s theorem. The following example due to Cauchy shows why such hopes were misplaced.
Let \(E:{\mathbb R}\rightarrow {\mathbb R}\) be defined by \(E(t)=\exp (-1/t^{2})\) for \(t\neq 0\) and \(E(0)=0\). Use induction to show that \(E\) is infinitely differentiable with \begin {align*} E^{(n)}(t)&=Q_{n}(1/t)E(t)\qquad \text {for $t\neq 0$}\\ E^{(n)}(0)&=0. \end {align*}
where \(Q_{n}\) is a polynomial. For which values of \(t\) is it true that \[E(t)=\sum _{n=0}^{\infty }\frac {E^{n}(0)t^{n}}{n!}?\]
Explain why the behaviour of \(E\) is consistent with Lemma 8.7 (taking \(m=1\)).
Q 20.2. In the first year you saw how the find stationary points for a well behaved function \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) and how to decide (except in certain specified cases) whether they are maxima, minima or saddle-points. Use Theorem 8.3 to state and prove your results rigorously.
In the course on quadratic mathematics you will see that given any real symmetric \(m\times m\) matrix \(A\) you can find a special orthogonal matrix \(P\) such that \(PAP^{T}\) is diagonal. Explain the relevance of this result to the problem of deciding (except in certain specified cases) when a stationary point of a well behaved function \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}\) is a maximum, a minimum or a saddle point.
Q 20.3. Define \(f:[0,1]\rightarrow {\mathbb R}\) by \(f(p/q)=1/q\) when \(p\) and \(q\) are coprime integers with \(1\leq p <q\) and \(f(x)=0\) otherwise.
(i) Show that \(f\) is Riemann integrable and find \(\int _{0}^{1}f(x)\,dx\).
(ii) At which points is \(f\) continuous? Prove your answer.
Q 20.4. We say that a function \(f:[0,1]\rightarrow {\mathbb R}\) is of bounded variation if there exists a constant \(K\) such that whenever \(0\leq x_{0}\leq x_{1}\leq x_{2}\dots \leq x_{n}\leq 1\) we have \[\sum _{j=1}^{n}|f(x_{j-1})-f(x_{j})|\leq K.\]
(i) Show that if \(g:[0,1]\rightarrow {\mathbb R}\) is increasing then \(g\) is of bounded variation. Show that if \(h_{1},h_{2}:[0,1]\rightarrow {\mathbb R}\) are increasing then \(h=h_{1}-h_{2}\) is of bounded variation.
(ii) Suppose that \(f\) is of bounded variation. Show that \[f_{1}(x)=\sup \{ {\textstyle \sum _{j=1}^{n}f(x_{j})-f(y_{j})} :0\leq y_{1}\leq x_{1}\leq y_{2}\leq x_{2}\leq \dots \leq y_{n}\leq x_{n}\leq x\}\] is a well defined increasing function. Show that \(f_{1}-f\) is also an increasing function and so \(f\) is the difference of two increasing functions.
(iii) Show that if we set \(f(x)=x\sin x^{-1}\) for \(x\neq 0\) and \(f(0)=0\) then \(f:[0,1]\rightarrow {\mathbb R}\) is a continuous function which is not of bounded variation.
[Remember diagrams may not be rigorous but they certainly help you understand what you are talking about.]
Q 20.5. If \(f:[0,1]\rightarrow {\mathbb R}\) is increasing show that if \(x_{n}\) is an increasing sequence then \(f(x_{n})\) tends to limit as \(n\rightarrow \infty \). Deduce carefully that if \(0<y\leq 1\) then \(f(x)\) tends to a limit \(f(y-)\) as \(x\rightarrow y\) through values of \(x\) with \(x<y\). (We shall set \(f(0-)=f(0)\).) Define \(f(y+)\) similarly.
Let us call \(f(y+)-f(y-)\) the jump \(J(y)\) at \(y\). Show that if \(y_{1}\), \(y_{2}\), …\(y_{n}\) are distinct points of \((0,1)\) then \[\sum _{j=1}^{n}J(y_{j})\leq f(1)-f(0).\] Show that \(f\) can have only finitely many jumps of size \(1/n\) or more. Deduce that \(f\) has only countably many jumps and that \(f\) must be continuous at some point.
Q 20.6. Let \(\mathcal K\) be the set of functions \(f:[0,1]\rightarrow {\mathbb R}\) such that \(f\) is continuous on \((0,1]\) and \(f\) has an improper Riemann integral \[{\mathcal I}f=\int _{0}^{1}f(x)\,dx.\]
Show that if \(f,g\in \mathcal K\) and \(\lambda ,\mu \in {\mathbb R}\) then \(\lambda f+\mu g\in \mathcal K\) and \[{\mathcal I}(\lambda f+\mu g)=\lambda {\mathcal I}f +\mu {\mathcal I}g.\]
By giving a proof or counter-example establish whether it is always true that if \(f,g\in {\mathcal K}\) then \(fg\in {\mathcal K}\).
Q 20.7. Suppose that \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) is continuous. By using results on the differentiation of integrals which should be quoted exactly show that \[\frac {d\ }{dt}\int _{a}^{t} \left (\int _{c}^{d}f(u,v)\,dv\right )\,du =\frac {d\ }{dt}\int _{c}^{d} \left (\int _{a}^{t}f(u,v)\,du\right )\,dv.\] Deduce that \[\int _{a}^{b} \left (\int _{c}^{d}f(u,v)\,dv\right )\,du =\int _{c}^{d} \left (\int _{a}^{b}f(u,v)\,du\right )\,dv,\] or, more informally \[\int _{a}^{b} \int _{c}^{d}f(u,v)\,dv\,du =\int _{c}^{d} \int _{a}^{b}f(u,v)\,du\,dv.\]
Q 20.8. Prove all the results of Section 11. (A long but easy and useful exercise. In order to keep the noise level from whinging students down to tolerable levels this counts as 2 questions.)
Q 20.9. See Question 20.8.
Q 20.10. Consider \(\mathcal E\) the space of linear maps \(\alpha :{\mathbb R}^{u}\rightarrow {\mathbb R}^{u}\). We wish to show that \(d(\alpha ,\beta )=\|\alpha -\beta \|\) (with \(\|\alpha \|\) the usual ‘operator norm’) defines a complete metric.
To this end consider a Cauchy sequence \(\alpha _{n}\) in \(\mathcal E\). Show that if \(\mathbf {x}\in {\mathbb R}^{u}\) then \(\alpha _{n}(\mathbf {x})\) tends to limit, call it \(\alpha (\mathbf {x})\). Now show that \(\alpha :{\mathbb R}^{u}\rightarrow {\mathbb R}^{u}\) is linear. By using the inequality \begin {align*} \|\alpha (\mathbf {x})-\alpha _{n}(\mathbf {x})\|& \leq \|\alpha _{m}(\mathbf {x})-\alpha _{n}(\mathbf {x})\| +\|\alpha _{m}(\mathbf {x})-\alpha (\mathbf {x})\|\\ &\leq \sup _{p,q\geq n}\|\alpha _{p}(\mathbf {x})-\alpha _{q}(\mathbf {x})\| +\|\alpha _{m}(\mathbf {x})-\alpha (\mathbf {x})\|, \end {align*}
and allowing \(m\rightarrow \infty \) show that \[\|\alpha -\alpha _{n}\|\leq \sup _{p,q\geq n}\|\alpha _{p}-\alpha _{q}\|,\] and deduce that \(\|\alpha -\alpha _{n}\|\rightarrow \infty \) as \(n\rightarrow \infty \).
Q 20.11. In Question 20.10 we showed that \(\mathcal {E}\) with the appropriate metric is complete. In this question we make use of the result.
(i) Suppose \(\alpha \in \mathcal {E}\) and \(\|\alpha \|<1\). Let \[\sigma _{n}=\iota +\alpha +\alpha ^{2}+\dots +\alpha ^{n}.\] Show that \(\sigma _{n}\) converges to a limit \(\sigma \), say.
Compute \((\iota -\alpha )\sigma _{n}\) and show, carefully that \((\iota -\alpha )\sigma =\iota \).
(iii) Deduce that if \(\|\iota -\gamma \|<1\) then \(\gamma \) is invertible.
(iv) Show that if \(\beta \in \mathcal {E}\) is invertible and \(\|\beta -\tau \|<\|\beta ^{-1}\|^{-1}\) then \(\tau \) is invertible.
(v) Conclude that the invertible elements of \(\mathcal E\) form an open set.
Q 20.12. Consider \(C([-1,1])\) the set of continuous functions \(f:[-1,1]\rightarrow {\mathbb R}\). Show that, if we set \[d(f,g)=\|f-g\|_{2}= \left (\int _{-1}^{1}|f(x)-g(x)|^{2}\,dx\right )^{1/2}\] then \(d\) is a metric but \(d\) is not complete.
Show that if \(\|f-g\|_{1}\) is defined as in Example 13.8 then \(2^{1/2}\|f\|_{2}\geq \|f\|_{1}\) for all \(f\in C([-1,1])\) but that, given any constant \(K>0\) we can find a \(g\in C([-1,1])\) such that \(\|g\|_{2}\geq K\|g\|_{1}\).
The remaining question is included for general interest and not for relevance to the syllabus or to passing Tripos exams.
Q 20.13. We use the notation of Questions 20.10 and 20.11.
(i) Show that if \(\alpha \in \mathcal {E}\) we can find \(e^{\alpha }\in \mathcal {E}\) such that \[\left \|\sum _{r=0}^{n}\frac {\alpha ^{r}}{r!} -e^{\alpha }\right \|\rightarrow 0\] as \(n\rightarrow \infty \).
(ii) Show carefully that if \(\alpha \) and \(\beta \) commute \[e^{\alpha }e^{\beta }=e^{\alpha +\beta }.\]
(iii) Show that if \(\alpha \) and \(\beta \) are general (not necessarily commuting) elements of \(\mathcal E\) then \[\left \|h^{-2}(e^{h\alpha }e^{h\beta }-e^{h\beta }e^{h\alpha }) -(\alpha \beta -\beta \alpha )\right \|\rightarrow 0\] as the real number \(h\rightarrow 0\).
Conclude that, in general, \(e^{\alpha }e^{\beta }\) and \(e^{\alpha +\beta }\) need not be equal.
I have tried to produce 12 rather routine questions for each sheet. The remaining questions are for interest only. Ambitious students and their supervisors should look at Tripos questions to supplement this meagre fare.
Q 21.1. Consider the two railway metrics \(d_{1}\) and \(d_{2}\) on \({\mathbb R}^{2}\) given in Examples 11.3 and 11.4. For each of the two metrics give a reasonably simple description of the open balls of radius \(r\) and centre \(\mathbf x\) when \({\mathbf x}={\mathbf 0}\) and when \({\mathbf x}\neq {\mathbf 0}\) and (a) \(r\leq \|{\mathbf x}\|\), (b) \(r<\|{\mathbf x}\|\leq 2r\), (c) \(2r<\|{\mathbf x}\|\).
Explain the statement ‘when finding open sets only balls of small radius are important so cases (b) and (c) are irrelevant’.
Give the simplest description you can find of open sets in the two metrics.
Q 21.2. Let \(f:{\mathbb R}^{2}\rightarrow {\mathbb R}\) be differentiable and let \(g(x)=f(x,c-x)\) where \(c\) constant. Show that \(g:{\mathbb R}\rightarrow {\mathbb R}\) is differentiable and find its derivative
(i) directly from the definition of differentiability
and also
(ii) by using the chain rule.
Deduce that if \(f_{,1}=f_{,2}\) throughout \(\mathbb R\) then \(f(x,y)=h(x+y)\) for some differentiable function \(h\).
Q 21.3 (Traditional). Consider the functions \(f_{n}:[0,1]\rightarrow {\mathbb R}\) defined by \(f_{n}(x)=n^{p}x\exp (-n^{q}x)\) where \(p,q>0\).
(i) Show that \(f_{n}\) converges pointwise on \([0,1]\).
(ii) Show that if \(p<q\) then \(f_{n}\) converges uniformly on \([0,1]\).
(iii) Show that if \(p\geq q\) then \(f_{n}\) does not converge uniformly on \([0,1]\). Does \(f_{n}\) converge uniformly on \([0,1-\epsilon ]\)? Does \(f_{n}\) converge uniformly on \([\epsilon ,1]\)? (Here \(0<\epsilon <1\), you should justify your answers.)
Q 21.4. Let \((X,d)\) be a metric space. We say that \(E\) is dense in \(X\) if whenever \(x\in X\) we can find \(e_{n}\in E\) with \(e_{n}\rightarrow x\).
(i) Show that the rationals are dense in the reals (i.e. \(\mathbb Q\) is dense in \(\mathbb R\) if we give \(\mathbb R\) the usual metric). Show that the irrationals are dense in the reals.
[You should make explicit any use you make of the axiom of Archimedes.]
(ii) Consider the space \(C([0,1])\) of continuous functions with the uniform norm. Show that the piecewise linear functions are dense in \(C([0,1])\).
Q 21.5. Recall from your courses in mathematical methods that, if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) then we write \[\hat {f}(n)=\frac {1}{2\pi } \int _{-\pi }^{\pi }f(t)\exp (-int)\,dt.\]
Show, quoting carefully any results that you use, that, if \(a_{n}\in {\mathbb C}\) and \(\sum _{n=-N}^{N} |a_{n}|\) converges, then \(\sum _{n=-N}^{N} a_{n}\exp (int)\) converges uniformly to a continuous periodic function \(f(t)\), say, with period \(2\pi \). Show that \(\hat {f}(n)=a_{n}\). for each \(n\).
[This chain of thought is continued in Question 21.14.]
Q 21.6. Recall that \(\|\ \|\) is said to be a norm on a real vector space \(V\) if the following conditions hold.
(1) \(\|\mathbf {x}\|\geq 0\) for all \(\mathbf {x}\in V\).
(2) \(\|\mathbf {x}\|=0\) implies \(\mathbf {x}=\mathbf {0}\).
(3) \(\|\lambda \mathbf {x}\|=|\lambda |\|\mathbf {x}\|\) for all \(\lambda \in {\mathbb R}\) and all \(\mathbf {x}\in V\).
(4) \(\|\mathbf {x}+\mathbf {y}\|\leq \|\mathbf {x}\|+ \|\mathbf {y}\|\) for all \(\mathbf {x},\mathbf {y}\in {\mathbb R}\).
(i) Let \(\|\ \|_{2}\) be the usual Euclidean norm on \({\mathbb R}^{m}\) and let \(\|\ \|\) be another norm on \({\mathbb R}^{m}\). Let \(I:({\mathbb R}^{m},\|\ \|_{2})\rightarrow ({\mathbb R}^{m},\|\ \|)\) be the identity map, given by \(I\mathbf {x}=\mathbf {x}\). By considering basis vectors or otherwise that there exists a constant \(K\) such that \(\|\mathbf {x}\|\leq K\|\mathbf {x}\|_{2}\) for all \(\|\mathbf {x}\|\in {\mathbb R}^{m}\). Deduce that \(I\) is continuous.
By applying a theorem on continuous functions on closed bounded sets show that there exists a \(k>0\) such that \(\|\mathbf {x}\|\geq k\) for all \(\mathbf {x}\) with \(\|\mathbf {x}\|_{2}=1\). Conclude that \[k\|\mathbf {x}\|_{2}\leq \|\mathbf {x}\|\leq K\|\mathbf {x}\|_{2}\] for all \(\|\mathbf {x}\|\in {\mathbb R}^{m}\).
[Thus all norms on a finite dimensional space are essentially the same.]
(ii) Consider the real vector space \(\mathbb {R}^{\mathbb {N}}\) of all real sequences \(\mathbf {x}=(x_{1},x_{2},\dots )\) with the usual vector addition and multiplication by scalars. Let \(V\) be the set of all sequences \(\mathbf {x}=(x_{1},x_{2},\dots )\) such that \(x_{j}\neq 0\) for only finitely many \(j\). Show that \(V\) is a subspace of \(\mathbb {R}^{\mathbb {N}}\) and so a vector space in its own right. By considering norms of the form \(\|\mathbf {x}\|=\max _{j}\kappa _{j}|x_{j}|\), or otherwise, find two norms \(\|\ \|_{A}\) and \(\|\ \|_{B}\) such that \[\sup _{\|\mathbf {x}\|_{A}=1}\|\mathbf {x}\|_{B} =\sup _{\|\mathbf {x}\|_{B}=1}\|\mathbf {x}\|_{A}=\infty .\]
Q 21.7. (i) Suppose that \(f:{\mathbb R}\rightarrow {\mathbb R}\) is a twice differentiable function such that \[\left |\frac {f(x)f''(x)}{f'(x)^{2}}\right | \leq \lambda \] for all \(x\) and some \(|\lambda |<1\) Show that the mapping \[Tx=x-\frac {f(x)}{f'(x)}\] is a contraction mapping and deduce that \(f\) has a unique root \(y\).
(ii) Suppose that \(F:{\mathbb R}\rightarrow {\mathbb R}\) is a twice differentiable function such that \[\left |\frac {F(x)F''(x)}{F'(x)^{2}}\right | \leq \lambda \] for all \(|x|\leq a\) and some \(|\lambda |<1\) and that \(F(0)=0\). Consider the mapping \[Tx=x-\frac {F(x)}{F'(x)}.\] Show that \(T^{n}x\rightarrow 0\).
Suppose that \[\frac {\sup _{|t|\leq a}|f'(t)|\sup _{|t|\leq a}|f''(t)|} {\inf _{|t|\leq a}|f'(t)|^{2}}=M.\] By using the mean value theorem twice, show that if \(|x|\leq a\) then \[|Tx|\leq Mx^{2}.\]
(iii) If you know what the Newton–Raphson method is, comment on the relevance of the results of (i) and (ii) to that method.
Q 21.8. We work in \({\mathbb R}^{m}\) with the usual ‘Euclidean norm’. The following line of thought is extremely important in later work. Suppose we want the solution \({\mathbf x}_{0}\) of \[{\mathbf x}+{\boldsymbol \epsilon }({\mathbf x})= {\mathbf y}\] where \({\boldsymbol \epsilon }({\mathbf x})\) is ‘a small error term’ or ‘of first order compared to \(\mathbf x\). The following method of solution is traditional (for the excellent reason that it usually works like a Spanish charm). Start by ignoring the small \(\boldsymbol \epsilon \) term and guess \({\mathbf x}_{0}\approx {\mathbf x}_{1}={\mathbf y}\). Since this initial guess is good we estimate the small \(\boldsymbol \epsilon \) term as \({\boldsymbol \epsilon }({\mathbf x}_{1})\). Feeding this estimate back into the equation we now guess \({\mathbf x}_{0}\approx {\mathbf x}_{2}={\mathbf y}-{\boldsymbol \epsilon }({\mathbf x}_{1})\). Repeating this process gives us the rule for successive approximations \[\mathbf {x}_{n+1}=\mathbf {y}-{\boldsymbol \epsilon }(\mathbf {x}_{n}).\] We now try to justify this in certain cases.
(i) Suppose that \({\boldsymbol \epsilon }({\mathbf 0})={\mathbf 0}\) and \[\|{\boldsymbol \epsilon }({\mathbf a})- {\boldsymbol \epsilon }({\mathbf b})\| <\|{\mathbf a}-{\mathbf b}\|/2\] for all \(\|\mathbf {a}\|,\|\mathbf {b}\|\leq \delta \) where \(\delta >0\). (Note that these are rather stronger conditions than might have been expected. However, this is not to say that the general idea might not work under weaker conditions.) We shall show that the method works for \(\|{\mathbf y}\|\leq \delta /2\).
Consider the closed ball \(B=\{{\mathbf x}:\|{\mathbf x}\|\leq \delta \}\). Show that the equation \[T({\mathbf x})=\mathbf {y}-{\boldsymbol \epsilon }({\mathbf x})\] defines a map \(T:B\rightarrow B\). Now use the contraction mapping theorem to show that \[{\mathbf x}+{\boldsymbol \epsilon }({\mathbf x})= {\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta \) and that \(T^{n}({\mathbf y})\rightarrow {\mathbf x}(\mathbf {y})\) as \(n\rightarrow \infty \).
(ii) Suppose that \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is a differentiable function with derivative continuous at \(\mathbf 0\), \(f({\mathbf 0})={\mathbf 0}\) and \(Df({\mathbf 0})=I\). Show that we can find a \(\delta >0\) such that whenever \(\|{\mathbf y}\|\leq \delta /2\) the equation \[f({\mathbf x})={\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta \).
(iii) Suppose that \(f:{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is a differentiable function with derivative continuous at \(\mathbf 0\), \(f({\mathbf 0})={\mathbf 0}\) and \(Df({\mathbf 0})\) invertible. Show that we can find a \(\delta _{1},\delta _{2}>0\) such that whenever \(\|{\mathbf y}\|\leq \delta _{1}\) the equation \[f({\mathbf x})={\mathbf y}\] has a unique solution \({\mathbf x}(\mathbf {y})\) with \(\|{\mathbf x}(\mathbf {y})\|\leq \delta _{2}\).
Q 21.9. (A method of Abel) (i) Suppose that \(a_{j}\) and \(b_{j}\) are sequences of complex numbers and that \(S_{n}=\sum _{j=1}^{n}a_{j}\) for \(n\geq 1\) and \(S_{0}=0\). Show that, if \(1\leq u\leq v\) then \[\sum _{j=u}^{v}a_{j}b_{j}=\sum _{j=u}^{v}S_{j}(b_{j}-b_{j+1}) -S_{u-1}b_{u}+S_{v}b_{v+1}.\] (This is known as partial summation, for obvious reasons.)
(ii) Suppose now that, in addition, the \(b_{j}\) form a decreasing sequence of positive terms and that \(|S_{n}|\leq K\) for all \(n\). Show that \[\left |\sum _{j=u}^{v}a_{j}b_{j}\right | \leq 2Kb_{u}.\] Deduce that if \(b_{j}\rightarrow 0\) as \(j\rightarrow \infty \) then \(\sum _{j=1}^{\infty }a_{j}b_{j}\) converges.
Deduce the alternating series test.
(iii) If \(b_{j}\) is a decreasing sequence of positive terms with \(b_{j}\rightarrow 0\) as \(j\rightarrow \infty \) show that \(\sum _{j=1}^{\infty }b_{j}z^{j}\) converges uniformly in the region given by \(|z|\leq 1\) and \(|z-1|\geq \epsilon \) for all \(\epsilon >0\).
Q 21.10. We work in \({\mathbb R}^{m}\) with the usual distance.
(i) Show that if \(A_{1}\), \(A_{2}\), are non-empty, closed and bounded with \(A_{1}\supseteq A_{2}\supseteq \dots \) then \(\bigcap _{j=1}^{\infty }A_{j}\) is non empty. Is this result true if we merely assume \(A_{j}\) closed and non-empty? Give reasons.
(ii) If \(A\) is non-empty, closed and bounded show that we can find \({\mathbf a}',{\mathbf b}'\in A\) such that \[\|{\mathbf a}'-{\mathbf b}'\|\geq \|{\mathbf a}-{\mathbf b}\|\] for all \({\mathbf a},{\mathbf b}\in A\). Is this result true if we merely assume \(A\) bounded and non-empty? Give reasons.
Q 21.11. We work in \({\mathbb R}^{m}\) with the usual distance. Let \(E\) be a closed non-empty subset of \({\mathbb R}^{m}\) and let \(T\) be a map \(T:E\rightarrow E\).
(i) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|<\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). We saw in Example 15.2 that \(T\) need not have a fixed point. Show that, if \(T\) has a fixed point, it is unique.
(ii) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|>\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). In Question 21.17 it is shown that \(T\) need not have a fixed point. Show that, if \(T\) has a fixed point, it is unique.
(iii) Suppose \(\|T({\mathbf a})-T({\mathbf b})\|=\|{\mathbf a}-{\mathbf b}\|\) for all \({\mathbf a},{\mathbf b}\in E\). Show that \(T\) need not have a fixed point. and that, if \(T\) has a fixed point, it need not be unique.
(iv) Suppose now that \(E\) is non-empty, closed and bounded and \[\|T({\mathbf a})-T({\mathbf b})\|<\|{\mathbf a}-{\mathbf b}\|\] for all \({\mathbf a},{\mathbf b}\in E\) with \({\mathbf a}\neq {\mathbf b}\). By considering \(\inf _{{mathbf x}\in E}\|{\mathbf x}-T({mathbf x})\|\), or otherwise show that \(T\) has a fixed point.
Q 21.12. From time to time numerical analysts mention the spectral radius. It forms no part of any of non-optional 1B courses but the reader may be interested to see what it is.
(i) Give an example of a linear map \(\beta :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) such that \(\beta ^{m-1}\neq {\mathbf 0}\) but \(\beta ^{m}={\mathbf 0}\).
(ii) Let \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) be linear. If \(n=jk+r\) explain why \[\|\alpha ^{jk+r}\|\leq \|\alpha ^{j}\|^{k}\|\alpha \|^{r}.\]
(iii) Continuing with the hypotheses of (ii), show that \(\Delta =\inf _{n}\|\alpha ^{n}\|^{1/n}\) is well defined and, by using the result of (ii), or otherwise, that \[\|\alpha ^{n}\|^{1/n}\rightarrow \Delta .\] We call \(\Delta \) the spectral radius of \(\alpha \) and write \(\rho (\alpha )=\Delta \).
(iv) If \(\alpha :{\mathbb R}^{m}\rightarrow {\mathbb R}^{m}\) is diagonalisable show that \[\rho (\alpha )=\max \{|\lambda |:\lambda \ \text {an eigenvalue of $\alpha $}\}.\]
(v) Give an example of linear maps \(\alpha ,\beta :{\mathbb R}^{2}\rightarrow {\mathbb R}^{2}\) such that \(\rho (\alpha )=\rho (\beta )=0\) but \(\rho (\alpha +\beta )=1\).
(vi) Recalling Question 20.11 of the third sheet, show that if \(\rho (\iota -\gamma )<1\) then \(\gamma \) is invertible.
The remaining questions are included for general interest and not for relevance to the syllabus or to passing Tripos exams.
Q 21.13. We work in \({\mathbb R}^{m}\), \(\|{\mathbf x}-{\mathbf y}\|\) will represent the usual Euclidean distance between \(\mathbf x\) and \(\mathbf y\).
(i) If \(K\) is a closed non-empty bounded set and \(\mathbf x\) is any point, show that there exists a point \({\mathbf k}'\in K\) such that \[\|{\mathbf x}-{\mathbf k}\|\geq \|{\mathbf x}-{\mathbf k}'\|\] for all \({\mathbf k}\in K\). Is \({\mathbf k}'\) necessarily unique?
(ii) If \(E\) is a non-empty closed set and \(\mathbf x\) is any point, show that there exists a point \({\mathbf e}'\in E\) such that \[\|{\mathbf x}-{\mathbf e}\|\geq \|{\mathbf x}-{\mathbf e}'\|\] for all \({\mathbf e}\in E\). We write \(d({\mathbf x},E)=\|{\mathbf x}-{\mathbf e}'\|.\)
(iii) With the notation of (ii) show that \[d({\mathbf x},E)+\|{\mathbf x}-{\mathbf y}\| \geq d({\mathbf y},E).\] Show that \(d(\ ,E):{\mathbb R}^{m}\rightarrow {\mathbb R}\) is continuous.
(iv) If \(E\) is closed and non-empty and \(K\) closed, bounded and non-empty show that there exist \({\mathbf e}'\in E\) and \({\mathbf k}'\in K\) such that \[\|{\mathbf e}-{\mathbf k}\|\geq \|{\mathbf e}'-{\mathbf k}'\|.\] Would this result be true if we only assumed \(E\) and \(K\) closed and non-empty.
Q 21.14. (i) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\hat {f}(n)=0\) for all \(n\) then \[\int _{-\pi }^{\pi }(1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}f(t)\,dt=0\] for all \(N\).
(ii) Show that, given \(\delta >0\) we can find \(\epsilon _{1},\ \epsilon _{2}>0\) and an \(\eta >0\) such that \begin {align*} (1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}\rightarrow \infty &\ \text {uniformly for $|t|\leq \eta $}\\ (1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}\rightarrow 0&\ \text {uniformly for $\delta \leq |t|\leq \pi $} \end {align*}
as \(N\rightarrow \infty \).
(iii) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(f(0)\) is real and \(f(0)>0\) we can find \(\epsilon _{1},\ \epsilon _{2}>0\) such that \[\Re \int _{-\pi }^{\pi }(1-\epsilon _{1}+\epsilon _{2}\cos t)^{N}f(t)\,dt \rightarrow \infty \] as \(N\rightarrow \infty \).
(iv) Show that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\hat {f}(n)=0\) for all \(n\) then \(f(t)=0\) for all \(t\).
(v) By using Question 21.5, or otherwise, show that if if \(F:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \) such that \(\sum |\hat {F}(n)|\) converges then \[F(t)=\sum _{n=-\infty }^{\infty }\hat {F}(n)\exp (int).\]
Q 21.15. (i) Suppose that \(g:{\mathbb R}\rightarrow {\mathbb C}\) is a continuous periodic function with period \(2\pi \). Show that \[\frac {1}{2\pi }\int _{0}^{2\pi } \left |g(t)-\sum _{r=-N}^{N}\hat {g}(r)\exp (irt)\right |^{2} \,dt =\frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt -\sum _{r=-N}^{N}|\hat {g}(r)|^{2}.\] Deduce that \[\sum _{r=-N}^{N}|\hat {g}(r)|^{2} \leq \frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt.\] (This is a version of Bessel’s inequality.)
(ii) Now suppose that \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuously differentiable periodic function with period \(2\pi \). If \(f\) has derivative \(g\), obtain a simple relation between \(\hat {f}(n)\) and \(\hat {g}(n)\). By applying the Cauchy-Schwarz inequality to \(\sum _{n\neq 0,\ |n|\leq N} |n||\hat {g}(n)|\) show that \[\sum _{n\neq 0,\ |n|\leq N} |\hat {f}(n)| \leq A \frac {1}{2\pi }\int _{0}^{2\pi }|g(t)|^{2}\,dt,\] for some constant \(A\). Conclude that \(\sum |\hat {f}(n)|\) converges.
(iii) Deduce, using Question 21.14, that if \(f:{\mathbb R}\rightarrow {\mathbb C}\) is a continuously differentiable periodic function with period \(2\pi \) then \[f(t)=\sum _{n=-\infty }^{\infty }\hat {f}(n)\exp (int).\]
Q 21.16. We continue with the ideas of Question 21.13. From now on all sets will belong to the collection \(\mathcal {K}\) of closed bounded non-empty subsets of \({\mathbb R}^{m}\). Define \[d(E,F)=\sup _{e\in E}d(e,F)+\sup _{f\in F}d(f,E).\] Show that \(d\) is a metric on \(\mathcal {K}\).
Show also that \(d\) is a complete metric.
Q 21.17. If \((X,d)\) is a complete metric space and \(T:X\rightarrow X\) is a surjective map such that \[d(Tx,Ty)\geq Kd(x,y)\] for all \(x,y\in X\) and some \(K>1\) show that \(T\) has a unique fixed point.
By considering the map \(T:{\mathbb R}\rightarrow {\mathbb R}\) defined by \(T(x)=1+4n+2x\) for \(0\leq x<1\) and \(n\) an integer, or otherwise show that the condition \(T\) surjective can not be dropped.
Q 21.18 (A continuous nowhere differentiable function). The following construction is due to Van der Waerden.
(i) Sketch the graph of the function \(g\) given by the condition \[g(x+k)=|x|\ \ \ \ \mbox {if $k$ is any integer and $-1/2<x\leq 1/2$.}\]
(ii) Let \(g_{n}(t)=2^{-n}g(8^{n}t)\). Sketch the graphs of \(g_{1}\), \(g_{2}\) and \(g_{3}\).
(iii) Let \(f_{n}(t)=g_{1}(t)+g_{2}(t)+\ldots +g_{n}(t)\). Sketch the graphs of \(f_{1}\), \(f_{2}\) and \(f_{3}\).
(iv) Show that \(f_{n}\) converges uniformly to a continuous function \(f\).
(v) If \(n\) is a positive integer and \(r\) an odd integer give an expression for \(f(r2^{-n})\). Show that when \(n\) is large then \(2^{n}|f((r+2)2^{-n})-f(r2^{-n})|\) is large. Conclude that if \(r2^{-n}<t<(r+2)2^{-n}\) then \[\max \left (\left |\frac {f(t)-f(r2^{-n})}{t-r2^{-n}}\right |, \left |\frac {f((r+2)2^{-n})-f(t)}{(r+2)2^{-n}-t}\right |\right ).\]
(vi) By developing the ideas of part (v) show that \(f\) is nowhere differentiable.
Final Note To Supervisors
Let me reiterate my request for corrections and improvements particularly to the exercises. The easiest path for me is e-mail. My e-mail address is twk@dpmms. I have to express my gratitude to Drs Barden and Pinch for finding a multitude of errors but I am sure that once the first veil of error has been lifted a further veil of deeper errors will appear.
1Footnote for passing historians, this is a course in mathematics.
2Indeed, anyone who works through the exercises in Hardy as a first course and the exercises in Whittaker and Watson’s even older A Course of Modern Analysis [11] as a second will have had a splendid education in analysis.
3Of course economists know this but few of their public pronouncements seem to mention it.
4In [2] Boas reports the story of a friend visiting the Princeton common room ‘…where Einstein was talking to another man, who would shake his head and stop him; Einstein then thought for a while, then started talking again; was stopped again; and so on. After a while, …my friend was introduced to Einstein. He asked Einstein who the other man was. “Oh,” said Einstein, “that’s my mathematician.” ’
5I know of two universities which teach undergraduates Lebesgue theory in their first year but of no university where undergraduates learn Lebesgue theory in their first year.
6‘Shut up!’ he explained.
7There is an appropriate theory for objects with order (lattices) but we shall not pursue it here.
8Or according to a rival school of thought the ‘\(3\epsilon \) Theorem’.
9Most of the new words in this last paragraph are well worth dropping.
You must lie among the daisies and discourse in novel phrases of your
complicated state of mind,
The meaning doesn’t matter if it’s only idle chatter of a transcendental
kind.
And everyone will say,
As you walk your mystic way,
‘If this young man expresses himself in terms too deep for me
Why, what a very singularly deep young man this deep young man must
be!’
10Boas notes that ‘There is a test for identifying some of the future professional mathematicians at an early age. These are students who instantly comprehend a sentence beginning “Let \(X\) be an ordered quintuple \((a,T,\pi ,\sigma ,{\mathcal B})\) where …’. They are even more promising if they add, ‘I never really understood it before.” ’ ([2] page231.)
11Changing those things that need to be changed.