In mathematical analysis, points of the plane are associated with ordered pairs of real numbers, and the plane itself is associated with the set $\mathbb {R}\times \mathbb {R}=\mathbb {R}^2$. We will proceed analogously in representing three-dimensional space. The coordinate system in three-dimensional space can be described as follows. We consider three lines in space intersecting at a point that are mutually perpendicular, which we call the x-, y-, and z-axes. We call the plane spanned by the x- and y-axes the xy-plane, and we have similar definitions for the xz- and yz-planes. We assign an ordered triple (a, b, c) to every point P in space, in which a, b, and c denote the distance (with positive or negative sign) of the point from the yz-, xz-, and xy-planes, respectively. We call the numbers a, b, and c the coordinates of
P. The geometric properties of space imply that the map $P\mapsto (a,b, c)$ that we obtain in this way is a bijection. This justifies our representation of three-dimensional space by ordered triples of real numbers.

Thus if we want to deal with questions both in the plane and in space, we need to deal with sets that consist of ordered p-tuples of real numbers, where $p=2$ or $p=3$. We will see that the specific value of p does not usually play a role in the definitions and proofs that arise. Therefore, for every positive integer p we can define p-dimensional Euclidean space,
by which we simply mean the set of all sequences of real numbers of length p, with the appropriately defined addition, multiplication by a constant, absolute value, and distance. If $p=1$, then this Euclidean space is just the real line; if $p=2$, then it is the plane; and if $p=3$, then it is 3-dimensional space. For $p>3$, p-dimensional space does not have an observable meaning, but it is very important for both theory and applications.

Definition 1.1.

$\mathbb {R}^p$ denotes the set of ordered p-tuples of real numbers, that is,The points of the set ${\mathbb {R}^{p}}$ are sometimes called p-dimensional vectors. The sum of the vectors $x={(x_1 ,\ldots , x_p )}$ and $y=(y_1 ,\ldots , y_p )$ is the vector
and the product of the vector x and a real number c is the vectorThe absolute value of the vector x is
the nonnegative real number

$$ {\mathbb {R}^{p}}=\{ (x_1 ,\ldots , x_p ):x_1 ,\ldots , x_p \in \mathbb {R}\} . $$

$$ x+y =(x_1 +y_1 ,\ldots , x_p +y_p ) , $$

$$ c\cdot x=(cx_1 ,\ldots , cx_p ) . $$

$$ |x|=\sqrt{x_1^2 +\dots +x_p^2} . $$

(The absolute value of the vector x is also called the norm of the vector x.
In order to be consistent with the usage of [7], we will use the term absolute value.)

It is clear that for all $x\in {\mathbb {R}^{p}}$ and $c\in \mathbb {R}$ we have $|cx|=|c|\cdot |x|$. It is also easy to see that if $x=(x_1 ,\ldots , x_p )$, then The triangle inequality also holds:
To prove this it suffices to show that $|x+y|^2 \le (|x|+|y|)^2$, since both sides are nonnegative. By the definition of the absolute value this is exactlythat is,which is the Cauchy^{1}–Schwarz^{2}–Bunyakovsky^{3} inequality (see
[7, Theorem 11.19]).

$$\begin{aligned} |x|\le |x_1 |+\dots +|x_p |. \end{aligned}$$

(1.1)

$$\begin{aligned} |x+y|\le |x|+|y| \qquad (x, y\in {\mathbb {R}^{p}}). \end{aligned}$$

(1.2)

$$\begin{aligned} (x_1&+y_1 )^2 + \dots + (x_p +y_p )^2 \le \\&(x_1^2 +\dots +x_n^2 ) + 2\cdot \sqrt{x_1^2 +\dots +x_p^2}\cdot \sqrt{y_1^2 +\dots +y_p^2}+ y_1^2 +\dots +y_p^2 , \end{aligned}$$

$$ x_1 y_1 +\dots +x_p y_p \le \sqrt{x_1^2 +\dots +x_p^2} \cdot \sqrt{y_1^2 +\dots +y_p^2}, $$

The distance between the vectors x and y is the number ${|x-y|}$
. By (1.2) it is clear thatfor all $x,y, z\in {\mathbb {R}^{p}}$. We can consider these to be variants of the triangle inequality.

$$ \big ||x|-|y|\big |\le |x-y| \qquad \text{ and } \qquad |x-y| \le |x-z|+|z-y| $$

If we apply (1.1) to the difference of the vectors $x=(x_1 ,\ldots , x_p )$ and $y=(y_1 ,\ldots , y_p )$, then we get that The scalar product
of the vectors $x=(x_1,\ldots , x_p)$ and $y=(y_1,\ldots , y_p)$ is the real number $\sum _{i=1}^p x_iy_i$, which we denote by $\langle x, y\rangle $. One can prove that if $x\ne 0$ and $y\ne 0$, then $\langle x, y\rangle =|x|\cdot |y| \cdot \cos \alpha $, where $\alpha $ denotes the angle enclosed by the two vectors. (For $p=2$ see [7, Remark 14.57].) We say that the vectors $x, y\in {\mathbb {R}^{p}}$ are orthogonal
if $\langle x, y\rangle =0$
.

$$\begin{aligned} ||x|-|y||\le |x-y|\le |x_1 -y_1 |+\dots +|x_p -y_p |. \end{aligned}$$

(1.3)

We say that f is a p
-variable real function
if $D(f)\subset {\mathbb {R}^{p}}$
and $R(f)\subset \mathbb {R}$. (Recall that D(f) denotes the domain and R(f) denotes the range of the function f.)

Similarly to the case of single-variable functions, multivariable functions are best illustrated by their graphs. The graph
of a function $f:H\rightarrow \mathbb {R}$ is the set of pairs (u, f(u)), where $u\in H$. If $H\subset {\mathbb {R}^{p}}$, then $\mathrm{graph}~f \subset {\mathbb {R}^{p}}\times \mathbb {R}$; in other words, $\mathrm{graph}~f$ is the set of pairs $((x_1 ,\ldots ,x_p ), x_{p+1} )$, where $(x_1 ,\ldots , x_p ) \in H$ and $x_{p+1}=f(x_1 ,\ldots , x_p )$. In this case it is useful to “identify” ${\mathbb {R}^{p}}\times \mathbb {R}$ as the set $\mathbb {R}^{p+1}$ in the sense that instead of the pair $((x_1 ,\ldots ,x_p ), x_{p+1} )$, we consider the vector $(x_1 ,\ldots ,x_p , x_{p+1} ) \in \mathbb {R}^{p+1}$. From now on, if $f:H\rightarrow \mathbb {R}$, where $H\subset {\mathbb {R}^{p}}$, then by the graph of f we mean the set
For example, if $f:H\rightarrow \mathbb {R}$, where $H\subset \mathbb {R}^2$, then $\mathrm{graph}~f\subset \mathbb {R}^3 $. Just as we can visualize the graph of a function as a curve in the plane in the $p=1$ case, we can also visualize the graph of a function as a surface in three-dimensional space in the $p=2$ case.

$$ \mathrm{graph}~f =\{ (x_1 ,\ldots ,x_p , x_{p+1} ):(x_1 ,\ldots , x_p ) \in H\ \text {and}\ x_{p+1}=f(x_1 ,\ldots , x_p ) \} . $$

Aside from using the usual coordinate notation $(x_1 , x_2)$ and $(x_1 , x_2 , x_3 )$, we will also use the traditional notation (x, y) and (x, y, z) in the $p=2$ and $p=3$ cases, respectively.

Example 1.2.

1. The graph of the constant function $f(x, y)=c$ is a horizontal plane (in other words, it is parallel to the xy-plane). (See Figure 1.1.)

2. The graph of the function $f(x, y)=x^2$ is an infinite trough-shaped surface, whose intersections with the planes orthogonal to the y-axis are parabolas. (See Figure 1.2.)

We may ask whether multivariable analysis is “more difficult” or more complicated than its single-variable counterpart. The answer is twofold. On the one hand, the answer is that it is not harder at all, since it makes no difference whether we define our mappings^{4} on subsets of $\mathbb {R}$ or on subsets of ${\mathbb {R}^{p}}$. On the other hand, the answer is “to a great extent,” since we have “much more room” in a multidimensional space; that is, the relative positions of points in space can be much more complicated than their relative positions on a line. On the real line, a point can be to the left or to the right to another point, and there is no other option.

There is truth to both answers. While it is true that the relative positions of points can be much more complicated in a multidimensional space, this complication mostly falls in the topics of geometry and topology. For a good portion of our studies of multivariable analysis we can follow the guideline that more variables only complicate the notation but not the ideas themselves. We will warn the reader when this guideline is no longer applicable.

Definition 1.3.

We say that a sequence $(x_n)$ of the points $x_n \in {\mathbb {R}^{p}}$ converges to a point $a\in {\mathbb {R}^{p}}$ if for every $\varepsilon >0$ there exists $n_0$ such that ${|x_n -a|<\varepsilon }$ holds for every $n>n_0$. We denote this fact by $\lim _{n\rightarrow \infty } x_n =a$ or simply by $x_n \rightarrow a$.
We say that the sequence of points $(x_n )$ is convergent if there exists an $a\in {\mathbb {R}^{p}}$ to which it converges. In this case we say that a is the limit of the sequence $(x_n )$. If a sequence of points is not convergent, then it is divergent.

We denote by B(a, r) the open ball
centered at a with radius r
: $B(a, r)=\{ x\in {\mathbb {R}^{p}}:|x-a|\lt r\} $.
Note that if $p=1$, then B(a, r) is the open interval $(a-r, a+r)$, and if $p=2$, then B(a, r) is the open disk with center a and radius r.

Theorem 1.4.

The following statements are equivalent:

- (i)$x_n \rightarrow a$.
- (ii)For every $\varepsilon >0$ there are only finitely many points of the sequence $(x_n)$ that fall outside of the open ball $B(a,\varepsilon )$.
- (iii)$|x_n -a|\rightarrow 0$.

Proof.

The implication (i)$\Rightarrow $(ii) is clear from the definition of $x_n \rightarrow a$.

Suppose (ii), and let $\varepsilon >0$ be given. Then there is an $n_0$ such that ${|x_n -a|<\varepsilon }$ holds for every $n>n_0$. By the definition of the convergence of sequences of real numbers, this means that $|x_n -a|\rightarrow 0$; that is, (iii) holds.

Now suppose (iii), and let $\varepsilon >0$ be given. Then there is an $n_0$ such that ${|x_n -a|<\varepsilon }$ holds for every $n>n_0$. By the definition of the convergence of sequences of points of ${\mathbb {R}^{p}}$, this means that $x_n \rightarrow a$; that is, (i) holds. $\square $

The following theorem states that the convergence of a sequence of points is equivalent to the convergence of the sequences of their coordinates.

Theorem 1.5.

Let $x_n =(x_{n, 1} ,\ldots ,x_{n, p} )\in {\mathbb {R}^{p}}$ for every $n=1,2,\ldots $, and let $a=(a_1 ,\ldots , a_p )$. The sequence $(x_n )$ converges to a if and only if $\lim _{n\rightarrow \infty } x_{n, i} =a_i$ for every $i=1,\ldots , p$.

Proof.

Suppose $x_n \rightarrow a$. Since $0\le |x_{n, i} -a_i |\le |x_n -a|$ for every $i=1,\ldots , p$ and $|x_n -a| \rightarrow 0$, we have that $|x_{n, i} -a_i |\rightarrow 0$ follows from the squeeze theorem (see [7, Theorem 5.7]).

On the other hand, if $|x_{n, i} -a_i |\rightarrow 0$ for every $i=1,\ldots , p$, then the inequalityand the repeated use of the squeeze theorem give us ${x_n \rightarrow a}$. $\square $

$$|x_n -a| \le \sum _{i=1}^p|x_{n, i} -a_i |$$

We can generalize several theorems for sequences of real numbers to sequences of points of ${\mathbb {R}^{p}}$ with the help of the above theorem. The proofs of the next two theorems (which are left to the reader) are just applications of the respective theorems for sequences of real numbers to sequences of coordinates of a point-sequence.

Theorem 1.6.

- (i)If a sequence of points is convergent, then the deletion of finitely many of its terms, addition of finitely many new terms, or the reordering of its terms affect neither the convergence of the sequence nor the value of its limit.
- (ii)If a sequence of points is convergent, then its limit is unique.
- (iii)If a sequence of points converges to a, then each of its subsequences also converges to a.

$\square $

Theorem 1.7.

If $x_n \rightarrow a$ and $y_n \rightarrow b$, then $x_n +y_n \rightarrow a+b$ and $c\cdot x_n \rightarrow c \cdot a$, for every $c\in \mathbb {R}$. $\square $

Theorem 1.8.

(Cauchy’s criterion)
A sequence of points $(x_n )$ is convergent if and only if for every $\varepsilon >0$ there exists an index N such that $|x_n -x_m |<\varepsilon $ for every $n, m\ge N$.

Proof.

If $|x_n -a|<\varepsilon $ for every $n>N$, then $|x_n -x_m |<2\varepsilon $ for every $n, m\ge N$. This proves the “only if” direction of our statement.

Let $\varepsilon >0$ be given, and suppose that $|x_n -x_m |<\varepsilon $ for every $n, m\ge N$. If $x_n =(x_{n, 1} ,\ldots ,x_{n, p} )$ $(n=1,2,\ldots )$, then for every $i=1,\ldots , p$ and $n, m>N$ we haveThis means that for every fixed $i=1,\ldots , p$ the sequence $(x_{n, i} )$ satisfies Cauchy’s criterion (for real sequences), and thus it is convergent. Therefore, $(x_n )$ is convergent by Theorem 1.5. $\square $

$$ |x_{n,i} -x_{m, i} |\le |x_n -x_m |\lt\varepsilon . $$

We say that a set $A\subset {\mathbb {R}^{p}}$ is bounded
if there exists a box ${[a_1 , b_1 ]\times } \ldots {\times [a_p , b_p ]}$ that covers (contains) it. It is obvious that a set A is bounded if and only if the set of the ith coordinates of its points is bounded in $\mathbb {R}$, for every $i=1,\ldots , p$ (see Exercise 1.1).

A sequence of points $(x_n )$ is bounded if the set of its terms is bounded.

Theorem 1.9.

Proof.

Let us assume that the sequence of points $(x_n )$ is bounded, and let $x_n =(x_{n, 1} ,\ldots ,x_{n, p} )$ $(n=1,2,\ldots )$. The sequence of the ith coordinates $(x_{n, i} )$ is bounded for every $i=1,\ldots , p$. Based on the Bolzano–Weierstrass theorem for real sequences (see [7, Theorem 6.9]), we can choose a convergent subsequence $(x_{{n_k} , 1} )$ from $(x_{n, 1} )$. The sequence $(x_{{n_k} , 2} )$ is bounded, since it is a subsequence of the bounded sequence $(x_{n, 2})$. Thus, we can choose a convergent subsequence $(x_{n_{k_l} , 2})$ of $(x_{{n_k} , 2} )$. If $p\ge 3$, then $(x_{n_{k_l} , 3})$ is bounded, since it is a subsequence of the sequence $(x_{n, 3} )$. Therefore, we can choose another convergent subsequence again. Repeating the process p times yields a subsequence $(m_j)$ of the indices for which the ith coordinate sequence of $(x_{m_j})$ is convergent for every $i=1,\ldots , p$. Thus, by Theorem 1.5, the subsequence $(x_{m_j})$ is convergent. $\square $

Exercises

1.1.

Prove that for every set $A\subset {\mathbb {R}^{p}}$, the following statements are equivalent.

- (a)The set A is bounded.
- (b)There exists an $r>0$ such that $A\subset B(0,r)$.
- (c)For all $i=1,\ldots , p$ the ith coordinates of the points of A form a bounded set in $\mathbb {R}$.

1.2.

Show that(Here $x_n , y_n \in {\mathbb {R}^{p}}$ and $\langle x_n , y_n \rangle $ is the scalar product of $x_n$ and $y_n$.)

- (a)if $x_n \rightarrow a$, then $|x_n |\rightarrow |a|$;
- (b)if $x_n \rightarrow a$ and $y_n \rightarrow b$, then $\langle x_n ,y_n \rangle \rightarrow \langle a, b\rangle $.

1.3.

Show that $x_n \in {\mathbb {R}^{p}}$ does not have a convergent subsequence if and only if $|x_n |\rightarrow \infty $.

1.4.

Show that if every subsequence of $(x_n )$ has a convergent subsequence converging to a, then $x_n \rightarrow a $.

1.5.

Show that if $x_n \in {\mathbb {R}^{p}}$ and $|x_{n+1} -x_n |\le 2^{-n}$ for every n, then $(x_n )$ is convergent.

1.6.

Let $x_0 =(0,0)$, $x_{n+1} =x_n +(2^{-n} , 0)$ if n is even, and $x_{n+1} =x_n +(0,2^{-n})$ if n is odd. Show that $(x_n )$ is convergent. What is its limit?

1.7.

Construct a sequence $x_n \in \mathbb {R}^2$ having a subsequence that converges to $x\in \mathbb {R}^2$ for every x.

In order to describe the basic properties of subsets of the space ${\mathbb {R}^{p}}$, we need to introduce a few notions. We define some of these by generalizing the corresponding notions from the case $p=1$ to an arbitrary p. Since we do not exclude the $p=1$ case from our definitions, everything we say below holds for the real line as well.

First, we generalize the notion of neighborhoods of points. The neighborhoods
of a point $a\in {\mathbb {R}^{p}}$ are the open balls B(a, r), where r is an arbitrary positive real number.

By fixing an arbitrary set $A\subset {\mathbb {R}^{p}}$, we can divide the points of ${\mathbb {R}^{p}}$ into three classes.

The first class consists of the points that have a neighborhood that is a subset of A. We call these points the interior points
of A
, and denote the set of all interior points of A by $\mathrm{int}~A$. That is,
The second class consists of those points that have a neighborhood that is disjoint from A. We call these points the exterior points
of A
, and denote the set of all exterior points of A by $\mathrm{ext}~A$. That is,
The third class consists of the points that do not belong to any of the first two classes. We call these points the boundary points
of A
. In other words, a point x is a boundary point of A if every neighborhood of x has a nonempty intersection with both A and the complement of A. We denote the set of all boundary points of A by $\partial A$. That is,
It is easy to see that $\mathrm{ext}~A=\mathrm{int}~({\mathbb {R}^{p}}\setminus A)$, $\mathrm{int}~A=\mathrm{ext}~({\mathbb {R}^{p}}\setminus A)$, and $\partial A=\partial ({\mathbb {R}^{p}}\setminus A)$ hold for every set $A\subset {\mathbb {R}^{p}}$.

$$ \mathrm{int}~A=\{ x\in {\mathbb {R}^{p}}:\exists \ r>0,\ B(x, r)\subset A\} . $$

$$ \mathrm{ext}~A=\{ x\in {\mathbb {R}^{p}}:\exists \ r>0,\ B(x, r)\cap A =\emptyset \} . $$

$$ \partial A=\{ x\in {\mathbb {R}^{p}}:\forall \ r>0, \ B(x,r)\cap A \ne \emptyset \ \text {and} \ B(x, r)\setminus A \ne \emptyset \} . $$

Example 1.10.

1.a. Every point of the open ball B(a, r) is an interior point. Indeed, if $x\in B(a, r)$, then $|x-a|\lt r$. Let $\delta =r-|x-a|$. Now $\delta >0$ and $B(x,\delta )\subset B(a, r)$, since $y\in B(x,\delta )$ implies $|y-x|<\delta $, and thusi.e., $y\in B(a, r)$.

$$ |y-a|\le |y-x|+ |x-a|\lt \delta +|x-a|=r, $$

1.b. If $|x-a|\gt r$, then x is an exterior point of the open ball B(a, r). Indeed, $\eta ={|x-a|-r}>0$ and $B(x,\eta )\cap B(a, r)=\emptyset $, since if $y\in B(x,\eta )$, then $|y-x|<\eta $ and
1.c. We now prove that the boundary of B(a, r) is the set $S(a, r)=\{ x\in {\mathbb {R}^{p}}: |x-a|=r\}$ (Figure 1.6). (In the case $p=1$, the set S(a, r) consists of the points $a-r$ and $a+r$, while in the case $p=2$ the set S(a, r) consists of the boundary of the circle with center a and radius r. In the case $p=3$, S(a, r) contains the surface of the ball with center a and radius r.)

$$ |y-a|\ge |x-a|-|y-x| >|x-a|-\eta =r. $$

Indeed, if $x\in S(a, r)$, then $x\notin B(a, r)$; therefore, every neighborhood of x has nonempty intersection with the complement of B(a, r). We show that every neighborhood of x also has nonempty intersection with B(a, r). Intuitively, it is clear that for every $\varepsilon >0$, the open sphere $B(x,\varepsilon )$ contains those points of the segment connecting a and x that are close enough to x.

To formalize this idea, it is enough to show that for a well-chosen $\eta \in (0,1)$ we have $x-t(x-a)\in B(a, r)\cap B(x,\varepsilon )$ if $t\in (0,\eta )$. Sinceit follows that $x-t(x-a)\in B(a, r)$. On the other hand,for $\eta < \varepsilon /r$, and then $x-t(x-a)\in B(x,\varepsilon )$ also holds for every ${t\in (0,\eta )}$.

$$ |(x-t(x-a))-a|=(1-t)\cdot |x-a| =(1-t)\cdot r\lt r, $$

$$ |(x-t(x-a))-x|=t\cdot |x-a|<\eta \cdot r <\varepsilon $$

2. By an axis-parallel rectangle in ${\mathbb {R}^{p}}$, or just a rectangle or a box for short, we will mean a set of the form
where $a_i\lt b_i$ for every $i=1,\ldots , p$. The boxes in the Euclidean spaces $\mathbb {R}$, $\mathbb {R}^2$, and $\mathbb {R}^3$ are the nondegenerate and bounded closed intervals, the axis-parallel rectangles, and the rectangular boxes, respectively.

$$ [a_1,b_1 ]\times \dots \times [a_p , b_p ] , $$

The interior of the box is the open box
For every point $x=(x_1 ,\ldots , x_p )$ of this open box, we have $a_i\lt x_i\lt b_i$ for every $i=1,\ldots , p$. If $\delta >0$ is small enough, then for every $i=1,\ldots , p$. Then $B(x,\delta )\subset R$, since $y=(y_1 ,\ldots , y_p )\in B(x,\delta )$ implies $|y-x|<\delta $, which gives ${|y_i -x_i |<\delta }$ for every i, and thus, by (1.6), $a_i\lt y_i\lt b_i$ for every i.

$$\begin{aligned} R=[a_1 , b_1 ]\times \ldots \times [a_p , b_p ] \end{aligned}$$

(1.4)

$$\begin{aligned} (a_1 , b_1 )\times \ldots \times (a_p , b_p ). \end{aligned}$$

(1.5)

\begin{align*} a_i\lt x_i -\delta\lt x_i\lt x_i +\delta\lt b_i \end{align*}

(1.6)

If the point $x=(x_1 ,\ldots , x_p )$ is not in the open box defined in (1.5), then x is not an interior point of R. Indeed, if there exists an i such that $x_i\lt a_i$ or $x_i \gt b_i $, then we can find an appropriate neighborhood of x that is disjoint from R. Therefore, in this case x is an exterior point. On the other hand, if $x\in R$ and there exists i such that $x_i =a_i$ or $x_i =b_i $, then every neighborhood of x intersects both R and its complement, and thus x is a boundary point of R (Figure 1.7).

3. Let $\mathbb {Q}^p$ be the set of those points $x \in {\mathbb {R}^{p}}$ for which every coordinate of x is rational. We show thatFirst, we prove that $\mathbb {Q}^p$ intersects every box in ${\mathbb {R}^{p}}$. Indeed, we know that the set of rational numbers is everywhere dense; i.e., there are rational numbers in every interval. (See [7, Theorem 3.2].) If R is the box defined in (1.4) and $x_i \in [a_i , b_i ] \cap \mathbb {Q}$ for every $i=1,\ldots , p$, then the point $x=(x_1 ,\ldots , x_p )$ is an element of both $\mathbb {Q}^p$ and R. Thus, $\mathbb {Q}^p$ intersects every box. From this it follows that $\mathbb {Q}^p$ intersects every ball. This is true, since every ball contains a box: if $a=(a_1 ,\ldots , a_p )$ and $r>0$, then for every $\eta \lt r/p$, Indeed, if $x=(x_1 ,\ldots , x_p )$ is an element of the left-hand side of (1.7), then $|x_i -a_i |\le \eta $ for each i, and thusand $x\in B(a, r)$. We have proved that $\mathbb {Q}^p$ intersects every ball, and thus $\mathrm{ext}~\mathbb {Q}^p =\emptyset $.

$$\mathrm{int}~\mathbb {Q}^p =\mathrm{ext}~\mathbb {Q}^p =\emptyset .$$

$$\begin{aligned}{}[a_1 -\eta , a_1 +\eta ]\times \ldots \times [a_p -\eta ,a_p +\eta ]\subset B(a, r). \end{aligned}$$

(1.7)

$$|x-a| \le \sum _{i=1}^p|x_i -a_i | \le p\eta \lt r,$$

Now we prove that each ball B(a, r) has a point that is not an element of $\mathbb {Q}^p$. We need to find a point in B(a, r) that has at least one irrational coordinate. We can, however, go further and find a point that has only irrational coordinates. We know that the set of irrational numbers is also dense everywhere (see [7, Theorem 3.12]). Thus we can repeat the same steps as above, and then $\mathrm{int}~\mathbb {Q}^p =\emptyset $ follows.

In the end we get that $\mathbb {Q}^p$ has neither interior nor exterior points, i.e., every point $x\in {\mathbb {R}^{p}}$ is a boundary point of $\mathbb {Q}^p$.

Definition 1.11.

We say that a point $a\in {\mathbb {R}^{p}}$ is a limit point
of the set $A\subset {\mathbb {R}^{p}}$ if every neighborhood of the point a contains infinitely many points of A. We call the set of all limit points of the set A the derived set of A, and denote it by
$A'$
.

We say that a point $a\in {\mathbb {R}^{p}}$ is an isolated point
of A
if there exists $r>0$ such that $B(a, r)\cap A=\{ a\} $.

Remark 1.12.

1. The limit points of A are not necessarily elements of the set A. For example, every point y that satisfies ${|y-x|}=r$ is a limit point of the ball B(x, r) (see Example 1.10.1.c). Thus $S(a,r)\subset B(a, r)' $. However, $S(a,r)\cap B(a, r)=\emptyset $.

2. By our definitions, the isolated points of A need to be elements of A. It is easy to see that the set of all isolated points of A is nothing other than the set $A\setminus A'$. It follows that every point of A is either an isolated point or a limit point of A.

3. It is also easy to see that a point a is a limit point of the set A if and only if there exists a sequence $x_n \in A\setminus \{ a\}$ that converges to a.

We say that the set $A\subset {\mathbb {R}^{p}}$ is open
if every point of A is an interior point of A, i.e., if $A=\mathrm{int}~A$. The open balls and open boxes are indeed open sets by Example 1.10.1a and Example 1.10.2. The empty set and ${\mathbb {R}^{p}}$ are also open.

Obviously, the set A is open if and only if $A\cap \partial A=\emptyset $.

Theorem 1.13.

The following hold for an arbitrary set $A\subset {\mathbb {R}^{p}}$:

- (i)$\mathrm{int}~A$ and $\mathrm{ext}~A$ are open sets;
- (ii)$\mathrm{int}~A$ is the largest open set contained by A.

Proof.

Part (i) follows from the definition and from the fact that every ball is an open set.

If $G\subset A$ is open and $x\in G$, then there exists $r>0$ such that $B(x, r)\subset G$. In this case, $B(x, r)\subset A$ also holds, and thus $x\in \mathrm{int}~A$. We have proved that $\mathrm{int}~A$ contains every open set contained by A. Since $\mathrm{int}~A$ is also open by part (i), it follows that (ii) holds. $\square $

Theorem 1.14.

The intersection of finitely many open sets and the union of arbitrarily many open sets are also open.

Proof.

If A and B are open sets and $x\in A\cap B$, then $x\in \mathrm{int}~A$, and $x\in \mathrm{int}~B$ means that there exist positive numbers r and s such that $B(x, r)\subset A$ and $B(x, s)\subset B$. In this case, $B(x,\min (r, s))\subset A\cap B$, and thus $x\in \mathrm{int}~(A\cap B)$. We have proved that every point of $A\cap B$ is an interior point of $A\cap B$, and thus the set $A\cap B$ is open. By induction we have that the intersection of n open sets is open, for every $n\in \mathbb {N}^+$.

Let $G_i$ be an open set for each $i\in I$, where I is an arbitrary (finite or infinite) index set, and let $G=\bigcup _{i\in I}G_i $. If $x\in G$, then x is in one of the sets $G_{i_0}$. Since $G_{i_0}$ is open, it follows that $x\in \mathrm{int}~G_{i_0} $, i.e., $B(x, r)\subset G_{i_0}$ for some $r>0$. Now $B(x, r)\subset G$ holds, and thus $x\in \mathrm{int}~G$. This is true for every $x\in G$, which implies that the set G is open. $\square $

Remark 1.15.

The intersection of infinitely many open sets is not necessarily open. For example, the intersection of the sets B(x, 1 / n) is the singleton $\{ x\}$. This set is not open, since its interior is empty.

We say that a ball B(x, r) is a rational ball
if each of the coordinates of its center x, along with its radius, is a rational number.

Lemma 1.16.

Every open set is the union of rational balls.

Proof.

Let G be an open set and $x\in G$. Then $B(x, r)\subset G$ holds for some $r>0$. As shown in Example 1.10.3, every ball contains a point with rational coordinates. Let $y\in B(x, r/2)$ be such a point. If $s\in \mathbb {Q}$ and $|x-y|\lt s\lt r/2$, then B(y, s) is a rational ball that contains x, since $|x-y|\lt s$. On the other hand, $B(y,s)\subset B(x, r)$, since ${z\in B(y, s)}$ impliesWe have proved that every point in G is in a rational ball contained by G. Therefore, G is equal to the union of all the rational balls it contains. $\square $

$$ |z-x|\le |z-y|+|y-x|\lt s+(r/2)\lt r. $$

We say that a set $A\subset {\mathbb {R}^{p}}$ is closed
if it contains each of its boundary points, i.e., $\partial A\subset A$. Thus every box is closed. The set $\overline{B} (a, r)=\{ x\in {\mathbb {R}^{p}}:|x-a|\le r\}$ is also closed.
We call this set the closed ball with center a and radius r.

Theorem 1.17.

For every set $A\subset {\mathbb {R}^{p}}$ the following are equivalent:

- (i)A is a closed set.
- (ii)${\mathbb {R}^{p}}\setminus A$ is an open set.
- (iii)If $x_n \in A$ for every n and $x_n \rightarrow a$, then $a\in A$.

Proof.

(i)$\Rightarrow $(ii): If A is closed and $x\notin A$, then $x\notin \mathrm{int}~A$ and $x\notin \partial A$, and thus $x\in \mathrm{ext}~A$. Thus $B(x, r)\cap A=\emptyset $ holds for some $r>0$, i.e., $B(x, r)\subset {\mathbb {R}^{p}}\setminus A$. We have shown that every point of ${\mathbb {R}^{p}}\setminus A$ is an interior point of ${\mathbb {R}^{p}}\setminus A$; that is, ${\mathbb {R}^{p}}\setminus A$ is open.

(ii)$\Rightarrow $(iii): We prove by contradiction. Assume that $x_n \rightarrow a$, where $x_n \in A$ for every n, but $a\notin A$, i.e., $a\in {\mathbb {R}^{p}}\setminus A$. Since ${\mathbb {R}^{p}}\setminus A$ is open, $B(a, r)\subset {\mathbb {R}^{p}}\setminus A$ for some $r>0$. On the other hand, as $x_n \rightarrow a$, we have $x_n \in B(a, r)\subset {\mathbb {R}^{p}}\setminus A$ for every n large enough. This is a contradiction, since $x_n \in A$ for every n.

(iii)$\Rightarrow $(i): Let $a\in \partial A$. Then for every $n\in \mathbb {N}^+$ we have $B(a, 1/n)\cap A\ne \emptyset $. Choose a point $x_n \in B(a, 1/n)\cap A$ for each n. Then $x_n \rightarrow a$, and thus $a\in A$ by (iii). We have proved that $\partial A\subset A$, i.e., A is closed. $\square $

It follows from our previous theorem that the boundary of every set is a closed set. Indeed, $\partial A ={\mathbb {R}^{p}}\setminus (\mathrm{int}~A\, \cup \, \mathrm{ext}~A)$, and by Theorems 1.13 and 1.14, $\mathrm{int}~A \,\cup \, \mathrm{ext}~A$ is open. It is also easy to see that the set of limit points of an arbitrary set is closed (see Exercise 1.22).

Theorem 1.18.

The union of finitely many closed sets and the intersection of arbitrarily many closed sets is also a closed set.

Obviously, there are sets that are neither open nor closed (for example, the set $\mathbb {Q}$ as a subset of $\mathbb {R}$). On the other hand, the empty set and ${\mathbb {R}^{p}}$ are both open and closed at the same time. We will show that there is no other set in ${\mathbb {R}^{p}}$ that is both open and closed.

For every $a, b\in {\mathbb {R}^{p}}$ we denote by [a, b] the set $\{ t\in [0,1]:a+t(b-a)\}$. It is clear that [a, b] is the segment connecting the points a and b.

Theorem 1.19.

If $A\subset {\mathbb {R}^{p}}$, $a\in A$, and ${b\in {\mathbb {R}^{p}}\setminus A}$, then the segment [a, b] intersects the boundary of A, i.e., $[a, b]\cap \partial A \ne \emptyset $.

Proof.

Let $T=\{ t\in [0,1]:a+t(b-a)\in A\} $. The set T is nonempty (since $0\in T$) and bounded; thus it has a least upper bound. Let $t_0 =\sup T$. We show that the point $x_0 =a+t_0 (b-a)$ is in the boundary set of A. Obviously, for every $\varepsilon >0$, the interval $(t_0 -\varepsilon , t_0 +\varepsilon )$ intersects both T and $[0,1]\setminus T$. (This is also true in the case $t_0 =1$, since $1\notin T$.) If $t\in (t_0 -\varepsilon , t_0 +\varepsilon )\cap T$, then the point $x=a+t(b-a)$ is an element of A, and $|x-x_0 |<\varepsilon \cdot |b-a|$. However, if $t\in (t_0 -\varepsilon , t_0 +\varepsilon )\setminus T$, then the point $y=a+t(b-a)$ is not an element of A, and $|y-x_0 |<\varepsilon \cdot |b-a|$. We have proved that every neighborhood of $x_0$ intersects both A and the complement of A, i.e., $x_0 \in \partial A $. $\square $

Corollary 1.20.

If a set $A\subset {\mathbb {R}^{p}}$ is both open and closed, then $A=\emptyset $ or $A={\mathbb {R}^{p}}$.

Proof.

If A is an open set, then $A\cap \partial A =\emptyset $. If, however, A is a closed set, then $\partial A \subset A$. Only if $\partial A =\emptyset $ can these conditions both hold. Now Theorem 1.19 states that if $\emptyset \ne A\ne {\mathbb {R}^{p}}$, then $\partial A\ne \emptyset $. $\square $

The connected open sets play an important role in multivariable analysis.

Definition 1.21.

We say that an open set $G\subset {\mathbb {R}^{p}}$ is connected
if G cannot be represented as the union of two disjoint nonempty open sets.

Theorem 1.22.

- (i)An open set G is connected if and only if every pair of its points can be connected with a polygonal line
^{7}contained entirely in G. - (ii)Every open set can be written as the union of pairwise disjoint connected open sets (the number of which can be finite or infinite).

Proof.

Let $G\subset {\mathbb {R}^{p}}$ be an open set. We call the points $x, y\in G$ equivalent if they can be connected by a polygonal line that lies entirely in G. We will denote this fact by $x\sim y$. Obviously, this is an equivalence relation in G. If $x\in G$, then $B(x, r)\subset G$ for some $r>0$. The point x is equivalent to every point y of B(x, r), since $[x,y]\subset B(x, r) \subset G$. It follows that every equivalence class (the set of points equivalent to an arbitrary fixed point) is an open set. Since the different equivalence classes are disjoint, we have a system of pairwise disjoint open sets whose union is G.

If G is connected, then there is only one equivalence class, for otherwise, we could write G as the union of two disjoint nonempty open sets (e.g., take a single class and the union of the rest). Thus we have proved that if G is connected, then every pair of its points are equivalent to each other.

To prove the converse, let us assume that every pair of points in G are equivalent to each other, but G is not connected. Let $G=A\cup B$, where A and B are nonempty disjoint open sets. Let $x\in A$, $y\in B$, and let T be a polygonal line connecting the two points. Let T be the union of the segments $[x_{i-1}, x_i ]$ $(i=1,\ldots , n)$, where $x_0 =x$ and $x_n =y$. Since $x_0 \in A$ and $x_n \notin A$, there exists i such that $x_{i-1} \in A$ and $x_i \notin A$. The segment $[x_{i-1}, x_i ]$ contains a boundary point of A by Theorem 1.19. This is impossible, since every point of $[x_{i-1}, x_i ]$ is either an exterior or an interior point of A, as implied by $[x_{i-1}, x_i ]\subset G=A\cup B$. This contradiction proves (i).

We showed that an arbitrary open set G can be written as the union of pairwise disjoint open sets $G_i$, where each $G_i$ contains every point from the same equivalence class. We also proved that each $G_i$ is also a connected set, which proves (ii). $\square $

We call the connected open sets domains.

The proof of Theorem 1.22 also shows that the decomposition in part (ii) of the theorem is unique: the open sets of the composition are just the equivalence classes of the $x\sim y$ equivalence relation. We call the domains of this decomposition of the set G the components of G.

Definition 1.23.

We call the set $A\cup \partial A$ the closure of the set A,
and use the notation
$\mathrm{cl}~A$
.

Theorem 1.24.

For an arbitrary set $A\subset {\mathbb {R}^{p}}$, the following hold.

- (i)the point x is in $\mathrm{cl}~A$ if and only if every neighborhood of x intersects A;
- (ii)$\mathrm{cl}~A =A\cup A' $;
- (iii)$\mathrm{cl}~A = {\mathbb {R}^{p}}\setminus \mathrm{ext}~A= {\mathbb {R}^{p}}\setminus \mathrm{int}~({\mathbb {R}^{p}}\setminus A)$;
- (iv)$\mathrm{cl}~A$ is the smallest closed set containing A.

Proof.

We leave the proof of (i)–(iii) to the reader, while (iv) follows from (iii) and Theorem 1.13. $\square $

Our next theorem is a generalization of Cantor’s axiom^{8} (see [7, p. 33]).
Note that Cantor’s axiom states only that if the sets $A_1 \supset A_2 \supset \ldots $ are closed intervals in $\mathbb {R}$, then their intersection is nonempty. As the following theorem shows, it follows from Cantor’s axiom and from the other axioms of the real numbers that the statement is also true in ${\mathbb {R}^{p}}$ (for every p) and for much more general sets. From now on, we consider only subsets of ${\mathbb {R}^{p}}$.

Theorem 1.25.

(Cantor’s Theorem)
If the sets $A_1 \supset A_2 \supset \ldots $ are bounded, closed, and nonempty, then the set $\bigcap _{n=1}^\infty A_n$ is also nonempty.

Proof.

Choose a point $x_n$ from each set $A_n$. The sequence $(x_n)$ is bounded, since it is contained in the bounded set $A_1$. The Bolzano–Weierstrass theorem (Theorem 1.9) states that $(x_n)$ has a convergent subsequence. Let $(x_{n_k} )$ be one such subsequence, and let its limit be a. We show that $a\in \bigcap _{n=1}^\infty A_n $.

Let n be fixed. For k large enough, we have $n_k >n$, and thus ${x_{n_k }\in A_{n_k}\subset A_n}$. Therefore, the sequence $(x_{n_k} )$ is contained in $A_n$, except for at most finitely many of its points. Since $A_n$ is closed, we have $a\in A_n$ (Theorem 1.17). Also, since n was arbitrary, it follows that $a \in \bigcap _{n=1}^\infty A_n $. $\square $

Theorem 1.26.

(Lindelöf’s
^{9} Theorem)
If the set A is covered by the union of some open sets, then we can choose countably many of those open sets whose union also covers A.

Lemma 1.27.

The set of rational balls is countable.

Proof.

Let $(r_n )_{n=1}^\infty $ be an enumeration of the rational numbers. If $x=(r_{n_1} ,\ldots , r_{n_p})$ and $r=r_m $, then we call $n_1 +\ldots +n_p +m$ the weight of B(x, r). Obviously, there are only finitely many balls with a given weight w for every $w\ge p+1$. It follows that there exists a sequence that contains every rational ball. Indeed, first we enumerate the rational balls with weight $p+1$ (there is at most one such ball). Then we list the rational balls with weight $p+2$, and so on. In this way we list every rational ball in a single infinite sequence, which proves that the set of rational balls is countable. $\square $

Remark 1.28.

The proof above also shows that the set $\mathbb {Q}^p$ (the set of points with rational coordinates) is countable. Combining this result with Example 1.10.3, we get that there exists a set in ${\mathbb {R}^{p}}$ that is countable and everywhere dense.

Proof of Theorem 1.26.

Let $(B_n )_{n=1}^\infty $ be an enumeration of the rational balls. (By Lemma 1.27, there is such a sequence.)

Let $\mathcal {G}$ be a system of open sets whose union covers A. For every ball $B_n$ that is contained by at least one of the open sets $G\in \mathcal {G}$ we choose an open set $G_n \in \mathcal {G}$ such that $B_n \subset G_n $. In this way we have chosen the countable subsystem $\{ G_n \}$ of $\mathcal {G}$. The union of the sets of this subsystem is the same as the union of all sets in $\mathcal {G}$. Indeed, if $x\in \bigcup \mathcal {G}$, then there is a set $G\in \mathcal {G}$ containing x. By Lemma 1.16, there is a ball $B_n$ such that $x\in B_n \subset G $. Since $B_n \subset G_n $ holds, it follows that $x\in \bigcup _{n=1} G_n $.

Therefore, if the union of $\mathcal {G}$ covers A, then the union of the sets $G_n$ also covers A. $\square $

Example 1.29.

1. The balls B(0, r) cover the whole space ${\mathbb {R}^{p}}$. Lindelöf’s theorem claims that countably many of these also cover ${\mathbb {R}^{p}}$, e.g., $\bigcup _{n=1}^\infty B(0,n)={\mathbb {R}^{p}}$. On the other hand, it is obvious that finitely many of the balls B(0, r) cannot cover the whole of ${\mathbb {R}^{p}}$.

2. The open sets $G_r ={\mathbb {R}^{p}}\setminus \overline{B}(0,r) =\{ x\in {\mathbb {R}^{p}}:|x|>r\}$ cover the set $A= {\mathbb {R}^{p}}\setminus \{ 0\}$. Lindelöf’s theorem claims that countably many of these also cover A, e.g., $\bigcup _{n=1}^\infty G_{1/n} =A $. On the other hand, it is obvious that finitely many of the sets $G_r$ do not cover A.

The examples above show that we cannot replace the word “countable” by “finite” in Lindelöf’s theorem. That is, we cannot always choose a finite subcovering system from a covering system of open sets. The sets that satisfy this stronger condition form another important class of sets.

Definition 1.30.

We call a set $A\subset {\mathbb {R}^{p}}$ compact if we can choose a finite covering system from each of its covering systems of open sets.

Theorem 1.31.

(Borel’s
^{10} Theorem)
A set $A\subset {\mathbb {R}^{p}}$ is compact if and only if it is bounded and closed.

Proof.

Let A be compact. Since $A\subset {\mathbb {R}^{p}}=\bigcup _{n=1}^\infty B(0,n)$, there exists N such that $A\subset \bigcup _{n=1}^N B(0,n)=B(0,N)$ (this follows from the compactness of A). Thus A is bounded.

Now we prove that A is closed. We shall do so by showing that ${\mathbb {R}^{p}}\setminus A$ is open. Let $a\in {\mathbb {R}^{p}}\setminus A$. Thenis an open cover of A, and then, by the compactness of A, there exists an integer K such thatThus $B(a, 1/K) \cap A=\emptyset $ and $B(a, 1/K) \subset {\mathbb {R}^{p}}\setminus A$. Since $a\in {\mathbb {R}^{p}}\setminus A$ was arbitrary, this proves that ${\mathbb {R}^{p}}\setminus A$ is open.

$$ A\subset {\mathbb {R}^{p}}\setminus \{ a\} =\bigcup _{k=1}^\infty \left( {\mathbb {R}^{p}}\setminus \overline{B} (a, 1/k)\right) $$

$$ A\subset \bigcup _{k=1}^K \left( {\mathbb {R}^{p}}\setminus \overline{B} (a, 1/k) \right) ={\mathbb {R}^{p}}\setminus \overline{B} (a, 1/K) . $$

Now suppose that A is bounded and closed; we shall show that A is compact. Let $\mathcal {G}$ be a system of open sets covering A. By Lindelöf’s theorem there exists a countable subsystem $\{ G_1 , G_2 ,\ldots \}$ of $\mathcal {G}$ that also covers A. Letfor each n. The sets $F_n$ are closed (since ${\bigcup _{i=1}^n G_i}$ is open, $A_n ={{\mathbb {R}^{p}}\setminus \bigcup _{i=1}^n G_i}$ is closed, and thus $F_n =A\cap A_n$ is also closed), and they are bounded (since they are contained in A), and $F_1 \supset F_2 \supset \ldots $ holds. If the sets $F_n$ are all nonempty, then by Cantor’s theorem, their intersection $A\setminus \bigcup _{i=1}^\infty G_i$ is also nonempty. However, this is impossible, since $A\subset \bigcup _{i=1}^\infty G_i $. Thus, there exists n such that $F_n = A\setminus \bigcup _{i=1}^n G_i =\emptyset $; that is, $A\subset \bigcup _{i=1}^n G_i $. This shows that finitely many of the sets $G_i$ cover A. $\square $

$$F_n =A\setminus \bigcup _{i=1}^n G_i =A\cap \left( {\mathbb {R}^{p}}\setminus \bigcup _{i=1}^n G_i \right) $$

If A and B are nonempty sets in ${\mathbb {R}^{p}}$, then the distance
between A
and B is
The distance between two disjoint closed sets can be zero (see Exercise 1.36). Our next theorem shows that this is possible only if neither A nor B is bounded.

$$ \mathrm{dist}(A, B)=\inf \{ |x-y|:x\in A,\ y\in B\}. $$

Theorem 1.32.

Let A and B be disjoint nonempty closed sets, and suppose that at least one of them is bounded. Then

- (i)there exist points $a\in A$ and $b\in B$ such that $\mathrm{dist}(A, B)=|a-b|$, and
- (ii)$\mathrm{dist}(A, B)>0$.

Proof.

Let $\mathrm{dist}(A, B)=d$, and let the points $a_n \in A$ and $b_n \in B$ be chosen such that $|a_n-b_n|\lt d+(1/n)$ $(n=1,2,\ldots )$. Since at least one of the sets A and B is bounded, it follows that both of the sequences $(a_n )$ and $(b_n )$ are bounded.

By the Bolzano–Weierstrass theorem (Theorem 1.9) we can select a convergent subsequence of $(a_n )$. Replacing $(a_n )$ by this subsequence, we may assume that $(a_n )$ itself is convergent. Then we select a convergent subsequence of $(b_n )$. Turning to this subsequence, we may assume that $(a_n )$ and $(b_n )$ are both convergent.

If $a_n \rightarrow a $ and $b_n \rightarrow b$, then $a\in A$ and $b\in B$, since A and B are both closed. Now $|a-b|=\lim _{n\rightarrow \infty } |a_n -b_n |\le d$. Using the definition of the distance between sets, we get $|a-b|\ge d$, and thus $|a-b|= d$. This proves (i), while (ii) follows immediately from (i). $\square $

Exercises

1.8.

Let $p=2$. Find $\mathrm{int}\, A$, ext A, and $\partial A$ for each of the sets below.

- (a)$\{ (x, y)\in \mathbb {R}^2:x, y>0, \ x+y\lt 1 \}$;
- (b)$\{ (x, 0)\in \mathbb {R}^2:0\lt x\lt 1\}$;
- (c)$\{(x, y)\in \mathbb {R}^2:x=1/n$ ($n=1,2,\ldots $), ${0\lt y\lt1}\}$.

1.9.

Find every set $A\subset {\mathbb {R}^{p}}$ such that $\mathrm{int}\, A$ has exactly three elements. (S)

1.10.

Show that $\partial (A\cup B)\subset \partial A \cup \partial B$ and $\partial (A\cap B)\subset \partial A \cup \partial B$ hold for every $A, B\subset {\mathbb {R}^{p}}$-re. (S)

1.11.

Is there a set $A\subset \mathbb {R}^2$ such that ${\partial A =\{ (1/n , 0):n=1,2,\ldots \}}$?

1.12.

Let $A\subset \mathbb {R}^2$ be a closed set. Show that $A=\partial H$ for a suitable set ${H\subset \mathbb {R}^2}$.

1.13.

Show that $\partial \, \partial A\subset \partial A$ for every set $A\subset {\mathbb {R}^{p}}$. Also show that $\partial \, \partial A= \partial A$ is not always true.

1.14.

Show that if the set $A\subset {\mathbb {R}^{p}}$ is open or closed, then $\partial \, \partial A=\partial A$ and $\mathrm{int}~\partial A =\emptyset $.

1.15.

Show that the union of infinitely many closed sets is not necessarily closed.

1.16.

Show that every open set of ${\mathbb {R}^{p}}$ can be written as the union of countable many boxes.

1.17.

What are the sets whose boundary consists of exactly three points?

1.18.

Show that if $A\subset {\mathbb {R}^{p}}$, where $p>1$, and if $\partial A$ is countable, then one of A and ${\mathbb {R}^{p}}\setminus A$ is countable.

1.19.

Which are the sets satisfying

- (a)$\mathrm{int}\, A=\partial A$?
- (b)$\mathrm{int}~A =\mathrm{cl}~A$?
- (c)$\mathrm{ext}~A =\mathrm{cl}~A$?

1.20.

Show that every infinite bounded set has a limit point.

1.21.

What are the sets with no limit points? What are the sets with exactly three limit points?

1.22.

Show that for every set $A\subset {\mathbb {R}^{p}}$, the set $A'$ is closed. (S)

1.23.

Find every set $A \subset \mathbb {R}^2$ that satisfies $A' = A$ and $(\mathbb {R}^2\setminus A)' = \mathbb {R}^2\setminus A$.

1.24.

Let $A\subset \mathbb {R}^2$ be bounded, $G\subset \mathbb {R}^2$ open, and let $A'\subset G$. Show that $A\setminus G$ is finite.

1.25.

Construct a set A such that the sets $A,\ A' ,\ A'' $, etc. are distinct.

1.26.

Is there a bounded infinite set every point of which is an isolated point?

1.27.

Show that the number of isolated points of an arbitrary set is countable. (H)

1.28.

A set $A\subset {\mathbb {R}^{p}}$ is called everywhere dense
if it has a point in every ball. Construct an everywhere dense set $A\subset \mathbb {R}^2$ that does not contain three collinear points.

1.29.

Decompose $\mathbb {R}^2$ into infinitely many pairwise disjoint everywhere dense sets.

1.30.

Construct a function $f:\mathbb {R}\rightarrow \mathbb {R}$ whose graph is everywhere dense in $\mathbb {R}^2$.

1.31.

We call a set $A\subset \mathbb {R}^2$ a star if it is the union of three segments that have a common endpoint but are otherwise disjoint.
Show that every system of pairwise disjoint stars is countable. ($*$ H)

1.32.

Show that a system of pairwise disjoint stars in $\mathbb {R}^2$ cannot cover a line. ($*$)

1.33.

Construct a sequence of sets $A_1 \supset A_2 \supset \ldots $ that satisfy $\bigcap _{n=1}^\infty A_n =\emptyset $ and are

- (a)bounded and nonempty;
- (b)closed and nonempty.

1.34.

Show that a set $A\subset {\mathbb {R}^{p}}$ is bounded and closed if and only if every sequence $x_n \in A$ has a subsequence converging to a point of A.

1.35.

Is there a sequence $x_n \in \mathbb {R}$ such that $[0,1]\subset \bigcup _{n=1}^\infty (x_n -2^{-n} , x_n +2^{-n} )$?

How about a sequence with $[0,1]\subset \bigcup _{n=1}^\infty (x_n -2^{-n-1} , x_n +2^{-n-1} )$? (H)

1.36.

Give examples of two disjoint nonempty closed sets with distance zero (a) in $\mathbb {R}^2$, and (b) in $\mathbb {R}$. (S)

1.37.

A set $G\subset {\mathbb {R}^{p}}$ is called a regular open set if $G=\mathrm{int}~\mathrm{cl}~G$.
Show that for every $G\subset {\mathbb {R}^{p}}$ the following statements are equivalent.

- (i)The set G is regular open.
- (ii)There is a set A with $G=\mathrm{int}~\mathrm{cl}~A $.
- (iii)There is a set A with $G=\mathrm{ext}~\mathrm{int}~A$.
- (iv)$G=\mathrm{ext}~\mathrm{ext}~G$.

1.38.

Which of the following sets in $\mathbb {R}^2$ are regular open?

- (i)$\{ (x, y):x^2 +y^2\lt 1\} $.
- (ii)$\{ (x, y):0\lt x^2 +y^2\lt 1 \} $.
- (iii)$\{ (x, y):x^2 +y^2\lt 1 ,\ y\ne 0 \} $.
- (iv)$\{ (x, y):x^2 +y^2 \in [0,1) \setminus \{ 1/2\} \} $.

1.39.

Show that for every set $A\subset {\mathbb {R}^{p}}$ the following are true:

$$\begin{aligned} \begin{aligned}&\mathrm{ext}~\mathrm{ext}~\mathrm{ext}~\mathrm{ext}~A=\mathrm{ext}~\mathrm{ext}~A ,\qquad \mathrm{ext}~\mathrm{ext}~\mathrm{ext}~\mathrm{int}~A=\mathrm{ext}~\mathrm{int}~A,\\&\mathrm{ext}~\mathrm{ext}~\mathrm{int}~\partial A=\mathrm{int}~\partial A,\qquad \qquad \mathrm{ext}~\mathrm{ext}~\partial A=\mathrm{int}~\partial A,\\&\partial \, \mathrm{ext}~\mathrm{ext}~\mathrm{int}~A=\partial \, \mathrm{ext}~\mathrm{int}~A,\qquad \partial \, \mathrm{ext}~\mathrm{ext}~\mathrm{ext}~A=\partial \, \mathrm{ext}~\mathrm{ext}~A,\\&\partial \, \mathrm{ext}~\mathrm{int}~\partial A=\partial \, \mathrm{int}~\partial A. \end{aligned} \end{aligned}$$

(1.8)

1.40.

Show that applying the operations $\mathrm{int}~,\ \mathrm{ext}~,\ \partial $ to an arbitrary set ${A\subset {\mathbb {R}^{p}}}$ (repeated an arbitrary number of times and in an arbitrarily chosen order) cannot result in more than 25 different sets. (* H)

1.41.

Show that the estimate in the previous exercise is sharp; i.e., give an example of a set $A\subset {\mathbb {R}^{p}}$ such that we get 25 different sets by applying the operations $\mathrm{int}~\!\!,\ \mathrm{ext}~\!\!,\ \partial $ an arbitrary number of times and in an arbitrarily chosen order.

1.42.

Show that applying the operations $\mathrm{int}~,\ \mathrm{ext}~,\ \partial $ together with the closure operation and the complement operation on an arbitrary set $A\subset {\mathbb {R}^{p}}$ (repeated an arbitrary number of times and in an arbitrarily chosen order) cannot result in more than 34 different sets.

At the core of multivariable analysis—as in the case of one-variable analysis—lies the investigation and application of the limit, continuity, differentiation, and integration of functions.

The concept of limit of a multivariable function—similarly to the single-variable case—is the idea that if x is close to a point a, then the value of the function at x is close to the limit.

Definition 1.33.

Let the real-valued function f be defined on the set $A\subset {\mathbb {R}^{p}}$, and let a be a limit point of A. We say that the limit of the function f at the point a restricted to the set A is $b\in \mathbb {R}$ if the following condition is satisfied. For every $\varepsilon >0$ there exists $\delta >0$ such that whenever $x\in A$ and $0<|x-a|<\delta $, then $|f(x)-b|<\varepsilon $.
Notation:
$\lim _{x\rightarrow a, \, x\in A}f(x)=b$
.

If the domain of f is A (i.e., if D(f) is not larger than A), then we can omit the part “restricted to the set A” from the definition and instead we can say that the limit of the function f at the point a is b
. In this case, the notation is $\lim _{x\rightarrow a} f(x)=b$ or $f(x)\rightarrow b$ as $x\rightarrow a$.

Example 1.34.

1. Let $p=2$. We show that $\lim _{(x, y)\rightarrow (0,0)}\frac{x^2 y}{x^2 +y^2} =0$. For $\varepsilon >0$ fixed, $0<|(x, y)|=\sqrt{x^2 +y^2} <\varepsilon $ implies $|y|<\varepsilon $; thus
2. We show that the limit $\lim _{(x, y)\rightarrow (0,0)}\frac{xy}{x^2 +y^2}$ does not exist.

$$\left| \frac{x^2 y}{x^2 +y^2} \right| \le |y|<\varepsilon .$$

Since the function is zero on the axes, there exists a point in every neighborhood of (0, 0) where the function is zero. On the other hand, the function is 1/2 at the points of the line $y=x$, whence there exists a point in every neighborhood of (0, 0) where the function is 1/2. This implies that the limit does not exist: we cannot find an appropriate $\delta $ for $\varepsilon = 1/4$, regardless of the value of b. (See Figure 1.10.)

Note, however, that the function $xy/(x^2 +y^2)$ has a limit at the origin when restricted to a line that passes through it, since the function is constant on every such line (aside from the origin itself).

Definition 1.35.

Let the function f be defined on the set $A\subset {\mathbb {R}^{p}}$, and let a be a limit point of A. We say that the limit of the function f at the point a restricted to the set A is infinity (negative infinity) if for every K there exists $\delta\gt0$ such that $f(x)\gt K$ ($f(x)\lt K$) for every $x\in A$ satisfying $0\lt|x-a|\lt\delta $.
Notation:
$\lim _{{x\rightarrow a,\; x\in A}}f(x)=\infty $
(${-\infty }$).

If the domain of f is A (i.e., if it is not larger than A), then we can omit the part “restricted to the set A” of the definition and instead we can say that the limit of the function f at the point a is infinity (negative infinity). In this case, the notation is $\lim _{x\rightarrow a} f(x)=\infty \ (-\infty ) $.

Example 1.36.

Let A be the $\{ (x, y):y>x\}$ half-plane.

Then $\lim _{\genfrac{}{}{0.0pt}1{(x, y)\rightarrow (0,0)}{(x, y)\in A}} \frac{1}{y-x}=\infty $. Indeed, if $K> 0$ is fixed and $0<|(x, y)|= \sqrt{x^2 +y^2} <1/K$, then $|x|,\, |y|\lt1/K$, thus $|y-x|<2/K $. On the other hand, if $(x, y)\in A$ also holds, then $x\lt y $ and $0\lt y-x\lt2/K$, and thus ${1}/{(y-x)}\gt K/2$.

By the same argument, $\lim _{\genfrac{}{}{0.0pt}1{(x, y)\rightarrow (0,0)}{(x, y)\in B}} \frac{1}{y-x}=-\infty $, where $B=\{ (x, y):y\lt x\}$. It also follows that the limit $\lim _{(x, y)\rightarrow (0,0)} \frac{1}{y-x}$ does not exist.

These three kinds of limits can be described by a single definition with the help of punctured neighborhoods (sometimes called deleted neighborhoods). The punctured neighborhoods
of a point $a\in {\mathbb {R}^{p}}$ are the sets $B(a, r)\setminus \{ a\}$, where r is an arbitrary positive number.

Recall that the neighborhoods of $\infty $ and $-\infty $ are defined as the half-lines $(a,\infty )$ and $(-\infty , a)$, respectively.

Theorem 1.37.

Let the function f be defined on the set $A\subset {\mathbb {R}^{p}}$, and let a be a limit point of A. Let $\beta $ be a real number b or one of $\pm \infty $. Then $\lim _{{x\rightarrow a,\; x\in A}}f(x)=\beta $ holds if and only if for every neighborhood V of $\beta $, there exists a punctured neighborhood $\,\dot{\!U\,}\!$ of
a
such that $ f(x)\in V$ for every $x\in \ A\cap \,\dot{\!U\,}\!$. $\square $

The proof of the following theorem is exactly the same as the proof of its single-variable counterpart (see [7, Theorem 10.19]).

Theorem 1.38.

(Transference principle) Let the function f be defined on the set $A\subset {\mathbb {R}^{p}}$, and let a be a limit point of A. Let $\beta $ be a real number b or one of $\pm \infty $. Then $\lim _{{x\rightarrow a,\; x\in A}}f(x)=\beta $ holds if and only if for every sequence $(x_n )$ with $x_n \rightarrow a$ and $x_n \in A \setminus \{a \}$ for every n, we have that $f(x_n )\rightarrow \beta $. $\square $

The following three statements follow easily from the definitions and from the theorems above, combined with their single-variable counterparts. (See [7, Theorems 10.29-10.31].)

Theorem 1.39.

- (i)(Squeeze theorem) If $f(x)\le g(x)\le h(x)$ for every $x{\in } A\setminus \{a \}$ andthen $\lim _{{x\rightarrow a,\; x\in A}} g(x)=\beta $.$$ \lim _{\genfrac{}{}{0.0pt}1{x\rightarrow a}{x\in A}}f(x)= \lim _{\genfrac{}{}{0.0pt}1{x\rightarrow a}{x\in A}}h(x)=\beta , $$
- (ii)Ifthen there exists a punctured neighborhood $\,\dot{\!U\,}\!$ of a such that $f(x)\lt g(x)$ holds for every $x\in \,\dot{\!U\,}\!\cap A$.$$ \lim _{\genfrac{}{}{0.0pt}1{x\rightarrow a}{x \in A}}f(x)=b\lt c=\lim _{\genfrac{}{}{0.0pt}1{x\rightarrow a}{x \in A}} g(x), $$
- (iii)If the limits $\lim _{{x\rightarrow a},\; {x\in A}}f(x)=b$ and $\lim _{{x\rightarrow a},\; {x\in A}}g(x)=c$ exist, and furthermore, if $f(x)\leqq g(x)$ holds at the points of the set $A\setminus \{a \}$, then $b\le c$. $\square $

From the squeeze theorem and from the corresponding theorems on real sequences we obtain the following.

Theorem 1.40.

Let the limits ${\lim _{{x\rightarrow a},\,{x \in A}}f(x)=b}$ and ${\lim _{{x\rightarrow a},\,{x \in A}} g(x)=c}$ exist and be finite. Then we have $ \lim _{{x\rightarrow a},\; {x \in A}}(f(x) +g(x))=b+c$, $ \lim _{{x\rightarrow a},\; {x \in A}}(f(x) \cdot g(x))=b\cdot c$, and, assuming also $c\ne 0$, $\lim _{{x\rightarrow a},\; {x \in A}}(f(x)/g(x))=b/c$. $\square $

Remark 1.41.

In the case of one-variable functions, one can define 15 kinds of limits, considering five different options for the location of the limit (a finite point, left- or right-sided limit at a finite point, $\infty $, and $-\infty $), and three options for the value of the limit (finite, $\infty $, and ${-\infty }$).

In the case of multivariable functions the notion of left- and right-sided limits and limits at $\infty $ and ${-\infty }$ are meaningless. The reason is clear; for $p>1$ we have infinitely many directions in ${\mathbb {R}^{p}}$, instead of merely two. Obviously, it would be pointless to define limits for every direction; if we really need to talk about limits in a given direction, we can simply take the limit of the function restricted to the corresponding line.

The limit at infinity in a given direction can be viewed as the limit at $\infty $ of an appropriate single-variable function. For example, if v is a unit vector in the plane, then a half-line starting from the origin in the direction of v is the set of vectors $tv\ (t>0)$. Thus the limit of a function at infinity in the direction of v can be viewed as the limit of the single-variable function ${t\mapsto f(tv)}$ at infinity.

Exercises

1.43.

Evaluate the following limits or prove that the limits do not exist for the following two-variable functions at the given points. If the limit exists, find a suitable $\delta $ for every $\varepsilon >0$ (based on the definition of the limit).

- (a)$\dfrac{x-2}{y-3}$, (2, 3);
- (b)$\dfrac{x^2 y}{x^2 +y}$, (0, 0);
- (c)$x\cdot \sin \dfrac{1}{y}$, (0, 0);
- (d)$\dfrac{x^2 -y^2 }{x^2 +y^2}$, (0, 0);
- (e)$x+\dfrac{1}{y}$, (3, 2);
- (f)$\dfrac{\sin xy}{y}$, (0, 0);
- (g)$x^y \ (x>0, \ y\in \mathbb {R})$, (0, 0);
- (h)$(1+x)^y$, (0, 0);
- (i)$\dfrac{x^2 y^2}{x+y}$, (0, 0);
- (j)$\dfrac{xy -1}{x-1}$, (1, 1);
- (k)$\dfrac{\log x}{x-1}$, (1, 1);
- (l)$\dfrac{\root 3 \of {x^2 y^5}}{x^2 +y^2}$, (0, 0);
- (m)$\dfrac{\sin x -\sin y}{x-y},\ (0,0)$.

1.44.

Show that if $A\subset {\mathbb {R}^{p}}$ is countable, then there exists a function $f{:A\rightarrow \mathbb {R}}$ such that $\lim _{x\rightarrow a} f(x)=\infty $ for every point $a\in A'$.

1.45.

Show that if $A\subset {\mathbb {R}^{p}}$, $f:A\rightarrow \mathbb {R}$, and $\lim _{x\rightarrow a} f(x)=\infty $ for every point $a\in A'$, then A is countable. (H)

Definition 1.42.

Let the function f be defined on the set $A\subset {\mathbb {R}^{p}}$, and let $a\in A$. We say that f is continuous at the point a restricted to the set A if for every $\varepsilon >0$, there exists $\delta >0$ such that $x\in A$, $|x-a|<\delta $ imply ${|f(x) -f(a)|<\varepsilon }$.

If the domain of f is equal to A, we can omit the part “restricted to the set A” in the above definitions, and instead we can say that f is continuous at a.

If the function f is continuous at every point $a\in A$, we say that f is continuous on the set A.

Intuitively, the continuity of a function f at a point a means that the graph of f at the point (a, f(a)) “does not break.”

Remark 1.43.

It is obvious from the definition that f is continuous at a point a restricted to the set A if and only if one of the following statements holds:

- (i)the point a is an isolated point of A;
- (ii)$a\in A\cap A'$ and $\lim _{{x\rightarrow a},\; {x\in A}}f(x)=f(a)$.

We can easily prove the following theorem, called the transference principle for continuity, with the help of Theorem 1.38.

Theorem 1.44.

The function f is continuous at the point a restricted to the set A if and only if for every sequence $(x_n )$ with $x_n \rightarrow a$ and $x_n \in A$ we have $f(x_n )\rightarrow f(a) $. $\square $

While investigating multivariable functions, fixing certain variables at a given value and considering our original function as a function of the remaining variables can make the investigation considerably easier. The functions we get this in way are the sections of the original function.
For example, the sections of the two-variable function f(x, y) are the single-variable functions $y\mapsto f_a (y)=f(a, y)$ and $x\mapsto f^b (x)=f(x, b)$
, for every ${a, b\in \mathbb {R}}$. The section $f_a$ is defined at those points y for which the point (a, y) is in the domain D(f) of the function f. Similarly, the section $f^b$ is defined at those points x for which $(x, b)\in D(f)$.

Remark 1.45.

It is easy to see that if a function is continuous at the point $(a_1 ,\ldots , a_p )$, then fixing a subset of the coordinates at the appropriate numbers $a_i$, we obtain a section that is continuous at $(a_{i_1} ,\ldots , a_{i_s})$, where the $i_1 ,\ldots , i_s$ denote the indices of the nonfixed coordinates. For example, if a two-variable function f is continuous at the point (a, b), then the section $f_a$ is continuous at b, and the section $f^b$ is continuous at a. The converse of the statement is not true. The continuity of the sections does not imply the continuity of the original function.

Consider the function $f:\mathbb {R}^2\rightarrow \mathbb {R}$, where $f(x, y)=xy/(x^2 +y^2 )$ if $(x, y)\ne (0,0)$, and $f(0,0)=0$. (See Figure 1.10.) Every section of f is continuous. Indeed, if $a\ne 0$, then the function $f_a (y)=ay/(a^2 +y^2 )$ is continuous everywhere, since it can be written as a rational function whose denominator is never zero (see Theorem 1.48 below). However, for $a=0$ the function $f_a$ is constant, with the value zero, and thus it is continuous as well. Similarly, the section $f^b$ is continuous for every b.

On the other hand, the function f is not continuous at the point (0, 0), since by Example 1.34.2, it does not even have a limit at (0, 0).

Theorem 1.40 implies the following theorem.

Theorem 1.46.

If the functions f and g are continuous at the point a restricted to the set A, then the same is true for the functions $f+g$ and $f\cdot g$. Furthermore, if $g (a)\ne 0$, then the function f / g is also continuous at the point a. $\square $

Definition 1.47.

We call the function $x=(x_1 ,\ldots , x_p )\mapsto x_i$, defined on ${\mathbb {R}^{p}}$, the i
th coordinate function.

We call the function $f:{\mathbb {R}^{p}}\rightarrow \mathbb {R}$ a p-variable polynomial function
(polynomial for short) if we can obtain f from the coordinate functions $x_1 ,\ldots , x_p$ and constants using only addition and multiplication. Clearly, the polynomials are finite sums of terms of the form $c\cdot x^{n_1}_1 \cdots x^{n_p}_p$, where the c coefficients are real numbers and the exponents $n_1 ,\ldots , n_p$ are nonnegative integers.

We call the quotients of two p-variable polynomials p-variable rational functions.

Theorem 1.48.

The polynomials are continuous everywhere. The rational functions are continuous at every point of their domain.

Proof.

First we show that the coordinate functions are continuous everywhere. This follows from the fact that if $|x-a|<\varepsilon $, where $x=(x_1 ,\ldots , x_p )$ and$a=(a_1 ,\ldots , a_p )$, then $|x_i -a_i |<\varepsilon $ for every $i=1,\ldots , p$. From this it is clear, by Theorem 1.46, that the polynomials are continuous everywhere.

If p and q are polynomials, then the domain of the rational function p / q consists of the points where q is not zero. Again, Theorem 1.46 gives that p / q is continuous at those points. $\square $

The following theorem concerns the limits of composite functions.

Theorem 1.49.

Suppose that

- (i)$A\subset {\mathbb {R}^{p}},\ g:A\rightarrow \mathbb {R}$ and $\lim _{x\rightarrow a}g(x)=\gamma $, where $\gamma $ is a real number or one of $\pm \infty $;
- (ii)$g(A)\subset H\subset \mathbb {R}$, $f:H\rightarrow \mathbb {R}$, and $\lim _{y\rightarrow \gamma }f(y)=\beta $, where $\beta $ is a real number or one of $\pm \infty $;
- (iii)$g(x)\ne \gamma $ in a punctured neighborhood of a, or $\gamma \in H$ and f is continuous at $\gamma $ restricted to H.

Then

$$\begin{aligned} \lim _{x\rightarrow a} f(g(x))=\beta . \end{aligned}$$

(1.9)

Proof.

By the transference principle, we have to show that if $x_n \rightarrow a$ is a sequence with $x_n \in A\setminus \{ a\}$ for each n, then $f(g(x_n ))\rightarrow \beta $.

It follows from Theorem 1.38 that $g(x_n )\rightarrow \gamma $. If $g(x)\ne \gamma $ in a punctured neighborhood of a, then $g(x_n )\ne \gamma $ for every n large enough. Then, applying Theorem 1.38 again, we find that $f(g(x_n ))\rightarrow \beta $. Also, if f is continuous at $\gamma $, then Theorem 1.44 gives $f(g(x_n ))\rightarrow f(\gamma )=\beta $. Therefore, applying Theorem 1.38, we obtain (1.9). $\square $

Corollary 1.50.

If g is continuous at a point $a\in {\mathbb {R}^{p}}$ restricted to the set $A\subset {\mathbb {R}^{p}}$ and if the single-variable function f is continuous at g(a) restricted to g(A), then $f\circ g$ is also continuous at a restricted to A. $\square $

This corollary implies that all functions obtained from the coordinate functions using elementary functions^{11} are continuous on their domain. For example, the three-variable functionis continuous at every point (x, y, z) such that $xyz\ne 1$.

$$(x,y, z)\mapsto \frac{e^{\cos (x^2 +y)} -z}{1-xyz}$$

The familiar theorems concerning continuous functions on bounded and closed intervals (see [7, Theorems 10.52 and 10.55]) can be generalized as follows.

Theorem 1.51.

(Weierstrass’s theorem) Let $A\subset {\mathbb {R}^{p}}$ be nonempty, bounded, and closed, and let $f:A\rightarrow \mathbb {R}$ be continuous. Then f is bounded on the set A, and the range of f has a greatest as well as a least element.

Proof.

Let $M=\sup f(A)$. If f is not bounded from above, then $M=\infty $, and for every n there exists a point $x_n \in A$ such that $f(x_n )>n$. On the other hand, if f is bounded from above, then M is finite, and for every positive integer n there exists a point $x_n \in A$ such that $f(x_n )>M-(1/n)$. In both cases we have found a sequence $x_n \in A$ with the property $f(x_n )\rightarrow M$.

The sequence $(x_n)$ is bounded (since its terms are in A). Then, by the Bolzano–Weierstrass theorem, it has a convergent subsequence $(x_{n_k})$. Let $\lim _{k\rightarrow \infty }x_{n_k}=a $. Since A is closed, it follows that $a\in A$ by Theorem 1.17. Now, f is continuous at a, and thus the transference principle implies ${f(x_{n_k} )\rightarrow f(a)}$. Thus $M=f(a)$. We obtain that M is finite, whence f is bounded from above, and that ${M\in f(A)}$; that is, $M =\max f(A)$.

The proof of the existence of $\min f(A)$ is similar. $\square $

Definition 1.52.

We say that a function f is uniformly continuous on the set ${A\subset {\mathbb {R}^{p}}}$ if for every $\varepsilon >0$ there exists a uniform $\delta $, i.e., a $\delta >0$ independent of the location in A such that $x, y\in A$ and $|x-y|\lt \delta $ imply $|f(x)-f(y)|\lt \varepsilon $.

Theorem 1.53.

(Heine’s
^{12} theorem) Let $A\subset {\mathbb {R}^{p}}$ be bounded and closed, and let $f:A\rightarrow \mathbb {R}$ be continuous. Then f is uniformly continuous on A.

Proof.

We prove the statement by contradiction. Suppose that f is not uniformly continuous in A. Then there exists $\varepsilon _0 >0$ for which there does not exist a “good” $\delta >0$; that is, there is no $\delta $ satisfying the requirement formulated in the definition of uniform continuity. Then in particular, ${\delta =1/n}$ is not “good” either, that is, for every n there exist $\alpha _n , \beta _n \in A$ for which $|\alpha _n -\beta _n |<1/n$ but $|f(\alpha _n)-f(\beta _n)|\ge \varepsilon _0$.

Since $\{\alpha _n \} \subset A$ and A is bounded, there exists a convergent subsequence $(\alpha _{n_k})$ whose limit, $\alpha $, is also in A, since A is closed. Now we haveSince f is continuous on A, it is continuous at $\alpha $ (restricted to A). Thus, by the transference principle, $f\left( \alpha _{n_k} \right) \rightarrow f(\alpha )$ and $f\left( \beta _{n_k} \right) \rightarrow f(\alpha )$, soThis, however, contradicts $|f(\alpha _n)-f(\beta _n)|\ge \varepsilon _0$. $\square $

$${ \beta _{n_k} =\left( \beta _{n_k} -\alpha _{n_k} \right) + \alpha _{n_k} \rightarrow 0+ \alpha =\alpha . }$$

$$ \lim _{k\rightarrow \infty } \left( f\left( \alpha _{n_k} \right) -f\left( \beta _{n_k} \right) \right) =0. $$

In many different applications of analysis we need to replace the functions involved by simpler functions that approximate the original one and are much easier to handle. An important example is the Weierstrass approximation theorem, which in the one-variable case states that if $f:[a, b]\rightarrow \mathbb {R}$ is continuous, then for every $\varepsilon >0$ there exists a polynomial g such that $|f(x)-g(x)|<\varepsilon $ for every $x\in [a, b]$. (See [7, Theorem 13.19].) Our next theorem is the generalization of this theorem to continuous functions of several variables.

Theorem 1.54.

(Weierstrass’s approximation theorem)
Let the real-valued function f be continuous on the box $R\subset {\mathbb {R}^{p}}$. Then for every $\varepsilon >0$ there exists a p-variable polynomial g such that $|f(x)-g(x)|<\varepsilon $ for every $x\in R$.

Proof.

We prove the theorem by induction on p. The case $p=1$ is covered by [7, Theorem 13.19]. (See also Remark 7.85 of this volume, where we give an independent proof.) We now consider the $p=2$ case.

Let $R=[a,b]\times [c, d]$, and let $0<\varepsilon <1$ be fixed. If f is continuous on R, then by Heine’s theorem (Theorem 1.53), f is uniformly continuous on R. Choose a positive $\delta $ such that $|f(x_1 , y_1 )-f(x_2 , y_2 )|<\varepsilon $ holds for every $(x_1 , y_1 ) , \, (x_2 , y_2 ) \in R$ satisfying $|(x_1 , y_1 )-(x_2 , y_2 )|\lt\delta $. We fix an integer $n>2(b-a )/\delta $ and divide the interval [a, b] into n equal subintervals. Let $a=t_0 be the endpoints of these subintervals.

For every $i=0,\ldots , n$, let $u_i$ denote the continuous one-variable function that is zero everywhere outside of $(t_{i-1} , t_{i+1} )$, equals 1 at the point $t_i$, and is linear on the intervals $[t_{i-1}, t_i]$ and $[t_i , t_{i+1}]$. (The numbers $t_{-1}\lt a $ and $t_{n+1}\gt b$ can be arbitrarily chosen for the functions $u_0$ and $u_n$.) The functions $u_0 ,\ldots , u_n$ are continuous, and $\sum _{i=0}^n u_i (x)=1$ for every ${x\in [a , b]}$. Consider the function We show that $|f(x, y)-f_1 (x, y)|<\varepsilon $ for every $(x, y)\in R$. For a fixed $(x, y)\in R$, $u_i (x)$ is nonzero only if $|t_i -x|<2(b -a )/n<\delta $. For every such factor $u_i (x)$ we have $|(t_i ,y) -(x, y)|<\delta $, and thus $|f(t_i ,y) -f(x, y)|<\varepsilon $ by the choice of $\delta $. Since the sum of the numbers $u_i (x )$ is 1, it follows thatBy the single-variable version of Weierstrass’s approximation theorem, we can choose the polynomials $g_i$ and $h_i$ such that ${|f(t_i , y) -g_i (y)|}<{\varepsilon /(n+1)}$ for every $y\in [c , d]$, and ${|u_i (x) -h_i (x)|}<{\varepsilon /(n+1)}$ for every $x\in [a , b]$ $(i=1,\ldots , n)$. Consider the two-variable polynomial $g(x, y)=\sum _{i=0}^n g_i (y)\cdot h_i (x )$. We show that g approximates $f_1$ well on R. Indeed,where K denotes an upper bound of |f| on R. Thusfor every $(x, y)\in R$. We get $|f-g|\le |f-f_1 |+|f_1 -g|<{(K+3)\varepsilon }$ for each point in the box R. Since $\varepsilon $ was arbitrary, we have proved the theorem for ${p=2}$.

$$\begin{aligned} f_1 (x, y) =\sum _{i=0}^n f(t_i , y)\cdot u_i (x ). \end{aligned}$$

(1.10)

$$\begin{aligned} \left| f_1 (x,y)-f(x, y) \right|&= \left| \sum _{i=0}^n (f(t_i ,y)- f(x, y)) \cdot u_i (x)\right| \le \\&\le \sum _{u_i (x)\ne 0} \varepsilon \cdot u_i (x) = \varepsilon \cdot \sum _{i=0}^n u_i (x)=\varepsilon . \end{aligned}$$

$$\begin{aligned}&|f(t_i ,y) \cdot u_i (x ) -g_i (y)\cdot h_i (x )|\le \\&\quad \le |f(t_i , y) -g_i (y)| \cdot u_i (x ) + |g_i (y)|\cdot |u_i (x)-h_i (x)| \\&\quad \le (\varepsilon /(n+1))\cdot 1 +(K+\varepsilon ) \cdot (\varepsilon /(n+1)) \le (K+2)\varepsilon /(n+1) , \end{aligned}$$

$$|f_1 (x,y) -g(x, y)| \le \sum _{i=0}^n |f(t_i , y)\cdot u_i (x ) - g_i (y)\cdot h_i (x )|\le (K+2)\varepsilon $$

In the general case of the induction step a similar argument works. We leave the details to the reader. $\square $

Remark 1.55.

In the previous theorem one can replace the box R by an arbitrary bounded and closed set. More precisely, the following is true: if the set $A\subset {\mathbb {R}^{p}}$ is bounded and closed, and the function $f:A\rightarrow \mathbb {R}$ is continuous, then for every $\varepsilon >0$ there exists a p -variable polynomial g such that $|f(x)-g(x)|<\varepsilon $ holds for every $x\in A$ . See Exercises 1.59–1.63.

Exercises

1.46.

Let $f(x, y)=xy/(x^2 +y^2 )^\alpha $ if $(x, y)\ne 0$, and $f(0,0)=0$. For what values of $\alpha $ will f be continuous at the origin?

1.47.

Let $f(x, y)=|x|^\alpha |y|^\beta $ if $x\ne 0$ and $y\ne 0$, and let $f(x, y)=0$ otherwise. For what values of $\alpha ,\beta $ will f be continuous at the origin?

1.48.

Let $A \subset {\mathbb {R}^{p}}$ and $f:A \rightarrow \mathbb {R}$. Show that if the limit $g(x)=\lim _{y\rightarrow x} f(y)$ exists and is finite for every $x\in A$, then g is continuous on A.

1.49.

Construct a function $f:\mathbb {R}^2\rightarrow \mathbb {R}$ such that f is continuous when restricted to any line, but f is not continuous everywhere. (H)

1.50.

Let the function $f:\mathbb {R}^2\rightarrow \mathbb {R}$ be such that the section $f_a$ is continuous for every a, and the section $f^b$ is monotone and continuous for every b. Show that f is continuous everywhere.

1.51.

Let the set $A\subset {\mathbb {R}^{p}}$ be such that every continuous function $f:A\rightarrow \mathbb {R}$ is bounded. Show that A is bounded and closed.

1.52.

Is there a two-variable polynomial with range $(0,\infty )$? (H S)

1.53.

Show that if $A\subset {\mathbb {R}^{p}}$ is closed and $f:A\rightarrow \mathbb {R}$ is continuous, then the graph of f is a closed set in $\mathbb {R}^{p+1}$.

1.54.

True or false? If the graph of $f:[a, b]\rightarrow \mathbb {R}$ is a closed set in $\mathbb {R}^2$, then f is continuous on [a, b]. (H)

1.55.

Let $A\subset {\mathbb {R}^{p}}$ and $f:A\rightarrow \mathbb {R}$. Show that the graph of f is bounded and closed in $\mathbb {R}^{p+1}$ if and only if A is bounded and closed, and f is continuous on A.

1.56.

Let $A\subset {\mathbb {R}^{p}}$. Which of the following statements is true?

- (a)If every function $f:A\rightarrow \mathbb {R}$ is continuous, then A is closed.
- (b)If every function $f:A\rightarrow \mathbb {R}$ is continuous, then A is bounded.
- (c)If every function $f:A\rightarrow \mathbb {R}$ is uniformly continuous, then A is closed.
- (d)If every function $f:A\rightarrow \mathbb {R}$ is uniformly continuous, then A is bounded.

1.57.

Let $A\subset {\mathbb {R}^{p}}$. Show that the function $f:A\rightarrow \mathbb {R}$ is continuous on A if and only for every open interval $I\subset \mathbb {R}$ there exists an open set $G\subset {\mathbb {R}^{p}}$ such that $f^{-1}(I)=A\cap G$.

1.58.

Show that if $f:{\mathbb {R}^{p}}\rightarrow \mathbb {R}$ is continuous and $g_1 ,\ldots , g_p :[a, b] \rightarrow \mathbb {R}$ are integrable on [a, b], then the function $x\mapsto f(g_1 (x),\ldots , g_p (x))$ is also integrable on [a, b].

In the next five exercises $A\subset {\mathbb {R}^{p}}$ is a fixed bounded and closed set, and $f:A{\rightarrow } \mathbb {R}$ is a fixed continuous function.

1.59.

Show that for every polynomial h and $\varepsilon >0$, there exists a polynomial g such that $\left| |h(x)|-g(x) \right| <\varepsilon $ for every $x\in A$. (S)

1.60.

Let $h_1 ,\ldots , h_n $ be polynomials. Show that for every $\varepsilon >0$, there exist polynomials $g_1 , g_2$ such that $|\max (h_1 (x) ,\ldots , h_n (x)) -g_1 (x)|<\varepsilon $ and $|\min (h_1 (x) ,\ldots , h_n (x)) -g_2 (x)|<\varepsilon $ for every $x\in A$. (S)

1.61.

Show that for every $a, b\in A$ there exists a polynomial $g_{a, b}$ such that $g_{a, b} (a)=f(a)$ and $g_{a, b} (b)=f(b)$. (S)

1.62.

Let $\varepsilon >0$ be fixed. Show that for every $a\in A$, there exists a polynomial $g_a$ such that $g_a (x)>f(x)-\varepsilon $ for every $x\in A$, and $g_a (a)<{f(a)+\varepsilon }$. (S)

1.63.

Show that if $A\subset {\mathbb {R}^{p}}$ is a bounded and closed set and ${f:A\rightarrow \mathbb {R}}$ is a continuous function, then for every $\varepsilon >0$ there exists a p-variable polynomial g such that $|f(x)-g(x)|<\varepsilon $, for every $x\in A$. (S)

Differentiation of multivariable functions shows more diversity than limits or continuity. Although some of the equivalent definitions of differentiability of one-variable functions have a straightforward generalization to functions of several variables, the notion of derivative is more complicated than that for functions of one variable. For this reason we postpone the discussion of differentiability and the derivative of functions of several variables to the next section and begin with those derivatives that we get by fixing all but one variable and differentiating the resulting single-variable function.

Definition 1.56.

Let the function f be defined in a neighborhood of the point $a=(a_1 ,\ldots , a_p ) \in {\mathbb {R}^{p}}$. Let us fix each of the coordinates $a=(a_1 ,\ldots , a_p )$, except for the ith one, and consider the corresponding section of the function: We call the derivative of the single-variable function $f_i$ at the point $a_i$ (when it exists) the ith partial derivative of the function f at a, and use any of the following notation:^{13}
So, for example, assuming that the (finite or infinite) limit exists.

$$\begin{aligned} t\mapsto f_i (t)=f(a_1 ,\ldots , a_{i-1} ,t, a_{i+1} ,\ldots , a_p ). \end{aligned}$$

(1.11)

$$ \frac{\partial f}{\partial x_i} (a),\ f'_{x_i}(a),\ f_{x_i}(a),\ D_{x_i} f(a), \ D_i f(a) .$$

$$\begin{aligned} D_i f(a)=\lim _{t\rightarrow a_i}\frac{f(a_1 ,\ldots , a_{i-1} ,t, a_{i+1} ,\ldots , a_p )-f(a)}{t-a_i}, \end{aligned}$$

(1.12)

Let the function f be defined on a subset of ${\mathbb {R}^{p}}$. By the ith partial derivative function of f
we mean the function $D_i f$, where $D_i f$ is defined at every point a, where the ith partial derivative of f exists and is finite, and its value is $D_i f (a)$ at these points.

Example 1.57.

We get the partial derivatives by fixing all but one of the variables and differentiating the resulting function as a single-variable function. For example, if $f(x, y)=xy(x^2 +y^2 -1)$, thenandat every point (x, y).

$$\frac{\partial f}{\partial x} =D_1 f(x, y)=y(x^2 +y^2 -1)+xy\cdot 2x=y\cdot (3x^2 +y^2 -1)$$

$$\frac{\partial f}{\partial y} =D_2 f(x, y)=x(x^2 +y^2 -1)+xy\cdot 2y = x\cdot (x^2 +3y^2 -1) $$

Remark 1.58.

Continuity does not follow from the existence of finite partial derivatives. Let $f(x, y)=xy/(x^2 +y^2 )$ if $(x, y)\ne (0,0)$, and let $f(0,0)=0$. Both partial derivatives of f exist at the origin, and they are both zero, since the sections $f_0$ and $f^0$ are both constant with value zero. (It is also clear that the partial derivatives of f exist and are finite at every other point $(x, y)\ne (0,0)$.)

However, by Example 1.34.2, f is not continuous at the origin.

According to one of the most important applications of differentiation of one-variable functions, if a is a local extremum point of the function f and if f is differentiable at a, then $f' (a)=0$. (See [7, Theorem 12.44, part (v)].) This theorem can easily be generalized to multivariable functions.

Definition 1.59.

We say that a function f has a local maximum (or local minimum) at the point a if a has a neighborhood U such that f is defined on U and for every $x\in U$ we have $f(x)\le f(a)$ (or $f(x)\ge f(a)$). In this case we say that the point a is a local maximum point (or local minimum point) of the function f.

If for every point $x\in U\setminus \{ a\}$ we have $f(x)< f(a)$ (or $f(x)\gt f(a)$), then we say that a is a strict local maximum point (or strict local minimum point).

We call the local maximum and local minimum the local extrema, while we call the local maximum points and local minimum points local extremal points, collectively.

Let f have a local maximum at the point $a=(a_1 ,\ldots , a_p )$. Obviously, for every $i=1,\ldots , p$, the function $f_i$ defined by (1.11) also has a local maximum at $a_i$. If $f_i$ is differentiable at $a_i$, then $f'_i (a_i )=0$. It is easy to see that $f'_i (a_i )=\pm \infty $ cannot happen, and thus we have proved the following theorem.

Theorem 1.60.

If the function f has a local extremum at the point $a\in {\mathbb {R}^{p}}$, and if the partial derivatives of f exist at a, then $D_i f(a)= 0$ for each $i=1,\ldots , p$. $\square $

Applying Theorems 1.51 and 1.60, we can determine the extrema of functions that are continuous on a bounded and closed set A and have partial derivatives in the interior of A. This method, described in the next theorem, corresponds to the technique that finds the extrema of functions of one variable that are continuous on an interval [a, b] and differentiable in (a, b). (See Example 12.46, Remark 12.47, and Example 12.48 in [7].)

Theorem 1.61.

Let $A\subset {\mathbb {R}^{p}}$ be bounded and closed, let $f:A\rightarrow \mathbb {R}$ be continuous, and let the partial derivatives of f exist at every internal point of A. Every point where f takes its greatest (least) value is either a boundary point of A, or else an internal point of A where the partial derivatives $D_i f$ are zero for every $i=1,\ldots , p$.

Proof.

By Weierstrass’s theorem (Theorem 1.51), f has a maximal value on A. Let $a\in A$ be a point where f is the largest. If $a\in \partial A$, then we are done. If, on the other hand, ${a\in \mathrm{int}~A}$, then f has a local maximum at a. By the assumptions of the theorem, the partial derivatives of f exist at the point a; thus $D_i f(a)= 0$ for every $i=1,\ldots , p$ by Theorem 1.60. $\square $

Example 1.62.

1. Find the maximum value of the function $f(x, y)=xy(x^2 +y^2 -1)$ on the disk $K=\{ (x, y):x^2 +y^2 \le 1\}$. In Example 1.10.1.c we saw that the boundary of K is the circle $S=\{ (x, y):x^2 +y^2 =1\}$. Since $S\subset K$, it follows that K is closed. The function f is a polynomial; thus it is continuous (see Theorem 1.48), and then, by Weierstrass’s theorem, f has a maximal value on K. The value of f is zero on the whole set S. Since the function f is positive for every $(x, y)\in \mathrm{int}~K, \ x>0,\ y<0$, it follows that f takes its largest value somewhere in the interior of K.

Let $(a, b)\in \mathrm{int}~K$ be a point where the value of f is the largest. Now, $0=D_1 f(a, b) =b\cdot (3a^2 +b^2 -1)$ and $0=D_2 f(a, b) =a\cdot (a^2 +3b^2 -1)$.

If $a=0$, then $b=0$ (since $|b|<1$), which is impossible, since the value of the function at the origin is zero, even though its maximal value is positive. Similarly, we can exclude the $b=0$ case. So, $a\ne 0\ne b$, whence $a^2 +3b^2 -1=3a^2 +b^2 -1=0$, and we get $a=\pm 1/2$ and $b=\pm 1/2$. Of these cases, f takes the value 1 / 8 at the points $(\pm 1/2, \mp 1/2)$, while it takes the value $-1/8$ at the points $(\pm 1/2, \pm 1/2)$. Thus, the largest value of f is 1 / 8, and f takes this value at two points, namely at $(\pm 1/2, \mp 1/2)$.

2. Find the triangle with largest area that can be inscribed in a circle with a fixed radius.

Consider a triangle H defined by its three vertices that lie on the circle $S=\{ (u, v):u^2 +v^2 =1\}$. If the angles between the segments connecting the origin and the vertices are x, y, z, then we can compute the area of H with the help of the formula $(\sin x+\sin y +\sin z) /2$. (This follows from the fact that if the two equal sides of an isosceles triangle are of unit length, and the angle between these two sides is $\alpha $, then the area of the triangle is $\frac{1}{2}\cdot \sin \alpha $.) This is true even if one of the angles x, y, z is larger than $\pi $. Since $z=2\pi -x-y $, we need to find the maximum value of the function $f(x, y)=\sin x+\sin y -\sin (x+y)$ on the set $A=\{ (x, y):x\ge 0,\ y\ge 0, \ x+y\le 2\pi \}$.

The set A is nothing other than the triangle defined by the points (0, 0), $(2\pi , 0)$, and $(0,2\pi )$. Obviously, this is a bounded and closed set, and thus Theorem 1.61 can be applied.

It is easy to see that the function f is zero on the boundary of the set A. Since f takes a positive value (e.g., at the point $(\pi /2,\pi /2)$), it follows that f takes its maximum at an internal point (x, y), for which $D_1 f(x, y)=\cos x -\cos (x+y)=0$ and $D_2 f(x, y)=\cos y -\cos (x+y)=0$. We get $\cos x=\cos y$, so either $y=2\pi -x$ or $y=x$. In the first case, (x, y) lies on the boundary of A, which is impossible. Thus $y=x$ and $\cos x=\cos 2x$. Since $x=2x $ is not possible (it would imply that $x=0$, whence (x, y) would be on the boundary of A again), we must have $2x=2\pi -x $, and $x=y=2\pi /3$. We have proved that the triangle with the largest area that can be inscribed in a circle with fixed radius is an equilateral triangle. $\square $

Exercises

1.64.

Find the points where the partial derivatives of the following two-variable functions exist.

- (a)$|x+y|$;
- (b)$\root 3 \of {x^3 +y^3} $;
- (c)$f(x, y)=x$ if $x\in \mathbb {Q}$, $f(x, y)=y$ if $x\notin \mathbb {Q}$.

1.65.

Show that the partial derivatives of the function $f(x, y)=xy/\sqrt{x^2 +y^2} ,\ f(0,0)=0$ exist and are bounded everywhere in the plane.

1.66.

Construct a two-variable function whose partial derivatives exist everywhere, but the function is unbounded in every neighborhood of the origin.

1.67.

Let $f:\mathbb {R}^2\rightarrow \mathbb {R}$. Show that if $D_1 f \equiv 0$, then f depends only on the variable y. If $D_2 f \equiv 0$, then f depends only on the variable x.

1.68.

Show that if $f:\mathbb {R}^2\rightarrow \mathbb {R}$, $D_1 f \equiv 0$, and $D_2 f \equiv 0$, then f is constant.

1.69.

Show that if $G\subset {\mathbb {R}^{p}}$ is a connected open set, the partial derivatives of the function $f:G \rightarrow \mathbb {R}$ exist everywhere, and $D_1 f (x)=\ldots =D_p f (x)=0$ for every $x\in G$, then f is constant. (H)

1.70.

Show that if the partial derivatives of the function $f:\mathbb {R}^2\rightarrow \mathbb {R}$ exist everywhere and $|D_1 f |\le 1,\ |D_2 f |\le 1$ everywhere, then f is continuous. (Furthermore, f has the Lipschitz property.)^{14}

1.71.

Construct a two-variable polynomial that has two local maximum points but no local minimum points. (H S)

1.72.

Find the local extremal points of the function $x^2 +xy +y^2 -4\log x -10 \log y$.

1.73.

Find the maximum of $x^3 +y^2 -xy$ on the square $[0,1]\times [0,1]$.

1.74.

Find the minimum of $x+ \frac{y^2}{4x}+ \frac{z^2}{y} + \frac{2}{z}$ in the octant $x,y, z >0$. (First prove that the function can be restricted to a bounded and closed set.)

1.75.

Find the minimum of $(x^3 +y^3 +z^3 )/(xyz)$ on the set $\{ (x,y, z \in \mathbb {R}^3 : x,y, z >0 \}$.

1.76.

Find the maximum and minimum values of $xy\cdot \log (x^2 +y^2 )$ on the disk ${x^2 +y^2 \le R^2}$.

1.77.

Find the maximum of $x^{\sqrt{2}} \cdot y^e \cdot z^\pi $ restricted to $x,y, z\ge 0$ and $x+y+z{=}1$.

1.78.

Find the minimum value of the function $2x^4 +y^4 -x^2 -2y^2$.

1.79.

What is the minimum value of $xy + \frac{50}{x}+ \frac{20}{y}$ on the set $x, y>0$?

Weierstrass’s approximation theorem states that if f is a continuous function defined on a box (or, more generally, on a closed and bounded set), then f can be approximated by polynomials (see Theorem 1.54 and Exercises 1.59–1.63). However, we cannot control the degree of the approximating polynomials: in general, it may happen that we need polynomials of arbitrarily high degrees for the approximation. The situation is different in the case of local approximation, when we want to approximate a function in a neighborhood of a given point. For an important class of functions, good local approximation is possible using polynomials of first degree.

In the case of single-variable analysis, differentiability is equivalent to local approximability by first-degree polynomials (see [7, Theorem 12.9]). For multivariable functions, differential quotients do not have an immediate equivalent (since we cannot divide by vectors), and therefore, we cannot define differentiability via the limits of differential quotients. Approximability by first-degree polynomials, however, can be generalized verbatim to multivariable functions.

We call the function $\ell :{\mathbb {R}^{p}}\rightarrow \mathbb {R}$ a linear function if there exist real numbers $\alpha _1 ,\ldots ,\alpha _p$ such that
$\ell (x)=\alpha _1 x_1 +\ldots +\alpha _p x_p$ for every $x=(x_1 ,\ldots , x_p )\in ~ {\mathbb {R}^{p}}$.

Definition 1.63.

Let the function f be defined in a neighborhood of the point $a\in {\mathbb {R}^{p}}$. We say that f is differentiable at the point a if there exists a linear function $\ell (x)$ such that
for every $x\in D(f)$, where $\varepsilon (x)\rightarrow 0$ as $x\rightarrow a$.

$$\begin{aligned} f(x)=f(a) +\ell (x-a)+\varepsilon (x)\cdot |x-a| \end{aligned}$$

(1.13)

Remark 1.64.

1. It is clear that the function f is differentiable at the point a if and only if it is defined in a neighborhood of $a\in {\mathbb {R}^{p}}$ and there exists a linear function $\ell (x)$ such that
2. If $p=1$, then the notion of differentiability defined above is equivalent to the “usual” definition, that is, to the existence of a finite limit of the differential quotient $(f(x)-f(a))/(x-a)$ as $x\rightarrow a$.

$$\lim _{x\rightarrow a} \frac{f(x)-f(a) -\ell (x-a)}{|x-a|} =0.$$

3. If a function depends only on one of its variables, then the differentiability of the function is equivalent to the differentiability of the corresponding single-variable function. More precisely, let $a_1 \in \mathbb {R}$, and let a single-variable function f be defined in a neighborhood of $a_1$. Let $g(x_1 ,\ldots , x_p )=f(x_1 )$ for every $x_1 \in D(f)$ and $x_2 ,\ldots , x_p \in \mathbb {R}$. For arbitrary $a_2 ,\ldots , a_p$, the function g is differentiable at the point ${a=(a_1 ,\ldots , a_p)}$ if and only if f is differentiable at $a_1$ (see Exercise 1.82).

Example 1.65.

1. It follows from the definition that every polynomial of degree at most one is differentiable everywhere.

2. Let $f(x, y)= \frac{x^2 y^2}{x^2 +y^2}$ if $(x, y)\ne (0,0)$, and let $f(0,0)=0$. We prove that f is differentiable at the origin. Indeed, if $\ell $ is the constant zero function and $(x, y)\ne (0,0)$, then we haveand (1.13) holds.

$$\begin{aligned} \left| \frac{f(x,y)-\ell (x,y)}{|(x, y)|} \right|&=\frac{x^2 y^2}{(x^2 +y^2 )\cdot \sqrt{x^2 +y^2 } }=\frac{x^2 y^2}{(x^2 +y^2 )^{3/2} } \le \\&\le \frac{\max (x^2 , y^2 )^2}{\max (x^2 , y^2 )^{3/2}} =\max (x^2 , y^2 )^{1/2}, \end{aligned}$$

We know that every single-variable, differentiable function is continuous (see [7, Theorem 12.4]). The following theorem generalizes this fact for functions with an arbitrary number of variables.

Theorem 1.66.

If the function f is differentiable at a point a, then f is continuous at a.

Proof.

Since the right-hand side of (1.13) converges to f(a) as $x\rightarrow a$, it follows that

$$ \lim _{x\rightarrow a}f(x)=f(a).\ \square $$

The following fundamental theorem represents the linear functions of the definition of differentiability with the help of partial derivatives.

Theorem 1.67.

If a function f is differentiable at a point $a=(a_1 ,\ldots , a_p )\in {\mathbb {R}^{p}}$, then

- (i)every partial derivative of f exists and is finite at a, and furthermore,
- (ii)there is only one function $\ell $ satisfying Definition 1.63, namely the function$$\ell (x)=D_1 f(a)x_1 +\ldots +D_p f(a)x_p .$$

Proof.

Suppose that (1.13) holds for the linear function $\ell = \alpha _1 x_1 +\ldots +\alpha _p x_p$. Let i be fixed, and apply (1.13) with the point $x=(a_1 ,\ldots , a_{i-1} ,t, a_{i+1} ,\ldots , a_p )$. We get thatwhere $f_i$ is the function defined at (1.11). Since $f_i (a_i )=f(a)$, we haveand thus by $\lim _{x\rightarrow a} \varepsilon (x)=0$, we obtain that $f_i$ is differentiable at the point $a_i$, and $f'_i (a_i )=\alpha _i $. Therefore, by the definition of the partial derivatives, $D_i f(a)=\alpha _i $. This is true for every $i=1,\ldots , p$, and thus (i) and (ii) are proved. $\square $

$$f_i (t)=f(a)+ \alpha _i (t-a_i )+ \varepsilon (x)\cdot |t-a_i |,$$

$$\frac{f_i (t)-f_i (a_i )}{t-a_i} =\alpha _i \pm \varepsilon (x),$$

Corollary 1.68.

Let f be defined in a neighborhood of $a\in {\mathbb {R}^{p}}$. The function f is differentiable at the point $a\in {\mathbb {R}^{p}}$ if and only if all partial derivatives of f exist at a, they are finite, and for every $x\in D(f)$, where $\lim _{x\rightarrow a} \varepsilon (x)=0$. $\square $

$$\begin{aligned} f(x) = f(a) + D_1 f(a)(x_1 - a_1) + \ldots + D_p f(a)(x_p - a_p) + \varepsilon (x)\cdot |x - a| \end{aligned}$$

(1.14)

Example 1.69.

1. We show that the function $f(x, y)=xy$ is differentiable at (1, 2). Since $D_1 f(1,2)=2$ and $D_2 f(1,2)=1$, we need to proveConsidering that the numerator is $(x-1)(y-2)$ andas $(x, y)\rightarrow (1,2)$, we obtain that indeed, f is differentiable at (1, 2).

$$\lim _{(x, y)\rightarrow (1,2)} \frac{xy-2-2(x-1)-(y-2)}{\sqrt{(x-1)^2 +(y-2)^2}} =0.$$

$$ \left| \frac{(x-1)(y-2)}{\sqrt{(x-1)^2 +(y-2)^2}}\right| \le |y-2| \rightarrow 0 $$

2. The function |x| is continuous but not differentiable at 0. This is true in the multivariable case as well. Indeed, the partial derivatives of $|x|=\sqrt{x_1^2 +\ldots +x_p^2}$ do not exist at the origin. Since $|x|=|t|$ at the point $x=(0,\ldots , 0 ,t, 0,\ldots , 0)$, the fraction of the right-hand side on (1.12) is $\frac{|t|-|0|}{t-0}$, which does not have a limit as $t\rightarrow 0$. Therefore, by Theorem 1.67, |x| is not differentiable at the origin.

3. Consider the function $f(x, y)=\sqrt{|xy|}$ on $\mathbb {R}^2$. By Corollary 1.50, f is continuous everywhere. We prove that f is not differentiable at the origin. In contrast to our previous example, the partial derivatives do exist at the origin. Indeed, the sections $f_0$ and $f^0$ are both zero, and hence their derivatives are also constant and equal to zero, i.e., $D_1 f(0,0)=D_2 f(0,0)=0$.

Now, f is differentiable at the origin if and only if holds (see Corollary 1.68). However, the value of the fraction on the line $y=x$ is $1/\sqrt{2} $, and consequently, there exists a point in every neighborhood of (0, 0) where the fraction is $1/\sqrt{2} $. Thus (1.15) does not hold, and f is not differentiable at the point (0, 0).

$$\begin{aligned} \lim _{(x, y)\rightarrow (0,0)}\frac{\sqrt{|xy|}}{\sqrt{x^2 +y^2}} =0 \end{aligned}$$

(1.15)

The right-hand side of the equality (1.14) can be simplified if we notice that $D_1 f(a)(x_1 -a_1 ) +\ldots +D_p f(a)(x_p -a_p )$ is nothing other than the scalar product of the vectors $(D_1 f(a),\ldots , D_p f(a))$ and $x-a$. This motivates the following definition.

Definition 1.70.

If f is differentiable at the point $a\in {\mathbb {R}^{p}}$, then the vectoris said to be the derivative vector
of f
at a and is denoted by
$f'(a)$
.

$$ (D_1 f(a),\ldots , D_p f(a)) $$

Using the notation above, (1.14) becomes $f(x)=f(a)+ \langle f'(a), x-a\rangle +\varepsilon (x)\cdot |x-a|$. In the single-variable case this is the well-known formula $f(x)=f'(a)\cdot (x-a) +f(a) +\varepsilon (x)\cdot |x-a|$.

The following theorem gives a useful sufficient condition for differentiability.

Theorem 1.71.

Let f be defined in a neighborhood of $a\in {\mathbb {R}^{p}}$. If the partial derivatives of f exist in a neighborhood of a and they are continuous at a, then f is differentiable at a.

Proof.

We prove the result for $p=3$. It is straightforward to generalize the proof for an arbitrary p.

Let $\varepsilon >0$ be fixed. Since the partial derivatives of f exist in a neighborhood of a and they are continuous at a, there exists $\delta >0$ such that $|D_i f(x) -D_i f(a)|<\varepsilon $ for every $x\in B(a,\delta )$ and $i=1,2,3$.

Let us fix $x=(x_1 , x_2 , x_3 )\in B(a,\delta )$ and connect the points ${a=(a_1 , a_2 , a_3 )}$ and x with a polygonal line consisting of at most three segments, each parallel to one of the axes. Let ${u=(x_1 , a_2 , a_3 )}$ and ${v=(x_1 , x_2 , a_3 )}$. The segment [a, u] is parallel to the x-axis (including the possibility that it is reduced to a point), the segment [u, v] is parallel to the y-axis, and the segment [v, x] is parallel to the z-axis.

The partial derivative $D_1 f$ exists and is finite at each point of the segment [a, u], and thus the section $t\mapsto f(t, a_2 , a_3 )$ is differentiable on the interval $[a_1 , x_1 ]$, and its derivative is $D_1 f(t, a_2 , a_3 )$ there. By the mean value theorem,^{15} there is a point $c_1 \in [a_1 , x_1 ]$ such thatSince $(c_1 , a_2 , a_3 )\in B(a,\delta )$, we have $|D_1 f(c_1 , a_2 , a_3 ) -D_1 f(a)|<\varepsilon $, and thus follows. Similarly, the partial derivative $D_2 f$ exists and is finite everywhere on the segment [u, v]; thus the section $t\mapsto f(x_1 ,t , a_3 )$ is differentiable on the interval $[a_2 , x_2 ]$, and its derivative is $D_2 f(x_1 ,t , a_3 )$ there. By the mean value theorem, there is a point $c_2 \in [a_2 , x_2 ]$ such thatSince $(x_1 , c_2 , a_3 )\in B(a,\delta )$, it follows that $| D_2 f(x_1 , c_2 , a_3 ) -D_2 f(a)|<\varepsilon $, and By the same argument we obtain Applying the triangle inequality yieldswhence the approximations (1.16), (1.17), and (1.18) give

$$f(u)-f(a)=f(x_1 , a_2 , a_3 )-f(a_1 , a_2 , a_3 )=D_1 f(c_1 , a_2 , a_3 ) \cdot (x_1 -a_1 ).$$

$$\begin{aligned} |f(u)-f(a) -D_1 f(a) (x_1 -a_1 )|\le \varepsilon \cdot |x_1 -a_1 | \le \varepsilon \cdot |x-a| \end{aligned}$$

(1.16)

$$f(v)-f(u)=f(x_1 , x_2 , a_3 )-f(x_1 , a_2 , a_3 )=D_2 f(x_1 , c_2 , a_3 ) \cdot (x_2 -a_2 ).$$

$$\begin{aligned} |f(v)-f(u) -D_2 f(a) (x_2 -a_2 )|\le \varepsilon \cdot |x_2 -a_2 |\le \varepsilon \cdot |x-a|. \end{aligned}$$

(1.17)

$$\begin{aligned} |f(x)-f(v) -D_3 f(a) (x_3 -a_3 )|\le \varepsilon \cdot |x-a|. \end{aligned}$$

(1.18)

$$\begin{aligned} \big | f(x)- (D_1 f(a) (x_1 -a_1 ) +&D_2 f(a) (x_2 -a_2 ) +D_3 f(a) (x_3 -a_3 ) +f(a) ) \big | \le \\&\le |f(u)-f(a) -D_1 f(a) (x_1 -a_1 )|+\\&+ |f(v)-f(u) -D_2 f(a) (x_2 -a_2 )|+\\&+ |f(x)-f(v) -D_3 f(a) (x_3 -a_3 )| , \end{aligned}$$

$$\big | f(x)- (D_1 f(a) (x_1 -a_1 ) +D_2 f(a) (x_2 -a_2 ) + D_3 f(a) (x_3 -a_3 ) +f(a))\big | \le 3\varepsilon \cdot |x-a|.$$

Since $\varepsilon $ was arbitrary, we haveand f is differentiable at the point a. $\square $

$$ \lim _{x\rightarrow a} \frac{f(x)\!-\!(D_1 f(a)(x_1\!-\!a_1)\!+\!D_2f(a)(x_2\!-\!a_2 )\!+\!D_3 f(a)(x_3\!-\!a_3)\!+\!f(a))}{|x - a|} = 0, $$

Corollary 1.72.

The polynomial functions are differentiable everywhere. The rational functions are differentiable at every point of their domain.

Proof.

Remark 1.73.

By Theorems 1.66, 1.67, and 1.71 we have the following:

- (i)if f is differentiable at a point a , then f is continuous at a , and its partial derivatives exist and are finite at a ; furthermore,
- (ii)if the partial derivatives of f exist in a neighborhood of a and are continuous at a , then f is differentiable at a .

We prove that the converses of these implications are not true.

Let $f(x, y)=\frac{x^2 y}{x^2 +y^2}$ if $(x, y)\ne (0,0)$, and let $f(0,0)=0$. In Example 1.34.1 we proved that the limit of f at (0, 0) is zero, and thus f is continuous at the origin. (Furthermore, f is continuous everywhere by Theorem 1.48.) The partial derivatives of f exist everywhere. If $a\ne 0$, then the section $f_a (y)=a^2 y/(a^2 +y^2 )$ is differentiable everywhere, and if $a=0$, then $f_a$ is zero everywhere; thus it is also differentiable everywhere. The same is true for the sections $f^b$. Thus the partial derivatives $D_1 f ,\ D_2 f$ exist everywhere and $D_1 f(0,0)=D_2 f(0,0)=0$.

By Theorem 1.67, f is differentiable at the origin if and only if However, the value of the fraction is $\pm 1/2\sqrt{2} $ at every point of the line $y=x$, and hence there exists a point in every neighborhood of (0, 0) where the fraction takes the value $\pm 1/2\sqrt{2} $. Therefore, (1.19) does not hold, and f is not differentiable at (0, 0). We have shown that the converse of statement (i) is not true.

$$\begin{aligned} \lim _{(x, y)\rightarrow (0,0)}\frac{x^2 y}{(x^2 +y^2 )\sqrt{x^2 +y^2}} =0. \end{aligned}$$

(1.19)

One can check that the function $f(x)=x^2 \cdot \sin (1/x)$, $f(0)=0$, is differentiable everywhere on $\mathbb {R}$, but its derivative is not continuous at zero (see [7, Example 13.43])). This function shows that the converse of statement (ii) is not true for single-variable functions. By Remark 1.64.3, $g(x_1 ,\ldots , x_p )=f(x_1 )$ is differentiable everywhere on ${\mathbb {R}^{p}}$, and since $D_1 g(x_1 ,\ldots , x_p )=f'(x_1 )$ for every $x\in {\mathbb {R}^{p}}$, the partial derivative $D_1 g$ is not continuous at the origin. We have therefore shown that the converse of (ii) is also not true for p-variable functions.

If f is a differentiable function of one variable, then the graph of the first-degree polynomial approximating f in a neighborhood of a is nothing but the tangent of the graph of f at the point (a, f(a)). We want to find an analogous statement in the multivariable case.

In three dimensions, planes are given by equations of the form $a_1 x_1 +a_2 x_2 +a_3 x_3 =b$, where at least one of the coefficients $a_1 , a_2 , a_3$ is nonzero. This can be shown as follows. Let S be a plane and let c be a point in S. Let a be a nonzero vector perpendicular to the plane S. We know that a point x is a point of the plane S if and only if the vector $x-c$ is perpendicular to a, i.e., if $\langle x-c, a\rangle =0$. Thus, ${x\in S}$ if and only if $\langle a,x \rangle =\langle a, c \rangle $. Using the notation $x=(x_1 , x_2 , x_3 )$, $a=(a_1 , a_2 , a_3 )$, and $c=(c_1 , c_2 , c_3 )$ we have that $x\in S$ if and only if $a_1 x_1 +a_2 x_2 +a_3 x_3 =b$, where $b=\langle a, c \rangle $.

Now suppose that $a_1 , a_2 , a_3 , b\in \mathbb {R}$, and at least one of $a_1 , a_2 , a_3$ is nonzero. Let $a=(a_1 , a_2 , a_3 )$. Choose a vector c such that $\langle a, c \rangle =b$. Obviously, the equality $a_1 x_1 +a_2 x_2 +a_3 x_3 =b$ holds if and only if ${\langle x-c, a\rangle }=0$, i.e., if the vector $x-c$ is perpendicular to a. We get that $a_1 x_1 +a_2 x_2 +a_3 x_3 =b$ is the equation of the plane containing the point c and perpendicular to the vector a.

Let $g(x_1 , x_2 )=a_1 x_1 +a_2 x_2 +b$ be a first-degree polynomial. Then the graph of g, i.e., the set $\{ (x_1 , x_2 , x_3 ):x_3 =a_1 x_1 +a_2 x_2 +b\}$, is a plane. Conversely, if $a_1 x_1 +a_2 x_2 +a_3 x_3 =b$ is the equation of a plane S that satisfies ${a_3 \ne 0}$, then S is the graph of the first-degree polynomial $g(x_1 , x_2 )=-(a_1 /a_3 ) x_1 -(a_2 /a_3 ) x_2 +(b/a_3 )$.

We can now generalize the definition of the tangent to the case of two-variable functions. Let us switch from the coordinate notation $(x_1 , x_2 , x_3 )$ to the notation (x, y, z).

Definition 1.74.

Let $(a, b)\in \mathbb {R}^2$ be fixed, and let f be defined in a neighborhood of the point (a, b). We say that the plane S is the tangent plane
of $\mathrm{graph}\, f$
at the point (a, b, f(a, b)) if S contains the point (a, b, f(a, b)), and S is the graph of a first-degree polynomial g that satisfies

$$ \lim _{(x,y)\rightarrow (a,b)} \frac{f(x,y)-g (x,y)}{|(x,y)-(a, b)|}=0. $$

It is clear from Remark 1.64.1 that the graph of f has a tangent plane at the point (a, b, f(a, b)) if and only if f is differentiable at (a, b). Using the definition above and Corollary 1.68, it is also obvious that the equation of the tangent plane isThese concepts can be generalized to functions with an arbitrary number of variables. We call the set of points of the space $\mathbb {R}^{p+1}$ that satisfy the equality $a_1 x_1 +\ldots +a_{p+1} x_{p+1} =b$ a hyperplane
of $\mathbb {R}^{p+1}$
, where at least one of the coefficients $a_1 ,\ldots , a_{p+1}$ is nonzero.

$$ z\,{=}\, D_1 f(a,b) (x\,{-}\,a)\,{+}\, D_2 f(a,b) (y\,{-}\,b)\,{+}\,f(a, b). $$

Definition 1.75.

Let f be defined in a neighborhood of the point $u=(u_1 ,\ldots , u_p )\in {\mathbb {R}^{p}}$. We say that the hyperplane $H\subset \mathbb {R}^{p+1}$ is the tangent hyperplane
of the graph $\mathrm{graph}\, f$ at the point $v=(u_1 ,\ldots ,u_p , f(u_1 ,\ldots , u_p ))$ if S contains the point v, and H is the graph of a first-degree polynomial g that satisfies

$$ \lim _{x\rightarrow u} (f(x)-g (x))/|x-u|=0. $$

It is easy to see that the graph of f has a tangent hyperplane at the point v if and only if f is differentiable at u. In this case, the equation of the tangent hyperplane is $x_{p+1} =\langle f'(a), x-a\rangle +f(a)$.

Note that the concept of the tangent and the tangent plane can be defined for every subset of ${\mathbb {R}^{p}}$. The tangent and the tangent plane of the graph of a function are just special cases of the general definition. The reader can find more on this in the appendix of this chapter.

Let f be defined in a neighborhood of $a\in {\mathbb {R}^{p}}$, and let $v\in {\mathbb {R}^{p}}$ be a unit vector. The function $t\mapsto f(a+tv)$ ($t\in \mathbb {R}$) is defined in a neighborhood of 0. The value of $f(a+tv)$ is the height of the graph of the function f at the point $a+tv$. (If $p=2$, then the graph of the function $t\mapsto {f(a+tv)}$ can be illustrated by intersecting the graph of f by the vertical plane containing the line $a+tv$ ($t\in \mathbb {R}$) and the point (a, f(a)) of the graph.) In this way, $t\mapsto {f(a+tv)}$ describes the “climbing” we do as we start from the point (a, f(a)) on the graph of f and walk in the direction of v. Intuitively it is clear that the derivative of the function $t\mapsto {f(a+tv)}$ at the point 0 (if it exists) tells us how steep a slope we need to climb at the point (a, f(a)). We are descending when the derivative is negative, and ascending when the derivative is positive.

Definition 1.76.

Let $v\in {\mathbb {R}^{p}}$ be a unit vector. We call the derivative of the function $t\mapsto f(a+tv)$ at the point 0 (if it exists) the directional derivative
of the function f at the point a and in the direction v. Notation: $\frac{\partial f}{\partial v} (a)$ or $D_v f(a)$. In other words,
assuming that the limit exists.

$$ D_v f(a)=\lim _{t\rightarrow 0} \frac{f(a+tv)-f(a)}{t}, $$

Theorem 1.77.

If the function f is differentiable at $a\in {\mathbb {R}^{p}}$, then the single-variable function $t\mapsto f(a+tv)$ is differentiable at 0 for every vector $v\in {\mathbb {R}^{p}}$, and its derivative is $\langle f'(a), v\rangle $. In particular, if $|v|=1$, then the directional derivative $D_v f(a)$ exists and its value is $D_v f(a)=\langle f'(a), v\rangle $.

Proof.

By Corollary 1.68 we havei.e.,for every $t\ne 0$ satisfying $a+tv\in D(f)$. Since $\lim _{x\rightarrow a} \varepsilon (x)=0$ implies $\lim _{t\rightarrow 0} \varepsilon (a+tv)=0$, we have $(f(a+tv)-f(a))/t\rightarrow \langle f'(a), v\rangle $ as $t\rightarrow 0$. Thus we have proved the first statement of the theorem. The second statement is obvious from the first one. $\square $

$$f(a+tv)=f(a) +\langle f'(a), tv\rangle +\varepsilon (a+tv)\cdot |tv|,$$

$$\frac{f(a+tv)-f(a)}{t} =\langle f'(a), v\rangle \pm \varepsilon (a+tv) \cdot |v|$$

Remark 1.78.

1. The partial derivative $D_i f(a)$ is the same as the directional derivative in the direction $v_i$, where $v_i$ is the vector whose coordinates are all zero except for its ith coordinate, which is 1. This follows directly from the definitions. Furthermore, if f is differentiable at a, this also follows from the formula $D_v f(a)=\langle f'(a), v\rangle $.

2. Suppose that at least one of the partial derivatives $D_i f(a)$ is nonzero, i.e., the derivative vector $f'(a)$ is not the zero vector. If $|v|=1$, then $\langle f'(a), v\rangle =|f'(a)|\cdot \cos \alpha $, where $\alpha $ is the angle between vectors $f'(a)$ and v (see page 3). Therefore, $\langle f'(a), v\rangle \le |f'(a)| $, and equality holds only if the directions of the vectors v and $f'(a)$ are the same. In other words, the “climbing” of the graph of f is the steepest in the direction of the vector $f'(a)$. Because of this, we also call the derivative vector $f'(a)$ the gradient.

3. It is possible that the directional derivative $D_v f(a)$ exists for every $|v|=1$ yet f is not differentiable at a (see Exercise 1.89).

As an important corollary of Theorem 1.77, we obtain the mean value theorem for multivariable functions.

Theorem 1.79.

(Mean value theorem) Let the function f be differentiable at the points of the segment [a, b], where $a, b\in {\mathbb {R}^{p}}$. Then

- (i)the single-variable function $F(t)= f(a+t(b-a)) \ (t\in [0,1])$ is differentiable in $[0,1]$, $F'(t)=\langle f'(a+t(b-a)), b-a\rangle $ for every $t\in [0,1]$, and
- (ii)there exists a point $c\in [a, b]$ such that $f(b)-f(a)=\langle f'(c), b-a\rangle $.

Proof.

Let $t_0 \in [0,1]$, and apply Theorem 1.77 to the point ${a+t_0 (b-a)}$ and the vector $v{=}b{-}a$. We find that the functionis differentiable at the point 0, and its derivative is $\langle f' (a+t_0 (b-a)), b-a\rangle $ there. Thus $F'(t_0 )= \langle f' (a+t_0 (b-a)), b-a\rangle $, which proves (i).

$$ {t{\mapsto } f(a+(t_0 +t)(b-a))} $$

By the single-variable version of the mean value theorem, there exists a point ${u\in [0,1]}$ such that ${F(1)-F(0)=F'(u)}$. Since ${F(0)=f(a)}$ and ${F(1)=f(b)}$, by applying (i) we have ${f(b)-f(a)}={\langle f'(c), b-a\rangle }$, where ${c=a+u(b-a)}$. $\square $

Exercises

1.80.

Which of the following functions are differentiable at the origin?

- (a)$\sqrt{x^2 +y^2}$;
- (b)$\sqrt{|x^2 -y^2|}$;
- (c)$\sqrt{|x^3 -y^3|}$;
- (d)$\sqrt{|x^3 +y^3 |}$;
- (e)$\sqrt{|x^2 y+xy^2 |}$;
- (f)$f(x, y)=xy/\sqrt{x^2 +y^2}$, $f(0,0)=0$;
- (g)${\root 3 \of {x^3 +y^3}}$;
- (h)${\root 3 \of {x^3 +y^4}}$ (H S);
- (i)$ x\cdot \sqrt{|y|} $;
- (j)$f(x, y)=xy(x^2 -y^2 )/(x^2 +y^2 )$, $f(0,0)=0$;
- (k)$f(x, y)=(x^3 +y^5 )/(x^2 +y^4 )$, $f(0,0)=0$;
- (l)$f(x, y)=x^2 \cdot \sin (x^2 +y^2 )^{-1}$, $f(0,0)=0$;
- (m)$f(x, y)=\frac{x^3}{x^2 +y^2}$, $f(0,0)=0$;
- (n)$\frac{\root 3 \of {x^2 y^5}}{\sqrt{x^2 +y^2}}$, $f(0,0)=0$.
- (o)$f(x, y)=x\cdot \sin \frac{1}{y}, \ f(x, 0)=0$.

1.81.

Let $f(x, y)=|x|^\alpha \cdot |y|^\beta $ if $xy\ne 0$, and let $f(x, y)=0$ if $xy=0$. For what values of $\alpha ,\beta $ is f differentiable at the origin? For what values of $\alpha ,\beta $ is f differentiable everywhere?

1.82.

Show that if $f:\mathbb {R}\rightarrow \mathbb {R}$ is differentiable at a, then the function $g(x, y)=f(x)$ is differentiable at (a, b) for every b. (S)

1.83.

For what functions $f:\mathbb {R}^2\rightarrow \mathbb {R}$ will the function $x\cdot f(x, y)$ be differentiable at the origin?

1.84.

Show that if the function $f:\mathbb {R}^2 \rightarrow \mathbb {R}$ is differentiable at the origin, then for every $c{\in } \mathbb {R}$ the single-variable function $g(x){=}f(x, cx)$ is differentiable at 0.

1.85.

Show that if the function $f:{\mathbb {R}^{p}}\rightarrow \mathbb {R}$ is differentiable at a and $f(a)=D_1 f(a)=\ldots =D_p f(a)=0$, then $f\cdot g$ is also differentiable at a for every bounded function $g:\mathbb {R}\rightarrow \mathbb {R}$.

1.86.

True or false? If f is differentiable at $a\in \mathbb {R}^2$ and f has a strict local minimum at a restricted to every line going through a, then f has a strict local minimum at a. (H)

1.87.

Find the directional derivatives of $f(x, y)=\root 3 \of {x^3 +y^3}$ at the origin. Can we choose the vector a such that the directional derivative in the direction u equals $\langle a, u \rangle $ for every $|u|=1$? Prove that f is not differentiable at the origin.

1.88.

Find the directional derivatives of $f(x, y)=\frac{x^3}{x^2 +y^2} , \ f(0,0)=0$, at the origin. Can we choose a vector a such that the directional derivative in the direction u equals $\langle a, u \rangle $ for every $|u|=1$?

1.89.

Construct two-variable functions f whose every directional derivative at the origin is 0, but

- (a)f is not differentiable at the origin,
- (b)f is not continuous at the origin,
- (c)there does not exist a neighborhood of the origin on which f is bounded.

1.90.

Let $G\subset {\mathbb {R}^{p}}$ be a connected open set, and let $f:{\mathbb {R}^{p}}\rightarrow \mathbb {R}$ be differentiable. Show that if $f'(x)=0$ for every $x\in G$, then f is a constant function. (H)

1.91.

Let $f:\mathbb {R}^2\rightarrow \mathbb {R}$ be differentiable in the plane, and let $D_1 f (x, x)= D_2 f (x, x)=0$, for every x. Show that f(x, x) is a constant function.

1.92.

Let the real functions f and g be differentiable at the point $a\in {\mathbb {R}^{p}}$. Find a formula for the partial derivatives of the functions $f\cdot g$ and (when $g(a)\ne 0$) of f / g at the point a in terms of the partial derivatives of f and g.

1.93.

Verify that the gradient of $\sqrt{x^2 +y^2}$ at $(a, b) \ne (0,0)$ is parallel to and points in the same direction as (a, b). Why is this obvious intuitively?

1.94.

Verify that the gradient of $\sqrt{1 - x^2 - y^2}$ at the point (a, b) is parallel to and points in the opposite direction as (a, b) when $0< {a^2 + b^2} < 1$. Why is this obvious intuitively?

1.95.

Let $a, b>0$, and let $T_{a, b}$ denote the tetrahedron bounded by the xy, xz, yz coordinate planes and by the tangent plane of the graph of the function $f(x, y)=1/(xy)$ at the point (a, b). Show that the volume of $T_{a, b}$ is independent of a and b.

Definition 1.80.

Let f
be defined in a neighborhood of $a\in {\mathbb {R}^{p}}$. If the partial derivative $D_j f$ exists in a neighborhood of a and the ith partial derivative of $D_j f$ exists at a, then we call this the ijth second-order partial derivative
of the function f at the point a, and we use any of the following notations:
(The function f has at most $p^2$ different second-order partial derivatives at the point a.)

$$ \frac{\partial ^2 f}{\partial x_i \partial x_j} (a),\ f''_{x_j x_i} (a),\ f_{x_j x_i} (a),\ D_i D_j f(a),\ D_{ij} f(a). $$

Example 1.81.

1. The partial derivatives of the two-variable function $f(x, y)=\sin (x^2 y)$ exist everywhere, with $D_1 f(x, y)=\cos (x^2 y)\cdot (2xy)$ and $D_2 f (x, y)=\cos (x^2 y)\cdot x^2$ for every (x, y). Since the partial derivatives of these functions exist everywhere, each of f’s four second-order derivatives exist everywhere, withNote that $D_{12} f(x, y)=D_{21} f(x, y)$ everywhere. This is surprising, since there is no obvious reason why the two calculations should lead to the same results. Our next example shows that $D_{12} f=D_{21}f$ is not always true.

$$\begin{aligned}&D_{11} f(x, y)=D_1 D_1 f(x, y) =-\sin (x^2 y) \cdot 4x^2 y^2 +\cos (x^2 y)\cdot 2y,\\&D_{21} f(x, y)=D_2 D_1 f(x, y) =-\sin (x^2 y) \cdot 2x^3 y +\cos (x^2 y)\cdot 2x ,\\&D_{12} f(x, y)=D_1 D_2 f(x, y) =-\sin (x^2 y) \cdot 2x^3 y +\cos (x^2 y)\cdot 2x ,\\&D_{22} f(x, y)=D_2 D_2 f(x, y) =-\sin (x^2 y) \cdot x^4 . \end{aligned}$$

2. Let $f(x, y)=xy\cdot (x^2 -y^2 )/(x^2 +y^2 )$ if $(x, y)\ne (0,0)$, and let $f(0,0)=0$. First we prove that the partial derivative $D_1 f$ exists everywhere. The section $f^0$ is zero everywhere, and thus $D_1 f(x, 0)$ exists for every x, and its value is zero everywhere. If $b\ne 0$, then the section $f^b$ is differentiable everywhere; thus $D_1 f(x, b)$ also exists for every x. If $b\ne 0$, thenWe have shown that $D_1 f(x, y)$ exists everywhere, and $D_1 f(0,y)=-y$ for every y. It follows that $D_{21} f(0,0)=D_2 D_1 f(0,0)=-1$.

$$D_1 f(0,b)=\lim _{x\rightarrow 0}\frac{f(x, b)-f(0,b)}{x}= \lim _{x\rightarrow 0}\frac{xb \cdot (x^2 -b^2 )}{(x^2 +b^2 )\cdot x}= b\cdot \lim _{x\rightarrow 0}\frac{x^2 -b^2}{x^2 +b^2}=-b.$$

Now let us consider the partial derivatives $D_2 f$. The section $f_0$ is zero everywhere, and thus $D_2 f(0,y)$ exists for all y, and its value is zero everywhere. If $a\ne 0$, then $f_a$ is differentiable everywhere; thus $D_2 f(a, y)$ also exists for every y. If $a\ne 0$, thenWe have shown that $D_2 f(x, y)$ exists everywhere, and $D_2 f(x, 0)=x$ for every x. It follows that $D_{12} f(0,0)=D_1 D_2 f(0,0)=1$, and thus ${D_{12} f(0,0)\ne D_{21} f(0,0)}$. $\square $

$$ D_2 f(a, 0)=\lim _{y\rightarrow 0}\frac{f(a,y)-f(a, 0)}{y}= \lim _{y\rightarrow 0}\frac{ay \cdot (a^2 -y^2 )}{(a^2 +y^2 )\cdot y}= a\cdot \lim _{y\rightarrow 0}\frac{a^2 -y^2}{a^2 +y^2}=a. $$

The following theorem explains why $D_{12} f= D_{21} f$ was true for Example 1.81.1.

Theorem 1.82.

(Young’s
^{16} theorem)
Let f
(x, y) be a two-variable function. If the partial derivative functions $D_1 f(x, y)$ and $D_2 f(x, y)$ exist in a neighborhood of $(a, b)\in \mathbb {R}^2$ and they are differentiable at (a, b), then $D_{12} f(a, b)= D_{21} f(a, b)$.

Lemma 1.83.

- (i)If the partial derivative $D_1 f(x, y)$ exists in a neighborhood of (a, b) and it is differentiable at (a, b), then$$\begin{aligned} \lim _{t\rightarrow 0} \frac{f(a+t,b+t)-f(a+t,b)-f(a,b+t)+f(a, b)}{t^2} =D_{21}f(a, b). \end{aligned}$$(1.20)
- (ii)If the partial derivative $D_2 f(x, y)$ exists in a neighborhood of (a, b) and it is differentiable on (a, b), then$$\begin{aligned} \lim _{t\rightarrow 0} \frac{f(a+t,b+t)-f(a+t,b)-f(a,b+t)+f(a, b)}{t^2} =D_{12}f(a, b). \end{aligned}$$(1.21)

Proof.

(i) Let us use the notationand, for a fixed t, $F(u)=f(u,b+t)-f(u, b)$. Clearly, $H(t)=F(a+t)-F(a)$. The main idea of the proof is to use the mean value theorem for the latter formula, and then use the differentiability of $D_1 f$ at a to show that H(t) is close to $D_{21} f(a)\cdot t^2$ when t is small.

$$ H(t)=\left( f(a+t,b+t)-f(a+t,b)\right) -\left( f(a,b+t)-f(a, b) \right) $$

Let $\varepsilon >0$ be fixed. Since $D_1 f(x, y)$ is differentiable at (a, b), we can choose $\delta >0$ such that holds for every point $(x,y)\in B((a, b),\delta )$.

$$\begin{aligned} \big | D_1 f(x, y)-(D_{11}f(a, b) (x-a)&+D_{21} f(a, b)(y-b)+D_1 f(a, b)) \big | \le \nonumber \\&\le \varepsilon \cdot (|x-a|+|y-b|) \end{aligned}$$

(1.22)

Let $0<|t|<\delta /2$ be fixed. The function F is differentiable in the interval ${[a, a+t]}$, since $u\in [a, a+t]$ impliesandFurthermore, the sections $f^{b+t}$ and $f^b$ are differentiable at $[a, a+t]$, with derivatives $D_1 f(u, b+t)$ and $D_1 f(u, b)$, respectively. Thus $F'(u)=D_1 f(u, b+t) -D_1 f(u, b)$ for every $u\in [a, a+t]$. By the mean value theorem we havefor an appropriate choice of $c\in [a, a+t]$, and thus Plugging $(x,y)=(c, b+t)$ and $(x,y)=(c, b)$ into (1.22), we getandrespectively. Applying the triangle inequality yieldsComparing with (1.23), we getSince $\varepsilon $ was arbitrary, and this is true for every $0<|t|<\delta /2$, (1.20) is proved.

$$ {(u,b+t)\in B((a, b),\delta )} $$

$$ (u,b)\in B((a, b),\delta ). $$

$$ F(a+t)-F(a)=(D_1 f(c, b+t) -D_1 f(c, b))\cdot t $$

$$\begin{aligned} H(t)=\left( D_1 f(c, b+t) -D_1 f(c, b) \right) \cdot t. \end{aligned}$$

(1.23)

$$\begin{aligned} \big | D_1 f(c, b+t)-(D_{11}f(a, b) (c-a)+&D_{21} f(a, b)t+D_1 f(a, b))\big | \le \\&\le \varepsilon \cdot (|c-a|+|t|)\le 2\varepsilon \cdot t \end{aligned}$$

$$\begin{aligned} \big | D_1 f(c, b)-(D_{11}f(a, b) (c-a)+&D_1 f(a, b)) \big | \le \\&\le \varepsilon \cdot |c-a| \le \varepsilon \cdot |t|, \end{aligned}$$

$$\left| D_1 f(c, b+t) -D_1 f(c, b) -D_{21} f(a, b)t\right| \le 3\varepsilon \cdot |t|.$$

$$\left| \frac{H(t)}{t^2} -D_{21} f(a, b)\right| \le 3\varepsilon .$$

(ii) Let $0<|t|<\delta /2$ be fixed, and let $G(v)=f(a+t,v)-f(a, v)$ for every v for which f is defined at the points $(a+t, v)$ and (a, v). We have $H(t)= G(b+t)-F(b)$ for every t small enough. Repeating the steps of the proof of (i), we get (1.21). $\square $

Proof of Theorem 1.82.

Let us revisit Example 1.81.1. One can see that the second-order partial derivatives of f are continuous everywhere. By Theorem 1.71 this implies that the first-order partial derivatives of f are differentiable. Thus, by Young’s theorem, $D_{12} f= D_{21} f$ everywhere.

Definition 1.84.

Let f be differentiable in a neighborhood of $a\in {\mathbb {R}^{p}}$. If the partial derivative functions of f are differentiable at a, then we say that f is twice differentiable
at the point a.

Lemma 1.85.

Let $p>2$, let f be defined in a neighborhood of $a=(a_1 , a_2 ,\ldots , a_p )\in {\mathbb {R}^{p}}$, and consider the sectionIf f is twice differentiable at a, then g is twice differentiable at $(a_1 , a_2 )\in \mathbb {R}^2$. Furthermore, $D_{21}g(a_1 , a_2 )=D_{21} f(a)$ and $D_{12}g(a_1 , a_2 )=D_{12} f(a)$.

$$g(u,v)=f(u,v, a_3 ,\ldots , a_p ).$$

Proof.

From the definition of the partial derivative, we have $D_1 g(u, v)=D_1 f(u,v, a_3 ,\ldots , a_p )$ and $D_2 g(u, v)=D_2 f(u,v, a_3 ,\ldots , a_p )$ whenever the right-hand sides exist. Thus, $D_1 g$ and $D_2 g$ are defined in a neighborhood of $(a_1 , a_2 )$. By assumption, $D_1 f$ is differentiable at a, and thuswhere $\varepsilon (x)\rightarrow 0$ as $x\rightarrow a$. Applying this with $x=(u,v, a_3 ,\ldots , a_p )$, we obtainSince $\varepsilon (u,v, a_3 ,\ldots , a_p )\rightarrow 0$ if $(u, v)\rightarrow (a_1 , a_2 )$, it follows that $D_1 g$ is differentiable at $(a_1 , a_2 )$, and $D_{21} g (a_1 , a_2 )=D_{21} f(a)$. Similarly, $D_2 g$ is differentiable at $(a_1 , a_2 )$, and $D_{12} g (a_1 , a_2 )=D_{12} f(a)$. $\square $

$$D_1 f(x)=D_1 f(a) + \sum _{i=1}^pD_{i1} f(a)(x_i -a_i )+\varepsilon (x)\cdot |x-a|,$$

$$\begin{aligned} D_1 g(u, v)=&D_1 g(a_1 , a_2 ) + D_{11} f(a)(u-a_1 )+D_{21} f(a)(v-a_2 )+\\&+ \varepsilon (u,v, a_3 ,\ldots ,a_p ) \cdot |(u, v)-(a_1 , a_2 )|. \end{aligned}$$

Theorem 1.86.

If f is twice differentiable at $a\in {\mathbb {R}^{p}}$, then $D_{ij} f(a)= D_{ji} f(a)$ for every $i, j=1,\ldots , p$.

Proof.

We may assume $i\ne j$. Since the role of the coordinates is symmetric, we may also assume, without loss of generality, that $i=1$ and $j=2$. Consider the sectionCombining Young’s theorem and our previous lemma yields $D_{12} g (a_1 , a_2 )=D_{21} g (a_1 , a_2 )$, and thus $D_{12} f (a)=D_{21} f (a)$. $\square $

$$g(u,v)=f(u,v, a_3 ,\ldots , a_p ).$$

Definition 1.87.

We define the k
th-order partial derivatives recursively on k. Assume that we have already defined the kth-order partial derivatives of the function f. Then we define the $(k+1)$st-order partial derivatives as follows.

Let $1\le i_1 ,\ldots , i_{k+1}\le p$ be arbitrary indices, and suppose that the kth-order partial derivative $D_{i_2 \ldots i_{k+1}} f(x)$ exists and is finite in a neighborhood of ${a\in {\mathbb {R}^{p}}}$. If the $i_1$th partial derivative of the function $x\mapsto D_{i_2 \ldots i_{k+1}} f(x)$ exists at a, then we call this the $(k+1)$st-order partial derivative of f at a, and use the notation $D_{i_1 \ldots i_{k+1}} f(a)$. (Obviously, f has at most $p^k$ different kth-order partial derivatives at a.)

Some other usual notation for $D_{i_1 \ldots i_{k}} f(a)$:

$$ \frac{\partial ^k f}{\partial x_{i_1} \ldots \partial x_{i_{k}}} (a), \ f^{(k)}_{x_{i_k} \ldots x_{i_1}} (a), \ f_{x_{i_k} \ldots x_{i_1}} (a), \ D_{i_1} \ldots D_{i_k} f(a). $$

Definition 1.88.

Suppose that we have already defined k-times differentiability. (We did so in the cases of $k=1$ and $k=2$.) We say that a function f is $(k+1)$ times differentiable at $a\in {\mathbb {R}^{p}}$ if f is k times differentiable on a neighborhood of a, furthermore, every kth-order partial derivative of f exists and is finite in a neighborhood of a, and these partial derivatives are differentiable at a.

Thus, we have defined k times differentiability for every k.

We say that a function f is infinitely differentiable at a if f is k times differentiable at a for every $k=1,2,\ldots $.

Remark 1.89.

It follows from Theorem 1.67 that if f is k times differentiable at a, then every kth-order partial derivative of f exists and is finite at a.

Theorem 1.90.

The polynomials are infinitely differentiable everywhere. The rational functions are infinitely differentiable at every point of their domains.

Proof.

By Corollary 1.72, polynomials are differentiable everywhere. Suppose we have already proved that polynomials are k times differentiable. Since the kth-order partial derivatives of a polynomial are also polynomials, these are differentiable, showing that the polynomials are also $(k+1)$ times differentiable. Thus, the polynomials are infinitely differentiable.

The proof for rational functions is similar. $\square $

Theorem 1.91.

Let the function f be k times differentiable at $a\in {\mathbb {R}^{p}}$. If the ordered k-tuples $(i_1 ,\ldots , i_k )$ and $(j_1 ,\ldots , j_k )$ are permutations of each other (i.e., each $i=1,\ldots , p$ appears the same number of times in both k-tuples), then $D_{i_1 \ldots i_{k}} f(a)=D_{j_1 \ldots j_{k}} f(a)$.

Proof.

Exercises

1.96.

Find every function $f:\mathbb {R}^2\rightarrow \mathbb {R}$ such that $D_2 (D_1 f)$ is zero everywhere. (H)

1.97.

Young’s theorem implies that the function $f(x, y)=xy\cdot (x^2 -y^2 )/(x^2 +y^2 )$, ${f(0,0)=0}$, cannot be twice differentiable at the origin. Verify, without using the theorem, that $D_1 f$ is not differentiable at the origin.

1.98.

For what values of $\alpha ,\beta >0$ is $|x|^\alpha \cdot |y|^\beta $ twice differentiable at the origin?

1.99.

Show that if $D_{12}f$ and $D_{21}f$ exist in a neighborhood of (a, b) and are continuous at (a, b), then $D_{12}f(a, b) =D_{21}f(a, b)$.

1.100.

Let the partial derivatives $D_{1}f$, $D_2 f$, and $D_{12}f$ exist in a neighborhood of (a, b), and let $D_{12}f$ be continuous at (a, b). Show that $D_{21}f(a, b)$ exists and is equal to $D_{12}f(a, b)$ (Schwarz’s theorem).

1.101.

Let $f:\mathbb {R}^2\rightarrow \mathbb {R}$ be twice differentiable everywhere. Show that if $D_{21}f$ is nonnegative everywhere, then $f(b,d)-f(a,d)-f(b,c)+f(a, c)\ge 0$ for every $a** and $c**.

The most important applications of differentiation—in the cases of multi- and single-variable functions alike—is the analysis of functions, finding the greatest and the smallest values, and finding good approximations using simpler functions (e.g., polynomials).

Since each of the applications below is based on Taylor^{17}
polynomials, our first task is to define these polynomials for p-variable functions and establish their most important properties. This proves to be surprisingly simple. The notation in the multivariable case is necessarily more complicated, but the notion of the Taylor polynomials, as well as their basic properties, is basically the same as in the single-variable case.

By a monomial
we mean a product of the form $c\cdot x^{s_1}_1 \cdots x^{s_p}_p$, where c is a nonzero real number and the exponents $s_1 ,\ldots , s_p$ are nonnegative integers.

The degree
of the monomial ${c\cdot x^{s_1}_1 \cdots x^{s_p}_p}$ is $s_1 +\ldots +s_p$. Every p-variable polynomial can be written as the sum of monomials. Obviously, if a polynomial is not the constant zero function, then it can be written in a way that the p-element sequences of the exponents of its corresponding monomials are distinct. By induction on p one can easily prove that this representation of the polynomials is unique. We call it the canonical form
of the polynomial.

We say that the degree
of a nonidentically zero polynomial is the highest degree of the monomials in its canonical form. The constant zero polynomial does not have a degree. Still, when we speak about the set of polynomials of degree at most n, we will include the identically zero polynomial among them.

Lemma 1.92.

Let Then $g(a)=c_{0\ldots 0}$, and furthermore, for every $k\le n$ and $1\le i_1 ,\ldots , i_k \le p$ we have where $s_1 ,\ldots , s_p$ denotes the number of indices of $1,\ldots , p$ in the sequence $(i_1 ,\ldots , i_k )$.

$$\begin{aligned} g (x)= \sum _{\genfrac{}{}{0.0pt}1{s_1 ,\ldots , s_p \ge 0}{s_1 +\ldots +s_p \le n}}c_{s_1 \ldots s_p} \cdot (x_{1} - a_{1} )^{s_1} \cdots (x_{p} - a_{p} )^{s_p}. \end{aligned}$$

(1.24)

$$\begin{aligned} D_{i_1 \ldots i_k} g(a)={s_1 ! \cdots s_p !} \cdot c_{s_1 \ldots s_p}, \end{aligned}$$

(1.25)

Proof.

The equality $g(a)=c_{0\ldots 0}$ is obvious. Let the indices $1\le i_1 ,\ldots , i_k \le p$ be fixed, with $k\le n$. For simplicity, we write $\mathcal{D}$ instead of $D_{i_1 \ldots i_k}$. It is easy to see that if $g_1$ and $g_2$ are polynomials, then $\mathcal{D}(g_1 +g_2 ) = \mathcal{D}g_1 +\mathcal{D}g_2 $ and $\mathcal{D}(\lambda g_1 )=\lambda \cdot \mathcal{D}g_1 $ for every $\lambda \in \mathbb {R}$. Thus, the kth-order partial derivative $\mathcal{D}g(a)$ can be computed by applying $\mathcal{D}$ to each of the terms on the right-hand side of (1.24) and summing the values of the resulting partial derivatives at the point a. Consider the kth-order partial derivative and its value at a: It is easy to see that if the index i is present in the sequence $(i_1 ,\ldots , i_k )$ more than $t_i$ times, then 1.26 is constant and equal to zero. On the other hand, if there is an index i such that i is present in the sequence $(i_1 ,\ldots , i_k )$ fewer than $t_i$ times, then the polynomial 1.26 is divisible by $x_i -a_i$, and thus the value of (1.27) is zero. Therefore, in applying $\mathcal{D}$ to the right-hand side of (1.24) and taking its value at a, we get a nonzero term only if $(t_1 ,\ldots , t_p )=(s_1 ,\ldots , s_p )$.

$$\begin{aligned} \mathcal{D}(x_{1} - a_{1} )^{t_1} \cdots (x_{d} - a_{d} )^{t_p} \end{aligned}$$

(1.26)

$$\begin{aligned} \left( \mathcal{D}(x_{1} - a_{1} )^{t_1} \cdots (x_{d} - a_{d} )^{t_p} \right) (a). \end{aligned}$$

(1.27)

Furthermore, since $\mathcal{D}(x_{1} - a_{1} )^{s_1} \cdots (x_{d} - a_{d} )^{s_p}$ is equal to the constant function $s_1 !\cdots s_p !$, it follows that (1.25) holds. $\square $

Let f be n times differentiable at a. By Theorem 1.91, if $n\le k$, then the kth-order partial derivative $D_{i_1 \ldots i_k} f(a)$ does not depend on the order of the indices $i_1 ,\ldots , i_k$, and only on the number of times these indices are present in the sequence $(i_1 ,\ldots , i_k )$. Let $s_1 ,\ldots , s_p$ be nonnegative integers, with $s_1 +\ldots +s_p \le n$. We denote by $D^{s_1 \ldots s_p} f(a)$ the number $D_{i_1 \ldots i_k} f(a)$, where the indices $1,\ldots , p$ are present in the sequence $(i_1 ,\ldots , i_k )$ exactly $s_1 ,\ldots , s_p $ times, respectively. Let $D^{0\ldots 0} f(a)=f(a)$.

Theorem 1.93.

Suppose that the function f is n times differentiable at $a=(a_1 ,\ldots , a_p )\in {\mathbb {R}^{p}}$, and let
The
polynomial $t_n$ is the only polynomial of degree at most n such that $t_n (a)=f(a)$, and for every $1\le k\le n$ and $1\le i_1 ,\ldots , i_k \le p$.

$$\begin{aligned} t_n (x)=\sum _{\genfrac{}{}{0.0pt}1{s_1 ,\ldots , s_p \ge 0}{s_1 +\ldots +s_p \le n}}\frac{1}{s_1 ! \cdots s_p !} \cdot D^{s_1 \ldots s_p } f(a) \cdot (x_{1} - a_{1} )^{s_1} \cdots (x_{p} - a_{p} )^{s_p}. \end{aligned}$$

(1.28)

$$\begin{aligned} D_{i_1 \ldots i_k} t_n (a)= D_{i_1 \ldots i_k} f (a) \end{aligned}$$

(1.29)

Proof.

Let g be a polynomial of degree at most n, and suppose that g satisfies $g(a)=f(a)$ and $D_{i_1 \ldots i_k} g(a)= D_{i_1 \ldots i_k} f(a)$ for every $k\le n$ and $1\le i_j \le p$ $(1\le j\le k)$. Then the polynomial $q=g(x_1 +a_1 ,\ldots , x_p +a_p )$ has degree at most n. Write q as the sum of the monomials ${c\cdot x^{s_1}_1 \cdots x^{s_p}_p}$ (with $c\ne 0$). Then ${s_1 +\ldots +s_p \le n}$ holds for each term. If we replace $x_i$ by $x_i -a_i$ in g for every $i=1,\ldots , p$, then we get that (1.24) is true for suitable coefficients $c_{s_1 \ldots s_p}$. Then by Lemma 1.92 we havefor every $(i_1 ,\ldots , i_k )$, i.e., $g=t_n $. $\square $

$$s_1 !\cdots s_p ! \cdot c_{s_1 \ldots s_p}=D_{i_1 \ldots i_k} g(a)=D_{i_1 \ldots i_k} f(a)$$

We can see thati.e., the graph of the polynomial $t_1$ is the tangent plane of $\mathrm{graph}\, f$ at (a, f(a)).

$$ t_1 (x) =f(a)+D_1 f(a)\cdot (x_1 -a_1 )+\ldots +D_p f(a)\cdot (x_p -a_p), $$

The polynomial $t_2$ in the cases $p=2$ and $p=3$ can be written as follows:orrespectively.

$$\begin{aligned}&t_2 (x,y)=f(a,b) +f'_x (a,b)\cdot (x-a)+f'_y (a, b)\cdot (y-b)+\\&+ \frac{1}{2}\cdot f''_{xx} (a, b)\cdot (x-a)^2 +f''_{xy} (a, b)\cdot (x-a)(y-b) +\frac{1}{2}\cdot f''_{yy} (a, b)\cdot (y-b)^2 , \end{aligned}$$

$$\begin{aligned} t_2 (x,y,z)&=f(a,b,c) +f'_x (a,b,c)\cdot (x-a)+f'_y (a,b,c)\cdot (y-b)+\\&+f'_z (a,b, c)\cdot (z-c)+\\&+\frac{1}{2}\cdot f''_{xx} (a,b, c)\cdot (x-a)^2+\\&+\frac{1}{2}\cdot f''_{yy} (a,b, c)\cdot (y-b)^2+\frac{1}{2}\cdot f''_{zz} (a,b, c)\cdot (z-c)^2 +\\&+ f''_{xy} (a,b,c)\cdot (x-a)(y-b)+f''_{xz} (a,b,c)\cdot (x-a)(z-c)+\\&+f''_{yz} (a, b)\cdot (y-b)(z-c), \end{aligned}$$

Remark 1.94.

If the function f is n times differentiable at a, then the polynomial in (1.28) can be written in the following alternative form: Indeed, suppose that the index i occurs in the sequence $(i_1 ,\ldots , i_k )$ exactly $s_i$ times (${i =1,\ldots , p}$). Then $s_1 ,\ldots , s_p$ are nonnegative integers with ${s_1 +\ldots +s_p =k}$. It is well known (and easy to show) that the number of possible permutations of the sequence $(i_1 ,\ldots , i_k )$ is $\frac{k!}{s_1 !\cdots s_p !} $. Using the notation of Theorem 1.93, we can see that the term $D^{s_1 \ldots s_p}({x_{1} - a_{1}})^{s_1} \cdots ({x_{p}-a_{p}})^{s_p}$ occurs $\frac{k!}{s_1 !\cdots s_p !}$ times on the right-hand side of (1.30). This proves that (1.28) and (1.30) define the same polynomial.

$$\begin{aligned} t_n (x)= f(a)&+ \sum _{i=1}^pD_i f(a) \cdot (x_i -a_i ) + \\&+\frac{1}{2!}\sum _{i_1 , i_2 =1}^p D_{i_1 i_2} f(a) \cdot (x_{i_1} - a_{i_1} ) (x_{i_2} -a_{i_2} ) + \ldots +\\&+ \frac{1}{n!} \sum _{i_1 ,\ldots , i_n =1}^p D_{i_1 \ldots i_n} f(a) \cdot (x_{i_1} - a_{i_1} ) \cdots (x_{i_n} - a_{i_n} ).\nonumber \end{aligned}$$

(1.30)

Definition 1.95.

The following notion makes it possible to represent the multivariable Taylor polynomials in a simple form similar to that in the single-variable case.

Definition 1.96.

If the function f is n times differentiable at $a\in {\mathbb {R}^{p}}$, then we call the polynomial the k
th differential
of the function f at a, and we use the notation $d^k f(a)$ $(k\le n)$. Thus $d^k f(a)$ is not a real number, but a p-variable polynomial. If $b=(b_1 ,\ldots , b_p ) \in {\mathbb {R}^{p}}$, then $d^k f(a)(b)$ is the value the polynomial $d^k f(a)$ takes at b; that is,

$$\begin{aligned} \sum _{\genfrac{}{}{0.0pt}1{s_1 ,\ldots , s_p \ge 0}{s_1 +\ldots +s_p =k}}\frac{k!}{s_1 ! \cdots s_p !} \cdot D^{s_1 \ldots s_p } f(a) \cdot x_{1}^{s_1} \cdots x_{p}^{s_p} =\nonumber \\ =\sum _{i_1 ,\ldots , i_k =1}^p D_{i_1 \ldots i_k} f(a) \cdot x_{i_1} \cdots x_{i_k} \end{aligned}$$

(1.31)

$$ d^k f(a)(b)=\sum _{i_1 ,\ldots , i_k =1}^p D_{i_1 \ldots i_k} f(a) \cdot b_{i_1} \cdots b_{i_k} . $$

For $p=2$ and $k=2$ we haveWe can write the nth Taylor polynomial in the formusing differentials. Again, $d^kf(a)(x-a)$ is the value $d^kf(a)$ takes at $x-a$.

$$ d^2 f(a)(b)=f''_{xx}(a)b_1^2+2f''_{xy}(a)b_1b_2+f''_{yy}(a)b_2^2. $$

$$ t_n (x)=f(a)+d^1 f(a) (x-a) +\frac{1}{2!} d^2 f(a) (x-a) +\ldots + \frac{1}{n!} d^n f(a) (x-a) $$

Theorem 1.97.

(Taylor’s formula) Let the function f be $(n+1)$ times differentiable at the points of
the segment [a, b], where $a, b\in {\mathbb {R}^{p}}$. Then there exists a point $c\in [a, b]$ such that

$$\begin{aligned} f(b)=t_n (b)+\frac{1}{(n+1)!} d^{n+1} f(c) (b-a). \end{aligned}$$

(1.32)

Lemma 1.98.

Let the function f be n times differentiable at the points of the segment [a, b], where $a, b\in {\mathbb {R}^{p}}$. If $F(t)=f(a+t\cdot (b-a))$ $(t\in [0,1])$, then the function F is n times differentiable on the interval $[0,1]$, and for every $k\le n$ and $t\in [0,1]$.

$$\begin{aligned} F^{(k)} (t)= d^k f(a+t(b-a))(b-a) \end{aligned}$$

(1.33)

Proof.

We prove the lemma by induction on k. If $k=0$, then the statement is true, since $F^{(0)} (t)= F(t)=f(a+t(b-a))$, and $ d^0 f(a+t(b-a))$ is the constant polynomial $f(a+t(b-a))$. If $k=1$, then (1.33) is exactly part (i) of Theorem 1.79.

Let $1\le k, and suppose that (1.33) is true for every $t\in [0,1]$. By the definitions of the differential $d^k f$, we have for every $t\in [0,1]$. Since f is $n>k$ times differentiable at the points of [a, b], every kth-order partial derivative $D_{i_1 \ldots i_k} f$ is differentiable there. By part (i) of Theorem 1.79, the function $t\mapsto D_{i_1 \ldots i_k} f(a+t(b-a))$ is differentiable at $[0,1]$, and its derivative isThis holds for every term on the right-hand side of (1.34). Thus $F^{(k)}$ is differentiable at $[0,1]$, and its derivative isTherefore, (1.33) holds for $(k+1)$, and (1.33) has been proved for every ${k\le n}$. $\square $

$$\begin{aligned} F^{(k)} (t)=\sum _{i_1 ,\ldots , i_k =1}^p D_{i_1 \ldots i_k} f(a+t(b-a)) \cdot (b_{i_1} - a_{i_1} ) \cdots (b_{i_k} - a_{i_k}) \end{aligned}$$

(1.34)

$$\sum _{i=1}^pD_{i, i_1 \ldots i_k} f(a+t(b-a)) \cdot (b_i -a_i ).$$

$$ F^{(k+1)} (t)= \sum _{i_1 ,\ldots , i_{k+1} =1}^p D_{i_1 \ldots i_{k+1}} f(a+t(b-a)) \cdot (b_{i_1} - a_{i_1} ) \cdots (b_{i_{k+1}} - a_{i_{k+1}} ). $$

Proof of Theorem 1.97.

Let $F(t)=f(a+t\cdot (b-a))$, for every ${t\in [0,1]}$. By Lemma 1.98, F is $(n+1)$ times differentiable on the interval $[0,1]$, and (1.33) holds for every $k\le n+1$ and $t\in [0,1]$. If we apply (the single-variable version of) Taylor’s formula with Lagrange remainder (see [7, 13.7]), we get (1.32). $\square $

Theorem 1.99.

Let the function f be n times differentiable at $a=(a_1 ,\ldots , a_p )\in {\mathbb {R}^{p}}$, and let $t_n$ be the nth Taylor polynomial of f at a. Then Conversely, if a polynomial q with degree at most n satisfies then $q=t_n $. (In other words, among the polynomials of degree at most n, $t_n$ is the one that approximates the function f best locally at the point a.)

$$\begin{aligned} \lim _{x\rightarrow a} \frac{f(x)- t_n (x)}{|x-a|^n} =0. \end{aligned}$$

(1.35)

$$\begin{aligned} \lim _{x\rightarrow a} \frac{f(x)-q(x)}{|x-a|^n} =0, \end{aligned}$$

(1.36)

Proof.

For $n=1$, equation (1.35) is exactly the definition of differentiability of f at a. Thus, we may assume that $n\ge 2$.

Let f be n times differentiable at a. The function $g=f-t_{n}$ is also n times differentiable at a, and by Theorem 1.93, the partial derivatives of g of order at most n are all zero at a. The $(n-1)$st-order partial derivatives of g are differentiable at a, and for the same reason as we mentioned above, both their values at a and the values of their partial derivatives at a are zero. By the definition of differentiability, for every $\varepsilon >0$ there exists $\delta >0$ such that if $|x-a|<\delta $, then for every $1\le i_{j} \le p\,(j=1,\ldots , n-1)$. Let us apply the $(n-2)$nd Taylor formula for g. We find that for every $x\in B(a,\delta )$ there exists ${c\in [a, x]}$ such thatSince $|c-a|<\delta $, it follows from (1.37) thatThis impliesfor every $0<|x-a|<\delta $. Since $\varepsilon $ was arbitrary, (1.35) has been proved.

$$\begin{aligned} \left| D_{i_1 \ldots i_{n-1}} g(x)\right| \le \varepsilon \cdot |x-a| \end{aligned}$$

(1.37)

$$\begin{aligned} g(x)&=\frac{1}{(n-1)!} d^{n-1} g(c) (x-a)=\\&=\frac{1}{(n-1)!} \cdot \sum _{i_1 ,\ldots , i_{n-1} =1}^p D_{i_1 \ldots i_{n-1}} g(c)(x_{i_1} -a_{i_1})\cdots (x_{i_{n-1}} -a_{i_{n-1}} ). \end{aligned}$$

$$|g(x)|\le \frac{p^{n-1}}{(n-1)!} \cdot \varepsilon \cdot |c-a|\cdot |x-a|^{n-1} \le \frac{p^{n-1}}{(n-1)!} \cdot \varepsilon \cdot |x-a|^{n} .$$

$$\frac{|f(x)-t_{n}(x)|}{|x-a|^{n}} \le \frac{p^{n-1}}{(n-1)!} \cdot \varepsilon $$

Now let’s assume that (1.36) holds for a polynomial q with degree at most n. Then $r=q-t_n$ is a polynomial of degree at most n, and We need to prove that r is the constant zero function. If $p=1$, then (1.38) implies that a is a root of r with multiplicity at least $(n+1)$. Since the degree of r is at most n, this is possible only if r is identically zero.

$$\begin{aligned} \lim _{x\rightarrow a} r(x)/|x-a|^n =0. \end{aligned}$$

(1.38)

Let $p>1$ and let $s(t)=r(a+ty)$ $(t\in \mathbb {R})$, where y is a fixed p-dimensional nonzero vector. It is obvious that s is a polynomial in the variable t of degree at most n. Applying Theorem 1.49 on the limit of composite functions, we obtain $\lim _{t\rightarrow 0} s(t)/|ty|^n =0$ and $\lim _{t\rightarrow 0} s(t)/|t|^n =0$. As we saw above, this implies that $s(t)=0$ for every t. Then $r(a+y)=s(1)=0$ for every vector $y\in {\mathbb {R}^{p}},\ y\ne 0$. Thus $r\equiv 0$, since r is continuous at the point a. $\square $

Let f be a function of one variable, and suppose that f is twice differentiable at the point $a\in \mathbb {R}$. It is well known that if $f'(a)=0$ and $f''(a)>0$, then f has a strict local minimum at the point a, and if $f'(a)=0$ and $f''(a)<0$, then f has a strict local maximum at the point a. (See [7, Theorem 12.60].) This implies that if f has a local minimum at the point a, then necessarily $f''(a)\ge 0$. The following application of Taylor’s formula gives a generalization of these results to multivariable functions.

To state our theorem, we need to introduce a few concepts from the field of algebra. We say that a p-variable polynomial is a quadratic form
if every term of its canonical form is of degree two. In other words, a polynomial is a quadratic form if it can be written as $\sum _{i, j=1}^p c_{ij} x_i x_j$. Note that if f is twice-differentiable at a, then the second differential $d^2 f(a)$ is a quadratic form, since $d^2 f(a)(x)=\sum _{i, j=1}^p D_{ij} f(a)\cdot x_i x_j $.

Definition 1.100.

A quadratic form q is positive (negative) definite
if ${q(x)>0}$
($q(x)<0$) for every $x\ne 0$.

A quadratic form q is positive (negative) semidefinite
if ${q(x)\ge 0}$
(${q(x)\le 0}$) for every $x\in {\mathbb {R}^{p}}$.

A quadratic form q is indefinite
if it takes both positive and negative values.

Theorem 1.101.

Let f be twice differentiable at $a\in {\mathbb {R}^{p}}$, and let $D_i f(a)=0$ for every $i=1,\ldots , p$.

- (i)If f has a local minimum (maximum) at a, then the quadratic form $d^2 f(a)$ is positive (negative) semidefinite.
- (ii)If the quadratic form $d^2 f(a)$ is positive (negative) definite, then f has a strict local minimum (maximum) at a.

Proof.

(i) We prove the result by contradiction. Let f have a local minimum at a, and suppose that there exists a point $x_0 $ such that $d^2 f(a) (x_0 )<0$. Since $D_i f(a)\,{=}\, 0$ for every $i\,{=}\, 1,\ldots , p$, we have $d^1 f(a){=}\, 0$, and $t_2 (x){=}f(a){+}\frac{1}{2}\cdot d^2 f(a) (x-a)$ for every x. According to Theorem 1.99, For t small enough, (1.39) impliesOn the other hand,and thusfor every t small enough. This means that if $d^2 f(a)$ takes a negative value, then f takes a value less than f(a) in every neighborhood of a, which is a contradiction.

$$\begin{aligned} \lim _{x\rightarrow a} \frac{f(x)-t_2 (x)}{|x-a|^2} =0. \end{aligned}$$

(1.39)

$$|f(a+tx_0 )-t_2 (a+tx_0 )|< \frac{|d^2 f(a)(x_0 )|}{2} \cdot t^2 .$$

$$t_2 (a+tx_0 )=f(a)+\frac{t^2}{2} \cdot d^2 f(a) (x_0 ),$$

$$\begin{aligned} f(a+tx_0 )&

We can prove similarly that if f has a local maximum at a, then $d^2 f(a) $ is negative semidefinite. Thus (i) is proved.

Now let $d^2 f(a) $ be positive definite. The function $d^2 f(a) $ is positive and continuous on the set $S(0,1)= \{ x\in {\mathbb {R}^{p}}:|x|=1\}$. Since S(0, 1) is bounded and closed, Theorem 1.51 implies that $d^2 f(a) $ takes a least value on S(0, 1). Let this value be m; then $m>0$ and $d^2 f(a) (x)\ge m$ for every $x\in S(0,1)$. If $x\ne 0$, then $x/|x| \in S(0,1)$, and thus By (1.39), there exists $\delta >0$ such that $|f(x)-t_2 (x) | < (m/2)\cdot |x-a|^2$ for every $0<|x-a|<\delta $. If $0<|x-a|<\delta $ then (1.40) impliesThis proves that f has a strict local minimum at a. Similarly, if $d^2 f(a) $ is negative definite, then f has a strict local maximum at a, which proves (ii). $\square $

$$\begin{aligned} d^2 f(a) (x)=|x|^2 \cdot d^2 f(a) (x/|x|)\ge m\cdot |x|^2 . \end{aligned}$$

(1.40)

$$ f(x)>t_2 (x)- (m/2)\cdot |x-a|^2 \ge f(a) +\tfrac{1}{2} \cdot m\cdot |x-a|^2 - (m/2)\cdot |x-a|^2 =f(a). $$

Remark 1.102.

1. For $p=1$, we have $d^2 f(a) (x)=f''(a)\cdot x^2$, which is positive definite if $f''(a)>0$, negative definite if $f''(a)<0$, positive semidefinite if $f''(a)\ge 0$, and negative semidefinite if $f''(a)\le 0$. (For single-variable functions every quadratic form is semidefinite; there are no indefinite quadratic forms.) Thus, (i) of Theorem 1.101 gives the statement we quoted above: if $f'(a)=0$ and $f''(a)>0$, then f has a strict local minimum at the point a.

Note that for $p>1$, there exist indefinite quadratic forms (e.g., $x_1 x_2$).

2. We show that neither of the converses of the statements of Theorem 1.101 is true. Obviously, every first- and second-order partial derivative of the polynomial $f(x_1 ,\ldots , x_p )= x_1^3 $ is zero at the origin. Thus the quadratic form $d^2 f(0)$ is constant and equal to zero. Consequently, it is positive semidefinite. Still, the function f does not have a local minimum at the origin, since it takes negative values in every neighborhood of the origin.

Now consider the polynomial $g(x_1 ,\ldots , x_p )= x_1^4 +\ldots + x_p^4$, which has a strict local minimum at the origin. Since every second-order partial derivative of g is zero at the origin, the quadratic form $d^2 g(0)$ is constant and equal to zero, and is therefore not positive definite.

3. The quadratic form $ax^2 +bxy +cy^2$ is positive definite if and only if $a> 0$ and $b^2 -4ac < 0$. A classic theorem of abstract algebra states that for every quadratic form (of an arbitrary number of variables) an appropriate matrix (or rather the signs of its subdeterminants) formed from the coefficients of the quadratic form can tell us whether the quadratic form is positive (negative) definite, or positive (negative) semidefinite. For a mathematically precise statement see [6, Theorem 7.3.4].

A single-variable differentiable function f is convex on an interval if and only if each of the tangents of graph f is under the graph of the function (see [7, Theorem 12.64]). Also, a twice-differentiable function is convex on an interval if and only if its second derivative is nonnegative everywhere on the interval (see [7, Theorem 12.65]). Both statements can be generalized in the multivariable case.

Definition 1.103.

We say that the set $H\subset {\mathbb {R}^{p}}$ is convex if H contains every segment whose endpoints are in H.

Every ball is convex. Indeed, if $x,y\in B(a, r)$, thenfor every $t\in [0,1]$, i.e., every point of the segment [x, y] is in B(a, r).

$$\begin{aligned} |x+t(y-x) -a|&=|(1-t)(x-a)+t(y-a)|\le \\&\le (1-t)|x-a|+t|y-a|<\\&< (1-t)r+tr=r \end{aligned}$$

A similar argument shows that every closed ball is convex. It is also easy to see that every open or closed box is also convex.

Definition 1.104.

Let $H\subset {\mathbb {R}^{p}}$ be convex. We say that the function ${f:H\rightarrow \mathbb {R}}$ is convex on the set H
if for every $x, y\in H$, the single-variable function $t\mapsto f(x+t(y-x))$ is convex on the interval $[0,1]$. That is, f is convex on H iffor every $x, y\in H$ and $t\in [0,1]$.

$$f((1-t)x+ty)\le (1-t)f(x)+tf(y)$$

We say that the function ${f:H\rightarrow \mathbb {R}}$ is concave on the set H
if $-f$
is convex on H.

Figure 1.18 shows an example of a convex function.

Theorem 1.105.

Let f be differentiable on the convex and open set $G\subset {\mathbb {R}^{p}}$. The function f is convex on G if and only if the graph of f is above the tangent hyperplane at the point (a, f(a)) for every $a\in G$. In other words, f is convex on G if and only if for every $a, x\in G$.

$$\begin{aligned} f(x)\ge f(a)+\langle f' (a), x-a\rangle \end{aligned}$$

(1.41)

Proof.

Let f be convex on G, and let a and x be different points of G. By Theorem 1.79, the single-variable function $F(t)=f(a+t(x-a))$ is differentiable at $[0,1]$, and $F'(t)=\langle f'(a+t(x-a)), x-a\rangle $ for every $t\in [0,1]$. Since F is convex on $[0,1]$ (by our assumption), we havewhich is exactly (1.41). (We applied here [7, Theorem 12.64]).

$$f(x)=F(1)\ge F(0)+F'(0)=f(a)+\langle f'(a), x-a\rangle ,$$

Now suppose (1.41) for every $a, x\in G$. Let F be the same function as above. We have to prove that F is convex on $[0,1]$. By [7, Theorem 12.65], it is enough to show that $F(t)\ge F(t_0 )+F'(t_0 )(t-t_0 )$ for every $t, t_0 \in [0,1]$. Since $F'(t)=\langle f'(a+t(x-a)), x-a\rangle $, we haveHowever, this follows from (1.41) if we apply it with ${a+t_0 (x-a)}$ and $a+t (x-a)$ in place of a and x, respectively. $\square $

$$f(a+t(x-a)) \ge f(a+t_0 (x-a)) + \langle f'(a+t_0 (x-a)), (t-t_0 )\cdot (x-a ) \rangle .$$

Theorem 1.106.

Let f be twice differentiable on the convex and open set $G\subset {\mathbb {R}^{p}}$. The function f is convex on G if and only if the quadratic form $d^2 f(a)$ is positive semidefinite for every $a\in G$.

Proof.

Let f be convex on G, and let a and b be different points of G. By Lemma 1.98, the function $F(t)=f(a+t(b-a))$ is twice differentiable on the interval $[0,1]$, and $F'' (0)=d^2 f (a)(b-a)$. Since F is convex on $[0,1]$ (by our assumption), we have $F'' (0)= d^2 f (a)(b-a)\ge 0$. This is true for every $b\in G$, showing that $d^2 f (a)$ is positive semidefinite. Indeed, since G is open, we must have $B(a, r)\subset G$ for a suitable $r>0$. For every $x\in {\mathbb {R}^{p}}$ we have $a+tx\in B(a, r)$ if t is small enough, i.e., $d^2 f (a)(tx)\ge 0$ for every t small enough. Since ${d^2 f (a)(tx)}={t^2 \cdot d^2 f (a)(x)}$, it follows that $d^2 f (a)(x)\ge 0$, and $d^2 f(a)$ is positive semidefinite.

Now let $d^2 f(a)$ be positive semidefinite for every $a\in G$. Let a and b be distinct points of G, and let $F(t)=f(a+t(b-a))$ $(t\in [0,1])$. By Lemma 1.98, F is twice differentiable on the interval $[0,1]$, and $F'' (t)=d^2 f(a+t(b-a))(b-a)\ge 0$, since ${d^2 f(a+t(b-a))}$ is positive semidefinite. This implies that F is convex on $[0,1]$. Since this is true for every $a, b\in G$, $a\ne b$, this means that f is convex on G. $\square $

Remark 1.107.

Example 1.108 Let $p=2$. The graph of the polynomial ${f(x, y)=x^2 +y^2}$ is a rotated paraboloid,
since it can be obtained by rotating the graph of the single-variable function ${z=x^2}$ around the z-axis. We show that f is convex in the plane.

For every $(a, b)\in \mathbb {R}^2$ we haveThus $d^2 f(a,b)(x, y)=2x^2+2y^2$. Since this quadratic form is positive definite, it follows from Theorem 1.106 that f is convex.

$$\begin{aligned} D_{1,1} f(a, b)&=2,\\ D_{2,1} f(a, b)&=D_{1,2}f(a, b)=0,\\ \hbox {and}\;\;D_{2,2} f(a, b)&=2. \end{aligned}$$

Exercises

1.102.

What are the third Taylor polynomials of the following functions?

- (a)x / y at (1, 1);
- (b)$x^3 +y^3 +z^3 -3xyz$ at (1, 1, 1);
- (c)$\sin (x+y)$ at (0, 0);
- (d)$x^y$ at (1, 0).

1.103.

Find the local extremum points and also the least and greatest values (if they exist) of the following two-variable functions:

- (a)$x^2 +xy +y^2-3x-3y$;
- (b)$x^3 y^2 (2-x-y)$;
- (c)$x^3 +y^3 -9xy$;
- (d)$x^4 +y^4 -2x^2 +4xy -2y^2 $.

1.104.

Let $H\subset {\mathbb {R}^{p}}$ be convex. Show that the function $f:H\rightarrow \mathbb {R}$ is convex if and only if the setis convex.

$$\{ (x, y)\in \mathbb {R}^{p+1} :x\in H,\ y\ge f(x)\} \subset \mathbb {R}^{p+1}$$

1.105.

Let $G\subset {\mathbb {R}^{p}}$ be convex and open. Show that if $f:G\rightarrow \mathbb {R}$ is convex, then it is continuous.

1.106.

Let $G\subset {\mathbb {R}^{p}}$ be convex and open. Show that the function $f:G\rightarrow \mathbb {R}$ is convex if and only if it is continuous and ifholds for every $x, y\in G$.

$$f\left( \frac{x+y}{2}\right) \le \frac{f(x)+f(y)}{2}$$

In our previous investigations we introduced the notions of tangent lines and tangent planes in connection with approximations by linear functions. However, the intuitive notion of tangent lines also involves the idea that tangents are the “limits of the secant lines.” Let, for example, f be a one-variable function differentiable at a. The slope of the line (the “secant”) intersecting the graph of f at the points (a, f(a)) and (x, f(x)) is ${(f(x)-f(a))/(x-a)}$. This slope converges to $f'(a)$ as $x\rightarrow a$, and thus the secant “converges” to the line with slope $f'(a)$ that contains point (a, f(a)), i.e., to the tangent line. More precisely, if x converges to a from the right or from the left, then the half-line with endpoint (a, f(a)) that intersects (x, f(x)) “converges” to one of the half-lines that are subsets of the tangent and lie above $[a,\infty )$ or $(-\infty , a]$, respectively. This property will be used for a more general definition of the tangent.

Let $x_0$ and x be different points of ${\mathbb {R}^{p}}$. The half-line $\overrightarrow{x_0 x}$ with endpoint $x_0$ and passing through x consists of the points $x_0 +t(x-x_0 )$ $(t\in \mathbb {R},\ t\ge 0)$. We say that the unit vector $(x-x_0 )/|x-x_0 |$ is the direction vector of this half-line.
Let $x_n \rightarrow x_0$
and $x_n \ne x_0$, for every n, and let ${(x_n -x_0 )/|x_n -x_0 |} \rightarrow v$. In this case we say that the sequence of half-lines $\overrightarrow{x_0 x_n }$ converges to the half-line $\{ x_0 +tv:t\ge 0\}$.

Let $H\subset {\mathbb {R}^{p}}$, and let $x_0 \in H'$. If $x_n \in H\setminus \{ x_0 \}$ and $x_n \rightarrow x_0 $, then by the Bolzano–Weierstrass theorem (Theorem 1.9), the sequence of unit vectors ${(x_n -x_0 )/|x_n -x_0 |}$ has a convergent subsequence. We say that the contingent
of the set H at $x_0$ is the set of vectors v for which there exists a sequence $x_n \in H\setminus \{ x_0 \}$ such that $x_n \rightarrow x_0$ and ${(x_n -x_0 )/|x_n -x_0 |} \rightarrow v$. We denote the contingent of the set H at $x_0$ by $\mathrm{Cont}\, (H;x_0 )$. It is clear that $\mathrm{Cont}\, (H;x_0 )\ne \emptyset $ for every ${x_0 \in H'}$.

In the next three examples we investigate the contingents of curves.
By a curve we mean a map $g:[a, b] \rightarrow {\mathbb {R}^{p}}$ (see [7, p. 380]).

Example 1.109.

1. If the single-variable function f is differentiable at a, then $\mathrm{Cont}\, (\mathrm{graph}~f ; (a, f(a)))$ contains exactly two unit vectors, namely the vectorwith slope $f'(a)$ and its negative.

$$\left( \frac{1}{\sqrt{1+(f'(a))^2 }} ,\frac{f'(a)}{\sqrt{1+(f'(a))^2 } } \right) $$

2. Let $g:[a, b]\rightarrow {\mathbb {R}^{p}}$ be a curve, and let g be differentiable at $t_0 \in (a, b)$ with $g'(t_0 )\ne 0$. The contingent of the set $\Gamma =g([a, b])$ at $g(t_0 )$ contains the unit vectors $\pm g'(t_0 )/|g'(t_0 )|$. Indeed, if $t_n \rightarrow t_0 $, thenWe havewhich impliesif $t_n >t_0$. Therefore, $g'(t_0 )/|g'(t_0 ) |\in \mathrm{Cont}\, (\Gamma , g(t_0 ))$. If $t_n$ converges to $t_0$ from the left-hand side, we get $-g'(t_0 )/|g'(t_0 )| \in \mathrm{Cont}\, (\Gamma , g(t_0 ))$ in the same way.

$$\frac{g(t_n )-g(t_0 )}{t_n -t_0 }\rightarrow g'(t_0 ).$$

$$ \left| \frac{g(t_n )-g(t_0 )}{t_n -t_0 } \right| \rightarrow |g'(t_0 )|, $$

$$ \frac{g(t_n )-g(t_0 )}{|g(t_n )-g(t_0 )|} = \frac{(g(t_n )-g(t_0 ))/(t_n -t_0 )}{|g(t_n )-g(t_0 )| /(t_n -t_0 )} \rightarrow \frac{g'(t_0 )}{|g'(t_0 )|} $$

3. Let g be a curve that passes through the point $g(t_0 )$ only once, i.e., $g(t)\ne g(t_0 )$ for every $t\ne t_0$. It is easy to see that $g(t_n ) \rightarrow g(t_0 )$ is true only if $t_n \rightarrow t_0 $. If we also assume that $g'(t_0 )\ne 0$, then we obtain that the contingent $\mathrm{Cont}\, (\Gamma , g(t_0 ))$ consists of the unit vectors $\pm g'(t_0 )/|g'(t_0 )|$.

The examples above motivate the following definition of the tangent.

Definition 1.110.

Let $x_0 \in H'$
, and let $|v|=1$. We say that the line $\{ x_0 +tv:t\in \mathbb {R}\}$ is the tangent line of the set H at the point $x_0$ if $\mathrm{Cont}\, ( H;x_0 )=\{ v,-v\} $.

By this definition, the graph of the function f has a tangent line at the point (a, f(a)) not only when f is differentiable at a, but also when $f'(a)=\infty $ or $f'(a)=-\infty $. On the other hand, if $f'_- (a)=-\infty $ and $f'_+ (a)=\infty $, then $\mathrm{graph}~f$ does not have a tangent line at (a, f(a)).

We can easily generalize Definition 1.110 to tangent planes.

Definition 1.111.

Let $x_0 \in H'$, and let S be a plane containing the origin (i.e., let S be a two-dimensional subspace). We say that a plane $\{x_0 +s:s\in S\}$ is the tangent plane of the set H at the point $x_0$ if $\mathrm{Cont}\, ( H;x_0 )$ consists of exactly the unit vectors of S.

Let the function $f:\mathbb {R}^2\rightarrow \mathbb {R}$ be differentiable at $(a, b)\in \mathbb {R}^2$. It is not very difficult to show (though some computation is involved) that the contingent of the set $\mathrm{graph}~f$ at the point (a, b, f(a, b)) consists of the unit vectors $(v_1 , v_2 , v_3 )\in \mathbb {R}^3$ for which $v_3 =D_1 f(a, b)v_1 +D_2 f(a, b)v_2 $.

Footnotes

7

By a polygonal line we mean a set of the form $[a_0 , a_1] \cup [a_1 , a_2 ]\cup \ldots \cup [a_{n-1}, a_n ]$, where $a_0 ,\ldots , a_n$ are arbitrary points in $\mathbb {R}^n$.

11

By the elementary functions we mean the polynomial, rational, exponential, power, logarithmic, hyperbolic, and trigonometric functions and their inverses, and all functions that can be obtained from these using basic operations and composition.

13

Each of these symbols appears in practice. The symbol ${\partial f}/{\partial x_i} $ is used mostly by engineers and physicists and in older books on mathematics; the symbol $f_{x_i}$ appears in the field of partial differential equations. The symbol $D_i$ is used in contemporary pure mathematics; most of the time (though not exclusively) we will also write $D_i$ for the ith partial derivative.

14

Rudolph Otto Sigismund Lipschitz (1832–1903), German mathematician. A function f is said to have the Lipschitz property
(is Lipschitz, for short) on a set A if there exists a constant $K \ge 0$ such that $|f(x_1)-f(x_0)|\le K\cdot |x_1 -x_0 |$
for all $x_0, x_1 \in A$.

15

The mean value theorem states that if $g:[a, b]\rightarrow \mathbb {R}$ is continuous on [a, b] and differentiable on (a, b), then there is a point $c\in (a, b)$ such that $g'(c)=(g(b)-g(a))/(b-a)$. See [7, Theorem 12.50].