Chapter 2 Elementary Prime Number Theory

for 2018-19 v1 [5 lectures]

In keeping with the ’elementary’ theme of the title I will attempt to keep away from complex variables. Recall that in Chapter 1 we proved the infinitude of primes by relating ∑_p1/p^σ to ζ(σ) for σ > 1. From the Euler Product, we formally get (so not worrying about convergence), $$\log \zeta(\sigma)=\sum_{p} \log \left(1-\frac{1}{p^{\sigma}}\right)^{-1}$$ for σ > 1. We will equate the derivatives of both sides, using the logarithmic derivative $$\frac{d}{d \sigma} \log f(\sigma)=\frac{f^{\prime}(\sigma)}{f(\sigma)}=\frac{f^{\prime}}{f}(\sigma)$$ along with $$\frac{d}{d \sigma} \frac{1}{p^{\sigma}}=\frac{d}{d \sigma} e^{-\sigma \log p}=-\frac{\log p}{p^{\sigma}}$$ Then, because the resulting sum, (1) below, converges uniformly for σ≥ 1 + δ, for any δ > 0 (see Background: Complex Analysis II for a discussion of uniform convergence) we can justify differentiating term-by term to get$$\tag{1}\begin{aligned} \frac{\zeta^{\prime}}{\zeta}(\sigma) &=\frac{d}{d \sigma} \log \zeta(\sigma)=-\sum_{p} \frac{d}{d \sigma} \log \left(1-\frac{1}{p^{\sigma}}\right) \\ &=-\sum_{p} \frac{1}{\left(1-\frac{1}{p^{\sigma}}\right)} \frac{d}{d \sigma}\left(1-\frac{1}{p^{\sigma}}\right) \\ &=-\sum_{p} \frac{1}{\left(1-\frac{1}{p^{\sigma}}\right)} \frac{\log p}{p^{\sigma}} \end{aligned}$$(In the Appendix we show this series converges uniformly for σ ≥ 1 + δ ). Expand (1−1/p^σ)^− 1 as a geometric series to get $$-\frac{\zeta^{\prime}}{\zeta}(\sigma)=\sum_{p} \frac{\log p}{p^{\sigma}} \sum_{k \geq 0} \frac{1}{p^{k \sigma}}=\sum_{p} \sum_{r \geq 1} \frac{\log p}{p^{r \sigma}},$$ on relabelling k + 1 as r.

The right hand side here is a double sum. See the Background: Product of Series notes to see that when the double sum is absolutely convergent, as it is in this case, then it can be rearranged in any way and the resulting series will converge to the same value. In particular, we can write out the right hand side starting as $$\frac{0}{1^{s}}+\frac{\log 2}{2^{s}}+\frac{\log 3}{3^{s}}+\frac{\log 2}{4^{s}}+\frac{\log 5}{5^{s}}+\frac{0}{6^{s}}+\frac{\log 7}{7^{s}}+\frac{\log 2}{8^{s}}+\frac{\log 3}{9^{s}}+\frac{0}{10^{s}}+\ldots$$ This non-rigorous introduction is simply to motivate the following definition.

Definition 2.1 von Mangoldt’s function is defined by $$\Lambda(n)= \begin{cases}\log p & \text { if } n=p^{r}, \\ 0 & \text { otherwise. }\end{cases}$$ Then the above argument concludes with $$\frac{\zeta^{\prime}}{\zeta}(\sigma)=-\sum_{n=1}^{\infty} \frac{\Lambda(n)}{n^{\sigma}},$$ for σ ≥ 1 + δ for any δ > 0, i.e. σ > 1.

Note In the next chapter we will show this equality holds with real σ replaced by complex s for Re s > 1.

Definition 2.2 A Dirichlet Series is a sum of the form $$\sum_{n=1}^{\infty} \frac{a_{n}}{n^{s}}$$ for some sequence {a_n}_n ≥ 1 of complex numbers, where s ∈ ℂ.

Aside Given a sequence {a_n}_n ≥ 1 the associated Dirichlet Series may not converge for any s ∈ ℂ. If it does converge for some s then it can be shown that it will converge in some half-plane Re s > c (which may be the whole of ℂ, i.e. c = − ∞). We can also look at absolute convergence; again if it converges absolutely at some point then it will do so in some half-plane Re s > c_a. Since absolute convergence implies convergence we have c ≤ c_a. It can be shown that 0 ≤ c_a − c ≤ 1. End of Aside Example 2.3ζ(s) and ζ^′(s)/ζ(s) for Re s > 1 are Dirichlet Series.

Notation For an integer n > 1 and prime p, then p^a∥n means that a is the largest power of p that divide p.

For example, if n = 2³5⁴13² = 845000 then 2³∥845000,5⁴∥845000 and 13²∥845000. Note that log 845000 = 3log 2 + 4log 5 + 2log 13 = ∑_p^a||845000alog p a general form of which will be seen in the next proof.

This notation allows an efficient way of writing an integer n in terms of its prime divisors as n = ∏_p^a∥np^a The basic and important property of von Mangoldt’s Λ is

Theorem 2.4 For n ≥ 1$$\sum_{d \mid n} \Lambda(d)=\log n\tag2$$ Proof If n = 1 both sides of (2) are zero.

If n > 1 then ∑_d ∣ nΛ(d) = ∑_{p^r ∣ n}Λ(p^r), which simply means that we have excluded the terms with d not a prime power, for in such cases Λ(d) = 0. Yet on prime powers Λ(p^r) = log p so ∑_{p^r ∣ n}Λ(p^r) = ∑_{p^r ∣ n}log p Observe this is really a double sum, over p and r. Write n = ∏_p^a∥np^a, as a product of distinct primes. Then p^r ∣ n if, and only if, p^a||n and 1 ≤ r ≤ a. In which case $$\begin{aligned} \sum_{p^{r} \mid n} \log p &=\sum_{p^{a} \| n}\left(\sum_{1 \leq r \leq a} 1\right) \log p=\sum_{p^{a} \| n} a \log p \\ &=\sum_{p^{a} \| n} \log p^{a}=\log \prod_{p^{a} \| n} p^{a}=\log n \end{aligned}$$ Combine all these steps to get the stated result.

Notation 2.5 The summatory function of the von Mangoldt function is denoted by ψ(x) = ∑_n ≤ xΛ(n). Denote the sum over the logarithm of primes by θ(x) = ∑_p ≤ xlog p, and the unweighted sum by π(x) = ∑_p ≤ x1.
Landau’s big O-notation If f(x), g(x) and h(x) are functions then we write f(x) = O(h(x)) if there exists C > 0 such that |f(x)| < Ch(x) for all x. The constant C is referred to as the implied constant. The notation is extended so that f(x) = g(x) + O(h(x)) means there exists C > 0 such that |f(x)−g(x)| < Ch(x) for all x.

It is further extended so that f(x) ≤ g(x) + O(h(x)) means there exists a function k(x) satisfying both f(x) ≤ g(x) + k(x) and k(x) = O(h(x)).
Vinogradov’s ≪ ( read as "less than less than") notation. f(x) ≪ g(x) means exactly the same as f(x) = O(g(x)).

If g(x) ≪ f(x) ≪ g(x) we write f(x) ≍ g(x).
Little o-notation We write f(x) = o(g(x)) iff $$\lim _{x \rightarrow \infty} \frac{f(x)}{g(x)}=0 .$$Asymptotic We write f(x) ∼ g(x) iff $$\lim _{x \rightarrow \infty} \frac{f(x)}{g(x)}=1 .$$ Example 2.6 $$\sum_{1 \leq n \leq x} \frac{1}{n}=\log x+O(1)$$Proof in Chapter 1 it was shown that for integer N $$\log (N+1) \leq \sum_{n \leq N} \frac{1}{n} \leq \log N+1 .$$ Given real x > 1 apply this with N = [x], the integer part of x. Then N ≤ x < N + 1 and we deduce $$\log x \leq \sum_{1 \leq n \leq x} \frac{1}{n} \leq \log x+1 .$$ This means, on writing $$\mathcal{E}(x)=\sum_{1 \leq n \leq x} \frac{1}{n}-\log x,$$ that 0 ≤ ℰ(x) ≤ 1. Weaken this to |ℰ(x)| ≤ 1, the definition of ℰ(x) = O(1). Hence result.

To proceed with our investigation into prime numbers we need a version of this with a smaller error term. This is achieved using the following important result. But first a Subtle Point. If f is differentiable on an interval containing [a,b] it may appear obvious that f(b) − f(a) = ∫_a^bf^′(t)dt but it may not, in fact, be true. You need to invoke the Fundamental Theorem of Calculus that says that if f^′ is continuous on [a,b] (alternatively that f has continuous derivative) then (5) holds.

Theorem 2.7 Abel or Partial Summation (Continuous Version) Let g : ℕ → ℂ and set G(x) = ∑_n ≤ xg(n). Let f : ℝ_> 0 → ℂ have a continuous derivative on x > 0. Then ∑_{1 ≤ n ≤ x}g(n)f(n) = f(x)G(x) − ∫₁^xG(t)f^′(t)dt. Note We sometimes write this as ∑_{1 ≤ n ≤ x}g(n)f(n) = f(x)G(x) − ∫₁^xG(t)df(t). Proof The proof is an exercise in the interchange of a finite sum and a finite integral. Start with the simple observation that, since f has a continuous derivative, f(n) = f(x) − (f(x)−f(n)) = f(x) − ∫_n^xf^′(t)dt Then, multiplying by g(n) and summing over n ≤ x gives $$\begin{aligned} \sum_{1 \leq n \leq x} g(n) f(n) &=\sum_{1 \leq n \leq x} g(n)\left(f(x)-\int_{n}^{x} f^{\prime}(t) d t\right) \\ &=f(x) G(x)-\sum_{1 \leq n \leq x} g(n) \int_{n}^{x} f^{\prime}(t) d t \end{aligned}$$ The second term here can be written as $$\sum_{1 \leq n \leq x \atop t \geq n} \int_{1}^{x} g(n) f^{\prime}(t) d t$$ Finite integrals and sums can be interchanged, with the restriction on the integral of t ≥ n reinterpreted as a condition on the sum of n ≤ t. This gives $$\begin{aligned} \underbrace{\sum_{1 \leq n \leq x} \int_{1}^{x}}_{t \geq n} g(n) f^{\prime}(t) d t=& \underbrace{\int_{1}^{x} \sum_{1 \leq n \leq x}}_{n \leq t} g(n) f^{\prime}(t) d t=\int_{1}^{x} f^{\prime}(t) \sum_{1 \leq n \leq t} g(n) d t \\ &=\int_{1}^{x} G(t) f^{\prime}(t) d t, \end{aligned}$$ as required.

An important Special Case is when g(n) = 1 for all n ≥ 1.

Notation For x ∈ ℝ define [x], the integer part of x, to be the largest integer ≤ x. Define {x} = x − [x], the fractional part of x. This satisfies 0 ≤ {x} < 1 for all real x.

Thus, in the notation of the previous Theorem, G(x) = ∑_n ≤ x1 = [x].

We can now state a fundamental result on approximating sums by integrals.

Proposition 2.8 Euler Summation Let f have a continuous derivative on x > 0. Then ∑_{1 ≤ n ≤ x}f(n) = ∫₁^xf(t)dt + f(1) − {x}f(x) + ∫₁^x{t}f^′(t)dt for all real x ≥ 1.

Notes i) If x = N is an integer then the {N}f(N) term is zero. So we can use the proposition to prove results valid for all real x and improved results for integral x.

ii) We have f(1) on both sides of this result, so it could have been written as ∑_{2 ≤ n ≤ x}f(n) = ∫₁^xf(t)dt − {x}f(x) + ∫₁^x{t}f^′(t)dt, but there is a danger that if this was done, you would not have noticed that the left hand side was a sum only over 2 ≤ n ≤ x, not 1 ≤ n ≤ x.
Proof By the result above $$\begin{aligned} \sum_{1 \leq n \leq x} f(n) &=f(x)[x]-\int_{1}^{x}[t] f^{\prime}(t) d t \\ &=f(x)[x]-\int_{1}^{x}(t-\{t\}) f^{\prime}(t) d t \\ &=f(x)[x]-\int_{1}^{x} t f^{\prime}(t) d t+\int_{1}^{x}\{t\} f^{\prime}(t) d t . \end{aligned}$$ We integrate the second integral by parts to get $$\begin{aligned} \int_{1}^{x} t f^{\prime}(t) d t &=[t f(t)]_{1}^{x}-\int_{1}^{x} f(t) d t \\ &=f(x) x-f(1)-\int_{1}^{x} f(t) d t \end{aligned}$$ Substituting back in we get $$\begin{aligned} \sum_{1 \leq n \leq x} f(n) &=f(x)[x]-f(x) x+f(1)+\int_{1}^{x} f(t) d t+\int_{1}^{x}\{t\} f^{\prime}(t) d t \\ &=\int_{1}^{x} f(t) d t+f(1)-\{x\} f(x)+\int_{1}^{x}\{t\} f^{\prime}(t) d t \end{aligned}$$ To see the strength of Proposition 2.8 we improve (3),

Theorem 2.9 There exists a constant γ such that $$\sum_{n \leq x} \frac{1}{n}=\log x+\gamma+O\left(\frac{1}{x}\right)$$ for real x > 1.

Note how the estimate on the error here is best possible (i.e. you could not replace it by a faster diminishing function of x ). This is because as x varies by a minuscule amount from just below an integer n to just above it, the left hand changes by 1/n, yet the main terms log x + γ change imperceptibly (being continuous in x ); it is the error term O(1/x) which exactly matches the change in the left hand side.
Proof From above with f(x) = 1/x we have $$\sum_{n \leq x} \frac{1}{n}=\int_{1}^{x} \frac{d t}{t}+1-\frac{\{x\}}{x}-\int_{1}^{x} \frac{\{t\}}{t^{2}} d t$$ The second integral converges absolutely since $$\int_{1}^{\infty}|\{t\}| \frac{d t}{t^{2}} \leq \int_{1}^{\infty} \frac{d t}{t^{2}}=1$$ Thus we can complete the integral up to ∞, the error in doing so is $$\leq \int_{x}^{\infty}|\{t\}| \frac{d t}{t^{2}} \leq \int_{x}^{\infty} \frac{d t}{t^{2}}=\frac{1}{x} \text {. }$$ Combining, $$\sum_{n \leq x} \frac{1}{n}=\log x+1+O\left(\frac{1}{x}\right)-\int_{1}^{\infty}\{t\} \frac{d t}{t^{2}}$$ Hence the result follows with $$\gamma=1-\int_{1}^{\infty}\{t\} \frac{d t}{t^{2}} .$$ Fundamental Idea. This method of completing a convergent integral up to infinity and then bounding the tail end is often used and should be remembered.

The constant γ is the called Euler’s constant or sometimes the EulerMascheroni constant. Reinterpreted, Theorem 2.9 says $$\gamma=\lim _{x \rightarrow \infty}\left(\sum_{n \leq x} \frac{1}{n}-\log x\right) \text {. }$$ This can be used to calculate γ though the speed of convergence is very slow. Numerically γ ≈ 0.57721566490153286060… It is not known if γ is irrational!

If we assume more about the function f we can state a very useful version of Euler’s summation. Useful in that it easily allows a sum to be replaced by an integral. Corollary 2.10 If f has a continuous derivative on x > 0, is non-negative and monotonic then ∑_{1 ≤ n ≤ x}f(n) = ∫₁^xf(t)dt + O(max(f(1),f(x))), for all real x ≥ 1.

Proof Since f is monotonic its derivative f^′(x) is of constant sign. Thus $$\begin{aligned} \left|\int_{1}^{x}\{t\} f^{\prime}(t) d t\right| & \leq \int_{1}^{x}\left|\{t\} f^{\prime}(t)\right| d t \\ & \leq \int_{1}^{x}\left|f^{\prime}(t)\right| d t \text { since }|\{t\}| \leq 1 \\ &=\left|\int_{1}^{x} f^{\prime}(t) d t\right| \text { since } f^{\prime}(t) \text { is of constant sign } \\ &=|f(x)-f(1)| \end{aligned}$$ Hence, by the triangle inequality applied twice $$\begin{aligned} \mid f(1)-\{x\} f(x)+\int_{1}^{x}\{t\} f^{\prime}(t) d t & \leq|f(1)|+|\{x\} f(x)|+\left|\int_{1}^{x}\{t\} f^{\prime}(t) d t\right| \\ & \leq f(1)+f(x)+|f(x)-f(1)| \\ & \leq f(1)+f(x)+f(x)+f(1) \\ & \leq 4 \max (f(1), f(x)) . \end{aligned}$$ In the last lines of this proof we have used a + b ≤ 2max (a,b) for a, b > 0. Make sure you believe this. An immediate application of Corollary 2.10 is

Example 2.11 Choose f(x) = log x to deduce ∑_{1 ≤ n ≤ x}log n = xlog x − x + O(logx) for real x > 1.

Again we have the best possible error term for real x. We can though do better when x = N an integer and in a number of Problem Sheet questions we look at improving and generalising this result on the sum of logarithms. The interest comes from the fact that ∑_n ≤ Nlog n = log N! and we thus get bound on N!.