1. In the model $Y_{i}=\alpha+\beta x_{i}+\epsilon_{i}, i=1, \ldots, n$, where $E\left(\epsilon_{i}\right)=0$, show that the least squares estimator of $\beta$ is$$\widehat{\beta}=\frac{n \sum x_{i} Y_{i}-\left(\sum x_{i}\right)\left(\sum Y_{i}\right)}{n \sum x_{i}^{2}-\left(\sum x_{i}\right)^{2}} .$$Show that $\widehatβ$ is unbiased for $β$. Under what additional assumptions is $\widehat{\beta}$ the maximum likelihood estimator of $\beta$?
    Proof.
    (i) To find $\widehat{\alpha}$ and $\widehat{\beta}$ that minimize $S(\alpha, \beta)=\sum_{i=1}^{n}\left(y_{i}-\alpha-\beta x_{i}\right)^{2}$, we calculate\begin{aligned}\frac{\partial S}{\partial \alpha}&=-2 \sum\left(y_{i}-\alpha-\beta x_{i}\right) \\\frac{\partial S}{\partial \beta}&=-2 \sum x_{i}\left(y_{i}-\alpha-\beta x_{i}\right)\end{aligned}Putting these partial derivatives equal to zero, the minimisers $\widehat{\alpha}$ and $\widehat{\beta}$ satisfy\begin{aligned}n \widehat{\alpha}+\widehat{\beta} \sum x_{i} &=\sum y_{i} \\\widehat{\alpha} \sum x_{i}+\widehat{\beta} \sum x_{i}^{2} &=\sum x_{i} y_{i}\end{aligned}Solving this pair of simultaneous equations for $\widehat{\alpha}$ and $\widehat{\beta}$ gives the required least square estimators.
    (ii)\begin{aligned}E[\widehat{\beta}-\beta]&=E\left[\frac{n \sum x_i (Y_i-\beta x_i)-\left(∑x_i\right)\left(∑Y_i-β∑x_i\right)}{n \sum x_i^2-\left(\sum x_i\right)^2}\right]\\&=E\left[\frac{n \sum x_i (α+ϵ_i)-\left(\sum x_i\right)\left(nα+∑ϵ_i\right)}{n \sum x_i^2-\left(\sum x_i\right)^2}\right]\\ &=\frac{n \sum x_i α-\left(\sum x_i\right)\left(nα\right)}{n \sum x_i^2-\left(\sum x_i\right)^2}\qquad\text{since }E[ϵ_i]=0 \\&=0\end{aligned}So $\widehatβ$ is unbiased for $β$.
    (iii) Assumption: $ϵ_1,⋯,ϵ_n$ are normal, independent, with mean 0 and a common variance (of $σ^2$ say)
    $Y_i∼N\left(\alpha+\beta x_{i}, \sigma^{2}\right)$. So $Y_{i}$ has p.d.f.$$f_{i}\left(y_{i}\right)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(-\frac{1}{2 \sigma^{2}}\left(y_{i}-\alpha-\beta x_{i}\right)^{2}\right), \quad-\infty< y_{i}<\infty$$So the likelihood $L(\alpha, \beta)$ is \begin{aligned} L(\alpha, \beta) &=\prod_{i=1}^{n} \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(-\frac{1}{2 \sigma^{2}}\left(y_{i}-\alpha-\beta x_{i}\right)^{2}\right) \\ &=\left(2 \pi \sigma^{2}\right)^{-n / 2} \exp \left(-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(y_{i}-\alpha-\beta x_{i}\right)^{2}\right) \end{aligned} with log-likelihood$$\ell(\alpha, \beta)=-\frac{n}{2} \log \left(2 \pi \sigma^{2}\right)-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(y_{i}-\alpha-\beta x_{i}\right)^{2}$$So $\widehat{\beta}$ is the maximum likelihood estimator of $\beta$.
  2. Suppose $x_{1}, \ldots, x_{n}$ are known constants and that $Y_{1}, \ldots, Y_{n}$ satisfy the ‘regression through the origin’ model $Y_{i}=\beta x_{i}+\epsilon_{i}$, where the $\epsilon_{i}$ are independent $N\left(0, \sigma^{2}\right)$ random variables. Show that the maximum likelihood estimator of $\beta$ is $\widehat{\beta}=\sum x_{i} Y_{i} / \sum x_{i}^{2}$. What is the distribution of $\widehat{\beta}$?
    Suppose we have data giving the distance, in miles, by road $\left(y_{i}\right)$ and in a straight line $\left(x_{i}\right)$ for several different journeys. Why might we prefer to consider the model above to the model $Y_{i}=\alpha+\beta x_{i}+\epsilon_{i}$?
    Assuming the regression ‘through the origin’ model, if the straight-line distance between two locations is 12 miles, how would you use the model to predict the expected distance by road? How could we find a $95 \%$ confidence interval for this expected distance?
    Solution.
    (i)To minimize $S = \sum (Y_i-βx_i)^2$, $\frac{\mathrm dS}{\mathrm dβ}=\sum -2x_i(Y_i-βx_i)=2β\sum x_i^2-2\sum x_iY_i$, so $\frac{\mathrm dS}{\mathrm dβ}=0$ when $β=\sum x_iY_i/\sum x_i^2$.
    (ii)$\widehat{\beta}=\frac{\sum x_i(\beta x_{i}+\epsilon_{i})}{\sum x_i^2}=β+\frac{∑x_iϵ_i}{∑x_i^2}∼N\left(β,\frac{σ^2}{∑x_i^2}\right)$
    (iii)Because when the departure station and destination coincide, the distance is zero for both straight line and road.
    (iv)Using the model, the expected distance by road is $Y=12β$. A 95% percent confidence interval is $(12β-1.96σ,12β+1.96σ)$.
  3. (a) Suppose $Y_1, \ldots, Y_n$ satisfy$$Y_i=a+b\left(x_{i}-\bar{x}\right)+\epsilon_{i}\tag1$$where the $\epsilon_{i}$ are independent $N\left(0, \sigma^{2}\right)$ and the constants $x_{i}$ are not all equal.
    Find the maximum likelihood estimators $\widehat{a}$ and $\widehat{b}$. Show that $\widehat{a}$ and $\widehat{b}$ are unbiased for $a$ and $b$, respectively, and find their variances.
    Assuming $\sigma^{2}$ is known, show how the distribution of $\widehat{b}$ can be used to construct a $95 \%$ confidence interval for $b$.
    (b) The plot below (data from Davison (2003)) shows annual maximum sea levels in Venice for 1931-1981. Consider model (1) and also the second model$$Y_{i}=\alpha+\beta x_{i}+\epsilon_{i}.\tag2$$Give an interpretation in words of the estimates $\widehat{a}=119.6$ and $\widehat{b}=0.567$ for model (1), and $\widehat{\alpha}=-989.4$ and $\widehat{\beta}=0.567$ for model (2).

    Software error:

    No closing </tex>:  at /var/www/html/MathJax.pl line 45.
    

    For help, please send mail to the webmaster ([no address given]), giving this error message and the time and date of the error.