Reference: Wooldridge "Introductory Econometrics - A Modern Approach", Chapter 4
Recall the “I love numbers”-course experiments of not-my brother?
The first OLS estimate suggested that each course increases the math score by 0.15 points ($\hat{\beta}_1=0.15$)…
But that was only one estimate….
For each new random sample we would usually get a different estimate.
So what can we learn about the true value of $\beta_j$ from $\hat{\beta}_j$ in one sample? (We usually only have one sample!)
If MLR 1-4 hold, $\hat{\beta}_j$ is our best guess for the true value of $\beta_j$. But, we want to know:
How would you decide if an estimate is evidence of a ”real” effect (i.e. the true effect is not zero) and not a result of chance?
Recall that we can estimate the variance of an OLS estimator.
An estimate is more likely the result of a ”real” effect if:
$X$ is normally distributed with mean $\mu$ and standard devidation $\sigma$
What are the properties of the normal distribution?
[Advanced technical footnote: any function that is non-negative and integrates (adds up to) 1 is the pdf of a distribution.]
$\hat{\beta}_j$ is normally distributed if:
$\hat{\beta}_j$ is approximately normally distributed if:
We therefore have to standardize: Under MLR 1-6 one can show that:
The ratio of $\hat{\beta}_j - \beta_j$ to the standard deviation follows a standard normal distribution.
The ratio of $\hat{\beta}_j - \beta_j$ to the standard error (called t-statistic) follows a t-distribution.
We can then ask: If the hypothesized value of $\beta_j$ is true (usually $\beta_j=0$), what is the probability of observing a t-statistic as extreme as the one we have?
Each value of the t-statistic is associated with a specific probability: If this probability is low, we conclude that the hypothesized value is probably not true.
Goal: find out whether $\beta_j$ is different from zero?
Remember that $\hat{\beta}_j$ has a distribution?
We therefore have to standardize: Under MLR 1-6 one can show that:
The ratio of $\hat{\beta}_j - \beta_j$ to the standard deviation follows a standard normal distribution.
The ratio of $\hat{\beta}_j - \beta_j$ to the standard error (called t-statistic) follows a t-distribution.
We assume that null hypothesis is true (typically $\beta_j = 0$)
Calculate t-statistic: $\frac{\hat{\beta}_j-0}{se(\hat{\beta}_j)}$
We continue to make assumptions MLR.1-5 introduced previously.
We additionally assume that the unobserved error term ($u$) is normally distributed in the population.
This is often referred to as the normality assumption. (Note that non-normality of $u$ is not a problem if we have large samples)
Answer: It implies that the OLS estimator $\hat{\beta}_j$ follows a normal distribution too.
Theorem 4.1: Under the CLM assumptions (MLR.1-6):
$$\hat{\beta}_j \sim Normal(\beta_j,Var(\hat{\beta}_j))$$
where $$ Var(\hat{\beta}_j) = \frac{\sigma^2}{SST_j (1-R^2_j)} $$
[Do you remember how the R-squared is defined? Why does it have a j-subscript? If you’re not sure, check the the slides for topic 3.]
The result that $\hat{\beta}_j \sim Normal(\beta_j, Var(\hat{\beta}_j))$
implies that
where $$ sd(\hat{\beta}_j)= \sqrt{Var(\hat{\beta}_j)}=\frac{\sigma}{\sqrt{SST_j (1-R_j^2)}} $$
In words: the difference between the estimated value and the true parameter value, divided by the standard deviation of the estimator, is normally distributed with mean 0 and standard deviation equal to 1.
Note, we don’t observe $sd(\hat{\beta}_j)$, because it depends on $\sigma$ (the standard deviation of the error term $u$), which is an unknown parameter.
But we can calculated the standard error $se(\hat{\beta}_j)$ which is an estimate of $sd(\hat{\beta}_j)$.
If we replace $sd(\hat{\beta}_j)$ with $se(\hat{\beta}_j)$ we get the t-statistic, which follows a t-distribution instead of a normal distribution.
The test is therefore often referred to as a t-test.
[Technical footnote: $sd(\hat{\beta}_j)$ is a number. The normal distribution minus a number divided by a number gives us a (different) normal distribution. $se(\hat{\beta}_j)$ in contrast has a distribution (namely it follows a chi-squared distribution) hence why we get something different.]</sup>
Formula for standard error: $$ se(\hat{\beta}_j)= \sqrt{Var(\hat{\beta}_j)}=\frac{\hat{\sigma}}{\sqrt{SST_j (1-R_j^2)}} $$
Where $\hat{\sigma}$ is based on the OLS residuals, $$ \hat{\sigma}^2 = \frac{SSR}{n-k-1} = \frac{\sum_{i=1}^n (\hat{u}_i)^2 }{n-k-1} $$
The t statistic or the t ratio of $\hat{\beta}_j$ is defined as:
[General case (If $H_0: \beta_j=a_j$): $t = \frac{\hat{\beta}_j-a_j}{se(\hat{\beta}_j)}$ ]
under the CLM assumptions:
$$\frac{\hat{\beta}_j - \beta_j}{se(\hat{\beta}_j)} \sim t_{n-k-1}$$
where $k+1$ is the number of unknown parameters in the population model ($k$ slope parameters & the intercept).
In words, this says that the difference between the estimated value and the true parameter value, divided by the standard error of the estimator follows a t-distribution with $n-k-1$ degrees of freedom .
Our starting point is the model (DGP): $$ y= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k +u $$
Our goal is to test hypotheses about a particular $\beta_j$
Remember: $\beta_j$ is an unknown parameter and we will never know its value with certainty. But we can hypothesize about the value of $\beta_j$ and then use statistical inference to test our hypothesis.
Step $1$. Specify null hypothesis (H0) and alternative Hypothesis (H1).
H0: $\beta_j = 0$
H1: $\beta_j \neq 0$ (two sided alternative)
Step $2$. Decide on a significance level ($\alpha$): highest probability we are willing to accept of rejecting H0 if it is in fact true.
The most common significance level is 5%.
Step $3$. Stata computes the t-statistic and looks up the p-value associated with it.
Interpretation: The p-value is the probability of observing a t-statistic as extreme as we did if the null hypothesis is true.
Thus, small p-values are evidence against the null hypothesis.
Step $4$. Compare significance level ($\alpha$) with p-value. Decision rule:
If $p-value>\alpha, \quad$ H0 is not rejected.
If $p-value \leq \alpha, \quad$ H0 is rejected in favor of H1.
If you can’t look up the p-value, this simple rule of thumb can help:
Given our rule of thumb, what can we say about the statistical significance at the 5% level of ACT and skipped?
$$ wage = \beta_0 + \beta_1 educ + \beta_2 exper + u $$
The null hypothesis $H0: \beta_2=0$, years of experience has no effect on wage.
H0: $\beta_2=0$
H1: $\beta_2 \neq 0$
Significance level: 5%
Assume that CLM hold.
(continues next slide)
(answer next slide)
t-statistic: 6.39, p-value: 0.000
Significance level > p-value $\to$ reject H0.
Although H0: $\beta_j=0$ is the most common hypothesis, we sometimes want to test whether $\beta_j$ is equal to some other given constant. Suppose the null hypothesis is $$ H0: \, \beta_j=a_j$$
In this case the appropriate t-statistic is: $$t = \frac{\hat{\beta}_j-a_j}{se(\hat{\beta}_j)}$$
The rest of the t-test is the same.
Is the hsGPA coefficient in the regression below significantly different from 1 (against a two-sided alternative) at the 5% level?
A) Yes B) No C) We can’t say
Once we have estimated the DGP parameters $\beta_0$,$\beta_1$,...,$\beta_k$, and obtained the associated standard errors, we can easily construct a confidence interval (CI) for each $\beta_j$.
The CI provides a range of likely values for the unknown $\beta_j$.
Recall that $\frac{\hat{\beta}_j-\beta_j}{se(\hat{\beta}_j)}$ has a t distribution with $n-k-1$ degrees of freedom (df).
Define a 95% confidence interval for $\beta_j$ as $$ \hat{\beta}_j \pm c \cdot se(\hat{\beta}_j) $$ where constant $c$ the $97.5^{th}$ percentile in the $t_{n-k-1}$ distribution.
Question: why 97.5, not 95?</sup>
For a 95% Confidence-Interval, the critical value c is chosen to make the area in each tail equal 2.5%, i.e., c is the 97.5th percentile in the t distribution.
The graph shows that, if df=26, then c=2.06.
If H0 were true, we would expect a t-statistic larger than 2.06 or smaller than -2.06 in only 5% of the cases.
$$ \hat{\beta}_j \pm c \cdot se(\hat{\beta}_j) \left\{ \begin{align*} \bar{\beta}_j = \hat{\beta}_j + c \cdot se(\hat{\beta}_j) \quad \text{upper limit} \\ \underline{\beta}_j = \hat{\beta}_j - c \cdot se(\hat{\beta}_j) \quad \text{lower limit} \end{align*} \right. $$
Constructing a 95% confidence interval involves calculating two values, $\bar{\beta}_j$ and $\underline{\beta}_j$, which are such that if random samples were obtained many times, with the confidence interval ($\bar{\beta}_j$, $\underline{\beta}_j$) computed each time, then 95% of these intervals would contain the unknown.
This implies that if our 95% confidence interval does not include zero, we can reject H0 at the 5% level.
$$ \hat{\beta}_j \pm c \cdot se(\hat{\beta}_j) \left\{ \begin{align*} \bar{\beta}_j = \hat{\beta}_j + c \cdot se(\hat{\beta}_j) \quad \text{upper limit} \\ \underline{\beta}_j = \hat{\beta}_j - c \cdot se(\hat{\beta}_j) \quad \text{lower limit} \end{align*} \right. $$
Unfortunately, for the single sample that we use to construct the CI (confidence interval), we do not know whether $\beta_j$ is actually contained in the interval.
We believe we have obtained a sample that is one of the 95% of all samples where the CI contains $\beta_j$, but we have no guarantee.
Question: What happens to the CI when se() increases?
Assume that CLM assumptions hold.
Question: Which coefficient are statistically significant at the 5% level?
Use the Use the 95% CI in your to answer this question.
How is it in other sports?
Let’s find out! We have data on salary and performance statistics.
[Personal thoughts: Baseball is a contender for the worst sport ever invented. This is clear from the fact that the most exciting thing in the game is when the ball is no longer on the field. :P]
Consider the following model of (major league) baseball players’ salaries: $$ log(salary) = \beta_0 + \beta_1 years + \beta_2 gamesyr + \beta_3 bavg + \beta_4 hrunsyr + \beta_5 rbisyr + u$$
Variables:
[We will consider the last three variables in red as those capturing 'performance'.]
Assume that CLM assumptions hold.
Note: Because the dependent variable is log salary, we interpret the coefficients as percentage change in salary. For example, the model predicts that a one year increase in the league increases salary by 6.8%.
$$ log(salary) = \beta_0 + \beta_1 years + \beta_2 gamesyr + \beta_3 bavg + \beta_4 hrunsyr + \beta_5 rbisyr + u$$
Our hypothesis is very general:
$H0: \beta_3=0, \, \beta_4=0, \, \beta_4=0$
$H1:$ H0 is not true
In economics we sometimes want to test whether a number of coefficients are jointly significant. We can do this with an F-test.
$ \text{Sum of Squares Total: }SST= \sum_{i=1}^n (y_i-\bar{y})^2 $
$ \text{Sum of Squares Explained: }SSE= \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 $
$ \text{Sum of Squares Residual: }SSR= \sum_{i=1}^n (\hat{u}_i)^2 $
We compare the SSR of the restricted model to the SSR of the unrestricted model.
Unrestricted model: $$ log(salary) = \beta_0 + \beta_1 years + \beta_2 gamesyr + \beta_3 bavg + \beta_4 hrunsyr + \beta_5 rbisyr + u$$
Restricted model: $$ log(salary) = \beta_0 + \beta_1 years + \beta_2 gamesyr + u$$
Exclusion restrictions: $H0: \beta_3=0, \, \beta_4=0, \, \beta_4=0$
Econometrics jargon: ”impose restrictions” = other values (in our case zeros) are assumed for certain parameters than those obtained when the model is estimated freely. The null hypothesis is that these parameters are "jointly" (all) zero.
What can we say about the relationship between SSRr and SSRur?
a) SSRr ≥ SSRur
b) SSRr ≤ SSRur
c) SSRr = SSRur
d) it depends / we can’t say
[SSRr=SSR of restricted model, SSRur=SSR of unrestricted model]
When we add the variables to the model and move from the restricted to the unrestricted model...
Key question: Does SSR decrease enough for it to be warranted to reject the null hypothesis?
How much should the SSR decrease so that it is likely not a result of chance, i.e., statistically significant?
To use the F statistic we must know its sampling distribution under the null (this enables us to choose critical values & rejection rules).
Under $H0$, and when the CLM assumptions hold, F follows an F distribution with (q,n-k-1) degrees of freedom: $$F \sim F_{q,n-k-1}$$
The critical values for significance level of 5% for the F distribution are given in Table G.3.b.
Rejection rule: Reject H0 in favor of H1 at (say) the 5% significance level if F>c, where c is the 95th percentile in the $F_{q,n-k-1}$ distribution.
Recall from topic 3: $se(\hat{\beta}_j) = \frac{\hat{\sigma}}{\sqrt{SST_j (1-R_j^2)}}$
This example shows quite clearly that by including near-multicollinear regressors you will get large standard errors and, consequently, low t-values.
The p-value is defined as $$ p-value = Pr(\textbf{F}>F)$$
where F is the random variable with $df=(q,n-k-1)$ and $F$ is the actual value of the test statistic.
Interpretation of p-value: The probability of observing a value for F as large as we did given that the null hypothesis is true.
For example, $p-value = 0.016$ implies such probability is only 1.6% - hence we would reject the null hypothesis at the 5% level (but not at the 1% level).
Consider the following model and null hypothesis: $$ y= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k +u $$
H0: $x_1, x_2,..., x_k$ do not help to explain $y$
That is,
$H0: \beta_1=\beta_2=...=\beta_k=0$
Model under $H0$: $y=\beta_0 + u$
It can be shown that, in this case, the F statistic can be computed as $$ F=\frac{R^2/k}{(1-R^2)/(n-k-1)}$$
[Note: This last formula is only valid if you want to test whether all explanatory variables are jointly significant. (Thought: If you used the earlier formula, what is $q$ here?)]
$$ F=\frac{R^2/k}{(1-R^2)/(n-k-1)}=\frac{0.628/5}{(1-0.628)/347}=117$$
In some cases, the alternative Hypothesis is only one sided.
H0: $\beta_j=0$
H1: $\beta_j>0$ or H1: $\beta_j<0$
If H1: $\beta_j=0$ this means that we believe that the effect of $x_j$ can only be positive.
The p-values reported in Stata are for two-sided tests (i.e. H0: $\beta_j=0$, and H1: $\beta_j\neq 0$)
To get the p-value for a one sided test, we just divide the p-value reported in Stata by 2.
$$\text{H0: }\beta_j=0\text{, and H1: } \beta_j\neq 0$$
Significance level = 5% $\to$ We accept being wrong in 5% of the cases.
Extreme t-statistics (both positive and negative) are evidence against the null hypothesis.
Choose critical value for which we would be wrong in 5% of the cases.
$$\text{H0: }\beta_j=0\text{, and H1: } \beta_j > 0$$
Significance level = 5% $\to$ We accept being wrong in 5% of the cases.
Large, positive t-statistics are evidence against the null hypothesis.
Large, negative t-statistics are evidence for the null hypothesis.
Choose critical value for which we would be wrong in 5% of the cases.
One sided hypothesis testing less conservative:
This is because we already rule out one direction of the effect.
The key difference is that we look up a different critical value.
Step 1: Specify null and alternative hypothesis (H0: $\beta_j=0$; H1: $\beta_j>0$ (or H1:$\beta_j<0$))
Step 2: Decide on significance level.
Step 3: Compute t-statistic.
Step 4: Divide Stata p-value by 2.
Step 5: Compare p-value (from Step 4) with significance level.
To get the p-value for the one sided test, simply divide the p-value reported in Stata by 2
What’s the p-value for the one sided test for the ACT coefficient?
Assumption MLR.1: The model is linear in parameters: $y=\beta_0 + \beta_1 x_1+...+\beta_k x_k+u$
Assumption MLR.2: Random sampling: We have a sample of n observations $\{x_{i1}, x_{i2},...,x_{ik},y): \, i=1,2,...,n\}$ following the population model in Assumption MLR.1
Assumption MLR.3: No perfect collinearity: In the sample, none of the independent variables is constant and there are no exact linear relationships among the independent variables.
Assumption MLR.4: Zero conditional mean: The error $u$ has an expected value of zero, given any values of the independent variables: $E[u|x_1,x_2,...,x_k]=0$
Assumption MLR.5: Homoskedasticity: The error $u$ has the same variance given any value of the explanatory variables.
Assumption MLR.6: Normality: The population error $u$ is independent of the explanatory variables $x_1,x_2,...,x_k$, and is normally distributed with zero mean and variance $\sigma^2$: $u \sim Normal(0, \sigma^2)$.
If CLM assumptions hold: We can make inference about the underlying parameters (i.e., we can learn from world 2 about world 1).
If MLR 1.-5 are true, but error term is not normally distributed (MLR 6)
What is the effect of being married on wage?
Let’s estimate the following model: $$ wage = \beta_0 + \beta_1 educ + \beta_2 exper + \color{red}{\beta_3} married + u $$
Where married is a dummy variable which is 1 if the individual is married and 0 otherwise.
(We will talk more about dummy variables later)
Is the sample large enough? Yes, so we don’t worry about MLR 6.
My interpretation of the results:
Remember how to interpret OLS coefficients.
Nice discussion in Section 4.2 in the book. Summary:
Focus on point estimate and confidence intervals.
Why confidence intervals? They tell us if 'big' or 'small' values/effects are plausible given the evidence.
Don't focus on statistical significance at a specific cut-off level (95%, 99%, etc.)
Report results for hypothesis tests for statistical significance using p-values.
Big advantage is you can use p-value approach without needing to know the underlying distribution. E.g., we saw p-values in both t-test and F-test.
There are some important exceptions: e.g., in most countries pharmaceutical regulation means drugs can be sold as treating a certain disease if their effect is shown to be statistically significant (non-zero), even if that effect might be small.
Today, any statistical program reports p-values.
In the past, researchers did hypothesis testing by looking up critical values in statistical tables.
While this is now hardly done in practice this approach is of some historical importance.
You don’t need to do this in the exam.