Reference: Wooldridge, Chapters 5-6
This model implies that the effect of x on y is linear (i.e., the same for all levels of x).
Do all DGPs follow this form?
Will an estimate of a linear effect always be the best description of the data?
Can we estimate a simple linear regression in this case?
Is the OLS regression line a good description of the data?
$$y=\beta_0+\beta_1 x +\beta_2 x^2+u$$
For positive values of $x$:
$$y=\beta_0+\beta_1 x +\beta_2 x^2+u$$
Let's estimate the following model: $$wage= \beta_0+\beta_1 exper +\beta_2 exper^2+u$$
Step 1: Generate the squared term ($exper^2$). [Stata: gen exper2=exper*exper]
Step 2: Include the original variable and the square term in the regression
The estimated effect of experience first increases, and then decreases with experience.
Note: Because the coefficient exper2 is not significant (p-value 0.171) we can’t reject the null hypothesis that the effect of experience is linear.
However, we know from other datasets that the effect of experience on wage is inverse u-shaped.
To get at the marginal effect of experience on predicted wage, we take the first derivative.
$$ \frac{\Delta \widehat{wage}}{\Delta exper} = \hat{\beta}_1+2\hat{\beta}_2 exper = 18.84-2*0.79*exper $$
The estimated effect of going from 0 to 1 year of experience is equal to approximately $\$18.84$.
The *estimated* effect of going from 20 to 21 years of experience is equal to approximately $-\$12.76$.
The estimated effect of experience is $\$0$ at $\frac{18.84}{1.58}=11.73$ years.
The maximum/minimum is at the point $|\frac{\beta_1}{2\beta_2}|$
Assumption MLR.1: The model is linear in parameters: $y=\beta_0+\beta_1 x_1+\beta_2 x_2 +...+\beta_k x_k + u$
This means we can estimate any model with OLS --even nonlinear relationships-- as long as we can express it with linear parameters (i.e. betas that linearly affect variables).
Example of relationship that is not linear in parameters: $$ f(x,\beta) = \frac{\beta_1 x}{\beta_2 + x}$$
Is this a quadratic relationship?
Can we estimate a simple linear regression with this data?
Is this OLS regression line a good description of the data?
growth = (1+return)$^x$
growth = (1+100/100)$^x$=2
growth = (1+50/100)$^2$=2.25
[50=half of 100, as 6 months is half of one year]
growth = (1+33.3/100)$^3$=2.237037...
[33.3=one third of 100, as 4 months is one third of one year]
If the rate=1 (ie. 100%) and interest period is indefinitely small, the growth rate is equal to 2.71829...$\equiv e$. $$ \text{Amount(t)=Initial amount} * e^{rate*time} $$
In words: e is the rate of growth if we continuously compont 100% return. $$ growth= e=\lim_{n \to \infty} \left(1+\frac{1}{n} \right)^n$$
We can generalize the continuous growth process to other rates. $$ y_t=y_0 * e^{rate*time} $$
Changes in the amount (expressed as a ratio) $$ \text{growth of y} = e^{rate*time} $$
Some changes are not with respect to time, but other variables. We can therefore generalize: $$ \text{growth of y} = e^{rate*x} $$
Some DGPs can be characterized by a continuous growth process. In this kind of process the increase depends on the level of y.
Examples
A DGP that is characterized by a continuous growth process can be expressed like this: $$ y= e^{\beta_0 + \beta_1 x + u} $$
This is currently non-linear in parameters. But...
..to express as a DGP that is linear in parameters (see MLR.1).
If the DGP has the functional form $$ y= e^{\beta_0 + \beta_1 x + u} $$
We can take (natural) log of both sides to get $$ ln(y)=\beta_0 + \beta_1 x + u $$
If a one-year increase in education increases the wage by a constant dollar amount, then the DGP would look like $$ y= \beta_0 + \beta_1 x + u $$
If a one-year increase in education increases the wage by a constant percentage, then the DGP would look like $$ y= e^{\beta_0 + \beta_1 x + u} $$
Let's estimate the following model: $wage=\beta_0 + \beta_1 educ + u$
A one-year increase in education increases predicted hourly wage by $\$0.54$.
[Note: deliberately estimating what we suspect is a 'wrong/misspecified' model.]
Let's estimate the following model: $log(wage)=\beta_0 + \beta_1 educ + u$
A one-year increase in education increases predicted hourly wage by approximately $8.3\%$.
Comparing the R-squared of the two regressions we can see that the log(wage) model fits the data better.
Wage and Education
Rule of thumb: for small values of $\beta_j$, $\beta_j$ is a good approximation of percentage changes in y.
This percentage approximation is only good for small values of $\beta_j$.
Recall the following regression: $log(wage)=\beta_0 + \beta_1 educ + u$
Chapter 5 introduces the concept of consistency:
As $n$ (the sample size) tends to infinity, the distribution of $\hat{\beta}_j$ collapses to the single point $\beta_j$.
What is the functional form of the relationship between x and y?
Write down an econometric model that describes this relationship?
How would you estimate it?
What is the functional form of the relationship between x and y?
Write down an econometric model that describes this relationship?
How would you estimate it?
MLR.4': $E(u)=0$ and $Cov(x_j,u)=0$, for $j=1,2,...,k$
(Traditional) R-squared: $R^2 = 1-\frac{SSR/n}{SST/n}$
Adjusted R-squared: $\bar{R}^2 = 1-\frac{SSR/(n-{\color{red}k}-1)}{SST/n-1}$
(e.g., what if we use 100x instead of x; measure x in cents instead of dollars)
We are interested in the effects of smoking during pregnancy ($cigs$) and family income ($faminc$) on birthweight ($bwght$): $$\widehat{bwght} = \hat{\beta}_0 + \hat{\beta}_1 cigs + \hat{\beta}_2 faminc$$
$bwght$ is measured in ounces.
We are interested in the effects of smoking during pregnancy ($cigs$) and family income ($faminc$) on birthweight ($bwght$):
$bwght$ is measured in ounces.
Interpret the estimated coefficients.
We are interested in the effects of smoking during pregnancy ($cigs$) and family income ($faminc$) on birthweight ($bwght$):
Let’s change $bwght$ from ounces to pounds (1 pound = 16 ounces)
We are interested in the effects of smoking during pregnancy ($cigs$) and family income ($faminc$) on birthweight ($bwght$):
Let’s change $cigs$ from to packs of cigarettes (1 ppack = 20 cigarettes)
Slope coefficient of Simple Regression Model: $$ \hat{\beta}_1= \frac{Cov(x,y)}{Var(x)} $$
Constant of a Simple Regression Model: $$ \hat{\beta}_0= \bar{y}-\hat{\beta}_1 \bar{x} $$
Variance of the OLS estimator:
Simple regression model: $$ Var(\hat{\beta}_1)= \frac{\sigma^2}{\sum_{i=1}^n (x_i-\bar{x})^2} $$
Multiple Regression Model: $$ Var(\hat{\beta}_j)= \frac{\sigma^2}{SST_j (1-R^2_j)} $$
Standard Errors of the OLS estimator: $$ se(\hat{\beta}_j)= \frac{\hat{\sigma}^2}{\sqrt{SST_j (1-R^2_j})} $$
t-statistic/t-ratio: