So I was tutoring an undergraduate/master’s level applied econometrics course, and several students asked me why it is necessary to include also the exogenous control variables in both stages of 2SLS – More specifically, wouldn’t this double-count the correlation between the instrument and the controls?
This turns out to be a fascinating question that I somehow have never thought carefully about, and seemingly also lacks documentation online (Although I am known as a horrible search engine user…). Hence I spent half an afternoon creating an illustration, which, to maximize accessibility to an audience with little matrix algebra background, is purely algebraic:
For simplicity, consider the following regression
$$\qquad\qquad Y_i=\beta_0+\beta_1X_i+\beta_2K_i+u_i\,, \qquad\qquad(1)$$
where $X_i$ and $u_i$ are correlated, and $K_i$ is exogenous. We will need IV(s) for $X_i$ because it’s endogenous.
A quick reminder that the second stage of 2SLS is running the following regression
$$Y_i=\beta_0+\beta_1\hat{X}_i+\beta_2K_i+\nu_i\,, $$
where $\hat{X}_i$ is the predicted value of $X_i$ from the first stage, whatever that might be.
It makes sense that $K_i$ needs to be included in the second stage, as that is just running (1) using $\hat{X}_i$ instead. In the first stage, for simplicity let’s say that we are only using one instrument, $Z_i$. So the question becomes, why does
\begin{equation}
\qquad\qquad X_i=\pi_{0,1}+\pi_{1,1}Z_i+\pi_{2,1}K_i+\xi_{1,i} \qquad\qquad(2)
\end{equation}
make more sense than
\begin{equation}
\qquad\qquad X_i=\pi_{0,2}+\pi_{1,2}Z_i+\xi_{2,i}\,? \qquad\qquad(3)
\end{equation}
Well, it’s because of the omitted variable bias! If $Z$ and $K$ might be correlated (The assumptions for $Z$ to be a valid IV did not prohibit $Z$ to do so!), then we could cause ourselves troubles if we didn’t account for that in the first stage.
For illustrative purposes, assume $K_i=C_i+\theta Z_i$ where $corr(C_i,Z_i)=0$, and we are fully aware of that (i.e, actually both $C_i$ and $\theta$ are known). Then (2) becomes
\begin{align*} X_i&=\pi_{0,1}+\pi_{1,1}Z_i+\pi_{2,1}(C_i+\theta Z_i)+\xi_{1,i} \\ &=\pi_{0,1}+(\pi_{1,1}+\pi_{2,1}\theta)Z_i+\pi_{2,1}C_i+\xi_{1,i}\,, \end{align*}
So if we regress $X$ on $Z$ and $K$, we are basically regressing $X$ on $Z$ and $C$, i.e., estimating $\pi_{2,1}$ and $\pi_{1,1}+\pi_{2,1}\theta$ and backing out $\pi_{1,1}$ as we know the value of $\theta$.1 $\hat{\pi}_{1,1}$ and $\hat{\pi}_{2,1}$ will be unbiased in this situation.
On the other hand, for (3) we have an omitted variable, $K_i$. We can actually pin down the size of OVB because we know what’s omitted here! Equation (3) should be equivalent to a slight rearrangement of what we just did:
\[X_i=(\pi_{0,1}+\pi_{2,1}C_i)+(\pi_{1,1}+\pi_{2,1}\theta)Z_i+\xi_{1,i}\,,\]
which means
\begin{align*}\hat{\pi}_{0,2}&=\hat{\pi}_{0,1}+\hat{\pi}_{2,1}\bar{C}\,,\\ \hat{\pi}_{1,2}&=\hat{\pi}_{1,1}+\hat{\pi}_{2,1}\theta\,, \end{align*}
indicating that $\hat{\pi}_{1,2}$ comes with a bias of size $\hat{\pi}_{2,1}\theta$, which we will not be able to remove if we have only estimated (3) as we have no idea about the size of $\hat{\pi}_{2,1}$ unless we estimate (2).
So imagine plugging the two $\hat{X}$ to the second stage equation respectively:
\begin{align*}(2)\quad\Rightarrow\quad Y_i&=\beta_{0,1}+\beta_{1,1}\hat{X}_{i,1}+\beta_{2,1}K_i+u_i \\
&=\beta_{0,1}+\beta_{1,1}(\color{blue}{\hat{\pi}_{0,1}}\color{black}{+}(\color{blue}{\hat{\pi}_{1,1}}\color{black}{+}\color{blue}{\hat{\pi}_{2,1}\theta}\color{black}{)Z_i+}\color{blue}{\hat{\pi}_{2,1}}\color{black}{C_i)+\beta_{2,1}(C_i+}\color{blue}{\theta} \color{black}{Z_i)+u_i} \\
&=(\beta_{0,1}+\beta_{1,1}\color{blue}{\hat{\pi}_{0,1}}\color{black}{)+[\beta_{1,1}(}\color{blue}{\hat{\pi}_{1,1}}\color{black}{+}\color{blue}{\hat{\pi}_{2,1}\theta}\color{black}{)+\beta_{2,1}}\color{blue}{\theta}\color{black}{]Z_i+(\beta_{1,1}}\color{blue}{\hat{\pi}_{2,1}}\color{black}{+\beta_{2,1})C_i+u_i} \\
(3)\quad\Rightarrow\quad Y_i&=\beta_{0,2}+\beta_{1,2}\hat{X}_{i,2}+\beta_{2,2}K_i+u_i \\
&=\beta_{0,2}+\beta_{1,2}(\color{blue}{\hat{\pi}_{0,2}}\color{black}{+}\color{blue}{\hat{\pi}_{1,2}}\color{black}{Z_i)+\beta_{2,2}(C_i+}\color{blue}{\theta} \color{black}{Z_i)+u_i} \\
&=(\beta_{0,2}+\beta_{1,2}\color{blue}{\hat{\pi}_{0,2}}\color{black}{)+(\beta_{1,2}}\color{blue}{\hat{\pi}_{1,2}}\color{black}{+\beta_{2,2}}\color{blue}{\theta}\color{black}{)Z_i+\beta_{2,2}C_i+u_i}\end{align*}
I’m colouring all the numbers we know in each scenario in blue. The reason why I intentionally used $\beta_{j,1}$ and $\beta_{j,2}$ is because as you’ll soon see, plugging in the two different $\hat{X}$ gives you different estimates!
You’ll see how if we assume we are running this regression instead:
$$Y_i=\gamma_0+\gamma_1Z_i+\gamma_2C_i+u_i\,.$$
Since $Z$ and $C$ are both exogenous, we can obtain unbiased $\hat{\gamma}\,$s:
\begin{array}{rclcl} \color{blue}{\hat{\gamma}_0}&\color{black}{=}&\hat{\beta}_{0,1}+\hat{\beta}_{1,1}\color{blue}{\hat{\pi}_{0,1}}&\color{black}{=}&\hat{\beta}_{0,2}+\hat{\beta}_{1,2}\color{blue}{\hat{\pi}_{0,2}} \\ \color{blue}{\hat{\gamma}_1}&\color{black}{=}&\hat{\beta}_{1,1}(\color{blue}{\hat{\pi}_{1,1}}\color{black}{+}\color{blue}{\hat{\pi}_{2,1}\theta}\color{black}{)+\hat{\beta}_{2,1}}\color{blue}{\theta}&\color{black}{=}&\hat{\beta}_{1,2}\color{blue}{\hat{\pi}{1,2}}\color{black}{+\hat{\beta}_{2,2}}\color{blue}{\theta} \\
\color{blue}{\hat{\gamma}_2}&\color{black}{=}&\hat{\beta}_{1,1}\color{blue}{\hat{\pi}_{2,1}}\color{black}{+\hat{\beta}_{2,1}}&\color{black}{=}&\hat{\beta}_{2,2}
\end{array}
Which, after a little algebra solves the IV estimates corresponding to the two different first stages:
\begin{align*} (2)\quad\Rightarrow\quad \hat{\beta}_{1,1}&=\frac{\color{blue}{\hat{\gamma}_1}\color{black}{-}\color{blue}{\hat{\gamma}_2\theta}}{\color{blue}{\hat{\pi}_{1,1}}}\\
(3)\quad\Rightarrow\quad \hat{\beta}_{1,2}&=\frac{\color{blue}{\hat{\gamma}_1}\color{black}{-}\color{blue}{\hat{\gamma}_2\theta}}{\color{blue}{\hat{\pi}_{1,2}}}=\frac{\color{blue}{\hat{\gamma}_1}\color{black}{-}\color{blue}{\hat{\gamma}_2\theta}}{\hat{\pi}_{1,1}+\hat{\pi}_{2,1}\color{blue}{\theta}}
\end{align*}
The only difference between the two IV estimates are the denominators! $\color{blue}{\hat{\pi}_{1,1}}$ is not biased, while $\color{blue}{\hat{\pi}_{1,2}}$ is biased. Hence if we run the first stage without including $K_i$, we end up getting a biased IV estimate of $\beta_1$
- In practice you can imagine that we regress $K$ on $Z$ and $C$ to get $\hat{\theta}$ ↩︎
Leave a Reply