Diligent reader Daniela Falzon, who works at the World Bank (in France . . . or Washington, DC) writes us with the following interesting problem concerning multiple endogenous variables in 2SLS:
I am estimating Y = b0+ b1*X1 +b2* X2 + b3*X1*X2 + X3
X1 is a dummy variable and endogenous,
X2 is continuous and endogenous
X3 is a set of additional control variables.
Do you have a better idea of how I should do it or should I just focus on the interaction term and instrument it?
Many thanks in advance for your response and best regards,
thanks for your question Daniela. Models with multiple endogenous variables are indeed hard to identify and the results can be hard to interpret.
So we don’t usually like to see them – for one thing it’s not clear why you’re tackling two causal questions at the same time; one is hard enough.
You may have noticed that the only model with more than one endogenous regressor in MHE is the peer effects regression (equation 4.6.6, based on Acemoglu and Angrist, 2000). Here we have both individual and state-level schooling endogenous in a wage equation.
But we are really only interested in the peer effect in this case – the effect of state average schooling. Individual schooling is there because we realize that any instrument for average schooling must also be correlated with individual schooling. We therefore try to fix this violation of the exclusion restriction by treating individual schooling as endogenous as well. This is the best reason for having a second endog variable that I can think of. And the model may work – in the case of schooling we have enough instruments. But not very often, I would think.
More generally, it doesn’t make sense to think of one endogenous variable as a “control” when looking at the effects of another, at least not a good one (in the sense in which we use the terms good and bad control in chapter 3). So any time someone shows me a problem with more than one endogenous variable, my first question is always: why?