• Welcome to the MHE Blog. We'll use this space to post corrections and comments and to address any questions you might have about the material in Mostly Harmless Econometrics. We especially welcome questions that are likely to be of interest to other readers. The econometric universe is infinite and expanding, so we ask that questions be brief and relate to material in the book. Here is an example:
    "In Section 8.2.3, (on page 319), you suggest that 42 clusters is enough for the usual cluster variance formula to be fairly accurate. Is that a joke, or do you really think so?"
    To which we'd reply: "Yes."

Bbbbb . . . bivariate probit!

Raphael Studer from Switzerland noticed that the bivariate probit likelihood on page 199 looks suspiciously like the likelihood for old fashioned monaural probit.

Thanks Raphael – this is indeed the wrong likelihood, so don’t try to maximize that at home, folks.  It works only if you don’t have an endogenous regressor in the first place.  For the correct biprobit likelihood, see, e.g., pp. 849-851 in Greene (2007) or better yet, just do it in stata using biprobit (if you must).

This of course raises the question of how we came to make such a mistake.  Is it because Angrist has such a strong aversion to latent-index models that he couldn’t stand the sight of the full likelihood?  Or is it just another silly mistake Steve missed in galleys?

Published | Tagged | Leave a comment

Adding lagged dependent variables to differenced models

Reader Christopher Ordowich asks:

In sections 5.3-5.4, there is a great discussion of using
fixed effects vs. a lagged dependent variable with panel data. I am
having trouble reconciling some of this discussion with a section in a
recent paper by Imbens and Wooldridge (2008) titled “Recent
Developments in the Econometrics of Program Evaluation.” On page 68 of
their paper (as published by IZA in 2008) they suggest that it might
be better in some circumstances with two periods of data to use first
differencing and a lag of the dependent variable (assuming
unconfoundedness given lagged outcomes). I understand your discussion
of instrumenting for lagged variables if you have more than two
periods, but with two periods, how do you react to adding a lag (the
baseline value of the dependent variable) after first differencing
with only two periods of data? I have had difficulty finding support
for this approach elsewhere and given that you have given much thought
to this issue, I was wondering what your opinion might be.

The way I see it, once you add a lagged dependent variable to a differenced model, you are really doing lagged-dep-var control and not fixed effects.  Steve may disagree (he’s generally less dogmatic than me).  This is not always exactly true but it is a theorem for the simple example we use to contrast f.e. and lagged-dep-var control in Section 5.4

Here’s that again:

two periods
no covariates
the treatment, D_it, is zero for everybody in period 1 and switched on for some in period 2 (think of a training program that some people participate in between periods; period 1 is before, period 2 is after (similar to Ashenfelter and Card, 1985)

ignoring constants,  fixed effects estimation fits

(1) Y_it – Y_it-1 = aD_it + error

lagged dependent variable estimation fits

(2) Y_it = gY_it-1 + bD_it + error

As I understand it, the Imbens-Wooldridge proposal is to throw Y_it-1 into equation (1):

(3) Y_it – Y_it-1 = dY_it-1 + cD_it + error

But in this case, c is (algebraically) the same as b.  Why ? The coefficient c is

c= COV(Y_it – Y_it-1, D_it*)/V(D_it*)

where D_it* is the residual from a regression of D_it on Y_it-1.  But this residual is orthogonal to Y_it-1, hence

c= COV(Y_it – Y_it-1, D_it*)/V(D_it*) = COV(Y_it, D_it*)/V(D_it*) = b in equation (2)

So I say: “You wanna do fixed effects?  no lagged dependent variable, please (or at least be prepared to instrument it if you include one).  You wanna control for  lagged dependent variables?  Then, just do it!

– JDA

Published | Tagged , | 1 Comment

Typo on page 130

Well, we like the occasional casual relationship as much as the next guy, but on page 130 the relationship between draft-eligibility and earnings is meant to be causal . . .

Thanks to Peter Dizikes for pointing this out!

Published | Tagged | Leave a comment

In good company at The Economist Book Shop

economistbookshop_aug20091

Taken August 28, 2009, really!

Published | | Leave a comment

Typo on page 174

Hendrik Juerges from the University of Mannheim caught this one:
Bottom of page 174
– should read: “where rho_1 is LATE using …”
– not: “where psi_1 is LATE using …”

many thanks Hendrik!

Published | Tagged | Leave a comment

Is 2SLS really OK?

Elias Dinas from EUI asks: In section 4.6.1 you explain very clearly the problems from the straightforward use of the 2SLS logic in binary choice and/orendogenous treatment models. You also provide a simple ‘linearized’ alternative but this is useful at the cost of introducing back-door identifying information. It so happens that I have a continuous Y a binary D, instrumented with two Zs (one binary the other continuous). I guess that if Y was also a dummy, MLE could provide consistent estimates for the average effect (following wooldridge 2003:478). However, in this case, I think I am left with two alternatives: 2-stage probit least squares (the cdsimeq command in stata) whose second stage however seems to belong in the fobidden regressions family, and the ‘linearized’ 2-Stage solution you suggest in the book. So my question is should I prefer one over the other or even consider a third option? Thank you very much for your help and looking forward for your reply. Elias Dinas

Thanks for your question Elias.

Section 4.6.1 discusses two approaches to 2SLS with a dummy endogenous variable, forbidden (plug-in) regression and the use nonlinear fitted values as instruments, neither of which we really like. Rather, as suggested by our discussion of nonlinear models with endogenous regressors in Section 6.4.3 (LDV reprise), we think you should use garden-variety 2SLS (IV) for dummy endogenous variables (as always; of course you can try fancier methods in the privacy of your own home, but this is what we like to see in published papers). With a single Bernoulli instrument IV gives you LATE; with two Bernoulli instruments, you get a weighted average of the two underlying LATEs. When one instrument is continuous, the weighting is a little trickier (see, e.g., the “fish paper”). But my experience is that the marginal effects from nonlinear structural models will be close to 2SLS (that’s how you can tell the structural model MFX were done correctly), and with 2SLS you might even get the standard errors right!

–JA

Published | Tagged | Leave a comment

MHE goes viral!

Recently seen at Logan airport checkpoint

Lucky she got through!

Lucky she got through

(and this is not one of the authors)

Published | | Leave a comment

OLS is between the effect on the treated and the effect on controls

We learn something new (and useful!) every day . . .

Macartan Humphreys of Columbia University has shown why regression estimates of treatment effects can often be expected to fall between the average effect on the treated and the average effect on controls.   His theorem goes like this:  Let D denote treatment, let p(X) denote the propensity score E[D|X], and let M(X) denote the covariate-specific treatment effects, E[Y1-Y0|X].   Suppose that M(X) varies in a monotone way with p(X) (either weakly increasing or weakly decreasing). Then OLS estimates of the treatment effect in model using saturated control for covariates (i.e., the sort of regression discussed in Section 3.3.1 of MHE) will lie between E[Y1 - Y0| D=1] and E[Y1-Y0| D=0].  Read all about it in his working paper.

Why is a treatment effect likely to be monotone in the propensity score?  This happens in the Angrist (1998) study of the effects of military service because those who benefit the most from military service are least likely to be qualified and therefore least likely to be treated.  In other cases, where self-selection is more important than qualifications (as in the Roy [1951] model), those most likely to benefit from treatment may be the most likely to get treated.  Either case is fine as long as it’s one or the other.

Why is this useful?  It’s one more reason why OLS is a good summary statistic for program impact.  Check out this figure from Macartan’s paper, which illustrates the OLS-is-in-between property using the Angrist (1998) data:

Figure 3 from Humphreys (2009)

The figure shows how OLS estimates of the effects of voluntary military service are almost always between matching estimates of effects on veterans and matching estimates of effects on non-veterans.  This happens because covariate-specific estimates of veteran effects are either unrelated to the propensity score or they are a weakly decreasing function of the propensity score.

Published | Tagged | Leave a comment

Data up for Lee (2008)

Dave Lee has graciously contributed data and programs from his landmark RD study.  You can get the goods in the MHE Data Archive.

Published | Tagged , | Leave a comment

Just-identified IV

Gary Solon of Michigan State University pointed out to us that our claim on p. 209 that “just identified 2SLS is median unbiased” is not quite correct and that the claim should be qualified. Gary notes that if the first stage is really zero, the just identified IV estimator is centered at the same point as the biased OLS estimator.  Similarly, just identified IV is biased for instruments that are extremely weak, as has been shown in the literature.

Gary is right, of course, and we thank him for pointing this out.  Just-identified IV is approximatelyly median unbiased, but if the instruments are weak enough you’ll certainly have bias.  On the other hand, if a single instrument is really that weak, you’re unlikely to want to use it since a very low t-stat and high 2nd stage standard errors will warn you away.  See the attached note for details.

Published | Tagged | Leave a comment