• Welcome to the MHE Blog. We'll use this space to post corrections and comments and to address any questions you might have about the material in Mostly Harmless Econometrics. We especially welcome questions that are likely to be of interest to other readers. The econometric universe is infinite and expanding, so we ask that questions be brief and relate to material in the book. Here is an example:
    "In Section 8.2.3, (on page 319), you suggest that 42 clusters is enough for the usual cluster variance formula to be fairly accurate. Is that a joke, or do you really think so?"
    To which we'd reply: "Yes."

QOB Qonfusion

Ilyssa wonders

Question: In Table 4.1.1 (p. 124), how are there 30 instruments in Column 8 rather than 27 (= 3 qob dummies * 9 year of birth dummies)?

Why indeed?  There are still 3 QOB main effects.

JA

Published | Tagged , | Leave a comment

Imbens and Angrist Discover LATE

… in the the Andean foothills, Chile, November 2011

LACEA: Santiago, November 2011

Published | Tagged , | 1 Comment

Mostly Harmless in Hungary

with the students of Rajk College, after receiving the Von Neumann Award

Published | Tagged | Leave a comment

Covariate Contradiction?

Thoughtful reader Nikhil from UBC asks:

I had a question regarding LATE. In your book you say in a model with
covariates, 2SLS leads to a sort of "covariate averaged LATE" even
when one does not have a saturated model. Does this mean that as one
introduces covariates the 2SLS estimator is most likely to change and
that change in the 2SLS estimate is not a comment on the validity of
the instrument?However in your empirical examples you seem to suggest
that invariance of 2SLS estimates to introduction of covariates is a
desirable thing.For example in the first paragraph on pg-152 of
Chapter 4, below Table 4.6.1, you state, "The invariance to covariates
seems desirable: since the same-sex instrument is essentially
independent of the covariates, control for covariates is unnecessary
to eliminate bias and should primarily affect precision." Essentially
my question is: should I start worrying if I see my 2SLS estimates
change as I introduce more covariates in my model? Thanks

Wow, awesome question!  MHE is indeed a little fast and loose on this.
Let me take a stab at clarification.

In Section 4.6.2, we talk about how models with covariates can be
understood as generating a weighted average of cov-specific LATEs across
covariate cells.  True enough ... if the the instrument is
discrete and the first stage saturates (includes a full set of covariate
interactions).  So far so good.  Of course, in practice, you might not
want to saturate.  OK, so do Abadie kappa weighting and get the
best-in-class linear approx to the fully saturated model.
Too lazy to do Abadie?  Just do plain old 2SLS, and that will likely be
close enough to a more rigorously justified approx or weighted average.

Later, however, as Nikhil notes - below table 4.6.1 and on the following page -
we express relief (or satisfaction at least) when IV estimates come out
insensitive to covariates (using samesex) on the grounds that samesex is
independent of covs.  

Contradiction? 

Marginal LATE, that is, LATE with no covs, is also a weighted average
of covariate-specific LATE.  The weight here is the histogram of X 
(convince yourself of this using the law of iterated expectations).
Now, sticking the covariates in and saturating (where we start in 4.5.2)
produces a weighted average with different, more complex, weighting scheme
(instead of the histogram of X as for marginal LATE, it's the histogram
times the variance of conditional-on-covs first-stage, as in Thm 4.5.1).
In practice, tho, w/o too much heterogeneity, we don't expect weighting
this way or that to be a big deal.  On the other hand, even under constant
effects, covs may matter big time when there's substantial omitted variables.
bias. Seeing that randomly assigned instrument generates IV estimates
invariant to covs makes me happy - as always, its the OVB I worry about first!  

So to be specific - Nikhil asks if he should worry when IV ests are
sensitive to covs - I'd say, yes, worry a little.
Try to figure out if what you thought was a good instrument is in
fact highly confounded with covariates. If so, its maybe not such a
great experiment after all.   If not, then perhaps the senitivity you're
seeing is just a difference in weighting schemes at work

JA
Published | Tagged , , | Leave a comment

Why are There So Many Dummies?

Lina from Essex writes:

When talking about grouped data and 2SLS (section 4.1.3) you mention
that expanding a continuous instrument is equivalent to have a set of
Wald estimators that consistent estimates the causal effect of
interest and in the Vietnam paper you mention that using the whole set
of dummies as instruments is more efficient.  I was wondering whether
using grouped data and instrumenting by the set of dummies for
different values of the continuous instrument differ from using the
continuous instrument (i.e. in your case using the continuous RSN). Is
there any gain of efficiency in the estimation? or is it just to
interpret the result under the set of Wald estimators? In other words.
If you have the continuous instrument why would you expand it? and
have over identification?. Thank you very much!!!, all the best.

Good question Lina.  One answer is the conceptual appeal of putting 
together Wald estimators.  Takes the mystery out of 2SLS!  But there 
is a more formal argument for dummying out intervals of a continuous 
instrument and then doing 2SLS with the dummies. As discussed in 
Section 4.1.3, in a homoskedastic constant-effects model with a 
continuous instrument, the efficient  method of moments estimator uses 
the (unknown) E[D|Z] as an instrument, where D is the variable to be 
instrumented and Z is the continuous instrument.  You
can think of a model with many dummies for intervals of Z as a 
nonparametric approximation to this efficient but infeasible procedure. 
Just using Z itself as an instrument would be a ** parametric ** approx
and therefore, perhaps, not as good. Of course, you could add polynomials 
in Z for a similar nonparametric flavor, but the first stage would be ugly, 
and as you conjecture, we would lose the conceptual appeal of combining 
Wald estimators.

My 1990 draft lottery paper shows this reasoning in action.  
See Newey (1990) for the theory.

JA
Published | Tagged , | Leave a comment

Regression what?!

Matt from Western Kentucky U comments on Chapter 3. . .

Question: You state:

“Our view is that regression can be motivated as a particular sort of
weighted matching estimator, and therefore the differences between
regression and matching estimates are unlikely to be of major
empirical importance” (Chapter 3 p. 70)

I take this to mean that in a ‘mostly harmless way’ regular OLS
regression is in fact a method of matching, or is a matching
estimator.  Is that an appropriate interpretation?  In ‘The Stata
Journal and his blog, Andrew Gelman takes issue with my understanding,
he states:
“A casual reader of the book might be left with the unfortunate
impression that matching is a competitor to regression rather than a
tool for making regression more effective.”
Any guidance?

Well Matt, Andrew Gelman’s intentions are undoubtedly good but I’m afraid he risks doing some harm here.  Suppose you’re interested in the effects of treatment, D, and you have a discrete control variable, X, for a selection-on-observables story.  Regress on D an a full set of dummies (i.e., saturated) model for X.  The resulting estimate of  the effect of D is equal to matching on X, and weighting across covariate cells by the variance of treatment conditional on X, as explained in Chapter 3.  While you might not always want to saturate, any other regression model for X gives the best linear approx to this version subject to whatever parameterization you’re using.

This means that i can’t imagine a situation where matching makes sense but regression does not (though some my say that I’m known for my lack of imagination when it comes to econometric methods)

JA

Published | Tagged , | 3 Comments

High Fashion at the Spring Meeting of Young Economists

could it be ... MHE

as seen at the University of Groningen . . .what a good-lookin crew!

excellent wardrobe choices

what are they spelling?  I wish I knew

SMYE forever

Published | Tagged , | 1 Comment

good eye!

Hui Cao from China caught this one . . . and it’s in the “corrected printing” to boot!

On 6/21/11 12:44 PM, 晖 曹 wrote:

On page 75:  [p(Xi=x|Di=1)(1-p(Xi=x|Di=1)] should be [p(Di=1|Xi=x)(1-P(Di=1|Xi=x)}

Yes indeed!

Published | Tagged , | Leave a comment

Fixed effects, lagged dependent variables, or what?

Arzu Kibris asks

Question: Suppose you have data on turnout at the county level for two
periods, t and t-1, and suppose there has been a change in the
electoral threshold in some states from t-1 to t. You want to analyze
whether this change has effected turnout. Because counties are
clustered within states, you think there might be some unobserved
state level effects which you control with state dummies. In models of
electoral behavior, to capture habit, lagged dependent variable is
also included as a control. So, in this example turnout at  t-1 is
included as an independent variable. Would such a model be considered
a fixed effects model with lagged dependent variable? It does not
include county-level fixed effects , nonetheless province level fixed
effects are still accounted for. I could not place such models in your
discussion. I would really appreciate if you could clarify.

Interesting question Arzu. First thing I would do to clarify the problem

is focus on the source of variation. If the law changes of interest are

at the state level, then that’s where the action is. You want to control

for state effects since thats the source of OVB. You can control for county

effects but counties are a red herring once you’ve got states under control.

It sounds like you are trying to have both fixed effects and lagged dependent

variables. I don’t find the idea of lagged dependent variables very appealing

in state-level DD. Its hard to see why the lagged dependent variable is a

primary source of OVB, while there are almost surely time-invariant state effects

to worry about.

Good luck with your project!

JA

Published | | Leave a comment

AP F-stat . . . one more time!

One more correction here folks.  If you follow our prescriptions you won’t get the degrees of freedom for the F-stat right.  ivreg2 gets this right for you (thanks to Jenny Hunt for pointing out this discrepancy and to Mark Schaffer from the ivreg team for resolving it).

But if you want to impress your friends and do it longhand . . .

Suppose you have two endogenous variables x1 and x2, and you want the AP F-stat for x2.  There is also a third (exogenous) covariate x3 and you have q instruments, z1, z2, …, zq.  Using ivreg2 you would run

ivreg2 y (x1 x2 = z*) x3, ffirst

To do this manually in Stata, try:

local q = 3                      /* or whatever the number of your instruments is */

reg x1 z* x3

predict double x1hat

reg x2 x1hat x3

predict double x2res, resid

reg x2res z* x3

testparm z*

dis r(F)*(`q’/(`q’-1))

The testparm procedure in the second last line will produce an F-test with q numerator degrees of freedom.  But you lost one dof for partialling out x1 (you need one of your q instruments to identify x1), so your correct number of dof is q-1.  The last line in the code fixes the dofs, and should produce the same answer you get from ivreg2.  If you follow the procedure in MHE, p. 217-18 by first partialling out x3 from the instruments you wouldn’t have an x3 anymore in your last regression you run for the F-test.  In this case, you will also have to fix the denominator dofs for the F-test, to adjust for the dofs lost due to the exogenous covariates.

SP

Published | | Leave a comment