Blog | Mostly Harmless Econometrics

Welcome to the MHE Blog. We'll use this space to post corrections and comments and to address any questions you might have about the material in Mostly Harmless Econometrics. We especially welcome questions that are likely to be of interest to other readers. The econometric universe is infinite and expanding, so we ask that questions be brief and relate to material in the book. Here is an example:
"In Section 8.2.3, (on page 319), you suggest that 42 clusters is enough for the usual cluster variance formula to be fairly accurate. Is that a joke, or do you really think so?"
To which we'd reply: "Yes."

Corrections Coming!

Princeton University Press has graciously released a corrected version of MHE. This is not a new edition (we’re still recovering from the first!). But we’ve corrected the mistakes uncovered by careful readers in the past 18 months. The corrected version is now in print and should be shipping soon from Amazon and other big retailers. PUP plans to fulfill Fall 2010 course orders using the new version.

Which isn’t to say there are no more mistakes, so keep those corrections coming.

Published August 25, 2010 | Tagged Corrections, Questions, Reader Comments | 4 Comments

Possibly Harmful Econometrics?

Northwestern finance Prof Bernard Black describes some interesting causality bloopers, a valuable caution for students and teachers alike!

Published August 24, 2010 | Tagged Econometrics and other courses, Reader Comments | Leave a comment

Regression anatomy revealed

Valerio Filoso from the University of Naples has written a neat Stata routine that automates the regression anatomy formula and makes a complete family of partial regression plots. Check it out!

Published June 16, 2010 | Tagged Data, Reader Comments | Leave a comment

The RD bandwidth thing

Vanderson Amadeu da Rocha, a student of economics at FEA-RP / USP,
Brazil, asks:

My questions are about the chapter of Regression Discontinuity
Designs. What criteria are used to determine the neighborhood size in
nonparametric RDD Fuzzy and Sharp?

Great question Vanderson – The bandwidth is indeed at the business end of
nonparametric RD, though until recently we simply would have had to say
“try a few.”

Happily, a new paper by Imbens and Kalyanaraman provides a better answer
by deriving formulas for an MSE-minimizing choice.

Good luck with your project!

Published June 4, 2010 | Tagged Questions | Leave a comment

Can I get an indulgence for bad control?

We get a lot of questions about bad control. Here’s an interesting one from Colin Vance:

I'd like to estimate the effect of fuel price (which I assume is exogenous) 
on distance driven. As a control, I would like to include the fuel
efficiency of the driver's car. Although efficiency is likely to be
endogenous, leaving it out of the specification runs the risk of
imparting omitted bias on my fuel price estimate. But since it is
*just* a control, I'm inclined to leave efficiency as is in the model
and not worry about whether it is endogenous. Wise move?
Any insights would be appreciated!

Before tackling the metrics, think about a likely motivation for the research question. Suppose the government is considering a rise in the gas tax. Policy-makers would like to know how this will affect driving habits and fuel consumption. The government is unlikely to forbid people from buying a new more fuel efficient car in response to the tax, in fact they probably would like to encourage that. So who needs to know what the causal effect of a price rise is conditional on being locked in to my current vehicle? I think this observation neatly answers Colin’s question. Prices will go up, driving behavior will change for a number of reasons. There is no scenario where only one response is all that’s allowed (driving in the same car). Then there is the econometric problem that conditioning on fuel efficiency will not actually answer the question of how driving behavior changes for those who don’t buy a more fuel efficient car. That’s the bad control problem described in MHE – but that’s just metrics.

Published May 14, 2010 | Tagged Questions, Reader Comments | Leave a comment

How many df in that?!

Reading pp 298-299 with somewhat more care than they were written,
Tobias Wuergler from Zurich writes:

In order to demonstrate that robust standard errors are likely to
be more biased than non-robust under homoskedasticity, you use a
bivariate example, where the single regressor is assumed to be in
deviations-from-means form. Wouldn't one need, strictly speaking,
the regressand "y" to be in deviations-from-means form, too, in
order to partial out the constant? If so, the appropriate
degree-of-freedom correction should be (1-2/N) since the residual
maker in a demeaned regression is M(x)M(1), where M(1) is the
annihilator associated with the vector of ones (which one needs to
demean). The square of this residual maker is (M(1)-H(x)), hence
E(e(hat)²)=sigma²*(m(ii,1)-h(ii,x)), and the sum of
(m(ii,1)-h(ii,x)) is equal to (N-1-1) since m(ii,1)=(1-1/N).
Intuitively, a demeaned simple regression (with the original model
having a constant) still needs a degree of freedom correction of 2
as an average needs to be estimated apart from the single beta. Or
am I misunderstanding your example?
(In order to circumvent this complication one could assume a simple
regression through the origin, which would not require x (nor y) to
be demeaned.)

Good catch Toby - partialing out the constant does not change the
underlying df in the estimated residual; you can't fool mother
nature.  So the df should be 2 and not 1. The argument about
relative bias of robust and conventional standard errors still
goes through, but to get the details right, change 1-1/N to 1-2/N
and make sure the leverage adds up to 2 and not 1.

Published May 1, 2010 | Tagged Corrections | Leave a comment

P-score in the reg?

Geo. from GA asks this interesting question 'bout the propensity score: 

I was wondering whether replacing high dimensional
covariates (X) in the regression model with their propensity scores
(p(X)) was a good idea? That is, Y = a + bT + cX + e becomes Y= a + bT
+ c(p(X)) + e. The book does not really address it unless I missed it.
What are the implications? Thanks.


George: its certainly not a crazy idea. In fact, Dehejia-Wahba (1999) tried 
this (Table 5, estimates labeled quadratic in score).  But its not clear 
what the theoretical justification is here; once you are using regression, 
why do this two-step procedure instead of just sticking the covs you've put 
in the score right into the reg (since you're implicitly assuming these are 
the only source of OVB)?  Also, as we know from chpt 3, regression does not 
estimate the pop ATE or the effect of treatment on the treated except under 
constant effects or if the score is constant. Score fiends are often after 
those parameters instead of the variance-weighted avg that regression produces.

Published March 27, 2010 | Tagged Questions | 1 Comment

42 clusters references swap

The references to Hansen (2007a) and Hansen (2007b) on page 322-323 are swapped. On page 322, it should be Hansen (2007b) referenced as discussing bias-correction of serial correlation parameters and on page 323, it should be Hansen (2007a) referenced as showing pretty good results for state clustering with modest numbers of states.

Steve must have been dozin’ on his galleys by this point.

Published March 18, 2010 | Tagged Corrections | 2 Comments

ivreg2 update

If you’re going to run multiple endogenous variables (not something we’re all that crazy about) you at least oughta look at the appropriate first stage Fs. And, as explained in an earlier post, we didn’t give the right formula in MHE. Luckily, a routine for first-stage F-stats in models with multiple endogenous variables is now programmed in ivreg2. The same update includes other useful routines, like two-way clustering. More information below:

New versions of and extensions to the Baum-Schaffer-Stillman packages
ivreg2, xtivreg2, ranktest and xtoverid, and a new program, ivreg29, are
now available from ssc.

The main extensions and upgrades are:

1.  2-way clustering.

2-way clustering, introduced by Cameron, Gelbach and Miller (2006) and
Thompson (2009), is now supported.  2-way clustering, e.g.,

	ivreg2 y x1 x2, cluster(id year)

or
	ivreg2 y (x = z1 z2), gmm2s (cluster id year)

allows for arbitrary within-cluster correlation in two cluster
dimensions.  In the examples above, standard errors and statistics are
robust to disturbances that are autocorrelated (correlated within
panels, clustering on id) and common (correlated across panels,
clustering on year).  In the second example, estimates also are
efficient in the presence of arbitrary within-panel and within-year
clustering.  As with 1-way clustering, the numbers of clusters in both
dimensions should be large.

2.  Angrist-Pischke first-stage F statistics

ivreg2 and xtivreg2 now provide Angrist-Pischke first-stage F
statistics.  Angrist and Pischke (2009, pp. 217-18) introduced
first-stage F statistics for tests of under- and weak identification
when there is more than one endogenous regressor.  In contrast to the
Cragg-Donald and Kleibergen-Paap statistics, which test the
identification of the equation as a whole, the AP first-stage F
statistics are tests of whether one of the endogenous regressors is
under- or weakly identified.

3.  SEs that are robust to autocorrelated across-panel disturbances

Following Thompson (2009), cluster-robust and kernel-robust SEs can be
combined and applied to panel data to produce SEs that are robust to
arbitary common autocorrelated disturbances.  This can also be combined
with 2-way clustering to provide SEs and statistics that are robust to
autocorrelated within-panel disturbances (clustering on panel id) and to
autocorrelated across-panel disturbances (clustering on time combined
with kernel-based HAC).

4.  ivreg2 has been Mata-ized

... and is noticably faster, in particular with time series and the CUE
(continuously-updated) GMM estimator.

5.  ivreg29 for users who don't yet have Stata 10 or 11

ivreg2 requires Stata 10 or later.  For those who have only Stata 9, we
have provided a new program, ivreg29.  ivreg29 is basically the previous
version of ivreg2 plus support for AP F-statistics and some minor bug
fixes.  ivreg29 does not support the other features described above.

For full details and examples, see the new help files accompanying the
programs.

Published February 20, 2010 | Tagged Reader Comments | Leave a comment

Multiple endogenous variables – now what?!

Diligent reader Daniela Falzon, who works at the World Bank (in France . . . or Washington, DC) writes us with the following interesting problem concerning multiple endogenous variables in 2SLS:

I am estimating Y = b0+ b1*X1 +b2* X2 + b3*X1*X2 + X3

Y is a dummy variable
X1 is a dummy variable and endogenous,
X2 is continuous and endogenous
X3 is a set of additional control variables.

I am running ivreg2 and so I just dump in the three endogenous variables and the instruments and of course I get very weird coefficients/results. And even if they were not weird, I would not be sure on how to interpret them.
Do you have a better idea of how I should do it or should I just focus on the interaction term and instrument it?

Or Could you please indicate me where in your book “Mostly Harmless Econometrics” I should get the answer?

Many thanks in advance for your response and best regards,

thanks for your question Daniela. Models with multiple endogenous variables are indeed hard to identify and the results can be hard to interpret.

So we don’t usually like to see them – for one thing it’s not clear why you’re tackling two causal questions at the same time; one is hard enough.
You may have noticed that the only model with more than one endogenous regressor in MHE is the peer effects regression (equation 4.6.6, based on Acemoglu and Angrist, 2000). Here we have both individual and state-level schooling endogenous in a wage equation.

But we are really only interested in the peer effect in this case – the effect of state average schooling. Individual schooling is there because we realize that any instrument for average schooling must also be correlated with individual schooling. We therefore try to fix this violation of the exclusion restriction by treating individual schooling as endogenous as well. This is the best reason for having a second endog variable that I can think of. And the model may work – in the case of schooling we have enough instruments. But not very often, I would think.

More generally, it doesn’t make sense to think of one endogenous variable as a “control” when looking at the effects of another, at least not a good one (in the sense in which we use the terms good and bad control in chapter 3). So any time someone shows me a problem with more than one endogenous variable, my first question is always: why?

Published February 8, 2010 | Tagged Questions, Reader Comments | 6 Comments

Ask a Question

Send a Correction

Post a Comment
Tags
Recent Posts

RSS Links
- All posts
- All comments

Tags

Recent Posts

RSS Links