• Welcome to the MHE Blog. We'll use this space to post corrections and comments and to address any questions you might have about the material in Mostly Harmless Econometrics. We especially welcome questions that are likely to be of interest to other readers. The econometric universe is infinite and expanding, so we ask that questions be brief and relate to material in the book. Here is an example:
    "In Section 8.2.3, (on page 319), you suggest that 42 clusters is enough for the usual cluster variance formula to be fairly accurate. Is that a joke, or do you really think so?"
    To which we'd reply: "Yes."

whoops

Eagle-eyed Robson Santos notes:

In the last paragraph of p. 55, the
expectations of $f_{i}(s−4)$ is taken and the expectation of
$f_{i}(s−1)$ is not. The text reads:

Conditional on $X_{i}$, the average causal effect of one-year increase
in schooling is $E[f_{i}(s)−f_{i}(s−1)|X_{i}]$, while the average
causal effect of a four-year increase in schooling is
$E[f_{i}(s)−E[f_{i}(s−4)]|X_{i}]$

In the second equation there is an expectation inside the expectation.

Indeed, Robson, that inner E is a typo!

Published | Tagged , | Leave a comment

Pop Quiz

Colin Vance asks:

A brief question about statistical significance: taking a
“population first” approach to econometrics, you note on page 36
that “the regression coefficients defined in this section are not
estimators; rather, they are nonstochastic features of the joint
distribution of dependent and independent variables.” You later
imply on page 40 that the issue of statistical inference arises when
we draw samples. My question is how do we interpret standard errors in
those (admittedly rare) instances when we have data on the entire
population. Does this circumstance render the notion of statistical
significance moot?

Good question Colin. No single answer, I’d say.

Some would say all data come from “super-populations,” that is, the data we happen to have could have come from other times, other places, or
other people, even if we seem to everyone from a particular scenario. Others take a model-based approach:some kind of stochastic process generates the data at hand; there is always more where they came from. Finally, an approach known as randomization inference recognizes that even in finite populations, counterfactuals remain hidden, and therefore we always require inference. You could spend your life pondering such things. I have to admit I try not to.

-JA

Published | Tagged , | Leave a comment

The Cosmic Allness of OVB

Michael Wolf from the University of Zurich asks the following brilliant OVB question:

Say the long regression of interest is

(1) yi =α+ρsi +γ1MOi +γ2IQi +vi . (1)

Here, MO stands for motivation and IQ stands for intelligence. In your notation then, Ai = (MOi, IQi)and γ = (γ1, γ2).

In practice, motivation and intelligence are not observed and one estimates the short regression

y i = α + ρ s i + η i ,

w i t h

(2) η i = A i γ + v i .

Since si is correlated with the error term vi (unless si is uncorrelated with Ai or γ = 0),  the short regression has OVB. So far so good.

But then you claim that if one estimates the short regression with IV using an suitable instrument xi that is uncorrelated with both the omitted control variables MOi and IQi (and thus uncorrelated with the error term ηi), on the one hand, and correlated with the regressor si, on the other hand, one can estimate ρ consistently. Here, I have some doubt.

What if instead of (1), the long regression of interest were ‘only’
(3) y i = α* + ρ*s i + γ I Q i + u i .

So here Ai = IQi. Since the instrument xi is uncorrelated with the (single) omitted control variable IQi, then estimating the short regression (2) with IV using the same instrument xi also results in a consistent estimator of ρ *, according to your logic. But this would seem a contradiction, since ρ* differs from ρ.

Yikes! A worrisome contradiction indeed . . . or so it would seem.
But the regressions in our discussion are linked with causal parameters, and that makes all the difference.

We start with 4.1.1, which defines a constant linear causal effect.  So LATE = rho in this setup.
The A (ability) variable at the top of page 116 is not a generic omitted variable but its a (set of) omitted variable(s) “that give a selection on observables story… the variables A are assumed to be the only reason eta and S are correlated, so that E[Sv]=0.” In other words, the regression of log wages on S and A produces the constant causal effect as coefficient on rho.  This is not generic; its an assumption.

Some other OLS regression, which controls for only say a subset of the variables in A (assuming A is multivariate) does not produce the same rho, as Michael rightly notes.  But IV is indifferent to the various OLS regs you’re thinking about running.  We have anchored the IV parameter by making regression 4.1.2 causal and arguing that it is
this rho that IV uncovers.

Another way to put this:  Given our assumptions, LATE is rho in 4.1.1.  What OLS regression produces this same parameter?  Only the one including the controls required for selection on observables.  Since Michael’s equation (3) is inadequately controlled, it won’t generate the same rho.  How to see this in the math?  It’s subtle.  Take the residual, eta, in 4.1.1, and regress that on the IQ variable that appears in equation (3).  The residual from this is orthogonal to IQ of course.  But since our A is Michael’s [MO, IQ], its not orthogonal to S because we must control for MO as well as IQ to get orthogonality with S.  Therefore the schooling coefficient in (3) in is not the schooling coefficient in (1) or in our causal model, 4.1.1 and 4.1.2.

Finally, we’re led to conjecture that if the OVB concept was easy and obvious, econometricians would spend more time explaining it.
Published | Tagged , | Leave a comment

Perry Preschool Subjects are Ageless

Observant reader Oliver Jones noticed that the Perry subjects can not be age 27 in 1993, as we mistakenly implied on page 11.  Rather, 1993 is the publication date of a study looking at the Perry subjects when they were 27.

Published | Tagged , | Leave a comment

Hey baby, what’s your name?

Noemi Banerjee-Duflo!

Published | Tagged , , | Leave a comment

RD News

Help is at hand from Calonico, Cattaneo, and Titiunik – check it out

Three (3) new Stata commands, no less!

rdrobust: new robust, bias-corrected confidence intervals (here’s the theory to go with)

rdbwselect: bandwidth selection this way and that

rdbinselect: Automated binwidth selection for those figs where you’re not doing any smoothing

Published | Tagged , , | Leave a comment

Signs of aging …

From: Martin Van der Linden
In chapter 1 page 6, you mention the case of the effect of start age
on results in school as and example of fundamentally unidentified
questions. 

Do you mean that what cannot be assessed experimentally is the very
effect of starting school later because a student who starts school
at 6 and is perfectly identical in all dimensions but start age to
another student starting school at 7 cannot be found?
If true, is there any reason we would like to measure this pure
effect of start age independently of maturation effect? Isn't it
precisely maturation effect we try to measure when thinking about
start age?

yes! any first grader who started at 7 will be older than a first grader
who starts at age 6 on test day.
Since we think there are big age-at-test effects,
the comparison in test scores between these two is misleading

Many school districts would like to boost their test scores and are
tempted to go for an older start age to do it. 
Older start ages will indeed boost scores
(suppose you couldn't enter first grade until after your bar mitzvah ...) 
But that fact doesn't mean older entrants are learning more; they
might well do worse (e.g., by virtue of the dropout age mechanism
detailed in Angrist and Krueger 1991)

JA
Published | Tagged , | Leave a comment

Our Chicago Connection

http://theincidentaleconomist.com/wordpress/dont-forget-the-t-shirt/

Thanks Austin!

Published | Tagged , | Leave a comment

Why children succeed

Published | Tagged , | 1 Comment

Probit better than LPM?

From Mark Schaffer:

Question: Dave Giles, in his econometrics blog, has spent a few blog entries attacking the linear probability model.

http://davegiles.blogspot.co.uk/2012/06/another-gripe-about-linear-probability.html

http://davegiles.blogspot.co.uk/2012/06/yet-another-reason-for-avoiding-linear.html

The first of these is the more convincing (at least for me): he cites Horace & Oaxaca (2006) who show that the LPM will usually generate biased and inconsistent estimates. Biasedness doesn’t bother me so much but inconsistency does, especially as it apparently carries over to estimates of the marginal effects.

Dave’s conclusion is that one should use probit or logit unless there are really good reasons not to (e.g., endogenous dummies or with panel data).

You’ve been staunch defenders of estimating the LPM using OLS, so I’d be very interested to see your views on this.

Best wishes,

Mark Schaffer

There are three arguments here: (1) The LPM does not estimate the structural parameters of a non-linear model (Horace and Oaxaca, 2006); (2) the LPM does not give consistent estimates of the marginal effects (Giles blog 1) and (3) the LPM does not lend itself towards dealing with measurement error in the dependent variable (Giles blog 2). The structural parameters of a binary choice model, just like the probit index coefficients, are not of particular interest to us. We care about the marginal effects. The LPM will do a pretty good job estimating those. If the CEF is linear, as it is for a saturated model, regression gives the CEF – even for LPM. If the CEF is non-linear, regression approximates the CEF. Usually it does it pretty well. Obviously, the LPM won’t give the true marginal effects from the right nonlinear model. But then, the same is true for the “wrong” nonlinear model! The fact that we have a probit, a logit, and the LPM is just a statement to the fact that we don’t know what the “right” model is. Hence, there is a lot to be said for sticking to a linear regression function as compared to a fairly arbitrary choice of a non-linear one! Nonlinearity per se is a red herring. As for measurement error, we would welcome seeing more applied work taking this seriously. Of course, plain vanilla probit is not the answer.

SP

Published | | 4 Comments