<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mostly Harmless Econometrics &#187; Questions</title>
	<atom:link href="http://www.mostlyharmlesseconometrics.com/tag/questions/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mostlyharmlesseconometrics.com</link>
	<description></description>
	<lastBuildDate>Sun, 27 Jun 2010 19:33:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>The RD bandwidth thing</title>
		<link>http://www.mostlyharmlesseconometrics.com/2010/06/the-rd-bandwidth-thing/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2010/06/the-rd-bandwidth-thing/#comments</comments>
		<pubDate>Sat, 05 Jun 2010 02:46:01 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=695</guid>
		<description><![CDATA[Vanderson Amadeu da Rocha, a student of economics at FEA-RP / USP, Brazil, asks: My questions are about the chapter of Regression Discontinuity Designs. What criteria are used to determine the neighborhood size in nonparametric RDD Fuzzy and Sharp? Great question Vanderson &#8211; The bandwidth is indeed at the business end of nonparametric RD, though [...]]]></description>
			<content:encoded><![CDATA[<p>Vanderson Amadeu da Rocha, a student of economics at FEA-RP / USP,<br />
Brazil, asks:</p>
<p>My questions are about the chapter of Regression Discontinuity<br />
Designs.  What criteria are used to determine the neighborhood size in<br />
nonparametric RDD Fuzzy and Sharp?</p>
<p><em>Great question Vanderson &#8211; The bandwidth is indeed at the business end of<br />
</em><em>nonparametric RD, though until recently we simply would have had to say<br />
&#8220;try a few.&#8221;</em></p>
<p><em>Happily,  a new paper by <a href="//www.nber.org/papers/w14726" target="_blank">Imbens and Kalyanaraman</a> provides a better answer<br />
by deriving formulas for an MSE-minimizing choice. </em></p>
<p><em>Good luck with your project!<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2010/06/the-rd-bandwidth-thing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Can I get an indulgence for bad control?</title>
		<link>http://www.mostlyharmlesseconometrics.com/2010/05/can-i-get-an-indulgence-for-bad-control/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2010/05/can-i-get-an-indulgence-for-bad-control/#comments</comments>
		<pubDate>Sat, 15 May 2010 02:51:22 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>
		<category><![CDATA[Reader Comments]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=674</guid>
		<description><![CDATA[We get a lot of questions about bad control.  Here&#8217;s an interesting one from Colin Vance: I'd like to estimate the effect of fuel price (which I assume is exogenous) on distance driven. As a control, I would like to include the fuel efficiency of the driver's car. Although efficiency is likely to be endogenous, [...]]]></description>
			<content:encoded><![CDATA[<p>We get a lot of questions about bad control.  Here&#8217;s an interesting one from Colin Vance:</p>
<pre>I'd like to estimate the effect of fuel price (which I assume is exogenous)
on distance driven. As a control, I would like to include the fuel
efficiency of the driver's car. Although efficiency is likely to be
endogenous, leaving it out of the specification runs the risk of
imparting omitted bias on my fuel price estimate. But since it is
<strong><span>*</span>just<span>*</span></strong> a control, I'm inclined to leave efficiency as is in the model
and not worry about whether it is endogenous. Wise move?
Any insights would be appreciated!</pre>
<p><em>Before tackling the metrics, think about a likely motivation for the research question.  Suppose the government is considering a rise in the gas tax.  Policy-makers would like to know how this will affect driving habits and fuel consumption.  The government is unlikely to forbid people from buying a new more fuel efficient car in response to the tax, in fact they probably would like to encourage that.  So who needs to know what the causal effect of a price rise is conditional on being locked in to my current vehicle?  I think this observation neatly answers Colin&#8217;s  question.  Prices will go up, driving behavior will change for a number of reasons.  There is no scenario where only one response is all that&#8217;s allowed (driving in the same car). Then there is the econometric problem that conditioning on fuel efficiency will not actually answer the question of how driving behavior changes for those who don&#8217;t buy a more fuel efficient car.  That&#8217;s the bad control problem described in MHE &#8211; but that&#8217;s just metrics.<br />
</em></p>
<p><em>JA</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2010/05/can-i-get-an-indulgence-for-bad-control/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>P-score in the reg?</title>
		<link>http://www.mostlyharmlesseconometrics.com/2010/03/p-score-in-the-reg/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2010/03/p-score-in-the-reg/#comments</comments>
		<pubDate>Sat, 27 Mar 2010 22:18:04 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=653</guid>
		<description><![CDATA[Geo. from GA asks this interesting question 'bout the propensity score: I was wondering whether replacing high dimensional covariates (X) in the regression model with their propensity scores (p(X)) was a good idea? That is, Y = a + bT + cX + e becomes Y= a + bT + c(p(X)) + e. The book [...]]]></description>
			<content:encoded><![CDATA[<pre>Geo. from GA asks this interesting question 'bout the propensity score: 

I was wondering whether replacing high dimensional
covariates (X) in the regression model with their propensity scores
(p(X)) was a good idea? That is, Y = a + bT + cX + e becomes Y= a + bT
+ c(p(X)) + e. The book does not really address it unless I missed it.
What are the implications? Thanks.
<em>
</em>
<pre><em>George: its certainly not a crazy idea. In fact, Dehejia-Wahba (1999) tried
this (Table 5, estimates labeled quadratic in score).  But its not clear
what the theoretical justification is here; once you are using regression,
why do this two-step procedure instead of just sticking the covs you've put
in the score right into the reg (since you're implicitly assuming these are
the only source of OVB)?  Also, as we know from chpt 3, regression does not
estimate the pop ATE or the effect of treatment on the treated except under
constant effects or if the score is constant. Score fiends are often after
those parameters instead of the variance-weighted avg that regression produces.</em></pre>
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2010/03/p-score-in-the-reg/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Multiple endogenous variables &#8211; now what?!</title>
		<link>http://www.mostlyharmlesseconometrics.com/2010/02/multiple-endogenous-variables-what-now/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2010/02/multiple-endogenous-variables-what-now/#comments</comments>
		<pubDate>Mon, 08 Feb 2010 19:47:40 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>
		<category><![CDATA[Reader Comments]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=630</guid>
		<description><![CDATA[Diligent reader Daniela Falzon, who works at  the World Bank (in France . . . or Washington, DC) writes us with the following interesting problem concerning multiple endogenous variables in 2SLS: I am estimating Y = b0+ b1*X1 +b2* X2 + b3*X1*X2 + X3 Y is a dummy variable X1 is a dummy variable and [...]]]></description>
			<content:encoded><![CDATA[<p>Diligent reader Daniela Falzon, who works at  the World Bank (in France . . . or Washington, DC) writes us with the following interesting problem concerning multiple endogenous variables in 2SLS:</p>
<p>I am estimating Y = b0+ b1*X1 +b2* X2 + b3*X1*X2 + X3</p>
<div>Y is a dummy variable<br />
X1 is a dummy variable and endogenous,<br />
X2 is continuous and endogenous<br />
X3 is a set of additional control variables.</div>
<div>I am running ivreg2 and so I just dump in the three endogenous variables and  the instruments and of course I get very weird coefficients/results. And even if they were not weird, I would not be sure on how to interpret them.<br />
Do you have a better idea of how I should do it or should I just focus on the interaction term and instrument it?</div>
<div>Or Could you please indicate me where in  your book &#8220;Mostly Harmless Econometrics&#8221;  I should get the answer?</div>
<p>Many thanks in advance for your response and best regards,</p>
<p><em>thanks for your question Daniela.  Models with multiple endogenous variables are indeed hard to identify and the results can be hard to interpret.</em></p>
<p><em>So we don&#8217;t usually like to see them &#8211; for one thing it&#8217;s not clear why you&#8217;re tackling two causal questions at the same time; one is hard enough.<br />
You may have noticed that the only model with more than one endogenous regressor in MHE is the peer effects regression (equation 4.6.6, based on Acemoglu and Angrist, 2000).  Here we have both individual and state-level schooling endogenous in a wage equation.</em></p>
<p><em>But we are really only interested in the peer effect in this case &#8211; the effect of state average schooling. Individual schooling is there because we realize that any instrument for average schooling must also be correlated with individual schooling.  We therefore try to fix this violation of the exclusion restriction by treating individual schooling as endogenous as well. This is the best reason for having a second endog variable that I can think of.  And the model may work &#8211; in the case of schooling we have enough instruments.  But not very often, I would think.</em></p>
<p><em>More generally, it doesn&#8217;t make sense to think of one endogenous variable as a &#8220;control&#8221; when looking at the effects of another, at least not a <span style="text-decoration: underline">good</span> one (in the sense in which we use the terms good and bad control in chapter 3).  So any time someone shows me a problem with more than one endogenous variable, my first question is always: why?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2010/02/multiple-endogenous-variables-what-now/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Adding lagged dependent variables to differenced models</title>
		<link>http://www.mostlyharmlesseconometrics.com/2009/10/adding-lagged-dependent-vars-to-differenced-models/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2009/10/adding-lagged-dependent-vars-to-differenced-models/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 02:32:32 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>
		<category><![CDATA[Reader Comments]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=545</guid>
		<description><![CDATA[Reader Christopher Ordowich asks: In sections 5.3-5.4, there is a great discussion of using fixed effects vs. a lagged dependent variable with panel data. I am having trouble reconciling some of this discussion with a section in a recent paper by Imbens and Wooldridge (2008) titled &#8220;Recent Developments in the Econometrics of Program Evaluation.&#8221; On [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Reader Christopher Ordowich asks:</strong></p>
<p>In sections 5.3-5.4, there is a great discussion of using<br />
fixed effects vs. a lagged dependent variable with panel data. I am<br />
having trouble reconciling some of this discussion with a section in a<br />
recent paper by Imbens and Wooldridge (2008) titled &#8220;Recent<br />
Developments in the Econometrics of Program Evaluation.&#8221; On page 68 of<br />
their paper (as <a href="http://www.iza.org/en/webcontent/publications/papers/viewAbstract?dp_id=3640" target="_blank">published by IZA in 2008</a>) they suggest that it might<br />
be better in some circumstances with two periods of data to use first<br />
differencing and a lag of the dependent variable (assuming<br />
unconfoundedness given lagged outcomes). I understand your discussion<br />
of instrumenting for lagged variables if you have more than two<br />
periods, but with two periods, how do you react to adding a lag (the<br />
baseline value of the dependent variable) after first differencing<br />
with only two periods of data? I have had difficulty finding support<br />
for this approach elsewhere and given that you have given much thought<br />
to this issue, I was wondering what your opinion might be.</p>
<p><em>The way I see it, once you add a lagged dependent variable to a  differenced model, you are really doing lagged-dep-var control and not fixed  effects.  Steve may disagree (he&#8217;s generally less dogmatic than me).  This is not always exactly true but it is a theorem for the simple example we use to contrast f.e. and lagged-dep-var control in Section 5.4</em></p>
<p><em>Here&#8217;s that again:</em></p>
<p><em>two periods<br />
no covariates<br />
the treatment, D_it, is zero for everybody in period 1 and switched on for  some in period 2 (think of a training program that some people  participate in between periods; period 1 is before, period 2 is after  (similar to Ashenfelter  and Card, 1985)</em></p>
<p><em>ignoring constants,  fixed effects estimation fits</em></p>
<p><em>(1) Y_it &#8211; Y_it-1 = aD_it + error</em></p>
<p><em>lagged dependent variable estimation fits</em></p>
<p><em>(2) Y_it = gY_it-1 + bD_it + error</em></p>
<p><em>As I understand it, the Imbens-Wooldridge proposal is to throw Y_it-1 into  equation (1):</em></p>
<p><em>(3) Y_it &#8211; Y_it-1 = dY_it-1 + cD_it + error</em></p>
<p><em>But in this case, c is (algebraically) the same as b.  Why ? The coefficient c is</em></p>
<p><em>c= COV(Y_it &#8211; Y_it-1, D_it*)/V(D_it*)</em></p>
<p><em>where D_it* is the residual from a regression of D_it on Y_it-1.  But  this residual is orthogonal to Y_it-1, hence</em></p>
<p><em>c= COV(Y_it &#8211; Y_it-1, D_it*)/V(D_it*) = COV(Y_it, D_it*)/V(D_it*) = b in  equation (2)</em></p>
<p><em>So I say: &#8220;You wanna do fixed effects?  no lagged dependent variable, please (or at least be prepared to instrument it if you include one).   You wanna control for  lagged dependent variables?  Then, just do it!</em></p>
<p><em>&#8211; JDA<br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2009/10/adding-lagged-dependent-vars-to-differenced-models/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Is 2SLS really OK?</title>
		<link>http://www.mostlyharmlesseconometrics.com/2009/07/is-2sls-really-ok/</link>
		<comments>http://www.mostlyharmlesseconometrics.com/2009/07/is-2sls-really-ok/#comments</comments>
		<pubDate>Wed, 08 Jul 2009 12:24:39 +0000</pubDate>
		<dc:creator>josh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Questions]]></category>

		<guid isPermaLink="false">http://www.mostlyharmlesseconometrics.com/?p=487</guid>
		<description><![CDATA[Elias Dinas from EUI asks: In section 4.6.1 you explain very clearly the problems from the straightforward use of the 2SLS logic in binary choice and/orendogenous treatment models. You also provide a simple &#8216;linearized&#8217; alternative but this is useful at the cost of introducing back-door identifying information. It so happens that I have a continuous [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Elias Dinas from EUI asks</strong>: In section 4.6.1 you explain very clearly the problems from the straightforward use of the 2SLS logic in binary choice and/orendogenous treatment models. You also provide a simple &#8216;linearized&#8217; alternative but this is useful at the cost of introducing back-door identifying information. It so happens that I have a continuous Y a binary D, instrumented with two Zs (one binary the other continuous). I guess that if Y was also a dummy, MLE  could provide consistent estimates for the average effect (following wooldridge 2003:478). However, in this case, I think I am left with two alternatives: 2-stage probit least squares (the cdsimeq command in stata) whose second stage however seems to belong in the fobidden regressions family, and the &#8216;linearized&#8217; 2-Stage solution you suggest in the book. So my question is should I prefer one over the other or even consider a third option? Thank you very much for your help and looking forward for your reply. Elias Dinas</p>
<p><em>Thanks for your question Elias.</p>
<p>Section 4.6.1 discusses two approaches to 2SLS with a dummy endogenous variable, forbidden (plug-in) regression and the use nonlinear fitted values as instruments, neither of which we really like. Rather, as suggested by our discussion of nonlinear models with endogenous regressors in Section 6.4.3 (LDV reprise), we think you should use garden-variety 2SLS (IV) for dummy endogenous variables (as always; of course you can try fancier methods in the privacy of your own home, but this is what we like to see in published papers). With a single Bernoulli instrument IV gives you LATE; with two Bernoulli instruments, you get a weighted average of the two underlying LATEs. When one instrument is continuous, the weighting is a little trickier (see, e.g., the “fish paper”). But my experience is that the marginal effects from nonlinear structural models will be close to 2SLS (that’s how you can tell the structural model MFX were done correctly), and with 2SLS you might even get the standard errors right!</p>
<p>–JA</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mostlyharmlesseconometrics.com/2009/07/is-2sls-really-ok/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
