+ - 0:00:00
Notes for current slide
Notes for next slide

Difference-in-Differences: What it DiD?

Andrew Baker

Stanford University

2020-05-25

1 / 44

Outline of Talk

  1. Overview of DiD

  2. Problems with Staggered DiD

  3. Simulation Results

  4. Some Alternative Methods

  5. Application

2 / 44

Difference-in-Differences

  • Think Card and Krueger minimum wage study comparing NJ and PA.

  • 2 units and 2 time periods.

  • 1 unit (T) is treated, and receives treatment in the second period. The control unit (C) is never treated.

3 / 44

Difference-in-Differences

4 / 44

Difference-in-Differences

  • Building upon Angrist & Pischke (2008, p. 228)Angrist & Pischke (2008, p. 228) we can think of these simple 2x2 DiDs as a fixed effects estimator.

  • Potential Outcomes

    • Y1i,tY1i,t = value of dependent variable for unit ii in period tt with treatment.
    • Y0i,tY0i,t = value of dependent variable for unit ii in period tt without treatment.
  • The expected outcome is a linear function of unit and time fixed effects: E[Y0i,t]=αi+αtE[Y0i,t]=αi+αt E[Y1i,t]=αi+αt+δDstE[Y1i,t]=αi+αt+δDst

  • Goal of DiD is to get an unbiased estimate of the treatment effect δδ.
5 / 44

Difference-in-Differences as Solving System of Equations for Unknown Variable

  • Difference in expectations for the control unit times t = 1 and t = 0: E[Y0C,1]=α1+αCE[Y0C,0]=α0+αCE[Y0C,1]E[Y0C,0]=α1α0

  • Now do the same thing for the treated unit: E[Y1T,1]=α1+αT+δE[Y1T,0]=α0+αTE[Y1T,1]E[Y1T,0]=α1α0+δ

  • If we assume the linear structure of DiD, then unbiased estimate of δ is:

δ=(E[Y1T,1]E[Y1T,0])(E[Y0C,1]E[Y0C,0])

6 / 44

Two-Way Differencing

7 / 44

Regression DiD

The DiD can be estimated through linear regression of the form:

yit=α+β1TREATi+β2POSTt+δ(TREATiPOSTt)+ϵit

The coefficients from the regression estimate in (1) recover the same parameters as the double-differencing performed above: α=E[yit|i=C,t=0]=α0+αCβ1=E[yit|i=T,t=0]E[yit|i=C,t=0]=(α0+αT)(α0+αC)=αTαCβ2=E[yit|i=C,t=1]E[yit|i=C,t=0]=(α1+αC)(α0+αC)=α1α0δ=(E[yit|i=T,t=1]E[yit|i=T,t=0])(E[yit|i=C,t=1]E[yit|i=Ct=0])=δ

8 / 44

Regression DiD

9 / 44

Regression DiD - The Workhorse Model

  • Advantage of regression DiD - it provides both estimates of δ and standard errors for the estimates.

  • Angrist & Pischke (2008):

    • "It's also easy to add additional (units) or periods to the regression setup... [and] it's easy to add additional covariates."
  • Two-way fixed effects estimator: yit=αi+αt+δDDDit+ϵit

    • αi and αt are unit and time fixed effects, Dit is the unit-time indicator for treatment.

    • TREATi and POSTt now subsumed by the fixed effects.

    • can be easily modified to include covariate matrix Xit, time trends, dynamic treatment effects estimation, etc.

10 / 44

Where It Goes Wrong

  • Developed literature now on the issues with TWFE DiD with "staggered treatment timing" (Abraham and Sun (2018), Borusyak and Jaravel (2018), Callaway and Sant'Anna (2019), Goodman-Bacon (2019), Strezhnev (2018), Athey and Imbens (2018))

    • Different units receive treatment at different periods in time.
  • Probably the most common use of DiD today. If done right can increase amount of cross-sectional variation.

  • Without digging into the literature:

    • δDD with staggered treatment timing is a weighted average of many different treatment effects.

    • We know little about how it measures when treatment timing varies, how it compares means across groups, or why different specifications change estimates.

    • The weights are often negative and non-intuitive.

11 / 44

Bias with TWFE - Goodman-Bacon (2019)

  • Goodman-Bacon (2019) provides a clear graphical intuition for the bias. Assume three treatment groups - never treated units (U), early treated units (k), and later treated units (l).

12 / 44

Bias with TWFE - Goodman-Bacon (2019)

  • Goodman-Bacon (2019) shows that we can form four different 2x2 groups in this setting, where the effect can be estimated using the simple regression DiD in each group:

13 / 44

Bias with TWFE - Goodman-Bacon (2019)

  • Important Insights

    • δDD is just the weighted average of the four 2x2 treatment effects. The weights are a function of the size of the subsample, relative size of treatment and control units, and the timing of treatment in the sub sample.

    • Already-treated units act as controls even though they are treated.

    • Given the weighting function, panel length alone can change the DiD estimates substantially, even when each δDD does not change.

    • Groups treated closer to middle of panel receive higher weights than those treated earlier or later.

14 / 44

Simulation Exercise

  • Can show how easily δDD goes awry up through a simulation exercise.

  • Consider two sets of DiD estimates - one where the treatment occurs in one period, and one where the treatment is staggered.

  • The data generating process is linear: yit=αi+αt+δit+ϵit.

    • αi,αtN(0,1)
    • ϵi,tN(0,(12)2)
  • We will consider two different treatment assignment set ups for δit.

15 / 44

Simulation 1 - 1 Period Treatment

  • There are 40 states s, and 1000 units i randomly drawn from the 40 states.

  • Data covers years 1980 to 2010, and half the states receive "treatment" in 1995.

  • For every unit incorporated in a treated state, we pull a unit-specific treatment effect from μiN(0.3,(1/5)2).

  • Treatment effects here are trend breaks rather than unit shifts: the accumulated treatment effect δit is μi×(year1995+1) for years after 1995.

  • We then estimate the average treatment effect as ˆδ from:

    yit=^αi+^αt+ˆδDit

  • Simulate this data 1,000 and plot the distribution of estimates ˆδ and the true effect (red line).

16 / 44

Simulation 1 - 1 Period Treatment

17 / 44

Simulation 1 - 1 Period Treatment

18 / 44

Simulation 2 - Staggered Treatment

  • Run similar analysis with staggered treatment.

  • The 40 states are randomly assigned into four treatment cohorts of size 250 depending on year of treatment assignment (1986, 1992, 1998, and 2004)

  • DGP is identical, except that now δit is equal to μi×(yearτg+1) where τg is the treatment assignment year.

  • Estimate the treatment effect using TWFE and compare to the analytically derived true δ (red line).

19 / 44

Simulation 2 - Staggered Treatment

20 / 44

Simulation 2 - Staggered Treatment


21 / 44

Simulation 2 - Staggered Treatment

  • Main problem - we use prior treated units as controls.

  • When the treatment effect is "dynamic", i.e. takes more than one period to be incorporated into your dependent variable, you are subtracting the treatment effects from prior treated units from the estimate of future control units.

  • This biases your estimates towards zero when all the treatment effects are the same.

22 / 44

Another Simulation

  • Can we actually get estimates for δ that are of the wrong sign? Yes, if treatment effects for early treated units are larger (in absolute magnitude) than the treatment effects on later treated units.

  • Here firms are randomly assigned to one of 50 states. The 50 states are randomly assigned into one of 5 treatment groups Gg based on treatment being initiated in 1985, 1991, 1997, 2003, and 2009.

  • All treated firms incorporated in a state in treatment group Gg receive a treatment effect δiN(δg,.22).

  • The treatment effect is cumulative or dynamic - δit=δi×(yearGg).

23 / 44

Another Simulation

  • The average treatment effect multiple decreases over time:

Treatment Effect Averages
`Gg` `δg`
1985 0.5
1991 0.4
1997 0.3
2003 0.2
2009 0.1
24 / 44

Another Simulation

  • First let's look at the distribution of δDD using TWFE estimation with this simulated sample:

25 / 44

Goodman-Bacon Decomposition

26 / 44

Callaway & Sant'Anna

  • Inverse propensity weighted long-difference in cohort-specific average treatment effects between treated and untreated units for a given treatment cohort.

ATT(g,t)=E[(GgE[Gg]pg(X)C1pg(X)E[pg(X)C1pg(X)])(YtTg1)]

  • Without covariates, as in the simulated example here, it calculates the simple long difference between all treated units i in relative year k with all potential control units that have not yet been treated by year k.
27 / 44

Callaway & Sant'Anna

28 / 44

Abraham and Sun

  • A relatively straightforward extension of the standard event-study TWFE model:

    yit=αi+αt+el1δel(1{Ei=e}Dlit)+ϵit

  • You saturate the relative time indicators (i.e. t = -2, -1, ...) with indicators for the treatment initiation year group, and aggregate to overall aggregate relative time indicators by cohort size.

  • In the case of no covariates, this gives you the same estimate as Callaway & Sant'Anna if you fully saturate the model with time indicators (leaving only two relative year identifiers missing).

  • The authors don't claim that it can be used with covariates, but it seemingly follows if we think it is okay with normal TWFE DiD.

29 / 44

Abraham and Sun

30 / 44

Cengiz et al. (2019)

  • Similar to the standard TWFE DiD, but we ensure that no previously treated units enter as controls by trimming the sample.

  • For each treatment cohort Gg, get all treated units, and all units that are not treated by year g+k where g is the treatment year and k is the outer most relative year that you want to test (e.g. if you do an event study plot from -5 to 5, k would equal 5).

  • Keep only observations within years gk and g+k for each cohort-specific dataset, and then stack them in relative time.

  • Run the same TWFE estimates as in standard DiD, but include interactions for the cohort-specific dataset with all of the fixed effects, controls, and clusters.

31 / 44

Cengiz et al. (2019)

32 / 44

Model Comparison

  • In the stylized example all the models work. How do they differ?

  • Callaway & Sant'Anna

    • Can be very flexible in determining which control units to consider.
    • Has a more flexible functional form as well (IPW instead of OLS).
    • IPW can run into issues with p-scores near 0 or 1. But just bc OLS runs doesn't mean it's right!
  • Abraham & Sun

    • Very similar to regular TWFE OLS and hence easy to explain.
    • Control units are all units not treated within the data sample. If most of your units are treated by the end (or all), this can make control units very non-representative and restricted.
  • Cengiz et al.

    • Also fairly close to regular DiD.
    • Can modify this framework to allow different forms of control units as well.
    • Not theoretically derived.
33 / 44

Application - Medical Marijuana Laws and Opioid Overdose Deaths

  • Bachhuber et al. 2014 found, using a staggered DiD, that states with medical cannabis laws experienced a slower increase in opioid overdose mortality from 1999-2010.

  • Shover et al. 2020 extend the data sample from 2010 to 2017, a period during which 32 extra states passed MML laws.

  • Not only do the results go away, but the sign flips; MML laws are associated with higher opioid overdose mortality rates.

  • Authors don't call it difference-in-differences, but it uses TWFE with a binary indicator variable (thus is effectively DiD).

34 / 44

Replication

35 / 44

Event Study Estimates

  • Little evidence covariates matter here, so estimate standard DiD with no controls over the two periods:

    yit=αi+αt+k=Pre,Postδk+33δk+ϵit

36 / 44

Event Study Estimates

  • So we can verify that in the first sample (1999 - 2010), there appears to be a negative effect of law introduction, while in the full sample (1999 - 2017), there is a positive effect.

  • But there appears to be evidence of pre-trends in the full sample.

  • In addition, by the end of the sample the number of firms adopting MMLs is quite large.

  • If there are dynamic treatment effects, then these estimates could be biased from using many prior treated states as controls.

37 / 44

Bacon-Goodman Decomposition

38 / 44

Bacon-Goodman Decomposition

  • The unweighted average of the 2x2 treatment effects are negative for the earlier vs. later treated (unbiased), while positive for the later vs. earlier treated (biased).

  • The effect is also positive for the treated vs. untreated units, but there are not many untreated states (i.e. states without medical cannabis laws).

Type Average Estimate Number of 2x2 Comparisons Total Weight
Earlier vs Later Treated -0.16 91 0.38
Later vs Earlier Treated 0.32 105 0.42
Treated vs Untreated 0.44 14 0.20
39 / 44

Callaway & Sant'Anna

40 / 44

Abraham & Sun

  • Skip for now - without covariates it's the same as Callaway & Sant'Anna
41 / 44

Cengiz et al.

  • First, we can plot the state-specific DiD estimates, separated by adoption period:

42 / 44

Cengiz et al.

43 / 44

Takeaways

  • DiDs are a powerful tool and we are going to keep using them.

  • But we should make sure we understand what we're doing! DiD is a comparison of means and at a minimum we should know which means we're comparing.

  • Multiple new methods have been proposed, all of which ensure that you aren't using prior treated units as controls.

  • You should probably tailor your selection of method to your data structure: they use and discard different amount of control units and depending on your setting this might matter.

  • Unclear what's going on with MMLs and opioid mortality rates, but very unlikely that the results in the first published paper is robust.

44 / 44

Outline of Talk

  1. Overview of DiD

  2. Problems with Staggered DiD

  3. Simulation Results

  4. Some Alternative Methods

  5. Application

2 / 44
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow