+ - 0:00:00
Notes for current slide
Notes for next slide

Difference-in-Differences in 2020

Common Pitfalls and How to Avoid Them

Andrew Baker

Stanford University

2020-09-21

1 / 29

Outline of Talk

\(\hspace{2cm}\)

  1. Overview of DiD

  2. Problems with Staggered DiD

  3. Simulation Results

  4. Some Alternative Methods

  5. Application

2 / 29

Difference-in-Differences

\(\hspace{2cm}\)

  • Think Card and Krueger minimum wage study comparing NJ and PA.

  • 2 units and 2 time periods.

  • 1 unit (T) is treated, and receives treatment in the second period. The control unit (C) is never treated.

3 / 29

Difference-in-Differences

4 / 29

Difference-in-Differences

  • Building upon \(\color{blue}{\text{Angrist & Pischke (2008, p. 228)}}\) we can think of these simple 2x2 DiDs as a fixed effects estimator.

  • Potential Outcomes

    • \(Y_{i, t}^1\) = value of dependent variable for unit \(i\) in period \(t\) with treatment.
    • \(Y_{i, t}^0\) = value of dependent variable for unit \(i\) in period \(t\) without treatment.
  • The expected outcome is a linear function of unit and time fixed effects: $$E[{Y_{i, t}^0}] =\alpha_i + \alpha_t$$ $$E[{Y_{i, t}^1}] =\alpha_i + \alpha_t + \delta D_{st}$$

  • Goal of DiD is to get an unbiased estimate of the treatment effect \(\delta\).
5 / 29

Difference-in-Differences as Solving System of Equations for Unknown Variable

  • Difference in expectations for the control unit times t = 1 and t = 0: $$\begin{align*} E[Y_{C, 1}^0] & = \alpha_1 + \alpha_C \\ E[Y_{C, 0}^0] & = \alpha_0 + \alpha_C \\ E[Y_{C, 1}^0] - E[Y_{C, 0}^0] & = \alpha_1 - \alpha_0 \end{align*}$$

  • Now do the same thing for the treated unit: $$\begin{align*} E[Y_{T, 1}^1] & = \alpha_1 + \alpha_T + \delta \\ E[Y_{T, 0}^1] & = \alpha_0 + \alpha_T \\ E[Y_{T, 1}^1] - E[Y_{T, 0}^1] & = \alpha_1 - \alpha_0 + \delta \end{align*}$$

  • If we assume the linear structure of DiD, then unbiased estimate of \(\delta\) is:

$$\delta= \begin{align*} & \left( E[Y_{T, 1}^1] - E[Y_{T, 0}^1] \right) - \left( E[Y_{C, 1}^0] - E[Y_{C, 0}^0] \right) \end{align*}$$

6 / 29

Two-Way Differencing

7 / 29

Regression DiD

The DiD can be estimated through linear regression of the form:

$$\tag{1} y_{it} = \alpha + \beta_1 TREAT_i + \beta_2 POST_t + \delta (TREAT_i \cdot POST_t) + \epsilon_{it}$$

The coefficients from the regression estimate in (1) recover the same parameters as the double-differencing performed above: $$\begin{align*} \alpha &= E[y_{it} | i = C, t = 0] = \alpha_0 + \alpha_C \\ \beta_1 &= E[y_{it} | i = T, t = 0] - E[y_{it} | i = C, t= 0] \\ &= (\alpha_0 + \alpha_T) - (\alpha_0 + \alpha_C) = \alpha_T - \alpha_C \\ \beta_2 &= E[y_{it} | i = C, t = 1] - E[y_{it} | i = C, t = 0] \\ &= (\alpha_1 + \alpha_C) - (\alpha_0 + \alpha_C) = \alpha_1 - \alpha_0 \\ \delta &= \left(E[y_{it} | i = T, t = 1] - E[y_{it} | i = T, t = 0] \right) - \\ &\hspace{.5cm} \left(E[y_{it} | i = C, t = 1] - E[y_{it} | i = C t = 0] \right) = \delta \end{align*}$$

8 / 29

Regression DiD - The Workhorse Model

  • Advantage of regression DiD - it provides both estimates of \(\delta\) and standard errors for the estimates.

  • \(\color{blue}{\text{Angrist & Pischke (2008)}}\):

    • "It's also easy to add additional (units) or periods to the regression setup... [and] it's easy to add additional covariates."
  • Two-way fixed effects estimator: $$y_{it} = \alpha_i + \alpha_t + \delta^{DD} D_{it} + \epsilon_{it}$$

    • \(\alpha_i\) and \(\alpha_t\) are unit and time fixed effects, \(D_{it}\) is the unit-time indicator for treatment.

    • \(TREAT_i\) and \(POST_t\) now subsumed by the fixed effects.

    • can be easily modified to include covariate matrix \(X_{it}\), time trends, dynamic treatment effects estimation, etc.

9 / 29

Where It Goes Wrong

  • Developed literature now on the issues with TWFE DiD with "staggered treatment timing" (Abraham and Sun (2018), Borusyak and Jaravel (2018), Callaway and Sant'Anna (2019), Goodman-Bacon (2019), Strezhnev (2018), Athey and Imbens (2018))

    • Different units receive treatment at different periods in time.
  • Probably the most common use of DiD today. If done right can increase amount of cross-sectional variation.

  • Without digging into the literature:

    • \(\delta^{DD}\) with staggered treatment timing is a weighted average of many different treatment effects.

    • We know little about how it measures when treatment timing varies, how it compares means across groups, or why different specifications change estimates.

    • The weights are often negative and non-intuitive.

10 / 29

Bias with TWFE - Goodman-Bacon (2019)

  • \(\color{blue}{\text{Goodman-Bacon (2019)}}\) provides a clear graphical intuition for the bias. Assume three treatment groups - never treated units (U), early treated units (k), and later treated units (l).

11 / 29

Bias with TWFE - Goodman-Bacon (2019)

  • \(\color{blue}{\text{Goodman-Bacon (2019)}}\) shows that we can form four different 2x2 groups in this setting, where the effect can be estimated using the simple regression DiD in each group:

12 / 29

Bias with TWFE - Goodman-Bacon (2019)

  • Important Insights

    • \(\delta^{DD}\) is just the weighted average of the four 2x2 treatment effects. The weights are a function of the size of the subsample, relative size of treatment and control units, and the timing of treatment in the sub sample.

    • Already-treated units act as controls even though they are treated.

    • Given the weighting function, panel length alone can change the DiD estimates substantially, even when each \(\delta^{DD}\) does not change.

    • Groups treated closer to middle of panel receive higher weights than those treated earlier or later.

13 / 29

Simulation Exercise

  • Can show how easily \(\delta^{DD}\) goes awry up through a simulation exercise.

  • Assume we're modeling outcome variable \(y_{it}\) on balanced panel with \(T = 36\) years from 1980 to 2015 with 1000 firms \(i\).

  • Time-invariant unit effects and time-varying year effects drawn from \(\sim N \left(0, \frac{1}{2}^2\right)\).

  • Firms are incorporate in of 50 randomly drawn states, and states are randomly assigned into three treatment groups \(G_g \in \{1989, 1998, 2006\}\).

14 / 29

Simulation Exercise

\(\hspace{0.5cm}\)

  • Model the treatment effect process in three ways

    • Only one treatment period (1998) and one treated group, with constant additive treatment effects.

    • Allow for staggered treatment timing but with constant additive effects. Simulated treatment effects \(\tau\) are all positive in expectation but decrease over time ($\tau{G1989} = 5, \tau{G1998} = 3, \tau_{G2007} = 1$).

    • Allow for both staggered treatment timing and change-in-trend "dynamic" treatment effects. Instead of a constant \(\tau\) for each group, \(\tau_i\) is the yearly increase in outcome variable that compounds over time. Here \(\tau_{i, G1989} = 0.5, \tau_{i, G1998} = 0.3,\) and \(\tau_{i, G2007} = 0.1\).

15 / 29

Simulation Exercise

16 / 29

Simulation Exercise

\(\hspace{0.5cm}\)

  • With the simulated data we estimate TWFE DiD using MLE on:

$$y_{it} = \alpha_i + \alpha_t + \delta^{DD}D_{it} + \epsilon_{it}$$

\(\hspace{0.5cm}\)

  • Simple regression model with unit and time fixed effects.

\(\hspace{0.5cm}\)

  • For each of the three simulated datasets we run a Monte Carlo simulation where we create the datasets 1,000 times and plot the distribution of \(\widehat{\delta^{DD}}\).

\(\hspace{0.5cm}\)

  • Bias is deviation from true underlying treatment effect.
17 / 29

Simulation Exercise

18 / 29

Goodman-Bacon Decomposition for Simulation 3

\(\hspace{0.5cm}\)

19 / 29

Callaway & Sant'Anna

  • Inverse propensity weighted long-difference in cohort-specific average treatment effects between treated and untreated units for a given treatment cohort.

$$\begin{equation} ATT(g, t) = \mathbb{E} \left[\left( \frac{G_g}{\mathbb{E}[G_g]} - \frac{\frac{p_g(X)C}{1 - p_g(X)}}{\mathbb{E}\left[\frac{p_g(X)C}{1 - p_g(X)} \right]} \right) \left(Y_t - T_{g - 1}\right)\right] \end{equation}$$

20 / 29

Abraham and Sun

  • A relatively straightforward extension of the standard event-study TWFE model:

    $$y_{it} = \alpha_i + \alpha_t + \sum_e \sum_{l \neq -1} \delta_{el}(1\{E_i = e\} \cdot D_{it}^l) + \epsilon_{it}$$

  • You saturate the relative time indicators (i.e. t = -2, -1, ...) with indicators for the treatment initiation year group, and aggregate to overall aggregate relative time indicators by cohort size.

  • In the case of no covariates, this gives you the same estimate as Callaway & Sant'Anna if you fully saturate the model with time indicators (leaving only two relative year identifiers missing).

  • The authors don't claim that it can be used with covariates, but it seemingly follows if we think it is okay with normal TWFE DiD.

21 / 29

Stacked Regression

  • Similar to the standard TWFE DiD, but we ensure that no previously treated units enter as controls by trimming the sample.

  • For each treatment cohort \(G_g\), get all treated units, and all units that are not treated by year \(g + k\) where \(g\) is the treatment year and \(k\) is the outer most relative year that you want to test (e.g. if you do an event study plot from -5 to 5, \(k\) would equal 5).

  • Keep only observations within years \(g - k\) and \(g + k\) for each cohort-specific dataset, and then stack them in relative time.

  • Run the same TWFE estimates as in standard DiD, but include interactions for the cohort-specific dataset with all of the fixed effects, controls, and clusters.

22 / 29

Simulations - Remedies

23 / 29

Application - Medical Marijuana Laws and Opioid Overdose Deaths

  • \(\color{blue}{\text{Bachhuber et al. 2014}}\) found, using a staggered DiD, that states with medical cannabis laws experienced a slower increase in opioid overdose mortality from 1999-2010.

  • \(\color{blue}{\text{Shover et al. 2020}}\) extend the data sample from 2010 to 2017, a period during which 32 extra states passed MML laws.

  • Not only do the results go away, but the sign flips; MML laws are associated with higher opioid overdose mortality rates.

  • Authors don't call it difference-in-differences, but it uses TWFE with a binary indicator variable (thus is effectively DiD).

24 / 29

Replication of MML

25 / 29

Event Study Estimates

  • Little evidence covariates matter here, so estimate standard DiD with no controls over the two periods:

    $$y_{it} = \alpha_i + \alpha_t + \sum_{k = k_*}^{k^*} \delta_k D_{it} + \epsilon_{it}$$

where \(\alpha_i\) and \(\alpha_t\) are state and year fixed effects respectively, and \(\delta_k\) are the coefficients on the lead/lag indicators for years around treatment.

26 / 29

Goodman-Bacon Decomposition

1999-2010
1999-2017
Type Average Estimate Number of 2x2 Comparisons Total Weight Average Estimate Number of 2x2 Comparisons Total Weight
Earlier vs Later Treated -0.11 21 0.04 -0.16 91 0.38
Later vs Earlier Treated 0.09 28 0.16 0.32 105 0.42
Treated vs Untreated -0.25 7 0.79 0.44 14 0.20
27 / 29

Remedies

28 / 29

Takeaways

  • DiDs are a powerful tool and we are going to keep using them.

  • But we should make sure we understand what we're doing! DiD is a comparison of means and at a minimum we should know which means we're comparing.

  • Multiple new methods have been proposed, all of which ensure that you aren't using prior treated units as controls.

  • You should probably tailor your selection of method to your data structure: they use and discard different amount of control units and depending on your setting this might matter.

  • Unclear what's going on with MMLs and opioid mortality rates, but very unlikely that the results in the first published paper is robust.

29 / 29

Outline of Talk

\(\hspace{2cm}\)

  1. Overview of DiD

  2. Problems with Staggered DiD

  3. Simulation Results

  4. Some Alternative Methods

  5. Application

2 / 29
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow