\(\hspace{2cm}\)
Overview of DiD
Problems with Staggered DiD
Simulation Results
Some Alternative Methods
Application
\(\hspace{2cm}\)
Think Card and Krueger minimum wage study comparing NJ and PA.
2 units and 2 time periods.
1 unit (T) is treated, and receives treatment in the second period. The control unit (C) is never treated.
Building upon \(\color{blue}{\text{Angrist & Pischke (2008, p. 228)}}\) we can think of these simple 2x2 DiDs as a fixed effects estimator.
Potential Outcomes
The expected outcome is a linear function of unit and time fixed effects: $$E[{Y_{i, t}^0}] =\alpha_i + \alpha_t$$ $$E[{Y_{i, t}^1}] =\alpha_i + \alpha_t + \delta D_{st}$$
Difference in expectations for the control unit times t = 1 and t = 0: $$\begin{align*} E[Y_{C, 1}^0] & = \alpha_1 + \alpha_C \\ E[Y_{C, 0}^0] & = \alpha_0 + \alpha_C \\ E[Y_{C, 1}^0] - E[Y_{C, 0}^0] & = \alpha_1 - \alpha_0 \end{align*}$$
Now do the same thing for the treated unit: $$\begin{align*} E[Y_{T, 1}^1] & = \alpha_1 + \alpha_T + \delta \\ E[Y_{T, 0}^1] & = \alpha_0 + \alpha_T \\ E[Y_{T, 1}^1] - E[Y_{T, 0}^1] & = \alpha_1 - \alpha_0 + \delta \end{align*}$$
$$\delta= \begin{align*} & \left( E[Y_{T, 1}^1] - E[Y_{T, 0}^1] \right) - \left( E[Y_{C, 1}^0] - E[Y_{C, 0}^0] \right) \end{align*}$$
The DiD can be estimated through linear regression of the form:
$$\tag{1} y_{it} = \alpha + \beta_1 TREAT_i + \beta_2 POST_t + \delta (TREAT_i \cdot POST_t) + \epsilon_{it}$$
The coefficients from the regression estimate in (1) recover the same parameters as the double-differencing performed above: $$\begin{align*} \alpha &= E[y_{it} | i = C, t = 0] = \alpha_0 + \alpha_C \\ \beta_1 &= E[y_{it} | i = T, t = 0] - E[y_{it} | i = C, t= 0] \\ &= (\alpha_0 + \alpha_T) - (\alpha_0 + \alpha_C) = \alpha_T - \alpha_C \\ \beta_2 &= E[y_{it} | i = C, t = 1] - E[y_{it} | i = C, t = 0] \\ &= (\alpha_1 + \alpha_C) - (\alpha_0 + \alpha_C) = \alpha_1 - \alpha_0 \\ \delta &= \left(E[y_{it} | i = T, t = 1] - E[y_{it} | i = T, t = 0] \right) - \\ &\hspace{.5cm} \left(E[y_{it} | i = C, t = 1] - E[y_{it} | i = C t = 0] \right) = \delta \end{align*}$$
Advantage of regression DiD - it provides both estimates of \(\delta\) and standard errors for the estimates.
\(\color{blue}{\text{Angrist & Pischke (2008)}}\):
Two-way fixed effects estimator: $$y_{it} = \alpha_i + \alpha_t + \delta^{DD} D_{it} + \epsilon_{it}$$
\(\alpha_i\) and \(\alpha_t\) are unit and time fixed effects, \(D_{it}\) is the unit-time indicator for treatment.
\(TREAT_i\) and \(POST_t\) now subsumed by the fixed effects.
can be easily modified to include covariate matrix \(X_{it}\), time trends, dynamic treatment effects estimation, etc.
Developed literature now on the issues with TWFE DiD with "staggered treatment timing" (Abraham and Sun (2018), Borusyak and Jaravel (2018), Callaway and Sant'Anna (2019), Goodman-Bacon (2019), Strezhnev (2018), Athey and Imbens (2018))
Probably the most common use of DiD today. If done right can increase amount of cross-sectional variation.
Without digging into the literature:
\(\delta^{DD}\) with staggered treatment timing is a weighted average of many different treatment effects.
We know little about how it measures when treatment timing varies, how it compares means across groups, or why different specifications change estimates.
The weights are often negative and non-intuitive.
Important Insights
\(\delta^{DD}\) is just the weighted average of the four 2x2 treatment effects. The weights are a function of the size of the subsample, relative size of treatment and control units, and the timing of treatment in the sub sample.
Already-treated units act as controls even though they are treated.
Given the weighting function, panel length alone can change the DiD estimates substantially, even when each \(\delta^{DD}\) does not change.
Groups treated closer to middle of panel receive higher weights than those treated earlier or later.
Can show how easily \(\delta^{DD}\) goes awry up through a simulation exercise.
Assume we're modeling outcome variable \(y_{it}\) on balanced panel with \(T = 36\) years from 1980 to 2015 with 1000 firms \(i\).
Time-invariant unit effects and time-varying year effects drawn from \(\sim N \left(0, \frac{1}{2}^2\right)\).
Firms are incorporate in of 50 randomly drawn states, and states are randomly assigned into three treatment groups \(G_g \in \{1989, 1998, 2006\}\).
\(\hspace{0.5cm}\)
Model the treatment effect process in three ways
Only one treatment period (1998) and one treated group, with constant additive treatment effects.
Allow for staggered treatment timing but with constant additive effects. Simulated treatment effects \(\tau\) are all positive in expectation but decrease over time ($\tau{G1989} = 5, \tau{G1998} = 3, \tau_{G2007} = 1$).
Allow for both staggered treatment timing and change-in-trend "dynamic" treatment effects. Instead of a constant \(\tau\) for each group, \(\tau_i\) is the yearly increase in outcome variable that compounds over time. Here \(\tau_{i, G1989} = 0.5, \tau_{i, G1998} = 0.3,\) and \(\tau_{i, G2007} = 0.1\).
\(\hspace{0.5cm}\)
$$y_{it} = \alpha_i + \alpha_t + \delta^{DD}D_{it} + \epsilon_{it}$$
\(\hspace{0.5cm}\)
\(\hspace{0.5cm}\)
\(\hspace{0.5cm}\)
\(\hspace{0.5cm}\)
$$\begin{equation} ATT(g, t) = \mathbb{E} \left[\left( \frac{G_g}{\mathbb{E}[G_g]} - \frac{\frac{p_g(X)C}{1 - p_g(X)}}{\mathbb{E}\left[\frac{p_g(X)C}{1 - p_g(X)} \right]} \right) \left(Y_t - T_{g - 1}\right)\right] \end{equation}$$
A relatively straightforward extension of the standard event-study TWFE model:
$$y_{it} = \alpha_i + \alpha_t + \sum_e \sum_{l \neq -1} \delta_{el}(1\{E_i = e\} \cdot D_{it}^l) + \epsilon_{it}$$
You saturate the relative time indicators (i.e. t = -2, -1, ...) with indicators for the treatment initiation year group, and aggregate to overall aggregate relative time indicators by cohort size.
In the case of no covariates, this gives you the same estimate as Callaway & Sant'Anna if you fully saturate the model with time indicators (leaving only two relative year identifiers missing).
The authors don't claim that it can be used with covariates, but it seemingly follows if we think it is okay with normal TWFE DiD.
Similar to the standard TWFE DiD, but we ensure that no previously treated units enter as controls by trimming the sample.
For each treatment cohort \(G_g\), get all treated units, and all units that are not treated by year \(g + k\) where \(g\) is the treatment year and \(k\) is the outer most relative year that you want to test (e.g. if you do an event study plot from -5 to 5, \(k\) would equal 5).
Keep only observations within years \(g - k\) and \(g + k\) for each cohort-specific dataset, and then stack them in relative time.
Run the same TWFE estimates as in standard DiD, but include interactions for the cohort-specific dataset with all of the fixed effects, controls, and clusters.
\(\color{blue}{\text{Bachhuber et al. 2014}}\) found, using a staggered DiD, that states with medical cannabis laws experienced a slower increase in opioid overdose mortality from 1999-2010.
\(\color{blue}{\text{Shover et al. 2020}}\) extend the data sample from 2010 to 2017, a period during which 32 extra states passed MML laws.
Not only do the results go away, but the sign flips; MML laws are associated with higher opioid overdose mortality rates.
Authors don't call it difference-in-differences, but it uses TWFE with a binary indicator variable (thus is effectively DiD).
Little evidence covariates matter here, so estimate standard DiD with no controls over the two periods:
$$y_{it} = \alpha_i + \alpha_t + \sum_{k = k_*}^{k^*} \delta_k D_{it} + \epsilon_{it}$$
where \(\alpha_i\) and \(\alpha_t\) are state and year fixed effects respectively, and \(\delta_k\) are the coefficients on the lead/lag indicators for years around treatment.
1999-2010 |
1999-2017 |
|||||
---|---|---|---|---|---|---|
Type | Average Estimate | Number of 2x2 Comparisons | Total Weight | Average Estimate | Number of 2x2 Comparisons | Total Weight |
Earlier vs Later Treated | -0.11 | 21 | 0.04 | -0.16 | 91 | 0.38 |
Later vs Earlier Treated | 0.09 | 28 | 0.16 | 0.32 | 105 | 0.42 |
Treated vs Untreated | -0.25 | 7 | 0.79 | 0.44 | 14 | 0.20 |
DiDs are a powerful tool and we are going to keep using them.
But we should make sure we understand what we're doing! DiD is a comparison of means and at a minimum we should know which means we're comparing.
Multiple new methods have been proposed, all of which ensure that you aren't using prior treated units as controls.
You should probably tailor your selection of method to your data structure: they use and discard different amount of control units and depending on your setting this might matter.
Unclear what's going on with MMLs and opioid mortality rates, but very unlikely that the results in the first published paper is robust.
\(\hspace{2cm}\)
Overview of DiD
Problems with Staggered DiD
Simulation Results
Some Alternative Methods
Application
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |