class: center, middle, inverse, title-slide # Difference-in-Differences in 2020 ## Common Pitfalls and How to Avoid Them ### Andrew Baker ### Stanford University ### 2020-09-21 --- <style type="text/css"> @media print { .has-continuation { display: block !important; } } </style> # .center.pull[Outline of Talk] `\(\hspace{2cm}\)` 1. Overview of DiD 2. Problems with Staggered DiD 3. Simulation Results 4. Some Alternative Methods 5. Application --- # .center.pull[Difference-in-Differences] `\(\hspace{2cm}\)` - Think Card and Krueger minimum wage study comparing NJ and PA. - 2 units and 2 time periods. - 1 unit (T) is treated, and receives treatment in the second period. The control unit (C) is never treated. --- # .center.pull[Difference-in-Differences] <img src="SoDa_files/figure-html/d1-1.png" width="720" style="display: block; margin: auto;" /> --- # .center.pull[Difference-in-Differences] - Building upon `\(\color{blue}{\text{Angrist & Pischke (2008, p. 228)}}\)` we can think of these simple 2x2 DiDs as a fixed effects estimator. - Potential Outcomes - `\(Y_{i, t}^1\)` = value of dependent variable for unit `\(i\)` in period `\(t\)` with treatment. - `\(Y_{i, t}^0\)` = value of dependent variable for unit `\(i\)` in period `\(t\)` without treatment. - The expected outcome is a *linear function* of unit and time fixed effects: `$$E[{Y_{i, t}^0}] =\alpha_i + \alpha_t$$` `$$E[{Y_{i, t}^1}] =\alpha_i + \alpha_t + \delta D_{st}$$` - Goal of DiD is to get an unbiased estimate of the treatment effect `\(\delta\)`. --- # .center.pull[Difference-in-Differences as Solving System of Equations for Unknown Variable] - Difference in expectations for the *control* unit times t = 1 and t = 0: `$$\begin{align*} E[Y_{C, 1}^0] & = \alpha_1 + \alpha_C \\ E[Y_{C, 0}^0] & = \alpha_0 + \alpha_C \\ E[Y_{C, 1}^0] - E[Y_{C, 0}^0] & = \alpha_1 - \alpha_0 \end{align*}$$` - Now do the same thing for the *treated* unit: `$$\begin{align*} E[Y_{T, 1}^1] & = \alpha_1 + \alpha_T + \delta \\ E[Y_{T, 0}^1] & = \alpha_0 + \alpha_T \\ E[Y_{T, 1}^1] - E[Y_{T, 0}^1] & = \alpha_1 - \alpha_0 + \delta \end{align*}$$` - If we assume the linear structure of DiD, then unbiased estimate of `\(\delta\)` is: `$$\delta= \begin{align*} & \left( E[Y_{T, 1}^1] - E[Y_{T, 0}^1] \right) - \left( E[Y_{C, 1}^0] - E[Y_{C, 0}^0] \right) \end{align*}$$` --- # .center.pull[Two-Way Differencing] <img src="SoDa_files/figure-html/d2-1.gif" style="display: block; margin: auto;" /> --- # .center.pull[Regression DiD] The DiD can be estimated through linear regression of the form: `$$\tag{1} y_{it} = \alpha + \beta_1 TREAT_i + \beta_2 POST_t + \delta (TREAT_i \cdot POST_t) + \epsilon_{it}$$` The coefficients from the regression estimate in (1) recover the same parameters as the double-differencing performed above: `$$\begin{align*} \alpha &= E[y_{it} | i = C, t = 0] = \alpha_0 + \alpha_C \\ \beta_1 &= E[y_{it} | i = T, t = 0] - E[y_{it} | i = C, t= 0] \\ &= (\alpha_0 + \alpha_T) - (\alpha_0 + \alpha_C) = \alpha_T - \alpha_C \\ \beta_2 &= E[y_{it} | i = C, t = 1] - E[y_{it} | i = C, t = 0] \\ &= (\alpha_1 + \alpha_C) - (\alpha_0 + \alpha_C) = \alpha_1 - \alpha_0 \\ \delta &= \left(E[y_{it} | i = T, t = 1] - E[y_{it} | i = T, t = 0] \right) - \\ &\hspace{.5cm} \left(E[y_{it} | i = C, t = 1] - E[y_{it} | i = C t = 0] \right) = \delta \end{align*}$$` --- # .center.pull[Regression DiD - The Workhorse Model] - Advantage of regression DiD - it provides both estimates of `\(\delta\)` and standard errors for the estimates. - `\(\color{blue}{\text{Angrist & Pischke (2008)}}\)`: - "It's also easy to add additional (units) or periods to the regression setup... [and] it's easy to add additional covariates." - Two-way fixed effects estimator: `$$y_{it} = \alpha_i + \alpha_t + \delta^{DD} D_{it} + \epsilon_{it}$$` - `\(\alpha_i\)` and `\(\alpha_t\)` are unit and time fixed effects, `\(D_{it}\)` is the unit-time indicator for treatment. - `\(TREAT_i\)` and `\(POST_t\)` now subsumed by the fixed effects. - can be easily modified to include covariate matrix `\(X_{it}\)`, time trends, dynamic treatment effects estimation, etc. --- # .center.pull[Where It Goes Wrong] - Developed literature now on the issues with TWFE DiD with "staggered treatment timing" <span style="color:blue"> (Abraham and Sun (2018), Borusyak and Jaravel (2018), Callaway and Sant'Anna (2019), Goodman-Bacon (2019), Strezhnev (2018), Athey and Imbens (2018))<span> - Different units receive treatment at different periods in time. - Probably the most common use of DiD today. If done right can increase amount of cross-sectional variation. - Without digging into the literature: - `\(\delta^{DD}\)` with staggered treatment timing is a *weighted average of many different treatment effects*. - We know little about how it measures when treatment timing varies, how it compares means across groups, or why different specifications change estimates. - The weights are often negative and non-intuitive. --- # .center.pull[Bias with TWFE - Goodman-Bacon (2019)] - `\(\color{blue}{\text{Goodman-Bacon (2019)}}\)` provides a clear graphical intuition for the bias. Assume three treatment groups - never treated units (U), early treated units (k), and later treated units (l). <img src="SoDa_files/figure-html/d3-1.png" width="504" style="display: block; margin: auto;" /> --- # .center.pull[Bias with TWFE - Goodman-Bacon (2019)] - `\(\color{blue}{\text{Goodman-Bacon (2019)}}\)` shows that we can form four different 2x2 groups in this setting, where the effect can be estimated using the simple regression DiD in each group: <img src="SoDa_files/figure-html/d4-1.png" width="504" style="display: block; margin: auto;" /> --- # .center.pull[Bias with TWFE - Goodman-Bacon (2019)] - Important Insights - `\(\delta^{DD}\)` is just the weighted average of the four 2x2 treatment effects. The weights are a function of the size of the subsample, relative size of treatment and control units, and the timing of treatment in the sub sample. - Already-treated units act as controls even though they are treated. - Given the weighting function, panel length alone can change the DiD estimates substantially, even when each `\(\delta^{DD}\)` does not change. - Groups treated closer to middle of panel receive higher weights than those treated earlier or later. --- # .center.pull[Simulation Exercise] - Can show how easily `\(\delta^{DD}\)` goes awry up through a simulation exercise. - Assume we're modeling outcome variable `\(y_{it}\)` on balanced panel with `\(T = 36\)` years from 1980 to 2015 with 1000 firms `\(i\)`. - Time-invariant unit effects and time-varying year effects drawn from `\(\sim N \left(0, \frac{1}{2}^2\right)\)`. - Firms are incorporate in of 50 randomly drawn states, and states are randomly assigned into three treatment groups `\(G_g \in \{1989, 1998, 2006\}\)`. --- # .center.pull[Simulation Exercise] `\(\hspace{0.5cm}\)` - Model the treatment effect process in three ways - Only one treatment period (1998) and one treated group, with constant additive treatment effects. - Allow for staggered treatment timing but with constant additive effects. Simulated treatment effects `\(\tau\)` are all positive in expectation but decrease over time ($\tau_{G1989} = 5, \tau_{G1998} = 3, \tau_{G2007} = 1$). - Allow for both staggered treatment timing and change-in-trend "dynamic" treatment effects. Instead of a constant `\(\tau\)` for each group, `\(\tau_i\)` is the yearly increase in outcome variable that compounds over time. Here `\(\tau_{i, G1989} = 0.5, \tau_{i, G1998} = 0.3,\)` and `\(\tau_{i, G2007} = 0.1\)`. --- # .center.pull[Simulation Exercise] <img src="SoDa_files/figure-html/d5-1.png" width="864" style="display: block; margin: auto;" /> --- # .center.pull[Simulation Exercise] `\(\hspace{0.5cm}\)` - With the simulated data we estimate TWFE DiD using MLE on: `$$y_{it} = \alpha_i + \alpha_t + \delta^{DD}D_{it} + \epsilon_{it}$$` `\(\hspace{0.5cm}\)` - Simple regression model with unit and time fixed effects. `\(\hspace{0.5cm}\)` - For each of the three simulated datasets we run a Monte Carlo simulation where we create the datasets 1,000 times and plot the distribution of `\(\widehat{\delta^{DD}}\)`. `\(\hspace{0.5cm}\)` - Bias is deviation from true underlying treatment effect. --- # .center.pull[Simulation Exercise] <img src="fig2.png" width="3333" style="display: block; margin: auto;" /> --- # .center.pull[Goodman-Bacon Decomposition for Simulation 3] `\(\hspace{0.5cm}\)` <img src="SoDa_files/figure-html/d7-1.png" width="576" style="display: block; margin: auto;" /> --- # .center.pull[Callaway & Sant'Anna] - Inverse propensity weighted long-difference in cohort-specific average treatment effects between treated and untreated units for a given treatment cohort. `$$\begin{equation} ATT(g, t) = \mathbb{E} \left[\left( \frac{G_g}{\mathbb{E}[G_g]} - \frac{\frac{p_g(X)C}{1 - p_g(X)}}{\mathbb{E}\left[\frac{p_g(X)C}{1 - p_g(X)} \right]} \right) \left(Y_t - T_{g - 1}\right)\right] \end{equation}$$` --- # .center.pull[Abraham and Sun] - A relatively straightforward extension of the standard event-study TWFE model: `$$y_{it} = \alpha_i + \alpha_t + \sum_e \sum_{l \neq -1} \delta_{el}(1\{E_i = e\} \cdot D_{it}^l) + \epsilon_{it}$$` - You saturate the relative time indicators (i.e. t = -2, -1, ...) with indicators for the treatment initiation year group, and aggregate to overall aggregate relative time indicators by cohort size. - In the case of no covariates, this gives you the same estimate as Callaway & Sant'Anna if you *fully saturate* the model with time indicators (leaving only two relative year identifiers missing). - The authors don't claim that it can be used with covariates, but it seemingly follows if we think it is okay with normal TWFE DiD. --- # .center.pull[Stacked Regression] - Similar to the standard TWFE DiD, but we ensure that no previously treated units enter as controls by trimming the sample. - For each treatment cohort `\(G_g\)`, get all treated units, and all units that are not treated by year `\(g + k\)` where `\(g\)` is the treatment year and `\(k\)` is the outer most relative year that you want to test (e.g. if you do an event study plot from -5 to 5, `\(k\)` would equal 5). - Keep only observations within years `\(g - k\)` and `\(g + k\)` for each cohort-specific dataset, and then stack them in relative time. - Run the same TWFE estimates as in standard DiD, but include interactions for the cohort-specific dataset with all of the fixed effects, controls, and clusters. --- # .center.pull[Simulations - Remedies] <img src="fig4.png" width="3333" style="display: block; margin: auto;" /> --- # .center.pull[Application - Medical Marijuana Laws and Opioid Overdose Deaths] - `\(\color{blue}{\text{Bachhuber et al. 2014}}\)` found, using a staggered DiD, that states with medical cannabis laws experienced a slower increase in opioid overdose mortality from 1999-2010. - `\(\color{blue}{\text{Shover et al. 2020}}\)` extend the data sample from 2010 to 2017, a period during which 32 extra states passed MML laws. - Not only do the results go away, but the sign flips; MML laws are associated with *higher* opioid overdose mortality rates. - Authors don't call it difference-in-differences, but it uses TWFE with a binary indicator variable (thus is effectively DiD). --- #.center.pull[Replication of MML] <img src="SoDa_files/figure-html/repmml-1.png" width="504" style="display: block; margin: auto;" /> --- # .center.pull[Event Study Estimates] - Little evidence covariates matter here, so estimate standard DiD with no controls over the two periods: `$$y_{it} = \alpha_i + \alpha_t + \sum_{k = k_*}^{k^*} \delta_k D_{it} + \epsilon_{it}$$` where `\(\alpha_i\)` and `\(\alpha_t\)` are state and year fixed effects respectively, and `\(\delta_k\)` are the coefficients on the lead/lag indicators for years around treatment. <img src="SoDa_files/figure-html/d14-1.png" width="720" style="display: block; margin: auto;" /> --- # .center.pull[Goodman-Bacon Decomposition] <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="border-bottom:hidden" colspan="1"></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">1999-2010</div></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="3"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">1999-2017</div></th> </tr> <tr> <th style="text-align:center;"> Type </th> <th style="text-align:center;"> Average Estimate </th> <th style="text-align:center;"> Number of 2x2 Comparisons </th> <th style="text-align:center;"> Total Weight </th> <th style="text-align:center;"> Average Estimate </th> <th style="text-align:center;"> Number of 2x2 Comparisons </th> <th style="text-align:center;"> Total Weight </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> Earlier vs Later Treated </td> <td style="text-align:center;"> <span style=" color: red !important;">-0.11</span> </td> <td style="text-align:center;"> 21 </td> <td style="text-align:center;"> 0.04 </td> <td style="text-align:center;"> <span style=" color: red !important;">-0.16</span> </td> <td style="text-align:center;"> 91 </td> <td style="text-align:center;"> 0.38 </td> </tr> <tr> <td style="text-align:center;"> Later vs Earlier Treated </td> <td style="text-align:center;"> <span style=" color: blue !important;">0.09</span> </td> <td style="text-align:center;"> 28 </td> <td style="text-align:center;"> 0.16 </td> <td style="text-align:center;"> <span style=" color: blue !important;">0.32</span> </td> <td style="text-align:center;"> 105 </td> <td style="text-align:center;"> 0.42 </td> </tr> <tr> <td style="text-align:center;"> Treated vs Untreated </td> <td style="text-align:center;"> <span style=" color: red !important;">-0.25</span> </td> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 0.79 </td> <td style="text-align:center;"> <span style=" color: blue !important;">0.44</span> </td> <td style="text-align:center;"> 14 </td> <td style="text-align:center;"> 0.20 </td> </tr> </tbody> </table> --- # .center.pull[Remedies] <img src="SoDa_files/figure-html/remedies-1.png" width="720" style="display: block; margin: auto;" /> --- # .center.pull[Takeaways] - DiDs are a powerful tool and we are going to keep using them. - But we should make sure we understand what we're doing! DiD is a comparison of means and at a minimum we should know which means we're comparing. - Multiple new methods have been proposed, all of which ensure that you aren't using prior treated units as controls. - You should probably tailor your selection of method to your data structure: they use and discard different amount of control units and depending on your setting this might matter. - Unclear what's going on with MMLs and opioid mortality rates, but very unlikely that the results in the first published paper is robust.