Difference-in-Differences in 2020Common Pitfalls and How to Avoid ThemAndrew BakerStanford University2020-09-211 / 29

Outline of Talk

Overview of DiD
Problems with Staggered DiD
Simulation Results
Some Alternative Methods
Application

2 / 29

Difference-in-Differences

Think Card and Krueger minimum wage study comparing NJ and PA.
2 units and 2 time periods.
1 unit (T) is treated, and receives treatment in the second period. The control unit (C) is never treated.

3 / 29

Difference-in-Differences

4 / 29

Difference-in-Differences

Building upon $Angrist & Pischke (2008, p. 228)$ we can think of these simple 2x2 DiDs as a fixed effects estimator.
Potential Outcomes
- $Y_{i, t}^{1}$ = value of dependent variable for unit $i$ in period $t$ with treatment.
- $Y_{i, t}^{0}$ = value of dependent variable for unit $i$ in period $t$ without treatment.
The expected outcome is a linear function of unit and time fixed effects: $E [Y_{i, t}^{0}] = α_{i} + α_{t}$ $E [Y_{i, t}^{1}] = α_{i} + α_{t} + δ D_{s t}$
Goal of DiD is to get an unbiased estimate of the treatment effect $δ$ .

5 / 29

Difference-in-Differences as Solving System of Equations for Unknown Variable

Difference in expectations for the control unit times t = 1 and t = 0:
Now do the same thing for the treated unit:
If we assume the linear structure of DiD, then unbiased estimate of is:

6 / 29

Two-Way Differencing

7 / 29

Regression DiD

The DiD can be estimated through linear regression of the form:

The coefficients from the regression estimate in (1) recover the same parameters as the double-differencing performed above:

8 / 29

Regression DiD - The Workhorse Model

Advantage of regression DiD - it provides both estimates of and standard errors for the estimates.
:
- "It's also easy to add additional (units) or periods to the regression setup... [and] it's easy to add additional covariates."
Two-way fixed effects estimator:
- and are unit and time fixed effects, is the unit-time indicator for treatment.
- and now subsumed by the fixed effects.
- can be easily modified to include covariate matrix , time trends, dynamic treatment effects estimation, etc.

9 / 29

Where It Goes Wrong

Developed literature now on the issues with TWFE DiD with "staggered treatment timing" (Abraham and Sun (2018), Borusyak and Jaravel (2018), Callaway and Sant'Anna (2019), Goodman-Bacon (2019), Strezhnev (2018), Athey and Imbens (2018))
- Different units receive treatment at different periods in time.
Probably the most common use of DiD today. If done right can increase amount of cross-sectional variation.
Without digging into the literature:
- with staggered treatment timing is a weighted average of many different treatment effects.
- We know little about how it measures when treatment timing varies, how it compares means across groups, or why different specifications change estimates.
- The weights are often negative and non-intuitive.

10 / 29

Bias with TWFE - Goodman-Bacon (2019)

provides a clear graphical intuition for the bias. Assume three treatment groups - never treated units (U), early treated units (k), and later treated units (l).

11 / 29

Bias with TWFE - Goodman-Bacon (2019)

shows that we can form four different 2x2 groups in this setting, where the effect can be estimated using the simple regression DiD in each group:

12 / 29

Bias with TWFE - Goodman-Bacon (2019)

Important Insights
- is just the weighted average of the four 2x2 treatment effects. The weights are a function of the size of the subsample, relative size of treatment and control units, and the timing of treatment in the sub sample.
- Already-treated units act as controls even though they are treated.
- Given the weighting function, panel length alone can change the DiD estimates substantially, even when each does not change.
- Groups treated closer to middle of panel receive higher weights than those treated earlier or later.

13 / 29

Simulation Exercise

Can show how easily goes awry up through a simulation exercise.
Assume we're modeling outcome variable on balanced panel with years from 1980 to 2015 with 1000 firms .
Time-invariant unit effects and time-varying year effects drawn from .
Firms are incorporate in of 50 randomly drawn states, and states are randomly assigned into three treatment groups .

14 / 29

Simulation Exercise

Model the treatment effect process in three ways
- Only one treatment period (1998) and one treated group, with constant additive treatment effects.
- Allow for staggered treatment timing but with constant additive effects. Simulated treatment effects are all positive in expectation but decrease over time ($\tau{G1989} = 5, \tau{G1998} = 3, \tau_{G2007} = 1$).
- Allow for both staggered treatment timing and change-in-trend "dynamic" treatment effects. Instead of a constant for each group, is the yearly increase in outcome variable that compounds over time. Here and .

15 / 29

Simulation Exercise

16 / 29

Simulation Exercise

With the simulated data we estimate TWFE DiD using MLE on:

Simple regression model with unit and time fixed effects.

For each of the three simulated datasets we run a Monte Carlo simulation where we create the datasets 1,000 times and plot the distribution of .

Bias is deviation from true underlying treatment effect.

17 / 29

Simulation Exercise

18 / 29

Goodman-Bacon Decomposition for Simulation 3

19 / 29

Callaway & Sant'Anna

Inverse propensity weighted long-difference in cohort-specific average treatment effects between treated and untreated units for a given treatment cohort.

20 / 29

Abraham and Sun

A relatively straightforward extension of the standard event-study TWFE model:
You saturate the relative time indicators (i.e. t = -2, -1, ...) with indicators for the treatment initiation year group, and aggregate to overall aggregate relative time indicators by cohort size.
In the case of no covariates, this gives you the same estimate as Callaway & Sant'Anna if you fully saturate the model with time indicators (leaving only two relative year identifiers missing).
The authors don't claim that it can be used with covariates, but it seemingly follows if we think it is okay with normal TWFE DiD.

21 / 29

Stacked Regression

Similar to the standard TWFE DiD, but we ensure that no previously treated units enter as controls by trimming the sample.
For each treatment cohort , get all treated units, and all units that are not treated by year where is the treatment year and is the outer most relative year that you want to test (e.g. if you do an event study plot from -5 to 5, would equal 5).
Keep only observations within years and for each cohort-specific dataset, and then stack them in relative time.
Run the same TWFE estimates as in standard DiD, but include interactions for the cohort-specific dataset with all of the fixed effects, controls, and clusters.

22 / 29

Simulations - Remedies

23 / 29

Application - Medical Marijuana Laws and Opioid Overdose Deaths

found, using a staggered DiD, that states with medical cannabis laws experienced a slower increase in opioid overdose mortality from 1999-2010.
extend the data sample from 2010 to 2017, a period during which 32 extra states passed MML laws.
Not only do the results go away, but the sign flips; MML laws are associated with higher opioid overdose mortality rates.
Authors don't call it difference-in-differences, but it uses TWFE with a binary indicator variable (thus is effectively DiD).

24 / 29

Replication of MML

25 / 29

Event Study Estimates

Little evidence covariates matter here, so estimate standard DiD with no controls over the two periods:

where and are state and year fixed effects respectively, and are the coefficients on the lead/lag indicators for years around treatment.

26 / 29

Goodman-Bacon Decomposition
 

1999-2010

1999-2017


    Type 
    Average Estimate 
    Number of 2x2 Comparisons 
    Total Weight 
    Average Estimate 
    Number of 2x2 Comparisons 
    Total Weight 
  


    Earlier vs Later Treated 
    -0.11 
    21 
    0.04 
    -0.16 
    91 
    0.38 
  

    Later vs Earlier Treated 
    0.09 
    28 
    0.16 
    0.32 
    105 
    0.42 
  

    Treated vs Untreated 
    -0.25 
    7 
    0.79 
    0.44 
    14 
    0.20 
  

27 / 29

	1999-2010	1999-2017
Earlier vs Later Treated	-0.11	21	0.04	-0.16	91	0.38
Later vs Earlier Treated	0.09	28	0.16	0.32	105	0.42
Treated vs Untreated	-0.25	7	0.79	0.44	14	0.20

Remedies

28 / 29

Takeaways

DiDs are a powerful tool and we are going to keep using them.
But we should make sure we understand what we're doing! DiD is a comparison of means and at a minimum we should know which means we're comparing.
Multiple new methods have been proposed, all of which ensure that you aren't using prior treated units as controls.
You should probably tailor your selection of method to your data structure: they use and discard different amount of control units and depending on your setting this might matter.
Unclear what's going on with MMLs and opioid mortality rates, but very unlikely that the results in the first published paper is robust.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

	1999-2010			1999-2017
Type	Average Estimate	Number of 2x2 Comparisons	Total Weight	Average Estimate	Number of 2x2 Comparisons	Total Weight
Earlier vs Later Treated	-0.11	21	0.04	-0.16	91	0.38
Later vs Earlier Treated	0.09	28	0.16	0.32	105	0.42
Treated vs Untreated	-0.25	7	0.79	0.44	14	0.20

Difference-in-Differences in 2020

Common Pitfalls and How to Avoid Them

Andrew Baker

Stanford University

2020-09-21

Outline of Talk

Difference-in-Differences

Difference-in-Differences

Difference-in-Differences

Difference-in-Differences as Solving System of Equations for Unknown Variable

Two-Way Differencing

Regression DiD

Regression DiD - The Workhorse Model

Where It Goes Wrong

Bias with TWFE - Goodman-Bacon (2019)

Bias with TWFE - Goodman-Bacon (2019)

Bias with TWFE - Goodman-Bacon (2019)

Simulation Exercise

Simulation Exercise

Simulation Exercise

Simulation Exercise

Simulation Exercise

Goodman-Bacon Decomposition for Simulation 3

Callaway & Sant'Anna

Abraham and Sun

Stacked Regression

Simulations - Remedies

Application - Medical Marijuana Laws and Opioid Overdose Deaths

Replication of MML

Event Study Estimates

Goodman-Bacon Decomposition

Remedies

Takeaways

Outline of Talk

Help