Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

Open
hieronymusBusch opened this issue Jul 30, 2024 · 2 comments
Assignees

Comments

@hieronymusBusch
Copy link

Stata version: 18.0 16jul2024
OS: Windows 10

Hi, this is my first report so I am sorry for any mistakes / misunderstandings.

We (Daniele Girardi & I) are building on reghdfe for our lpdid (local projections difference-in-differences) command. Your command is a great help and immensely speeds up the command. We are grateful for your contribution to the community!

Since our outcome is first-differenced, we want to give our users some guidelines in how to enter control variables (i.e. enter them first-differenced in most cases). The most intuitive process in Stata would be to simply add the "D." operator to a control variable. However, this is when we noticed that the standard errors on the main regressor are sensitive to where and how we specify such a control.

Expected Behaviour:

We would expect the point estimates and standard errors on the main regressors to be the same, regardless of how users enter their first-differenced control variable. Specifically, it should not matter for SE on the main regressor whether users first-difference their control variable themselves before running the command & then include it as control in absorb() or in the main line, or do the same without manually first-differencing but using the operator "D." instead.

Actual Behaviour:

Standard errors differ when using the "D." operator in the absorb() option. Point estimates remain identical. While the provided example does not change drastically, in one of our empirical applications the change was more notable. The change in SE cannot be explained by rounding.

Output / Example:

I report the standard error on the main variable "treat" using different specifications. In this example, we add linear time trends to the diff-in-diff estimation. First-differencing a group-specific linear time trend is equivalent to adding group FE.

clear *
set seed 12345
set obs 1000

gen y = uniform()
gen group = floor(_n/50)
bys group: gen time = _n
gen treat = 0
replace treat = 1 if group>10 & time>15

xtset group time

  • using the Stata-provided regress command
    regress D.y treat i.group i.time
    regress D.y treat D.c.time#i.group i.time

  • SE = .05779

  • using reghdfe mirroring regress command from above
    reghdfe D.y treat , absorb(group time)
    reghdfe D.y treat D.c.time#i.group , absorb(time)

  • SE = .05779 (same as above)

  • entering controls with D. in absorb()
    reghdfe D.y treat , absorb(D.c.time#i.group time)
    *> SE = .0578218 (different to above)

@NilsEnevoldsen
Copy link

I don't know why, but degrees of freedom are computed slightly differently when both D.c.time#i.group and i.time are in absorb(). It feels like that should represent 68 absorbed degrees of freedom, not 69, but I'm not confident.

You can adjust e(V) to account for this.

reghdfe D.y treat D.c.time#i.group i.time, noabsorb
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat D.c.time#i.group, absorb(i.time)
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat i.time, absorb(D.c.time#i.group)
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat, absorb(D.c.time#i.group i.time)
di e(df_m)
di e(df_a)
di e(df_r)

matrix V = e(V)
matrix V = V * e(df_r) / (e(df_r)+1)
erepost V = V
estimates replay

@hieronymusBusch
Copy link
Author

Thank you for the comment! Without venturing into the code of reghdfe, my best guess is then that this behaviour originates in how Stata generates e(sample) when using time series operators (see also https://www.statalist.org/forums/forum/general-stata-discussion/general/1756480-complete-sample-sizes-and-e-sample-with-lags-and-leads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants