[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

hieronymusBusch · 2024-07-30T08:13:16Z

Stata version: 18.0 16jul2024
OS: Windows 10

Hi, this is my first report so I am sorry for any mistakes / misunderstandings.

We (Daniele Girardi & I) are building on reghdfe for our lpdid (local projections difference-in-differences) command. Your command is a great help and immensely speeds up the command. We are grateful for your contribution to the community!

Since our outcome is first-differenced, we want to give our users some guidelines in how to enter control variables (i.e. enter them first-differenced in most cases). The most intuitive process in Stata would be to simply add the "D." operator to a control variable. However, this is when we noticed that the standard errors on the main regressor are sensitive to where and how we specify such a control.

Expected Behaviour:

We would expect the point estimates and standard errors on the main regressors to be the same, regardless of how users enter their first-differenced control variable. Specifically, it should not matter for SE on the main regressor whether users first-difference their control variable themselves before running the command & then include it as control in absorb() or in the main line, or do the same without manually first-differencing but using the operator "D." instead.

Actual Behaviour:

Standard errors differ when using the "D." operator in the absorb() option. Point estimates remain identical. While the provided example does not change drastically, in one of our empirical applications the change was more notable. The change in SE cannot be explained by rounding.

Output / Example:

I report the standard error on the main variable "treat" using different specifications. In this example, we add linear time trends to the diff-in-diff estimation. First-differencing a group-specific linear time trend is equivalent to adding group FE.

clear *
set seed 12345
set obs 1000

gen y = uniform()
gen group = floor(_n/50)
bys group: gen time = _n
gen treat = 0
replace treat = 1 if group>10 & time>15

xtset group time

using the Stata-provided regress command
regress D.y treat i.group i.time
regress D.y treat D.c.time#i.group i.time
SE = .05779
using reghdfe mirroring regress command from above
reghdfe D.y treat , absorb(group time)
reghdfe D.y treat D.c.time#i.group , absorb(time)
SE = .05779 (same as above)
entering controls with D. in absorb()
reghdfe D.y treat , absorb(D.c.time#i.group time)
*> SE = .0578218 (different to above)

NilsEnevoldsen · 2024-08-07T21:06:43Z

I don't know why, but degrees of freedom are computed slightly differently when both D.c.time#i.group and i.time are in absorb(). It feels like that should represent 68 absorbed degrees of freedom, not 69, but I'm not confident.

You can adjust e(V) to account for this.

reghdfe D.y treat D.c.time#i.group i.time, noabsorb
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat D.c.time#i.group, absorb(i.time)
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat i.time, absorb(D.c.time#i.group)
di e(df_m)
di e(df_a)
di e(df_r)

reghdfe D.y treat, absorb(D.c.time#i.group i.time)
di e(df_m)
di e(df_a)
di e(df_r)

matrix V = e(V)
matrix V = V * e(df_r) / (e(df_r)+1)
erepost V = V
estimates replay

hieronymusBusch · 2024-08-07T22:45:43Z

Thank you for the comment! Without venturing into the code of reghdfe, my best guess is then that this behaviour originates in how Stata generates e(sample) when using time series operators (see also https://www.statalist.org/forums/forum/general-stata-discussion/general/1756480-complete-sample-sizes-and-e-sample-with-lags-and-leads).

hieronymusBusch assigned sergiocorreia Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

hieronymusBusch commented Jul 30, 2024

NilsEnevoldsen commented Aug 7, 2024

hieronymusBusch commented Aug 7, 2024

[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

[BUG] Different SE's on main regressors using time series operator in absorb() vs as "regular" control in main line #286

Comments

hieronymusBusch commented Jul 30, 2024

NilsEnevoldsen commented Aug 7, 2024

hieronymusBusch commented Aug 7, 2024