You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this is my first report so I am sorry for any mistakes / misunderstandings.
We (Daniele Girardi & I) are building on reghdfe for our lpdid (local projections difference-in-differences) command. Your command is a great help and immensely speeds up the command. We are grateful for your contribution to the community!
Since our outcome is first-differenced, we want to give our users some guidelines in how to enter control variables (i.e. enter them first-differenced in most cases). The most intuitive process in Stata would be to simply add the "D." operator to a control variable. However, this is when we noticed that the standard errors on the main regressor are sensitive to where and how we specify such a control.
Expected Behaviour:
We would expect the point estimates and standard errors on the main regressors to be the same, regardless of how users enter their first-differenced control variable. Specifically, it should not matter for SE on the main regressor whether users first-difference their control variable themselves before running the command & then include it as control in absorb() or in the main line, or do the same without manually first-differencing but using the operator "D." instead.
Actual Behaviour:
Standard errors differ when using the "D." operator in the absorb() option. Point estimates remain identical. While the provided example does not change drastically, in one of our empirical applications the change was more notable. The change in SE cannot be explained by rounding.
Output / Example:
I report the standard error on the main variable "treat" using different specifications. In this example, we add linear time trends to the diff-in-diff estimation. First-differencing a group-specific linear time trend is equivalent to adding group FE.
clear *
set seed 12345
set obs 1000
gen y = uniform()
gen group = floor(_n/50)
bys group: gen time = _n
gen treat = 0
replace treat = 1 if group>10 & time>15
xtset group time
using the Stata-provided regress command
regress D.y treat i.group i.time
regress D.y treat D.c.time#i.group i.time
SE = .05779
using reghdfe mirroring regress command from above
reghdfe D.y treat , absorb(group time)
reghdfe D.y treat D.c.time#i.group , absorb(time)
SE = .05779 (same as above)
entering controls with D. in absorb()
reghdfe D.y treat , absorb(D.c.time#i.group time)
*> SE = .0578218 (different to above)
The text was updated successfully, but these errors were encountered:
I don't know why, but degrees of freedom are computed slightly differently when both D.c.time#i.group and i.time are in absorb(). It feels like that should represent 68 absorbed degrees of freedom, not 69, but I'm not confident.
You can adjust e(V) to account for this.
reghdfe D.y treat D.c.time#i.group i.time, noabsorb
di e(df_m)
di e(df_a)
di e(df_r)
reghdfe D.y treat D.c.time#i.group, absorb(i.time)
di e(df_m)
di e(df_a)
di e(df_r)
reghdfe D.y treat i.time, absorb(D.c.time#i.group)
di e(df_m)
di e(df_a)
di e(df_r)
reghdfe D.y treat, absorb(D.c.time#i.group i.time)
di e(df_m)
di e(df_a)
di e(df_r)
matrix V = e(V)
matrix V = V * e(df_r) / (e(df_r)+1)
erepost V = V
estimates replay
Stata version: 18.0 16jul2024
OS: Windows 10
Hi, this is my first report so I am sorry for any mistakes / misunderstandings.
We (Daniele Girardi & I) are building on reghdfe for our lpdid (local projections difference-in-differences) command. Your command is a great help and immensely speeds up the command. We are grateful for your contribution to the community!
Since our outcome is first-differenced, we want to give our users some guidelines in how to enter control variables (i.e. enter them first-differenced in most cases). The most intuitive process in Stata would be to simply add the "D." operator to a control variable. However, this is when we noticed that the standard errors on the main regressor are sensitive to where and how we specify such a control.
Expected Behaviour:
We would expect the point estimates and standard errors on the main regressors to be the same, regardless of how users enter their first-differenced control variable. Specifically, it should not matter for SE on the main regressor whether users first-difference their control variable themselves before running the command & then include it as control in absorb() or in the main line, or do the same without manually first-differencing but using the operator "D." instead.
Actual Behaviour:
Standard errors differ when using the "D." operator in the absorb() option. Point estimates remain identical. While the provided example does not change drastically, in one of our empirical applications the change was more notable. The change in SE cannot be explained by rounding.
Output / Example:
I report the standard error on the main variable "treat" using different specifications. In this example, we add linear time trends to the diff-in-diff estimation. First-differencing a group-specific linear time trend is equivalent to adding group FE.
clear *
set seed 12345
set obs 1000
gen y = uniform()
gen group = floor(_n/50)
bys group: gen time = _n
gen treat = 0
replace treat = 1 if group>10 & time>15
xtset group time
using the Stata-provided regress command
regress D.y treat i.group i.time
regress D.y treat D.c.time#i.group i.time
using reghdfe mirroring regress command from above
reghdfe D.y treat , absorb(group time)
reghdfe D.y treat D.c.time#i.group , absorb(time)
entering controls with D. in absorb()
reghdfe D.y treat , absorb(D.c.time#i.group time)
*> SE = .0578218 (different to above)
The text was updated successfully, but these errors were encountered: