-
-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ex. 9.5 #236
Comments
binary greedy partition (without pruning)for simplicity, I wrote a no-pruning version from scratch, julia> includet("df_regtree.jl")
julia> [rep_calc_df(maxdepth=i) for i=0:4]
5-element Vector{Tuple{Float64, Float64}}:
(1.080629443872466, 0.041187110826985264)
(9.996914473623567, 0.11093327184178654)
(21.34913882167176, 0.14589493792386243)
(34.57260013395476, 0.2469943540268895)
(47.90389232392168, 0.31056757408950914) the number of terminal nodes |
call
|
similar experiments in Ye (1998)Ye, J. (1998). On Measuring and Correcting the Effects of Data Mining and Model Selection. Journal of the American Statistical Association, 93(441), 120–131. https://doi.org/10.2307/2669609 And a close result is reported if I set > mean(replicate(10, calc_df(m=19)))
[1] 60.31616 |
Thanks for your great solution. May I ask why is the estimated degree of freedom so far from the one in theory? |
@litsh what do you mean "the one in theory"? You mean the number of nodes? Actually, here I am trying to say that the number of nodes is not the theoretical degrees of freedom. There is a gap, and the gap is referred to as search cost. If you are interested, you can check the paper on the excess part of degrees of freedom by comparing lasso and the best subset regression: Tibshirani, Ryan J. “Degrees of Freedom and Model Search.” Statistica Sinica 25, no. 3 (2015): 1265–96. I also discussed the search cost of degrees of freedom for more methods (including the regression tree here) in my paper. https://arxiv.org/abs/2308.13630 |
Thank you for your reply! I will read the paper.
Thaison
***@***.***
Original Email
Sender:"szcf-weiya"< ***@***.*** >;
Sent Time:2023/12/5 23:16
To:"szcf-weiya/ESL-CN"< ***@***.*** >;
Cc recipient:"litsh"< ***@***.*** >;"Mention"< ***@***.*** >;
Subject:Re: [szcf-weiya/ESL-CN] Ex. 9.5 (#236)
@litsh what do you mean "the one in theory"? You mean the number of nodes? Actually, here I am trying to say that the number of nodes is not the theoretical degrees of freedom. There is a gap, and the gap is referred to as search cost.
If you are interested, you can check the paper on the excess part of degrees of freedom by comparing lasso and the best subset regression: Tibshirani, Ryan J. “Degrees of Freedom and Model Search.” Statistica Sinica 25, no. 3 (2015): 1265–96.
I also discussed the search cost of degrees of freedom for more methods (including the regression tree here) in my paper. https://arxiv.org/abs/2308.13630
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
The text was updated successfully, but these errors were encountered: