Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Goodness-of-fit bootstrap documentation #94

Open
Lvulis opened this issue Jul 10, 2020 · 1 comment
Open

Goodness-of-fit bootstrap documentation #94

Lvulis opened this issue Jul 10, 2020 · 1 comment

Comments

@Lvulis
Copy link

Lvulis commented Jul 10, 2020

Hi,

There are two minor issues in the bootstrap_p documentation I believe.

  1. I think there is a discrepancy between what algorithm 2 in vignette 1: an introduction to using the poweRlaw package says is the p value and what bootstrap_p computes. It appears bootstrap_p is computing P(KS_sim >= KS_d) but algorithm 2 states P(KS_sim < KS_d), where KS_sim is the distribution of simulated KS statistics and KS_d is the KS-statistic of the observed data with the fitted distribution. I think bootstrap_p is correct and algorithm 2 incorrect.

Algorithm 2 line 9: KS_d > KS_sim, then P = P + 1.

bootstrap_p line 37 (poweRlaw v0.70.2): p = sum(nof[1, ] >= gof_v[["gof"]], na.rm = TRUE)/no_of_sims 
# Where gof_v[["gof"]] is KS_d and nof[1, ] is KS_sim. 
  1. The bootstrap procedure specified by Clauset et al. (2009) specifies that for each simulation, a new x_min and alpha should be simulated to prevent getting a biased estimate of the p-value (page 677, paragraph 3). If you pass an argument to pars in bootstrap_p, it will use this argument as the parameters of the fitted distribution when computing KS_sim. A quick simulation shows this leads to 2 different distributions of the KS statistic. Believing Clauset et al., the simulation with the prespecified parameter is biased towards larger values. A quick note in algorithm 2 would be helpful but not necessary! In either case the simualted data is fine but for an estimated KS statistic on the right tail of the distributions, you could mistakenly fail to reject the power law.
library(poweRlaw)
rv <- rplcon(1000, 1, 2)

m <- conpl$new(rv)
est <- estimate_xmin(m)
print(est)
$gof
[1] 0.01396956

$xmin
[1] 1.061891

$pars
[1] 1.975244

$ntail
[1] 951

$distance
[1] "ks"

attr(,"class")
[1] "estimate_xmin"

m$setXmin(est)

pp <- bootstrap_p(m, pars = est$pars, threads = 3, no_of_sims = 2500)
pp2 <- bootstrap_p(m, threads = 3, no_of_sims = 2500)

# pars used to estimate KS
head(pp$bootstraps)
         gof     xmin     pars ntail
1 0.01468929 1.254009 1.975244   825
2 0.01724365 1.207974 1.975244   816
3 0.01835665 1.008667 1.975244   995
4 0.01575556 1.020330 1.975244   979
5 0.02101735 1.002672 1.975244  1000
6 0.01718341 1.975697 1.975244   511
# pars used to estimate KS
head(pp2$bootstraps)
         gof     xmin     pars ntail
1 0.01627703 1.159591 1.971896   880
2 0.01844899 1.195371 1.977631   830
3 0.02075603 1.962142 1.920906   507
4 0.01671231 1.008667 1.973206   991
5 0.02177171 1.015397 1.959509   982
6 0.01786802 2.396719 1.937950   415


plot(density(pp$bootstraps[, 1]), xlim = range(c(pp$bootstraps[, 1], pp2$bootstraps[, 1])), 
     ylim = c(0, 120), lwd = 3, main = bquote(f[x]*(KS[sim])), xlab = "KS-statistic")
grid(lwd = 2, col = rgb(0.05, 0.05, 0.05, 0.5))

abline(v = est$gof, lwd = 2, lty = 2)
lines(density(pp2$bootstraps[, 1]), col = 'red', lwd = 3)
legend("topright", lwd = 3, lty = 1, col = c('black', 'red'),
       legend = c('constant alpha', 'estimated alpha'))

> mean(pp$bootstraps[, 1] >= est$gof)
[1] 0.9316
> mean(pp2$bootstraps[, 1] >= est$gof)
[1] 0.8764

> mean(pp$bootstraps[, 1] >= .028)
[1] 0.1256
> mean(pp2$bootstraps[, 1] >= .028)
[1] 0.0352

image

@csgillespie
Copy link
Owner

I realise you opened this issue a long time ago. I don't suppose you could make an MR?

It's been so long since I looked at the "maths" and I don't have the time to get back up to speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants