Goodness-of-fit bootstrap documentation #94

Lvulis · 2020-07-10T18:42:30Z

Hi,

There are two minor issues in the bootstrap_p documentation I believe.

I think there is a discrepancy between what algorithm 2 in vignette 1: an introduction to using the poweRlaw package says is the p value and what bootstrap_p computes. It appears bootstrap_p is computing P(KS_sim >= KS_d) but algorithm 2 states P(KS_sim < KS_d), where KS_sim is the distribution of simulated KS statistics and KS_d is the KS-statistic of the observed data with the fitted distribution. I think bootstrap_p is correct and algorithm 2 incorrect.

Algorithm 2 line 9: KS_d > KS_sim, then P = P + 1.

bootstrap_p line 37 (poweRlaw v0.70.2): p = sum(nof[1, ] >= gof_v[["gof"]], na.rm = TRUE)/no_of_sims 
# Where gof_v[["gof"]] is KS_d and nof[1, ] is KS_sim.

The bootstrap procedure specified by Clauset et al. (2009) specifies that for each simulation, a new x_min and alpha should be simulated to prevent getting a biased estimate of the p-value (page 677, paragraph 3). If you pass an argument to pars in bootstrap_p, it will use this argument as the parameters of the fitted distribution when computing KS_sim. A quick simulation shows this leads to 2 different distributions of the KS statistic. Believing Clauset et al., the simulation with the prespecified parameter is biased towards larger values. A quick note in algorithm 2 would be helpful but not necessary! In either case the simualted data is fine but for an estimated KS statistic on the right tail of the distributions, you could mistakenly fail to reject the power law.

library(poweRlaw)
rv <- rplcon(1000, 1, 2)

m <- conpl$new(rv)
est <- estimate_xmin(m)
print(est)
$gof
[1] 0.01396956

$xmin
[1] 1.061891

$pars
[1] 1.975244

$ntail
[1] 951

$distance
[1] "ks"

attr(,"class")
[1] "estimate_xmin"

m$setXmin(est)

pp <- bootstrap_p(m, pars = est$pars, threads = 3, no_of_sims = 2500)
pp2 <- bootstrap_p(m, threads = 3, no_of_sims = 2500)

# pars used to estimate KS
head(pp$bootstraps)
         gof     xmin     pars ntail
1 0.01468929 1.254009 1.975244   825
2 0.01724365 1.207974 1.975244   816
3 0.01835665 1.008667 1.975244   995
4 0.01575556 1.020330 1.975244   979
5 0.02101735 1.002672 1.975244  1000
6 0.01718341 1.975697 1.975244   511
# pars used to estimate KS
head(pp2$bootstraps)
         gof     xmin     pars ntail
1 0.01627703 1.159591 1.971896   880
2 0.01844899 1.195371 1.977631   830
3 0.02075603 1.962142 1.920906   507
4 0.01671231 1.008667 1.973206   991
5 0.02177171 1.015397 1.959509   982
6 0.01786802 2.396719 1.937950   415


plot(density(pp$bootstraps[, 1]), xlim = range(c(pp$bootstraps[, 1], pp2$bootstraps[, 1])), 
     ylim = c(0, 120), lwd = 3, main = bquote(f[x]*(KS[sim])), xlab = "KS-statistic")
grid(lwd = 2, col = rgb(0.05, 0.05, 0.05, 0.5))

abline(v = est$gof, lwd = 2, lty = 2)
lines(density(pp2$bootstraps[, 1]), col = 'red', lwd = 3)
legend("topright", lwd = 3, lty = 1, col = c('black', 'red'),
       legend = c('constant alpha', 'estimated alpha'))

> mean(pp$bootstraps[, 1] >= est$gof)
[1] 0.9316
> mean(pp2$bootstraps[, 1] >= est$gof)
[1] 0.8764

> mean(pp$bootstraps[, 1] >= .028)
[1] 0.1256
> mean(pp2$bootstraps[, 1] >= .028)
[1] 0.0352

The text was updated successfully, but these errors were encountered:

csgillespie · 2025-02-02T21:33:00Z

I realise you opened this issue a long time ago. I don't suppose you could make an MR?

It's been so long since I looked at the "maths" and I don't have the time to get back up to speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goodness-of-fit bootstrap documentation #94

Goodness-of-fit bootstrap documentation #94

Lvulis commented Jul 10, 2020

csgillespie commented Feb 2, 2025

Goodness-of-fit bootstrap documentation #94

Goodness-of-fit bootstrap documentation #94

Comments

Lvulis commented Jul 10, 2020

csgillespie commented Feb 2, 2025