Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual method for population re-sampling in bootstrap #34

Open
dennlinger opened this issue Feb 13, 2023 · 0 comments
Open

Unusual method for population re-sampling in bootstrap #34

dennlinger opened this issue Feb 13, 2023 · 0 comments

Comments

@dennlinger
Copy link

Hi, great work and really interesting approach to NLG evaluation!
I was going through your implementation of computing paired bootstrap tests for estimating the significance of results and found an unusual way with the re-sampling of your population estimate, e.g., the one below (other testing functions perform similar sampling in the same script):

sub_ids = doc_ids[:int(0.8 * len(doc_ids))]

According to literature, bootstrap sampling generally performs sampling with replacement to the size of the original data, see, e.g., (Koehn, 2004). I am not sure how much this affects significance, but especially for the rather small QAGS datasets this might have some effect.
Did you follow any recommendations when choosing the portion to sub-sample to (i.e., the 80%), or is this a more or less arbitrarily set threshold?

Thanks in advance for any insights!
Best,
Dennis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant