You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, great work and really interesting approach to NLG evaluation!
I was going through your implementation of computing paired bootstrap tests for estimating the significance of results and found an unusual way with the re-sampling of your population estimate, e.g., the one below (other testing functions perform similar sampling in the same script):
According to literature, bootstrap sampling generally performs sampling with replacement to the size of the original data, see, e.g., (Koehn, 2004). I am not sure how much this affects significance, but especially for the rather small QAGS datasets this might have some effect.
Did you follow any recommendations when choosing the portion to sub-sample to (i.e., the 80%), or is this a more or less arbitrarily set threshold?
Thanks in advance for any insights!
Best,
Dennis
The text was updated successfully, but these errors were encountered:
Hi, great work and really interesting approach to NLG evaluation!
I was going through your implementation of computing paired bootstrap tests for estimating the significance of results and found an unusual way with the re-sampling of your population estimate, e.g., the one below (other testing functions perform similar sampling in the same script):
BARTScore/analysis.py
Line 104 in 248f511
According to literature, bootstrap sampling generally performs sampling with replacement to the size of the original data, see, e.g., (Koehn, 2004). I am not sure how much this affects significance, but especially for the rather small QAGS datasets this might have some effect.
Did you follow any recommendations when choosing the portion to sub-sample to (i.e., the 80%), or is this a more or less arbitrarily set threshold?
Thanks in advance for any insights!
Best,
Dennis
The text was updated successfully, but these errors were encountered: