Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

effective sample size or input sample size? #1

Open
BenWilliams-NOAA opened this issue Oct 2, 2024 · 6 comments
Open

effective sample size or input sample size? #1

BenWilliams-NOAA opened this issue Oct 2, 2024 · 6 comments

Comments

@BenWilliams-NOAA
Copy link
Member

#' @param Neff matrix of assumed effective sample sizes (same dimension as obs)

Clarification of exactly which sample size to use may prove valuable?
Here is a snippet from Hulson and Williams 2024 where we (continue) to propose some consistency in the terminology, because the literature is all over the place:

• Nominal sample size: the actual sample size obtained for age or length composition data from fishery-independent or fishery-dependent sources.

• Input sample size: the reduced sample size that accounts for overdispersion of age or length composition data used to statistically weight the composition data in SCAA models.

• Effective sample size: the statistic used to measure the difference in fit between SCAA model estimates of age or length composition data and the observed composition data.

• Realized sample size: the sample size that measures the difference between bootstrap estimates of age or length composition and the observed composition for a given bootstrap iteration.
Much of this terminology follows from Thorson et al. (2023).

Also, I believe the sample size is typically a vector rather than a matrix?

@JaneSullivan-NOAA
Copy link
Collaborator

Good call on the clarification to the documentation, will fix now.

In terms of terminology, I think this is consistent with Hulson and Williams 2024 terminology? Effective sample size is what goes into the likelihood right?

@BenWilliams-NOAA
Copy link
Member Author

ISS (e.g., nmulti_fish_age in some rockfish .tpl files) is what goes into the likelihood, ESS is the difference between the model fit and obs (e.g., effn_fish_age in some of the .tpl files)

@JaneSullivan-NOAA
Copy link
Collaborator

oh, duh! :) i support that change. @Cole-Monnahan-NOAA any objections? i think it makes a lot of sense to be consistent with other packages and relevant literature.

all this will change is run_osa(obs, exp, iss, ...) instead of run_osa(obs, exp, Neff, ...)

@Cole-Monnahan-NOAA
Copy link
Collaborator

I actually don't follow and think it should be ESS. ESS is used in the likelihood I thought? ESS is reweighted ISS that gets used in the comp? Even if you do variance adjustments like SS3 I think those are equivalent.

When you calculate Pearson residuals what do you use in the denominator?

@JaneSullivan-NOAA
Copy link
Collaborator

I looked back to what I did when I presented these plots last yr for REBS. I did not use the reweighted value for either Pearson and OSA calculations (because it is not used in the likelihood, it is just reported). I really could use a class on this.

Given AFSC hasn't fully caught up with standard practices for data weighting and the timing for this isn't great, would it be helpful to update the documentation to simply say "the Neff used in the likelihood"? I want to align with Ben and Pete on terminology longer-term but don't think it's a high priority for the next 3 wks.

@Cole-Monnahan-NOAA
Copy link
Collaborator

The most important is that the N used by Pearson and OSA is what is used in the likelihood. I'm not familiar with what other folks are doing with their workflows and agree that should be a separate issue. Also note that for the D-M we need to calculate ESS from the ISS and theta parameter. I will file a new issue on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants