Fix description of Kmat in fit_gbcd #6

pcarbo · 2025-01-03T20:30:43Z

Precisely, Kmax is the maximum number of factors added during the initialization step where we fit flash with a point Laplace prior. Then we do a nonnegative transform to split each factor into two. But we will always get an nonnegative intercept/baseline factor at this stage, so we will have at most 2*Kmax - 1 factors that enter the next step where we fit flash with a GB prior and improve the fit using backfit. (edited)

But after we are done with fitting flash with GB prior, we have an additional step to filter out those k's for which l_k and \tilde{l}_k are not consistent (by having a correlation < 0.8 by default), which means that we could have fewer than 2*Kmax - 1 factors in the function output.

If the underlying structure in the data requires much fewer than 2Kmax - 1 factors to explain, we will find many k's such that l_k and \tilde{l}_k are not consistent and these will be removed during this step. On the other hand, if our specified Kmax is smaller than needed, this final step will filter out no or very few factors, leaving almost all the 2Kmax - 1 factors in the function output.

So the resulting number of factors will be up to 2Kmax - 1 factors; but it can be much smaller than that. The difference depends on the relationship between the specified Kmax and the underlying "true" number of factors needed to explain the data structure (Again this is totally based on my empirical experience with real datasets). I think I made it clear that Kmax should be interpreted as an approximation of the final K we will get; maybe we can also say that the final K is up to 2Kmax - 1?

pcarbo added a commit that referenced this issue Jan 3, 2025

Updated description of Kmax addressing Issue #6.

f93909d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix description of Kmat in fit_gbcd #6

Fix description of Kmat in fit_gbcd #6

pcarbo commented Jan 3, 2025

Fix description of Kmat in fit_gbcd #6

Fix description of Kmat in fit_gbcd #6

Comments

pcarbo commented Jan 3, 2025