Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added asCategorical() #211

Merged
merged 4 commits into from
Dec 25, 2024
Merged

Added asCategorical() #211

merged 4 commits into from
Dec 25, 2024

Conversation

gregorgorjanc
Copy link
Contributor

Here is a simple function that will convert normal (Gaussian) trait to categorical (threshold) trait following the ordered probit model.

I have added stub code for asCount() for later work

@gregorgorjanc
Copy link
Contributor Author

This contributes towards #178 and HighlanderLab/SIMplyBee#453

@gregorgorjanc
Copy link
Contributor Author

Thinking a bit more about this, maybe we also need the ability to supply a vector of probabilities (instead of thresholds) to calculate thresholds , as @gaynorr suggested in #178.

@gaynorr did you had in mind that such functionality would come as part of SimParam trait specifications? It would be cool, but we would loose the liability scale values if pheno columns would be converted to scores.

@gregorgorjanc
Copy link
Contributor Author

I have now expanded the asCategorical() so the user can provide either:

  1. a p vector of category probabilities and mean and var of the trait to calculate the required thresholds
  2. a threshold vector of thresholds

@gaynorr
Copy link
Owner

gaynorr commented Dec 16, 2024

I haven't quite fully thought through how to implement these types of traits. The easiest thing to do is process regular traits using these functions.

My next possible iteration was to incorporate the threshold information in the trait definitions themselves. Then the traits would be automatically reported on the correct scale with setPheno. I see GV continuing to hold the latent variable. I'm just not sure what to do about variance calculations. It can easily be left on the scale of the latent variable, but this wouldn't connect too well to the trait as it's being observed.

Putting thresholds in the trait definition also has the benefit of abstracting away a lot of math for the user. The user would probably set items like heritability and prevalence as in the example.

@gregorgorjanc
Copy link
Contributor Author

GV is just the genetic part. PV is the “full” latent variable that is then “cut” into categories. So, all variance calculations could just be done with the Gaussian latent variables and can likely be left as they are - this assumes knowledge of latent h2 and not the observed h2 (ala Falconer…).

Setting version to devel flag
@gaynorr gaynorr merged commit f030da2 into gaynorr:devel Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants