Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genfile meaning and r^2 #87

Open
SelinaKlees opened this issue Oct 26, 2023 · 1 comment
Open

genfile meaning and r^2 #87

SelinaKlees opened this issue Oct 26, 2023 · 1 comment

Comments

@SelinaKlees
Copy link

Dear Robert,

I have a question regarding the genfile. Actually, it is not totally clear to me if the genotypes of high coverage seq data (i.e. the genfile) is actually used for the impuation itself (as some kind of reference panel) or if it is just used to obtain a quality coefficient (r^2).
Further, in my results plots, I found the r^2 plot and the value but just for the "goodonly", as it is called. How can I obtain the r^2 value for all SNPs?

Thank you for your help,
Selina

@SelinaKlees SelinaKlees changed the title genfile meaning genfile meaning and r^2 Oct 26, 2023
@rwdavies
Copy link
Owner

Hi,

The high coverage genotypes (the genfile) is not used for imputation, it is only used to provide a measure of accuracy during imputation. The reason to include it during imputation is to get an estimate of the accuracy at each iteration, to guide parameter choices, e.g. the number of iterations.

The other plot you mention should be a plot of of the "real" allele frequency (from the pileup) vs the estimated allele frequency (from the imputation). It makes the most sense for SNPs that pass QC.

Nonetheless if you want to plot all SNPs you can take a look at the function

            plotEstimatedAgainstReal(
                outputdir = outputdir,alleleCount=alleleCount,
                estimatedAlleleFrequency=estimatedAlleleFrequency,
                which=passQC,chr=chr,regionName=regionName
            )

and the code for this function here

plotEstimatedAgainstReal <- function(outputdir,alleleCount,estimatedAlleleFrequency,which,chr,regionName) {

you should be able to load the results (RData/EM.a.ll.* as appropriate)
and by sourcing the STITCH/R/functions.R file
then you can call something like

            plotEstimatedAgainstReal(
                outputdir = outputdir,alleleCount=alleleCount,
                estimatedAlleleFrequency=estimatedAlleleFrequency,
                which=rep(TRUE, length(estimatedAlleleFrequency),chr=chr,regionName=regionName
            )

and it should work (though would override your original file)

Hope that helps,
Robbie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants