-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no vcf file provided - SNPs with called genotype are imputed #97
Comments
Hi, What's the depth of coverage? (And how many samples?) If the depth of coverage is higher, you might not want to use STITCH. If the depth of coverage is lower, you might not want to trust your called genotypes from the WGS directly. STITCH does not have a way to simply impute some missing genotypes. If this is something like RADseq or GBS or similar, my suggestion would run everything through STITCH, then make a merged set with SNPs and genotypes from the RADseq, and then for SNPs not meeting a certain QC filter in the RADseq or GBS, use the imputed ones Thanks, |
Hi Robbie, |
If you have data at high coverage (>10 X), you probably don't need to impute, if you can tolerate a moderate missing data rate, filtering out genotypes with low GQ (say below 10 or 20) I would say that in its primary purpose, STITCH is neither a variant caller, nor designed for imputation of individual missing genotypes. It's designed for quite low coverages (<2X), where individual genotyping of variants in samples is impossible. It also doesn't do variant calling per-se, though it can help better determine which variants are likely true positive, as those variants that agree with their imputed background (have a high INFO score). Hope that helps. One last comment, 96 samples is good, but at 0.5X, you might see much better accuracy if you imputed many more samples (e.g. 1000 samples). So I would take any results you get at the lower coverage as advisory, rather than definitive, if that makes sense (i.e. assume things might get better for more samples) (see the STITCH paper, we have a figure about this) |
Thank you for the comment and advice! Yes, the low sample size will definitely be a discussion point in the manuscript. |
Hello,
I have WGS sequencing data, incuding the bam files and the called SNPs as vcf. I used now stitch to impute my SNPs based on the bam files and the SNP positions. However, I realized since I am not providing the vcf, Stitch also imputes genotypes that are actually not missing in the vcf files. So many genotypes of the called SNPs and the ones obtained from Stitch are not the same. Actually, I only want to impute the genotypes which are missing in my vcf. Is there a way to do that with Stitch?
Best,
Selina
The text was updated successfully, but these errors were encountered: