Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using with B37 #4

Open
ttbek opened this issue Nov 19, 2019 · 0 comments
Open

Using with B37 #4

ttbek opened this issue Nov 19, 2019 · 0 comments

Comments

@ttbek
Copy link

ttbek commented Nov 19, 2019

I'm attempting to use this pipeline with bam files that are aligned to B37.

1000G_omni2.5.b38.sites.PASS.vcf.gz
1000G_omni2.5.b38.sites.PASS.vcf.gz.tbi
dbsnp_142.b38.vcf.gz
dbsnp_142.b38.vcf.gz.tbi
hapmap_3.3.b38.sites.vcf.gz
hapmap_3.3.b38.sites.vcf.gz.tbi
Seem to have direct analogues available at ftp://[email protected]/gotcloud/ref/hs37d5-db142-v1.tgz

hg38.centromere.bed.gz
This file seems simple enough, but I'm not sure I have a good B37 equivalent, I could construct a file in the same format using the info at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz , but their resolution seems to be by the 100,000 bp... e.g.
chr1 121500000 125000000 p11.1 acen
chr1 125000000 128900000 q11 acen

I am having trouble finding B37 versions of the following:
HGDP_938.b38.genotypes.svd.fUD.gz
HGDP_938.b38.genotypes.svd.V.gz
I've figured out that they are for Cramore's cram-verify-bam
"== Options for input SVD files ==
--svd [STR: ] : Prefix of SVD (.fUD, .V) files
--num-PC [INT: 2] : Number of PCs to use"
But I don't know how I would generate these SVD files. Some kind of singular value decomposition on the HGDP data for ancestry?

The md5 folder, I've figured out it is also for Cramore, but I'm not sure how to create it for the B37 data. Or is it generated automatically (I tried a run even though I knew my B37 input would fail to see where)?

HGDP_938.hg38.sites.vcf.gz ... so this is the unrelated set from HGDP, but those are B36.1 and this one is B38, is there a B37 version? Create it with liftover? HGDP_938.hg38.sites.vcf.gz.csi and HGDP_938.hg38.sites.vcf.gz.tbi I can easily generate once I have my B37 equivalent of HGDP_938.hg38.sites.vcf.gz

hs38DH.fa and hs38DH.fa.fai I have the analogs of, I guess I can create my B37 equivalent of hs38DH.dict with Picard.

hs38DH.gc.w150.s5.gz and hs38DH.gc.w150.s5.gz.gzi Umm, I guess these are somehow analogous to hs37d5.winsize100.gc, but with a different window size, so perhaps I can get away with gzipping and indexing that (what tool is used for the .gzi index?). https://genome.sph.umich.edu/wiki/GotCloud:_Genetic_Reference_and_Resource_Files#Reference_fasta_Files suggests that I can generate this with qplot, does that take an argument for the window size?

Sorry for all the questions, let me know if there is a better place to ask them, person to email, etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant