You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to use this pipeline with bam files that are aligned to B37.
1000G_omni2.5.b38.sites.PASS.vcf.gz
1000G_omni2.5.b38.sites.PASS.vcf.gz.tbi
dbsnp_142.b38.vcf.gz
dbsnp_142.b38.vcf.gz.tbi
hapmap_3.3.b38.sites.vcf.gz
hapmap_3.3.b38.sites.vcf.gz.tbi
Seem to have direct analogues available at ftp://[email protected]/gotcloud/ref/hs37d5-db142-v1.tgz
hg38.centromere.bed.gz
This file seems simple enough, but I'm not sure I have a good B37 equivalent, I could construct a file in the same format using the info at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz , but their resolution seems to be by the 100,000 bp... e.g.
chr1 121500000 125000000 p11.1 acen
chr1 125000000 128900000 q11 acen
I am having trouble finding B37 versions of the following:
HGDP_938.b38.genotypes.svd.fUD.gz
HGDP_938.b38.genotypes.svd.V.gz
I've figured out that they are for Cramore's cram-verify-bam
"== Options for input SVD files ==
--svd [STR: ] : Prefix of SVD (.fUD, .V) files
--num-PC [INT: 2] : Number of PCs to use"
But I don't know how I would generate these SVD files. Some kind of singular value decomposition on the HGDP data for ancestry?
The md5 folder, I've figured out it is also for Cramore, but I'm not sure how to create it for the B37 data. Or is it generated automatically (I tried a run even though I knew my B37 input would fail to see where)?
HGDP_938.hg38.sites.vcf.gz ... so this is the unrelated set from HGDP, but those are B36.1 and this one is B38, is there a B37 version? Create it with liftover? HGDP_938.hg38.sites.vcf.gz.csi and HGDP_938.hg38.sites.vcf.gz.tbi I can easily generate once I have my B37 equivalent of HGDP_938.hg38.sites.vcf.gz
hs38DH.fa and hs38DH.fa.fai I have the analogs of, I guess I can create my B37 equivalent of hs38DH.dict with Picard.
hs38DH.gc.w150.s5.gz and hs38DH.gc.w150.s5.gz.gzi Umm, I guess these are somehow analogous to hs37d5.winsize100.gc, but with a different window size, so perhaps I can get away with gzipping and indexing that (what tool is used for the .gzi index?). https://genome.sph.umich.edu/wiki/GotCloud:_Genetic_Reference_and_Resource_Files#Reference_fasta_Files suggests that I can generate this with qplot, does that take an argument for the window size?
Sorry for all the questions, let me know if there is a better place to ask them, person to email, etc...
The text was updated successfully, but these errors were encountered:
I'm attempting to use this pipeline with bam files that are aligned to B37.
1000G_omni2.5.b38.sites.PASS.vcf.gz
1000G_omni2.5.b38.sites.PASS.vcf.gz.tbi
dbsnp_142.b38.vcf.gz
dbsnp_142.b38.vcf.gz.tbi
hapmap_3.3.b38.sites.vcf.gz
hapmap_3.3.b38.sites.vcf.gz.tbi
Seem to have direct analogues available at ftp://[email protected]/gotcloud/ref/hs37d5-db142-v1.tgz
hg38.centromere.bed.gz
This file seems simple enough, but I'm not sure I have a good B37 equivalent, I could construct a file in the same format using the info at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz , but their resolution seems to be by the 100,000 bp... e.g.
chr1 121500000 125000000 p11.1 acen
chr1 125000000 128900000 q11 acen
I am having trouble finding B37 versions of the following:
HGDP_938.b38.genotypes.svd.fUD.gz
HGDP_938.b38.genotypes.svd.V.gz
I've figured out that they are for Cramore's cram-verify-bam
"== Options for input SVD files ==
--svd [STR: ] : Prefix of SVD (.fUD, .V) files
--num-PC [INT: 2] : Number of PCs to use"
But I don't know how I would generate these SVD files. Some kind of singular value decomposition on the HGDP data for ancestry?
The md5 folder, I've figured out it is also for Cramore, but I'm not sure how to create it for the B37 data. Or is it generated automatically (I tried a run even though I knew my B37 input would fail to see where)?
HGDP_938.hg38.sites.vcf.gz ... so this is the unrelated set from HGDP, but those are B36.1 and this one is B38, is there a B37 version? Create it with liftover? HGDP_938.hg38.sites.vcf.gz.csi and HGDP_938.hg38.sites.vcf.gz.tbi I can easily generate once I have my B37 equivalent of HGDP_938.hg38.sites.vcf.gz
hs38DH.fa and hs38DH.fa.fai I have the analogs of, I guess I can create my B37 equivalent of hs38DH.dict with Picard.
hs38DH.gc.w150.s5.gz and hs38DH.gc.w150.s5.gz.gzi Umm, I guess these are somehow analogous to hs37d5.winsize100.gc, but with a different window size, so perhaps I can get away with gzipping and indexing that (what tool is used for the .gzi index?). https://genome.sph.umich.edu/wiki/GotCloud:_Genetic_Reference_and_Resource_Files#Reference_fasta_Files suggests that I can generate this with qplot, does that take an argument for the window size?
Sorry for all the questions, let me know if there is a better place to ask them, person to email, etc...
The text was updated successfully, but these errors were encountered: