To help get started quickly, we have pre-processed some of the public domain data (more specifically, the data and truth sets shared during the Precision FDA Truth challenge) and made them available for instant use. They can be downloaded from our public S3 bucket links mentioned below.
- hs37d5 reference
https://dl4vc.s3.us-east-2.amazonaws.com/hs37d5.fa
https://dl4vc.s3.us-east-2.amazonaws.com/hs37d5.fa.fai
- HG001 50x BAM (generated from precisionFDA HG001 FASTQ file)
https://dl4vc.s3.us-east-2.amazonaws.com/HG001-NA12878-50x.sort.bam
https://dl4vc.s3.us-east-2.amazonaws.com/HG001-NA12878-50x.sort.bam.bai
- Truth set split by multi-allele and normalized
- High confidence region
- HG002 50x BAM (generated from pFGA HG002 FASTQ file)
https://dl4vc.s3.us-east-2.amazonaws.com/HG002-NA24385-50x.sort.bam
https://dl4vc.s3.us-east-2.amazonaws.com/HG002-NA24385-50x.sort.bam.bai
- Truth set split by multi-allele and normalized
- High confidence region
- High recall variant candidates in ihgh confidence region
https://dl4vc.s3.us-east-2.amazonaws.com/HG002-NA24385-50x-candidates.vcf.gz
https://dl4vc.s3.us-east-2.amazonaws.com/HG002-NA24385-50x-candidates.vcf.gz.csi