Skip to content

Latest commit

 

History

History
118 lines (97 loc) · 4.1 KB

pacbio_hifi_quick_demo.md

File metadata and controls

118 lines (97 loc) · 4.1 KB

PacBio HiFi Somatic Calling Quick Demo

Here is a quick demo for the PacBio somatic calling using HCC1395 tumor-normal pair chromosome 17 data. The data was sequenced using PacBio latest Revio system.

Platform:          PacBio HiFi Revio
Sample:     	   HCC1395-HCC1395BL tumor-normal pair
Normal coverage:   ~40x
Tumor coverage:    ~60x
Reference:         GRCh38_no_alt
Aligner:           pbmm2
Region:            chr17:80000000-80100000

Download data

# Parameters
INPUT_DIR="${HOME}/pacbio_hifi_quick_demo"
OUTPUT_DIR="${INPUT_DIR}/output"

mkdir -p ${INPUT_DIR}
mkdir -p ${OUTPUT_DIR}

# Download quick demo data
# GRCh38_no_alt reference
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/GRCh38_no_alt_chr17.fa
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/GRCh38_no_alt_chr17.fa.fai
# Normal and tumor BAM
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/HCC1395BL_normal_chr17_demo.bam
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/HCC1395BL_normal_chr17_demo.bam.bai
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/HCC1395_tumor_chr17_demo.bam
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/HCC1395_tumor_chr17_demo.bam.bai

# SEQC2 Truth VCF and BED
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz.tbi
wget -P ${INPUT_DIR} -nc http://www.bio8.cs.hku.hk/clairs/quick_demo/pacbio_hifi/SEQC2_High-Confidence_Regions_v1.2_chr17.bed

REF="GRCh38_no_alt_chr17.fa"
NORMAL_BAM="HCC1395BL_normal_chr17_demo.bam"
TUMOR_BAM="HCC1395_tumor_chr17_demo.bam"
BASELINE_VCF_FILE_PATH="SEQC2_high-confidence_sSNV_in_HC_regions_v1.2_chr17.vcf.gz"
BASELINE_BED_FILE_PATH="SEQC2_High-Confidence_Regions_v1.2_chr17.bed"
OUTPUT_VCF_FILE_PATH="output.vcf.gz"

Somatic variant calling using docker pre-built image

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clairs:latest \
  /opt/bin/run_clairs \
  --tumor_bam_fn ${INPUT_DIR}/${TUMOR_BAM} \
  --normal_bam_fn ${INPUT_DIR}/${NORMAL_BAM} \
  --ref_fn ${INPUT_DIR}/${REF} \
  --threads 4 \
  --platform hifi_revio \
  --output_dir ${OUTPUT_DIR} \
  --region chr17:80000000-80100000

Run compare_vcf.py for benchmarking (optional)

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clairs:latest \
  python3 /opt/bin/clairs.py compare_vcf \
     --truth_vcf_fn ${INPUT_DIR}/${BASELINE_VCF_FILE_PATH} \
     --input_vcf_fn ${OUTPUT_DIR}/${OUTPUT_VCF_FILE_PATH} \
     --bed_fn ${INPUT_DIR}/${BASELINE_BED_FILE_PATH} \
     --output_dir ${OUTPUT_DIR}/benchmark \
     --input_filter_tag 'PASS' \
     --ctg_name chr17 \
     --ctg_start 80000000 \
     --ctg_end 80100000

Expected output:

Type Precision Recall F1-score TP FP FN
SNV 1.0 0.9655 0.9825 28 0 1

Or run som.py for benchmarking (optional)

# Need to restrict target BED regions for benchmarking
echo -e "chr17\t80000000\t80100000" > ${INPUT_DIR}/quick_demo.bed

docker run \
-v "${INPUT_DIR}":"${INPUT_DIR}" \
-v "${OUTPUT_DIR}":"${OUTPUT_DIR}" \
jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/som.py \
    ${INPUT_DIR}/${BASELINE_VCF_FILE_PATH} \
    ${OUTPUT_DIR}/${OUTPUT_VCF_FILE_PATH} \
    -T ${INPUT_DIR}/quick_demo.bed \
    -f ${INPUT_DIR}/${BASELINE_BED_FILE_PATH} \
    -r ${INPUT_DIR}/${REF} \
    -o "${OUTPUT_DIR}/som" \
    -l chr17

Run all commands above:

cd ${HOME}
wget "https://raw.githubusercontent.com/HKU-BAL/clairs/main/demo/pacbio_hifi_quick_demo.sh"
chmod +x pacbio_hifi_quick_demo.sh
./pacbio_hifi_quick_demo.sh

Check the results using less ${HOME}/pacbio_hifi_quick_demo/output/output.vcf.gz.