Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter wham-only DELs and scramble-only SVAs in CleanVcf & docs updates #740

Merged
merged 8 commits into from
Oct 31, 2024
Merged
9 changes: 9 additions & 0 deletions .github/.dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,15 @@ workflows:
tags:
- /.*/

- subclass: WDL
name: VisualizeCnvs
primaryDescriptorPath: /wdl/VisualizeCnvs.wdl
filters:
branches:
- main
tags:
- /.*/

- subclass: WDL
name: SingleSamplePipeline
primaryDescriptorPath: /wdl/GATKSVPipelineSingleSample.wdl
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data.

For technical documentation on GATK-SV, including how to run the pipeline, please refer to our website.
For technical documentation on GATK-SV, including how to run the pipeline, please refer to our [website](https://broadinstitute.github.io/gatk-sv/).

## Repository structure
* `/carrot`: [Carrot](https://github.com/broadinstitute/carrot) tests
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"VisualizeCnvs.vcf_or_bed": "${this.filtered_vcf}",
"VisualizeCnvs.prefix": "${this.sample_set_set_id}",
"VisualizeCnvs.median_files": "${this.sample_sets.median_cov}",
"VisualizeCnvs.rd_files": "${this.sample_sets.merged_bincov}",
"VisualizeCnvs.ped_file": "${workspace.cohort_ped_file}",
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
"VisualizeCnvs.rd_files": [{{ test_batch.merged_coverage_file | tojson }}],
"VisualizeCnvs.ped_file": {{ test_batch.ped_file | tojson }},
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "",
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": {{ dockers.sv_pipeline_docker | tojson }}
}
2 changes: 1 addition & 1 deletion scripts/test/terra_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def main():
parser.add_argument("-j", "--womtool-jar", help="Path to womtool jar", required=True)
parser.add_argument("-n", "--num-input-jsons",
help="Number of Terra input JSONs expected",
required=False, default=25, type=int)
required=False, default=26, type=int)
parser.add_argument("--log-level",
help="Specify level of logging information, ie. info, warning, error (not case-sensitive)",
required=False, default="INFO")
Expand Down
65 changes: 64 additions & 1 deletion wdl/CleanVcfChromosome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ workflow CleanVcfChromosome {
RuntimeAttr? runtime_override_stitch_fragmented_cnvs
RuntimeAttr? runtime_override_final_cleanup
RuntimeAttr? runtime_override_rescue_me_dels
RuntimeAttr? runtime_attr_add_high_fp_rate_filters

# Clean vcf 1b
RuntimeAttr? runtime_attr_override_subset_large_cnvs_1b
Expand Down Expand Up @@ -299,9 +300,17 @@ workflow CleanVcfChromosome {
runtime_attr_override = runtime_override_rescue_me_dels
}

call FinalCleanup {
call AddHighFPRateFilters {
input:
vcf=RescueMobileElementDeletions.out,
prefix="~{prefix}.high_fp_filtered",
sv_pipeline_docker=sv_pipeline_docker,
runtime_attr_override=runtime_attr_add_high_fp_rate_filters
}

call FinalCleanup {
input:
vcf=AddHighFPRateFilters.out,
contig=contig,
prefix="~{prefix}.final_cleanup",
sv_pipeline_docker=sv_pipeline_docker,
Expand Down Expand Up @@ -799,6 +808,60 @@ task StitchFragmentedCnvs {
}
}

# Add FILTER status for pockets of variants with high FP rate: wham-only DELs and Scramble-only SVAs with HIGH_SR_BACKGROUND
task AddHighFPRateFilters {
input {
File vcf
String prefix
String sv_pipeline_docker
RuntimeAttr? runtime_attr_override
}

Float input_size = size(vcf, "GiB")
RuntimeAttr runtime_default = object {
mem_gb: 3.75 + input_size * 1.5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this copy+paste from somewhere? 3.75 should be plenty I think.

Suggested change
mem_gb: 3.75 + input_size * 1.5,
mem_gb: 3.75,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah most of the things you pointed out were from copy/paste, I'll do a better job of cleanup next time. Thanks for catching

disk_gb: ceil(100.0 + input_size * 3.0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disk_gb: ceil(100.0 + input_size * 3.0),
disk_gb: ceil(10 + input_size * 2.0),

cpu_cores: 1,
preemptible_tries: 3,
max_retries: 1,
boot_disk_gb: 10
}
RuntimeAttr runtime_override = select_first([runtime_attr_override, runtime_default])
runtime {
memory: "~{select_first([runtime_override.mem_gb, runtime_default.mem_gb])} GB"
disks: "local-disk ~{select_first([runtime_override.disk_gb, runtime_default.disk_gb])} HDD"
cpu: select_first([runtime_override.cpu_cores, runtime_default.cpu_cores])
preemptible: select_first([runtime_override.preemptible_tries, runtime_default.preemptible_tries])
maxRetries: select_first([runtime_override.max_retries, runtime_default.max_retries])
docker: sv_pipeline_docker
bootDiskSizeGb: select_first([runtime_override.boot_disk_gb, runtime_default.boot_disk_gb])
}

command <<<
set -euo pipefail

python <<CODE
import pysam
fin = pysam.VariantFile("~{vcf}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fin = pysam.VariantFile("~{vcf}")
with pysam.VariantFile("~{vcf}") as fin:

header = fin.header
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FP_RATE,Description=\"Categories of variants with low specificity including Wham-only deletions and certain Scramble SVAs\">")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FP_RATE,Description=\"Categories of variants with low specificity including Wham-only deletions and certain Scramble SVAs\">")
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FDR,Description=\"Categories of variants with low precision including Wham-only deletions and certain Scramble SVAs\">")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I probably suggested this name but FDR is more appropriate

fo = pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fo = pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header)
with pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header) as fo:

for record in fin:
if (record.info['ALGORITHMS'] == ('wham',) and record.info['SVTYPE'] == 'DEL') or \
(record.info['ALGORITHMS'] == ('scramble',) and record.info['HIGH_SR_BACKGROUND'] and record.alts == ('<INS:ME:SVA>',)):
record.filter.add('HIGH_ALGORITHM_FP_RATE')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record.filter.add('HIGH_ALGORITHM_FP_RATE')
record.filter.add('HIGH_ALGORITHM_FDR')

fo.write(record)
fin.close()
fo.close()
CODE
>>>

output {
File out = "~{prefix}.vcf.gz"
}
}



# Final VCF cleanup
task FinalCleanup {
Expand Down
2 changes: 1 addition & 1 deletion website/docs/advanced/cromwell/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Google Cloud Platform (GCP).

# Cromwell Server

There are two option to communicate with a running Cromwell server:
There are two options to communicate with a running Cromwell server:
[REST API](https://cromwell.readthedocs.io/en/stable/tutorials/ServerMode/), and
[Cromshell](https://github.com/broadinstitute/cromshell) which is a command line tool
to interface with a Cromwell server. We recommend using Cromshell due to its simplicity
Expand Down
4 changes: 2 additions & 2 deletions website/docs/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ description: Guide for using GATK-SV
sidebar_position: 4
---

A comprehensive guide for the single-sample calling mode is available in [GATK Best Practices for Structural Variation
Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
A comprehensive guide for the single-sample [calling mode](/docs/gs/calling_modes) is available in
[GATK Best Practices for Structural Variation Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
This material covers basic concepts of structural variant calling, specifics of SV VCF formatting, and
advanced troubleshooting that also apply to the joint calling mode as well. This guide is intended to supplement
documentation found here.
Expand Down
Loading
Loading