Skip to content

Commit

Permalink
Resolve merge
Browse files Browse the repository at this point in the history
  • Loading branch information
kjaisingh committed Nov 1, 2024
2 parents 1ad3c1c + fb6720a commit cbcf76e
Show file tree
Hide file tree
Showing 27 changed files with 392 additions and 392 deletions.
9 changes: 9 additions & 0 deletions .github/.dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,15 @@ workflows:
tags:
- /.*/

- subclass: WDL
name: VisualizeCnvs
primaryDescriptorPath: /wdl/VisualizeCnvs.wdl
filters:
branches:
- main
tags:
- /.*/

- subclass: WDL
name: SingleSamplePipeline
primaryDescriptorPath: /wdl/GATKSVPipelineSingleSample.wdl
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A structural variation discovery pipeline for Illumina short-read whole-genome sequencing (WGS) data.

For technical documentation on GATK-SV, including how to run the pipeline, please refer to our website.
For technical documentation on GATK-SV, including how to run the pipeline, please refer to our [website](https://broadinstitute.github.io/gatk-sv/).

## Repository structure
* `/carrot`: [Carrot](https://github.com/broadinstitute/carrot) tests
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"VisualizeCnvs.vcf_or_bed": "${this.filtered_vcf}",
"VisualizeCnvs.prefix": "${this.sample_set_set_id}",
"VisualizeCnvs.median_files": "${this.sample_sets.median_cov}",
"VisualizeCnvs.rd_files": "${this.sample_sets.merged_bincov}",
"VisualizeCnvs.ped_file": "${workspace.cohort_ped_file}",
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": "${workspace.sv_pipeline_docker}"
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@
"VisualizeCnvs.rd_files": [{{ test_batch.merged_coverage_file | tojson }}],
"VisualizeCnvs.ped_file": {{ test_batch.ped_file | tojson }},
"VisualizeCnvs.min_size": 50000,
"VisualizeCnvs.flags": "",
"VisualizeCnvs.flags": "-s 999999999",
"VisualizeCnvs.sv_pipeline_docker": {{ dockers.sv_pipeline_docker | tojson }}
}
2 changes: 1 addition & 1 deletion scripts/test/terra_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def main():
parser.add_argument("-j", "--womtool-jar", help="Path to womtool jar", required=True)
parser.add_argument("-n", "--num-input-jsons",
help="Number of Terra input JSONs expected",
required=False, default=25, type=int)
required=False, default=26, type=int)
parser.add_argument("--log-level",
help="Specify level of logging information, ie. info, warning, error (not case-sensitive)",
required=False, default="INFO")
Expand Down
85 changes: 84 additions & 1 deletion wdl/CleanVcfChromosome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,29 @@ workflow CleanVcfChromosome {
RuntimeAttr? runtime_override_stitch_fragmented_cnvs
RuntimeAttr? runtime_override_final_cleanup
RuntimeAttr? runtime_override_rescue_me_dels
# overrides for local tasks
RuntimeAttr? runtime_override_clean_vcf_1a
RuntimeAttr? runtime_override_clean_vcf_2
RuntimeAttr? runtime_override_clean_vcf_3
RuntimeAttr? runtime_override_clean_vcf_4
RuntimeAttr? runtime_override_clean_vcf_5_scatter
RuntimeAttr? runtime_override_clean_vcf_5_make_cleangq
RuntimeAttr? runtime_override_clean_vcf_5_find_redundant_multiallelics
RuntimeAttr? runtime_override_clean_vcf_5_polish
RuntimeAttr? runtime_override_stitch_fragmented_cnvs
RuntimeAttr? runtime_override_final_cleanup
RuntimeAttr? runtime_override_rescue_me_dels
RuntimeAttr? runtime_attr_add_high_fp_rate_filters
# Clean vcf 1b
RuntimeAttr? runtime_attr_override_subset_large_cnvs_1b
RuntimeAttr? runtime_attr_override_sort_bed_1b
RuntimeAttr? runtime_attr_override_intersect_bed_1b
RuntimeAttr? runtime_attr_override_build_dict_1b
RuntimeAttr? runtime_attr_override_scatter_1b
RuntimeAttr? runtime_attr_override_filter_vcf_1b
RuntimeAttr? runtime_override_concat_vcfs_1b
RuntimeAttr? runtime_override_cat_multi_cnvs_1b
RuntimeAttr? runtime_override_preconcat_step1
RuntimeAttr? runtime_override_hail_merge_step1
Expand Down Expand Up @@ -271,9 +294,17 @@ workflow CleanVcfChromosome {
runtime_attr_override = runtime_override_rescue_me_dels
}
call FinalCleanup {
call AddHighFDRFilters {
input:
vcf=RescueMobileElementDeletions.out,
prefix="~{prefix}.high_fdr_filtered",
sv_pipeline_docker=sv_pipeline_docker,
runtime_attr_override=runtime_attr_add_high_fp_rate_filters
}
call FinalCleanup {
input:
vcf=AddHighFDRFilters.out,
contig=contig,
prefix="~{prefix}.final_cleanup",
sv_pipeline_docker=sv_pipeline_docker,
Expand Down Expand Up @@ -798,6 +829,58 @@ task StitchFragmentedCnvs {
}
}

# Add FILTER status for pockets of variants with high FP rate: wham-only DELs and Scramble-only SVAs with HIGH_SR_BACKGROUND
task AddHighFDRFilters {
input {
File vcf
String prefix
String sv_pipeline_docker
RuntimeAttr? runtime_attr_override
}
Float input_size = size(vcf, "GiB")
RuntimeAttr runtime_default = object {
mem_gb: 3.75,
disk_gb: ceil(10.0 + input_size * 3.0),
cpu_cores: 1,
preemptible_tries: 3,
max_retries: 1,
boot_disk_gb: 10
}
RuntimeAttr runtime_override = select_first([runtime_attr_override, runtime_default])
runtime {
memory: "~{select_first([runtime_override.mem_gb, runtime_default.mem_gb])} GB"
disks: "local-disk ~{select_first([runtime_override.disk_gb, runtime_default.disk_gb])} HDD"
cpu: select_first([runtime_override.cpu_cores, runtime_default.cpu_cores])
preemptible: select_first([runtime_override.preemptible_tries, runtime_default.preemptible_tries])
maxRetries: select_first([runtime_override.max_retries, runtime_default.max_retries])
docker: sv_pipeline_docker
bootDiskSizeGb: select_first([runtime_override.boot_disk_gb, runtime_default.boot_disk_gb])
}
command <<<
set -euo pipefail
python <<CODE
import pysam
with pysam.VariantFile("~{vcf}", 'r') as fin:
header = fin.header
header.add_line("##FILTER=<ID=HIGH_ALGORITHM_FDR,Description=\"Categories of variants with low precision including Wham-only deletions and certain Scramble SVAs\">")
with pysam.VariantFile("~{prefix}.vcf.gz", 'w', header=header) as fo:
for record in fin:
if (record.info['ALGORITHMS'] == ('wham',) and record.info['SVTYPE'] == 'DEL') or \
(record.info['ALGORITHMS'] == ('scramble',) and record.info['HIGH_SR_BACKGROUND'] and record.alts == ('<INS:ME:SVA>',)):
record.filter.add('HIGH_ALGORITHM_FDR')
fo.write(record)
CODE
>>>
output {
File out = "~{prefix}.vcf.gz"
}
}



# Final VCF cleanup
task FinalCleanup {
Expand Down
1 change: 1 addition & 0 deletions website/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# Generated files
.docusaurus
.cache-loader
package-lock.json

# Misc
.DS_Store
Expand Down
2 changes: 1 addition & 1 deletion website/docs/advanced/cromwell/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Google Cloud Platform (GCP).

# Cromwell Server

There are two option to communicate with a running Cromwell server:
There are two options to communicate with a running Cromwell server:
[REST API](https://cromwell.readthedocs.io/en/stable/tutorials/ServerMode/), and
[Cromshell](https://github.com/broadinstitute/cromshell) which is a command line tool
to interface with a Cromwell server. We recommend using Cromshell due to its simplicity
Expand Down
4 changes: 2 additions & 2 deletions website/docs/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ description: Guide for using GATK-SV
sidebar_position: 4
---

A comprehensive guide for the single-sample calling mode is available in [GATK Best Practices for Structural Variation
Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
A comprehensive guide for the single-sample [calling mode](/docs/gs/calling_modes) is available in
[GATK Best Practices for Structural Variation Discovery on Single Samples](https://gatk.broadinstitute.org/hc/en-us/articles/9022653744283-GATK-Best-Practices-for-Structural-Variation-Discovery-on-Single-Samples).
This material covers basic concepts of structural variant calling, specifics of SV VCF formatting, and
advanced troubleshooting that also apply to the joint calling mode as well. This guide is intended to supplement
documentation found here.
Expand Down
Loading

0 comments on commit cbcf76e

Please sign in to comment.