forked from melbournebioinformatics/variant_calling_pipeline
-
Notifications
You must be signed in to change notification settings - Fork 7
/
pipeline.groovy
executable file
·64 lines (61 loc) · 2.53 KB
/
pipeline.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
////////////////////////////////////////////////////////////
// GATK-based variant-calling pipeline, WGS version.
//
// This pipeline is a port of the VLSCI whole genome variant
// calling pipeline from this Git repo to Bpipe:
//
// https://github.com/claresloggett/variant_calling_pipeline/
//
// This example is intended as an illustration of how a full pipeline
// looks in Bpipe, especially for those who might be familiar with Ruffus.
// There are a number of places where the pipeline is non-ideal, but it has
// been kept that way to reflect the original as accurately as possible.
//
// One particular point of note is that the alignment output is stored
// in SAM format by some of the intermediate stages which would be very
// unadvisable for true whole-genome data. It is easy to make it store
// output in BAM format instead, so this would be a very advisable
// change if you intend to run this pipeline on a lot of data.
//
// There are a number of software requirements, which you should ensure are
// satisfied before running the pipeline. These need to be configured
// in the file called 'config.groovy' (which is loaded below). A template
// is provided in config.groovy.template which you can use to
// create the file and set the right paths to your tools and reference
// data.
//
// By default this pipeline will attempt to use all the available cores
// on the computer it runs on. If you don't wish to do that, limit the
// concurrency by running it with the -n flag:
//
// bpipe run -n 4 pipeline.groovy example_data/input_data_wgs/*.fastq.gz
//
// Assumes: paired end reads
// Assumes: files in form *<sample_name>*_..._R1.fastq.gz, *<sample_name>*_..._R2.fastq.gz
//
// Author: Simon Sadedin, MCRI
//
////////////////////////////////////////////////////////////
// Create this file by copying config.groovy.template and editing
load 'config.groovy'
// All the core pipeline stages in the pipeline
load 'pipeline_stages_config.groovy'
run {
// Align each pair of input files separately in parallel
"%_*_R*" * [
fastqc +
"%.gz" * [ alignBWA ] +
alignToSamPE +
samToSortedBam + indexBam +
dedup + indexBam
] +
// Merge all the bam files afterwards
mergeBams +
indexBam +
[
depthOfCoverage,
realignIntervals + realign + indexBam +
baseQualRecalCount + baseQualRecalTabulate + indexBam +
[ callIndels + filterIndels + annotateEnsembl, callSNPs + filterSNPs + annotateEnsembl ]
]
}