Skip to content

04. Call deletions with popdel call

Sebastian Niehus edited this page Mar 25, 2021 · 6 revisions

Overview
General calling options
Filtering options for calling
Examples

Overview

After creating the profiles, PopDel call takes a list of all profiles and performs the joint calling on all samples simultaneously. In the calling step variants are analyzed across all samples, iterating over the genome with windows of 30bp and performing a likelihood ratio test. The likelihood ratio test for a given genomic window compares the likelihood that a deletion of a certain length overlaps the window against the likelihood of observing the reference genome's haplotype. Thereby the size and frequency of the possible deletion(s) is iteratively estimated using the empirical insert size distributions and adaptive weighting. This approach also works for deletions that overlap each other or deletion with a very low allele frequency.

The BAM-files are no longer required for this. The genotyped calls are written in VCF-format (v4.2) to the file popdel.vcf. The option -o can be used for changing the path and the name of the output file. In the simplest case the call is:

popdel call myProfiles.txt

where myProfiles.txt contains the paths to the previously created profiles (one file per line). On UNIX-systems this list can conveniently be created using the function realpath and redirecting the output to a file.

realpath myProfileFolder/myProfile*.profile > myProfiles.txt

Instead of using a file containing the profiles it is also possible to define directly as command line arguments, but only if at least two profiles are used:

popdel call myProfile1.profile myProfile2.profile [...] myProfileN.profile

Note: It is NOT possible run popdel call with only on bam file as an argument as this will be interpreted as the text file containing all the profiles. Use a file containing the path to the bam file instead.

General calling options

Like the profiling, the calling can be restricted to a single or multiple regions of interest. This is done by either using the option -r followed by the samtools-style region (multiple times for multiple regions) or by using the option -R followed by the path of a file containing one region of interest per line. The profiles contain their own index, so jumping to the regions to perform the calling on them is fast. Other options include:

Flag Extended flag Meaning Default
-b --buffer-size Number of buffered windows. In range [10000..inf] 200000
-e --per-sample-rgid Internally modify each read group ID by adding the filename. This can be used if read groups across different samples have conflicting IDs. false
-f --pseudocount-fraction The biggest likelihood of the background distribution will be divided by this value to determine the pseudocounts of the histogram. Bigger values boost the sensitivity for HET calls but also increase the chance of miss classifying HOMDEL or HOMREF as HET calls. In range [50..inf] 500
-o --out Output file name popdel.vcf
-r --region-of-interest Genomic region 'chr:start-end' (closed interval, 1-based index). Calling is limited to this region. Multiple regions can be defined by using the parameter -r multiple times /
-R --ROI-file File listing one or more regions of interest one region per line. See parameter -r.
-n --no-regenotyping Outputs every potential variant window without re-genotyping false
-p --prior-probability Prior probability of a deletion. In range [0.0..0.9999] 0.0001
-t --iterations Maximum number of iterations in EM for length estimation 15
-u --unsmoothed Disable the smoothing of the insert size histogram false
-x --uncompressed-in Read uncompressed profiles. If used, the profiles of all samples must be uncompressed. See option '-x' for PopDel profile or option '-o' for PopDel view. false

Note: Changing the value for the buffer size has a direct influence on memory usage and running time. The default value is set to a good compromise between running time and memory consumption. You can expect the required memory to roughly scale 1:1 with the window buffer, meaning an increase of the buffer by a factor of ten will also increase the required memory by a factor of ten. In our tests this led to a reduction of the running time by one third.

Filtering options for calling

Flag Extended flag Meaning Default
-A --active-coverage-file File with lines consisting of "ReadGroup maxCov". If this value is reached no more new reads are loaded for this read group until the coverage drops again. The sample will be excluded from calling in high-coverage windows. A value of 0 disables the filter for the read group.
-a --active-coverage Maximum number of active read pairs (~coverage). This value is taken for all read groups that are not listed in 'active-coverage-file'. Setting it to 0 disables the filter for all read groups that are not specified in 'active-coverage-file'. In range [0..inf] 100
-c --min-relative-window-cover Determines which fraction of a deletion has to be covered by significant windows (see SWIN INFO field). In range [0..2.0] 0.5
-d --max-deletion-size Maximum size of deletions 10000
-F --output-failed Also output calls which did not pass the filters.
-l --min-init-length Minimal deletion length at initialization of iteration 4*standard deviation
-m --min-length Minimal deletion length during iteration 95th percentile of standard deviations
-s --min-sample-fraction Minimum fraction of samples which is required to have enough data in the window. In range [0..1.0] 0.1

Examples

Perform calling on all profiles listed in myProfiles.txt and write the output to myCalls.vcf:

popdel call -o myCalls.vcf myProfiles.txt

Perform calling on all profiles listed in myProfiles.txt, only reporting deletions between length 500 and 5000 on chr21 or chr19:45000000-55000000. Only deletions with an initialization-length of at least 450 are promoted to further iterations.

popdel call -l 450 -m 500 -d 5000 -r chr21 -r chr19:45000000-55000000 myProfiles.txt


Next page → Output popdel.vcf