-
Notifications
You must be signed in to change notification settings - Fork 2
04. Call deletions with popdel call
Overview
General calling options
Filtering options for calling
Examples
After creating the profiles, PopDel call takes a list of all profiles and performs the joint calling on all samples simultaneously. In the calling step variants are analyzed across all samples, iterating over the genome with windows of 30bp and performing a likelihood ratio test. The likelihood ratio test for a given genomic window compares the likelihood that a deletion of a certain length overlaps the window against the likelihood of observing the reference genome's haplotype. Thereby the size and frequency of the possible deletion(s) is iteratively estimated using the empirical insert size distributions and adaptive weighting. This approach also works for deletions that overlap each other or deletion with a very low allele frequency.
The BAM-files are no longer required for this. The genotyped calls are written in VCF-format (v4.2) to the file popdel.vcf. The option -o
can be used for changing the path and the name of the output file. In the simplest case the call is:
popdel call myProfiles.txt
where myProfiles.txt contains the paths to the previously created profiles (one file per line). On UNIX-systems this list can conveniently be created using the function realpath and redirecting the output to a file.
realpath myProfileFolder/myProfile*.profile > myProfiles.txt
Instead of using a file containing the profiles it is also possible to define directly as command line arguments, but only if at least two profiles are used:
popdel call myProfile1.profile myProfile2.profile [...] myProfileN.profile
Note: It is NOT possible run popdel call with only on bam file as an argument as this will be interpreted as the text file containing all the profiles. Use a file containing the path to the bam file instead.
Like the profiling, the calling can be restricted to a single or multiple regions of interest. This is done by either using the option -r
followed by the samtools-style region (multiple times for multiple regions) or by using the option -R
followed by the path of a file containing one region of interest per line. The profiles contain their own index, so jumping to the regions to perform the calling on them is fast. Other options include:
Flag | Extended flag | Meaning | Default |
---|---|---|---|
-b |
--buffer-size |
Number of buffered windows. In range [10000..inf] | 200000 |
-e |
--per-sample-rgid |
Internally modify each read group ID by adding the filename. This can be used if read groups across different samples have conflicting IDs. | false |
-f |
--pseudocount-fraction |
The biggest likelihood of the background distribution will be divided by this value to determine the pseudocounts of the histogram. Bigger values boost the sensitivity for HET calls but also increase the chance of miss classifying HOMDEL or HOMREF as HET calls. In range [50..inf] | 500 |
-o |
--out |
Output file name | popdel.vcf |
-r |
--region-of-interest |
Genomic region 'chr:start-end' (closed interval, 1-based index). Calling is limited to this region. Multiple regions can be defined by using the parameter -r multiple times | / |
-R |
--ROI-file |
File listing one or more regions of interest | one region per line. See parameter -r. |
-n |
--no-regenotyping |
Outputs every potential variant window without re-genotyping | false |
-p |
--prior-probability |
Prior probability of a deletion. In range [0.0..0.9999] | 0.0001 |
-t |
--iterations |
Maximum number of iterations in EM for length estimation | 15 |
-u |
--unsmoothed |
Disable the smoothing of the insert size histogram | false |
-x |
--uncompressed-in |
Read uncompressed profiles. If used, the profiles of all samples must be uncompressed. See option '-x' for PopDel profile or option '-o' for PopDel view. | false |
Note: Changing the value for the buffer size has a direct influence on memory usage and running time. The default value is set to a good compromise between running time and memory consumption. You can expect the required memory to roughly scale 1:1 with the window buffer, meaning an increase of the buffer by a factor of ten will also increase the required memory by a factor of ten. In our tests this led to a reduction of the running time by one third.
Flag | Extended flag | Meaning | Default |
---|---|---|---|
-A |
--active-coverage-file |
File with lines consisting of "ReadGroup maxCov". If this value is reached no more new reads are loaded for this read group until the coverage drops again. The sample will be excluded from calling in high-coverage windows. A value of 0 disables the filter for the read group. | |
-a |
--active-coverage |
Maximum number of active read pairs (~coverage). This value is taken for all read groups that are not listed in 'active-coverage-file'. Setting it to 0 disables the filter for all read groups that are not specified in 'active-coverage-file'. In range [0..inf] | 100 |
-c |
--min-relative-window-cover |
Determines which fraction of a deletion has to be covered by significant windows (see SWIN INFO field). In range [0..2.0] | 0.5 |
-d |
--max-deletion-size |
Maximum size of deletions | 10000 |
-F |
--output-failed |
Also output calls which did not pass the filters. | |
-l |
--min-init-length |
Minimal deletion length at initialization of iteration | 4*standard deviation |
-m |
--min-length |
Minimal deletion length during iteration | 95th percentile of standard deviations |
-s |
--min-sample-fraction |
Minimum fraction of samples which is required to have enough data in the window. In range [0..1.0] | 0.1 |
Perform calling on all profiles listed in myProfiles.txt and write the output to myCalls.vcf:
popdel call -o myCalls.vcf myProfiles.txt
Perform calling on all profiles listed in myProfiles.txt, only reporting deletions between length 500 and 5000 on chr21 or chr19:45000000-55000000. Only deletions with an initialization-length of at least 450 are promoted to further iterations.
popdel call -l 450 -m 500 -d 5000 -r chr21 -r chr19:45000000-55000000 myProfiles.txt
Next page → Output popdel.vcf