Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing number of variant read counts for v3.3.0 and v3.5.0 and above #111

Open
sb43 opened this issue Apr 26, 2024 · 2 comments
Open
Assignees

Comments

@sb43
Copy link
Member

sb43 commented Apr 26, 2024

I have tested different versions of pindel to narrow down the read counting issue, it seems the issue has been observed after version v3.3.0
Versions tested and issue reproduced.
pindel_3.10.0 (current version)
pindel_3.5.0
pindel_3.7.0

VCF record showing difference in DP and PD counts in both tumour and normal samples between pindel versions.

Correct call pindel version (V3.3.0)

tabix ../WTSI-COLO_170_1pre_pindel_old_version_3.3.0/WTSI-COLO_170_1pre_vs_WTSI-COLO_170_b.flagged.vcf.gz chr2:202555406-202555406
chr2 202555406 897c3bb4-f119-11ee-90f0-9928a4fd4733 GA G 1200 PASS FF017;LEN=1;PC=D;RE=202555414;REP=7;RS=202555406;S1=105;S2=1683.32 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:FD:FC:MTR:WTR:AMB:UNK:VAF ./.:0:0:0:0:15:19:15:19:0:0:34:0:0:32:0:0:0.0000 ./.:14:6:15:8:35:27:35:27:15:8:62:23:25:38:5:0:0.3968

Output from all affected versions

tabix pindel_3.10.0/WTSI-COLO_170_1pre_vs_WTSI-COLO_170_b.flagged.vcf.gz chr2:202555406-202555406
chr2 202555406 97380cb8-f75b-11ee-b828-bb7ec1a1a3fd GA G 1200 FF018 PC=D;RS=202555406;RE=202555414;LEN=1;S1=105;S2=1683.32;REP=7;FF017 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:FD:FC ./.:0:0:0:0:2:0:2:0:0:0:2:0 ./.:14:6:1:0:2:1:16:7:15:6:23:21

VCF_HEADER:
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PP,Number=1,Type=Integer,Description="Pindel calls on the positive strand">
##FORMAT=<ID=NP,Number=1,Type=Integer,Description="Pindel calls on the negative strand">
##FORMAT=<ID=PB,Number=1,Type=Integer,Description="BWA calls on the positive strand">
##FORMAT=<ID=NB,Number=1,Type=Integer,Description="BWA calls on the negative strand">
##FORMAT=<ID=PD,Number=1,Type=Integer,Description="BWA mapped reads on the positive strand">
##FORMAT=<ID=ND,Number=1,Type=Integer,Description="BWA mapped reads on the negative strand">
##FORMAT=<ID=PR,Number=1,Type=Integer,Description="Total mapped reads on the positive strand">
##FORMAT=<ID=NR,Number=1,Type=Integer,Description="Total mapped reads on the negative strand">
##FORMAT=<ID=PU,Number=1,Type=Integer,Description="Unique calls on the positive strand">
##FORMAT=<ID=NU,Number=1,Type=Integer,Description="Unique calls on the negative strand">
##FORMAT=<ID=FD,Number=1,Type=Integer,Description="Fragment depth">
##FORMAT=<ID=FC,Number=1,Type=Integer,Description="Fragment calls">

##FILTER=<ID=FF018,Description="Sufficient Depth: Pass if depth > 10">

Screenshot 2024-04-26 at 15 31 43
Thanks,
Shriram

@keiranmraine
Copy link
Contributor

Following our discussion I created a Dockerfile to install the dependencies in a single image and very small script focusing on the Bio::DB::HTS call to get_features_by_location that we identified as returning incorrect values:

sub _read_depth {
my ($sam, $seqid, $r_start, $r_end) = @_;
my $align_iter = $sam->get_features_by_location(
-seq_id => $seqid,
-start => $r_start,
-end => $r_end,
-filter => \&_read_filter,
-iterator => 1,
);

Investigation

Reproduce issue in cgppindel 3.10.0 image:

# quay.io/wtsicgp/cgppindel:3.10.0
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data quay.io/wtsicgp/cgppindel:3.10.0 perl /var/spool/data/htslib-bind.pl | wc -l
2

Command runs docker, mounting a folder containing the script and bam/bai files, inline executes the script and counts the reads.

$ tree tmp/
tmp/
├── htslib-bind.pl
├── normal.sample.dupmarked.bam
└── normal.sample.dupmarked.bam.bai

Reproduce and prove fix with minimal image for speed of rebuild:

# reproduced: hts-lib-1.12
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data hts-lib-1.12 perl /var/spool/data/htslib-bind.pl | wc -l
2
# fixed: hts-lib-1.20
$ docker run --rm -ti -v $PWD/tmp:/var/spool/data hts-lib-1.20 perl /var/spool/data/htslib-bind.pl | wc -l
34

Remediation actions

You need to update all of you image stack to 1.20 to ensure this isn't impacting any other tooling. For pindel specifically this is:

You should review the other code stacks as most are built on top of PCAP-core.

htslib-bind.pl

Expects files to be mounted to /var/spool/data/normal.sample.dupmarked.bam (inc index).

#!perl

use strict;
use Const::Fast qw(const);
use Bio::DB::HTS;

const my $F_UNMAPPED    => 4;
const my $F_NOT_PRIMARY => 256;
const my $F_QC_FAIL     => 512;
const my $F_DUPLICATE   => 1024;
const my $F_SUPP_ALIGN  => 2048;
const my @FILTER_READS_IF_FLAG => ($F_DUPLICATE, $F_UNMAPPED, $F_SUPP_ALIGN, $F_QC_FAIL, $F_NOT_PRIMARY);

my $sam = Bio::DB::HTS->new(-bam => '/var/spool/data/normal.sample.dupmarked.bam');
my $seqid = 'chr2';
my $r_start = 202555406;
my $r_end = 202555406;

my $align_iter = $sam->get_features_by_location(
    -seq_id => $seqid,
    -start => $r_start,
    -end => $r_end,
    -filter => \&_read_filter,
    -iterator => 1,
);

while(my $a = $align_iter->next_seq) {
	print $a->qname."\n";
}

sub _read_filter {
  my $flag = shift->flag;
  for my $test(@FILTER_READS_IF_FLAG) {
    return 0 if(($flag | $test) == $flag);
  }
  return 1;
}

@tomdrever
Copy link

This issue has been resolved in cgpPindel 3.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants