-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi allelic vcf: vcfanno is not respecting VCF number format and is flipping scores #87
Comments
I see what you mean. It should use The other "fix" would be to simply set |
What about printing a warning when writing multiple values when the (previously written) header was |
somehow I missed that the alleles were flipped. that is indeed a bug. I'm looking into this and the other issue raised by @RoanKanninga now. |
After much messing about, this is going to have to be indicated as a WARNING. I thought I could magically adjust the order, but this changes the behavior in cases where Number=1 is actually what is desired. I'll push a fix shortly once I have the other issue resolved. |
re #83 and #87 if there are already values in the query info field for a variant with multiple alternates, incoming values will only overwrite existing values if they are non-nil (or non-zero values of the type). thanks @RoanKanninga for reporting and providing test-cases. when Number=1 in the annotation file (and therefore the input file) and there are multiple alternates in the input file, the values can be out of order. This now issues a warning indicating the file and the field in question and noting that it can be mitigated by decomposing the input file.
re #83 and #87 if there are already values in the query info field for a variant with multiple alternates, incoming values will only overwrite existing values if they are non-nil (or non-zero values of the type). thanks @RoanKanninga for reporting and providing test-cases. when Number=1 in the annotation file (and therefore the input file) and there are multiple alternates in the input file, the values can be out of order. This now issues a warning indicating the file and the field in question and noting that it can be mitigated by decomposing the input file.
Hi Brent, What I really want is just one value for my Number=1 field 1 123456 . A C,G AN=24,24 INFO field called AN, that should always be Number=1, since this is the total amount of all the alleles. But what I now see in my data is e.g. AN=24,24 instead of AN=24. So this is my real problem. |
Can you use "first" instead of "self" in the ops field of the config file? That should grab only a single value instead of multiple. |
some of this addressed in the latest release. |
This one is quite complex to explain, so i will start with an example
This is in my header
CADD,Number=1
CADD_SCALED,Number=A
When I have a multiallelic variant let say:
1 208063100 rs5780411 G GA,T
I would expect that CADD_SCALED has two values and CADD only one value.
This is correct when my file with the CADD/CADD_SCALED scores only contains this position once, when (in case of the cadd scores you will get scores for each ALT allele) you have multiple lines containing the same position but different ALT alleles it is going all wrong.
although CADD,Number=1, the CADD info field has now 2 values (for each ALT allele), and the scores has been flipped, the CADD score for ALT allele 1 has now the value of ALT allele 2 and vice versa
I included: input(input.vcf), output(annotated.vcf), conf(conf.toml) and annotationsfile (whole.vcf.gz + index)
vcfAnno.tar.gz
The text was updated successfully, but these errors were encountered: