Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This package only takes BND notation vcf ? #34

Closed
yangyxt opened this issue Mar 22, 2021 · 4 comments
Closed

This package only takes BND notation vcf ? #34

yangyxt opened this issue Mar 22, 2021 · 4 comments

Comments

@yangyxt
Copy link

yangyxt commented Mar 22, 2021

I tried to convert vcf records to grange objects and use breakpointGRangesToVCF function to normalise symbolic records to BND vcf records.

However, I found this is not available since the symbolic records will be stored as records with irange width > 1 in GRange Object. And there is an assertion in .toVcfBreakendNotationAlt all(width(gr)==1), so the records in GRange object derived from symbolic vcf records will surely fail this assertion.

I test this with a simple DELLY generated SV record VCF file. Here is a screenshot for GRange object derived from function breakpointRanges(vcf):
image

Therefore, generally speaking, StructuralVariantAnnotation cannot do format normalization for SV records in vcf files from different callers? I better do the normalization myself, like convert all symbolic records to BND notation records and then load the vcf into StructuralVariantAnnotation?

@hsiaoyi0504
Copy link

I have a similar question here. Probably also related to #33.
What's the acceptable notation of structural variant calls for StructuralVariantAnnotation?
Does it really support both notations of structural variants? Thank you.

@d-cameron
Copy link
Member

@yangyxt sorry for the late reply. Can you post which version of DELLY you're using, and a VCF with a few entries in it?

symbolic records will be stored as records with irange width > 1 in GRange Object

That is actually possible for IMPRECISE events. Without the input VCF, I'm not sure whether that is the case here, or a bug in SVA.

@d-cameron
Copy link
Member

d-cameron commented Jul 2, 2021

Sorry for the long delay - I'm currently updating the documentation to better describe the design of StructuralVariantAnnotation.

use breakpointGRangesToVCF
breakpointGRangesToVCF is only partially implemented and not officially released.

I tried to convert vcf records to grange objects to normalise symbolic records to BND vcf records.

StructuralVariantAnnotation already does this in breakpointRanges(). I have test cases for VCFs produced by crest, delly, gridss, manta, pindel, tigra, lumpy, and others.

What's the acceptable notation of structural variant calls for StructuralVariantAnnotation?
Does it really support both notations of structural variants? Thank you.

Any spec-compliant VCF representation (plus a few caller-specific ones I have special-case code for). That is, sequence symbolic, breakpoint, and breakend notations are all supported. For example, StructuralVariantAnnotation can correctly parse the following VCF:

##fileformat=VCFv4.2
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##ALT=<ID=DEL,Description="Deletion">
##contig=<ID=chr,length=18,sequence="CGTGTtgtagtaCCGTAA">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr	5	sequence	TTGTAGTA	T	.	.	
chr	5	symbolic	T	<DEL>	.	.	SVTYPE=DEL;SVLEN=-7;END=12
chr	5	breakpoint1	T	T[chr:13[	.	.	SVTYPE=BND;MATEID=breakpoint2
chr	13	breakpoint2	C	]chr:5]C	.	.	SVTYPE=BND;MATEID=breakpoint1
chr	5	breakend	T	T.	.	.	SVTYPE=BND

@d-cameron
Copy link
Member

What is not immediately clear from the docs is that SVA turns everything into breakpoint notation. In the delly example by the OP, SVA turns DUP000000000 into breakpoint notation hence why the output includes DUP000000000_bp1 and DUP000000000_bp2, and why exists INV00026615_bp4 (since an inversion has 2 breakpoints = 4 breakends).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants