-
Notifications
You must be signed in to change notification settings - Fork 133
VCFFixIndels
Pierre Lindenbaum edited this page Mar 24, 2015
·
16 revisions
Fix variants Alleles in VCF (for @SolenaLS) Works with multiple allele in the ALT column
See also Compilation
$ make vcffixindels
Option | Description |
---|---|
-o (filename) | filename out. Default: stdout |
-h | get help (this screen) and exit. |
-v | print version and exit. |
-L (level) | log level. One of java.util.logging.Level . Optional. |
$ curl -sx proxy-upgrade.univ-nantes.prive:3128 "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/input_callsets/si/ALL.wgs.samtools_pass_filter.20130502.snps_indels.low_coverage.sites.vcf.gz" | gunzip -c | java -jar dist/vcffixindels.jar 2> /dev/null | grep FIX | head -n 15
##INFO=<ID=INDELFIXED,Number=1,Type=String,Description="Fix Indels for @SolenaLS (position|alleles...)">
1 2030197 . T TTTTGT,TTTTG 999 PASS (...);INDELFIXED=2030101|CGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGT|CGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGT|CGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTG;(...)
1 3046432 . C CCCT,CCC 999 PASS (...);INDEL;INDELFIXED=3046429|TC|TCCCT|TCCC;(...)
1 4258343 rs137902679;rs61115653 A AAT,AA 999 PASS (...);INDELFIXED=4258316|CAAAAAAAAA|CAAAAAAAAAA|CAAAAAAAAAAT;(...)
1 5374893 rs59294415 C CCCC,CCCCA 999 PASS (...);INDELFIXED=5374881|TCCCC|TCCCCCCC|TCCCCCCCA;(...)
1 5669486 rs143435517 C CACAT,CAC 999 PASS (...);INDELFIXED=5669414|TACACACACACACACACACACACAC|TACACACACACACACACACACACACAC|TACACACACACACACACACACACACACAT;(...)
1 5702066 . A AA,AAC 999 PASS (...);INDELFIXED=5702060|TAA|TAAAC|TAAA;(...)
1 5713690 rs70977965 A AAAAA,AAAAAC 999 PASS (...);INDELFIXED=5713678|CAAAA|CAAAAAAAA|CAAAAAAAAC;(...)
1 5911138 . T TGCCATT,TGCCATTCCAAAGAGGCACTCA 999 PASS (...);INDELFIXED=5911135|CT|CTGCCATTCCAAAGAGGCACTCA|CTGCCATT;(...)
1 6067285 rs34064079;rs59468731 G GG,GGC 999 PASS (...);INDELFIXED=6067261|TGGGGGGGG|TGGGGGGGGG|TGGGGGGGGGC;(...)
1 6069978 . TC T,TTC 999 PASS (...);INDELFIXED=6069933|CTTTTTTTTTTTTTTTC|CTTTTTTTTTTTTTTTTC|CTTTTTTTTTTTTTTT;(...)
1 6480786 . C CGGGCCCCAGGCTGCCCGCC,CGGGCCCCAGGCTGCCCGCCT 999 PASS (...);INDELFIXED=6480783|GC|GCGGGCCCCAGGCTGCCCGCCT|GCGGGCCCCAGGCTGCCCGCC;(...)
1 6829103 rs34184977;rs5772255 A AAC,AA 999 PASS (...);INDELFIXED=6829070|TAAAAAAAAAAA|TAAAAAAAAAAAA|TAAAAAAAAAAAAC;(...)
1 7086221 . AG A,AAG 999 PASS (...);INDELFIXED=7086179|TAAAAAAAAAAAAAAG|TAAAAAAAAAAAAAAAG|TAAAAAAAAAAAAAA;(...)
1 8096197 . T TATATATATAC,TAT 999 PASS (...);INDELFIXED=8096143|CATATATATATATATATAT|CATATATATATATATATATAT|CATATATATATATATATATATATATATAC;(...)
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/vcffixindels/VCFFixIndels.java
- "Unified Representation of Genetic Variants" http://bioinformatics.oxfordjournals.org/content/early/2015/02/19/bioinformatics.btv112.abstract (hey ! it was published after I wrote this tool !)
- https://github.com/quinlan-lab/vcftidy/blob/master/vcftidy.py
- http://www.cureffi.org/2014/04/24/converting-genetic-variants-to-their-minimal-representation/
- Issue Tracker: http://github.com/lindenb/jvarkit/issues`
- Source Code: http://github.com/lindenb/jvarkit
##History
- 2013 : Creation
The project is licensed under the MIT license.