-
Notifications
You must be signed in to change notification settings - Fork 133
ForkVcf
##Motivation
Fork a VCF.
##Compilation
Since 2016-05-30 the compilation of the "Java API for high-throughput sequencing data (HTS) formats" (htsjdk) library requires gradle http://gradle.org
- java compiler SDK 1.8 http://www.oracle.com/technetwork/java/index.html (NOT the old java 1.7 or 1.6) . Please check that this java is in the
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 ) - GNU Make > 3.81
- curl/wget
- git
- gradle http://gradle.org is only required to compile the "Java API for high-throughput sequencing data (HTS) formats" (htsjdk)
- xsltproc http://xmlsoft.org/XSLT/xsltproc2.html
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make forkvcf
by default, the libraries are not included in the jar file, so you shouldn't move them (https://github.com/lindenb/jvarkit/issues/15#issuecomment-140099011 ). You can create a bigger but standalone executable jar by addinging standalone=yes
on the command line:
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make forkvcf standalone=yes
The required libraries will be downloaded and installed in the dist
directory.
The a file local.mk can be created edited to override/add some definitions.
For example it can be used to set the HTTP proxy:
http.proxy.host=your.host.com
http.proxy.port=124567
to set the gradle user home ( https://docs.gradle.org/current/userguide/build_environment.html#sec:gradle_properties_and_system_properties )
gradle.user.home=/dir1/dir2/gradle_user_home
##Synopsis
$ java -jar dist/forkvcf.jar [options] (stdin|file.vcf|file.vcf.gz)
- -o|--output (OUTPUT-FILE) Output file. Default:stdout.
- -maxRecordsInRam|--maxRecordsInRam (NUMBER) When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort a SAM/VCF/... file, and increases the amount of RAM needed. Default value : "500000".
- -tmpdir|--tmpdir (TMPDIR) Set temporary directory
- -g|--groupfile (GROUP) Chromosome group file. Intervals are 1-based. If undefined, splitvcf will use the sequence dictionary to output one vcf per contig.
- -n|--count (VALUE) number of vcf files to generate Default value : "2".
- -c|--splitbychunk When this option is used, the variant are first saved in a temporary file, the number of variant is dividided by 'count' and the output files are lineray produced. The default is to dispatch the variants as they are coming in the stream. Default value : "false".
- -m|--manifest (VALUE) optional save produced vcf filenames in this file.
- -h|--help print help
- -version|--version show version and exit
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/misc/ForkVcf.java
Output filename (option -o) MUST contain the word GROUPID.
$
cat input.vcf | java -jar dist/forkvcf.jar -n 3 -o "_tmp.GROUPID.vcf" [main] INFO jvarkit - opening VCF file "_tmp.00001.vcf" for writing [main] INFO jvarkit - opening VCF file "_tmp.00002.vcf" for writing [main] INFO jvarkit - opening VCF file "_tmp.00003.vcf" for writing
$ wc _tmp.0000* 226 6819 143947 _tmp.00001.vcf 226 6819 140792 _tmp.00002.vcf 225 6161 125219 _tmp.00003.vcf
- Issue Tracker: http://github.com/lindenb/jvarkit/issues
- Source Code: http://github.com/lindenb/jvarkit
The project is licensed under the MIT license.
Should you cite forkvcf ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030