-
Notifications
You must be signed in to change notification settings - Fork 133
CompareBams4
##Motivation
Compare two BAM files. Print a tab-delimited report
##Compilation
- java compiler SDK 1.8 http://www.oracle.com/technetwork/java/index.html (NOT the old java 1.7 or 1.6) . Please check that this java is in the
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 ) - GNU Make >= 3.81
- curl/wget
- git
- xsltproc http://xmlsoft.org/XSLT/xsltproc2.html (tested with "libxml 20706, libxslt 10126 and libexslt 815")
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make cmpbams4
by default, the libraries are not included in the jar file, so you shouldn't move them (https://github.com/lindenb/jvarkit/issues/15#issuecomment-140099011 ). You can create a bigger but standalone executable jar by adding standalone=yes
on the command line:
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make cmpbams4 standalone=yes
The required libraries will be downloaded and installed in the dist
directory.
The a file local.mk can be created edited to override/add some definitions.
For example it can be used to set the HTTP proxy:
http.proxy.host=your.host.com
http.proxy.port=124567
##Synopsis
$ java -jar dist/cmpbams4.jar [options] (stdin|file)
- -o|--output (OUTPUT-FILE) Output file. Default:stdout.
- -c|--chain (VALUE) Lift Over file from bam1 to bam2. Optional
- -m|--mismatch (VALUE) Default Lift Over mismatch. negative=use default Default value : "-1".
- -novalidchain|--novalidchain Disable Lift Over chain validation Default value : "false".
- -st|--samtools Data was sorted using samtools sort -n algorithm (!= picard) see https://github.com/samtools/hts-specs/issues/5 Default value : "false".
- -h|--help print help
- -version|--version show version and exit
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/cmpbams/CompareBams4.java
The following Makefile compare the bam for hg19 and hg38 on chr22 and 21
include ../../config/config.mk
CHROMS=21 22
OUTDIR=tmp
define run
${OUTDIR}/$(1).bam : ${OUTDIR}/$(1).fa.bwt R1.fastq.gz R1.fastq.gz
${bwa.exe} mem -M -R '@RG\tID:SAMPLE\tLB:SAMPLE\tSM:SAMPLE\tPL:illumina\tCN:Nantes' ${OUTDIR}/$(1).fa $$(word 2,$$^) $$(word 3,$$^) | ${samtools.exe} view -b -u -S -F4 - | ${samtools.exe} sort -n -o $$@ -T ${OUTDIR}/$(1)_tmp -
${OUTDIR}/$(1).fa.bwt : ${OUTDIR}/$(1).fa
${bwa.exe} index $$<
${OUTDIR}/$(1).dict : ${OUTDIR}/$(1).fa
${java.exe} -jar $(picard.jar) CreateSequenceDictionary R=$$< O=$$@
${OUTDIR}/$(1).fa.fai : ${OUTDIR}/$(1).fa
${samtools.exe} faidx $$<
${OUTDIR}/$(1).fa :
mkdir -p $$(dir $$@) && rm -f $$@
$$(foreach C,${CHROMS}, curl "http://hgdownload.cse.ucsc.edu/goldenPath/$(1)/chromosomes/chr$${C}.fa.gz" | gunzip -c >> $$@;)
endef
all: ${OUTDIR}/diff.txt
${OUTDIR}/diff.txt : ${OUTDIR}/hg19.bam ${OUTDIR}/hg38.bam ${OUTDIR}/hg19ToHg38.over.chain ${OUTDIR}/hg19.dict ${OUTDIR}/hg38.dict
mkdir -p $(dir $@) && $(call run_jvarkit,cmpbams4) --novalidchain -st -c $(word 3,$^) $(word 1,$^) $(word 2,$^) > $@
${OUTDIR}/hg19ToHg38.over.chain :
mkdir -p $(dir $@) && curl "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz" | gunzip -c > $@
$(eval $(call run,hg19))
$(eval $(call run,hg38))
onlyIn liftover compareContig shift diffCigarOperations diffNM diffFlags diffChroms Count
BOTH SameChrom DiscordantContig . -1 0 147/163 chr22/chr21 2
BOTH SameChrom SameContig Gt100 3 15 83/83 chr22/chr22 1
BOTH SameChrom DiscordantContig . 3 5 147/129 chr21/chr22 1
BOTH SameChrom SameContig Gt100 -1 1 163/163 chr22/chr22 22
BOTH SameChrom DiscordantContig . 0 1 83/99 chr21/chr22 32
BOTH SameChrom SameContig Gt100 0 2 99/99 chr21/chr21 22
BOTH SameChrom DiscordantContig . 2 6 81/65 chr22/chr21 1
BOTH SameChrom SameContig Gt100 0 0 185/137 chr22/chr22 20
BOTH SameChrom SameContig Zero 0 0 177/177 chr22/chr22 1417
(...)
- Issue Tracker: http://github.com/lindenb/jvarkit/issues
- Source Code: http://github.com/lindenb/jvarkit
The project is licensed under the MIT license.
Should you cite cmpbams4 ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030