-
Notifications
You must be signed in to change notification settings - Fork 133
BamToSql
##Motivation
Convert a SAM/BAM to sqlite statements
##Compilation
- java 1.8 http://www.oracle.com/technetwork/java/index.html (NOT the old java 1.7 or 1.6) . Please check that this java is in the
${PATH}
. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 ) - GNU Make > 3.81
- curl/wget
- git
- apache ant is only required to compile htsjdk
- xsltproc http://xmlsoft.org/XSLT/xsltproc2.html
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make bam2sql
by default, the libraries are not included in the jar file, so you shouldn't move them (https://github.com/lindenb/jvarkit/issues/15#issuecomment-140099011 ). You can create a bigger but standalone executable jar by addinging standalone=yes
on the command line:
$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ make bam2sql standalone=yes
The required libraries will be downloaded and installed in the dist
directory.
The a file local.mk can be created edited to override/add some paths.
For example it can be used to set the HTTP proxy:
http.proxy.host=your.host.com
http.proxy.port=124567
##Synopsis
$ java -jar dist/bam2sql.jar [options] (stdin|file)
- -o|--output (OUTPUT-FILE) Output file. Default:stdout
- -R|--REF (FASTA) indexed Fasta sequence
- -r|--region (CHR:START-END) Restrict to a given region Default value : "".
- -c|--cigar print cigar data Default value : "false".
- -f|--flag expands details about sam flag Default value : "false".
- -h|--help print help
- -version|--version show version and exit
##Source Code
Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/misc/BamToSql.java
Inserting a BAM in a SQL is not a good idea of course ! But it might be interesting to get some informations about the bases in a segment of bam.
The schema can change if some options (-c , -f) are used. At the time of writing the schema is :
CREATE TABLE IF NOT EXISTS SamFile
(
id INTEGER PRIMARY KEY,
filename TEXT
);
CREATE TABLE IF NOT EXISTS Dictionary
(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
length INT NOT NULL,
tid INT NOT NULL,
samfile_id INT NOT NULL,
FOREIGN KEY(samfile_id) REFERENCES SamFile(id)
);
CREATE TABLE IF NOT EXISTS ReadGroup
(
id INTEGER PRIMARY KEY,
groupId TEXT NOT NULL,
sample TEXT NOT NULL,
samfile_id INT NOT NULL,
FOREIGN KEY(samfile_id) REFERENCES SamFile(id)
);
CREATE TABLE IF NOT EXISTS Read
(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
flag INTEGER NOT NULL,
rname TEXT,
pos INTEGER,
mapq INTEGER NOT NULL,
cigar TEXT,
rnext TEXT,
pnext INTEGER,
tlen INTEGER,
sequence TEXT NOT NULL,
qualities TEXT NOT NULL,
samfile_id INT NOT NULL,
group_id INT NOT NULL,
FOREIGN KEY(samfile_id) REFERENCES SamFile(id),
FOREIGN KEY(group_id) REFERENCES ReadGroup(id)
);
CREATE TABLE IF NOT EXISTS Cigar
(
id INTEGER PRIMARY KEY,
read_pos INT ,
read_base TEXT,
read_qual INT ,
ref_pos INT ,
ref_base TEXT,
operator TEXT NOT NULL,
read_id INT NOT NULL,
FOREIGN KEY(read_id) REFERENCES Read(id)
);
Build a sqlite3 database for a set of BAM files in the region "rotavirus:1-10""
$java -jar dist/bam2sql.jar -r 'rotavirus:1-10' -R ref.fa -c S*.bam | sqlite3 database.sqlite
Select data from sqlite database where the genomic position is "rotavirus:5"
select SamFile.filename,
ReadGroup.sample,
Read.flag,
Read.rname,
Cigar.operator,
Cigar.read_pos,
Cigar.read_base,
Cigar.read_qual,
Cigar.ref_pos,
Cigar.ref_base
from
SamFile,Read,Cigar,ReadGroup
where
SamFile.id = Read.samfile_id AND
ReadGroup.id = Read.group_id AND
Cigar.read_id = Read.id and
Read.rname = "rotavirus" and
Cigar.ref_pos= 5
;
query:
$ sqlite3 -header -separator ' ' database.sqlite < query.sql | column -t
output:
filename sample flag rname operator read_pos read_base read_qual ref_pos ref_base
S1.bam S1 99 rotavirus M 4 T 10 5 T
S1.bam S1 163 rotavirus M 4 T 10 5 T
S1.bam S1 163 rotavirus M 4 T 10 5 T
S1.bam S1 99 rotavirus M 4 T 10 5 T
S1.bam S1 163 rotavirus M 4 T 10 5 T
S1.bam S1 99 rotavirus M 3 T 10 5 T
S1.bam S1 99 rotavirus M 3 T 10 5 T
S1.bam S1 99 rotavirus M 3 T 10 5 T
S1.bam S1 99 rotavirus M 3 T 10 5 T
S1.bam S1 163 rotavirus M 3 T 10 5 T
S1.bam S1 163 rotavirus M 3 T 10 5 T
S1.bam S1 163 rotavirus M 3 T 10 5 T
S1.bam S1 99 rotavirus M 3 T 10 5 T
S1.bam S1 163 rotavirus M 3 T 10 5 T
S1.bam S1 99 rotavirus M 2 T 10 5 T
S1.bam S1 99 rotavirus M 2 T 10 5 T
S1.bam S1 163 rotavirus M 2 T 10 5 T
S1.bam S1 163 rotavirus M 2 T 10 5 T
S1.bam S1 99 rotavirus M 1 T 10 5 T
S1.bam S1 99 rotavirus M 1 T 10 5 T
S1.bam S1 163 rotavirus M 1 T 10 5 T
S1.bam S1 163 rotavirus M 1 T 10 5 T
S1.bam S1 163 rotavirus M 4 T 10 5 T
S1.bam S1 163 rotavirus M 0 T 10 5 T
S1.bam S1 163 rotavirus M 0 T 10 5 T
S1.bam S1 163 rotavirus M 0 T 10 5 T
S1.bam S1 163 rotavirus M 0 T 10 5 T
S1.bam S1 99 rotavirus S 3 T 10 5 T
S1.bam S1 163 rotavirus S 0 T 10 5 T
S1.bam S1 99 rotavirus S 3 T 10 5 T
S2.bam S2 99 rotavirus M 4 T 10 5 T
S2.bam S2 163 rotavirus M 4 T 10 5 T
S2.bam S2 99 rotavirus M 4 T 10 5 T
S2.bam S2 99 rotavirus M 4 T 10 5 T
S2.bam S2 163 rotavirus M 4 T 10 5 T
S2.bam S2 163 rotavirus M 3 T 10 5 T
S2.bam S2 99 rotavirus M 3 T 10 5 T
S2.bam S2 99 rotavirus M 3 T 10 5 T
S2.bam S2 99 rotavirus M 3 T 10 5 T
S2.bam S2 99 rotavirus M 2 A 10 5 T
S2.bam S2 163 rotavirus M 2 T 10 5 T
S2.bam S2 99 rotavirus M 1 T 10 5 T
S2.bam S2 99 rotavirus M 1 T 10 5 T
S2.bam S2 163 rotavirus M 3 T 10 5 T
S2.bam S2 99 rotavirus M 1 T 10 5 T
S2.bam S2 99 rotavirus M 1 T 10 5 T
S2.bam S2 99 rotavirus M 0 T 10 5 T
S2.bam S2 99 rotavirus M 0 T 10 5 T
S2.bam S2 163 rotavirus S 4 T 10 5 T
S2.bam S2 99 rotavirus S 2 A 10 5 T
S3.bam S3 99 rotavirus M 4 A 10 5 T
S3.bam S3 163 rotavirus M 4 T 10 5 T
S3.bam S3 99 rotavirus M 4 T 10 5 T
S3.bam S3 99 rotavirus M 3 T 10 5 T
S3.bam S3 99 rotavirus M 3 T 10 5 T
S3.bam S3 99 rotavirus M 3 T 10 5 T
S3.bam S3 163 rotavirus M 3 T 10 5 T
S3.bam S3 163 rotavirus M 3 T 10 5 T
S3.bam S3 163 rotavirus M 2 T 10 5 T
S3.bam S3 163 rotavirus M 2 T 10 5 T
S3.bam S3 99 rotavirus M 2 T 10 5 T
S3.bam S3 163 rotavirus M 2 T 10 5 T
S3.bam S3 99 rotavirus M 1 T 10 5 T
S3.bam S3 163 rotavirus M 1 A 10 5 T
S3.bam S3 99 rotavirus M 1 A 10 5 T
S3.bam S3 99 rotavirus M 1 A 10 5 T
S3.bam S3 99 rotavirus M 1 T 10 5 T
S3.bam S3 163 rotavirus M 1 T 10 5 T
S3.bam S3 99 rotavirus M 0 T 10 5 T
S3.bam S3 163 rotavirus M 0 T 10 5 T
S3.bam S3 163 rotavirus M 0 T 10 5 T
S3.bam S3 163 rotavirus M 0 T 10 5 T
S3.bam S3 163 rotavirus M 0 A 10 5 T
S3.bam S3 99 rotavirus M 0 T 10 5 T
S3.bam S3 163 rotavirus M 0 T 10 5 T
S3.bam S3 99 rotavirus S 2 A 10 5 T
S4.bam S4 163 rotavirus M 4 T 10 5 T
S4.bam S4 163 rotavirus M 4 T 10 5 T
S4.bam S4 99 rotavirus M 4 T 10 5 T
S4.bam S4 163 rotavirus M 4 T 10 5 T
S4.bam S4 163 rotavirus M 3 T 10 5 T
S4.bam S4 163 rotavirus M 3 T 10 5 T
S4.bam S4 99 rotavirus M 3 T 10 5 T
S4.bam S4 163 rotavirus M 2 T 10 5 T
S4.bam S4 99 rotavirus M 1 T 10 5 T
S4.bam S4 99 rotavirus M 0 T 10 5 T
S4.bam S4 99 rotavirus M 4 T 10 5 T
S4.bam S4 163 rotavirus M 0 T 10 5 T
S4.bam S4 163 rotavirus M 0 T 10 5 T
- Issue Tracker: http://github.com/lindenb/jvarkit/issues
- Source Code: http://github.com/lindenb/jvarkit
The project is licensed under the MIT license.
Should you cite bam2sql ? https://github.com/mr-c/shouldacite/blob/master/should-I-cite-this-software.md
The current reference is:
http://dx.doi.org/10.6084/m9.figshare.1425030
Lindenbaum, Pierre (2015): JVarkit: java-based utilities for Bioinformatics. figshare. http://dx.doi.org/10.6084/m9.figshare.1425030