VATAligner (Versatile Alignment Tool) is a high-performance, multi-purpose tool designed for DNA and protein sequence alignments. It supports a wide range of alignment tasks, including nucleotide and protein database creation, homology searches, splice alignments, and whole-genome sequencing.
- Fast and Efficient: Multi-threaded support for large-scale data processing.
- Versatile: DNA and protein sequence alignment with advanced options.
- Flexible: Fine-tune parameters to customize alignment for specific applications.
- Comprehensive Options: Supports chimera alignment, circular DNA alignment, splice alignments, and more.
Option | Description |
---|---|
-h , --help |
Show help message. |
-p , --threads |
Number of CPU threads (default: 4). |
-d , --db |
Specify the database file. |
-a , --vaa |
VAT alignment archive (vatr) file. |
--dbtype |
Database type: nucl (nucleotide) or prot (protein). |
Option | Description |
---|---|
-i , --in |
Input reference file in FASTA format. |
Option | Description |
---|---|
-q , --query |
Input query file. |
-k , --maxtarget_seqs |
Maximum number of target sequences to report (default: 25). |
--top |
Report alignments within the top percentage range of alignment scores (default: 100). |
-e , --evalue |
Maximum e-value to report (default: 0.001). |
--min_score |
Minimum bit score to report alignments (default: 0). |
--report_id |
Minimum identity percentage to report alignments (default: 0). |
--gapopen |
Gap opening penalty (default: -1; maps to 11 for protein). |
--gapextend |
Gap extension penalty (default: -1; maps to 1 for protein). |
-S , --seed_len |
Seed length (default: 15 for DNA, 8 for protein). |
--match |
Match score (default: 5). |
--mismatch |
Mismatch score (default: -4). |
--simd_sort |
Enable SIMD (AVX2) sorting for double-indexing. |
--chimera |
Enable chimera alignment. |
--circ |
Enable circular alignment. |
--wga |
Enable whole-genome alignment. |
--wgs |
Enable whole-genome sequencing. |
--splice |
Enable splice alignments. |
--dnah |
Enable DNA homology search. |
--avx2 |
Enable AVX2 hamming distance calculations. |
--hifi |
Enable PacBio HiFi/CCS genomic reads. |
--matrix |
Specify scoring matrix for protein alignment (default: blosum62 ). |
Option | Description |
---|---|
--max_seeds |
Maximum number of hits to consider for a seed (default: 0). |
--window |
Window size for local hit search (default: 0). |
--minimizer |
Window size for minimizer (default: 10). |
--xdrop |
X-drop threshold for ungapped alignment (default: 18). |
-X , --gapped_xdrop |
X-drop threshold for gapped alignment in bits (default: 18). |
--ungapped_score |
Minimum raw alignment score to continue local extension (default: 0). |
--band |
Band size for dynamic programming computation (default: 8). |
--num_shapes |
Number of seed shapes to use (default: 0 = all available). |
--ra |
Reduced alphabet (options: murphy.10 , MMSEQS12 , td.10 ; default: null ). |
--out2pro |
Output file for DNA-to-protein conversion (default: out2pro.fa ). |
--for_only |
Enable alignment only on the forward strand. |
-
Create a nucleotide database:
VAT makevatdb --dbtype nucl --in test_all.fa -d mydb
-
Run DNA alignment:
VAT dna -d mydb.vatf -q test_reads.fa -a alignment_output
-
View the results:
VAT view -a alignment_output.vatr -o alignment_output vim alignment_output
-
Create a protein database:
VAT makevatdb --dbtype prot --in protein_ref.fa -d protein_db
-
Run protein alignment:
VAT protein -d protein_db.vatf -q protein_test.fa -a protein_alignment -p 4
-
View the results:
VAT view -a protein_alignment.vatr -o protein_alignment vim protein_alignment
-
Create a protein database:
VAT makevatdb --dbtype prot --in protein_ref.fa -d protein_db
-
Run BLASTX alignment:
VAT blastx -d protein_db.vatf -q dna_reads.fa -a blastx_output
-
View the results:
VAT view -a blastx_output.vatr -o blastx_output vim blastx_output
- Convert DNA to protein:
VAT dna2pro --query dna_sequence.fa --out2pro protein_output.fa
- Database Type Errors:
Specify
--dbtype
asnucl
orprot
when creating a database. - Alignment Errors:
Check input formats (
FASTA
/FASTQ
) and ensure query and database files are compatible.
Format | Description |
---|---|
tab |
Tab-delimited summary of alignments. |
sam |
Sequence Alignment/Map format. |
paf |
Pairwise Alignment Format for long reads. |
Use --outfmt
to specify the desired output format.