VATAligner

VATAligner (Versatile Alignment Tool) is a high-performance, multi-purpose tool designed for DNA and protein sequence alignments. It supports a wide range of alignment tasks, including nucleotide and protein database creation, homology searches, splice alignments, and whole-genome sequencing.

Features

Fast and Efficient: Multi-threaded support for large-scale data processing.
Versatile: DNA and protein sequence alignment with advanced options.
Flexible: Fine-tune parameters to customize alignment for specific applications.
Comprehensive Options: Supports chimera alignment, circular DNA alignment, splice alignments, and more.

Command-line Options

General Options

Option	Description
`-h`, `--help`	Show help message.
`-p`, `--threads`	Number of CPU threads (default: 4).
`-d`, `--db`	Specify the database file.
`-a`, `--vaa`	VAT alignment archive (vatr) file.
`--dbtype`	Database type: `nucl` (nucleotide) or `prot` (protein).

Makedb Options

Option	Description
`-i`, `--in`	Input reference file in FASTA format.

Aligner Options

Option	Description
`-q`, `--query`	Input query file.
`-k`, `--maxtarget_seqs`	Maximum number of target sequences to report (default: 25).
`--top`	Report alignments within the top percentage range of alignment scores (default: 100).
`-e`, `--evalue`	Maximum e-value to report (default: 0.001).
`--min_score`	Minimum bit score to report alignments (default: 0).
`--report_id`	Minimum identity percentage to report alignments (default: 0).
`--gapopen`	Gap opening penalty (default: -1; maps to 11 for protein).
`--gapextend`	Gap extension penalty (default: -1; maps to 1 for protein).
`-S`, `--seed_len`	Seed length (default: 15 for DNA, 8 for protein).
`--match`	Match score (default: 5).
`--mismatch`	Mismatch score (default: -4).
`--simd_sort`	Enable SIMD (AVX2) sorting for double-indexing.
`--chimera`	Enable chimera alignment.
`--circ`	Enable circular alignment.
`--wga`	Enable whole-genome alignment.
`--wgs`	Enable whole-genome sequencing.
`--splice`	Enable splice alignments.
`--dnah`	Enable DNA homology search.
`--avx2`	Enable AVX2 hamming distance calculations.
`--hifi`	Enable PacBio HiFi/CCS genomic reads.
`--matrix`	Specify scoring matrix for protein alignment (default: `blosum62`).

Advanced Options

Option	Description
`--max_seeds`	Maximum number of hits to consider for a seed (default: 0).
`--window`	Window size for local hit search (default: 0).
`--minimizer`	Window size for minimizer (default: 10).
`--xdrop`	X-drop threshold for ungapped alignment (default: 18).
`-X`, `--gapped_xdrop`	X-drop threshold for gapped alignment in bits (default: 18).
`--ungapped_score`	Minimum raw alignment score to continue local extension (default: 0).
`--band`	Band size for dynamic programming computation (default: 8).
`--num_shapes`	Number of seed shapes to use (default: 0 = all available).
`--ra`	Reduced alphabet (options: `murphy.10`, `MMSEQS12`, `td.10`; default: `null`).
`--out2pro`	Output file for DNA-to-protein conversion (default: `out2pro.fa`).
`--for_only`	Enable alignment only on the forward strand.

Example Usage

DNA Alignment

Create a nucleotide database:

VAT makevatdb --dbtype nucl --in test_all.fa -d mydb

Run DNA alignment:

VAT dna -d mydb.vatf -q test_reads.fa -a alignment_output

View the results:

VAT view -a alignment_output.vatr -o alignment_output
vim alignment_output

Protein Alignment

Create a protein database:

VAT makevatdb --dbtype prot --in protein_ref.fa -d protein_db

Run protein alignment:

VAT protein -d protein_db.vatf -q protein_test.fa -a protein_alignment -p 4

View the results:

VAT view -a protein_alignment.vatr -o protein_alignment
vim protein_alignment

BLASTX Alignment

Create a protein database:

VAT makevatdb --dbtype prot --in protein_ref.fa -d protein_db

Run BLASTX alignment:

VAT blastx -d protein_db.vatf -q dna_reads.fa -a blastx_output

View the results:

VAT view -a blastx_output.vatr -o blastx_output
vim blastx_output

DNA-to-Protein Conversion

Convert DNA to protein:

VAT dna2pro --query dna_sequence.fa --out2pro protein_output.fa

Troubleshooting

Database Type Errors: Specify --dbtype as nucl or prot when creating a database.
Alignment Errors: Check input formats (FASTA/FASTQ) and ensure query and database files are compatible.

Output Formats

Format	Description
`tab`	Tab-delimited summary of alignments.
`sam`	Sequence Alignment/Map format.
`paf`	Pairwise Alignment Format for long reads.

Use --outfmt to specify the desired output format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VATAligner

Features

Command-line Options

General Options

Makedb Options

Aligner Options

Advanced Options

Example Usage

DNA Alignment

Protein Alignment

BLASTX Alignment

DNA-to-Protein Conversion

Troubleshooting

Output Formats

Files

README.md

Latest commit

History

README.md

File metadata and controls

VATAligner

Features

Command-line Options

General Options

Makedb Options

Aligner Options

Advanced Options

Example Usage

DNA Alignment

Protein Alignment

BLASTX Alignment

DNA-to-Protein Conversion

Troubleshooting

Output Formats