diff --git a/MANUAL.markdown b/MANUAL.markdown
index 33040aa1..907bcc59 100644
--- a/MANUAL.markdown
+++ b/MANUAL.markdown
@@ -1427,7 +1427,7 @@ on separate processors/cores and synchronize when parsing reads and
outputting alignments. Searching for alignments is highly parallel,
and speedup is fairly close to linear.
-
Comma-separated list of files containing the #1 mates (filename usually includes _1), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_1.fq,flyB_1.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m2>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #1 mates from the “standard in” filehandle.
+
Comma-separated list of files containing the #1 mates (filename usually includes _1), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_1.fq,flyB_1.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m2>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #1 mates from the “standard in” filehandle.
@@ -230,7 +230,7 @@
Main arguments
<m2>
-
Comma-separated list of files containing the #2 mates (filename usually includes _2), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_2.fq,flyB_2.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #2 mates from the “standard in” filehandle.
+
Comma-separated list of files containing the #2 mates (filename usually includes _2), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_2.fq,flyB_2.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #2 mates from the “standard in” filehandle.
@@ -238,7 +238,7 @@
Main arguments
<r>
-
Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequence of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the “standard in” filehandle.
+
Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequence of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the “standard in” filehandle.
@@ -246,7 +246,7 @@
Main arguments
<i>
-
A comma-separated list of interleaved paired-end FASTQ files, where the records for the mate #1s are interleaved with the records for the mate #2s. Reads may be a mix of different lengths. If - is specified, Bowtie reads from the “standard in” filehandle.
+
A comma-separated list of interleaved paired-end FASTQ files, where the records for the mate #1s are interleaved with the records for the mate #2s. Reads may be a mix of different lengths. If - is specified, Bowtie reads from the “standard in” filehandle.
@@ -254,7 +254,7 @@
Main arguments
<s>
-
A comma-separated list of files containing unpaired reads to be aligned, or, if -c is specified, the unpaired read sequences themselves. E.g., this might be lane1.fq,lane2.fq,lane3.fq,lane4.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Reads may be a mix of different lengths. If - is specified, Bowtie gets the reads from the “standard in” filehandle.
+
A comma-separated list of files containing unpaired reads to be aligned, or, if -c is specified, the unpaired read sequences themselves. E.g., this might be lane1.fq,lane2.fq,lane3.fq,lane4.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Reads may be a mix of different lengths. If - is specified, Bowtie gets the reads from the “standard in” filehandle.
@@ -262,7 +262,7 @@
Main arguments
<hit>
-
File to write alignments to. By default, alignments are written to the “standard out” filehandle (i.e. the console).
+
File to write alignments to. By default, alignments are written to the “standard out” filehandle (i.e. the console).
@@ -290,7 +290,7 @@
Input
-F
-
Reads are substrings (k-mers) extracted from a FASTA file s. Specifically, for every reference sequence in FASTA file s, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, … until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like sequence_offset, where sequence is the name of the FASTA sequence it was drawn from and offset is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
+
Reads are substrings (k-mers) extracted from a FASTA file s. Specifically, for every reference sequence in FASTA file s, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, … until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like sequence_offset, where sequence is the name of the FASTA sequence it was drawn from and offset is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
@@ -314,7 +314,7 @@
Input
-s/--skip <int>
-
Skip (i.e. do not align) the first <int> reads or pairs in the input.
+
Skip (i.e. do not align) the first <int> reads or pairs in the input.
@@ -362,7 +362,7 @@
Input
--solexa-quals
-
Convert input qualities from Solexa (which can be negative) to Phred (which can’t). This is usually the right option for use with (unconverted) reads emitted by GA Pipeline versions prior to 1.3. Default: off.
+
Convert input qualities from Solexa (which can be negative) to Phred (which can’t). This is usually the right option for use with (unconverted) reads emitted by GA Pipeline versions prior to 1.3. Default: off.
@@ -378,7 +378,7 @@
Input
--integer-quals
-
Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40…, rather than ASCII characters, e.g., II?I…. Integers are treated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.
+
Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40…, rather than ASCII characters, e.g., II?I…. Integers are treated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.
@@ -386,7 +386,7 @@
Input
--large-index
-
Force usage of a ‘large’ index (those ending in ‘.ebwtl’), even if a small one is present. Default: off.
+
Force usage of a ‘large’ index (those ending in ‘.ebwtl’), even if a small one is present. Default: off.
@@ -405,7 +405,7 @@
Alignment
-n/--seedmms <int>
-
Maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/--seedlen). This may be 0, 1, 2 or 3 and the default is 2. This option is mutually exclusive with the -v option.
+
Maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/--seedlen). This may be 0, 1, 2 or 3 and the default is 2. This option is mutually exclusive with the -v option.
@@ -413,7 +413,7 @@
Alignment
-e/--maqerr <int>
-
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the “seed”. The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
+
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the “seed”. The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
@@ -421,7 +421,7 @@
Alignment
-l/--seedlen <int>
-
The “seed length”; i.e., the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
+
The “seed length”; i.e., the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
@@ -477,7 +477,7 @@
Alignment
--maxbts
-
The maximum number of backtracks permitted when aligning a read in -n 2 or -n 3 mode (default: 125 without --best, 800 with --best). A “backtrack” is the introduction of a speculative substitution into the alignment. Without this limit, the default parameters will sometimes require that bowtie try 100s or 1,000s of backtracks to align a read, especially if the read has many low-quality bases and/or has no valid alignments, slowing bowtie down significantly. However, this limit may cause some valid alignments to be missed. Higher limits yield greater sensitivity at the expensive of longer running times. See also: -y/--tryhard.
+
The maximum number of backtracks permitted when aligning a read in -n 2 or -n 3 mode (default: 125 without --best, 800 with --best). A “backtrack” is the introduction of a speculative substitution into the alignment. Without this limit, the default parameters will sometimes require that bowtie try 100s or 1,000s of backtracks to align a read, especially if the read has many low-quality bases and/or has no valid alignments, slowing bowtie down significantly. However, this limit may cause some valid alignments to be missed. Higher limits yield greater sensitivity at the expensive of longer running times. See also: -y/--tryhard.
@@ -509,7 +509,7 @@
Alignment
--reads-per-batch <int>
-
Part of bowtie’s batch parsing and used to specify the number of reads that bowtie will consume from the input file at once. Default: 16
+
Part of bowtie’s batch parsing and used to specify the number of reads that bowtie will consume from the input file at once. Default: 16
@@ -520,7 +520,7 @@
Reporting
-k <int>
-
Report up to <int> valid alignments per read or pair (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Report up to <int> valid alignments per read or pair (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
@@ -528,7 +528,7 @@
Reporting
-a/--all
-
Report all valid alignments per read or pair (default: off). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower if -a/--all is specified. If you would like to use Bowtie with -a, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Report all valid alignments per read or pair (default: off). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower if -a/--all is specified. If you would like to use Bowtie with -a, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
@@ -536,7 +536,7 @@
Reporting
-m <int>
-
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
@@ -544,7 +544,7 @@
Reporting
-M <int>
-
Behaves like -m except that if a read has more than <int> reportable alignments, one is reported at random. In default output mode, the selected alignment’s 7th column is set to <int>+1 to indicate the read has at least <int>+1 valid alignments. In -S/--sam mode, the selected alignment is given a MAPQ (mapping quality) of 0 and the XM:I field is set to <int>+1. This option requires --best; if specified without --best, --best is enabled automatically.
+
Behaves like -m except that if a read has more than <int> reportable alignments, one is reported at random. In default output mode, the selected alignment’s 7th column is set to <int>+1 to indicate the read has at least <int>+1 valid alignments. In -S/--sam mode, the selected alignment is given a MAPQ (mapping quality) of 0 and the XM:I field is set to <int>+1. This option requires --best; if specified without --best, --best is enabled automatically.
@@ -552,7 +552,7 @@
Reporting
--best
-
Make Bowtie guarantee that reported singleton alignments are “best” in terms of stratum (i.e. number of mismatches, or mismatches in the seed in the case of -n mode) and in terms of the quality values at the mismatched position(s). Stratum always trumps quality; e.g. a 1-mismatch alignment where the mismatched position has Phred quality 40 is preferred over a 2-mismatch alignment where the mismatched positions both have Phred quality 10. When --best is not specified, Bowtie may report alignments that are sub-optimal in terms of stratum and/or quality (though an effort is made to report the best alignment). --best mode also removes all strand bias. Note that --best does not affect which alignments are considered “valid” by bowtie, only which valid alignments are reported by bowtie. When --best is specified and multiple hits are allowed (via -k or -a), the alignments for a given read are guaranteed to appear in best-to-worst order in bowtie’s output. bowtie is somewhat slower when --best is specified.
+
Make Bowtie guarantee that reported singleton alignments are “best” in terms of stratum (i.e. number of mismatches, or mismatches in the seed in the case of -n mode) and in terms of the quality values at the mismatched position(s). Stratum always trumps quality; e.g. a 1-mismatch alignment where the mismatched position has Phred quality 40 is preferred over a 2-mismatch alignment where the mismatched positions both have Phred quality 10. When --best is not specified, Bowtie may report alignments that are sub-optimal in terms of stratum and/or quality (though an effort is made to report the best alignment). --best mode also removes all strand bias. Note that --best does not affect which alignments are considered “valid” by bowtie, only which valid alignments are reported by bowtie. When --best is specified and multiple hits are allowed (via -k or -a), the alignments for a given read are guaranteed to appear in best-to-worst order in bowtie’s output. bowtie is somewhat slower when --best is specified.
@@ -560,7 +560,7 @@
Reporting
--strata
-
If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment “stratum”, report only those alignments that fall into the best stratum. By default, Bowtie reports all reportable alignments regardless of whether they fall into multiple strata. When --strata is specified, --best must also be specified.
+
If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment “stratum”, report only those alignments that fall into the best stratum. By default, Bowtie reports all reportable alignments regardless of whether they fall into multiple strata. When --strata is specified, --best must also be specified.
@@ -646,7 +646,7 @@
SAM
-S/--sam
-
Print alignments in SAM format. See the SAM output section of the manual for details. To suppress all SAM headers, use --sam-nohead in addition to -S/--sam. To suppress just the @SQ headers (e.g. if the alignment is against a very large number of reference sequences), use --sam-nosq in addition to -S/--sam. bowtie does not write BAM files directly, but SAM output can be converted to BAM on the fly by piping bowtie’s output to samtools view.
+
Print alignments in SAM format. See the SAM output section of the manual for details. To suppress all SAM headers, use --sam-nohead in addition to -S/--sam. To suppress just the @SQ headers (e.g. if the alignment is against a very large number of reference sequences), use --sam-nosq in addition to -S/--sam. bowtie does not write BAM files directly, but SAM output can be converted to BAM on the fly by piping bowtie’s output to samtools view.
@@ -678,7 +678,7 @@
SAM
--sam-RG <text>
-
Add <text> (usually of the form TAG:VAL, e.g. ID:IL7LANE2) as a field on the @RG header line. Specify --sam-RG multiple times to set multiple fields. See the SAM Spec for details about what fields are legal. Note that, if any @RG fields are set using this option, the ID and SM fields must both be among them to make the @RG line legal according to the SAM Spec. --sam-RG is ignored unless -S/--sam is also specified.
+
Add <text> (usually of the form TAG:VAL, e.g. ID:IL7LANE2) as a field on the @RG header line. Specify --sam-RG multiple times to set multiple fields. See the SAM Spec for details about what fields are legal. Note that, if any @RG fields are set using this option, the ID and SM fields must both be among them to make the @RG line legal according to the SAM Spec. --sam-RG is ignored unless -S/--sam is also specified.
@@ -709,7 +709,7 @@
Performance
-
+
--reorder
@@ -721,7 +721,7 @@
Performance
--mm
-
Use memory-mapped I/O to load the index, rather than normal C file I/O. Memory-mapping the index allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not possible.
+
Use memory-mapped I/O to load the index, rather than normal C file I/O. Memory-mapping the index allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not possible.
@@ -729,7 +729,7 @@
Performance
--shmem
-
Use shared memory to load the index, rather than normal C file I/O. Using shared memory allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not desirable. Unlike --mm, --shmem installs the index into shared memory permanently, or until the user deletes the shared memory chunks manually. See your operating system documentation for details on how to manually list and remove shared memory chunks (on Linux and Mac OS X, these commands are ipcs and ipcrm). You may also need to increase your OS’s maximum shared-memory chunk size to accommodate larger indexes; see your OS documentation.
+
Use shared memory to load the index, rather than normal C file I/O. Using shared memory allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not desirable. Unlike --mm, --shmem installs the index into shared memory permanently, or until the user deletes the shared memory chunks manually. See your operating system documentation for details on how to manually list and remove shared memory chunks (on Linux and Mac OS X, these commands are ipcs and ipcrm). You may also need to increase your OS’s maximum shared-memory chunk size to accommodate larger indexes; see your OS documentation.
@@ -779,8 +779,8 @@
Default bowtie output
Read sequence (reverse-complemented if orientation is -).
ASCII-encoded read qualities (reversed if orientation is -). The encoded quality values are on the Phred scale and the encoding is ASCII-offset by 33 (ASCII char !).
If -M was specified and the prescribed ceiling was exceeded for this read, this column contains the value of the ceiling, indicating that at least that many valid alignments were found in addition to the one reported.
-
Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be ‘0’ while the number of other alignments with the same number of mismatches might be large).
-
Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5’) end of the read.
+
Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be ‘0’ while the number of other alignments with the same number of mismatches might be large).
+
Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5’) end of the read.
SAM bowtie output
Following is a brief description of the SAM format as output by bowtie when the -S/--sam option is specified. For more details, see the SAM format specification.
@@ -860,9 +860,9 @@
SAM bowtie output
1-based offset into the forward reference strand where leftmost character of the alignment occurs
Mapping quality
CIGAR string representation of alignment
-
Name of reference sequence where mate’s alignment occurs. Set to = if the mate’s reference sequence is the same as this alignment’s, or * if there is no mate.
-
1-based offset into the forward reference strand where leftmost character of the mate’s alignment occurs. Offset is 0 if there is no mate.
-
Inferred insert size. Size is negative if the mate’s alignment occurs upstream of this alignment. Size is 0 if there is no mate.
+
Name of reference sequence where mate’s alignment occurs. Set to = if the mate’s reference sequence is the same as this alignment’s, or * if there is no mate.
+
1-based offset into the forward reference strand where leftmost character of the mate’s alignment occurs. Offset is 0 if there is no mate.
+
Inferred insert size. Size is negative if the mate’s alignment occurs upstream of this alignment. Size is 0 if there is no mate.
Read sequence (reverse-complemented if aligned to the reverse strand)
ASCII-encoded read qualities (reverse-complemented if the read aligned to the reverse strand). The encoded quality values are on the Phred quality scale and the encoding is ASCII-offset by 33 (ASCII char !), similarly to a FASTQ file.
Optional fields. Fields are tab-separated. For descriptions of all possible optional fields, see the SAM format specification. bowtie outputs some of these optional fields for each alignment, depending on the type of the alignment:
@@ -889,14 +889,14 @@
SAM bowtie output
XM:i:<N>
-
For a read with no reported alignments, <N> is 0 if the read had no alignments. If -m was specified and the read’s alignments were suppressed because the -m ceiling was exceeded, <N> equals the -m ceiling + 1, to indicate that there were at least that many valid alignments (but all were suppressed). In -M mode, if the alignment was randomly selected because the -M ceiling was exceeded, <N> equals the -M ceiling + 1, to indicate that there were at least that many valid alignments (of which one was reported at random).
+
For a read with no reported alignments, <N> is 0 if the read had no alignments. If -m was specified and the read’s alignments were suppressed because the -m ceiling was exceeded, <N> equals the -m ceiling + 1, to indicate that there were at least that many valid alignments (but all were suppressed). In -M mode, if the alignment was randomly selected because the -M ceiling was exceeded, <N> equals the -M ceiling + 1, to indicate that there were at least that many valid alignments (of which one was reported at random).
The bowtie-build indexer
bowtie-build builds a Bowtie index from a set of DNA sequences. bowtie-build outputs a set of 6 files with suffixes .1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.
-
Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/--packed, --bmax/--bmaxdivn, and --dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/--noauto option.
-
The indexer provides options pertaining to the “shape” of the index, e.g. --offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments. See Performance Tuning for details.
+
Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/--packed, --bmax/--bmaxdivn, and --dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/--noauto option.
+
The indexer provides options pertaining to the “shape” of the index, e.g. --offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments. See Performance Tuning for details.
The Bowtie index is based on the FM Index of Ferragina and Manzini, which in turn is based on the Burrows-Wheeler transform. The algorithm used to build the index is based on the blockwise algorithm of Karkkainen.
Command Line
Usage:
@@ -1007,7 +1007,7 @@
Options
-o/--offrate <int>
-
To map alignments back to positions on the reference sequences, it’s necessary to annotate (“mark”) some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^<int> rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
+
To map alignments back to positions on the reference sequences, it’s necessary to annotate (“mark”) some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^<int> rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
Comma-separated list of files containing the #2 mates (filename usually includes _2), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_2.fq,flyB_2.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #2 mates from the “standard in” filehandle.
+
Comma-separated list of files containing the #2 mates (filename usually includes _2), or, if -c is specified, the mate sequences themselves. E.g., this might be flyA_2.fq,flyB_2.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie will read the #2 mates from the “standard in” filehandle.
Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequence of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the “standard in” filehandle.
+
Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequence of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the “standard in” filehandle.
A comma-separated list of interleaved paired-end FASTQ files, where the records for the mate #1s are interleaved with the records for the mate #2s. Reads may be a mix of different lengths. If - is specified, Bowtie reads from the “standard in” filehandle.
+
A comma-separated list of interleaved paired-end FASTQ files, where the records for the mate #1s are interleaved with the records for the mate #2s. Reads may be a mix of different lengths. If - is specified, Bowtie reads from the “standard in” filehandle.
A comma-separated list of files containing unpaired reads to be aligned, or, if -c is specified, the unpaired read sequences themselves. E.g., this might be lane1.fq,lane2.fq,lane3.fq,lane4.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Reads may be a mix of different lengths. If - is specified, Bowtie gets the reads from the “standard in” filehandle.
+
A comma-separated list of files containing unpaired reads to be aligned, or, if -c is specified, the unpaired read sequences themselves. E.g., this might be lane1.fq,lane2.fq,lane3.fq,lane4.fq, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT. Reads may be a mix of different lengths. If - is specified, Bowtie gets the reads from the “standard in” filehandle.
Reads are substrings (k-mers) extracted from a FASTA file s. Specifically, for every reference sequence in FASTA file s, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, … until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like sequence_offset, where sequence is the name of the FASTA sequence it was drawn from and offset is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
+
Reads are substrings (k-mers) extracted from a FASTA file s. Specifically, for every reference sequence in FASTA file s, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, … until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like sequence_offset, where sequence is the name of the FASTA sequence it was drawn from and offset is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
Align in colorspace. Read characters are interpreted as colors. The index specified must be a colorspace index (i.e. built with bowtie-build-C, or bowtie will print an error message and quit. See Colorspace alignment for more details.
-
-
-
-
-
-Q/--quals <files>
-
-
-
Comma-separated list of files containing quality values for corresponding unpaired CSFASTA reads. Use in combination with -C and -f. --integer-quals is set automatically when -Q/--quals is specified.
-
-
-
-
-
--Q1 <files>
-
-
-
Comma-separated list of files containing quality values for corresponding CSFASTA #1 mates. Use in combination with -C, -f, and -1. --integer-quals is set automatically when --Q1 is specified.
-
-
-
-
-
--Q2 <files>
-
-
-
Comma-separated list of files containing quality values for corresponding CSFASTA #2 mates. Use in combination with -C, -f, and -2. --integer-quals is set automatically when --Q2 is specified.
-
-
-
-s/--skip <int>
-
Skip (i.e. do not align) the first <int> reads or pairs in the input.
+
Skip (i.e. do not align) the first <int> reads or pairs in the input.
Convert input qualities from Solexa (which can be negative) to Phred (which can’t). This is usually the right option for use with (unconverted) reads emitted by GA Pipeline versions prior to 1.3. Default: off.
+
Convert input qualities from Solexa (which can be negative) to Phred (which can’t). This is usually the right option for use with (unconverted) reads emitted by GA Pipeline versions prior to 1.3. Default: off.
Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40…, rather than ASCII characters, e.g., II?I…. Integers are treated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.
+
Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40…, rather than ASCII characters, e.g., II?I…. Integers are treated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.
Maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/--seedlen). This may be 0, 1, 2 or 3 and the default is 2. This option is mutually exclusive with the -v option.
+
Maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/--seedlen). This may be 0, 1, 2 or 3 and the default is 2. This option is mutually exclusive with the -v option.
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the “seed”. The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
+
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the “seed”. The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
The “seed length”; i.e., the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
+
The “seed length”; i.e., the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met, that alignment is valid. Also, if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate1 and a downstream mate2 to be forward-oriented. Default: --fr when -C (colorspace alignment) is not specified, --ff when -C is specified.
+
The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand. E.g., if --fr is specified and there is a candidate paired-end alignment where mate1 appears upstream of the reverse complement of mate2 and the insert length constraints are met, that alignment is valid. Also, if mate2 appears upstream of the reverse complement of mate1 and all other constraints are met, that too is valid. --rf likewise requires that an upstream mate1 be reverse-complemented and a downstream mate2 be forward-oriented. --ff requires both an upstream mate1 and a downstream mate2 to be forward-oriented.
The maximum number of backtracks permitted when aligning a read in -n 2 or -n 3 mode (default: 125 without --best, 800 with --best). A “backtrack” is the introduction of a speculative substitution into the alignment. Without this limit, the default parameters will sometimes require that bowtie try 100s or 1,000s of backtracks to align a read, especially if the read has many low-quality bases and/or has no valid alignments, slowing bowtie down significantly. However, this limit may cause some valid alignments to be missed. Higher limits yield greater sensitivity at the expensive of longer running times. See also: -y/--tryhard.
+
The maximum number of backtracks permitted when aligning a read in -n 2 or -n 3 mode (default: 125 without --best, 800 with --best). A “backtrack” is the introduction of a speculative substitution into the alignment. Without this limit, the default parameters will sometimes require that bowtie try 100s or 1,000s of backtracks to align a read, especially if the read has many low-quality bases and/or has no valid alignments, slowing bowtie down significantly. However, this limit may cause some valid alignments to be missed. Higher limits yield greater sensitivity at the expensive of longer running times. See also: -y/--tryhard.
Report up to <int> valid alignments per read or pair (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Report up to <int> valid alignments per read or pair (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
Report all valid alignments per read or pair (default: off). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower if -a/--all is specified. If you would like to use Bowtie with -a, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Report all valid alignments per read or pair (default: off). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the --best and --strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower if -a/--all is specified. If you would like to use Bowtie with -a, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
+
Suppress all alignments for a particular read or pair if more than <int> reportable alignments exist for it. Reportable alignments are those that would be reported given the -n, -v, -l, -e, -k, -a, --best, and --strata options. Default: no limit. Bowtie is designed to be very fast for small -m but bowtie can become significantly slower for larger values of -m. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/--offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
Behaves like -m except that if a read has more than <int> reportable alignments, one is reported at random. In default output mode, the selected alignment’s 7th column is set to <int>+1 to indicate the read has at least <int>+1 valid alignments. In -S/--sam mode, the selected alignment is given a MAPQ (mapping quality) of 0 and the XM:I field is set to <int>+1. This option requires --best; if specified without --best, --best is enabled automatically.
+
Behaves like -m except that if a read has more than <int> reportable alignments, one is reported at random. In default output mode, the selected alignment’s 7th column is set to <int>+1 to indicate the read has at least <int>+1 valid alignments. In -S/--sam mode, the selected alignment is given a MAPQ (mapping quality) of 0 and the XM:I field is set to <int>+1. This option requires --best; if specified without --best, --best is enabled automatically.
Make Bowtie guarantee that reported singleton alignments are “best” in terms of stratum (i.e. number of mismatches, or mismatches in the seed in the case of -n mode) and in terms of the quality values at the mismatched position(s). Stratum always trumps quality; e.g. a 1-mismatch alignment where the mismatched position has Phred quality 40 is preferred over a 2-mismatch alignment where the mismatched positions both have Phred quality 10. When --best is not specified, Bowtie may report alignments that are sub-optimal in terms of stratum and/or quality (though an effort is made to report the best alignment). --best mode also removes all strand bias. Note that --best does not affect which alignments are considered “valid” by bowtie, only which valid alignments are reported by bowtie. When --best is specified and multiple hits are allowed (via -k or -a), the alignments for a given read are guaranteed to appear in best-to-worst order in bowtie’s output. bowtie is somewhat slower when --best is specified.
+
Make Bowtie guarantee that reported singleton alignments are “best” in terms of stratum (i.e. number of mismatches, or mismatches in the seed in the case of -n mode) and in terms of the quality values at the mismatched position(s). Stratum always trumps quality; e.g. a 1-mismatch alignment where the mismatched position has Phred quality 40 is preferred over a 2-mismatch alignment where the mismatched positions both have Phred quality 10. When --best is not specified, Bowtie may report alignments that are sub-optimal in terms of stratum and/or quality (though an effort is made to report the best alignment). --best mode also removes all strand bias. Note that --best does not affect which alignments are considered “valid” by bowtie, only which valid alignments are reported by bowtie. When --best is specified and multiple hits are allowed (via -k or -a), the alignments for a given read are guaranteed to appear in best-to-worst order in bowtie’s output. bowtie is somewhat slower when --best is specified.
If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment “stratum”, report only those alignments that fall into the best stratum. By default, Bowtie reports all reportable alignments regardless of whether they fall into multiple strata. When --strata is specified, --best must also be specified.
+
If many valid alignments exist and are reportable (e.g. are not disallowed via the -k option) and they fall into more than one alignment “stratum”, report only those alignments that fall into the best stratum. By default, Bowtie reports all reportable alignments regardless of whether they fall into multiple strata. When --strata is specified, --best must also be specified.
When decoding colorspace alignments, use <int> as the SNP penalty. This should be set to the user’s best guess of the true ratio of SNPs per base in the subject genome, converted to the Phred quality scale. E.g., if the user expects about 1 SNP every 1,000 positions, --snpphred should be set to 30 (which is also the default). To specify the fraction directly, use --snpfrac.
-
-
-
-
-
--snpfrac <dec>
-
-
-
When decoding colorspace alignments, use <dec> as the estimated ratio of SNPs per base. For best decoding results, this should be set to the user’s best guess of the true ratio. bowtie internally converts the ratio to a Phred quality, and behaves as if that quality had been set via the --snpphred option. Default: 0.001.
-
-
-
-
-
--col-cseq
-
-
-
If reads are in colorspace and the default output mode is active, --col-cseq causes the reads’ color sequence to appear in the read-sequence column (column 5) instead of the decoded nucleotide sequence. See the Decoding colorspace alignments section for details about decoding. This option is ignored in -S/--sam mode.
-
-
-
-
-
--col-cqual
-
-
-
If reads are in colorspace and the default output mode is active, --col-cqual causes the reads’ original (color) quality sequence to appear in the quality column (column 6) instead of the decoded qualities. See the Colorspace alignment section for details about decoding. This option is ignored in -S/--sam mode.
-
-
-
-
-
--col-keepends
-
-
-
When decoding colorspace alignments, bowtie trims off a nucleotide and quality from the left and right edges of the alignment. This is because those nucleotides are supported by only one color, in contrast to the middle nucleotides which are supported by two. Specify --col-keepends to keep the extreme-end nucleotides and qualities.
Print alignments in SAM format. See the SAM output section of the manual for details. To suppress all SAM headers, use --sam-nohead in addition to -S/--sam. To suppress just the @SQ headers (e.g. if the alignment is against a very large number of reference sequences), use --sam-nosq in addition to -S/--sam. bowtie does not write BAM files directly, but SAM output can be converted to BAM on the fly by piping bowtie’s output to samtools view.
+
Print alignments in SAM format. See the SAM output section of the manual for details. To suppress all SAM headers, use --sam-nohead in addition to -S/--sam. To suppress just the @SQ headers (e.g. if the alignment is against a very large number of reference sequences), use --sam-nosq in addition to -S/--sam. bowtie does not write BAM files directly, but SAM output can be converted to BAM on the fly by piping bowtie’s output to samtools view.
Add <text> (usually of the form TAG:VAL, e.g. ID:IL7LANE2) as a field on the @RG header line. Specify --sam-RG multiple times to set multiple fields. See the SAM Spec for details about what fields are legal. Note that, if any @RG fields are set using this option, the ID and SM fields must both be among them to make the @RG line legal according to the SAM Spec. --sam-RG is ignored unless -S/--sam is also specified.
+
Add <text> (usually of the form TAG:VAL, e.g. ID:IL7LANE2) as a field on the @RG header line. Specify --sam-RG multiple times to set multiple fields. See the SAM Spec for details about what fields are legal. Note that, if any @RG fields are set using this option, the ID and SM fields must both be among them to make the @RG line legal according to the SAM Spec. --sam-RG is ignored unless -S/--sam is also specified.
Use memory-mapped I/O to load the index, rather than normal C file I/O. Memory-mapping the index allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not possible.
+
Use memory-mapped I/O to load the index, rather than normal C file I/O. Memory-mapping the index allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not possible.
Use shared memory to load the index, rather than normal C file I/O. Using shared memory allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not desirable. Unlike --mm, --shmem installs the index into shared memory permanently, or until the user deletes the shared memory chunks manually. See your operating system documentation for details on how to manually list and remove shared memory chunks (on Linux and Mac OS X, these commands are ipcs and ipcrm). You may also need to increase your OS’s maximum shared-memory chunk size to accommodate larger indexes; see your OS documentation.
+
Use shared memory to load the index, rather than normal C file I/O. Using shared memory allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using -p is not desirable. Unlike --mm, --shmem installs the index into shared memory permanently, or until the user deletes the shared memory chunks manually. See your operating system documentation for details on how to manually list and remove shared memory chunks (on Linux and Mac OS X, these commands are ipcs and ipcrm). You may also need to increase your OS’s maximum shared-memory chunk size to accommodate larger indexes; see your OS documentation.
Reference strand aligned to, + for forward strand, - for reverse
Name of reference sequence where alignment occurs, or numeric ID if no name was provided
0-based offset into the forward reference strand where leftmost character of the alignment occurs
-
Read sequence (reverse-complemented if orientation is -).
-
If the read was in colorspace, then the sequence shown in this column is the sequence of decoded nucleotides, not the original colors. See the Colorspace alignment section for details about decoding. To display colors instead, use the --col-cseq option.
-
ASCII-encoded read qualities (reversed if orientation is -). The encoded quality values are on the Phred scale and the encoding is ASCII-offset by 33 (ASCII char !).
-
If the read was in colorspace, then the qualities shown in this column are the decoded qualities, not the original qualities. See the Colorspace alignment section for details about decoding. To display colors instead, use the --col-cqual option.
+
Read sequence (reverse-complemented if orientation is -).
+
ASCII-encoded read qualities (reversed if orientation is -). The encoded quality values are on the Phred scale and the encoding is ASCII-offset by 33 (ASCII char !).
If -M was specified and the prescribed ceiling was exceeded for this read, this column contains the value of the ceiling, indicating that at least that many valid alignments were found in addition to the one reported.
-
Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be ‘0’ while the number of other alignments with the same number of mismatches might be large).
-
Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5’) end of the read.
+
Otherwise, this column contains the number of other instances where the same sequence aligned against the same reference characters as were aligned against in the reported alignment. This is not the number of other places the read aligns with the same number of mismatches. The number in this column is generally not a good proxy for that number (e.g., the number in this column may be ‘0’ while the number of other alignments with the same number of mismatches might be large).
+
Comma-separated list of mismatch descriptors. If there are no mismatches in the alignment, this field is empty. A single descriptor has the format offset:reference-base>read-base. The offset is expressed as a 0-based offset from the high-quality (5’) end of the read.
SAM bowtie output
Following is a brief description of the SAM format as output by bowtie when the -S/--sam option is specified. For more details, see the SAM format specification.
1-based offset into the forward reference strand where leftmost character of the alignment occurs
Mapping quality
CIGAR string representation of alignment
-
Name of reference sequence where mate’s alignment occurs. Set to = if the mate’s reference sequence is the same as this alignment’s, or * if there is no mate.
-
1-based offset into the forward reference strand where leftmost character of the mate’s alignment occurs. Offset is 0 if there is no mate.
-
Inferred insert size. Size is negative if the mate’s alignment occurs upstream of this alignment. Size is 0 if there is no mate.
+
Name of reference sequence where mate’s alignment occurs. Set to = if the mate’s reference sequence is the same as this alignment’s, or * if there is no mate.
+
1-based offset into the forward reference strand where leftmost character of the mate’s alignment occurs. Offset is 0 if there is no mate.
+
Inferred insert size. Size is negative if the mate’s alignment occurs upstream of this alignment. Size is 0 if there is no mate.
Read sequence (reverse-complemented if aligned to the reverse strand)
ASCII-encoded read qualities (reverse-complemented if the read aligned to the reverse strand). The encoded quality values are on the Phred quality scale and the encoding is ASCII-offset by 33 (ASCII char !), similarly to a FASTQ file.
Optional fields. Fields are tab-separated. For descriptions of all possible optional fields, see the SAM format specification. bowtie outputs some of these optional fields for each alignment, depending on the type of the alignment:
Aligned read has an edit distance of <N> in colorspace. This field is present in addition to the NM field in -C/--color mode, but is omitted otherwise.
-
-
-
-
MD:Z:<S>
-
For aligned reads, <S> is a string representation of the mismatched reference bases in the alignment. See SAM format specification for details. For colorspace alignments, <S> describes the decoded nucleotide alignment, not the colorspace alignment.
+
For aligned reads, <S> is a string representation of the mismatched reference bases in the alignment. See SAM format specification for details.
For a read with no reported alignments, <N> is 0 if the read had no alignments. If -m was specified and the read’s alignments were suppressed because the -m ceiling was exceeded, <N> equals the -m ceiling + 1, to indicate that there were at least that many valid alignments (but all were suppressed). In -M mode, if the alignment was randomly selected because the -M ceiling was exceeded, <N> equals the -M ceiling + 1, to indicate that there were at least that many valid alignments (of which one was reported at random).
+
For a read with no reported alignments, <N> is 0 if the read had no alignments. If -m was specified and the read’s alignments were suppressed because the -m ceiling was exceeded, <N> equals the -m ceiling + 1, to indicate that there were at least that many valid alignments (but all were suppressed). In -M mode, if the alignment was randomly selected because the -M ceiling was exceeded, <N> equals the -M ceiling + 1, to indicate that there were at least that many valid alignments (of which one was reported at random).
The bowtie-build indexer
bowtie-build builds a Bowtie index from a set of DNA sequences. bowtie-build outputs a set of 6 files with suffixes .1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.
-
Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/--packed, --bmax/--bmaxdivn, and --dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/--noauto option.
-
The indexer provides options pertaining to the “shape” of the index, e.g. --offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments. See Performance Tuning for details.
+
Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/--packed, --bmax/--bmaxdivn, and --dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/--noauto option.
+
The indexer provides options pertaining to the “shape” of the index, e.g. --offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments. See Performance Tuning for details.
The Bowtie index is based on the FM Index of Ferragina and Manzini, which in turn is based on the Burrows-Wheeler transform. The algorithm used to build the index is based on the blockwise algorithm of Karkkainen.
To map alignments back to positions on the reference sequences, it’s necessary to annotate (“mark”) some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^<int> rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
+
To map alignments back to positions on the reference sequences, it’s necessary to annotate (“mark”) some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^<int> rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
By default, when bowtie-inspect is run without -s or -n, it recreates the reference nucleotide sequences using the bit-encoded reference nucleotides kept in the .3.ebwt and .4.ebwt index files. When -e/--ebwt-ref is specified, bowtie-inspect recreates the reference sequences from the Burrows-Wheeler-transformed reference sequence in the .1.ebwt file instead. The reference recreation process is much slower when -e/--ebwt-ref is specified. Also, when -e/--ebwt-ref is specified and the index is in colorspace, the reference is printed in colors (A=blue, C=green, G=orange, T=red).
+
By default, when bowtie-inspect is run without -s or -n, it recreates the reference nucleotide sequences using the bit-encoded reference nucleotides kept in the .3.ebwt and .4.ebwt index files. When -e/--ebwt-ref is specified, bowtie-inspect recreates the reference sequences from the Burrows-Wheeler-transformed reference sequence in the .1.ebwt file instead. The reference recreation process is much slower when -e/--ebwt-ref is specified.