Major updates making applications more useful 🥳
Two short-hand command-line arguments (-i
and -T
) break with previous versions 💀
- Release binaries CI/CD
- Input alignment format (
-i/--alignment
) from file extension (bam|sam|cram|paf
) or specifically with--alignment-format
- Added
--aligned/--group-aligned
filter to supplement filter by unique aligned reads (--reads/--group-reads
) - Pretty table output short argument is now
-T
(previously-t
) - Input alignment short argument is now
-i
(previously-A
) - Added
-H
argument to print machine-readable header to non-pretty table output [#13] - Reference alignment grouping by field in header and automated reference selection:
- Requires annotation in reference sequence header (description) e.g.taxid=9606; segment="M"
- Whitespace around header fields or values is trimmed (start-end) internally on parsing
---group-by <field>
: group alignments by this field
---group-sep <delimiter>
: the delimiter with which fields in the header are separated
---group-select-split <dir>
: selects a single reference per group and outputs to file in<dir >
({group_id}.fasta
)
---group-select-by <coverage|reads>
: selection by highest coverage or max reads
---group-select-order
outputs the selected reference with index prefixes sorted byselect-by
metric ({idx}-{group_id}.fasta
)
- Example:--group-by "taxid=" --group-sep ";" --group-select-split ref_seqs/ --group-select-by coverage
- If segment fields are specified each select segment reference is output by highest coverage or reads
- Command line:--segment-field
and--segment-field-nan
- Example:--segment-field "segment=" --segment-field-nan "segment=N/A"
- Grouped filtering and outputs behave different to non-grouped filtering and outputs:
- Non-group filters (--regions
,--reads
,--aligned
,--coverage
,--length
) are applied before grouping
- Group filters can be applied (--group-regions
,--group-reads
,--group-coverage
,--group-aligned
)
- Grouped output fields are distinct from the non-grouped fields - they change the following (described in--help
):
* Reference sequence identifier is the value that is grouped by followed by the number of grouped members in brackets e.g.9606 (5)
* Distinct alignment regions are summed across group members
* Alignments are summed across group members
* Unique reads aligned are recomputed across group members
* Covered bases and reference lengths are set to 0
* Coverage is selected to be the highest among the group members - Conditional coverage filter applied to
--regions
filters and applies it only if coverage is below this threshold
- This rescues high coverage sequences as these usually have few regions
---regions-coverage <0.0-1.0>
- a sufficient value can be somewhere around 0.3 - 0.6
- Short argument for conditional coverage filter (-t
) has replaced pretty table output (now-T
)