Release 0.6.0 · esteinig/vircov

Major updates making applications more useful 🥳

Two short-hand command-line arguments (-i and -T) break with previous versions 💀

Release binaries CI/CD
Input alignment format (-i/--alignment) from file extension (bam|sam|cram|paf) or specifically with --alignment-format
Added --aligned/--group-aligned filter to supplement filter by unique aligned reads (--reads/--group-reads)
Pretty table output short argument is now -T (previously -t)
Input alignment short argument is now -i (previously -A)
Added -H argument to print machine-readable header to non-pretty table output [#13]
Reference alignment grouping by field in header and automated reference selection:
- Requires annotation in reference sequence header (description) e.g. taxid=9606; segment="M"
- Whitespace around header fields or values is trimmed (start-end) internally on parsing
- --group-by <field>: group alignments by this field
- --group-sep <delimiter>: the delimiter with which fields in the header are separated
- --group-select-split <dir>: selects a single reference per group and outputs to file in <dir >({group_id}.fasta)
- --group-select-by <coverage|reads>: selection by highest coverage or max reads
- --group-select-order outputs the selected reference with index prefixes sorted by select-by metric ({idx}-{group_id}.fasta)
- Example: --group-by "taxid=" --group-sep ";" --group-select-split ref_seqs/ --group-select-by coverage
If segment fields are specified each select segment reference is output by highest coverage or reads
- Command line: --segment-field and --segment-field-nan
- Example: --segment-field "segment=" --segment-field-nan "segment=N/A"
Grouped filtering and outputs behave different to non-grouped filtering and outputs:
- Non-group filters (--regions, --reads, --aligned, --coverage, --length) are applied before grouping
- Group filters can be applied (--group-regions, --group-reads, --group-coverage, --group-aligned)
- Grouped output fields are distinct from the non-grouped fields - they change the following (described in --help):
* Reference sequence identifier is the value that is grouped by followed by the number of grouped members in brackets e.g. 9606 (5)
* Distinct alignment regions are summed across group members
* Alignments are summed across group members
* Unique reads aligned are recomputed across group members
* Covered bases and reference lengths are set to 0
* Coverage is selected to be the highest among the group members
Conditional coverage filter applied to --regions filters and applies it only if coverage is below this threshold
- This rescues high coverage sequences as these usually have few regions
- --regions-coverage <0.0-1.0> - a sufficient value can be somewhere around 0.3 - 0.6
- Short argument for conditional coverage filter (-t) has replaced pretty table output (now -T)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.6.0