Skip to content

0.6.0

Latest
Compare
Choose a tag to compare
@esteinig esteinig released this 18 Feb 23:53
· 82 commits to main since this release
3327d01

Major updates making applications more useful 🥳

Two short-hand command-line arguments (-i and -T) break with previous versions 💀

  • Release binaries CI/CD
  • Input alignment format (-i/--alignment) from file extension (bam|sam|cram|paf) or specifically with --alignment-format
  • Added --aligned/--group-aligned filter to supplement filter by unique aligned reads (--reads/--group-reads)
  • Pretty table output short argument is now -T (previously -t)
  • Input alignment short argument is now -i (previously -A)
  • Added -H argument to print machine-readable header to non-pretty table output [#13]
  • Reference alignment grouping by field in header and automated reference selection:
    - Requires annotation in reference sequence header (description) e.g. taxid=9606; segment="M"
    - Whitespace around header fields or values is trimmed (start-end) internally on parsing
    - --group-by <field>: group alignments by this field
    - --group-sep <delimiter>: the delimiter with which fields in the header are separated
    - --group-select-split <dir>: selects a single reference per group and outputs to file in <dir >({group_id}.fasta)
    - --group-select-by <coverage|reads>: selection by highest coverage or max reads
    - --group-select-order outputs the selected reference with index prefixes sorted by select-by metric ({idx}-{group_id}.fasta)
    - Example: --group-by "taxid=" --group-sep ";" --group-select-split ref_seqs/ --group-select-by coverage
  • If segment fields are specified each select segment reference is output by highest coverage or reads
    - Command line: --segment-field and --segment-field-nan
    - Example: --segment-field "segment=" --segment-field-nan "segment=N/A"
  • Grouped filtering and outputs behave different to non-grouped filtering and outputs:
    - Non-group filters (--regions, --reads, --aligned, --coverage, --length) are applied before grouping
    - Group filters can be applied (--group-regions, --group-reads, --group-coverage, --group-aligned)
    - Grouped output fields are distinct from the non-grouped fields - they change the following (described in --help):
    * Reference sequence identifier is the value that is grouped by followed by the number of grouped members in brackets e.g. 9606 (5)
    * Distinct alignment regions are summed across group members
    * Alignments are summed across group members
    * Unique reads aligned are recomputed across group members
    * Covered bases and reference lengths are set to 0
    * Coverage is selected to be the highest among the group members
  • Conditional coverage filter applied to --regions filters and applies it only if coverage is below this threshold
    - This rescues high coverage sequences as these usually have few regions
    - --regions-coverage <0.0-1.0> - a sufficient value can be somewhere around 0.3 - 0.6
    - Short argument for conditional coverage filter (-t) has replaced pretty table output (now -T)