Skip to content

Releases: PacificBiosciences/pb-metagenomics-tools

v3.2.0

16 Jul 19:32
Compare
Choose a tag to compare

Added new feature to HiFi-MAG-Pipeline:

  • Identifies "superbins" that are output from SemiBin2, which are >100 Mb in size. Including very large superbins (~1GB) will cause crashes in DAS_Tool. This new identification step will move all superbins to a superbins folder in the SemiBin2 sample output directory. These individual superbins can be inspected to determine their contents, which sometimes contain interesting eukaryotic genomes.

v3.1.0

27 Jun 20:56
d9a88ba
Compare
Choose a tag to compare

Added new features to HiFi-MAG-Pipeline:

  • Use local version of CheckM2 database, rather than rely on automated download in workflow. Fixes issues previously raised.
  • Added new mapping feature. Converts existing bam to paf, then filters based on unique reads, percent read aligned, and percent identity. Performs at contig and MAG level, generates a table and figure of percent reads mapped.

v3.0.0

13 May 16:05
5e6249f
Compare
Choose a tag to compare

Added pb-MAG-mirror workflow to compare and consolidate two MAG sets.

v2.1.0

22 Aug 23:03
9db571c
Compare
Choose a tag to compare

HiFi-MAG-Pipeline:
Bug fix. Given a reference longer than 4Gb, minimap2 is unable to see all the sequences and thus can't produce a correct SAM header. This throws an error in the MinimapToBam rule with the sorting step. This fix avoids the issue by splitting this rule into two new rules: MinimapIndex and MinimapToBam. These rules index the reference prior to alignment and run minimap2 with the --split-prefix option, respectively.

Taxonomic-Profiling-Diamond-Megan:

  • Bug fix (#65). Created separate environments for rules to prevent bad env recipes.
  • Feature request (#20). Added access to KEGG annotations with MEGAN-UE. New snakemake added to repo called Snakefile-diamond-megan-ue.smk to enable workflow. Docs updated to explain usage and file download requirements (requires binaries from MEGAN-UE and mapping file for MEGAN-UE). Apparently does not require a license to map KEGG annotations using CLI tools.

v2.0.2

19 Apr 18:13
Compare
Choose a tag to compare

HiFi-MAG-Pipeline:

Depending on version of DAS_Tool, the name of the helper script used to generate input files change:

<= v1.1.3 : Fasta_to_Scaffolds2Bin.sh
> v1.1.3 : Fasta_to_Contig2Bin.sh

Snakemake was designed to run with the script from <= v1.1.3, causing errors with the newer versions.

This release pins DAS_Tool to v1.1.6 and uses the updated Fasta_to_Contig2Bin.sh name in the workflow.

v2.0.1

11 Apr 18:13
Compare
Choose a tag to compare

HiFi-MAG-Pipeline:

Made changes to environment recipes.

  • Add fuzzy matching to problematic environments (dastool, semibin)
  • Added full channel set to environments (some only had bioconda)

Bug fix for edge case in Filter-Complete-Contigs.py: Dataset consisting of only bins with 100% completeness raises an error in plotting the histogram due to improper bin sizes. Bug fix adds conditional statement to handle this case.

v2.0.0

24 Feb 05:43
Compare
Choose a tag to compare

HiFi-MAG-Pipeline received major improvements.

The new version of HiFi-MAG-Pipeline is "completeness-aware":

  • Long contigs >500kb are identified and placed in individual fasta files.
  • They are then examined using CheckM2 to determine percent completeness.
  • All long contigs that are >93% complete are then moved directly to the final MAG set.
  • The long contigs that are <93% complete are pooled with other shorter incomplete contigs from the starting set, and this contig set is subjected to binning.
  • Binning algorithms include MetaBat2 and SemiBin2 (using long read settings).
  • The two bin sets are merged using DAS_Tool.
  • The dereplicated bin set consists of the merged bin set from above and all long complete contigs found.
  • This dereplicated bin set is examined using CheckM2, and subsequently filtered based on several qualities (defaults = >70% completeness, <10% contamination, <20 contigs).
  • All bins/MAGs passing filtering undergo taxonomic assignment using GTDB-Tk.
  • The final MAGs are written as a set of fasta files, several figures are produced, and a summary file of metadata is generated.

The new "completeness-aware" strategy is highly effective at preventing improper binning of complete contigs.

  • It is more effective than the previous "circular-aware" binning used in v1.5 and v1.6.
  • Compared to a standard binning pipeline (e.g., MetaBat2), it results in a 14-67% increase in total MAGs (average 36%) and 13-186% increase in single contig MAGs (average 87%).
  • Compared to the "circular-aware" binning in v1.5, it results in a 14-39% increase in total MAGs (average 27%) and 10-28% increase in single contig MAGs (average 20%).

Beyond the "completeness-aware" strategy, there are several other important updates:

  • It now uses CheckM2 instead of CheckM, and no longer requires the manual download of the Checkm database.
  • For binning, Concoct and MaxBin2 have been retired, and SemiBin2 is used in conjunction with MetaBat2. SemiBin2 is highly effective at binning contigs from long-read assemblies and obtains better results.
  • This version also introduces checkpoints to create forked workflows depending on the properties of the sample, thereby preventing crashes when no bins pass filtering. This applies to the long contig completeness evaluation stage and the binning of incomplete contigs.
  • New figures are produced as part of the long contig evaluations and final summary steps.

v1.6.1

30 Jan 19:31
Compare
Choose a tag to compare

In #36 it was reported thatHiFi-MAG-Pipeline scripts were missing from the required location. This update fixes the issue.

v1.6.0

12 Jan 18:51
Compare
Choose a tag to compare

Major changes:

  • Added Taxonomic-Profiling-Sourmash workflow (all credit to @bluegenes !) .
  • Consolidated existing profiling workflows.
    • Taxonomic-Profiling-Diamond-Megan is the combined workflow of Taxonomic-Functional-Profiling-Protein + MEGAN-RMA-Summary.
    • Taxonomic-Profiling-Minimap-Megan is the combined workflow of Taxonomic-Profiling-Nucleotide + MEGAN-RMA-Summary.
  • Added new binning methods to the custom strategy in HiFi-MAG-Pipeline, including CONCOCT and MaxBin2.
  • Updated all relevant documentation and images.

Minor changes:

  • Taxonomic-Profiling-Minimap-Megan: Added filtered & unfiltered versions of RMA file, using the --minSupportPercent 0.01 flag in sam2rma. Allows filtered & unfiltered taxonomic report outputs.
  • HiFi-MAG-Pipeline: Set minimum contig size for binning to 50000 across all binning methods (MetaBAT2, CONCOCT, MaxBin2).
  • HiFi-MAG-Pipeline: Updated GTDB-TK requirement to v2.1.1.

v1.5.0

13 Apr 18:04
dbc183d
Compare
Choose a tag to compare

HiFi-MAG-Pipeline:

  • Updated to require GTDB-Tk V2.0.0+, and ensured compatibility with GTDB 07-RS207 (release 207).
  • Set default max contigs per MAG to 20.
  • Updated docs, including image showing "circular-aware binning" strategy.

Taxonomic-Functional-Profiling-Protein:

  • New default behavior is to output two RMA files per sample. The {sample}_filtered.protein.{mode}.rma results from the optimal MEGAN-LR filter for precision/recall balance, whereas the {sample}_unfiltered.protein.{mode}.rma file has no filtering (e.g., any read assigned to any taxon is reported). The filtering parameter for {sample}_filtered.protein.{mode}.rma is still be controlled with the sam2rma: minSupportPercent argument in the config.yaml file; the default is 0.01.
  • Ensured compatibility with newest MEGAN mapping file: megan-map-Feb2022.db.zip.
  • Updated docs.