Skip to content

Alternate Approaches and Comparisons

Rauf Salamzade edited this page Sep 5, 2024 · 30 revisions

Comparison to dRep and galah for collapsing redundancy of Enterococcus genomes

For latest benchmarking using skDER v1.2.6 - see the latest preprint - coming soon.

Alternate Approaches for Genomic Dereplication

If dereplication based on ANI thresholds is not needed these alternate approaches might also be of interest to you:

1. Phylogenetic construction and pruning while retaining diversity using Treemer or something like it.

One approach to selecting representative genomes might be to construct a phylogenetic/phylogenomic tree for all the genomes and then prune samples while maximizing retention of diversity. Treemmer is a really nice program for performing this.

2. Intra-species identification of strains using PopPunk

PopPunk might also be of interest to users interested in clustering genomes within a species into strain clusters, after which they can select representatives based on N50 or other metrics. PopPunk's infrastructure is well designed for scalability.

3. Fast genome clustering using k-mer sketching with RabbitTclust

RabbitTClust seems fast and extremely useful to cluster very large datasets (100s of thousands to million). It is based on k-mer sketches and thus does not assess ANI exactly during the clustering process.