-
Notifications
You must be signed in to change notification settings - Fork 3
Alternate Approaches and Comparisons
For latest benchmarking using skDER v1.2.6 - see the latest preprint - coming soon.
If dereplication based on ANI thresholds is not needed these alternate approaches might also be of interest to you:
1. Phylogenetic construction and pruning while retaining diversity using Treemer or something like it.
One approach to selecting representative genomes might be to construct a phylogenetic/phylogenomic tree for all the genomes and then prune samples while maximizing retention of diversity. Treemmer is a really nice program for performing this.
PopPunk might also be of interest to users interested in clustering genomes within a species into strain clusters, after which they can select representatives based on N50 or other metrics. PopPunk's infrastructure is well designed for scalability.
RabbitTClust seems fast and extremely useful to cluster very large datasets (100s of thousands to million). It is based on k-mer sketches and thus does not assess ANI exactly during the clustering process.