Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong exemplars returned when using cluster_selection_epsilon (exemplars from eps=0 are returned) #593

Open
lucetka opened this issue May 19, 2023 · 0 comments

Comments

@lucetka
Copy link

lucetka commented May 19, 2023

When using a model with cluster_selection_epsilon within the effective range, the exemplars returned seem to be totally wrong - they are the exemplars that belong to the clusters produced before the eps is applied.

I think this issue is related also to another issue that I've asked about #571 , ie that the condensed tree returned is always the eps=0 tree, without showing the new "superclusters" selected as a consequence of merging clusters + the points falling out at the specified eps level, and I've noticed that other related issues have been identified by others #586. It would be great if this could be fixed.

Meanwhile, as an ultra-quick and very dirty workaround sufficient for my specific use, I map the labels from the clustering with epsilon to the clustering without, and for the newly emerged superclusters I simply use the exemplars from all the clusters from the eps=0 clustering that had been engulfed by the new supercluster (i.e. instead of 3 exemplars, I end up for e.g. with 6, which is in my case -- clustering documents -- not necessarily a bad thing as it also gives you an idea about the heterogeneity of the final cluster). However, I know this is not really correct because of course the resulting supercluster consists of more than just the engulfed clusters that had been selected in the eps=0 clustering - the supercluster of course also sucks in all the points previously discarded as noise at every split that happened above the applied eps level, and all these points (previously noise in the eps=0 clustering but now part of the cluster in the clustering with eps applied) are then not represented by the exemplars.

Edit: I realize I should have mentioned hdbscan 0.8.28 with Python 3.10.2 on Windows 10 64bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant