-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding min-max linkage to Agglomerative Clustering #19698
Comments
hloo,i would like to contribute to this issue, can u please assign me this issue |
@deepthi1107 : To be honest, I don't know if I am allowed to assign you for this as I am waiting to see if someone confirm this new feature, so we can work on it. |
Iam super excited to work with you, ya let's wait for the assigning
…On Fri, Mar 19, 2021, 10:38 PM Nima Sarajpoor ***@***.***> wrote:
@deepthi1107 <https://github.com/deepthi1107> : To be honest, I don't
know if I am allowed to assign you for this as I am waiting to see if
someone confirm this new feature, so we can work on it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19698 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS3DRJM2RJVS5IH7LT55ROLTEOAJ7ANCNFSM4ZJ6DOUQ>
.
|
One of the reason for the scikit-learn-extra library is to create increased
visibility for algorithms that are not sufficiently popular to maintain in
the core library. (Of course, the visibility of scikit-learn-extra is also
reduced relative to this library, and maybe we should consider linking out
from the core document out more often.) Have you considered making this
contribution there?
|
@jnothman: No. I used "KMedoid" from extra library a while ago, and I thought that library was just a small project created to add some new algorithms. I will mention it there. Thanks for your input. |
@deepthi1107 : |
Cool, thanks!
…On Sun, Mar 21, 2021, 10:38 PM Nima Sarajpoor ***@***.***> wrote:
@deepthi1107 <https://github.com/deepthi1107> :
Hi, I just mentioned it on the scikit-learn-extra page. If they approve,
we can start to work on this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19698 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS3DRJPN25GOOVM6JOQSUW3TEYR2FANCNFSM4ZJ6DOUQ>
.
|
PR
…On Sun, Jun 27, 2021, 6:09 PM Nima Sarajpoor ***@***.***> wrote:
@cmarmo <https://github.com/cmarmo> @jnothman
<https://github.com/jnothman>
UPDATE:
I opened an issue
<scikit-learn-contrib/scikit-learn-extra#93> of
this on scikit-learn-extra about three months ago; however, there has been
no reaction. Should I go ahead and implement it and submit a PR? Or should
I close the issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#19698 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS3DRJPML4LMVVWJ47DXEDTTU4LX3ANCNFSM4ZJ6DOUQ>
.
|
Hi,
I checked out the "the scikit-learn inclusion criterion:" and the algorithm I am about to suggest satisfies all except one! And, that's the number of citations, which is 116 according to google scholar (not 200+).
However, I think many people (who are on the application side of ML) use scikit-learn for their projects and are not aware of many algorithms that are available. Thus, including a new algorithm in the scikit-learn can let them know such method exists. Please let me know if it is a good idea to add this method can be added to the Agglemorative Clustering. If so, I can work on it (as I already implemented it myself) and add it to the package.
Challenge: The linkages in agglemorative clustering that can work with the distance matrix are complete, linkage, and average. However, the resulted clusters don't have a clear centroid. So, a method that can work with a distance matrix and provide a centroid-based structured clusters can provide a lot of flexibility. It can be used for any distance measure and since it utilizes the distance matrix only, it can be performed very fast for large-size data. It also provides centroids which can be used as the representatives of the clusters.
Solution: A min-max linkage proposed by Jacob Bien & Robert Tibshirani (2011) is a good solution. It merges two clusters if a ball that is required to cover the points of the two clusters together is the smallest compare to others. A ball for each cluster can be identified as follows: First, identify the furthest neighbor of each point and calculate its distance. Then, choose the point (M) that has the shortest distance to its furthest neighbor. That point M is the centroid and that shortest distance is the radius of the ball. I recently used it for my paper where I had to use DTW distance, but I also need a fast algorithm that provides a reasonable centroid rather than computing DBA at each step of the process.
Alternative solution: N/A
Additional Info:
Paper:
Bien, Jacob, and Robert Tibshirani. "Hierarchical clustering with prototypes via minimax linkage." Journal of the American Statistical Association 106.495 (2011): 1075-1084.
The text was updated successfully, but these errors were encountered: