Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate grouping in the clustering? #4

Open
soichih opened this issue Nov 15, 2021 · 3 comments
Open

Duplicate grouping in the clustering? #4

soichih opened this issue Nov 15, 2021 · 3 comments

Comments

@soichih
Copy link

soichih commented Nov 15, 2021

Hello! Thank you for implementing this algorithm.

I am trying to see if we can use this module for our project (brainlife.io) and I've been test driving it.

I've noticed that sometimes the algorithm returns duplicate clusters. Do you think it is normal?

For example, given this adjacency matrix.

[
  [ 0, 9, 0, 9, 11 ],
  [ 9, 0, 9, 8, 6 ],
  [ 0, 9, 0, 9, 11 ],
  [ 9, 8, 9, 0, 2 ],
  [ 11, 6, 11, 2, 0 ]
]

Here is the output from this algorithm.

[ [ 0, 2 ], [ 0, 2 ], [ 1, 3, 4 ] ]

I think the output should be

[ [ 0, 2 ], [ 1, 3, 4 ] ]

Another a bit more complex example is this.

[
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ]
]

cluster output

[
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ]
]

As you can see, I see nodes listed in multiple clusters. Are all nodes supposed to be listed only once across all clusters?

Thank you!!!

@oflisback
Copy link
Owner

Hi @soichih,

thanks for checking it out but I would advice against using this implementation, it was a one-off thing I needed for a side-project and I'm not surprised if you found a bug. I would look into alternative implementations, sorry.

@soichih
Copy link
Author

soichih commented Nov 19, 2021

Thanks. Which implementations would you suggest? Do you mean not use the MCL at all? Should I try something like this ?https://www.npmjs.com/package/@cdxoo/dbscan

@oflisback
Copy link
Owner

I'm afraid I have no idea, I haven't been active in this space for a long time and it probably depends a lot on your use-case, I remember that the markov clustering approach didn't scale very well with the input size for instance. But best of luck to you! 🚀

@oflisback oflisback reopened this Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants