Duplicate grouping in the clustering? #4

soichih · 2021-11-15T17:54:21Z

Hello! Thank you for implementing this algorithm.

I am trying to see if we can use this module for our project (brainlife.io) and I've been test driving it.

I've noticed that sometimes the algorithm returns duplicate clusters. Do you think it is normal?

For example, given this adjacency matrix.

[
  [ 0, 9, 0, 9, 11 ],
  [ 9, 0, 9, 8, 6 ],
  [ 0, 9, 0, 9, 11 ],
  [ 9, 8, 9, 0, 2 ],
  [ 11, 6, 11, 2, 0 ]
]

Here is the output from this algorithm.

[ [ 0, 2 ], [ 0, 2 ], [ 1, 3, 4 ] ]

I think the output should be

[ [ 0, 2 ], [ 1, 3, 4 ] ]

Another a bit more complex example is this.

[
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    1, 0, 0, 0, 1,
    2, 1, 1, 1, 2,
    1, 2
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ],
  [
    0, 1, 1, 1, 0,
    1, 0, 0, 0, 1,
    0, 1
  ],
  [
    1, 2, 2, 2, 1,
    0, 1, 1, 1, 0,
    1, 0
  ]
]

cluster output

[
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 1, 2,  3, 4,
    6, 7, 8, 10
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ],
  [
    0, 4,  5,  6, 7,
    8, 9, 10, 11
  ]
]

As you can see, I see nodes listed in multiple clusters. Are all nodes supposed to be listed only once across all clusters?

Thank you!!!

The text was updated successfully, but these errors were encountered:

oflisback · 2021-11-19T14:56:08Z

Hi @soichih,

thanks for checking it out but I would advice against using this implementation, it was a one-off thing I needed for a side-project and I'm not surprised if you found a bug. I would look into alternative implementations, sorry.

soichih · 2021-11-19T14:58:03Z

Thanks. Which implementations would you suggest? Do you mean not use the MCL at all? Should I try something like this ?https://www.npmjs.com/package/@cdxoo/dbscan

oflisback · 2021-11-19T15:04:17Z

I'm afraid I have no idea, I haven't been active in this space for a long time and it probably depends a lot on your use-case, I remember that the markov clustering approach didn't scale very well with the input size for instance. But best of luck to you! 🚀

oflisback closed this as completed Nov 19, 2021

oflisback reopened this Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate grouping in the clustering? #4

Duplicate grouping in the clustering? #4

soichih commented Nov 15, 2021 •

edited

Loading

oflisback commented Nov 19, 2021

soichih commented Nov 19, 2021

oflisback commented Nov 19, 2021

Duplicate grouping in the clustering? #4

Duplicate grouping in the clustering? #4

Comments

soichih commented Nov 15, 2021 • edited Loading

oflisback commented Nov 19, 2021

soichih commented Nov 19, 2021

oflisback commented Nov 19, 2021

soichih commented Nov 15, 2021 •

edited

Loading