Filter a VertexClustering object based on cluster sizes #507

ntamas · 2022-02-07T09:44:49Z

What is the feature or improvement you would like to see?

It is a common requirement when clustering a large graph to plot only the larger communities but not the smaller ones, as illustrated by this SO question.

Maybe it would be even more flexible if we allowed the user to pass an upper and a lower size limit, or an aribtrary filtering predicate based on the sizes.

I've posted a quick sketch of a possible implementation in the SO question.

Use cases for the feature

See above; the SO question provides a possible use-case.

References

N/A

stale · 2022-04-16T03:02:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

5anperez · 2022-12-06T01:33:28Z

Hello! I would like to contribute to igraph and noticed that this is labeled as a good first issue, which I am hunting for, since it will be my very first contribution. Is this still open and if so, is everything I need to get starrted in the wiki?

ntamas · 2022-12-06T08:07:46Z

Yes, the issue is still open and feel free to get started on it. There's no wiki page but the README contains some instructions for getting started with development (after checking out the repo and all its submodules); see here.

I'd suggest that you open a PR with your draft as soon as possible (and mark it as draft) and commit often so we can give you early feedback if needed.

As for this specific task, you can use my SO response as a starting point, but it needs to be made a proper method of the VertexClustering class. I'd probably add a new filter_by_size() method to the VertexClustering class.

szhorvat · 2022-12-06T09:12:09Z

Wouldn't it make sense to try to design some sort of generalized filtering mechanism instead of a one-trick method that only handles sizes?

What if:

I want to filter by total vertex weight or total edge weight inside of a partition.
I want to select the partition containing a specific vertex.
I want to select partitions that contain vertices with a specific property, stored in some attribute. For example, select partitions that have a vertex whose name starts with V, representing areas in the visual cortex within a brain network.
I want to filter an overlapping clustering represented using VertexCover, not VertexClustering

Generally speaking, the area where most open source projects don't do so well is having a coherent, predictable, well thought out design. If you work with a system like Mathematica often enough, this becomes painfully obvious. I would like igraph to stand out from the crowd by being better in this area.

ntamas · 2022-12-06T10:33:39Z

Wouldn't it make sense to try to design some sort of generalized filtering mechanism instead of a one-trick method that only handles sizes?

I was considering this, but I don't want to overstretch the scope of this issue. It's always possible to add such a filtering method later and then rewrite filter_by_size() based on this. One advantage of adding a separate method for size (which is the most common filtering that people want to use) is that you don't actually need to construct the subgraph.

That being said, I can imagine adding a filter() method to VertexClustering that takes a Boolean predicate. The predicate would be a Python callable that takes the list of vertices in the cluster and returns True/False, but in that case it would be the job of the predicate to construct the subgraph on-the-fly if needed.

szhorvat · 2022-12-06T13:43:24Z

Alright, that makes sense. But it's still a useful exercise to think about how a generalization could be formulated and then implemented. Creating the subgraph seems like a heavyweight operation. Would it make sense to add a helper function to the C core which can retrieve the IDs of edges contained within a subgraph induced by some vertices? Then it will become relatively easy to compute things like:

Total number of edges in community
Total edge weight in community
Any statistic (e.g. variance) of edge weights within community

More generally, it might make sense for a filtering method to have access to:

Vertices and their attributes
Edges and their attributes

The first is trivial, as the clusters are generally understood in terms of their vertices and API provides access to these through .as_cover(). There does not seem to be a simple way for the second. Am I missing something? If not, do you agree that it makes sense to add the support function to the C core, or at least discuss these ideas further somewhere else (forum?)

ntamas · 2022-12-20T12:36:46Z

Would it make sense to add a helper function to the C core which can retrieve the IDs of edges contained within a subgraph induced by some vertices?

Sounds like a good idea.

More generally, it might make sense for a filtering method to have access to:

I think that all that the filtering function needs access to is the graph object itself and the members of the current cluster being considered. The graph might not even be needed as you can always add it using functools.partial, e.g.:

def some_predicate(graph, vertices):
    # do something and return True/False here

clusters.filter(partial(some_predicate, graph))

This would allow .filter() to be implemented in the Clustering base class where we don't already have a graph.

But again, I still maintain that this is outside the scope of the current issue, which only asks for filtering based on cluster sizes, so in order not to derail the current discussion, I'd recommend opening a separate issue for a more generic filtering method.

palash018 · 2023-06-10T18:15:49Z

Is anyone still working on this issue currently?

ntamas added the good first issue Candidates for first-time contributors who are already familiar with C label Feb 7, 2022

stale bot added the stale label Apr 16, 2022

szhorvat added the wishlist Feature request that has not been chosen for implementation yet; vote or comment to prioritize it! label Apr 16, 2022

stale bot removed the stale label Apr 16, 2022

szhorvat mentioned this issue Dec 6, 2022

Retrieve biconnected components in terms of edges #619

Open

szhorvat mentioned this issue Dec 20, 2022

A function to find edges contained within an induced subgraph igraph/igraph#2277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter a VertexClustering object based on cluster sizes #507

Filter a VertexClustering object based on cluster sizes #507

ntamas commented Feb 7, 2022

stale bot commented Apr 16, 2022

5anperez commented Dec 6, 2022

ntamas commented Dec 6, 2022

szhorvat commented Dec 6, 2022

ntamas commented Dec 6, 2022

szhorvat commented Dec 6, 2022

ntamas commented Dec 20, 2022

palash018 commented Jun 10, 2023

Filter a VertexClustering object based on cluster sizes #507

Filter a VertexClustering object based on cluster sizes #507

Comments

ntamas commented Feb 7, 2022

stale bot commented Apr 16, 2022

5anperez commented Dec 6, 2022

ntamas commented Dec 6, 2022

szhorvat commented Dec 6, 2022

ntamas commented Dec 6, 2022

szhorvat commented Dec 6, 2022

ntamas commented Dec 20, 2022

palash018 commented Jun 10, 2023