Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity_matrix_connect update #489

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Connectivity_matrix_connect update #489

wants to merge 16 commits into from

Conversation

kyralianaka
Copy link
Contributor

This PR resolves #486 so that views with global indices of different ranges can be used to connect cells with an adjacency matrix (and adds a test).

I also removed the random selection of compartments on the post synaptic cell because I don't quite see the benefit to doing this as opposed to just using the first compartment of the first branch, and the random selection in list comprehension was really slowing down my building of large networks. This being said, I would be interested in having connectivity_matrix_connect take a matrix (num pre comps x num post comps) instead of (num pre cells x num post cells) for more fine control of the connectivity... I can open another issue about this as enhancement.

@kyralianaka kyralianaka requested a review from jnsbck November 7, 2024 15:02
@jnsbck
Copy link
Contributor

jnsbck commented Nov 7, 2024

This was also fixed by #488, which I merged already.
The random selection is consistent with the other cell level connection methods. Also the first of the first of the first will not work with single comp cells. I agree that we should make it faster though if it slows down the whole thing.

The comp2comp connectivity matrix connect that you're proposing is basically already possible using connect since it will not just take single comps but also n comps. Something like the following should work:

con_matrix = some_boolean_matrix
pre_inds, post_inds = np.where(con_matrix)
pre_comps = net.select(pre_inds)
post_comps = net.select(post_inds)

connect(pre_comps, post_comps, TestSynapse())

@kyralianaka
Copy link
Contributor Author

I am happy to go through the other methods and remove the random selection; it will always make connecting networks faster and avoid having to look for the postsynaptic compartments.

I'm not sure why you say it won't work for single compartment cells, I can change the test to single compartment cells and have no problem.

For the comp2comp mat, that's fine, no need to add then.

In general I would just like to make this function easier to read, mainly by not using so many types of indices for the same cells ("from_idx", "pre_cell_inds", and "global_pre_cell_inds"). I think there are even more simplifications I can do, so I will do those and commit again (will also remove the rand selection).

@kyralianaka
Copy link
Contributor Author

I also think the added test is really important especially if you make more plans to change synapse indexing

@jnsbck
Copy link
Contributor

jnsbck commented Nov 8, 2024

I agree that it will make networks faster, but I would not make it the default. I'd be fine with making it a toggle though, i.e.
post_comp="random" per default and sth. like post_comp=1 for branch 0, comp 1. If we add this, we should keep in mind that branches can have different numbers of segments tho.

Ah I see, you mean 0th comp rather than 1st. In that case I am strongly against hard coding this, since we would always connect soma to soma.

I'd be surprised if you can make rewrite it without `"from_idx", "pre_cell_inds", and "global_pre_cell_inds", but if you have an idea how if can work its great if you can make the function easier to read.

Could you briefly explain what the additional test is covering that is not covered by the other cases? Also please add some more doc/comments :)

@kyralianaka
Copy link
Contributor Author

I am strongly for making it the default haha, but open to giving the option for random selection. Your concern about always connecting soma to soma doesn't quite make sense to me because the soma doesn't have to be and often isn't the zeroith compartment (when self-defining cells, I could be wrong about swc loaded cells, in which case I would reconsider).

Re all the indices, I renamed things and I think the most clarifying aspect is distinguishing between global_cell_inds and global_comp_inds. I also really like it if the pre and post indices are handled the same way because this makes the function more modular.

I just added some comments to the tests, they are in the test_connectivity_matrix_connect() function, but you can let me know if anything is still unclear. I might still go add even more comments :)

@kyralianaka
Copy link
Contributor Author

Thinking about the way cells are typically constructed, would it actually maybe make sense to default connect the last compartment of the presynaptic cell to the zeroith compartment of the post-synaptic cell? (yes before I meant zeroith when I said first sorry for the confusion)

@kyralianaka
Copy link
Contributor Author

Another thing, the test_sparse_connect() function uses this:
assert all( [ 63, 59, 65, 86, 80, 58, 92, 85, 168, 145, 189, 153, 180, 190, 184, 163, 159, 179, 182, ] )
so it will always pass... I will fix that here

@kyralianaka
Copy link
Contributor Author

I updated all of the connectivity functions now so that they all index in the same way. It probably looks more repetitive now, but I would argue that that's a good thing as a more modular design to avoid potential future bugs.

I haven't yet included the option to randomly select the postsynaptic compartment, but I guess we should talk about that as a group and get everyone's opinions.

@kyralianaka
Copy link
Contributor Author

Two tests are currently failing because changing the post_comp_index of the fully connected network changes the voltage solutions of simulations. I will wait to fix this then until we talk about the post_comp_index scheme.

@michaeldeistler
Copy link
Contributor

+1 for giving the option of random connection vs first-to-first. We should do it consistently for all connection methods though. I am fine with having first-to-first as the default. I am against connecting first to last.

As for the connectivity_matrix_connect to be a (comp, comp) matrix: I think this would be great, but doing it naively will blow up memory. Instead, we should support matrices to be passed in csr or csc formats.

@kyralianaka
Copy link
Contributor Author

I just added back the option to randomly select the post-synaptic compartment; setting this to true makes most of the old tests pass (except where I changed other things).

The other thing I changed was the way of sparsely connecting the network because the previous method randomly sampled pre and post cells separately, which resulted in a lot of duplicate connections. Now the connectivity matrix is generated by independent Bernoulli trials, consistent with common random graph generation methods. The one disadvantage of this method is that it could be memory intensive for extremely large numbers of pre and post synaptic cells, but since the matrix is local, I'm hopeful that it won't cause problems.

I might look into the (comp, comp) matrix situation in another pull request since it will soon become relevant to my project, but hopefully we can update these functions first :)

@michaeldeistler
Copy link
Contributor

Cool! When you are done please request a review!

Copy link
Contributor

@michaeldeistler michaeldeistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! Could you report the runtime of random_post_comp = True in the PR description and compare it to the runtime before this PR? Especially for large networks.

(And ideally also add the timing for random_post_comp=False (after this PR).

@@ -84,41 +84,42 @@ def test_fully_connect():

fully_connect(net[8:12], net[12:16], TestSynapse())

# This was previously visually inspected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear what you mean here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I just mean that I ran network.vis() in a notebook many times to make sure that the connectivity looks as expected, but I can remove these comments

@@ -135,27 +136,31 @@ def test_sparse_connect():

sparse_connect(net[8:12], net[12:], TestSynapse(), p=0.5)

# This was previously visually inspected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, unclear

@@ -203,3 +208,38 @@ def test_connectivity_matrix_connect():
comp_inds = nodes.loc[net.edges[cols].to_numpy().flatten()]
cell_inds = comp_inds["global_cell_index"].to_numpy().reshape(-1, 2)
assert np.all(cell_inds == incides_of_connected_cells)

# Test with different view ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a view range?

Copy link
Contributor Author

@kyralianaka kyralianaka Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addresses bug #486 where connectivity_matrix_connect() would not work with e.g.

pre = net.cell(list(range(2, 5)))
post = net.cell(list(range(5, 7)))

but did when the pre range started at 0, e.g.

pre = net.cell(list(range(0, 4)))
post = net.cell(list(range(4, 8)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording a bit

fully_connect(pre, post, IonotropicSynapse())
fully_connect(pre, post, TestSynapse())
fully_connect(pre, post, IonotropicSynapse(), random_post_comp=True)
fully_connect(pre, post, TestSynapse(), random_post_comp=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a test for random_post_comp=False? At least for fully_connect, but also for the others? At least to check that they do not break?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the tests in test_connection.py are with random_post_comp=False, but the tests that used connection.py everywhere else (test_grad.py and test_basic_modules.py) use random_post_comp=True with fully connect (so that the simulation results are the same as before). I could add tests for random_post_comp=True to test_connection.py -- would this then be enough coverage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: all those function tests now have both random_post_comp=True & False

@michaeldeistler
Copy link
Contributor

@kyralianaka please request a review when ready.

Copy link
Contributor

@jnsbck jnsbck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look and left some comments, not sure when you requested the review, if it was recent or months ago. Also make sure to rebase once ready, since there have been a lot of merged prs lately

# Pre-synapse at the zero-eth branch and zero-eth compartment
global_pre_comp_indices = (
pre_cell_view.scope("local").branch(0).comp(0).nodes.index.to_numpy()
) # setting scope ensure that this works indep of current scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure to reset this after, in case the user is in global scope

num_pre = len(pre_cell_view._cells_in_view)
num_post = len(post_cell_view._cells_in_view)

# Generate random cxns without duplicates --> respects p but memory intesive if extremely large n cells
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I implemented the connect API, @michaeldeistler and me had discussed not wanting to do this, since this can be really bad for large, sparse networks. Did we change our mind about doing it this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its cleaner and easier to parse, but not efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch Jonas! Let's avoid building a n x n matrix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into this for quite a while, and all of the ways that are memory efficient are extremely time intensive, and vice versa, so I came up with a way to split the cost a bit... generating blocks of random matrix at a time. I am still thinking of how it could be prettier, but the resource management seems to be good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we cannot do it as it was before? If really would like to avoid memory scaling with O(N^2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function before created many duplicate connections between the same cells, and simply filtering these out would remove a lot of connections (inefficient) and make the p value entered somewhat meaningless (more problematic). Also p=1.0 did not create a fully connected graph, far from it I might add. Resource efficient methods for constructing sparse random graphs are highly sought after and typically trade off computation speed and memory, so I tried to write a function that balances the two. The function now isn't O(N^2) per se, the largest matrix it creates is 100x100. I can keep looking for better algorithms in the future, but for now the function at least fixes the bug of duplicate connections and restores the traditional meaning of p.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking now that just removing sparse_connect() might not be so bad... I think we might have used connectivity matrix connect or something else in the RNN experiments anyway

@@ -44,33 +44,52 @@ def fully_connect(
pre_cell_view: "View",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we rename fully_connect to fully_connect_cells? Might be less confusing, since otherwise connections are made comp 2 comp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone would ever try to connect all of the compartments to all of the other compartments, and it's longer, so I'd prefer not to

@kyralianaka
Copy link
Contributor Author

Thanks for the feedback, I've done a bit of runtime testing and there is still some room for improvement, but my goal is still to make it as readable as possible (so that it's easy to write more functions like this) while not too slow. I'll work on this on Monday and think a bit more about restructuring

@jnsbck jnsbck mentioned this pull request Nov 22, 2024
… scoping, and added function for random post-comp selection
@kyralianaka
Copy link
Contributor Author

An update on the timing:

PR original
fc 0-0 2.485 s
fc rand 2.205 s 2.00 s
sparse 0-0 3.16 s
sparse rand 3.24 s 10963.41 s (3 h)

The random selection does actually seem to be faster now, so I'll just make that the default again unless that changes. Also shifting back towards the old fully connect which didn't have any problems other than not being so easy to parse.

@jnsbck
Copy link
Contributor

jnsbck commented Feb 21, 2025

@kyralianaka what is happening here? :)

@kyralianaka
Copy link
Contributor Author

The state of this was that the sparse connect function was taking very very long to connect the cells, and it wasn't doing it a way one would expect in that it would often connect the same cells multiple times and p=1.0 would not fully connect the cells...

Then I did some digging and came up with a function for sparse connectivity that traded of memory (normally a full adj-mat would be needed) and time (assessing whether to connect cells one at a time). This was after a previous version of the function that generated expected connectivity patterns at the expense of generating the whole adj-mat. The last I heard though was that Michael wanted to revert back to the old function, maybe unless I could get append_multiple_synapses to work with sparse array types like csc or csr. The runtime problem of the old function could be solved by removing the random selection of a post synaptic compartment for each cell, so I made the random sampling its own function and an optional call of all of the fully_connect function as well. I think it's nice to have the option of knowing that the post-synaptic compartment is always zero (there was a use case for me a while ago) or randomly selecting it. This results in a slight increase in computational time for fully-connecting a large network (table above) which I might be able to reduce by shoving the random selection back into the function, but it doesn't save that much and I didn't think it'd be as clean.

Anyway, what's left for me to do as I understand it is revert sparse-connect back to the old function without or with improved random sampling (because of the time problem) and write my concerns about the unexpected connections generated in the doc string. I'll try to push that here today, and you all can let me know if we should eventually merge this.

@jnsbck
Copy link
Contributor

jnsbck commented Feb 24, 2025

sounds great, thanks for the update :) Even if we just make first-to-first default and change nothing else, it should probably make things faster and more predictable, which would already be great imo

… comp choosing func for efficiency, updated sparse test
@kyralianaka
Copy link
Contributor Author

Ok now since the random compartment choosing is done a little differently depending on the connection type (because sparse_connect and connectivity_matrix_connect can have varying numbers of post-synaptic compartments), I put it back into the respective functions. I tried to make everything as consistent as possible with the way it was before except for the first-to-first connectivity. I still need to add a few more tests and do the new chores :)

@kyralianaka
Copy link
Contributor Author

kyralianaka commented Feb 25, 2025

New (hopefully final) times for connecting a network of 1000 cells with p=0.5 for the sparse connectivity:

PR original
fc 0-0 0.36 s n/a
fc rand 0.39 s 0.39 s
sparse 0-0 0.20 s n/a
sparse rand 0.40 s 10963.41 s (3 h)

the sparse random connectivity is faster now because I switched to using the pandas sample method instead of list comprehension, but 0-0 connectivity is default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in connectivity_matrix_connect
3 participants