Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] - Update rhythm code & docs #230

Merged
merged 16 commits into from
Jun 3, 2021
Merged

[ENH] - Update rhythm code & docs #230

merged 16 commits into from
Jun 3, 2021

Conversation

TomDonoghue
Copy link
Member

@TomDonoghue TomDonoghue commented Dec 30, 2020

This PR is for updates to the rhythm module, including code documentation, and tutorials.

So far, it updates the tutorials (since they were a bit limited) and in code documentation.

During this PR, I noticed an issue I noticed in SWM, that was also mentioned by Eric in #193 about how it does the window spacing. I think we need to audit and update this code a bit.
@ryanhammonds : could you do a check through the SWM code algorithm (see paper here: https://github.com/bartgips/SWM, and Matlab implementation here: https://github.com/bartgips/SWM), and see what you think might need fixing? There are a couple notes in #193 about it as well.
From my check, I think at the window check should ensure samples are less than X samples apart from another (not greater than). I think there should perhaps also be an additional check to ensure windows are distinct time segments.

As a sidenote, I'm not really sure that SWM needs to return costs, so maybe we can update to make that an optional return, and move toward deprecating that output.

@TomDonoghue TomDonoghue added the 2.2 Updates to go into a 2.2.0 release label Dec 31, 2020
@neurodsp-tools neurodsp-tools deleted a comment from codecov-io Dec 31, 2020
@neurodsp-tools neurodsp-tools deleted a comment from codecov-io Dec 31, 2020
@ryanhammonds
Copy link
Member

ryanhammonds commented Jan 5, 2021

I dug into this and also think that there is an issue with the current implementation. Eric's comment in the #193 ("You need to select from indices which are at least spacing_n_samps away from the entire window to ensure they are disjoint.") maybe incorrect based on Algorithm 1 of the manuscript. In the "initialization" section, window positions appear to be independent of the window length, L.

In Eric's code example in #193, the window_start indices:

len_sig = 23
window_length = 10
n_samp = len_sig - window_length
window_starts = np.array([0, 13])
spacing_n_samps = 3

should be "separated by twice the minimal spacing", so I think the indices should be:

window_starts = array([0, 6, 12]) # 18 isn't included since a full window can't fit from that position

Then, for any position (Xn), a new/random position (Xn') must be at least G (or window_length) away from Xn-1 and Xn+1, based on Algorithm 1 and Fig 2 from the manuscript.

I'm going to have to revisit this because I could be completely wrong. I took a peek at the matlab implementation and it's going to take me a bit to parse through.

@ryanhammonds
Copy link
Member

@TomDonoghue I pushed an algorithm update in 54592dc.

Here is a notebook I used for testing the update and comparing the results to the matlab implementation. It seems to recover average window better than the matlab version. Before the update, the example in the notebook would return an error.

The version I pushed really only works for a small number of windows (i.e. short signals / large window length and spacing) and a small number of iterations. The computation time exponentially increases with the number of windows. I think this is because it brute forces distances (i.e. cost difference) between every combination of windows, rather than using Markov Chain Monte Carlo sampling like in the paper / matlab version.

If we wanted to use MCMC sampling, I think we should add an external dependency unless we wanted to add at stats module to ndsp. I tried bootstraping + multiprocessing when computing the distance between pairs to speed things up couldn't get it to work. I can try again though.

@ryanhammonds ryanhammonds changed the title [WIP/ENH] - Update rhythm code & docs [ENH] - Update rhythm code & docs Jan 19, 2021
Base automatically changed from master to main January 25, 2021 21:17
@neurodsp-tools neurodsp-tools deleted a comment from codecov-io Jan 26, 2021
neurodsp/rhythm/swm.py Outdated Show resolved Hide resolved
neurodsp/rhythm/swm.py Outdated Show resolved Hide resolved
@TomDonoghue
Copy link
Member Author

Thanks for digging into this one @ryanhammonds!

In terms of comparing our version to the Matlab version - we have a minimal version. I don't think it's worth it right now to massively extend this algorithm (unless / until someone really needs it) - so I vote not trying to add MCMC etc at the moment. To be more clear about our implementation, I've added some description to the docstring, and fleshed out sub-function descriptions to try and better track what's going on.

Being clear this is a minimal implementation, I think as long as we are confident about what it does, we should merge this as is. I've left a couple review comments on the code for things I'm not clear on - once those are done, then this should be good to go without major extension.

@neurodsp-tools neurodsp-tools deleted a comment from codecov-io Feb 5, 2021
neurodsp/rhythm/swm.py Outdated Show resolved Hide resolved
@ryanhammonds ryanhammonds mentioned this pull request May 3, 2021
26 tasks
@ryanhammonds
Copy link
Member

ryanhammonds commented Jun 3, 2021

@TomDonoghue, I rebased main and resolved conflicts here. I also re-wrote the swm implementation with several modifications from the original to get things working. The main deviations from the published swm algorithm are:

  1. Each iteration is similar to an neural net epoch, randomly moving each windows, in random order. The original implementation does not guarantee even window sampling.
  2. New window acceptance is determined via increased correlation coefficients and reduced variance across windows. Before, this was a cost measure that didn't work.
  3. Phase optimization / realignment to escape local minima.
  4. Remove low variance windows to speed up runtime.

These changes allow the algorithm to scale to much longer timeseries.

Copy link
Member Author

@TomDonoghue TomDonoghue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ryanhammonds - thanks for coming back to this one!

In terms of quick / practical code things, I:

  • removed the old plot_filtering.py tutorial file (that has since been split up in main) that was re-added here.
  • fixed the doctest in plot_swm_pattern
  • fixed the SWM tutorial to run with the new version of the SWM function
  • did some quick linting of the code

In terms of the SWM implementation: I don't have any strong thoughts / ideas. You know more than me about this approach, so I'm happy to roll with what you propose here! I think as long as we're clear in the docs on the differences (which I think we are), it should be good.

Just a couple quick-check Qs:

  • the SWM function now returns the 2d array of all windows (when it used to return an average window). In terms of plotting and the tutorial, taking the average across this set of windows makes sense, right? Is that expected behaviour
  • Based on your understanding of the data, does the pattern found in the tutorial make sense to you? I'm not sure I expect the 'double peak' it current finds. Can you have a quick look to see if you think the output of the example is good / representative?

Other than that, if you are happy with this implementation, and think it performs well (at least roughly similar to the Matlab version), then let's go for it, and finally merge this PR!

@ryanhammonds
Copy link
Member

ryanhammonds commented Jun 3, 2021

Thanks for helping clean the rebase up! As for the questions:

  • The 2d array is intended. I think it's a bit more flexible this way - for things like the std/var across windows or for clustering windows.
  • I pushed an update to the tutorial. The double peak was basically just noise and had very low variance. What was happening was that the window spacing param was large relative to the window length and the default number of iterations (100) wasn't enough to search the wide space between windows to find a proper motif. I ended up just lowering the the spacing to equal the window length to address this, and added a var_thresh to speed the search up.

@TomDonoghue
Copy link
Member Author

@ryanhammonds - this all makes sense to me! I re-ran the tutorial, and it seems to pull out a reasonable peak.

This PR should be ready to merge, I think!

@ryanhammonds ryanhammonds merged commit ea27979 into main Jun 3, 2021
@TomDonoghue TomDonoghue deleted the rhythm branch June 3, 2021 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.2 Updates to go into a 2.2.0 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants