-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] - Update rhythm code & docs #230
Conversation
I dug into this and also think that there is an issue with the current implementation. Eric's comment in the #193 ("You need to select from indices which are at least In Eric's code example in #193, the
should be "separated by twice the minimal spacing", so I think the indices should be:
Then, for any position (Xn), a new/random position (Xn') must be at least I'm going to have to revisit this because I could be completely wrong. I took a peek at the matlab implementation and it's going to take me a bit to parse through. |
@TomDonoghue I pushed an algorithm update in 54592dc. Here is a notebook I used for testing the update and comparing the results to the matlab implementation. It seems to recover average window better than the matlab version. Before the update, the example in the notebook would return an error. The version I pushed really only works for a small number of windows (i.e. short signals / large window length and spacing) and a small number of iterations. The computation time exponentially increases with the number of windows. I think this is because it brute forces distances (i.e. cost difference) between every combination of windows, rather than using Markov Chain Monte Carlo sampling like in the paper / matlab version. If we wanted to use MCMC sampling, I think we should add an external dependency unless we wanted to add at |
Thanks for digging into this one @ryanhammonds! In terms of comparing our version to the Matlab version - we have a minimal version. I don't think it's worth it right now to massively extend this algorithm (unless / until someone really needs it) - so I vote not trying to add MCMC etc at the moment. To be more clear about our implementation, I've added some description to the docstring, and fleshed out sub-function descriptions to try and better track what's going on. Being clear this is a minimal implementation, I think as long as we are confident about what it does, we should merge this as is. I've left a couple review comments on the code for things I'm not clear on - once those are done, then this should be good to go without major extension. |
This reverts commit 856a1fd.
@TomDonoghue, I rebased main and resolved conflicts here. I also re-wrote the swm implementation with several modifications from the original to get things working. The main deviations from the published swm algorithm are:
These changes allow the algorithm to scale to much longer timeseries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ryanhammonds - thanks for coming back to this one!
In terms of quick / practical code things, I:
- removed the old
plot_filtering.py
tutorial file (that has since been split up inmain
) that was re-added here. - fixed the doctest in
plot_swm_pattern
- fixed the SWM tutorial to run with the new version of the SWM function
- did some quick linting of the code
In terms of the SWM implementation: I don't have any strong thoughts / ideas. You know more than me about this approach, so I'm happy to roll with what you propose here! I think as long as we're clear in the docs on the differences (which I think we are), it should be good.
Just a couple quick-check Qs:
- the SWM function now returns the 2d array of all windows (when it used to return an average window). In terms of plotting and the tutorial, taking the average across this set of windows makes sense, right? Is that expected behaviour
- Based on your understanding of the data, does the pattern found in the tutorial make sense to you? I'm not sure I expect the 'double peak' it current finds. Can you have a quick look to see if you think the output of the example is good / representative?
Other than that, if you are happy with this implementation, and think it performs well (at least roughly similar to the Matlab version), then let's go for it, and finally merge this PR!
Thanks for helping clean the rebase up! As for the questions:
|
@ryanhammonds - this all makes sense to me! I re-ran the tutorial, and it seems to pull out a reasonable peak. This PR should be ready to merge, I think! |
This PR is for updates to the
rhythm
module, including code documentation, and tutorials.So far, it updates the tutorials (since they were a bit limited) and in code documentation.
During this PR, I noticed an issue I noticed in SWM, that was also mentioned by Eric in #193 about how it does the window spacing. I think we need to audit and update this code a bit.
@ryanhammonds : could you do a check through the SWM code algorithm (see paper here: https://github.com/bartgips/SWM, and Matlab implementation here: https://github.com/bartgips/SWM), and see what you think might need fixing? There are a couple notes in #193 about it as well.
From my check, I think at the window check should ensure samples are less than X samples apart from another (not greater than). I think there should perhaps also be an additional check to ensure windows are distinct time segments.
As a sidenote, I'm not really sure that SWM needs to return costs, so maybe we can update to make that an optional return, and move toward deprecating that output.