-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tech-Audit #1: Code & Analysis #193
Comments
So I finished reading through the filter tutorial and submodule. I have a few comments, which I'll list in decreasing urgency. Your method of implementing an IIR filter risks running into some serious numerical instability. In iir.py, apply_iir_filter(), you call the filtfilt method. For high-order butterworth filters, this could lead to disaster. Instead, you should use sosfiltfilt(). This requires only minor modifications. Namely, when calling scipy.signal.butter(), use output='sos'. Feed this, instead of the b values and a values, into sosfiltfilt. Here's a minimal working example, inspired from the filter tutorial, which shows how these two methods diverge when the butterworth order is 5. import numpy as np
from neurodsp.filt import filter_signal
from neurodsp.sim import sim_combined
from neurodsp.utils import create_times
from neurodsp.plts.time_series import plot_time_series
from scipy.signal import butter, sosfiltfilt
# Set the random seed, for consistency simulating data
np.random.seed(0)
# General setting for simulations
fs = 1000
n_seconds = 5
# Generate a times vector, for plotting
times = create_times(n_seconds, fs)
# Set the frequency in our simulated signal
freq = 6
# Set up simulation for a signal with an oscillaton + noise
components = {"sim_powerlaw": {"exponent": 0}, "sim_oscillation": {"freq": 6}}
variances = [0.1, 1]
# Simulate our signal
sig = sim_combined(n_seconds, fs, components, variances)
# Define a frequency range to filter the data
f_range = (4, 8)
butterworth_order = 5
# Bandpass filter the data, across the band of interest
sig_filt = filter_signal(
sig, fs, "bandpass", f_range, filter_type="iir", butterworth_order=butterworth_order
)
# Now compare to using sosfilt.
sos = butter(butterworth_order, f_range, btype="bandpass", output="sos", fs=fs)
sig_filt_sos = sosfiltfilt(sos, sig)
plot_time_series(times, [sig, sig_filt_sos], ["Raw", "SOS Filtered"])
plot_time_series(times, [sig, sig_filt], ["Raw", "Filtered"]) I'm happy to incorporate these changes if you'd like. Just let me know if so. If you're curious as to why this helps the stability, the reason is because sos stands for "second order series". It factors the transfer function into a product of rational functions where the numerators and denominators are quadratics, rather than higher order polynomials. These factors can now be iteratively applied to get the result. |
In fir.py, the compute_filter_length() function forces the filter length to be odd. In effect, this means you'll always be using a Type I filter. Is this intentional? |
In both fir.py and iir.py, when you call scipy.signal.firwin (resp. scipy.signal.butter), you feed in the normalized frequencies, whereby I mean you've divided by the nyquist frequency. In the case of firwin, the nyq key word argument is deprecated at least as of scipy 1.50. You can now feed in the sampling rate directly! This is also the case for butter(). You can optionally feed in the sampling rate as a keyword argument. I did this in my minimal working example in the comment above. |
In iir.py, design_iir_filter() throws a warning if the pass_type is not bandstop. Why are you throwing this warning? Butterworth filters can certainly be used for lowpass and highpass filters, despite that their effective transition band can be quite large. |
Just finished working through the time-frequency submodule. Here are some comments in decreasing order of urgency. In hilbert.py, your instantaneous phase computation can generate a discontinuous phase-curve. This is mildly problematic for when you compute instantaneous frequency, as you're trying to differentiate a non-differentiable function. It looks like you fudge the instantaneous phase in the freq_by_time() function by adding 2*pi to all locations where the pairwise differences are negative. A more elegant way of addressing this would be to use np.unwrap(np.angle(robust_hilbert()) inside phase_by_time. This will give you a monotone increasing phase. Further, when a user plots instantaneous phase and instantaneous frequency, it will be clearer that one is the derivative of the other. Here's a minimal working example to show you the difference between what is currently outputted and what using np.unwrap would give you: import numpy as np
from neurodsp.sim import sim_combined
from neurodsp.utils import create_times
from neurodsp.plts.time_series import plot_time_series
from neurodsp import timefrequency as tf
from matplotlib import pyplot as plt
# Set the random seed, for consistency simulating data
np.random.seed(0)
# General setting for simulations
fs = 1000
n_seconds = 5
# Generate a times vector, for plotting
times = create_times(n_seconds, fs)
# Set the frequency in our simulated signal
freq = 6
# Set up simulation for a signal with an oscillaton + noise
components = {"sim_powerlaw": {"exponent": 0}, "sim_oscillation": {"freq": 6}}
variances = [0, 1]
# Simulate our signal
sig = sim_combined(n_seconds, fs, components, variances)
analytic_sig = tf.robust_hilbert(sig, increase_n=False)
instantaneous_phase = tf.phase_by_time(sig, fs)
instantaneous_phase_cots = np.unwrap(np.angle(analytic_sig))
instantaneous_frequency = tf.freq_by_time(sig, fs)
fig, axes = plt.subplots(2,1, figsize=(8,6))
fig2, axes2 = plt.subplots(2,1, figsize=(8,6))
plot_time_series(times, instantaneous_phase, ax=axes[0])
plot_time_series(times, instantaneous_frequency, ax=axes[1])
fig.suptitle("Misleading Differentiation of Discontinuous Phase", fontsize=16)
plot_time_series(times, instantaneous_phase_cots, ax=axes2[0])
plot_time_series(times, instantaneous_frequency, ax=axes2[1])
fig2.suptitle("Differentiation of Continuous Phase", fontsize=16) I will incorporate this change with a small PR if you think it's worth including. |
Inside wavelets.py, calling compute_wavelet_transform() returns a time-frequency diagram where time is on the vertical axis and frequency is plotted on the horizontal axis. This goes against the conventions that I typically see when looking at time-frequency diagrams. It's usually the case that time is on the x-axis and frequency is on the y-axis. I think this is probably just an aesthetic point, but users who are accustomed to calling wavelet transforms like scipy.signal.cwt may be confused by this change of convention. |
Thank you @elybrand for great detailed comments! Here I'll do some quick replies to the points for filtering, and tag in lab members as needed to check anything.
Thanks for the working example, very useful to see the problem! And like, woah. To the best of my knowledge, the updates you propose make sense to me. Unless @rdgao has any more thoughts on this, I think we can move forward with your proposed updates, if you can start a PR for this. The only current implementation detail that comes to mind is that this update seems like it might have some downstream effects of requiring updates to how we manage filter coefficients in sub-functions such as
I'm going to pitch this one to @rdgao, because I'm not sure on the answer.
This, as I understand it, is basically suggesting to update to use newer API elements from
Yeah, as I understand it, the original warning was to suggest against what might be a somewhat un-ideal (though valid) filter design. Within neuroscience, my understanding is that IIR's are generally only really used for line noise bandstop, and for other use cases FIR have better properties for the task at hand. That said, as you say, other filter types are valid, and since we can't in general know the context or check and warn for overall ideal properties, I agree on we don't need this warning, and can & probably should consider it up to the user to choose filters. We can still mention some recommendations in the tutorials. So, unless anyone disagrees, let's just drop this warning. |
And some comments on
Thank you again for the example! As I understand this, there are two points here:
If I'm understanding, and we can separate these things, I think my first thoughts are:
If I'm understanding, and we try not to change Does this track? I think keeping a wrapped phase output is consistent with how people might use
Yes, this is a good point. Although changing this would change the output, I think we can consider it as basically a bug to be returning like this, and I would vote for updating the shape organization of the returned array. I don't think our wavelets are used all that much, and this at least shouldn't break any other internal code to update, so I vote for changing. |
This is a very good point. I'll have another look over to see if changing the return type from coefficients to sos would wreak havoc downstream. If it does, I'll punt that PR to @ryanhammonds.
If the return of phase_by_time is unchanged, then it's a pretty easy fix. The hacks in freq_by_time can be addressed by calling np.wrap. I will defer to your expertise on whether or not phase_by_time should be monotone or should wrap around. As you say, it may be the convention in neuroscience. I also don't know yet what changing the output does downstream for other applications. |
just hopping on with some potential clarifications:
But we should absolutely use unwrap() in the instantaneous freq calculation. I don't know why the hell it's so janky currently, probably Scott's doing. |
I actually realized the answer to this as I was writing out this comment. It turns out to be twofold. First, having an odd number of coefficients means that the linear phase shift (due to symmetry) is an integer shift. This, as you say, does in fact let you find a map between time stamps of the filtered and unfiltered data. What's more, type II filters (even number of coefficients) are a poor choice for bandstop and high pass filters because their transfer functions have roots at the nyquist frequency z=-1 = e^{i pi}. Using a type II high pass filter on the signal [1, -1, 1, -1], a signal whose lone frequency is the nyquist frequency, would unfortunately give you [0,0,0,0]. Since NeuroDSP more or less treats all pass types the same, it makes sense to use Type I filters whose transfer functions do not have a zero at z=-1.
That makes sense. However, I just did a sanity check and I couldn't find anywhere in NeuroDSP that tries to correct for the phase shift with FIR filters. I spent a good amount of time trying to cook up signals which would break it and I couldn't. I went down this deep rabbit hole to see why the phase shift was visibly non-existent, and the best I can conjecture is that
lol |
After reading through the I skimmed the Gips et al paper and it looks like the motifs are suppose to be disjoint, i.e. don't overlap. If this is the case, then I believe there is a problem with import numpy as np
np.random.seed(0)
def _find_new_window_idx(window_starts, spacing_n_samps, n_samp, tries_limit=1000):
"""Find a new sample for the starting window."""
for n_try in range(tries_limit):
# Generate a random sample & check how close it is to other window starts
new_samp = np.random.randint(n_samp)
dists = np.abs(window_starts - new_samp)
if np.min(dists) > spacing_n_samps:
break
else:
raise RuntimeError('SWM algorithm has difficulty finding a new window. \
Try increasing the spacing parameter.')
return new_samp
# Current windows
# [0 9] 10 11 12 [13 22].
# There are no feasible moves for either window, assuming
# they must be disjoint.
len_sig = 23
window_length = 10
n_samp = len_sig - window_length
window_starts = np.array([0, 13])
spacing_n_samps = 3
idx = _find_new_window_idx(window_starts, spacing_n_samps, n_samp)
print(idx) # == 5 You'll have to feed in the window lengths to make sure you get a feasible index. You'll also have to compute the distance from a putative new window start index to the set of indices of each window. |
Catching up on things here: Filtering
Time-Frequency
RhythmWhat you point out generally makes sense to me, but overall I'm not too familiar with this algorithm.
|
At the very least, we'd have to write a function which would verify whether a proposed starting index is valid. That kind of function wouldn't be too difficult to write. Once we have that, you could use rejection sampling as is done now until you get a valid index or exceed the maximum number of iterations. |
not me, and no idea about the Gips algorithm, probably better to ask Natalie, tbh. |
This issue is part of a 'technical audit' of
neurodsp
, with @elybrand.The first task / goal is to check through the code and in particular, all the analysis functions. For the relevant sub-modules, we primarily want to check that everything, as implemented, is technically sound, and that the settings and implementations are appropriate for our purposes. It's also an opportunity to identify any places we could add extensions to functionality, and/or update and replace current approaches, if there appear to be better options available.
In the case of
filt
,timefrequency
, andspectral
, these sub-modules implement basic and standard DSP procedures for the application to neuroscientific signals. The goal here is to check that these functions perform as expected, and look through to check for any possible issues with the current implementations - doing a check & audit for correctness.In the case of
burst
andrhythm
, these sub-modules implement particular algorithms that have been proposed for the analysis of neuroscientific data. In this case, I think it makes sense to perhaps have a look at these functions and, in so far as you can, check that the implementations seem technically valid (that there are no mistakes in the calculations). This doesn't need to address whether these algorithms are ideal / well designed for there stated tasks (I think that's somewhat out of scope for this review).How you go about doing this is largely up to you! As you mentioned, I think using some simulations and checking that outputs are all as expected makes sense (these can be collected and later used to augment our code tests). For anything that comes up, feel free to comment it here, start a chat on Slack, and/or open a new issue for anything we should address.
The text was updated successfully, but these errors were encountered: