-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lr ml #106
Conversation
gaitlink/lrd/_lr_optimizer.py
Outdated
from gaitlink.lrd import LrdUllrich | ||
|
||
|
||
# TODO: this might be removed in the future, but I think it makes life easier, as people might not be familiar with tpcp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like having that as a separate thing. I think it adds very little value and just removes flexibility.
But, I fully agree people need some guidance on how to do the optimization.
Can we do an example instead that shows step by step how to retrain/hyper para optimize the model?
That would also be great to have, in case we need to retrain any of the models in the future.
tests/test_lrd/test_ml.py
Outdated
algo = LrdUllrich(**test_params) | ||
data = pd.DataFrame([], columns=["acc_x", "acc_y", "acc_z", "gyr_x", "gyr_y", "gyr_z"]) | ||
ic_list = pd.DataFrame({"ic": []}) | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty input should not raise, but produce empty output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we limit the number of models to the one that we actually need for now? A big problem will be reproducibility and usability of the models with future versions of sklearn. We need to somehow still ensure that the models can be retrained in the future. Hence, best to limit what data is required for that. See #89 for details. I think for now we only need the ms_project_all and ms_project_ms model.
tests/test_lrd/test_ml.py
Outdated
algo = LrdUllrich(**params) | ||
data = pd.DataFrame(np.zeros((100, 6)), columns=["acc_x", "acc_y", "acc_z", "gyr_x", "gyr_y", "gyr_z"]) | ||
ic_list = pd.DataFrame({"ic": []}) | ||
with pytest.raises(ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
gaitlink/lrd/_lrd_ml.py
Outdated
"model": model, "scaler": scaler} | ||
|
||
|
||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use proper defaults instead of doing the None check later.
gaitlink/lrd/_lrd_ml.py
Outdated
self.scaler = MinMaxScaler() | ||
|
||
# Fit the scaler if it hasn't been fitted yet, otherwise just transform the data | ||
if not hasattr(self.scaler, 'scale_'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be more explicit when using the sklearn is_fitted
helper
gaitlink/lrd/_lrd_ml.py
Outdated
ics = ics.copy() | ||
|
||
# Apply Butterworth filtering and extract the first and second derivatives. | ||
gyr = data[["gyr_x", "gyr_z"]].rename({"gyr_x": "v", "gyr_z": "ap"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this renaming? The explict names are not really used anywhere, right?
gaitlink/lrd/_lrd_ml.py
Outdated
# Squash the multi index | ||
signal_paras.columns = ["_".join(c) for c in signal_paras.columns] | ||
|
||
# shift the last IC by 3 samples to make the second derivative work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make that conditional? I.e. only if the last Ic is really to close to the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should document this as an edgecase in the docstring of the Algorithm
tests/test_lrd/test_ml.py
Outdated
from gaitlink.lrd import LrdUllrich | ||
|
||
|
||
class TestMetaLrdUllrich(TestAlgorithmMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look great.
I would suggest we add a regression test for the two models that we will keep (see comment above), so that in case we retrain in the future, we see if the results change.
As an example for the regression test see here:
I would suggest to use the reference ICs as input to LR algo for the test.
gaitlink/lrd/_lrd_ml.py
Outdated
|
||
@classmethod | ||
def _load_from_file(cls, model_name): | ||
base_dir = Path(__file__).parent / os.getenv('MODEL_DIR', 'pretrained_models') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to load files that are part of a package this should be done using from importlib.resources import files
. When a package is installed, it can be quite difficult to figure out what the correct file path is then. We use that approach in data_transform._filter.py
. To make that work with joblib, use joblib.loads
instead of load
to load the binary blob instead of the file.
gaitlink/lrd/_lrd_ml.py
Outdated
smoothing_filter: BaseFilter = cf(ButterworthFilter(order = 4, cutoff_freq_hz = (0.5, 2), filter_type="bandpass")) | ||
) -> None: | ||
super().__init__() | ||
if model is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same way you set the ButterworthFilter as default you can also just set the other paras.
But in this case, as you want to have the defaults from the predefined parameters you can use set_defaults
utility. Have a look here on how to use it: https://github.com/mobilise-d/gaitlink/blob/89b32f4330e68df81f5db81737bfecd11bfc66db/gaitlink/wba/_wb_assembly.py#L127
gaitlink/lrd/_lrd_ml.py
Outdated
""" | ||
if isinstance(model, ClassifierMixin): | ||
return model | ||
raise TypeError(f"Unknown model type {type(model)}. The model must be of type {ClassifierMixin}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The curly braces around ClassiefierMixin
might do strange things. You want the name of the class there right?
gaitlink/lrd/_lrd_ml.py
Outdated
self.ic_list = ic_list | ||
self.sampling_rate_hz = sampling_rate_hz | ||
|
||
if data.empty: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be simplified to:
if data.empty or ic_list.empty
gaitlink/lrd/_lrd_ml.py
Outdated
mapping = {0: "left", 1: "right"} | ||
prediction_per_gs = prediction_per_gs.replace(mapping) | ||
|
||
self.ic_lr_list_ = pd.DataFrame({"ic": self.ic_list.values.flatten(), "lr_label": prediction_per_gs["lr_label"]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.value
is deprecated. Use to_numpy
gaitlink/lrd/_lrd_ml.py
Outdated
all_features = pd.concat(features, axis=0, ignore_index=True) if len(features) > 1 else features[0] | ||
|
||
try: | ||
check_is_fitted(self.scaler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHeck is fitted is actually called internally by transform. So the error would happen naturally.
However, thinking about this. What is the scenario where I have an unfitted scalar but a fitted model? Isn't an unfitted scalar a sign that something went wrong?
gaitlink/lrd/_utils.py
Outdated
import pandas as pd | ||
|
||
|
||
def extract_ref_data(datapoint): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be simpliefied using the iter_gs
utility and the reference_parameters_relative_to_wb_
property of the dataset.
imu_data = datapoint.data["LowerBack"]
ref = datapoint.reference_parameters_relative_to_wb_
return zip(*[(data, ref.ic_list.loc[wb.id]["ic"], ref.ic_list.loc[wb.id]["lr_label"]) for wb, data in iter_gs(imu_data, ref.wb_list)])
```
(have not tested, just from the top of my head)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we actually delete all the files we don't need?
tests/test_lrd/test_ml.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run the tests locally? A bunch of them are not passing :)
tests/test_lrd/test_ml.py
Outdated
assert (output["lr_label"] == ["right", "left", "right"]).all() | ||
|
||
# TODO: this needs checking | ||
class testRegression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of notes:
- This will only be recognized as a valid test if the name is uppercase ->
TestRegression
- If you wan to you can also compress this into one test, by also paramterizing the params. So applying the
@pytest.mark.parametrize("config_name, config", [("msp_all", LrdUllrich.PredefinedParameters.msproject_all), ("msp_ms", LrdUllrich.PredefinedParameters.msproject_ms)
in addition to the existing paramterize will run the test for all combinations of datasets and configs. I would suggest including the conig name in the name of the snapshotfile (second argument to snapshot.assert_match), so that the file can be easily identified later. - To keep things consistent, use the GsIterator as done in the other examples
TODO: Squash-Merge! |
…eper configuration
No description provided.