Lr ml #106

AKuederle · 2024-03-13T08:31:57Z

No description provided.

AKuederle · 2024-03-13T08:34:07Z

gaitlink/lrd/_lr_optimizer.py

+from gaitlink.lrd import LrdUllrich
+
+
+# TODO: this might be removed in the future, but I think it makes life easier, as people might not be familiar with tpcp.


I still don't like having that as a separate thing. I think it adds very little value and just removes flexibility.

But, I fully agree people need some guidance on how to do the optimization.

Can we do an example instead that shows step by step how to retrain/hyper para optimize the model?

That would also be great to have, in case we need to retrain any of the models in the future.

AKuederle · 2024-03-13T08:57:04Z

tests/test_lrd/test_ml.py

+        algo = LrdUllrich(**test_params)
+        data = pd.DataFrame([], columns=["acc_x", "acc_y", "acc_z", "gyr_x", "gyr_y", "gyr_z"])
+        ic_list = pd.DataFrame({"ic": []})
+        with pytest.raises(ValueError):


Empty input should not raise, but produce empty output.

AKuederle · 2024-03-13T08:57:09Z

gaitlink/lrd/pretrained_models/icicle_all_all_model.gz

Can we limit the number of models to the one that we actually need for now? A big problem will be reproducibility and usability of the models with future versions of sklearn. We need to somehow still ensure that the models can be retrained in the future. Hence, best to limit what data is required for that. See #89 for details. I think for now we only need the ms_project_all and ms_project_ms model.

AKuederle · 2024-03-13T08:57:24Z

tests/test_lrd/test_ml.py

+        algo = LrdUllrich(**params)
+        data = pd.DataFrame(np.zeros((100, 6)), columns=["acc_x", "acc_y", "acc_z", "gyr_x", "gyr_y", "gyr_z"])
+        ic_list = pd.DataFrame({"ic": []})
+        with pytest.raises(ValueError):


Same as above

AKuederle · 2024-03-13T08:58:34Z

gaitlink/lrd/_lrd_ml.py

+                    "model": model, "scaler": scaler}
+
+
+    def __init__(self,


Use proper defaults instead of doing the None check later.

AKuederle · 2024-03-13T09:03:08Z

gaitlink/lrd/_lrd_ml.py

+            self.scaler = MinMaxScaler()
+
+        # Fit the scaler if it hasn't been fitted yet, otherwise just transform the data
+        if not hasattr(self.scaler, 'scale_'):


Might be more explicit when using the sklearn is_fitted helper

AKuederle · 2024-03-13T09:04:11Z

gaitlink/lrd/_lrd_ml.py

+        ics = ics.copy()
+
+        # Apply Butterworth filtering and extract the first and second derivatives.        
+        gyr = data[["gyr_x", "gyr_z"]].rename({"gyr_x": "v", "gyr_z": "ap"})


Do we need this renaming? The explict names are not really used anywhere, right?

AKuederle · 2024-03-13T09:05:13Z

gaitlink/lrd/_lrd_ml.py

+        # Squash the multi index
+        signal_paras.columns = ["_".join(c) for c in signal_paras.columns]
+
+        # shift the last IC by 3 samples to make the second derivative work    


Can we make that conditional? I.e. only if the last Ic is really to close to the end?

And we should document this as an edgecase in the docstring of the Algorithm

tests/test_lrd/test_ml.py

AKuederle · 2024-03-13T09:34:29Z

tests/test_lrd/test_ml.py

+from gaitlink.lrd import LrdUllrich
+
+
+class TestMetaLrdUllrich(TestAlgorithmMixin):


Tests look great.

I would suggest we add a regression test for the two models that we will keep (see comment above), so that in case we retrain in the future, we see if the results change.

As an example for the regression test see here:

https://github.com/mobilise-d/gaitlink/blob/1a750978690a991ac40c367d17f624b5c63f9071/tests/test_icd/test_icd_ionescu.py#L54

I would suggest to use the reference ICs as input to LR algo for the test.

AKuederle · 2024-03-15T10:49:37Z

gaitlink/lrd/_lrd_ml.py

+
+        @classmethod
+        def _load_from_file(cls, model_name):
+            base_dir = Path(__file__).parent / os.getenv('MODEL_DIR', 'pretrained_models')


If you want to load files that are part of a package this should be done using from importlib.resources import files. When a package is installed, it can be quite difficult to figure out what the correct file path is then. We use that approach in data_transform._filter.py. To make that work with joblib, use joblib.loads instead of load to load the binary blob instead of the file.

AKuederle · 2024-03-15T11:01:23Z

gaitlink/lrd/_lrd_ml.py

+                smoothing_filter: BaseFilter = cf(ButterworthFilter(order = 4, cutoff_freq_hz = (0.5, 2), filter_type="bandpass"))
+    ) -> None:
+        super().__init__()
+        if model is None:


In the same way you set the ButterworthFilter as default you can also just set the other paras.

But in this case, as you want to have the defaults from the predefined parameters you can use set_defaults utility. Have a look here on how to use it: https://github.com/mobilise-d/gaitlink/blob/89b32f4330e68df81f5db81737bfecd11bfc66db/gaitlink/wba/_wb_assembly.py#L127

AKuederle · 2024-03-15T11:02:15Z

gaitlink/lrd/_lrd_ml.py

+        """
+        if isinstance(model, ClassifierMixin):
+            return model
+        raise TypeError(f"Unknown model type {type(model)}. The model must be of type {ClassifierMixin}")


The curly braces around ClassiefierMixin might do strange things. You want the name of the class there right?

AKuederle · 2024-03-15T11:03:06Z

gaitlink/lrd/_lrd_ml.py

+        self.ic_list = ic_list
+        self.sampling_rate_hz = sampling_rate_hz
+
+        if data.empty:


Can be simplified to:

if data.empty or ic_list.empty

AKuederle · 2024-03-15T11:07:06Z

gaitlink/lrd/_lrd_ml.py

+        mapping = {0: "left", 1: "right"}
+        prediction_per_gs = prediction_per_gs.replace(mapping)
+
+        self.ic_lr_list_ = pd.DataFrame({"ic": self.ic_list.values.flatten(), "lr_label": prediction_per_gs["lr_label"]})


.value is deprecated. Use to_numpy

AKuederle · 2024-03-15T11:12:04Z

gaitlink/lrd/_lrd_ml.py

+        all_features = pd.concat(features, axis=0, ignore_index=True) if len(features) > 1 else features[0]
+
+        try:
+            check_is_fitted(self.scaler)


CHeck is fitted is actually called internally by transform. So the error would happen naturally.

However, thinking about this. What is the scenario where I have an unfitted scalar but a fitted model? Isn't an unfitted scalar a sign that something went wrong?

AKuederle · 2024-03-15T11:36:49Z

gaitlink/lrd/_utils.py

+import pandas as pd
+
+
+def extract_ref_data(datapoint):


I think this can be simpliefied using the iter_gs utility and the reference_parameters_relative_to_wb_ property of the dataset.

imu_data = datapoint.data["LowerBack"] ref = datapoint.reference_parameters_relative_to_wb_ return zip(*[(data, ref.ic_list.loc[wb.id]["ic"], ref.ic_list.loc[wb.id]["lr_label"]) for wb, data in iter_gs(imu_data, ref.wb_list)]) ``` (have not tested, just from the top of my head)

AKuederle · 2024-03-15T11:37:15Z

gaitlink/lrd/pretrained_models/icicle_all_all_model.gz

Can we actually delete all the files we don't need?

AKuederle · 2024-03-15T11:38:31Z

tests/test_lrd/test_ml.py

Can you run the tests locally? A bunch of them are not passing :)

AKuederle · 2024-03-15T11:55:22Z

tests/test_lrd/test_ml.py

+        assert (output["lr_label"] == ["right", "left", "right"]).all()
+
+# TODO: this needs checking
+class testRegression:


A couple of notes:

This will only be recognized as a valid test if the name is uppercase -> TestRegression

If you wan to you can also compress this into one test, by also paramterizing the params. So applying the @pytest.mark.parametrize("config_name, config", [("msp_all", LrdUllrich.PredefinedParameters.msproject_all), ("msp_ms", LrdUllrich.PredefinedParameters.msproject_ms) in addition to the existing paramterize will run the test for all combinations of datasets and configs. I would suggest including the conig name in the name of the snapshotfile (second argument to snapshot.assert_match), so that the file can be easily identified later.

To keep things consistent, use the GsIterator as done in the other examples

…ests

AKuederle · 2024-04-15T11:22:58Z

TODO: Squash-Merge!

…effects better

… issues

…eper configuration

AKuederle commented Mar 13, 2024

View reviewed changes

AKuederle assigned alexstihi Mar 13, 2024

AKuederle mentioned this pull request Mar 13, 2024

LR_Detection #52

Closed

AKuederle commented Mar 15, 2024

View reviewed changes

alexstihi and others added 5 commits April 15, 2024 13:16

Auto stash before rebase of "lr_ml" onto "origin/main"

56d8b6c

ported first version of ML-based L/R detection and added some basic t…

95465ba

…ests

fixed tests and cleaned up the code, following Arne's review

433081f

removed extra models, fixed regression test & other minor fixes

725523b

Autoformatting

1ea8a9e

AKuederle added 10 commits April 15, 2024 13:26

Added selected models back after rebase

97de86c

Formatting

34335fa

Removed remaining gailink references

1a0f3b4

Fixed regression tests

610ca7c

Renamed models folder

c6be281

Refactored model loading and caching

b4ec481

Some small refactoring and switching diff -> gradient to handle edge …

d8d8996

…effects better

Docstring and linting

353cbb9

Updated snapshots after switch to gradient over diff

90bda72

Rework of LR pipeline

849d280

AKuederle force-pushed the lr_ml branch from dece2df to 849d280 Compare April 15, 2024 15:33

AKuederle added 9 commits April 16, 2024 11:02

Ignored types

8ea77f4

Further work on self-optimize for the pipeline

d7d1ae7

Some typos

6a45f25

Added full evaluation example and fixed some remaining implementation…

31ee4cb

… issues

Unified naming for self-optimize methods

b2419f0

The big renaming

f60fda3

Switched from individual scaler and model to pipeline to allow for de…

d69ffd8

…eper configuration

Further renaming

c88e28d

Updated some of the tests

19a2d40

AKuederle added 7 commits April 16, 2024 19:03

More docstrings, small naming improvements and tests for pipeline

51ba1db

Comments

bc70134

Added regression test for evaluation example

d6a4168

Added changelog'

9702b90

Added example for LrcUllrich and refactored other Lrc examples

3afce5e

Typo

458ac29

Fixed import for Py3.9

1655fca

AKuederle merged commit 6eb8a8d into main Apr 17, 2024
9 checks passed

AKuederle deleted the lr_ml branch April 17, 2024 13:17

AKuederle mentioned this pull request Apr 17, 2024

General Naming LRD (Left-Right-Detection) is probably better named LRC (Left-Right-Classification #129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lr ml #106

Lr ml #106

AKuederle commented Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 13, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle Mar 15, 2024

AKuederle commented Apr 15, 2024

		from gaitlink.lrd import LrdUllrich


		# TODO: this might be removed in the future, but I think it makes life easier, as people might not be familiar with tpcp.

		from gaitlink.lrd import LrdUllrich


		class TestMetaLrdUllrich(TestAlgorithmMixin):

Lr ml #106

Lr ml #106

Conversation

AKuederle commented Mar 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AKuederle commented Apr 15, 2024