Add "Performance Index" for GSD #122

AKuederle · 2024-03-26T12:40:59Z

In https://preprints.jmir.org/preprint/50035/accepted and https://pubmed.ncbi.nlm.nih.gov/37316858/ we use variations of the weighted performance index introduced in https://www.mdpi.com/1424-8220/20/22/6509

It would be nice to also implement that here. @felixkluge you already implemented a custom version for GSD. Would it be feasible to port this?

felixkluge · 2024-03-26T15:10:37Z

Should be possible, timeline TBD. Let me take a look.

AKuederle · 2024-05-03T07:50:03Z

@felixkluge Any updates on this? :)

felixkluge · 2024-05-03T08:12:09Z

No, not yet, do you have a timeline in mind?

AKuederle · 2024-05-03T08:21:50Z

As soon as possible, latest end of June ;)

felixkluge · 2024-06-06T09:59:05Z

Micó-Amigo (https://pubmed.ncbi.nlm.nih.gov/37316858/) used:

Domain	Type	Metric	Weight
Gait sequence duration accuracy	cost	Mean of relative absolute duration error	0.122
Gait sequence duration accuracy	cost	Std. dev. of relative absolute duration error	0.122
ICC	benefit	ICC of gait sequence duration	0.196
Performance	benefit	Specificity	0.178
Performance	benefit	Accuracy	0.160
Performance	benefit	Sensitivity	0.117
Performance	benefit	Positive Predictive Value	0.105

Kluge et al (https://doi.org/10.2196/50035) used (see also Multimedia Appendix 1):

Domain	Type	Metric	Weight
Gait sequence duration accuracy	cost	Mean of relative absolute duration error	0.104
Gait sequence duration accuracy	cost	Std. dev. of relative absolute duration error	0.104
ICC	benefit	ICC of gait sequence duration	0.167
Performance	benefit	Specificity	0.151
Performance	benefit	Accuracy	0.135
Performance	benefit	Sensitivity	0.100
Performance	benefit	Positive Predictive Value	0.089
Gait sequence fragmentation	cost	Relative absolute # gait sequence error	0.150

felixkluge · 2024-06-06T10:05:22Z

The latter work added Relative absolute # gait sequence error for the following reason (based on a thought experiment)

Assumptions

20 s recording
One 10 s reference gait sequence
Algorithm detects nine 1 s gait sequences within 10 s reference GS
No other gait detected (outside reference GS)

-> Fragmentation of gait not captured in previously used metrics (which look fine), but in the new one:

Metric	Outcome
Sensitivity	9 s / 10 s = 90 %
Specificity	10 s / 10 s = 100 %
PPV	9 s / 9 s = 100 %
Accuracy	(9 + 10) s / 20 s = 95 %
GS duration error	9 s - 10 s = - 1 s
Relative absolute GS duration error	10 %
# GS error	9 - 1 = 8
Relative absolute #GS error	8 / 1 = 800 %

felixkluge · 2024-06-06T10:28:34Z

As the Relative absolute #GS error is not scaled and normalized yet, it wil be considered as "cost" factor as defined by Bonci et al (2020):

felixkluge · 2024-06-06T11:04:17Z

In general, parameters that are not normalized yet, need to be normalized:

This concerns the GS duration errors and # GS errors

felixkluge · 2024-06-06T12:20:32Z

@AKuederle : Please check 9fa1b27 , I added a new example for performance index calculation.

Basically, we define a dictionary which includes information about a) which underlying score, b) which normalization (cost/benefit/none), c) which aggregation (mean/std/other), and the respective weight to use.

The performance index is the sum of all above defined metrics.

AKuederle · 2024-06-06T13:47:32Z

Great! I am a little concerned by the normalization. Shouldn't we scale the theoretical limits instead of the observed limits? Otherwise the performance index is not at all comparable between groups of participants.

E.g. let's say in the HA population the worst accuracy was 0.8 and the best accuracy was 0.9, then 0.8 will be considered a 0 (aka worst possible score).
And if in another cohort the range is 0.3-0.8, 0.8 will be considered a 1. So best possible score or am I missing something?

felixkluge · 2024-06-07T04:27:16Z

For the standard metrics (precision, recall, ...) which are defined in the interval [0, 1], normalization is set to None -> no normalization is performed but original values are used:

...
"recall_mean": {
        "metric": "single_recall",
        "normalization": None,
        "aggregation": lambda x: np.mean(x),
        "weight": 0.117,
    },
...

It might not be the most elegant way, as recall is actually a "benefit" criterion. Maybe an additional parameter should be added to separate "normalization" (True/False) and "criterion_type" ("cost"/"benefit"). E.g.,

"recall_mean": {
        "metric": "single_recall",
        "criterion": "benefit",
        "normalization": False,
        "aggregation": lambda x: np.mean(x),
        "weight": 0.117,
    }

Regarding the other metrics (e.g., based on duration), there are no theoretical limits, so those will be based on the data itself.

@AKuederle : Please check 68858d1

AKuederle · 2024-06-07T08:28:31Z

I think for the other values, we also need to "hard code" the value ranges to make them comparable. So we need to define what we consider best and worst, also for things like duration error

felixkluge · 2024-06-07T08:38:11Z

Yes, it is kind of problematic to have this dynamic (data dependent) normalization range, but this is how it was originally implemented.

The current definition of the performance index has this limitation:

If the underlying data is the same (e.g., comparing different algorithms on the same dataset), comparison of performance is possible.
It is currently not advisable to compare the performance index using different data sets

The challenge is, that the error of the duration might be unlimited, so it will be hard to hard code theoretical (especially upper) limits.

We could add a capping or transformation (e.g. sigmoid or 1 - exp(-x)) function. This would be a change compared to the current Mobilise-D validation, but better in the long run, I suppose.

What do you think?

AKuederle · 2024-06-07T09:13:47Z

Do you think sigmoid is required? Or would "capping" be sufficient. Let's say for duration error, we define no error 0s -> 1, worst case 60s -> 0. If the error is larger 60s it is just a zero? But of course that also heavily depends on the length of the recording etc... difficult...

felixkluge · 2024-06-07T09:34:14Z

For duration error within [0,inf), f(x) = 1 - exp(-x) could make sense (as sigmoid would rather consider (-inf, inf)).
Not sure whether either transformation or capping (with an arbitrary threshold) makes sense.

felixkluge · 2024-06-07T09:40:58Z

In addition, if I understand correctly, we would also not be in line with the cost/benefit definition of Bonci et al. any more

AKuederle · 2024-06-07T09:46:53Z

For duration error within [0,inf), f(x) = 1 - exp(-x) could make sense (as sigmoid would rather consider (-inf, inf)). Not sure whether either transformation or capping (with an arbitrary threshold) makes sense.

That seems reasonable!

In addition, if I understand correctly, we would also not be in line with the cost/benefit definition of Bonci et al. any more

Is the higher level definition not as simple as: Benefit -> best value = 0, Cost -> best_value = 0

So it is either the value or 1-value after normalization. I think it makes sense to separate the normalization and the cost-benefit transformation mentally :)

felixkluge · 2024-06-07T12:30:44Z

Makes sense. I decoupled and added the different normalization methods in 1a5d055

felixkluge · 2024-06-10T09:16:45Z

Hi @AKuederle , I created a respective pull request. The discussed changes should be included now. Please check.

AKuederle added the enhancement New feature or request label Mar 26, 2024

AKuederle assigned felixkluge Mar 26, 2024

felixkluge mentioned this issue Jun 10, 2024

Add performance index #149

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "Performance Index" for GSD #122

Add "Performance Index" for GSD #122

AKuederle commented Mar 26, 2024

felixkluge commented Mar 26, 2024 via email •

edited

Loading

AKuederle commented May 3, 2024

felixkluge commented May 3, 2024

AKuederle commented May 3, 2024

felixkluge commented Jun 6, 2024 •

edited

Loading

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024 •

edited

Loading

AKuederle commented Jun 6, 2024

felixkluge commented Jun 7, 2024 •

edited

Loading

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024 •

edited

Loading

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024 •

edited

Loading

felixkluge commented Jun 7, 2024

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024

felixkluge commented Jun 10, 2024 •

edited

Loading

Add "Performance Index" for GSD #122

Add "Performance Index" for GSD #122

Comments

AKuederle commented Mar 26, 2024

felixkluge commented Mar 26, 2024 via email • edited Loading

AKuederle commented May 3, 2024

felixkluge commented May 3, 2024

AKuederle commented May 3, 2024

felixkluge commented Jun 6, 2024 • edited Loading

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024

felixkluge commented Jun 6, 2024 • edited Loading

AKuederle commented Jun 6, 2024

felixkluge commented Jun 7, 2024 • edited Loading

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024 • edited Loading

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024 • edited Loading

felixkluge commented Jun 7, 2024

AKuederle commented Jun 7, 2024

felixkluge commented Jun 7, 2024

felixkluge commented Jun 10, 2024 • edited Loading

felixkluge commented Mar 26, 2024 via email •

edited

Loading

felixkluge commented Jun 6, 2024 •

edited

Loading

felixkluge commented Jun 6, 2024 •

edited

Loading

felixkluge commented Jun 7, 2024 •

edited

Loading

felixkluge commented Jun 7, 2024 •

edited

Loading

felixkluge commented Jun 7, 2024 •

edited

Loading

felixkluge commented Jun 10, 2024 •

edited

Loading