Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "Performance Index" for GSD #122

Open
AKuederle opened this issue Mar 26, 2024 · 19 comments
Open

Add "Performance Index" for GSD #122

AKuederle opened this issue Mar 26, 2024 · 19 comments
Assignees
Labels
enhancement New feature or request

Comments

@AKuederle
Copy link
Contributor

In https://preprints.jmir.org/preprint/50035/accepted and https://pubmed.ncbi.nlm.nih.gov/37316858/ we use variations of the weighted performance index introduced in https://www.mdpi.com/1424-8220/20/22/6509

It would be nice to also implement that here. @felixkluge you already implemented a custom version for GSD. Would it be feasible to port this?

@AKuederle AKuederle added the enhancement New feature or request label Mar 26, 2024
@felixkluge
Copy link
Contributor

felixkluge commented Mar 26, 2024 via email

@AKuederle
Copy link
Contributor Author

@felixkluge Any updates on this? :)

@felixkluge
Copy link
Contributor

No, not yet, do you have a timeline in mind?

@AKuederle
Copy link
Contributor Author

As soon as possible, latest end of June ;)

@felixkluge
Copy link
Contributor

felixkluge commented Jun 6, 2024

Micó-Amigo (https://pubmed.ncbi.nlm.nih.gov/37316858/) used:

Domain Type Metric Weight
Gait sequence duration accuracy cost Mean of relative absolute duration error 0.122
Gait sequence duration accuracy cost Std. dev. of relative absolute duration error 0.122
ICC benefit ICC of gait sequence duration 0.196
Performance benefit Specificity 0.178
Performance benefit Accuracy 0.160
Performance benefit Sensitivity 0.117
Performance benefit Positive Predictive Value 0.105

Kluge et al (https://doi.org/10.2196/50035) used (see also Multimedia Appendix 1):

Domain Type Metric Weight
Gait sequence duration accuracy cost Mean of relative absolute duration error 0.104
Gait sequence duration accuracy cost Std. dev. of relative absolute duration error 0.104
ICC benefit ICC of gait sequence duration 0.167
Performance benefit Specificity 0.151
Performance benefit Accuracy 0.135
Performance benefit Sensitivity 0.100
Performance benefit Positive Predictive Value 0.089
Gait sequence fragmentation cost Relative absolute # gait sequence error 0.150

@felixkluge
Copy link
Contributor

The latter work added Relative absolute # gait sequence error for the following reason (based on a thought experiment)

Assumptions

  • 20 s recording
  • One 10 s reference gait sequence
  • Algorithm detects nine 1 s gait sequences within 10 s reference GS
  • No other gait detected (outside reference GS)

-> Fragmentation of gait not captured in previously used metrics (which look fine), but in the new one:

Metric Outcome
Sensitivity 9 s / 10 s = 90 %
Specificity 10 s / 10 s = 100 %
PPV 9 s / 9 s = 100 %
Accuracy (9 + 10) s / 20 s = 95 %
GS duration error 9 s - 10 s = - 1 s
Relative absolute GS duration error 10 %
# GS error 9 - 1 = 8
Relative absolute #GS error 8 / 1 = 800 %

@felixkluge
Copy link
Contributor

As the Relative absolute #GS error is not scaled and normalized yet, it wil be considered as "cost" factor as defined by Bonci et al (2020):

image

@felixkluge
Copy link
Contributor

In general, parameters that are not normalized yet, need to be normalized:
image
This concerns the GS duration errors and # GS errors

@felixkluge
Copy link
Contributor

felixkluge commented Jun 6, 2024

@AKuederle : Please check 9fa1b27 , I added a new example for performance index calculation.

Basically, we define a dictionary which includes information about a) which underlying score, b) which normalization (cost/benefit/none), c) which aggregation (mean/std/other), and the respective weight to use.

The performance index is the sum of all above defined metrics.

@AKuederle
Copy link
Contributor Author

Great! I am a little concerned by the normalization. Shouldn't we scale the theoretical limits instead of the observed limits? Otherwise the performance index is not at all comparable between groups of participants.

E.g. let's say in the HA population the worst accuracy was 0.8 and the best accuracy was 0.9, then 0.8 will be considered a 0 (aka worst possible score).
And if in another cohort the range is 0.3-0.8, 0.8 will be considered a 1. So best possible score or am I missing something?

@felixkluge
Copy link
Contributor

felixkluge commented Jun 7, 2024

For the standard metrics (precision, recall, ...) which are defined in the interval [0, 1], normalization is set to None -> no normalization is performed but original values are used:

...
"recall_mean": {
        "metric": "single_recall",
        "normalization": None,
        "aggregation": lambda x: np.mean(x),
        "weight": 0.117,
    },
...

It might not be the most elegant way, as recall is actually a "benefit" criterion. Maybe an additional parameter should be added to separate "normalization" (True/False) and "criterion_type" ("cost"/"benefit"). E.g.,

"recall_mean": {
        "metric": "single_recall",
        "criterion": "benefit",
        "normalization": False,
        "aggregation": lambda x: np.mean(x),
        "weight": 0.117,
    }

Regarding the other metrics (e.g., based on duration), there are no theoretical limits, so those will be based on the data itself.

@AKuederle : Please check 68858d1

@AKuederle
Copy link
Contributor Author

I think for the other values, we also need to "hard code" the value ranges to make them comparable. So we need to define what we consider best and worst, also for things like duration error

@felixkluge
Copy link
Contributor

felixkluge commented Jun 7, 2024

Yes, it is kind of problematic to have this dynamic (data dependent) normalization range, but this is how it was originally implemented.

The current definition of the performance index has this limitation:

  • If the underlying data is the same (e.g., comparing different algorithms on the same dataset), comparison of performance is possible.
  • It is currently not advisable to compare the performance index using different data sets

The challenge is, that the error of the duration might be unlimited, so it will be hard to hard code theoretical (especially upper) limits.

We could add a capping or transformation (e.g. sigmoid or 1 - exp(-x)) function. This would be a change compared to the current Mobilise-D validation, but better in the long run, I suppose.

What do you think?

@AKuederle
Copy link
Contributor Author

Do you think sigmoid is required? Or would "capping" be sufficient. Let's say for duration error, we define no error 0s -> 1, worst case 60s -> 0. If the error is larger 60s it is just a zero? But of course that also heavily depends on the length of the recording etc... difficult...

@felixkluge
Copy link
Contributor

felixkluge commented Jun 7, 2024

For duration error within [0,inf), f(x) = 1 - exp(-x) could make sense (as sigmoid would rather consider (-inf, inf)).
Not sure whether either transformation or capping (with an arbitrary threshold) makes sense.

@felixkluge
Copy link
Contributor

In addition, if I understand correctly, we would also not be in line with the cost/benefit definition of Bonci et al. any more

@AKuederle
Copy link
Contributor Author

For duration error within [0,inf), f(x) = 1 - exp(-x) could make sense (as sigmoid would rather consider (-inf, inf)). Not sure whether either transformation or capping (with an arbitrary threshold) makes sense.

That seems reasonable!

In addition, if I understand correctly, we would also not be in line with the cost/benefit definition of Bonci et al. any more

Is the higher level definition not as simple as: Benefit -> best value = 0, Cost -> best_value = 0

So it is either the value or 1-value after normalization. I think it makes sense to separate the normalization and the cost-benefit transformation mentally :)

@felixkluge
Copy link
Contributor

Makes sense. I decoupled and added the different normalization methods in 1a5d055

@felixkluge
Copy link
Contributor

felixkluge commented Jun 10, 2024

Hi @AKuederle , I created a respective pull request. The discussed changes should be included now. Please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants