Refactor multiple warning levels for same data validation filter #461

dc-almeida · 2025-01-21T09:59:55Z

Closes #439
Allows for a more compact writing of multiple warning level criteria (if warning level is ommitted, it defaults to error). Backwards compatible with more verbose syntax.
Stores the criteria for each level in descending order of severity (useful for upcoming PR that skips validation for lower warnings if a higher one failed).

…tor-multiple-warning-refactor

danielhuppmann · 2025-01-21T11:05:03Z

The PR looks good to me, but I'm wondering how the "skipping" will work in the follow-up PR?

The current approach separates one validation-item with multiple criteria/warning-levels in separate items - how will the validation know that an alternative variant of the same item was already triggered?

I'm wondering if a better solution wouldn't be the other direction:

if the legacy-format is used, translate to validation-item to the new multiple-criteria-variant
extend the validation to go over the criteria within a validation-item until one is triggered and write to log

dc-almeida · 2025-01-21T14:00:03Z

Makes sense, and yes, makes the descending severity validation sequence easier to implement.

Question? in translating the legacy format to the new format, say the file contains multiple (separate) criteria for the same validation filter, but these may appear in any order in the file. It should be checked if the same type of item has been passed already and update it instead of creating a new one? That is, even if starting as separate legacy items, they end up aggregated in the object.

danielhuppmann · 2025-01-21T14:05:03Z

In translating the legacy format to the new format, say the file contains multiple (separate) criteria for the same validation filter, but these may appear in any order in the file.

Make it easy for us, implementation-wise:

Don't worry about overlaps of criteria in the simple format. If there are multiple criteria items that check the same thing, there will be multiple warnings.
Instead of actively sorting the multiple criteria in one validation item, just do a validation-step of the criteria that the sub-items must be given in descending order.

…class

danielhuppmann

Maybe I'm confused about the order of validation-criteria? But it would be useful to have another variant of the test where only the low-warning-level criteria is triggered...

tests/test_cli.py

tests/data/validation/validate_data/validate_warning_joined_asc.yaml

tests/test_validate_data.py

Co-authored-by: Daniel Huppmann <[email protected]>

dc-almeida · 2025-01-29T14:23:45Z

Fixed the warning level skips; now, the DataFrame is updated to remove the already flagged rows for the same criteria (sharing different warning levels). The DataFrame is reset for new criteria. Had to use pandas to manipulate the IamDataFrame, since I couldn't find a clean and convenient method in pyam to match/drop rows.

tests/test_validate_data.py

danielhuppmann

Looks good to me overall, two minor suggestions inline

phackstock

Two small comments below. The correct use of ErrorCollector should be a quick fix. After that, you can go ahead with the merge without another review from my side. Thanks for the work @dc-almeida.

nomenclature/processor/data_validator.py

phackstock · 2025-01-30T11:24:44Z

nomenclature/processor/data_validator.py

-                    fail_list.append(
-                        textwrap.indent(str(failed_validation), prefix="    ") + "\n"
+                per_item_df = df
+                for criterion in item.validation:


Too much for this PR but as a future refactoring, I'd suggest to add an apply function (or something along those lines) to DataValidationCriteriaMultiple. This way this code would look something like this:

for item in self.criteria_items: item.apply(df)

and the rest is handled by DataValidationCriteriaMultiple.

David Almeida added 2 commits January 20, 2025 17:33

Refactor multiple warning levels for same data validation filter

b01834b

Merge remote-tracking branch 'upstream/main' into feature/data-valida…

316cc9b

…tor-multiple-warning-refactor

dc-almeida requested a review from danielhuppmann January 21, 2025 09:59

dc-almeida added the enhancement New feature or request label Jan 21, 2025

dc-almeida self-assigned this Jan 21, 2025

dc-almeida marked this pull request as ready for review January 21, 2025 10:06

David Almeida added 2 commits January 24, 2025 11:34

Coerce legacy format criteria into new DataValidatorCriteriaMultiple …

d598857

…class

Stop validation after higher severity fail; update tests

78a61ad

danielhuppmann reviewed Jan 27, 2025

View reviewed changes

tests/test_cli.py Outdated Show resolved Hide resolved

tests/data/validation/validate_data/validate_warning_joined_asc.yaml Show resolved Hide resolved

tests/test_validate_data.py Show resolved Hide resolved

Remove debugging print

6737493

Co-authored-by: Daniel Huppmann <[email protected]>

dc-almeida mentioned this pull request Jan 29, 2025

Remove None defaults from RegionAggregationMapping #466

Merged

Fix warning level skipping; add third variant to test

4a411e9

dc-almeida requested a review from phackstock January 29, 2025 14:21

dc-almeida requested a review from danielhuppmann January 29, 2025 14:23

danielhuppmann reviewed Jan 29, 2025

View reviewed changes

tests/test_validate_data.py Outdated Show resolved Hide resolved

danielhuppmann reviewed Jan 29, 2025

View reviewed changes

tests/test_validate_data.py Outdated Show resolved Hide resolved

danielhuppmann approved these changes Jan 29, 2025

View reviewed changes

David Almeida added 2 commits January 29, 2025 15:59

Apply suggested changes

c0b65d9

Remove ErrorCollector

20c6b4b

phackstock mentioned this pull request Jan 30, 2025

Documentation for DataValidator #467

Open

phackstock approved these changes Jan 30, 2025

View reviewed changes

dc-almeida merged commit 6ec1e19 into IAMconsortium:main Jan 30, 2025
11 checks passed

dc-almeida deleted the feature/data-validator-multiple-warning-refactor branch January 30, 2025 15:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor multiple warning levels for same data validation filter #461

Refactor multiple warning levels for same data validation filter #461

dc-almeida commented Jan 21, 2025

danielhuppmann commented Jan 21, 2025

dc-almeida commented Jan 21, 2025

danielhuppmann commented Jan 21, 2025

danielhuppmann left a comment

dc-almeida commented Jan 29, 2025

danielhuppmann left a comment

phackstock left a comment

phackstock Jan 30, 2025

Refactor multiple warning levels for same data validation filter #461

Refactor multiple warning levels for same data validation filter #461

Conversation

dc-almeida commented Jan 21, 2025

danielhuppmann commented Jan 21, 2025

dc-almeida commented Jan 21, 2025

danielhuppmann commented Jan 21, 2025

danielhuppmann left a comment

Choose a reason for hiding this comment

dc-almeida commented Jan 29, 2025

danielhuppmann left a comment

Choose a reason for hiding this comment

phackstock left a comment

Choose a reason for hiding this comment

phackstock Jan 30, 2025

Choose a reason for hiding this comment