-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor multiple warning levels for same data validation filter #461
Refactor multiple warning levels for same data validation filter #461
Conversation
The PR looks good to me, but I'm wondering how the "skipping" will work in the follow-up PR? The current approach separates one validation-item with multiple criteria/warning-levels in separate items - how will the validation know that an alternative variant of the same item was already triggered? I'm wondering if a better solution wouldn't be the other direction:
|
Makes sense, and yes, makes the descending severity validation sequence easier to implement. Question? in translating the legacy format to the new format, say the file contains multiple (separate) criteria for the same validation filter, but these may appear in any order in the file. It should be checked if the same type of item has been passed already and update it instead of creating a new one? That is, even if starting as separate legacy items, they end up aggregated in the object. |
Make it easy for us, implementation-wise:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm confused about the order of validation-criteria? But it would be useful to have another variant of the test where only the low-warning-level criteria is triggered...
Co-authored-by: Daniel Huppmann <[email protected]>
Fixed the warning level skips; now, the DataFrame is updated to remove the already flagged rows for the same criteria (sharing different warning levels). The DataFrame is reset for new criteria. Had to use pandas to manipulate the IamDataFrame, since I couldn't find a clean and convenient method in pyam to match/drop rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall, two minor suggestions inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small comments below. The correct use of ErrorCollector
should be a quick fix. After that, you can go ahead with the merge without another review from my side. Thanks for the work @dc-almeida.
fail_list.append( | ||
textwrap.indent(str(failed_validation), prefix=" ") + "\n" | ||
per_item_df = df | ||
for criterion in item.validation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much for this PR but as a future refactoring, I'd suggest to add an apply
function (or something along those lines) to DataValidationCriteriaMultiple
. This way this code would look something like this:
for item in self.criteria_items:
item.apply(df)
and the rest is handled by DataValidationCriteriaMultiple
.
Closes #439
Allows for a more compact writing of multiple warning level criteria (if warning level is ommitted, it defaults to error). Backwards compatible with more verbose syntax.
Stores the criteria for each level in descending order of severity (useful for upcoming PR that skips validation for lower warnings if a higher one failed).