Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary workaround for pandas and xarray frequency changes #1540

Closed
wants to merge 2 commits into from

Conversation

aulemahal
Copy link
Collaborator

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features)
    • (If applicable) Documentation has been added / updated (for bug fixes / features)
  • CHANGES.rst has been updated (with summary of main changes)
    • Link to issue (:issue:number) and pull request (:pull:number) has been added

This PR became a bit more complicated when I realized that xarray's latest release implements the "new" standards, but pandas 2.2 has not yet been released. And some of those changes are not implemented at all yet in pandas1. I.e. pd.tseries.frequencies.to_offset(xr.infer_freq(da)) sometimes raises an error.

What kind of change does this PR introduce?

  • New function fix_freq : it takes in a frequency code and translates it to the appropriate syntax. There are, sadly, three cases:
    • xarray >= 2023.11.0, pandas >= 2.2 : The "new" syntax (see below). Some changes are not yet supported by pandas 2.1.
    • xarray >= 2023.11.0, but pandas < 2.2 : The "inter[mediate]" syntax. Changes the subdaily codes, but not the coarser ones.
    • xarray < 2023.11.0, pandas < 2.2 : The "old" syntax, the one you know.

By « syntax » I mean the default codes. The "inter" version is supported by all previous xarray versions and pandas >= 1.0.

  • All hard-coded frequencies have been adapted for the "inter" syntax. (in the code, tests and notebooks)

  • fix_freq has been added everywhere conflict was possible

  • Raised the minimum pin on pandas so we can at least use the "inter" codes and not have a fourth category.

My goal here was to reduce the number of warnings for users. I wanted to avoid xclim's default to trigger user warnings or even errors. I don't think I achieved that goal, but I hope the transition will be smoother nonetheless.

Does this PR introduce a breaking change?

For sure. I think a normal "beginner" usage of xclim would be unchanged, but I'm not so sure about things like xscen or custom indicator modules. But some of that is broken anyway with xarray 2023.11.0 and would further break with pandas 2.2. Hopefully, we are able to control the damage with this PR and makes minimum breaking changes here only.

Other information:

The long term goal is to remove fix_freq when we pin pandas >= 2.2. This PR also introduces a few workarounds to remove at the moment, all denoted with "FIXME" comments.

Major differences

old inter new
AS-JAN AS-JAN YS-JAN2
A-JAN A-JAN YE-JAN
Q-DEC Q-DEC QE-DEC
M M ME
H h h
T min min
S s s

Footnotes

  1. and I was a reviewer on that PR in xarray. Woopsie!

  2. The "new" syntax is not yet supported by pandas (< 2.2), but xarray (>= 2023.11) will warn and return that in xr.infer_freq.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added sdba Issues concerning the sdba submodule. docs Improvements to documenation indicators Climate indices and indicators labels Nov 28, 2023
Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well played. What are the chances that we could push the freq fixing stuff upstream to xarray? This change must be affecting tons of users in addition to us.

Comment on lines +809 to +821
def _get_freq_version(version: str | None = None) -> str:
if version is None:
if Version("2023.11.0") <= Version(xr.__version__) and Version(
"2.2"
) <= Version(pd.__version__):
version = "new"
elif Version("2023.11.0") <= Version(xr.__version__) or Version(
"2.2"
) <= Version(pd.__version__):
version = "inter"
else:
version = "old"
return version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of not having version written 20 times, I would substitute it for convention.

base = base[:-1]
if base not in "YQAM":
raise ValueError(
f"Invalid frequency: {freq}. Base {base[:-1]} can't have an start (S) or end (E) anchor."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Invalid frequency: {freq}. Base {base[:-1]} can't have an start (S) or end (E) anchor."
f"Invalid frequency: {freq}. Base {base[:-1]} cannot have a start (S) or end (E) anchor."

anchor is not None and anchor not in _MONTH_ABBREVIATIONS.values()
):
raise ValueError(
f"Invalid frequency: {freq}, not fit for usage with xclim, xarray and cftime. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Invalid frequency: {freq}, not fit for usage with xclim, xarray and cftime. "
f"Invalid frequency: `{freq}` not fit for usage with installed xclim, xarray, and cftime. "

base: str,
start_anchored: bool,
anchor: str | None,
version: str | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version: str | None = None,
convention: str | None = None,

}


# TODO: This can be removed when we pin pandas >= 2.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can integrate a small helper function here to check what the required pins are?

Something like:

from importlib.metadata import distribution

for req in distribution("xclim").requires:
    if req.split(";")[0].startswith("pandas>=2.2"):
        raise FixMeOrSomethingPleaseException()

@github-actions github-actions bot added the approved Approved for additional tests label Nov 28, 2023
@aulemahal
Copy link
Collaborator Author

Closed as this seems to complicated. A drastic change in the pins seems cleaner.

@aulemahal aulemahal closed this Nov 29, 2023
@aulemahal aulemahal deleted the new-freq branch May 6, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved for additional tests docs Improvements to documenation indicators Climate indices and indicators sdba Issues concerning the sdba submodule.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adopt new pandas frequency codes
2 participants