Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Control resampling at halfyear with origin #60928

Open
2 of 3 tasks
rwijtvliet opened this issue Feb 13, 2025 · 8 comments · May be fixed by #60946
Open
2 of 3 tasks

ENH: Control resampling at halfyear with origin #60928

rwijtvliet opened this issue Feb 13, 2025 · 8 comments · May be fixed by #60946
Assignees
Labels
Enhancement Frequency DateOffsets Resample resample method

Comments

@rwijtvliet
Copy link

rwijtvliet commented Feb 13, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
s1 = pd.Series(1, pd.date_range('2025', freq='D', periods=700)).resample('2QS-JAN').sum()
s2 = pd.Series(1, pd.date_range('2025-04', freq='D', periods=700)).resample('2QS-JAN').sum()

# s1 expectedly has timestamps in january and july
# s1
# 2025-01-01    181
# 2025-07-01    184
# 2026-01-01    181
# 2026-07-01    154
# Freq: 2QS-JAN, dtype: int64    # NB frequency

# but s2 unexpectedly has timestamps in april and october
# s2
# 2025-04-01    183
# 2025-10-01    182
# 2026-04-01    183
# 2026-10-01    152
# Freq: 2QS-JAN, dtype: int64    # NB frequency

s1.index.freq == s2.index.freq   # True

Issue Description

It seems there is no way to force where the period boundaries are when resampling at the 2-Quarter frequency. Resampling at 2QS-APR gives the same results for s1 and s2 as those shown above.

Expected Behavior

I'd expect the index of s2 to also have timestamps on the first of January and July.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5c python : 3.10.12 python-bits : 64 OS : Linux OS-release : 6.9.3-76060903-generic Version : #202405300957~1738770968~22.04~d5f7c84 SMP PREEMPT_DYNAMIC Wed F machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : 7.3.7
IPython : 8.29.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : None
matplotlib : 3.9.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : 8.3.3
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2024.2
qtpy : None
pyqt5 : None

@rwijtvliet rwijtvliet added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 13, 2025
@asifmohammed1
Copy link

.

@rhshadrach
Copy link
Member

Thanks for the report, I don't think the expectation is correct. It appears to me pandas consistently resamples based on the first observation.

s = pd.Series(1, pd.date_range('2025-04-04', freq='D', periods=5))
print(s)
# 2025-04-02    1
# 2025-04-03    1
# 2025-04-04    1
# 2025-04-05    1
# 2025-04-06    1

print(s.resample('3D').sum())
# 2025-04-02    3
# 2025-04-05    2
# Freq: 3D, dtype: int64

As such, in your example, the first observation for s2 is in the April quarter, and pandas goes every 2-quarters from there on.

You can control this for certain frequencies with origin, but it has no effect for quarters. It's not clear to me if that's because we cannot support it (e.g. it's ambiguous in certain cases), do not desire to support it (e.g. complexity), or just don't yet. Further investigations are welcome!

@rhshadrach rhshadrach added Enhancement Needs Discussion Requires discussion from core team before further action Resample resample method Frequency DateOffsets and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 14, 2025
@rhshadrach rhshadrach changed the title BUG: resampling at halfyear interval gives unexpected frequency ENH: Control resampling at halfyear with origin Feb 14, 2025
@snitish
Copy link
Contributor

snitish commented Feb 15, 2025

@rhshadrach we currently have a QuarterBegin offset where we can specify the starting month. Would adding a new HalfYearBegin offset, with a customizable starting month, solve OP's issue? It should be relatively simple to achieve this imo.

@snitish
Copy link
Contributor

snitish commented Feb 15, 2025

Perhaps it's not a bad idea to add HalfYear offsets in general given we have Quarter and Year based offsets.

@rhshadrach
Copy link
Member

I'm positive on this, especially if it is a simple addition.

@snitish
Copy link
Contributor

snitish commented Feb 15, 2025

take

@rhshadrach rhshadrach removed the Needs Discussion Requires discussion from core team before further action label Feb 15, 2025
@snitish snitish linked a pull request Feb 17, 2025 that will close this issue
6 tasks
@rwijtvliet
Copy link
Author

rwijtvliet commented Feb 17, 2025

Thanks guy for this fast and constructive discussion. Very happy to see this get implemented; thanks @snitish for your contribution!

@snitish
Copy link
Contributor

snitish commented Feb 17, 2025

@rwijtvliet you're welcome! However, please note - this change is still under review, so you might want to reopen the issue for tracking purposes (it'll be closed automatically after the PR is merged).

@rhshadrach rhshadrach reopened this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Frequency DateOffsets Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants