Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-uniform datetime type #292

Open
JGuetschow opened this issue Nov 11, 2024 · 6 comments
Open

non-uniform datetime type #292

JGuetschow opened this issue Nov 11, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@JGuetschow
Copy link
Contributor

JGuetschow commented Nov 11, 2024

Describe the bug

Depending on the function the format of the returned time coordinates seems to differ. While a dataset created with e.g. pd.daterange() has the type np.datatime64[ns] , the return value of test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}] has the datatype np.datetime64[D]

Failing Test
Below a test I constructed to show the behaviour. It fails because (I think) the internal date conversion of xr.polyfit and xr.polyval use the format of the np.datetime64 variables to convert the dates to integers used for the actual fitting and evaluation. The example below works if the dates are manually specified as having nanosecond resolution.

import numpy as np
import pandas as pd
import pytest
import xarray as xr
from primap2.tests.utils import assert_aligned_equal

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1970-01-01", "1980-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        value,
        rtol=1e-04,
    )

Expected behavior

It would be good if the date resolution is fixed without having to specify it manually. It would also be good if it's e.g. days and not nanoseconds (but not necessary unless nanoseconds has a negative impact on the fits)

System (please complete the following information):

Linux mint 22

Additional context

In the filling strategy code I'm currently writing I use nanosecond for now, because that seems to be the default behaviour of pd.date_range. But a pr.loc[{'time': ['1999-01-01]}] gives days back, so it's not a general solution to just be specific in the new code.

@JGuetschow JGuetschow added the bug Something isn't working label Nov 11, 2024
@JGuetschow
Copy link
Contributor Author

I think this could also be considered a bug in xarray, unless I'm doing something wrong.

@JGuetschow
Copy link
Contributor Author

The fix for the test above only works for the specific dates in the test. 1970-01-01 is special as it is zero for the internal time. If we replace 1970-1980 it by e.g. 1956-1966 the test fails.

@JGuetschow
Copy link
Contributor Author

I found the reason for the problem. It's not the time resolution as I first suspected but a coordinate recomputation in xr.polyfit. The function _floatize_x shifts the time coordinate such that the minimal value is 0 to avoid accuracy problems as np.datetime64 uses 64 bit integers which can not be represented by 64 bit floats with the necessary accuracy. All further computations are done with the shifted coordinates and thus the coefficients are also computed in relation to the shifted coordinates. As 1970 is zero this case worked while other dates did not work. The following code takes the shifting of _floatize_x into account:

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1956-01-01", "1966-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    time_to_eval = np.datetime64('1957-01-01')

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [time_to_eval]}]-test_ts.coords['time'].data[0],
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [time_to_eval]}],
        value,
        rtol=1e-03,
    )
    return None

I still think this is very weird behaviour. Maybe I missed a hint when reading the docs.

@JGuetschow
Copy link
Contributor Author

I've opened a second issue for the xr.polyfit problem (#293 )

@znichollscr
Copy link

I'd push this upstream to xarray. I can't believe this behaviour is intentional

@JGuetschow
Copy link
Contributor Author

After thinking a bit about the code in xr.polyfit, I think we should definitely try to move away from nanosecond precision to e.g. hours or minutes. This should improve the accuracy of the fits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants