non-uniform datetime type #292

JGuetschow · 2024-11-11T11:09:53Z

Describe the bug

Depending on the function the format of the returned time coordinates seems to differ. While a dataset created with e.g. pd.daterange() has the type np.datatime64[ns] , the return value of test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}] has the datatype np.datetime64[D]

Failing Test
Below a test I constructed to show the behaviour. It fails because (I think) the internal date conversion of xr.polyfit and xr.polyval use the format of the np.datetime64 variables to convert the dates to integers used for the actual fitting and evaluation. The example below works if the dates are manually specified as having nanosecond resolution.

import numpy as np
import pandas as pd
import pytest
import xarray as xr
from primap2.tests.utils import assert_aligned_equal

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1970-01-01", "1980-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [np.datetime64('1970-01-01')]}],  # specify 'ns' for test to pass
        value,
        rtol=1e-04,
    )

Expected behavior

It would be good if the date resolution is fixed without having to specify it manually. It would also be good if it's e.g. days and not nanoseconds (but not necessary unless nanoseconds has a negative impact on the fits)

System (please complete the following information):

Linux mint 22

Additional context

In the filling strategy code I'm currently writing I use nanosecond for now, because that seems to be the default behaviour of pd.date_range. But a pr.loc[{'time': ['1999-01-01]}] gives days back, so it's not a general solution to just be specific in the new code.

The text was updated successfully, but these errors were encountered:

JGuetschow · 2024-11-11T12:36:09Z

I think this could also be considered a bug in xarray, unless I'm doing something wrong.

JGuetschow · 2024-11-11T13:58:19Z

The fix for the test above only works for the specific dates in the test. 1970-01-01 is special as it is zero for the internal time. If we replace 1970-1980 it by e.g. 1956-1966 the test fails.

JGuetschow · 2024-11-11T14:50:55Z

I found the reason for the problem. It's not the time resolution as I first suspected but a coordinate recomputation in xr.polyfit. The function _floatize_x shifts the time coordinate such that the minimal value is 0 to avoid accuracy problems as np.datetime64 uses 64 bit integers which can not be represented by 64 bit floats with the necessary accuracy. All further computations are done with the shifted coordinates and thus the coefficients are also computed in relation to the shifted coordinates. As 1970 is zero this case worked while other dates did not work. The following code takes the shifting of _floatize_x into account:

def test_temp_polyval():
    test_ts = xr.DataArray(
        np.linspace(6, 12, 11),
        coords={"time": pd.date_range("1956-01-01", "1966-01-01", freq="YS")},
        dims="time",
        name="test_ts",
    )

    time_to_eval = np.datetime64('1957-01-01')

    fit = test_ts.polyfit(dim='time', deg=1, skipna=True)
    value = xr.polyval(
        test_ts.coords['time'].loc[{'time': [time_to_eval]}]-test_ts.coords['time'].data[0],
        fit.polyfit_coefficients
    )

    value.name='test_ts' # for assertion

    assert_aligned_equal(
        test_ts.loc[{'time': [time_to_eval]}],
        value,
        rtol=1e-03,
    )
    return None

I still think this is very weird behaviour. Maybe I missed a hint when reading the docs.

JGuetschow · 2024-11-12T07:12:01Z

I've opened a second issue for the xr.polyfit problem (#293 )

znichollscr · 2024-11-12T09:53:02Z

I'd push this upstream to xarray. I can't believe this behaviour is intentional

JGuetschow · 2024-11-12T13:53:16Z

After thinking a bit about the code in xr.polyfit, I think we should definitely try to move away from nanosecond precision to e.g. hours or minutes. This should improve the accuracy of the fits

JGuetschow added the bug Something isn't working label Nov 11, 2024

JGuetschow mentioned this issue Nov 12, 2024

xarray polyfit time coordinate treatment #293

Closed

JGuetschow added this to the Composite Source Generator milestone Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non-uniform datetime type #292

non-uniform datetime type #292

JGuetschow commented Nov 11, 2024 •

edited

Loading

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 12, 2024

znichollscr commented Nov 12, 2024

JGuetschow commented Nov 12, 2024

non-uniform datetime type #292

non-uniform datetime type #292

Comments

JGuetschow commented Nov 11, 2024 • edited Loading

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 11, 2024

JGuetschow commented Nov 12, 2024

znichollscr commented Nov 12, 2024

JGuetschow commented Nov 12, 2024

JGuetschow commented Nov 11, 2024 •

edited

Loading