Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error casting strings with large dates to Timestamp #7208

Open
ryzhyk opened this issue Feb 27, 2025 · 1 comment
Open

Error casting strings with large dates to Timestamp #7208

ryzhyk opened this issue Feb 27, 2025 · 1 comment
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@ryzhyk
Copy link

ryzhyk commented Feb 27, 2025

Describe the bug

Attempting to convert a string with a valid timestamp with a large date in the ISO format (e.g., +10999-12-31T00:00:00) results in:

Error parsing timestamp from '+10999-12-31T00:00:00': error parsing date

To Reproduce

#[test]
fn test_cast_string_with_large_timestamp_to_timestamp() {
    let array = Arc::new(StringArray::from(vec![
        Some("+10999-12-31T00:00:00"),
    ])) as ArrayRef;
    let to_type = DataType::Timestamp(TimeUnit::Second, None);
    let options = CastOptions {
        safe: false,
        format_options: FormatOptions::default(),
    };
    let b = cast_with_options(&array, &to_type, &options).unwrap();
}

Expected behavior
The cast succeeds.

Additional context

I think this issue is similar to #7073, but this one is for timestamps, whereas the other one affected dates.

I ran into this issue when parsing a metadata file of a delta lake table containing large dates. The table was created using Spark SQL i.e., this issue occurs in the wild.

@ryzhyk ryzhyk added the bug label Feb 27, 2025
@tustvold tustvold added enhancement Any new improvement worthy of a entry in the changelog and removed bug labels Feb 27, 2025
@mbutrovich
Copy link

I've actually been debugging a similar issue for DataFusion Comet, and will open a related issue shortly. The issue may stem from the fact that Spark still defaults to writing INT96 for timestamps. In my issue, we read back a Parquet file written with large timestamp values from a Parquet file, and arrow-rs coerces them into a Timestamp(TimeUnit::Nanoseconds, None) by default which cannot represent as large of a date range as an INT96.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

3 participants