Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] fsspec version 2024.10.0 results in truncated S3 writes #5936

Open
2 tasks done
mhagel-theorem opened this issue Oct 29, 2024 · 3 comments
Open
2 tasks done

[BUG] fsspec version 2024.10.0 results in truncated S3 writes #5936

mhagel-theorem opened this issue Oct 29, 2024 · 3 comments
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working

Comments

@mhagel-theorem
Copy link

mhagel-theorem commented Oct 29, 2024

Describe the bug

Versions:
flytekit: 1.13.11
flyteidl: 1.13.5
flytecore chart: 1.12.0

When using inter-Flyte task file IO for both pickles and LightGBM binary datasets, fsspec version 2024.10.0 results in incomplete/truncated writes. It is expected that this would be an issue with all file-based IO. Downstream tasks fail as a result with a respective incomplete read error.

Reverting to fsspec version 2024.9.0 resolves all issues we have had with incomplete writes and Flyte inter-task IO. If this can be reproduced, recommend pinning Flyte's version to <=2024.9.0.

Expected behavior

File-based writing and IO does not fail to complete writing files.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes

Edit: Added versions for reproducibility

@mhagel-theorem mhagel-theorem added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 29, 2024
Copy link

welcome bot commented Oct 29, 2024

Thank you for opening your first issue here! 🛠

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 31, 2024
@eapolinario eapolinario self-assigned this Oct 31, 2024
@eapolinario eapolinario added the backlogged For internal use. Reserved for contributor team workflow. label Oct 31, 2024
@mhagel-theorem
Copy link
Author

I am happy to investigate this a bit further/deeper when I have the time @eapolinario

@eapolinario
Copy link
Contributor

@mhagel-theorem , thank you. If you can share more details about your setup (other package versions, the shape of your dataframe, or how you're actually using FlyteFile to read/write, which cloud provider, etc).

More than happy to collaborate on this one as I tried to repro, but couldn't yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants