Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newest updates break DaskOfflineStore with S3 parquets #4753

Open
bjmccotter7192 opened this issue Nov 12, 2024 · 0 comments
Open

Newest updates break DaskOfflineStore with S3 parquets #4753

bjmccotter7192 opened this issue Nov 12, 2024 · 0 comments

Comments

@bjmccotter7192
Copy link

bjmccotter7192 commented Nov 12, 2024

Expected Behavior

In version 0.40.1 the Dask Offline store was able to read the data_source.path directly from the FileSource and retrieve the data from S3 using a path like: s3://<your-bucket>/<file-name>

Current Behavior

Failing to pull data because it is now appending the repo_path to the front of the s3 url.

Example:
/tmp/feast:s3//<your-bucket>/<file-name>

I believe this is because of a recent change: #4624 which is now not accepting the S3 url as a absolute Path

Steps to reproduce

  • Rebuilt my environment with latest tagged version 0.41.3
  • Reran my get_historical_features and call hung for a while then errored with the file path error not existing

Specifications

  • Version: 0.41.3
  • Platform: Linux
  • Subsystem: Debian

Possible Solution

  • Revert that change or allow a flag that would be able to bypass that breaking change
  • IF storage_options NOT None, Read parquet directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant