-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change S3Client to use user-provided storage_options even in Studio #415
Change S3Client to use user-provided storage_options even in Studio #415
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #415 +/- ##
====================================
- Coverage 78% 72% -7%
====================================
Files 34 34
Lines 5056 5042 -14
====================================
- Hits 3964 3618 -346
- Misses 1092 1424 +332 |
Hi @grez72, Would you mind enabling "Allow edits by maintainers"? This will help us assist with any necessary updates directly. Thank you! |
Hi @bhimrazy, I don't see an "Allow edits by maintainers" option. Perhaps this is because I made the pull request from our organization (harvard-visionlab) and not from my private account (grez72)? Should I create a new issue from a personal-account fork so I can allow edits by maintainers, or is there a way to "Allow edits by maintainers" from our harvard-visionlab/litdata fork? |
Thanks for clarifying, @grez72! In that case, could you please assist with the final updates based on @tchaton's suggestion? Here’s a proposed change that should help integrate - if has_shared_credentials_file or not _IS_IN_STUDIO or self._storage_options:
+ if has_shared_credentials_file or not _IS_IN_STUDIO
self._client = boto3.client(
"s3",
**{
"config": botocore.config.Config(retries={"max_attempts": 1000, "mode": "adaptive"}),
**self._storage_options,
},
)
else:
provider = InstanceMetadataProvider(iam_role_fetcher=InstanceMetadataFetcher(timeout=3600, num_attempts=5))
credentials = provider.load()
self._client = boto3.client(
"s3",
aws_access_key_id=credentials.access_key,
aws_secret_access_key=credentials.secret_key,
aws_session_token=credentials.token,
config=botocore.config.Config(retries={"max_attempts": 1000, "mode": "adaptive"}),
+ **self._storage_options,
) |
I implemented the suggested change, but ran into a few issues when passing credentials as storage options like so:
First I encountered this error (which makes sense, my storage options also contain aws_access_key_id).
I next tried to merge the provider credentials and self._storage_options, like so:
But here I get a permissions error because provider.load() provides a I think maybe if the user passes storage options that include aws credentials (e.g., 'aws_access_key_id' or 'aws_session_token'), then none of the credentials returned by the provider are relevant, so maybe the following would be best? I'm not sure how the aws_session_token works (so maybe checking for that is not needed). If this seems reasonable I can make the change.
|
Yes, fine by me ;) |
What does this PR do?
This PR fixes an issue in the S3Client class where user-provided storage_options are ignored in the Lightning AI Studio environment. The client now uses user-supplied credentials when provided in storage_options.
Fixes #414