Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Avoid AssumeRoleWithWebIdentity for each reconcile #1148

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

chlunde
Copy link
Collaborator

@chlunde chlunde commented Feb 14, 2022

Description of your changes

Use a shared client/credentials to avoid calling AssumeRoleWithWebIdentity for each reconcile.

aws.Config (v1/v2) is thread safe and checks token expiry/handles renewal. To handle

Fixes #828

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Ran for one week. Creating PR for discussion, how can we avoid a global variable?

As you can see below we get a significant reduction in the number of API calls to AWS:

image

@chlunde chlunde requested review from muvaf and haarchri February 14, 2022 07:59
@haarchri
Copy link
Member

did we need to adjust also something for assumeRoleARN?

@chlunde
Copy link
Collaborator Author

chlunde commented Feb 14, 2022

@haarchri yeah, I think we can refactor it to share a bit more code, but first I wonder if there's a place to store this struct in a "less global" variable.

@chlunde
Copy link
Collaborator Author

chlunde commented Feb 14, 2022

@muvaf do you have any idea where we could store this state? Is a global variable like this OK if there are no other options? TAs you can see this should give a performance boost and cost reduction if all AWS security products are enabled (Detective/GuardDuty/CloudTrail).

@haarchri
Copy link
Member

tested your PR today in our environment looks really great - if we adopt this after we clearify to save the state for assumeRoleARN would be perfect ;)

@muvaf
Copy link
Member

muvaf commented Feb 20, 2022

I think this is a very good idea if we have the tools to get the caching right. A few questions:

  • Does the token in the config ever expire? How do we know that so that we can request a new one without returning auth error when we shouldn't?
  • Are we certain that there can ever be a single identity for a single pod, hence we don't need to use a map? Not really important today, but just checking.

Is a global variable like this OK if there are no other options?

I think that's OK.

@haarchri
Copy link
Member

@chlunde any chance to finalize this PR ? i am happy to test this in our environment =)

@chlunde chlunde force-pushed the sdk-cache-creds branch 2 times, most recently from b1ee6d1 to dae0db7 Compare October 11, 2022 21:01
@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Crossplane does not currently have enough maintainers to address every issue and pull request. This pull request has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Adding a comment starting with /fresh will mark this PR as not stale.

@github-actions github-actions bot added the stale label Sep 8, 2023
@MisterMX
Copy link
Collaborator

MisterMX commented Sep 8, 2023

/fresh

@github-actions github-actions bot removed the stale label Sep 9, 2023
@kaessert
Copy link

I rebased changes in this pr and we're running this version now as an experiment. So far no issues and we're seing roughly a 65% drop in AssumeRole requests:
image

@tkaesserfm
Copy link

Version was running in one of our cluster for almost 10 days, looking great. Token is refreshed properly, everything works, no restarts.

@chlunde @haarchri @MisterMX What would be options to proceed here? The change would save a lot some bucks for everyone operating provider-aws on a bigger scale like we do.

I could make a fresh PR with rebased changes and even work on finalizing the PR but i don't want to steal @chlunde the show 😋

@chlunde
Copy link
Collaborator Author

chlunde commented Oct 20, 2023

We've been running with this code for 1.5 years without issues, but we only use one provider config per provider. If someone with more complex setups, assuming roles in different accounts, also could verify it would be nice.

@chlunde chlunde marked this pull request as ready for review October 20, 2023 10:17
@chlunde chlunde requested a review from MisterMX October 20, 2023 10:17
@MisterMX MisterMX changed the title AWS SDK: Avoid AssumeRoleWithWebIdentity for each reconcile (for discussion) fix: Avoid AssumeRoleWithWebIdentity for each reconcile Nov 8, 2023
Use a shared client/credentials to avoid calling AssumeRoleWithWebIdentity.

aws.Config (v1/v2) is thread safe and checks token expiry/handles
renewal.

Signed-off-by: Carl Henrik Lunde <[email protected]>
Signed-off-by: Maximilian Blatt (external expert on behalf of DB Netz) <[email protected]>
@MisterMX
Copy link
Collaborator

MisterMX commented Nov 8, 2023

I made a little improvement by using two separate sync.Mutex for V1 and V2 so they can be accessed in parallel.

I believe there is room for some more improvements by using sync.RWMutex and RLocks for accessing the default config fields. However, this would make the synchronization with writers much more complicated. So I left it out for now.

Copy link
Collaborator

@MisterMX MisterMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you very much @chlunde!

@MisterMX MisterMX merged commit 040f3fe into crossplane-contrib:master Nov 8, 2023
9 checks passed
tektondeploy pushed a commit to gtn3010/provider-aws that referenced this pull request Mar 12, 2024
…ntity-auth-tokenfile

add web identity token configuration to ProviderConfig spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive calls to AssumeRoleWithWebIdentity w/ IRSA
6 participants