Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor authentication #435

Merged
merged 41 commits into from
Aug 24, 2023
Merged

Refactor authentication #435

merged 41 commits into from
Aug 24, 2023

Conversation

rwegener2
Copy link
Contributor

@rwegener2 rwegener2 commented Aug 3, 2023

What is done

This PR refactors Earthdata authentication to make it more extensible to additional classes that may want to authenticate. Related discussion #433

How it was done

Here I'm continuing the discussion from #433. @JessicaS11 made the good point that instead of creating our own EarthdataAuth class we should just try to reference the auth object that comes back from earthaccess.login(). (Ex. by setting a self.auth = earthaccess.login() variable on all the classes that require authentication. Generally like this:

def Query():
    def __init__():
        self.auth = earthaccess.login()

I experimented with this but eventually realized the library would try to regenerate the objects very time self.auth.get_s3_credentials(daac="NSIDC") or self.auth.get_session() was accessed. For session that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time. We could do separate attributes (self.session or self.s3login_credentials) and a @property getter method to get around this, but that was starting to feel like a lot of code to copy and paste into every class that needed auth. The other major problem is that I don't think we want to make users login when the Query object is created. I think we'd like to let them run Queries and search for data, and only require login once the user tries to .order_granules().

To avoid these challenges I created a mixin class. This differs from an inherited class in that it is built to extend the functionality of other classes with a small, specific set of attributes or functions. The EarthdataAuthMixin contains properties for auth, session, and s3login_credentials. These set themselves (login) the first time they are accessed, and reference the previously created object in subsequent accesses. A bonus of this mixin method is that I was able to keep the .earthdata_login() method, which should make it backwards compatible. The mixin could be added to any class that needs authentication and the user doesn't need to keep track of if s3 credentials or a regular session are needed.

I'm curious what others think of this technique. It's a half step more complicated than the self.auth=earthaccess.login() technique. It felt to me like we gained in modularity and functionality enough to warrant the complication, but I'd love additional opinions.

How it can be tested

Both the following code snippets should work:

import icepyx as ipx

region_a = ipx.Query('ATL06',[-45, 74, -44,75],['2019-11-30','2019-11-30'], \
                           start_time='00:00:00', end_time='23:59:59')

region_a.order_vars.avail()   # works without logging in!

region_a.order_granules()  # also works without logging in!

# try running these multiple times to notice that they store the previously created objects. Also notice that
# once `.session` creates the auth object, `.s3login_credentials` doesn't try to repeat logging in
region_a.session
region_a.s3login_credentials

region_a.earthdata_login()  # still works, even though it's not needed
from icepyx.core.variables import Variables

var_obj = Variables("file", path='./data/ATL06/processed_ATL06_20191201105502_10010505_006_01.h5',
          product='ATL06')

var_obj.auth
var_obj.session

Todo

  • how can we allow the user to regenerate new s3 creds after an hour has passed? Right now there isn't a way to do that.
  • tests
  • documentation - examples
  • documentation - api reference

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

Binder 👈 Launch a binder notebook on this branch for commit 4564b3b

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 16b8d3f

Binder 👈 Launch a binder notebook on this branch for commit 083426c

Binder 👈 Launch a binder notebook on this branch for commit edfa362

Binder 👈 Launch a binder notebook on this branch for commit 72c5347

Binder 👈 Launch a binder notebook on this branch for commit 434bbf2

Binder 👈 Launch a binder notebook on this branch for commit e06765a

Binder 👈 Launch a binder notebook on this branch for commit 9e1f745

Binder 👈 Launch a binder notebook on this branch for commit a2a455f

Binder 👈 Launch a binder notebook on this branch for commit 1aacb1d

Binder 👈 Launch a binder notebook on this branch for commit 50db05e

Binder 👈 Launch a binder notebook on this branch for commit 12e21ba

Binder 👈 Launch a binder notebook on this branch for commit 10bc734

Binder 👈 Launch a binder notebook on this branch for commit 8033ed4

Binder 👈 Launch a binder notebook on this branch for commit 6cdddbf

Binder 👈 Launch a binder notebook on this branch for commit b7b8b7e

Binder 👈 Launch a binder notebook on this branch for commit 1172e9e

Binder 👈 Launch a binder notebook on this branch for commit abd950a

Binder 👈 Launch a binder notebook on this branch for commit 90354e6

Binder 👈 Launch a binder notebook on this branch for commit 7fedc2d

Binder 👈 Launch a binder notebook on this branch for commit e26c9a1

Binder 👈 Launch a binder notebook on this branch for commit a31b092

Binder 👈 Launch a binder notebook on this branch for commit 3aab79a

Binder 👈 Launch a binder notebook on this branch for commit ade942f

Binder 👈 Launch a binder notebook on this branch for commit f052b76

Binder 👈 Launch a binder notebook on this branch for commit 63b0275

Binder 👈 Launch a binder notebook on this branch for commit 1796a2c

Binder 👈 Launch a binder notebook on this branch for commit 4d7687b

Binder 👈 Launch a binder notebook on this branch for commit 3838044

Binder 👈 Launch a binder notebook on this branch for commit 693a9fa

@JessicaS11
Copy link
Member

@betolink I'm curious if you have any thoughts on this approach for centralizing auth in icepyx so that it can easily be picked up across Query, Variables, and Read objects. Pulling out some relevant points from @rwegener2's great writeup above:

we should just try to reference the auth object that comes back from earthaccess.login(). (Ex. by setting a self.auth = earthaccess.login() variable on all the classes that require authentication

but...

the library would try to regenerate the objects very time self.auth.get_s3_credentials(daac="NSIDC") or self.auth.get_session() was accessed. For session that wouldn't matter so much because that is fast to access, but the s3 creds takes ~5 seconds to retrieve. This seemed like it might be a bit cumbersome over time.

She developed this Mixin approach, but we're still left with (assuming this isn't still outstanding in earthaccess too)

how can we allow the user to regenerate new s3 creds after an hour has passed? Right now there isn't a way to do that.

Do you have any thoughts?

@betolink
Copy link
Contributor

betolink commented Aug 5, 2023

Hi @JessicaS11 @rwegener2 this is still an outstanding issue with earthaccess the library tries to optimize and cache the s3fs session but we haven't implemented anything like this at the get_s3_credentials() level yet.

For icepyx, this mixing class looks good! maybe when we instantiate the class for the first time we could store the timestamp and return the same cached set of credentials until the time delta goes above say 50 minutes, a little like the get_s3fs_session() method.

@rwegener2
Copy link
Contributor Author

rwegener2 commented Aug 8, 2023

Thanks for the feedback @betolink!

I've cleaned up the PR and added:

  • some logic that refreshes the s3 token after an hour
  • some logic that checks if earthaccess.login() succeeded, and raises an error if it did not

The one question I had for @betolink:

  • In the event of failure does earthaccess.login() return the error message anywhere other than printing it to output? If it does I can pass that along to the user, but if not I think printed output is fine.

We can talk about if it makes sense to merge this / how the best way to that.

UPDATE: Just adding that after a conversation with Luis it sounds like earthaccess does not currently propagate the error message back to the user. I think it's fine for icepyx to raise the error and direct users to view the earthaccess output. No further action needed here.

@rwegener2 rwegener2 marked this pull request as ready for review August 8, 2023 15:27
icepyx/core/variables.py Outdated Show resolved Hide resolved
icepyx/core/query.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
@rwegener2
Copy link
Contributor Author

rwegener2 commented Aug 14, 2023

Current status

  1. All code comments have been addressed
  2. I’ve written a few tests for the class

In writing those tests I did also end up changing a few lines of test_behind_NSIDC_API_login.py. There was a fixture in that class that writes a .netrc file. I moved that block to the pytest configuration so that the .netrc gets written before all the tests and can also be used by the new authentication class.

Todo

  1. update docs

I’m going to pause on this PR for the rest of the day, but the last thing I’ve got on the todo list is updating the documentation to reflect that the user is no longer required to explicitly login. After that I'll re-request review.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@JessicaS11
Copy link
Member

JessicaS11 commented Aug 23, 2023

@rwegener2 I'm running the basic data access notebook locally and... it fails. I've got valid authentication (it did show an error before bringing up the manual login box, but it did let me authenticate that way and I got valid credentials), but cannot actually do anything that's behind a login. The two failure cases I found are: region_a.order_vars.avail() and region_a.order_granules(), both of which are getting hung up in the requests stage.

UPDATE: I should have known better than to try and test icepyx code on a Wednesday. NSIDC system is down for maintenance (see banner at: https://nsidc.org/data/icesat-2). Wednesday is often their maintenance day...

@rwegener2
Copy link
Contributor Author

rwegener2 commented Aug 23, 2023

I'm running the basic data access notebook locally and... it fails.

Ok, I'm trying it out now. @JessicaS11

I also get a failure, but mine is happening on

#search for available granules and provide basic summary info about them
region_a.avail_granules()

due to AssertionError: Your search returned no results; try different search parameters. Are you changing something about the default search parameters to locate more data? I just opened IS2_data_access.ipynb and did Restart and Run All Cells. I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.

@JessicaS11
Copy link
Member

I wouldn't think that would be an error with this code, but if it's not expected I can dig into it.

There are a few examples in there that are set up to fail to demo things for the user. With Run All Cells you're likely hitting up against one of those.

icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
icepyx/core/auth.py Outdated Show resolved Hide resolved
@rwegener2
Copy link
Contributor Author

I just:

  • merged the suggestion switching the s3token parameter in the earthdata_login function to None
  • added a sentence about unneeded parameters to all the Authentication Warning pop ups in the docs

Let me know if anything comes up while running the code @JessicaS11 🤞🏻

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the features typo got reintroduced... what do you think about removing the "if you choose" language, since we'd ultimately like to discourage deprecated uses of the function.

Copy link
Contributor Author

@rwegener2 rwegener2 Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoot, sorry about that features typo. I'll push that change now.

I see what you're saying about the "if you choose" language. What do you think of:

If you are unable to remove earthdata_login() calls from your workflow, you will also note that certain inputs, such as earthdata_uid and email, are no longer required. e.g. region_a.earthdata_login(earthdata_uid, email) becomes region_a.earthdata_login()

If you have different phrasing I'm happy with whatever you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great to me - it gently pushes folks towards removing it. If we want to mince words we could take out "you will also"... (science writing has gotten to me I guess). :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you've got such a tuned eye for writing! Definitely better to be concise. Both those changes of phrase are pushed

Copy link
Member

@JessicaS11 JessicaS11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this, @rwegener2!

Please note that the uml will auto-rebuild after I approve and add a last commit (which will then require all the tests to re-pass before the merge button is available). And I think it's forced, but a squash merge would be greatly appreciated!

@rwegener2
Copy link
Contributor Author

Yay, soooo exiting! Thanks for all of your attentive review @JessicaS11! Merging now! 🎊

@rwegener2 rwegener2 merged commit 3d2ed4e into development Aug 24, 2023
1 check passed
@rwegener2 rwegener2 deleted the move_auth branch August 24, 2023 21:31
@JessicaS11 JessicaS11 linked an issue Oct 18, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modularizing authentication
3 participants