Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added 3 packages to environment.yml #476

Merged
merged 4 commits into from
Aug 29, 2023
Merged

Conversation

ds4114
Copy link
Contributor

@ds4114 ds4114 commented Aug 2, 2023

I work with the LDEO McKinley group and wanted to recommend 3 commonly used packages: cdsapi, cmocean, xgboost

cdsapi
cmocean
xgboost
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

Binder 👈 Try on Mybinder.org!
Binder 👈 Try on Pangeo GCP Binder!
Binder 👈 Try on Pangeo AWS Binder!

@pangeo-bot
Copy link
Collaborator

/condalock
Automatically locking new conda environment, building, and testing images...

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ds4114! Thanks for opening this Pull Request to add these oceanography related packages, just have one comment for now 😄

@@ -98,6 +100,7 @@ dependencies:
- xclim
# - xcube>=0.9.1
- xesmf
- xgboost
Copy link
Member

@weiji14 weiji14 Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XGBoost is a machine learning algorithm, should this be added to the ml-notebook or pytorch-notebook instead? Are you using the CPU or GPU version of xgboost on the LDEO compute cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - Thanks for the quick response. We are using the CPU version. Sorry, Im new to pangeo - where do you think it would make the most sense? We have not been using deep learning with pytorch or tensorflow2 so far so I thought it maybe should go along with scikit-learn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, if you're just using the CPU version, it should be fine to add xgboost to pangeo-notebook, similar to scikit-learn as you said. Let us know though if you decide to use the GPU version at some point (not sure if the LDEO JupyterHub has GPU clusters?), and we can add it to either ml-notebook/pytorch-notebook next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake and cdsapi is not on conda-forge and would need to be pip installed. But I spoke with @jbusecke and we were thinking that the cdsapi probably should not be on this list at all as it is more for downloading and accessing data for ERA5 SST / sea level pressure / etc. We'll try to use pangeo forge or have that data accessible on GFS another way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But cdsapi is on conda-forge no? See https://anaconda.org/conda-forge/cdsapi. Feel free to remove it though if you prefer to leave it out, and re-run /condalock afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for the package (https://cds.climate.copernicus.eu/api-how-to) described using pip but you are correct; it is there. I was encountering an issue when trying to install via conda forge on Pangeo that was unrelated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we'll leave cdsapi in the docker image then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have to defer that question to you and @jbusecke. If up to me, I would say include it since I believe it is a very small package that would not add much weight, but it just might not fit with the categories of packages you are trying to include.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have cdsapi in the base notebook image. It is a general package for accessing ECMWF data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's leave cdsapi in then (will keep it in pangeo-notebook, base-notebook should be as minimal as possible). I see that cdsapi only depends on a few packages like python, requests, and tqdm, so shouldn't be too hard to maintain. I'll merge in this PR after #479, just to double check that all the build tests pass.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, but @scottyhq, we'll need to see how to handle the unrelated no space left on device error on pytorch-notebook (similar to #475 (comment)). Also, maybe we should check the docker image filesize if it's getting too big.

@weiji14
Copy link
Member

weiji14 commented Aug 29, 2023

/condalock

@weiji14 weiji14 merged commit 667192c into pangeo-data:master Aug 29, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants