-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added 3 packages to environment.yml #476
Conversation
cdsapi cmocean xgboost
/condalock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ds4114! Thanks for opening this Pull Request to add these oceanography related packages, just have one comment for now 😄
@@ -98,6 +100,7 @@ dependencies: | |||
- xclim | |||
# - xcube>=0.9.1 | |||
- xesmf | |||
- xgboost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XGBoost is a machine learning algorithm, should this be added to the ml-notebook
or pytorch-notebook
instead? Are you using the CPU or GPU version of xgboost on the LDEO compute cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - Thanks for the quick response. We are using the CPU version. Sorry, Im new to pangeo - where do you think it would make the most sense? We have not been using deep learning with pytorch or tensorflow2 so far so I thought it maybe should go along with scikit-learn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, if you're just using the CPU version, it should be fine to add xgboost to pangeo-notebook, similar to scikit-learn as you said. Let us know though if you decide to use the GPU version at some point (not sure if the LDEO JupyterHub has GPU clusters?), and we can add it to either ml-notebook
/pytorch-notebook
next time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake and cdsapi is not on conda-forge and would need to be pip installed. But I spoke with @jbusecke and we were thinking that the cdsapi probably should not be on this list at all as it is more for downloading and accessing data for ERA5 SST / sea level pressure / etc. We'll try to use pangeo forge or have that data accessible on GFS another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But cdsapi
is on conda-forge no? See https://anaconda.org/conda-forge/cdsapi. Feel free to remove it though if you prefer to leave it out, and re-run /condalock
afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the package (https://cds.climate.copernicus.eu/api-how-to) described using pip but you are correct; it is there. I was encountering an issue when trying to install via conda forge on Pangeo that was unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll leave cdsapi
in the docker image then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have to defer that question to you and @jbusecke. If up to me, I would say include it since I believe it is a very small package that would not add much weight, but it just might not fit with the categories of packages you are trying to include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to have cdsapi in the base notebook image. It is a general package for accessing ECMWF data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave cdsapi
in then (will keep it in pangeo-notebook
, base-notebook
should be as minimal as possible). I see that cdsapi
only depends on a few packages like python
, requests
, and tqdm
, so shouldn't be too hard to maintain. I'll merge in this PR after #479, just to double check that all the build tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me, but @scottyhq, we'll need to see how to handle the unrelated no space left on device error on pytorch-notebook (similar to #475 (comment)). Also, maybe we should check the docker image filesize if it's getting too big.
/condalock |
I work with the LDEO McKinley group and wanted to recommend 3 commonly used packages: cdsapi, cmocean, xgboost