Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have a pip installer for CMOR #768

Open
matthew-mizielinski opened this issue Jan 17, 2025 · 8 comments
Open

Can we have a pip installer for CMOR #768

matthew-mizielinski opened this issue Jan 17, 2025 · 8 comments

Comments

@matthew-mizielinski
Copy link

Some institutions are having issues with access to conda (including conda-forge). Could we have a pip installation option here?

@durack1
Copy link
Contributor

durack1 commented Jan 18, 2025

@matthew-mizielinski can I poke the bear a little.

I wanted to understand the Anaconda licensing model and if we/DOE/PCMDI also need to start being careful about our use. As we always host code/data in publicly accessible CC-BY (or equivalent) licenses, I think we're ok, no?

Is the MetOffice "commercial" license environment the issue for you?

I just took a peek at their terms of service and found their "free tier" conditions

Anaconda Public Repository.

License Grant. Subject to the terms of this Agreement, Anaconda hereby grants You a non-exclusive, non-transferable license to: (1) Use the Public Repository in accordance to the number of licenses purchased in a non-commercial environment (unless otherwise specified in writing by Anaconda); (2) Access and use the Premium Repository for the sole purpose of internal development of software or models or for internal development of proprietary software packages; (3) Redistribute code files in source (if provided to You by Anaconda as source) and binary forms, with or without modification, subject to the requirements set forth below; and (4) Modify and create derivative works of sample source code delivered in the Premium Repository for internal use.

@durack1
Copy link
Contributor

durack1 commented Jan 18, 2025

Ok might have found a clearer definition of the issue - this summarises it well. So presumably as LLNL has 8000 staff, we also may need to be careful. The PCMDI organization hosts the repo, and we are well under 200..

So, what changed?

The heart of the changes lies in Anaconda’s definition of "Organizational Use." According to the new licensing terms, any organization with 200 or more employees or contractors is now required to purchase a paid license to use Anaconda's software.

The licensing terms clarify, in particular, that the 200-employee threshold applies not only to for-profit companies but also to government entities and non-profit organizations.

This is a shift from previous policies that had allowed universities, research institutions, and other non-profits to use Anaconda without charge (or at least that’s what some thought). Currently, these organizations must obtain licenses if they exceed the 200-employee limit.

@matthew-mizielinski
Copy link
Author

Ok might have found a clearer definition of the issue - this summarises it well. So presumably as LLNL has 8000 staff, we also may need to be careful. The PCMDI organization hosts the repo, and we are well under 200..

So, what changed?

The heart of the changes lies in Anaconda’s definition of "Organizational Use." According to the new licensing terms, any organization with 200 or more employees or contractors is now required to purchase a paid license to use Anaconda's software.

The licensing terms clarify, in particular, that the 200-employee threshold applies not only to for-profit companies but also to government entities and non-profit organizations.

This is a shift from previous policies that had allowed universities, research institutions, and other non-profits to use Anaconda without charge (or at least that’s what some thought). Currently, these organizations must obtain licenses if they exceed the 200-employee limit.

To confirm: institutional policies have had to change in a number of centres in response to Anaconda's change in approach to this policy, as linked to above, with a range of approaches depending on local infrastructure. Given the risks of incurring significant cost a number of centres are being cautious about supporting access, so alternate mechanisms to accessing python packages would be valued.

@JamesAnstey
Copy link

@matthew-mizielinski @durack1 thanks for starting this discussion. Yes, just to confirm, at CCCma we are being encouraged to move away from using conda altogether, due to the risk that ECCC employees (CCCma being part of ECCC, which is much larger than 200 people) may unintentionally incur large licensing costs by using packages from the default conda channel. So we would use a pip installer if one becomes available.

@durack1
Copy link
Contributor

durack1 commented Jan 21, 2025

@matthew-mizielinski @JamesAnstey, @sashakames just mentioned that miniforge and mamba (available here) solve your issue?

@sashakames noted that much of LLNL development had been pushed in this direction, anything else to note here?

@clintseinen
Copy link

Technical lead at the CCCma here - in short, in the immediate term @durack1 mamba and miniforge do give work arounds, and we are exploring that. The concern that I asked @JamesAnstey to flag here is that depending on how each group handles these risks, some might just block conda installs altogether and make overly restrictive policies that make it so the conda install isn't an option.

That said, its possible mamba would be allowed, but given that its really an implementation of conda, I could see some less-informed higher-ups also saying that should be blocked too.

Outside of this licensing issue, I've also worked on HPC platforms where the sys-admins were adamant about using conda at all.

In short, I'm not saying that this should be a high priority, but it would be great if there was a pip installer for instances like this - I've struggled through the building from source myself (made extra tough by needing an old version of CMOR) and it was definitely non-trivial

@matthew-mizielinski
Copy link
Author

matthew-mizielinski commented Jan 27, 2025

@matthew-mizielinski @JamesAnstey, @sashakames just mentioned that miniforge and mamba (available here) solve your issue?

I believe that miniforge is an alternative wrapper around a conda/miniconda installations, and mamba is conda with a faster, compiled environment resolution tool. While they may connect to the default channels managed by Anaconda, where licenses may be needed, I think they work in the same way and could be caught up in any site wide restriction. I wouldn't view these as a guaranteed alternative to the licensing issue noted above.

There are (moderately?) robust ways around this for organisations; the Met Office use Artifactory to cache and manage everything that we access via conda (important that for operational purposes libraries and tools don't just disappear and our systems break if someone retires them from conda).

I agree with @clintseinen that this isn't necessarily a high priority, but I think it is a useful insurance policy to have a pip installer as the alternative (compile yourself) is both challenging and daunting for users who don't already have experience of this. (Side note: we stuck with the same version of CMOR for many years until conda was made available to us as we thought it would be very tricky to recompile).

@mauzey1
Copy link
Collaborator

mauzey1 commented Jan 27, 2025

I was able to make a wheel file for pip installation using the current setup.py created in our current conda build pipeline. However, I had a few issues with it. My initial problem was that pip installing CMOR into a new Python environment didn't include dependencies like six, NumPy, and NetCDF4 so I needed to install those separately. Once I had those installed I was able to run most of our Python tests successfully. One test that didn't work was our zstandard compression and quantization test. It runs into the following error.

======================================================================
ERROR: testZstandardCompression (__main__.TestCase.testZstandardCompression)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mauzey1/Desktop/github/cmor/Test/test_cmor_zstandard_and_quantize.py", line 127, in testZstandardCompression
    zstd_shuffle_ta = zstd_shuffle_nc.variables['ta'][:]
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "src/netCDF4/_netCDF4.pyx", line 5079, in netCDF4._netCDF4.Variable.__getitem__
  File "src/netCDF4/_netCDF4.pyx", line 6051, in netCDF4._netCDF4.Variable._get
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Filter error: undefined filter encountered

----------------------------------------------------------------------
Ran 2 tests in 1.865s

FAILED (errors=1)

I suspect this is an issue with the differences between the conda-forge NetCDF4 used in the build environment and the PyPI version that was installed via pip. Anyone have any advice on how to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants