Fix for numpy2.0 compatibility #1341

PGijsbers · 2024-07-03T13:07:08Z

Numpy2.0 cleaned up their namespace, so we have to alias for the new version.

The problem is that this is not covered in CI (tested versions do not support numpy 2.0), and in general tests need to be updated to support more recent scikit-learn versions that support 2.0 (1.5, maybe 1.4?). I don't know if I really have the time for that right now. Options:

Put a pin in the dependencies
Accept this as a best-effort, and hope that the tests that don't fail due to scikit-learn incompatibility sufficiently cover the code to raise any numpy1/2 issues
Wait until I find time to make tests support scikit-learn 1.5

Numpy2.0 cleaned up their namespace.

codecov-commenter · 2024-07-03T13:52:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 28.20%. Comparing base (923b49d) to head (124e96f).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1341      +/-   ##
===========================================
- Coverage    32.94%   28.20%   -4.75%     
===========================================
  Files           38       38              
  Lines         5254     5255       +1     
===========================================
- Hits          1731     1482     -249     
- Misses        3523     3773     +250

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* Update 'sparse' parameter for OHE for sklearn >= 1.4 * Add compatability or skips for sklearn >= 1.4 * Change 'auto' to 'sqrt' for sklearn>1.3 as 'auto' is deprecated * Skip flaky test It is unclear how a condition where the test is supposed to pass is created. Even after running the test suite 2-3 times, it does not yet seem to pass. * Fix typo * Ignore description comparison for newer scikit-learn There are some minor changes to the docstrings. I do not know that it is useful to keep testing it this way, so for now I will disable the test on newer versions. * Adjust for scikit-learn 1.3 The loss has been renamed. The performance of the model also seems to have changed slightly for the same seed. So I decided to compare with the lower fidelity that was already used on Windows systems. * Remove timeout and reruns to better investigate CI failures * Fix typo in parametername * Add jobs for more recent scikit-learns * Expand the matrix with all scikit-learn 1.x versions * Fix for numpy2.0 compatibility (#1341) Numpy2.0 cleaned up their namespace. * Rewrite matrix and update numpy compatibility * Move comment in-line * Stringify name of new step to see if that prevented the action * Fix unspecified os for included jobs * Fix typo in version pinning for numpy * Fix version specification for sklearn skips * Output final list of installed packages for debugging purposes * Cap scipy version for older versions of scikit-learn There is a breaking change to the way 'mode' works, that breaks scikit-learn internals. * Update parameter base_estimator to estimator for sklearn>=1.4 * Account for changes to sklearn interface in 1.4 and 1.5 * Non-strict reinstantiation requires different scikit-learn version * Parameters were already changed in 1.4 * Fix race condition (I think) It seems to me that run.evaluations is set only when the run is fetched. Whether it has evaluations depends on server state. So if the server has resolved the traces between the initial fetch and the trace-check, you could be checking len(run.evaluations) where evaluations is None. * Use latest patch version of each minor release * Convert numpy types back to builtin types Scikit-learn or numpy changed the typing of the parameters (seen in a masked array, not sure if also outside of that). Convert these values back to Python builtins. * Specify versions with * instead to allow for specific patches * Flow_exists does not return None but False is the flow does not exist * Update new version definitions also installation step * Fix bug introduced in refactoring for np.generic support We don't want to serialize as the value np.nan, we want to include the nan directly. It is an indication that the parameter was left unset. * Add back the single-test timeout of 600s * [skip ci] Add note to changelog * Check that evaluations are present with None-check instead The default behavior if no evaluation is present is for it to be None. So it makes sense to check for that instead. As far as I can tell, run.evaluations should always contain some items if it is not None. But I added an assert just in case. * Remove timeouts again I suspect they "crash" workers. This of course introduces the risk of hanging processes... But I cannot reproduce the issue locally.

Fix for numpy2.0 compatibility

124e96f

Numpy2.0 cleaned up their namespace.

PGijsbers added testing dependencies Pull requests that update a dependency file labels Jul 3, 2024

PGijsbers changed the base branch from develop to fix/sklearn-test-compatibility July 3, 2024 14:08

PGijsbers marked this pull request as ready for review July 3, 2024 14:08

PGijsbers merged commit 9af2f62 into fix/sklearn-test-compatibility Jul 3, 2024
9 of 10 checks passed

PGijsbers deleted the fix/numpy2compatible branch July 3, 2024 14:08

PGijsbers mentioned this pull request Jul 3, 2024

Fix/sklearn test compatibility #1340

Merged

PGijsbers mentioned this pull request Aug 23, 2024

Pin NumPy to be lower than 2.0 #1344

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for numpy2.0 compatibility #1341

Fix for numpy2.0 compatibility #1341

PGijsbers commented Jul 3, 2024

codecov-commenter commented Jul 3, 2024

Fix for numpy2.0 compatibility #1341

Fix for numpy2.0 compatibility #1341

Conversation

PGijsbers commented Jul 3, 2024

codecov-commenter commented Jul 3, 2024

Codecov Report