-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup CI tests by bypassing CPython's difflib
for large string comparisons?
#1254
Comments
Bypassing these comparisons helps a little bit, but somehow only for some tests when I run them locally. Currently coffea's CI jobs take ~30min, which seems quite long. I'm currently developing integration-tests for awkward, uproot, vector, dask-awkward, and coffea. The CI jobs run all test suites and it takes in total ~35mins of which ~30mins are spent only in coffea's tests. Is there a way to speed them up? |
A lot of the time is spent in setting up and tearing down dask clients (and also running jobs in them). We could speed things up by using a single client over the whole test and changing our sample root files to use zstd instead of lzma (but maybe not by much...). Though perhaps the best compression for these little test files is no compression (unless they are suddenly no longer little)? I wouldn't skip doing these tests because they're largely representative of emergent bugs from integration and use cases, so they're important for catching nasty regressions. |
I'll profile some of the slower ones today and see if these hypotheses hold. |
Thanks! Not sure if that's somehow useful, but dask distributed uses a |
We may want think about changing some tests from:
to
A few tests, e.g. this one: https://github.com/scikit-hep/coffea/blob/master/tests/test_dataset_tools.py#L379-L380, have very large strings to compare.
pytest
uses CPython'sdifflib
to create a diff of the left and right value if used withassert
directly (this is super helpful for small strings).difflib
is unfortunately pretty slow for large strings especially with small differences, see pytest-dev/pytest#8998.I'm not sure if this is always triggered or only for failing tests. I had to tweak this to make a failing test run locally, otherwise I was stuck in a not-in-my-lifetime-ending for-loop in
difflib
.(this comes at the cost of not having a diff, but that's not useful to read for very large strings anyway)
The text was updated successfully, but these errors were encountered: