-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPUSummary.jl v0.1.14 breaks CI of Trixi.jl on skylake-avx512 #6
Comments
Additional information:
|
Unfortunately, CPUSummary 0.1.8 did not work under wine (that is, they'd segfault Julia as soon as you I do have skylake-avx512 locally, so I probably just need to spend the time to figure out what is different in |
Unless you need to run Julia on wine, I suggest you pin CPUSummary 0.1.8. |
One problem is that my check for "will Hwloc segfault Julia or throw an error": CPUSummary.jl/src/CPUSummary.jl Lines 15 to 17 in d7c3676
almost always returns a false positive, even though it passes when run from the REPL. |
Okay, thanks!
Yeah, that's our current workaround at trixi-framework/Trixi.jl#1083 |
If using Hwloc is the problem, it seems to be weird that our CI reports |
That also will also generally be inaccurate. julia> using CPUSummary
julia> CPUSummary.USE_HWLOC
true
julia> isdefined(CPUSummary, :safe_topology_load!)
false This is a far more reliable check. Therefore, look at |
Oh, okay. That's indeed |
We observed some specific problems when going from CPUSummary.jl v0.1.8 to v0.1.14 at Trixi.jl. Everything is fine with the old version of CPUSummary.jl. CI also passes with the new version unless the GitHub CI runner happens to use
LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)
(eitherubuntu-latest
orwindows-latest
).I could reduce this problem at https://github.com/trixi-framework/TrixiDebug.jl. Using the latest version of CPUSummary.jl, CI fails on
ubuntu-latest
(e.g., https://github.com/trixi-framework/TrixiDebug.jl/runs/5492313195?check_suite_focus=true#step:6:357)windows-latest
(e.g., https://github.com/trixi-framework/TrixiDebug.jl/runs/5492410761?check_suite_focus=true#step:6:356)Restricting CPUSummary.jl to v0.1.8 let's CI pass on
ubuntu-latest
withLLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)
(https://github.com/trixi-framework/TrixiDebug.jl/runs/5493268766?check_suite_focus=true#step:6:358)windows-latest
withLLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)
(https://github.com/trixi-framework/TrixiDebug.jl/runs/5493268841?check_suite_focus=true#step:6:357)So far, we have not been able to reproduce this locally...
For context: We use some matrix multiplications based on
matmul!
from Octavian.jl. To me, it seems like these multiplications fail catastrophically, resulting in the errors shown in CI.CC @sloede
The text was updated successfully, but these errors were encountered: