-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add detection for zen 5 #56967
add detection for zen 5 #56967
Conversation
src/processor_x86.cpp
Outdated
@@ -236,6 +237,7 @@ constexpr auto znver2 = znver1 | get_feature_masks(clwb, rdpid, wbnoinvd); | |||
constexpr auto znver3 = znver2 | get_feature_masks(shstk, pku, vaes, vpclmulqdq); | |||
constexpr auto znver4 = znver3 | get_feature_masks(avx512f, avx512cd, avx512dq, avx512bw, avx512vl, avx512ifma, avx512vbmi, | |||
avx512vbmi2, avx512vnni, avx512bitalg, avx512vpopcntdq, avx512bf16, gfni, shstk, xsaves); | |||
constexpr auto znver5 = znver4 | get_feature_masks(avxvnni, movdiri, movdir64b, avx512vp2intersect, /*prefetchi,*/ avxvnni); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume prefetchi
needs to be added to src/features_x86.h
, but I didn't know how
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 113 in 4750dc2
JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000) |
Now you need to look in the CPU docs for how prefetchi
is encoded.
From the "Processor Programming Reference" https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/57896.zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/llvm/llvm-project/blob/3edbe36c3eb01d1c35ac1761da108e3a493258ee/clang/lib/Headers/cpuid.h#L220 The bits are here, though you will to add the
// EAX=7,ECX=1: EDX
branch IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hints! What I don't get is where the 32 * 8
, 32 * 9
etc. is coming from.
Is this the correct patch or are the 32 * 9
bits incorrect?
diff --git a/src/features_x86.h b/src/features_x86.h
index 2ecc8fee32..b817781404 100644
--- a/src/features_x86.h
+++ b/src/features_x86.h
@@ -113,6 +113,9 @@ JL_FEATURE_DEF(wbnoinvd, 32 * 8 + 9, 0)
JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000)
JL_FEATURE_DEF(avx512bf16, 32 * 9 + 5, 0)
+// EAX=7,ECX=1: EDX
+JL_FEATURE_DEF(prefetchi, 32 * 9 + 20, 0)
+
// EAX=0x14,ECX=0: EBX
JL_FEATURE_DEF(ptwrite, 32 * 10 + 4, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm implementing it and maybe adding some comments
Won't we need to wait for #56130 to be merged before we can use Zen5 since that is only in LLVM 19? |
Yes, to take full advantage of zen 5 features I believe LLVM 19 is needed, but this PR is still an improvement since we now fall back to the |
src/features_x86.h
Outdated
JL_FEATURE_DEF(avx512vnniw, 32 * 4 + 2, 0) | ||
JL_FEATURE_DEF(avx512fmaps, 32 * 4 + 3, 0) | ||
JL_FEATURE_DEF(uintr, 32 * 4 + 5, 140000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the last statement a comment which LLVM version introduced support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it turns out those were never implemented :)
@gbaraldi Good to merge? Would be nice to get into 1.12 |
I believe so, has someone actually tested it 😄 . I don't have a zen5 |
I did actually try it on my zen 5 machine, and didn't encounter any issues so far: julia> versioninfo()
Julia Version 1.12.0-DEV.1839
Commit 4750dc2e6f* (2025-01-06 10:44 UTC)
Platform Info:
OS: Linux (x86_64-redhat-linux)
CPU: 16 × AMD Ryzen AI 7 PRO 360 w/ Radeon 880M
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver5)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
julia> @ccall jl_dump_host_cpu()::Cvoid
CPU: znver5
Features: sse3, pclmul, ssse3, fma, cx16, sse4.1, sse4.2, movbe, popcnt, aes, xsave, avx, f16c, rdrnd, fsgsbase, bmi, avx2, bmi2, avx512f, avx512dq, rdseed, adx, avx512ifma, clflushopt, clwb, avx512cd, sha, avx512bw, avx512vl, avx512vbmi, pku, avx512vbmi2, shstk, gfni, vaes, vpclmulqdq, avx512vnni, avx512bitalg, avx512vpopcntdq, rdpid, movdiri, movdir64b, avx512vp2intersect, sahf, lzcnt, sse4a, prfchw, mwaitx, xsaveopt, xsavec, xsaves, clzero, wbnoinvd, avxvnni, avx512bf16 That's still with LLVM 18 of course where we fall back to znver4, should I test this on top of #56130 as well? |
ref llvm/llvm-project@149a150