Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add detection for zen 5 #56967

Merged
merged 3 commits into from
Jan 9, 2025
Merged

add detection for zen 5 #56967

merged 3 commits into from
Jan 9, 2025

Conversation

simeonschaub
Copy link
Member

@@ -236,6 +237,7 @@ constexpr auto znver2 = znver1 | get_feature_masks(clwb, rdpid, wbnoinvd);
constexpr auto znver3 = znver2 | get_feature_masks(shstk, pku, vaes, vpclmulqdq);
constexpr auto znver4 = znver3 | get_feature_masks(avx512f, avx512cd, avx512dq, avx512bw, avx512vl, avx512ifma, avx512vbmi,
avx512vbmi2, avx512vnni, avx512bitalg, avx512vpopcntdq, avx512bf16, gfni, shstk, xsaves);
constexpr auto znver5 = znver4 | get_feature_masks(avxvnni, movdiri, movdir64b, avx512vp2intersect, /*prefetchi,*/ avxvnni);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume prefetchi needs to be added to src/features_x86.h, but I didn't know how

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000)
needs to be added here.

Now you need to look in the CPU docs for how prefetchi is encoded.

image

From the "Processor Programming Reference" https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/57896.zip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/llvm/llvm-project/blob/3edbe36c3eb01d1c35ac1761da108e3a493258ee/clang/lib/Headers/cpuid.h#L220 The bits are here, though you will to add the

// EAX=7,ECX=1: EDX 

branch IIUC

Copy link
Member Author

@simeonschaub simeonschaub Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hints! What I don't get is where the 32 * 8, 32 * 9 etc. is coming from.

Is this the correct patch or are the 32 * 9 bits incorrect?

diff --git a/src/features_x86.h b/src/features_x86.h
index 2ecc8fee32..b817781404 100644
--- a/src/features_x86.h
+++ b/src/features_x86.h
@@ -113,6 +113,9 @@ JL_FEATURE_DEF(wbnoinvd, 32 * 8 + 9, 0)
 JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000)
 JL_FEATURE_DEF(avx512bf16, 32 * 9 + 5, 0)
 
+// EAX=7,ECX=1: EDX
+JL_FEATURE_DEF(prefetchi, 32 * 9 + 20, 0)
+
 // EAX=0x14,ECX=0: EBX
 JL_FEATURE_DEF(ptwrite, 32 * 10 + 4, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm implementing it and maybe adding some comments

@imciner2
Copy link
Contributor

imciner2 commented Jan 6, 2025

Won't we need to wait for #56130 to be merged before we can use Zen5 since that is only in LLVM 19?

@simeonschaub
Copy link
Member Author

Yes, to take full advantage of zen 5 features I believe LLVM 19 is needed, but this PR is still an improvement since we now fall back to the znver4 target instead of the generic one

Comment on lines 83 to 84
JL_FEATURE_DEF(avx512vnniw, 32 * 4 + 2, 0)
JL_FEATURE_DEF(avx512fmaps, 32 * 4 + 3, 0)
JL_FEATURE_DEF(uintr, 32 * 4 + 5, 140000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the last statement a comment which LLVM version introduced support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out those were never implemented :)

@simeonschaub
Copy link
Member Author

@gbaraldi Good to merge? Would be nice to get into 1.12

@gbaraldi
Copy link
Member

gbaraldi commented Jan 8, 2025

I believe so, has someone actually tested it 😄 . I don't have a zen5

@simeonschaub
Copy link
Member Author

I did actually try it on my zen 5 machine, and didn't encounter any issues so far:

julia> versioninfo()
Julia Version 1.12.0-DEV.1839
Commit 4750dc2e6f* (2025-01-06 10:44 UTC)
Platform Info:
  OS: Linux (x86_64-redhat-linux)
  CPU: 16 × AMD Ryzen AI 7 PRO 360 w/ Radeon 880M
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver5)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

julia> @ccall jl_dump_host_cpu()::Cvoid
CPU: znver5
Features: sse3, pclmul, ssse3, fma, cx16, sse4.1, sse4.2, movbe, popcnt, aes, xsave, avx, f16c, rdrnd, fsgsbase, bmi, avx2, bmi2, avx512f, avx512dq, rdseed, adx, avx512ifma, clflushopt, clwb, avx512cd, sha, avx512bw, avx512vl, avx512vbmi, pku, avx512vbmi2, shstk, gfni, vaes, vpclmulqdq, avx512vnni, avx512bitalg, avx512vpopcntdq, rdpid, movdiri, movdir64b, avx512vp2intersect, sahf, lzcnt, sse4a, prfchw, mwaitx, xsaveopt, xsavec, xsaves, clzero, wbnoinvd, avxvnni, avx512bf16

That's still with LLVM 18 of course where we fall back to znver4, should I test this on top of #56130 as well?

@simeonschaub simeonschaub merged commit 4250be8 into master Jan 9, 2025
7 checks passed
@simeonschaub simeonschaub deleted the sds/znver5 branch January 9, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants