Skip to content

Commit

Permalink
Update AMD EPYC-5 tuning guide after review (#472)
Browse files Browse the repository at this point in the history
* Update AMD EPYC-5 tuning guide after review
---------

Signed-off-by: Yousaf Kaukab <[email protected]>
Co-authored-by: Yousaf Kaukab <[email protected]>
Co-authored-by: Daria Vladykina <[email protected]>
  • Loading branch information
3 people authored Dec 18, 2024
1 parent 5e00edf commit f413a3c
Showing 1 changed file with 38 additions and 38 deletions.
76 changes: 38 additions & 38 deletions xml/MAIN-SBP-AMD-EPYC-5-SLES15SP6.xml
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@
</author>
<author>
<personname>
<firstname>Brent</firstname>
<surname>Hollingsworth</surname>
<firstname>Kim</firstname>
<surname>Naru</surname>
</personname>
<affiliation>
<jobtitle>Engineering Manager</jobtitle>
Expand Down Expand Up @@ -121,7 +121,7 @@
<abstract>

<para>The document at hand provides an overview of both the AMD EPYC™ 9005 Series
Classic and AMD EPYC™ 9005 Series Dense Processors. It details how some
Processors based on Zen5 and Zen5c cores. It details how some
computational-intensive workloads can be tuned on SUSE Linux Enterprise Server
15 SP6.</para>

Expand All @@ -143,19 +143,19 @@
<title>Overview</title>

<para>The AMD EPYC 9005 Series Processor is the 5th generation of the AMD EPYC server
class processors family. It is based on the Zen 5 microarchitecture introduced in 2024.
AMD EPYC 9005 Series classic processors supports up to 128 Zen5 cores (256 threads) whereas
AMD EPYC 9005 Series Dense Processors support up to 192 Zen5c cores (384 threads). Both support 12
class processor family. It is based on the Zen 5 microarchitecture introduced in 2024.
AMD EPYC 9005 Series Processors based on Zen5 cores support up to 128 cores (256 threads) whereas
AMD EPYC 9005 Series Processors based on Zen5c cores support up to 192 cores (384 threads). Both support 12
memory channels per socket. At the time of writing, 1-socket and 2-socket models are expected
to be available from Original Equipment Manufacturers (OEMs) in 2024. This document provides
an overview of the AMD EPYC 9005 Series Classic Processor and how computational-intensive
an overview of the AMD EPYC 9005 Series Processors based on Zen5 cores and how computational-intensive
workloads can be tuned on SUSE Linux Enterprise Server 15 SP6. Additional details about the
AMD EPYC 9005 Series Dense Processor are provided where appropriate.</para>
AMD EPYC 9005 Series Processors based on Zen5c cores are provided where appropriate.</para>

</sect1>

<sect1 xml:id="sec-epyc-architecture">
<title>AMD EPYC 9005 Series Classic Processor architecture</title>
<title>AMD EPYC 9005 Series Processor (Zen5 cores) architecture</title>

<para><emphasis role="italic">Symmetric multiprocessing (SMP)</emphasis> systems are those that
contain two or more physical processing cores. Each core may have two threads if <emphasis role="italic">Symmetric
Expand Down Expand Up @@ -245,14 +245,14 @@
</sect1>

<sect1 xml:id="sec-epyc9005-topology">
<title>AMD EPYC 9005 Series Classic Processor topology</title>
<title>AMD EPYC 9005 Series Processor (Zen5 cores) topology</title>

<para><xref linkend="fig-epyc-topology"/> below shows the topology of an example two socket machine
with a fully populated memory configuration generated by the <package>lstopo</package>
tool.</para>

<figure xml:id="fig-epyc-topology">
<title>AMD EPYC 9005 Series Classic Processor Topology</title>
<title>AMD EPYC 9005 Series Processors based on Zen5 cores Topology</title>
<mediaobject>
<imageobject role="fo">
<imagedata fileref="amd-epyc-5-topology.png" width="100%" format="PNG"/>
Expand Down Expand Up @@ -311,11 +311,12 @@ node 0 1
</sect1>

<sect1 xml:id="sec-memory-zen5c-variant">
<title>AMD EPYC 9005 Series Dense Processor</title>
<title>AMD EPYC 9005 Series Processors (Zen5c cores)</title>

<para>The AMD EPYC 9005 Series Dense Processor launched in 2024. While the fundamental
microarchitecture is based on the <quote>Zen 5</quote> compute core, there are some
important differences between it and the AMD EPYC 9005 Series Classic Processors. Both
<para>The AMD EPYC 9005 Series Processors based on Zen5c cores launched in 2024. While the fundamental
microarchitecture is based on the <quote>Zen 5</quote> compute core, it is optimized for density and efficiency.
Its physical layout takes less space and is designed to deliver more performance per watt.
There are some other important differences between it and the AMD EPYC 9005 Series Processors based on Zen5 cores. Both
processors are socket-compatible, have the same number of memory channels and the
same number of I/O lanes. This means that the processors may be used interchangeably
on the same platform with the same limitation that dual-socket configurations must
Expand All @@ -326,27 +327,24 @@ node 0 1
<para>Despite the compatible ISA, the processors are physically different using a
manufacturing process focused on increased density for both the CPU core and the
physical cache. The L1 and L2 caches have the same capacity. The L3 cache capacity per core is half
the capacity of the AMD EPYC 9005 Series Classic Processor as twice as many cores are placed on
each CCDs. The basic CCX structure for both the AMD EPYC 9005 Series Dense and
9005 Series Classic processor is similar but each CCD for the AMD EPYC 9005 Series
Dense has 16 cores instead of 8. While the AMD EPYC 9005 Series Classic can have up to 16
CCDs (with 1 CCX each) within a socket, the AMD EPYC 9005 Series Dense processor can have up to
12 CCDs, each with 16 cores. This increases the maximum number of its cores per socket from 128
the capacity of the AMD EPYC 9005 Series Processors based on Zen5 cores as twice as many cores are placed on
each CCD. The basic CCX structure in both processors is similar but each CCD with Zen5c
has 16 cores instead of 8. While 16 CCDs (with 1 CCX each) with Zen5 cores can be placed in a package, only
12 CCDs with Zen5c cores (each containing 16 cores) can be placed in a package. This increases the maximum number of cores per socket from 128
cores to 192. Finally, the <emphasis>Thermal Design Points (TDPs)</emphasis> differ
for the AMD EPYC 9005 Series Dense processor, with different frequency scaling limits
and generally a lower peak frequency. While each individual core may achieve less peak
performance than the AMD EPYC 9005 Series Classic Processor, the total peak compute
for Zen5c cores, with different frequency scaling limits
and generally a lower peak frequency. While each individual Zen5c core may achieve less peak
performance than the Zen5 core, the total peak compute
throughput available is higher due to the increased number of cores.</para>

<para>The intended use case and workloads determine which processor is superior. The
key advantage of the AMD EPYC 9005 Series Dense Processor is packing more cores within
key advantage of the AMD EPYC 9005 Series Processors based on Zen5c cores is packing more cores within
the same socket. This may benefit Cloud or HyperScale environments in that more
containers or virtual machines can use uncontested CPUs for their workloads within
the same physical machine. As a result, physical space in data centers can potentially
be reduced. It may also benefit some HPC workloads that are primarily CPU and memory bound.
For example, some HPC workloads scale to the number of available cores working on data sets that
are too large to fit into a typical cache. For such workloads, the AMD EPYC 9005 Series Dense
Processor may be ideal.</para>
are too large to fit into a typical cache. For such workloads, the AMD EPYC 9005 Series Processors based on Zen5c cores may be ideal.</para>

</sect1>

Expand Down Expand Up @@ -904,8 +902,10 @@ epyc:~ # taskset -c `cat /sys/devices/system/cpu/cpu1/cache/index3/shared_cpu_li
CPU. There is a latency penalty when switching P-States, but the AMD EPYC 9005
Series Processor is capable of making fine-grained adjustments to reduce the likelihood
that the latency is a bottleneck. On SUSE Linux Enterprise
Server, the AMD EPYC 9005 Series Processor uses the <command>acpi_cpufreq</command>
driver by default. This allows P-states to be configured to match requested performance. However,
Server 15 SP6, cpufreq subsystem uses the <command>acpi_cpufreq</command> driver by default for AMD EPYC 9005 Series Processors.
However, this may change in the future SUSE Linux Enterprise Server releases
as work is in progress to enable<command>amd-pstate</command> driver for AMD EPYC 9005 Series Processors.
cpufreq subsystem allows P-States to be configured to match requested performance. However,
this is limited in terms of the full capabilities of the hardware. It cannot boost
the frequency beyond the maximum stated frequencies, and if a target is specified,
then the highest frequency below the target will be used. A special case is if the
Expand Down Expand Up @@ -969,9 +969,9 @@ Pac. Die Core CPU Avg_M Busy% Bzy_M TSC_M IPC IRQ POLL C1 C2 POLL% C1
descriptors on a system with 512 CPUs. Although its possible for an application
to increase the current limit, the version of <command>turbostat</command> that ships with SLE15-SP6
at the time of writing is not changing this limit. As a result it fails with error
<emphasis role="italic">open failed: Too many open files.</emphasis> Fix for this
<emphasis role="italic">open failed: Too many open files.</emphasis> A fix for this
issue is on the way. Meanwhile, <command>turbostat</command> can be run after locally changing
this limit by running the command <command>ulimit -n 1029</command>
this limit by running the command <command>ulimit -n 1029</command>.
</para>
</note>

Expand Down Expand Up @@ -1760,9 +1760,9 @@ epyc:~ # perf script
</sect1>

<sect1 xml:id="sec-tuning-zen5c-variant">
<title>Tuning AMD EPYC 9005 Series Dense</title>
<title>Tuning AMD EPYC 9005 Processors (Zen5c cores)</title>

<para>As the AMD EPYC 9005 Series Classic and AMD EPYC Series Dense are ISA-compatible,
<para>As the Zen5 and Zen5c cores are ISA-compatible,
no code tuning or compiler setting changes should be necessary. For Cloud environments,
partitioning or any binding of Virtual CPUs to Physical CPUs may need to be adjusted
to account for the increased number of cores. The additional cores may also allow
Expand All @@ -1778,8 +1778,8 @@ epyc:~ # perf script
<!-- comment: Seems a word was missing - is "partitioning" correct here?-->
may adjust automatically but any static partitioning should be re-examined.</para>

<para>When configuring workloads for either the AMD EPYC 9005 Series Classic or the AMD
EPYC 9005 Series Dense, the most important task is to set expectations. While super-linear
<para>When configuring workloads for AMD EPYC 9005 Series Processors based on either Zen5 for Zen5c cores,
the most important task is to set expectations. While super-linear
scaling is possible, it should not be expected. It may be possible to achieve super-linear
scaling in Cloud Environments for the number of instances hosted without performance
loss if individual containers or virtual machines are not utilising 100% of CPU. However,
Expand Down Expand Up @@ -1854,9 +1854,9 @@ epyc:~ # perf script
<sect1 xml:id="sec-conclusion">
<title>Conclusion</title>

<para>The introduction of the AMD EPYC 9005 Series Classic and AMD EPYC 9005 Series Dense
Processors continues to push the boundaries of what is possible for memory and
IO-bound workloads with much higher bandwidth and available number of channels. A
<para>The introduction of the AMD EPYC 9005 Series Processors continues to push
the boundaries of what is possible for memory and
IO-bound workloads with significantly higher bandwidth and an increased number of channels. A
properly configured and tuned workload can exceed the performance of many contemporary
off-the-shelf solutions even when fully customized. The symmetric and balanced nature
of the machine makes the task of tuning a workload considerably easier, given that
Expand Down

0 comments on commit f413a3c

Please sign in to comment.