Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with configs and reproduction of results #4

Open
ngsrinivas opened this issue Jan 15, 2021 · 6 comments
Open

Help with configs and reproduction of results #4

ngsrinivas opened this issue Jan 15, 2021 · 6 comments

Comments

@ngsrinivas
Copy link

Hi xdp team,

I'm attempting to replicate part of the results from the xdp conext'18 paper, to begin with, under the drop and forward through same interface xdp programs. I'm using xdp_rxq_info for both with actions XDP_DROP and XDP_TX on kernel 5.4

I've been following instructions from https://github.com/tohojo/xdp-paper/blob/master/benchmarks/bench02_xdp_drop.org

I have a few questions. If I can provide any more info, pls let me know.

(i) The xdp paper mentions kernel configurations such as disabling full preemption and retpoline mitigation. would you happen to have (pointers to) any data that discusses the relative importance of these configs to the Mpps per core?

(ii) The retpoline config for the kernel is easy enough to find (CONFIG_RETPOLINE). Could you pls point me to the full-preemption disabling config?

In general, if there is any chance you could provide a diff of your kernel config from the baseline config of the distribution, I would be very grateful.

(iii) I couldn't find concrete documentation (with a quick google search) on PCIe descriptor compression. Apologies if I'm missing something basic. I'd be highly appreciative of any pointers on how you accomplished this.

(iv) after following through with the NIC configs (RSS + ethernet flow control + NIC striding etc., trying a few different RX ring sizes) and IRQ affinity, as described in various scripts in the benchmarks folder, I cannot seem to push single core drop performance beyond 16 Mpps (paper reports ~24 Mpps per core) at 64 bytes.

I'm using a AMD EPYC 7452 32-Core Processor (2312.611 MHz).

Do you think the clock frequency alone explains the discrepancy? (xdp paper reports experiments on 3.6 GHz processor) or should I look elsewhere?

(v) Further, I'm finding that even when using multi-cores, total Mpps across all cores tapers off at around the same value (~16.5 Mpps). Have you observed anything of this sort? Where should I look to determine the problem?

I would be very much grateful for any help on these. Thank you in advance.

Srinivas

@netoptimizer
Copy link
Contributor

That is a lot of questions... I'll try to answer (the ones I can) in separate comments

@netoptimizer
Copy link
Contributor

(i) The xdp paper mentions kernel configurations such as disabling full preemption and retpoline mitigation. would you happen to have (pointers to) any data that discusses the relative importance of these configs to the Mpps per core?

Today the kernel have fixes/workarounds for the overhead of retpoline (CONFIG_RETPOLINE).
Thus, the retpoline (compile time) setting isn't that important to reproduce our results with today's kernel.

XDP was actually not as affected as other parts of the kernel by CONFIG_RETPOLINE.
One XDP area what was affected was DMA mappings combined with XDP_REDIRECT.
Measurements and fixes are documented here: https://github.com/xdp-project/xdp-project/tree/master/areas/dma

@netoptimizer
Copy link
Contributor

In general, if there is any chance you could provide a diff of your kernel config from the baseline config of the distribution, I would be very grateful.

You have to do the diff yourself.

I found the kernel config on my testlab machine, I have add/committed the kernel config to:
https://github.com/xdp-project/xdp-paper/tree/master/benchmarks/ config-4.17.0-rc7-bpf-next-xdp-paper02

I though that I already did this, but I guess forgot.

@netoptimizer
Copy link
Contributor

(ii) The retpoline config for the kernel is easy enough to find (CONFIG_RETPOLINE). Could you pls point me to the full-preemption disabling config?

Hmm... did we write that we disabled preemption?

Looking at the config I just uploaded, it looks like we have enabled preemption:

CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
# CONFIG_DEBUG_PREEMPT is not set

@netoptimizer
Copy link
Contributor

(iii) I couldn't find concrete documentation (with a quick google search) on PCIe descriptor compression. Apologies if I'm missing something basic. I'd be highly appreciative of any pointers on how you accomplished this.

I guess, you are talking about the driver priv-flags used e.g. in benchmarks/bench01_baseline.org and bench02_xdp_drop.org

ethtool --set-priv-flags DEVICE
ethtool --show-priv-flags DEVICE

@netoptimizer
Copy link
Contributor

(iv) after following through with the NIC configs (RSS + ethernet flow control + NIC striding etc., trying a few different RX ring sizes) and IRQ affinity, as described in various scripts in the benchmarks folder, I cannot seem to push single core drop performance beyond 16 Mpps (paper reports ~24 Mpps per core) at 64 bytes.

I'm using a AMD EPYC 7452 32-Core Processor (2312.611 MHz).

Do you think the clock frequency alone explains the discrepancy? (xdp paper reports experiments on 3.6 GHz processor) or should I look elsewhere?

You need to investigate if your CPU supports DDIO/DCA.

Maybe this is related to this issue: xdp-project/xdp-tutorial#170
The issue describe how you can test this...
...can you please help us answer if DDIO works on AMD EPYC CPUs?

There is a scientific article about DDIO/DCA here: https://www.usenix.org/conference/atc20/presentation/farshin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants