Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Performance degradation with ZenDNN #5

Closed
aj-prime opened this issue Jun 12, 2024 · 3 comments
Closed

[Performance] Performance degradation with ZenDNN #5

aj-prime opened this issue Jun 12, 2024 · 3 comments

Comments

@aj-prime
Copy link

Describe the issue

I followed the installation instructions described in the section 4 of the README.

Processor Name: AMD EPYC 7V13 64-Core Processor (Azure Cloud)

Performance (QPS):
ZenDNN: 34
CPU: 77

To reproduce

CPU: python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel -p fp32 -i 3 -t 10 -b 24 -s 16 -n 96 -v --provider cpu

ZenDNN: python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel -p fp32 -i 3 -t 10 -b 24 -s 16 -n 96 -v --provider zendnn

I tried 64 threads also, but it results in worse performance.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 20.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-zendnn:1.17.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Unknown

@ajeet1203singh
Copy link

Hello @aj-prime, can you please let us know the environment variables you are using and can you confirm that "Optimal Memory Allocator Settings Specific to ONNXRT" section in the user guide was followed?

For this case our recommended settings are:
export GOMP_CPU_AFFINITY=0-63 &&
export OMP_NUM_THREADS=64 &&
export OMP_WAIT_POLICY=ACTIVE &&
export OMP_PROC_BIND=FALSE &&
export OMP_DYNAMIC=FALSE &&
export ZENDNN_MATMUL_ALGO=FP32:4 &&
export LD_PRELOAD=$ZENDNN_PARENT_FOLDER/openmp-10.0.1.src/runtime/src/libomp.so

With thp setting as "always"

@aj-prime
Copy link
Author

Thanks @ajeet1203singh. Using Optimal Memory Allocator Setting resolved the issue.

@lauthu
Copy link

lauthu commented Aug 13, 2024

Hello @ajeet1203singh and @aj-prime , can you please share some number on the expected throughput improvement of ZenDNN 4.2?

I'm also trying to run the transformer benchmark, and also got the similar result (ZenDNN is slower than CPU Execution Provider).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants