Releases · amd/ZenDNN

15 Nov 11:39

688e7ae

Latest

The focus of the ZenDNN 5.0 release is on delivering support for Zen5 AMD EPYC™ architectures, as well as performance enhancements for generative LLM models through the PyTorch plug-in. The list of models supported includes architectures such as Llama2 and Llama3, Phi2, Phi3, Qwen, ChatGLM, and GPT. The release also delivers performance improvements to non generative LLM models such as BERT.

The ZenDNN library can be used in the following frameworks through a plug-in:

TensorFlow v2.16 and later.
PyTorch v2.0 and later.

The ZenDNN library is integrated with ONNX Runtime v1.19.2

The highlights of this release are as follows:

Support for the Zen5 family of AMD EPYC™ processors, codenamed Turin.
Compatibility with AOCL BLIS 5.0.
AMD EPYC™ specific enhancements to matmul operators and related fusions, specifically for BF16 precision.
An auto-tuning algorithm BF16:0 specifically targeting generative LLM models. Support for weight only quantization (WOQ) with INT4 weights and BF16 activations for LLMs; ZenDNN 5.0 natively supports models optimized and exported using the AMD Quantizer Quark.
AMD EPYC™ specific enhancements for WOQ matmul operators and related fusions.
Performance enhancements targeted at generative LLM models using the function zentorch.llm.optimize( ) available in the
ZenDNN PyTorch plug-in; this function contains additive AMD EPYC™ specific optimizations on top of the x86 optimizations available in
ipex.llm.optimize().
An optimized Scalar Dot Product Attention (SDPA) operator in the PyTorch plug-in, including KV cache performance optimizations tailored to AMD EPYC™ cache architectures.
Support for BF16 precision for Recommender System models in the PyTorch plug-in.
Graph optimization and pattern matching improvements in the PyTorch plug-in.

Assets 2

21 May 13:13

kiriti-pendyala

v4.2

a36ae00

ZenDNN Release v4.2

The highlights of this release are as follows

The ZenDNN library is based on oneDNN v2.6.3, and provides optimizations tailored to enable performant AI inference on AMD EPYC^TM servers.
The ZenDNN library can be used in the following frameworks through a plug-in:
- TensorFlow v2.16 and later
- PyTorch v2.0 and later
The ZenDNN library is integrated with ONNX Runtime v1.17.0.
Supports Environment Variables for Tuning Performance
The following environment variables have been added to tune performance:
- Memory Pooling (Persistent Memory Caching)
  - ZENDNN_ENABLE_MEMPOOL for all TensorFlow models
  - Added MEMPOOL support for BF16 models in TensorFlow models
- Convolution Operation
  - ZENDNN_CONV_ALGO for all TensorFlow models
  - Added new options to ALGO paths
- Matrix Multiplication Operation
  - ZENDNN_MATMUL_ALGO for TensorFlow, PyTorch, and ONNX Runtime models
  - Added new options, ALGO paths, and an experimental version of auto-tuner for TensorFlow
Embedding Bag and Embedding Operators
- Support for Embedding operator
- AVX512 support for Embedding and Embedding Bag kernel
- Two new parallelization strategies for Embedding and Embedding bag operators, namely, Table threading and Hierarchical threading
Matrix Multiplication (MatMul) Operators
- MatMul post-ops computation with BLIS kernels
- Weight caching for FP32 JIT and BLIS kernels
- BLIS BF16 kernel support

Assets 2

26 Sep 06:38

ratan-prasad

v4.1

fb21ac5

ZenDNN Release v4.1

[ZENDNN API] ZenDNN Release version v4.1

ZenDNN Release version v4.1

TensorFlow:
Integration with TensorFlow v2.12
Built with manylinux2014

PyTorch:
Integration with PyTorch v1.13
Built with manylinux2014

ONNX Runtime:
Integration with ONNX Runtime v1.15.1
Built with manylinux2014

Assets 2

18 Jan 05:15

ratan-prasad

v4.0

a08bf9a

ZenDNN Release v4.0

[ZENDNN API] ZenDNN Release version v4.0

ZenDNN Release version v4.0

TensorFlow:
Integration with TF v2.10
Built with manylinux2014

PyTorch:
Integration with PyTorch v 1.12
Built with manylinux2014

ONNX Runtime:
Integration with ONNX Runtime v 1.12.1
Built with manylinux2014

Assets 2

05 Jul 11:04

ratan-prasad

v3.3

0fc322c

ZenDNN Release v3.3

[ZENDNN API] ZenDNN Release version v3.3

ZenDNN Release version v3.3

TensorFlow:
Integration with TF v2.9
--config=zendnn enabled
Built with manylinux 2014

PyTorch:
Integration with PyTorch v 1.11.0
Built with manylinux 2014
Graph optimizations for CNNs

Assets 2

23 May 05:23

ratan-prasad

v3.2.5

c89a060

ZenDNN Release v3.2.5

[ZENDNN API] ZenDNN Release version v3.2.5

ZenDNN Release version v3.2.5

TensorFlow:
Integration with TF v1.15

Assets 2

22 Dec 05:14

ratan-prasad

v3.2

2d8e025

ZenDNN Release v3.2

[ZENDNN API] ZenDNN Release version v3.2

ZenDNN Release version v3.2

TensorFlow:
Integration with TF v2.7
--config=zendnn enabled
Compiled with GCC compiler
Filter caching support for Convolution
Added new path(inplace blocked direct convolution)
Added two new algo for MatMul op
Few convolution fusions
Eager support
Bug fixes

PyTorch:
Integration with PyTorch v 1.9.0
Built with manylinux 2014
Graph optimizations for CNNs

ONNXRT:
Integration with ONNXT v1.8.0
Graph optimization to accelerate Resnet variant models

Assets 2

30 Aug 10:14

ratan-prasad

v3.1

99e7934

ZenDNN Release v3.1

This is first public release for ZenDNN

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: amd/ZenDNN

ZenDNN Release v5.0

ZenDNN Release v4.2

ZenDNN Release v4.1

ZenDNN Release v4.0

ZenDNN Release v3.3

ZenDNN Release v3.2.5

ZenDNN Release v3.2

ZenDNN Release v3.1