Releases: amd/ZenDNN
ZenDNN Release v5.0
The focus of the ZenDNN 5.0 release is on delivering support for Zen5 AMD EPYC™ architectures, as well as performance enhancements for generative LLM models through the PyTorch plug-in. The list of models supported includes architectures such as Llama2 and Llama3, Phi2, Phi3, Qwen, ChatGLM, and GPT. The release also delivers performance improvements to non generative LLM models such as BERT.
The ZenDNN library can be used in the following frameworks through a plug-in:
- TensorFlow v2.16 and later.
- PyTorch v2.0 and later.
The ZenDNN library is integrated with ONNX Runtime v1.19.2
The highlights of this release are as follows:
- Support for the Zen5 family of AMD EPYC™ processors, codenamed Turin.
- Compatibility with AOCL BLIS 5.0.
- AMD EPYC™ specific enhancements to matmul operators and related fusions, specifically for BF16 precision.
- An auto-tuning algorithm BF16:0 specifically targeting generative LLM models. Support for weight only quantization (WOQ) with INT4 weights and BF16 activations for LLMs; ZenDNN 5.0 natively supports models optimized and exported using the AMD Quantizer Quark.
- AMD EPYC™ specific enhancements for WOQ matmul operators and related fusions.
- Performance enhancements targeted at generative LLM models using the function zentorch.llm.optimize( ) available in the
ZenDNN PyTorch plug-in; this function contains additive AMD EPYC™ specific optimizations on top of the x86 optimizations available in
ipex.llm.optimize(). - An optimized Scalar Dot Product Attention (SDPA) operator in the PyTorch plug-in, including KV cache performance optimizations tailored to AMD EPYC™ cache architectures.
- Support for BF16 precision for Recommender System models in the PyTorch plug-in.
- Graph optimization and pattern matching improvements in the PyTorch plug-in.
ZenDNN Release v4.2
The highlights of this release are as follows
-
The ZenDNN library is based on oneDNN v2.6.3, and provides optimizations tailored to enable performant AI inference on AMD EPYCTM servers.
-
The ZenDNN library can be used in the following frameworks through a plug-in:
- TensorFlow v2.16 and later
- PyTorch v2.0 and later
-
The ZenDNN library is integrated with ONNX Runtime v1.17.0.
-
Supports Environment Variables for Tuning Performance
The following environment variables have been added to tune performance:- Memory Pooling (Persistent Memory Caching)
- ZENDNN_ENABLE_MEMPOOL for all TensorFlow models
- Added MEMPOOL support for BF16 models in TensorFlow models
- Convolution Operation
- ZENDNN_CONV_ALGO for all TensorFlow models
- Added new options to ALGO paths
- Matrix Multiplication Operation
- ZENDNN_MATMUL_ALGO for TensorFlow, PyTorch, and ONNX Runtime models
- Added new options, ALGO paths, and an experimental version of auto-tuner for TensorFlow
- Memory Pooling (Persistent Memory Caching)
-
Embedding Bag and Embedding Operators
- Support for Embedding operator
- AVX512 support for Embedding and Embedding Bag kernel
- Two new parallelization strategies for Embedding and Embedding bag operators, namely, Table threading and Hierarchical threading
-
Matrix Multiplication (MatMul) Operators
- MatMul post-ops computation with BLIS kernels
- Weight caching for FP32 JIT and BLIS kernels
- BLIS BF16 kernel support
ZenDNN Release v4.1
[ZENDNN API] ZenDNN Release version v4.1
ZenDNN Release version v4.1
TensorFlow:
Integration with TensorFlow v2.12
Built with manylinux2014
PyTorch:
Integration with PyTorch v1.13
Built with manylinux2014
ONNX Runtime:
Integration with ONNX Runtime v1.15.1
Built with manylinux2014
ZenDNN Release v4.0
[ZENDNN API] ZenDNN Release version v4.0
ZenDNN Release version v4.0
TensorFlow:
Integration with TF v2.10
Built with manylinux2014
PyTorch:
Integration with PyTorch v 1.12
Built with manylinux2014
ONNX Runtime:
Integration with ONNX Runtime v 1.12.1
Built with manylinux2014
ZenDNN Release v3.3
[ZENDNN API] ZenDNN Release version v3.3
ZenDNN Release version v3.3
TensorFlow:
Integration with TF v2.9
--config=zendnn enabled
Built with manylinux 2014
PyTorch:
Integration with PyTorch v 1.11.0
Built with manylinux 2014
Graph optimizations for CNNs
ZenDNN Release v3.2.5
[ZENDNN API] ZenDNN Release version v3.2.5
ZenDNN Release version v3.2.5
TensorFlow:
Integration with TF v1.15
ZenDNN Release v3.2
[ZENDNN API] ZenDNN Release version v3.2
ZenDNN Release version v3.2
TensorFlow:
Integration with TF v2.7
--config=zendnn enabled
Compiled with GCC compiler
Filter caching support for Convolution
Added new path(inplace blocked direct convolution)
Added two new algo for MatMul op
Few convolution fusions
Eager support
Bug fixes
PyTorch:
Integration with PyTorch v 1.9.0
Built with manylinux 2014
Graph optimizations for CNNs
ONNXRT:
Integration with ONNXT v1.8.0
Graph optimization to accelerate Resnet variant models
ZenDNN Release v3.1
This is first public release for ZenDNN