Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
YWHyuk committed Apr 29, 2024
1 parent b0c6bef commit ef26770
Showing 1 changed file with 33 additions and 29 deletions.
62 changes: 33 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,39 @@
# ONNXim: Fast and Detailed Multi-core NPU Simulator
[![Docker Image CI](https://github.com/PSAL-POSTECH/ONNXim/actions/workflows/docker-image.yml/badge.svg)](https://github.com/PSAL-POSTECH/ONNXim/actions/workflows/docker-image.yml)

ONNXim is a fast and detailed multi-core NPU
simulator.
- ONNXim provide fastsimulation speed.
- ONNXim supports multi-core simulation and detailed modeling of DRAM and interconnect, enabling the modeling of contention when running multiple DNN models simultaneously.
- ONNXim is compatible with ONNX graphs, allowing simulation of DNN models across multiple frameworks without
requiring code modifications
ONNXim is a fast cycle-level simulator that models multi-core NPUs for DNN inference. Its features include the follwing:
- Fast simulation speed.
- Support for modeling multi-core NPUs.
- Support for cycle-level simulation of network-on-chip (through [Booksim2](https://github.com/booksim/booksim2)) and memory (through [Ramulator](https://github.com/CMU-SAFARI/ramulator)), which is important for memory-bound operations of DNNs.
- Use of ONNX graphs as DNN model specification enabling simulation of DNNs implemented in different deep learning frameworks (e.g., PyTorch and TensorFlow).

## Requirements
### OS Distribution
* Centos8 (Recommended)
* CentOS8 (Recommended)

*Other OS distributions are not tested!*
### Python(>=3.8) packages
*We have not tested ONNXim on other Linux distributions.*
### Python(>=3.8) Packages
* torch >= 1.10.1
* conan == 1.57.0
* onnxruntime >= 1.10.0
* torchvision >= 0.17.2 (Optional: for onnx graph generation)
* optimum >= 1.19.0 (Optional: for onnx graph generation)
* torchvision >= 0.17.2 (Optional: for ONNX graph generation)
* optimum >= 1.19.0 (Optional: for ONNX graph generation)

### Package
* cmake >= 3.22.1 (You may need to build manually)
### Other Dependencies
* cmake >= 3.22.1
* gcc >= 8.3


## ONNX graph
ONNXim require ONNX graph file (.onnx) to simulate DNN model. We provide fused ResNet18 in `models` directory already. If you want to export new DNN model to ONNX grpah, you can use `scripts/generate_*_onnx.py`.
## ONNX Graph
ONNXim requires ONNX graph file (.onnx) to simulate DNN model. We provide fused ResNet-18 in `models` directory as an example. If you want to export new DNN model to ONNX Graph, you can use `scripts/generate_*_onnx.py` script as shown below.

If you want ResNet50 model, follow the example below
For ResNet-50:
```
$ cd ONNXim
$ python3 ./srcripts/generate_cnn_onnx.py --model resnet50
```

In case of GPT2 or BERT,
For GPT and BERT:
```
$ cd ONNXim
$ python3 ./scripts/generate_transformer_onnx.py --model gpt2
Expand Down Expand Up @@ -88,7 +87,7 @@ $ python3 ./scripts/generate_transformer_onnx.py --model bert

# Getting Started
This section describes how to build and run ONNXim. There are two methods to run ONNXim: Container-based method and Manual build method.
## 1. Docker image method (Recommended)
## 1. Docker Image Method (Recommended)
```
$ git clone https://github.com/PSAL-POSTECH/ONNXim.git
$ cd ONNXim
Expand All @@ -104,7 +103,7 @@ $ docker run -it onnxim
Run docker image and simulate resnet18 example


## 2. Manual method
## 2. Manual Method
### Installation
```
$ git clone https://github.com/PSAL-POSTECH/ONNXim.git
Expand All @@ -130,22 +129,22 @@ $ ./build/bin/Simulator --config ./configs/systolic_ws_8x8_c1_simple_noc.json --
![Demo](/img/ONNXim_demo.png)

------------
## Mapping (Optional)
ONNXim uses a hierarchical tiling method that can handle large tensor.
If the mapping method is not specified, the tiling method of Gemmini is used by default.
## Mapping
ONNXim uses a hierarchical tiling method that can handle large tensors.
If the mapping method is not specified, the tiling method from [Gemmini](https://github.com/ucb-bar/gemmini) [DAC'21] is used by default.

### Manual Mapping file
You can specify the mapping method by putting `*.mapping` file in the same folder of `*.onnx` file.
### Manual Mapping file (Optional)
You can specify the mapping by putting `*.mapping` file in the same folder as a `*.onnx` file.

Mapping file is composed to 3 parts.
The mapping file is composed of 3 parts.

1. Total Loop: `[T] N1 C3 M64 P112 Q112 S7 R7`
2. Outer Loop: `[O] N1 C1 M4 P5 Q6 S1 R1`
3. Inner Loop: `[I] N1 C3 M16 P23 Q22 S7 R7`

N: Batch size, C: Input channel, M: Output Channel, P: Output Rows, Q: Output Cols, S: Kernel Row, R: Kernel Cols

This mapping is an example of first convolution layer in ResNet18. Inner Loop is the tensor size that can be holded in scratch pad and accumulator.
This mapping is an example of the ResNet-18. Inner Loop is the tensor size that fit in the NPU core's scratchpad memory and accumulator.

```
[T] N1 C3 M64 P112 Q112 S7 R7 - [O] N1 C1 M4 P5 Q6 S1 R1 - [I] N1 C3 M16 P23 Q22 S7 R7
Expand All @@ -163,8 +162,13 @@ This mapping is an example of first convolution layer in ResNet18. Inner Loop is
[T] N1 C512 M512 P7 Q7 S3 R3 - [O] N1 C5 M5 P1 Q1 S1 R1 - [I] N1 C120 M112 P7 Q7 S3 R3
[T] N1 C512 M1000 - [O] N1 C1 M5 - [I] N1 C512 M248
```
Activation and Weight are stored in scratch pad, and output is stored in accumulator. This simulator consider the size of scratch pad as 256KB and accumulator size as 16KB (default, you can modify). Furthermore, it uses double buffering so that it could fill the scratch pad with half size. Mapping is calculated before implement simulator.
The input activation and weight are stored in the scratchpad memory, and the output is stored in accumulator. This simulator assumes the size of the scratchpad memory as 256KB and accumulator size as 16KB (default, you can modify). Furthermore, it uses double buffering so that it could fill the scratch pad with half size. Mapping is calculated before implement simulator.

------------
## Future Works
This version only supports GEMM, Conv, Attention, GeLU, LayerNorm Operation. We're developing the simulator to support other operations such as pooling.
## Future Work
This current version only supports GEMM, Conv, Attention, GeLU, LayerNorm operations. Other operations will be supported in later versions.

## Citation
If you use ONNXim for your research, please cite the following paper.

TBA

0 comments on commit ef26770

Please sign in to comment.