Skip to content

Latest commit

 

History

History
 
 

Step-by-Step

This example load a language translation model and confirm its accuracy and speed based on SQuAD task.

Prerequisite

1. Environment

pip install neural-compressor
pip install -r requirements.txt

Note: Validated ONNX Runtime Version.

2. Prepare Model

Download pretrained bert model. We will refer to vocab.txt file.
Download MLPerf mobilebert model and convert it to onnx model with tf2onnx tool.

python prepare_model.py --output_model="mobilebert_SQuAD.onnx"

3. Prepare Dataset

Download SQuAD dataset from SQuAD dataset link.

Dataset directories:

squad
├── dev-v1.1.json
└── train-v1.1.json

Run

1. Quantization

Dynamic quantization:

bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
                   --output_model=/path/to/model_tune \
                   --dataset_location=/path/to/SQuAD/dataset 

2. Benchmark

bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
                   --dataset_location=/path/to/SQuAD/dataset \
                   --batch_size=batch_size \
                   --mode=performance # or accuracy