Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
main.py		main.py
prepare_model.py		prepare_model.py
requirements.txt		requirements.txt
run_benchmark.sh		run_benchmark.sh
run_onnx_squad.py		run_onnx_squad.py
run_quant.sh		run_quant.sh
squad_evaluate.py		squad_evaluate.py
tokenization.py		tokenization.py

README.md

Step-by-Step

This example load a language translation model and confirm its accuracy and speed based on SQuAD task.

Prerequisite

1. Environment

pip install neural-compressor
pip install -r requirements.txt

Note: Validated ONNX Runtime Version.

2. Prepare Model

Download pretrained bert model. We will refer to vocab.txt file.
Download MLPerf mobilebert model and convert it to onnx model with tf2onnx tool.

python prepare_model.py --output_model="mobilebert_SQuAD.onnx"

3. Prepare Dataset

Download SQuAD dataset from SQuAD dataset link.

Dataset directories:

squad
├── dev-v1.1.json
└── train-v1.1.json

Run

1. Quantization

Dynamic quantization:

bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
                   --output_model=/path/to/model_tune \
                   --dataset_location=/path/to/SQuAD/dataset

2. Benchmark

bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
                   --dataset_location=/path/to/SQuAD/dataset \
                   --batch_size=batch_size \
                   --mode=performance # or accuracy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ptq_dynamic

ptq_dynamic

README.md

Step-by-Step

Prerequisite

1. Environment

2. Prepare Model

3. Prepare Dataset

Run

1. Quantization

2. Benchmark

Files

ptq_dynamic

Directory actions

More options

Directory actions

More options

Latest commit

History

ptq_dynamic

Folders and files

parent directory

README.md

Step-by-Step

Prerequisite

1. Environment

2. Prepare Model

3. Prepare Dataset

Run

1. Quantization

2. Benchmark