This example load a language translation model and confirm its accuracy and speed based on SQuAD task.
pip install neural-compressor
pip install -r requirements.txt
Note: Validated ONNX Runtime Version.
Download pretrained bert model. We will refer to vocab.txt
file.
Download MLPerf mobilebert model and convert it to onnx model with tf2onnx tool.
python prepare_model.py --output_model="mobilebert_SQuAD.onnx"
Download SQuAD dataset from SQuAD dataset link.
Dataset directories:
squad
├── dev-v1.1.json
└── train-v1.1.json
Dynamic quantization:
bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
--output_model=/path/to/model_tune \
--dataset_location=/path/to/SQuAD/dataset
bash run_quant.sh --input_model=/path/to/model \ # model path as *.onnx
--dataset_location=/path/to/SQuAD/dataset \
--batch_size=batch_size \
--mode=performance # or accuracy