This example load a language translation model and confirm its accuracy and speed based on WikiText dataset.
pip install neural-compressor
pip install -r requirements.txt
Note: Validated ONNX Runtime Version.
Supported model identifier from huggingface.co:
Model Identifier |
---|
gpt2 |
distilgpt2 |
Require python <=3.8 and transformers==3.2.0.
python prepare_model.py --input_model=gpt2 --output_model=gpt2.onnx # or other supported model identifier
Please download WikiText-2 dataset.
Quantize model with static quantization:
bash run_quant.sh --dataset_location=/path/to/wikitext-2-raw/wiki.test.raw \
--input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/model_tune
bash run_benchmark.sh --dataset_location=/path/to/wikitext-2-raw/wiki.test.raw \
--input_model=path/to/model \ # model path as *.onnx
--batch_size=batch_size \
--mode=performance # or accuracy