Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Step-by-Step

This example load a language translation model and confirm its accuracy and speed based on WikiText dataset.

Prerequisite

1. Environment

pip install neural-compressor
pip install -r requirements.txt

Note: Validated ONNX Runtime Version.

2. Prepare Model

Supported model identifier from huggingface.co:

Model Identifier
gpt2
distilgpt2

Require python <=3.8 and transformers==3.2.0.

python prepare_model.py --input_model=gpt2 --output_model=gpt2.onnx  # or other supported model identifier

3. Prepare Dataset

Please download WikiText-2 dataset.

Run

1. Quantization

Quantize model with static quantization:

bash run_quant.sh --dataset_location=/path/to/wikitext-2-raw/wiki.test.raw \ 
                   --input_model=path/to/model \ # model path as *.onnx
                   --output_model=path/to/model_tune

2. Benchmark

bash run_benchmark.sh --dataset_location=/path/to/wikitext-2-raw/wiki.test.raw \ 
                      --input_model=path/to/model \ # model path as *.onnx
                      --batch_size=batch_size \
                      --mode=performance # or accuracy