Skip to content

Commit

Permalink
Exclude Conv op from quantization in ort-quantize.py
Browse files Browse the repository at this point in the history
By default `quantize_dynamic` will replace `Conv` with `ConvInteger`.  RTen
doesn't support this operator yet and ONNX Runtime doesn't support the operator
with the particular combination of data types that `quantize_dynamic` generates
(u8 input, i8 weights). Hence omit this from the quantized op types for now.
  • Loading branch information
robertknight committed Jan 25, 2025
1 parent 046e743 commit 8e888d6
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions tools/ort-quantize.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,32 @@

output = args.output or args.input.replace(".onnx", ".quant.onnx")

# Quantized operation types we support.
#
# See https://github.com/microsoft/onnxruntime/blob/1fc9c4823d7c2e8f0d07a09315a0755dd7c58ef8/onnxruntime/python/tools/quantization/quantize.py#L828 for the default list that ORT uses.
#
# See https://github.com/microsoft/onnxruntime/blob/1fc9c4823d7c2e8f0d07a09315a0755dd7c58ef8/onnxruntime/python/tools/quantization/registry.py#L66 for registries of different ops that
# will be quantized depending on the quantization type.
op_types_to_quantize = [
# Supported ops from `CommonOpsRegistry`. These support int8 types directly.
#
# There are other operators which support int8 types that we could list
# here but don't because `quantize_dynamic` doesn't attempt to quantize them.
"Gather",
"Transpose",
# Supported ops from `IntegerOpsRegistry`. These get replaced during quantization.
"MatMul", # Replaced by MatMulInteger
# "Conv" - Replaced by ConvInteger, which is not implemented yet.
#
# ConvInteger ops produced by `quantize_dynamic` also don't work in ORT
# due to the input data type combination being unsupported.
# See https://github.com/microsoft/onnxruntime/issues/15888 .
]

quantize_dynamic(
args.input,
output,
op_types_to_quantize=op_types_to_quantize,
# Avoid a saturation issue on x86-64 systems that don't support VNNI by
# reducing the range of quantized values from 8 to 7 bits.
#
Expand Down

0 comments on commit 8e888d6

Please sign in to comment.