Exclude Conv op from quantization in ort-quantize.py

By default `quantize_dynamic` will replace `Conv` with `ConvInteger`. RTen doesn't support this operator yet and ONNX Runtime doesn't support the operator with the particular combination of data types that `quantize_dynamic` generates (u8 input, i8 weights). Hence omit this from the quantized op types for now.
robertknight · Jan 25, 2025 · 8e888d6 · 8e888d6
1 parent 046e743
commit 8e888d6
Showing 1 changed file with 23 additions and 0 deletions.
diff --git a/tools/ort-quantize.py b/tools/ort-quantize.py
@@ -10,9 +10,32 @@
 
 output = args.output or args.input.replace(".onnx", ".quant.onnx")
 
+# Quantized operation types we support.
+#
+# See https://github.com/microsoft/onnxruntime/blob/1fc9c4823d7c2e8f0d07a09315a0755dd7c58ef8/onnxruntime/python/tools/quantization/quantize.py#L828 for the default list that ORT uses.
+#
+# See https://github.com/microsoft/onnxruntime/blob/1fc9c4823d7c2e8f0d07a09315a0755dd7c58ef8/onnxruntime/python/tools/quantization/registry.py#L66 for registries of different ops that
+# will be quantized depending on the quantization type.
+op_types_to_quantize = [
+    # Supported ops from `CommonOpsRegistry`. These support int8 types directly.
+    #
+    # There are other operators which support int8 types that we could list
+    # here but don't because `quantize_dynamic` doesn't attempt to quantize them.
+    "Gather",
+    "Transpose",
+    # Supported ops from `IntegerOpsRegistry`. These get replaced during quantization.
+    "MatMul",  # Replaced by MatMulInteger
+    # "Conv" - Replaced by ConvInteger, which is not implemented yet.
+    #
+    # ConvInteger ops produced by `quantize_dynamic` also don't work in ORT
+    # due to the input data type combination being unsupported.
+    # See https://github.com/microsoft/onnxruntime/issues/15888 .
+]
+
 quantize_dynamic(
     args.input,
     output,
+    op_types_to_quantize=op_types_to_quantize,
     # Avoid a saturation issue on x86-64 systems that don't support VNNI by
     # reducing the range of quantized values from 8 to 7 bits.
     #