Skip to content
This repository has been archived by the owner on Feb 3, 2025. It is now read-only.

Serve tf-trt converted model return error: NodeDef mentions attr 'max_batch_size' not in Op: name=TRTEngineOp #332

Open
biaochen opened this issue Apr 4, 2023 · 0 comments

Comments

@biaochen
Copy link

biaochen commented Apr 4, 2023

I want to use tf-trt to optimize a tf2 model, and then serve with triton. But fail to serve the optimized tf-trt model. Following is the process:

  1. following this tutorial (https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#introduction), create a tf-trt optimized model
    I use image nvcr.io/nvidia/tensorflow:22.07-tf2-py3 to run the code, and successfully created native model and converted model:
models/
├── native_saved_model
│   ├── assets
│   ├── keras_metadata.pb
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── tftrt_saved_model
    ├── assets
    │   └── trt-serialized-engine.TRTEngineOp_000_000
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index
  1. copy the native and converted model to a repos, and create the dir structure as triton wants:
├── mnist
│   ├── 1
│   │   └── model.savedmodel
│   │       ├── assets
│   │       ├── keras_metadata.pb
│   │       ├── saved_model.pb
│   │       └── variables
│   │           ├── variables.data-00000-of-00001
│   │           └── variables.index
│   └── config.pbtxt
└── mnist_trt
    ├── 1
    │   └── model.savedmodel
    │       ├── assets
    │       │   └── trt-serialized-engine.TRTEngineOp_000_000
    │       ├── saved_model.pb
    │       └── variables
    │           ├── variables.data-00000-of-00001
    │           └── variables.index
    └── config.pbtxt

the native model is copied under mnist/1/model.savedmodel, with config.pbtxt like this:

name: "mnist"
platform: "tensorflow_savedmodel"
max_batch_size : 0

the converted model is copied under mnist_trt/1/model.savedmodel, with config.pbtxt the same as above.

  1. start the triton server within container nvcr.io/nvidia/tritonserver:22.07-py3, the log shows both models are loaded successfully.

  2. try to infer. The client code likes this:

import tensorflow as tf
import numpy as np
import tritonclient.http as httpclient

# Setting up client
url = 'SERVER_IP:8000'
triton_client = httpclient.InferenceServerClient(url=url)
input1_shape = [1, 28, 28]
input1 = httpclient.InferInput("flatten_input", input1_shape, datatype="FP32")
input1_data = np.arange(1*28*28).reshape(1,28,28).astype(np.float32)
print('input1_data: ', input1_data)
input1.set_data_from_numpy(input1_data, binary_data=False)

test_output = httpclient.InferRequestedOutput("dense_1", binary_data=False, class_count=10)

# Querying the server
model_name="mnist"
results = triton_client.infer(model_name=model_name, inputs=[input1], outputs=[test_output])
print(results.as_numpy('dense_1'))

If the model_name is mnist, the infer succeeds, and print the predict result.

[['9575.137695:3' '9021.530273:2' '5957.917969:7' '-416.794525:5'
'-6797.246582:9' '-8895.693359:1' '-9928.074219:0' '-15507.916016:8'
'-22406.882812:6' '-29679.443359:4']]

However, after changing model_name to mnist_trt, the call fails, with error message:

tritonclient.utils.InferenceServerException: NodeDef mentions attr 'max_batch_size' not in Op<name=TRTEngineOp; signature=in_tensor: -> out_tensor:; attr=serialized_segment:string; attr=segment_func:func,default=[]; attr=InT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=OutT:list(type),min=1,allowed=[DT_INT8, DT_HALF, DT_FLOAT, DT_INT32]; attr=max_cached_engines_count:int,default=1; attr=workspace_size_bytes:int; attr=precision_mode:string,allowed=["FP32", "FP16", "INT8"]; attr=calibration_data:string,default=""; attr=use_calibration:bool,default=true; attr=input_shapes:list(shape),default=[]; attr=output_shapes:list(shape),default=[]; attr=segment_funcdef_name:string,default=""; attr=cached_engine_batches:list(int),default=[],min=0; attr=fixed_input_size:bool,default=true; attr=static_engine:bool,default=true>; NodeDef: {{node TRTEngineOp_000_000}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
[[PartitionedCall/PartitionedCall/TRTEngineOp_000_000]]

I guess maybe it's a version issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant