You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 3, 2025. It is now read-only.
I have convertd matterport implementation of Mask RCNN saved model to a 16-bit TRT optimized saved model. I can see 100ms improvement in the inference time, however, I do not see any reduction in GPU memory consumption. Given that the original model is 32-bit model, and the optimized model is 16-bit model, I am expecting some reduction in the GPU memory consumption during inference.
I used:
Tensorflow 2.10.0
Tensorrt 7.2.2.1
Colab pro+
No one talks about the GPU memory consumption after optimization. Is it only the inference time that is improved by TF-TRT?
The text was updated successfully, but these errors were encountered:
In general TF-TRT focuses on inference performance, and unfortunately memory consumption is rarely improved. TensorRT itself does a much better job at memory reduction than TF-TRT if memory size is critical for your application.
@ncomly-nvidia@pjannaty Understood, however, if I am optimizing a 32 bit model and using precision_mode='FP16' as one of the conversion parameters, my understanding is that the weights of the converted/optimized model should be FP16. And if that is the case, the model should now take ~half the memory during inference. Am I missing something?
It's hard to tell why TRT does not show memory usage reduction here.
We do have an experimental PR that you may want to use at your discretion to see if it helps with the issue: tensorflow/tensorflow#55959
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I have convertd matterport implementation of Mask RCNN saved model to a 16-bit TRT optimized saved model. I can see 100ms improvement in the inference time, however, I do not see any reduction in GPU memory consumption. Given that the original model is 32-bit model, and the optimized model is 16-bit model, I am expecting some reduction in the GPU memory consumption during inference.
I used:
Tensorflow 2.10.0
Tensorrt 7.2.2.1
Colab pro+
No one talks about the GPU memory consumption after optimization. Is it only the inference time that is improved by TF-TRT?
The text was updated successfully, but these errors were encountered: