Evaluate lower onnx latency for Gdino #18

rhysdg · 2024-07-31T18:22:01Z

Currently we're looking at about ~3x slower
Time has been reduced almost by half with TensorrtExecutionProvider in comparison to straight onnx with the CUDA execution provider - headed through an opset analysis etc

The text was updated successfully, but these errors were encountered:

rhysdg · 2024-07-31T22:26:34Z

Tracking at a comparable ~0.25s now with custom ops etc - gdino vanila in pytorch is also 0.25. Making progress
150ms is now available with the TensorRT excecution provider after warmup
Worth noting that this is with an Ampere GPU - T4's in colab have horrendous performance
FP16 takes a heavy hit in inference quality for TRT

rhysdg added the bug Something isn't working label Jul 31, 2024

rhysdg self-assigned this Jul 31, 2024

Provide feedback