You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
GGUF is becoming the mainstream method for large model compression and accelerated inference. Transformers currently supports the loading of T5's GGUF format, but inference does not support acceleration.
Describe the solution you'd like.
If models in the gguf format (such as t5 and flux transformer component) can support loading of gguf format files and at the same time can achieve inference in the same format during inference, instead of converting to float32 for inference, it will be very helpful.
Describe alternatives you've considered.
Additional context.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
GGUF is becoming the mainstream method for large model compression and accelerated inference. Transformers currently supports the loading of T5's GGUF format, but inference does not support acceleration.
Describe the solution you'd like.
If models in the gguf format (such as t5 and flux transformer component) can support loading of gguf format files and at the same time can achieve inference in the same format during inference, instead of converting to float32 for inference, it will be very helpful.
Describe alternatives you've considered.
Additional context.
The text was updated successfully, but these errors were encountered: