KV cache APIs for LLM inference #21589
Answered
by
carzh
prashantaithal
asked this question in
API Q&A
-
Does ORT either provide any APIs in python or C++ or an implementation of kv cache that would help optimize LLM inference? |
Beta Was this translation helpful? Give feedback.
Answered by
carzh
Aug 13, 2024
Replies: 1 comment
-
ORT GenAI provides a higher-level API that handles the KV cache |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
prashantaithal
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ORT GenAI provides a higher-level API that handles the KV cache