KV cache APIs for LLM inference #21589

prashantaithal · 2024-08-01T20:00:51Z

prashantaithal
Aug 1, 2024

Does ORT either provide any APIs in python or C++ or an implementation of kv cache that would help optimize LLM inference?
If no, is there any guide or reference on how the kv cache can be implemented in ORT?

Answered by carzh

Aug 13, 2024

ORT GenAI provides a higher-level API that handles the KV cache

View full answer

carzh · 2024-08-13T22:39:53Z

carzh
Aug 13, 2024
Collaborator

ORT GenAI provides a higher-level API that handles the KV cache

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV cache APIs for LLM inference #21589

{{title}}

Replies: 1 comment

{{title}}

Select a reply

KV cache APIs for LLM inference #21589

prashantaithal Aug 1, 2024

Replies: 1 comment

carzh Aug 13, 2024 Collaborator

prashantaithal
Aug 1, 2024

carzh
Aug 13, 2024
Collaborator