Handling of context overflow in Local LLMs #33295

glebashnik · 2025-02-11T12:58:37Z

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

This is necessary for generate document augmentation to make local LLM outputs more robust.
Add constraints and logic to avoid context overflow.

…naming for config params

… LLMs. Added tests that verifies outputs of a small LLM.

bjorncs · 2025-02-11T13:35:37Z

model-integration/src/main/resources/configdefinitions/llm-local-client.def

+maxPromptTokens int default = 0
+
+# Specifies what to do when the number of tokens in a prompt + maxTokens > contextSize / parallelRequests.
+# NONE - default option, run inference no matter context and prompt sizes.


We should look into how users can detect when overflow occurs in combination NONE, for instance logging or metric. If do logging we do need a mechanism for silencing the warnings if degraded execution is fine.

bjorncs · 2025-02-11T13:36:18Z

model-integration/src/main/resources/configdefinitions/llm-local-client.def

+# It can be used for better reproducibility of LLM completions.
+# Still, it doesn't guarantee reproducibility, especially on different hardware.
+# -1 means no random seed will be set.
+randomSeed int default = -1 


Consider renaming to seed. I assume the seed is only random if value is set to -1 and non-random for all other?

bjorncs · 2025-02-11T13:46:31Z

model-integration/src/test/java/ai/vespa/llm/clients/LocalLLMTest.java

+
+    @Test
+    public void testContextOverflowPolicySkip() {
+        downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH);


Consider using @BeforeClass for shared initialization.

bjorncs · 2025-02-11T13:55:51Z

model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java

+        }
+
+        var numRequestTokens = promptTokens.length + maxTokens;
+        // logger.info("Prompt tokens: " + promptTokens.length + ", max tokens: " + maxTokens + ", request tokens: " + numRequestTokens);


Consider replacing with logger.log(Level.DEBUG, () -> "Prompt ... instead.

bjorncs · 2025-02-11T14:00:22Z

model-integration/src/test/java/ai/vespa/llm/clients/LocalLLMTest.java

+    // Small LLM tests use a quantized Mistral model ca. 4.3 GB.
+    // It produces sensible completions which can be verified as part of the test.
+    private static final String SMALL_LLM_URL = "https://huggingface.co/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/blob/main/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf";
+    private static final String SMALL_LLM_PATH = "src/test/models/llm/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf";


Will this file be ignored by git?

glebashnik added 5 commits February 6, 2025 16:18

First draft of locall llm context size management implemenation

7058b5b

Added Completion enum change

86b9e7e

Renamed config option for context size management

71dcb70

Moved context reservation inside executor, added some tests, refined …

11e4aca

…naming for config params

Adds config parameters and logic to handle context overflow for local…

da073a3

… LLMs. Added tests that verifies outputs of a small LLM.

glebashnik requested review from bjorncs and arnej27959 February 11, 2025 13:02

bjorncs approved these changes Feb 11, 2025

View reviewed changes

glebashnik removed the request for review from arnej27959 February 11, 2025 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of context overflow in Local LLMs #33295

Handling of context overflow in Local LLMs #33295

glebashnik commented Feb 11, 2025

bjorncs Feb 11, 2025

bjorncs Feb 11, 2025

bjorncs Feb 11, 2025

bjorncs Feb 11, 2025

bjorncs Feb 11, 2025

Handling of context overflow in Local LLMs #33295

Are you sure you want to change the base?

Handling of context overflow in Local LLMs #33295

Conversation

glebashnik commented Feb 11, 2025

bjorncs Feb 11, 2025

Choose a reason for hiding this comment

bjorncs Feb 11, 2025

Choose a reason for hiding this comment

bjorncs Feb 11, 2025

Choose a reason for hiding this comment

bjorncs Feb 11, 2025

Choose a reason for hiding this comment

bjorncs Feb 11, 2025

Choose a reason for hiding this comment