Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of context overflow in Local LLMs #33295

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

glebashnik
Copy link
Contributor

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

This is necessary for generate document augmentation to make local LLM outputs more robust.
Add constraints and logic to avoid context overflow.

maxPromptTokens int default = 0

# Specifies what to do when the number of tokens in a prompt + maxTokens > contextSize / parallelRequests.
# NONE - default option, run inference no matter context and prompt sizes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look into how users can detect when overflow occurs in combination NONE, for instance logging or metric. If do logging we do need a mechanism for silencing the warnings if degraded execution is fine.

# It can be used for better reproducibility of LLM completions.
# Still, it doesn't guarantee reproducibility, especially on different hardware.
# -1 means no random seed will be set.
randomSeed int default = -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming to seed. I assume the seed is only random if value is set to -1 and non-random for all other?


@Test
public void testContextOverflowPolicySkip() {
downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using @BeforeClass for shared initialization.

}

var numRequestTokens = promptTokens.length + maxTokens;
// logger.info("Prompt tokens: " + promptTokens.length + ", max tokens: " + maxTokens + ", request tokens: " + numRequestTokens);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing with logger.log(Level.DEBUG, () -> "Prompt ... instead.

// Small LLM tests use a quantized Mistral model ca. 4.3 GB.
// It produces sensible completions which can be verified as part of the test.
private static final String SMALL_LLM_URL = "https://huggingface.co/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/blob/main/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf";
private static final String SMALL_LLM_PATH = "src/test/models/llm/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this file be ignored by git?

@glebashnik glebashnik removed the request for review from arnej27959 February 11, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants