-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of context overflow in Local LLMs #33295
base: master
Are you sure you want to change the base?
Conversation
…naming for config params
… LLMs. Added tests that verifies outputs of a small LLM.
maxPromptTokens int default = 0 | ||
|
||
# Specifies what to do when the number of tokens in a prompt + maxTokens > contextSize / parallelRequests. | ||
# NONE - default option, run inference no matter context and prompt sizes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look into how users can detect when overflow occurs in combination NONE
, for instance logging or metric. If do logging we do need a mechanism for silencing the warnings if degraded execution is fine.
# It can be used for better reproducibility of LLM completions. | ||
# Still, it doesn't guarantee reproducibility, especially on different hardware. | ||
# -1 means no random seed will be set. | ||
randomSeed int default = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming to seed
. I assume the seed is only random if value is set to -1
and non-random for all other?
|
||
@Test | ||
public void testContextOverflowPolicySkip() { | ||
downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using @BeforeClass
for shared initialization.
} | ||
|
||
var numRequestTokens = promptTokens.length + maxTokens; | ||
// logger.info("Prompt tokens: " + promptTokens.length + ", max tokens: " + maxTokens + ", request tokens: " + numRequestTokens); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing with logger.log(Level.DEBUG, () -> "Prompt ...
instead.
// Small LLM tests use a quantized Mistral model ca. 4.3 GB. | ||
// It produces sensible completions which can be verified as part of the test. | ||
private static final String SMALL_LLM_URL = "https://huggingface.co/lmstudio-community/Mistral-7B-Instruct-v0.3-GGUF/blob/main/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf"; | ||
private static final String SMALL_LLM_PATH = "src/test/models/llm/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this file be ignored by git?
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
This is necessary for
generate
document augmentation to make local LLM outputs more robust.Add constraints and logic to avoid context overflow.