use max_tokens to do better rate limit handling #4224

evan-danswer · 2025-03-06T22:25:41Z

Description

Fixes https://linear.app/danswer/issue/DAN-1518/use-max-tokens-in-agent-search-where-possible

Some timeout bumps to make o1 happy
use max_tokens to make (especially the smaller) Claude models not give rate limit errors.

From https://docs.anthropic.com/en/api/rate-limits

"OTPM rate limits are estimated based on max_tokens at the beginning of each request, and the estimate is adjusted at the end of the request to reflect the actual number of output tokens used. If you’re hitting OTPM limits earlier than expected, try reducing max_tokens to better approximate the size of your completions."

How Has This Been Tested?

Tested in UI

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

vercel · 2025-03-06T22:25:45Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 6, 2025 11:25pm

greptile-apps

PR Summary

This PR adds max_tokens parameters across LLM calls to better handle rate limits, particularly for smaller Claude models, following Anthropic's guidance on OTPM rate limit management.

Added _should_restrict_tokens utility in utils.py to conditionally apply token restrictions based on LLM provider/model
Increased timeout values in agent_configs.py to accommodate slower models like o1, though duplicated code blocks should be cleaned up
Fixed role handling in chat_llm.py where None role was incorrectly defaulting to None instead of 'unknown'
Renamed run_with_timeout to run_with_timeout_multiproc in timeout.py for better clarity on multiprocessing behavior
Inconsistency in custom_llm.py where max_tokens parameter is added but not utilized in _execute method, still using fixed _max_output_tokens

_{18 file(s) reviewed, 1 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2025-03-06T22:28:10Z

backend/onyx/configs/agent_configs.py

+
+AGENT_DEFAULT_MAX_TOKENS_ANSWER_GENERATION = 1024
+AGENT_MAX_TOKENS_ANSWER_GENERATION = int(
+    os.environ.get("AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION")


logic: Environment variable name AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION doesn't match the config variable name AGENT_MAX_TOKENS_ANSWER_GENERATION

Suggested change

os.environ.get("AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION")

os.environ.get("AGENT_MAX_TOKENS_ANSWER_GENERATION")

joachim-danswer

LGTM

use max_tokens to do better rate limit handling

ea08a72

evan-danswer requested a review from a team as a code owner March 6, 2025 22:25

greptile-apps bot reviewed Mar 6, 2025

View reviewed changes

vercel bot deployed to Preview March 6, 2025 22:29 View deployment

fix unti tests

c037d84

vercel bot deployed to Preview March 6, 2025 23:12 View deployment

address greptile comment, thanks greptile

5198bd4

vercel bot deployed to Preview March 6, 2025 23:25 View deployment

joachim-danswer approved these changes Mar 6, 2025

View reviewed changes

evan-danswer added this pull request to the merge queue Mar 7, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 7, 2025

evan-danswer added this pull request to the merge queue Mar 7, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 7, 2025

evan-danswer merged commit 0c29743 into main Mar 7, 2025
9 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use max_tokens to do better rate limit handling #4224

use max_tokens to do better rate limit handling #4224

evan-danswer commented Mar 6, 2025

vercel bot commented Mar 6, 2025 •

edited

Loading

greptile-apps bot left a comment

greptile-apps bot Mar 6, 2025

joachim-danswer left a comment

	os.environ.get("AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION")
	os.environ.get("AGENT_MAX_TOKENS_ANSWER_GENERATION")

use max_tokens to do better rate limit handling #4224

use max_tokens to do better rate limit handling #4224

Conversation

evan-danswer commented Mar 6, 2025

Description

How Has This Been Tested?

Backporting (check the box to trigger backport action)

vercel bot commented Mar 6, 2025 • edited Loading

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Mar 6, 2025

Choose a reason for hiding this comment

joachim-danswer left a comment

Choose a reason for hiding this comment

vercel bot commented Mar 6, 2025 •

edited

Loading