Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use max_tokens to do better rate limit handling #4224

Merged
merged 3 commits into from
Mar 7, 2025

Conversation

evan-danswer
Copy link
Contributor

Description

Fixes https://linear.app/danswer/issue/DAN-1518/use-max-tokens-in-agent-search-where-possible

  • Some timeout bumps to make o1 happy
  • use max_tokens to make (especially the smaller) Claude models not give rate limit errors.

From https://docs.anthropic.com/en/api/rate-limits

"OTPM rate limits are estimated based on max_tokens at the beginning of each request, and the estimate is adjusted at the end of the request to reflect the actual number of output tokens used. If you’re hitting OTPM limits earlier than expected, try reducing max_tokens to better approximate the size of your completions."

How Has This Been Tested?

Tested in UI

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

@evan-danswer evan-danswer requested a review from a team as a code owner March 6, 2025 22:25
Copy link

vercel bot commented Mar 6, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 6, 2025 11:25pm

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds max_tokens parameters across LLM calls to better handle rate limits, particularly for smaller Claude models, following Anthropic's guidance on OTPM rate limit management.

  • Added _should_restrict_tokens utility in utils.py to conditionally apply token restrictions based on LLM provider/model
  • Increased timeout values in agent_configs.py to accommodate slower models like o1, though duplicated code blocks should be cleaned up
  • Fixed role handling in chat_llm.py where None role was incorrectly defaulting to None instead of 'unknown'
  • Renamed run_with_timeout to run_with_timeout_multiproc in timeout.py for better clarity on multiprocessing behavior
  • Inconsistency in custom_llm.py where max_tokens parameter is added but not utilized in _execute method, still using fixed _max_output_tokens

18 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile


AGENT_DEFAULT_MAX_TOKENS_ANSWER_GENERATION = 1024
AGENT_MAX_TOKENS_ANSWER_GENERATION = int(
os.environ.get("AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Environment variable name AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION doesn't match the config variable name AGENT_MAX_TOKENS_ANSWER_GENERATION

Suggested change
os.environ.get("AGENT_MAX_TOKENS_INITIAL_ANSWER_GENERATION")
os.environ.get("AGENT_MAX_TOKENS_ANSWER_GENERATION")

Copy link
Contributor

@joachim-danswer joachim-danswer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@evan-danswer evan-danswer added this pull request to the merge queue Mar 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 7, 2025
@evan-danswer evan-danswer added this pull request to the merge queue Mar 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 7, 2025
@evan-danswer evan-danswer merged commit 0c29743 into main Mar 7, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants