-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How handle long conversation history #26
Comments
Thanks @cahuja1992 1. As history grows summarization of conversation would be useful. This makes sense, but would include one more call to the LLM, how long do you think it would make sense to start summarizing? 2. If a question is just to format the previous response or anything that doesn't need further retrieval at that time shouldn't we be detecting if retrieval is needed? Now we have a triage function that does exactly this. |
@placerda As chat history is passed in the triage, there we will reach the token limits. So one idea is to as we reach 90% of the token limits, spawn a thread of chat summarization so that for the next question we can use the summary. |
@placerda Any thoughts on this approach? If we proactively keep on summarizing the conversation, then we do not even see significant differences in latency. |
The text was updated successfully, but these errors were encountered: