Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How handle long conversation history #26

Open
cahuja1992 opened this issue Nov 17, 2023 · 3 comments
Open

How handle long conversation history #26

cahuja1992 opened this issue Nov 17, 2023 · 3 comments

Comments

@cahuja1992
Copy link

  1. As history grows summarization of conversation would be useful.
  2. If a question is just to format the previous response or anything that doesn't need further retrieval at that time shouldn't we be detecting if retrieval is needed?
@placerda placerda transferred this issue from Azure/gpt-rag-orchestrator Nov 20, 2023
@placerda placerda transferred this issue from Azure/GPT-RAG Nov 20, 2023
@placerda
Copy link
Collaborator

Thanks @cahuja1992

1. As history grows summarization of conversation would be useful.

This makes sense, but would include one more call to the LLM, how long do you think it would make sense to start summarizing?

2. If a question is just to format the previous response or anything that doesn't need further retrieval at that time shouldn't we be detecting if retrieval is needed?

Now we have a triage function that does exactly this.

@cahuja1992
Copy link
Author

cahuja1992 commented Nov 28, 2023

@placerda As chat history is passed in the triage, there we will reach the token limits. So one idea is to as we reach 90% of the token limits, spawn a thread of chat summarization so that for the next question we can use the summary.

@cahuja1992
Copy link
Author

@placerda As chat history is passed in the triage, there we will reach the token limits. So one idea is to as we reach 90% of the token limits, spawn a thread of chat summarisation so that for the next question we can use the summary.

@placerda Any thoughts on this approach? If we proactively keep on summarizing the conversation, then we do not even see significant differences in latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants