-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error after second start_chat() for StatefulLLMPipeline #1684
base: master
Are you sure you want to change the base?
Conversation
if (have_state) { | ||
m_model_runner.reset_state(); | ||
m_model_runner.get_tensor("attention_mask").set_shape({1, 0}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a similar block below:
if (!m_tokenized_chat_history.empty()) {
reset_kv_state();
m_history = {};
m_templated_chat_history.clear();
m_tokenized_chat_history.clear();
}
I propose to change the condition to if (have_state)
for the old block and reset everything there.
Should a similar thing be performed in the end of generate()
if is_chat_conversation
is false
? It would be better to perform in the beginning of generate()
to improve exception safety, but we already do that in the end everywhere, so stay consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should start_chat
call stop_chat
first ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
in chat scenario we rely on the existing attention_mask , and when is_chat_conversation
is false
new attention_mask
is explicitly used, that why it doesn't crash for non-chat
if (have_state) { | ||
m_model_runner.reset_state(); | ||
m_model_runner.get_tensor("attention_mask").set_shape({1, 0}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the request to add a test is valid. You can take the idea from #1674 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also cover a case when user call next start_chat
w/o stop_chat
for current one.
IMO, it should automatically stop current chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cases with system_massage, with double start_chat and with several chats one after another are added.
If i understand right tests work now for PA backend, should it be added run for SDPA too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i understand right tests work now for PA backend, should it be added run for SDPA too ?
they will be added as part of CVS-159925
For now, you can check locally that tests are passing on SPDA branch by changing default backend here
std::string attention_backend = PA_BACKEND; |
does a current PR fix #1663 ? |
The reason looks like point which Vladimir mention above. If execution is interrupted and generation() does not complete correctly, this error will occur. I moved the resets to the beginning, this fixes similar behavior for phi-3. |
77c2688
to
ac2474c
Compare
Ticket: CVS-161902