Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error after second start_chat() for StatefulLLMPipeline #1684

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Feb 6, 2025

Ticket: CVS-161902

@sbalandi sbalandi requested review from Wovchena and ilya-lavrenov and removed request for Wovchena February 6, 2025 21:37
if (have_state) {
m_model_runner.reset_state();
m_model_runner.get_tensor("attention_mask").set_shape({1, 0});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a similar block below:

    if (!m_tokenized_chat_history.empty()) {
        reset_kv_state();
        m_history = {};
        m_templated_chat_history.clear();
        m_tokenized_chat_history.clear();
    }

I propose to change the condition to if (have_state) for the old block and reset everything there.

Should a similar thing be performed in the end of generate() if is_chat_conversation is false? It would be better to perform in the beginning of generate() to improve exception safety, but we already do that in the end everywhere, so stay consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should start_chat call stop_chat first ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated
in chat scenario we rely on the existing attention_mask , and when is_chat_conversation is false new attention_mask is explicitly used, that why it doesn't crash for non-chat

if (have_state) {
m_model_runner.reset_state();
m_model_runner.get_tensor("attention_mask").set_shape({1, 0});
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the request to add a test is valid. You can take the idea from #1674 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also cover a case when user call next start_chat w/o stop_chat for current one.
IMO, it should automatically stop current chat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cases with system_massage, with double start_chat and with several chats one after another are added.
If i understand right tests work now for PA backend, should it be added run for SDPA too ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand right tests work now for PA backend, should it be added run for SDPA too ?

they will be added as part of CVS-159925
For now, you can check locally that tests are passing on SPDA branch by changing default backend here

std::string attention_backend = PA_BACKEND;

@ilya-lavrenov ilya-lavrenov added bug Something isn't working category: LLM LLM pipeline (stateful, static) labels Feb 7, 2025
@ilya-lavrenov ilya-lavrenov added this to the 2025.1 milestone Feb 7, 2025
@ilya-lavrenov
Copy link
Contributor

does a current PR fix #1663 ?

@sbalandi
Copy link
Contributor Author

sbalandi commented Feb 7, 2025

does a current PR fix #1663 ?

The reason looks like point which Vladimir mention above. If execution is interrupted and generation() does not complete correctly, this error will occur. I moved the resets to the beginning, this fixes similar behavior for phi-3.

@sbalandi sbalandi force-pushed the phi3 branch 2 times, most recently from 77c2688 to ac2474c Compare February 7, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: LLM LLM pipeline (stateful, static) no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants