-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper pipeline: use parallel streamer #1642
Draft
as-suvorov
wants to merge
21
commits into
openvinotoolkit:master
Choose a base branch
from
as-suvorov:as/whisper_parallel_streamer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+813
−347
Draft
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
8a38e00
Use whisper streamer
as-suvorov 478fed1
Remove lock
as-suvorov 05bc82a
Fix python lock
as-suvorov b3ae260
Use async mode
as-suvorov c50dd9c
Use StreamerBase with tokens put
as-suvorov baced7b
correct typings
as-suvorov d16c8c0
Use threaded streamer
as-suvorov 403270e
Merge remote-tracking branch 'upstream/master' into as/whisper_parall…
as-suvorov 46bd423
Remove decoder.decode
as-suvorov c505063
Return if has no callback
as-suvorov 7881cb9
Switch condition
as-suvorov 02320ff
Early return for is_dropped
as-suvorov 5104bfc
Move cb to private
as-suvorov e70b743
Remove todods
as-suvorov 1a6dcb6
Add stream_tokens for icb
as-suvorov 78ad93f
Apply review comments
as-suvorov a3e451f
Fix iterable streamer
as-suvorov 520e758
Use sync queue
as-suvorov f559f44
Remove put tokens override
as-suvorov e4d5a6a
Add py tests
as-suvorov 8590765
Add threaded streamer tests
as-suvorov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,3 +41,5 @@ def main(): | |
|
||
if "__main__" == __name__: | ||
main() | ||
|
||
# todo: check base streamer inheritance (gil lock) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ class OPENVINO_GENAI_EXPORTS StreamerBase { | |
/// @brief put is called every time new token is decoded, | ||
/// @return bool flag to indicate whether generation should be stopped, if return true generation stops | ||
virtual bool put(int64_t token) = 0; | ||
virtual bool put(const std::vector<int64_t>& tokens) = 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Explain in doc-strings what the new overload is for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doc-string added |
||
|
||
/// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one | ||
virtual void end() = 0; | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That adds a requirements to the child classes to override one more method. I think it must have a default implementation.
IterableStreamer
only overrides single token version.openvino.genai/samples/python/text_generation/multinomial_causal_lm.py
Line 11 in 9575602
bool put(const std::vector<int64_t>& tokens)
remains 0 by defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default implementation will be suboptimal as it does not populate
m_tokens_cache
, which is a field of derived class.Do we need to have
m_tokens_cache
in interface as well?It will tell users "something" about possible implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A NotIimplementedException is enough.
No, a child class will end up with this field which may not be used.
end()
already suggests that caching is possible.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case such streamer cannot work with Spec Decoding, Prompt Look-up or even stop strings
It's not BW compatible as well, as in current PR we assume that such method
put(vector)
is availableThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then the need to override
IterableStreamer.put()
is even more important.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the final decision is to add default sub-optimal implementation of
put(vector)
? While our streamer will override this method to have optimal implementationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe with a warning print