Add a choice of how to end streaming from callback: STOP or CANCEL #1476

sbalandi · 2025-01-03T22:05:28Z

No description provided.

sbalandi · 2025-01-03T22:08:18Z

TODO: add CANCEL for ContinuousBatching

sbalandi · 2025-01-08T22:20:33Z

TODO: add CANCEL for ContinuousBatching

done

ilya-lavrenov

Please, add tests for new functionality.

ilya-lavrenov · 2025-01-10T08:06:22Z

samples/python/prompt_lookup_decoding_lm/prompt_lookup_decoding_lm.py

+    print(subword, end='', flush=True) 
+    # Return flag corresponds whether generation should be stopped. 
+    # False means continue generation. 
+    return False


BTW, should we also support callback w/o return value?

E.g. when user don't care about any stop / cancellation

ilya-lavrenov · 2025-01-10T08:06:35Z

samples/cpp/chat_sample/chat_sample.cpp

        std::cout << word << std::flush;
        // Return flag corresponds whether generation should be stopped.
        // false means continue generation.
-        return false; 
+
+        return ov::genai::StreamerRunningStatus::RUNNING;


Suggested change

return ov::genai::StreamerRunningStatus::RUNNING;

return ov::genai::StreamingStatus::CONTINUE;

ilya-lavrenov · 2025-01-10T08:07:41Z

src/cpp/include/openvino/genai/generation_handle.hpp

@@ -30,6 +31,9 @@ struct EncodedGenerationResult {

    // Status of generation
    GenerationStatus m_status = GenerationStatus::RUNNING;
+
+    // Status of streaming
+    StreamerRunningStatus m_streaming_status = ov::genai::StreamerRunningStatus::UNDEF;


maybe we can extend GenerationStatus ? E.g. DROPPED_BY_HANDLE means STOP in its current implementation, while for CANCEL we can add a new value.

BTW, looks like we can drop DROPPED_BY_PIPELINE as unused.

More thoughts:

maybe we should deprecated drop() method and introduce stop() instead

similarly for GenerationStatus

and extend both GenerationHandle and GenerationStatus with cancel() functionality

In this case CB and LLM pipelines logic / naming will be aligned

ilya-lavrenov · 2025-01-10T08:12:20Z

src/python/openvino_genai/__init__.py

@@ -15,7 +15,7 @@
    RawPerfMetrics,
    PerfMetrics,
    StreamerBase,
-    get_version,


can we keep it?

ilya-lavrenov · 2025-01-10T08:13:37Z

src/cpp/src/text_callback_streamer.hpp

@@ -11,12 +11,17 @@ namespace genai {

 class TextCallbackStreamer: public StreamerBase {
 public:
+    StreamerRunningStatus streaming_status = StreamerRunningStatus::UNDEF;


as I see StreamerBase already contains this field ?

ilya-lavrenov · 2025-01-10T08:15:58Z

src/cpp/include/openvino/genai/streamer_base.hpp

+    CANCEL = 3 // Stop generate, drop last prompt and all generated tokens from history, KV cache include history but last step
+};
+
+using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>;


Suggested change

using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>;

using CallbackTypeVariant = std::variant<void, bool, StreamerRunningStatus>;

to support callback which just "prints"

ilya-lavrenov · 2025-01-10T08:19:34Z

src/cpp/include/openvino/genai/streamer_base.hpp

@@ -22,6 +34,10 @@ class OPENVINO_GENAI_EXPORTS StreamerBase {
    /// @brief end is called at the end of generation. It can be used to flush cache if your own streamer has one
    virtual void end() = 0;

+    virtual StreamerRunningStatus get_finish_streaming_reason() {


Suggested change

virtual StreamerRunningStatus get_finish_streaming_reason() {

StreamingStatus get_streaming_status() {

ilya-lavrenov · 2025-01-10T08:21:17Z

src/python/openvino_genai/py_openvino_genai.pyi

        ...
    @typing.overload
-    def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | StreamerBase | None = None) -> list[GenerationResult]:
+    def generate(self, prompts: list[str], generation_config: list[GenerationConfig], streamer: typing.Callable[[str], bool] | typing.Callable[[str], ...] | StreamerBase | None = None) -> list[GenerationResult]:


should we propagate StreamingStatus to Python API? to use enum instead of str

Wovchena · 2025-01-14T08:28:40Z

src/cpp/include/openvino/genai/streamer_base.hpp

 /**
 * @brief base class for streamers. In order to use inherit from from this class and implement put, and methods
 *
 * @param m_tokenizer tokenizer
 */
 class OPENVINO_GENAI_EXPORTS StreamerBase {
+protected:
+    StreamerRunningStatus streaming_finish_status = StreamerRunningStatus::UNDEF;


Suggested change

StreamerRunningStatus streaming_finish_status = StreamerRunningStatus::UNDEF;

StreamerRunningStatus m_streaming_finish_status = StreamerRunningStatus::UNDEF;

Wovchena · 2025-01-14T08:33:17Z

src/cpp/src/visual_language/pipeline.cpp

@@ -187,6 +187,9 @@ class ov::genai::VLMPipeline::VLMPipelineImpl {
        SequenceGroup::Ptr sequence_group = std::make_shared<SequenceGroup>(request_id, prompt_ids, generation_config, block_size);
        requests.push_back(sequence_group);

+        OPENVINO_ASSERT((!m_is_chat_conversation || !std::get_if<std::function<StreamerRunningStatus(std::string)>>(&streamer)),
+                         "For chat mode, please, use Steamer as StreamerBase class or as callback with a bool return value.");


Why is StreamerRunningStatus return value is prohibited for chat? You wrap it with TextCallbackStreamer below

Wovchena · 2025-01-14T08:36:39Z

src/cpp/include/openvino/genai/streamer_base.hpp


 namespace ov {
 namespace genai {

+enum class StreamerRunningStatus {
+    UNDEF = 0, // Streaming is not run
+    RUNNING = 1, // Continue to run of inference


RUNNING and UNDEF seem to be equivalent. In that case you should keep only one of them. Moreover callback should never return UNDEF, so merging them fixes the API.

sbalandi force-pushed the callback branch from 62439bf to 3800085 Compare January 3, 2025 22:10

ilya-lavrenov added this to the 2025.0 milestone Jan 4, 2025

ilya-lavrenov self-assigned this Jan 6, 2025

sbalandi force-pushed the callback branch 5 times, most recently from 454cdd9 to 1592ed0 Compare January 8, 2025 19:38

github-actions bot added category: Python API Python API for GenAI category: samples GenAI samples labels Jan 8, 2025

sbalandi force-pushed the callback branch 3 times, most recently from 10a755b to d18fe16 Compare January 8, 2025 22:19

sbalandi marked this pull request as ready for review January 8, 2025 22:43

sbalandi force-pushed the callback branch 3 times, most recently from 2758f6b to 03ca3ce Compare January 9, 2025 21:56

ilya-lavrenov reviewed Jan 10, 2025

View reviewed changes

ilya-lavrenov requested a review from Wovchena January 10, 2025 08:21

andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025

Wovchena requested changes Jan 14, 2025

View reviewed changes

sbalandi force-pushed the callback branch 8 times, most recently from f739e2a to 04bcc7f Compare January 17, 2025 14:48

sbalandi added 4 commits January 17, 2025 15:46

Add a choice of how to end streaming from callback: STOP or CANCEL

77ee167

fix python

08346ee

revert changes for python

067411b

Update

09f3020

sbalandi force-pushed the callback branch from 04bcc7f to c9f0df0 Compare January 17, 2025 19:16

update

c9f0df0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

sbalandi commented Jan 3, 2025

sbalandi commented Jan 3, 2025

sbalandi commented Jan 8, 2025

ilya-lavrenov left a comment

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 11, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

ilya-lavrenov Jan 10, 2025

Wovchena Jan 14, 2025

Wovchena Jan 14, 2025

Wovchena Jan 14, 2025

	return ov::genai::StreamerRunningStatus::RUNNING;
	return ov::genai::StreamingStatus::CONTINUE;

	using CallbackTypeVariant = std::variant<bool, StreamerRunningStatus>;
	using CallbackTypeVariant = std::variant<void, bool, StreamerRunningStatus>;

	virtual StreamerRunningStatus get_finish_streaming_reason() {
	StreamingStatus get_streaming_status() {

	StreamerRunningStatus streaming_finish_status = StreamerRunningStatus::UNDEF;
	StreamerRunningStatus m_streaming_finish_status = StreamerRunningStatus::UNDEF;

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Are you sure you want to change the base?

Add a choice of how to end streaming from callback: STOP or CANCEL #1476

Conversation

sbalandi commented Jan 3, 2025

sbalandi commented Jan 3, 2025

sbalandi commented Jan 8, 2025

ilya-lavrenov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment