Make `TextStreamer` public & add unit-tests #1700

pavel-esir · 2025-02-10T10:15:34Z

Check whether text is incomplete should be done as early as possible in order to write to m_decoded_lengths relevant information that text in the position of this token is incomplete (is expressed with -1).
Made TextStreamer public.
Also added unit-tests for TextStreamer ~~which use pybind to make available this object from Python for test purposes. But TextCallbackStreamer is still private and visible only for developers.~~

Ticket: CVS-148635

tests/python_tests/test_text_streamer.py

CMakeLists.txt

.github/workflows/windows.yml

.github/workflows/linux.yml

Wovchena · 2025-02-12T07:08:18Z

src/cpp/src/text_callback_streamer.cpp

+    if (text.size() >= 3 && text.compare(text.size() - 3, 3, replacement) == 0) {
+        m_decoded_lengths[m_decoded_lengths.size() - 1] = -1;
+        // Don't print incomplete text
+        return set_streaming_status(on_finalized_subword_callback(res.str()));


IterableStreamer needs a similar change. Please add ChunkStreamer to tests as well.

Turned out IterableStreamer already handles this situation correctly.

We talked with @as-suvorov, looks like he will add such unit-tests together with implementation of TextStreamer with vectorized write.

ilya-lavrenov · 2025-02-12T17:30:11Z

src/python/py_openvino_genai.cpp

@@ -131,6 +143,11 @@ PYBIND11_MODULE(py_openvino_genai, m) {
        .def(py::init<>())
        .def("write", &StreamerBase::write, "Write is called every time new token is decoded. Returns a StreamingStatus flag to indicate whether generation should be stopped or cancelled", py::arg("token"))
        .def("end", &StreamerBase::end, "End is called at the end of generation. It can be used to flush cache if your own streamer has one");
+
+    py::class_<TextStreamer, std::shared_ptr<TextStreamer>>(m, "TextStreamer", text_streamer_docstring)


I think we need to highlight that TextStreamerBase is a base class

I would also create a dedicated py_text_streamer.cpp file for all streamers on python side

ilya-lavrenov · 2025-02-12T17:30:24Z

tests/python_tests/test_text_streamer.py

+    # ("black-forest-labs/FLUX.1-dev", dict(subfolder="tokenizer")),  # FLUX.1-dev has tokenizer in subfolder
+]
+
+# Chekc that fix for CVS-157216 works.


Suggested change

# Chekc that fix for CVS-157216 works.

# Check that fix for CVS-157216 works.

ilya-lavrenov · 2025-02-12T17:31:59Z

src/python/openvino_genai/py_openvino_genai.pyi

+    callback: User-defined callback function to process the decoded text, callback should return either boolean flag or StreamingStatus.
+
+    """
+    def __init__(self, tokenizer: ..., callback: typing.Callable[[str], bool | StreamingStatus]) -> None:


type ... is unexpected

Tokenizer class should be defined before TextStreamer in pybind11 PYBIND11_MODULE(py_openvino_genai, ..)

ilya-lavrenov · 2025-02-12T17:33:57Z

src/python/openvino_genai/py_openvino_genai.pyi

+        ...
+    def end(self) -> None:
+        ...
+    def write(self, arg0: int) -> StreamingStatus:


please, use explicit argument name via py::arg to avoid automatic arg0 names

ilya-lavrenov · 2025-02-12T17:34:42Z

src/python/openvino_genai/__init__.pyi

-__all__ = ['Adapter', 'AdapterConfig', 'AggregationMode', 'AutoencoderKL', 'CLIPTextModel', 'CLIPTextModelWithProjection', 'CacheEvictionConfig', 'ChunkStreamerBase', 'ContinuousBatchingPipeline', 'CppStdGenerator', 'DecodedResults', 'EncodedResults', 'FluxTransformer2DModel', 'GenerationConfig', 'GenerationResult', 'Generator', 'Image2ImagePipeline', 'ImageGenerationConfig', 'InpaintingPipeline', 'LLMPipeline', 'PerfMetrics', 'RawPerfMetrics', 'SD3Transformer2DModel', 'Scheduler', 'SchedulerConfig', 'StopCriteria', 'StreamerBase', 'StreamingStatus', 'T5EncoderModel', 'Text2ImagePipeline', 'TokenizedInputs', 'Tokenizer', 'TorchGenerator', 'UNet2DConditionModel', 'VLMPipeline', 'WhisperGenerationConfig', 'WhisperPerfMetrics', 'WhisperPipeline', 'WhisperRawPerfMetrics', 'draft_model', 'get_version', 'openvino', 'os', 'py_openvino_genai']
-__version__: str
+__all__ = ['Adapter', 'AdapterConfig', 'AggregationMode', 'AutoencoderKL', 'CLIPTextModel', 'CLIPTextModelWithProjection', 'CacheEvictionConfig', 'ChunkStreamerBase', 'ContinuousBatchingPipeline', 'CppStdGenerator', 'DecodedResults', 'EncodedResults', 'FluxTransformer2DModel', 'GenerationConfig', 'GenerationResult', 'Generator', 'Image2ImagePipeline', 'ImageGenerationConfig', 'InpaintingPipeline', 'LLMPipeline', 'PerfMetrics', 'RawPerfMetrics', 'SD3Transformer2DModel', 'Scheduler', 'SchedulerConfig', 'StopCriteria', 'StreamerBase', 'StreamingStatus', 'T5EncoderModel', 'Text2ImagePipeline', 'TextStreamer', 'TokenizedInputs', 'Tokenizer', 'TorchGenerator', 'UNet2DConditionModel', 'VLMPipeline', 'WhisperGenerationConfig', 'WhisperPerfMetrics', 'WhisperPipeline', 'WhisperRawPerfMetrics', 'draft_model', 'get_version', 'openvino', 'os', 'py_openvino_genai']
+__version__: str = '2025.1.0.0-1760-6e22001f986-add_text_streamer_tests'


having a version is unexpected

it's explicitly removed to avoid regeneration of this file for each commit because of different hashes

pavel-esir added the bug Something isn't working label Feb 10, 2025

pavel-esir added this to the 2025.1 milestone Feb 10, 2025

pavel-esir requested review from ilya-lavrenov, Wovchena and as-suvorov February 10, 2025 10:15

github-actions bot added category: LLM LLM pipeline (stateful, static) category: GHA CI based on Github actions category: cmake / build Cmake scripts no-match-files labels Feb 10, 2025

pavel-esir commented Feb 10, 2025

View reviewed changes

tests/python_tests/test_text_streamer.py Outdated Show resolved Hide resolved

fix TextCallbackStreamer add unit-tests

dd46bee

ilya-lavrenov reviewed Feb 10, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

.github/workflows/windows.yml Outdated Show resolved Hide resolved

.github/workflows/linux.yml Outdated Show resolved Hide resolved

started to add ov_genai_test_utils package

f23717b

Wovchena reviewed Feb 12, 2025

View reviewed changes

pavel-esir force-pushed the add_text_streamer_tests branch 3 times, most recently from 220815c to c30f1a8 Compare February 12, 2025 10:45

pavel-esir requested a review from ilya-lavrenov February 12, 2025 10:56

pavel-esir added 2 commits February 12, 2025 11:58

use genai_test_utils wheel; extend test cases

c30f1a8

unite test with a common CMakeLists.txt

6e22001

pavel-esir force-pushed the add_text_streamer_tests branch from 1667b9e to 8d9c781 Compare February 12, 2025 15:34

github-actions bot removed the category: samples GenAI samples label Feb 12, 2025

pavel-esir changed the title ~~Fix TextCallbackStreamer add unit-tests~~ Make TextStreamer public & add unit-tests Feb 12, 2025

make TextStreamer public

860307b

pavel-esir requested a review from Wovchena February 12, 2025 15:45

pavel-esir added 4 commits February 12, 2025 16:46

remove genai_test_utils

8d9c781

Merge branch 'master' into add_text_streamer_tests

1c092aa

resolve conflicts with master

5ca36c0

update stubgen

93ed87f

ilya-lavrenov reviewed Feb 12, 2025

View reviewed changes

ilya-lavrenov assigned ilya-lavrenov, mzegla and as-suvorov Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `TextStreamer` public & add unit-tests #1700

Make `TextStreamer` public & add unit-tests #1700

pavel-esir commented Feb 10, 2025 •

edited

Loading

Wovchena Feb 12, 2025

pavel-esir Feb 12, 2025

ilya-lavrenov Feb 12, 2025

ilya-lavrenov Feb 12, 2025

ilya-lavrenov Feb 12, 2025

ilya-lavrenov Feb 12, 2025

ilya-lavrenov Feb 12, 2025

	# Chekc that fix for CVS-157216 works.
	# Check that fix for CVS-157216 works.

Make TextStreamer public & add unit-tests #1700

Are you sure you want to change the base?

Make TextStreamer public & add unit-tests #1700

Conversation

pavel-esir commented Feb 10, 2025 • edited Loading

Wovchena Feb 12, 2025

Choose a reason for hiding this comment

pavel-esir Feb 12, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 12, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 12, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 12, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 12, 2025

Choose a reason for hiding this comment

ilya-lavrenov Feb 12, 2025

Choose a reason for hiding this comment

Make `TextStreamer` public & add unit-tests #1700

Make `TextStreamer` public & add unit-tests #1700

pavel-esir commented Feb 10, 2025 •

edited

Loading