Fix Doc-Sum stream output format #1219

xiguiw · 2025-01-23T12:26:21Z

Keep Doc-Sum stream output aligned with v1.1 format

Fix issue:
opea-project/GenAIInfra#753

Type of change

List the type of change like below. Please delete options that are not relevant.

Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

vrantala

This change makes the output v1.1 compatible, but the issue for wrong first token latency count remains. in v1.1 DocSum on outputted tokens e.g

`curl http://10.96.106.94:8888/v1/docsum    -H "Content-Type: multipart/form-data"    -F "type=text"    -F "messages="    -F "files=@./pubmed_10.txt"    -F "max_tokens=1024"    -F "language=en"    -F "stream=true"
data: b' \n\n'

data: b'The'

data: b' provided'

data: b' text'
`

Now the output is aligned with v1.1, but for some reason it still out put lines that it should not out as tokens. Next example shows current output. First three outputs are not token outputs, fourth line is first real token. What are the three output lines and why they are there?

`curl http://10.110.23.165:8888/v1/docsum \
   -H "Content-Type: multipart/form-data" \
   -F "type=text" \
   -F "messages=" \
   -F "files=@./pubmed_10.txt" \
   -F "max_tokens=1024" \
   -F "language=en" \
   -F "stream=true"
data: {"ops":[{"op":"replace","path":"","value":{"id":"0d08e051-f332-4ee4-9ffa-af7c38a32d91","streamed_output":[],"final_output":null,"logs":{},"name":"StuffDocumentsChain","type":"chain"}}]}

data: {"ops":[{"op":"add","path":"/logs/LLMChain","value":{"id":"1338af3d-063f-4cac-a7d0-4e0c891dcf67","name":"LLMChain","type":"chain","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.221+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint","value":{"id":"fbc35b7c-9cc0-4ea7-8878-011feb75ea14","name":"HuggingFaceEndpoint","type":"llm","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.225+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":" \n\n"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":" \n\n"}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":"The"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":"The"}]}
`

xiguiw · 2025-01-24T02:03:14Z

This change makes the output v1.1 compatible, but the issue for wrong first token latency count remains. in v1.1 DocSum on outputted tokens e.g

`curl http://10.96.106.94:8888/v1/docsum    -H "Content-Type: multipart/form-data"    -F "type=text"    -F "messages="    -F "files=@./pubmed_10.txt"    -F "max_tokens=1024"    -F "language=en"    -F "stream=true"
data: b' \n\n'

data: b'The'

data: b' provided'

data: b' text'
`

Now the output is aligned with v1.1, but for some reason it still out put lines that it should not out as tokens. Next example shows current output. First three outputs are not token outputs, fourth line is first real token. What are the three output lines and why they are there?

`curl http://10.110.23.165:8888/v1/docsum \
   -H "Content-Type: multipart/form-data" \
   -F "type=text" \
   -F "messages=" \
   -F "files=@./pubmed_10.txt" \
   -F "max_tokens=1024" \
   -F "language=en" \
   -F "stream=true"
data: {"ops":[{"op":"replace","path":"","value":{"id":"0d08e051-f332-4ee4-9ffa-af7c38a32d91","streamed_output":[],"final_output":null,"logs":{},"name":"StuffDocumentsChain","type":"chain"}}]}

data: {"ops":[{"op":"add","path":"/logs/LLMChain","value":{"id":"1338af3d-063f-4cac-a7d0-4e0c891dcf67","name":"LLMChain","type":"chain","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.221+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint","value":{"id":"fbc35b7c-9cc0-4ea7-8878-011feb75ea14","name":"HuggingFaceEndpoint","type":"llm","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.225+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":" \n\n"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":" \n\n"}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":"The"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":"The"}]}
`

Yes, you are right. The stream output is not expected: Not only the format, but also the contents.
I'll investigate it. If could not get a complete fix, a workaround will be submitted.

Workround to keep Doc-Sum stream output aligned with v1.1 format This is a workaround to extract the tokens from stream output. Fix issue: opea-project/GenAIInfra#753 Signed-off-by: Wang, Xigui <[email protected]>

for more information, see https://pre-commit.ci

xiguiw · 2025-01-24T08:16:16Z

The workaround extracts the LLM output tokens from the stream output. The extracted tokens are correctly.

As I am not familiar with the stream output format, I believe there are matured solution I did not find out yet.

curl http://${host_ip}:9000/v1/docsum   -X POST   -d '{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "stream":true}'   -H 'Content-Type: application/json'
data:  Text

data:  Emb

data: ed

data: dings

data:  In

data: ference

data:  (

data: TE

data: I

data: )

data:  is

data:  a

data:  tool

data: kit

data:  facil

data: itating

data:  deployment

data:  and

data:  serving

data:  of

data:  open

data:  source

data:  text

data:  embed

data: dings

data:  and

data:  sequence

data:  classification

data:  models

data: ,

data:  offering

data:  efficient

data: [DONE]

xiguiw · 2025-01-24T09:30:18Z

Close it and wait for a complete fix.

xiguiw requested review from lvliang-intel and letonghan as code owners January 23, 2025 12:26

vrantala reviewed Jan 23, 2025

View reviewed changes

xiguiw force-pushed the doc-sum branch from 005caa7 to 4f45127 Compare January 24, 2025 08:05

Fix Doc-Sum stream output format

141678e

Workround to keep Doc-Sum stream output aligned with v1.1 format This is a workaround to extract the tokens from stream output. Fix issue: opea-project/GenAIInfra#753 Signed-off-by: Wang, Xigui <[email protected]>

xiguiw force-pushed the doc-sum branch from 947d339 to 141678e Compare January 24, 2025 08:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

40ef2ce

for more information, see https://pre-commit.ci

xiguiw closed this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Doc-Sum stream output format #1219

Fix Doc-Sum stream output format #1219

xiguiw commented Jan 23, 2025

vrantala left a comment

xiguiw commented Jan 24, 2025

xiguiw commented Jan 24, 2025 •

edited

Loading

xiguiw commented Jan 24, 2025 •

edited

Loading

Fix Doc-Sum stream output format #1219

Fix Doc-Sum stream output format #1219

Conversation

xiguiw commented Jan 23, 2025

Type of change

Dependencies

Tests

vrantala left a comment

Choose a reason for hiding this comment

xiguiw commented Jan 24, 2025

xiguiw commented Jan 24, 2025 • edited Loading

xiguiw commented Jan 24, 2025 • edited Loading

xiguiw commented Jan 24, 2025 •

edited

Loading

xiguiw commented Jan 24, 2025 •

edited

Loading