Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Doc-Sum stream output format #1219

Closed
wants to merge 2 commits into from
Closed

Conversation

xiguiw
Copy link
Collaborator

@xiguiw xiguiw commented Jan 23, 2025

Keep Doc-Sum stream output aligned with v1.1 format

Fix issue:
opea-project/GenAIInfra#753

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

Copy link

@vrantala vrantala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes the output v1.1 compatible, but the issue for wrong first token latency count remains. in v1.1 DocSum on outputted tokens e.g

`curl http://10.96.106.94:8888/v1/docsum    -H "Content-Type: multipart/form-data"    -F "type=text"    -F "messages="    -F "files=@./pubmed_10.txt"    -F "max_tokens=1024"    -F "language=en"    -F "stream=true"
data: b' \n\n'

data: b'The'

data: b' provided'

data: b' text'
`

Now the output is aligned with v1.1, but for some reason it still out put lines that it should not out as tokens. Next example shows current output. First three outputs are not token outputs, fourth line is first real token. What are the three output lines and why they are there?

`curl http://10.110.23.165:8888/v1/docsum \
   -H "Content-Type: multipart/form-data" \
   -F "type=text" \
   -F "messages=" \
   -F "files=@./pubmed_10.txt" \
   -F "max_tokens=1024" \
   -F "language=en" \
   -F "stream=true"
data: {"ops":[{"op":"replace","path":"","value":{"id":"0d08e051-f332-4ee4-9ffa-af7c38a32d91","streamed_output":[],"final_output":null,"logs":{},"name":"StuffDocumentsChain","type":"chain"}}]}

data: {"ops":[{"op":"add","path":"/logs/LLMChain","value":{"id":"1338af3d-063f-4cac-a7d0-4e0c891dcf67","name":"LLMChain","type":"chain","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.221+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint","value":{"id":"fbc35b7c-9cc0-4ea7-8878-011feb75ea14","name":"HuggingFaceEndpoint","type":"llm","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.225+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":" \n\n"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":" \n\n"}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":"The"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":"The"}]}
` 

@xiguiw
Copy link
Collaborator Author

xiguiw commented Jan 24, 2025

This change makes the output v1.1 compatible, but the issue for wrong first token latency count remains. in v1.1 DocSum on outputted tokens e.g

`curl http://10.96.106.94:8888/v1/docsum    -H "Content-Type: multipart/form-data"    -F "type=text"    -F "messages="    -F "files=@./pubmed_10.txt"    -F "max_tokens=1024"    -F "language=en"    -F "stream=true"
data: b' \n\n'

data: b'The'

data: b' provided'

data: b' text'
`

Now the output is aligned with v1.1, but for some reason it still out put lines that it should not out as tokens. Next example shows current output. First three outputs are not token outputs, fourth line is first real token. What are the three output lines and why they are there?

`curl http://10.110.23.165:8888/v1/docsum \
   -H "Content-Type: multipart/form-data" \
   -F "type=text" \
   -F "messages=" \
   -F "files=@./pubmed_10.txt" \
   -F "max_tokens=1024" \
   -F "language=en" \
   -F "stream=true"
data: {"ops":[{"op":"replace","path":"","value":{"id":"0d08e051-f332-4ee4-9ffa-af7c38a32d91","streamed_output":[],"final_output":null,"logs":{},"name":"StuffDocumentsChain","type":"chain"}}]}

data: {"ops":[{"op":"add","path":"/logs/LLMChain","value":{"id":"1338af3d-063f-4cac-a7d0-4e0c891dcf67","name":"LLMChain","type":"chain","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.221+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint","value":{"id":"fbc35b7c-9cc0-4ea7-8878-011feb75ea14","name":"HuggingFaceEndpoint","type":"llm","tags":[],"metadata":{},"start_time":"2025-01-23T07:42:11.225+00:00","streamed_output":[],"streamed_output_str":[],"final_output":null,"end_time":null}}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":" \n\n"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":" \n\n"}]}

data: {"ops":[{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output_str/-","value":"The"},{"op":"add","path":"/logs/HuggingFaceEndpoint/streamed_output/-","value":"The"}]}
` 

Yes, you are right. The stream output is not expected: Not only the format, but also the contents.
I'll investigate it. If could not get a complete fix, a workaround will be submitted.

Workround to keep Doc-Sum stream output aligned with v1.1 format

This is a workaround to extract the tokens from stream output.

Fix issue:
opea-project/GenAIInfra#753

Signed-off-by: Wang, Xigui <[email protected]>
@xiguiw
Copy link
Collaborator Author

xiguiw commented Jan 24, 2025

The workaround extracts the LLM output tokens from the stream output. The extracted tokens are correctly.

As I am not familiar with the stream output format, I believe there are matured solution I did not find out yet.

curl http://${host_ip}:9000/v1/docsum   -X POST   -d '{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5.", "max_tokens":32, "language":"en", "stream":true}'   -H 'Content-Type: application/json'
data:  Text

data:  Emb

data: ed

data: dings

data:  In

data: ference

data:  (

data: TE

data: I

data: )

data:  is

data:  a

data:  tool

data: kit

data:  facil

data: itating

data:  deployment

data:  and

data:  serving

data:  of

data:  open

data:  source

data:  text

data:  embed

data: dings

data:  and

data:  sequence

data:  classification

data:  models

data: ,

data:  offering

data:  efficient

data: [DONE]

@xiguiw
Copy link
Collaborator Author

xiguiw commented Jan 24, 2025

Close it and wait for a complete fix.

@xiguiw xiguiw closed this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants