Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(huixiangdou): add chat_with_repo pipeline #362

Merged
merged 29 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
fe535bf
feat(llm_server_hybrid.py): support internlm2.5
tpoisonooo Aug 6, 2024
8d39558
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 6, 2024
f1077e9
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 7, 2024
3c663c1
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 12, 2024
e5bcc2e
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 12, 2024
d99b0ac
fix(primitive/faiss.py): save distance_strategy to pkl
tpoisonooo Aug 12, 2024
450ad2f
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 12, 2024
3ef2b57
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 13, 2024
8c690cc
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 14, 2024
024e328
fix(web_search.py): close async pyppeteer
tpoisonooo Aug 14, 2024
be679ef
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 14, 2024
1c37adc
Merge branch 'main' of https://github.com/internlm/huixiangdou into main
tpoisonooo Aug 15, 2024
7218743
update
tpoisonooo Aug 15, 2024
0088380
feat(service): add parallel pipeline
tpoisonooo Aug 13, 2024
722eb41
feat(service/parallel_pipeline.py): add parallel
tpoisonooo Aug 15, 2024
30444ea
feat(service): add parallel pipeline
tpoisonooo Aug 16, 2024
de88369
feat(config.ini): clean up
tpoisonooo Aug 16, 2024
ed23134
feat(huixiangdou/service): typo
tpoisonooo Aug 19, 2024
8f302f9
feat(service): gradio streaming chat
tpoisonooo Aug 19, 2024
81f0b69
feat(huixiangdou): test gradio passed
tpoisonooo Aug 20, 2024
6c9c300
tests(gradio.py): update
tpoisonooo Aug 20, 2024
4b007ac
update
tpoisonooo Aug 20, 2024
e97ec90
tests(service): gradio passed
tpoisonooo Aug 20, 2024
78d171c
tests(huixiangdou/main): test passed
tpoisonooo Aug 20, 2024
708aa23
update
tpoisonooo Aug 20, 2024
09a9652
ci(projects): update
tpoisonooo Aug 20, 2024
ce4c80e
style(llm_client.py): remove useless
tpoisonooo Aug 20, 2024
f4be1cd
update
tpoisonooo Aug 20, 2024
527d12b
update
tpoisonooo Aug 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/scripts/doc_link_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def analyze_doc(home, path):
ref = ref[ref.find('#'):]
fullpath = os.path.join(home, ref)
if not os.path.exists(fullpath):
raise ValueError(fullpath)
problem_list.append(ref)
else:
continue
Expand Down
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@ English | [简体中文](README_zh.md)

</div>

HuixiangDou is a **group chat** assistant based on LLM (Large Language Model).
HuixiangDou is a **professional knowledge assistant** based on LLM.

Advantages:

1. Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/).
2. No training required, with CPU-only, 2G, 10G and 80G configuration
1. Design three-stage pipelines of preprocess, rejection and response
* `chat_in_group` copes with **group chat** scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/)
* `chat_with_repo` for **real-time streaming** chat
2. No training required, with CPU-only, 2G, 10G, 20G and 80G configuration
3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable

Check out the [scenes in which HuixiangDou are running](./huixiangdou-inside.md) and join [WeChat Group](resource/figures/wechat.jpg) to try AI assistant inside.
Expand All @@ -46,6 +48,7 @@ If this helps you, please give it a star ⭐

Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web), where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) and [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y) !

- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 👍
- \[2024/07\] Image and text retrieval & Removal of `langchain` 👍
- \[2024/07\] [Hybrid Knowledge Graph and Dense Retrieval](./docs/knowledge_graph_en.md) improve 1.7% F1 score 🎯
- \[2024/06\] [Evaluation of chunksize, splitter, and text2vec model](./evaluation) 🎯
Expand Down Expand Up @@ -221,7 +224,9 @@ python3 -m huixiangdou.main --standalone
python3 -m huixiangdou.gradio
```

Or run a server to listen 23333:
https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554

Or run a server to listen 23333, default pipeline is `chat_with_repo`:
```bash
python3 -m huixiangdou.server

Expand Down Expand Up @@ -368,7 +373,7 @@ Contributors have provided [Android tools](./android) to interact with WeChat. T
3. How to access other local LLM / After access, the effect is not ideal?

- Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./huixiangdou/service/worker.py).
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [prompt.py](./huixiangdou/service/prompt.py).

4. What if the response is too slow/request always fails?

Expand Down
19 changes: 13 additions & 6 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@

</div>

茴香豆是一个基于 LLM 的**群聊**知识助手,优势:
茴香豆是一个基于 LLM 的专业知识助手,优势:

1. 设计预处理、拒答、响应三阶段 pipeline 应对群聊场景,解答问题同时不会消息泛滥。精髓见 [2401.08772](https://arxiv.org/abs/2401.08772),[2405.02817](https://arxiv.org/abs/2405.02817),[混合检索](./docs/knowledge_graph_zh.md)和[业务数据精度测试](./evaluation)
2. 无需训练适用各行业,提供 CPU-only、2G、10G、80G 规格配置
1. 设计预处理、拒答、响应三阶段 pipeline:
* `chat_in_group` 群聊场景,解答问题时不会消息泛滥。见 [2401.08772](https://arxiv.org/abs/2401.08772),[2405.02817](https://arxiv.org/abs/2405.02817),[混合检索](./docs/knowledge_graph_zh.md)和[业务数据精度测试](./evaluation)
* `chat_with_repo` 实时聊天场景,响应更快
2. 无需训练适用各行业,提供 CPU-only、2G、10G、20G、80G 规格配置
3. 提供一整套前后端 web、android、算法源码,工业级开源可商用

查看[茴香豆已运行在哪些场景](./huixiangdou-inside.md);加入[微信群](resource/figures/wechat.jpg)直接体验群聊助手效果。
Expand All @@ -45,6 +47,7 @@

Web 版视频教程见 [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) 和 [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y)。

- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py)
- \[2024/07\] 图文检索 & 移除 `langchain` 👍
- \[2024/07\] [混合知识图谱和稠密检索,F1 提升 1.7%](./docs/knowledge_graph_zh.md) 🎯
- \[2024/06\] [评估 chunksize,splitter 和 text2vec 模型](./evaluation) 🎯
Expand Down Expand Up @@ -216,10 +219,14 @@ python3 -m huixiangdou.main --standalone
💡 也可以启动 `gradio` 搭建一个简易的 Web UI,默认绑定 7860 端口:

```bash
python3 -m huixiangdou.gradio
python3 -m huixiangdou.gradio
# 若已单独运行 `llm_server_hybrid.py`,可以
# python3 -m huixiangdou.gradio --no-standalone
```

或者启动服务端,监听 23333 端口:
https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554

或者启动服务端,监听 23333 端口。默认使用 `chat_with_repo` pipeline:
```bash
python3 -m huixiangdou.server

Expand Down Expand Up @@ -364,7 +371,7 @@ python3 tests/test_query_gradio.py
3. 如何接入其他 local LLM / 接入后效果不理想怎么办?

- 打开 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py),增加新的 LLM 推理实现
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [worker.py](./huixiangdou/service/worker.py)
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [prompt.py](./huixiangdou/service/prompt.py)

4. 响应太慢/网络请求总是失败怎么办?

Expand Down
2 changes: 1 addition & 1 deletion config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ engine = "serper"
# For ddgs, see https://pypi.org/project/duckduckgo-search
# For serper, check https://serper.dev/api-key to get a free API key
serper_x_api_key = "YOUR-API-KEY-HERE"
domain_partial_order = ["openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
domain_partial_order = ["arxiv.org", "openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
save_dir = "logs/web_search_result"

[llm]
Expand Down
4 changes: 2 additions & 2 deletions docs/full_dev_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,6 @@ The basic version may not perform well. You can enable these features to enhance

It is often unavoidable to adjust parameters with respect to business scenarios.

- Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./huixiangdou/service/worker.py).
- Adjust the [number of search results](./huixiangdou/service/worker.py) based on the maximum length supported by the model.
- Refer to [data.json](../tests/data.json) to add real data, run [test_intention_prompt.py](../tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [prompt.py](../huixiangdou/service/prompt.py).
- Adjust the [number of search results](../huixiangdou/service/serial_pipeline.py) based on the maximum length supported by the model.
- Update `web_search.domain_partial_order` in `config.ini` according to your scenarios.
4 changes: 2 additions & 2 deletions docs/full_dev_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,6 @@

针对业务场景调参往往不可避免。

- 参照 [data.json](./tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [worker](./huixiangdou/service/worker.py)
- 根据模型支持的最大长度,调整[搜索结果个数](./huixiangdou/service/worker.py)
- 参照 [data.json](../tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](../tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [prompt.py](../huixiangdou/service/prompt.py)
- 根据模型支持的最大长度,调整[搜索结果个数](../huixiangdou/service/serial_pipeline.py)
- 按照场景偏好,修改 config.ini 中的 `web_search.domain_partial_order`,即搜索结果偏序
2 changes: 1 addition & 1 deletion huixiangdou/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from .service import FeatureStore # noqa E401
from .service import HybridLLMServer # noqa E401
from .service import WebSearch # noqa E401
from .service import Worker # noqa E401
from .service import SerialPipeline, ParallelPipeline # no E401
from .service import build_reply_text # noqa E401
from .service import llm_serve # noqa E401
from .version import __version__
2 changes: 1 addition & 1 deletion huixiangdou/frontend/wechat.py
Original file line number Diff line number Diff line change
Expand Up @@ -845,7 +845,7 @@ def loop(self, worker):

def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='wechat server.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand Down
150 changes: 121 additions & 29 deletions huixiangdou/gradio.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@
import time
import pdb
from multiprocessing import Process, Value

import asyncio
import cv2
import gradio as gr
import pytoml
from loguru import logger

from typing import List
from huixiangdou.primitive import Query
from huixiangdou.service import ErrorCode, Worker, llm_serve, start_llm_server

from huixiangdou.service import ErrorCode, SerialPipeline, ParallelPipeline, llm_serve, start_llm_server
import json

def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='SerialPipeline.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand All @@ -25,7 +25,7 @@ def parse_args():
'--config_path',
default='config.ini',
type=str,
help='Worker configuration path. Default value is config.ini')
help='SerialPipeline configuration path. Default value is config.ini')
parser.add_argument('--standalone',
action='store_true',
default=True,
Expand All @@ -37,50 +37,142 @@ def parse_args():
args = parser.parse_args()
return args

def predict(text, image):
language='en'
enable_web_search=False
pipeline='chat_with_repo'
main_args = None
paralle_assistant = None
serial_assistant = None

def on_language_changed(value:str):
global language
print(value)
language = value

def on_pipeline_changed(value:str):
global pipeline
print(value)
pipeline = value

def on_web_search_changed(value: str):
global enable_web_search
print(value)
if 'no' in value:
enable_web_search = False
else:
enable_web_search = True


def format_refs(refs: List[str]):
refs_filter = list(set(refs))
if len(refs) < 1:
return ''
text = ''
if language == 'zh':
text += '参考资料:\r\n'
else:
text += '**References:**\r\n'

for file_or_url in refs_filter:
text += '* {}\r\n'.format(file_or_url)
text += '\r\n'
return text


async def predict(text:str, image:str):
global language
global enable_web_search
global pipeline
global main_args
global serial_assistant
global paralle_assistant

with open('query.txt', 'a') as f:
f.write(json.dumps({'data': text}))
f.write('\n')

if image is not None:
filename = 'image.png'
image_path = os.path.join(args.work_dir, filename)
cv2.imwrite(image_path, image)
else:
image_path = None

assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
query = Query(text, image_path)
if 'chat_in_group' in pipeline:
if serial_assistant is None:
serial_assistant = SerialPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
args = {'query':query, 'history': [], 'groupname':''}
pipeline = {'status': {}}
debug = dict()
stream_chat_content = ''
for sess in serial_assistant.generate(**args):
if len(sess.delta) > 0:
# start chat, display
stream_chat_content += sess.delta
yield stream_chat_content
else:
status = {
"state":str(sess.code),
"response": sess.response,
"refs": sess.references
}
pipeline['status'] = status
pipeline['debug'] = sess.debug

json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
yield json_str

pipeline = {'step': []}
debug = dict()
for sess in assistant.generate(query=query, history=[], groupname=''):
status = {
"state":str(sess.code),
"response": sess.response,
"refs": sess.references
}
else:
if paralle_assistant is None:
paralle_assistant = ParallelPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
args = {'query':query, 'history':[], 'language':language}
args['enable_web_search'] = enable_web_search

print(status)
pipeline['step'].append(status)
pipeline['debug'] = sess.debug
sentence = ''
async for sess in paralle_assistant.generate(**args):
if sentence == '' and len(sess.references) > 0:
sentence = format_refs(sess.references)

json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
yield json_str
if len(sess.delta) > 0:
sentence += sess.delta
yield sentence

yield sentence

if __name__ == '__main__':
args = parse_args()
main_args = parse_args()

# start service
if args.standalone is True:
if main_args.standalone is True:
# hybrid llm serve
start_llm_server(config_path=args.config_path)
start_llm_server(config_path=main_args.config_path)

with gr.Blocks() as demo:
with gr.Blocks(theme=gr.themes.Soft(), title='HuixiangDou AI assistant', analytics_enabled=True) as demo:
with gr.Row():
gr.Markdown("""
#### [HuixiangDou](https://github.com/internlm/huixiangdou) AI assistant
""", label='Reply', header_links=True, line_breaks=True,)
with gr.Row():
input_question = gr.TextArea(label='Input the question.')
input_image = gr.Image(label='Upload Image.')
with gr.Column():
ui_pipeline = gr.Radio(["chat_with_repo", "chat_in_group"], label="Pipeline type", info="Group-chat is slow but accurate and safe, default value is `chat_with_repo`")
ui_pipeline.change(fn=on_pipeline_changed, inputs=ui_pipeline, outputs=[])
with gr.Column():
ui_language = gr.Radio(["en", "zh"], label="Language", info="Use `en` by default ")
ui_language.change(fn=on_language_changed, inputs=ui_language, outputs=[])
with gr.Column():
ui_web_search = gr.Radio(["no", "yes"], label="Enable web search", info="Disable by default ")
ui_web_search.change(on_web_search_changed, inputs=ui_web_search, outputs=[])

with gr.Row():
input_question = gr.TextArea(label='Input your question', placeholder='how to install mmpose ?', show_copy_button=True, lines=9)
input_image = gr.Image(label='[Optional] Image-text retrieval needs `config-multimodal.ini`')
with gr.Row():
run_button = gr.Button()
with gr.Row():
result = gr.TextArea(label='HuixiangDou pipline status', show_copy_button=True)
result = gr.Markdown('>Text reply or inner status callback here, depends on `pipeline type`', label='Reply', show_label=True, header_links=True, line_breaks=True, show_copy_button=True)
# result = gr.TextArea(label='Reply', show_copy_button=True, placeholder='Text Reply or inner status callback, depends on `pipeline type`')

run_button.click(predict, [input_question, input_image], [result])

demo.queue()
demo.launch(share=False, server_name='0.0.0.0', debug=True)
11 changes: 4 additions & 7 deletions huixiangdou/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@
from loguru import logger
from termcolor import colored

from .service import ErrorCode, Worker, build_reply_text, start_llm_server
from .service import ErrorCode, SerialPipeline, build_reply_text, start_llm_server


def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='SerialPipeline.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand All @@ -25,7 +25,7 @@ def parse_args():
'--config_path',
default='config.ini',
type=str,
help='Worker configuration path. Default value is config.ini')
help='SerialPipeline configuration path. Default value is config.ini')
parser.add_argument('--standalone',
action='store_true',
default=False,
Expand Down Expand Up @@ -191,7 +191,7 @@ def run():
with open(args.config_path, encoding='utf8') as f:
fe_config = pytoml.load(f)['frontend']
logger.info('Config loaded.')
assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
assistant = SerialPipeline(work_dir=args.work_dir, config_path=args.config_path)

fe_type = fe_config['type']
if fe_type == 'none':
Expand All @@ -209,8 +209,5 @@ def run():
f'unsupported fe_config.type {fe_type}, please read `config.ini` description.' # noqa E501
)

# server_process.join()


if __name__ == '__main__':
run()
Loading
Loading