diff --git a/README.md b/README.md
index 73ab5fa7..0edef52c 100644
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@
[简体中文](README_zh.md) | English
-[![GitHub license](https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg)](./LICENSE)
-![CI](https://img.shields.io/github/actions/workflow/status/internml/huixiangdou/lint.yml?branch=master)
+[![GitHub license](https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg?style=plastic)](./LICENSE)
+![CI](https://img.shields.io/github/actions/workflow/status/internml/huixiangdou/lint.yml?branch=master&style=plastic)
@@ -17,13 +17,13 @@ View [HuixiangDou inside](./huixiangdou-inside.md).
# 📦 Hardware Requirements
-The following are the hardware requirements for running Fennel. It is suggested to follow the deployment process, starting with the basic version and gradually experiencing advanced features.
+The following are the hardware requirements for running. It is suggested to follow this document, starting with the basic version and gradually experiencing advanced features.
-| Version | Hardware Requirements | Remarks |
-| :-: | :-: | :-: |
-| Basic Version | 20GB GPU memory, such as 3090 or above | Able to answer basic domain knowledge questions, zero cost operation |
-| Advanced Version | 40GB GPU memory, such as A100 | Able to answer source code level questions, zero cost operation |
-| Modified Version | 4GB graphics memory, such as 3050/2080ti | Using openai API to replace local LLM, basic development capability required, operation involves cost |
+| Version | GPU Memory Requirements | Remarks | Tested on |
+| :-: | :-: | :-: | :-: |
+| Basic Version | 20GB | Able to answer basic domain knowledge questions, zero cost operation | ![](https://img.shields.io/badge/linux%203090%2024G-passed-blue?style=for-the-badge) |
+| Advanced Version | 40GB | Able to answer source code level questions, zero cost operation | ![](https://img.shields.io/badge/linux%20A100%2080G-passed-blue?style=for-the-badge) |
+| Modified Version | 4GB | Using openai API to replace local LLM, basic development capability required, operation involves cost | ![](https://img.shields.io/badge/linux%201660ti%206GB-passed-blue?style=for-the-badge) |
# 🔥 Run
@@ -34,15 +34,17 @@ We will take lmdeploy & mmpose as examples to explain how to deploy the knowledg
```shell
# Download chat topics
mkdir repodir
-git clone https://github.com/openmmlab/mmpose --depth=1 repodir/mmpose
+git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy
# Establish feature repository
cd HuixiangDou && mkdir workdir # Create working directory
python3 -m pip install -r requirements.txt # Install dependencies, python3.11 requires `conda install conda-forge::faiss-gpu`
-python3 service/feature_store.py repodir workdir # Save features from repodir to workdir
+python3 service/feature_store.py # Save features from repodir to workdir
```
+The first run will automatically download the configuration of [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese), you can also manually download it and update model path in `config.ini`.
+
After running, HuixiangDou can distinguish which user topics should be dealt with and which chitchats should be rejected. Please edit [good_questions](./resource/good_questions.json) and [bad_questions](./resource/bad_questions.json), and try your own domain knowledge (medical, finance, electricity, etc.).
```shell
@@ -72,7 +74,7 @@ x_api_key = "${YOUR-X-API-KEY}"
Please ensure that the GPU memory is over 20GB (such as 3090 or above). If the memory is low, please modify it according to the FAQ.
-The first run will automatically download the configuration of internlm2-7B and text2vec-large-chinese, please ensure network connectivity.
+The first run will automatically download the configuration of internlm2-7B.
* **Non-docker users**. If you **don't** use docker environment, you can start all services at once.
```shell
diff --git a/README_zh.md b/README_zh.md
index 9afe6743..4619bc45 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -3,8 +3,8 @@
简体中文 | [English](README.md)
-[![GitHub license](https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg)](./LICENSE)
-![CI](https://img.shields.io/github/actions/workflow/status/internml/huixiangdou/lint.yml?branch=master)
+[![GitHub license](https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg?style=plastic)](./LICENSE)
+![CI](https://img.shields.io/github/actions/workflow/status/internml/huixiangdou/lint.yml?branch=master&style=plastic)
“茴香豆”是一个基于 LLM 的领域特定知识助手。特点:
@@ -18,11 +18,11 @@
以下是运行茴香豆的硬件需求。建议遵循部署流程,从基础版开始,逐渐体验高级特性。
-| 版本 | 硬件需求 | 备注 |
-| :-: | :-: | :-: |
-| 基础版 | 20G GPU 显存,如 3090 及以上 | 能够回答领域知识的基础问题,零成本运行 |
-| 高级版 | 40G 显存,如 A100 | 能够回答源码级问题,零成本运行 |
-| 魔改版 | 4G 显存,如 3050/2080ti | 用 openai API 替代本地 LLM,需要基础开发能力,运行需要费用 |
+| 版本 | 硬件需求 | 备注 | 已验证设备 |
+| :-: | :-: | :-: | :-: |
+| 基础版 | 20GB | 能够回答领域知识的基础问题,零成本运行 | ![](https://img.shields.io/badge/linux%203090%2024G-passed-blue?style=for-the-badge) |
+| 高级版 | 40GB | 能够回答源码级问题,零成本运行 | ![](https://img.shields.io/badge/linux%20A100%2080G-passed-blue?style=for-the-badge) |
+| 魔改版 | 4GB| 用 openai API 替代本地 LLM,需要基础开发能力,运行需要费用 | ![](https://img.shields.io/badge/linux%201660ti%206GB-passed-blue?style=for-the-badge) |
# 🔥 运行
@@ -32,15 +32,17 @@
```shell
# 下载聊天话题
mkdir repodir
-git clone https://github.com/openmmlab/mmpose --depth=1 repodir/mmpose
+git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy
# 建立特征库
cd HuixiangDou && mkdir workdir # 创建工作目录
python3 -m pip install -r requirements.txt # 安装依赖,python3.11 需要 `conda install conda-forge::faiss-gpu`
-python3 service/feature_store.py repodir workdir # 把 repodir 的特征保存到 workdir
+python3 service/feature_store.py # 把 repodir 的特征保存到 workdir
```
-运行结束后,茴香豆能够区分应该处理哪些用户话题,哪些闲聊应该拒绝。请编辑 [good_questions](./resource/good_questions.json) 和 [bad_questions](./resource/bad_questions.json),尝试自己的领域知识(医疗,金融,电力等)。
+首次运行将自动下载配置中的 [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese),如果自动下载失败,可以手动下载到本地,然后在 `config.ini` 设置模型路径。
+
+结束后,茴香豆能够区分应该处理哪些用户话题,哪些闲聊应该拒绝。请编辑 [good_questions](./resource/good_questions.json) 和 [bad_questions](./resource/bad_questions.json),尝试自己的领域知识(医疗,金融,电力等)。
```shell
# 接受技术话题
@@ -69,7 +71,7 @@ x_api_key = "${YOUR-X-API-KEY}"
请保证 GPU 显存超过 20GB(如 3090 及以上),若显存较低请按 FAQ 修改。
-首次运行将自动下载配置中的 internlm2-7B 和 text2vec-large-chinese,请保证网络畅通。
+首次运行将自动下载配置中的 internlm2-7B,请保证网络畅通。
* **非 docker 用户**。如果你**不**使用 docker 环境,可以一次启动所有服务。
```shell
diff --git a/service/feature_store.py b/service/feature_store.py
index ca1c3a4a..449c35ed 100644
--- a/service/feature_store.py
+++ b/service/feature_store.py
@@ -338,6 +338,8 @@ def preprocess(self, repo_dir: str, work_dir: str):
if file.endswith('.md') and 'mdb' not in file:
mds.append(os.path.join(root, file))
+ if len(mds) < 1:
+ raise Exception(f'cannot search any markdown file, please check usage: python3 {__file__} workdir repodir')
# copy each file to ./finetune-data/ with new name
for _file in mds:
tmp = _file.replace("/", "_")
@@ -398,10 +400,11 @@ def initialize(self,
def parse_args():
parser = argparse.ArgumentParser(
description='Feature store for processing directories.')
- parser.add_argument('work_dir', type=str, help='Working directory.')
+ parser.add_argument('--work_dir', type=str, default='workdir', help='Working directory.')
parser.add_argument(
- 'repo_dir',
+ '--repo_dir',
type=str,
+ default='repodir',
help='Root directory where the repositories are located.')
parser.add_argument(
'--good_questions',