feat(docs): how to add readthedocs (#377)

* feat(docs): how to add readthedocs * docs(README): update
InternLM · Aug 28, 2024 · 109616c · 109616c
1 parent 008f1fa
commit 109616c
Show file tree

Hide file tree

Showing 8 changed files with 127 additions and 88 deletions.
diff --git a/README.md b/README.md
@@ -51,7 +51,7 @@ If this helps you, please give it a star ⭐
 
 Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web), where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) and [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y) !
 
-- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 👍
+- \[2024/08\] [chat_with_readthedocs](https://huixiangdou.readthedocs.io/en/latest/), see [how to integrate](./docs/zh/doc_add_readthedocs.md) 👍
 - \[2024/07\] Image and text retrieval & Removal of `langchain` 👍
 - \[2024/07\] [Hybrid Knowledge Graph and Dense Retrieval](./docs/en/doc_knowledge_graph.md) improve 1.7% F1 score 🎯
 - \[2024/06\] [Evaluation of chunksize, splitter, and text2vec model](./evaluation) 🎯
@@ -132,8 +132,9 @@ Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/det
 - WeChat([android](./docs/zh/doc_add_wechat_accessibility.md)/[wkteam](./docs/zh/doc_add_wechat_commercial.md))
 - Lark
 - [OpenXLab Web](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web)
-- [Gradio Demo](./huixiangdou/gradio.py)
+- [Gradio Demo](./huixiangdou/gradio_ui.py)
 - [HTTP Server](./huixiangdou/server.py)
+- [Read the Docs](./docs/zh/doc_add_readthedocs.md)
 
 </td>
 
@@ -227,7 +228,7 @@ python3 -m huixiangdou.main --standalone
 💡 Also run a simple Web UI with `gradio`:
 
 ```bash
-python3 -m huixiangdou.gradio
+python3 -m huixiangdou.gradio_ui
 ```
 
 <video src="https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554" ></video>
@@ -282,7 +283,7 @@ python3 -m huixiangdou.service.feature_store --config_path config-cpu.ini
 # Q&A test
 python3 -m huixiangdou.main --standalone --config_path config-cpu.ini
 # gradio UI
-python3 -m huixiangdou.gradio --config_path config-cpu.ini
+python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini
 ```
 
 If you find the installation too slow, a pre-installed image is provided in [Docker Hub](https://hub.docker.com/repository/docker/tpoisonooo/huixiangdou/tags). Simply replace it when starting the docker.

diff --git a/README_zh.md b/README_zh.md
@@ -7,12 +7,12 @@
   <a href="resource/figures/wechat.jpg" target="_blank">
     <img alt="Wechat" src="https://img.shields.io/badge/wechat-robot%20inside-brightgreen?logo=wechat&logoColor=white" />
   </a>
-  <!-- <a href="https://huixiangdou.readthedocs.io/zh-cn/latest/" target="_blank">
-    <img alt="Readthedocs" src="https://img.shields.io/badge/readthedocs-chat%20with%20AI-brightgreen?logo=readthedocs&logoColor=white" />
-  </a> -->
   <a href="https://huixiangdou.readthedocs.io/zh-cn/latest/" target="_blank">
-    <img alt="Readthedocs" src="https://img.shields.io/badge/readthedocs-black?logo=readthedocs&logoColor=white" />
+    <img alt="Readthedocs" src="https://img.shields.io/badge/readthedocs-chat%20with%20AI-brightgreen?logo=readthedocs&logoColor=white" />
   </a>
+  <!-- <a href="https://huixiangdou.readthedocs.io/zh-cn/latest/" target="_blank">
+    <img alt="Readthedocs" src="https://img.shields.io/badge/readthedocs-black?logo=readthedocs&logoColor=white" />
+  </a> -->
   <a href="https://youtu.be/ylXrT-Tei-Y" target="_blank">
     <img alt="YouTube" src="https://img.shields.io/badge/YouTube-black?logo=youtube&logoColor=red" />
   </a>
@@ -50,7 +50,7 @@
 
 Web 版视频教程见 [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) 和 [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y)。
 
-- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 
+- \[2024/08\] ["chat_with readthedocs"](https://huixiangdou.readthedocs.io/zh-cn/latest/) ，见[集成说明](./docs/zh/doc_add_readthedocs.md)
 - \[2024/07\] 图文检索 & 移除 `langchain` 👍
 - \[2024/07\] [混合知识图谱和稠密检索，F1 提升 1.7%](./docs/zh/doc_knowledge_graph.md) 🎯
 - \[2024/06\] [评估 chunksize，splitter 和 text2vec 模型](./evaluation) 🎯
@@ -131,8 +131,9 @@ Web 版视频教程见 [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn)
 - 微信（[android](./docs/zh/doc_add_wechat_accessibility.md)/[wkteam](./docs/zh/doc_add_wechat_commercial.md)）
 - 飞书
 - [OpenXLab Web](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web)
-- [Gradio Demo](./huixiangdou/gradio.py)
+- [Gradio Demo](./huixiangdou/gradio_ui.py)
 - [HTTP Server](./huixiangdou/server.py)
+- [Read the Docs](./docs/zh/doc_add_readthedocs.md)
 
 </td>
 
@@ -225,9 +226,9 @@ python3 -m huixiangdou.main --standalone
 💡 也可以启动 `gradio` 搭建一个简易的 Web UI，默认绑定 7860 端口：
 
 ```bash
-python3 -m huixiangdou.gradio 
+python3 -m huixiangdou.gradio_ui
 # 若已单独运行 `llm_server_hybrid.py`，可以 
-# python3 -m huixiangdou.gradio --no-standalone
+# python3 -m huixiangdou.gradio_ui --no-standalone
 ```
 
 <video src="https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554" ></video>
@@ -281,7 +282,7 @@ python3 -m huixiangdou.service.feature_store  --config_path config-cpu.ini
 # 问答测试
 python3 -m huixiangdou.main --standalone --config_path config-cpu.ini
 # gradio UI
-python3 -m huixiangdou.gradio --config_path config-cpu.ini
+python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini
 ```
 
 如果装依赖太慢，[dockerhub 里](https://hub.docker.com/repository/docker/tpoisonooo/huixiangdou/tags)提供了安装好依赖的镜像，docker 启动时替换即可。

diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -29,6 +29,13 @@ We warmly welcome users' PRs and Issues!
    doc_architecture.md
    doc_rag_annotate_sft_data.md
 
+.. _readthedocs:
+.. toctree::
+   :maxdepth: 1
+   :caption: readthedocs Integration
+
+   doc_add_readthedocs.md
+
 .. _IMApplicaion:
 .. toctree::
    :maxdepth: 1

diff --git a/docs/zh/doc_add_readthedocs.md b/docs/zh/doc_add_readthedocs.md
@@ -0,0 +1,95 @@
+# 在 readthedocs 实现 `chat_with_repo`
+
+本文介绍如何零成本在 readthedocs 实现 `chat_with_repo`。效果见 [HuixiangDou readthedocs 文档](https://huixiangdou.readthedocs.io)。
+
+部署图如下：
+
+<img src="https://github.com/user-attachments/assets/d15935fa-a8fa-49ed-9995-7549ab1f71dc" width="400">
+
+其中：
+* [readthedocs](https://readthedocs.io) 托管中英文文档
+* [OpenXLab](https://openxlab.org.cn/apps) 提供 https 入口（readthedocs 无法内嵌 http）和 cpu
+* [SiliconCloud](https://siliconflow.cn/siliconcloud) 提供 text2vec、reranker 和 LLM 模型 API
+
+我们需要使用 readthedocs 的自定义 theme，在 theme 中添加按钮。
+
+1. 点击按钮时，创建一个 `iframe` 加载 https 版茴香豆
+2. https 需要审核域名。可以用 OpenXLab 提供的随机子域名
+3. OpenXLab 中 GPU 资源有限，我们使用 SiliconCloud 提供的免费模型 API
+
+以下是操作步骤。
+
+## 一、准备代码和文档
+
+假设用 mmpose 所有文档做知识库，把知识库放入 repodir
+
+```bash
+cd HuixiangDou
+mkdir repodir
+git clone https://github.com/open-mmlab/mmpose --depth=1
+# 移除知识库的 .git
+rm -rf .git
+```
+
+调整 `gradio_ui.py` 的默认配置，使用 `config-cpu.ini`
+```bash
+# huixiangdou/gradio_ui.py
+    parser.add_argument(
+        '--config_path',
+        default='config-cpu.ini',
+        type=str,
+..
+```
+
+连同知识库和 Huixiangou 项目，一起提交到 Gtihub，例如 [huixiangdou-readthedocs](https://github.com/tpoisonooo/huixiangdou-readthedocs/tree/for-openxlab-readthedocs) 的 `for-openxlab-readthedocs` 分支。
+
+## 二、创建 OpenXLab 应用
+
+打开 [OpenXLab](https://openxlab.org.cn/apps)，创建 `Gradio` 类型应用。
+
+1. 填入上一步的 Github 地址和分支名称
+2. 服务器选择 CPU
+
+确认后，修改应用设置：
+
+* `自定义启动文件` 改为 `huixiangdou/gradio_ui.py`
+* 由于代码已开源，需配置环境变量。HuixiangDou 优先使用配置中的 token，找不到时会尝试检查 `SILICONCLOUD_TOKEN` 和 `LLM_API_TOKEN`，如图：
+
+    <img src="https://github.com/user-attachments/assets/66291c65-1a5e-495a-aad6-e8962bef6bb6" width="400">
+
+
+启动。首次运行需要 **10min 左右**建立特征库，结束后应该能看到一个 gradio 应用。例如:
+
+```bash
+https://openxlab.org.cn/apps/detail/tpoisonooo/HuixiangDou-readthedocs
+```
+
+在浏览器中按 F12，检查源码，可获得此服务对应的 https 地址：
+
+```JavaScript
+src="https://g-app-center-000704-0786-wrbqzpv.openxlab.space"
+```
+
+只要不删除应用数据，这个地址是**固定的**。
+
+
+## 三、使用 readthedocs 自定义主题
+
+假设你已经熟悉 readthedocs 基本用法，可以直接拷贝 HuixiangDou docs 目录
+
+* zh 或 en 目录
+* requirements/doc.txt 设置自定义主题
+
+[这里](https://github.com/tpoisonooo/pytorch_sphinx_theme/
+) 是我们的自定义主题的实现，主要是：
+
+1. 在 [layout.html](https://github.com/tpoisonooo/pytorch_sphinx_theme/blob/3db120b0f1e064425f37e98368dcea49972702e9/pytorch_sphinx_theme/layout.html#L324) 创建了一个 `chatButton` 和空白 container
+2. 为 `chatButton` 绑定事件。按钮点击时，空白 container 加载 https 地址，例如前面的：
+
+    ```bash
+    https://g-app-center-000704-0786-wrbqzpv.openxlab.space
+    ```
+
+    在 [theme.css](https://github.com/tpoisonooo/pytorch_sphinx_theme/blob/master/pytorch_sphinx_theme/static/css/theme.css) 中，您可修改自己喜欢的样式。
+
+最后，在 readthedocs.io 配置自己的项目，`Build Version` 即可。
diff --git a/docs/zh/index.rst b/docs/zh/index.rst
@@ -29,6 +29,13 @@ HuixiangDou 上手路线
    doc_rag_annotate_sft_data.md
    doc_architecture.md
 
+.. _接入readthedocs:
+.. toctree::
+   :maxdepth: 1
+   :caption: 接入readthedocs
+
+   doc_add_readthedocs.md
+
 .. _接入即时通讯软件:
 .. toctree::
    :maxdepth: 1

diff --git a/huixiangdou/gradio.py → huixiangdou/gradio_ui.py b/huixiangdou/gradio.py → huixiangdou/gradio_ui.py
diff --git a/requirements.txt b/requirements.txt
@@ -19,8 +19,8 @@ redis
 requests
 scikit-learn
 # See https://github.com/deanmalmgren/textract/issues/461
-textract @ git+https://github.com/tpoisonooo/textract@master
-# textract
+# textract @ git+https://github.com/tpoisonooo/textract@master
+textract
 texttable
 tiktoken
 torch>=2.0.0

diff --git a/tests/test_query_gradio.py b/tests/test_query_gradio.py
@@ -1,76 +1,4 @@
-import argparse
-import json
-import os
-import time
-from multiprocessing import Process, Value
-
-import cv2
-import gradio as gr
-import pytoml
 from loguru import logger
 
-from huixiangdou.primitive import Query
-from huixiangdou.service import ErrorCode, SerialPipeline, ParallelPipeline, llm_serve, start_llm_server
-
-def parse_args():
-    """Parse args."""
-    parser = argparse.ArgumentParser(description='SerialPipeline Gradio WebUI.')
-    parser.add_argument('--work_dir',
-                        type=str,
-                        default='workdir',
-                        help='Working directory.')
-    parser.add_argument(
-        '--config_path',
-        default='config.ini',
-        type=str,
-        help='SerialPipeline configuration path. Default value is config.ini')
-    parser.add_argument('--standalone',
-                        action='store_true',
-                        default=True,
-                        help='Auto deploy required Hybrid LLM Service.')
-    args = parser.parse_args()
-    return args
-
-
-def get_reply(text, image):
-    if image is not None:
-        filename = 'image.png'
-        image_path = os.path.join(args.work_dir, filename)
-        cv2.imwrite(image_path, image)
-    else:
-        image_path = None
-
-    assistant = SerialPipeline(work_dir=args.work_dir, config_path=args.config_path)
-    query = Query(text, image_path)
-
-    code, reply, references = assistant.generate(query=query,
-                                                 history=[],
-                                                 groupname='')
-    ret = dict()
-    ret['text'] = str(reply)
-    ret['code'] = int(code)
-    ret['references'] = references
-
-    return json.dumps(ret, indent=2, ensure_ascii=False)
-
-
 if __name__ == '__main__':
-    args = parse_args()
-
-    # start service
-    if args.standalone is True:
-        # hybrid llm serve
-        start_llm_server(config_path=args.config_path)
-
-    with gr.Blocks() as demo:
-        with gr.Row():
-            input_question = gr.Textbox(label='Input the question.')
-            input_image = gr.Image(label='Upload Image.')
-            with gr.Column():
-                result = gr.Textbox(label='Generate response.')
-                run_button = gr.Button()
-        run_button.click(fn=get_reply,
-                         inputs=[input_question, input_image],
-                         outputs=result)
-    logger.warning('This file would move to `huixiangdou.gradio`')
-    demo.launch(share=False, server_name='0.0.0.0', debug=True)
+    logger.warning('This file moved to `huixiangdou.gradio_ui`')