Make WebUI and API code cleaner (+ 1.5 fixes) (#703)

* rename webui.py to run_webui.py * remove unused imports * remove unsued code * move inference code and fix all warnings * move web app code * make code easier to read * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused function * remove msgpack_api.py * rename API files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * finish updating the doc with the new file names * finish updating the doc with the new file names * fix CPU use in the API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor WebUIinference in a class with submodules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * re-enable streaming in webui inference code * generalize inference code in webui * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * make a unique inference engine class * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * cleaning code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implement new structure of the API (not working) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reimplement chat endpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fishaudio · Dec 7, 2024 · 62eae26 · 62eae26
1 parent 954cae1
commit 62eae26
Show file tree

Hide file tree

Showing 45 changed files with 1,959 additions and 1,697 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -45,7 +45,7 @@ body:
       description: |
         Include detailed steps, screenshots, and logs. Use the correct markdown syntax for code blocks.
       placeholder: |
-        1. Run the command `python -m tools.post_api -t "xxxxx"`
+        1. Run the command `python -m tools.api_client -t "xxxxx"`
         2. Observe the console output error: `ModuleNotFoundError: No module named 'pyaudio'` (with screenshots or logs will be better)
     validations:
       required: true

diff --git a/docs/en/index.md b/docs/en/index.md
@@ -185,7 +185,7 @@ pip install -e .[stable]
 4. Configure environment variables and access WebUI
 
     In the terminal inside the docker container, enter `export GRADIO_SERVER_NAME="0.0.0.0"` to allow external access to the gradio service inside docker.
-    Then in the terminal inside the docker container, enter `python tools/webui.py` to start the WebUI service.
+    Then in the terminal inside the docker container, enter `python tools/run_webui.py` to start the WebUI service.
 
     If you're using WSL or MacOS, visit [http://localhost:7860](http://localhost:7860) to open the WebUI interface.
 

diff --git a/docs/en/inference.md b/docs/en/inference.md
@@ -67,7 +67,7 @@ python tools/vqgan/inference.py \
 We provide a HTTP API for inference. You can use the following command to start the server:
 
 ```bash
-python -m tools.api \
+python -m tools.api_server \
     --listen 0.0.0.0:8080 \
     --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
@@ -78,10 +78,10 @@ python -m tools.api \
 
 After that, you can view and test the API at http://127.0.0.1:8080/.
 
-Below is an example of sending a request using `tools/post_api.py`.
+Below is an example of sending a request using `tools/api_client.py`.
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "Text to be input" \
     --reference_audio "Path to reference audio" \
     --reference_text "Text content of the reference audio" \
@@ -93,7 +93,7 @@ The above command indicates synthesizing the desired audio according to the refe
 The following example demonstrates that you can use **multiple** reference audio paths and reference audio texts at once. Separate them with spaces in the command.
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "Text to input" \
     --reference_audio "reference audio path1" "reference audio path2" \
     --reference_text "reference audio text1" "reference audio text2"\
@@ -109,7 +109,7 @@ The currently supported reference audio has a maximum total duration of 90 secon
 
 
 !!! info 
-    To learn more about available parameters, you can use the command `python -m tools.post_api -h`
+    To learn more about available parameters, you can use the command `python -m tools.api_client -h`
 
 ## GUI Inference 
 [Download client](https://github.com/AnyaCoder/fish-speech-gui/releases)

diff --git a/docs/en/start_agent.md b/docs/en/start_agent.md
@@ -44,7 +44,7 @@ pip install -e .[stable]
 To build fish-agent, please use the command below under the main folder:
 
 ```bash
-python -m tools.api --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
+python -m tools.api_server --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
 ```
 
 The `--compile` args only support Python < 3.12 , which will greatly speed up the token generation.

diff --git a/docs/ja/index.md b/docs/ja/index.md
@@ -184,7 +184,7 @@ pip install -e .[stable]
 4. 環境変数の設定と WebUI へのアクセス
 
     Docker コンテナ内のターミナルで、`export GRADIO_SERVER_NAME="0.0.0.0"` と入力して、外部から Docker 内の gradio サービスにアクセスできるようにします。
-    次に、Docker コンテナ内のターミナルで `python tools/webui.py` と入力して WebUI サービスを起動します。
+    次に、Docker コンテナ内のターミナルで `python tools/run_webui.py` と入力して WebUI サービスを起動します。
 
     WSL または MacOS の場合は、[http://localhost:7860](http://localhost:7860) にアクセスして WebUI インターフェースを開くことができます。
 

diff --git a/docs/ja/inference.md b/docs/ja/inference.md
@@ -67,7 +67,7 @@ python tools/vqgan/inference.py \
 推論のための HTTP API を提供しています。次のコマンドを使用してサーバーを起動できます：
 
 ```bash
-python -m tools.api \
+python -m tools.api_server \
     --listen 0.0.0.0:8080 \
     --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
@@ -78,10 +78,10 @@ python -m tools.api \
 
 その後、`http://127.0.0.1:8080/`で API を表示およびテストできます。
 
-以下は、`tools/post_api.py` を使用してリクエストを送信する例です。
+以下は、`tools/api_client.py` を使用してリクエストを送信する例です。
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "入力するテキスト" \
     --reference_audio "参照音声へのパス" \
     --reference_text "参照音声テキスト" \
@@ -91,7 +91,7 @@ python -m tools.post_api \
 上記のコマンドは、参照音声の情報に基づいて必要な音声を合成し、ストリーミング方式で返すことを示しています。
 
 !!! info
-    使用可能なパラメータの詳細については、コマンド` python -m tools.post_api -h `を使用してください
+    使用可能なパラメータの詳細については、コマンド` python -m tools.api_client -h `を使用してください
 
 ## WebUI 推論
 

diff --git a/docs/ja/start_agent.md b/docs/ja/start_agent.md
@@ -47,7 +47,7 @@ pip install -e .[stable]
 fish-agentを構築するには、メインフォルダで以下のコマンドを使用してください:
 
 ```bash
-python -m tools.api --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
+python -m tools.api_server --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
 ```
 
 `--compile`引数はPython < 3.12でのみサポートされており、トークン生成を大幅に高速化します。

diff --git a/docs/ko/index.md b/docs/ko/index.md
@@ -185,7 +185,7 @@ pip install -e .[stable]
 4. 환경 변수 설정 및 WebUI 접근
 
     Docker 컨테이너 내부의 터미널에서 `export GRADIO_SERVER_NAME="0.0.0.0"`를 입력하여 Docker 내부에서 Gradio 서비스에 외부 접근을 허용합니다.
-    이후, 터미널에서 `python tools/webui.py` 명령어를 입력하여 WebUI 서비스를 시작합니다.
+    이후, 터미널에서 `python tools/run_webui.py` 명령어를 입력하여 WebUI 서비스를 시작합니다.
 
     WSL 또는 macOS를 사용하는 경우 [http://localhost:7860](http://localhost:7860)에서 WebUI 인터페이스를 열 수 있습니다.
 

diff --git a/docs/ko/inference.md b/docs/ko/inference.md
@@ -67,7 +67,7 @@ python tools/vqgan/inference.py \
 추론을 위한 HTTP API를 제공하고 있습니다. 아래의 명령어로 서버를 시작할 수 있습니다:
 
 ```bash
-python -m tools.api \
+python -m tools.api_server \
     --listen 0.0.0.0:8080 \
     --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
@@ -78,10 +78,10 @@ python -m tools.api \
 
 이후, http://127.0.0.1:8080/ 에서 API를 확인하고 테스트할 수 있습니다.
 
-아래는 `tools/post_api.py`를 사용하여 요청을 보내는 예시입니다.
+아래는 `tools/api_client.py`를 사용하여 요청을 보내는 예시입니다.
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "입력할 텍스트" \
     --reference_audio "참고 음성 경로" \
     --reference_text "참고 음성의 텍스트 내용" \
@@ -93,7 +93,7 @@ python -m tools.post_api \
 다음 예시는 여러 개의 참고 음성 경로와 텍스트를 한꺼번에 사용할 수 있음을 보여줍니다. 명령에서 공백으로 구분하여 입력합니다.
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "입력할 텍스트" \
     --reference_audio "참고 음성 경로1" "참고 음성 경로2" \
     --reference_text "참고 음성 텍스트1" "참고 음성 텍스트2"\
@@ -107,7 +107,7 @@ python -m tools.post_api \
 `--reference_audio`와 `--reference_text` 대신에 `--reference_id`(하나만 사용 가능)를 사용할 수 있습니다. 프로젝트 루트 디렉토리에 `references/<your reference_id>` 폴더를 만들어 해당 음성과 주석 텍스트를 넣어야 합니다. 참고 음성은 최대 90초까지 지원됩니다.
 
 !!! info 
-    제공되는 파라미터는 `python -m tools.post_api -h`를 사용하여 확인할 수 있습니다.
+    제공되는 파라미터는 `python -m tools.api_client -h`를 사용하여 확인할 수 있습니다.
 
 ## GUI 추론 
 [클라이언트 다운로드](https://github.com/AnyaCoder/fish-speech-gui/releases)

diff --git a/docs/ko/start_agent.md b/docs/ko/start_agent.md
@@ -47,7 +47,7 @@ pip install -e .[stable]
 fish-agent를 구축하려면 메인 폴더에서 아래 명령어를 사용하세요:
 
 ```bash
-python -m tools.api --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
+python -m tools.api_server --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
 ```
 
 `--compile` 인자는 Python < 3.12에서만 지원되며, 토큰 생성 속도를 크게 향상시킵니다.

diff --git a/docs/pt/index.md b/docs/pt/index.md
@@ -181,7 +181,7 @@ pip install -e .[stable]
 4. Configure as variáveis de ambiente e acesse a WebUI
 
     No terminal do contêiner Docker, digite `export GRADIO_SERVER_NAME="0.0.0.0"` para permitir o acesso externo ao serviço gradio dentro do Docker.
-    Em seguida, no terminal do contêiner Docker, digite `python tools/webui.py` para iniciar o serviço WebUI.
+    Em seguida, no terminal do contêiner Docker, digite `python tools/run_webui.py` para iniciar o serviço WebUI.
 
     Se estiver usando WSL ou MacOS, acesse [http://localhost:7860](http://localhost:7860) para abrir a interface WebUI.
 

diff --git a/docs/pt/inference.md b/docs/pt/inference.md
@@ -67,7 +67,7 @@ python tools/vqgan/inference.py \
 Fornecemos uma API HTTP para inferência. O seguinte comando pode ser usado para iniciar o servidor:
 
 ```bash
-python -m tools.api \
+python -m tools.api_server \
     --listen 0.0.0.0:8080 \
     --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
@@ -78,10 +78,10 @@ python -m tools.api \
 
 Depois disso, é possível visualizar e testar a API em http://127.0.0.1:8080/.
 
-Abaixo está um exemplo de envio de uma solicitação usando `tools/post_api.py`.
+Abaixo está um exemplo de envio de uma solicitação usando `tools/api_client.py`.
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "Texto a ser inserido" \
     --reference_audio "Caminho para o áudio de referência" \
     --reference_text "Conteúdo de texto do áudio de referência" \
@@ -91,7 +91,7 @@ python -m tools.post_api \
 O comando acima indica a síntese do áudio desejada de acordo com as informações do áudio de referência e a retorna em modo de streaming.
 
 !!! info
-    Para aprender mais sobre parâmetros disponíveis, você pode usar o comando `python -m tools.post_api -h`
+    Para aprender mais sobre parâmetros disponíveis, você pode usar o comando `python -m tools.api_client -h`
 
 ## Inferência por WebUI
 

diff --git a/docs/pt/start_agent.md b/docs/pt/start_agent.md
@@ -47,7 +47,7 @@ pip install -e .[stable]
 Para construir o fish-agent, use o comando abaixo na pasta principal:
 
 ```bash
-python -m tools.api --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
+python -m tools.api_server --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
 ```
 
 O argumento `--compile` só suporta Python < 3.12, o que aumentará muito a velocidade de geração de tokens.

diff --git a/docs/zh/index.md b/docs/zh/index.md
@@ -188,7 +188,7 @@ pip install -e .[stable]
 4. 配置环境变量，访问 WebUI
 
     在 docker 容器内的终端，输入 `export GRADIO_SERVER_NAME="0.0.0.0"` ，从而让外部可以访问 docker 内的 gradio 服务。
-    接着在 docker 容器内的终端，输入 `python tools/webui.py` 即可开启 WebUI 服务。
+    接着在 docker 容器内的终端，输入 `python tools/run_webui.py` 即可开启 WebUI 服务。
 
     如果是 WSL 或者是 MacOS ，访问 [http://localhost:7860](http://localhost:7860) 即可打开 WebUI 界面。
 

diff --git a/docs/zh/inference.md b/docs/zh/inference.md
@@ -73,7 +73,7 @@ python tools/vqgan/inference.py \
 运行以下命令来启动 HTTP 服务:
 
 ```bash
-python -m tools.api \
+python -m tools.api_server \
     --listen 0.0.0.0:8080 \
     --llama-checkpoint-path "checkpoints/fish-speech-1.5" \
     --decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
@@ -88,10 +88,10 @@ HF_ENDPOINT=https://hf-mirror.com python -m ...(同上)
 
 随后, 你可以在 `http://127.0.0.1:8080/` 中查看并测试 API.
 
-下面是使用`tools/post_api.py`发送请求的示例。
+下面是使用`tools/api_client.py`发送请求的示例。
 
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "要输入的文本" \
     --reference_audio "参考音频路径" \
     --reference_text "参考音频的文本内容" \
@@ -102,7 +102,7 @@ python -m tools.post_api \
 
 下面的示例展示了， 可以一次使用**多个** `参考音频路径` 和 `参考音频的文本内容`。在命令里用空格隔开即可。
 ```bash
-python -m tools.post_api \
+python -m tools.api_client \
     --text "要输入的文本" \
     --reference_audio "参考音频路径1" "参考音频路径2" \
     --reference_text "参考音频的文本内容1" "参考音频的文本内容2"\
@@ -117,7 +117,7 @@ python -m tools.post_api \
 里面放上任意对音频与标注文本。 目前支持的参考音频最多加起来总时长90s。
 
 !!! info
-    要了解有关可用参数的更多信息，可以使用命令`python -m tools.post_api -h`
+    要了解有关可用参数的更多信息，可以使用命令`python -m tools.api_client -h`
 
 ## GUI 推理 
 [下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases)

diff --git a/docs/zh/start_agent.md b/docs/zh/start_agent.md
@@ -49,7 +49,7 @@ pip install -e .[stable]
 你需要使用以下指令来构建 fish-agent
 
 ```bash
-python -m tools.api --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
+python -m tools.api_server --llama-checkpoint-path checkpoints/fish-agent-v0.1-3b/ --mode agent --compile
 ```
 
 `--compile`只能在小于 3.12 版本的 Python 使用，这个功能可以极大程度上提高生成速度。

diff --git a/entrypoint.sh b/entrypoint.sh
@@ -7,4 +7,4 @@ if [ "${CUDA_ENABLED}" != "true" ]; then
     DEVICE="--device cpu"
 fi
 
-exec python tools/webui.py ${DEVICE}
+exec python tools/run_webui.py ${DEVICE}
diff --git a/fish_speech/webui/manage.py b/fish_speech/webui/manage.py
@@ -176,7 +176,7 @@ def change_infer(
         p_infer = subprocess.Popen(
             [
                 PYTHON,
-                "tools/webui.py",
+                "tools/run_webui.py",
                 "--decoder-checkpoint-path",
                 infer_decoder_model,
                 "--decoder-config-name",

diff --git a/inference.ipynb b/inference.ipynb
@@ -83,7 +83,7 @@
    },
    "outputs": [],
    "source": [
-    "!python tools/webui.py \\\n",
+    "!python tools/run_webui.py \\\n",
     "    --llama-checkpoint-path checkpoints/fish-speech-1.4 \\\n",
     "    --decoder-checkpoint-path checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth \\\n",
     "    # --compile"

diff --git a/start.bat b/start.bat
@@ -82,7 +82,7 @@ if not "!flags!"=="" set "flags=!flags:~1!"
 echo Debug: flags = !flags!
 
 if "!mode!"=="api" (
-    %PYTHON_CMD% -m tools.api !flags!
+    %PYTHON_CMD% -m tools.api_server !flags!
 ) else if "!mode!"=="infer" (
     %PYTHON_CMD% -m tools.webui !flags!
 )