Move inference code to library dir (#801)

* move inference engine * move file functions * move schema * minor fix * minor fix * move inference code * update docs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix * [feature]retain the interface to support old version codes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update inference.ipynb tp support 1.5 * Remove unused package --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Whale-Dolphin <[email protected]> Co-authored-by: Whale and Dolphin <[email protected]>
fishaudio · Jan 10, 2025 · 4fc8dbd · 4fc8dbd
1 parent d76b917
commit 4fc8dbd
Show file tree

Hide file tree

Showing 44 changed files with 1,505 additions and 3,016 deletions.
diff --git a/docs/en/inference.md b/docs/en/inference.md
@@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
 !!! note
     If you plan to let the model randomly choose a voice timbre, you can skip this step.
 
+!!! warning "Future Warning"
+    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "paimon.wav" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```
@@ -33,8 +36,11 @@ You should get a `fake.npy` file.
 
 ### 2. Generate semantic tokens from text:
 
+!!! warning "Future Warning"
+    We have kept the interface accessible from the original path (tools/llama/generate.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
+
 ```bash
-python tools/llama/generate.py \
+python fish_speech/models/text2semantic/inference.py \
     --text "The text you want to convert" \
     --prompt-text "Your reference text" \
     --prompt-tokens "fake.npy" \
@@ -56,8 +62,11 @@ This command will create a `codes_N` file in the working directory, where N is a
 
 #### VQGAN Decoder
 
+!!! warning "Future Warning"
+    We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```

diff --git a/docs/ja/inference.md b/docs/ja/inference.md
@@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
 !!! note
     モデルにランダムに音声の音色を選ばせる場合、このステップをスキップできます。
 
+!!! warning "将来のバージョンに関する警告"
+    元のパス（tools/vqgan/infernce.py）からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "paimon.wav" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```
@@ -33,8 +36,11 @@ python tools/vqgan/inference.py \
 
 ### 2. テキストからセマンティックトークンを生成する：
 
+!!! warning "将来のバージョンに関する警告"
+    元のパス（tools/llama/generate.py）からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。
+
 ```bash
-python tools/llama/generate.py \
+python fish_speech/models/text2semantic/inference.py \
     --text "変換したいテキスト" \
     --prompt-text "参照テキスト" \
     --prompt-tokens "fake.npy" \
@@ -56,8 +62,11 @@ python tools/llama/generate.py \
 
 #### VQGAN デコーダー
 
+!!! warning "将来のバージョンに関する警告"
+    元のパス（tools/vqgan/infernce.py）からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```

diff --git a/docs/ko/inference.md b/docs/ko/inference.md
@@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
 !!! note
     모델이 음색을 무작위로 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.
 
+!!! warning "향후 버전 경고"
+    원래 경로(tools/vqgan/infernce.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "paimon.wav" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```
@@ -33,8 +36,11 @@ python tools/vqgan/inference.py \
 
 ### 2. 텍스트에서 시맨틱 토큰 생성:
 
+!!! warning "향후 버전 경고"
+    원래 경로(tools/llama/generate.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.
+
 ```bash
-python tools/llama/generate.py \
+python fish_speech/models/text2semantic/inference.py \
     --text "변환할 텍스트" \
     --prompt-text "참고할 텍스트" \
     --prompt-tokens "fake.npy" \
@@ -56,8 +62,11 @@ python tools/llama/generate.py \
 
 #### VQGAN 디코더
 
+!!! warning "향후 버전 경고"
+    원래 경로(tools/vqgan/infernce.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```

diff --git a/docs/pt/inference.md b/docs/pt/inference.md
@@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
 !!! note
     Se quiser permitir que o modelo escolha aleatoriamente um timbre de voz, pule esta etapa.
 
+!!! warning "Aviso de Versão Futura"
+    Mantivemos a interface acessível a partir do caminho original (tools/vqgan/infernce.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "paimon.wav" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```
@@ -33,8 +36,11 @@ Você deverá obter um arquivo `fake.npy`.
 
 ### 2. Gerar tokens semânticos a partir do texto:
 
+!!! warning "Aviso de Versão Futura"
+    Mantivemos a interface acessível a partir do caminho original (tools/llama/generate.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.
+
 ```bash
-python tools/llama/generate.py \
+python fish_speech/models/text2semantic/inference.py \
     --text "O texto que você deseja converter" \
     --prompt-text "Seu texto de referência" \
     --prompt-tokens "fake.npy" \
@@ -56,8 +62,11 @@ Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é u
 
 #### Decodificador VQGAN
 
+!!! warning "Aviso de Versão Futura"
+    Mantivemos a interface acessível a partir do caminho original (tools/vqgan/infernce.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```

diff --git a/docs/zh/inference.md b/docs/zh/inference.md
@@ -29,8 +29,11 @@ HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/fish-speech
 !!! note
     如果你打算让模型随机选择音色, 你可以跳过这一步.
 
+!!! warning "未来版本警告"
+    我们保留了从原来路径（tools/vqgan/infernce.py）访问的接口，但是这个接口可能在之后几个版本被删除，请尽快更改你的代码。
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "paimon.wav" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```
@@ -39,8 +42,11 @@ python tools/vqgan/inference.py \
 
 ### 2. 从文本生成语义 token:
 
+!!! warning "未来版本警告"
+    我们保留了从原来路径（tools/llama/generate.py）访问的接口，但是这个接口可能在之后几个版本被删除，请尽快更改你的代码。
+
 ```bash
-python tools/llama/generate.py \
+python fish_speech/models/text2semantic/inference.py \
     --text "要转换的文本" \
     --prompt-text "你的参考文本" \
     --prompt-tokens "fake.npy" \
@@ -62,8 +68,11 @@ python tools/llama/generate.py \
 
 #### VQGAN 解码
 
+!!! warning "未来版本警告"
+    我们保留了从原来路径（tools/vqgan/infernce.py）访问的接口，但是这个接口可能在之后几个版本被删除，请尽快更改你的代码。
+
 ```bash
-python tools/vqgan/inference.py \
+python fish_speech/models/vqgan/inference.py \
     -i "codes_0.npy" \
     --checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
 ```

diff --git a/tools/inference_engine/__init__.py → fish_speech/inference_engine/__init__.py b/tools/inference_engine/__init__.py → fish_speech/inference_engine/__init__.py
@@ -6,18 +6,18 @@
 import torch
 from loguru import logger
 
-from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
-from fish_speech.text.chn_text_norm.text import Text as ChnNormedText
-from fish_speech.utils import autocast_exclude_mps, set_seed
-from tools.inference_engine.reference_loader import ReferenceLoader
-from tools.inference_engine.utils import InferenceResult, wav_chunk_header
-from tools.inference_engine.vq_manager import VQManager
-from tools.llama.generate import (
+from fish_speech.inference_engine.reference_loader import ReferenceLoader
+from fish_speech.inference_engine.utils import InferenceResult, wav_chunk_header
+from fish_speech.inference_engine.vq_manager import VQManager
+from fish_speech.models.text2semantic.inference import (
     GenerateRequest,
     GenerateResponse,
     WrappedGenerateResponse,
 )
-from tools.schema import ServeTTSRequest
+from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
+from fish_speech.text.chn_text_norm.text import Text as ChnNormedText
+from fish_speech.utils import autocast_exclude_mps, set_seed
+from fish_speech.utils.schema import ServeTTSRequest
 
 
 class TTSInferenceEngine(ReferenceLoader, VQManager):
@@ -72,7 +72,10 @@ def inference(self, req: ServeTTSRequest) -> Generator[InferenceResult, None, No
         if req.streaming:
             yield InferenceResult(
                 code="header",
-                audio=(sample_rate, wav_chunk_header(sample_rate=sample_rate)),
+                audio=(
+                    sample_rate,
+                    np.array(wav_chunk_header(sample_rate=sample_rate)),
+                ),
                 error=None,
             )
 

diff --git a/tools/inference_engine/reference_loader.py → ...eech/inference_engine/reference_loader.py b/tools/inference_engine/reference_loader.py → ...eech/inference_engine/reference_loader.py
@@ -8,8 +8,13 @@
 from loguru import logger
 
 from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
-from tools.file import AUDIO_EXTENSIONS, audio_to_bytes, list_files, read_ref_text
-from tools.schema import ServeReferenceAudio
+from fish_speech.utils.file import (
+    AUDIO_EXTENSIONS,
+    audio_to_bytes,
+    list_files,
+    read_ref_text,
+)
+from fish_speech.utils.schema import ServeReferenceAudio
 
 
 class ReferenceLoader:

diff --git a/tools/inference_engine/utils.py → fish_speech/inference_engine/utils.py b/tools/inference_engine/utils.py → fish_speech/inference_engine/utils.py
@@ -11,7 +11,7 @@
 @dataclass
 class InferenceResult:
     code: Literal["header", "segment", "error", "final"]
-    audio: Optional[Tuple[int, np.ndarray | bytes]]
+    audio: Optional[Tuple[int, np.ndarray]]
     error: Optional[Exception]
 
 

diff --git a/tools/inference_engine/vq_manager.py → fish_speech/inference_engine/vq_manager.py b/tools/inference_engine/vq_manager.py → fish_speech/inference_engine/vq_manager.py