Skip to content

Commit

Permalink
Move inference code to library dir (#801)
Browse files Browse the repository at this point in the history
* move inference engine

* move file functions

* move schema

* minor fix

* minor fix

* move inference code

* update docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

* [feature]retain the interface to support old version codes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update inference.ipynb tp support 1.5

* Remove unused package

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Whale-Dolphin <[email protected]>
Co-authored-by: Whale and Dolphin <[email protected]>
  • Loading branch information
4 people authored Jan 10, 2025
1 parent d76b917 commit 4fc8dbd
Show file tree
Hide file tree
Showing 44 changed files with 1,505 additions and 3,016 deletions.
15 changes: 12 additions & 3 deletions docs/en/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
!!! note
If you plan to let the model randomly choose a voice timbre, you can skip this step.

!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand All @@ -33,8 +36,11 @@ You should get a `fake.npy` file.

### 2. Generate semantic tokens from text:

!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/llama/generate.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.

```bash
python tools/llama/generate.py \
python fish_speech/models/text2semantic/inference.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
Expand All @@ -56,8 +62,11 @@ This command will create a `codes_N` file in the working directory, where N is a

#### VQGAN Decoder

!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand Down
15 changes: 12 additions & 3 deletions docs/ja/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
!!! note
モデルにランダムに音声の音色を選ばせる場合、このステップをスキップできます。

!!! warning "将来のバージョンに関する警告"
元のパス(tools/vqgan/infernce.py)からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand All @@ -33,8 +36,11 @@ python tools/vqgan/inference.py \

### 2. テキストからセマンティックトークンを生成する:

!!! warning "将来のバージョンに関する警告"
元のパス(tools/llama/generate.py)からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。

```bash
python tools/llama/generate.py \
python fish_speech/models/text2semantic/inference.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
Expand All @@ -56,8 +62,11 @@ python tools/llama/generate.py \

#### VQGAN デコーダー

!!! warning "将来のバージョンに関する警告"
元のパス(tools/vqgan/infernce.py)からアクセスできるインターフェースは残していますが、このインターフェースは将来のいくつかのバージョンで削除される可能性があります。お早めにコードを変更してください。

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand Down
15 changes: 12 additions & 3 deletions docs/ko/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
!!! note
모델이 음색을 무작위로 선택하도록 하려면 이 단계를 건너뛸 수 있습니다.

!!! warning "향후 버전 경고"
원래 경로(tools/vqgan/infernce.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand All @@ -33,8 +36,11 @@ python tools/vqgan/inference.py \

### 2. 텍스트에서 시맨틱 토큰 생성:

!!! warning "향후 버전 경고"
원래 경로(tools/llama/generate.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.

```bash
python tools/llama/generate.py \
python fish_speech/models/text2semantic/inference.py \
--text "변환할 텍스트" \
--prompt-text "참고할 텍스트" \
--prompt-tokens "fake.npy" \
Expand All @@ -56,8 +62,11 @@ python tools/llama/generate.py \

#### VQGAN 디코더

!!! warning "향후 버전 경고"
원래 경로(tools/vqgan/infernce.py)에서 접근할 수 있는 인터페이스는 유지했지만, 이 인터페이스는 향후 몇몇 버전에서 삭제될 수 있습니다. 가능한 한 빨리 코드를 변경하십시오.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand Down
15 changes: 12 additions & 3 deletions docs/pt/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-
!!! note
Se quiser permitir que o modelo escolha aleatoriamente um timbre de voz, pule esta etapa.

!!! warning "Aviso de Versão Futura"
Mantivemos a interface acessível a partir do caminho original (tools/vqgan/infernce.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand All @@ -33,8 +36,11 @@ Você deverá obter um arquivo `fake.npy`.

### 2. Gerar tokens semânticos a partir do texto:

!!! warning "Aviso de Versão Futura"
Mantivemos a interface acessível a partir do caminho original (tools/llama/generate.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.

```bash
python tools/llama/generate.py \
python fish_speech/models/text2semantic/inference.py \
--text "O texto que você deseja converter" \
--prompt-text "Seu texto de referência" \
--prompt-tokens "fake.npy" \
Expand All @@ -56,8 +62,11 @@ Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é u

#### Decodificador VQGAN

!!! warning "Aviso de Versão Futura"
Mantivemos a interface acessível a partir do caminho original (tools/vqgan/infernce.py), mas esta interface poderá ser removida em algumas versões futuras. Por favor, altere o seu código o mais breve possível.

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand Down
15 changes: 12 additions & 3 deletions docs/zh/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@ HF_ENDPOINT=https://hf-mirror.com huggingface-cli download fishaudio/fish-speech
!!! note
如果你打算让模型随机选择音色, 你可以跳过这一步.

!!! warning "未来版本警告"
我们保留了从原来路径(tools/vqgan/infernce.py)访问的接口,但是这个接口可能在之后几个版本被删除,请尽快更改你的代码。

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand All @@ -39,8 +42,11 @@ python tools/vqgan/inference.py \

### 2. 从文本生成语义 token:

!!! warning "未来版本警告"
我们保留了从原来路径(tools/llama/generate.py)访问的接口,但是这个接口可能在之后几个版本被删除,请尽快更改你的代码。

```bash
python tools/llama/generate.py \
python fish_speech/models/text2semantic/inference.py \
--text "要转换的文本" \
--prompt-text "你的参考文本" \
--prompt-tokens "fake.npy" \
Expand All @@ -62,8 +68,11 @@ python tools/llama/generate.py \

#### VQGAN 解码

!!! warning "未来版本警告"
我们保留了从原来路径(tools/vqgan/infernce.py)访问的接口,但是这个接口可能在之后几个版本被删除,请尽快更改你的代码。

```bash
python tools/vqgan/inference.py \
python fish_speech/models/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@
import torch
from loguru import logger

from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
from fish_speech.text.chn_text_norm.text import Text as ChnNormedText
from fish_speech.utils import autocast_exclude_mps, set_seed
from tools.inference_engine.reference_loader import ReferenceLoader
from tools.inference_engine.utils import InferenceResult, wav_chunk_header
from tools.inference_engine.vq_manager import VQManager
from tools.llama.generate import (
from fish_speech.inference_engine.reference_loader import ReferenceLoader
from fish_speech.inference_engine.utils import InferenceResult, wav_chunk_header
from fish_speech.inference_engine.vq_manager import VQManager
from fish_speech.models.text2semantic.inference import (
GenerateRequest,
GenerateResponse,
WrappedGenerateResponse,
)
from tools.schema import ServeTTSRequest
from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
from fish_speech.text.chn_text_norm.text import Text as ChnNormedText
from fish_speech.utils import autocast_exclude_mps, set_seed
from fish_speech.utils.schema import ServeTTSRequest


class TTSInferenceEngine(ReferenceLoader, VQManager):
Expand Down Expand Up @@ -72,7 +72,10 @@ def inference(self, req: ServeTTSRequest) -> Generator[InferenceResult, None, No
if req.streaming:
yield InferenceResult(
code="header",
audio=(sample_rate, wav_chunk_header(sample_rate=sample_rate)),
audio=(
sample_rate,
np.array(wav_chunk_header(sample_rate=sample_rate)),
),
error=None,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,13 @@
from loguru import logger

from fish_speech.models.vqgan.modules.firefly import FireflyArchitecture
from tools.file import AUDIO_EXTENSIONS, audio_to_bytes, list_files, read_ref_text
from tools.schema import ServeReferenceAudio
from fish_speech.utils.file import (
AUDIO_EXTENSIONS,
audio_to_bytes,
list_files,
read_ref_text,
)
from fish_speech.utils.schema import ServeReferenceAudio


class ReferenceLoader:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
@dataclass
class InferenceResult:
code: Literal["header", "segment", "error", "final"]
audio: Optional[Tuple[int, np.ndarray | bytes]]
audio: Optional[Tuple[int, np.ndarray]]
error: Optional[Exception]


Expand Down
File renamed without changes.
Loading

0 comments on commit 4fc8dbd

Please sign in to comment.