Skip to content

Commit

Permalink
v1.5 (#696)
Browse files Browse the repository at this point in the history
* fix e2e_webui

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Agent: Streaming audio

* fix

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix text streaming

* [feature]:add tiktoken tokenizer to fit v1.5

* v1.5 vq

* update docs

* [feature]:add agent infer

* [feature]:add decoder of api agent inference

* [fix]: use lengyue's fix to fix infer bugs

* [fix]:fix the problem of inference error with prompt audio

* [fix]:remove some used tokens

* [fix]:fix some prompt bug

* [fix]:fix the origin audio of speaking out the system prompt

* remove unused

* revert spliter

* remove unused

* remove unused ignore

* remove root conversaion

* fix llama

* disable visualization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: whaledolphin <[email protected]>
Co-authored-by: PoTaTo <[email protected]>
Co-authored-by: Whale and Dolphin <[email protected]>
  • Loading branch information
5 people authored Dec 3, 2024
1 parent 2cb60a5 commit b951de3
Show file tree
Hide file tree
Showing 25 changed files with 535 additions and 316 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ repos:
- id: check-yaml
- id: check-json
- id: mixed-line-ending
args: ['--fix=lf']
args: ["--fix=lf"]
- id: check-added-large-files
args: ['--maxkb=5000']
args: ["--maxkb=5000"]
10 changes: 5 additions & 5 deletions docs/en/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ You need to convert your dataset into the above format and place it under `data`
Make sure you have downloaded the VQGAN weights. If not, run the following command:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

You can then run the following command to extract semantic tokens:
Expand All @@ -48,7 +48,7 @@ You can then run the following command to extract semantic tokens:
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -92,7 +92,7 @@ After the command finishes executing, you should see the `quantized-dataset-ft.p
Similarly, make sure you have downloaded the `LLAMA` weights. If not, run the following command:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

Finally, you can start the fine-tuning by running the following command:
Expand Down Expand Up @@ -120,9 +120,9 @@ After training, you need to convert the LoRA weights to regular weights before p
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.4 \
--base-weight checkpoints/fish-speech-1.5 \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.4-yth-lora/
--output checkpoints/fish-speech-1.5-yth-lora/
```
!!! note
You may also try other checkpoints. We suggest using the earliest checkpoint that meets your requirements, as they often perform better on out-of-distribution (OOD) data.
2 changes: 1 addition & 1 deletion docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ pip install -e .[stable]
Make sure you are in the terminal inside the docker container, then download the required `vqgan` and `llama` models from our huggingface repository.

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

4. Configure environment variables and access WebUI
Expand Down
16 changes: 8 additions & 8 deletions docs/en/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Inference support command line, HTTP API and web UI.
Download the required `vqgan` and `llama` models from our Hugging Face repository.

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

### 1. Generate prompt from voice:
Expand All @@ -26,7 +26,7 @@ huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

You should get a `fake.npy` file.
Expand All @@ -38,7 +38,7 @@ python tools/llama/generate.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4" \
--checkpoint-path "checkpoints/fish-speech-1.5" \
--num-samples 2 \
--compile
```
Expand All @@ -59,7 +59,7 @@ This command will create a `codes_N` file in the working directory, where N is a
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

## HTTP API Inference
Expand All @@ -69,8 +69,8 @@ We provide a HTTP API for inference. You can use the following command to start
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down Expand Up @@ -120,8 +120,8 @@ You can start the WebUI using the following command:

```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```
> If you want to speed up inference, you can add the `--compile` parameter.
Expand Down
10 changes: 5 additions & 5 deletions docs/ja/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
VQGANの重みをダウンロードしたことを確認してください。まだダウンロードしていない場合は、次のコマンドを実行してください。

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

次に、次のコマンドを実行してセマンティックトークンを抽出できます。
Expand All @@ -48,7 +48,7 @@ huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -92,7 +92,7 @@ python tools/llama/build_dataset.py \
同様に、`LLAMA`の重みをダウンロードしたことを確認してください。まだダウンロードしていない場合は、次のコマンドを実行してください。

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

最後に、次のコマンドを実行して微調整を開始できます。
Expand Down Expand Up @@ -120,9 +120,9 @@ python fish_speech/train.py --config-name text2semantic_finetune \
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.4 \
--base-weight checkpoints/fish-speech-1.5 \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.4-yth-lora/
--output checkpoints/fish-speech-1.5-yth-lora/
```
!!! note
他のチェックポイントを試すこともできます。要件を満たす最も早いチェックポイントを使用することをお勧めします。これらは通常、分布外(OOD)データでより良いパフォーマンスを発揮します。
2 changes: 1 addition & 1 deletion docs/ja/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ pip install -e .[stable]
Docker コンテナ内のターミナルにいることを確認し、huggingface リポジトリから必要な `vqgan``llama` モデルをダウンロードします。

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

4. 環境変数の設定と WebUI へのアクセス
Expand Down
16 changes: 8 additions & 8 deletions docs/ja/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
必要な`vqgan`および`llama`モデルを Hugging Face リポジトリからダウンロードします。

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

### 1. 音声からプロンプトを生成する:
Expand All @@ -26,7 +26,7 @@ huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

`fake.npy`ファイルが生成されるはずです。
Expand All @@ -38,7 +38,7 @@ python tools/llama/generate.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4" \
--checkpoint-path "checkpoints/fish-speech-1.5" \
--num-samples 2 \
--compile
```
Expand All @@ -59,7 +59,7 @@ python tools/llama/generate.py \
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

## HTTP API 推論
Expand All @@ -69,8 +69,8 @@ python tools/vqgan/inference.py \
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down Expand Up @@ -99,8 +99,8 @@ python -m tools.post_api \

```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```
> 推論を高速化したい場合は、`--compile` パラメータを追加できます。
Expand Down
10 changes: 5 additions & 5 deletions docs/ko/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
VQGAN 가중치를 다운로드했는지 확인하세요. 다운로드하지 않았다면 아래 명령어를 실행하세요:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

이후 시맨틱 토큰을 추출하기 위해 아래 명령어를 실행하세요:
Expand All @@ -47,7 +47,7 @@ huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -91,7 +91,7 @@ python tools/llama/build_dataset.py \
마찬가지로, `LLAMA` 가중치를 다운로드했는지 확인하세요. 다운로드하지 않았다면 아래 명령어를 실행하세요:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

마지막으로, 아래 명령어를 실행하여 파인튜닝을 시작할 수 있습니다:
Expand Down Expand Up @@ -119,9 +119,9 @@ python fish_speech/train.py --config-name text2semantic_finetune \
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.4 \
--base-weight checkpoints/fish-speech-1.5 \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.4-yth-lora/
--output checkpoints/fish-speech-1.5-yth-lora/
```

!!! note
Expand Down
2 changes: 1 addition & 1 deletion docs/ko/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ pip install -e .[stable]
Docker 컨테이너 내부의 터미널에서 아래 명령어를 사용하여 필요한 `vqgan``llama` 모델을 Huggingface 리포지토리에서 다운로드합니다.

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

4. 환경 변수 설정 및 WebUI 접근
Expand Down
16 changes: 8 additions & 8 deletions docs/ko/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
필요한 `vqgan``llama` 모델을 Hugging Face 리포지토리에서 다운로드하세요.

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

### 1. 음성에서 프롬프트 생성:
Expand All @@ -26,7 +26,7 @@ huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

이 명령을 실행하면 `fake.npy` 파일을 얻게 됩니다.
Expand All @@ -38,7 +38,7 @@ python tools/llama/generate.py \
--text "변환할 텍스트" \
--prompt-text "참고할 텍스트" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4" \
--checkpoint-path "checkpoints/fish-speech-1.5" \
--num-samples 2 \
--compile
```
Expand All @@ -59,7 +59,7 @@ python tools/llama/generate.py \
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

## HTTP API 추론
Expand All @@ -69,8 +69,8 @@ python tools/vqgan/inference.py \
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down Expand Up @@ -118,8 +118,8 @@ python -m tools.post_api \

```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--llama-checkpoint-path "checkpoints/fish-speech-1.5" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```

Expand Down
10 changes: 5 additions & 5 deletions docs/pt/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Você precisa converter seu conjunto de dados para o formato acima e colocá-lo
Certifique-se de ter baixado os pesos do VQGAN. Se não, execute o seguinte comando:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

Em seguida, você pode executar o seguinte comando para extrair os tokens semânticos:
Expand All @@ -48,7 +48,7 @@ Em seguida, você pode executar o seguinte comando para extrair os tokens semân
python tools/vqgan/extract_vq.py data \
--num-workers 1 --batch-size 16 \
--config-name "firefly_gan_vq" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
--checkpoint-path "checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```

!!! note
Expand Down Expand Up @@ -92,7 +92,7 @@ Após executar o comando, você deverá ver o arquivo `quantized-dataset-ft.prot
Da mesma forma, certifique-se de ter baixado os pesos do `LLAMA`. Se não, execute o seguinte comando:

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

E então, execute o seguinte comando para iniciar o ajuste fino:
Expand Down Expand Up @@ -120,9 +120,9 @@ Após o treinamento, é preciso converter os pesos do LoRA em pesos regulares an
```bash
python tools/llama/merge_lora.py \
--lora-config r_8_alpha_16 \
--base-weight checkpoints/fish-speech-1.4 \
--base-weight checkpoints/fish-speech-1.5 \
--lora-weight results/$project/checkpoints/step_000000010.ckpt \
--output checkpoints/fish-speech-1.4-yth-lora/
--output checkpoints/fish-speech-1.5-yth-lora/
```
!!! note
É possível também tentar outros checkpoints. Sugerimos usar o checkpoint que melhor atenda aos seus requisitos, pois eles geralmente têm um desempenho melhor em dados fora da distribuição (OOD).
2 changes: 1 addition & 1 deletion docs/pt/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ pip install -e .[stable]
Certifique-se de estar no terminal do contêiner Docker e, em seguida, baixe os modelos necessários `vqgan` e `llama` do nosso repositório HuggingFace.

```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
```

4. Configure as variáveis de ambiente e acesse a WebUI
Expand Down
Loading

0 comments on commit b951de3

Please sign in to comment.