-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
184 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
name: ci | ||
on: | ||
push: | ||
branches: | ||
- main | ||
|
||
permissions: | ||
contents: write | ||
|
||
jobs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Configure Git Credentials | ||
run: | | ||
git config user.name github-actions[bot] | ||
git config user.email 41898282+github-actions[bot]@users.noreply.github.com | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.x | ||
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV | ||
- uses: actions/cache@v3 | ||
with: | ||
key: mkdocs-material-${{ env.cache_id }} | ||
path: .cache | ||
restore-keys: | | ||
mkdocs-material- | ||
- run: pip install mkdocs-material | ||
- run: mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Welcome to Fish Speech | ||
|
||
English Document is under construction. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
template: redirect.html | ||
location: /zh/ | ||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
mkdocs-material |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# 介绍 | ||
|
||
此代码库根据 BSD-3-Clause 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节. | ||
|
||
<p align="center"> | ||
<img src="/assets/figs/diagram.png" width="75%"> | ||
</p> | ||
|
||
## 免责声明 | ||
我们不对代码库的任何非法使用承担任何责任。请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律的法律。 | ||
|
||
## 要求 | ||
- GPU内存: 2GB (用于推理), 16GB (用于微调) | ||
- 系统: Linux (全部功能), Windows (仅推理, 不支持 `flash-attn`, 不支持 `torch.compile`) | ||
|
||
因此, 我们强烈建议 Windows 用户使用 WSL2 或 docker 来运行代码库. | ||
|
||
## 设置 | ||
```bash | ||
# 基本环境设置 | ||
conda create -n fish-speech python=3.10 | ||
conda activate fish-speech | ||
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia | ||
|
||
# 安装 flash-attn (适用于linux) | ||
pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation | ||
|
||
# 安装 fish-speech | ||
pip3 install -e . | ||
``` | ||
|
||
## 推理 (命令行) | ||
|
||
从我们的 huggingface 仓库下载所需的 `vqgan` 和 `text2semantic` 模型。 | ||
|
||
```bash | ||
wget https://huggingface.co/fishaudio/speech-lm-v1/raw/main/vqgan-v1.pth -O checkpoints/vqgan-v1.pth | ||
wget https://huggingface.co/fishaudio/speech-lm-v1/blob/main/text2semantic-400m-v0.2-4k.pth -O checkpoints/text2semantic-400m-v0.2-4k.pth | ||
``` | ||
|
||
### 1. [可选] 从语音生成 prompt: | ||
```bash | ||
python tools/vqgan/inference.py -i paimon.wav --checkpoint-path checkpoints/vqgan-v1.pth | ||
``` | ||
|
||
你应该能得到一个 `fake.npy` 文件. | ||
|
||
### 2. 从文本生成语义 token: | ||
```bash | ||
python tools/llama/generate.py \ | ||
--text "要转换的文本" \ | ||
--prompt-text "你的参考文本" \ | ||
--prompt-tokens "fake.npy" \ | ||
--checkpoint-path "checkpoints/text2semantic-400m-v0.1-4k.pth" \ | ||
--num-samples 2 \ | ||
--compile | ||
``` | ||
|
||
该命令会在工作目录下创建 `codes_N` 文件, 其中 N 是从 0 开始的整数. | ||
您可能希望使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 个 token/秒 -> ~500 个 token/秒). | ||
|
||
### 3. 从语义 token 生成人声: | ||
```bash | ||
python tools/vqgan/inference.py -i codes_0.npy --checkpoint-path checkpoints/vqgan-v1.pth | ||
``` | ||
|
||
## Rust 数据服务器 | ||
由于加载和打乱数据集非常缓慢且占用内存, 因此我们使用 rust 服务器来加载和打乱数据. 该服务器基于 GRPC, 可以通过以下方式安装: | ||
|
||
```bash | ||
cd data_server | ||
cargo build --release | ||
``` | ||
|
||
## 更新日志 | ||
|
||
- 2023/12/17: 更新了 `text2semantic` 模型, 支持无音素模式. | ||
- 2023/12/13: 测试版发布, 包含 VQGAN 模型和一个基于 LLAMA 的语言模型 (只支持音素). | ||
|
||
## 致谢 | ||
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2) | ||
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) | ||
- [GPT VITS](https://github.com/innnky/gpt-vits) | ||
- [MQTTS](https://github.com/b04901014/MQTTS) | ||
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
site_name: Fish Speech | ||
repo_url: https://github.com/fishaudio/fish-speech | ||
|
||
theme: | ||
name: material | ||
language: en | ||
features: | ||
- navigation.instant | ||
- navigation.instant.prefetch | ||
- navigation.tracking | ||
- search | ||
- search.suggest | ||
- search.highlight | ||
- search.share | ||
|
||
palette: | ||
# Palette toggle for automatic mode | ||
- media: "(prefers-color-scheme)" | ||
toggle: | ||
icon: material/brightness-auto | ||
name: Switch to light mode | ||
|
||
# Palette toggle for light mode | ||
- media: "(prefers-color-scheme: light)" | ||
scheme: default | ||
toggle: | ||
icon: material/brightness-7 | ||
name: Switch to dark mode | ||
primary: black | ||
font: | ||
code: Roboto Mono | ||
|
||
# Palette toggle for dark mode | ||
- media: "(prefers-color-scheme: dark)" | ||
scheme: slate | ||
toggle: | ||
icon: material/brightness-4 | ||
name: Switch to light mode | ||
primary: black | ||
font: | ||
code: Roboto Mono | ||
|
||
extra: | ||
homepage: https://speech.fish.audio | ||
version: | ||
provider: mike | ||
alternate: | ||
- name: English | ||
link: /en/ | ||
lang: en | ||
- name: 中文 | ||
link: /zh/ | ||
lang: zh |