Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fish TTS API Fails to Match Reference Audio Tone and Style #836

Open
6 tasks done
AshutoshMipax opened this issue Jan 17, 2025 · 0 comments
Open
6 tasks done

Fish TTS API Fails to Match Reference Audio Tone and Style #836

AshutoshMipax opened this issue Jan 17, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@AshutoshMipax
Copy link

AshutoshMipax commented Jan 17, 2025

Self Checks

  • This template is only for bug reports. For questions, please visit Discussions.
  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues, including closed ones. Search issues
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source), Self Hosted (Docker)

Environment Details

Environment Details
Operating System: Windows 11 (fully updated)
Processor: Intel Core i5 13th Gen
GPU: NVIDIA RTX 4050
Python Version: Python 3.12
Relevant Libraries and Versions:
torch: 2.4.1
Gradio: 4.44.0
pydub: Latest version installed via pip
ffmpeg: Installed and accessible via system PATH (version: 2024-08-01-git)

Steps to Reproduce

Install Fish TTS and dependencies as per the documentation.
Run the following code to use the Fish TTS API:
i have included the file for the code at the end of the document

from gradio_client import Client, handle_file

client = Client("http://127.0.0.1:7860/")
result = client.predict(
text="This is a test input.",
normalize=True,
reference_id="test_reference",
reference_audio=handle_file(r"C:\Users\ashu4\Music\Sound\final_new_vocal.wav"),
reference_text="",
max_new_tokens=0,
chunk_length=200,
top_p=0.7,
repetition_penalty=1.2,
temperature=0.7,
seed=0,
use_memory_cache="on",
api_name="/partial"
)
print(result)
Observe the results:
The generated audio chunks do not match the tone, speed, or style of the provided reference audio.
In some cases, the first chunk is synthesized as a female voice and the second as a male voice.
Stitch the chunks using the following code:
python
Copy
Edit
from pydub import AudioSegment

final_audio = AudioSegment.empty()
for chunk_path in ["chunk1.wav", "chunk2.wav"]: # Replace with actual chunk paths
final_audio += AudioSegment.from_file(chunk_path)
final_audio.export("final_output.wav", format="wav")
The final output is inconsistent and does not replicate the reference audio style.

fish.py.txt

✔️ Expected Behavior

The generated audio should replicate the tone, speed, and style of the reference audio provided in the reference_audio parameter.
All audio chunks should be consistent in voice, tone, and style.

❌ Actual Behavior

The generated audio:
Does not match the tone, speed, or style of the provided reference audio.
Is inconsistent between chunks (e.g., one chunk is in a male voice, another in a female voice).
When running the same input in the Gradio UI, the results are far better and match the reference audio, indicating that the API may not be fully utilizing GPU resources or properly processing the reference audio.

@AshutoshMipax AshutoshMipax added the bug Something isn't working label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant