You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for bug reports. For questions, please visit Discussions.
I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English中文日本語Portuguese (Brazil)
I have searched for existing issues, including closed ones. Search issues
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source), Self Hosted (Docker)
Environment Details
Environment Details
Operating System: Windows 11 (fully updated)
Processor: Intel Core i5 13th Gen
GPU: NVIDIA RTX 4050
Python Version: Python 3.12
Relevant Libraries and Versions:
torch: 2.4.1
Gradio: 4.44.0
pydub: Latest version installed via pip
ffmpeg: Installed and accessible via system PATH (version: 2024-08-01-git)
Steps to Reproduce
Install Fish TTS and dependencies as per the documentation.
Run the following code to use the Fish TTS API:
i have included the file for the code at the end of the document
from gradio_client import Client, handle_file
client = Client("http://127.0.0.1:7860/")
result = client.predict(
text="This is a test input.",
normalize=True,
reference_id="test_reference",
reference_audio=handle_file(r"C:\Users\ashu4\Music\Sound\final_new_vocal.wav"),
reference_text="",
max_new_tokens=0,
chunk_length=200,
top_p=0.7,
repetition_penalty=1.2,
temperature=0.7,
seed=0,
use_memory_cache="on",
api_name="/partial"
)
print(result)
Observe the results:
The generated audio chunks do not match the tone, speed, or style of the provided reference audio.
In some cases, the first chunk is synthesized as a female voice and the second as a male voice.
Stitch the chunks using the following code:
python
Copy
Edit
from pydub import AudioSegment
final_audio = AudioSegment.empty()
for chunk_path in ["chunk1.wav", "chunk2.wav"]: # Replace with actual chunk paths
final_audio += AudioSegment.from_file(chunk_path)
final_audio.export("final_output.wav", format="wav")
The final output is inconsistent and does not replicate the reference audio style.
The generated audio should replicate the tone, speed, and style of the reference audio provided in the reference_audio parameter.
All audio chunks should be consistent in voice, tone, and style.
❌ Actual Behavior
The generated audio:
Does not match the tone, speed, or style of the provided reference audio.
Is inconsistent between chunks (e.g., one chunk is in a male voice, another in a female voice).
When running the same input in the Gradio UI, the results are far better and match the reference audio, indicating that the API may not be fully utilizing GPU resources or properly processing the reference audio.
The text was updated successfully, but these errors were encountered:
Self Checks
Cloud or Self Hosted
Self Hosted (Source), Self Hosted (Docker)
Environment Details
Environment Details
Operating System: Windows 11 (fully updated)
Processor: Intel Core i5 13th Gen
GPU: NVIDIA RTX 4050
Python Version: Python 3.12
Relevant Libraries and Versions:
torch: 2.4.1
Gradio: 4.44.0
pydub: Latest version installed via pip
ffmpeg: Installed and accessible via system PATH (version: 2024-08-01-git)
Steps to Reproduce
Install Fish TTS and dependencies as per the documentation.
Run the following code to use the Fish TTS API:
i have included the file for the code at the end of the document
from gradio_client import Client, handle_file
client = Client("http://127.0.0.1:7860/")
result = client.predict(
text="This is a test input.",
normalize=True,
reference_id="test_reference",
reference_audio=handle_file(r"C:\Users\ashu4\Music\Sound\final_new_vocal.wav"),
reference_text="",
max_new_tokens=0,
chunk_length=200,
top_p=0.7,
repetition_penalty=1.2,
temperature=0.7,
seed=0,
use_memory_cache="on",
api_name="/partial"
)
print(result)
Observe the results:
The generated audio chunks do not match the tone, speed, or style of the provided reference audio.
In some cases, the first chunk is synthesized as a female voice and the second as a male voice.
Stitch the chunks using the following code:
python
Copy
Edit
from pydub import AudioSegment
final_audio = AudioSegment.empty()
for chunk_path in ["chunk1.wav", "chunk2.wav"]: # Replace with actual chunk paths
final_audio += AudioSegment.from_file(chunk_path)
final_audio.export("final_output.wav", format="wav")
The final output is inconsistent and does not replicate the reference audio style.
fish.py.txt
✔️ Expected Behavior
The generated audio should replicate the tone, speed, and style of the reference audio provided in the reference_audio parameter.
All audio chunks should be consistent in voice, tone, and style.
❌ Actual Behavior
The generated audio:
Does not match the tone, speed, or style of the provided reference audio.
Is inconsistent between chunks (e.g., one chunk is in a male voice, another in a female voice).
When running the same input in the Gradio UI, the results are far better and match the reference audio, indicating that the API may not be fully utilizing GPU resources or properly processing the reference audio.
The text was updated successfully, but these errors were encountered: