You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This template is only for bug reports. For questions, please visit Discussions.
I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English中文日本語Portuguese (Brazil)
I have searched for existing issues, including closed ones. Search issues
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
Please do not modify this template and fill in all required fields.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
INFO: Started server process [98040]
INFO: Waiting for application startup.
2024-12-27 15:43:48.722 | INFO | tools.server.model_manager:__init__:41 - mps is available, running on mps.
2024-12-27 15:43:57.305 | INFO | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:43:57.306 | INFO | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:43:57.312 | INFO | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
2024-12-27 15:43:58.504 | INFO | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:43:58.505 | INFO | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:43:58.517 | INFO | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:43:58.519 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
ERROR: Traceback (most recent call last):
File "/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
await result
File "/Users/leaf/Demo/fish-speech/tools/api_server.py", line 77, in initialize_app
app.state.model_manager = ModelManager(
File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 66, in __init__
self.warm_up(self.tts_inference_engine)
File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 122, in warm_up
list(inference(request, tts_inference_engine))
File "/Users/leaf/Demo/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, "'Expected elements.dtype() == test_elements.dtype() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)'")
ERROR: Application startup failed. Exiting.
When I manually modify /tools/server/model_manager.py:
# Check if MPS or CUDA is available# if torch.backends.mps.is_available():ifFalse:
self.device="mps"logger.info("mps is available, running on mps.")
elifnottorch.cuda.is_available():
self.device="cpu"logger.info("CUDA is not available, running on CPU.")
and run python tools/api_server.py again, it works:
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn(
INFO: Started server process [98127]
INFO: Waiting for application startup.
2024-12-27 15:48:35.950 | INFO | tools.server.model_manager:__init__:44 - CUDA is not available, running on CPU.
2024-12-27 15:48:43.325 | INFO | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:48:43.325 | INFO | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:48:43.328 | INFO | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
@autocast(enabled = False)
2024-12-27 15:48:44.036 | INFO | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:48:44.036 | INFO | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:48:44.042 | INFO | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:48:44.042 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
3%|███▍ | 29/1023 [00:02<01:35, 10.37it/s]
2024-12-27 15:48:48.053 | INFO | tools.llama.generate:generate_long:861 - Generated 31 tokens in 4.01 seconds, 7.73 tokens/sec
2024-12-27 15:48:48.054 | INFO | tools.llama.generate:generate_long:864 - Bandwidth achieved: 4.93 GB/s
2024-12-27 15:48:48.066 | INFO | tools.inference_engine.vq_manager:decode_vq_tokens:20 - VQ features: torch.Size([8, 30])
2024-12-27 15:48:48.524 | INFO | tools.server.model_manager:warm_up:123 - Models warmed up.
2024-12-27 15:48:48.524 | INFO | __main__:initialize_app:88 - Startup done, listening server at http://127.0.0.1:8080
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)
And all functions works properly on CPU.
The text was updated successfully, but these errors were encountered:
Self Checks
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
macOS = 15.2 (Apple M3), python = 3.10, torch=2.4.1
Steps to Reproduce
python tools/api_server.py
✔️ Expected Behavior
The server starts using
MPS
❌ Actual Behavior
If use
MPS
:When I manually modify
/tools/server/model_manager.py
:and run
python tools/api_server.py
again, it works:And all functions works properly on CPU.
The text was updated successfully, but these errors were encountered: