Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

Closed
6 tasks done
LeafYeeXYZ opened this issue Dec 27, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@LeafYeeXYZ
Copy link

Self Checks

  • This template is only for bug reports. For questions, please visit Discussions.
  • I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
  • I have searched for existing issues, including closed ones. Search issues
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source)

Environment Details

macOS = 15.2 (Apple M3), python = 3.10, torch=2.4.1

Steps to Reproduce

  1. run the command python tools/api_server.py

✔️ Expected Behavior

The server starts using MPS

❌ Actual Behavior

If use MPS:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98040]
INFO:     Waiting for application startup.
2024-12-27 15:43:48.722 | INFO     | tools.server.model_manager:__init__:41 - mps is available, running on mps.
2024-12-27 15:43:57.305 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:43:57.306 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:43:57.312 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:43:58.504 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:43:58.505 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:43:58.517 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:43:58.519 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
ERROR:    Traceback (most recent call last):
  File "/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/Users/leaf/Demo/fish-speech/tools/api_server.py", line 77, in initialize_app
    app.state.model_manager = ModelManager(
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 66, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 122, in warm_up
    list(inference(request, tts_inference_engine))
  File "/Users/leaf/Demo/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, "'Expected elements.dtype() == test_elements.dtype() to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)'")

ERROR:    Application startup failed. Exiting.

When I manually modify /tools/server/model_manager.py:

        # Check if MPS or CUDA is available
        # if torch.backends.mps.is_available():
        if False:
            self.device = "mps"
            logger.info("mps is available, running on mps.")
        elif not torch.cuda.is_available():
            self.device = "cpu"
            logger.info("CUDA is not available, running on CPU.")

and run python tools/api_server.py again, it works:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98127]
INFO:     Waiting for application startup.
2024-12-27 15:48:35.950 | INFO     | tools.server.model_manager:__init__:44 - CUDA is not available, running on CPU.
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:48:43.328 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:48:44.036 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:48:44.036 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  3%|███▍                                                                                                                    | 29/1023 [00:02<01:35, 10.37it/s]
2024-12-27 15:48:48.053 | INFO     | tools.llama.generate:generate_long:861 - Generated 31 tokens in 4.01 seconds, 7.73 tokens/sec
2024-12-27 15:48:48.054 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 4.93 GB/s
2024-12-27 15:48:48.066 | INFO     | tools.inference_engine.vq_manager:decode_vq_tokens:20 - VQ features: torch.Size([8, 30])
2024-12-27 15:48:48.524 | INFO     | tools.server.model_manager:warm_up:123 - Models warmed up.
2024-12-27 15:48:48.524 | INFO     | __main__:initialize_app:88 - Startup done, listening server at http://127.0.0.1:8080
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)

And all functions works properly on CPU.

@LeafYeeXYZ LeafYeeXYZ added the bug Something isn't working label Dec 27, 2024
@LeafYeeXYZ
Copy link
Author

Same error in #779 when using webui

@AspadaX
Copy link

AspadaX commented Dec 28, 2024

encountered the same issue on my M1 Pro mac.

@acastry
Copy link

acastry commented Dec 30, 2024

Same on M1 Pro. I would like to discover how running in MPS can accelerate the process.

@zutuanwang
Copy link

Same issue happen in mac mini M2

@aarohc
Copy link

aarohc commented Jan 3, 2025

Same with Mac M3, works fine with CPU but MPS is the trouble. has anyone tried with API, without webui

@Stardust-minus
Copy link
Member

Fixed, see #790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants