Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

LeafYeeXYZ · 2024-12-27T07:52:44Z

Self Checks

This template is only for bug reports. For questions, please visit Discussions.
I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文日本語 Portuguese (Brazil)
I have searched for existing issues, including closed ones. Search issues
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Self Hosted (Source)

Environment Details

macOS = 15.2 (Apple M3), python = 3.10, torch=2.4.1

Steps to Reproduce

run the command python tools/api_server.py

✔️ Expected Behavior

The server starts using MPS

❌ Actual Behavior

If use MPS:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98040]
INFO:     Waiting for application startup.
2024-12-27 15:43:48.722 | INFO     | tools.server.model_manager:__init__:41 - mps is available, running on mps.
2024-12-27 15:43:57.305 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:43:57.306 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:43:57.312 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:43:58.504 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:43:58.505 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:43:58.517 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:43:58.519 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
ERROR:    Traceback (most recent call last):
  File "/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/kui/asgi/lifespan.py", line 36, in __call__
    await result
  File "/Users/leaf/Demo/fish-speech/tools/api_server.py", line 77, in initialize_app
    app.state.model_manager = ModelManager(
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 66, in __init__
    self.warm_up(self.tts_inference_engine)
  File "/Users/leaf/Demo/fish-speech/tools/server/model_manager.py", line 122, in warm_up
    list(inference(request, tts_inference_engine))
  File "/Users/leaf/Demo/fish-speech/tools/server/inference.py", line 25, in inference_wrapper
    raise HTTPException(
baize.exceptions.HTTPException: (<HTTPStatus.INTERNAL_SERVER_ERROR: 500>, "'Expected elements.dtype() == test_elements.dtype() to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)'")

ERROR:    Application startup failed. Exiting.

When I manually modify /tools/server/model_manager.py:

        # Check if MPS or CUDA is available
        # if torch.backends.mps.is_available():
        if False:
            self.device = "mps"
            logger.info("mps is available, running on mps.")
        elif not torch.cuda.is_available():
            self.device = "cpu"
            logger.info("CUDA is not available, running on CPU.")

and run python tools/api_server.py again, it works:

/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py:265: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn(
INFO:     Started server process [98127]
INFO:     Waiting for application startup.
2024-12-27 15:48:35.950 | INFO     | tools.server.model_manager:__init__:44 - CUDA is not available, running on CPU.
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:682 - Restored model from checkpoint
2024-12-27 15:48:43.325 | INFO     | tools.llama.generate:load_model:688 - Using DualARTransformer
2024-12-27 15:48:43.328 | INFO     | tools.server.model_manager:load_llama_model:100 - LLAMA model loaded.
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
/Users/leaf/Demo/fish-speech/.venv/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @autocast(enabled = False)
2024-12-27 15:48:44.036 | INFO     | tools.vqgan.inference:load_model:43 - Loaded model: <All keys matched successfully>
2024-12-27 15:48:44.036 | INFO     | tools.server.model_manager:load_decoder_model:108 - Decoder model loaded.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: Hello world.
2024-12-27 15:48:44.042 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  3%|███▍                                                                                                                    | 29/1023 [00:02<01:35, 10.37it/s]
2024-12-27 15:48:48.053 | INFO     | tools.llama.generate:generate_long:861 - Generated 31 tokens in 4.01 seconds, 7.73 tokens/sec
2024-12-27 15:48:48.054 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 4.93 GB/s
2024-12-27 15:48:48.066 | INFO     | tools.inference_engine.vq_manager:decode_vq_tokens:20 - VQ features: torch.Size([8, 30])
2024-12-27 15:48:48.524 | INFO     | tools.server.model_manager:warm_up:123 - Models warmed up.
2024-12-27 15:48:48.524 | INFO     | __main__:initialize_app:88 - Startup done, listening server at http://127.0.0.1:8080
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8080 (Press CTRL+C to quit)

And all functions works properly on CPU.

The text was updated successfully, but these errors were encountered:

LeafYeeXYZ · 2024-12-27T07:54:46Z

Same error in #779 when using webui

AspadaX · 2024-12-28T02:03:42Z

encountered the same issue on my M1 Pro mac.

acastry · 2024-12-30T04:59:45Z

Same on M1 Pro. I would like to discover how running in MPS can accelerate the process.

zutuanwang · 2024-12-30T09:25:32Z

Same issue happen in mac mini M2

aarohc · 2025-01-03T17:47:39Z

Same with Mac M3, works fine with CPU but MPS is the trouble. has anyone tried with API, without webui

Stardust-minus · 2025-01-04T08:31:01Z

Fixed, see #790

LeafYeeXYZ added the bug Something isn't working label Dec 27, 2024

LeafYeeXYZ mentioned this issue Dec 27, 2024

Support inference on mps device natively #461

Merged

3 tasks

Stardust-minus closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

LeafYeeXYZ commented Dec 27, 2024

LeafYeeXYZ commented Dec 27, 2024

AspadaX commented Dec 28, 2024

acastry commented Dec 30, 2024

zutuanwang commented Dec 30, 2024

aarohc commented Jan 3, 2025 •

edited

Loading

Stardust-minus commented Jan 4, 2025

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

Error when using MPS: "Expected elements.dtype() == test_elements.dtype() to be true, but got false." #789

Comments

LeafYeeXYZ commented Dec 27, 2024

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

LeafYeeXYZ commented Dec 27, 2024

AspadaX commented Dec 28, 2024

acastry commented Dec 30, 2024

zutuanwang commented Dec 30, 2024

aarohc commented Jan 3, 2025 • edited Loading

Stardust-minus commented Jan 4, 2025

aarohc commented Jan 3, 2025 •

edited

Loading