Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lunar Lake] UR_RESULT_ERROR_DEVICE_LOST #780

Open
rpolyano opened this issue Feb 4, 2025 · 2 comments
Open

[Lunar Lake] UR_RESULT_ERROR_DEVICE_LOST #780

rpolyano opened this issue Feb 4, 2025 · 2 comments
Assignees

Comments

@rpolyano
Copy link

rpolyano commented Feb 4, 2025

Describe the bug

Trying to load the openbmb/MiniCPM-o-2_6 model results in

Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)
  File "...py/nightingale/server.py", line 49, in __init__
    self.model = self.model.to(self._device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...py/nightingale/server_test.py", line 10, in <module>
    service = MiniCPMService()
              ^^^^^^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: 20 (UR_RESULT_ERROR_DEVICE_LOST)

If I add .eval() after .model() it fully crashes my entire desktop, and sends me back to login screen.

I have also tried this in the docker.io/intel/intel-extension-for-pytorch:2.5.10-xpu docker container, same result.

Full code snippet:

import enum
from io import BytesIO
from typing import NewType, TypeAlias, TypeVar
from grpc import ServicerContext
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer, AutoModel
from PIL import Image


def _select_device() -> torch.device:
    if torch.cuda.is_available():
        return torch.device("cuda:0")
    if torch.xpu.is_available():
        return torch.device("xpu")
    if torch.mps.is_available():
        return torch.device("mps")
    return torch.device("cpu")

class MiniCPMService:

    def __init__(self) -> None:
        super().__init__()

        self._device = _select_device()
        self._torch_dtype = (
            torch.bfloat16 if self._device.type != "cpu" else torch.float16
        )

        print(f'Running on {self._device} with dtype {self._torch_dtype}')

        self.model = AutoModel.from_pretrained(
            'openbmb/MiniCPM-o-2_6',
            trust_remote_code=True,
            attn_implementation='sdpa', # sdpa or flash_attention_2
            torch_dtype=torch.bfloat16,
            init_vision=True,
            init_audio=False,
            init_tts=False
        )


        self.model = self.model.to(self._device)
        # self.model = self.model.eval().to(self._device) # This eval() results in a full system crash
        self.tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-o-2_6', trust_remote_code=True)

Versions

Traceback (most recent call last):
File .../collect_env.py", line 19, in
import intel_extension_for_pytorch as ipex
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/intel_extension_for_pytorch/init.py", line 147, in
from . import _dynamo
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/intel_extension_for_pytorch/_dynamo/init.py", line 4, in
from torch._inductor.compile_fx import compile_fx
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 49, in
from torch._inductor.debug import save_args_for_compile_fx_inner
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/debug.py", line 26, in
from . import config, ir # noqa: F811, this is needed
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/ir.py", line 77, in
from .runtime.hints import ReductionHint
File "/home/roman/.local/share/virtualenvs/nightingale-uqI8m8sk/lib/python3.12/site-packages/torch/_inductor/runtime/hints.py", line 36, in
attr_desc_fields = {f.name for f in fields(AttrsDescriptor)}
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/roman/.pyenv/versions/3.12.8/lib/python3.12/dataclasses.py", line 1289, in fields
raise TypeError('must be called with a dataclass type or instance') from None
TypeError: must be called with a dataclass type or instance

@louie-tsai louie-tsai self-assigned this Feb 5, 2025
@louie-tsai
Copy link
Contributor

@rpolyano
Sorry for late response.
Since you faced a device lost, could you use xpu-smi to check whether you have right devices inside docker?
here are the instructions
https://intel.github.io/xpumanager/smi_user_guide.html#discover-the-devices-in-this-machine
thanks
Louie

@louie-tsai
Copy link
Contributor

@rpolyano
moreover, your codes need flash-attn which doesn't support XPU or CPU.
https://pypi.org/project/flash-attn/
In that case, we might not be able to run the codes with flash-attn package dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants