[iGPU] The device does not have the ext_intel_free_memory aspect #1352

Stonepia · 2025-02-10T07:52:08Z

🐛 Describe the bug

This issue only happens on iGPU on Windows. It could pass on BMG. The error message is below:

>>> import torch
>>> torch.xpu.mem_get_info()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\sdp\miniforge3\envs\tongsu_stock_pt\lib\site-packages\torch\xpu\memory.py", line 194, in mem_get_info
    return torch._C._xpu_getMemoryInfo(device)
RuntimeError: The device does not have the ext_intel_free_memory aspect
>>> torch.version.xpu
'20250000'

Versions

PyTorch version: 2.7.0.dev20250209+xpu
XPU used to build PyTorch: 20250000
Is XPU available: True
Intel GPU driver version:

32.0.101.6458 (20250110000000.***+)
Intel GPU models onboard:
Intel(R) Arc(TM) 140V GPU (16GB)
Intel GPU models detected:
[0] _XpuDeviceProperties(name='Intel(R) Arc(TM) 140V GPU (16GB)', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero', type='gpu', driver_version='1.6.31441', total_memory=16900MB, max_compute_units=64, gpu_eu_count=64, gpu_subslice_count=8, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)

# Motivation Friendly handle the runtime error message if the device doesn't support querying the available free memory. See intel/torch-xpu-ops#1352 Pull Request resolved: #146899 Approved by: https://github.com/EikanWang

# Motivation Friendly handle the runtime error message if the device doesn't support querying the available free memory. See intel/torch-xpu-ops#1352 Pull Request resolved: pytorch#146899 Approved by: https://github.com/EikanWang

Stonepia · 2025-02-20T02:43:01Z

More details for future reproduce:

// icpx demo.cpp -o demo.exe -fsycl
#include <sycl/sycl.hpp>
 
#include <cstdint>
#include <vector>
 
namespace {
 
struct DevicePool {
  std::vector<std::unique_ptr<sycl::device>> devices;
  std::unique_ptr<sycl::context> context;
} gDevicePool;
 
void enumDevices(std::vector<std::unique_ptr<sycl::device>>& devices) {
  for (const auto& platform : sycl::platform::get_platforms()) {
    if (platform.get_backend() != sycl::backend::ext_oneapi_level_zero) {
        continue;
    }
    for (const auto& device : platform.get_devices()) {
      if (device.is_gpu()) {
        devices.push_back(std::make_unique<sycl::device>(device));
      }
    }
    break;
  }
}
 
inline void initGlobalDevicePoolState() {
  // Enumerate all GPU devices and record them.
  enumDevices(gDevicePool.devices);
  if (gDevicePool.devices.empty()) {
    return;
  }
  gDevicePool.context = std::make_unique<sycl::context>(
      gDevicePool.devices[0]->get_platform().ext_oneapi_get_default_context());
}
 
}
 
sycl::device& get_raw_device(int device) {
  return *gDevicePool.devices[device];
}
 
sycl::context& get_device_context() {
  return *gDevicePool.context;
}
 
int device_count() {
  return gDevicePool.devices.size();
}
 
int main() {
  initGlobalDevicePoolState();
  const auto count = device_count();
  std::cout << "device count is " << count << std::endl;
  if (count <= 0) {
    return 0;
  }
  for (auto i = 0; i < count; i++) {
    auto& device = get_raw_device(i);
    std::cout << i << "th device name is " << device.get_info<sycl::info::device::name>() << ", total memory is "
              << device.get_info<sycl::info::device::global_mem_size>()/1024./1024./1024. << " Gb.";
    if (device.has(sycl::aspect::ext_intel_free_memory)) {
      std::cout << " free device memory is " << device.get_info<sycl::ext::intel::info::device::free_memory>()/1024./1024./1024. << "Gb.";
    } else {
      // This happens on LNL, it lacks of the sycl::aspect::ext_intel_free_memory 
       std::cout << " ERROR: free device memory is not available.";
    }
    std::cout << std::endl;
  }
 
  std::cout << "finish!" << std::endl;
}

The output is below, You could find that it does not support the sycl::aspect::ext_intel_free_memory:

>demo.exe
device count is 1
0th device name is Intel(R) Arc(TM) 140V GPU (16GB), total memory is 16.5044 Gb. ERROR: free device memory is not available.
finish!

>sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) 140V GPU (16GB) 20.4.4 [1.6.31441]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 9 288V 3.30GHz OpenCL 3.0 (Build 0) [2024.18.12.0.05_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) 140V GPU (16GB) OpenCL 3.0 NEO  [32.0.101.6458]

The UR_TRACE would get the following output:

ZE ---> zeCommandListCreateImmediate(ZeContext, Device->ZeDevice, &ZeCommandQueueDesc, &ZeCommandListInit)
(.DeviceCount = 1, .phDevices = {000001E0CD323980}, .pProperties = nullptr, .phContext = 000001E0CE08AB80 (000001E0CE0AEC20)) -> UR_RESULT_SUCCESS;
device count is 1
0th device name is ---> urDeviceGetInfo(.hDevice = 000001E0CD323980, .propName = UR_DEVICE_INFO_NAME, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 00000026E23BFC30 (33)) -> UR_RESULT_SUCCESS;
---> urDeviceGetInfo(.hDevice = 000001E0CD323980, .propName = UR_DEVICE_INFO_NAME, .propSize = 33, .pPropValue = 000001E0CE099E50 (Intel(R) Arc(TM) 140V GPU (16GB)), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
Intel(R) Arc(TM) 140V GPU (16GB), total memory is ---> urDeviceGetInfoZE ---> zeDeviceGetMemoryProperties(ZeDevice, &Count, nullptr)
ZE ---> zeDeviceGetMemoryProperties(ZeDevice, &Count, PropertiesVector.data())
(.hDevice = 000001E0CD323980, .propName = UR_DEVICE_INFO_GLOBAL_MEM_SIZE, .propSize = 8, .pPropValue = 00000026E23BFD50 (17721458688), .pPropSizeRet = nullptr) -> UR_RESULT_SUCCESS;
16.5044 Gb.---> urDeviceGetInfoZE ---> zesDeviceEnumMemoryModules(ZesDevice, &MemCount, nullptr)
(.hDevice = 000001E0CD323980, .propName = UR_DEVICE_INFO_GLOBAL_MEM_FREE, .propSize = 0, .pPropValue = nullptr, .pPropSizeRet = 00000026E23BFD18 (0)) -> UR_RESULT_ERROR_UNSUPPORTED_ENUMERATION;
 ERROR: free device memory is not available.
finish!

Stonepia · 2025-02-20T02:43:22Z

The root cause of this issue is that the driver does not support it now. See the GSD-10758 for the internal track.

Summary, is this is not supported on Integrated Platforms (Like LNL) at this time.
UR_DEVICE_INFO_GLOBAL_MEM_FREE needs device modules from Sysman being reported which are not, hence not supported return codes from UR.

UR detects 0 modules being reported and returns this error.

Stonepia · 2025-02-20T02:49:19Z

The issue should not be a blocking issue, since we already have a notification in PyTorch to warn users. So it won't block the user's overall experience:

pytorch/pytorch#146899

# Motivation Friendly handle the runtime error message if the device doesn't support querying the available free memory. See intel/torch-xpu-ops#1352 Pull Request resolved: #146899 Approved by: https://github.com/EikanWang

Stonepia · 2025-02-21T07:50:52Z

Additional Ref : intel/compute-runtime#742

# Motivation Friendly handle the runtime error message if the device doesn't support querying the available free memory. See intel/torch-xpu-ops#1352 Pull Request resolved: pytorch#146899 Approved by: https://github.com/EikanWang

Stonepia added client hw : LNL labels Feb 10, 2025

Stonepia assigned guangyey and Stonepia Feb 10, 2025

guangyey mentioned this issue Feb 11, 2025

Friendly handle mem_get_info's runtime error message pytorch/pytorch#146899

Closed

Stonepia mentioned this issue Feb 20, 2025

[Win] torch.xpu.mem_get_info() failed on Integrated Platforms #1384

Closed

Stonepia added hw : MTL MTL platform module: dependency bug Problem is not caused by us, but caused by the library we use os: Windows Windows Platform labels Feb 20, 2025

Stonepia changed the title ~~[LNL] The device does not have the ext_intel_free_memory aspect~~ [iGPU] The device does not have the ext_intel_free_memory aspect Feb 20, 2025

daisyden added this to the 2.8 milestone Feb 24, 2025

Stonepia added the dependency component: driver label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[iGPU] The device does not have the ext_intel_free_memory aspect #1352

[iGPU] The device does not have the ext_intel_free_memory aspect #1352

Stonepia commented Feb 10, 2025 •

edited

Loading

Stonepia commented Feb 20, 2025

Stonepia commented Feb 20, 2025

Stonepia commented Feb 20, 2025

Stonepia commented Feb 21, 2025 •

edited

Loading

[iGPU] The device does not have the ext_intel_free_memory aspect #1352

[iGPU] The device does not have the ext_intel_free_memory aspect #1352

Comments

Stonepia commented Feb 10, 2025 • edited Loading

🐛 Describe the bug

Versions

Stonepia commented Feb 20, 2025

Stonepia commented Feb 20, 2025

Stonepia commented Feb 20, 2025

Stonepia commented Feb 21, 2025 • edited Loading

Stonepia commented Feb 10, 2025 •

edited

Loading

Stonepia commented Feb 21, 2025 •

edited

Loading