Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing fails with segmentation fault #265

Open
mckees opened this issue Jan 24, 2025 · 4 comments
Open

Testing fails with segmentation fault #265

mckees opened this issue Jan 24, 2025 · 4 comments

Comments

@mckees
Copy link

mckees commented Jan 24, 2025

OS: Ubuntu 24.04, freshly provisioned
Hardware: i7-1185G7, i9-12900 (reproduced on both)
Level Zero version: 1.19.2
Setup:
I am working to get the level-zero version in Ubuntu bumped to 1.19.2, which is why I'm installing from a PPA

sudo apt install -y build-essential cmake
sudo add-apt-repository ppa:mckeesh/testing
sudo apt -y update
sudo apt install libze1=1.19.2-0ubuntu1
git clone https://github.com/oneapi-src/level-zero
cd level-zero/
git checkout v1.19.2
mkdir build
cd build/
cmake -DBUILD_L0_LOADER_TESTS=yes ..
make -j`nproc`
./bin/tests 

Result:

Running main() from /home/ubuntu/level-zero/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LoaderAPI
[ RUN      ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned
/home/ubuntu/level-zero/test/loader_api.cpp:26: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

Found 1 versions

component.component_name: loader
component.component_lib_version.major: 1
component.spec_version: 65548
component.component_lib_name: loader

[  FAILED  ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned (0 ms)
[----------] 1 test from LoaderAPI (0 ms total)

[----------] 11 tests from LoaderInit
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned
Segmentation fault (core dumped)
@rwmcguir
Copy link
Contributor

For Reference: https://dgpu-docs.intel.com/driver/client/overview.html
View the section on Ubuntu 24.04

Two different things happening here:

Error code 2013265921d is ZE_RESULT_ERROR_UNINITIALIZED = 0x78000001..
(

ZE_RESULT_ERROR_UNINITIALIZED = 0x78000001, ///< [Validation] driver is not initialized
)
This indicates the L0 Loader cannot find a working Intel(R) GPU or NPU driver in the system, or those drivers failed to find a valid device to attach to, or are you running on a system with that didn't load a kernel driver. I see you posted the CPUs, both of which have valid HW, so this isn't the issue but are the KMD drivers actually running and is the user mode driver installed. )

First check to see if you've got a User Mode Driver installed and re-test
$ sudo apt install libze-intel-gpu1
( I am assuming your PPA has this package, if not install the one Intel is hosting )

If that doesn't work, check your dmesg to ensure that an Intel GPU is being loaded during boot with an i915.ko (potentially also xekmd.ko if you've got anything newer in a discrete slot).

The segfault, clearly shouldn't happen, but is likely a error induced by missing drivers.
We should address the segfault within the L0 Loader to fix it

@mckees
Copy link
Author

mckees commented Feb 6, 2025

Hmm, interesting. I tried moving to Ubuntu 24.10 since we have been working more on that and have had validation teams check our stack. Here's what I'm seeing now:

ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ apt policy libze1 
libze1:
  Installed: 1.19.2.0-1076~24.10
  Candidate: 1.19.2.0-1076~24.10
  Version table:
 *** 1.19.2.0-1076~24.10 500
        500 https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu oracular/main amd64 Packages
        100 /var/lib/dpkg/status
     1.17.42-1 500
        500 http://archive.ubuntu.com/ubuntu oracular/universe amd64 Packages
ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ apt policy libze-intel-gpu1 
libze-intel-gpu1:
  Installed: 24.52.32224.5-1~24.10~ppa2
  Candidate: 24.52.32224.5-1~24.10~ppa2
  Version table:
 *** 24.52.32224.5-1~24.10~ppa2 500
        500 https://ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu oracular/main amd64 Packages
        100 /var/lib/dpkg/status
     24.35.30872.24-1 500
        500 http://archive.ubuntu.com/ubuntu oracular/universe amd64 Packages
ubuntu@hp-elite-mini-800-g9-desktop-pc-c29603:~/level-zero/build$ ./bin/zello_world 
Driver not initialized: ZE_RESULT_ERROR_UNINITIALIZED
Did NOT find matching ZE_DEVICE_TYPE_GPU device!

@mckees
Copy link
Author

mckees commented Feb 6, 2025

Showing the same tests as before, I'm seeing more tests run after installing an updated libze-intel-gpu1 version, but there's still the segfaulting and uninitialized issues"

Running main() from /home/ubuntu/level-zero/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 15 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 1 test from LoaderAPI
[ RUN      ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned
/home/ubuntu/level-zero/test/loader_api.cpp:26: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

Found 2 versions

component.component_name: loader
component.component_lib_version.major: 1
component.spec_version: 65547
component.component_lib_name: loader

component.component_name: tracing layer
component.component_lib_version.major: 1
component.spec_version: 65547
component.component_lib_name: tracing layer

[  FAILED  ] LoaderAPI.GivenLevelZeroLoaderPresentWhenCallingzeGetLoaderVersionsAPIThenValidVersionIsReturned (24 ms)
[----------] 1 test from LoaderAPI (24 ms total)

[----------] 11 tests from LoaderInit
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned
/home/ubuntu/level-zero/test/loader_api.cpp:64: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:65: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithTypesUnsupportedWithFailureThenSupportedTypesThenSuccessReturned (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithGPUTypeThenExpectPassWithGPUorAllOnly
/home/ubuntu/level-zero/test/loader_api.cpp:77: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:78: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:81: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:82: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:85: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:86: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithGPUTypeThenExpectPassWithGPUorAllOnly (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithNPUTypeThenExpectPassWithNPUorAllOnly
/home/ubuntu/level-zero/test/loader_api.cpp:98: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:99: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:102: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:103: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:106: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:107: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithNPUTypeThenExpectPassWithNPUorAllOnly (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithAnyTypeWithNullDriverAcceptingAllThenExpectatLeast1Driver
/home/ubuntu/level-zero/test/loader_api.cpp:119: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:120: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:123: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:124: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:127: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:128: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:131: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:132: Failure
Expected: (pCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversWithAnyTypeWithNullDriverAcceptingAllThenExpectatLeast1Driver (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithAllTypes
/home/ubuntu/level-zero/test/loader_api.cpp:145: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:146: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:147: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:149: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithAllTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithGPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:162: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:163: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:164: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:166: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithGPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenZeInitDriversUnsupportedOnTheDriverWhenCallingZeInitDriversThenUninitializedReturned
/home/ubuntu/level-zero/test/loader_api.cpp:181: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:183: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenZeInitDriversUnsupportedOnTheDriverWhenCallingZeInitDriversThenUninitializedReturned (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithNPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:196: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:197: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:198: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_VPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:200: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingZeInitDriversThenzeInitThenBothCallsSucceedWithNPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithAllTypes
/home/ubuntu/level-zero/test/loader_api.cpp:213: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(0)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:214: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:215: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:217: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithAllTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithGPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:230: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:231: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:232: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:234: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithGPUTypes (0 ms)
[ RUN      ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithNPUTypes
/home/ubuntu/level-zero/test/loader_api.cpp:247: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_VPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:248: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pInitDriversCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_api.cpp:249: Failure
Expected: (pInitDriversCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_api.cpp:251: Failure
Expected: (pDriverGetCount) > (0), actual: 0 vs 0

[  FAILED  ] LoaderInit.GivenLevelZeroLoaderPresentWhenCallingzeInitThenZeInitDriversThenBothCallsSucceedWithNPUTypes (0 ms)
[----------] 11 tests from LoaderInit (0 ms total)

[----------] 3 tests from LoaderValidation
[ RUN      ] LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock
/home/ubuntu/level-zero/test/loader_validation_layer.cpp:28: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInit(ZE_INIT_FLAG_GPU_ONLY)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:29: Failure
Expected equality of these values:
  ZE_RESULT_SUCCESS
    Which is: 0
  zeInitDrivers(&pCount, nullptr, &desc)
    Which is: 2013265921

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:30: Failure
Expected: (pCount) > (0), actual: 0 vs 0

/home/ubuntu/level-zero/test/loader_validation_layer.cpp:72: Failure
Expected: (pDevice) != (nullptr), actual: NULL vs (nullptr)

Segmentation fault (core dumped)

@mckees
Copy link
Author

mckees commented Feb 7, 2025

Alright, I did 2 things to improve my situation:

  1. Installed the NPU UMD, which provided /usr/lib/x86_64-linux-gnu/libze_intel_vpu.so.1
  2. Ran the tests as root. This was only obviously necessary once strace said it was failing to access /usr/lib/x86_64-linux-gnu/libze_intel_vpu.so.1

After that, the tests seemed to be getting stuck, so I did some debugging and found that it was hanging on a call to zeCommandQueueSynchronize. I made the following change to get past that by reducing the timeout:

diff --git a/test/loader_validation_layer.cpp b/test/loader_validation_layer.cpp
index d04f795..2138d90 100644
--- a/test/loader_validation_layer.cpp
+++ b/test/loader_validation_layer.cpp
@@ -168,26 +180,36 @@ TEST(
     status = zeCommandQueueCreate(context, pDevice, &command_queue_description, &command_queue);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);
 
+    std::cout << "lvl 6" << std::endl;
+
     status = zeCommandQueueExecuteCommandLists(command_queue, 1, &command_list, nullptr);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);
+    std::cout << "lvl 6.1" << std::endl;
 
-    status = zeCommandQueueSynchronize(command_queue, UINT64_MAX);
+    status = zeCommandQueueSynchronize(command_queue, 10000000000);
     EXPECT_EQ(ZE_RESULT_SUCCESS, status);

From there, I was able to get through all the tests on i9-12900.

However, on Core Ultra 7 268V, I'm still getting a test hang here in ze_libapi.cpp during the test LoaderValidation.GivenLevelZeroLoaderPresentWhenCallingzeCommandListAppendMemoryCopyWithCircularDependencyOnEventsThenValidationLayerPrintsWarningOfDeadlock:

ze_result_t ZE_APICALL
zeMemFree(
    ze_context_handle_t hContext,                   ///< [in] handle of the context object
    void* ptr                                       ///< [in][release] pointer to memory to free
    )
{
    if(ze_lib::context->inTeardown) {
        return ZE_RESULT_ERROR_UNINITIALIZED;
    }

    auto pfnFree = ze_lib::context->zeDdiTable.load()->Mem.pfnFree;
    if( nullptr == pfnFree ) {
        if(!ze_lib::context->isInitialized)
            return ZE_RESULT_ERROR_UNINITIALIZED;
        else
            return ZE_RESULT_ERROR_UNSUPPORTED_FEATURE;
    }

    return pfnFree( hContext, ptr ); /////////////// GETS STUCK ///////////////////
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants