Skip to content

Blackhole post-commit tests #3072

Blackhole post-commit tests

Blackhole post-commit tests #3072

Manually triggered January 23, 2025 09:26
Status Failure
Total duration 6h 7m 3s
Artifacts 2

blackhole-post-commit.yaml

on: workflow_dispatch
build-docker-image  /  build-docker-image
19s
build-docker-image / build-docker-image
static-checks  /  Run Pre-commit Hooks
27s
static-checks / Run Pre-commit Hooks
static-checks  /  check-black
2s
static-checks / check-black
static-checks  /  check-spdx-licenses
54s
static-checks / check-spdx-licenses
static-checks  /  check-metal-kernel-count
6s
static-checks / check-metal-kernel-count
static-checks  /  check-doc
1m 28s
static-checks / check-doc
static-checks  /  check-forbidden-imports
5s
static-checks / check-forbidden-imports
static-checks  /  check-sweeps-workflow
7s
static-checks / check-sweeps-workflow
static-checks  /  cmake-version
22s
static-checks / cmake-version
umd-unit-tests  /  blackhole P150-175
4m 58s
umd-unit-tests / blackhole P150-175
build-artifact  /  ...  /  build-docker-image
build-artifact / build-docker-image / build-docker-image
build-artifact  /  build-artifact
3m 31s
build-artifact / build-artifact
build-wheels  /  build-wheel
3m 47s
build-wheels / build-wheel
Matrix: cpp-unit-tests / cpp-unit-tests
Matrix: sd-unit-tests / cpp-unit-tests-slow-dispatch
Matrix: fd-unit-tests / fd-tests
Fit to window
Zoom out
Zoom in

Annotations

46 errors, 34 warnings, and 26 notices
umd-unit-tests / blackhole P150-175
Process completed with exit code 1.
sd-unit-tests / blackhole P150-175 All C++
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
unsuccessful-reset-startup
Unable to reset board successfully, rebooting
sd-unit-tests / blackhole P150-175 All C++
The operation was canceled.
sd-unit-tests / blackhole P150-175 api
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
cards-not-detected-startup
tenstorrent module not in use, usually meaning the driver is not working or the cards aren't detected. Please let the infra team know to check the driver / cards
cards-not-detected-shutting-down-startup
This runner will now shutdown and refuse to run further jobs. Please let the infra team know by filing an issue with the CI job link and tagging them.
sd-unit-tests / blackhole P150-175 api
The operation was canceled.
sd-unit-tests / blackhole P150-175 device
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 debug_tools
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 dispatch
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 distributed
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 eth
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 llk
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
unsuccessful-reset-startup
Unable to reset board successfully, rebooting
sd-unit-tests / blackhole P150-175 llk
The operation was canceled.
sd-unit-tests / blackhole P150-175 stl
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
cards-not-detected-startup
tenstorrent module not in use, usually meaning the driver is not working or the cards aren't detected. Please let the infra team know to check the driver / cards
cards-not-detected-shutting-down-startup
This runner will now shutdown and refuse to run further jobs. Please let the infra team know by filing an issue with the CI job link and tagging them.
sd-unit-tests / blackhole P150-175 stl
The operation was canceled.
sd-unit-tests / blackhole P150-175 tools
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
sd-unit-tests / blackhole P150-175 user kernel path
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / All C++ blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / api blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / debug_tools blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / device blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / dispatch blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / dispatch multicmd queue blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / distributed blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / eth blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / llk blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / stl blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / tools blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / ttnn ccl cpp unit tests blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / ttnn cpp unit tests blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / ttnn tensor cpp unit tests blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
cpp-unit-tests / user kernel path blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager trace tests blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 2 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 3 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 4 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 5 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 6 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 7 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / eager unit tests 1 blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
fd-unit-tests / sweep blackhole P150-175
The self-hosted runner: tt-metal-ci-vm-175 lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
static-checks / check-black
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / check-forbidden-imports
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / check-metal-kernel-count
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / check-sweeps-workflow
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / Run Pre-commit Hooks
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / Run Pre-commit Hooks
Unexpected input(s) 'fetch-refs', valid inputs are ['repository', 'ref', 'token', 'ssh-key', 'ssh-known-hosts', 'ssh-strict', 'ssh-user', 'persist-credentials', 'path', 'clean', 'filter', 'sparse-checkout', 'sparse-checkout-cone-mode', 'fetch-depth', 'fetch-tags', 'show-progress', 'lfs', 'submodules', 'set-safe-directory', 'github-server-url']
static-checks / Run Pre-commit Hooks
Unexpected input(s) 'fetch-refs', valid inputs are ['repository', 'ref', 'token', 'ssh-key', 'ssh-known-hosts', 'ssh-strict', 'ssh-user', 'persist-credentials', 'path', 'clean', 'filter', 'sparse-checkout', 'sparse-checkout-cone-mode', 'fetch-depth', 'fetch-tags', 'show-progress', 'lfs', 'submodules', 'set-safe-directory', 'github-server-url']
static-checks / check-spdx-licenses
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
static-checks / check-doc
ubuntu-latest pipelines will use ubuntu-24.04 soon. For more details, see https://github.com/actions/runner-images/issues/10636
build-wheels / build-wheel
Your workflow is using a version of actions/cache that is scheduled for deprecation, actions/cache@13aacd865c20de90d75de3b17ebe84f7a17d57d2. Please update your workflow to use either v3 or v4 of actions/cache to avoid interruptions. Learn more: https://github.blog/changelog/2024-12-05-notice-of-upcoming-releases-and-breaking-changes-for-github-actions/#actions-cache-v1-v2-and-actions-toolkit-cache-package-closing-down
build-wheels / build-wheel
Cache paths are empty. Please check the previous logs and make sure that the python version is specified
build-wheels / build-wheel
The `python-version` input is not set. The version of Python currently in `PATH` will be used.
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
sd-unit-tests / blackhole P150-175 FD2
Your workflow is using a version of actions/cache that is scheduled for deprecation, actions/cache@13aacd865c20de90d75de3b17ebe84f7a17d57d2. Please update your workflow to use either v3 or v4 of actions/cache to avoid interruptions. Learn more: https://github.blog/changelog/2024-12-05-notice-of-upcoming-releases-and-breaking-changes-for-github-actions/#actions-cache-v1-v2-and-actions-toolkit-cache-package-closing-down
sd-unit-tests / blackhole P150-175 FD2
No files were found with the provided path: ~/run-log/20250123104609_sys_logs.tar. No artifacts will be uploaded.
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
unsuccessful-reset-attempt-startup
Unsuccessful board reset, trying again in 1 minute ...
disk-usage-after-startup
Disk usage is 26 %
printing-smi-info-startup
Touching and printing out SMI info
attempting-reset-startup
Attempting to reset card(s). Sleeping first
reset-successful-startup
tt-smi reset was successful
hugepages-service-found-startup
Hugepages service found. Command returned with exit code 3. Restarting it so we can ensure hugepages are available
hugepages-setup-success-startup
Hugepages is now setup.
disk-usage-after-startup
Disk usage is 27 %
printing-smi-info-startup
Touching and printing out SMI info
attempting-reset-startup
Attempting to reset card(s). Sleeping first
reset-successful-startup
tt-smi reset was successful
hugepages-service-found-startup
Hugepages service found. Command returned with exit code 3. Restarting it so we can ensure hugepages are available
hugepages-setup-success-startup
Hugepages is now setup.
disk-usage-after-startup
Disk usage is 26 %
disk-usage-after-startup
Disk usage is 26 %
printing-smi-info-startup
Touching and printing out SMI info
attempting-reset-startup
Attempting to reset card(s). Sleeping first
reset-successful-startup
tt-smi reset was successful
hugepages-service-found-startup
Hugepages service found. Command returned with exit code 3. Restarting it so we can ensure hugepages are available
hugepages-setup-success-startup
Hugepages is now setup.
disk-usage-after-startup
Disk usage is 32 %
printing-smi-info-startup
Touching and printing out SMI info
attempting-reset-startup
Attempting to reset card(s). Sleeping first
reset-successful-startup
tt-smi reset was successful
hugepages-service-found-startup
Hugepages service found. Command returned with exit code 3. Restarting it so we can ensure hugepages are available
hugepages-setup-success-startup
Hugepages is now setup.
disk-usage-after-startup
Disk usage is 32 %

Artifacts

Produced during runtime
Name Size
TTMetal_build_any
333 MB
eager-dist-ubuntu-22.04-any
330 MB