Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bots get stuck in job-runner on failing podman #7474

Open
martinpitt opened this issue Feb 24, 2025 · 1 comment
Open

bots get stuck in job-runner on failing podman #7474

martinpitt opened this issue Feb 24, 2025 · 1 comment
Assignees

Comments

@martinpitt
Copy link
Member

Our test queue has become very long and slow recently. Many of them often don't do any work any more. The container log usually ends with something like

+ timeout 12h ./run-queue
INFO:root:Consuming message with command: ./job-runner json {"repo": "rhinstaller/anaconda-webui", ...
INFO:lib.aio.job:Log: https://cockpit-logs.us-east-1.linodeobjects.com/pull-657-9f15d9d1-20250224-025303-fedora-42-boot-efi-other/log.html
Error: reading CIDFile: open /tmp/tmpkdsgqcpp/cidfile: no such file or directory

and there is either no corresponding podman container on the host at all (cockpit-tasks-1 is the only one), or it has exited long ago, like

c8d82bebd818  ghcr.io/cockpit-project/tasks:2025-02-22  python3 -c #!/usr...  4 hours ago  Exited (1) 4 hours ago              adoring_bose

I found some as old as 7 hours, which is when I mass-rebooted all bots.

@martinpitt martinpitt moved this to next in Pilot tasks Feb 24, 2025
@allisonkarlitskaya allisonkarlitskaya self-assigned this Feb 25, 2025
@martinpitt
Copy link
Member Author

I haven't seen this any more in the last two days -- this was somehow magically triggered by rhinstaller/anaconda-webui#655 . But now disappeared just as mysteriously as it arrived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: next
Development

No branches or pull requests

2 participants