Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Users Suspending Jobsets When Using Kueue #788

Open
valayDave opened this issue Feb 20, 2025 · 0 comments
Open

Users Suspending Jobsets When Using Kueue #788

valayDave opened this issue Feb 20, 2025 · 0 comments

Comments

@valayDave
Copy link

valayDave commented Feb 20, 2025

When jobsets are deployed in an environment with Kueue, users cannot suspend Jobsets that have been admitted as a Workload by Kueue. This is because Kueue manages suspending/unsuspending jobsets and because of that any jobset suspened by a user (i.e. the user explicitly patches the jobset to suspend:true via kubeclt) gets ignored and the jobset keep running. This is because kueue will unsuspend that jobset in it's next reconciliation cycle.

The current way we handle this is by crashing the containers in the pods (if job is active) or deleting the jobset (if its not active). This entire strategy is not optimal (especially the deletion part since we need keep those objects around for sometime for an audit trail). Another alternative I have explored was setting replica count to 0 but this is also something that we don't prefer doing since it messes with the audit trail. I wanted to know if there can be some means of explicitly crashing the jobset (in both cases When it's pending / when it's running) such that kueue doesn't interfere and considers it as terminated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant