You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When jobsets are deployed in an environment with Kueue, users cannot suspend Jobsets that have been admitted as a Workload by Kueue. This is because Kueue manages suspending/unsuspending jobsets and because of that any jobset suspened by a user (i.e. the user explicitly patches the jobset to suspend:true via kubeclt) gets ignored and the jobset keep running. This is because kueue will unsuspend that jobset in it's next reconciliation cycle.
The current way we handle this is by crashing the containers in the pods (if job is active) or deleting the jobset (if its not active). This entire strategy is not optimal (especially the deletion part since we need keep those objects around for sometime for an audit trail). Another alternative I have explored was setting replica count to 0 but this is also something that we don't prefer doing since it messes with the audit trail. I wanted to know if there can be some means of explicitly crashing the jobset (in both cases When it's pending / when it's running) such that kueue doesn't interfere and considers it as terminated.
The text was updated successfully, but these errors were encountered:
When jobsets are deployed in an environment with Kueue, users cannot suspend Jobsets that have been admitted as a Workload by Kueue. This is because Kueue manages suspending/unsuspending jobsets and because of that any jobset suspened by a user (i.e. the user explicitly patches the jobset to
suspend:true
via kubeclt) gets ignored and the jobset keep running. This is because kueue will unsuspend that jobset in it's next reconciliation cycle.The current way we handle this is by crashing the containers in the pods (if job is active) or deleting the jobset (if its not active). This entire strategy is not optimal (especially the deletion part since we need keep those objects around for sometime for an audit trail). Another alternative I have explored was setting replica count to 0 but this is also something that we don't prefer doing since it messes with the audit trail. I wanted to know if there can be some means of explicitly crashing the jobset (in both cases When it's pending / when it's running) such that kueue doesn't interfere and considers it as terminated.
The text was updated successfully, but these errors were encountered: