-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preemption doesn't happened for guaranteed resource #4028
Comments
I haven't run the experiment, but my first thought is that job1 needs to also borrow (cpu, because it requests 2 cpu, but it is not provided by nominal quota of CQ1), and preemption while borrowing is not enabled (preemption.borrowWithinCohort), so seem WAI. |
cc @gabesaba as related to scheduling |
It's a little bit confuse, job without any guaranteed resources cann't be preempted by job with guaranteed resource. It's expected, for example, in case when CQ2 has cpu guarantee, CQ1 - nvidia.com/gpu and scheduler could not decide which one win without any resource priority algorithm. |
What do you mean "job with guaranteed resource"? job1 requires "cpu" which is not provided by nominal quota of CQ1.
Due to the need to borrow "cpu" the |
Gpu provided by CQ1, cpu/ram - not, I expect that kueue provides gpu for job1 in best effort mode. I'll try to explain what i mean: we submit job2, usage of CQ2 (for all resources) will above nominal quota. After submitting job1, gpu resource will be equal nominal quota CQ1 and should be provided by kueue (if possible). Kueue should scan all CQ in cohort and if finds CQ (in our case CQ2) with borrowed resources and after preempting usage of CQ still equal or above nominal quotas, so it should preempt that workload (in our case job2). Another words - job1 requested resources more guaranteed then job2 resources
if it should be handled by borrowWIthinCohort parameter - ok, but now supported only borrow based on workload priority, not based on quota conditions |
What happened: There are two CQ for teams without cpu/ram guaranteed resources and one CQ for capacity management. CQ1 configured with nominalQuota=1 for resource nvidia.com/gpu. I apply job2 to CQ2 (without any guaranteed resources) and after job1 to CQ1 and preemption doesn't happened for job1. Message from WL status
What you expected to happen:
job1 will preempt job2.
How to reproduce it (as minimally and precisely as possible):
CQ1
CQ2
CQ3
job1
job2
Anything else we need to know?:
Environment:
kubectl version
): v1.29.0git describe --tags --dirty --always
): 0.10.0cat /etc/os-release
): Debian 11uname -a
): 6.1.0-0.deb11.17-amd64The text was updated successfully, but these errors were encountered: