Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption doesn't happened for guaranteed resource #4028

Open
and-1 opened this issue Jan 21, 2025 · 5 comments
Open

Preemption doesn't happened for guaranteed resource #4028

and-1 opened this issue Jan 21, 2025 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@and-1
Copy link

and-1 commented Jan 21, 2025

What happened: There are two CQ for teams without cpu/ram guaranteed resources and one CQ for capacity management. CQ1 configured with nominalQuota=1 for resource nvidia.com/gpu. I apply job2 to CQ2 (without any guaranteed resources) and after job1 to CQ1 and preemption doesn't happened for job1. Message from WL status

message: 'couldn''t assign flavors to pod set main: insufficient unused quota
      for cpu in flavor a100, 2 more needed, insufficient unused quota for memory
      in flavor a100, 2Gi more needed'

What you expected to happen:
job1 will preempt job2.

How to reproduce it (as minimally and precisely as possible):

CQ1

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "team1"
spec:
  cohort: default
  namespaceSelector: {}
  preemption:
    reclaimWithinCohort: Any
    borrowWithinCohort:
      policy: Never
    withinClusterQueue: LowerPriority
  queueingStrategy: BestEffortFIFO
  flavorFungibility:
    whenCanBorrow: TryNextFlavor
    whenCanPreempt: TryNextFlavor
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: "a100"
      resources:
      - name: "cpu"
        nominalQuota: 0
      - name: "memory"
        nominalQuota: 0
      - name: "nvidia.com/gpu"
        nominalQuota: 1

CQ2

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "team2"
spec:
  cohort: default
  namespaceSelector: {}
  preemption:
    reclaimWithinCohort: Any
    borrowWithinCohort:
      policy: Never
    withinClusterQueue: LowerPriority
  queueingStrategy: BestEffortFIFO
  flavorFungibility:
    whenCanBorrow: TryNextFlavor
    whenCanPreempt: TryNextFlavor
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: "a100"
      resources:
      - name: "cpu"
        nominalQuota: 0
      - name: "memory"
        nominalQuota: 0
      - name: "nvidia.com/gpu"
        nominalQuota: 0

CQ3

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "unallocated-resources"
spec:
  cohort: default
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    flavors:
    - name: "a100"
      resources:
      - name: "cpu"
        nominalQuota: 2
      - name: "memory"
        nominalQuota: 2Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 1
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "a100"
spec:
  nodeLabels:
    flavor.kueue.x-k8s.io/a100: "true"
  tolerations:
  - effect: NoSchedule
    key: flavor.kueue.x-k8s.io/a100
    operator: "Exists"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: "team1-lq"
spec:
  clusterQueue: "team1"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: "team2-lq"
spec:
  clusterQueue: "team2"

job1

apiVersion: batch/v1
kind: Job
metadata:
  name: job1
  labels:
    kueue.x-k8s.io/queue-name: team1-lq
spec:
  parallelism: 1
  completions: 1
  template:
    spec:
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
        args: ["10m"]
        resources:
          limits:
            cpu: 2
            memory: "2Gi"
            nvidia.com/gpu: 1
      restartPolicy: Never

job2

apiVersion: batch/v1
kind: Job
metadata:
  name: job2
  labels:
    kueue.x-k8s.io/queue-name: team2-lq
spec:
  parallelism: 1
  completions: 1
  template:
    spec:
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
        args: ["10m"]
        resources:
          limits:
            cpu: 2
            memory: "2Gi"
            nvidia.com/gpu: 1
      restartPolicy: Never

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.29.0
  • Kueue version (use git describe --tags --dirty --always): 0.10.0
  • OS (e.g: cat /etc/os-release): Debian 11
  • Kernel (e.g. uname -a): 6.1.0-0.deb11.17-amd64
@and-1 and-1 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 21, 2025
@mimowo
Copy link
Contributor

mimowo commented Jan 21, 2025

I haven't run the experiment, but my first thought is that job1 needs to also borrow (cpu, because it requests 2 cpu, but it is not provided by nominal quota of CQ1), and preemption while borrowing is not enabled (preemption.borrowWithinCohort), so seem WAI.

@mimowo
Copy link
Contributor

mimowo commented Jan 21, 2025

cc @gabesaba as related to scheduling

@and-1
Copy link
Author

and-1 commented Jan 21, 2025

It's a little bit confuse, job without any guaranteed resources cann't be preempted by job with guaranteed resource. It's expected, for example, in case when CQ2 has cpu guarantee, CQ1 - nvidia.com/gpu and scheduler could not decide which one win without any resource priority algorithm.
Also i thought that my case managed by preemption.reclaimWithinCohort parameter

@mimowo
Copy link
Contributor

mimowo commented Jan 22, 2025

guaranteed resources cann't be preempted by job with guaranteed resource.

What do you mean "job with guaranteed resource"? job1 requires "cpu" which is not provided by nominal quota of CQ1.

Also i thought that my case managed by preemption.reclaimWithinCohort parameter

Due to the need to borrow "cpu" the job1 needs to borrow, and as such it cannot preempt, even reclaim, unless the preemption.borrowWIthinCohort is enabled.

@and-1
Copy link
Author

and-1 commented Jan 25, 2025

What do you mean "job with guaranteed resource"? job1 requires "cpu" which is not provided by nominal quota of CQ1.

Gpu provided by CQ1, cpu/ram - not, I expect that kueue provides gpu for job1 in best effort mode. I'll try to explain what i mean: we submit job2, usage of CQ2 (for all resources) will above nominal quota. After submitting job1, gpu resource will be equal nominal quota CQ1 and should be provided by kueue (if possible). Kueue should scan all CQ in cohort and if finds CQ (in our case CQ2) with borrowed resources and after preempting usage of CQ still equal or above nominal quotas, so it should preempt that workload (in our case job2). Another words - job1 requested resources more guaranteed then job2 resources

Due to the need to borrow "cpu" the job1 needs to borrow, and as such it cannot preempt, even reclaim, unless the preemption.borrowWIthinCohort is enabled.

if it should be handled by borrowWIthinCohort parameter - ok, but now supported only borrow based on workload priority, not based on quota conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants