-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pod group instead of PDB for gang scheduling #916
Comments
kubernetes/enhancements#639 was merged, I think it's time to replace PDB with PodGroup. In addition, it will also support phase/condition/status of PodGroup, e.g. unschedulable, which is helpful for lifecycle management of tfjob. |
if no objection, I'd like to creat pr for that :) |
@k82cn I know kube-batch is using namespace as queue as default, but it also supports creating queue CRD and specify the queue in the pod group. It would be great if I can specify the queue for my TFjob, to do that we can add to field called |
I'm OK with that; for |
xref kubernetes-retired/kube-batch#465 @adam-marek and I talked about Queue supported in tf-operator; it need to community's feedback on Or maybe we can introduce annotation as an alpha features, @jlewi , WDYT? |
@richardsliu @johnugeorge WDYT? |
@zionwu Isn't the queue name part of the PodGroupSpec ? https://github.com/kubernetes-sigs/kube-batch/blob/master/pkg/apis/scheduling/v1alpha1/types.go#L116 In such a case, why do we need a separate queue field after we set the PodGroup for the TFJob? |
Agree with @johnugeorge on the queue name. I would like to keep k8s details out of the TFJob spec if possible. Ideally, TFJob should contain just TF-specific fields. |
@johnugeorge, @richardsliu , just like PDB, the PodGroup will be created by TF-Operator,. If user can only specify the queue name in PodGroup , the process to specify the queue name is:
I think the queue name should be decided at the creation of PodGroup. Since the PodGroup is created by TF-operator, the only way user can specify the queue name is by setting it on TFJob spec, then TF-operator can set the queue name for PodGroup accordingly. |
Is pod group a standard resource in K8s? I am not sure what will happen if the user does not register the resource. |
No for now; CRD should be deployed with kube-batch together. If failed to create |
If tf-operator need resource fair-sharing from scheduler, |
@zionwu I am little skeptical about adding non-TF fields in TFJob API. We would then be forced to change the TFJob API when upstream implementation changes. As @k82cn suggested, how about considering annotations for now? @gaocegege @richardsliu |
Would it be possible to introduce a field in https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/common/v1beta1/common_types.go? For example something like "SchedulingPolicy". That way all operators can then share the same implementation. |
+1 ; in the previous version of gang-scheduling design doc, I proposed to introduce |
/area 0.5.0 |
@k82cn @richardsliu @gaocegege How about in v1beta2, should we also ask users to add schedulerName to each replicas(pod) spec or also change to use annotation to indicate the scheduler? |
@ChanYiLin Ref #920 We decide to simplify the process. |
@k82cn working on this ? |
yes, i'm working on this one :) |
@k82cn Any updates on this? |
@k82cn Is this still on track to be finished in 0.5.0? We are trying to reach code complete by 3/15. |
@richardsliu , sorry for the delay response. Let me try to submit a PR this week. |
@thandayuthapani @k82cn Any more pending work for this item? This was automatically closed by k8s bot. |
@richardsliu Yeah, in v1beta2 side, it uses podGroup instead of PDB during gang scheduling. |
@thandayuthapani Are you working on SchedulingPolicy? (From #916 (comment) ) |
@richardsliu , no more items for 0.5 release
@johnugeorge , we'll do some investigation after 0.5 release. |
@johnugeorge @k82cn @richardsliu Is there more work to be done before we close out this issue? If there's more work can we clarify what work is needed and if anyone is planning on taking it on? |
@k82cn Is there any more we want to do for 0.6? Note that if the scheduling policy changes need to go in the common API repo, then it can wait until after 0.7. |
It is proposed in common. |
/kind feature |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
The lastest version of kube-batch is using pod group instead of PDB.
With the pod group, user can specify the queue for the job, even the job is in different namespaces. With the PDB, we can't specify the queue, it is defaults to the its namespace.
PDB is still supported in kube-batch for backward compatbility, However it will be removed when v1alpha1 finalized. I think tf-operator should support pod group for more powerful scheduling.
The text was updated successfully, but these errors were encountered: