Skip to content

Commit

Permalink
KEP-672: Implement the DependsOn API (#740)
Browse files Browse the repository at this point in the history
* Add DependsOn API

* Add JobSet controller changes

* Add integration tests for DependsOn CEL validation

* Add unit tests for DependsOn

* Add controller integration tests for DependsOn

* Fix go lint

* Add test case with DependsOn and StartupPolicy: AnyOrder
Improve API docs

* Test case when job-2 depends on job-1 and job-3 depends on job-2

* Add manifests to the make generate

* Add E2E test for the DependsOn API

* Rename var to DependencyReady and DependencyComplete
Rename func to dependencyReachedStatus

* Update docs and add DependsOn example

* Use startupProbe for launcher

* Remove DependsOn rules from the docs

* Add E2Es for Kubeflow usecases with DependsOn

* Refactor dependencyReachedStatus to accept rJob and rJobReplicas
Add info for Suspended Job

* Don't check idx in webhook
Improve docs

* Add comment for e2e

* Improve integration tests

* Run generate

* Update test/integration/controller/jobset_controller_test.go

---------

Co-authored-by: Abdullah Gharaibeh <[email protected]>
  • Loading branch information
andreyvelich and ahg-g authored Jan 27, 2025
1 parent 7bbc954 commit ec94252
Show file tree
Hide file tree
Showing 46 changed files with 1,644 additions and 168 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and Cust
paths="./pkg/..."

.PHONY: generate
generate: controller-gen code-generator openapi-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations and client-go libraries.
generate: manifests controller-gen code-generator openapi-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations and client-go libraries.
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./api/..."
./hack/update-codegen.sh $(GO_CMD) $(PROJECT_DIR)/bin
./hack/python-sdk/gen-sdk.sh
Expand Down
41 changes: 41 additions & 0 deletions api/jobset/v1alpha2/jobset_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ const (
)

// JobSetSpec defines the desired state of JobSet
// +kubebuilder:validation:XValidation:rule="!(has(self.startupPolicy) && self.startupPolicy.startupPolicyOrder == 'InOrder' && self.replicatedJobs.exists(x, has(x.dependsOn)))",message="StartupPolicy and DependsOn APIs are mutually exclusive"
// +kubebuilder:validation:XValidation:rule="!(has(self.replicatedJobs[0].dependsOn))",message="DependsOn can't be set for the first ReplicatedJob"
type JobSetSpec struct {
// ReplicatedJobs is the group of jobs that will form the set.
// +listType=map
Expand All @@ -105,6 +107,7 @@ type JobSetSpec struct {
FailurePolicy *FailurePolicy `json:"failurePolicy,omitempty"`

// StartupPolicy, if set, configures in what order jobs must be started
// Deprecated: StartupPolicy is deprecated, please use the DependsOn API.
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
StartupPolicy *StartupPolicy `json:"startupPolicy,omitempty"`

Expand Down Expand Up @@ -230,8 +233,46 @@ type ReplicatedJob struct {
// Jobs names will be in the format: <jobSet.name>-<spec.replicatedJob.name>-<job-index>
// +kubebuilder:default=1
Replicas int32 `json:"replicas,omitempty"`

// DependsOn is an optional list that specifies the preceding ReplicatedJobs upon which
// the current ReplicatedJob depends. If specified, the ReplicatedJob will be created
// only after the referenced ReplicatedJobs reach their desired state.
// The Order of ReplicatedJobs is defined by their enumeration in the slice.
// Note, that the first ReplicatedJob in the slice cannot use the DependsOn API.
// Currently, only a single item is supported in the DependsOn list.
// If JobSet is suspended the all active ReplicatedJobs will be suspended. When JobSet is
// resumed the Job sequence starts again.
// This API is mutually exclusive with the StartupPolicy API.
// +kubebuilder:validation:XValidation:rule="self == oldSelf",message="Value is immutable"
// +kubebuilder:validation:MaxItems=1
// +optional
// +listType=map
// +listMapKey=name
DependsOn []DependsOn `json:"dependsOn,omitempty"`
}

// DependsOn defines the dependency on the previous ReplicatedJob status.
type DependsOn struct {
// Name of the previous ReplicatedJob.
Name string `json:"name"`

// Status defines the condition for the ReplicatedJob. Only Ready or Complete status can be set.
// +kubebuilder:validation:Enum=Ready;Complete
Status DependsOnStatus `json:"status"`
}

type DependsOnStatus string

const (
// DependencyReady means the Ready + Succeeded + Failed counter
// equals the number of child Jobs of the dependant ReplicatedJob.
DependencyReady DependsOnStatus = "Ready"

// DependencyComplete means the Succeeded counter
// equals the number of child Jobs of the dependant ReplicatedJob.
DependencyComplete DependsOnStatus = "Complete"
)

type Network struct {
// EnableDNSHostnames allows pods to be reached via their hostnames.
// Pods will be reachable using the fully qualified pod hostname:
Expand Down
57 changes: 55 additions & 2 deletions api/jobset/v1alpha2/openapi_generated.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions api/jobset/v1alpha2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions client-go/applyconfiguration/jobset/v1alpha2/dependson.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 17 additions & 3 deletions client-go/applyconfiguration/jobset/v1alpha2/replicatedjob.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions client-go/applyconfiguration/utils.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 46 additions & 2 deletions config/components/crd/bases/jobset.x-k8s.io_jobsets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,43 @@ spec:
set.
items:
properties:
dependsOn:
description: |-
DependsOn is an optional list that specifies the preceding ReplicatedJobs upon which
the current ReplicatedJob depends. If specified, the ReplicatedJob will be created
only after the referenced ReplicatedJobs reach their desired state.
The Order of ReplicatedJobs is defined by their enumeration in the slice.
Note, that the first ReplicatedJob in the slice cannot use the DependsOn API.
Currently, only a single item is supported in the DependsOn list.
If JobSet is suspended the all active ReplicatedJobs will be suspended. When JobSet is
resumed the Job sequence starts again.
This API is mutually exclusive with the StartupPolicy API.
items:
description: DependsOn defines the dependency on the previous
ReplicatedJob status.
properties:
name:
description: Name of the previous ReplicatedJob.
type: string
status:
description: Status defines the condition for the ReplicatedJob.
Only Ready or Complete status can be set.
enum:
- Ready
- Complete
type: string
required:
- name
- status
type: object
maxItems: 1
type: array
x-kubernetes-list-map-keys:
- name
x-kubernetes-list-type: map
x-kubernetes-validations:
- message: Value is immutable
rule: self == oldSelf
name:
description: |-
Name is the name of the entry and will be used as a suffix
Expand Down Expand Up @@ -8976,8 +9013,9 @@ spec:
- name
x-kubernetes-list-type: map
startupPolicy:
description: StartupPolicy, if set, configures in what order jobs
must be started
description: |-
StartupPolicy, if set, configures in what order jobs must be started
Deprecated: StartupPolicy is deprecated, please use the DependsOn API.
properties:
startupPolicyOrder:
description: |-
Expand Down Expand Up @@ -9039,6 +9077,12 @@ spec:
minimum: 0
type: integer
type: object
x-kubernetes-validations:
- message: StartupPolicy and DependsOn APIs are mutually exclusive
rule: '!(has(self.startupPolicy) && self.startupPolicy.startupPolicyOrder
== ''InOrder'' && self.replicatedJobs.exists(x, has(x.dependsOn)))'
- message: DependsOn can't be set for the first ReplicatedJob
rule: '!(has(self.replicatedJobs[0].dependsOn))'
status:
description: JobSetStatus defines the observed state of JobSet
properties:
Expand Down
Loading

0 comments on commit ec94252

Please sign in to comment.