Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Deploy JobSet in kubeflow-system namespace #2388

Merged
merged 6 commits into from
Jan 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions manifests/v2/base/manager/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,2 @@
resources:
- manager.yaml
# TODO (andreyvelich): Move it to overlays once we copy the JobSet manifests.
namespace: kubeflow-system
2 changes: 0 additions & 2 deletions manifests/v2/base/rbac/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,3 @@ resources:
- role.yaml
- role_binding.yaml
- service_account.yaml
# TODO (andreyvelich): Move it to overlays once we copy the JobSet manifests.
namespace: kubeflow-system
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- torch-distributed.yaml
- torch_distributed.yaml
2 changes: 0 additions & 2 deletions manifests/v2/base/webhook/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,3 @@ patches:
kind: ValidatingWebhookConfiguration
configurations:
- kustomizeconfig.yaml
# TODO (andreyvelich): Move it to overlays once we copy the JobSet manifests.
namespace: kubeflow-system
Original file line number Diff line number Diff line change
@@ -1,16 +1,23 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

# Namespace where all resources are deployed.
namespace: kubeflow-system

resources:
- namespace.yaml
- ../../base/crds
- ../../base/manager
- ../../base/rbac
- ../../base/webhook
# TODO (andreyvelich): JobSet should support kubeflow-system namespace.
- https://github.com/kubernetes-sigs/jobset/releases/download/v0.6.0/manifests.yaml
- ../../third-party/jobset # Comment this line if JobSet is installed on the Kubernetes cluster.

# Update the Kubeflow Training manager image tag.
images:
- name: kubeflow/training-operator-v2
newTag: latest

# Secret for the Kubeflow Training webhook.
secretGenerator:
- name: training-operator-v2-webhook-cert
namespace: kubeflow-system
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base/runtimes/pre-training
- ../../base/runtimes/pretraining
19 changes: 0 additions & 19 deletions manifests/v2/overlays/standalone/kustomization.yaml

This file was deleted.

2 changes: 2 additions & 0 deletions manifests/v2/third-party/jobset/jobset_manager_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
apiVersion: config.jobset.x-k8s.io/v1alpha1
kind: Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just using the default configuration would be sufficient.

https://github.com/kubernetes-sigs/jobset/blob/main/config/components/manager/controller_manager_config.yaml

I guess a nil configuration should be fine but it looks a bit weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way this is just a nit.

everything LGTM to me now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just keep this empty config, so in the future we can configure it.
I think, we can enable leader election in the future, if we want to have it by default.

18 changes: 18 additions & 0 deletions manifests/v2/third-party/jobset/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- https://github.com/kubernetes-sigs/jobset/releases/download/v0.7.3/manifests.yaml

# Config for the JobSet manager.
configMapGenerator:
- name: jobset-manager-config
files:
- jobset_manager_config.yaml
options:
disableNameSuffixHash: true

# Add required patches.
patchesStrategicMerge:
- patches/jobset_remove_namespace.yaml # Remove namespace from the JobSet release manifests.
Comment on lines +16 to +17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also patch those namespace related configurations outside of the .metadata field? Like: https://github.com/kubeflow/training-operator/pull/2382/files#r1914094979

It seems that we also need to patch those webhook configurations to ensure jobsetworking correctly:)

Copy link
Member Author

@andreyvelich andreyvelich Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that even without this config, the kustomize correctly patches the JobSet's manifests with:

  clientConfig:
    service:
      name: jobset-webhook-service
      namespace: kubeflow-system
      path: /validate--v1-pod

I think, it works without this configuration since the JobSet manifests includes this service which tells kustomize that namespace in clientConfig needs to be patched.

apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/created-by: jobset
    app.kubernetes.io/instance: webhook-service
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: service
    app.kubernetes.io/part-of: jobset
  name: jobset-webhook-service
  namespace: jobset-system
spec:
  ports:
    - port: 443
      protocol: TCP
      targetPort: 9443
  selector:
    control-plane: controller-manager

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

- patches/jobset_config_patch.yaml # Add custom manager config to the JobSet.
21 changes: 21 additions & 0 deletions manifests/v2/third-party/jobset/patches/jobset_config_patch.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: jobset-controller-manager
namespace: jobset-system
spec:
template:
spec:
containers:
- name: manager
args:
- "--config=/jobset_manager_config.yaml"
volumeMounts:
- name: jobset-manager-config
mountPath: /jobset_manager_config.yaml
subPath: jobset_manager_config.yaml
readOnly: true
volumes:
- name: jobset-manager-config
configMap:
name: jobset-manager-config
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
---
$patch: delete
apiVersion: v1
kind: Namespace
metadata:
name: kubeflow-system
name: jobset-system
Loading