Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutate and Validate RayCluster on SecurityContext #574

Conversation

ChristianZaccaria
Copy link
Contributor

@ChristianZaccaria ChristianZaccaria commented Jun 19, 2024

Issue link

Jira: https://issues.redhat.com/browse/RHOAIENG-7638

What changes have been made

  • The Mutating Webhook will now add the required SecurityContext in the RayCluster's head and worker containers.
  • The Validating Webhook will now check for the required SecurityContext in the RayCluster's head and worker containers.

Verification steps

  1. Create Data science project in RHOAI
  2. Create and start workbench
  3. Find OpenShift namespace corresponding to the Data science project name, add labels to the namespace resource with value:
  • pod-security.kubernetes.io/enforce: restricted
  • pod-security.kubernetes.io/enforce-version: v1.24
  1. Clone SDK demo notebooks in Workbench
  2. Setup Kueue resources
  3. Run i.e. 0_basic_ray.ipynb Notebook, try to create Ray cluster
  4. RayCluster should start with or without the labels set in the Namespace resource.
  5. You may also check that the SecurityContext is added to both head and worker containers when the mentioned labels have been set.
  6. Attempt to create a RayCluster without the SecurityContext, and attempt to update the SecurityContext once the RayCluster is created. The ValidatingWebhook should reject these actions.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link

openshift-ci bot commented Jun 19, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from christianzaccaria. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChristianZaccaria ChristianZaccaria force-pushed the podsec-context-raycluster branch from 1c53e5e to 1ba3eb2 Compare June 19, 2024 15:07
rayclusterlog.Error(err, "Failed to check namespace resource labels")
return err
}
if hasSecurityLabels {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not adding the security context unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wasn't sure if that's what we wanted, hence made it conditional. On a security point a view, you're right, I think we should always have this securityContext enabled. Should I make it unconditional, and also add to the ValidatingWebhook to ensure these are set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that should be added unconditionally. I'm not sure how it'll behave on non OpenShift clusters. We may make it only for OpenShift if it has to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting point. I'll make the changes and check the e2e tests. This should be sufficient to confirm its behaviour. Thanks Antonin!

@ChristianZaccaria ChristianZaccaria force-pushed the podsec-context-raycluster branch from c00e58b to bd0aff8 Compare June 20, 2024 11:49
@ChristianZaccaria ChristianZaccaria changed the title Add securityContext to RayCluster based on Namespace labels Mutate and Validate RayCluster on SecurityContext Jun 20, 2024
@ChristianZaccaria ChristianZaccaria force-pushed the podsec-context-raycluster branch from d5ce875 to 20d8abf Compare June 20, 2024 14:44
@Fiona-Waters
Copy link
Contributor

I've run this on my cluster:

  • I'm logged in as an admin user
  • I have not set the labels in the namespace
  • I can submit a raycluster via the SDK and the webhook successfully adds the security context in each container of ray head and worker pods
  • I tried to edit the yaml to update or remove the security context in the raycluster CR and was stopped from doing so.
  • I can submit a job but it is in pending state (ray dashboard) for a few minutes and then fails.
  • I get this error (using cluster_job_client demo notebook)

'Traceback (most recent call last):\n File "/tmp/ray/session_2024-06-21_04-50-03_677463_8/runtime_resources/working_dir_files/_ray_pkg_262c3c69a1596acc/mnist_fashion.py", line 84, in \n results = trainer.fit()\n File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/train/base_trainer.py", line 633, in fit\n tuner = Tuner(\n File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/tune/tuner.py", line 179, in init\n self._local_tuner = TunerInternal(**kwargs)\n File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/tune/impl/tuner_internal.py", line 144, in init\n ) = self.setup_create_experiment_checkpoint_dir(\n File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/tune/impl/tuner_internal.py", line 522, in setup_create_experiment_checkpoint_dir\n os.makedirs(experiment_path, exist_ok=True)\n File "/home/ray/anaconda3/lib/python3.9/os.py", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n File "/home/ray/anaconda3/lib/python3.9/os.py", line 225, in makedirs\n mkdir(name, mode)\nPermissionError: [Errno 13] Permission denied: '/home/ray/ray_results'\n'

@KPostOffice
Copy link
Collaborator

@ChristianZaccaria Should we close this then in lieu of opendatahub-io/kuberay#19 ?

@ChristianZaccaria
Copy link
Contributor Author

Closing in lieu of opendatahub-io/kuberay#19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants