Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an initcontainer for persistent home setup #1251

Merged
merged 3 commits into from
May 15, 2024
Merged

Conversation

dkwon17
Copy link
Collaborator

@dkwon17 dkwon17 commented Apr 16, 2024

What does this PR do?

The problem

Today, the persistent home setup occurs in the entrypoint [1] of the tooling container (assuming it is based on the UBI). Notably, the entrypoint runs stow [2] to make symbolic links for the home directory to be saved on the PVC.

If a poststart event depends on files or edits files located in the home directory, there is a race condition because the poststart event may start running before stow has finished running. Stow takes about 6 seconds to complete.

This race condition causes eclipse-che/che#22914, because the checode poststart event runs before the entrypoint script completes.

Proposed solution

To solve the race condition altogether, the entrypoint script must be complete before the post start events run. To do this, an initcontainer that runs the entrypoint script can be used.

What is the container image for the initcontainer?

To have a minimal effect on workspace startup times, the container image for the initcontainer should be the same image as the tooling container's image to avoid pulling an extra image. Assuming that the image is already pulled, I've observed no significant startup impact (see demo video in the next section) when running an extra container running the tooling image, even if the tooling image is large (quay.io/devfile/universal-developer-image:ubi8-latest has a compressed size of 2.9GB).

When thinking about it in the Docker perspective, running a container is only creating a writable layer on top of the already-pulled image layers, therefore if the image is already pulled the image size would not have a significant impact on container startup. I'm guessing this is what also happens in OpenShift/CRI-O, because adding the initcontainer did not increase workspace startup time significantly.

What issues does this PR fix or reference?

eclipse-che/che#22914
Fixes #1260

Demo: https://youtu.be/RQeKKJgaogM

In the demo above, the init container named init-persistent-home runs for than 9 seconds at the 0:20 second mark. Stow is only run once (because of this if statement [3]) to set up the new PVC.

So for subsequent workspace starts, the init-persistent-home init container takes less than a second to finish. Here is a second demo showing that:

https://youtu.be/hKnb1mgBSqA?si=yuJ3EKmj_dX0qHj0&t=20

Is it tested? How?

To test this PR with Eclipse Che:

Install DWO on your cluster by adding this catalog source and installing DWO from OperatorHub:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: devworkspace-operator-catalog
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/dkwon17/devworkspace-operator-index:init-container
  publisher: Red Hat
  displayName: DevWorkspace Operator Catalog
  updateStrategy:
    registryPoll:
      interval: 5m

After waiting a few minutes, the DWO from the new catalog should appear in OperatorHub. Install the one from DevWorkspace Operator Catalog.

image

Install Eclipse Che with chectl:

chectl server:deploy -p openshift --skip-devworkspace-operator

Enabled persistent user home by adding the following in the CheCluster CR:

spec:
  devEnvironments:
    persistUserHome:
      enabled: true

Create a workspace with the following repositories and confirm that the podman info command runs without errors after workspace startup (just like in the demo video):

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Is it tested? How?

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Copy link

openshift-ci bot commented Apr 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@dkwon17 dkwon17 force-pushed the persistenthome-init branch 2 times, most recently from e67e4bb to 28795c2 Compare April 17, 2024 22:54
@dkwon17
Copy link
Collaborator Author

dkwon17 commented Apr 17, 2024

Some cases to think about in general:

  • User is using a parent devfile (works, tested with this child devfile)
  • User's main container image does not have the /entrypoint.sh script
  • User's main container image has ["tail", "-f", "/dev/null"] in it's entry point, which would cause the init container to never terminate

@dkwon17 dkwon17 force-pushed the persistenthome-init branch from 28795c2 to 953df89 Compare April 17, 2024 23:26
@AObuchow
Copy link
Collaborator

AObuchow commented Apr 18, 2024

Amazing work so far David. I think this is a good proposed solution to fixing the race condition between the UDI's entrypoint and any post-start commands.

There are a lot of considerations to make so my thoughts are a bit all over the place, but here goes:

Should we create the home persistence init container component in DWO or Che Dashboard?

This is something you mentioned in your earlier draft. I think it makes the most sense for the init container component to be added in DWO:

  • We want to minimize the transformation of a devfile -> devworkspace, so that DWO's devfile bootstrap feature continues to work. If the Che-Dashboard was adding an extra component to the devworkspace for home persistence, this extra component would be lost when DWO restarts the devworkspace from the cloned devfile (as the bootstrap feature will use components from the devfile, not the devworkspace)
  • I'm not sure if the Che-Dashboard can currently "read" the Che Cluster CR in order to determine the home persistence settings? The dashboard would have to watch for changes in the Che Cluster CR or get re-deployed (with the configured home persistence settings) when the home persistence configuration changes.
  • There's already some level of coupling between the UDI and DWO for home persistence: mounting /home/user/ as a volume assumes that $HOME == /home/user/, which is the case for the UDI but not necessarily true for other user-provided images.
    • Since there's already coupling between DWO and the UDI, we should keep this the same rather than move the coupling to be between the UDI and Dashboard.

Should we call the main/init containers entrypoint? Or just the code relevant to home persistence (stow, cp)?

I don't think we want to call the UDI's entrypoint directly, as there's other code in it that is irrelevant to home persistence. Instead, we'll want to move the stow-related code into its own script and execute that.

This script could potentially reside in the UDI under a pre-determined name (e.g. /home-persistence-init.sh), or it could be injected by DWO as the initContainer's command. In the latter case, we could test whether stow is installed in the container before calling it.

Should we hope that the main/tooling component has stow installed? Or consider shipping a dedicated init container with stow installed.

Instead of depending on the first container component in the devworkspace being the UDI & having stow installed, we could ship a minimal image that has stow installed (similar to how we ship the project clone container).

This would simplify the inferInitContainer() step, but require us to productize a new container downstream that we'd be shipping.

For a first pass of your proposed change, I think it's fine to require users to use a main/tooling image based on the UDI (with stow installed, as well as tooling installed in /home/tooling/ and $HOME set to /home/user/). This is already the case for $HOME persistence to work as expected, anyhow.

EDIT: This idea I had of having a dedicated init container image that provides the stow binary and executes it doesn't make sense, as there is no way to mount the /home/tooling/ directory from the tooling/main container to the init container image. Ignore this point

@AObuchow
Copy link
Collaborator

AObuchow commented Apr 18, 2024

Some cases to think about in general:

* User's main container image does not have the /entrypoint.sh script

* User's main container image has `["tail", "-f", "/dev/null"]` in it's entry point, which would cause the init container to never terminate

Both of these can be resolved by having DWO call stow and cp directly (by default), rather than requiring them to be in the entrypoint. We could then let users configure the init container command through the DWOC if they don't want to use the default we provide.

Edit: one thing we'll have to consider if we move the stow step outside of the entrypoint, the stow step will now be executed before the kubedock step, which might break kubedock (stow is currently creating a symlink from /home/tooling/.local/bin/podman -> /home/user/.local/bin/podman). I'm hopeful this won't actually be a problem though, as /home/tooling/.local/bin/ should be on $PATH.

Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more feedback, great improvements :)

@dkwon17 dkwon17 force-pushed the persistenthome-init branch from f99d0e3 to 121f107 Compare April 22, 2024 20:07
@dkwon17
Copy link
Collaborator Author

dkwon17 commented Apr 22, 2024

Sorry, the PR still had rough work, I've still updated the PR recently with some of your feedback

Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @dkwon17 :)

pkg/library/home/persistentHome.go Outdated Show resolved Hide resolved
pkg/library/home/persistentHome.go Outdated Show resolved Hide resolved
pkg/library/home/persistentHome.go Outdated Show resolved Hide resolved
@AObuchow
Copy link
Collaborator

@dkwon17 since we're adding new fields to the DWOC, you'll have to modify mergeConfig() and GetCurrentConfigString() in sync.go to account for the new fields.

Additionally, the CRDs will have to be re-generated by running the following and adding the resulting changes into a separate commit: make generate manifests fmt generate_default_deployment generate_olm_bundle_yaml.

Since we're still working through the details of this PR, these 2 steps can wait until we're set on the new DWOC fields.

@dkwon17 dkwon17 marked this pull request as ready for review April 26, 2024 16:52
@dkwon17 dkwon17 requested a review from ibuziuk as a code owner April 26, 2024 16:52
@openshift-ci openshift-ci bot requested a review from AObuchow April 26, 2024 16:52
Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor observations but looking awesome, great work so far @dkwon17

pkg/library/home/persistentHome.go Outdated Show resolved Hide resolved
pkg/library/home/persistentHome.go Outdated Show resolved Hide resolved
@ibuziuk ibuziuk requested a review from svor April 29, 2024 13:37
Copy link

@svor svor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested as described in the description with che-plugin registry repo, works as expected.
Great job @dkwon17

screenshot-eclipse-che apps ci-ln-jt06z1b-76ef8 origin-ci-int-aws dev rhcloud com-2024 04 30-13_08_24

@openshift-ci openshift-ci bot added the lgtm label Apr 30, 2024
@tolusha
Copy link
Contributor

tolusha commented May 2, 2024

@dkwon17
I've noticed, that after enabling persistUserHome, all content from /home/user is vanished that it has been there before

@dkwon17
Copy link
Collaborator Author

dkwon17 commented May 2, 2024

@tolusha could you provide a reproducer? Does this happen when enabling persistUserHome and starting an existing workspace?

Copy link

openshift-ci bot commented May 2, 2024

@tolusha: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AObuchow
Copy link
Collaborator

AObuchow commented May 9, 2024

@dkwon17 your latest changes look good to me :) I still need to do some final manual testing on a cluster but I think this is just about ready to merge.

My 2 last thoughts on this PR:

  • We need to verify this doesn't break the Podman wrapper in the UDI (i.e. kubedock) since now Stow is being invoked before the podman wrapper is symlinked. However, as we briefly discussed on call, I don't think this is an issue since /home/tooling/ (where the Podman wrapper exists) is on PATH
  • It'd be nice to squash some of the commits from this PR together to create an easier-to-follow git log. My suggestion is that you could probably just have ~3 commits: one for the feature implementation, one for changing the CRDs/resources and one for adding tests. I know some commits (like this one) change both the implementation and test cases, so don't worry if these remain their own separate commits.

@dkwon17
Copy link
Collaborator Author

dkwon17 commented May 10, 2024

Thanks for the feedback, I was able to verify the kubedock functionality [1] in the UDI for this PR

[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_dev_spaces/3.9/html/release_notes_and_known_issues/new-features#enhancement-crw-3367

@AObuchow
Copy link
Collaborator

Thanks for the feedback, I was able to verify the kubedock functionality [1] in the UDI for this PR

[1] access.redhat.com/documentation/en-us/red_hat_openshift_dev_spaces/3.9/html/release_notes_and_known_issues/new-features#enhancement-crw-3367

@dkwon17 awesome, thank you for verifying this. I'll do some final testing and have this merged.

Signed-off-by: David Kwon <[email protected]>
@dkwon17
Copy link
Collaborator Author

dkwon17 commented May 11, 2024

Sounds good @AObuchow , do you want me to squash the commits and force push?

@AObuchow
Copy link
Collaborator

@dkwon17 yes please squash your commits then force push if you get a chance. I know you're going to be on PTO however, so I can do this for you.

@dkwon17 dkwon17 force-pushed the persistenthome-init branch from 3e55cc3 to 5a27b78 Compare May 13, 2024 05:32
@openshift-ci openshift-ci bot removed the lgtm label May 13, 2024
@AObuchow
Copy link
Collaborator

@dkwon17 thanks for squashing your commits. Will do final review & have this merged by Wednesday morning :)

@AObuchow
Copy link
Collaborator

Note to myself: need to create a DWO issue before this is merged for tracking in the 0.28 release miletstones

Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in Che on an OpenShift 4.15 cluster with a few different devfiles and tried restarting workspaces as well -- works as expected :)

Looks good to me :) Great job @dkwon17 🥳

cp -n /home/tooling/.viminfo /home/user/.viminfo
cp -n /home/tooling/.bashrc /home/user/.bashrc
cp -n /home/tooling/.bash_profile /home/user/.bash_profile
touch $STOW_COMPLETE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to log here that the init script completed here, but that can be added in a future PR. We could also check whether stow completed successfully and report that here.

Copy link

openshift-ci bot commented May 14, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AObuchow, dkwon17, ibuziuk, svor, tolusha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@AObuchow AObuchow merged commit d913c07 into main May 15, 2024
8 checks passed
@AObuchow AObuchow deleted the persistenthome-init branch May 15, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add initContainer for initializing persistentHome when $HOME persistence enabled
5 participants