-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an initcontainer for persistent home setup #1251
Conversation
Skipping CI for Draft Pull Request. |
e67e4bb
to
28795c2
Compare
Some cases to think about in general:
|
28795c2
to
953df89
Compare
Amazing work so far David. I think this is a good proposed solution to fixing the race condition between the UDI's entrypoint and any post-start commands. There are a lot of considerations to make so my thoughts are a bit all over the place, but here goes: Should we create the home persistence init container component in DWO or Che Dashboard?This is something you mentioned in your earlier draft. I think it makes the most sense for the init container component to be added in DWO:
Should we call the main/init containers entrypoint? Or just the code relevant to home persistence (stow, cp)?I don't think we want to call the UDI's entrypoint directly, as there's other code in it that is irrelevant to home persistence. Instead, we'll want to move the stow-related code into its own script and execute that. This script could potentially reside in the UDI under a pre-determined name (e.g. Should we hope that the main/tooling component has stow installed? Or consider shipping a dedicated init container with stow installed.Instead of depending on the first container component in the devworkspace being the UDI & having stow installed, we could ship a minimal image that has stow installed (similar to how we ship the project clone container). This would simplify the For a first pass of your proposed change, I think it's fine to require users to use a main/tooling image based on the UDI (with stow installed, as well as tooling installed in /home/tooling/ and $HOME set to /home/user/). This is already the case for $HOME persistence to work as expected, anyhow. EDIT: This idea I had of having a dedicated init container image that provides the stow binary and executes it doesn't make sense, as there is no way to mount the |
Both of these can be resolved by having DWO call stow and cp directly (by default), rather than requiring them to be in the entrypoint. We could then let users configure the init container command through the DWOC if they don't want to use the default we provide. Edit: one thing we'll have to consider if we move the stow step outside of the entrypoint, the stow step will now be executed before the kubedock step, which might break kubedock (stow is currently creating a symlink from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some more feedback, great improvements :)
f99d0e3
to
121f107
Compare
Sorry, the PR still had rough work, I've still updated the PR recently with some of your feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @dkwon17 :)
@dkwon17 since we're adding new fields to the DWOC, you'll have to modify Additionally, the CRDs will have to be re-generated by running the following and adding the resulting changes into a separate commit: Since we're still working through the details of this PR, these 2 steps can wait until we're set on the new DWOC fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor observations but looking awesome, great work so far @dkwon17
pkg/library/home/testdata/persistent-home/no-home-vm-when-disabled.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested as described in the description with che-plugin registry repo, works as expected.
Great job @dkwon17
@dkwon17 |
@tolusha could you provide a reproducer? Does this happen when enabling |
@tolusha: changing LGTM is restricted to collaborators In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dkwon17 your latest changes look good to me :) I still need to do some final manual testing on a cluster but I think this is just about ready to merge. My 2 last thoughts on this PR:
|
Thanks for the feedback, I was able to verify the kubedock functionality [1] in the UDI for this PR |
Signed-off-by: David Kwon <[email protected]>
Signed-off-by: dkwon17 <[email protected]>
@dkwon17 awesome, thank you for verifying this. I'll do some final testing and have this merged. |
Signed-off-by: David Kwon <[email protected]>
Sounds good @AObuchow , do you want me to squash the commits and force push? |
@dkwon17 yes please squash your commits then force push if you get a chance. I know you're going to be on PTO however, so I can do this for you. |
3e55cc3
to
5a27b78
Compare
@dkwon17 thanks for squashing your commits. Will do final review & have this merged by Wednesday morning :) |
Note to myself: need to create a DWO issue before this is merged for tracking in the 0.28 release miletstones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in Che on an OpenShift 4.15 cluster with a few different devfiles and tried restarting workspaces as well -- works as expected :)
Looks good to me :) Great job @dkwon17 🥳
cp -n /home/tooling/.viminfo /home/user/.viminfo | ||
cp -n /home/tooling/.bashrc /home/user/.bashrc | ||
cp -n /home/tooling/.bash_profile /home/user/.bash_profile | ||
touch $STOW_COMPLETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to log here that the init script completed here, but that can be added in a future PR. We could also check whether stow completed successfully and report that here.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AObuchow, dkwon17, ibuziuk, svor, tolusha The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What does this PR do?
The problem
Today, the persistent home setup occurs in the entrypoint [1] of the tooling container (assuming it is based on the UBI). Notably, the entrypoint runs stow [2] to make symbolic links for the home directory to be saved on the PVC.
If a poststart event depends on files or edits files located in the home directory, there is a race condition because the poststart event may start running before stow has finished running. Stow takes about 6 seconds to complete.
This race condition causes eclipse-che/che#22914, because the checode poststart event runs before the entrypoint script completes.
Proposed solution
To solve the race condition altogether, the entrypoint script must be complete before the post start events run. To do this, an initcontainer that runs the entrypoint script can be used.
What is the container image for the initcontainer?
To have a minimal effect on workspace startup times, the container image for the initcontainer should be the same image as the tooling container's image to avoid pulling an extra image. Assuming that the image is already pulled, I've observed no significant startup impact (see demo video in the next section) when running an extra container running the tooling image, even if the tooling image is large (quay.io/devfile/universal-developer-image:ubi8-latest has a compressed size of 2.9GB).
When thinking about it in the Docker perspective, running a container is only creating a writable layer on top of the already-pulled image layers, therefore if the image is already pulled the image size would not have a significant impact on container startup.
I'm guessingthis is what also happens in OpenShift/CRI-O, because adding the initcontainer did not increase workspace startup time significantly.What issues does this PR fix or reference?
eclipse-che/che#22914
Fixes #1260
Demo: https://youtu.be/RQeKKJgaogM
In the demo above, the init container named
init-persistent-home
runs for than 9 seconds at the 0:20 second mark. Stow is only run once (because of this if statement [3]) to set up the new PVC.So for subsequent workspace starts, the
init-persistent-home
init container takes less than a second to finish. Here is a second demo showing that:https://youtu.be/hKnb1mgBSqA?si=yuJ3EKmj_dX0qHj0&t=20
Is it tested? How?
To test this PR with Eclipse Che:
Install DWO on your cluster by adding this catalog source and installing DWO from OperatorHub:
After waiting a few minutes, the DWO from the new catalog should appear in OperatorHub. Install the one from
DevWorkspace Operator Catalog
.Install Eclipse Che with chectl:
Enabled persistent user home by adding the following in the CheCluster CR:
Create a workspace with the following repositories and confirm that the
podman info
command runs without errors after workspace startup (just like in the demo video):PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with CheIs it tested? How?
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che