-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Linux Helix containers usability #1374
base: main
Are you sure you want to change the base?
Conversation
bd41e6a
to
558b6a6
Compare
- give `helixbot` user UID 1000 in Ubuntu 24.04 and 24.10 images - upgrade `pip` Python package in Debian 11 images - work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003252 - that bug caused problems installing `psutil` package - always provide and activate a `venv` for `helixbot` user - always install `helix-scripts` module and dependencies in `venv` - some images installed this module in the system Python environment - where possible, do this installation as `helixbot` user - bump some Python OS package versions - avoid some older versions e.g. `python3.8` - link new version to /usr/bin/python - use `python -m pip` instead of `pip` to avoid version issues - install `sudo` package in Mariner 2.0 and Azure Linux 3.0 images - try to get `sudo` working in Debian 11 images - nits: - remove `helixbot2` users; they no longer help anything - remove `virtualenv` module and comments it - remove `helix-scripts` wheels after installing them - make some syntax a bit more consistent
558b6a6
to
5c38ffb
Compare
build is green and I've done some local validation. ready for review… |
@@ -76,8 +77,14 @@ RUN ARCH=$TARGETARCH \ | |||
&& dpkg -i libmsquic* \ | |||
&& rm libmsquic* | |||
|
|||
# Move user (probably ubuntu) with UID 1000 to UID 2000 to avoid conflict with helixbot | |||
RUN user_id=$(id -un 1000) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as an undesirable solution, at least in the long-term. This approach is committing us to including all of this /etc/passwd
management over the long-term in our Ubuntu Dockerfiles. Ideally, we'd change the uid we expect.
Node images use this same uid. They appear to be skipping this problem by not supporting Ubuntu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dougbu, for reference, here's the script devcontainers uses to change UIDs... https://github.com/devcontainers/cli/blob/main/scripts/updateUID.Dockerfile
Is anything hardcoded to reference the helixbot
user, or could you use the built-in ubuntu user instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a lot easier to just delete the ubuntu
user. This is what the devcontainer Dockerfile does for .NET: https://github.com/devcontainers/images/blob/4828efe3844f92a9802aba9831b3567d79f9c958/src/dotnet/.devcontainer/Dockerfile#L8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping we can migrate to a new UID over time.
You can see that we tried to avoid this problem with our official images, with our app
user. For clarity, we developed this solution MANY years later, so had the benefit of better guidance.
$ docker run --rm mcr.microsoft.com/dotnet/runtime-deps cat /etc/passwd | tail -n 1
app:x:1654:1654::/home/app:/bin/sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a lot to unpack here but I'd like to avoid the long-term considerations for a bit longer. much of the infrastructure in Helix will need to change as we go further and it's not clear whether UID alignment will be required even mid-term. for now, I'm dealing w/ a few exceptions (Azure Linux 3 hosts, Ubuntu 24.* containers) rather than doing anything broader
background here is a number of attempts I made e.g., to expand file permissions in the hosts and containers, update default permissions, et cetera. but kept coming back to the unfortunate conclusion we need to align UIDs for now. without that alignment Helix work items fail or (worse) succeed but mask problems reporting tests and so on
the script you referenced @lbussell is messy and focuses entirely on /etc/passwd and /etc/groups, files that mean less and less these days. plus that script only updates the discovered user's home directory, not other files the user may own. usermod
handles the user's home automatically
deleting the ubuntu
user sounds great to me. any objections to that approach in another iteration❓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine for the near-term.
@@ -44,8 +44,9 @@ RUN dnf upgrade --refresh -y \ | |||
openssl-devel \ | |||
perl-FindBin \ | |||
procps-ng \ | |||
python3 \ | |||
python3-devel \ | |||
python3.12 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Versioning pinning Python is a new practice. It would be good to include why this is needed and what other Dockerfile maintainers should do going forward, in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure why this is being done. The version associated with python3
is 3.9 which is the same version that exists in some other Helix images. So why does this need to be higher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Versioning pinning Python is a new practice
not really. it was used already in src/opensuse/15.6/helix/amd64/Dockerfile
the background is fairly simple: OS vendors sometimes have Python dependencies of their own or leave the Python packages alone after a release. this means our software can easily fall behind Python's deprecations and we have to support a broader range of Python versions than is manageable
the versioned packages move forward and are sometimes backported to prior releases, reducing our costs. see below for more about Centos Stream 9 in particular (bumping Python elsewhere in this PR e.g. on Debian 11 was required)
here and in src/centos-stream/9/helix/amd64/Dockerfile, this update avoided some messier workarounds for problems installing helix-scripts
dependencies when using Python 3.9. I don't remember the exact details but believe this was the most straightforward fix
that said
- 3.9 will be out of support in October IIRC
- I'll try the
python -m pip install --upgrade pip
workaround used in src/debian/11/helix/arm64v8/Dockerfile (where 3.9 is the newest available Python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document in this repo (not in this issue):
- Min required Python version
- Encouraged patterns for that
@@ -52,5 +53,5 @@ USER helixbot | |||
# Install Helix Dependencies | |||
ENV VIRTUAL_ENV=/home/helixbot/.vsts-env | |||
RUN python3 -m venv $VIRTUAL_ENV | |||
ENV PATH="$VIRTUAL_ENV/bin:$PATH" | |||
ENV PATH="${VIRTUAL_ENV}/bin:$PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice inconsistent usage of this syntax. In the AlmaLinux Dockerfile, it used ${PATH}
but here it does not. I think they should all be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. expect in the next iteration (likely tomorrow at best)
@@ -29,6 +30,7 @@ RUN tdnf install --setopt tsflags=nodocs --refresh -y \ | |||
python3 \ | |||
python3-pip \ | |||
shadow-utils \ | |||
sudo \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What requires sudo
? These images have been working for a long time without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo
worked in the majority of the Helix containers and some teams use it in their testing scenarios. I also kept hitting issues in the updated containers as I tried to understand problems while I worked through iterations of this PR. so, consistency, muscle memory, and customer scenarios all play into this
we can discuss removing sudo
where that's an option later. doing so will make some things more difficult around investigations and enhancements in this repo e.g., on OSes whose package managers can only be run as root — even when querying but it may not break the world
@@ -44,8 +44,9 @@ RUN dnf upgrade --refresh -y \ | |||
openssl-devel \ | |||
perl-FindBin \ | |||
procps-ng \ | |||
python3 \ | |||
python3-devel \ | |||
python3.12 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure why this is being done. The version associated with python3
is 3.9 which is the same version that exists in some other Helix images. So why does this need to be higher?
@@ -76,8 +77,14 @@ RUN ARCH=$TARGETARCH \ | |||
&& dpkg -i libmsquic* \ | |||
&& rm libmsquic* | |||
|
|||
# Move user (probably ubuntu) with UID 1000 to UID 2000 to avoid conflict with helixbot | |||
RUN user_id=$(id -un 1000) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a lot easier to just delete the ubuntu
user. This is what the devcontainer Dockerfile does for .NET: https://github.com/devcontainers/images/blob/4828efe3844f92a9802aba9831b3567d79f9c958/src/dotnet/.devcontainer/Dockerfile#L8
helixbot
user UID 1000 in Ubuntu 24.04 and 24.10 imagespip
Python package in Debian 11 imagespsutil
packagevenv
forhelixbot
userhelix-scripts
module and dependencies invenv
helixbot
userpython3.8
python -m pip
instead ofpip
to avoid version issuessudo
package in Mariner 2.0 and Azure Linux 3.0 imagessudo
working in Debian 11 imageshelixbot2
users; they no longer help anythingvirtualenv
module and comments ithelix-scripts
wheels after installing them