Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Linux Helix containers usability #1374

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dougbu
Copy link
Member

@dougbu dougbu commented Mar 2, 2025

  • give helixbot user UID 1000 in Ubuntu 24.04 and 24.10 images
  • upgrade pip Python package in Debian 11 images
  • always provide and activate a venv for helixbot user
  • always install helix-scripts module and dependencies in venv
    • some images installed this module in the system Python environment
    • where possible, do this installation as helixbot user
  • bump some Python OS package versions
    • avoid some older versions e.g. python3.8
    • link new version to /usr/bin/python
    • use python -m pip instead of pip to avoid version issues
  • install sudo package in Mariner 2.0 and Azure Linux 3.0 images
  • try to get sudo working in Debian 11 images
  • nits:
    • remove helixbot2 users; they no longer help anything
    • remove virtualenv module and comments it
    • remove helix-scripts wheels after installing them
    • make some syntax a bit more consistent

@dougbu dougbu requested review from a team as code owners March 2, 2025 02:20
@dougbu dougbu force-pushed the dougbu/helix.cleanup branch 5 times, most recently from bd41e6a to 558b6a6 Compare March 3, 2025 03:04
- give `helixbot` user UID 1000 in Ubuntu 24.04 and 24.10 images
- upgrade `pip` Python package in Debian 11 images
  - work around https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003252
  - that bug caused problems installing `psutil` package
- always provide and activate a `venv` for `helixbot` user
- always install `helix-scripts` module and dependencies in `venv`
  - some images installed this module in the system Python environment
  - where possible, do this installation as `helixbot` user
- bump some Python OS package versions
  - avoid some older versions e.g. `python3.8`
  - link new version to /usr/bin/python
  - use `python -m pip` instead of `pip` to avoid version issues
- install `sudo` package in Mariner 2.0 and Azure Linux 3.0 images
- try to get `sudo` working in Debian 11 images
- nits:
  - remove `helixbot2` users; they no longer help anything
  - remove `virtualenv` module and comments it
  - remove `helix-scripts` wheels after installing them
  - make some syntax a bit more consistent
@dougbu dougbu force-pushed the dougbu/helix.cleanup branch from 558b6a6 to 5c38ffb Compare March 3, 2025 05:17
@dougbu dougbu requested a review from richlander March 3, 2025 06:37
@dougbu
Copy link
Member Author

dougbu commented Mar 3, 2025

build is green and I've done some local validation. ready for review…

@@ -76,8 +77,14 @@ RUN ARCH=$TARGETARCH \
&& dpkg -i libmsquic* \
&& rm libmsquic*

# Move user (probably ubuntu) with UID 1000 to UID 2000 to avoid conflict with helixbot
RUN user_id=$(id -un 1000) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as an undesirable solution, at least in the long-term. This approach is committing us to including all of this /etc/passwd management over the long-term in our Ubuntu Dockerfiles. Ideally, we'd change the uid we expect.

Node images use this same uid. They appear to be skipping this problem by not supporting Ubuntu.

https://hub.docker.com/_/node/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougbu, for reference, here's the script devcontainers uses to change UIDs... https://github.com/devcontainers/cli/blob/main/scripts/updateUID.Dockerfile

Is anything hardcoded to reference the helixbot user, or could you use the built-in ubuntu user instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a lot easier to just delete the ubuntu user. This is what the devcontainer Dockerfile does for .NET: https://github.com/devcontainers/images/blob/4828efe3844f92a9802aba9831b3567d79f9c958/src/dotnet/.devcontainer/Dockerfile#L8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping we can migrate to a new UID over time.

You can see that we tried to avoid this problem with our official images, with our app user. For clarity, we developed this solution MANY years later, so had the benefit of better guidance.

$ docker run --rm mcr.microsoft.com/dotnet/runtime-deps cat /etc/passwd | tail -n 1
app:x:1654:1654::/home/app:/bin/sh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot to unpack here but I'd like to avoid the long-term considerations for a bit longer. much of the infrastructure in Helix will need to change as we go further and it's not clear whether UID alignment will be required even mid-term. for now, I'm dealing w/ a few exceptions (Azure Linux 3 hosts, Ubuntu 24.* containers) rather than doing anything broader

background here is a number of attempts I made e.g., to expand file permissions in the hosts and containers, update default permissions, et cetera. but kept coming back to the unfortunate conclusion we need to align UIDs for now. without that alignment Helix work items fail or (worse) succeed but mask problems reporting tests and so on

the script you referenced @lbussell is messy and focuses entirely on /etc/passwd and /etc/groups, files that mean less and less these days. plus that script only updates the discovered user's home directory, not other files the user may own. usermod handles the user's home automatically


deleting the ubuntu user sounds great to me. any objections to that approach in another iteration❓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine for the near-term.

@@ -44,8 +44,9 @@ RUN dnf upgrade --refresh -y \
openssl-devel \
perl-FindBin \
procps-ng \
python3 \
python3-devel \
python3.12 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning pinning Python is a new practice. It would be good to include why this is needed and what other Dockerfile maintainers should do going forward, in the PR description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure why this is being done. The version associated with python3 is 3.9 which is the same version that exists in some other Helix images. So why does this need to be higher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versioning pinning Python is a new practice

not really. it was used already in src/opensuse/15.6/helix/amd64/Dockerfile

the background is fairly simple: OS vendors sometimes have Python dependencies of their own or leave the Python packages alone after a release. this means our software can easily fall behind Python's deprecations and we have to support a broader range of Python versions than is manageable

the versioned packages move forward and are sometimes backported to prior releases, reducing our costs. see below for more about Centos Stream 9 in particular (bumping Python elsewhere in this PR e.g. on Debian 11 was required)


here and in src/centos-stream/9/helix/amd64/Dockerfile, this update avoided some messier workarounds for problems installing helix-scripts dependencies when using Python 3.9. I don't remember the exact details but believe this was the most straightforward fix

that said

  1. 3.9 will be out of support in October IIRC
  2. I'll try the python -m pip install --upgrade pip workaround used in src/debian/11/helix/arm64v8/Dockerfile (where 3.9 is the newest available Python)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to document in this repo (not in this issue):

  • Min required Python version
  • Encouraged patterns for that

@@ -52,5 +53,5 @@ USER helixbot
# Install Helix Dependencies
ENV VIRTUAL_ENV=/home/helixbot/.vsts-env
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
ENV PATH="${VIRTUAL_ENV}/bin:$PATH"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice inconsistent usage of this syntax. In the AlmaLinux Dockerfile, it used ${PATH} but here it does not. I think they should all be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. expect in the next iteration (likely tomorrow at best)

@@ -29,6 +30,7 @@ RUN tdnf install --setopt tsflags=nodocs --refresh -y \
python3 \
python3-pip \
shadow-utils \
sudo \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What requires sudo? These images have been working for a long time without it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sudo worked in the majority of the Helix containers and some teams use it in their testing scenarios. I also kept hitting issues in the updated containers as I tried to understand problems while I worked through iterations of this PR. so, consistency, muscle memory, and customer scenarios all play into this

we can discuss removing sudo where that's an option later. doing so will make some things more difficult around investigations and enhancements in this repo e.g., on OSes whose package managers can only be run as root — even when querying but it may not break the world

@@ -44,8 +44,9 @@ RUN dnf upgrade --refresh -y \
openssl-devel \
perl-FindBin \
procps-ng \
python3 \
python3-devel \
python3.12 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure why this is being done. The version associated with python3 is 3.9 which is the same version that exists in some other Helix images. So why does this need to be higher?

@@ -76,8 +77,14 @@ RUN ARCH=$TARGETARCH \
&& dpkg -i libmsquic* \
&& rm libmsquic*

# Move user (probably ubuntu) with UID 1000 to UID 2000 to avoid conflict with helixbot
RUN user_id=$(id -un 1000) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a lot easier to just delete the ubuntu user. This is what the devcontainer Dockerfile does for .NET: https://github.com/devcontainers/images/blob/4828efe3844f92a9802aba9831b3567d79f9c958/src/dotnet/.devcontainer/Dockerfile#L8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants