Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

Open
renzedj opened this issue Jan 31, 2025 · 3 comments
Open
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@renzedj
Copy link

renzedj commented Jan 31, 2025

I initially posted this as an issue in the elastic-synthetics repo, but I think it probably belongs here in elastic-agent, so I'm posting it here and cancelling it there.

I am using Elastic Synthetics for application monitoring, and I'm having an issue with the image elastic-agent-complete >= 8.16.0.

We use self-signed certs on many of the internal sites we monitor, so I have to add CA's for these certs to the browser store (nssdb) in order to run browser journeys. This is located at /usr/share/elastic-agent/.pki. The nssdb must be owned by the browser user in order to be used for browser journeys, which means that it must be chowned to elastic-agent:elastic-agent.

This worked well for elastic-agent-complete < 8.16.0, However starting with 8.16.0, elastic-agent appears to chown everything in /usr/share/elastic-agent to root:root at startup. This includes /usr/share/elastic-agent/.pki, which means that the browser cannot use the certificates from nssdb.

I posted this to the discussion forums, but have not received a response.

To replicate:

  1. Add certificates to nssdb in elastic-agent-complete image.
  2. Start elastic-agent-complete.
  3. Validate ownership of /usr/share/elastic-agent/.pki and its contents.

Workaround

I found that when I get a shell into the elastic-agent-complete container and reset /usr/share/elastic-agent/.pki ownership and permissions to the correct values, browser tests immediately start again and run correctly until the pod stops; the replacement pod of course has the incorrect permissions and ownership. As a result, I added the following workaround to a custom docker-entrypoint script:

# Set ownership after the elastic-agent process starts
if [ -d /usr/share/elastic-agent/.pki/ ]
then
  (
    while true
    do
      sleep 15
      if pgrep -f "elastic-agent container" > /dev/null || pgrep -f "elastic-agent otel" > /dev/null
      then
        chown -R elastic-agent:elastic-agent /usr/share/elastic-agent/.pki/
        find /usr/share/elastic-agent/.pki/ -type d | xargs -I {} chmod 0700 {}
        find /usr/share/elastic-agent/.pki/ -type f | xargs -I {} chmod 0600 {}
        exit 0
      fi
    done
  ) &
fi

This forks a shell which sleeps for 15s. After that, it checks every 15s to see whether elastic-agent container is running. Once it is, it resets ownership and permissions for /usr/share/elastic-agent/.pki to the correct values and exits.

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@cmacknz
Copy link
Member

cmacknz commented Jan 31, 2025

The chown was introduced in #4925 and lets the container work properly when run as root or elastic-agent (or another non-root user).

You potentially could also work around this by running the container as a non-root user and the explicitly giving it the capabilities it needs to run synthetics. Then the chown would be to the user you specify and not root.

@pkoutsovasilis might have a better suggestion or way to handle this scenario.

@pkoutsovasilis
Copy link
Contributor

pkoutsovasilis commented Jan 31, 2025

UPDATE: ok after reading again the issue description, I have to say that this seems kinda entangled; so the browser journeys are invoked as the container default user, namely elastic-agent but the actual elastic-agent process is running with root user. Is that even possible?! I have to investigate who does the user switching. In the meantime, indeed try to run the elastic-agent container without root; if you don't remove any capabilities from the container the elastic-agent process will be the essentially the same in terms of capabilities when invoked with root vs non-root. Or try to utilise the $XDG_DATA_HOME env var as the nssdb looks for $XDG_DATA_HOME/.pki when $HOME/.pki does not exist

hey @renzedj 👋 Just echoing the same message as @cmacknz this functionality of elastic-agent was introduced to mitigate some ownership inconsistencies that could lead to execution problems for certain components of elastic-agent. However, this feature affects only files under /usr/share/elastic-agent thus my initial thinking is if this /usr/share/elastic-agent/.pki can live under a different directory which is not part of /usr/share/elastic-agent e.g. under home dir of the browser user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

4 participants