[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

renzedj · 2025-01-31T16:30:48Z

I initially posted this as an issue in the elastic-synthetics repo, but I think it probably belongs here in elastic-agent, so I'm posting it here and cancelling it there.

I am using Elastic Synthetics for application monitoring, and I'm having an issue with the image elastic-agent-complete >= 8.16.0.

We use self-signed certs on many of the internal sites we monitor, so I have to add CA's for these certs to the browser store (nssdb) in order to run browser journeys. This is located at /usr/share/elastic-agent/.pki. The nssdb must be owned by the browser user in order to be used for browser journeys, which means that it must be chowned to elastic-agent:elastic-agent.

This worked well for elastic-agent-complete < 8.16.0, However starting with 8.16.0, elastic-agent appears to chown everything in /usr/share/elastic-agent to root:root at startup. This includes /usr/share/elastic-agent/.pki, which means that the browser cannot use the certificates from nssdb.

I posted this to the discussion forums, but have not received a response.

To replicate:

Add certificates to nssdb in elastic-agent-complete image.
Start elastic-agent-complete.
Validate ownership of /usr/share/elastic-agent/.pki and its contents.

Workaround

I found that when I get a shell into the elastic-agent-complete container and reset /usr/share/elastic-agent/.pki ownership and permissions to the correct values, browser tests immediately start again and run correctly until the pod stops; the replacement pod of course has the incorrect permissions and ownership. As a result, I added the following workaround to a custom docker-entrypoint script:

# Set ownership after the elastic-agent process starts
if [ -d /usr/share/elastic-agent/.pki/ ]
then
  (
    while true
    do
      sleep 15
      if pgrep -f "elastic-agent container" > /dev/null || pgrep -f "elastic-agent otel" > /dev/null
      then
        chown -R elastic-agent:elastic-agent /usr/share/elastic-agent/.pki/
        find /usr/share/elastic-agent/.pki/ -type d | xargs -I {} chmod 0700 {}
        find /usr/share/elastic-agent/.pki/ -type f | xargs -I {} chmod 0600 {}
        exit 0
      fi
    done
  ) &
fi

This forks a shell which sleeps for 15s. After that, it checks every 15s to see whether elastic-agent container is running. Once it is, it resets ownership and permissions for /usr/share/elastic-agent/.pki to the correct values and exits.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2025-01-31T18:18:52Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

cmacknz · 2025-01-31T18:47:01Z

The chown was introduced in #4925 and lets the container work properly when run as root or elastic-agent (or another non-root user).

You potentially could also work around this by running the container as a non-root user and the explicitly giving it the capabilities it needs to run synthetics. Then the chown would be to the user you specify and not root.

@pkoutsovasilis might have a better suggestion or way to handle this scenario.

pkoutsovasilis · 2025-01-31T19:49:49Z

UPDATE: ok after reading again the issue description, I have to say that this seems kinda entangled; so the browser journeys are invoked as the container default user, namely elastic-agent but the actual elastic-agent process is running with root user. Is that even possible?! I have to investigate who does the user switching. In the meantime, indeed try to run the elastic-agent container without root; if you don't remove any capabilities from the container the elastic-agent process will be the essentially the same in terms of capabilities when invoked with root vs non-root. Or try to utilise the $XDG_DATA_HOME env var as the nssdb looks for $XDG_DATA_HOME/.pki when $HOME/.pki does not exist

hey @renzedj 👋 Just echoing the same message as @cmacknz this functionality of elastic-agent was introduced to mitigate some ownership inconsistencies that could lead to execution problems for certain components of elastic-agent. However, this feature affects only files under /usr/share/elastic-agent thus my initial thinking is if this /usr/share/elastic-agent/.pki can live under a different directory which is not part of /usr/share/elastic-agent e.g. under home dir of the browser user?

renzedj added the bug Something isn't working label Jan 31, 2025

renzedj mentioned this issue Jan 31, 2025

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki elastic/synthetics#994

Closed

cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

renzedj commented Jan 31, 2025 •

edited

Loading

elasticmachine commented Jan 31, 2025

cmacknz commented Jan 31, 2025 •

edited

Loading

pkoutsovasilis commented Jan 31, 2025 •

edited

Loading

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

[Bug] elastic-agent-complete >= 8.16.0 chowns /usr/share/elastic-agent/.pki #6684

Comments

renzedj commented Jan 31, 2025 • edited Loading

Workaround

elasticmachine commented Jan 31, 2025

cmacknz commented Jan 31, 2025 • edited Loading

pkoutsovasilis commented Jan 31, 2025 • edited Loading

renzedj commented Jan 31, 2025 •

edited

Loading

cmacknz commented Jan 31, 2025 •

edited

Loading

pkoutsovasilis commented Jan 31, 2025 •

edited

Loading