You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I initially posted this as an issue in the elastic-synthetics repo, but I think it probably belongs here in elastic-agent, so I'm posting it here and cancelling it there.
I am using Elastic Synthetics for application monitoring, and I'm having an issue with the image elastic-agent-complete >= 8.16.0.
We use self-signed certs on many of the internal sites we monitor, so I have to add CA's for these certs to the browser store (nssdb) in order to run browser journeys. This is located at /usr/share/elastic-agent/.pki. The nssdb must be owned by the browser user in order to be used for browser journeys, which means that it must be chowned to elastic-agent:elastic-agent.
This worked well for elastic-agent-complete < 8.16.0, However starting with 8.16.0, elastic-agent appears to chown everything in /usr/share/elastic-agent to root:root at startup. This includes /usr/share/elastic-agent/.pki, which means that the browser cannot use the certificates from nssdb.
I posted this to the discussion forums, but have not received a response.
To replicate:
Add certificates to nssdb in elastic-agent-complete image.
Start elastic-agent-complete.
Validate ownership of /usr/share/elastic-agent/.pki and its contents.
Workaround
I found that when I get a shell into the elastic-agent-complete container and reset /usr/share/elastic-agent/.pki ownership and permissions to the correct values, browser tests immediately start again and run correctly until the pod stops; the replacement pod of course has the incorrect permissions and ownership. As a result, I added the following workaround to a custom docker-entrypoint script:
# Set ownership after the elastic-agent process starts
if [ -d /usr/share/elastic-agent/.pki/ ]
then
(
while true
do
sleep 15
if pgrep -f "elastic-agent container" > /dev/null || pgrep -f "elastic-agent otel" > /dev/null
then
chown -R elastic-agent:elastic-agent /usr/share/elastic-agent/.pki/
find /usr/share/elastic-agent/.pki/ -type d | xargs -I {} chmod 0700 {}
find /usr/share/elastic-agent/.pki/ -type f | xargs -I {} chmod 0600 {}
exit 0
fi
done
) &
fi
This forks a shell which sleeps for 15s. After that, it checks every 15s to see whether elastic-agent container is running. Once it is, it resets ownership and permissions for /usr/share/elastic-agent/.pki to the correct values and exits.
The text was updated successfully, but these errors were encountered:
The chown was introduced in #4925 and lets the container work properly when run as root or elastic-agent (or another non-root user).
You potentially could also work around this by running the container as a non-root user and the explicitly giving it the capabilities it needs to run synthetics. Then the chown would be to the user you specify and not root.
@pkoutsovasilis might have a better suggestion or way to handle this scenario.
UPDATE: ok after reading again the issue description, I have to say that this seems kinda entangled; so the browser journeys are invoked as the container default user, namely elastic-agent but the actual elastic-agent process is running with root user. Is that even possible?! I have to investigate who does the user switching. In the meantime, indeed try to run the elastic-agent container without root; if you don't remove any capabilities from the container the elastic-agent process will be the essentially the same in terms of capabilities when invoked with root vs non-root. Or try to utilise the $XDG_DATA_HOME env var as the nssdb looks for $XDG_DATA_HOME/.pki when $HOME/.pki does not exist
hey @renzedj 👋 Just echoing the same message as @cmacknz this functionality of elastic-agent was introduced to mitigate some ownership inconsistencies that could lead to execution problems for certain components of elastic-agent. However, this feature affects only files under /usr/share/elastic-agent thus my initial thinking is if this /usr/share/elastic-agent/.pki can live under a different directory which is not part of /usr/share/elastic-agent e.g. under home dir of the browser user?
I initially posted this as an issue in the
elastic-synthetics
repo, but I think it probably belongs here inelastic-agent
, so I'm posting it here and cancelling it there.I am using Elastic Synthetics for application monitoring, and I'm having an issue with the image
elastic-agent-complete >= 8.16.0
.We use self-signed certs on many of the internal sites we monitor, so I have to add CA's for these certs to the browser store (nssdb) in order to run browser journeys. This is located at
/usr/share/elastic-agent/.pki
. The nssdb must be owned by the browser user in order to be used for browser journeys, which means that it must be chowned toelastic-agent:elastic-agent
.This worked well for
elastic-agent-complete < 8.16.0
, However starting with8.16.0
, elastic-agent appears to chown everything in/usr/share/elastic-agent
toroot:root
at startup. This includes/usr/share/elastic-agent/.pki
, which means that the browser cannot use the certificates fromnssdb
.I posted this to the discussion forums, but have not received a response.
To replicate:
elastic-agent-complete
image.elastic-agent-complete
./usr/share/elastic-agent/.pki
and its contents.Workaround
I found that when I get a shell into the
elastic-agent-complete
container and reset/usr/share/elastic-agent/.pki
ownership and permissions to the correct values, browser tests immediately start again and run correctly until the pod stops; the replacement pod of course has the incorrect permissions and ownership. As a result, I added the following workaround to a customdocker-entrypoint
script:This forks a shell which sleeps for 15s. After that, it checks every 15s to see whether
elastic-agent container
is running. Once it is, it resets ownership and permissions for/usr/share/elastic-agent/.pki
to the correct values and exits.The text was updated successfully, but these errors were encountered: