-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes at service startup logs agent gets permission denied #943
Comments
Do you know when it started happening?
It's strange that it's sporadic behavior. Is it possible that the files within the |
I think sometime in July, but I can't be sure, there's been a bit of a habit of "restart the server and it will be fine" and I'm only really looking into it now.
It is possible they are created after the ACL was set, but the acl definitely applies to the files currently there:
If I restart the agent a few times, sometimes it will work, and sometimes it wont, but in all cases the /var/log/messages file is the same file (confirmed by checking the inode number of the file is unchanging) |
If possible can you try the following cases so we can gather more insight about the issue:
|
This happened again this morning in our staging environment so I took the opportunity to try 1. With the runas cwagent removed it never has the permission denied error, and in the log I can see it's deciding to start as root (and a ps shows it running as root)
I'll try 2. in the next day or so |
Ok, I tried to, my user data is identical to before, but with this cwagent config:
With this config it works every time, although give that it's running as root I am not surprised. It does make me think it's more likely to be an amazon linux problem, rather than a cloudwatch agent problem. I think i will try upgrading to amazon linux 2023 next, and will report back on my findings |
Going down the amazon linux 2023 path is painful due to #382 but I will still persist and try to replicate in my dev env, even if I don't end up shipping to production |
Having tired amazon linux 2023 the errors remain when using the cwagent user and a file acl. |
I'm back to thinking it's to do with the log agent now. I just modified my user data to completely ditch the file acls and just add cwagent to the adm group, and make the logs have group adm with group read (I put the effort into making sure log rotate was ok too) and the permission denied errors using just basic group permissions still generates the same error My user data now instead of the facl comamnds has
The permission in /var/log
And in /var/log/audit
WIth the same cwagent config (with runas cwagent) and the symptoms are exactly the same. Sometimes restarting the daemon it receives permission denied
and so on forever or until the next restart I think it's especially interesting that one of the log files was actually opened successfully (/var/log/dmesg) and all the others receive permission denied errors (the logs continue to |
For the moment I'm going to remediate the issue by making the server "self-healing" and add the following to our userdata:
|
Hi, |
For what it is worth, this is happening to me to. I had configuration which was running fine for perhaps years. In the Summer I was making upgrades, e.g. flip to Ubuntu:22.04, I probably also absorbed the latest AMI. The service I was running would periodically get stuck (I don't think related to AWS), so I used CloudWatch to look for a log message indicating it was sick. I'd then use a Lambda to kill the sick server and have it re-start. But then I found that periodically the server would stop emitting log messages, meaning I didn't know it was stuck. I didn't root cause this issue properly, and instead created another alarm to check I was receiving log messages; and if I was not, I would kill the server and re-start it. I've since upgraded another ECS task to Ubuntu 22.04 and found the same problem is happening with that service too. If I log in to the server, I find cwagent struggling to read log files.
But sudo -u cwagent has no problem at all tailing the logs:
Killing the agent the first time didn't fix the issue:
Killing it a second time did
Between these attempts I made no changes to the permissions. I conclude there must be a software fault in the cloud watch logs agent and recommend further investigation. In the mean time I'll amend my cwagent to run as root, which is not really a preferred configuration, and I propose this issue should be re-opened. FYI I happen to work for Amazon, but this project is not related to my Amazon employment. But if AWS tech staff want to contact me, you'll find me in the corporate directory. |
I don't seem to be able to re-open the issue, so have created #1140 and referenced this issue. |
Describe the bug
Sometimes when starting up the logs agent fails to read files with permission denied. Restarting the service repeatedly will sometimes get permission denied errors, and sometimes run ok.
Steps to reproduce
I can't perfectly recreate but our setup is:
Using the amazon-linux-2 AMI amzn2-ami-hvm-2.0.20231020.1-x86_64-gp2 (currently ami-0a123353df8e77189 in eu-west-1). Launch an instance with the following user data:
What did you expect to see?
The server startup and cloudwatch logs agent shipping logs
What did you see instead?
The logs agent sometimes receives permission denied on the log files:
If I restart the service sometimes it will be able to open the files, sometimes not. I tried following the process with strace, but it only shows a simple permission denied error on the log files in question (as reflected by the error log above)
What version did you use?
amazon-cloudwatch-agent RPM package 1.300028.1-1.amzn2 although it's been happening for a while
What config did you use?
See above in user data and log output
Environment
Amazon Linux 2
Additional context
We're controlling it's access using an acl (as seen in the user data script), adding cwagent user to the adm group, and giving the adm group read access with the setfacl commands.
The text was updated successfully, but these errors were encountered: