-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microk8s constantly spawning new processes when idle #2007
Comments
I notice that calico do spawn containers every now and then like those calico-ipam.... |
@balchua the thing is: I want to use (micro)k8s to run my software, not to constantly max out 1+ CPUs. It seems a bit excessive, but maybe my expectation is wrong and that is normal behavior? Perhaps my setup is botched? I can't really tell. The docs state:
Hogging CPU does not feel lightweight to me. |
Hi @knittl, i agree with you on that. Although the kubernetes control plane usually takes some resources of the node. You can setup a worker node only to run your apps, but with a separate control plane to avoid resource contention with your apps. I just wrote the steps on how to setup a worker node only with MicroK8s. |
@balchua I totally understand that k8s will require resources to run :) It just feels a little much for my usecase: a small number of deployments, i.e. 14 deployments with 18 pods in total, already causes the server to have a load average of 5–10 most of the time. Stopping microk8s makes the load go away. The server is a VPS with the following specs: Intel Xeon CPU with 2.2GHz and 8 cores, 32GB ram (8GB in use), and plenty of storage (1TB). |
Are you running multiple nodes? I am running my 3 node control plane on a 2 cpu x 4GB with 1 worker node. Running prometheus, linkerd, dashboard and my own workload, doesn't bring my load average to 5 or 10. |
No, it is a single-node setup. No prometheus, no dashboard. Listing all pods with
The load average currently is (5, 10, 15 minutes): 3.55, 4.33, 3.69 |
This is strange. I've not seen a load average this high on a good HW specs. The good way to try and check is to measure the load average on a bare bones MicroK8s. I.e. no workloads scheduled or running except for whatever comes default in Microk8s. Then try adding your workload one by one while measuring the load average in between. |
Another thing, if using top or htop to find the load average, IMHO i find that as long as the load average is below the number of cores, it should be fine. So in your case above a load average of 3.5 on an 8 core system is acceptable. |
A load average of 3 out of 8 cores is acceptable in theory, yes. But the system feels sluggish when being connected via SSH which it does not when microk8s is not running. And considering the server is not actively doing something, it still seems too high (i.e. 3 is too much for not doing anything) I finally found some time to analyze this a bit further. I restarted all pods this morning so that they are comparable. Afterwards, I have set up a cronjob to go over all running pods and read out their cpu usage as reported by the
The output format of my file is After less than 2 hours of running pods, the
The metrics are taken in 10 minute intervals, with the trend of calico-node as follows (+9500/+8000):
The next non-calico pod is one of my "real" applications, which currently sits at user=4000 and system=400. It is a Spring Boot application so most of the CPU was spent during startup (after all pods have started, it showed around user=2000). The trend of the Spring Boot application is not as steep (+1700/+300 in the same period):
Is this normal/expected? There are currently 22 pods running (4 in namespace |
probably related: #1567 i also take a look with execsnoop-bpfcc on a totally fresh installed system (with snaps install microk8s --classic) , this is what i'm getting from the log after a short amount of time.
@knittl : yes ! |
i have loadavg of 1 on a 4 vcpu system with a completely idle microk8s instance and i find this is not really acceptable for something which is called "micro..." . execsnoop-bpfcc at least is telling me that this not something which was build with efficiency in mind. is this a bug or is this the same old "so what? we have enough ram/cpu today!"-developer story ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
no, not stale |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Not stale, not completed. This is still something that we are actively engaging with and trying to improve further. Some notes:
Further, the MicroK8s team is working on moving most of this logic out of bash scripts and into the cluster-agent service, which means that we don't spawn new processes in a loop every 5 seconds without reason. Hope this communicates our team's work to people that are subscribed in the issue. |
can someone please reopen and stop this fu..ing stale-bots? they suck ! addon:@neoaggelos , thank you! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
why has this been closed? it does not look that it's been resolved ! ヾ( ・`⌓´・)ノ゙ |
I have a VPS server running microk8s (previously 1.19/stable and now upgraded to 1.20/stable while troubleshooting). My server's load is constantly high (5-10) with
kube-apiserver
appearing as top CPU consumer.Digging a bit more, using
execsnoop-bpfcc
I found lots of new processes being constantly spawned by microk8s. Every other second up to 50 new processes per second. The server is basically idle. That can't be right, can it?Quick and dirty stats with
grep microk8s | cut -d' ' -f1 | uniq -c
on the output ofexecsnoop-bpfcc
:But some processes excluded by this look like they were spawned by microk8s too, by looking at the PPIDs.
One of the processes which I find a lot is the following:
This process is spawned at least once per second, always with a new, unique id (a 64 character hex-string).
The logs of containerd always contain the same 4-6 lines repeated, again with changing ids:
Inspection report attached: inspection-report-20210212_211908.tar.gz
The text was updated successfully, but these errors were encountered: