-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing Boot Time in SONiC by Replacing Process manager #1922
base: master
Are you sure you want to change the base?
Conversation
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
``` | ||
Initialization performance analysis revealed that supervisord and supervisorctl contribute significantly to boot time, consuming roughly 20% of the total initialization period. This suggests that migrating away from these Python-based tools might offer a performance improvement. Generally, Python applications can exhibit slower startup times in these types of scenarios. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For how long perf was sampling the system during boot up? Could you please share the testing methodology?
Side note, supervisorctl
shouldn't be invoked during startup as its usage was replaced with supervisord-dependent-startup
plugin.
|
||
### 4.5 Supervisord to Runit config translation | ||
|
||
One option we are choosing is to use a Python script to automate the conversion of existing process manager (specifically supervisord) configurations into the runit format. This script, executed as part of a Docker entrypoint, transforms the provided supervisord configuration into runit service directories. This approach facilitates migration for Docker applications utilizing Jinja2 templated configuration files alongside traditional supervisord.conf files. There can be other options as well like static sv scripts etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script generating configs from j2 templates at boot will itself be heavy on cpu, how you plan to mitigate that? Can be generated at build time?
One option we are choosing is to use a Python script to automate the conversion of existing process manager (specifically supervisord) configurations into the runit format. This script, executed as part of a Docker entrypoint, transforms the provided supervisord configuration into runit service directories. This approach facilitates migration for Docker applications utilizing Jinja2 templated configuration files alongside traditional supervisord.conf files. There can be other options as well like static sv scripts etc. | ||
|
||
``` | ||
A Supervisord configuration defines a program named orchagent. This program, /usr/bin/orchagent.sh, depends on the portsyncd service. The conversion script translates this dependency into a runit run script for the orchagent service. The generated /etc/service/orchagent/run script waits for portsyncd to reach a running state before executing /usr/bin/orchagent.sh. This ensures the dependency is met before the orchagent process starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you determine a running state with runit
? Please check https://supervisord.org/subprocess.html#process-states to ensure same behaviour as well as startsecs=
parameter.
One option we are choosing is to use a Python script to automate the conversion of existing process manager (specifically supervisord) configurations into the runit format. This script, executed as part of a Docker entrypoint, transforms the provided supervisord configuration into runit service directories. This approach facilitates migration for Docker applications utilizing Jinja2 templated configuration files alongside traditional supervisord.conf files. There can be other options as well like static sv scripts etc. | ||
|
||
``` | ||
A Supervisord configuration defines a program named orchagent. This program, /usr/bin/orchagent.sh, depends on the portsyncd service. The conversion script translates this dependency into a runit run script for the orchagent service. The generated /etc/service/orchagent/run script waits for portsyncd to reach a running state before executing /usr/bin/orchagent.sh. This ensures the dependency is met before the orchagent process starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, do you investigate why orchagent
has dependency on portsyncd
? The whole idea of syncrhonizing processes using supervisord
process state like RUNNING
does not seem to guarantee any synchronization.
Similarly, a common pattern accross multiple containers is to start all processes after rsyslogd
reaches RUNNING
state in supervisord
which does not actually guarantee rsyslogd
will be capable of receiving syslog messages.
Does runit (or alternatives) has something like systemd's sd_notify ?
One option we are choosing is to use a Python script to automate the conversion of existing process manager (specifically supervisord) configurations into the runit format. This script, executed as part of a Docker entrypoint, transforms the provided supervisord configuration into runit service directories. This approach facilitates migration for Docker applications utilizing Jinja2 templated configuration files alongside traditional supervisord.conf files. There can be other options as well like static sv scripts etc. | ||
|
||
``` | ||
A Supervisord configuration defines a program named orchagent. This program, /usr/bin/orchagent.sh, depends on the portsyncd service. The conversion script translates this dependency into a runit run script for the orchagent service. The generated /etc/service/orchagent/run script waits for portsyncd to reach a running state before executing /usr/bin/orchagent.sh. This ensures the dependency is met before the orchagent process starts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please check sonic-net/sonic-buildimage#13765 for more info on the delays caused by how supervisor works.
|
||
### 4.6 Enabling Runit as the process manager | ||
|
||
To enable runit as the process manager, create an empty file named /etc/runit-manager and then trigger a configuration reload. This can be achieved by executing the command config reload or by reloading the device. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why support both at the same time?
``` | ||
|
||
## 7. Warmboot and Fastboot Design Impact | ||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has impact on warm and fast boot, please include reboot and upgrade timing
Any open issues or action items will be tracked here. This may include tasks like benchmarking different process managers, developing the configuration conversion tool, and updating the init process. | ||
|
||
``` | ||
Currently, runit doesn't offer equivalent functionality to Supervisord supervisor-proc-exit-listener for syslog alerting based on process states. This is a gap in functionality we need to address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do supervisord process states map to runit?
{ | ||
"PROCESS_MANAGER": { | ||
"runit": { | ||
"enabled": "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this related to a file based configuration described in 4.6 ?
``` | ||
Currently, runit doesn't offer equivalent functionality to Supervisord supervisor-proc-exit-listener for syslog alerting based on process states. This is a gap in functionality we need to address. | ||
|
||
Restart of docker derived based on auto-restart attribute in Feature table is not currently handled and will be handled later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Container lifetime is not controlled by init process inside container, coudl you please clarify?
|
||
 | ||
|
||
Potential replacement process managers will be evaluated based on criteria such as speed, resource consumption, and ease of integration with SONiC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has systemd
been evaluated? I realize that it is heavier in terms of code size and feature set (and possible impact to disk space), but I'm hoping that the fact that it is compiled code (compared to Python-based supervisord) still shows an improvement while preserving features.
Reducing Boot Time in SONiC by Replacing Process manager
What we did:
Replaced the current process manager (e.g.,
supervisord
) in SONiC with a more efficient alternative Runit .Why we did it:
In order to improve startup speed, this design focuses on optimizing service initialization by replacing the existing process manager with a higher-performance alternative. This is particularly crucial for switches leveraging the ASIC's internal CPU to run SONiC.
Support added:
Replaced Supervisord process manager with Runit which monitors all processes as supervisord