Skip to content
This repository has been archived by the owner on Aug 30, 2021. It is now read-only.

Make startup scripts and service files more friendly to modern process supervision #111

Open
space88man opened this issue Oct 25, 2019 · 12 comments
Assignees

Comments

@space88man
Copy link

The rabbitmq-server launch script runs multiple downstream scripts to start epmd and beam as long-running processes. This goes against modern process supervision that want epmd in the foreground and have beam and epmd as two separate services.

epmd -daemon enables epmd to escape process supervision suites that do not capture /usr/sbin/rabbitmq-server in a cgroup.

Observations:

  1. systemd: cleaner and the recommended way is to have epmd be one service and beam another service and beam could Requires= or After= epmd.

Barely manages to tame epmd -daemon, but only because of cgroups.

  1. Tried /usr/sbin/rabbitmq-server script in a Docker container running s6 as process supervisor. epmd escapes the supervisor by double forking and running as -daemon

Suggestions:

  1. Split epmd off into a separate service file and don't use -daemon
  2. Have a more direct command line that runs /usr/lib64/erlang/...beam.swp. The daemon script seems to go through enormous contortions to launch beam.smp. Lots of runuser / checking for UID/GID etc.
    Willl something like
    ExecStart=/usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 64 ... just work?
    However I am not sure where all the -MHas -MBas etc parameters come from.
@michaelklishin michaelklishin changed the title [RFE] Make daemon script simpler to work with modern process supervision [RFE] Make daemon script more friendly to modern process supervision Oct 25, 2019
@michaelklishin
Copy link
Member

FYI, most of the script is migrated to Erlang on the rabbitmq-server-script-replacement branch.

@michaelklishin michaelklishin changed the title [RFE] Make daemon script more friendly to modern process supervision Make daemon script more friendly to modern process supervision Oct 25, 2019
@lukebakken
Copy link
Contributor

lukebakken commented Oct 25, 2019

Considerations to take into account:

  • Not all of our supported operating systems use systemd
  • Erlang packaging includes the VM (beam.smp) and epmd - they aren't separate packages.
  • The general practice in the Erlang world is for applications to also start epmd if it's not already started. If you pass -name or -sname to your erl executable, it will start epmd for you.

@space88man
Copy link
Author

space88man commented Oct 26, 2019

"Not all of our supported operating systems use systemd" - other process supervisors in containers can lose sight of epmd. Can it be started without the -daemon argument? (Systemd actually manages to capture it due to cgroups - at least on EL7 it manages to clean up epmd/beam.smp and the erl children inet_gethost processes).

I remember that in the EL6 days /etc/init.d/rabbitmq-server stop didn't always clean up everything correctly.

@michaelklishin michaelklishin changed the title Make daemon script more friendly to modern process supervision Startup scripts: make daemon script more friendly to modern process supervision Oct 27, 2019
@michaelklishin
Copy link
Member

@space88man let's keep this issue a little bit more focused. Given that RabbitMQ supports a variety of platforms that do not use systemd and most of the scripts are moving to Erlang, what are some of the specific changes that you would like to see in the RPM package?

@michaelklishin michaelklishin changed the title Startup scripts: make daemon script more friendly to modern process supervision Make startup scripts and service files more friendly to modern process supervision Oct 27, 2019
@michaelklishin michaelklishin transferred this issue from rabbitmq/rabbitmq-server Oct 27, 2019
@michaelklishin
Copy link
Member

Moved to the packaging repo as it currently seems to fit best here.

@space88man
Copy link
Author

rabbitmq-epmd.service:

[Unit]
Description=Erlang Port Mapper Daemon
After=syslog.target network.target

[Service]
User=rabbitmq
Group=rabbitmq
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/lib64/erlang/erts-10.5.3/bin/epmd

[Install]
WantedBy=rabbitmq.target

rabbitmq-server.service:

[Unit]
Description=RabbitMQ broker
After=rabbitmq-epmd.service
Requires=rabbitmq-epmd.service

[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
UMask=0027
NotifyAccess=all
TimeoutStartSec=3600
LimitNOFILE=32768
Restart=on-failure
RestartSec=10
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/sbin/rabbitmq-server
ExecStop=/usr/sbin/rabbitmqctl shutdown
SuccessExitStatus=69

[Install]
WantedBy=rabbitmq.target

rabbitmq.target:

[Unit]
Description=RabbitMQ Broker Target

@lukebakken
Copy link
Contributor

@space88man If you have a specific problem, or issue you have seen due to how RabbitMQ currently starts, that would be useful information for us.

@michaelklishin
Copy link
Member

@space88man sorry but we would not consider a change unless we understand it. Why should we adopt those unit files? What are the risks?

@space88man
Copy link
Author

space88man commented Oct 29, 2019

TL;DR: to work nicely with process supervisors (supervisord/s6 etc) don't launch epmd with -daemon.

@michaelklishin @lukebakken - This issue is intended to address process supervisors like supervisord, s6 which will be unable to manage epmd, given the way it is currently launched.

I'd like to clarify that the current rabbitmq:

  • works perfectly with systemd. Unlike other process supervisors systemd is able to manage epmd due to the enclosing cgroup.
  • except for epmd (due to -daemon), works with other process supervisors

Process supervisor problem with /usr/sbin/rabbitmq-server
@lukebakken Specific problem: epmd is not correctly managed.
The key issue with supervisord/s6 etc is that "Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started." (Taken from the supervisord docs.)

Supervisor: https://github.com/just-containers/s6-overlay. A service in s6 is just an executable supervised by a monitoring process s6-supervise. Every service has its own long-running monitor that lies between it and PID 1, so the service is not intended to be a direct child of PID 1. The monitor never dies but the service main and child processes are expected to die when the service is down.

Configure a service rabbitmq in s6 and give the launch script as /usr/sbin/rabbitmq-server. (This example is in a container to remove all the noise from other OS processes.)

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
rabbitmq    2458    2457  0 01:53 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq    2666       1  0 01:53 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd -daemon
rabbitmq    2773    2458 34 01:53 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq    3093    2773  1 01:53 ?        00:00:00 erl_child_setup 1048576
rabbitmq    3147    3093  0 01:53 ?        00:00:00 inet_gethost 4
rabbitmq    3148    3147  0 01:53 ?        00:00:00 inet_gethost 4

Observations:

  • all rabbitmq processes are children of s6-supervise 2457 except epmd which escapes and goes to PID 1
  • try to stop the service: the s6 command is s6-svc -d /run/s6/services/rabbitmq
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
rabbitmq    2666       1  0 01:53 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd -daemon

Problem: epmd survives as it was reparented to PID 1 by -daemon.

Simplest solution
Is there a way to launch epmd without the -daemon option from /usr/sbin/rabbitmq-server or /usr/lib/rabbitmq/bin/rabbitmq-server?

Two service solution
Configure a separate service for epmd (without -daemon). The process tree looks like this:

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
root        3538       1  0 02:23 pts/0    00:00:00 s6-supervise epmd
rabbitmq    3558    3538  0 02:23 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq    3561    2457  1 02:23 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq    3876    3561 67 02:23 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlm
rabbitmq    4196    3876  2 02:23 ?        00:00:00 erl_child_setup 1048576
rabbitmq    4250    4196  0 02:23 ?        00:00:00 inet_gethost 4
rabbitmq    4251    4250  0 02:23 ?        00:00:00 inet_gethost 4

Notice epmd is contained under s6-supervise 3538. Both services can be stopped cleanly.

# try to stop services cleanly. In s6 lingo, the commands are:
# s6-svc -d /run/s6/services/rabbitmq
# s6-svc -d /run/s6/services/epmd
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 01:32 pts/0    00:00:00 s6-svscan -t0 /var/run/s6/services
root          27       1  0 01:32 pts/0    00:00:00 s6-supervise s6-fdholderd
root        2457       1  0 01:53 pts/0    00:00:00 s6-supervise rabbitmq
root        3538       1  0 02:23 pts/0    00:00:00 s6-supervise epmd

Of course for this to work properly s6 (and any other process supervisor/service management) would have to be using service dependency and declare that the rabbitmq service depends on the epmd service.

In my previous post, I used systemd as it was easiest to demonstrate the dependency relationship
that rabbitmq-server.service(beam.smp) depends on rabbitmq-epmd.service(epmd) and must be started after it. @michaelklishin there is no risk: this is merely an explicit declaration that the epmd process must be started first.

@space88man
Copy link
Author

space88man commented Oct 29, 2019

@michaelklishin For RPM/systemd based systems, let me try to show the intention of the two service proposal.

  1. epmd is a standalone service;
# systemctl start rabbitmq-epmd
[root@525856dd7915 system]# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
rabbitmq    7089       1  0 06:29 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
  1. rabbitmq is a separate service but since epmd(dependency) is started it is possible to run rabbitmq
[root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
rabbitmq    7089       1  0 06:29 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq    7093       1 24 06:31 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
rabbitmq    7727    7093  0 06:31 ?        00:00:00 erl_child_setup 32768
rabbitmq    7780    7727  0 06:31 ?        00:00:00 inet_gethost 4
rabbitmq    7781    7780  0 06:31 ?        00:00:00 inet_gethost 4
  1. Suppose user for some reason forgot to start epmd:
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only

When user tries to start rabbitmq-server, it will work(!) as rabbitmq-epmd is declared as an explicit dependency.

[root@525856dd7915 system]# systemctl start rabbitmq-server; ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         719       0  0 05:59 pts/1    00:00:00 /bin/bash
rabbitmq    8026       1 12 06:34 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
rabbitmq    8234       1  0 06:34 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq    8660    8026  0 06:34 ?        00:00:00 erl_child_setup 32768
rabbitmq    8713    8660  0 06:34 ?        00:00:00 inet_gethost 4
rabbitmq    8714    8713  0 06:34 ?        00:00:00 inet_gethost 4
root        8723     719  0 06:34 pts/1    00:00:00 ps -ef

Actually, this would work anyway as Erlang has the autostart epmd capability — I am just being explicit here.

  1. Cleaning up example:
# Continued from 3...
# since epmd is started as a dependency, when rabbitmq is stopped epmd is cleaned up as well

[root@525856dd7915 system]# systemctl stop rabbitmq-server
[root@525856dd7915 system]# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         719       0  0 05:59 pts/1    00:00:00 /bin/bash
root        9094     719  0 06:37 pts/1    00:00:00 ps -ef
  1. Cleaning up separate services. Supposed epmd and rabbitmq are started separately as in 2. rabbitmq can be stopped gracefully without impacting epmd.
# initial state epmd and rabbitmq start separately
[root@525856dd7915 system]# systemctl start rabbitmq-epmd
[root@525856dd7915 system]# systemctl start rabbitmq-server
[root@525856dd7915 system]# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         719       0  0 05:59 pts/1    00:00:00 /bin/bash
rabbitmq   10962       1  0 06:46 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
rabbitmq   10965       1 44 06:46 ?        00:00:02 /usr/lib64/erlang/erts-10.5.3/bin/beam.smp -W w -A 256 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 51
rabbitmq   11599   10965  0 06:46 ?        00:00:00 erl_child_setup 32768
rabbitmq   11652   11599  0 06:46 ?        00:00:00 inet_gethost 4
rabbitmq   11653   11652  0 06:46 ?        00:00:00 inet_gethost 4
root       11659     719  0 06:46 pts/1    00:00:00 ps -ef

# stop rabbitmq-server; here epmd is unaffected
[root@525856dd7915 system]# systemctl stop rabbitmq-server; ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 05:59 ?        00:00:00 /sbin/init
root          16       1  0 05:59 ?        00:00:00 /usr/lib/systemd/systemd-journald
dbus          23       1  0 05:59 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         719       0  0 05:59 pts/1    00:00:00 /bin/bash
rabbitmq   10962       1  0 06:46 ?        00:00:00 /usr/lib64/erlang/erts-10.5.3/bin/epmd
root       11827     719  0 06:47 pts/1    00:00:00 ps -ef

This makes me think of a separate question: do people ever run multiple instances of rabbitmq with a single instance of epmd? Then this suggestion might work well in that case.

# hypothetical unit file [email protected] which templates
# an instance of rabbitmq
systemctl start rabbitmq-epmd
systemctl start rabbitmq-server@instance1
systemctl start rabbitmq-server@instance2
# etc etc all sharing the single epmd process

@lukebakken
Copy link
Contributor

lukebakken commented Oct 29, 2019

Thanks for the explanations.

do people ever run multiple instances of rabbitmq with a single instance of epmd?

Only in development environments.

@michaelklishin @dumbbell this seems like a 4.0 feature, should we choose to undertake it.

@dumbbell
Copy link
Member

From a systemd point of view, @space88man is right: epmd(1) should be managed separately because it requires privileges, can run from a user account, and open TCP ports which are different from and unrelated to RabbitMQ.

RabbitMQ's mission was never to manage epmd(1). We relied on the way Erlang works for a long time: the first Erlang node to start with or enable distribution implicitely starts epmd(1) if it's missing. Therefore that instance of epmd(1) inherits the user & environment of that Erlang node. If we take a host running both RabbitMQ and Ejabberd as an example, depending on the first service to start, epmd(1) will run under different conditions.

Anyway, as said above, RabbitMQ shouldn't do anything with epmd(1) management IMHO, this is out of scope. However, our Erlang RPM package can probably do something if that's the package in question.

For instance, the Erlang Debian package installs the following epmd.service file:

[Unit]
Description=Erlang Port Mapper Daemon
After=network.target
Requires=epmd.socket

[Service]
ExecStart=/usr/bin/epmd -systemd
Type=simple
StandardOutput=journal
StandardError=journal
User=epmd
Group=epmd

[Install]
Also=epmd.socket
WantedBy=multi-user.target

Would it help to do the same in our Erlang RPM package?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants