s6
Software
skarnet.org
Service startup notifications
It is easy for a process supervision suite to know when a service that was up
is now down: the long-lived process implementing the service is dead. The
supervisor, running as the daemon's parent, is instantly notified via a SIGCHLD.
When it happens, s6-supervise sends a 'd' event
to its ./event fifodir, so every subscriber
knows that the service is down. All is well.
It is much trickier for a process supervision suite to know when a service
that was down is now up. The supervisor forks and execs the
daemon, and knows when the exec has succeeded; but after that point, it's all
up to the daemon itself. Some daemons do a lot of initialization work before
they're actually ready to serve, and it is impossible for the supervisor to
know exactly when the service is really ready.
s6-supervise sends a 'u' event to its
./event fifodir when it successfully
spawns the daemon, but any subscriber
reacting to 'u' is subject to a race condition - the service provided by the
daemon may not be ready yet.
Reliable startup notifications need support from the daemons themselves.
Daemons should do two things to signal the outside world that they are
ready:
- Update a state file, so other processes can get a snapshot
of the daemon's state
- Send an event to processes waiting for a state change.
This is complex to implement in every single daemon, so s6 provides
tools to make it easier for daemon authors, without any need to link
against the s6 library or use any s6-specific construct:
daemons can simply write a line to a file descriptor of their choice,
then close that file descriptor, when they're ready to serve. This is
a generic mechanism that some daemons already implement.
s6 supports that mechanism natively: when the
service directory for the daemon contains
a valid notification-fd file, the daemon's supervisor, i.e. the
s6-supervise program, will properly catch
the daemon's message, update the status file (supervise/status),
then notify all the subscribers
with a 'U' event, meaning that the service is now up and ready.
This method should really be implemented in every long-running
program providing a service. When it is not the case, it's impossible
to provide reliable startup notifications, and subscribers should then
be content with the unreliable 'u' events provided by s6-supervise.
Unfortunately, a lot of long-running programs do not offer that
functionality; instead, they provide a way to poll them, an external
program that runs and checks whether the service is ready. This is a
bad mechanism, for
several
reasons. Nevertheless, until all daemons are patched to notify their
own readiness, s6 provides a way to run such a check program to poll
for readiness, and route its result into the s6 notification system:
s6-notifyoncheck.
How to use a check program with s6 (i.e. readiness checking via polling)
- Let's say you have a daemon foo, started under s6 via a
/run/service/foo service directory, and that comes with a
foo-check program that exhibits different behaviours when
foo is ready and when it is not.
- Create an executable script /run/service/foo/data/check
that calls foo-check. Make sure this script exits 0 when
foo is ready and nonzero when it's not.
- In your /run/service/foo/run script that starts foo,
instead of executing into foo, execute into
s6-notifyoncheck foo. Read the
s6-notifyoncheck page if you need to
give it options to tune the polling.
- echo 3 > /run/service/foo/notification-fd. If file descriptor
3 is already open when your run script executes foo, replace 3 with
a file descriptor you know is not already open.
- That's it.
- Your check script will be automatically invoked by
s6-notifyoncheck, until it succeeds.
- s6-notifyoncheck will send the
readiness notification to the file descriptor given in the notification-fd
file.
- s6-supervise will receive it and will
mark foo as ready.
How to design a daemon so it uses the s6 mechanism without resorting to polling (i.e. readiness notification)
The s6-notifyoncheck mechanism was
made to accommodate daemons that provide a check program but do not notify
readiness themselves; it works, but is suboptimal.
If you are writing the foo daemon, here is how you can make things better:
- Readiness notification should be optional, so you should guard all
the following with a run-time option to foo.
- Assume a file descriptor other than 0, 1 or 2 is going to be open.
You can hardcode 3 (or 4); or you can make it configurable via a command line
option. See for instance the -D notif option to the
mdevd program. It
really doesn't matter what this number is; the important thing is that your
daemon knows that this fd is already open, and is not using it for another
purpose.
- Do nothing with this file descriptor until your daemon is ready.
- When your daemon is ready, write a newline to this file descriptor.
- If you like, you may write other data before the newline, just in
case it is printed to the terminal. It is not necessary, and it is best to
keep that data short. If the line is read by
s6-supervise, it will be entirely ignored;
only the newline is important.
- Then close that file descriptor.
The user who then makes foo run under s6 just has to do the
following:
- Write 3, or the file descriptor the foo daemon uses
to notify readiness, to the /run/service/foo/notification-fd file.
- In the /run/service/foo/run script, invoke foo
with the option that activates the readiness notification. If foo
makes the notification fd configurable, the user needs to make sure that
the number that is given to this option is the same as the number that is
written in the notification-fd file.
- And that is all. Do not use s6-notifyoncheck
in this case, because you do not need to poll to know whether foo
is ready; instead, foo will directly communicate its readiness to
s6-supervise, and that is a much more efficient
mechanism.
What does s6-supervise do with this
readiness information?
- s6-supervise maintains a readiness
state for other programs to read. You can check for it, for instance, via
the s6-svstat program.
- s6-supervise also broadcasts the
readiness event to programs that are waiting for it - for instance the
s6-svwait program. This can be used to
make sure that other programs only start when the daemon is ready. For
instance, the
s6-rc service manager uses
that mechanism to bring sets of services up or down: a service starts as
soon as all its dependencies are ready, but never earlier.