s6
Software
skarnet.org

Service startup notifications

It is easy for a process supervision suite to know when a service that was up is now down: the long-lived process implementing the service is dead. The supervisor, running as the daemon's parent, is instantly notified via a SIGCHLD. When it happens, s6-supervise sends a 'd' event to its ./event fifodir, so every subscriber knows that the service is down. All is well.

It is much trickier for a process supervision suite to know when a service that was down is now up. The supervisor forks and execs the daemon, and knows when the exec has succeeded; but after that point, it's all up to the daemon itself. Some daemons do a lot of initialization work before they're actually ready to serve, and it is impossible for the supervisor to know exactly when the service is really ready. s6-supervise sends a 'u' event to its ./event fifodir when it successfully spawns the daemon, but any subscriber reacting to 'u' is subject to a race condition - the service provided by the daemon may not be ready yet.

Reliable startup notifications need support from the daemons themselves. Daemons should do two things to signal the outside world that they are ready:

  1. Update a state file, so other processes can get a snapshot of the daemon's state
  2. Send an event to processes waiting for a state change.

This is complex to implement in every single daemon, so s6 provides tools to make it easier for daemon authors, without any need to link against the s6 library or use any s6-specific construct: daemons can simply write a line to a file descriptor of their choice, then close that file descriptor, when they're ready to serve. This is a generic mechanism that some daemons already implement. The administrator can then run the daemon under s6-notifywhenup, which will properly catch the daemon's message and update a state file itself, then notify all the subscribers with a 'U' event, meaning that the service is now up.
Note that there is still a small race condition remaining: if the daemon writes a line then instantly dies, and the supervisor picks up the death before the s6-notifywhenup program picks up the line, it is possible for the event sequence written to the fifodir to be wrong - 'd' before 'U'. This should be extremely rare, but unfortunately the race condition is unavoidable. The only way to be absolutely race-free is to have the daemon perform its readiness notification itself, which requires specific support.

This method should really be implemented in every long-running program providing a service. When it is not the case, it's impossible to provide reliable startup notifications, and subscribers should then be content with the unreliable 'u' events provided by s6-supervise.