summaryrefslogtreecommitdiff
path: root/doc/notifywhenup.html
blob: 6f9b06bc4a0279df4ba27236875b1c815aab1f06 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: service startup notifications</title>
    <meta name="Description" content="s6: service startup notifications" />
    <meta name="Keywords" content="s6 ftrig notification notifier writer libftrigw ftrigw startup U up svwait s6-svwait" />
    <!-- <link rel="stylesheet" type="text/css" href="http://skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="http://skarnet.org/software/">Software</a><br />
<a href="http://skarnet.org/">skarnet.org</a>
</p>

<h1> Service startup notifications </h1>

<p>
 It is easy for a process supervision suite to know when a service that was <em>up</em>
is now <em>down</em>: the long-lived process implementing the service is dead. The
supervisor, running as the daemon's parent, is instantly notified via a SIGCHLD.
When it happens, <a href="s6-supervise.html">s6-supervise</a> sends a 'd' event
to its <tt>./event</tt> <a href="fifodir.html">fifodir</a>, so every subscriber
knows that the service is down. All is well.
</p>

<p>
 It is much trickier for a process supervision suite to know when a service
that was <em>down</em> is now <em>up</em>. The supervisor forks and execs the
daemon, and knows when the exec has succeeded; but after that point, it's all
up to the daemon itself. Some daemons do a lot of initialization work before
they're actually ready to serve, and it is impossible for the supervisor to
know exactly <em>when</em> the service is really ready.
<a href="s6-supervise.html">s6-supervise</a> sends a 'u' event to its
<tt>./event</tt> <a href="fifodir.html">fifodir</a> when it successfully
spawns the daemon, but any subscriber
reacting to 'u' is subject to a race condition - the service provided by the
daemon may not be ready yet.
</p>

<p>
 Reliable startup notifications need support from the daemons themselves.
Daemons should do two things to signal the outside world that they are
ready:
</p>

<ol>
 <li> Update a state file, so other processes can get a snapshot
of the daemon's state </li>
 <li> Send an event to processes waiting for a state change. </li>
</ol>

<p>
 This is complex to implement in every single daemon, so s6 provides
tools to make it easier for daemon authors, without any need to link
against the s6 library or use any s6-specific construct:
 daemons can simply write a line to a file descriptor of their choice,
then close that file descriptor, when they're ready to serve. This is
a generic mechanism that some daemons already implement.
The administrator can
then run the daemon under <a href="s6-notifywhenup.html">s6-notifywhenup</a>,
which will properly catch the daemon's message and update a state file
itself, then notify all the subscribers
with a 'U' event, meaning that the service is now up. <br />
 Note that there is <em>still</em> a small race condition remaining:
if the daemon writes a line then instantly dies, and the supervisor
picks up the death before the <a href="s6-notifywhenup.html">s6-notifywhenup</a>
program picks up the line, it is possible for the event sequence written
to the fifodir to be wrong - 'd' before 'U'. This should be extremely
rare, but unfortunately the race condition is unavoidable. The only
way to be absolutely race-free is to have the daemon perform its
readiness notification itself, which requires specific support.
</p>

<p>
 This method should really be implemented in every long-running
program providing a service. When it is not the case, it's impossible
to provide reliable startup notifications, and subscribers should then
be content with the unreliable 'u' events provided by s6-supervise.
</p>

</body>
</html>