doc/s6-supervise.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172

<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="Content-Language" content="en" />
    <title>s6: the s6-supervise program</title>
    <meta name="Description" content="s6: the s6-supervise program" />
    <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" />
    <!-- <link rel="stylesheet" type="text/css" href="http://skarnet.org/default.css" /> -->
  </head>
<body>

<p>
<a href="index.html">s6</a><br />
<a href="http://skarnet.org/software/">Software</a><br />
<a href="http://skarnet.org/">skarnet.org</a>
</p>

<h1> The s6-supervise program </h1>

<p>
s6-supervise monitors a long-lived process (or <em>service</em>), making sure it
stays alive, sending notifications to registered processes when it dies, and
providing an interface to control its state. s6-supervise is designed to be the
last non-leaf branch of a <em>supervision tree</em>, the supervised process
being a leaf.
</p>

<h2> Interface </h2>

<pre>
     s6-supervise <em>servicedir</em>
</pre>

<ul>
 <li> s6-supervise switches to the <em>servicedir</em>
<a href="servicedir.html">service directory</a>. </li>
 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
 <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist,
s6-supervise creates it and allows subscriptions to it from processes having the same
effective group id as the s6-supervise process.
If it already exists, it uses it as is, without modifying the subscription rights. </li>
 <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li>
 <li> If the default service state is up, s6-supervise spawns <tt>./run</tt>. </li>
 <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it
successfully spawns <tt>./run</tt>. </li>
 <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>.
It then spawns <tt>./finish</tt> if it exists.
<tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if
<tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt>
as second argument, or an undefined number if <tt>./run</tt> was not signaled. </li>
 <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that,
s6-supervise kills it with a SIGKILL. This can be configured via the
<tt>./timeout-finish</tt> file, see the description in the
<a href="servicedir.html">service directory page</a>. </li>
 <li> When <tt>./finish</tt> dies (or is killed),
s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then
it restarts <tt>./run</tt> unless it has been told not to. </li>
 <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event
to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it
<strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had
been called. This can be used to signify permanent failure to start the service. </li>
 <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping
if <tt>./run</tt> exits too quickly. </li>
 <li> When killed or asked to exit, it waits for the service to go down one last time, then
sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li>
</ul>

<h2> Options </h2>

<p>
 s6-supervise does not support options, because it is normally not run
manually via a command line; it is usually launched by its own
supervisor, <a href="s6-svscan.html">s6-svscan</a>.
 However, the behaviour of an instance of s6-supervise can be tuned via
various configuration files in the service directory. These files, and
what they do, are listed on the
<a href="servicedir.html">service directory documentation page</a>.
</p>

<h2> Readiness notification support </h2>

<p>
 If the <a href="servicedir.html">service directory</a> contains a valid
<tt>notification-fd</tt> file when the service is started, or restarted,
s6-supervise creates and listens to an additional pipe from the service
for <a href="notifywhenup.html">readiness notification</a>. When the
notification occurs, s6-supervise updates the <tt>./supervise/status</tt>
file accordingly, then sends
a <tt>'U'</tt> event to <tt>./event</tt>.
</p>

<p>
 If the service is logged, i.e. if the service directory has a
<tt>log</tt> subdirectory that is also a service directory, and the
s6-supervise process has been launched by
that is also <a href="s6-svscan.html">s6-svscan</a>, then by default
the service's stdout goes into the logging pipe. If you set
<tt>notification-fd</tt> to 1, the logging pipe will be overwritten
by the notification pipe, which is probably not what you want. Instead,
if your daemon writes a notification message to its stdout, you should
set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs
in your run script. For instance, to redirect stderr to the logger and
stdout to a <tt>notification-fd</tt> set to 3, you would start your
daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or
<tt>exec 2&gt;&amp;1 1&gt;&amp;3 3&lt;&amp;- prog...</tt> (in shell).
</p>

<h2> Signals </h2>

<p>
 s6-supervise reacts to the following signals:
</p>

<ul>
 <li> SIGTERM: bring down the service and exit, as if a
<a href="s6-svc.html">s6-svc -xd</a> command had been received </li>
 <li> SIGHUP: exit as soon as the service stops, as if a
<a href="s6-svc.html">s6-svc -x</a> command had been received </li>
 <li> SIGQUIT: close stdin, stdout and stderr and exit as soon as
the service stops, as if a
<a href="s6-svc.html">s6-svc -X</a> command had been received </li>
</ul>

<h2> Usage notes </h2>

<ul>
 <li> s6-supervise is a long-lived process. It normally runs forever, from the system's
boot scripts, until shutdown time; it should not be killed or told to exit. If you have
no use for a service, just turn it off; the s6-supervise process does not hurt. </li>
 <li> Even in boot scripts, s6-supervise should normally not be run directly. It's
better to have a collection of <a href="servicedir.html">service directories</a> in a
single <a href="scandir.html">scan directory</a>, and just run
<a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn
the necessary s6-supervise processes, and will also take care of logged services. </li>
 <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise
process; mostly to change the service state and send signals to the monitored
process. </li>
 <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise
is successfully running. </li>
 <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a
service. </li>
 <li> s6-supervise maintains internal information inside the <tt>./supervise</tt>
subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only,
but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt>
need to be read-write. </li>
</ul>

<h2> Implementation notes </h2>

<ul>
 <li> s6-supervise tries its best to stay alive and running despite possible
system call failures. It will write to its standard error everytime it encounters a
problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out
of its way to stay alive; if it encounters an unsolvable situation, it will just
die. </li>
 <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous
state machine. That means that it can read and process commands at any time, even
when the machine is in trouble (full process table, for instance). </li>
 <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak
memory</em>. <small>However, s6-supervise uses opendir(), and most opendir()
implementations internally use heap memory - so unfortunately, it's impossible to
guarantee that s6-supervise does not use heap memory at all.</small> </li>
 <li> s6-supervise has been carefully designed so every instance maintains as little
data as possible, so it uses a very small
amount of non-sharable memory. It is not a problem to have several
dozens of s6-supervise processes, even on constrained systems: resource consumption
will be negligible. </li>
</ul>

</body>
</html>