s6
Software
skarnet.org
The s6-permafailon program
s6-permafailon is a program that is meant to be used
in the ./finish script of a
service directory supervised by
s6-supervise. When used, it
reads and analyses the death tally of a service (i.e. the recent
process death events that happened), and if the death tally
matches a given pattern, it causes permanent failure
of the service, i.e. it tells the supervisor not to try and
restart it.
Interface
s6-permafailon secs deathcount events prog...
- s6-permafailon must have the service directory of the
tested service as its current directory. This is the default if it is
called from the finish script of the service.
- It reads the death tally of the service, which is
maintained by s6-supervise.
- If the supervised process has died at least deathcount
times in the last secs seconds with a cause listed in
events, then s6-permafailon exits 125.
- Else s6-permafailon execs into prog....
events is a comma-separated list of events. An event can be
one of the following:
- An exit code, which is an integer between 0 and 255. Example: 1
- An exit code interval, which is two exit codes separated by a dash. Example: 1-50
- A signal name, or a signal number preceded by "SIG". Examples: SIGTERM, sigabrt, sig11
Usage
- s6-supervise detects when the ./finish
script of its service exits 125, and stops respawning the service. So, if the
./finish script is a chain-loading command line starting with a
s6-permafailon invocation (or containing such an invocation), when
s6-permafailon exits 125, then the ./finish script also
exits 125 (because it is the same process), and the service is then marked as
failing permanently.
- The ./finish script is naturally a chain-loading
command line if it is written in the
execline language. It
can also be made into a chain-loading command line from a shell script by using
exec s6-permafailon secs deathcount events rest-of-chainloading-cmdline...
- Multiple invocations of s6-permafailon can be chained, in order
to test several death patterns.
- If a permanent failure is triggered and secs is high, it is
possible that when the administrator manually launches the service again,
the next death triggers a permanent failure again. If this is not wanted,
the administrator should clear the death tally with the
s6-svdt-clear command.
- The current death tally can be viewed via the s6-svdt
command.
Example
s6-permafailon 60 5 1,101-103,SIGSEGV,SIGBUS prog...
will exit 125 if the service has died 5 times in the last 60 seconds with
an exit code of 1, 101, 102 or 103, a SIGSEGV or a SIGBUS. Else it will
chainload into the prog... command line.