3 files changed, 520 insertions, 0 deletions
diff --git a/doc/design/concepts.html b/doc/design/concepts.html
new file mode 100644
index 0000000..b5f1268
--- /dev/null
+++ b/doc/design/concepts.html
@@ -0,0 +1,251 @@
+<!doctype html>
+<html lang="en">
+<head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <meta name="description" content="s6-rc: service management concepts">
+    <meta name="keywords" content="s6-rc service management concepts dependencies unix administration root laurent bercot ska skarnet supervision init system boot service systemd alternative" />
+    <title>s6-rc: service management concepts</title>
+    <link rel="stylesheet" href="/css/pure/pure-min.css">
+    <!-- <link rel="stylesheet" href="https://unpkg.com/purecss@2.0.5/build/pure-min.css" integrity="sha384-G9DpmGxRIF6tpgbrkZVcZDeIomEU22LgTguPAI739bbKytjPE/kHTK5YxjJAAEXC" crossorigin="anonymous"> -->
+    <link rel="stylesheet" href="/layouts/side-menu/styles.css">
+</head>
+<body>
+
+<div id="layout">
+  <!-- Menu toggle -->
+  <a href="#menu" id="menuLink" class="menu-link">
+    <!-- Hamburger icon -->
+    <span></span>
+  </a>
+
+  <div id="menu">
+    <div class="pure-menu">
+      <a class="pure-menu-heading" style="text-transform:none;" href="/">skarnet.com</a>
+      <ul class="pure-menu-list">
+        <li class="pure-menu-item"> <a href="/" class="pure-menu-link">Home</a> </li>
+        <li class="pure-menu-item"> <a href="/projects/" class="pure-menu-link">Projects</a> </li>
+        <li class="pure-menu-item"> <a href="/contact/" class="pure-menu-link">Contact</a> </li>
+        <li class="pure-menu-item"> <a href="//skarnet.org/" class="pure-menu-link">skarnet.org</a> </li>
+      </ul>
+    </div>
+  </div>
+
+  <div id="main">
+    <div class="header">
+      <h1> s6-rc: service management concepts </h1>
+        <h2> The foundations for a solid design </h2>
+    </div>
+
+    <div class="content">
+
+      <p>
+      </p>
+
+      <h2 class="content-subhead" id="toc"> Table of contents </h2>
+
+      <ul>
+        <li> <a href="#toc">Table of contents</a> </li>
+        <li> <a href="#states">Service states, machine states</a> </li>
+        <li> <a href="#transitions">Transitions</a> </li>
+        <li> <a href="#dependencies">Dependencies</a> </li>
+        <li> <a href="#servicesets">Live set, working set</a> </li>
+      </ul>
+
+      <h2 class="content-subhead" id="states"> Service states, machine states </h2>
+
+      <p>
+        The job of a service manager is to bring the machine from one state, the
+        <em>current state</em>, to another, the <em>wanted state</em>,
+        either at boot time or at the administrator's request. The process by which
+        the machine moves from the <em>current state</em> to the <em>wanted state</em>
+        is called a <em>transition</em>.
+      </p>
+
+      <p>
+        The state of a machine is defined by the services that are running on it.
+        A service can have two states: <em>up</em> or <em>down</em>. Some service
+        managers like to define other states, such as "started" or "failed", but
+        these are not real states as seen by an external user: a web browser does
+        not care whether the web server has been "started" or has "failed", all it
+        sees is whether it is <em>up</em> or <em>down</em>.
+      </p>
+
+      <p>
+        (The previous sentence is not totally accurate. What a web browser sees is
+        whether the web server is <em>up and ready</em>: readiness is defined by the
+        ability for a service to... provide service. A service can be <em>up</em> but
+        not <em>ready</em> yet when it is in the process of initializing itself. We
+        will explore readiness in more detail later; for now, you can consider that
+        <em>up</em> means <em>up and ready</em>, unless explicitly stated otherwise.)
+      </p>
+
+      <p>
+        The machine's <em>current state</em> is a set of service states. For instance,
+        at boot time, the machine's <em>current state</em> is "all the services are
+        <em>down</em>", and the machine's <em>wanted state</em> is "a certain set of
+        services are <em>up</em>". (We name this certain set of services the
+        <em>top bundle</em>; more on that later.)
+      </p>
+
+      <h2 class="content-subhead" id="transitions"> Transitions </h2>
+
+      <p>
+        Since a machine state is a set of service state, as a direct consequence,
+        a machine's <em>transition</em> is a set of service <em>transitions</em> from
+        their <em>current state</em> to their <em>wanted state</em>. If the machine is
+        bringing a set of services up, it is called an <em>up transition</em> &mdash; and
+        every service in the set undergoes an <em>up transition</em>;
+        if the machine is bringing a set of services down, then it is called a <em>down
+        transition</em>, and services in the set undergo a <em>down transition</em> as well.
+        Note that every possible machine transition can be seen as a <em>down transition</em>
+        followed by an <em>up transition</em>, and being able to reason separately on sets of
+        <em>down transitions</em> and on sets of <em>up transitions</em> is a very useful
+        property, that we will make heavy use of.
+      </p>
+
+      <p>
+        A service transition can succeed, in which case the machine's <em>current state</em>
+        changes, getting closer to the <em>wanted state</em>, or it can fail.
+        When it fails, what the service manager does depends on certain factors:
+      </p>
+
+      <ul>
+       <li>
+        If the failure can be identified as <em>permanent</em>, then attempting the transition
+        again is pointless. In which case the transition permanently fails, and that means
+        the machine state transition fails - the machine will never reach its <em>wanted
+        state</em>. That does not mean other service transitions stop; they continue, and the
+        machine state ends up as close as possible to the <em>wanted state</em>, but it will
+        not reach it, and the user is informed of the failure.
+       </li>
+
+       <li>
+        If the failure can be identified as <em>temporary</em>, then the transition can be
+        retried. The delay between two attempts, as well as the maximum number of attempts,
+        depends on what the administrator has configured for the service: it is the
+        <em>retry policy</em>. If the transition has still not succeeded after the defined
+        maximum number of attempts, then the failure becomes permanent and the user is
+        informed.
+       </li>
+      </ul>
+
+      <p>
+        The way to identify permanent and temporary failures depends on the service, and are
+        configured as part of the <em>retry policy</em>. 
+      </p>
+
+      <p>
+        As a special engineering note, that is unsatisfying from a theoretical point of view
+        (because it makes our concepts asymmetrical) but <em>vital</em> where real-life services
+        are concerned, let us mention right away that <strong>down transitions should never
+        fail</strong>. Except in very specific, very rare cases, it should always be possible
+        to successfully stop a service: as far as services are concerned, <em>death is always
+        an out</em>. Allowing down transitions to fail leads to ridiculous issues like
+        <a href="https://github.com/systemd/systemd/issues/12967">systemd being unable to
+        shutdown a system</a>. This should never happen: when a user wants their system off,
+        <em>they want it off</em>, and fighting against that will only cause frustration and
+        plug-pulling.
+      </p>
+
+      <h3 class="content-subsubhead"> Parallelism </h3>
+
+      <p>
+        A traditional serial service manager performs all <em>transitions</em> one after
+        another, in a sequence; this is not efficient, because if a transition spends some
+        time waiting, or even doing CPU-intensive computations on one core while other cores
+        are available, then time is wasted if other transitions could be taking place during
+        that time. A good service manager is able to perform transitions <em>in parallel</em>,
+        to make the best use of the machine's available resources.
+      </p>
+
+      <p>
+        In order to perform transitions in parallel, the service manager must know what
+        transitions are independent (so they can be performed at the same time without
+        influencing one another) and which ones can only be done in a sequence. That means
+        that the administrator must provide the service manager with a list of
+        <em>dependencies</em> between services.
+      </p>
+
+      <h2 class="content-subhead" id="dependencies"> Dependencies </h2>
+
+      <p>
+        At a very basic level, a <em>dependency</em> from service <tt>B</tt> to service <tt>A</tt>
+        means that <tt>B</tt> can only be <em>up</em> when <tt>A</tt> is <em>up</em>; and so,
+        <tt>B</tt> should only be brought up once <tt>A</tt> is already up. For instance, a web
+        server should only be brought up when the database hosting its content is itself up.
+      </p>
+
+      <p>
+        A service <tt>C</tt> that has nothing to do with <tt>A</tt> or <tt>B</tt> can be brought
+        up whenever &mdash; in particular, it can be brought up in parallel with <tt>A</tt> or
+        <tt>B</tt>, without being bound by their state in any way.
+      </p>
+
+      <p>
+        If a service <tt>D</tt> depends on <tt>B</tt>, and <tt>A</tt> depends on <tt>D</tt>, then
+        the dependencies are invalid: there is a <em>dependency cycle</em>,
+        <tt>D</tt> &rarr; <tt>B</tt> &rarr; <tt>A</tt> &rarr; <tt>D</tt>. This configuration must
+        be rejected by the service manager.
+      </p>
+
+      <p>
+        On the other hand, if <tt>D</tt> and <tt>E</tt> both depend on <tt>B</tt>, and <tt>F</tt>
+        depends on both <tt>D</tt> and <tt>E</tt>, it is not a cycle, and it is acceptable: the
+        service manager will first bring <tt>A</tt> up, then <tt>B</tt>, then <tt>D</tt> and <tt>E</tt>
+        in parallel, then <tt>F</tt> once both <tt>D</tt> and <tt>E</tt> are up.
+      </p>
+
+      <p>
+        This shows that the acceptable structure for a list of dependencies is a <em>directed
+        acyclic graph</em>, or DAG. When we talk about the list of dependencies, we should say
+        <em>the dependency DAG</em>, but it is a bit hermetic, so we'll just talk about the
+        <em>dependency graph</em>.
+      </p>
+
+      <p>
+        One of the most important aspects of a service manager is validation of the <em>dependency
+        graph</em>. If the depdendency graph is invalid, then the service manager cannot do its
+        jobs of bringing services up or down in the proper order. If this validation happens at
+        boot time, when the service manager starts, and the graph happens to be invalid, then what
+        should the service manager do?
+      </p>
+
+      <p>
+        Boot time is the <em>worst</em> possible time to detect errors, especially in low-level
+        software such as a service manager, because the machine is not fully operational yet and
+        the administrator may not have many tools to fix the problem. In particular, if the
+        network services are started by the service manager, dependency graph validation happens
+        before the network is operational, and if it fails, the machine has no network. Nobody
+        wants that.
+      </p>
+
+      <p>
+        Consequently, dependency graph validation must be done <em>before</em> boot time.
+        A service set must be checked and validated while the machine is already running and
+        functional, before it is rebooted. It must be possible to <em>guarantee bootability</em>
+        on a service set once it has been checked.
+      </p>
+
+      <p>
+        This is why a service manager must have both <em>offline tools</em> and <em>online
+        tools</em>, and keep two separate sets of services: the <em>live set</em> and the
+        <em>working set</em>.
+      </p>
+
+      <h2 class="content-subhead" id="servicesets"> Live set, working set </h2>
+
+      <p>
+        (The prototype version of s6-rc uses the concept of <em>service databases</em>;
+        there is one <em>live service database</em> and all the others are, implicitly,
+        <em>working service databases</em>. We change the terminology here, at the same time
+        we refine the concept).
+      </p>
+
+    </div>
+  </div>
+</div>
+
+<script src="/js/ui.js"></script>
+</body>
+</html>
diff --git a/doc/design/index.html b/doc/design/index.html
new file mode 100644
index 0000000..42e8a7d
--- /dev/null
+++ b/doc/design/index.html
@@ -0,0 +1,66 @@
+<!doctype html>
+<html lang="en">
+<head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <meta name="description" content="the s6 ecosystem: the s6-rc service manager">
+    <meta name="keywords" content="s6-rc service manager engine management dependencies unix administration root laurent bercot ska skarnet supervision init system boot service systemd alternative" />
+    <title>the s6 ecosystem: the s6-rc service manager</title>
+    <link rel="stylesheet" href="/css/pure/pure-min.css">
+    <!-- <link rel="stylesheet" href="https://unpkg.com/purecss@2.0.5/build/pure-min.css" integrity="sha384-G9DpmGxRIF6tpgbrkZVcZDeIomEU22LgTguPAI739bbKytjPE/kHTK5YxjJAAEXC" crossorigin="anonymous"> -->
+    <link rel="stylesheet" href="/layouts/side-menu/styles.css">
+</head>
+<body>
+
+<div id="layout">
+  <!-- Menu toggle -->
+  <a href="#menu" id="menuLink" class="menu-link">
+    <!-- Hamburger icon -->
+    <span></span>
+  </a>
+
+  <div id="menu">
+    <div class="pure-menu">
+      <a class="pure-menu-heading" style="text-transform:none;" href="/">skarnet.com</a>
+      <ul class="pure-menu-list">
+        <li class="pure-menu-item"> <a href="/" class="pure-menu-link">Home</a> </li>
+        <li class="pure-menu-item"> <a href="/projects/" class="pure-menu-link">Projects</a> </li>
+        <li class="pure-menu-item"> <a href="/contact/" class="pure-menu-link">Contact</a> </li>
+        <li class="pure-menu-item"> <a href="//skarnet.org/" class="pure-menu-link">skarnet.org</a> </li>
+      </ul>
+    </div>
+  </div>
+
+  <div id="main">
+    <div class="header">
+      <h1> s6-rc </h1>
+        <h2> A powerful and reliable service management engine </h2>
+    </div>
+
+    <div class="content">
+
+      <p>
+      </p>
+
+      <h2 class="content-subhead" id="toc"> Table of contents </h2>
+
+      <ul>
+        <li> <a href="#toc">Table of contents</a> </li>
+        <li> <a href="#concepts">Service management concepts</a> </li>
+        <li> <a href="#programs">s6-rc programs and their roles</a> </li>
+      </ul>
+
+      <h2 class="content-subhead" id="concepts"> Service management concepts </h2>
+
+      <p>
+        <a href="concepts.html">This page</a> explains a few essential concepts
+        taking part in the design of the s6-rc service manager.
+      </p>
+
+    </div>
+  </div>
+</div>
+
+<script src="/js/ui.js"></script>
+</body>
+</html>
diff --git a/doc/design/services.html b/doc/design/services.html
new file mode 100644
index 0000000..b6275af
--- /dev/null
+++ b/doc/design/services.html
@@ -0,0 +1,203 @@
+<!doctype html>
+<html lang="en">
+<head>
+    <meta charset="utf-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <meta name="description" content="s6-rc: services">
+    <meta name="keywords" content="s6-rc service management services dependencies unix administration root laurent bercot ska skarnet supervision init system boot systemd alternative" />
+    <title>s6-rc: services</title>
+    <link rel="stylesheet" href="/css/pure/pure-min.css">
+    <!-- <link rel="stylesheet" href="https://unpkg.com/purecss@2.0.5/build/pure-min.css" integrity="sha384-G9DpmGxRIF6tpgbrkZVcZDeIomEU22LgTguPAI739bbKytjPE/kHTK5YxjJAAEXC" crossorigin="anonymous"> -->
+    <link rel="stylesheet" href="/layouts/side-menu/styles.css">
+</head>
+<body>
+
+<div id="layout">
+  <!-- Menu toggle -->
+  <a href="#menu" id="menuLink" class="menu-link">
+    <!-- Hamburger icon -->
+    <span></span>
+  </a>
+
+  <div id="menu">
+    <div class="pure-menu">
+      <a class="pure-menu-heading" style="text-transform:none;" href="/">skarnet.com</a>
+      <ul class="pure-menu-list">
+        <li class="pure-menu-item"> <a href="/" class="pure-menu-link">Home</a> </li>
+        <li class="pure-menu-item"> <a href="/projects/" class="pure-menu-link">Projects</a> </li>
+        <li class="pure-menu-item"> <a href="/contact/" class="pure-menu-link">Contact</a> </li>
+        <li class="pure-menu-item"> <a href="//skarnet.org/" class="pure-menu-link">skarnet.org</a> </li>
+      </ul>
+    </div>
+  </div>
+
+  <div id="main">
+    <div class="header">
+      <h1> s6-rc: services </h1>
+        <h2> The basic building block </h2>
+    </div>
+
+    <div class="content">
+
+      <p>
+      </p>
+
+      <h2 class="content-subhead" id="toc"> Table of contents </h2>
+
+      <ul>
+        <li> <a href="#toc">Table of contents</a> </li>
+        <li> <a href="#stypes">Service types</a> </li>
+        <li> <a href="#instances">Dynamic instantiation</a> </li>
+      </ul>
+
+      <h2 class="content-subhead" id="stypes"> Service types </h2>
+
+      <p>
+        In all genericity, a <em>service</em> is a basic unit that can undergo a
+        transition; but not all services can be handled the same way. Services
+        are divided into several categories, which we call <em>types</em>; these
+        are the following.
+      </p>
+
+      <ol>
+       <li> <strong>Longrun</strong>.
+
+        <p>
+          A <em>longrun</em> is the "traditional" definition of a service,
+          implemented by a <em>long-lived process</em>, a.k.a. a daemon. As a first
+          approximation, it means that when the daemon is alive, the service is up,
+          and when the daemon is not present, the service is down. Longruns are the
+          most common type of service, and the main reason why it's a good thing for
+          a service manager to work in tandem with a process supervisor: the details
+          of keeping the daemon alive, surveying its readiness, etc. are delegated
+          to the process supervisor, which abstracts some complexity away from the
+          service manager.
+        </p>
+       </li>
+
+       <li> <strong>Oneshot</strong>.
+
+        <p>
+          A <em>oneshot</em> is a service that represents a state change in the
+          machine, but that does not need a daemon because the state is maintained by
+          the kernel. For instance, "mounting a filesystem" and "setting a sysctl" are
+          oneshots: the service is considered <em>up</em> when the filesystem is mounted
+          or the sysctl has been performed, and <em>down</em> when the filesystem is
+          unmounted or the sysctl has its default value. Note that it's generally
+          meaningless to revert a sysctl (and in most cases it's also a bad idea to try
+          and unmount filesystems before the very end of a shutdown procedure), so it is
+          quite common for the <em>down transition</em> of a oneshot to be a nop: after
+          the first time the service has been brought up, the state basically never
+          changes.
+        </p>
+
+        <p>
+          <em>Longruns</em> and <em>oneshots</em> are collectively called <em>atomic
+          services</em>. They are the core service types, the ones that actually do the
+          work. Other service types are just convenience tools around them.
+        </p>
+       </li>
+
+       <li> <strong>External</strong>.
+
+        <p>
+          An <em>external</em> is a service that is not handled by the
+          service manager itself, but by a system that is external to it. It is a way for
+          the service manager to delegate complex subsystems to other programs such as a
+          network manager. The service manager does not know how to perform transitions
+          for an external, it does not know anything but its name.
+        </p>
+
+        <p>
+          It is impossible to set the <em>wanted state</em> of an <em>external</em>: such
+          a service has to be triggered entirely outside of the service manager. All the
+          service manager does is receive events that inform it of the external's <em>current
+          state</em>.
+        </p>
+
+        <p>
+          Consequently, an <em>external</em> does have any dependencies. It is, however,
+          possible for a service to depend on an external &mdash; that is their intended use,
+          gating the transition of another service to the reception of an external event.
+        </p>
+       </li>
+
+       <li> <strong>Bundle</strong>.
+
+        <p>
+          A <em>bundle</em> is a pseudo-service representing a set of services: it is used
+          to implement service conjunction (<tt>AND</tt>). when a
+          bundle is <em>wanted up</em>, it means that <em>all</em> the services it
+          contains are <em>wanted up</em>. A bundle's <em>current state</em> is <em>up</em>
+          if <em>all</em> the services it contains are up, and it is <em>down</em> otherwise.
+        </p>
+        <p>
+          However, when a bundle is <em>wanted down</em>, it also means that <em>all</em>
+          (and not just one!) of the services it contains are <em>wanted down</em>, so take
+          care when explicitly bringing down bundles.
+        </p>
+       </li>
+
+       <li> <strong>Virtual</strong>.
+
+        <p>
+          A <em>virtual</em> is a pseudo-service representing a set of services, but used for
+          disjunction (<tt>OR</tt>) instead: instead of meaning "all the services in the set", it means
+          "one of the services in the set". A virtual's <em>current state</em> is <em>up</em>
+          if at least one of the services it represents is <em>up</em>, and <em>down</em>
+          otherwise.
+        </p>
+       </li>
+      <ol>
+
+      <h2 class="content-subhead" id="instances"> Dynamic instantiation </h2>
+
+      <p>
+        In all genericity, a <em>service</em> is a basic unit that can undergo a
+        transition; but not all services can be handled the same way. Services
+        are divided into several categories, which we call <em>types</em>; these
+        are the following.
+      </p>
+
+
+
+       <li> <strong>Dynamically instantiated longrun</strong>.
+
+        <p>
+          A <em>dynamically instantiated longrun</em>, or <em>DIL</em>, is a template for
+          an indeterminate amount of <em>longruns</em> that all follow the same model,
+          and that differ by one parameter, the <em>instance name</em>. They are used
+          to implement sets of similar services that the user will want to start on
+          demand: for instance, a set of gettys. A <em>DIL</em> is identified by a
+          <tt>@</tt> at the end of the service name; anything that follows the <tt>@</tt>
+          is the <tt>instance parameter</tt>. For instance, <tt>getty@</tt> can be the name
+          of the <em>DIL</em> spawning the gettys, and <tt>getty@tty2</tt> can be a
+          dynamic instance of <tt>getty@</tt> with <tt>tty2</tt> as the <em>instance
+          parameter</em>.
+        </p>
+
+        <p>
+          (It is possible to define a regular, <em>static</em> (as opposed to dynamically
+          instantiated), <tt>getty@tty1</tt> service even if
+          the <tt>getty@</tt> DIL exists: in that case, <tt>getty@tty1</tt> will always
+          refer to the static service and it will be impossible to spawn a <tt>getty@</tt>
+          instance with <tt>tty1</tt> as an instance parameter. This can be a good way to
+          ensure that specific "instances" are special-cased.)
+        </p>
+
+        <p>
+          However, DILs have a strong limitation: only dynamically instantiated services
+          can depend on them, and only <em>with the same instance parameter</em>. In other
+          words: <tt>B</tt> cannot depend on <tt>A@</tt>, only <tt>B@</tt> can depend on
+          <tt>A@</tt>, and that means that for any <tt>x</tt>, <tt>B@x</tt> depends on
+          <tt>A@x</tt>.
+        </p>
+       </li>
+
+    </div>
+  </div>
+</div>
+
+<script src="/js/ui.js"></script>
+</body>
+</html>