tipidee
Software
skarnet.org
The tipideed program
tipideed is the binary that actually does what you want from
a web server package: it serves files over HTTP.
Interface
tipideed [ -f cdbfile ] [ -d basedir ] [ -R ] [ -U ]
- tipideed reads a stream of HTTP (1.0 or 1.1) requests on its stdin, and tries
to fulfill them, sending answers to stdout and logs to stderr.
- tipideed only speaks plaintext HTTP. It supports HTTPS, but the TLS layer
must be handled upstream by a program such as
s6-tlsd.
- tipideed stays alive until the client closes the connection, or times out,
or (in HTTP 1.1) sends a request with a Connection: close header; or an error
occurs that makes it nonsensical to keep the connection open.
- The documents it serves must be in subdirectories of its
current working directory, one subdirectory for every domain it hosts.
Common usage
tipideed is intended to be run under a TCP super-server such as
s6-tcpserver,
for plain text HTTP, or
s6-tlsserver,
for HTTPS. It delegates to the super-server the job of binding and listening to
the socket, accepting connections, spawning a separate process to handle a
given connection, and potentially establishing a TLS tunnel with the client for
secure communication.
As such, a command line for tipideed, running as user www, listening
on address ${ip}, would typically look like this, for HTTP:
s6-envuidgid www s6-tcpserver -U -- ${ip} 80 s6-tcpserver-access -- tipideed
or, for HTTPS:
s6-envuidgid www env KEYFILE=/path/to/private/key CERTFILE=/path/to/certificate s6-tlsserver -U -- ${ip} 443 tipideed
Most users will want to run these command lines as services, i.e. daemons
run in the background when the machine starts. The examples/ subdirectory
of the tipidee package provides service templates to help you run tipideed under
OpenRC,
s6 and
s6-rc.
Exit codes
- 0
- Clean exit. There was a successful series of HTTP exchanges,
that either tipideed or the client decided to end in a way that is
permitted by HTTP.
- 1
- Illicit client behaviour. tipideed exited because it could
not serve the client in good faith.
- 2
- Illicit CGI script behaviour. tipideed exited because the invoked
CGI script made it impossible to continue. Before exiting, tipideed likely has
sent a 502 (Bad Gateway) response to the client.
- 100
- Bad usage. tipideed was run in an incorrect way: bad command
line options, or missing environment variables, etc.
- 101
- Cannot happen. This signals a bug in tipideed, and comes with an
error message asking you to report the bug. Please do so, on the
skaware mailing-list.
- 102
- Misconfiguration. tipideed found something in its configuration
data or in the document layout that it does not like. This can happen, for
instance, when a document is a symbolic link pointing outside of the server's
root.
- 111
- System call failed. This usually signals an issue with the
the underlying operating system. Before exiting, if in the middle of processing
a request, tipideed likely has sent a 500 (Internal Server Error) response to
the client.
Environment variables
Reading - mandatory
tipideed expects the following variables in its environment, and will exit
with an error message if they are undefined. When tipideed is run under
s6-tcpserver or
s6-tlsserver,
these variables are automatically set by the super-server. This is the way
tipidee gets its network information without having to perform network
operations itself.
- PROTO
- The network protocol, normally TCP.
- TCPLOCALIP
- The IP address the server is bound to. It will be passed as SERVER_ADDR
to CGI scripts.
- TCPLOCALPORT
- The port the server is bound to. It will be passed as SERVER_PORT
to CGI scripts unless the requested URI explicitly mentions a different port.
- TCPREMOTEIP
- The IP address of the client. It will be passed as REMOTE_ADDR
to CGI scripts.
- TCPREMOTEPORT
- The port of the client socket. It will be passed as REMOTE_PORT
to CGI scripts.
Reading - optional
tipideed can function without these variables, but if they're present, it
uses them to get more information. They're typically obtained by calling
s6-tcpserver-access
before tipideed in the
s6-tcpserver
command line.
(For HTTPS, s6-tlsserver
calls it implicitly.)
- TCPLOCALHOST
- The default domain name associated to the local IP address. It will be
passed as SERVER_NAME to CGI scripts when the requested URI does
not mention a Host, i.e. in HTTP/1.0 requests without a full request URL.
If this variable is absent, the default will be set to the local IP address
itself (between square brackets if IPv6).
- TCPREMOTEHOST
- The domain name associated to the IP address of the client. It will
be passed as REMOTE_HOST to CGI scripts; if absent, the value of
TCPREMOTEIP will be used instead.
- TCPREMOTEINFO
- The name provided by an IDENT server running on the client, if any.
This is obsolete and not expected to be present; but if present, it will
be passed as REMOTE_IDENT to CGI scripts.
- SSL_PROTOCOL
- The version of the TLS protocol used to cipher communications between
the client and the server. If present, tipideed will assume that the client
connection is secure, and will pass HTTPS=on to CGI scripts;
otherwise, it will assume it is running plaintext HTTP.
Writing
When spawning a CGI or NPH script, tipideed clears all the previous variables,
so the passed environment is as close as possible to the environment of the
super-server; and it adds all the variables that are required by the
CGI 1.1
specification. As an exception, it does not add PATH_TRANSLATED, which
cannot be used by CGI scripts in a portable way.
Options
- -f file
- Use file as the compiled configuration database, typically obtained
by running tipidee-config -o file.
The default is /etc/tipidee.conf.cdb; /etc may be something
else if the --sysconfdir option has been given to configure at
build time.
- -d docroot
- Change the working directory to docroot before serving. Default
is serving from the current working directory. Note that documents need to
be located in subdirectories of docroot, one subdirectory
per virtual domain tipideed is serving.
- -R
- chroot. If the underlying operating system has the
chroot()
system call, use it before serving. This always happens after opening
the configuration database, after changing the working directory,
and before dropping privileges. The idea is that chrooting helps
with security, but the configuration database should be located outside of the
document space.
- -U
- Drop root privileges. If this option is given, tipideed expects two
additional environment variables, UID and GID, containing the uid and gid
it should run as; it will drop its privileges to $UID:$GID before serving.
This option is mainly useful when paired with -R, because chrooting
can only be performed as root, so root privileges need to be kept all the
way to tipideed then dropped after tipideed has chrooted. In a non-chrooted
setup, it is simpler and more secure to run the super-server with
the -U option instead: root privileges will be dropped as soon as
the super-server has bound to its socket, and all the subsequent operations,
including the spawning of tipideed processes, are performed as a normal user.
Document root
The way to organize your documents so they can be served by tipideed
may look a little weird, but there's a logic to it.
tipideed serves documents from subdirectories of its working directory,
and these subdirectories are named according to the host and
the port of the request.
- A request for https://example.com:1234/doc/u/ment
will result in a lookup in the filesystem for
./example.com:1234/doc/u/ment.
- A request for https://example.com/doc/u/ment
will result in a lookup in the filesystem for
./example.com:443/doc/u/ment.
The fact that the port is always specified allows you to have
different document sets for the same host on different ports:
more flexibility.
However, most of the time, you don't want different
document sets for different ports. You want the same document
sets for ports 80 and 443, and that's it. And you don't want
to have both a domain example.com:80 section and a
domain example.com:443 section in your
/etc/tipidee.conf, with
duplicate information.
That is why you are allowed to make your document roots
symbolic links, and resource attributes declared in
the configuration file are always looked up with the
canonical path. In other words, the common case
would be:
- Have your document root in ./example.com, a
real directory.
- Declare your resource attributes under a
domain example.com section in your configuration file.
- Have a ./example.com:80 symlink pointing to
example.com, if you want to serve example.com
under plaintext HTTP.
- Have a ./example.com:443 symlink pointing to
example.com, if you want to serve example.com
under HTTPS.
This system allows you to share documents across virtual hosts
without fear of misconfiguration. You can symlink any document
under example.com to any name under example.org;
if the path via example.com is the canonical path, then
your resource will still get the correct attributes, defined in a
domain example.com section, even if it is accessed via an
example.org URL. You will not inadvertently expose source
code for CGI scripts, for instance.
You can do wild things with symbolic links. However, anything
that does not resolve to a file in a document root under tipideed's
current working directory will be rejected. If an attacker symlinks
your /etc/passwd file, tipideed will keep it safe.
HTTP/1.0 does not have the concepts of virtual hosts. For HTTP/1.0
requests that do not provide a full URL, tipideed will use the value
it reads from the TCPLOCALHOST variable, which is normally the result
of a reverse DNS lookup on the server's address. You can override the
lookup and provide your own value by giving the -l option to
s6-tcpserver-access or
s6-tlsserver.
If TCPLOCALHOST does not exist or is empty, a fallback value of
@ (at), will be used. So if you aren't calling
s6-tcpserver-access
at all, your documents will most likely be accessible for HTTP/1.0 clients under
@:80 or @:443.
Logging
- tipideed uses stderr for all its logging. All its log lines are prefixed
with "tipideed: pid pid: ".
- The log lines continue with "fatal: " for fatal error messages (meaning
that tipideed exits right after writing the message), or "warning: " for
warnings (meaning that tipideed continues operating after writing the message).
In normal operation, you should not see any fatal or warning line.
- In normal operation, tipideed can log informational lines, and the continuing
prefix is "info: ". It can potentially log:
- One line when it starts (i.e. a client has connected)
- Up to three lines for every request:
- One when the request is received
- One when a suitable resource is found. In rare cases
(namely: the resource is a CGI script answering with a local redirection),
there may be more than one resource line.
- One when an answer is sent
- One line when it exits normally
- What to log is configured via the
log directive in the
configuration file. By default, only
a minimal request line and an answer line are printed.
- The log format is designed to be readable by a human, but still
easily processable by automation. For instance, the regular prefix structure
makes it easy for s6-log
to select different lines to send them to various backends for archiving or
processing.
Detailed operation
- tipideed reads its compiled
configuration file. Then:
- If the -d option has been given, it changes its working directory.
- If the -R option has been given, it chroots to its current directory.
- If the -U option has been given, it drops root privileges.
- It checks that its environment is valid, and that its configuration has
some minimal defaults it can use.
- tipideed listens to a stream of HTTP requests on its standard input. For every
HTTP request:
- It parses the request line and checks it's HTTP/1.0 or 1.1
- It parses the headers into a quick access structure
- It checks header consistency with the request
- If the method is OPTIONS * or TRACE, it answers here
and continues the loop
- It reads the request body, if any
- It checks in its configuration if a redirection has been defined for
the wanted resource or a prefix (by directory) of the wanted resource. If it's
the case, it answers with that redirection and continues the loop.
- It looks for a suitable resource in the filesystem, completing the
request with index files if necessary, or extracting CGI INFO_PATHs if
necessary
- It uses the canonical path of the resource in the filesystem to look
for resource attributes in its configuration. (Is this a CGI script? a NPH
script? Does it have a customized Content-Type? etc.)
- If the method is a targeted OPTIONS, it answers here and
continues the loop
- If the resource is a CGI script:
- If it is an NPH script, tipideed execs into the script (possibly
after spawning a helper child if there is a request body to feed to the script)
with the appropriate environment;
and the connection will close when the script exits.
- Else, tipideed spawns the CGI script as a child with the appropriate
environment, feeds it the request body if any, reads its output, and answers
the client.
- If a problem occurs server-side, the client will receive a 502
answer ("Bad Gateway"), and tipideed will write an error message to
its stderr, so that administrators can see what went wrong with their setup.
tipideed trusts its CGI scripts more than its clients, but it does not give
them its full trust either — lots of sites are running third-party
backends.
- Else, the resource is a regular ("static") file, and tipideed serves
it on its stdout, to the client.
- tipideed exits on EOF (when the client closes the connection), or when
the client times out before sending a request, or after tipideed receives a
single HTTP/1.0 request, or when it has executed into an NPH script, or when
it has answered a request with a Connection: close header. It also
exits when it encounters an error making it likely that the client will have
no use for the connection anymore anyway and exiting is simpler and cheaper;
in which case tipideed adds Connection: close to its last answer.
Performance considerations
On systems that implement
posix_spawn(),
the s6-tcpserver
super-server (and the
s6-tlsserver one
as well, since both use the same underlying program) uses it instead of
fork(),
and that partly alleviates the performance penalty usually associated with servers
that spawn one process per connection.
One of tipidee's stated goals is to explore what kind of performance is achievable for
a fully compliant Web server within the limits of that model. To that effect, tipideed
is meant to be fast. It should serve static files as fast as any server out
there, especially on Linux (or other systems supporting
splice()) where it
uses zero-copy transfer. CGI performance should be limited by the performance of the
CGI script itself, never by tipideed.
tipideed itself does not use
fork()
if the system supports
posix_spawn()
— with one exception, that you will not hit, and if you do, fork() will not
be the bottleneck. (Can you guess which case it is, without looking at the code?)
tipideed does not parse its configuration file itself, delegating the task to the
offline tipidee-config program and directly mapping
a binary file instead. To parse a client request, it uses a deterministic finite
automaton, only reading the request once, and only backtracking in pathological cases.
This should streamline request processing as much as possible.
If you have benchmarks, results of comparative testing of tipideed against
other Web servers, please share them on the
skaware mailing-list.
Notes
- tipideed sometimes answers 400, or even does not answer at all
(it just exits), when receiving some malformed or weirdly paced
client requests, despite what the
HTTP RFC says.
This is on purpose. HTTP servers are very heavily solicited, they can run
very hot, the Web is a cesspool of badly written bots and bad actors, and every
legitimate browser knows how to speak HTTP properly and without abusing
corner cases in the protocol.
It makes no sense to try to follow the book to the letter, expending
precious resources, when the client can't even be bothered to pretend
it's legit. Knowing when to exit early is crucial for good resource
management.
- tipideed does not have any configuration defaults baked in; it
reads all its configuration values, including the defaults, from the cdb
file created by tipidee-config. That
is why having such a cdb file is mandatory for tipideed to run.
- tipideed is pronounced tipi-deed. You can say
tipi-dee-dee, but only if you're the type of person who also says
PC computer, NIC card or ATM machine.
- tipidee is the name of the package, the software suite
implementing a Web server. tipideed is the name of the program
doing the HTTP serving part.