diff options
author | Laurent Bercot <ska-skaware@skarnet.org> | 2023-09-21 05:57:24 +0000 |
---|---|---|
committer | Laurent Bercot <ska@appnovation.com> | 2023-09-21 05:57:24 +0000 |
commit | 0251ba5cc54cdd24092e442ab7ec364b97d42601 (patch) | |
tree | 56dfd48ce39c1958c889daf1d1196571bf82981a /doc | |
parent | 3d334dca671898241732dbc0ef6838b768308da7 (diff) | |
download | tipidee-0251ba5cc54cdd24092e442ab7ec364b97d42601.tar.xz |
More doc, complete?
Signed-off-by: Laurent Bercot <ska@appnovation.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/future.html | 104 | ||||
-rw-r--r-- | doc/index.html | 30 | ||||
-rw-r--r-- | doc/quickstart.html | 11 | ||||
-rw-r--r-- | doc/tipideed.html | 180 |
4 files changed, 309 insertions, 16 deletions
diff --git a/doc/future.html b/doc/future.html new file mode 100644 index 0000000..1a8c3e5 --- /dev/null +++ b/doc/future.html @@ -0,0 +1,104 @@ +<html> + <head> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + <meta http-equiv="Content-Language" content="en" /> + <title>tipidee: the future</title> + <meta name="Description" content="tipidee: the future" /> + <meta name="Keywords" content="tipidee future features roadmap support extensions" /> + <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> --> + </head> +<body> + +<p> +<a href="index.html">tipidee</a><br /> +<a href="//skarnet.org/software/">Software</a><br /> +<a href="//skarnet.org/">skarnet.org</a> +</p> + +<h1> tipidee: the future </h1> + +<p> + tipidee is fully functional, and you are encouraged to use it; however, it +is not yet considered <em>complete</em>. There are some optional features +of HTTP that would be nice to have, and that may be implemented at some point +down the line. +</p> + +<h2> Ranges </h2> + +<p> + <a href="https://datatracker.ietf.org/doc/html/rfc9110#section-14">Ranges</a> +are a useful part of HTTP when you are serving big files and connections may +be interrupted and restarted: supporting the <tt>Range:</tt> header can save +bandwidth, if the client only asks for the parts of the files that it's still +missing. +</p> + +<p> + It hasn't been implemented in tipidee yet because parsing the <tt>Range:</tt> +header is rather complex, and serving parts of files (as opposed to full files +sequentially) also requires some extra coding that wasn't deemed worth it for +an initial release. +</p> + +<h2> HTTP Basic Authentication </h2> + +<p> + HTTP Basic Auth is ubiquitous; and even +<a href="https://git.busybox.net/busybox/tree/networking/httpd.c#n120">busybox httpd</a> +implements it. It sounds silly not to have it; it would be good to add to tipidee. +</p> + +<p> + However, how to implement HTTP basic auth in a secure way is not entirely obvious. +Credentials should not be stored under the document root; passwords should not +be stored in plain text; the credentials database should have more restrictive +permissions than the configuration database; and the credentials database +should be easily regenerated. +</p> + +<p> + I'm leaning towards a cdb credentials database, distinct from the configuration +file; but this requires a <em>second</em> offline text file processor, for the +credentials file, and adding support for a <em>second</em> cdb mapping in various +places in <a href="tipideed.html">tipideed</a>. That was more complexity than I +wanted for an initial release; it's not urgent, it can wait. +</p> + +<h2> ETags </h2> + +<p> +<a href="https://datatracker.ietf.org/doc/html/rfc9110#field.etag">ETags</a> are +unique identifiers for resources that clients can use to cache data, and only +download resources they do not have. Like ranges, ETags support can save bandwidth. +</p> + +<p> + The problem is that creating ETags is pretty resource-intensive on the server +side. You have to maintain an ETag database, and update it any time a document +changes; alternatively, you have to dynamically hash a whole resource before +deciding if you're serving it or not. Both paths are riddled with traps and +design challenges, and neither is appealing to a server like tipidee aiming at +simplicity and efficiency. ETag support may come one day, but it won't be soon. +</p> + +<h2> FastCGI </h2> + +<p> + If tipidee compares to big Web servers performance-wise, which is the expectation, +it is quite possible that the performance bottleneck becomes the CGI protocol +itself, i.e. the need to spawn an additional process for a dynamic request. +In this case, it would be useful to support other methods of communicating with +dynamic backends. +</p> + +<p> + A module system, or embedding language-specific support into +<a href="tipideed.html">tipideed</a>, is out of the question, because it goes against +the design principles of tipidee; however, FastCGI support sounds like a possible +path to more performance. +</p> + +</body> +</html> diff --git a/doc/index.html b/doc/index.html index 4a2b9b7..b30b01a 100644 --- a/doc/index.html +++ b/doc/index.html @@ -81,8 +81,11 @@ on what I want from a web server, which is: <ul> <li> Usability with HTTPS without the need to entangle the code with a given TLS library (which means delegating the TLS layer to a super-server -and not performing the socket work itself) </li> - <li> Support for HTTP 1.1, not only 1.0 </li> +and not performing the socket work itself. This is important: tying your +Web server to a TLS library makes it more difficult to maintain, more +difficult to secure, more difficult to build, and more difficult to +package and distribute. </li> + <li> Support for HTTP 1.1, with persistent connections, and not only 1.0 </li> <li> Support for real CGI, not only NPH </li> </ul> @@ -95,8 +98,10 @@ similar sites that need an <em>intermediary</em> web server. <h3> And why "tipidee"? </h3> <p> - Because <em>h-t-t-p-d</em> is pretty tedious to say out loud. -Only keeping the last three syllables makes it easier. + Because <em>h-t-t-p-d</em> is already pretty tedious to say out loud, and +other web servers have a nasty habit of <em>adding</em> to it; it's much +nicer to make it shorter. And yes, you can take that as an indication of what +is going on with the code, too. </p> <h2> Installation </h2> @@ -118,9 +123,14 @@ information via environment variables. It also defers to tools such as to provide access control and connection fine-tuning. And if you want to run an HTTPS server, you'll need something like <a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a> -to manage the TLS transport layer. So, installing -<a href="//skarnet.org/software/s6-networking/">s6-networking</a> will make -your life easier in many ways. </li> +to manage the TLS transport layer. It <em>will</em> make +your life easier. + <ul> + <li> Also, when built with BearSSL, +<a href="//skarnet.org/software/s6-networking/s6-tlsserver.html">s6-tlsserver</a> +basically gives you a TLS tunnel <em>for free</em>. Bearly any RAM use. +Don't take my word for it; try it out for yourself. </li> + </ul> </li> </ul> <h3> Licensing </h3> @@ -182,6 +192,12 @@ the previous versions of tipidee and the current one. </li> <li><a href="tipidee.conf.html">The <tt>/etc/tipidee.conf</tt> file format</a></li> </ul> +<h3> Design notes </h3> + +<ul> +<li> <a href="future.html">Features that may appear in future versions of tipidee</a> </li> +</ul> + <h2> Related resources </h2> <ul> diff --git a/doc/quickstart.html b/doc/quickstart.html index a3e8519..40586a9 100644 --- a/doc/quickstart.html +++ b/doc/quickstart.html @@ -54,7 +54,7 @@ two services. Or four if you want to serve on both IPv4 and IPv6 adresses. </li> for all the domains you're serving. </li> <li> Assuming you want to run the server as user <tt>www</tt>, and your local IP address is ${ip}, the basic command line for an HTTP service is: -<tt>s6-envuidgid www s6-tcpserver -U -- ${ip} 80 s6-tcpserver-access -- tipideed</tt>. +<tt>s6-envuidgid www s6-tcpserver -U ${ip} 80 s6-tcpserver-access tipideed</tt>. <ul> <li> <a href="//skarnet.org/software/s6/s6-envuidgid.html">s6-envuidgid</a> puts the uid and gid of user <tt>www</tt> into the environment, for <tt>s6-tcpserver</tt> @@ -125,14 +125,15 @@ IPv4 and IPv6, over HTTP and HTTPS, which makes 8 services. Plus one for each of these services. Plus a supervisor for every service and every logger — for a whooping total of 64 long-running processes just for its web server functionality; and it's still not even noticeable, the -amount of resources it consumes is negligible. So, don't worry about it. +amount of resources it consumes is negligible. So, don't worry about it; +all your resources are still available for the serving itself. </p> <p> Note that this allows you to run different instances of -<a href="tipideed.html">tipideed</a> with different configurations, if -you need it. Use the <tt>-f</tt> option to specify a different config -file for <a href="tipideed.html">tipideed</a>. +<a href="tipideed.html">tipideed</a>, on different sockets, with different +configurations, if you need it. Use the <tt>-f</tt> option to specify a +different config file in your instances. </p> </body> diff --git a/doc/tipideed.html b/doc/tipideed.html index b11a63c..0c34af5 100644 --- a/doc/tipideed.html +++ b/doc/tipideed.html @@ -23,7 +23,9 @@ a web server package: it serves files over HTTP. </p> +<div id="interface"> <h2> Interface </h2> +</div> <pre> tipideed [ -v <em>verbosity</em> ] [ -f <em>cdbfile</em> ] [ -d <em>basedir</em> ] [ -R ] [ -U ] @@ -42,7 +44,9 @@ occurs that makes it nonsensical to keep the connection open. </li> current working directory, one subdirectory for every domain it hosts. </li> </ul> +<div id="commonusage"> <h2> Common usage </h2> +</div> <p> tipideed is intended to be run under a TCP super-server such as @@ -81,11 +85,13 @@ of the tipidee package provides service templates to help you run tipideed under <a href="//skarnet.org/software/s6-rc/">s6-rc</a>. </p> +<div id="exitcodes"> <h2> Exit codes </h2> +</div> <dl> - <dt> 0 </dt> <dd> Clean exit. The client closed the connection after a stream of -HTTP exchanges. </dd> + <dt> 0 </dt> <dd> Clean exit. There was a successful stream of HTTP exchanges, +that the client decided to end. </dd> <dt> 1 </dt> <dd> Illicit client behaviour. tipideed exited because it could not serve the client in good faith. </dd> <dt> 2 </dt> <dd> Illicit CGI script behaviour. tipideed exited because the invoked @@ -96,12 +102,18 @@ line options, or missing environment variables, etc. </dd> <dt> 101 </dt> <dd> Cannot happen. This signals a bug in tipideed, and comes with an error message asking you to report the bug. Please do so, on the <a href="//skarnet.org/lists/#skaware">skaware mailing-list</a>. </dd> + <dt> 102 </dt> <dd> Misconfiguration. tipideed found something in its configuration +data or in the document layout that it does not like. This can happen, for +instance, when a document is a symbolic link pointing outside of the server's +root. </dd> <dt> 111 </dt> <dd> System call failed. If this happens while serving a request, tipideed likely has sent a 500 (Internal Server Error) response to the client before exiting. </dd> </dl> +<div id="environment"> <h2> Environment variables </h2> +</div> <h3> Reading - mandatory </h3> @@ -173,11 +185,13 @@ otherwise, it will assume it is running plaintext HTTP. </dd> so the passed environment is as close as possible to the environment of the super-server; and it adds all the variables that are required by the <a href="https://datatracker.ietf.org/doc/html/rfc3875#section-4.1">CGI 1.1 -specification</a>. It does not add PATH_TRANSLATED, which CGI scripts should -not rely on. +specification</a>. As an exception, it does not add PATH_TRANSLATED, which +cannot be used by CGI scripts in a portable way. </p> +<div id="options"> <h2> Options </h2> +</div> <dl> <dt> -v <em>verbosity</em> </dt> @@ -218,9 +232,150 @@ the super-server has bound to its socket, and all the subsequent operations, including the spawning of tipideed processes, are performed as a normal user. </dd> </dl> +<div id="docroot"> +<h2> Document root </h2> +</div> + +<p> + The way to organize your documents so they can be served by tipideed +may look a little weird, but there's a logic to it. +</p> + +<p> + tipideed serves documents from subdirectories of its working directory, +and these subdirectories are named according to the host <em>and</em> +the port of the request. +</p> + +<ul> + <li> A request for <tt>https://example.com:1234/doc/u/ment</tt> +will result in a lookup in the filesystem for +<tt>./example.com:1234/doc/u/ment</tt>. </li> + <li> A request for <tt>https://example.com/doc/u/ment</tt> +will result in a lookup in the filesystem for +<tt>./example.com:443/doc/u/ment</tt>. </li> +</ul> + +<p> +The fact that the port is always specified allows you to have +different document sets for the same host on different ports: +more flexibility. +</p> + +<p> + However, most of the time, you <em>don't</em> want different +document sets for different ports. You want the same document +sets for ports 80 and 443, and that's it. And you don't want +to have both a <tt>domain example.com:80</tt> section and a +<tt>domain example.com: 443</tt> section in your +<a href="tipidee.conf.html">/etc/tipidee.conf</a>, with +duplicate information. +</p> + +<p> + That is why you are allowed to make your document roots +<em>symbolic links</em>, and resource attributes declared in +the configuration file are always looked up with the +<em>canonical path</em>. In other words, the common case +would be: +</p> + +<ul> + <li> Have your document root in <tt>./example.com</tt>, a +real directory. </li> + <li> Declare your resource attributes under a +<tt>domain example.com</tt> section in your configuration file. </li> + <li> Have a <tt>./example.com:80</tt> symlink pointing to +<tt>example.com</tt>, if you want to serve <tt>example.com</tt> +under plaintext HTTP. </li> + <li> Have a <tt>./example.com:80</tt> symlink pointing to +<tt>example.com</tt>, if you want to serve <tt>example.com</tt> +under HTTPS. </li> +</ul> + +<p> + This system allows you to share documents across virtual hosts +without fear of misconfiguration. You can symlink any document +under <tt>example.com</tt> to any name under <tt>example.org</tt>; +if the path via <tt>example.com</tt> is the canonical path, then +your resource will still get the correct attributes, defined in a +<tt>domain example.com</tt> section, even if it is accessed via an +<tt>example.org</tt> URL. You will not inadvertently expose source +code for CGI scripts, for instance. +</p> + +<p> + You can do wild things with symbolic links. However, anything +that does not resolve to a file in a document root under tipideed's +current working directory will be rejected. If an attacker symlinks +your <tt>/etc/passwd</tt> file, tipideed will keep it safe. +</p> + + +<div id="details"> <h2> Detailed operation </h2> +</div> + +<ul> + <li> tipideed reads its <a href="tipidee-config.html">compiled</a> +configuration file. Then: + <ul> + <li> If the <tt>-d</tt> option has been given, it changes its working directory. </li> + <li> If the <tt>-R</tt> option has been given, it chroots to its current directory. </li> + <li> If the <tt>-U</tt> option has been given, it drops root privileges. </li> + </ul> </li> + <li> It checks that its environment is valid, and that its configuration has +some minimal defaults it can use. </li> + <li> tipideed listens to a stream of HTTP requests on its standard input. For every +HTTP request: + <ul> + <li> It parses the request line and check it's HTTP/1.0 or 1.1 </li> + <li> It parses the headers into a quick access structure </li> + <li> It checks header consistency with the request </li> + <li> If the method is <tt>OPTIONS *</tt> or <tt>TRACE</tt>, it answers here +and continues the loop </li> + <li> It reads the request body, if any </li> + <li> It checks in its configuration if a redirection has been defined for +the wanted resource or a prefix (by directory) of the wanted resource. If it's +the case, it answers with that redirection and continues the loop. </li> + <li> It looks for a suitable resource in the filesystem, completing the +request with index files if necessary, or substracting CGI INFO_PATHs if +necessary </li> + <li> It uses the canonical path of the resource in the filesystem to look +for resource attributes in its configuration. (Is this a CGI script? a NPH +script? Does it have a customized Content-Type? etc.) </li> + <li> If the method is a targeted <tt>OPTIONS</tt>, it answers here and +continues the loop </li> + <li> If the resource is a CGI script: + <ul> + <li> If it is an NPH script, tipideed execs into the script (possibly +after spawning a helper child if there is a request body to feed to the script) +with the appropriate environment; +and the connection will close when the script exits. </li> + <li> Else, tipideed spawns the CGI script as a child with the appropriate +environment, feeds it the request body if any, reads its output, and answers +the client. </li> + <li> If a problem occurs server-side, the client will receive a 502 +answer ("Bad Gateway"), <em>and</em> tipideed will write an error message to +its stderr, so that administrators can see what went wrong with their setup. +tipideed trusts its CGI scripts more than its clients, but it does not give +them its full trust either — lots of sites are running third-party +backends. </li> + </ul> </li> + <li> Else, the resource is a regular ("static") file, and tipideed serves +it on its stdout, to the client. </li> + </ul> </li> + <li> tipideed exits on EOF (when the client closes the connection), or after +a single HTTP/1.0 request, or when it has answered a request with a +<tt>Connection: close</tt> header, or when it encounters an error where it is +likely that the client will have no use for the connection anymore anyway +and exiting is simpler and cheaper — in which case tipideed adds +<tt>Connection: close</tt> to its last answer. </li> +</ul> +<div id="performance"> <h2> Performance considerations </h2> +</div> <p> On systems that implement @@ -264,12 +419,29 @@ other Web servers, please share them on the <a href="//skarnet.org/lists/#skaware">skaware mailing-list</a>. </p> +<div id="notes"> <h2> Notes </h2> +</div> <ul> + <li> tipideed sometimes answers 400, or even does not answer at all +(it just exits), when receiving some malformed or weirdly paced +client requests, despite what the +<a href="https://datatracker.ietf.org/doc/html/rfc9112">HTTP RFC</a> says. +This is on purpose. HTTP servers are very much solicited, they can run +very hot, the Web is a cesspool of bots and bad actors, and every +legitimate browser knows how to speak HTTP properly and without abusing +corner cases in the protocol. +It makes no sense to try to follow the book to the letter, expending +precious resources, when the client can't even be bothered to pretend +it's legit. Knowing when to exit early is crucial for good resource +management. </li> <li> <tt>tipideed</tt> is pronounced <em>tipi-deed</em>. You can say <em>tipi-dee-dee</em>, but only if you're the type of person who also says <em>PC computer</em>, <em>NIC card</em> or <em>ATM machine</em>. </li> + <li> <tt>tipidee</tt> is the name of the <em>package</em>, the software suite +implementing a Web server. <tt>tipideed</tt> is the name of the <em>program</em> +doing the HTTP serving part. </li> </ul> </body> |