From f316a2ed52195135a35e32d7096e876357c48c69 Mon Sep 17 00:00:00 2001
From: Laurent Bercot
+ One of the most frequent sources of security problems in programs
+is parsing. Parsing is a complex operation, and it is easy to
+make mistakes while designing and implementing a parser. (See
+what Dan Bernstein says
+on the subject, section 5.)
+
+ But shells parse all the time. Worse, the essence
+of the shell is parsing: the parser and the runner are intimately
+interleaved and cannot be clearly separated, thanks to the
+specification.
+Even worse, the
+shell sometimes has to perform double parsing, for instance
+after parameter expansion. This can lead to atrocities like
+
+Software
+skarnet.org
+
+ Why not just use /bin/sh ?
+
+
+
+ Security
+
+
+zork="foo ; echo bar"
+touch $zork
+
not doing what you would like them to do, even in that simple
+case. (zsh has a sane behaviour by
+default, at the expense of explicitly breaking the spec.)
+
+execlineb parses the script only once: when +reading it. The parser has been designed to be simple and systematic, +to reduce the potential for bugs - which you just cannot do +with a shell. After execlineb has split up the script into +words, no other parsing phase will happen, unless the user explicitly +requires it. Positional parameters, when +used, are never split, even if they contain spaces or newlines, unless +the user explicitly requires it. Users control exactly what +is split, what is done, and how. +
+ + ++ The shell language was designed to make scripts portable across various +versions of Unix. But it is actually really hard to write a portable shell +script. There are dozens of distinct +sh flavours, not even counting the openly incompatible +csh approach and its various tcsh-like followers. +The ash, bash, ksh and zsh shells +all exhibit a different behaviour, even when they are +run with the so-called compatibility mode. From what I have +seen on various experiments, only zsh is able to follow the +specification to the letter, at the expense of being big and complex to +configure. This is a source of endless problems for shell script writers, +who should be able to assume that a script will run everywhere, +but cannot in practice. Even a simple utility like test +cannot be used safely with the normalized options, because most shells +come with a builtin test that does not respect the +specification to the letter. And let's not get started about echo, +which has its own set of problems. Rich Felker has +a page listing tricks +to use to write portable shell scripts. Writing a portable script should +not be that hard. +
+ ++execline scripts are portable. There is no +complex syntax with opportunity to have an undefined or nonportable +behaviour. The execline package is portable across platforms: +there is no reason for vendors or distributors to fork their own +incompatible version. + Scripts will +not break from one machine to another; if they do, +it's not a "portability problem", it's a bug. You are then encouraged +to find the program that is responsible for the different behaviour, +and send a bug-report to the program author - including me, if the +relevant program is part of the execline distribution. +
+ ++ A long-standing problem with Unix scripts is the shebang line, which +requires an absolute path to the interpreter. Scripts are only portable +as is if the interpreter can be found at the same absolute path on every +system. With /bin/sh, it is almost the case (Solaris +manages to get it wrong by having a non-POSIX shell as /bin/sh +and requiring something like #!/usr/xpg4/bin/sh to get a POSIX +shell to interpret your script). Other scripting languages are not so +lucky: perl can be /bin/perl, /usr/bin/perl, +/usr/local/bin/perl or something else entirely. For those cases, +some people advocate the use of env: #!/usr/bin/env perl. +But first, env can only find interpreters that can be found via the +user's PATH environment variable, which defeats the purpose of having an +absolute path in the shebang line in the first place; and second, this only +displaces the problem: the env utility does not +have a guaranteed absolute path. /usr/bin/env is the usual +convention, but not a strong guarantee: it is valid for systems to have +/bin/env instead, for instance. +
+ ++execline suffers from the same issues. #!/bin/execlineb ? +#!/usr/bin/execlineb ? This is the only portability problem that +you will find with execline, and it is common to every script language. +
+ ++ The real solution to this portability problem is a convention that +guarantees fixed absolute paths for executables, which the FHS does not do. +The slashpackage convention is +such an initiative, and is well-designed; but as with every +convention, it only works if everyone follows it, and unfortunately, +slashpackage has not +found many followers. Nevertheless, like every skarnet.org package, execline +can be configured to follow the slashpackage convention. +
+ + ++ I originally wanted a shell that could be used on an embedded system. +Even the ash shell seemed big, so I thought of writing my +own. Hence I had a look at the +sh +specification... and ran away screaming. +This specification +is insane. It goes against every good programming +practice; it seems to have been designed only to give headaches +to wannabe sh implementors. +
+ ++ POSIX cannot really be blamed for that: it only normalizes existing, historical +behaviour. One can argue whether it is a good idea to normalize atrocious +behaviour for historical reasons, as is the case with the infamous +gets +function, but this is the way it is. +
+ ++ The fact remains that modern shells have to be compatible with that historical +nonsense and that makes them big and complex at best, or incompatible and ridden +with bugs at worst. +An OpenBSD developer said to me, when asked about the OpenBSD /bin/sh: +"It works, but it's far from not being a nightmare". +
+ ++ Nobody should have +nightmare-like software on their system. Unix is simple. Unix +was designed to be simple. And if, as Dennis Ritchie said, "it takes a +genius to understand the simplicity", that's because incompetent people +took advantage of the huge Unix flexibility to write insanely crappy or +complex software. System administrators can only do a decent job when +they understand how the programs they run are supposed to work. People +are slowly starting to grasp this (or are they ? We finally managed +to get rid of sendmail and BIND, but GNU/Linux users seem happy to +welcome the era of D-Bus and systemd. Will we ever learn ?) - but even +sh, a seemingly simple and basic Unix program, is hard to +understand when you lift the cover. +
+ ++ So I decided to forego sh entirely and take a new approach. So far it +has been working. + The execline specification is simple, and, +as I hope to have shown, easy to implement without too many bugs or +glitches. +
+ + ++ Since it was made to run on an embedded system, execline was +designed to be light in memory usage. And it is. +
+ ++ You can have hundreds of execline scripts running simultaneously on an +embedded box. Not exactly possible with a shell. +
+ ++ For scripts than do not require many computations that a shell can do +without calling external programs, + execline is faster than the shell. +Unlike sh's +one, the execline parser is simple and +straightforward; actually, it's more of a lexer than a parser, because +the execline language has been designed to be LL(1) - keep it simple, +stupid. +execline scripts get analysed and launched practically without a delay. +
+ + +