From f316a2ed52195135a35e32d7096e876357c48c69 Mon Sep 17 00:00:00 2001 From: Laurent Bercot Date: Thu, 18 Sep 2014 20:03:23 +0000 Subject: initial commit: rc for execline-2.0.0.0 --- doc/dieshdiedie.html | 278 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 doc/dieshdiedie.html (limited to 'doc/dieshdiedie.html') diff --git a/doc/dieshdiedie.html b/doc/dieshdiedie.html new file mode 100644 index 0000000..dc3c661 --- /dev/null +++ b/doc/dieshdiedie.html @@ -0,0 +1,278 @@ + + + + + execline: why execline and not sh + + + + + + +execline
+Software
+skarnet.org

+ +

Why not just use `/bin/sh` ?

+ + + +

Security

+ +

+ One of the most frequent sources of security problems in programs +is parsing. Parsing is a complex operation, and it is easy to +make mistakes while designing and implementing a parser. (See +what Dan Bernstein says +on the subject, section 5.) +

+ +

+ But shells parse all the time. Worse, the essence +of the shell is parsing: the parser and the runner are intimately +interleaved and cannot be clearly separated, thanks to the +specification. +Even worse, the +shell sometimes has to perform double parsing, for instance +after parameter expansion. This can lead to atrocities like +

+zork="foo ; echo bar"
+touch $zork
+

not doing what you would like them to do, even in that simple +case. (zsh has a sane behaviour by +default, at the expense of explicitly breaking the spec.) +

+ +

+execlineb parses the script only once: when +reading it. The parser has been designed to be simple and systematic, +to reduce the potential for bugs - which you just cannot do +with a shell. After execlineb has split up the script into +words, no other parsing phase will happen, unless the user explicitly +requires it. Positional parameters, when +used, are never split, even if they contain spaces or newlines, unless +the user explicitly requires it. Users control exactly what +is split, what is done, and how. +

+ + +

Portability

+ +

+ The shell language was designed to make scripts portable across various +versions of Unix. But it is actually really hard to write a portable shell +script. There are dozens of distinct +sh flavours, not even counting the openly incompatible +csh approach and its various tcsh-like followers. +The ash, bash, ksh and zsh shells +all exhibit a different behaviour, even when they are +run with the so-called compatibility mode. From what I have +seen on various experiments, only zsh is able to follow the +specification to the letter, at the expense of being big and complex to +configure. This is a source of endless problems for shell script writers, +who should be able to assume that a script will run everywhere, +but cannot in practice. Even a simple utility like test +cannot be used safely with the normalized options, because most shells +come with a builtin test that does not respect the +specification to the letter. And let's not get started about echo, +which has its own set of problems. Rich Felker has +a page listing tricks +to use to write portable shell scripts. Writing a portable script should +not be that hard. +

+ +

+execline scripts are portable. There is no +complex syntax with opportunity to have an undefined or nonportable +behaviour. The execline package is portable across platforms: +there is no reason for vendors or distributors to fork their own +incompatible version. + Scripts will +not break from one machine to another; if they do, +it's not a "portability problem", it's a bug. You are then encouraged +to find the program that is responsible for the different behaviour, +and send a bug-report to the program author - including me, if the +relevant program is part of the execline distribution. +

+ +

+ A long-standing problem with Unix scripts is the shebang line, which +requires an absolute path to the interpreter. Scripts are only portable +as is if the interpreter can be found at the same absolute path on every +system. With /bin/sh, it is almost the case (Solaris +manages to get it wrong by having a non-POSIX shell as /bin/sh +and requiring something like #!/usr/xpg4/bin/sh to get a POSIX +shell to interpret your script). Other scripting languages are not so +lucky: perl can be /bin/perl, /usr/bin/perl, +/usr/local/bin/perl or something else entirely. For those cases, +some people advocate the use of env: #!/usr/bin/env perl. +But first, env can only find interpreters that can be found via the +user's PATH environment variable, which defeats the purpose of having an +absolute path in the shebang line in the first place; and second, this only +displaces the problem: the env utility does not +have a guaranteed absolute path. /usr/bin/env is the usual +convention, but not a strong guarantee: it is valid for systems to have +/bin/env instead, for instance. +

+ +

+execline suffers from the same issues. #!/bin/execlineb ? +#!/usr/bin/execlineb ? This is the only portability problem that +you will find with execline, and it is common to every script language. +

+ +

+ The real solution to this portability problem is a convention that +guarantees fixed absolute paths for executables, which the FHS does not do. +The slashpackage convention is +such an initiative, and is well-designed; but as with every +convention, it only works if everyone follows it, and unfortunately, +slashpackage has not +found many followers. Nevertheless, like every skarnet.org package, execline +can be configured to follow the slashpackage convention. +

+ + +

Simplicity

+ +

+ I originally wanted a shell that could be used on an embedded system. +Even the ash shell seemed big, so I thought of writing my +own. Hence I had a look at the +sh +specification... and ran away screaming. +This specification +is insane. It goes against every good programming +practice; it seems to have been designed only to give headaches +to wannabe sh implementors. +

+ +

+ POSIX cannot really be blamed for that: it only normalizes existing, historical +behaviour. One can argue whether it is a good idea to normalize atrocious +behaviour for historical reasons, as is the case with the infamous +gets +function, but this is the way it is. +

+ +

+ The fact remains that modern shells have to be compatible with that historical +nonsense and that makes them big and complex at best, or incompatible and ridden +with bugs at worst. +An OpenBSD developer said to me, when asked about the OpenBSD /bin/sh: +"It works, but it's far from not being a nightmare". +

+ +

+ Nobody should have +nightmare-like software on their system. Unix is simple. Unix +was designed to be simple. And if, as Dennis Ritchie said, "it takes a +genius to understand the simplicity", that's because incompetent people +took advantage of the huge Unix flexibility to write insanely crappy or +complex software. System administrators can only do a decent job when +they understand how the programs they run are supposed to work. People +are slowly starting to grasp this (or are they ? We finally managed +to get rid of sendmail and BIND, but GNU/Linux users seem happy to +welcome the era of D-Bus and systemd. Will we ever learn ?) - but even +sh, a seemingly simple and basic Unix program, is hard to +understand when you lift the cover. +

+ +

+ So I decided to forego sh entirely and take a new approach. So far it +has been working. + The execline specification is simple, and, +as I hope to have shown, easy to implement without too many bugs or +glitches. +

+ + +

Performance

+ +

+ Since it was made to run on an embedded system, execline was +designed to be light in memory usage. And it is. +

+ +

No overhead due to interactive support.
No overhead due to unneeded features. Since every command performs +its task then executes another command, all occupied resources are instantly +freed. By contrast, a shell stays in memory during the whole execution +time.
Very limited use of the C library. Only the C interface to the +kernel's system calls, and some very basic functions like malloc(), +are used in the C library. In addition to avoiding the horrible interfaces +like stdio and the legacy libc bugs, this approach makes it easy +to statically compile execline - you will want to do that on an embedded +system, or just to gain performance.

+ +

+ You can have hundreds of execline scripts running simultaneously on an +embedded box. Not exactly possible with a shell. +

+ +

+ For scripts than do not require many computations that a shell can do +without calling external programs, + execline is faster than the shell. +Unlike sh's +one, the execline parser is simple and +straightforward; actually, it's more of a lexer than a parser, because +the execline language has been designed to be LL(1) - keep it simple, +stupid. +execline scripts get analysed and launched practically without a delay. +

+ + +

+ The best use case of execline is a linear, straightforward script, a +simple command line that does not require the shell's processing power. +In that case, execline will skip the shell's overhead and win big time +on resource usage and execution speed.
For longer scripts that fork a few commands, with a bit of +control flow, on average, an execline script will run at roughly the +same speed as the equivalent shell script, while using less resources.
The worst use case of execline is when the shell is used as a +programming language, and the script loops over complex internal constructs +that execline is unable to replicate without forking. In that case, +execline will waste a lot of time in fork/exec system calls that the +shell does not have to perform, and be noticeably slower. execline has +been designed as a scripting language, not as a programming +language: it is efficient at being the glue that ties together programs +doing a job, not at implementing a program's logic.

+ + +

execline limitations

+ +

execline can only handle scripts that fit in one argv. +Unix systems have a limit on the argv+envp size; +execline cannot execute scripts that are bigger than this limit.
execline commands do not perform signal handling. It is not +possible to trap signals inside an execline script. If you want to trap +signals, write a specific C program, or use a shell.
Due to the execline design, maintaining a state is +difficult. Information has to transit via environment variables or +temporary files, which makes commands like +loopwhilex a bit painful to handle.
Despite all its problems, the main shell advantage (apart from +being available on every Unix platform, that is) is that it +is often convenient. Shell constructs can be terse and short, +where execline constructs will be verbose and lengthy.
An execline script is generally heavier on execve() than +the average shell script - notably in programs where the shell can +use builtins. This can lead to a performance loss, especially when +executed programs make numerous calls to the dynamic linker: the system +ends up spending a lot of time resolving dynamic symbols. If it is a +concern to you, you should try and statically compile the +execline package, to eliminate the dynamic resolution costs. Unless +you're heavily looping around execve(), +the remaining costs will be negligible.

+ + + -- cgit v1.2.3

Why not just use /bin/sh ?

Security

Portability

Simplicity

Performance

execline limitations

Why not just use `/bin/sh` ?