diff options
author | Laurent Bercot <ska-skaware@skarnet.org> | 2014-09-18 20:03:23 +0000 |
---|---|---|
committer | Laurent Bercot <ska-skaware@skarnet.org> | 2014-09-18 20:03:23 +0000 |
commit | f316a2ed52195135a35e32d7096e876357c48c69 (patch) | |
tree | 5f4486b9a5a213a69e66ef574d6bc643a207981c /doc/grammar.html | |
download | execline-f316a2ed52195135a35e32d7096e876357c48c69.tar.xz |
initial commit: rc for execline-2.0.0.0
Diffstat (limited to 'doc/grammar.html')
-rw-r--r-- | doc/grammar.html | 160 |
1 files changed, 160 insertions, 0 deletions
diff --git a/doc/grammar.html b/doc/grammar.html new file mode 100644 index 0000000..6c26dbd --- /dev/null +++ b/doc/grammar.html @@ -0,0 +1,160 @@ +<html> + <head> + <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + <meta http-equiv="Content-Language" content="en" /> + <title>execline: language design and grammar</title> + <meta name="Description" content="execline: language design and grammar" /> + <meta name="Keywords" content="execline language design grammar script shell" /> + <!-- <link rel="stylesheet" type="text/css" href="http://skarnet.org/default.css" /> --> + </head> +<body> + +<p> +<a href="index.html">execline</a><br /> +<a href="http://skarnet.org/software/">Software</a><br /> +<a href="http://skarnet.org/">skarnet.org</a> +</p> + +<h1> The <tt>execline</tt> language design and grammar </h1> + +<a name="principles" /> +<h2> <tt>execline</tt> principles </h2> + +<p> + Here are some basic Unix facts: +</p> + +<ul> + <li> Unix programs are started with the <tt>execve()</tt> +system call, which takes 3 arguments: the command name (which +we won't discuss here because it's redundant in most cases), +the command line <em>argv</em>, which specifies the program name and its +arguments, and the environment <em>envp</em>. </li> + <li> The <em>argv</em> structure makes it easy to read some +arguments at the beginning of <em>argv</em>, perform some action, +then <tt>execve()</tt> into the rest of <em>argv</em>. For +instance, the <tt>nice</tt> command works that way: +<pre> nice -10 echo blah </pre> will read <tt>nice</tt> and <tt>-10</tt> +from the argv, change the process' <em>nice</em> value, then exec into +the command <tt>echo blah</tt>. This is called +<a href"http://en.wikipedia.org/wiki/Chain_loading">chain loading</a> +by some people, and <a href="http://www.faqs.org/docs/artu/ch07s02.html"> +Bernstein chaining</a> by others. </li> + <li> The purpose of the environment is to preserve some state across +<tt>execve()</tt> calls. This state is usually small: most programs +keep their information in the filesystem. </li> + <li> A <em>script</em> is basically a text file whose meaning is a +sequence of actions, i.e. calls to Unix programs, with some control +over the execution flow. You need a program to interpret your script. +Traditionally, this program is <tt>/bin/sh</tt>: scripts are written +in the <em>shell</em> language. </li> + <li> The shell reads and interprets the script command after command. +That means it must preserve a state, and stay in memory while the +script is running. </li> + <li> Standard shells have lots of built-in features and commands, so +they are big. Spawning (i.e. <tt>fork()</tt>ing then <tt>exec()</tt>ing) +a shell script takes time, because the shell program itself must be +initialized. For simple programs like <tt>nice -10 echo blah</tt>, +a shell is overpowered - we only need a way to make an <em>argv</em> +from the "<tt>nice -10 echo blah</tt>" string, and <tt>execve()</tt> +into that <em>argv</em>. </li> + <li> Unix systems have a size limit for <em>argv</em>+<em>envp</em>, +but it is high. POSIX states that this limit must not be inferior to +4 KB - and most simple scripts are smaller than that. Modern systems +have a much higher limit: for instance, it is 64 KB on FreeBSD-4.6, +and 128 KB on Linux. </li> +</ul> + +<p> + Knowing that, and wanting lightweight and efficient scripts, I +wondered: "Why should the interpreter stay in memory while the script +is executing ? Why not parse the script once and for all, put +it all into one <em>argv</em>, and just execute into that <em>argv</em>, +relying on external commands (which will be called from within the +script) to control the execution flow ?" +</p> + +<p> <tt>execline</tt> was born. </p> + +<ul> + <li> <tt>execline</tt> is the first script language to rely +<em>entirely</em> on chain loading. An execline script is a +single <em>argv</em>, made of a chain of programs designed to +perform their action then <tt>exec()</tt> into the next one. </li> + <li> The <a href="execlineb.html">execlineb</a> command is a +<em>launcher</em>: it reads and parses a text file, converting it +to an <em>argv</em>, then executes into that <em>argv</em>. It +does nothing more. </li> + <li> Straightforward scripts like <tt>nice -10 echo blah</tt> +will be run just as they are, without the shell overhead. +Here is what the script could look like: +<pre> +#!/command/execlineb -P +nice -10 +echo blah +</pre> + </li> + <li> More complex scripts will include calls to other <tt>execline</tt> +commands, which are meant to provide some control over the process state +and execution flow from inside an <em>argv</em>. </li> +</ul> + +<a name="grammar" /> +<h2> Grammar of an execline script </h2> + +<p> +An execline script can be parsed as follows: +</p> + +<pre> + <instruction> = <> | external options <arglist> <instruction> | builtin options <arglist> <blocklist> <instruction> + <arglist> = <> | arg <arglist> + <blocklist> = <> | <block> <blocklist> + <block> = { <arglist> } | { <instrlist> } + <instrlist> = <> | <instruction> <instrlist> +</pre> + +<p> +(This grammar is ambivalent, but much simpler to understand than the +non-ambivalent ones.) +</p> + +<ul> + <li> An execline script is valid if it reduces to an +<em>instruction</em>. </li> + <li> The empty <em>instruction</em> is the same as the <tt>true</tt> +command: when an execline component must exec into the empty +instruction, it exits 0. </li> + <li> Basically, every non-empty <em>instruction</em>, be it +"<em>builtin</em>" - an execline command - or "<em>external</em>" +- a program such as <tt>echo</tt> or <tt>cp</tt> - takes a number of +arguments, the <em>arglist</em>, then executes into a (possibly empty) +<em>instruction</em>. </li> + <li> Some <em>builtin</em>s are special because they also take a +non-empty <em>blocklist</em> after their <em>arglist</em>. For instance, +the <a href="foreground.html">foreground</a> command takes an empty +<em>arglist</em> and one <em>block</em>: <pre> + #!/command/execlineb -P + foreground { sleep 1 } echo blah +</pre> is a valid <a href="execlineb.html">execlineb</a> script. +The <a href="foreground.html">foreground</a> command uses the +<tt>sleep 1</tt> <em>block</em> then execs into the +remaining <tt>echo blah</tt> <em>instruction</em>. </li> +</ul> + +<a name="features"></a> +<h2> execline features </h2> + +<p> + <tt>execline</tt> commands can perform some transformations on +their <em>argv</em>, to emulate some aspects of a shell. Here are +descriptions of these features: +</p> + +<ul> + <li> <a href="el_semicolon.html">Block management</a> </li> + <li> <a href="el_substitute.html">Variable substitution</a> </li> +</ul> + +</body> +</html> |