mawk (1) Linux Manual Page
NAME
mawk – pattern scanning and text processing language
SYNOPSIS
mawk [-W option] [-F value] [-v var=value] [–] ‘program text’ [file …]
mawk [-W option] [-F value] [-v var=value] [-f program-file] [–] [file …]
DESCRIPTION
mawk is an interpreter for the AWK Programming Language. The AWK language is useful for manipulation of data files, text retrieval and processing, and for prototyping and experimenting with algorithms. mawk is a new awk meaning it implements the AWK language as defined in Aho, Kernighan and Weinberger, The AWK Programming Language, Addison-Wesley Publishing, 1988 (hereafter referred to as the AWK book.) mawk conforms to the POSIX 1003.2 (draft 11.3) definition of the AWK language which contains a few features not described in the AWK book, and mawk provides a small number of extensions.
An AWK program is a sequence of pattern {action} pairs and function definitions. Short programs are entered on the command line usually enclosed in ‘ ‘ to avoid shell interpretation. Longer programs can be read in from a file with the -f option. Data input is read from the list of files on the command line or from standard input when the list is empty. The input is broken into records as determined by the record separator variable, RS. Initially, RS = “
” and records are synonymous with lines. Each record is compared against each pattern and if it matches, the program text for {action} is executed.
OPTIONS
- –
Fvalue - sets the field separator,
FS, to value. - –
ffile- Program text is read from file
instead of from the command line. Multiple-foptions are allowed.- –
vvar=value - assigns value to program variable var.
- —
- indicates the unambiguous end of options.
The above options will be available with any POSIX compatible implementation of AWK. Implementation specific options are prefaced with -W. mawk provides these:
- –
Wdump - writes an assembler like listing of the internal representation of the program to stdout and exits 0 (on successful compilation).
- –
Wexec file- Program text is read from file
and this is the last option. - This is a useful alternative to –
fon systems that support the#!“magic number” convention for executable scripts. Those implicitly pass the pathname of the script itself as the final parameter, and expect no more than one “-” option on the#!line. Becausemawkcan combine multiple –Woptions separated by commas, you can use this option when an additional –Woption is needed. - –
Whelp - prints a usage message to stderr and exits (same as “-
Wusage”). - –
Winteractive - sets unbuffered writes to stdout and line buffered reads from stdin. Records from stdin are lines regardless of the value of
RS. - –
Wposix_space - forces
mawknot to consider ‘
’ to be space. - –
Wrandom=num - calls
srandwith the given parameter (and overrides the auto-seeding behavior). - –
Wsprintf=num - adjusts the size of
mawk‘s internal sprintf buffer to num bytes. More than rare use of this option indicatesmawkshould be recompiled. - –
Wusage - prints a usage message to stderr and exits (same as “-
Whelp”). - –
Wversion -
mawkwrites its version and copyright to stdout and compiled limits to stderr and exits 0.
mawk accepts abbreviations for any of these options, e.g., “-W v” and “-Wv” both tell mawk to show its version.
mawk allows multiple -W options to be combined by separating the options with commas, e.g., -Wsprint=2000,posix. This is useful for executable #! “magic number” invocations in which only one argument is supported, e.g., –Winteractive,exec.
THE AWK LANGUAGE
1. Program structure
An AWK program is a sequence of pattern {action} pairs and user function definitions.
A pattern can be:
BEGINENDexpression expression , expression
One, but not both, of pattern {action} can be omitted. If {action} is omitted it is implicitly { print }. If pattern is omitted, then it is implicitly matched. BEGIN and END patterns require an action.
Statements are terminated by newlines, semi-colons or both. Groups of statements such as actions or loop bodies are blocked via { … } as in C. The last statement in a block doesn’t need a terminator. Blank lines have no meaning; an empty statement is terminated with a semi-colon. Long statements can be continued with a backslash, \. A statement can be broken without a backslash after a comma, left brace, &&, ||, do, else, the right parenthesis of an if, while or for statement, and the right parenthesis of a function definition. A comment starts with # and extends to, but does not include the end of line.
The following statements control program flow inside blocks.
-
if( expr ) statementif( expr ) statementelsestatementwhile( expr ) statementdostatementwhile( expr )for( opt_expr ; opt_expr ; opt_expr ) statementfor( varinarray ) statementcontinuebreak
2. Data types, conversion and comparison
There are two basic data types, numeric and string. Numeric constants can be integer like -2, decimal like 1.08, or in scientific notation like -1.1e4 or .28E-3. All numbers are represented internally and all computations are done in floating point arithmetic. So for example, the expression 0.2e2 == 20 is true and true is represented as 1.0.
String constants are enclosed in double quotes.
"
Strings can be continued across a line by escaping (\) the newline. The following escape sequences are recognized.
\ \
\" "
alert, ascii 7
backspace, ascii 8
tab, ascii 9
newline, ascii 10
vertical tab, ascii 11
formfeed, ascii 12
carriage return, ascii 13
\ddd 1, 2 or 3 octal digits for ascii ddd

From Section 11 Paragraph 3:
“The fields are placed in A[1], A[2], …, A[n] and split() returns n, the number of fields which is the number of matches plus one.”
This is misleading or redundant. All that is to say here is, that the number of fields starts with one and ends with “n”.
The fact that the number of matches +1 equals the number of fields is evident, in my opinion.
Cheers
Axel