DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

awk(C)


awk, oawk, nawk -- pattern scanning and processing language

Syntax

awk [ -F <field_sep> ] [ -v <varname>=<value> ] ...
{{-f <srcfile> | -e '<progtext>' } ... | -We <srcfile> | '<progtext>' }
{ <varname>=<value> | <file> } ...

Description

awk is an interpreted pattern-matching language with a wide range of applications. See Chapter 13, ``Using awk'' in the SCO OpenServer Operating System User's Guide for a complete discussion of its use.

You can enter an awk program '<progtext>' directly from the command-line, enclosing it in single quotes to prevent interpretation by the shell.


-e
optional flag used if the complete program is to be passed as a single argument on the command line.

-f
fetch the program from srcfile; this may be more convenient for longer awk programs.

You can specify multiple -e programs and -f files and can use both forms together; all of the specified programs will be concatenated together (with intervening newlines) to form the program that is executed. This is similar to the -e and -f options in sed(C).


-We and -Wexec
special means of passing the name of a srcfile, indicating that it is the last option on the command line and that the next argument, if any, is the name of a sourcefile. Thus the following two commands are equivalent:
   awk -We srcfile [arg ...]
   awk -f srcfile -- [arg ...]
awk, like many utilities, allows the use of "--" to terminate the option list.

-We is mostly used in an executable awk program file that uses the #! mechanism documented on the exec(M) manual page. For awk, the #! line typically looks like this:

   #!/usr/bin/awk -f
Such a line passes the following arguments to awk:
   -f srcfile [args ...]
where [args ...] are any arguments with which the program is invoked. If only filenames are passed, this works properly. However, if the first argument begins with '-', it appears to awk as though it is an argument that should be interpreted according to awk's command line syntax. This prevents an awk program from using POSIX-style options, which are introduced with '-'.

The -We construct avoids these problems. If a program begins with:

   #!/usr/bin/awk -We
the following is passed to awk:
   -We srcfile [arg ...]
which is equivalent to:
   awk -f srcfile -- [arg ...]
The -We version is used because the #! mechanism does not allow for the latter syntax. Because of the implicit "--", awk does not attempt to interpret any arguments as options to itself.

For example, passing a -q option to "#!/usr/bin/awk -f" will abort with an error. But passing -q to the following runs correctly:

   #!/usr/bin/awk -We
With the -We option, -q is stored in ARGV[1], where it can be interpreted by the awk program as required.

-Wexec is a synonym for -We.

The remaining command-line options to awk are:


-F field_sep
Set the field separator to be field_sep. field_sep can be any extended regular expression. You can also specify the field separator as a single character; this sets the field separator to be that character.

awk -F t is a special case that sets the field separator to a tab. (The field separator can also be changed within an awk program using the variable FS.)


argument
name of a file to be read, or an assignment of the form varname=value. Input files are read in the order given; if no files are given on the command line or argument is ``-'', awk reads from the standard input.

-v varname=value
Set the value of variables you are going to use in the awk program. varname is the variable name and value is its initial value. -v assignments take place before awk executes any statements of the program, including those associated with BEGIN.

Variable assignments that follow a filename take place after that file has been processed and before the next specified file is read.

An assignment placed before the first file argument is processed after the BEGIN actions. An assignment placed after the final file argument is processed before the END actions.

If there are no input files, awk executes the assignments before processing the standard input.

What awk does with your program

After awk checks the syntax of your program, it reads each record (generally, each line) of the input and attempts to match it against the patterns specified in the program. For each pattern in the program, there may be an associated action performed when an input record matches the pattern. Actions can be made up of a single action statement, like print, or of a combination of statements.

A pattern-action statement has the form:

pattern { action }

Either pattern or action may be omitted. If there is no action with a pattern, the matching line is printed. If there is no pattern with an action, the action is performed on every input line.

Programming conventions

Pattern-action statements, and individual statements within actions, generally begin on a new line.

The opening brace ``{'' must be on the same line as the pattern for which the actions should be performed. Multiple action statements may appear on a single line if they are separated by semicolons ``;''.

A newline can be hidden with a backslash ``\'', so you can use backslash-newline to continue a long line.

Comments in awk are introduced by a number sign ``#'' and end with the end of the line. Comments can appear anywhere in a line.

Blank lines and whitespace (blanks and tabs) in an awk program are ignored.

Fields, records, and built-in variables

awk presumes that each field in a record is separated by whitespace, and that each record consists of one line of input. Both of these defaults can be modified.

You can change the field separator on the command line, as discussed earlier, using the -F field_sep option. You can also reset the value of the input field separator variable FS from within your awk program. FS can be set to any regular expression. The following action is a special case that resets FS to its default behavior:

   BEGIN { FS = " " }
The BEGIN in this example is a special pattern that matches before the first record is read; this is the mechanism awk provides for doing introductory processing.

Setting FS to a single blank is equivalent to:

   BEGIN { FS = "[ \t]+" }
That is, setting FS to a single blank tells awk to regard any combination of blanks and tabs (any whitespace) as a field separator. Note that once you set the input field separator to something other than a single blank (that is, to all whitespace), leading whitespace (before the first field) is no longer ignored.

awk is designed to consider each line of input as a complete record, but you can get awk to recognize multiline records by resetting the variable RS.

To get awk to recognize multiline records, set RS to the null string:

   BEGIN { RS = "" }
Now, awk will presume that records are separated by one or more blank lines. When you reset RS like this to use multiline records, newline is always considered a field separator, no matter what the value of FS is. To restore the default record separator, reset RS to a newline:
   { RS = "\n" }
You can address any field in the input record using the syntax $1, $2, etc., where $1 is the first field in a record, $2 is the second field, and so on. The entire record is referred to as $0.

Fields can also be referred to in relation to the built-in field variables, for example, for a five-field record:

   $(NF - 2)
would refer to the third field. The NF in this example is a built-in variable awk provides that counts the number of fields in a current record. (Thus, $NF refers to the last field in the current record.)

The following list shows all the built-in variables in awk:

Variable Meaning
ARGC number of command-line arguments plus 1
ARGV array of command-line arguments (ARGV[0 ... ARGC-1])
CONVFMT format for converting numbers to strings (default: "%.6g"; see printf(S)). (Used by print.)
ENVIRON array of environment variables, indexed by the name of the variable
FILENAME name of current input file
FNR input record number in current file
FS input field separator (default: any whitespace)
NF number of fields in current input record
NR number of records read so far
OFMT output format for numbers (default: "%.6g"; see printf(S)) (Used by print.)
OFS output field separator (default: blank)
ORS output record separator (default: newline)
RLENGTH length of string matched by match
RS input record separator (default: newline)
RSTART index of first character matched by match
SUBSEP separates multiple subscripts in array elements (default: "\034")

 +---------+--------------------------------------------------------+
 |Variable | Meaning                                                |
 +---------+--------------------------------------------------------+
 |ARGC     | number of command-line arguments plus 1                |
 +---------+--------------------------------------------------------+
 |ARGV     | array of command-line arguments (ARGV[0 ... ARGC-1])   |
 +---------+--------------------------------------------------------+
 |CONVFMT  | format for converting numbers to strings (default:     |
 |         | "%.6g"; see printf(S) (Used by print.)                 |
 +---------+--------------------------------------------------------+
 |ENVIRON  | array of environment variables, indexed by the name of |
 |         | the variable                                           |
 +---------+--------------------------------------------------------+
 |FILENAME | name of current input file                             |
 +---------+--------------------------------------------------------+
 |FNR      | input record number in current file                    |
 +---------+--------------------------------------------------------+
 |FS       | input field separator (default: any whitespace)        |
 +---------+--------------------------------------------------------+
 |NF       | number of fields in current input record               |
 +---------+--------------------------------------------------------+
 |NR       | number of records read so far                          |
 +---------+--------------------------------------------------------+
 |OFMT     | output format for numbers (default: "%.6g"; see        |
 |         | printf(S) (Used by print.)                             |
 +---------+--------------------------------------------------------+
 |OFS      | output field separator (default: blank)                |
 +---------+--------------------------------------------------------+
 |ORS      | output record separator (default: newline)             |
 +---------+--------------------------------------------------------+
 |RLENGTH  | length of string matched by match                      |
 +---------+--------------------------------------------------------+
 |RS       | input record separator (default: newline)              |
 +---------+--------------------------------------------------------+
 |RSTART   | index of first character matched by match              |
 +---------+--------------------------------------------------------+
 |SUBSEP   | separates multiple subscripts in array elements        |
 |         | (default: "\034")                                      |
 +---------+--------------------------------------------------------+

Patterns

Patterns can be any of the following:

BEGIN
END
/expr/
relational expression
pattern1 && pattern2
pattern1 ||pattern2
(pattern)
!pattern
pattern1,pattern2

BEGIN and END match before the first line is read, and after the last line has been read, respectively.

All other patterns can contain extended regular expressions. See regexp(M) for the pattern-matching syntax of extended regular expressions. (In the following discussion, extended regular expressions will be referred to simply as regular expressions.)

You can create a string matching pattern using a regular expression in one of three ways:


/regexpr/
This will match the current record if regexpr is contained anywhere in the current record.

expression ~ /regexpr/
This will match if regexpr is contained anywhere in the string value of expression.

expression !~ /regexpr/
This will match if regexpr is not contained anywhere in the string value of expression.
A relational expression is made up of two numeric or string expressions compared with one of the following operators:

Operator Meaning
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal to
!= not equal to

 +---------+--------------------------+
 |Operator | Meaning                  |
 +---------+--------------------------+
 |<        | less than                |
 +---------+--------------------------+
 |<=       | less than or equal to    |
 +---------+--------------------------+
 |>        | greater than             |
 +---------+--------------------------+
 |>=       | greater than or equal to |
 +---------+--------------------------+
 |==       | equal to                 |
 +---------+--------------------------+
 |!=       | not equal to             |
 +---------+--------------------------+

awk performs the comparison numerically if both operands are numeric, or if one is numeric and the other is a string with a numeric value. Otherwise, both arguments are converted to strings and compared character by character according to the sort (collation) order of the current locale (as determined by the environment variables LANG and LC_COLLATE). One string is less than another if it would appear earlier in the collation order.

Patterns can be joined using the logical operators && (AND) and || (OR). When patterns are joined like this, the pattern matches the current record if the entire pattern evaluates to true (non-zero or non-null). A pattern can be negated using the ! logical NOT operator. Parentheses may be used for grouping patterns.

The following pattern-matching expressions are available:


pattern1 && pattern2
This matches a record when both pattern1 and pattern2 match the record.

pattern1 ||pattern2
This matches a record when either pattern1 or pattern2 matches the record.

!pattern
This means the expression ``does not match pattern.'' That is, !pattern matches every record that is not matched by pattern.

pattern1, pattern2
This defines a matching range. The accompanying action is performed for all records that match from the first occurrence of pattern1 to the following occurrence of pattern2, inclusive. (The action is performed for the lines containing pattern1 and pattern2, as well as all the lines in between.)

Actions

The actual work your awk program does occurs in the action part of the program.

Action statements can be made up of:

Variables in awk are not explicitly declared; they simply spring into existence when they are first used. awk determines from the context whether a variable is numeric or string. Numeric variables are automatically initialized to 0; string variables are automatically initialized to the empty string (""). (See ``Number or string'' below, and Chapter 13, ``Using awk'' in the SCO OpenServer Operating System User's Guide for more information about variable types and type coercion in awk.)

Values are assigned to variables in the usual way in awk. For example:

   a = 100
creates a numeric variable a with the value ``100''. You can assign several variables in a single statement. For example:
   water = oil = "wet"
creates two string variables, water and oil, and sets them both to contain the string ``wet''.

Assignment operators are evaluated from right to left.

The following assignment operators are available; the shorthand assignment notation is borrowed from the C programming language:

Operator Meaning
a=b set a equal to b
a+=b set a equal to a + b
a-=b set a equal to a - b
a*=b set a equal to a * b
a/=b set a equal to a / b
a%=b set a equal to a % b; a becomes the remainder of a divided by b
a^=b set a equal to a ^ b; a becomes raised to the power b

 +---------+-----------------------------------------------------------------+
 |Operator | Meaning                                                         |
 +---------+-----------------------------------------------------------------+
 |a=b      | set a equal to b                                                |
 +---------+-----------------------------------------------------------------+
 |a+=b     | set a equal to a + b                                            |
 +---------+-----------------------------------------------------------------+
 |a-=b     | set a equal to a - b                                            |
 +---------+-----------------------------------------------------------------+
 |a*=b     | set a equal to a * b                                            |
 +---------+-----------------------------------------------------------------+
 |a/=b     | set a equal to a / b                                            |
 +---------+-----------------------------------------------------------------+
 |a%=b     | set a equal to a % b; a becomes the remainder of a divided by b |
 +---------+-----------------------------------------------------------------+
 |a^=b     | set a equal to a ^ b; a becomes raised to the power b           |
 +---------+-----------------------------------------------------------------+

awk offers the usual arithmetic operators: ``+'' (add), ``-'' (subtract), ``*'' (multiply), ``/'' (divide), ``%'' (modulo; divide and give remainder), ``^'' (exponentiation; ``**'' is a synonym). The unary ``+'' (plus) and ``-'' (minus) are also available.

All arithmetic in awk is done in floating point.

Relational expressions in action statements use the same operators as relational expressions in patterns; consult the relational operators table in ``Patterns'' above.

The logical AND and logical OR (&& and ||) are also available, as well as the logical NOT (!, as in !expr).

There is also a conditional operator: ``?'':

expression1 ? expression2 : expression3

expression is evaluated, and if it is non-empty and non-zero, then the expression has the value of expression2. Otherwise, it has the value of expression3.

Variables can be incremented using prefix or postfix notation, as in C. x++ and ++x are both equivalent to x = x + 1, and both x-- and --x are equivalent to x = x-1. The difference between prefix (++x) and postfix (x++) is when x assumes its new value. In prefix notation, x is immediately incremented; in postfix notation, the current value of x is used and then x is incremented.

Parentheses can be used to alter the order of evaluation in arithmetic and relational expressions.

The following table of precedence shows all the available action statement operators and the order in which they are evaluated. The table is in decreasing order of precedence; operators higher in the table are evaluated before operators lower in the table.

Operator Meaning
$ field
++ -- increment, decrement (prefix and postfix)
^ exponentiation (** is a synonym)
! logical negation
+ - unary plus, unary minus
* / % multiply, divide, mod
+ - add, subtract
(no explicit operator) string concatenation
< <= > >= != == relationals
~ !~ regular expression match, negated match
in array membership
&& logical AND
|| logical OR
?: conditional expression
= += -= *= /= %= ^= assignment

 +-----------------------+-------------------------------------------+
 |Operator               | Meaning                                   |
 +-----------------------+-------------------------------------------+
 |$                      | field                                     |
 +-----------------------+-------------------------------------------+
 |++ --                  | increment, decrement (prefix and postfix) |
 +-----------------------+-------------------------------------------+
 |^                      | exponentiation (** is a synonym)          |
 +-----------------------+-------------------------------------------+
 |!                      | logical negation                          |
 +-----------------------+-------------------------------------------+
 |+ -                    | unary plus, unary minus                   |
 +-----------------------+-------------------------------------------+
 |* / %                  | multiply, divide, mod                     |
 +-----------------------+-------------------------------------------+
 |+ -                    | add, subtract                             |
 +-----------------------+-------------------------------------------+
 |(no explicit operator) | string concatenation                      |
 +-----------------------+-------------------------------------------+
 |< <= > >= != ==        | relationals                               |
 +-----------------------+-------------------------------------------+
 |~ !~                   | regular expression match, negated match   |
 +-----------------------+-------------------------------------------+
 |in                     | array membership                          |
 +-----------------------+-------------------------------------------+
 |&&                     | logical AND                               |
 +-----------------------+-------------------------------------------+
 |||                     | logical OR                                |
 +-----------------------+-------------------------------------------+
 |?:                     | conditional expression                    |
 +-----------------------+-------------------------------------------+
 |= += -= *= /= %= ^=    | assignment                                |
 +-----------------------+-------------------------------------------+

All of these operators are evaluated from left to right (they are left-associative), except for the assignment operators, the conditional expression operator, and exponentiation, which are evaluated from right to left (they are right-associative).

Arrays

One-dimensional arrays are available in awk. Like other variables in awk, arrays and array elements do not need to be declared; they come into existence upon their first use.

awk allows you to use strings as array subscripts; arrays that do this are called associative arrays. This lets you group together data quite simply.

For example, a data file lists employee names, department names, and the number of sick days the employee has taken:

   Steve           Engineering     2
   Chris           Engineering     1
   Susannah        Documentation   0
   Vipin           Sales           2
   Connie          Marketing       3
   Matt            Documentation   1
   Nancy           Sales           1
   Nigel           Documentation   0
The first field, $1, contains the employee name; the second field, $2, contains the department, and the third field, $3, contains the number of sick days for that employee.

To accumulate the number of sick days in each department:

   { sickness[$2] += $3 }
This creates the array sickness, which uses the values in the second field (``Engineering'', ``Documentation'', ``Sales'', and ``Marketing'') as its subscripts. The sick day totals in the third field are then collected under the appropriate subscript.


for (i in arr) statement
This construct does statement for every subscript i in the array arr. Subscripts are looped over in a random order. If the value of i is changed within statement, unpredictable results may occur.

split(string,arr,fs)
This split function splits input into subscripts in an array. The string is the string you want to split, arr is the array into which you want to split it, and fs is the field separator on which you want to split. The first component of string is stored in arr[1], the second in arr[2] and so on. The return value is the number of fields.

delete arr [subscript]
Elements can be deleted from an array with this delete statement. After this is done, arr [subscript] no longer exists.

awk does not support multi-dimensional arrays, but this can be simulated by using a list of subscripts; see Chapter 13, ``Using awk'' in the SCO OpenServer Operating System User's Guide for details.


Flow of control

awk uses branching and looping statements borrowed from the C programming language. In all the following constructs, a single statement can be replaced by a statement list enclosed in braces { }.

Each statement in a statement list should begin on a new line or after a semicolon.

The following constructs are available:


if (expression) statement1 [else statement2]
If expression is non-zero and non-empty, do statement1; otherwise, do statement2. The ``else statement2'' is optional. If there are several ifs together with an else, the else belongs with the nearest preceding if.

while (expression) statement
When expression is non-zero and non-empty, statement is executed. This is a generalized form of the while statement. It is the same as:

expression1
while (expression2) {
statement
expression3
}

All three expressions are optional. This is often used to go through a loop based on the value of a counter, where expression1 is used to initialize a counter; expression is the test; and expression2 increments the counter. While expression is non-empty and non-zero, statement is executed.

do statement while (expression)

statement is repeatedly executed until expression becomes null or 0.

The break, continue, and next statements can be used to break out of loops that would otherwise keep going. break drops out of the innermost while, for, or do loop. continue causes the next iteration of the loop to begin. Execution will go to the test expression in a while or do loop, and to expression3 in a for loop. next reads the next record and starts the main input loop again.

exit will go straight to the END statements, if there are any. If exit occurs in an END statement, the program itself exits. If a numeric expression is given after exit, this expression is taken as the exit status for the awk program.

Output

The print and printf statements are used to write output in awk. These are as follows:

print expression1,expression2, ...,expressionn
This prints the string value of each expression separated by the output field separator, followed by the output record separator. Without the commas, the expressions are concatenated. Any non-integer numeric arguments are converted to strings using the built-in variable OFMT; integer values are converted using the %d format as defined for printf(S).

print by itself is an abbreviation for print $0.

To print an empty line use:

print ""

printf format, expression1, expression2, ... , expressionn
This printf function in awk is like printf(S) in C programming language. format can be made up of regular characters, which are printed as-is, escaped special characters, such as tab (\t) or newline (\n), and format keyletters that specify how to print the expressions following the format. Format keyletters begin with a ``%'' and can be preceded with a width specification, a precision statement, and/or an instruction to left-justify an expression in its field. If the field width or precision is specified as ``*'', then the next argument in the expression list is converted to an integer and used as the field width or precision value.

Each successive expression replaces each formatting keyletter.


If a print or printf statement includes an expression with the greater-than operator (>), this expression should be enclosed in parentheses to avoid confusion between the greater-than operator and redirection into a file. For example:
   { print $0 $2 > $3 }
This statement means ``print the record and then field 2 into a file named by field 3,'' while:
   { print $0 ($2 > $3) }
means ``print the record, followed by a 1 if field 2 is greater than field 3, or a 0 if it is not.''

printf keyletters are:

Keyletter Prints expression as
%c the ASCII character referred to by the least significant 8 bits of the numeric value of expression; truncates expression to the nearest integer. If the argument is a string, the first character of the string is used.
%d a decimal integer; truncates expression to the nearest integer
%e scientific notation using the form [-]d.ddddddE[+-]dd
%f scientific notation using the form [-]ddd.dddddd
%g the shorter of e or f conversion, with non-significant zeros suppressed
%o an unsigned octal number
%s a string
%x an unsigned hexadecimal number
%% prints a ``%'', no expression is converted

 +----------+--------------------------------------------------------+
 |Keyletter | Prints expression as                                   |
 +----------+--------------------------------------------------------+
 |%c        | the ASCII character referred to by the least           |
 |          | significant 8 bits of the numeric value of expression; |
 |          | truncates expression to the nearest integer. If the    |
 |          | argument is a string, the first character of the       |
 |          | string is used.                                        |
 +----------+--------------------------------------------------------+
 |%d        | a decimal integer; truncates expression to the nearest |
 |          | integer                                                |
 +----------+--------------------------------------------------------+
 |%e        | scientific notation using the form [-]d.ddddddE[+-]dd  |
 +----------+--------------------------------------------------------+
 |%f        | scientific notation using the form [-]ddd.dddddd       |
 +----------+--------------------------------------------------------+
 |%g        | the shorter of e or f conversion, with non-significant |
 |          | zeros suppressed                                       |
 +----------+--------------------------------------------------------+
 |%o        | an unsigned octal number                               |
 +----------+--------------------------------------------------------+
 |%s        | a string                                               |
 +----------+--------------------------------------------------------+
 |%x        | an unsigned hexadecimal number                         |
 +----------+--------------------------------------------------------+
 |%%        | prints a ``%'', no expression is converted             |
 +----------+--------------------------------------------------------+

The following escape sequences are recognized within regular expressions and strings:

Escape sequence Meaning
\\ backslash
\a alert or bell
\b backspace
\f formfeed
\n newline
\r carriage return
\t horizontal tab
\v vertical tab
\ddd octal value ddd

 +----------------+-----------------+
 |Escape sequence | Meaning         |
 +----------------+-----------------+
 |\\              | backslash       |
 +----------------+-----------------+
 |\a              | alert or bell   |
 +----------------+-----------------+
 |\b              | backspace       |
 +----------------+-----------------+
 |\f              | formfeed        |
 +----------------+-----------------+
 |\n              | newline         |
 +----------------+-----------------+
 |\r              | carriage return |
 +----------------+-----------------+
 |\t              | horizontal tab  |
 +----------------+-----------------+
 |\v              | vertical tab    |
 +----------------+-----------------+
 |\ddd            | octal value ddd |
 +----------------+-----------------+

Backslash followed by any other character is replaced by that character.

Output can be redirected into files using:

> filename

and

>> filename

Files are opened only once using the redirection operator. The first form will overwrite whatever is in filename, if filename already exists, and will create filename if it does not exist. The second form will append output to filename.

To send output to a pipe, use:

| command-line

where command-line is the command line to which you want to send the output. Filenames and command lines can be expressions, variables, or literal filenames or command lines. If you want to use a literal filename or command line, you must enclose it in double quotes, otherwise awk will treat it as a variable.

There is a limit to how many files and pipes you can open in an awk program (see ``Limitations'' below). Use the close statement to close files or pipes:

close(filename)
close(command-line)

where filename or command-line is the open file or pipe.

Input

awk provides the getline function to read in successive lines of input from a file or a pipe.

getline
getline by itself takes the next record of input as $0 and sets NF, NR, and FNR.

getline <file
The next record from file becomes $0; NF is set.

getline var
The next record of input is placed in var; NR and FNR are set.

getline var <file
The next record in file is placed in var.

command | getline
The output of command is piped to getline. $0 and NF are set.

command | getline var
The output of command is piped to getline and stored in var.
All forms of getline return 1 for successful input, 0 for end of file, and -1 for an error.

To read input from a file until the file runs out, use:

   while ( ( getline x < file ) > 0) { ... }
The ``> 0'' is needed so that the test catches a -1 error returned from getline. Otherwise, the while loop would read -1 as true, since it is non-zero.

Functions

The following arithmetic functions are built into awk:

Function Returns
atan2(y,x) arctangent of y/x in the range -pi to pi
cos(x) cosine of x, with x in radians
exp(x) exponential function of x, e^x
int(x) integer part of x; truncated toward 0 when x > 0
log(x) natural (base e) logarithm of x
rand() random number r, where 0 <= r < 1
sin(x) sine of x, with x in radians
sqrt(x) square root of x
srand() set the seed for rand() from the time of day
srand(x) x is new seed for rand()

 +-----------+--------------------------------------------------+
 |Function   | Returns                                          |
 +-----------+--------------------------------------------------+
 |atan2(y,x) | arctangent of y/x in the range -pi to pi         |
 +-----------+--------------------------------------------------+
 |cos(x)     | cosine of x, with x in radians                   |
 +-----------+--------------------------------------------------+
 |exp(x)     | exponential function of x, e^x                   |
 +-----------+--------------------------------------------------+
 |int(x)     | integer part of x; truncated toward 0 when x > 0 |
 +-----------+--------------------------------------------------+
 |log(x)     | natural (base e) logarithm of x                  |
 +-----------+--------------------------------------------------+
 |rand()     | random number r, where 0 <= r < 1                |
 +-----------+--------------------------------------------------+
 |sin(x)     | sine of x, with x in radians                     |
 +-----------+--------------------------------------------------+
 |sqrt(x)    | square root of x                                 |
 +-----------+--------------------------------------------------+
 |srand()    | set the seed for rand() from the time of day     |
 +-----------+--------------------------------------------------+
 |srand(x)   | x is new seed for rand()                         |
 +-----------+--------------------------------------------------+

The string functions are:


gsub(r,s,t)
Globally substitutes the string s for the regular expression r in the string t. If t is omitted, substitutions are made in the current record ($0). The number of substitutions is returned.

The & character has special meaning in the s substitution string: it is replaced with the text in the t string that is matched by the regular expression r. For example, if the var variable contains the string ``fob'', the following substitutions have these results:

Substitution string Reulting value for var
sub (/[bo]/,"x&y",var) fxoyb
gsub(/[bo]/,"x&y",var) fxoyxby

 Substitution string      Reulting value for var
 ------------------------------------------------
 sub (/[bo]/,"x&y",var)   fxoyb
 gsub(/[bo]/,"x&y",var)   fxoyxby

To get a literal &, escape the & by preceding it with a backslace (\).


index(s,t)
Returns the position in string s where string t first occurs, or 0 if it does not occur at all.

length(s)
Returns the length of its argument taken as a string, or of the whole record if there is no argument.

match(s,re)
Returns the position in string s where the regular expression re occurs, or 0 if it does not occur at all. RSTART is set to the starting position (which is the same as the returned value), and RLENGTH is set to the length of the matched string.

split(s,a,fs)
Splits the string s into array elements a[1], a[2], a[n], and returns n. The separation is done with the regular expression fs or with the field separator FS if fs is not given.

sprintf(format, expression,expression, ... )
Formats the expressions according to the printf format and returns the resulting string.

sub(r,s,t)
Substitutes the string s in place of the first instance of the regular expression r in string t and returns the number of substitutions. If t is omitted, awk substitutes in the current record ($0).

The & character has special meaning in the s substitution string: it is replaced with the text in the t string that is matched by the regular expression r. See the gsub description for examples.


substr(s,p)
Returns the suffix of s starting at position p.

substr(s,p,n)
Returns the n-character substring of s that begins at position p.

tolower(s)
Returns a copy of the string s with uppercase letters converted to lowercase.

toupper(s)
Returns a copy of the string s with lowercase letters converted to uppercase.
awk provides the system function for running commands:

system(command-line)

This executes command-line and returns its exit status.

You can define your own functions in awk. The syntax for this is:

function name(parameter-list) {
statements
}

name is the name of the function. Within the function, parameter-list is a comma-separated list of variable names which are the arguments with which the function was called: statements are action statements that make up the body of the function.

Function definitions can appear anywhere a pattern-action statement can appear. Recursion is permitted within user-defined functions; that is, a function may call itself directly or indirectly.

Variables passed to functions (as arguments) are copied, and a copy of the variable is manipulated by the function; that is, these variables are passed by value. The exception to this in awk is arrays, which are passed by reference, that is, the actual array elements are manipulated by the function, so array elements can be permanently altered, created, or deleted within a function.

Missing function arguments are set to null; extra arguments are ignored.

To define a return value for your function, you must include a statement:

return expression

where expression is the value you want your function to return. expression here is optional; if you leave it out, control will be returned to the caller of the function, but the return value will be undefined. The return statement itself is optional as well.

The formal parameters of a function (the argument list) are local to that function, but any other variables are global. You can use the argument list as a way of creating variables local only to the function; like other variables in awk these will be automatically initialized with null values.

Number or string?

In awk, variables come into being when they are used; there is no declaration of a variable. Therefore, you do not declare the type of a variable as a string or a number. Instead, awk assumes the type of a variable from its context.

In an assignment statement, such as v=e, the type of v becomes the type of e. When the context is ambiguous, awk determines the types when the program runs.

In comparisons, awk compares both operands as numbers if both are numeric, or if one is numeric and the other is a numeric string. Otherwise, the operands are compared as strings. (A string is greater than another string if it comes later in the sort sequence, and less than another string if it comes earlier in the sort sequence.)

All field variables are of type string; in addition, each field can be considered to have a numeric value (that is, the numeric value of a string). The numeric value of a string is the value of the longest prefix of a string that looks numeric. For example, if a field contains the string ``123abc'', the numeric value of this would be 123.

The value of a variable in awk is initially 0 or the string "".

You can force a variable of one type to become another type; this is known as type coercion. To force a number to a string, use:

number ""

This concatenates the null string to number.

To force a string to a number, use:

string + 0

For more information about variable types, see Chapter 13, ``Using awk'' in the SCO OpenServer Operating System User's Guide

Exit values

awk returns the following values:

0
successful completion

>0
an error occurred
The value returned may be altered within an awk program by using the exit statement.

Examples

The following examples are all individual awk programs; to try them out, you will need to put them in a file and call the file with awk -f, or enclose them in single quotes on the awk command line.

To print lines longer than 72 characters:

   length > 72
To print only the first two fields in the opposite order:
   { print $2, $1 }
To print the same, with input fields separated by commas and/or blanks and tabs:
   BEGIN	{ FS = ",[ \t]* | [ \t]+" }
   	{ print $2, $1 }
To add up the first column, and print the sum and the average:
   	{ s += $1 }
   END	{if ( NR > 0 )  print "sum is",  s, " average is", s/NR }
To print fields in reverse order (on separate lines):
   { for (i = NF; i > 0; --i) print $i }
To print all lines between start/stop pairs:
   /start/, /stop/
To print all lines whose first field is different from the previous one:
   $1 != prev { print; prev = $1 }
To simulate echo(C):
   BEGIN 	{
   	for (i = 1; i < ARGC; i++)
   	         printf "%s ", ARGV[i]
   	printf "\n"
   	exit
   	}
To simulate env(C):
   BEGIN 	{
   	for (e in ENVIRON)
   		print e "=" ENVIRON[e]
   	}

Limitations

The following limits exist in this implementation of awk:

 100 fields
3000 characters per input record
3000 characters per output record
3000 characters per field
3000 characters per printf string
 400 characters per literal string or regular expression
 250 characters per character class
  55 open files or pipes
double precision floating point

Numbers are limited to what can be represented on your machine; numbers outside this range will have string values only.

Input whitespace is not preserved on output if fields are involved.

func is an obsolete synonym for function.

Differences between versions

The -We and -Wexec options are provided for awk beginning with Release Supplement 506A for SCO OpenServer Release 5.0.6. These options were first introduced in mawk, a GPL awk interpreter by Michael D. Brennan.

The awk provided on SCO OpenServer systems is based on the so-called ``new awk'' described in the 1988 book entitled The AWK Programming Language. When it was introduced, some systems provided the ``new awk'' as nawk, and the older one is oawk. SCO OpenServer retains the nawk and oawk names, but they are both linked to awk to provide backward compatibility. We recommend that you use the awk name rather than nawk and oawk.

Known incompatibilities between the current of awk and older awks include:

See also

ed(C), exec(M), grep(C), lex(CP), printf(S), regexp(M), sed(C)

Chapter 13, ``Using awk'' in the SCO OpenServer Operating System User's Guide

Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger,
The AWK Programming Language, Addison-Wesley, 1988.


© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003