Using awk

Multiline records and the getline function

awk's facility for automatically breaking its input into records that are more than one line long is not adequate for all tasks. For example, if records are not separated by blank lines, but by something more complicated, setting RS to null does not work. In such cases, it is necessary to manage the splitting of each record into fields in the program. Here are some suggestions.

Use the function getline to read input either from the current input or from a file or pipe, by using redirection in a manner analogous to printf. By itself, getline fetches the next input record and performs the normal field-splitting operations on it. It sets NF, NR, and FNR. getline returns 1 if there was a record present, 0 if the end-of-file was encountered, and -1 if some error occurred (such as failure to open a file).

To illustrate, suppose we have input data consisting of multiline records, each of which begins with a line beginning with START and ends with a line beginning with STOP. The following awk program processes these multiline records, a line at a time, putting the lines of the record into consecutive entries of an array:

f[1] f[2] ... f[nf]

Once the line containing STOP is encountered, the record can be processed from the data in the f array:

   /^START/ {
             f[nf=1] = $0
             while (getline && $0 !~ /^STOP/)
                  f[++nf] = $0
             # now process the data in f[1]...f[nf]
Notice that this code uses the fact that && evaluates its operands left to right and stops as soon as one is true. The same job can also be done by the following program:
   /^START/ && nf==0	{ f[nf=1] = $0 }
   nf > 1	{ f[++nf] = $0 }
   /^STOP/	{ # now process the data in f[1] . . . f[nf]
   	  . . .
   	  nf = 0
The following statement reads the next record into the variable x:
   getline x
No splitting is done; NF is not set.

This statement reads from file instead of the current input:

   getline <"file"
It has no effect on NR or FNR, but field splitting is performed, and NF is set.

The following statement gets the next record from file into x:

   getline x <"file"
In this case, no splitting is done, and NF, NR, and FNR are untouched.

Note that if a filename is an expression, it should be in parentheses for evaluation:

   while ( getline x < (ARGV[1] ARGV[2]) ) {  ... }
(See ``Command-line arguments'' for a discussion of ARGV.) This is because the < has precedence over concatenation. Without parentheses, a statement such as the following attempts to set x to read a file called tmp$FILENAME:
   getline x < "tmp" FILENAME
Also, if you use this getline statement form, a statement like the following loops forever if the file cannot be read:
   while ( getline x < file ) { ... }
This is because getline returns -1, not zero, if an error occurs. A better way to write this test is as follows:
   while ( getline x < file > 0) { ... }
It is also possible to pipe the output of another command directly into getline. For example, the following statement executes who and pipes its output into getline:
   while ("who" | getline)
Each iteration of the while loop reads one more line and increments the variable n, so after the while loop terminates, n contains a count of the number of users. Similarly, the following statement pipes the output of date into the variable d, thus setting d to the current date:
   "date" | getline d
The table below summarizes the getline function.

getline function

Form Sets
getline $0, NF, NR, FNR
getline var var, NR, FNR
getline <file $0, NF
getline var <file var
cmd | getline $0, NF
cmd | getline var var

Next topic: Command-line arguments
Previous topic: Multiline records

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003