Using awk

Field variables

Given that much of the work you will do with awk will involve the processing of records, awk provides a notation for the fast and efficient identification of fields. The fields of the current record are referred to by the field variables $1, $2, ..., $NF. Field variables share all of the properties of other variables: they can be used in arithmetic or string operations, and they can have values assigned to them. So, for example, you can divide the second field of the file countries by 1000 to convert the area from thousands to millions of square miles:

   { $2 /= 1000; print }

You can also assign a new string to a field:

   $4 == "Africa"   { $4 = "South" }

Fields can be accessed by expressions. For example, $(NF-1) is the second to last field of the current record. For example:

   $4 ~/Asia/ { print $(NF-1) }

This program prints the penultimate field (population) for each record in the file countries whose fourth field contains the string ``Asia''. (Omitting the parentheses causes a series of strings reading ``-1'' to be printed.)

A field variable referring to a nonexistent field, for example, $(NF+1), has as its initial value the empty string. A new field can be created, however, by assigning a value to it. For example, the following program invoked on the file countries creates a fifth field giving the population density:

   BEGIN  { FS = OFS = "\t" }
          { $5 = 1000 * $3 / $2; print }

This program adds a fifth column to the output. In the case of Canada, this would read ``6.23053''.

The number of fields may vary from record to record, but there is a limit of 100 fields per record.