# Gnu Awk - Part 8 (HPR Show 2438)

## Introduction

This is the eighth episode of the “Learning Awk” series that b-yeezi and I are doing.

## Recap of the last episode

• The `while` loop: tests a condition and performs commands while the test returns true

• The `do while` loop: performs commands after the `do`, then tests afterwards, repeating the commands while the test is true.

• The `for` loop (type 1): initialises a variable, performs a test, and increments the variable all together, performing commands while the test is true.

• The `for` loop (type 2): sets a variable to successive indices of an array, preforming a collection of commands for each index.

These types of loops were demonstrated by examples in the last episode.

Note that the example for ‘`do while`’ was an infinite loop (perhaps as a test of the alertness of the audience!):

``````#!/usr/bin/awk -f
BEGIN {

i=2;
do {
print "The square of ", i, " is ", i*i;
i = i + 1
}
while (i != 2)

exit;
}``````

The condition in the `while` is always true:

``````The square of  2  is  4
The square of  3  is  9
The square of  4  is  16
The square of  5  is  25
The square of  6  is  36
The square of  7  is  49
The square of  8  is  64
The square of  9  is  81
The square of  10  is  100
...
The square of  1269630  is  1611960336900
The square of  1269631  is  1611962876161
The square of  1269632  is  1611965415424
The square of  1269633  is  1611967954689
The square of  1269634  is  1611970493956
...``````

The variable `i` is set to 2, the `print` is executed, then `i` is set to 3. The test “`i != 2`” is true and will be ad infinitum.

## Some more statements

We will come back to loops later in this episode, but first this seems like a good point to describe another statement: the `switch` statement.

### The `switch` statement

This is specific to `gawk`, and can be disabled if non-GNU `awk`-compatibility is required. The `switch` statement in `gawk` is very similar to the one in `C` and many other languages.

The layout of the `switch` statement is as follows:

`switch` (expression) {
`case` value:
case-body
`default`:
default-body
}

The ‘`expression`’ part is an expression, which returns a numeric or string result. The ‘`value`’ part after the `case` is a numeric or string constant or a regular expression.

The `expression` is evaluated and the result matched against the case `value`s in turn. If there is a match the `case-body` statements are executed. If there is no match the `default-body` statements are executed.

The following example is included as one of the files associated with this show, called `switch_example.awk`:

``````#!/usr/bin/awk -f

#
# Example of the use of 'switch' in GNU Awk.
#
# Should be run against the data file 'file1.txt' included with the second
# show in the series: http://hackerpublicradio.org/eps/hpr2129/file1.txt
#
NR > 1 {
printf "The %s is classified as: ",\$1

switch (\$1) {
case "apple":
print "a fruit, pome"
break
case "banana":
case "grape":
case "kiwi":
print "a fruit, berry"
break
case "strawberry":
print "not a true fruit, pseudocarp"
break
case "plum":
print "a fruit, drupe"
break
case "pineapple":
print "a fruit, fused berries (syncarp)"
break
case "potato":
print "a vegetable, tuber"
break
default:
print "[unclassified]"
}
}``````

The result of running this script against the “fruit” file presented in show 2129 is the following (`switch_example.out`):

``````The apple is classified as: a fruit, pome
The banana is classified as: a fruit, berry
The strawberry is classified as: not a true fruit, pseudocarp
The grape is classified as: a fruit, berry
The apple is classified as: a fruit, pome
The plum is classified as: a fruit, drupe
The kiwi is classified as: a fruit, berry
The potato is classified as: a vegetable, tuber
The pineapple is classified as: a fruit, fused berries (syncarp)``````

What this simple example does is:

• It ignores the first line of the file (a header)
• It prints the first field (the name of a fruit - mostly) in the string “The %s is classified as:”. There is no newline so whatever is printed next is appended to the line.
• It uses the first field in a `switch` statement. Each `case` is an exact match with the contents of the field. If there is a match a `print` statement is used to print out the Botanical classification. If there are no matches then the `default` instance would print “[unclassified]”, but that doesn’t happen in this example.
• All `print` statements are followed by `break`. If this hadn’t been there the next `case` would be executed and so forth. This can be desirable in some instances. See the next section for a discussion of `break`.
• Note that banana, grape and kiwi are all Botanically classified as a berry, so there are three `case` parts associated with one `print`.

### The `break` statement

This statement is mainly for “breaking out of” a `for`, `while` or `do-while` loop, though, as we have seen it can interrupt the flow of execution in a `switch` statement also. Outside of these statements `break` has no effect.

In a loop a `break` statement is often used where it’s not possible to determine the number of iterations of the loop beforehand. Invoking `break` completely terminates the enclosing loop (relevant when there are nested loops, or loops within loops).

The following example (available for download as `divisor.awk`) is from the Gnu Awk manual and shows a method of finding the smallest divisor:

``````#!/usr/bin/awk -f

# find smallest divisor of num
{
num = \$1

#
# Make an infinite loop using the for loop
#
for (divisor = 2; ; divisor++) {
#
# If the number is divisible by 'divisor' then we're done
#
if (num % divisor == 0) {
printf "Smallest divisor of %d is %d\n", num, divisor
break
}

#
# If the value of 'divisor' has got too large the number has no
# divisors and is therefore a prime number
#
if (divisor * divisor > num) {
printf "%d is prime\n", num
break
}
}
}``````

I have added some comments to this script to (hopefully) make it clearer.

Running this in a pipeline with the number presented to it as shown results in the following type of output (`divisor.out`):

``````\$ echo 67 | ./divisor.awk
67 is prime
\$ echo 69 | ./divisor.awk
Smallest divisor of 69 is 3``````

### The `continue` statement

This is similar to `break` in that it is used a `for`, `while` or `do-while` loop. It is not relevant in `switch` statements however.

Invoking `continue` skips the rest of the enclosing loop and begins the next cycle.

The following example (available for download as `continue_example.awk`) is from the Gnu Awk manual and demonstrates a possible use of `continue`:

``````#!/usr/bin/awk -f

#
# Loop, printing numbers from 0-20, except for 5
# (From the GNU Awk User's Guide)
#
BEGIN {
for (x = 0; x <= 20; x++) {
if (x == 5)
continue
printf "%d ", x
}
print ""
}``````

### The `next` statement

This statement is not related to loops in the same way as `break` and `continue` but to the main record processing cycle of Awk. The `next` statement causes Awk to stop processing the current input record and go on to the next one.

As we know from earlier episodes in this series, Awk reads records from its input stream and applies rules to them. The `next` statement stops the execution of further rules for the current record, and moves on to the next one.

The following example (available for download as `next_example.awk`) is demonstrates a use of `next`:

``````#!/usr/bin/awk -f

#
#
NR == 1 { next }

#
# If field 2 (colour) is less than 6 characters then save it with its line
# number and skip it
#
length(\$2) < 6 {
skip[NR] = \$0
next
}

#
# It's not the header and the colour name is > 6 characters, so print the line
#
{
print
}

#
# At the end show what was skipped
#
END {
printf "\nSkipped:\n"
for (n in skip)
print n": "skip[n]
}``````
• The script uses `next` in the first rule to avoid the first line of the file (a header).
• The second rule skips lines where the colour name is less than 6 characters long, but it also saves that line in an array called `skip` using the line number as the key (index).
• The third rule prints anything it sees, but it will not be invoked if either rule 1 or rule 2 cause it to be skipped.
• Finally, and `END` rule prints the contents of the array.

Running this with the file we have used many times before, `file1.txt`, results in the following output (`next_example.out`):

``````\$ next_example.awk file1.txt
banana     yellow 6
grape      purple 10
plum       purple 2
pineapple  yellow 5

Skipped:
2: apple      red    4
4: strawberry red    3
6: apple      green  8
8: kiwi       brown  4
9: potato     brown  9``````