Bash Tips - 15 (HPR Show 2699)

Dave Morriss


Table of Contents

Pitfalls for the unwary Bash loop user

This is the fifteenth episode covering useful tips for Bash users. In the last episode we looked at the 'for' loop, and prior to that we looked at 'while' and 'until' loops. In this one I want to look at some of the loop-related issues that can trip up the unwary user.

Loops in Bash are extremely useful, and they are not at all difficult to use in their basic forms. However, there are some perhaps less than obvious issues that can result in unexpected behaviour.

Feeding a loop from a pipe

What is a pipeline?

Bash contains a feature known as a pipeline which is a sequence of one or more commands separated by a vertical bar ('|') control operator where the output of one command is connected to the input of another. We will spend some time on this subject (and related areas) later in this series of Bash Tips, but for now I want to explain enough for this particular episode.

The series of commands and '|' control characters is called a pipeline. The connection of one command to another is called a pipe.

A typical example is:

$ echo "Hello World" | sed -e 's/^.\+$/\U&/'
HELLO WORLD

Here the string "Hello World" is piped to 'sed' which replaces all characters on the line by their upper case versions.

What is happening here is that the 'echo' command writes the arguments it has been given on the standard output channel and the pipe passes this data to the standard input channel for 'sed' to consume, and it in turn writes to its standard output (the terminal) and the transformed version is displayed.

One of the key characteristics of the pipeline is that each command is executed in its own subshell. This is a separate process within the operating system which inherits settings from the parent shell (process) that created (spawned) it, but it cannot affect the parent environment. In particular, environmental variables cannot be passed back to the parent.

We’ll look at pipelines in more detail in later shows in the Bash Tips sub-series.

Piping into a loop

One of the common scenarios where data is piped to a loop is where the output from the 'ls' command is being processed. For example:

ls *.mp3 | while read name; do echo $name; done

Although not really a pipeline issue it is a bad idea to use 'ls' like this because the output it produces is meant to be displayed, and there are often settings, aliases or defaults which cause 'ls' to add extra characters and colour codes to the file names.1

This type of pipeline can work if you ensure that you are using plain 'ls' and not an alias, as shown2:

$ unalias ls
$ ls *.mp3 | while read name; do echo "$name"; done
astonish.mp3
birettas.mp3
dizzying.mp3
fabled.mp3
neckline.mp3
overtone.mp3
salamis.mp3
skunked.mp3
sniffing.mp3
theorize.mp3

Regardless of this the advice it usually to avoid the use of 'ls' in this context.

(Note that this example does nothing useful since 'ls' itself can list files. More realistically, instead of the 'echo' such a loop might run a program or script on each of these files to do some useful work.)

Problems arise as a consequence of the loop running in a subshell when you want to work with variables in the loop. For example, you might want to count the files:

$ count=0
$ ls *.mp3 | while read name; do ((count++)); done
$ echo "$count"
0

The count is zero – why?

The answer is that the 'count' variable being incremented in the loop is a copy of the one set to zero before the pipeline. Its value is being incremented in the subshell running the 'while' command, but it is discarded when the pipeline ends. Bash cannot pass back the value from the subshell.

A similar case was highlighted by clacke in the comments to show 2651 (Community News for September 2018):

items=()
produce_items | while read item; items+=( "$item" ); done
do_stuff_with "${items[@]}"

Here, 'items' is an array (a subject we’ll be looking at soon in a forthcoming episode). It is assumed that 'produce_items' is a program or function that generates individual strings or numbers which are read by the 'read' in the loop and appended to the array. Then 'do_stuff_with' deals with all of the elements of the array.

This is what clacke says about it:

"items" gets updated just fine, in a subshell, and then after the pipe has finished executing, execution continues in the parent shell where the array is still empty.

What looks like instances of the same array outside and inside the loop are in fact separate arrays.

Avoiding the pipe pitfalls

We looked at the subject of process substitution in the Bash Tips series in show 6, episode 2045 (and also briefly considered the pipe problem which we’ve just examined in detail).

In that show we saw that the loop could be provided with data for a 'read' command by such a process:

$ unalias ls
$ count=0
$ while read name; do ((count++)); done < <(ls *.mp3)
$ echo "$count"
10

Here the 'while' loop runs in the parent process reading lines from the separate process containing the 'ls' command. This time the count is correct because we’re not counting in the subshell of a pipeline and expecting the result to be available to the parent process.3

The example clacke mentioned could also be remodelled as:

items=()
while read item; items+=( "$item" ); done < <(produce_items):
do_stuff_with "${items[@]}"

The downloadable script in bash15_ex1.sh demonstrates a simplified version of the above example using the (now probably infamous) /usr/share/dict/words:

$ cat bash15_ex1.sh
#!/bin/bash

#-------------------------------------------------------------------------------
# Example 1 for Bash Tips show 15 - a working example similar to clacke's
# problem example in the comments to HPR episode 2651
#-------------------------------------------------------------------------------

#
# Initialise an array
#
items=()

#
# Populate the array with random words
#
while read -r item; do
    items+=( "$item" )
done < <(grep -E -v "'s$" /usr/share/dict/words | shuf -n 5)

#
# Print the array with word numbers
#
for ((i = 0, j = 1; i < ${#items[@]}; i++, j++)); do
    echo "$j: ${items[$i]}"
done

Invoking the script results in a list of random words:

1: thruways
2: crimsoning
3: destructing
4: cadaver
5: pocketknives

It’s also possible to do something similar using a 'for' loop as in the following downloadable example bash15_ex2.sh:

$ cat bash15_ex2.sh
#!/bin/bash

#-------------------------------------------------------------------------------
# Example 2 for Bash Tips show 15 - you can also use a 'for' loop to load an
# array
#-------------------------------------------------------------------------------

#
# Initialise an array
#
items=()

#
# Populate the array with random words
#
for word in $(grep -E -v "'s$" /usr/share/dict/words | shuf -n 5); do
    items+=( "$word" )
done

#
# Print the array with word numbers
#
for ((i = 0, j = 1; i < ${#items[@]}; i++, j++)); do
    echo "$j: ${items[$i]}"
done

I will leave you to try this one out; the result is the same as example 1 (with different words).

Using find instead of ls

Another improvement to the earlier file counting example would be to to avoid the use of 'ls' and instead use 'find'. This command (and a number of others in the GNU Findutils manual) warrants a whole show or set of shows because it is so full of features, but for now we’ll just look at how it can be used in this context.

The typical way of using 'find' is like this:

find directory options

For example, to find all files in the current directory with a suffix of '.mp3' use:

find . -name '*.mp3' -print

The '-name' option defines a glob pattern to match the files we need returned. This must be quoted otherwise Bash will expand it on the command line, and we want 'find' to do that. The '-print' option causes the file to be reported. In this case the path of the file (relative to the nominated or defaulted directory) is also reported.

$ find . -name '*.mp3' -print
./theorize.mp3
./neckline.mp3
./sniffing.mp3
./fabled.mp3
./birettas.mp3
./salamis.mp3
./overtone.mp3
./dizzying.mp3
./skunked.mp3
./astonish.mp3

Unlike 'ls' the 'find' command does not sort the files.

One other difference from 'ls' is that 'find' will search any subdirectories as well. The following example makes a sub-directory called 'subdir' and creates a file within it. The 'find' command limits the search to files that begin with 'a' or 'i' for brevity:

$ mkdir subdir
$ touch subdir/ignorethisfile.mp3
$ find . -name '[ai]*.mp3' -print
./subdir/ignorethisfile.mp3
./astonish.mp3

Another option '-maxdepth' can be used to limit searches to the current directory (this option must precede '-name'):

$ find . -maxdepth 1 -name '[ai]*.mp3' -print
./astonish.mp3

So, using 'find' rather than 'ls' the earlier example might be:

$ count=0
$ while read name; do ((count++)); done < <(find . -maxdepth 1 -name "*.mp3")
$ echo "$count"
10

Using extglob-enabled extended patterns

Finally, let’s look at how the patterns available when the 'extglob' option is turned on can help to find files in a loop.

Since doing show 2293, where I looked at extended pattern matching features and the 'extglob' option enabled by the 'shopt' command, I have been using this capability a lot. As I mentioned in the show, my Debian system has 'extglob' enabled by default as part of the Bash completion extension. If your operating system does not do this you can set the option as described in show 2293.

The following example uses the files mentioned above where the sub-directory created earlier is still present. It uses a 'for' loop with the pattern '+(i|sa|t)*.mp3' which selects files beginning with 'i', with 'sa' and with 't'. Note that the second case contains two letters which is not something we can specify with simple glob patterns:

$ for f in +(i|sa|t)*.mp3; do echo "$f"; done
salamis.mp3
theorize.mp3

No files beginning with 'i' were returned; but the only one that there is exists in the sub-directory, so we know that, unlike 'find' in its default form, this search does not visit the directory.

Note also that the files are sorted this time and do not have the directory './' on the front.

This is a good way to process files in a loop in some circumstances. For more complex requirements the big guns of the 'find' command are often needed.

Future topics

There are other issues related to those we have examined here that need to be looked at in future episodes. For example:

  • A guide to arrays in Bash; types of arrays, how to initialise them and how to access them
  • More about the 'find' command
  • The features of the 'read' command

We will cover these topics in upcoming episodes of Bash Tips.


  1. Also, Unix and Linux filenames can contain a wide range of characters which lead to complications which 'ls' doesn’t help with.

  2. In case it is of interest, a group of 10 dummy *.mp3 files were generated for testing here. This was done by the following loop:

    for w in $(grep -E -v "'s$" /usr/share/dict/words | grep -E '^.{3,8}$' | shuf -n 10); do
    touch ${w}.mp3
    done

    Inside the command substitution the first 'grep' removes all possessive forms of words. The second one matches words between 3 and 8 characters in length, and 'shuf' then extracts 10 random words from all of that. The 'touch' command creates an empty file with the suffix '.mp3' using each word as the filename.

  3. It didn’t occur to me at the time, but the process substitution would be the better place to unalias 'ls'. Using <(unalias ls; ls *.mp3) means the alias is only removed in the sub-process, not the main login process.