Bash Tips - 16 (HPR Show 2709)

Dave Morriss


Table of Contents

Arrays in Bash

This is the first of a small group of shows on the subject of arrays in Bash. It is also the sixteenth show in the Bash Tips sub-series.

We have encountered Bash arrays at various points throughout this sub-series, and have even seen a number of examples, but the subject has never been examined in detail. This group of shows intends to make good this deficiency.

Types of arrays

Bash offers two types of arrays: indexed and associative.

Both types are one-dimensional. Indexed arrays are indexed by positive integers starting at zero. The indices do not have to be sequential (indexed arrays are sparse). Associative arrays are indexed by strings (like the equivalent in Awk). Both array types are unlimited in length and may contain strings.

Creating arrays

There are several ways of creating arrays in Bash, and the methods differ slightly between the two types.

Creating an indexed array

It is possible to declare an indexed array just by using a command of the form:

name[subscript]=value

For example:

fruits[0]='apple'
fruits[1]='pear'

Here 'fruits' is the array name, and the subscripts of the elements being initialised are 0 and 1. The values being set are the strings to the right of each equals sign.

The subscript must be a number or an expression which evaluates to a number:

i=0
fruits[$i]='apple'
fruits[$((++i))]='pear'
fruits[$((++i))]='grape'

Here variable 'i' starts at zero and then is incremented for each successive array element (using arithmetic expansion - see episode hpr1951).

The same effect can be achieved using a compound assignment of the general form:

name=([subscript1]=value1 [subscript2]=value2)

So, rewriting the last example:

i=0
fruits=([$i]='apple' [$((++i))]='pear' [$((++i))]='grape')

However, the '[subscript]=' part is optional and the whole thing could be written as:

fruits=('apple' 'pear' 'grape')

Using this format the index of the element assigned is the last index assigned to by the statement plus one.

It is even possible to append to an already populated array thus:

fruits+=('banana') # append to existing data in an array

Note the use of the '+=' operator here. A common mistake is to try and add to an array using the plain '=' for the assignment:

fruits=('banana') # clear the array and start again

This will empty the array and write 'banana' to the zero indexed (first) element.

Another way to define an indexed array is with the 'declare' builtin command:

declare -a name

The '-a' option specifies that 'name' is an indexed array as opposed to other types of variables that can be created with this command.

There are some interesting features in the 'declare' builtin command in the context of arrays which we will look at in a later show.

Creating an associative array

As we have seen, with indexed arrays the indices can be derived implicitly (as sequential numbers), but associative arrays use strings as their indices, so these have to be defined explicitly.

Unlike indexed arrays, before working with an associative array it has to be declared explicitly:

declare -A capitals

Then the following syntax initialises an element:

name[subscript]=value

The subscript does not need to be quoted if it contains a space, but other characters in subscripts may need quotes. For example none of the following need to be quoted:

declare -A capitals
capitals[England]='London'
capitals[Scotland]='Edinburgh'
capitals[Wales]='Cardiff'
capitals[Northern Ireland]='Belfast'

As before the same effect can be achieved using a compound assignment, but, unlike the indexed array, the subscript cannot be omitted:

declare -A capitals
capitals=([England]='London' [Scotland]='Edinburgh' [Wales]='Cardiff' [Northern Ireland]='Belfast')

It is also possible to populate the array at declaration time:

declare -A capitals=([England]='London' [Scotland]='Edinburgh' [Wales]='Cardiff' [Northern Ireland]='Belfast')

Using non-alphanumeric subscripts will always require quoting:

declare -A chars
chars['[']='open square bracket'
chars[']']='close square bracket'

Accessing array elements

A simple way of visualising the contents of either type of array is by using 'declare -p'. This generates a string which can be used as a command which can be used to rebuild the array if needed.

For example:

$ declare -p fruits capitals chars
declare -a fruits=([0]="apple" [1]="pear" [2]="grape" [3]="banana")
declare -A capitals=(["Northern Ireland"]="Belfast" [England]="London" [Wales]="Cardiff" [Scotland]="Edinburgh")
declare -A chars=(["["]="open square bracket" ["]"]="close square bracket")

Note that the ordering of associative array elements is arbitrary. Note also, that the 'Northern Ireland' subscript is quoted by Bash and of course the subscripts in the 'chars' array are quoted.

The usual way to access array elements is with the following syntax:

${name[subscript]}

Do not omit the curly brackets

The curly brackets (braces) are required to avoid conflicts with Bash’s filename expansion operators. The expression: '$fruits[1]' will be parsed as the contents of a variable called 'fruits' followed by the glob range expression containing the digit '1'.

For the arrays we’ve been using so far these are the sort of results that result from omitting the braces:

$ echo $fruits[1]
apple[1]
$echo $capitals[1]
[1]
$ ls $fruits[1]
ls: cannot access 'apple[1]': No such file or directory

When an array name is used without a subscript it is interpreted as the element with index zero. For an indexed array this may return an actual value, but for an associative array it depends on whether there’s an element with the string ‘0’ as a subscript.

$ declare -A hash=([a]=42 [b]=97 [0]='What is this?')
$echo $hash
What is this?

With curly brackets

Using the braces we see:

$ echo "${fruits[1]}"
pear

Accessing all elements of an array

There are two special subscripts that return all elements of an array. These are '@' and '*'.

For example:

$ echo "${fruits[@]}"
apple pear grape banana

The difference between '@' and '*' is only apparent when the expression is written in double quotes:

  • '*' - the array elements are returned as a single word separated by whatever the first character of the 'IFS'1 variable is (usually a space).
  • '@' - the array elements are returned as a list of words.

This can be seen when expanding an array in a loop.

The downloadable script in bash16_ex1.sh demonstrates this (especially for fans of /usr/share/dict/words):

$ cat bash16_ex1.sh
#!/bin/bash

#-------------------------------------------------------------------------------
# Example 1 for Bash Tips show 16: the difference between '*' and '@' as array
# subscripts
#-------------------------------------------------------------------------------

#
# Initialise an array
#
declare -a words

#
# Populate the array. Omit capitalised words and the weird possessives.
# [Note: there are better ways of populating arrays as we'll see in a later
# show]
#
for word in $(grep -E -v "(^[A-Z]|'s$)" /usr/share/dict/words | shuf -n 5); do
    words+=( "$word" )
done

#
# Report the array using '*' as the index
#
echo 'Using "${words[*]}"'
for word in "${words[*]}"; do
    echo "$word"
done

#
# Report the array using '@' as the index
#
echo 'Using "${words[@]}"'
for word in "${words[@]}"; do
    echo "$word"
done

Invoking the script results in the array of random words being reported in two ways:

Using "${words[*]}"
fatalistic rashes processioned grottoes abusively
Using "${words[@]}"
fatalistic
rashes
processioned
grottoes
abusively

  1. I knew I’d talked about 'IFS' before as I was recording the audio but forgot which show it was. Have a look at the long notes for hpr2045 if you want more information.