Bash Tips - 17 (HPR Show 2719)

Dave Morriss


Table of Contents

Arrays in Bash

This is the second of a small group of shows on the subject of arrays in Bash. It is also the seventeenth show in the Bash Tips sub-series.

In the last show we saw the two types of arrays, and learned about the multiple ways of creating them and populating them. We also looked at how array elements and entire arrays are accessed.

Now we want to continue looking at array access and some of the various parameter expansion operations available.

Negative indices with indexed arrays

When we looked at indexed array subscripts in the last episode we only considered positive numbers (and the '*' and '@' special subscripts). It is also possible to use negative numbers which index relative to the end of the array. The index '-1' means the last element, '-2' the penultimate, and so forth.

The downloadable script in bash17_ex1.sh demonstrates a use of negative indices:

#!/bin/bash

#-------------------------------------------------------------------------------
# Example 1 for Bash Tips show 17: Negative indices
#-------------------------------------------------------------------------------

#
# Seed the Fibonacci sequence in an indexed array
#
declare -a fib=(0 1 1)

#
# Populate the rest up to (and including) the 20th element
#
for ((i = 3; i <= 20; i++)); do
    fib[$i]=$((fib[i-2]+fib[i-1]))
done

#
# Show the whole array
#
echo "Fibonacci sequence"
echo "${fib[*]}"
echo

#
# Print a few elements working backwards
#
for i in {-1..-4}; do
    echo "fib[$i] = ${fib[$i]}"
done

exit

The script seeds an indexed array called 'fib' with the start of the Fibonacci sequence. This sequence builds its elements by adding together the previous two, and that is what the 'for' loop does, up to the 20th element.

Note that in the 'for' loop an arithmetic expansion expression is used: $((fib[i-2]+fib[i-1])) which does not require dollar signs or curly brackets inside it.

The script prints all of the generated numbers then picks out the last four to demonstrate negative indexing.

Invoking the script results in the following:

Fibonacci sequence
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765

fib[-1] = 6765
fib[-2] = 4181
fib[-3] = 2584
fib[-4] = 1597

Concatenating arrays

There is no special syntax to concatenate one array to another. The simplest way to do this is using a command of the form:

array1=( "${array2[@]}" "${array3[@]}" )

The expression "${array2[@]}", as we already know, returns the entirety of 'array2' as a list of words. Effectively the parentheses are filled with the contents of each array as a list of separate words.

It is also possible to append an array to an already filled array thus:

array1+=( "${array4[@]}" )

The downloadable script in bash17_ex2.sh demonstrates array concatenation1:

#!/bin/bash

#-------------------------------------------------------------------------------
# Example 2 for Bash Tips show 17: Array concatenation
#-------------------------------------------------------------------------------

#
# Make three indexed arrays
#
declare -a a1 a2 a3

#
# Seed the random number generator
#
RANDOM=$(date +%N)

#
# Place 10 random numbers between 1..100 into the arrays a1 and a2
#
for ((i=1; i<=10; i++)); do
    a1+=( $(( ( RANDOM % 100 ) + 1 )) )
    a2+=( $(( ( RANDOM % 100 ) + 1 )) )
done

#
# Show the results
#
echo "a1: ${a1[*]}"
echo "a2: ${a2[*]}"

#
# Concatenate a1 and a2 into a3 and show the result
#
a3=( "${a1[@]}" "${a2[@]}" )
echo "a3: ${a3[*]}"

Note the use of the special 'RANDOM' variable which generates a (pseudo) random integer between 0 and 32767 on each access. To ensure the random sequence is not the same on each use the generator can be seeded which is what the command RANDOM=$(date +%N) does.

Invoking the script results in the following:

a1: 34 9 70 68 5 36 63 2 11 84
a2: 13 87 58 10 33 88 35 51 45 83
a3: 34 9 70 68 5 36 63 2 11 84 13 87 58 10 33 88 35 51 45 83

Parameter expansion operations and arrays

Back in episode 1648 in 2014 I described most of the Bash parameter expansion operations available, some in the context of arrays. Now I want to visit these again as well as a few more.

Substring expansion

This performs two different functions:

  • sub-strings can be selected from strings
  • array element subsets can be extracted from arrays

The syntax of this feature is:

${parameter:offset}
${parameter:offset:length}

Both the offset and length are arithmetic expressions, which may be negative in some cases – which means to count backwards from the end of the string (or indexed array elements). A negative offset must be preceded by a space to stop Bash from interpreting it as another type of expansion. The negative length is only permitted with strings, not arrays. If length is omitted the remainder of the string or array after offset is returned.

When used with a single array element it is possible to extract parts of the string2:

$ declare -a planets=(mercury venus earth mars jupiter saturn uranus neptune)
$ echo "${planets[4]:2:3}" # middle letters of 'jupiter'
pit
$ echo "${planets[5]: -3:2}" # first two of the last 3 letters of 'saturn'
ur
$ echo "${planets[5]: -3}" # last 3 letters of 'saturn'
urn
$ echo "${planets[6]:1:-1}" # start at letter 1 up to but not including the last letter
ranu

When used with the entirety of an indexed array (subscript '@' or '*') then array elements are extracted:

$ echo "${planets[@]:1:3}"
venus earth mars
$ echo "${planets[@]: -3:2}" # count back 3 from the end, display 2 elements
saturn uranus
$ echo "${planets[@]: -3}"
saturn uranus neptune

As mentioned, the length may not be negative when using substring expansion to select indexed array elements.

Experiments have shown that elements can also be extracted from associative arrays with substring expansion, though since the element order is not defined the results may not be reliable.


Note: You might want to skip this section since it’s discussing a non-documented feature which shouldn’t be used in production.

The downloadable script in bash17_ex3.sh demonstrates a use of “substring expansion” with associative arrays:

#!/bin/bash

#-------------------------------------------------------------------------------
# Example 3 for Bash Tips show 17: Using "substring expansion" to extract
# associative array elements
#-------------------------------------------------------------------------------

#
# Make two indexed arrays each containing 10 letters. Note: this is not the
# best way to do this!
#
declare -a a1=( $(echo {a..j}) )
declare -a a2=( $(echo {k..t}) )

#
# Build an associative array using one set of letters as subscripts and the
# other as the values
#
declare -A hash
for ((i=0; i<10; i++)); do
    hash[${a1[$i]}]="${a2[$i]}"
done

#
# Display the associative array contents
#
echo "Contents of associative array 'hash'"
for key in "${!hash[@]}"; do
    printf '%s=%s\n' "hash[$key]" "${hash[$key]}"
done
echo

#
# Walk the associative array printing pairs of values
#
echo "Pairs of values from array 'hash'"
for ((i=1; i<10; i+=2)); do
    printf '%d: %s\n' "$i" "${hash[*]:$i:2}"
done

The two indexed arrays 'a1' and 'a2' are filled with a series of 10 letters and these are then used to build the test associative array 'hash'. This array is printed by the script to show what we did.

Note that we used the expression "${!hash[@]}" which returns a list of the subscripts for the 'hash' array. We’ll look at this in more detail shortly.

Note also the use of "${hash[*]:$i:2}" using '*' in the final 'printf'. This ensures that the two array elements returned are stored in one word. This allows us to use '%s' in the 'printf' format to print the two values as one.

The final loop in the script uses substring expansion to display pairs of array elements. It does this successfully, but it may well be that more complex examples will not work.

Invoking the script results in the following:

Contents of associative array 'hash'
hash[a]=k
hash[b]=l
hash[c]=m
hash[d]=n
hash[e]=o
hash[f]=p
hash[g]=q
hash[h]=r
hash[i]=s
hash[j]=t

Pairs of values from array 'hash'
1: k l
3: m n
5: o p
7: q r
9: s t

I tried another experiment like the previous one, this time using random words. I found this one worked too.

The downloadable script in bash17_ex4.sh contains this experiment. I will leave it for you to investigate further if this interests you.


List keys (indices or subscripts)

This expansion gives access to the indices, subscripts or keys of arrays. The syntax is:

${!name[@]}
${!name[*]}

If name is an array variable, expands to the list of array indices (keys) assigned in name. If name is not an array, expands to 0 if name is set and null otherwise. When '@' is used and the expansion appears within double quotes, each key expands to a separate word.

This is used in bash17_ex3.sh and bash17_ex4.sh to enable the associative arrays to be printed with their keys. The 'for' loop uses:

for key in "${!hash[@]}"; do

The choice between '*' and '@' (when the expansion is written in double quotes) determines whether the keys are returned as one concatenated word or as a series of separate words.

Length of string or array

We saw this expansion in show 1648. The syntax is:

${#parameter}

${#name[@]}
${#name[*]}

In the case where parameter is a simple variable this returns the length of the contents (i.e. the length of the string produced by expanding the parameter).

$ veggie='kohlrabi'
$ echo "${#veggie}"
8

In the case of an array with an index of '*' or '@' then it returns the number of elements in the array:

$ declare -a vegs=(celeriac artichoke asparagus)
$ echo "${#vegs[@]}"
3
$ echo "${vegs} ${#vegs}"
celeriac 8

Note that using just the name of an indexed array (without a subscript) returns the length of the first element (the '[0]' index is assumed, as we discussed last episode).

Removing leading or trailing parts that match a pattern

Again we looked at these in show 1648. There are four syntaxes listed in the manual:

${parameter#word}
${parameter##word}

${parameter%word}
${parameter%%word}

In these word is a glob pattern (or an extglob pattern if enabled). The form using one or two '#' characters after the parameter removes leading characters and the one using one or two '%' characters removes trailing characters.

The significance of the single versus the double '#' and '%' is that in the single case the shortest leading/trailing pattern is matched. In the double case the longest leading/trailing pattern is matched.

The downloadable script in bash17_ex5.sh demonstrates the use of removing leading and trailing strings matching patterns in a variety of ways:

#!/bin/bash

#-------------------------------------------------------------------------------
# Example 5 for Bash Tips show 17: Trimming leading or trailing parts
#-------------------------------------------------------------------------------

#
# Make an indexed array of root vegetables
#
declare -a vegs=(celeriac artichoke asparagus parsnip mangelwurzel daikon turnip)
printf '%s\n\n' "${vegs[*]}"

#
# Demonstrate some trimming
#
echo "1. Removing the first character:"
echo "${vegs[@]#?}"

echo "2. Removing characters up to and including the first vowel:"
echo "${vegs[@]#*[aeiou]}"

echo "3. Removing characters up to and including the last vowel:"
printf '[%s] ' "${vegs[@]##*[aeiou]}"
echo

echo "4. Using an extglob pattern to remove several different leading patterns:"
shopt -s extglob
echo "${vegs[@]#@(cele|arti|aspa|mangel)}"

echo "5. Removing the last character":
echo "${vegs[@]%?}"

echo "6. Removing from the last vowel to the end:"
echo "${vegs[@]%[aeiou]*}"

echo "7. Removing from the first vowel to the end:"
printf '[%s] ' "${vegs[@]%%[aeiou]*}"
echo

echo "8. Using an extglob pattern to remove several different trailing patterns:"
echo "${vegs[@]%@(iac|oke|gus|nip|zel)}"

Note the use of 'printf' in the script. This is used to enclose the results of the trimming in square brackets in order to make the results clearer. In some cases the trimming has removed the entirety of the string which would have been harder to see if this hadn’t been done.

Invoking the script results in the following:

celeriac artichoke asparagus parsnip mangelwurzel daikon turnip

1. Removing the first character:
eleriac rtichoke sparagus arsnip angelwurzel aikon urnip
2. Removing characters up to and including the first vowel:
leriac rtichoke sparagus rsnip ngelwurzel ikon rnip
3. Removing characters up to and including the last vowel:
[c] [] [s] [p] [l] [n] [p] 
4. Using an extglob pattern to remove several different leading patterns:
riac choke ragus parsnip wurzel daikon turnip
5. Removing the last character:
celeria artichok asparagu parsni mangelwurze daiko turni
6. Removing from the last vowel to the end:
celeri artichok asparag parsn mangelwurz daik turn
7. Removing from the first vowel to the end:
[c] [] [] [p] [m] [d] [t] 
8. Using an extglob pattern to remove several different trailing patterns:
celer artich aspara pars mangelwur daikon tur

  1. I realised I hadn’t discussed associative array concatenation as I was recording the audio. There is no simple way to concatenate these types of arrays. However, we will look at a way of doing this in the next episode.

  2. I added the last example after realising there was no negative length when recording the audio.