Useful Bash functions - part 4 (HPR Show 2483)

Dave Morriss


Table of Contents

Overview

This is the fourth show about the Bash functions I use, and it may be the last unless I come up with something else that I think might be of general interest.

There is only one function to look at this time, but it’s fairly complex so needs an entire episode devoted to it.

As before it would be interesting to receive feedback on this function and would be great if other Bash users contributed ideas of their own.

The range_parse function

The purpose of this function is to read a string containing a range or ranges of numbers and turn it into the actual numbers intended. For example, a range like 1-3 means the numbers 1, 2 and 3.

I use this a lot. It’s really helpful when writing a script to select from a list. The script can show the list with a number against each item, then ask the script user to select which items they want to be deleted, or moved or whatever.

For example, I manage the podcasts I am listening to this way. I usually have two or three players with playlists on them. When the battery on one needs charging I can pick up another and continue listening to whatever is on there. I have a script that knows which playlists are on which player, and it asks me which episode I am listening to by listing all the playlists. I answer with a range. Another script then asks which of the episodes that I was listening to have finished. It then deletes the episodes I have heard.

Parsing a collection of ranges then is not particularly difficult, even in Bash, though dealing with some of the potential problems complicates matters a bit.

The function range_parse takes three arguments:

  1. The maximum value allowed in the range (the minimum is fixed at 1)
  2. The string containing the range expression itself
  3. The name of the variable to receive the result

An example of using the function might be:

$ source range_parse.sh
$ range_parse 10 '1-4,7,3,7' parsed
$ echo $parsed
1 2 3 4 7

The function has dealt with the repetition of 7 and the fact that the 3 is already in the range 1-4 and has sorted the result as a string that can be placed in an array or used in a for loop.

Algorithm

The method used for processing the range presented to the function is fairly simple:

  1. The range string is stripped of spaces
  2. It is checked to ensure that the characters it contains are digits, commas and hyphens. If not then the function ends with an error
  3. The comma-separated elements are selected one by one
    • Elements consisting of groups of digits (i.e. numbers) are stored away for later
    • If the element contains a hyphen then it is checked to ensure it consists of two groups of digits separated by the hyphen, and it is split up and the range of numbers between its start and end is determined
    • The results of the step-by-step checking of elements is accumulated for the next stage
  4. The accumulated elements are checked to ensure they are each in range. Any that are not are rejected and an error message produced showing what was rejected.
  5. Finally all of the acceptable items are sorted and any duplicates removed and returned as a list in a string. If any errors occurred in the analysis of the range the function returns a ‘false’ value to the caller, otherwise ‘true’ is returned. This allows it to be used where a true/false value is expected, such as in an if statement, if desired.

Analysis of function

Here is the function itself, which may be downloaded from the HPR website as range_parse.sh:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
#===  FUNCTION  ================================================================
#         NAME: range_parse
#  DESCRIPTION: Parse a comma-separated list of numbers and "number-number"
#               ranges such as '1,3,5-7,9'
#   PARAMETERS: 1 - maximum limit of the range
#               2 - entered range expression (e.g. 1-3,7,14)
#               3 - name of the variable to receive the result
#      RETURNS: Writes a list of values to the nominated variable and returns
#               0 (true) if the range parsed, and 1 (false) if not
#===============================================================================
function range_parse {
    local max=${1?range_parse: arg1 missing}
    local range=${2?range_parse: arg2 missing}
    local -n result=${3?range_parse: arg3 missing}

    local item selection sel err msg exitcode=0

    #
    # Remove spaces from the range
    #
    range=${range// /}

    #
    # Check for invalid characters
    #
    if [[ $range =~ [^0-9,-] ]]; then
        echo "Invalid range: $range"
        return 1
    fi

    #
    # Slice up the sub-ranges separated by commas and turn all n-m expressions
    # into the intermediate values. Trim the trailing space from the
    # concatenation.
    #
    until [[ -z $range ]]; do
        #
        # Get a comma-separated item
        #
        if [[ $range =~ [,] ]]; then
            item=${range%%,*}
            range=${range#*,}
        else
            item=$range
            range=
        fi

        #
        # Look for a 'number-number' expression
        #
        if [[ $item =~ [-] ]]; then
            if [[ $item =~ ^([0-9]{1,})-([0-9]{1,})$ ]]; then
                item=$(eval "echo {${item/-/..}}")
            else
                echo "Invalid sequence: ${item}"
                item=
                exitcode=1
            fi
        fi
        selection+="$item "
    done

    #
    # Check for out of bounds problems, sort the values and and make unique
    #
    if [[ -n $selection ]]; then

        #
        # Validate the resulting range
        #
        for i in $selection; do
            if [[ $i -lt 1 || $i -gt $max ]]; then
                err+="$i "
            else
                sel+="$i "
            fi
        done

        #
        # Report any out of range errors
        #
        if [[ ${err+"${err}"} ]]; then
            msg="$(for i in ${err}; do echo "$i"; done | sort -un)"
            msg="${msg//$'\n'/ }"
            printf "Value(s) out of range: %s\n" "${msg}"
            exitcode=1
        fi

        #
        # Rebuild the selection after having removed errors
        #
        selection=
        if [[ ${sel+"${sel}"} ]]; then
            selection="$(for i in ${sel}; do echo "$i"; done | sort -un)"
            selection="${selection//$'\n'/ }"
        fi
    fi

    #
    # Return the result
    #
    result="$selection"

    return $exitcode
}
  • Line 11: There are two ways of declaring a function in Bash. The function name may be followed by a pair of parentheses and then the body of the function (usually enclosed in curly braces). Alternatively the word function is followed by the function name, optional parentheses and the function body. There is no significant difference between the two methods.

  • Lines 12 and 13: The first two arguments for the function are stored in local variables max (the maximum permitted number in the range) and range (the string holding the range expression to parse). In both cases we use the parameter expansion feature which halts the script with an error message if these arguments are not supplied.

  • Line 14: Here local -n is used for the local variable result which is to hold the name of a variable external to the function which will receive the result of parsing the expression. Using the -n option makes it a nameref; a reference to another variable. The definition in the Bash manual is as follows:

Whenever the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value. A nameref is commonly used within shell functions to refer to a variable whose name is passed as an argument to the function.

There is more to talk about with nameref variables, but we will leave that for another time.

  • Line 16: Some other variables local to the function are declared here, and one (exitcode) is given an initial value.

  • Line 21: Here all spaces are being removed from the range list in variable range.

  • Lines 26 to 29: In this test the range variable is being checked against a regular expression consisting only of the digits 0-9, a comma and a hyphen. These are the only characters allowed in the range list. If the match fails an error message is written and the function returns with a ‘false’ value.

  • Lines 36-61: This is the loop which chops up the range list into its component parts. Each time it iterates a comma-separated element is removed from the range variable, which grows shorter, and the test:

    until [[ -z $range ]]
    will become true when nothing is left.
    • Lines 40-46: This if statement looks to see if the range variable contains a comma, using a regular expression.
      • If it does a variable called item is filled with the characters of range up to the first comma. Then range is set to its previous contents without the part up to the first comma.
      • If there was no comma then item is set to the entirety of range and range is emptied. This is because this must be the last (or only) element.
    • Lines 51-59: At this point the element in item is either a plain number or a range expression of the form ‘number-number’. This pair of nested if statements determine if it is the latter and attempt to expand the range. The outer if tests item against a regular expression consisting of a hyphen, and if the result is true the inner if is invoked1.
      • Line 52: compares the contents of item against a more complex regular expression. This one looks for one or more digits, a hyphen, and one or more digits.
        • If found then item is edited to replace the hyphen by a pair of dots. This is inside braces as the argument to an echo statement. So, given 1-5 in item the echo will be given {1..5}, a brace expansion expression. The echo is the command of an eval statement (needed to actually execute the expansion), and this is inside a command expansion. The result should be that item is filled with the numbers from the expansion so 1-5 becomes ‘1 2 3 4 5’!
        • If the regular expression does not match then this is not a valid range, so this is reported in the else branch and item is cleared of its contents. Also, since we want this error reported to the caller we set exitcode to 1 for later use.
      • Line 60: Here a variable called selection is being used to accumulate the successive contents of item on each iteration. We use the += form of assignment to make it easier to do this accumulation. Notice that a trailing space is added to ensure none of the numbers collide with one another in the string.
  • Lines 66-97: This is an if statement which tests to see if the variable selection contains anything. If it does then the contents are validated.
    • Lines 71-77: This is a loop which cycles through the numbers in the variable. It is a feature of this form of the for loop that it operates on a list of space-separated items, and that’s what selection contains.
      • Lines 72-76: This if statement checks each number to ensure that it is in range between 1 and the value in the variable max.
        • If it is not in range then the number is appended to the variable err
        • If it is in range it is appended to the variable sel
    • Lines 82-87: This if statement tests to determine whether there is anything in the err variable. If it contains anything then there have been one or more errors, so we want to report this. The test used here seems very strange. The reason for it is discussed below in the Explanations section, explanation 1.
      • Line 83: The variable msg is filled with the list of errors. This is done with a command substitution expression where a for loop is used to list the numbers in err using an echo command and these are piped to the sort command. The sort command makes what it receives unique and sorts the lines numerically. This rather involved pipeline is needed because sort requires a series of lines, and these are provided by the echo. This deals with the possible duplication of the errors and the fact that they are not necessarily in any particular order.
      • Line 84: Because the process of sorting the erroneous numbers and making them unique has added newlines to them all we use this statement to remove them. This is an example of parameter expansion, and in this one the entire string is scanned for a pattern and each one is replaced by a space. There is a problem with replacing newlines in a string however, since there is no simple way to represent them. Here we use $'\n' to do this. See the Explanations section below (explanation 2) for further details.
      • Line 85 and 86: The string of erroneous number is printed here and exitcode is set to 1 so the function can flag that there has been an error when it exits. It doesn’t exit though since some uses will simply ignore the returned value and carry on regardless.
    • Lines 92-96: At this point we have extracted all the valid numbers and stored them in sel and we want to sort them and make them unique as we did with err before returning the result to the caller. We start by emptying the variable selection in anticipation.
      • Line 93: This if statement checks that the sel variable actually contains anything. This test uses the unusual construct ${sel+"${sel}"}, which was explained for an earlier test. (See explanation 1 in the Explanations section below).
      • Line 94 and 95: These rebuild selection by extracting the numbers from sel, sorting them and making them unique, and then removing the newlines this process has added. See the notes for lines 82-87 above and explanation 2 below.
  • Line 102: Here the variable result is set to the contents of selection. Now, since result is a nameref variable containing the name of a variable passed in when the range_parse function was called it is that variable that receives the result.

  • Line 104: Here the function returns to the caller. The value returned is whatever is in exitcode. By default this is zero, but if any sort of error has occurred it will have been set to 1, as discussed earlier.

Explanations

  1. The expression ${err+"${err}"} (see Lines 82-87 above), also ${sel+"${sel}"} (see Line 93 above): As far as I can determine this strange expression is needed because of a bug in the version of Bash I am running.

    In all of my scripts I include the line set -o nounset (set +u is equivalent) which has the result of treating the use of unset variables in parameter expansion as a fatal error. The trouble is that either err and sel might be unset in this function in some circumstances. This will result in the function stopping with an error. It should be possible to test a variable to see whether it is unset without the function crashing!

    This expression is a case of a parameter expansion of the ${parameter:+word} type, but without the colon. It returns a null string if the parameter is unset or null or the contents if it has any - and it does so without triggering the unset variable alarm.

    I don’t like resorting to “magic” solutions like this but it seems to be a viable way of avoiding this issue.

  2. The expression $'\n' (see Line 84 above): This is an example of ANSI-C quoting. See the GNU Bash Reference Manual in the ANSI-C Quoting section for the full details.

    The construct must be written as $'string' which is expanded to whatever characters are in the string with certain backslash sequences being replaced according to the ANSI-C standard. This allows characters such as newline (\n) and carriage return (\r) as well as Unicode characters to be easily inserted. For example echo $'\U2192' produces → (in a browser and in many terminals).

Possible improvements

This function has been around the block for quite a few years. I wrote it originally for a script I developed at work in the 2000’s and have been refining and using it in many other projects since. Preparing it for this episode has resulted in some further refinements!

  • The initial space removal means that '7,1-5' and '7 , 1 - 5 ' are identical as far as the algorithm is concerned. It also means that '4 2', which might have been written that way because a comma was omitted, is treated as '42' which might be a problem.

  • The command substitutions which sort lists of numbers and make them unique have to make use of the sort command. Ideally I’d like to avoid using external programs in my Bash scripts, but trying to do this type of thing in Bash where sort does a fine job seems a little extreme!

  • The reporting of all of the numbers which are out of range could lead to a slightly bizarre error report if called with arguments such as 20 '5-200' (where the second zero was added in error). Everything from 21-200 will be reported as an error! The function could be cleverer in this regard.

Examples of use

Simple command line usage

$ source range_parse.sh
$ range_parse 10 '1-3,9,7' mylist
$ echo "$mylist"
1 2 3 7 9

$ range_parse 10 '9-6,1,11' mylist
Value(s) out of range: 11
$ echo "$mylist"
1 6 7 8 9

$ range_parse 10 1,,2 somevar
$ echo "$somevar"
1 2

The range_parse function does not care what order the numbers and ranges are organised in the comma-separated list. It does not care about range overlaps either, nor does it care about empty items in the list. It flags items which are out of range but still prepares a final list.

A simple demo script

The simple script called range_demo.sh, which may be downloaded from the HPR website is as follows:

#!/bin/bash -

#
# Test script to run the range_parse function
#

set -o nounset                              # Treat unset variables as an error

#
# Source the function. In a real script you'd want to provide a path and check
# the file is actually there.
#
source range_parse.sh

#
# Call range_parse with the first two arguments provided to this script. Save
# the output in the variable 'parsed'. The function is called in an 'if'
# statement such that it takes different action depending on whether the
# parsing was successful or not.
#
if range_parse "$1" "$2" parsed; then
    echo "Success"
    echo "Parsed list: ${parsed}"
else
    echo "Failure"
fi

exit

An example call might be:

$ ./range_demo.sh 10 1,9-7,2
Success
Parsed list: 1 2 7 8 9

If you download these files and test the function and find any errors please let me know!!


  1. Why do it this way? I did a double-take while preparing these notes wondering why I had organised the logic here in this way.

    The first part of the loop is concerned with getting the next item from a comma-separated list. At that point the contents of $item is either a bare number or a 'number-number' range. The differentiator between the two is a hyphen, so checking for that character allows the complex regular expression on line 52 to be omitted if it is not there.

    If you can think of a better way of doing this please let me know in the comments or by email.