Some additional Bash tips (HPR Show 1951)

Dave Morriss


Table of Contents

Expansion

As we saw in the last episode 1903 there are seven types of expansion applied to the command line in the following order:

  • Brace expansion (we looked at this subject in episode 1884)
  • Tilde expansion (seen in episode 1903)
  • Parameter and variable expansion (this was covered in episode 1648)
  • Command substitution (seen in episode 1903)
  • Arithmetic expansion
  • Word splitting
  • Pathname expansion

There is also another, process substitution, which occurs after arithmetic expansion on systems that can implement it.

We will look at one more of these expansion types in this episode but since there is a lot to cover, we'll continue this subject in a later episode.

Note

For this episode I have changed the convention I am using for indicating commands and their output to the following:

$ echo "Message"
Message

The line beginning with a $ is the command that is typed and the rest is what is returned by the command.

It was pointed out to me that there was ambiguity in the examples in previous episodes, for which I apologise.

Arithmetic expansion

This form of expansion evaluates an arithmetic expression and returns the result. The format is:

$((expression))

So an example might be:

$ echo $((42/5))
8

This is integer arithmetic; the fractional part is simply thrown away.

To digress: if you want the full fractional answer then using the bc command would probably be wiser. This was covered in Dann Washko's "Linux in the Shell" series in HPR show number 1202.

For example, using bc in command substitution as in:

$ echo $(echo "scale=2; 42/5" | bc)
8.40

The "scale=2" is required to make bc output the result with two decimal places. By default it does not do this.

Note that using echo to report the result of this command sequence is not normally useful. It is used here just to demonstrate the point. Writing something like the following makes more sense in a script:

$ res=$(echo "scale=2; 42/5" | bc)
$ echo $res
8.40

The expressions allowed by Bash in arithmetic expansion include the use of variables. Normally these variables are written as just the plain name without the leading '$', though adding this is permitted. For example:

$ x=42
$ echo $((x/5))
8
$ echo $(($x/5))
8

There are potential pitfalls with using the '$' however, as we will see. The expression is subject to variable expansion, so in the second example above $x becomes 42 and the expression resolves to 42/5.

If a variable is null or unset (and is used without the leading '$') then it evaluates to zero. This is another reason not to use the parameter substitution method.

The value of a variable is always interpreted as an integer. If it is not an integer (for example, if it's a text string) then it is treated as zero.

$ str="A"
$ echo $((str*2))
0

$ str="0xA"
$ echo $((str*2))
20

Bash also interprets non-decimal numerical constants (as in the second example). For a start, any number beginning with a zero is taken to be octal, and hexadecimal numbers are denoted by a leading 0x or 0X.

Be aware that the way in which octal constants are written lead to unexpected outcomes:

$ x=010
$ echo $((x))
8
$ x=018
$ echo $((x))
bash: 018: value too great for base (error token is "018")
$ printf -v x "%03d\n" 19
$ echo $((x))
bash: 019: value too great for base (error token is "019")

There is also a complete system of defining numbers with bases between 2 and 64. Such numbers are written as:

base#number

If the 'base#' is omitted then base 10 is used (or the octal and hexadecimal conventions above may be used).

Like in hexadecimal numbers, other characters are used to show the digits of other bases. These are in order 'a' to 'z', 'A' to 'Z' and '@' and '_'.

The contexts in which these number formats are understood by Bash are limited. Consider the following:

$ x=16#F
$ echo $x
16#F
$ x=0xF
$ echo $x
0xF

Bash has not converted these values, but has treated them like strings.

It is possible to declare a variable as an integer (and set its value) thus:

$ declare -i II=16#F
$ echo $II
15

In this case the base 16 number has been converted.

There is also a let command that will evaluate such numeric constants:

$ let x=16#F
$ echo $x
15

Alternatively, using arithmetic expansion syntax causes interpretation to take place:

$ x=16#F
$ echo $((x))
15

The following loop could be used to examine the decimal values 0..64 using base64 notation. I have written it here as a short Bash script which could be placed in a file:

#!/usr/bin/env bash

for x in {0..9} {a..z} {A..Z} @ _; do
    n="64#$x"
    echo "$n=$((n))"
done

The script reports values like:

64#0=0
64#1=1
64#2=2
.
.
64#Y=60
64#Z=61
64#@=62
64#_=63

There is more than can be said about this, but I will leave you to explore. I could possibly talk about this subject in another episode if there is any interest.

Examples of Arithmetic Evaluation

The way in which the arithmetic expression in an arithmetic expansion is interpreted is defined in the Bash manpage under the ARITHMETIC EVALUATION heading. A copy of this is included at the end of these notes.

The use of arithmetic evaluation in Bash is quite powerful but has some problems. I could devote a whole episode to this subject, but I will restrict myself in this episode. I prepared a few examples of some of the operators which I hope will give some food for thought.

Pre- and post-increment and decrement

These operators increment or decrement the contents of a variable by 1. The pre- and post- effects control whether the operation is performed before or after the value is used.

Here are some examples of the effects:

$ val=87
$ echo "$((++val)) $val"
88 88
$ echo "$((--val)) $val"
87 87
$ echo "$((val++)) $val"
87 88
$ echo "$((val--)) $val"
88 87

Note that the pre- and post-increment and decrement operators need variables, not numbers. That means that the following is acceptable, as we saw:

$ myvar=128
$ echo $((++myvar))
129

However, placing a $ in front of the variable name causes substitution to take place and its contents to be used, which is either not acceptable or leads to unwanted effects:

$ myvar=128
$ echo $((++myvar))
129
$ echo $(($myvar--))
bash: 129--: syntax error: operand expected (error token is "-")

Also be aware that the following expression is not illegal, but is potentially confusing:

$ myvar=128
$ echo $((--$myvar))
128

It substitutes the value of myvar which then has two '-' signs in front of it: --128. The effect is the same as -(-128), in other words, the two minus signs cancel one another. Plenty of scope for confusion!

Unary minus
$ int=16
$ echo $((-int))
-16

This example just turns 16 into minus 16, as you would expect. We already saw this when discussing the pre-decrement operator.

Exponentiation
$ echo $((int**2))
256

Here we compute 162.

More complex expressions with parentheses
$ echo $(( (int**2 + 3) % 5 ))
4

This adds 3 to the result of 162 (259) then returns the remainder after division by 5. We need the parentheses to prevent the % (remainder) operator applying to the 3.

Bitwise shift, bitwise OR
$ echo $((int>>1))
8

$ echo $((int<<1))
32

$ echo $(( (int<<1) | 8 ))
40

$ printf "%#x\n" $(( (int<<1) | 8 ))
0x28
  1. Since 16 is binary 10000, shifting it to the right once returns 1000 which is the binary representation of decimal 8.
  2. Shifting 16 to the left once returns 100000 which is 32 in decimal.
  3. Taking 16 shifted left 1 (32) and binary OR'ing 8 to it is the same as binary 100000 OR 01000 which is 101000, which is decimal 40.
  4. The same calculation printed as hexadecimal is 28, which can be visualised in binary as 0010 1000. The printf format %#x prints numbers in hexadecimal with a leading 0x.
Conditional operator

The conditional operator is similar to the equivalent in C and many other languages but only operates with integer values.

$ myvar=$((int<<3))
$ msg=('under 100' 'between 100 and 200' 'over 200')
$ range=$((myvar>100?$((myvar>200?2:1)):0))
$ echo "myvar=$myvar, range=$range, message: ${msg[$range]}"
myvar=128, range=1, message: between 100 and 200

Here myvar is set to 16 shifted left 3 places, which is the same as multiplying it by 2 3 times, resulting in 128.

We declare an array msg which holds three text strings (index 0, 1 and 2).

Then range is set to the result of a complex expression. If myvar is greater than 100 then the second arithmetic expansion is used which tests to see if myvar is greater than 200. If it is then the result returned is 2, otherwise 1 is returned. If the value of myvar is less than 100 then 0 is returned.

So a value of 0 means "under 100", 1 means "between 100 and 200" and 2 means "over 200". The echo reports the values of myvar and range and uses range to index the appropriate element of the array msg (we looked at array indexing in episode 1684).

$ myvar=$((int<<4))
$ range=$((myvar>100?$((myvar>200?2:1)):0))
$ echo "myvar=$myvar, range=$range, message: ${msg[$range]}"
myvar=256, range=2, message: over 200

Here we set myvar to 16 shifted left 4 places, or in other words 16 times 24, which is 256. We then recalculate the value of range and use the same echo as before.

These are not particularly robust examples of conditional expressions, but hopefully they serve to make the point.

Assignment

Variable assignments may be performed in these arithmetic expressions. The assignment operators also include combinations with arithmetic operators:

$ echo $((x=12#20))
24
$ echo $((x*=2))
48
$ echo $((x%=5))
3

In this example x is set to 2012 ("two zero base 12") which is decimal 24. This is then multiplied by 2 and saved back into x, then the remainder of division by 5 is saved in x.

$ echo $((b=2#1000))
8
$ echo $((b|=2#10))
10

Here the number 10002 ("one zero zero zero base 2") is saved in b, which is decimal 8. This is then bitwise OR'ed with 102 ("one zero base 2") which is 2. The result saved in b is decimal 10 (or 10102).

There is no simple way of printing binary numbers in Bash. If you have difficulty in visualising them, you can use bc as follows:

$ echo "obase=2;$b" | bc
1010

The obase variable in bc defines the output number base.


Manual Page Extracts

EXPANSION

Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, word splitting, and pathname expansion.

The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.

On systems that can support it, there is an additional expansion available: process substitution. This is performed at the same time as tilde, parameter, variable, and arithmetic expansion and command substitution.

Only brace expansion, word splitting, and pathname expansion can change the number of words of the expansion; other expansions expand a single word to a single word. The only exceptions to this are the expansions of "$@" and "${name[@]}" as explained above (see PARAMETERS).

Brace Expansion

See the notes for HPR show 1884.

Tilde Expansion

See the notes for HPR show 1903.

Parameter Expansion

See the notes for HPR show 1648.

Command Substitution

See the notes for HPR show 1903.

Arithmetic Expansion

Arithmetic expansion allows the evaluation of an arithmetic expression and the substitution of the result. The format for arithmetic expansion is:

$((expression))

The old format $[expression] is deprecated and will be removed in upcoming versions of bash.

The expression is treated as if it were within double quotes, but a double quote inside the parentheses is not treated specially. All tokens in the expression undergo parameter and variable expansion, command substitution, and quote removal. The result is treated as the arithmetic expression to be evaluated. Arithmetic expansions may be nested.

The evaluation is performed according to the rules listed below under ARITHMETIC EVALUATION. If expression is invalid, bash prints a message indicating failure and no substitution occurs.


ARITHMETIC EVALUATION

The shell allows arithmetic expressions to be evaluated, under certain circumstances (see the let and declare builtin commands and Arithmetic Expansion). Evaluation is done in fixed-width integers with no check for overflow, though division by 0 is trapped and flagged as an error. The operators and their precedence, associativity, and values are the same as in the C language. The following list of operators is grouped into levels of equal-precedence operators. The levels are listed in order of decreasing precedence.

id++ id--
variable post-increment and post-decrement
++id --id
variable pre-increment and pre-decrement
- +
unary minus and plus
! ~
logical and bitwise negation
**
exponentiation
* / %
multiplication, division, remainder
+ -
addition, subtraction
<< >>
left and right bitwise shifts
<= >= < >
comparison
== !=
equality and inequality
&
bitwise AND
^
bitwise exclusive OR
|
bitwise OR
&&
logical AND
||
logical OR
expr?expr:expr
conditional operator
= *= /= %= += -= <<= >>= &= ^= |=
assignment
expr1 , expr2
comma

Shell variables are allowed as operands; parameter expansion is performed before the expression is evaluated. Within an expression, shell variables may also be referenced by name without using the parameter expansion syntax. A shell variable that is null or unset evaluates to 0 when referenced by name without using the parameter expansion syntax. The value of a variable is evaluated as an arithmetic expression when it is referenced, or when a variable which has been given the integer attribute using declare -i is assigned a value. A null value evaluates to 0. A shell variable need not have its integer attribute turned on to be used in an expression.

Constants with a leading 0 are interpreted as octal numbers. A leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base. If base# is omitted, then base 10 is used. When specifying n, the digits greater than 9 are represented by the lowercase letters, the uppercase letters, @, and _, in that order. If base is less than or equal to 36, lowercase and uppercase letters may be used interchangeably to represent numbers between 10 and 35.

Operators are evaluated in order of precedence. Sub-expressions in parentheses are evaluated first and may override the precedence rules above.