Response to show 2720 (HPR Show 2736)

Some suggestions on how to improve a Bash script

Dave Morriss


Table of Contents

Introduction

On January 4th 2019 Ken Fallon did a great show entitled hpr2720 :: Download youtube channels using the rss feeds where he presented a Bash script called youtube-rss.bash for managing YouTube downloads through RSS feeds.

Ken said he welcomed constructive feedback

When I see a Bash script these days I usually find myself looking for ways to rewrite it to make it fit in with what I have been learning while doing my Bash Tips sub-series. Either that or I find it’s got some better ideas than I’ve been using which I have to find out about.

I also spend time going over my own old scripts (I was writing them in the 1990’s in some cases) and trying to incorporate newer Bash features.

Suffice it to say that I spotted some areas for improvement in Ken’s script and thought this might be the way to share my thoughts about them. We’re low on shows as I write this, so that gave me more motivation to make a show rather than add a comment or send Ken an email.

Apology: I’m still suffering from the aftermath of some flu-like illness so have had to edit coughing fits out of the audio at various points. If you detect any remnants then I’m sorry!

General issues

There are a few uses of while loops in the script that I would be inclined to rewrite, but as it stands the script does what is wanted without these changes.

I loaded the script into Vim where I have the ShellCheck tool configured to examine Bash scripts. It found many issues, some fairly trivial, but a few were a bit more serious.

Click on the thumbnail to get an idea of what ShellCheck reported.

Vim with ShellCheck

Here are some of the reported issues:

  • [34] ShellCheck doesn’t like "${logfile}".$(/bin/date +%Y%m%d%H%M%S) but would be happy with "${logfile}.$(/bin/date +%Y%m%d%H%M%S)". It’s not clever enough to know that the date call will not generate spaces.

  • [39,48] Using read without the -r option means that any backslash characters in the input will not be handled properly so ShellCheck reports this.

  • [39,48] Personally, as alluded to above, I now try to avoid loops of the form:

      command_generating_data | while read -r var; do
          do_things
      done

    That is because the loop body is run in a sub-process which cannot communicate back to the main script.

    I would much rather write it as:

      while read -r var; do
          do_things
      done < <(command_generating_data)

    because the loop body can affect variables in the main script if needed.

In Ken’s script these are not serious issues since these are ShellCheck warnings and no loop needs to pass variables to the main script. However, as far as the loops are concerned, it would be all too easy to enhance them in the future to pass back values forgetting that it will not work. I have made this mistake myself on more than one occasion!

Specifics

I want to highlight a few instances which I would be strongly inclined to rewrite. I have referred to them by line within the youtube-rss.bash downloadable from show 2720.

Line 50

  if [ "$( grep "${thisvideo}" "${logfile}" | wc -l )" -eq 0 ]

This if command uses grep to search $logfile for $thisvideo and wants to know if there is (at least) one match, so the output is passed to wc to count the lines and the resulting number from the command substitution is checked to see if it is zero – i.e. there was no match.

However, since grep returns a true/false answer for cases such as this, the following would do the same but simpler:

  if ! grep -q "${thisvideo}" "${logfile}"

This time grep performs the same test but its result is reversed by the '!'. The -q option tells it not to produce any output.

Line 68

    if [[ ! -z "${skipcrap}" && $( echo ${title} | egrep -i "${skipcrap}" | wc -l ) -ne 0 ]]

In the script shown in the notes for 2720 there’s a skipcrap variable, but this has been commented out, so ShellCheck objects to this of course. I re-enabled it for testing. In general though this if command is unnecessarily convoluted. It can be rewritten thus:

    if [[ -n "${skipcrap}" ]] && echo "${title}" | grep -E -q -i "${skipcrap}"

The changes are:

  • Rather than checking if the length of string skipcrap is zero then negating it, better to use the non-zero test from the start -n.

  • Since this -n test needs to be in [] or [[]] but the rest does not we organise things that way.

  • The second half of the test after && can be a pipeline since the result of the final part is what is considered in the test. In this case non-standard egrep is replaced by grep -E and the test is performed quietly with only the true/false result being of importance.

A further change might be to do away with echo "${title}" | grep -E -q -i "${skipcrap}" and use a Bash regular expression. Here grep is being used as a regular expression tool, comparing title with skipcrap. The latter is a regular expression itself (since that’s what grep -E needs), so everything is ready to go:

    if [[ -n "${skipcrap}" && $title =~ $skipcrap ]]

This passes the ShellCheck test and when I run it from the command line it seems to work:

$ skipcrap="fail |react |live |Best Pets|BLOOPERS|Kids Try"
$ title='The Best Pets in the World'
$ if [[ -n "${skipcrap}" && $title =~ $skipcrap ]]; then echo "Crap"; else echo "Non-crap"; fi
Crap

The downside is that the test is case-sensitive whereas the grep version used -i which made it case-insensitive. This can be overcome – in a slightly less elegant way perhaps – by forcing everything to lowercase for the test:

    if [[ -n "${skipcrap}" && ${title,,} =~ ${skipcrap,,} ]]

Line 88

  cat "${logfile}_todo" | ${youtubedl} --batch-file - --ignore-errors --no-mtime --restrict-filenames --format mp4 -o "${savepath}"'/%(uploader)s/%(upload_date)s-%(title)s⋄%(id)s.%(ext)s'

Shellcheck's complaint about this is that it’s a useless cat. Here’s the full message:

SC2002: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead.

The commands can be rewritten as one of the following:

  ${youtubedl} --batch-file - --ignore-errors --no-mtime --restrict-filenames --format mp4 -o "${savepath}"'/%(uploader)s/%(upload_date)s-%(title)s⋄%(id)s.%(ext)s' < "${logfile}_todo"

  < "${logfile}_todo" ${youtubedl} --batch-file - --ignore-errors --no-mtime --restrict-filenames --format mp4 -o "${savepath}"'/%(uploader)s/%(upload_date)s-%(title)s⋄%(id)s.%(ext)s'

See the Wikipedia article on cat for more information (also UUOC and demoggification).

Conclusions

Bash is powerful but does some things in an obscure way.

Shellcheck is both a very helpful tool, helping to catch scripting errors, but can also be something of an irritant when it nags about things. I am preparing another HPR episode about its use with Vim as well as the possible use of other checkers.

Apologies to Ken if it seemed as if I was making excessive criticisms of his scripts. What I intended was constructive criticism of course.