Bash snippet - extglob and scp (HPR Show 2317)

Dave Morriss


Table of Contents

The Problem

Following on from my last show on filename expansion, concentrating on extended patterns and the extglob option, I was asked a question by Jon Kulp in the comment section.

Jon was using ‘ls *(*.mp3|*.ogg)’ to find all OGG and MP3 files in a directory which also held other files. However, when he wanted to copy this subset of files elsewhere he had problems using this expression in an scp command.

Having done some investigations to help solve this I thought I’d put what I found into an HPR episode and share it, and this is the show.

Test Environment

On one of my Raspberry Pis (rpi4) I made some empty test files for the purposes of this show:

$ mkdir scptest
$ touch scptest/{a..c}{00..10}.{mkd,mp3,ogg}
$ ls -x -w 80 scptest/
a00.mkd  a00.mp3  a00.ogg  a01.mkd  a01.mp3  a01.ogg  a02.mkd  a02.mp3  a02.ogg
a03.mkd  a03.mp3  a03.ogg  a04.mkd  a04.mp3  a04.ogg  a05.mkd  a05.mp3  a05.ogg
.
.
.
c05.mkd  c05.mp3  c05.ogg  c06.mkd  c06.mp3  c06.ogg  c07.mkd  c07.mp3  c07.ogg
c08.mkd  c08.mp3  c08.ogg  c09.mkd  c09.mp3  c09.ogg  c10.mkd  c10.mp3  c10.ogg

So, we have made files with the extensions mkd, ogg and mp3 and these are shown with an ls command.

If we move into the directory and use the glob pattern Jon did we see just the mp3 and ogg files:

$ cd scptest/
$ ls -x -w 80 *(*.mp3|*.ogg)
a00.mp3  a00.ogg  a01.mp3  a01.ogg  a02.mp3  a02.ogg  a03.mp3  a03.ogg  a04.mp3
a04.ogg  a05.mp3  a05.ogg  a06.mp3  a06.ogg  a07.mp3  a07.ogg  a08.mp3  a08.ogg
.
.
.
c05.mp3  c05.ogg  c06.mp3  c06.ogg  c07.mp3  c07.ogg  c08.mp3  c08.ogg  c09.mp3
c09.ogg  c10.mp3  c10.ogg

What Works

I ran the following command on rpi4 to copy selected files from the scptest directory to another Raspberry Pi called rpi5 where I have created a directory called test for the purpose. I have copied my ssh key to that machine already so no password is prompted for.

$ scp *(*.mp3|*.ogg) dave@rpi5:test/

a00.mp3                                                          100%    0     0.0KB/s   00:00
a00.ogg                                                          100%    0     0.0KB/s   00:00
a01.mp3                                                          100%    0     0.0KB/s   00:00
a01.ogg                                                          100%    0     0.0KB/s   00:00
.
.
.
c10.mp3                                                          100%    0     0.0KB/s   00:00
c10.ogg                                                          100%    0     0.0KB/s   00:00

All of the requested (empty) files were copied.

What Fails

If I try the equivalent from the other host, pulling the files from rpi4 to rpi5, I don’t get what I might expect:

$ scp dave@rpi4:scptest/*(*.mp3|*.ogg) .
bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: `scp -f scptest/*(*.mp3|*.ogg)'

Running the command again with the -v option we can see that the line ‘scp -f scptest/*(*.mp3,*.ogg)’ is being executed on rpi4 and this is causing the error. The conclusion is that scp itself is doing something that’s not compatible with this expression.

My later investigations revealed that extglob is apparently off when this command is being executed, but more of this anon.

Alternatives

First try - attempting to use extended globs

I found an article about this issue on StackExchange with a very comprehensive (if impenetrable) answer.

The answer points out that scp simply hands the filename (or expression) to the remote machine where it’s interpreted by the local shell. This could be any shell.

The answer suggests that the remote filename could be a command for the remote system, but that doesn’t seem to be the case in my very simple test:

$ scp dave@rpi4:'ls' .
scp: ls: No such file or directory

This is probably too naive to work as it is however.

It is suggested that the following command will work though. Note that the command contains a newline inside the string passed to ‘scp' before the word 'bash`’. This is necessary for the command to work:

$ LC_SCPFILES='scptest/*(*.mp3|*.ogg)' scp -o SendEnv=LC_SCPFILES "dave@rpi4:</dev/null
  bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit" .
a00.mp3                                           100%    0     0.0KB/s   00:00
a00.ogg                                           100%    0     0.0KB/s   00:00
a01.mp3                                           100%    0     0.0KB/s   00:00
a01.ogg                                           100%    0     0.0KB/s   00:00
.
.
.

This does work, though understanding why is a challenge.

A more manageable solution is the following function based on the same idea:

safer_scp() (
  file=$1; shift
  export LC_SCPFILES="${file#*:}"
  exec scp -o SendEnv=LC_SCPFILES "${file%%:*}:</dev/null
    bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit" "$@"
)

You might want to skip this part since it gets into deep deep Bash and scp magic!

This all hinges on the fact that in this case scp works by doing the following:

  1. It connects to the remote machine using the remote username and host name. It does this using ssh, creating a “tunnel” between the two and running a shell at the remote end.

  2. Over the tunnel it issues a command to be run on the remote machine which consists of scp -f FILENAME. The -f option runs scp in “remote” mode. This option is undocumented but can be seen in the source code.

  3. The remote end copies the file (or files) back to the local end. It interprets the filename or glob expression using the shell opened on the remote machine.

The safer_scp function takes advantage of these features. Note that the body of a function can be any compound command. A series of commands enclosed in parentheses is such a compound command, BUT it executes in a sub-shell where the more usual compound command in braces does not. I am not 100% clear why it is written this way but experimentation has shown that without a body in parentheses running the function will disconnect from the remote machine!

In the function the variable ‘file’ is set to the first argument. This is then removed from the function argument list with ‘shift’.

The variable ‘LC_SCPFILES’ is defined, being set to the piece of the contents of the ‘file’ variable following the colon.

The ‘exec’ command runs the rest of the function as a command which replaces the currently executing shell. The command invoked is an ‘scp’ command which passes the environment variable ‘LC_SCPFILES’ to the remote end (using the -o option with ‘SendEnv=LC_SCPFILES’).

The arguments to ‘scp’ are two strings. The first is:

"${file%%:*}:</dev/null
    bash -O extglob -c 'exec scp -f -- \$LC_SCPFILES';exit"

The second argument consists of the remaining arguments to safer_scp ("$@").

The first argument expands variable ‘file’, returning the first part (by removing the colon and everything after it). It then adds a colon and takes input from ‘/dev/null’. This is then followed by a newline.

The rest of the string invokes Bash, setting the ‘extglob’ option with the -O option and reading the following string as a command as specified by the -c option. The command is a further ‘exec’ which runs ‘scp’.

This instance of scp uses the undocumented option -f (as mentioned earlier). This tells scp that it is running as the remote instance.

The -- (double hyphen) is a convention to tell a program that the options have ended. This protects the following filename (in variable LC_SCPFILES) from possibly being interpreted as options.

So, going back to the entire string being handed to the first scp, this does the following:

  • It receives the username and host string (as in dave@rpi4) with a colon at the end. The rest of the remote file specification is /dev/null/ and when this is processed the usual remote scp exits.
  • The part after the newline is then executed. It runs Bash with extglob on and invokes another scp which simulates the one which is normally run - but now guaranteed to be in a Bash shell and with extglob on. This then sends the file or files back to the local end after expanding the expanded glob pattern in variable LC_SCPFILES.
  • The exit after the Bash process ensures the process invoked at the remote end shuts down.

This complex set of events compensates for deficiencies of scp and allows expanded glob patterns to be passed through. However, it’s still error-prone, as will be seen later.

The function does actually work, but it’s so obscure and reliant on what seem like edge conditions or hidden features I don’t think it should be used.


Second try - just use simpler globs

If the requirement is to use an extended glob expression in the solution then this one will not suit. However, if the goal is to copy files, then it will!

$ scp dave@rpi4:scptest/*.{mp3,ogg} .
a00.mp3                                           100%    0     0.0KB/s   00:00
a01.mp3                                           100%    0     0.0KB/s   00:00
a02.mp3                                           100%    0     0.0KB/s   00:00
a03.mp3                                           100%    0     0.0KB/s   00:00
.
.
.

This does the job. The expression passed to the remote end is s simple glob pattern (with a brace expansion) and this does not rely on extglob being on at the remote end. It may not work if the glob uses Bash-specific patterns and the remote account uses a shell other than Bash though.

Third try - use ‘rsync’ with a filter

I have never encountered this issue with ‘scp’ myself when moving files around between servers. I do a lot of file moving both for myself and as an HPR “janitor”. The reason I haven’t seen it is because I usually use ‘rsync’.

There is a way of using rsync to achieve what was wanted here, though it does not use extended glob patterns.

The ‘rsync’ command can be told to copy files from a directory, including those that match a pattern and to exclude the rest. This is done with filters.

The ‘rsync’ command is very powerful and hard to master. In fact there is scope for a whole HPR series on its intricacies. However, we’ll just restrict ourselves to the use of filters here to solve this problem.

Here’s what I do:

  1. Make a filter stored in a file
  2. Run ‘rsync’ with the filter

Making a filter file

I created a file called ‘.rsync_test’:

$ cat .rsync_test
+ *.mp3
+ *.ogg
- *

Lines beginning with ‘+’ are rules for inclusion. Those beginning with ‘-’ are exclusions. The order is significant.

These rules tell ‘rsync’ to include all files ending ‘.mp3’ and ‘.ogg’. Anything else is to be excluded.

Running rsync with the filter

The command would be:

$ rsync -vaP -e ssh --filter=". .rsync_test" dave@rpi4:scptest/ test/
receiving incremental file list
./
a00.mp3
              0 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=65/67)
a00.ogg
              0 100%    0.00kB/s    0:00:00 (xfr#2, to-chk=64/67)
a01.mp3
              0 100%    0.00kB/s    0:00:00 (xfr#3, to-chk=63/67)
a01.ogg
              0 100%    0.00kB/s    0:00:00 (xfr#4, to-chk=62/67)
a02.mp3
              0 100%    0.00kB/s    0:00:00 (xfr#5, to-chk=61/67)
a02.ogg
              0 100%    0.00kB/s    0:00:00 (xfr#6, to-chk=60/67)
.
.
.
c10.mp3
              0 100%    0.00kB/s    0:00:00 (xfr#65, to-chk=1/67)
c10.ogg
              0 100%    0.00kB/s    0:00:00 (xfr#66, to-chk=0/67)

sent 1,310 bytes  received 3,809 bytes  10,238.00 bytes/sec
total size is 0  speedup is 0.00

The options are:

-vaP        select verbose mode (v), archive mode (a, shorthand for many
            other options) and show progress (P)

-e ssh      use ssh to transfer files

--filter=". .rsync_test"    use a filter

The filter expression is ‘. .rsync_test’ where the leading ‘.’ is short for ‘merge’ and tells rsync to read filter rules from the file.

The arguments are:

dave@rpi4:scptest/          the remote host and directory to copy from
test/                       the local directory to copy to

It is a good idea to use the ‘-n’ option when setting up such a command, to check that everything works as it should, before running it for real. This option turns on ‘dry-run’ mode where the process is run without actually copying anything.

You don’t have to use the filter file. The following command does the same:

$ rsync -vaP -e ssh -f "+ *.mp3" -f "+ *.ogg" -f "- *" dave@rpi4:scptest/ test/

Here ‘-f’ is the short form of ‘--filter’.

I prefer the filter file myself.

Caution

The ‘rsync’ tool is a beast and needs careful treatment! Things to be aware of if you want to go further than this simple guide:

  • rsync’ will traverse a directory hierarchy (it’s recursive)
  • the presence of a trailing slash on the source directory makes it transfer the contents of the directory. Without it the directory itself and its contents will be copied
  • rsync’ compares source and destination files. If a file already exists at the destination it will not copy it. However, if the source copy is different from the destination copy ‘rsync’ will transfer differences

Another digression

Since I am already well off the rails with this episode I thought I’d go looking at another area commented on by clacke in the context of show 2293.

You are probably aware that file names containing spaces (and other unusual characters) can be difficult to use with commands and programs in Unix and Linux. The question was how scp would behave. I thought I’d do some experimentation with filenames containing spaces.


You might want to skip this part since it gets into more of the guts of scp

I created a file on rpi4 called “what a horrible filename.txt” and tried to pull it across to rpi5. In each case I used the -v option to scp in order to see all the details of what was going on. Be warned that this generates a lot of output.

  1. scp -v dave@rpi4:'scptest/what a horrible filename.txt' test/
    This normally is one way filenames with spaces can be dealt with but it fails here because the quotes are removed in the transfer.

  2. scp -v dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/
    Another way of protecting spaces is to escape each of them with a backslash. This time I have used these inside the string. This works. The quotes are removed but the backslashes remain to protect the spaces.

  3. scp -v dave@rpi4:"scptest/what\ a\ horrible\ filename.txt" test/
    Double quotes are equivalent to single ones in this context, so this works in the same way as example 2.

  4. scp -v dave@rpi4:scptest/what\ a\ horrible\ filename.txt test/
    This is normally another way that spaces can be protected, but this one fails because the backslashes are removed in the first pass. It is logically equivalent to example 1.

  5. scp -v dave@rpi4:scptest/what\\ a\\ horrible\\ filename.txt test/
    Since the scp process removes quotes and backslashes first time round, we’ll try doubling them. This does not work because the remote end gets the filename with literal backslashes and rejects it.

  6. scp -v dave@rpi4:scptest/what\\\ a\\\ horrible\\\ filename.txt test/
    Since the last test failed we’ll try trebling the backslashes. This works - rather counter-intuitively I find.

  7. scp -v dave@rpi4:'"scptest/what a horrible filename.txt"' test/
    Enclosing one sort of quotes in another should work, and indeed it does. Nested quotes are another solution. However, they must be different types of quotes - single inside double or vice versa.

You might wonder how the safer_scp function we saw earlier deals with such filenames. I could not get it to transfer the file using any of these formats.

However, by modifying it slightly (removing the backslash in front of $LC_SCPFILES) it worked:

$ safer_scp() (
>   file=$1; shift
>   export LC_SCPFILES="${file#*:}"
>   exec scp -o SendEnv=LC_SCPFILES "${file%%:*}:</dev/null
>     bash -O extglob -c 'exec scp -f -- $LC_SCPFILES';exit" "$@"
> )

$ safer_scp dave@rpi4:'scptest/what\ a\ horrible\ filename.txt' test/
what a horrible filename.txt                  100%    0     0.0KB/s   00:00

I wasn’t clear what the backslash was for anyway!

This modified function passed all of the tests of plain filenames and glob patterns which I tried. I am still not sure that I’d use it myself though.


Conclusion

The scp command is built on the original BSD Unix command rcp. I don’t know if this is why it has the quirks we have looked at here, but it does seem to suffer some deficiencies. However, I find it useful and usable most of the time.

Using rsync solves a number of the problems scp shows, though it has its own shortcomings. I think a good working knowledge of scp and rsync is important in a Sysadmin’s toolkit and can be of great use to all Unix/Linux users.