Introduction to sed - part 3 (HPR Show 1997)

Dave Morriss


Table of Contents

Introduction

In the last episode we looked at sed at a more advanced level. We looked at all of the command-line options which we will cover in this series and examined the s command in much more detail. We covered many more details of regular expressions.

In this episode we will look at more sed commands and how to use them.

Commands

So far we have concentrated on the s command. There are many more commands within sed. Most commands operate on lines or ranges of lines. We will look first at how such line addressing is achieved.

Selecting lines

The following table summarises the available addressing methods for reference; longer explanations are then given.

Address Explanation
number Matches the numbered line in the input
first~step Matches every stepth line starting with line first
$ Matches the last line of the input
/regexp/ Selects any line matching the regexp
addr1,addr2 Selects lines from the first to the second address inclusively

We will look at these addresses in detail, but to give examples as we go we need to look ahead at one of the commands we'll be examining in more detail later. We'll use the p command (as opposed to the flag we saw in the last episode). This just prints the line or lines we have addressed. This only makes sense if the -n option is used to prevent the auto-printing of the non-matching lines.

Selecting a line by number

This form of address justs consists of a number, and matches the line with that number in the input stream. So, to print just the first line of a file the following would suffice:

$ sed -ne '1p' sed_demo2.txt

Remember that normally sed treats all of the input files as one continuous stream, so that line number will match just once.

If sed is run with either of the -i or the -s options, multiple input files are treated separately. In this example, there will be two instances of line number 5:

$ sed -sne '5p' sed_demo1.txt sed_demo2.txt
HPR" for more information.
HPR" for more information.

Selecting every nth line starting at a number

This is a GNU extension which allows the addressing method to specify a starting line number and the size of the step to the next line number. Lines are selected by adding the starting point to n times the size of the step.

So, 1~2 means line 1, line 1+1*2=3, line 1+2*2=5, and so on for every odd numbered line.

Specifying 2~3 means line 2, line 2+1*3=5, line 2+2*3=8, and so on for every third line.

There is an example of using this addressing form with one of the demonstration files below in Example 2.

Selecting the last line of the file

The '$' symbol as an address matches the last line of the file, or more accurately, the last line of the stream of data read by sed, which when presented with multiple files means the last line of the last file.

As with the discussion for single line addressing, if sed is run with either of the -i or the -s options, multiple input files are treated separately and every file will have a last line.

See Example 3 below for a demonstration of this type of addressing.

Selecting by regular expression

An address of the form '/regexp/' is a regexp which matches various lines in the input stream.

$ sed -ne '/HPR/p' sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
shows every weekday Monday through Friday. HPR has a long lineage going back to
HPR" for more information.
What differentiates HPR from other podcasts is that the shows are
Alternative delimiters

Normally the delimiter for a regexp is the '/' character, and we have used this exclusively throughout the series so far. If the regexp needs to contain this delimiter then it needs to be preceded by a backslash.

However, it is possible to use alternative delimiters, which is useful in this type of circumstance. The first instance of the alternative delimiter must be preceded by a backslash.

The following regexp examples have the same effect:

/etc\/passwd/
\#etc/passwd#

This is particularly useful when the regexp contains multiple slashes which all need to be escaped.

Note: In the case of the s command it is not necessary to precede the first alternative delimiter with a backslash. Indeed, if a backslash is used, an error is produced, and a backslash seems to be a valid delimiter! This does not seem to be documented, but it is presumably because in the s command the 's' is expected to be followed by a delimiter, whereas a regexp delimiter is harder for a parser to recognise. The following examples all work except the one followed by an error message:

$ sed -ne 's|HPR|Banana|p' sed_demo1.txt

$ sed -ne 's\|HPR|Banana|p' sed_demo1.txt
sed: -e expression #1, char 15: unterminated `s' command

$ sed -ne 's\HPR\Banana\p' sed_demo1.txt

$ sed -ne 'spHPRpBananapp' sed_demo1.txt
Empty regular expressions

Another feature of regular expressions we have not looked at before is the case where the regexp in an address or an s command is empty. This is sed's way of representing the last regular expression that matched.

The following example uses this feature in three s commands. The first changes the first space on each line to an asterisk, the second changes the second space to an underscore and the third changes the third space to a plus sign. The second and third s commands have empty regexps so they use the previous matching one. The example shows the effect on the first line of the file:

$ sed -e 's/ /*/;s//_/;s//+/' sed_demo1.txt
Hacker*Public_Radio+(HPR) is an Internet Radio show (podcast) that releases

The GNU Manual warns that the use of modifiers in these cases can be problematic:

Note that modifiers to regular expressions are evaluated when the regular
expression is compiled, thus it is invalid to specify them together with
the empty regular expression.
Modifiers

There are two modifiers available which change the way in which regular expressions in addresses behave. These are both GNU extensions.

We have already seen the I and i flags in the context of the s command which make the regexp case insensitive. There is also an I modifier for address regexps, though there is no i equivalent. This modifier has the same effect:

$ sed -ne '/hpr/Ip' sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
shows every weekday Monday through Friday. HPR has a long lineage going back to
HPR" for more information.
What differentiates HPR from other podcasts is that the shows are

The second modifier is M which affects text in the pattern space containing multiple newlines. We will not be looking at this in detail in this episode, but will examine it in the next.

Selecting an address range

The address range allows a sed script to match the lines in the input data from a starting position to (and including) an ending position. The range is written as two addresses (of the types we have seen so far) separated by a comma:

$ sed -ne '1,3p' sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
shows every weekday Monday through Friday. HPR has a long lineage going back to
Radio FreeK America, Binary Revolution Radio & Infonomicon, and it is a direct

This simply prints lines 1 to 3. Note that, as before, we used the -n option to prevent automatic printing.

$ sed -ne '/^We/,$p' sed_demo1.txt
We also allow for a series of shows so that host(s) can go into more
detail on a topic.

This example prints from a line beginning with 'We' up to the end of the file (the next line in this case).

$ sed -ne '/^What/,/^produced/p' sed_demo1.txt
What differentiates HPR from other podcasts is that the shows are
produced by the community - fellow listeners like you. There is no

This example prints from a line which begins with 'What' and ends with a line beginning with 'produced' (the next line in fact).

There are some GNU sed address range extensions which can be found in the GNU Manual. We will not be looking at these in this series.

Negating an address match

All of the address types we have seen in this section can be "negated". For example, using a line number and negating it tells sed to match all lines but the selected line. Negation is achieved by adding a '!' character after the address.

See Example 1 below for an example of line number negation.

The addressing form matching every nth line starting at a specific line can also be negated. So, for example, the following command would print all the odd-numbered lines in this 13-line file:

$ sed -ne '2~2!p' sed_demo1.txt

Whereas without negation it would print all the even-numbered lines.

Negating the '$' (last line of file) means all lies except the last line. Negating a regular expression means all lines that do not match. So, for example, the following example will display all lines that do not contain a capital letter:

$ sed -ne '/[A-Z]/!p' sed_demo1.txt

This next example matches the same lines, but rather than just printing them it replaces every first letter of a word with the capital equivalent:

$ sed -ne '/[A-Z]/!s/\b\w/\u&/gp' sed_demo1.txt
Restrictions On How Long The Show Can Be, Nor On The Topic You Can
Detail On A Topic.

This emphasises how addresses can be associated with many of the commands that sed uses. Note that the 'p' used here is the flag to the s command, and, as such it will only print lines on which substitution has taken place.

If negation is used with an address range, then it applies to the range. It is not possible to negate the individual addresses in the range. The effect is to match all lines outside the range. So, the following example, instead of matching the two lines in the file as un-negated form did, will match the rest of the file:

$ sed -ne '/^What/,/^produced/!p' sed_demo1.txt

Comments in scripts

It is possible to add comments to a sed script. This makes most sense when the sed commands are in a file. Just like in many scripting and programming languages the '#' character begins a comment, and the comment continues to the end of the line (to the newline).

As with many other scripting languages running under Unix or Linux, if the command file begins with a specially formatted comment line and the file is made executable, the file may be directly invoked from the command line:

$ cat > demo.sed
#!/bin/sed -f
# Be 1337
s/Hacker/H4x0r/g
CTRL-D
$ chmod u+x demo.sed
$ ./demo.sed sed_demo1.txt
H4x0r Public Radio (HPR) is an Internet Radio show (podcast) that releases

In this example the cat command is used to redirect what is typed on STDIN into a file. The end of the data on STDIN is signalled by pressing CTRL-D. The file contains the comment which makes it a sed script, and another indicating what it does, followed by a single s command. The chmod command makes the resulting file executable, and then it is invoked to process the file sed_demo1.txt. (Note, only the first line of output is shown here.)

The Quit command

This command, which consists of a lower-case q, causes sed to exit. It can be preceded by a single address meaning "exit when this line is reached". In GNU sed the command can be followed by an exit code.

The current pattern space is printed unless the -n option was selected.

For example, as a variant of the last example, the following script would edit the first three lines then quit:

$ sed -ne 's/Radio/R4d10/gp;3q' sed_demo1.txt
Hacker Public R4d10 (HPR) is an Internet R4d10 show (podcast) that releases
R4d10 FreeK America, Binary Revolution R4d10 & Infonomicon, and it is a direct

Of course, the same effect can be achieved by adding an address range to the s command:

$ sed -ne '1,3s/Radio/R4d10/gp' sed_demo1.txt

In general, the use of q to quit has the advantage that it stops processing and exits. If sed was reading a very large file and the work it was asked to do was completed relatively early in the file, stopping it from reading the rest of the file might be advantageous in terms of speed.

Delete the pattern space

This command, which consists of a lower-case d, deletes the pattern space and causes sed to start the next cycle by reading the next line.

The command may be preceded by any of the various types of addresses. The effect it has is to omit the lines in question from the output stream.

For example, to omit (delete) all lines beginning with 'H' the following command would suffice:

$ sed -e '/^H/d' sed_demo1.txt

Alternatively, to delete all lines that do not begin with 'H':

$ sed -e '/^H/!d' sed_demo1.txt

Note how we have negated the address match here.

This command is equivalent to the p flag used with the s command, but is stand-alone. It consists of a lower-case p which may be preceded by any of the various types of addresses.

The command is only useful with the -n option to the sed command. Without this option it just prints the relevant line(s) again.

For example, to print lines 1-5 of a file:

$ sed -ne '1,5p' sed_demo1.txt

This is equivalent to the command: head -5 sed_demo1.txt

The n command is mainly relevant to more complex scripts. It is rare to see examples of its use in the simpler scripts we have seen so far. We will start looking at these and other less commonly used sed commands in the next episode.

The following text is from the GNU Manual:

If auto-print is not disabled, print the pattern space, then, regardless,
replace the pattern space with the next line of input. If there is no more
input then sed exits without processing any more commands.

An example of using the n command is in the next section.

Grouping commands

If it is necessary to perform several sed commands on a given input line or set of lines, for example, then there needs to be a means of grouping them together. This can be achieved by enclosing the commands between '{' and '}' characters.

The following example shows the same address range we have used before, but for every matching line a group of commands is performed. An s command adds a greater than sign and a tab at the start of the line, then a p command prints the result:

$ sed -ne '/^What/,/^produced/{s/^/>\t/;p}' sed_demo1.txt
>       What differentiates HPR from other podcasts is that the shows are
>       produced by the community - fellow listeners like you. There is no

The next example is a fairly useless demonstration. It shows a command file with two groups associated with regular expression addresses. The first regexp matches a line that contains the letter 'a' followed by 'b' within 5 characters and 'c' within 5 characters. The second is similar but matches only 'a' and 'b'.

The first group uses four s commands to mark the line with 'G1:' to show it was processed by group 1 and to highlight all the 'abc' characters, print the result and move to the next line. The second group does the same but with 'G2:' and for 'a' and 'b'.

/a.\{1,5\}b.\{1,5\}c/{
    s/^/G1: /
    s/a/[a]/g
    s/b/[b]/g
    s/c/[c]/g
    p
    n
}
/a.\{1,5\}b/{
    s/^/G2: /
    s/a/[a]/g
    s/b/[b]/g
    p
    n
}

The n commands here ensure that the same line is not processed by both of the groups.

Running this, assuming the commands are in the file demo2.sed, we get:

$ sed -nf demo2.sed sed_demo1.txt
G2: restrictions on how long the show c[a]n [b]e, nor on the topic you c[a]n
G1: wh[a]t topi[c]s h[a]ve [b]een [c]overed so f[a]r just h[a]ve [a] look [a]t our Ar[c]hive.

The command file is included so you can experiment with it if you want.

Examples

Example 1

In this example there are two ways of printing all but line 1 of a file:

$ sed -ne '1!p' sed_demo1.txt

$ sed -ne '2,$p' sed_demo1.txt

In the first case the address is '1!' meaning all lines but line number 1.

The alternative way is to specify an address of '2,$', meaning line 2 to the end of the file.

Example 2

This time we use the first~step form of addressing:

$ nl -w3 -ba sed_demo1.txt | sed -ne '1~5p'
  1  Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
  6
 11  what topics have been covered so far just have a look at our Archive.

The nl command numbers lines in a file. The -w3 option sets the width of these numbers to 3 columns, and -ba requests that even blank lines be numbered (the default is not to do this). We then feed the numbered file to sed and ask for the first line and every 5th line after to be printed.

Note that if we used '1~5!p', negating the addressing we would see all lines except 1, 6 and 11.

Example 3

Here we demonstrate the $ address - last line of input:

$ sed -ne '$p' sed_demo1.txt sed_demo2.txt
contribute one show a year.

$ sed -sne '$p' sed_demo1.txt sed_demo2.txt
detail on a topic.
contribute one show a year.

In the first case we just get the last line of the second file because sed sees the two files as continuous stream.

In the second case, on the other hand, because the -s option has been included, sed sees each file as separate so we get the last line of each.

Example 4

The regexp form of addressing is demonstrated here:

$ sed -ne '/long/p' sed_demo1.txt
shows every weekday Monday through Friday. HPR has a long lineage going back to
restrictions on how long the show can be, nor on the topic you can
cover as long as they "are of interest to Hackers". If you want to see

The regexp 'long' is enclosed in (forward) slash characters as we have seen in all of the examples so far. There will be times when it is more convenient to change the delimiter, and as we have seen, preceding the alternative character with a backslash (\) is required:

$ sed -ne '\#long#p' sed_demo1.txt

Whatever is used as a delimiter, if it occurs in the regexp it needs to be preceded by a backslash.

Example 5

The address range can use any of the address we have discussed. A start and end address are separated by a comma:

$ sed -ne '1,/hpr/Ip' sed_demo1.txt
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases
shows every weekday Monday through Friday. HPR has a long lineage going back to

Here we have started at line number 1 and continued to the next line containing 'HPR'. We specified it in lower case but qualified it with a capital 'I' to signal case-insensitivity.

$ sed -ne '/^what/I,/^PRODUCED/Ip' sed_demo1.txt
What differentiates HPR from other podcasts is that the shows are
produced by the community - fellow listeners like you. There is no
what topics have been covered so far just have a look at our Archive.
We also allow for a series of shows so that host(s) can go into more
detail on a topic.

This example indicates that the I modifier can be applied to both regular expressions in an address range. Note how the '^what' regexp matches twice and '^PRODUCED' matches once. So, the first two lines are from the first match and the last three are from the second match.

The following way of doing the same thing might make this clearer:

$ awk '{printf "%-75s (%d)\n",$0,NR}' sed_demo1.txt | sed -ne '/^what/I,/^PRODUCED/Ip'
What differentiates HPR from other podcasts is that the shows are           (7)
produced by the community - fellow listeners like you. There is no          (8)
what topics have been covered so far just have a look at our Archive.       (11)
We also allow for a series of shows so that host(s) can go into more        (12)
detail on a topic.                                                          (13)

Here an awk command is used to add line numbers to the ends of the lines (where they are not in the way), before passing them to sed. This lets you see that lines 7 and 8 were part of the first address range and 11-13 were the second range. The second range did not find an instance of '^PRODUCED' in any form by the time the last line (13) was reached.

Example 6

This example is a further development of the various corrections we applied to the file sed_demo2.txt in the earlier episodes.

$ cat example_6.sed
/^ *$/!{
    s/is no/are no/
    s/topic\b/topics/
    s/"are of/are "of/
    s/(like/(such as/
}

The file contains a single group of commands controlled by an address using a regular expression. The regexp matches all lines in the file which are not blank (contain nothing but zero or more spaces).

The contents of the group are all s commands which modify the various grammatical errors in the text we have already seen and a few others.

The commands can be invoked thus:

$ sed -f example_6.sed sed_demo2.txt

This file of sed commands is available on the HPR site.

Example 7

In this example we build an executable sed script in a file:

$ cat > example_7.sed
#!/bin/sed -nf
/^.\{75,80\}$/{
    s/$/     /
    s/^\(.\{80\}\).*/|\1|/
    p
}

The script consists of a single regexp address which controls a group of three commands. The regexp matches any line which contains between 75 and 80 characters. The first member of the group is an s command that adds 5 spaces to the end of each line to make sure each line is at least 80 characters long. The second s command matches the first 80 characters and replaces them with the captured characters preceded and followed by a vertical bar. Characters beyond 80 are discarded. The third command is a p which prints the edited line. The initial comment line starting with the '#!' characters sets the -n and -f options ensuring that only matching lines will be printed.

The created file needs to made executable and then it can be invoked as shown below:

$ chmod u+x example_7.sed
$ ./example_7.sed sed_demo1.txt
|Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases     |
|shows every weekday Monday through Friday. HPR has a long lineage going back to |
|Radio FreeK America, Binary Revolution Radio & Infonomicon, and it is a direct  |
|continuation of Twatech radio. Please listen to StankDawg's "Introduction to    |

This executable sed script is available on the HPR site.