sed is an editor which expects to read a stream of text, apply some action to the text and send it to another stream. It filters and transforms the text along the way according to instructions provided to it. These instructions are referred to as a
The name "sed" comes from Stream Editor, and
sed was developed from 1973 to 1974 as a Unix utility by Lee E. McMahon of Bell Labs. GNU
sed added several new features including better documentation, though most of it is only available on the command line through the
info command. The full manual is of course available on the web.
sed command is usually invoked with a
sed script and an input file on the command line. You might see:
$ sed -e 's/old/new/' infile > outfile
In this example the
-e introduces the
sed script which is enclosed in single quotation marks. The file
infile is read and edited. The result is written to standard output which in this case is being redirected to a file called
In this episode the
sed examples are often being applied to a small file of text, containing the following lines copied from the "about" page on the HPR site:
Hacker Public Radio (HPR) is an Internet Radio show (podcast) that releases shows every weekday Monday through Friday. HPR has a long lineage going back to Radio FreeK America, Binary Revolution Radio & Infonomicon, and it is a direct continuation of Twatech radio. Please listen to StankDawg's "Introduction to HPR" for more information. What differentiates HPR from other podcasts is that the shows are produced by the community - fellow listeners like you. There is no restrictions on how long the show can be, nor on the topic you can cover as long as they "are of interest to Hackers". If you want to see what topics have been covered so far just have a look at our Archive. We also allow for a series of shows so that host(s) can go into more detail on a topic.
sed_demo1.txt is available on the HPR site.
If the input file is missing
sed expects its input to come from standard input so you might see a pipeline such as:
$ wc -l sed_demo1.txt | sed -e 's/ .*$//'
wc command counts the lines in
sed_demo1.txt and normally reports the number and the filename:
$ wc -l sed_demo1.txt 13 sed_demo1.txt
We remove the filename using
sed leaving just the number - 13. We'll be looking at how this
sed example works later.
wc the way shown below is a simpler way of solving this problem:
$ wc -l < sed_demo1.txt 13
Some of the most frequently used options to the
sed command are:
sedcommands to be executed (the
sed"script"). There can be multiple such options.
Defines a file of
sedcommands. There can be multiple files, and these can be combined with scripts on the command-line as well.
Displays help information and exits
--file option is given, then the first non-option argument is taken as the
sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read.
How sed works
We will just look at the basics of how
sed uses commands to process incoming data in this episode. We will look into this subject in more depth in later episodes.
As mentioned under Options
sed takes in commands or scripts from the command line or from files, and stores them.
It then processes the data it has been given through input files or piped to it on STDIN. It reads this input one line at a time, placing it in what is referred to as the pattern space.
sed runs the saved commands on the pattern space. The full range of available commands is such that they can be conditional, but we'll leave these details until a later episode. The commands may change the data in the pattern space.
Once all the commands have been executed the contents of the pattern space are printed, the pattern space cleared and the next line is read.
The printing of the pattern space is the default behaviour but can be overridden as we will see in a later episode.
Simple sed scripts (the s command)
sed command is the s (substitute) command. It has the structure:
Its purpose is to look for a pattern (
REGEXP) and, if found, to replace it (with
REPLACEMENT). The real power of
sed (and other parts of Linux and Unix) is in the type of pattern called a regular expression (regexp for short).
We need to look at the fundamentals of regular expressions to appreciate the sophistication of what can be done.
FLAGS part is used to modify the behaviour of the command. We'll look at one commonly-used flag in this episode but will reserve the full range for later episodes.
Simple Regular Expressions
Regular expressions are patterns which are used to match a string. We will begin by looking at some of the simplest forms.
A regular expression is a sort of language in which certain characters have special meanings. The following table shows some of the simpler meta characters used by
sed. We will look into these in more detail in a later episode.
|any character||A single ordinary character matches itself|
|.||Matches any character|
|*||Matches a sequence of zero or more instances of the preceding item|
|[list]||Matches any single character in list: for example, [aeiou] matches all vowels|
|[^list]||A leading '^' reverses the meaning of list, so that it matches any single character not in list|
|^||Matches the beginning of the line (anchors the search at the start)|
|$||Matches the end of the line (anchors the search at the end)|
Simple character matching
The simplest form of match is where a particular sequence of characters is being searched for. So, the regexp 'abc' matches any string which contains the characters 'abc' in that order.
This will find the first occurrence of 'abc' and will change it to 'def'.
Matching arbitrary characters
Using the '.' (dot) character, which matches any character, we could search and change 'abc' or 'aac' of any other three character string beginning with 'a' and ending with 'c' like so:
If it is necessary to indicate an actual '.' character then it needs to be escaped by preceding it with a '\' (backslash) character. This indicates that its special regexp meaning is not to be used in this instance.
Zero or more of the preceding
Using the '*' character we can match sequences of variable length. So, if it is necessary to match 'bc', 'abc', 'aabc' or 'aaabc', for example, then the following could be used:
What this indicates is that the 'a' can occur zero or more times, followed by the 'bc'. So, the '*' indicates that we are searching for zero or more instances of the preceding item.
If it is necessary to indicate an actual '*' character then it needs to be escaped by preceding it with a '\' (backslash) character. This indicates that its special regexp meaning is not to be used in this instance.
Matching characters in or not in a set
Using the '[list]' expression we can match one of the characters in the given list. So, for example to match 'c' followed by any vowel, followed by 't' and replace it by 'dog' we could use:
This will find all instances of 'cat', 'cet', 'cit', 'cot' and 'cut' and will replace them with 'dog'.
The other form of this expression '[^list]' matches any character not in the given list.
This is a common type of expression used in
sed and elsewhere that you might find regular expressions. Here we are matching an open parenthesis followed by any characters which are not a close parenthesis followed by a close parenthesis. We replace what we find by the text '(example 1)'. This regexp will match any number of enclosed characters including zero. Note that the open and the close parentheses must be on the same line in this example. Of course,
sed is a line-orientated editor.
The list can be simply a list of characters as we have seen, but it can also be a range such as
0-9 meaning all the digits from
9 inclusive. So this is a way of specifying an arbitrary digit such as:
This will replace 'A4', 'A5' or 'A6' with 'An'.
Anchoring at start or end of line
The character '^' (circumflex) , when it occurs at the start of a regexp, indicates the start of a line. If it is used anywhere else it indicates the '^' character itself (though we just saw it being used for another purpose in a list).
The character '$' (dollar sign), when it occurs at the end of a regexp, indicates the end of a line. If it is used anywhere else it indicates the '$' character itself.
If the sequence 'abc' starts at the beginning of the line then use:
If at the end of the line then this regexp would be needed:
Replacement in the s command
The replacement used in the s command can be more complex than we have seen so far. We will go into more detail with what can be done here later in the series, but for now we'll look at the & character.
The & character denotes the whole matched portion of the
REGEXP part of the command. If an actual '&' character is required, then it must be escaped.
So, to append 'def' to 'abc' the command would be:
This can be seen as replacing 'abc' with 'abcdef'.
If a literal '&' is required then it needs to be escaped with a backslash:
s/fruit/apples \& pears/
Otherwise undesirable consequences will result:
$ echo "Eat your fruit!" | sed -e 's/fruit/apples & pears/' Eat your apples fruit pears!
Flags and the s command
The flag we will examine this time is g. This causes the replacement to be applied to all matches, not just the first. So, for example:
This means that all instances of the sequence 'abc' will be replaced with 'def' in the current line. Without it, as we saw earlier, just the first instance will be replaced.
Using sed commands in a file
As we saw in the Options section, sed can take its commands from a file (as well as from the command line). Commands can be formatted one per line, in which case the end of each line separates one command from another. There can be multiple commands per line, in which case they are separated by semicolons.
One way of using commands in a file might be the following:
$ sed -f - sed_demo1.txt <<END s/\./!/g s/community/Community/ END
This uses the Bash shell's heredoc feature. This is directly equivalent to using a quoted list of commands:
$ sed -e 's/\./!/g s/community/Community/' sed_demo1.txt
In general it is better to create a
sed command file in the way you would create any other text file, such as in an editor. Giving the file an extension of '.sed' will help to remind you what it is.
$ cat commands.sed s/\./!/g s/community/Community/ $ sed -f commands.sed sed_demo1.txt
$ wc -l sed_demo1.txt | sed -e 's/ .*$//'
This is a rather artificial example, as we have already seen, but we know that the
wc command returns the number of lines followed by the filename when run in this way:
This is passed to
sed which runs the script
s/ .*$//. This replaces the first space and the zero or more characters that follow up to the end of the string by nothing, thereby deleting them. This leaves the number of lines as the final result.
$ sed -e 's/is no/are no/' sed_demo1.txt
This fixes the fragment "There is no restrictions" replacing it with "There are no restrictions" in
sed_demo1.txt. You will see that the word restrictions is on the next line, so it cannot be included in the regexp.
Of course, we cannot just change 'is' to 'are' because there are many uses of this letter sequence throughout the file. That is why we make it more specific by using the regexp 'is no'.
We are not permanently changing the file with this command, but you can isolate and display the changes by adding a call to
grep in a pipeline as follows:
$ sed -e 's/is no/are no/' sed_demo1.txt | grep -A1 "are no" produced by the community - fellow listeners like you. There are no restrictions on how long the show can be, nor on the topic you can
-A option to
grep displays a number of lines after the target line, and the number chosen here is one line.
We will look at how
sed can alter a file and save the results back to it in a later episode.
$ sed -e 's/is no/are no/' -e 's/topic /topics /' sed_demo1.txt
This fixes the same fragment as Example 2, but also sorts out the phrase "the topic you can cover". The change is needed because of the use of the word "they" later in the sentence. We include the space in the target regexp because the word "topics" occurs later in the file.
We will look at this more in later shows in this series, but a
sed script can consist of multiple commands, and these can be separated by semi-colons. So, the following way of writing the earlier command in this example is exactly equivalent:
$ sed -e 's/is no/are no/;s/topic /topics /' sed_demo1.txt
$ sed -e 's/Hacker /Hobby /;s/Hackers/Hobbyists/' sed_demo1.txt
There is one instance of "Hacker" and one of "Hackers" in the text. We don't want "Hackers" to be turned into "Hobbys", so we differentiate the two instances as shown.
$ sed -e 's/is no/are no/;s/topic /topics /;s/\. /. /;s/ /#/g' sed_demo1.txt
This final example applies the earlier grammatical corrections, replaces a single space after a full-stop with two spaces, and (perversely) turns all spaces into hash marks. This stage uses the g flag to process all spaces.
This example shows that each of the commands is applied to each line in turn, and that it is possible to accumulate many commands to make a complex script. We have already seen how scripts can be more conveniently executed from a file, and we will examine this subject more deeply in a forthcoming episode in this series.
- HPR "About" page: http://hackerpublicradio.org/about.php
- "Sed - An Introduction and Tutorial" by Bruce Barnett: http://www.grymoire.com/Unix/Sed.html
- Example file for processing: http://hackerpublicradio.org/eps/hpr1976/sed_demo1.txt (extracted from http://hackerpublicradio.org/about.php)