Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr2184 :: Gnu Awk - Part 5

In this episode, I describe how to use regular expressions with Awk.

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2016-12-15 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (2)

Part of the series: Learning Awk

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

GNU AWK - Part 5

Regular Expressions in AWK

The syntax for using regular expressions to match lines in AWK is as follows:

word ~ /match/

Or for not matching, use the following:

word !~ /match/

Remember the following file from the previous episodes:

name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5

We can run the following command:

$1 ~ /p[elu]/ {print $0}

We will get the following output:

apple      red    4
grape      purple 10
apple      green  8
plum       purple 2
pineapple  yellow 5

In another example:

$2 ~ /e{2}/ {print $0}

Will produce the output:

apple      green  8

Regular expression basics

Certain characters have special meaning when using regular expressions.

Anchors

  • ^ - beginning of the line
  • $ - end of the line
  • \A - beginning of a string
  • \z - end of a string
  • \b on a word boundary

Characters

  • [ad] - a or d
  • [a-d] - any character a through d
  • [^a-d] - not any character a through d
  • \w - any word
  • \s - any white-space character
  • \d - any digit

The capital version of w, s, and d are negations.

Or, you can reference characters the POSIX standard way:

  • [:alnum:] - Alphanumeric characters
  • [:alpha:] - Alphabetic characters
  • [:blank:] - Space and TAB characters
  • [:cntrl:] - Control characters
  • [:digit:] - Numeric characters
  • [:graph:] - Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)
  • [:lower:] - Lowercase alphabetic characters
  • [:print:] - Printable characters (characters that are not control characters)
  • [:punct:] - Punctuation characters (characters that are not letters, digits, control characters, or space characters)
  • [:space:] - Space characters (such as space, TAB, and formfeed, to name a few)
  • [:upper:] - Uppercase alphabetic characters
  • [:xdigit:] - Characters that are hexadecimal digits

Quantifiers

  • . - match any character
  • + - match preceding one or more times
  • * - match preceding zero or more times
  • ? - match preceding zero or one time
  • {n} - match preceding exactly n times
  • {n,} - match preceding n or more times
  • {n,m} - match preceding between n and m times

Grouped Matches

  • (...) - Parentheses are used for grouping
  • | - Means or in the context of a grouped match

Replacement

  • The sub command substitutes the match with the replacement string. This only applies to the first match.
  • The gsub command substitutes all matching items.
  • The gensub command command substitutes the in a similar way as sub and gsub, but with extra functionality
  • The & character in the replacement field references the matched text. You have to use \& to replace the match with the literal & character.

Example:

{ sub(/apple/, "nut", $1);
    print $1}

The output is:

name
nut
banana
strawberry
grape
nut
plum
kiwi
potato
pinenut

Another example:

{ sub(/.+(pp|rr)/, "test-&", $1);
    print $1}

This produces the following output:

name
test-apple
banana
test-strawberry
grape
test-apple
plum
kiwi
potato
test-pineapple

Resources


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2016-12-15T01:00:28Z by Clinton Roy

Lots of useful info, great notes as well :)

There were a few times where the plosive Ps made it hard to listen to. What recording setup are you using?

Comment #2 posted on 2016-12-16T00:15:54Z by b-yeezi

:re Lots of useful info

Yes I know. I don't always use that Plantronics USB headset because of that reason, but it does the best at reducing background noise. I have to remember to position it correctly and do some tests before recording.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to
record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?