Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr2114 :: Gnu Awk - Part 1

An introduction the the awk text parsing tool

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2016-09-08 is flagged as Explicit and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (0)

Part of the series: Learning Awk

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

Introduction to Awk

Awk is a powerful text parsing tool for unix and unix-like systems.

The basic syntax is:

awk [options] 'pattern {action}' file

Here is a simple example file that we will be using, called file1.txt:

name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5

First command:

awk '{print $2}' file1.txt

As you can see, the “print” command will display the whatever follows. In this case we are showing the second column using “$2”. This is intuitive. To display all columns, use “$0”.

This example will output:

color
red
yellow
red
purple
green
purple
brown
brown
yellow

Second command:

awk '$2=="yellow"{print $1}' file1.txt

This will output:

banana
pineapple

As you can see, the command matches items in column 2 matching “yellow”, but prints column 1.

Field separator

By default, awk uses white space as the file separator. You can change this by using the -F option. For instance, file1.csv looks like this:

name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5

A similar command as before:

awk -F"," '$2=="yellow" {print $1}' file1.csv

will still output:

banana
pineapple

Regular expressions work as well:

awk '$2 ~ /p.+p/ {print $0}' file1.txt

This returns:

grape   purple  10
plum    purple  2

Numbers are interpreted automatically:

awk '$3>5 {print $1, $2}' file1.txt

Will output:

name    color
banana  yellow
grape   purple
apple   green
potato  brown

Using output redirection, you can write your results to file. For example:

awk -F, '$3>5 {print $1, $2} file1.csv > output.txt

This will output a file with the contents of the query.

Here’s a cool trick! You can automatically split a file into multiple files grouped by column. For example, if I want to split file1.txt into multiple files by color, here is the command.

awk '{print > $2".txt"}' file1.txt

This will produce files named yellow.txt, red.txt, etc. In upcoming episodes, we will show how to improve the outputs.

Resources

  1. http://www.theunixschool.com/p/awk-sed.html
  2. http://www.tecmint.com/category/awk-command/
  3. http://linux.die.net/man/1/awk

Coming up

  • More options
  • Built-in Variables
  • Arithmetic operations
  • Awk language and syntax

Comments

Subscribe to the comments RSS feed.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to
record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?