Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at


hpr1501 :: AWK

A cursory introduction to the AWK programming language

<< First, < Previous, , Latest >>

Thumbnail of laindir
Hosted by laindir on 2014-05-05 is flagged as Clean and is released under a CC-BY-SA license.
AWK, text processing, rule, pattern, action, regular expression. (Be the first).
The show is available on the Internet Archive at: https://archive.org/details/hpr1501

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:19:25

Programming 101.

A series focusing on concepts and the basics of programming

First of all, a correction. In the podcast, I mistakenly refer to one of the coauthors of the language as Kevin Weinberger. My humblest apologies to Mr. Weinberger, whose actual first name is Peter. I also neglected to mention one of AWK's most interesting features: its automatic field splitting. I hope to submit a followup podcast soon in order to rectify these two glaring mistakes.

AWK is a loosely typed interpreted programming language. Many useful functions in a UNIX programming environment, such as reading files, looping over input, matching regular expressions, and splitting strings into fields have been abstracted and are presented to the programmer as native parts of the language. This makes AWK ideal for text processing.

The basic structure of an AWK program is a list of rules. Each rule is made up of an optional pattern and an optional action. If the pattern is matched, the corresponding action is run. When AWK starts up, it loads the supplied program text, runs any rules with the special BEGIN pattern, then in turn, opens each file supplied on the command line (or stdin if no files or a - are specified). Each file is split into records based on the value in the RS (record separator) variable. AWK then loops through each record, splits it into fields based on the value in the FS (field separator) variable, and loops through each rule in the program. An empty pattern matches all records, so actions with no pattern run for every record. An empty action causes the current record to be printed.

The operator most unique to AWK is the $ (field access) operator. When followed by an integer literal or variable holding an integer value, it returns the corresponding field in the current record (counting from 1 up to NF, the number of fields special variable). $0 returns the entire record. If the supplied integer is greater than NF, it is treated as an uninitialized variable, which, in AWK, is treated dually as either the empty string, or the number 0, depending on the context in which it is referenced.

The most common type of pattern used in AWK (excepting, perhaps, the empty pattern) is a regular expression literal. It consists of a regular expression enclosed in forward slashes. This syntax is inherited from ed, the standard text editor, and has been passed down all the way to javascript. In AWK, a regular expression literal, alone as a pattern, is shorthand for $0 ~ /regex/, where ~ is the regular expression match operator (the string $0, current record, matches the supplied regular expression).


Comments

Subscribe to the comments RSS feed.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the letter P in HPR stand for?
Are you a spammer?
What is the HOST_ID for the host of this show?
What does HPR mean to you?