Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr1501 :: AWK

A cursory introduction to the AWK programming language

<< First, < Previous, Latest >>

Host Image
Hosted by laindir on 2014-05-05 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (0)

Part of the series: Programming 101

A series focusing on concepts and the basics of programming

First of all, a correction. In the podcast, I mistakenly refer to one of the coauthors of the language as Kevin Weinberger. My humblest apologies to Mr. Weinberger, whose actual first name is Peter. I also neglected to mention one of AWK's most interesting features: its automatic field splitting. I hope to submit a followup podcast soon in order to rectify these two glaring mistakes.

AWK is a loosely typed interpreted programming language. Many useful functions in a UNIX programming environment, such as reading files, looping over input, matching regular expressions, and splitting strings into fields have been abstracted and are presented to the programmer as native parts of the language. This makes AWK ideal for text processing.

The basic structure of an AWK program is a list of rules. Each rule is made up of an optional pattern and an optional action. If the pattern is matched, the corresponding action is run. When AWK starts up, it loads the supplied program text, runs any rules with the special BEGIN pattern, then in turn, opens each file supplied on the command line (or stdin if no files or a - are specified). Each file is split into records based on the value in the RS (record separator) variable. AWK then loops through each record, splits it into fields based on the value in the FS (field separator) variable, and loops through each rule in the program. An empty pattern matches all records, so actions with no pattern run for every record. An empty action causes the current record to be printed.

The operator most unique to AWK is the $ (field access) operator. When followed by an integer literal or variable holding an integer value, it returns the corresponding field in the current record (counting from 1 up to NF, the number of fields special variable). $0 returns the entire record. If the supplied integer is greater than NF, it is treated as an uninitialized variable, which, in AWK, is treated dually as either the empty string, or the number 0, depending on the context in which it is referenced.

The most common type of pattern used in AWK (excepting, perhaps, the empty pattern) is a regular expression literal. It consists of a regular expression enclosed in forward slashes. This syntax is inherited from ed, the standard text editor, and has been passed down all the way to javascript. In AWK, a regular expression literal, alone as a pattern, is shorthand for $0 ~ /regex/, where ~ is the regular expression match operator (the string $0, current record, matches the supplied regular expression).


Comments

Subscribe to the comments RSS feed.

Leave Comment

Powered by Comment Script