hpr2852 :: Gnu Awk - Part 16

Winding up the Gnu Awk series

Hosted by Dave Morriss on 2019-07-09 is flagged as Explicit and is released under a CC-BY-SA license.
Tags: Gnu Awk, advanced features. Comments: 4.
The show is available on the Internet Archive at: https://archive.org/details/hpr2852

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:42:44

Part of the series: Learning Awk.

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

Introduction

This is the sixteenth and final episode of the 'Learning Awk' series which is being produced by b-yeezi (BY) and Dave Morriss (DM).

We are using this as an opportunity to have a round-table discussion about the series, about Awk, and where we recommend the listeners should go from here. Including this one we have produced 16 episodes covering the features most likely to be used in pipelines on the command line or in simple shell and awk scripts.

Note that although the HPR site will list this episode as having a single host, in fact it has two! Plans are afoot to enhance the HPR database so we can eventually indicate this properly.

Topics Discussed

The series
- Started in 2016 (first show released 2016-07-13)
- Finishing in 2019
- 16 episodes in total

Why are we finishing the series?
- We have probably reached the limit of what is useful on the command line or in shell scripts or even in manageable-sized Awk scripts
- Awk shows its limitations as we go on and doesn’t compare well with more modern text processing languages

Our personal experiences with Awk
- BY:
  - Started with sed and awk when first moving to Linux in 2011
  - (ongoing) Exploring and cleaning client data
  - (ongoing) Personal scripts when adding python or other tool would be overkill
- DM:
  - Working with VAX/VMS in the 1980’s. No very good text processing features built-in, so Gnu Awk (and sed) was a great way to handle the data we were using to generate accounts for new students each year. Could easily spot bad records, do some data validation (for example impossible dates of birth).
  - Later in the late 1980’s and early 1990’s more Unix systems came on the scene running HP-UX, Ultrix, SunOS, Solaris, OSF/1, True64 Unix, and awk was very much used there.
  - Later still we moved to Linux; initially Fedora but later RHEL, and of course awk figured in the list of tools there as well.

What have we left out? Why?
- User-defined functions are pretty clunky and hard to use
- Multi-dimensional arrays: other languages do this better
- Internationalization: assumes you’re writing big awk programs
- The gawk debugger: quite clever but probably overkill for this series
- Extensions written in C and C++: some come with gawk and look quite good, but this subject is out of scope

What to use as an alternative to Awk?
- DM moved from gawk to Perl (version 4) in the 1980’s and later to Perl version 5. This might have engendered an awky, Bashy mindset that’s hard to shake off. Not the recommended place to start these days.
- BY moved from gawk to Python and R for large projects. For interactive Bashy exploration, moved to XSV, q, and csv-kit for most use cases.
- These tools have built-in convenience features, like accounting for headers, data types, and file encodings

What’s next?
- It is planned to turn the notes for this series into a combined document which will be available on the HPR site and on archive.org. There is no timescale for this at the moment

Links

GNU Awk User’s Guide
- Internationalization with gawk

A proof that Unix utility sed is Turing complete
Mutagen - discussed as an alternative way to access audio metadata (tags) from Python
XSV
csvkit
Run SQL on CSV files with q

Links to all of the shows in this series on HPR:
- Gnu Awk - Part 1 - episode 2114
- Gnu Awk - Part 2 - episode 2129
- Gnu Awk - Part 3 - episode 2143
- Gnu Awk - Part 4 - episode 2163
- Gnu Awk - Part 5 - episode 2184
- Gnu Awk - Part 6 - episode 2238
- Gnu Awk - Part 7 - episode 2330
- Gnu Awk - Part 8 - episode 2438
- Gnu Awk - Part 9 - episode 2476
- Gnu Awk - Part 10 - episode 2526
- Gnu Awk - Part 11 - episode 2554
- Gnu Awk - Part 12 - episode 2610
- Gnu Awk - Part 13 - episode 2804
- Gnu Awk - Part 14 - episode 2816
- Gnu Awk - Part 15 - episode 2824
- Gnu Awk - Part 16 - episode 2852

Show Transcript

Automatically generated using whisper

whisper --model tiny --language en hpr2852.wav

You can save these subtitle files to the same location as the HPR Episode, and they will automatically show in players like mpv, vlc. Some players allow you to specify the subtitle file location.

<< First, < Previous, Next >, Latest >>

Comments

Comment #1 posted on 2019-07-09 08:46:47 by tuturto

thanks

Thank you for the series and the wrap-up episode. It's been a pleasure to follow to series and learn about awk. I don't use awk by myself, but it's always good to know that there are plenty of tools to choose from when there's specific need.

Comment #2 posted on 2019-07-09 10:47:55 by Hipstre

Thank You!

Thank you for the series, you guys! It was great. I learned more than I wanted to. I tried hard to not learn, but you made me. Not just about awk, but about programming, information theory, and data structures, history, bash, etc...

Comment #3 posted on 2019-07-09 14:25:28 by norrist

HPR Epic

This series will stand out as one of the highlights of HPR. Thank you b-yeezi and Dave Morriss.

Comment #4 posted on 2019-07-13 17:08:55 by Dave Morriss

Many thanks for the kind words

We had a lot of fun putting the series together. I certainly found out more about awk than I knew before, and I think the same sentiment was expressed by my collaborator b-yeezi.

There's nothing quite like telling others about a thing to make you understand it better. ;-)

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
What is the HOST_ID for the host of this show?
What does HPR mean to you?

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at Sat, 20 Apr 2024 01:32:37 +0000