Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr2091 :: Everyday Unix/Linux Tools for data processing

In this episode, I give some examples of common and uncommon tools for processing data files

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (4)

Here are some of the tools I use to process and clean data from all manner of customers:

detox

The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

See other episodes for great sed information. I like to remove DOS end of line and end of file characters:

sed -i 's/
//g' *.txt

or

sed -i 's/\r//g' *.txt

Command-line tools

  • ack
  • awk
  • detox
  • grep
  • pandoc
  • pdftotext -layout
  • sed
  • unix2dos and dos2unix
  • wget
  • curl

R libraries

  • RCurl
  • XML
  • rvest
  • tm
  • xlsx

Python libraries

Vim tricks

  • buffer searches (:vim /pattern/ ##)
  • Ack plugin
  • bufdo (:bufdo %s/pattern/replace/ge | update)

Other tools


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp

Ack!

Thanks this is a genius tool. Never heard of it before.

Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon

I love detox

detox -vr *

wow what an excellent tool.

Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss

Thanks for mentioning 'ack'

Wow! I had never encountered 'ack' before. It's amazing.

I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:

ack --shell --pager=more psql .

There's no other easy way to do this that I know of.

Thanks very much for pointing this one out.

Comment #4 posted on 2016-08-21T14:53:50Z by ivor

Interesting

I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to
record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?