Site Map - skip to main content - dyslexic font - mobile - text - print

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr2091 :: Everyday Unix/Linux Tools for data processing

In this episode, I give some examples of common and uncommon tools for processing data files

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (4)

Here are some of the tools I use to process and clean data from all manner of customers:

detox

The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

See other episodes for great sed information. I like to remove DOS end of line and end of file characters:

sed -i 's/
//g' *.txt

or

sed -i 's/\r//g' *.txt

Command-line tools

  • ack
  • awk
  • detox
  • grep
  • pandoc
  • pdftotext -layout
  • sed
  • unix2dos and dos2unix
  • wget
  • curl

R libraries

  • RCurl
  • XML
  • rvest
  • tm
  • xlsx

Python libraries

Vim tricks

  • buffer searches (:vim /pattern/ ##)
  • Ack plugin
  • bufdo (:bufdo %s/pattern/replace/ge | update)

Other tools


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp

Ack!

Thanks this is a genius tool. Never heard of it before.

Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon

I love detox

detox -vr *

wow what an excellent tool.

Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss

Thanks for mentioning 'ack'

Wow! I had never encountered 'ack' before. It's amazing.

I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:

ack --shell --pager=more psql .

There's no other easy way to do this that I know of.

Thanks very much for pointing this one out.

Comment #4 posted on 2016-08-21T14:53:50Z by ivor

Interesting

I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...

Leave Comment

Powered by Comment Script