In this episode, I give some examples of common and uncommon tools for processing data files
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg, spx, or mp3 format. | Comments (4)
Here are some of the tools I use to process and clean data from all manner of customers:
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
See other episodes for great sed information. I like to remove DOS end of line and end of file characters:
sed -i 's/
sed -i 's/\r//g' *.txt
- pdftotext -layout
- unix2dos and dos2unix
- buffer searches (
:vim /pattern/ ##)
- Ack plugin
- bufdo (
:bufdo %s/pattern/replace/ge | update)
Subscribe to the comments RSS feed.
Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp
Thanks this is a genius tool. Never heard of it before.
Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon
I love detox
detox -vr *
wow what an excellent tool.
Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss
Thanks for mentioning 'ack'
Wow! I had never encountered 'ack' before. It's amazing.
I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:
ack --shell --pager=more psql .
There's no other easy way to do this that I know of.
Thanks very much for pointing this one out.
Comment #4 posted on 2016-08-21T14:53:50Z by ivor
I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...