In this episode, I give some examples of common and uncommon tools for processing data files
Hosted by b-yeezi on 2016-08-08 is flagged as Clean and is released under a CC-BY-SA license.
Listen in ogg,
mp3 format. | Comments (4)
Here are some of the tools I use to process and clean data from all manner of customers:
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It’ll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.
See other episodes for great sed information. I like to remove DOS end of line and end of file characters:
sed -i 's/
sed -i 's/\r//g' *.txt
- pdftotext -layout
- unix2dos and dos2unix
- buffer searches (
:vim /pattern/ ##)
- Ack plugin
- bufdo (
:bufdo %s/pattern/replace/ge | update)
Subscribe to the comments RSS feed.
Comment #1 posted on 2016-08-09T00:46:44Z by Jonathan Kulp
Thanks this is a genius tool. Never heard of it before.
Comment #2 posted on 2016-08-17T16:55:35Z by Ken Fallon
I love detox
detox -vr *
wow what an excellent tool.
Comment #3 posted on 2016-08-19T16:30:03Z by Dave Morriss
Thanks for mentioning 'ack'
Wow! I had never encountered 'ack' before. It's amazing.
I have written a bunch of Bash scripts to work with a PostgreSQL database (yes, I know, it's a bit like wearing a hair shirt; self mortification), and I found I could do things like:
ack --shell --pager=more psql .
There's no other easy way to do this that I know of.
Thanks very much for pointing this one out.
Comment #4 posted on 2016-08-21T14:53:50Z by ivor
I always love vim tips. So I got pulled in looking at the buffer search. Then I noticed the other tools mentioned. Most of them I know about and use all that are relevant to me very frequently. So now I'm going to subscribe...
Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.
Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).