Why use plain text?
- Portability
- Use with Unix tools
- Use with Ranger
Ranger for the win
- Ranger is a free console file manager that gives you greater flexibility and a good overview of your files without having to leave your *nix console. It visualizes the directory tree in two dimensions: the directory hierarchy on one, lists of files on the other, with a preview to the right so you know where you’ll be going.
- The scope functionality is where converting to text pays off. Located at
$HOME/.config/ranger/scope.sh
, scope is the feature that allows for file preview from inside the console. Text files are highlighted based on their file extension, for non-text files, different converters can be used to coerce the file into a text representation. Some items are available out of the box, but the configuration is written in such a way that any text can be presented in the preview screen.
- The basic format of the scope switch statement is as follows:
case "$extension" in
odt|odp)
try odt2txt "$path" && { dump | trim | fmt -s -w $width; exit 0; };;
- atool
- caca-utils
- poppler-utils
- catdoc
- catppt
- odt2txt
- ods2tsv
- docx2txt
- xlsx2csv
- mediainfo
- lynx/w3m/elinks
- highlight
Show Transcript
Automatically generated using whisper
whisper --model tiny --language en hpr2637.wav
<< First, < Previous, Next >, Latest >>
Comments
Comment #1 posted on 2018-09-13T05:00:37Z by Ken Fallon
WOW
Those that I know I use literally every day. Can't wait to try the rest out.
Please do a deep dive series on each. No pressure.
Comment #2 posted on 2018-09-14T11:23:51Z by Beeza
Value of text conversion
I'm a big fan of plain text and CSV files, as they are probably the formats that will last conceptually forever - unlike the Office formats we use today (including ODS/ODT etc). You may lose the layout information but the "meat" is always preserved.
The PDF to Text converters only work with documents which have been generated from a WP application. Scans of a printed document generally only produce an embedded JPG image.
A few years ago I created a system that employed many of the commands you mention in your episode to convert a document into pure ASCII text, then create a non-repeating list of all the words it contains, along with an instance count (using SQL). By applying this to the contents of a document library the database was used to populate a "search by keyword" system for that library.
Populating the database from several hundred Word and PDF documents took only a couple of minutes. The subsequent keyword searches were very fast and produced a list of relevant documents ranked by the number of instances of the keyword. It was very easy to combine keywords using SQL "AND" and "OR" qualifiers.
Comment #3 posted on 2018-09-15T11:49:36Z by Jonas
Ranger, etc.
I'm a die hard vimmer and have never heard of Ranger. I'm looking forward to using it more. I asked a couple of my online Linuxey buddies and they used it years ago when they had less substantial machines. I still love the command line stuff even with my best machines. Everything is super quick in the terminal.
Thanks for the mention and your great shows.
I need to explore jq for sure. I work with a database that saves a couple columns in JSON. It would be nice to query the exports in a more friendly way.
Comment #4 posted on 2018-09-15T15:34:53Z by Dave Morriss
Great show
I installed Ranger after listening to your show 1756 (https://hackerpublicradio.org/eps.php?id=1756) but never used it and completely forgot about it. I was surprised to find it on my system and have been playing about with it a lot since listening to this show.
I'm a long-time text and command-line user but I tend to use Midnight Commander for the times I want to do a lot of file searching and manipulation, though I have to admit I use Dolphin sometimes in two-pane mode when I'm doing things like copying files off an SD card. I shall add Ranger to the mix too I think.
I agree with Ken: we need shows about all of the tools in your list!
Anyway, this was a very welcome episode. Thanks.
Comment #5 posted on 2018-09-20T03:07:44Z by clacke
Q
Never heard of Q before. Very cool! I will very likely find use for this.
Not a very googlable name, but I found it here: https://harelba.github.io/q/
<< First, < Previous, Next >, Latest >>
Leave Comment
Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.
Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).