In the show "hpr1943 :: HPR AudioBook Club 11.5 - Interview with David Collins-Rivera" pokey asked if there was a way to get the duration for media. The following three options springs to mind immediately.

fix_tags

The first option is fix_tags and was written by our own Dave Morriss. I did a blog post on how to install it some time ago. Now while it is intended to manipulate the metadata in media files, I asked Dave to include the length in seconds as well.

$ fix_tags intro.flac 
intro.flac
album     : Hacker Public Radio
artist    : YOUR NAME HERE
comment   : 
genre     : Podcast
length    : 00:00:39 (39 sec)
title     : YOUR SHOW NAME HERE
track     : 0
year      : 0

The next command below will filter for the word length and give the second field delimited by braces. Finally it removes the surplus " sec". The "\" at the end of the line is just to make the commands readable, and bash will ignore them.

$ fix_tags *flac | \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g'
39
60
17
11

I'm going to use bc to add them up so I will need them on a single line rather than on their own line. The easiest way to do that was to wrap them in a $() Command substitution (HPR episode 1903) and echo the result.

$ echo $(fix_tags *flac |  \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g' )
39 60 17 11

Now I can use sed to replace the space with a + sign

$ echo $(fix_tags *flac | \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g' ) | \
sed 's/ /+/g'
39+60+17+11

now wrap that up i $(), and have echo send it to bc and we find the total number of seconds.

$ echo $(fix_tags *flac | \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g' ) | \
sed 's/ /+/g' | bc
127

Now that I have seconds, I can use the -d argument (display time described by STRING, not 'now') followed by the @ which Convert seconds since the epoch (1970-01-01 UTC) to a date. I also use the -u to print or set Coordinated Universal Time (UTC). Otherwise it would be off by your local time. Finally I use the formatting option %T which formats the time to iso8601 time format %H:%M:%S.

The reason I put the \ in front of the date command is because I alias date to my liking and the \ means please use the unaliased date.

$ \date -ud @$(echo $(fix_tags *flac | \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g' ) | \
sed 's/ /+/g' | bc )  +"%T"
00:02:07

When I run this on my current selection of podcasts on my sanza, we get:

$ time \
\date --utc --date="@$(echo $(fix_tags *mp3 *ogg 2>/dev/null | \
awk -F '\\(|\\)' '/length/ {print $2}' | \
sed 's/ sec//g' ) | \
sed 's/ /+/g' | bc )"  +"%T"
03:09:49

real    0m2.953s
user    0m1.079s
sys     0m1.936s

mediainfo

Next up is mediainfo which provides a lot of information on media files.

$ mediainfo intro.flac 
General
Complete name                            : intro.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
File size                                : 3.26 MiB
Duration                                 : 39s 50ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 701 Kbps
Album                                    : Hacker Public Radio
Track name                               : YOUR SHOW NAME HERE
Performer                                : YOUR NAME HERE
Genre                                    : Podcast
Comment                                  : http://hackerpublicradio.org

Audio
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Duration                                 : 39s 50ms
Bit rate mode                            : Variable
Bit rate                                 : 701 Kbps
Channel(s)                               : 1 channel
Sampling rate                            : 44.1 KHz
Bit depth                                : 24 bits
Stream size                              : 3.26 MiB (100%)
Writing library                          : libFLAC 1.3.0 (UTC 2013-05-26)

There are a few issues with the standard Duration field. First the fact that it is human readable makes it difficult to do calculations on. Second it is listed with the same field name in both the General and Audio Section.

Adding the --full argument gives a lot more information, so I will filter the output to the Duration field.

$ mediainfo --full intro.flac | grep Duration
Duration                                 : 39050
Duration                                 : 39s 50ms
Duration                                 : 39s 50ms
Duration                                 : 39s 50ms
Duration                                 : 00:00:39.050
Duration                                 : 39050
Duration                                 : 39s 50ms
Duration                                 : 39s 50ms
Duration                                 : 39s 50ms
Duration                                 : 00:00:39.050

Although there are a variety of formats the output available it isn't very useful as the format is either in iso8601 time format HH:MM:SS, split into components, or in milliseconds, and we still have two sections. We could say that we could just sort and print the unique values but then there is the possibility of collapsing two or more different files with exactly the same duration.

However it's a lot easier to deal with plain seconds. To get around these issues we can ask mediainfo to export into XML format and then take the output and use xmlstarlet to select only the Audio track and then only the first element. We then use sed to trim the last three characters from each line.

$ mediainfo --full --Output=XML *.flac | \
xmlstarlet sel -T -t -m "Mediainfo/File/track[@type='Audio']/Duration[1]" -v "." -n - | \
sed 's/.\{3\}$//'
39
60
17
11

Now doing the same tricks we did before to get it on one line and replace the spaces with +

$ echo $(mediainfo --full --Output=XML *.flac | \
xmlstarlet sel -T -t -m "Mediainfo/File/track[@type='Audio']/Duration[1]" -v "." -n - | \
sed 's/.\{3\}$//') | \
sed 's/ /+/g' | bc
127

And again we can convert it to a readable time format

$ \date -ud @$(echo $(mediainfo --full --Output=XML *.flac | \
 xmlstarlet sel -T -t -m "Mediainfo/File/track[@type='Audio']/Duration[1]" -v "." -n - | \
 sed 's/.\{3\}$//') | \
 sed 's/ /+/g' | bc)  +"%T"
00:02:07

Now to again check the current podcast list

$ time \date -ud @$(echo $(mediainfo --full --Output=XML *mp3 *ogg | \
xmlstarlet sel -T -t -m "Mediainfo/File/track[@type='Audio']/Duration[1]" -v "." -n - | \
sed 's/.\{3\}$//') | \
sed 's/ /+/g' | bc)  +"%T"
03:09:49

real    0m0.623s
user    0m0.520s
sys     0m0.131s

ffprobe

The last option is to use ffprobe from the ffmpeg team. It prints information related to the file, specified by the -i argument, to STDERR. You therefore need to redirect that to STDOUT so we can filter on only the Duration.

$ ffprobe -i intro.flac 2>&1 | grep Duration
  Duration: 00:00:39.05, start: 0.000000, bitrate: 700 kb/s

Here we get a iso8601 date which makes it more difficult to process. We can convert iso8601 easily enough with the command but you need to precede it with the day epoc started or it will assume you mean all the days since 1970-01-01 until now plus the duration you are asking about.

$ \date -ud 1970-01-01T00:00:39.05 +%s
39

We first need to extract the date and then remove the remaining "," to get the seconds.

$  \date -ud 1970-01-01T$(ffprobe -i intro.flac 2>&1 | \
grep Duration | \
awk '{print $2}'| \
sed 's/,//g' ) +%s
39

That's fine for single files but pretty useless for multiple ones. In fact we need to create a loop and process each one individually.

$ for i in *flac;do \date -ud 1970-01-01T$(ffprobe -i $i 2>&1 | \
grep Duration | \
awk '{print $2}'| \
sed 's/,//g' ) +%s;done
39
60
17
11

Now again we enclose that and echo it to the bc command.

$ echo $(for i in *flac;do  \date -ud 1970-01-01T$(ffprobe -i $i 2>&1 | \
grep Duration | awk '{print $2}'| \
sed 's/,//g' ) +%s;done) | \
sed 's/ /+/g' | bc
127

And from there we can again get the human readable iso8601 time format.

$ \date -ud @$(echo $(for i in *flac;do  \date -ud 1970-01-01T$(ffprobe -i $i 2>&1 | \
grep Duration | awk '{print $2}'| \
sed 's/,//g' ) +%s;done) | \
sed 's/ /+/g' | bc)  +"%T"
00:02:07

So let's see how that compares.

$ time \date -ud @$(echo $(for i in *mp3 *ogg;\
do  \date -ud 1970-01-01T$(ffprobe -i $i 2>&1 | \
grep Duration | awk '{print $2}'| \
sed 's/,//g' ) +%s;done) | \
sed 's/ /+/g' | bc)  +"%T"
03:09:49

real    0m4.187s
user    0m1.210s
sys     0m4.252s

Conclusions

Use what ever approach you like or have installed. I am sure that the steps I took here could be improved and some might even argue that I wrote them deliberately poorly so community members would get so annoyed that they would send in their own shows. Of course I would never do such a thing.

Links