Metadata in MP3, Opus/Ogg/FLAC/Speex, and other audio files.
Today's episode discusses (and encourages) the use of metadata tags in audio files.
Most of the episode is spent on id3v2.3 (metadata for mp3 files) and vorbiscomments (metadata for opus, ogg vorbis, flac, and speex files), and how to mix them, though metadata in webm/matroska, windows media, and wav files is briefly discussed as well.
This episode's files have also been crafted with substantially more metadata than the ID3v1 set of tags that HPR normally limits itself to, to serve as examples.
Listeners to the opus, ogg (vorbis), or speex versions will also have access to chapter markings if your playback software recognizes standard vorbiscomment chapter metadata. (No chapter markings in the mp3, as support for it is extremely sparse, and I've not
yet even managed to find a tool for making mp3 chapters that actually works - the java utility I mention in the episode crashes on me without starting...)
All metadata conforms to the published standards, so your playback software should at best fully use it all, or at worst simply ignore it. If your player software actually DOES have a real problem with this file, I would very much like to know!
If there's anything wrong with the metadata, blame Epicanis, not HPR (I did the metadata myself).
If you hear or see any errors in this episode, please tell me. I'll issue appropriate corrections in subsequent episodes. If I'm a big enough screwup with this episode, I could even do a small episode on "everything I got wrong in my metadata episode" if I did
badly enough. I don't THINK there should be more than a few minor errors or omissions here, though.
ERRATA: In chapter 18 (at 34:53) there is one small error: oggenc does NOT transfer attached pictures from flac input (though it DOES transfer all vorbiscomment metadata. FLAC stores attached pictures in a separate metadata structure so oggenc misses it.
opusenc - at least in recent beta versions - DOES appear to transfer the attached pictures as well as the vorbiscomments, though. Another reason to upgrade to opus, I suppose...)
Comment #1 posted on 2013-12-05 14:27:43 by Ace Frahm
At 17:51, you incorrectly state that each person heard on the track should have their own Artist tag. Artist is a singleton tag, according to the specification.
Each performer should have their own PERFORMER tag, and the ARTIST tag would simply summarize all PERFORMER and ENSEMBLE (that's a band name, or orchestra, or group, etc.) tags, for dumber software/hardware that can't easily show them separately.
Comment #2 posted on 2013-12-05 16:11:08 by Ace Frahm
From Hacker Public Radio 1393
Failure to include metadata tags is like sending e-Mail without any text on the subject line
oggenc does NOT transfer attached pictures from flac input (though it DOES transfer all vorbiscomment metadata. FLAC stores attached pictures in a separate metadata structure so oggenc misses it. opusenc - at least in recent beta versions - DOES appear to transfer the attached pictures as well as the vorbiscomments, though. Another reason to upgrade to opus, I suppose...)
The 2 Most Important Meta-tag Systems
ID3 Version 2.3
KID3-CLI 3.0 is a post-encoding metadata editor available on many platforms. Has a command line version too.
PuddleTag on Linux for sure, Mac unofficially & possibly on Windows, supports multiple attached pictures & modern file formats, good for editing whole directories at once.
Linux command line tag editors
MPG123-ID3dump for .mp3 files including attached pictures, comes with MPG123 command line audio player
ID3TDD supports multiple pictures, but tags them all as 3-Front Cover
vorbiscomment, but you must generate the METADATA_BLOCK_PICTURE yourself. Package includes ogginfo tool which displays ogg vorbis metadata
opustools package encoder & decoder, opusinfo displays metadata but doesn't dump pictures
exiftool mostly used for photos, but can display metadata from pretty much every media file except for .opus
There are lots of media file formats, but the only one that uses ID3v1 or ID3v2.3 metadata tagging system is the .mp3 file format.
The LAME .mp3 encoder appears to accept only 1 attached picture on the command line.
.mp4 is an object-oriented file format, kind of like a special version of Quicktime format
Quicktime isn't a "filetype", it's a framework. But it gets used like a filetype.
The .mp4 specification includes an ID3 data object you could put an entire ID3 header into
.m4a is the audio version of a .mp4 filetype
You might see .m4a files with this ID3 data object populated by a valid ID3 object
But .m4a files typically come from iTunes, but Apple uses an undocumented proprietary format for metadata, so you probably won't normally see the ID3 object on a .m4a file from Apple
There are 2 or 3 other undocumented metadata formats you might run into as well (I don't know what they are.)
ID3 Version 1 is an UNRELATED bad old format with SERIOUS LIMITATIONS.
All metadata is crammed into a specialized 128B data structure at the end of an .mp3 file.
By putting the 128B at the end of an .mp3 file, crappy players that did not understand what it was would probably just interpret it as more sound a play it as some noise, or if your player crashed on the metadata, it would do so at least AFTER you got to hear the file
30B each of title, artist, album, comment
1B genre code number, which of course, limits the genres to 2^8 = 256 labels, that need to be looked up in a table to find the definition.
There are ~141 genre codes defined by ID3v1
None of them are "podcast"
ID3 Version 2.3
Completely different than ID3v1
A whole bunch (~75?) of special data fields, each with their own special data structure at the beginning of the file
Each field has a special 4-character code to identify it, such as TCON for genre or TIT2 for title
The (~75?) special data fields use (~5-6) different KINDS of special data structures
Of these (~75?) special data fields, 39 fields use the text-class kind of special data structure
The text-based data fields have the same structure
Except for comments, which has its own structure
And except for the "involved persons list", which is a catch-all text field for stuffing in all the names & roles for everyone else whose role isn't defined in one of the other special fields.
When you stuff multiple entries into a text field, you separate them with a forward slash '/'
Aside from the text-based special data fields above, the only other frame anybody normally uses is the "attached picture" field.
Not just a copy of a .jpg file or whatever image format
Specifies a MIME type of picture data
Has a free-form text description of the picture data
You can have multiple "attached picture" fields
Except for 2 "file icon" attached picture types, one copy each only
Has a number code to indicate what the picture is supposed to be
Liner notes art
Record publisher logo
Image of the silk-screened CD art on the disc it came from
A brightly colored fish (the ogg format uses a cartoon fish for its logo, picture type 17)
"Content Type" = genre
The genre field is text in ID3v2, not a number code like ID3v1
But the ID3v2 specification still suggests adding the ID3v1 genre code number to this field
Text field TXXX
You can have as many TXXX fields as you want, so long as the descriptions are different
A key name
A string value
Could be used to include vorbiscomments
ID3 Version 2.4
Don't bother using ID3v2.4
Not widely used
If Windows won't read your files' tags, maybe someone tagged them with the ID3v2.4 format instead of the ID3v2.3 format.
Mostly a few backwards-incompatible renamings of a few tags
A few obscure new tags
When you stuff multiple entries into a text field, you separate them with a NULL, instead of the forward slash '/' used by ID3v2.3
There was a 2005 method of stuffing another ID3 header into the first one to make chapter tags, but this was made 5 years after ID3v2.4, which isn't used much anyway, and only the BBC ever used it with their own player software, so you should never try to use this either. If you have to do some archaeology on an old BBC file, you might need to know this. Otherwise, just use vorbiscomments if you want to make an "enhanced podcast" with images that show up during playback like a slideshow, based on the chapter tags.
.mp3 format only uses ID3 format metadata tags
All the other file formats we care about use Vorbiscomments
All printable characters, must be text characters, no non-printable characters or control codes
You could search vorbiscomments with grep
Tag key names are case insensitive
You can create your own key names
All tags are OPTIONAL; you can have an ogg file with NO tags present and it will still be compliant
But there is a recommended standard for common metadata
Singleton tags should only appear once
If one of these tags appears more than once (a non-compliant mistake), its last appearance should be displayed if there is only room to display one instance of the tag.
Genre should be TEXT not a number
You might put comments in the DESCRIPTION field, or make your own "comment" tag, although "comment" isn't in the recommended standard. You could put the same data in both places, although you're duplicating the data.
ISRC tag="International Standard Recording Code", a special tracking code for commercial audio recordings
The chapter comments proposed tags are very similar to codes fed to .matroska tools to create tags for those files
Replay gain tags could be used/set by user's player software to select a relative playback volume for track adjustment, if you like.
Location supports geo-tagging the track, although what this means isn't clear. GEO_LOCATION
Is it where the track was recorded?
Is it thee location referred to in the content?
Is it the location where the intended audience is?
Is it a tag that specifies where user's device should be when it automatically starts to play?
Is it a bunch of waypoints of recordings of chapters you took at different scenic locations in a travelogue?
Attached pictures are a pain.
A visible picture is obviously not text-encoded (other than ASCII art). Not human readable.
Shouldn't be in the metadata anyway, should be an independent file inside the container
But ogg files don't support encapsulation of picture format files in the container
And .mp3 files have been including the binary encoded album art pictures for so long, it is standard practice in .mp3 format
Encoding a picture inside a vorbiscomments tag involves encoding it as printable text characters.
e-Mail programs have to do this kind of thing too, encoding pictures as text
5 or 10 years ago ( right now is 2013-12-05 ) people were doing this with an obsolete field called "COVERART" with the contents of the field being nothing more than the base64 encoded .jpg or .png file
Don't do this now, no one will ever see the cover art
Nobody ever implemented using this field
It was replaced long ago by an officially documented structure
METADATA_BLOCK_PICTURE is the correct vorbiscomments tag name for a picture.
A complete Base64 encoded data structure, includes
Picture Type number code ( similar to ID3 )
.flac uses vorbiscomments for its metadata
Except for attached pictures
Unlike .vorbis, .speex & .opus files, .flac files are not inside .ogg containers. .flac is its own container format.
.flac has its own attached picture block, very similar to .mp3 files
.flac also calls this tag "METADATA_BLOCK_PICTURE"
But it does not have the same format as the vorbiscomments METADATA_BLOCK_PICTURE tag!
.vorbis, .speex & .opus files
Don't have a special metadata block just for attached pictures
These build a .flac METADATA_BLOCK_PICTURE tag, then Base64 encoded it into text that can be used as a valid vorbiscomment METADATA_BLOCK_PICTURE tag.
The .flac & .opus LINUX command line file encoders allow you to include as many attached pictures as you want as switches
The LINUX command line ogg vorbis encoder does not allow you to include multiple pictures
BUT, the ogg vorbis encoder does accept .flac files for input
It will transfer the .flac file's metadata to the finished .vorbis file, including any extra pictures that were already in the .flac file's metadata
So if you make .flac files with complete metadata as the source to work from, you can generate .opus & .vorbis files without editing the metadata further
.mp4 is a Apple Computer Company format.
If you wanted to create an "enhanced podcast" that shows pictures at certain times specified by chapter markers, you'd have to use special iTunes tags with .mp4 files to make it work, normally only on apple hardware, but WanAMP can also read this format and shows the pictures on a Windows box.
No one else knows haw to make them, as this is not documented well or supported on most other players.
You should just use Ogg Vorbis with vorbiscomments that have chapter markers instead.
The lowest common denominator for audio files
Usually lossless PCM audio
Simple in structure, widely supported
.wav files support metadata, but they are badly documented
Audacity can include a limited set of tags in a .wav both as an "info chunk", whatever that is, AND as an ID3 tag
.webm is a special file format version of .matroska
.matroska metadata is even worse than ID3
Uses ~100 rigidly defined tag names
.webm uses ~70 of those .matroska tags
The tags are heavily video-related, seems to presume the .matroska files will only contain movies
Supposed to be object-oriented
Burying some tags inside other tags, such as a "character" tag inside an "actor" tag
"thanks to" tag is a catch-all for stuff that couldn't go anywhere else
This metadata is tacked onto the end of the file so in theory you don't ever need to reincode the video file if you need to change the metadata
Streaming media won't get the metadata until the entire file is played, unless the whole file is being buffered to the end before playback
.webm doesn't support attached pictures at all
.matroska has limited support for attached pictures
Allows a large and a small version of a "banner graphic"
Allows a large and a small version of an "album art graphic"
.webm audio files only exist as afterthought to video
You could make one with GNU Media Goblin
Useful only as a "test"
.asf or .wma audio files are bad, obscure, Windows media file formats
All of these Windows media files are really just .asf format, similar to the way .m4a & .m4v are really just .mp4 format.
Metadata is limited
5 different metadata "objects"
Can contain different kinds of metadata
"content description object", a very small set of pre-defined metadata fields, 64kB each
"content branding" object
Limited to a single banner image
URL for copyright warning stored online
"extended content description" object
Random other metadata
Seems to be an "extended metadata content" object that can refer to just one file inside the .asf container, not just all of them at once
"metadata library object"
No browsers automatically display audio or video metadata by default, built-in. The web designer must write code to include this on the page.
Comment #3 posted on 2013-12-06 19:04:43 by Epicanis
Ace Frahm says:
"At 17:51, you incorrectly state that each person heard on the track should have their own Artist tag. Artist is a singleton tag, according to the specification."
I dispute this - if you go to directly to the actual specification at https://xiph.org/vorbis/doc/v-comment.html, you will find this:
"Field names are not required to be unique (occur once) within a comment header. As an example, assume a track was recorded by three well know artists; the following is permissible, and encouraged:
Worth a mention as a "point of contention" in a followup though - I'm less inclined to give broken old software that doesn't correctly support the specification a pass, but it is true that an awful lot of software (including VLC) is still stuck in "one value per tag" mode.
Comment #4 posted on 2013-12-06 20:22:10 by Epicanis
Good summary/questions, Ace Frahm!
Seems like you have enough questions there for a short followup all by yourself!
Just to hit a couple of random ones here (I'll probably try to follow up in audio in the "opus codec" episode once I've gotten caught up to that):
By "3 or 4 others" (metadata formats) besides id3 and vorbiscomments I was referring to RIFF INFO chunks (.wav metadata), webm/matroska tags, .wma microsoft-screwy-thing, and the undocumented special iTunes thing.
I did count right around 75 individual id3v2.3 field (/frame/tag) names, though I didn't go back to confirm the exact number.
Minor point - I need to enunciate better I think - it's "id3ted" rather than "id3tdd".
Good question about the geolocation - it's "any of those that are relevant". For photographs, the geolocation is always assumed to be "where the photographer was standing when the picture was taken" and there aren't too many cases where any other interpretation makes much sense. (In a telephoto of an obvious landmark it might make sense to geotag with the location of the landmark instead, or for an image of a map, it might make sense to geotag the center of where the map represents). As far as I know, photo geotagging only supports a single geolocation per image as well [not necessarily counting geoTIFF], so options are limited.
With vorbiscomments explicitly designed to support multiples of all tags, the way I think of it is you put in a geo_location tag for any location that is relevant to the recording: imagine that someone wants to generate openstreetmap (or Google Maps or whatever) pages with markers that go with the audio files. The way I figure it, the geo_location tags should provide the locations of all of the markers that the hypothetical link-maker would want to show. (It's probably worth proposing a "description" addition to the geo_location tag now that you mention this, though: something like "geo_location;35.1592;-98.4422;;Nowhere, OK used as example location"
Also, thanks for teaching ME something - I usually tend to think of playback as something that doesn't need an internet connection, so I feel stupid for only thinking of web links to relevant pages (what I used the "chapter###url" tags for, e.g. the id3v2.3 chapter's URL should have been a link to the specification online) and somehow completely missed using it to pull a slideshow from the internet while playing. Now I have to try that... (For me, chapter marks are more about having convenient "jump to:" points in the audio.) It's worth noting that doing "fetch pictures from the internet" like that also makes it a way to put "web bugs" in audio files...
(Finally: I should say I don't necessarily disagree with your contention of how one SHOULD use the "artist" tag...well, actually I do but less strongly and not because that's not what the specification says. The cases where it matters seem like they wouldn't come up TOO often in practice. The examples that come to mind are mostly things like "Darryl Hall and John Oates" and "Paul Simon and Art Garfunkel" and "William Gilbert and Arthur Sullivan", all of whom are so well known as duos that "Hall and Oates", "Simon and Garfunkel", and "Gilbert and Sullivan" are practically all single-word names (kind of like that "colladoody" video game people were going on about...) and makes sense as a "singlet". On the other hand, there's "Ebony and Ivory" (Stevie Wonder and Paul McCartney?), where I think it would definitely be more appropriate to follow the specification and give each "artist" their own tag. I just think it's more consistent to use all of the tags the same as per specification, and more people using the tags correctly will encourage playback software to support it correctly.)
Comment #5 posted on 2013-12-08 10:56:39 by Ken Fallon
Metadata should not be included in the file
Very informative show but I would argue that only metadata necessary for the playout of the file should be included in the file itself. Everything else should be included in a separate file that itself is dedicated to hold the metatadata. That file might include locations for where other item might be found, eg, a url to the poster location, or a link to the wikipedia article.
The main reason for this would be to keep the complexity of the playout device as simple, and therefore as cheap , as possible. Your argument that there is enough space in a media file to hold all the metadata breaks down, once you start adding metadata in multiple languages, or extending it to add reviews, sleeve notes, wikipedia articles, reviews etc. At that point the text begins to get very significant indeed and runs the risk of been outdated very quickly.
Fields like, Title/Artist/Album/Track Number/Length, should be included as they are (usually not always) the same in every language, and give playout devices enough information for to display something useful, while another another more compiles, independent, process gathers the richer metadata from all over the web.
In an ideal world a single additional field with a global unique identifier would be enough to identify the work to the player, and allow it to go out an locate the metadata file, which in turn would link to other sources of information.
Comment #6 posted on 2013-12-08 18:56:22 by Epicanis
Admins Against Tagging...
Egad, Ken_Fallon, I have to say I strongly and FUNDAMENTALLY disagree there. I guess I'll HAVE to record a followup now...
Wait. That was your plan all along, wasn't it? You sly devil!
Your argument essentially goes to the point of opposing local storage of files or information at all. What you're describing sounds like the user just has a big literally-meaningless (locally) number, and his/her/its media player is meant to send that big meaningless number to one or more places scattered on the general internet to beg for relevant "content". (Your description doesn't take it that far, but once you're down to nothing but an audio and/or video stream and have gotten rid of everything else meaningful, and have mandated an internet connection to get it, why even keep the media itself locally?)
I'll save the rest for a followup episode, so you win this round. :-)
Comment #7 posted on 2013-12-09 12:09:43 by Ken Fallon
I'm not an admin
First these are my own opinions. Second you completely misinterpreted my arguments so, first allow *me* to record a show stating my case and you can then record another arguing your point.
That way we get 3 shows !!!
Comment #8 posted on 2013-12-09 14:59:14 by Epicanis
"Charismatic Cult Leader"?
"El Presidente"? "Colonel"?...
(I actually have no idea what the [dis]organizational chart for HPR looks like, but I was equating your level of access with some form of administrative-level authority, even if you use it only sparingly and judiciously...)
"first allow *me* to record a show stating my case and you can then record another arguing your point.
That way we get 3 shows !!!"
Ah, HA! I knew it! Will your nefarious schemes never end?
Sounds like a plan!
(I should clarify that my leap to "against tagging" is largely based on an argument that any piece of information that is not actually ATTACHED to a file doesn't really count as "tagging", and, yes, I am intentionally engaging in a sort of "inflatio ad absurdum" there - I want more shows on this topic, too!)
Comment #9 posted on 2013-12-19 13:01:28 by pokey
Well, I liked it.
Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.
Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).