The short history of tagging

The past

Once upon a time, there were some giant companies that, with the failure of the 4-channel battle fresh in mind, formed an expert group with the mission to invent tomorrow's technology in sound compression. Fortunately, they did. The format, named MPEG Layer 3 or for short MP3, took advantage of the fact that our ears are not nearly as good as we generally believe them to be, and thus omitting frequencies that we wouldn't hear anyway. They also made the format suitable for streaming by letting the sound be represented in small, individually compressed blocks of audio data. Each block had a header containing some information relevant to the decoding process. As they ended up with a few bits to much, they used them for some additional information such as a 'copyright' bit and a 'private' bit.

Since MP3 had such an outstanding compression and still very good sound quality, it was soon adapted as the de facto standard for digital music. The lack of possibilities to include textual information in the files was however disturbingly present. Suddenly, someone (Eric Kemp alias NamkraD) had a vision of a fix-sized 128-byte tag that would reside at the end of the audio file. It would include title, artist, album, year, genre and a comment field. Someone, possibly the very same someone, implemented this and everyone was happy. Soon afterwards, Michael Mutschler, the author of MP3ext, extended this tag, called ID3, to also include which track on the CD the music originated from. He used the last two bytes of the comment field for this and named his variant ID3 v1.1.

The present

The ID3 v1.1 tag still had some obvious limitations and drawbacks, though. It supported only a few fields of information, and those were limited to 30 characters, making it impossible to correctly describe "The Hitchhiker's Guide to the Galaxy from BBC Radio" as well as "P.I. Tchaikovsky's Nutcracker Suite Op. 71 a, Ouverture miniature danses caractristiques by The New Philharmonic Orchestra, London, conducted by Laurence Siegel". Since the position of the ID3 v1.1 tag is at the end of the audio file it will also be the last thing to arrive when the file is being streamed. The fix size of 128 bytes also makes it impossible to extend further. That's why Martin Nilsson and several along with him thought that a new ID3 tag would be appropriate.

The new ID3 tag is named ID3v2 and is currently in a state of 'informal standard'. That is, Martin Nilsson decided, since there were less and less improvements and additions made, to proclaim the draft as a standard (an informal one since no standardization body has approved this decision). ID3v2 is often followed by its revision number, i.e. the current informal standard is ID3v2.4.0.

What is ID3(v1)?

The audio format MPEG layer I, layer II and layer III (MP3) has no native way of saving information about the contents, except for some simple yes/no parameters like "private", "copyrighted" and "original home" (meaning this is the original file and not a copy). A solution to this problem was introduced with the program "Studio3" by Eric Kemp alias NamkraD in 1996. By adding a small chunk of extra data in the end of the file one could get the MP3 file to carry information about the audio and not just the audio itself.

The placement of the tag, as the data was called, was probably chosen as there were little chance that it should disturb decoders. In order to make it easy to detect a fixed size of 128 bytes was chosen. The tag has the following layout (as hinted by the scheme to the right):

Song title30 characters
Artist30 characters
Album30 characters
Year4 characters
Comment30 characters
Genre1 byte

If you one sum the size of all these fields we see that 30+30+30+4+30+1 equals 125 bytes and not 128 bytes. The missing three bytes can be found at the very beginning of the tag, before the song title. These three bytes are always "TAG" and is the identification that this is indeed a ID3 tag. The easiest way to find a ID3v1/1.1 tag is to look for the word "TAG" 128 bytes from the end of a file.

As all artists doesn't have a 30 character name it is said that if there is some bytes left after the information is entered in the field, those bytes should be filled with the binary value 0. You might also think that you cannot write that much in the genre field, being one byte big, but it is more clever than that. The byte value you enter in the genre field corresponds to a value in a predefined list. The list that Eric Kemp created had 80 entries, ranging from 0 to 79.

What is ID3v1.1?

ID3v1 may well be easy to implement for programmers, but it sure is frustrating for those with their own, creative ideas. Since the ID3v1 tag had a fixed size and no space marked "Reserved for future use", there isn't really room for that much improvement, if you want to maintain compatibility with existing software.

One who found a way out was Michael Mutschler who made a quite clever improvement on ID3v1. Since all non-filled fields must be padded with zeroed bytes its a good assumption that all ID3v1 readers will stop reading the field when they encounter a zeroed byte. If the second last byte of a field is zeroed and the last one isn't we have an extra byte to fill with information. As the comments field is to short to write anything useful in the ID3v1.1 standard declares that this field should be 28 characters, that the next byte always should be zero and that the last byte before the genre byte should contain which track on the CD this music comes from.

What is ID3v2?

ID3v2 is a new tagging system that lets you put enriching and relevant information about your audio files within them. In more down to earth terms, ID3v2 is a chunk of data prepended to the binary audio data. Each ID3v2 tag holds one or more smaller chunks of information, called frames. These frames can contain any kind of information and data you could think of such as title, album, performer, website, lyrics, equalizer presets, pictures etc. The block scheme to the right is an example of how the layout of a typical ID3v2 tagged audio file may look like.

One of the design goals were that the ID3v2 should be very flexible and expandable. It is very easy to add new functions to the ID3v2 tag, because, just like in HTML, all parsers will ignore any information they don't recognize. Since each frame can be 16MB and the entire tag can be 256MB you'll probably never again be in the same situation as when you tried to write a useful comment in the old ID3 being limited to 30 characters.

Speaking of characters, the ID3v2 supports Unicode so even if you use the Bopomofo character set you'll be able to write in your native language. You can also include in which language you're writing so that one file might contain e.g. the same lyrics but in different languages.

Even though the tag supports a lot of byte consuming capabilities like inline pictures and even the possibility to include any other file, ID3v2 still tries to use the bytes as efficient as possibly. If you convert an ID3v1 tag to an ID3v2 tag it is even likely that the new tag will be smaller. If you convert an ID3v1 tag where all fields are full (that is, all 30 characters are used in every field) to an ID3v2 tag it will be 56 bytes bigger. This is the worst case scenario for ID3v1 to ID3v2 conversion.

Since it's so easy to implement new functionality into ID3v2, one can hope that we'll see a lot of creative uses for ID3v2 in the future. E.g. there is a built-in system for rating the music and counting how often you listen to a file, just to mention some brainstorm results that are included. This feature can be used to build playlists that play your favourite songs more often than others.

Main features of ID3
  • The ID3v2 tag is a container format, just like IFF or PNG files, allowing new frames (chunks) as evolution proceeds.
  • Residing in the beginning of the audio file makes it suitable for streaming.
  • Has an 'unsynchronization scheme' to prevent ID3v2-incompatible players to attempt to play the tag.
  • Maximum tag size is 256 megabytes and maximum frame size is 16 megabytes.
  • Byte conservative and with the capability to compress data it keeps the files small.
  • The tag supports Unicode.
  • Isn't entirely focused on musical audio, but also other types of audio.
  • Has several new text fields such as composer, conductor, media type, BPM, copyright message, etc. and the possibility to design your own as you see fit.
  • Can contain lyrics as well as music-synced lyrics (karaoke) in almost any language.
  • Is able to contain volume, balance, equalizer and reverb settings.
  • Could be linked to CD-databases such as CDDB.
  • Is able to contain images and just about any file you want to include.
  • Supports enciphered information, linked information and weblinks.
  • And more...
Related Links