A Digital Audio Primer

October 13, 2008 - Reading time: 9 minutes

(What the common person should know about their MP3 players)

(simplified of all technical junk you don’t need to know, techies: keep walking.)

File Formats, and what’s the difference?

Why use file formats? The answer to that is simple: space. The CDs you buy have the audio recorded at the highest quality they can fit on a single disc. Imagine if you were to direct transfer a full CD to your hard drive. We’d be talking 500-700 megabytes. That means roughly 30 CDs could fit on your 20 gigabyte iPod. That would be pretty disappointing. So we had to find a way to make the files smaller.

The answer: The MP3 format (Yes there were many compression formats before that, but this is just the high points)

How does an MP3 file work, conceptually? The “compression” takes the form of removing data to shrink the file size. This trimming, and the amount removed is what we mean when we say Bitrate. The higher the bitrate, the less data lost; the lower the bitrate, the more data trimmed away. Consider bitrate being the amount of the good stuff left.

128 bits = not much left

320 bits= Barely trimmed, about as good as it gets.

So why not always use 320? Every file is more than twice the size on your hard drive.

So what is lost when the file is trimmed? Take this sound wave (A graphical representation of what the sound looks like):

A Basic Music Soundwave

The blue section in the middle is the easiest for us to hear. As you get closer to the top and bottom of the wave, it becomes harder for our ears to discern (Think about the light spectrum, ultraviolet on one end and infrared on the other).

We have trouble hearing the furthest parts of the spectrum.

So naturally, this is the best part to cut. A good way to visualize it (it’s considerably more complex algorithms) is like this:

As you trim it down, the sound becomes less full, more tinny/metallic/shallow/etc. Now lets talk about VBR, or Variable Bit Rate MP3’s. It is exactly what it sounds like: The bitrate changes to preserve as much sound as possible, but cut the most data possible. More cutting, with less loss. Here’s a way to envision VBR (Of course the algorithm is even more complex, but let’s just think about it conceptually):

See, if there is a moment in the song with only a single speaking voice, a wider range can be cut without much damage (maybe down to 128). Now if you have a drum set and a guitar (maybe down to 256). A violin, a flute, an oboe, and a bass would probably stay at 320). As the MP3 plays, the bit rate changes, hence: Variable.

Transcoding: This is a process of horrible badness. Lets examine this cycle of musical destruction. We start with a CD, the data on this CD is in the purest state possible (Technically).

-We decide to rip them to MP3 (See, now you know why we call it ripping, we are forcibly removing data and only keeping what we need.) 256 bitrate sounds good enough. Let’s say that we lost approximately 20 percent of the total data. That’s fine, we can still listen to the remaining 80% without problem in our headphones (But I wouldn’t recommend playing it through a massive club system, you’ll hear the difference.).

-Now, we want to burn these MP3s for our friend as an audio cd that he can listen to in his car. The CD we burn for him will be the same 80% of the original data that we found perfectly satisfactory. It will be just fine.

-Now this friend of ours, he has no idea that we just burned the MP3s instead of a copy of the original CD for him (I recommend writing on the CD you burn what the bitrate was, but that only helps if your friend already knows, or has read this article.)

-Here’s where the trouble starts. Your friend decides he wants to listen to this CD on his MP3 player. So he rips the CD into MP3s. So what’s the big deal? His computer has no idea that these were 256 bit MP3s, and not pure CD audio. So our friend re-rips (transcodes) the music back into MP3s, cutting the already cut data again. He’s now ripped another 20% of the information from our already-reduced-by-20% files. He’s left with maybe 60% of the data, masquerading as a full 80% (The files will proudly proclaim themselves to be 256 bitrate, when they no longer are). The cycle continues.


Other Formats:

Lossy – Like MP3 encoding, these format’s compress data using the “trim what isn’t necessary” method.

Apple’s version – AAC – these files can support Digital Rights Management (Which means that if you don’t follow the rules, they can take your music away.) Slightly more efficient than MP3s at compressing data, but not by a massive amount. It’s not the container that is fancy on these, it’s the locks.

Microsoft’s version – WMA – without getting too deep in the details, these are MP3s that make Microsoft money. The sound quality is a smidge better for the same file size as MP3s, but not enough that you would want to convert your entire collection to it.

Lossless – Unlike MP3, these formats compress the data without losing any of it (It will always sound exactly like the CD did). Think of it like installing a closet organizer that allows you to fit twice as much stuff in the same closet. These codecs just reorganize the data into a smaller package. On average, the files are half the size. (Half is not amazing if you are short on drive space. You are still looking at 150-200 megs per CD) The beauty of this type of compression: Transcoding can not happen. You can rip and burn all day.

Apple’s version – M4A – Mac claims files will be 40-60% smaller than the original (CD) data. This statement is pretty much true.

Microsoft’s version – WMA – Microsoft claims a startling 20-40% smaller, but in most testing, it turns out that is actually in the same 40-60% category as Apple. Imagine that.

Open Source Version – FLAC – The same results as both formats above. So why use FLAC? Well, any software you use that is capable of playing WMA files, probably paid to use that codec. Even MP3 money goes to patent holders.

Audio File Type Summary:

It doesn’t really matter which lossy codec you use, as long as you acknowledge that it is lossy. If you choose to use Lossless (for the true audiophile for whom storage is not a problem, or for archival purposes) know the limitations of each file type. If you use Linux, FLAC is your best bet since getting Windows and Mac proprietary codecs work can be a headache.

If you find audio interesting, a good place to start is wikipedia. You can get a more in-depth explanation, but stop reading once they get to the math or patent rights sections. There is an infinite supply of technical information online on this subject, but a lot of it is impenetrable if you don’t already know about it.

-- James Diemer


Tech tips, reviews, tutorials, occasional rants.

Seldom updated.