Sound file formats


Martin McBride, 2016-12-12
Tags sample rate bit depth compression bit rate mp3 wav
Categories data representation sound

There are many different sound file formats. They all store sound as a sequence of quantised samples (as described in recording sound). The main difference is in the type of compression offered.

MP3 is a common format for music. It uses lossy compression to achieve a small file size, but with some loss of quality.

WAV is another common format. It is often used to store uncompressed sound data. It can be used to store compressed sound data, but that is less usual.

Most file formats allow you to choose various parameters which affect the file size and quality.

Sample rate

To digitise a sound wave, we sample it thousands of times per second. Here is our original wave:

sound-formats

This shows the effect of sampling it:

sound-formats

The green wave show what happened when we reconstruct the original wave from the samples. It is close to the original, but not exactly the same. The difference adds a small amount of unwanted noise to the sound.

Here is what happens if we halve the sample rate:

sound-formats

Since we are sampling less often, the green wave is less accurate than before, so the signal has a greater amount of noise.

A sample rate of about 5,000 samples per second is just about adequate for a telephone. CDs use a sample rate of 44,100. High end professional digital audio equipment can use frequencies of hundreds of thousands of samples per second. The greater the sample rate, the larger the file.

Bit depth

Digital sound is also quantised - the sound level cannot take any value, it is restricted to a fixed set of values. Here is the effect of quantisation on the signal:

sound-formats

Once again, this introduces noise to the signal. If we use fewer levels, the signal is even more distorted:

sound-formats

A bit depth of 8 bits allows for 256 different levels, and uses 1 byte per sample - this creates quite a poor quality sound. A bit depth of 16 bits allows for 65536 different levels, and uses 2 bytes per sample - this creates a good quality sound (provided a suitable sample rate is used). High end equipment uses 24 or 32 bit samples.

The overall quality is determined by the sample rate and the bit depth, and ideally they should be matched. For example, there is no point using a sample rate of 100,000 per second if the bit depth is only 8 bits - the low bit depth would degrade the quality so much that the high sample rate would be pointless.

CD quality uses 44,100 samples per second, with a bit depth of 16, which is a sensible choice for pretty good audio quality.

Stereo

In a stereo system, the left and right speakers (or left and right headphones) play slightly different sounds. This can give the illusion of sounds coming from different parts of the room.

A stereo sound file contains two separate sound streams, both in the same file. It will be twice as big as the equivalent mono (non-stereo) file.

Compression

Sound files can get very large. A 16 bit, 44,100 sample rate stereo file occupies about 170KB for every second of audio. A 3 minute song would occupy about 30MB! The ability to compress audio files is very useful.

Lossless compression compacts the sound data without losing any data. It relies on the fact that most sound waves are fairly smooth and often predictable, so can be encoded in a more efficient way. Typically the file will be 2 or 3 times smaller, but the uncompressed data will be exactly the same as the initial data.

Lossy compression takes a more aggressive approach, and actively throws away information to make the data smaller. This may seem extreme, but the algorithm analyses the sound and only throws away those parts of the signal that our ears won't really notice. In this way it can make the data 10 or more times smaller. The uncompressed data will not be exactly the same as the original, but it will sound almost the same to the human ear.

Often people use lossless compression (such as WAV format) while editing sound, to avoid the accumulated effect of saving the sound multiple times with lossy compression. Once the final sound is ready, it can then be saved as an MP3 or similar to be distributed on the Web.

Bit rate

The term bit rate has two slightly different meanings. For raw data, the bit-rate is equal to:

bit_rate = sample_rate*bit_depth

So if we have a sample rate of 44,100 and a bit depth of 16, the bit rate is about 700Kb/s (notice it is Kb/s for kilobits, not KB/s which would be kilobytes).

For a stereo signal, we would double this, because there are 2 separate signals.

For compressed data, the bit rate is often used as a target. So for example, we might decide that we want to create an MP3 which is encoded as 128Kb/s. This tells us approximately how big a 3 minute song will be.

128*3*60 = 23,040 kilobits

23,040 kilobits = 2,880 kilobytes = 2.9MB approx

Comparing this with the previous calculation, a 3 minute uncompressed song would be 30MB long, you can see that we are getting a compression of over 10 times.

Copyright (c) Axlesoft Ltd 2020