Bit depth in audio production represents a number of binary digits used to measure sampled audio.
The bit depth directly influences the resolution of the sample and controls the dynamic range of the audio. Professional audio engineers use a depth of 24 bits since it meets all of the modern-day requirements and standards.
Using bit depths higher than that would not provide any benefits other than the higher dynamic range, which nonetheless couldn’t be reproduced or heard by a human. If you plan to print your music on the CD, you still can use a project in 24 bits and lower the bit depth on the export and apply dither and noise shaping.
Bit Depth Explained
In computing and digital communication, the basic unit of information is called a bit. In digital audio engineering, bit depth shows a number of bits in each sample. Since graphically, the analog sound wave is represented with a sine curve, the bit depth shows how many points are used to build the sine wave. Modern DAWs and audio converters use PCM technology to transform analog sound waves into digital signals and convert them back. PCM stands for pulse-code modulation and samples the amplitude of the signal at even intervals.
The basics of bit depth
The bit depth shows the number of binary digits encoded in each sample. So basically, the bit depth indicates the resolution of the given waveform. The resolution is increased exponentially, and each added bit doubles the resolution. The important thing to remember is that the bit depth does not affect the frequency response.
Since the bit depth directly influences the resolution of a waveform, there are only two things that are affected by it. Firstly, it controls the dynamic range of audio. The higher the bit depth values, the higher the dynamic range would be. Secondly, the bit depth influences the signal-to-noise ratio. In professional audio engineering signal-to-noise ratio abbreviated to SNR.
The dynamic range
The dynamic range shows the difference between the quietest and the loudest points of your audio. The dynamic range is measured in decibels, and as it goes, the higher the number, the greater would be the difference between the quietest and loudest parts.
An average dynamic range of human hearing approximates 140 dB, but this number highly depends on the frequency and some other individual circumstances. The depth of 16 bits which is a standard of CD audio, will give you a dynamic range of 96 dB. 24-bit digital audio has a range of 144 dB, which matches the human hearing capabilities. But there is one thing that we need to clarify. Assuming that it is quite clear what the term “loudest part” would mean, things are a bit tricky with the quietest part.
Dealing with the quietest parts
Let’s say we are recording an acoustic piano, and obviously, the loudest part of our audio would be the loudest note we could perform. Naturally, one would assume that the quietest part of our audio would be the quietest note that we could perform or even the hammers’ noise. That assumption is as far from the truth as it gets. Since we are talking about audio recording specifically, the quietest part of the said piano recording would be the noise floor of the microphones and other peripheral analog equipment. Therefore, in audio engineering, the dynamic range represents the difference between the signal that has value and the unwanted signal.
Unfortunately, every analog equipment, including the A/D converters in your audio interface, has a certain noise floor. Even if you do not have an external audio processor, your PC still has converters that are partially analog. But the great news is that if you recorded your parts loudly, at a higher bit depth, the noise floor would not be audible.
Digital noise
The analog audio equipment noise represents a challenge only when you are trying to capture a very gentle and quiet performance. But what if you do not record any live instruments? You should not have any problem with the noise, right?
Well, that is not exactly true. Of course, using virtual instruments and synthesizers relieves you from dealing with microphones and pre-amps. But as it turns out, digital audio can also introduce some noise. This noise is introduced from a quantization error, which is typical for digital signal processing.
Quantization error
Quantization is the process of converting input values from a large set to output values in a smaller set. In digital audio, the quantization is performed by analog-to-digital converters and various algorithmic functions embedded into the software. The process of quantization in audio is the process of reconstructing the original analog amplitude of a signal into digital values. Quantization processes typically involve rounding and truncation, which introduce the difference between the value of the input signal and its quantized value. This difference is usually called a quantization error.
Typically, in audio production, we are dealing with a very large number of samples. A sequence of quantization errors occurs, which results in introducing a random signal with absolutely no production value. Generally, a higher bit depth means lower quantization noise power. Given that the modern-day industry standards are 16 and 24 bits, it means that working on a project with a higher bit depth will result in the addition of quantization noise on export. To fix this problem, audio engineers use dithering and noise shaping.
Dithering and noise shaping
Dither is a form of noise that is used to randomize the quantization error. Commonly, the quantization error is correlated to the signal, which results in a predictable distortion. When the signal is dithered, unwanted harmonics are mathematically removed and replaced with a constant fixed noise level.
Noise shaping is the process of altering the spectral shape of the quantization error. It lowers the level of the noise power in frequency bands at which noise is considered to be unwanted and makes it higher in the bands where it is more desirable. Noise shaping works by putting the quantization error in the loops, which makes it possible to filter the error. Most of the noise shaping algorithms used in audio processing are based on an absolute threshold of the hearing model, which makes it possible to render the quantization noise completely inaudible.
What bit depth to use?
Since we know that audio engineers typically prefer not to dig into boring mathematical equations too deep, it would be rather useful to know the bit depth’s actual practical applications. It might be a bit obvious to point out, but bigger does not always mean better. In this particular case, higher bit depth will give you a greater dynamic range but will result in higher CPU usage. Moreover, samples recorded at the greater bit depth will have a larger size, and since they will be eventually loaded into your RAM, you may run out of it while working on a fairly large project.
As we already know, the average dynamic range of human hearing is approximately 140 dB. And the depth of 24 bits will give a dynamic range of 144 dB, which is well within our hearing capabilities. Most electronic equipment actually can’t handle more than 120 dB of dynamic range, so it would be rather pointless to strain your PC for something that can not be reproduced. Basically, it all comes down to industry standards and individual project requirements.
Modern television broadcasts require an audio bit depth of 24 bits. The same goes for streaming services and YouTube. If you intend to print your audio to the CD, it has to be exclusively 16 bits. But since the CD audio is arguably a dying format and most people tend not to use any physical storage for the audio other than USB drives, it is fairly pointless to use that bit depth.
But on the other hand, the vinyl came back with a banger, and a lot of music is printed on it these days. The vinyl can have a dynamic range of 70 dB, give or take, so 16-bit audio should be sufficient enough. Nevertheless, it is recommended to have headroom in case of additional post-processing, so a bit depth of 24 is usually used for vinyl. So as we can clearly see, the bit depth of 24 is a sweet spot.