This article is was meant to help newbies understand some of the basic principals of digital audio recording and effects processing. It wasn’t meant for audiophiles or scientists, so if you are one, please look beyond some of my gorilla tactics when it comes to explaining some of this stuff.
That being said:
Digital editors, depending on their pre-sets will take snap shots of the incoming analog audio signal at what ever rate you tell them to. If I set my editor to record at 44100 – what I am telling the editor to do is take snapshots – or capture – or sample the incoming audio signal at a rate of 44 thousand 100 times per second. Hence, 44/1 as it is often referred to in short.
What this means is that for every second of recording, the editor is recording 44,100 little tiny cross cut images of audio and creating an image, or representation of the sound wavelength represented during that one second. It then takes all of the seconds – or collections of samples and lines them up in a linear fashion to represent the entire recording however long it may be. What audio editing software allows us to do is see a graphical image of what that collection of samples looks like once it is all put together.
The human ear can’t hear frequencies above 22,100 Hz. That is why digital audio, when sampled at such a high frequency (44-1) doesn’t have any latent artifacts left over by the sampling process that can be heard. All CD players play at this frequency. This is often why people rave about how good things sound when they hear audio on CD.
DVD’s on the other hand will often employ a frequency of 48000 Hz which takes an even higher resolution snapshot of the audio to make up for the multiple layers of effects movie studio’s use in the recordings during action sequences.
When you go to a high tech movie theater, the sound can be overwhelming at times. None of this would be possible without more bandwidth – or a broader image of samples. That is why audio will often times be recorded at exceptionally high sampling rates – like 192,000 times per second and then down sampled by any number of processes to a more manageable, exportable rate. The sampling rate doesn’t alone apply to bandwidth. What really applies to it is bit depth.
Spreading a group of pre-recorded samples over a palate of 16 bits (the bit-depth of CD’s) will give you an audible result that is sufficient for CD recordings and is quite standard practice. But directly recording an audio signal at a rate of 44100 with a bit-depth of 16 bits, means that you are recording your source audio at the base minimum resolution for CD quality sound. Any manipulation of the subsequent recording via DSP (Digital Signal Processing) will have immediate, generally degrading effects on the source audio. The effects will noticeably change the sound of the source audio and degrade it to a point where most skilled ears can hear the difference quickly.
When we know we want a clearer, more true to life representation of our voice, we will record at the higher resolution setting. Something like 48000Hz – 16 bit. But even this is considered as inadequate by some. If all you were doing were recording a very high quality incoming voice signal (speaking – not singing) this would probably be sufficient. But when you know you are going to be manipulating the end product buy using some sort of processing, whether it be Noise Reduction, EQ (Equalization), Reverb, Chorusing, Delay, Compression or any other process, you will want to not only record at a higher sampling rate; you will also want to provide a bigger palate for those samples to spread out across.
We increase the size of this palate by increasing the bit depth we use when we record. What this means is that when we begin to process the audio by any of the above mentioned processes, we will be applying that process to a much larger grouping of data.
In essence, the process will effect each little group of data to a smaller degree in order to achieve the end product, which in turn will give us a more audibly pleasing recording than if we applied the same effect to a lesser amount of data.
No matter how we process the audio, the process itself is going to be the same. It will alter a small group of data with the same size hammer it would alter a large amount of data with.
It’s just that when the process is applied to the larger sampling of data – we tend to notice it less.
Think about it like this:
If I took a gallon of paint and poured it into a bathtub full of water, no matter how I try to fix it or clean the water up – it is going to look like a bathtub full of paint. If I took that same gallon of paint and poured it into an Olympic size swimming pool and allowed it to spread out, I would eventually be able to see the bottom of the pool, if I even lost sight of it at all.
Now apply the analogy to digital audio and processing:
Your audio – when recorded at 44100 – 16 bit – is the bathtub. Signal processing is the paint. You may be able to get away with a thimble full of paint.
Your audio when recorded at 48000 – 16 bit – is just a bigger bathtub and you might get away with a shot glass of paint.
Your audio when recorded at 96000 – 24 bit – is the Olympic size pool. You can probably pour in the whole damn gallon and depending on the color and density of the paint, people with bad eyesight might not even notice.
Your audio when recorded at 192,000 – 32 bit – is the ocean. People have been pouring crap into it for centuries and we are just now starting to notice.
Not only is this sampling rate far, far too much for anything we would ever do as voice over talent, it is one that I have never recorded at. Even as an experiment. There is just no need and you will chew through disk space unnecessarily.
Needless to say, 48k – 24/16-bit should be your target rate and bit depth when it comes to recording voice over whether you are processing the audio or not. This should be the method used for any kind of professional production.
Recording at 96/24 and then down sampling is, in my opinion, too much work for a simple internet audition. For auditions, recording at 48/24/16 will allow you to encode your audio as an MP3 without fuss – and is more than sufficient, considering the upload engine used on some of the VO database sites takes that audio and further processes it.
Also note that when working with high-res audio in a multi-track environment, any of your production elements that you want to insert into the project will have to be up-sampled in order to play nicely with the other kids in the multi-track.
So that you don’t lose all of your multi-track project data when trying to save or close the project, remember this little tip: Inserting MP3 encoded audio into a multi-track session full of 96/24 wav or AIFF audio has been known to crash editors like Adobe Audition and may crash other as well.
If you would like to add to this or discuss it a bit further, there is a section to leave comments at the bottom of this article.
Leave a Reply