Ever tried to get that ‘phone futz’ sound without a dedicated plugin? In this article Damian Kearns explains that there’s an unconventional approach to emulating that ‘phone sound’ in a digital audio workstation and it might well be superior to the way you’re currently doing it.
Phoning It In
A couple of years ago, I gained a new corporate client. For this client, I am tasked to record voiceover for phone trees– phone trees are where prerecorded voiceovers guide you through a company’s or institution’s phone directory options - and I confess, I have a lot of fun during these sessions. I’m typically paired with my good friend, Erica Kourous. We while away the hours reading and recording scripts and making each other giggle to keep things light; during what can sometimes be lengthy sessions filled with corporate technobabble.
When I looked at the delivery requirements on the first couple of jobs, it struck me that I never knew phone tree audio is typically delivered to North American clients at 8 khz,16 bit, .wav or encoded using a μ-law codec. The frequency response of the files the clients require is typically 300 Hz-3400 Hz but can go as wide as 50 Hz to 4000 Hz these days, due to the adoption of a slightly more modern ITU-T standard known as G.711.1. This standard actually allows for even wider frequency response by employing lossless compression in files but really, for phone audio, we don’t need anything above 4 khz to comprehend what’s said by a human or an AI voice emulation.
After I finished recording my first phone tree session, I stumbled upon the fact that Pro Tools actually allows me to directly export files at 8 kHz, 16 bit .wav which is really cool and very handy. Other DAW’s like Logic don’t go down quite that low but don’t worry, freeware like Audacity does and if you ever find yourself having to deliver voiceover phone tree files, Audacity has a cornucopia of audio file formats to choose from when exporting files. Audacity shames all the major commercial DAW software in this regard, probably owing to its open source development. It even offers up sessions right up to a 384kHz sample rate!
Why does any of this matter in relation to making a good sounding phone futz for use in a music or post mix? Well, signal flow is everything in audio so acknowledging the stages involved in making real phone audio helps us understand how to mimic it in our creative endeavors. By the way, for all you non-North American and non-Japan-based audio engineers, a-law is your codec of choice over μ-law if you ever are tasked with this sort of work for real.
Talk Technobabble To Me
Since most DAW’s won’t allow us to record directly at 8 kHz (and our talent would probably ruin our careers if we tried) we’ve got to record at a higher bit depth and sampling frequency. This is often 16 bit and 44.1 kHz for music and most typically 24 bit and 48 kHz for TV post and most film projects.
Now, the bit depth actually needs to be mentioned to understand what happens when we produce a phone file for commercial use. This bit depth can end up being as low as 13 bits, which would substantially increase our noise floor from a 24 bit original file, but perhaps the sampling rate matters more to our understanding of a good phone futz.
The Nyquist theorem states that to accurately reproduce an audio signal, the sampling frequency must be 2x the highest frequency we are hoping to reproduce. So 48 khz means we get up to 24 kHz and 44.1 kHz is somewhere closer to 22.05 kHz. Usually, there’s a high frequency brickwall filter employed so we top out around 20 kHz, where the filter stops higher frequencies from entering the digital audio we’re recording. But what happens if telephone resolution is typically only about 8 kHz?
4 kHz is then our highest frequency and a steep brickwall filter is employed to prevent aliasing, which happens when unwanted artefacts are produced by frequencies not filtered above the Nyquist limit. If you don’t know about this stuff, read an article like this one. It’s worth 5 minutes of your time. In short, some smart people figured out that if a frequency isn’t sampled at least twice a second, we can end up with frequencies and artefacts we don’t want so best to omit them from our audio recordings and streams.
Let’s get going on this futz. If you don’t have your own audio, Erica has us covered. These downloadable files ought to help test things out and since the script she’s reading is mine, you’re free to test away. In this article, I’ll be using either free, stock or cheap software to demonstrate how easy it is to get a good, authentic sounding phone effect.
Bye Bye, Fidelity
Step 1, after duplicating the track or playlist containing the original un-futzed audio, is creating that 8 kHz, 16 bit file. In Pro Tools, a user can SHIFT+COMMAND+K. If you don’t have a DAW that will export like this, check out Audacity. In the bottom left corner you can dynamically change the sampling frequency of the session, then export your audio with the new sampling rate. What’s also cool about this program is that you can select a preference to dither down or not. This has audible implications that might well be desirable in some cases.
Regardless of how you create it, when you bring the 8 kHz, 16 bit file back into your DAW, it sounds like phone audio. It has an indefinable quality that simply using EQ filters and dynamic range compression cannot recreate. It’s gritty. It’s already mostly futzed. That’s thanks to resampling at such a low frequency; lots of samples dropped and lots of frequencies we’re used to hearing very clearly become a lot less defined. So much added noise and distortion. Think of taking a RAW photo on a DSLR and during it down to a 100kB file. Same idea.
Here’s Erica, reading the script I wrote specifically for this article but now the audio has been reduced in quality through that 8 kHz, 16 bit export. Without employing a steep filter below 300 Hz, she sounds a bit muddy in the 8kHz file. No problem, we’ll call up a filter that we can automate so that we can thin her out or thicken her back up during the mix. This will help with proximity perspective. A high cut filter could be used to pull a phone out of a pocket or something like that so applying the filters after the 8 kHz export works extremely well to intentionally create presence or reduce intelligibility.
Distortion Is Compression
Step 2 is less obvious than it might first appear but adding distortion to something is a harsh but viable option for dynamic range limiting. Check out the picture above to see the before and after result of adding distortion to audio; it’s a lot less dynamic on the right hand side of the photo. It also sounds more like a phone speaker. I didn’t actually add much distortion at all but thanks to where I set the clipping threshold, all those wide dynamic peaks are now flat as a pancake.
In this example, I’ve just used a standard distortion plugin available to Pro Tools users to clip my audio a bit. Any distortion will do, so long as it can be adjusted to keep the clips ‘soft’. By this I mean, just lopping the tops off the loud peaks. Again, I’d automate this in real-time during a mix to add variance to the sound and emphasize any loud voices that might be coming over the phone.
EQ Or Compress If You Must
As for Step 3, some people really like to EQ and compress their phone futzes. I do this too sometimes if I need to smooth out mix levels or carve out a pocket for phone audio. It can be fun to squash someone’s upper mid frequencies as a phone is going into a pocket or being flung off a bridge. In this case, EQ or compression can be as effective as they are corrective.
Reverb?
Step 4 is all about putting the phone in a space. Use as needed or desired. If you don’t need reverb, your futz is already done! Only thing left to do is look up the word ‘futz’ in the dictionary.
Futzing It Up
Programs like McDSP’s Futzbox or Audioease’s Speakerphone 2 are regular workflow options for me because of their layered approaches to creating and manipulating futzes. Occasionally, I work with subcontractors who don’t have either of these programs so what I get from them has band-limited EQ applied, as well as heavy dynamic range compression. This works, sort of, but not really. As we saw and heard with the 8 kHz downsampling and distortion, it’s more realistic to drop samples and clip audio to mimic a phone.
In the absence of Futzbox or Speakerphone 2, or as an alternative, I use my method of exporting and importing back 8 kHz files because of the added dimension the lack of samples gives to the futz. With the various stages of real-time effects processing I employ, I have some cool options for mixing. For even more fun, I add Pixelator from Joey Sturgis Tones to bitcrush and make things sound like interference is happening. This layer is also best automated in real-time because sometimes clients, like audio engineers, want to play around with the shape and amount of effects processing. There is a wide range of bit crushing software out there so if you’re into it, I’m sure you have your own favourite tool.
If you’re into capturing impulse responses, IR’s are another interesting way to produce a futz. Here are a couple of cheap options for using IR’s.
Waves’ IR-L can import impulses from a .wav file simply and cheaply enough and you can even vary the decay. If you own iZotope’s Trash 2, you can use the same .wav files you might in the Waves plugin, with very different results. For Trash 2, here’s a cool ‘hack’ to add IR’s to the program:
Band-limit some pink noise to 300Hz to 4 kHz so no other frequencies are involved in the IR creation. Play the band-limited pink noise through a phone. I do this by uploading the pink noise to Dropbox or my Music library. Record it from the desired perspective with your choice of mic. Trim this newly recorded futzed pink noise file and export it at whatever sampling frequency works for you. I’ve done a variety of these files at various lengths, some with room decay included and others with the decay trimmed off to dry up the model. Name each and every file you make so it’s clear which one is what.Then, locate your Trash 2 “Impulses” folder.
On a Mac, it’s in Users/Username/Documents/iZotope/Trash 2/Impulses. On my system, I created a subfolder in there called “Damian” This then shows up in Trash 2 when I hit the Convolve button. This is where I can apply my new IR convolution. As I said before, these same files work elsewhere; in Waves IR-L and probably even Altiverb, though I haven’t checked this last option out.
If you’re having trouble with Trash 2 loading your IR’s, this article explains how to reload the IR’s so Trash 2 will read them.
I’m Hanging Up Now
As with many things ‘audio’, there’s an easy way to do something and then there are other more creative or expensive ways to do the same thing well or better. As a guy who has used a hardware Futz device early in my career, I can tell you that was about as easy as it gets. But just like anything I do in a mix, I try to find the best sounding approach that works perfectly for the given situation. Far be it from me to ever phone it in (which also works as a futz!).
Photo by Pixabay