Netflix have just announced that they are improving the quality of the audio that they deliver to their streaming service subscribers.
Not long after Scott Kramer joined Netflix as Manager, Sound Technology | Creative Technologies & Infrastructure, they were reviewing ‘Stranger Things 2’ with the Duffer brothers in a living room environment as the brothers like to check how viewers would experience their work. At one point in the first episode, there was a car chase scene that they found didn’t sound as crisp as it had done on the mixing stage.
Even though Scott was new in post, he reported as saying “A lot of it was mushy," and words like “mushy” and "smeared" are ones that Scott and his team found themselves using when describing audio that just isn’t quite as crisp as it should be.
Now Stranger Things is a very popular series on Netflix and Scott very quickly realised this was something that needed to be “made right”. They pulled in their engineering teams and Netflix were determined to make it right, no matter how much effort it was going to take. The solution to the problem was to deliver a higher bitrate for the audio on Stranger Things 2 but rather than just fix this one series they have been working hard to roll out improved audio more broadly.
It was interesting example of the Netflix culture at work and doing what was needed to support their creative partners. Watch this video to hear the story from the perspective of their staff including Scott Kramer….
Who Will Benefit?
Netflix tell us that most TV devices that support 5.1 or Dolby Atmos are capable of receiving better sound. Depending on your device and bandwidth capabilities, the bitrate you receive may vary:
5.1: From 192 kbps up to 640 kbps
Dolby Atmos: From 448 kbps up to 768 kbps for subscribers to their Premium plan
Netflix do expect these bitrates to evolve over time as they get more efficient with their encoding techniques.
If subscribers have bandwidth or device limitations, Netflix have also engineered an adaptive feature, which they tell us that this is similar to what they already do for video side.
As you can see from the data rates above, the new Netflix high-quality sound feature is not lossless, but it is what they describe as “perceptually transparent”. Based on internal listening tests, listening test results provided by Dolby, and scientific studies, Scott and his team at Netflix determined that for Dolby Digital Plus with data rates at 640 kbps or higher, the audio coding quality becomes perceptually transparent, and going any higher would invoke a law of diminishing returns, with more bandwidth required for smaller and smaller quality improvements.
In addition to deciding on 640 kbps , which Netflix say is a 10:1 compression ratio when compared to a 24-bit/ 48KHz 5.1 channel studio master, they set up a bitrate ladder for 5.1-channel audio to range from 192 up to 640 kbps, which equates to what Netflix describe as “good” audio through to “transparent” audio.
At the same time, they revisited their Dolby Atmos bitrates and increased the highest offering to 768 kbps. .
Netflix say on their tech blog post…
“Sound helps to tell the story subconsciously, shaping our experience through subtle cues like the sharpness of a phone ring or the way a very dense flock of bird chirps can increase anxiety in a scene. Although variances in sound can be nuanced, the impact on the viewing and listening experience is often measurable.
And perhaps most of all, our “studio quality” sound is faithful to what the mixers are creating on the mix stage. For many years in the film and television industry, creatives would spend days on the stage perfecting the mix only to have it significantly degraded by the time it was broadcast to viewers. Sometimes critical sound cues might even be lost to the detriment of the story. By delivering studio quality sound, we’re preserving the creative intent from the mix stage.”
How Much Better Does It Sound?
Unfortunately we haven’t been privy to the tests that Netflix have undertaken but they have run some demos for journalists. One of them was Lauren Goode from wired.com. We will let Lauren describe her experiences in her own words…
“I was surrounded by at least 10 ATC (read: very expensive) speakers, while a large monitor in front of me offered a visual representation of the audio upgrade I was about to experience. A Netflix engineer played a simple, 20-second track of studio applause and toggled between the near-perfect studio master track, a track encoded at the old bit rate of 192 Kbps, and a track encoded at the new optimal bit rate of 640 Kbps.
I’m not an audio engineer or even much of an audiophile, but the applause track streamed at 640 Kbps sounded uncannily close to the studio master track and offered a crispness that the 192 Kbps stream couldn’t match.
Then Netflix’s engineers showed off an even more extreme example. They streamed the scene from Stranger Things in which the character Will first discovers the Upside Down. In that clip, an insect buzzes in the background, a small noise that adds volumes to the eeriness of the scene. When the audio was streamed at 192 Kbps, the insect was muffled, almost muted. On the higher-quality streams, it was much more obvious. And sure, the highly controlled demo I got was done on an optimal audio setup, but that bug sounded like it was buzzing right by my ear.”
Lauren’s experiences, especially with the insect buzzing, leads us to ask how much will this do for dialog intelligibility? It will be very interesting to see if this has a noticeable impact on intelligibility. It will be great news if it does.
Adaptive Streaming For Audio
Since Netflix began their streaming services, they have used static audio streaming at a constant bitrate. This approach determines the audio bitrate based on network conditions at the start of playback. Bizarrely, they have spent years optimising their adaptive streaming engine for video, but until now, they have only used adaptive streaming for video.
By using adaptive streaming for audio, they now allow the audio quality to adjust during playback to bandwidth capabilities, just as they do for video.
For example, consider a scenario where there is a sudden drop in data throughput. In this case Netflix are now able to select a higher audio bitrate when network conditions support it and then gracefully switch down the audio bitrate and be able to avoid the dreaded spinning wheel denoting a ‘rebuffer event’ by maintaining healthy audio and video buffer levels. Interestingly they claim that they are now able to maintain a higher video bitrate compared to a similar case using static audio streaming.
It Wasn’t A Matter Of Just Turning Up The Data Rates
You might be wondering why we are talking about this now. After all, Stranger Things 2 came out quite a while ago. However, they had to be sure that these changes would work out in the field. Netflix with all their subscribers have hundreds of millions of TV devices in the field, with different CPU, network and memory profiles, and adaptive audio had never been certified. Would these devices even support audio stream switching?
They had to assess this by testing adaptive audio switching on all Netflix supported devices.
They also added adaptive audio testing in their certification process so that every new certified device can benefit from it.
Once they knew that adaptive streaming for audio was achievable on most of the TV devices, they then had to answer the following questions as part of the algorithm design process:
How could they guarantee that they can improve audio subjective quality without degrading video quality and vice-versa?
How could they guarantee that they wouldn’t introduce additional rebuffers or increase the startup delay with high-quality audio?
How could they guarantee that this algorithm will gracefully handle devices with different performance characteristics?
They answered these questions via experimentation that led to the team fine-tuning the adaptive streaming for audio algorithm in order to get the best audio quality possible without impacting on the video experience. In fact it took them a year before they were able to answer these questions and implement adaptive audio streaming on a majority of TV devices and now we can all enjoy it.