We have become aware that a growing number of people understand that Object Based Audio means Dolby Atmos and Dolby Atmos mean Object Based Audio. Although Dolby Atmos is an application of object based audio it is by no means the only application for object based audio. In this article we care going to explore some of the other applications of object based audio.

What Is Object Based Audio?

Let;’s start with Dolby Atmos. A Dolby Atmos mix is different to what has gone before because as well as linear tracks that make up the mix where positional information is effectively baked into the mix. formats like Dolby Atmos will often use a combination of linear tracks, often called bed tracks in the Dolby Atmos workflow combined with individual sounds, called objects, where each object has matching metadata which tells the system where in the soundfield to position that individual sound. What this enables you to do is represent the content in a way that suits the playback environment, whether it is a huge cinema theatre or someone’s front room. Even better, if it just a stereo mix or binaural it can use the metadata to deliver a mix for that delivery environment. The Dolby system takes the metadata and in conjunction with its knowledge of the speakers attached to it will ply back the full program, bed tracks and objects so that the mixer’s intent is respected as well as the space that it is being played back in.

Object Based Audio Is More Than Dolby Atmos

But object based audio is not just about Dolby Atmos and DTS:X. It is possible to use object audio to deliver content to the end user where they can adjust the balance between content elements. Because MPEG-H audio also offers interactive and immersive sound, employing the audio objects, height channels, and Higher-Order Ambisonics for other types of distribution, including OTT services, digital radio, music streaming, VR, AR, and web content. Dolby and others are now offering personalised audio delivery systems based around the MPEG-H audio standard enabling the end user to chose what they want to hear or not hear. For example in tennis, maybe you don’t want to hear the shrieks from a player? You will have the option to turn that down.

Audiences want to enjoy our programmes everywhere. With mobile devices, they might start watching or listening to a programme at home and then finish the rest on the bus. Object-based media allows the mixer to specify different audio mixes for different environments - if people are listening on the move, with object based audio you can make sure that the sound is just right for them - whatever their surroundings.

Object Based Media - A Definition

With that in mind, let’s take a look at a broader definition of Object based media…

Object-based media allows the content of programmes to change according to the requirements of each individual audience member. The ‘objects’ refer to the different assets that are used to make a piece of content. These could be large objects: the audio and video used for a scene in a drama – or small objects, like an individual frame of video, a caption, or a sound effect. By breaking down a piece of media into separate objects, attaching meaning to them, and describing how they can be rearranged, a programme can be changed to reflect the context of an individual consumer.

But this workflow requires a paradigm shift on the part of content creators.

Audio becomes an object when it is accompanied by metadata that completely describes its existence, position and function. An audio object can, therefore, be the sound of a bee flying over your head, the crowd noise, commentary to a sporting event in any language. All this remains fully adjustable on the consumer’s end to their specific listening environment, needs and liking, regardless of the device.

Here in the UK the BBC has been leading the way in Object Based Audio research. contributing to a new ITU recommendation - ITU-R BS.2125 ‘A serial representation of the Audio Definition Model’, which was published in February 2019 that lays out a specification for metadata that can be used to describe object-based audio, scene-based audio and channel-based audio. Even so the BBC R&D team are under no illusions about the amount of work involved and more importantly the mindset change needed. BBC R&D senior research engineer Andrew Mason explains…

“People’s interest in object-based broadcasting varies enormously depending on their level of understanding of it. In some areas, for example BBC Radio Engineering, it is the focus of a significant amount of effort, designing the next generation of radio broadcasting infrastructure. The impact on production areas – both TV and radio – is still modest, being limited at the moment to an underpinning technology for binaural productions, many of which have now been aired or published on the BBC website. [Meanwhile] the interest of programme commissioners and programme makers in the possibilities of personalisation – for speech/music balance control, as an example – is still being developed.”

Multi-Dimensional Audio And MPEG-H

Another development I heard about at AES NY 2018 is DTS’s licence-free open platform MDA (Multi-Dimensional Audio).

MDA has been developed to provide the industry with a methodology to create, export, author, store and render content in a format that goes beyond closed proprietary systems. As such, MDA offers the industry a format that offers uncompressed PCM objects plus metadata that can be as universally accepted—and as easy and common to work with—as PCM channel-based audio is now.

MDA looks as though it will be possible to deliver different spatial audio formats in one delivery format, opening up what was a strong proprietary based delivery system dominated by one brand into a much more open delivery format. In a session at AES NY 2018, with representatives from both DTS and Dolby on the platform it would seem that there is a desire from both organisations to work together more to be able to deliver spatial content to the consumer.

Another important key in being able to deliver object based audio to the consumer has been the development of the MPEG-H Audio standard. MPEG-H Audio is already on-air in Korea and the US (ATSC 3.0), Europe (DVB UHD), and China.

The organisation behind MPEG-H is Fraunhofer IIS, the German research institute. MPEG-H is an audio system devised for delivering format agnostic object based audio. At IBC 2018 Fraunhofer IIS demonstrated an end-to-end production to consumer system that included MPEG-H monitoring units for real-time monitoring and content authoring, post-production tools, MPEG-H Audio real-time broadcast encoders, and decoders in professional and consumer receivers. Fraunhofer IIS technical standards and business development senior manager Adrian Murtaza, explained that with MPEG-H it is possible to offer…

“immersive sound to increase the realism and immersion in the scene, [as well as] the use of audio objects to enable interactivity. This means viewers can personalise a programme’s audio mix, for instance by switching between different languages, enhancing hard-to-understand dialogue, or adjusting the volume of the commentator in sports broadcasts.”

The view amongst industry experts is that along with Dolby’s new AC-4 format, which natively supports the Dolby Atmos immersive audio technology, MPEG-H is expected to have a significant impact on broadcast delivery services over the coming few years.

Object Based Audio Offers Opportunities For Smaller Companies Too

Object based audio is not just providing opportunities for the big boys like Dolby and DTS. Salsa Sound is a small business that has developed out of research initiatives at Salford University just down the road from me. I have referred to some of their work already in dialog intelligibility with a team undertaking some research on intelligibility, studying if the addition of relevant sound cues helps with intelligibility for both people with normal hearing and people with a range of hearing impairments. They used four experimental conditions, testing for the recognition of a particular keyword like ’sword’.

Machine Learning And Object Based Audio

Back to object based audio and the work that the team at Salsa Sound have been working on. Founders Ben Shirley and Rob Oldfield have developed a set of tools for automatic mixing which are both channel and object-based. They have focused in on live sport where their machine learning engine will automatically create a mix of the on-pitch sounds without any additional equipment, services or human input – freeing the sound supervisors up to be able to create better mixes.

As someone who pioneered surround audio for sport here in the UK mixing the first live football (soccer) broadcast in the UK In Dolby Pro Logic, I am very well aware of the challenges. Around this time there was a move by production to have a more immediate and upfront sound for football. This meant many more microphones around the stadium. However, the significantly larger number of microphones meant we could not work with a relatively static FXs mix. We couldn’t leave all the mics faded up, there was just too much noise and not enough focus. What I ended up doing was laying the pitch out across the desk and then leaving most of the mics partially faded up, and then follow the ball around the pitch fading up the mic closest to the ball to try and get the close ball kicks that production were looking for. This meant that mixing a football match was a full-on job and so a technology that could take some of the intensity away and free me up to be creative would be very appealing. Rob Oldfield explains…

“Our solutions not only create a mix for a channel-based world, but also allow for the individual objects to be broadcast separately with accompanying metadata from our optimised triangulation procedure which places all of the sounds in 3D space – even in a high noise environment – which helps facilitate immersive and interactive applications.”

What they have been able to do with machine learning is a two fold solution. Based on machine learning, they have been able to identify where the ball is on the pitch and to automate the mixing of all the field mics. Secondly, the machine learning technology has been taught to not only identify the ball but how hard is being kicked and to do automated ball kick foley on the fly, at last giving us the impact that we have been struggling to achieve.

Simplifying The Mix For Listening In Challenging Environments

Another application of object based audio and personalised audio delivery is the research Lauren Ward, a Postgraduate Audio Engineering Researcher with a passion for Broadcast Accessibility from Salford University. Lauren’s research has been looking at a methodology whereby different audio objects in a piece of content, are scored for how important each object is to the narrative. If an object is essential to the story, like the dialog, or a door opening, they are scored as essential. Other sounds like ambiences and music that add to the narrative but if they weren’t there you would still be able to follow the story are scored progressively less.

Then there is a single control that you can adjust from a full normal mix through to an essential only mix for the very hard of hearing. I have had a chance to try this out on a visit to Salford University and found it very simple and intuitive and the process of scoring of the objects would be very easy to do during the production process. You could have a rating system like Avid has in Pro Tools for rating clips. It would be very easy to have a narrative importance rating system in the production process and then for that metadata to be embedded into the delivery stream.

The single control interface is so much simpler than other personalised options where multiple level controls are presented for each of the objects like commentary, FXs, home crowd, away crowd etc.

The overall system could be a series of button options to determine which commentary feed the consumer wants and which crowd, home or away and then one control which provides a constantly variable simplification control based on the ‘importance’ metadata.

This system works for both people who are hard of hearing by ‘simplifying’ the mix to make sure that the key narrative elements are clearly audible and not masked by elements that are less essential to the narrative, but also in producing simplified mixes for people listening to content in challenging environments like the commute to work.

Object based audio offers the consumer a lot more control and it also offers the content providers with the technology to deliver one stream of object based content and then to use the metadata to render the most appropriate version for the hardware the consumer is using to playback the content.

Over To You

Are you working with object based audio? What are the challenges of deciding what are objects and what remain beds in Dolby Atmos. Moving on to personalisation, how do you go about making the decisions about what goes where? Do share your thoughts and observations in the comments section below.