Should the audio in augmented reality and immersive content merely meet audience expectations, or challenge them? Where do you draw the line between feeling like you’re in reality, or in a fantastical world?
Matthew Smith is a renowned video game developer and former audio director at Rockstar North, best known for his work on a variety of high-profile game titles.
Guns Sound As You’d Expect
That was the soul-crushing audio verdict I read in a review of Grand Theft Auto: San Andreas - the first video game I worked on, and that I had dedicated unknown hours to, tweaking and tuning the array of weapons available to Carl Johnson. It felt like a year of my life wasted - that nobody appreciated the craft and effort we’d all put in - that the audio details were taken for granted.
Pictures Taking The Credit? The Power Of Suggestion
But then another review heaped praise on the graphics, mentioning the great-looking lightning during one specific mission. Except I remembered there was no lightning - we’d added “fake” thunder to make it feel more dramatic.
That’s when I first realised the subtle but awesome power that audio has, partially made possible by people typically not noticing it at all - we’d made thunder sound so good, someone thought they saw lightning!
That kick-started my love of interactive audio, and working on GTA and Red Dead Redemption for a decade after that, we made good use of that power. A line of dialogue saying “this feels scary” lets you know something is coming, but doesn’t really affect you - drop the bird-song out of the ambience and make the wind whistle through the trees, and you actually feel tense.
And beyond driving mood, audio also provides a huge amount of information and context without occupying much mental bandwidth or precious screen real-estate. Back to weapons, I remember the effort we put into making gunshots sound appropriately near or far, as if they came from inside a building or in open space - and while that was a nice detail in single-player, its in multiplayer that it really paid off - you subconsciously felt where the danger was coming from, and how much attention to pay it, without needing to clutter the screen or actively think about it. Guns sound as you’d expect...
At Krotos, we’re fascinated by the use of sound in cutting edge creative projects, and no more so than in Augmented Reality. The current early-stage AR efforts are starting to demonstrate its potential, but they’re very much just annotating the real world, rather than interacting with it - or at most placing fairly static virtual objects so they don’t fall through the floor.
A New Generation
But as we advance - as a generation grows up that’s never lived without AR, that’s almost never seen the world UN-augmented, like 20 year olds now have never known life without the mobile internet - we’ll go far beyond that, to an augmented world where what’s “real” and what’s not is almost impossible to tell, and to a large degree irrelevant.
And in that world, the augmented elements - people, user interfaces, artwork, virtual assistants, will all feel broken if they don’t interact believably - both coherently and intuitively - with each other, with the physical world, and with you the user.
And that’s a multi-faceted problem - the lighting on an object will need to reflect, so to speak, reality - it will need to physically respond to other objects - and it’ll need to fit into and affect the sonic landscape. You can make the effort to learn any new user interface, but we’re all born with millions of years of evolution teaching us to instinctively recognise impossibly nuanced patterns in the way the world looks, behaves, and sounds - so a virtual world that makes use of that can be orders of magnitude richer and more complex, without needing to be learnt at all.
Integrating seamlessly with the real audio world will involve many elements - an early focus has been on spatialized audio in AR/VR, very similar to problems we’ve needed to solve in 3D video games for years. A virtual object will also need to sound appropriate for its surroundings - echoing in a tunnel, sounding deadened through a wall. Again, similar challenges to those faced by complex video games.
But beyond that, the object will need to interact with the physical world - a virtual ball bounces on a physical table - what does that need to sound like? It depends on the physical properties of both the ball and the table - and unlike in a video game where someone designed the table, this is a real-world object that could be made of anything.
Video calls are an obvious application of AR - current tech still feels stilted and awkward, but having another person feel like they’re physically in the same room as you will surely help break down that wall and make conversations more natural.
But a clunky 3D avatar half intersecting your coffee table, and sounding like they’re in an empty subway station rather than your living room, will immediately put up another wall, and be no better than the current stilted interruption-fest.
Why Are Video Calls So Awkward?
Lag of course is part of it - but the vast amount of non-verbal clues that our subconscious is astonishingly good at picking up get lost when the other person doesn’t share your space. The human brain is astonishingly good at picking up on tiny details - it’s why the uncanny valley exists, and why incredibly nuanced details in virtual worlds actually matter.
It’s What You Don’t Notice That’s Important
Designing believable physical interactions - collisions, scrapes, fractures - of virtual objects in a large game is a bewilderingly hard problem - we invested a massive amount of time to solving that at Rockstar, and if you listen to the recent Red Dead Redemption, you’ll hear an amazingly complex and varied array of sonic interactions. All of which make the player not notice the sound of a half-full wooden barrel rolling over gravel and bumping into a brick wall, the liquid sloshing inside. It’s the not noticing that contributes so much to the immersion.
That level of detail is only possible in games with multi-hundred million dollar budgets, and half-decade long development schedules. And it’s far more challenging still in an AR world, where you can’t ever design the environment your app will be used in. It takes hard design-time problems and turns them into even harder run-time problems.
We believe in the potential of AR to be transformative to people’s lives - to be as important, when looking back in 20 years, as the internet or smartphones. And we also believe it shouldn’t take massive budgets and glacial development timescales for developers to contribute to it.
We want the power to make immersive experiences, tools, stories - things we can’t even imagine yet - in the hands of everyone with a creative vision. We want to take the grunt work out of adding audio to AR projects, and let developers focus on the functionality and creativity.
No Magic Bullet
We don’t have a product that magically does that yet - it’s early days for AR, and we don’t know exactly what problems need solved, or what other pieces of the puzzle need to be in place. But we have a number of important building blocks, and speaking to some of the big players in the field, we’re beginning to focus in on the most pressing problems, and create concrete solutions.
The technology behind Reformer Pro is one of those building blocks, and an important one. I’ve spent well over a decade working in interactive audio, including developing plenty of innovative technologies, and I was simply amazed by Reformer when the team first demoed it to me. You don’t need to understand how it works to see how incredibly useful and powerful it is, but behind the scenes the tech is super-impressive, and my mind still boggles at the potential applications beyond the core product.
At its heart, Reformer takes one audio “performance” - say someone speaking, or a pattern of drum beats, or the sound of someone with a limp dragging their bad leg over a dirt track - and then recreates that same performance with a different palette of sounds - say a lion instead of a human voice, the components of a car crash instead of the drums, or cowboy boots through snow instead of sneakers over dirt.
Let Sound Do The Heavy Lifting
It’s truly magical to hear it work - and I’m sure you can immediately imagine how it can play a role in AR: You bang your hands down on the table in excitement on an AR Skype call, and the person you’re talking to hears their water-glass jiggle on the table in front of them.
You shoot the space pirates over your shoulder without looking, and know you got a direct hit, because the explosion reverberates through your kitchen the way your brain innately knows it should, half of the debris tinkling on your marble tiles, and half bouncing on the wooden floor of your living room. You don’t need to turn round, or glance at the radar, you don’t even think about it, you just know you hit it just in time and the burning wreckage isn’t too close to your feet.
You’re walking to a conference and it starts raining, and you hear the raindrops on your husband’s jacket as he walks beside you, even though he’s 1000 miles away in your living room on the end of a FaceTime call - and it’s that feeling of presence, as much as the words he says, that make you feel close.
There’ll be plenty of “wow, listen to that” moments in AR - having Childish Gambino swagger through your living room rather than stay glued to your smartphone screen, or experience the X factor panel sit on your sofa and judge your singing (heaven forbid) - but those aren’t what’s important for audio in AR.
Stand Out By Not Being Noticed
What’s important is the 99% of the time people simply don’t notice it at all, but it sells them on the reality, the physicality, and the believe-ability of this brave new world.
Augmented Reality sounds as you’d expect.
What Do You Think?
Matthew Smith is a video game developer, former audio director at Rockstar North and a non-executive director at Krotos Audio, best known for his work on a variety of high-profile titles including Grand Theft Auto V, Max Payne 3, L.A. Noire, Red Dead Redemption, Midnight Club: Los Angeles, Grand Theft, Auto IV and Grand Theft Auto: San Andreas.
Do you agree with Matthew’s view on the state of audio for Augmented Reality and Immersive Content? Please leave your opinion in the comments section below.