Adobe recently demonstrated one of those software products which periodically appear and open up genuinely new possibilities. I posted 18 months ago about plugins which do something not possible using traditional hardware. The plug ins in this post were for the most part extensions of existing processes like EQ or noise reduction. However, as Alan Sallabank highlighted in this story about Adobe Voco, which includes an Adobe demo that appears to be genuinely breaking new ground. Instead of using editing techniques to manipulate existing dialogue to change the meaning of spoken words, Voco offers the possibility of synthesising new words and phrases simply by typing, based on around 20 minutes of a person's speech. See the demonstration of Voco below:
The benefits of such a system are clear to see. Apart from inevitably heralding a new age of comedy (our public figures are really going to hate this software which is going to turn everyone into a skilled impressionist). Common workflow tasks such as last minute changes which might previously have necessitated a call back for a voice artist will now be simple to fix in software. I’m not sure what this might mean for the incomes of voice artists but I’d be surprised if they came out of this as winners. Legal and profanity issues will be easily fixed before broadcast but even the most casual observer will be aware that in the same way as Photoshop has made it impossible to take a still image as proof of anything and many people are now wary of images posted online as potential fakes, as technology makes audio manipulation which was previously difficult and often impossible potentially routine, it inevitably will have an effect on our perception of evidence and therefore of truth.
The Adobe system needs to “learn” the characteristics of the speaker to be synthesised. The figure given in the demo was 20 minutes of dialogue from which to learn. Assuming this can be general speech and not specific tutorial phrases then finding 20 minutes of clean dialogue from whichever politician you wish to incriminate shouldn’t present too much of an issue!
In their demonstration Adobe were quick to confront the issue of ethics. They refer to a watermarking system. Detail was limited but hopefully it is a system which will be suitably resistant to removal as I immediately thought of DRM and SCMS as failed attempts to control what people do with audio recordings which couldn’t survive a simple analogue transfer. I’m sure they are more sophisticated than that but I’d be surprised if detecting tampered audio of this kind didn’t develop into a new area of forensic audio.
The concerns around faked audio have gone mainstream. The BBC recently reported on Voco.
The faking of still images is entirely routine. We now assume images of celebrities to be heavily doctored and while this is indirectly harmful to many, once the “truth” represented by a photograph becomes as arbitrary as it has then blatant manipulation of the assumed truth of an image becomes commonplace and unremarkable.
A certain Mr Trump was recently involved in a perfect example of how damaging an audio recording can be but in this “post truth” era, when we can’t trust text, still images or even audio recordings not to have been doctored what does reliable evidence look like? Video perhaps?
Unfortunately video, while presenting significant barriers to manipulation, and currently enjoying its status as too difficult to fake for most people to bother, isn’t far behind and recent developments such as Face2Face show just how astonishing the results can be. Combined with the techniques found in Voco, changing dialogue in video projects in post could become routine. As with any early iteration of a technology it suffers from a slightly fake feel to its output but in the same way as the Money For Nothing video led to Toy Story, I don’t think it’s going to be very long before we won’t be able to trust video any more than we can trust a still image today.
Of course there are positives, this will be a godsend for dubbing films into multiple languages but part of me would be happy to stick with subtitles if it means I can still believe the news…