Amongst discussion of their image manipulation software, Adobe unveiled Project Voco, at Adobe MAX (Sneak Peaks). This incredible software allows you to actually change the words in a piece of recorded dialogue, simply by text entry.
This user interface belies an incredibly powerful speech manipulation engine. Not only can you edit dialogue by changing text, you can actually generate words that didn't exist in the original recording.
In 2014, Andy Moorer shared his Visual Speech Editor project, which laid some of the groundwork for Project VoCo.
Adobe Audition has been featuring synthesized speech technology in the Generate Speech function since last year, which enables any TTS-compatible voice installed and licensed on the system to be used for generating speech directly in the waveform and multitrack environments.
Zeyu Jin’s Project VoCo builds on these concepts to provide what could be an incredible dialogue editing tool, which has really caught the attention of a lot of the industry, for a variety of reasons.
Don't Talk Too Long
Project Voco can't just generate convincing dialogue out of thin air - it needs around 20 minutes of the subject talking, in order to form some kind of "voice print". This all sounds very 60's science fiction I know, but it's absolutely incredible, as this video from Adobe MAX (Sneak Peeks) shows -
Obviously technology of this power could actually be very dangerous in the wrong hands. As Jordan Peele says in the video, just as they are working very hard making it sound perfect, they're also working equally as hard to try and make it detectable, through some form of watermarking.
This factor has caused Adobe to release the following statement -
Project Voco was shown at Adobe MAX as a first look of forward looking technologies from Adobe’s research labs and may or may not be released as a product or product feature. No ship date has been announced.
I'm blown away by this leap in research, but we'll probably have to wait a while before I'm writing any hands-on reviews!