October 25, 2019

The Coming Age of Fake Faces and Voices

As AI and machine learning become better at reproducing human likenesses and speech, we wonder how society and the creative industries will cope once the technology becomes widespread. We look at the possible ramifications of Deepfakes and the lesser-known Adobe speech engine VoCo, dubbed “Photoshop for the voice”.

Deepfakes and VoCo

By now, the Internet is no stranger to Deepfakes, whether it’s through hearing about its baser use cases or laughing our way through “re-cast” scenes from iconic films. The technology uses multiple images or footage of a person’s face to create an animated model that can be superimposed atop the original. But few seem to be aware of a similar and arguably, more powerful technology: fake voices. When it was announced in 2016, VoCo was touted as Adobe’s “Photoshop for voice” and while updates have been sparse since, other similar platforms have stepped in, such as LyreBird.

To get a feel for what Voco can do, check out this video where the technology was first debuted at Adobe MAX 2016. It shows the speech engine replicating the voice of actor and director Jordan Peele (who co-hosted) to make him say some funny but embarrassing things he has never said before — all using only 20 minutes of his recorded speech. Coincidentally, Peele also made a PSA where he provided the voice of a deepfaked President Obama in an effort to underscore a renewed need for media literacy in the age of Deepfakes.

Misinformation, Echo Chambers and Social Fallout

We’re continuing to keep a pulse on the potential for big data to amplify narratives, sway conversations and change culture for better or worse. Unfortunately, in the age of fake news, fact-checking is playing a losing game of cat and mouse with dubiously factual content or straight-up misinformation.

We’ve always used a combination of technology and creativity — well-intentioned or malicious — to shape reality, whether it means “cheating” shots to get a certain look on a budget or doctoring media for libelous reasons. Yet every generation has also had experts that keep us informed of how these things are done. The issue that’s most worrying is both that the tech is improving and we’re not listening anymore: even when shown evidence against their beliefs, people will dig in their heels and defend them.

Social media’s information silos and echo chambers threaten to become even worse once the average tech-savvy netizen is able to Deepfake and VoCo-lize with ease. When we lose the ability to trust our senses that much more (something we’ve already been losing as of late), it makes even the most engaged of us despondent to the state of the world and eager to just shut everything off.

The Potential Creative Outcomes

All said, it would be cynical to conclude that the only uses for these technologies are nefarious ones. “Hate the player, not the game,” as they say and we see a lot of potentials for Deepfakes and Voco to assist artists and creative workers.

For creatives providing their likenesses or voices and the people processing them, we see this new dynamic going one of several ways:

  • Quick Fixes: Not unlike content aware tools for Photoshop, Deepfakes and VoCo-like technology can help patch up more severe mistakes that can’t be done with conventional editing of the source material. This will evidently, lower the cost of reshoots and other production expenses as Adobe originally stated for VoCo.

As always, getting things done right the first time will always prevail, and for that there will be someone still thankful for not having to Deepfake or Voco correct hours of poorly captured footage, not to mention it still might not replace the real thing (which is why practical film effects still have an edge on CGI in many cases).

  • Updated Terms: We imagine there is a need to update contracts down the line that prevents someone from creating derivative content off of the images provided for a given project. For instance, an agency could create advertising materials out of video footage of us from say, a music video — so long as we’ve signed off on it.

But as the legal stance on deepfakes and similar content catches up, we could see the addition of key clauses that stipulate something to the effect of : “the client shall not create new material generated by AI taught using the artist’s likeness, voice or previous work.” Or if we allowed it, we could negotiate to be compensated depending on how much content is generated against a portion of our day rate (we’re going to assume the original voice of Siri, Susan Bennett was paid handsomely for her efforts).

  • Composite People: If Generated Photos’ 100,000 Faces project (which generated as many portraits through machine learning) has taught us anything, it’s that AI is getting better and better at generating realistic likenesses of people (albeit portraits of them). We can and should protect the rights to our unique selves and content generated from them, but what if we become less than a thousandth of a generated person in body or voice? Perhaps we could be entitled to a thousandth of the royalties, depending on the platform!

The Takeaway: A Re-Shuffling of the Creative Landscape

All in all, we still don’t know how much machine-generated personalities will change the creative landscape just yet, but we doubt it will be a clear-cut net positive or negative. Take our previous example of digital clothing collections made for the gram: in cases like these, the designer keeps their job, the pattern maker loses theirs, and the 3D modeler posing outfits onto customer photos gained a new one.

Even once we get to the stage where we’re using fully-posable photorealistic models of digital people using text-to-speech that nails personality, we predict the most-respected work and their creators will continue to pride themselves on employing, connecting with and working with real humans that can think for themselves, versus simply doing or saying what they’re programmed to do.

Play Pause
Context—
Loading...