The Zao app, by Changsha Shenduronghe Network Technology Co Ltd, was released on the Chinese iTunes store a week ago and was popularized in a tweet by Allan Xia.
It is not currently available through iTunes in the U.S. but with a bit of hard work I was finally able to install a copy. I was concerned that the capabilities of the app might be exaggerated but it actually exceeded my expectations. As a novelty app, it is fascinating. As an indicator of the current state and future of deepfakes, it is a moment of titanic proportions.
As of a year ago, when the machine learning tool Fake App was released, a decent deepfake took tens of hours and some fairly powerful hardware to generate. The idea of being able to create one in less than 30 seconds on a standard smartphone seemed a remote possibility at the time. Even impossible.
The Zao app also does some nice things I’ve never gotten to work well with deepfakes/faceswap or deepfacelab – for instance like handling facial hair.
… or even no hair. (This is also a freaky way to see what you’ll look like in 15-20 years.)
What is particularly striking is the way it handles movement and multiple face angles as with this scene from Trainspotting and a young Obi Wan Kenobi. In the very first scene, it even skips over several faces and just automatically targets the particular one you specify. (In other snippets that include multiple characters, the Zao app allows you to choose which face you want to swap out.)
All this indicates that the underlying algos are quite different from the autoencoder based ones from last year. I have some ideas about how they have managed to generate deepfakes so quickly and with a much smaller set of data.
Back in the day, deepfakes required a sample of 500 source faces and 500 target faces to train the model. In general, the source images were rando and pulled out of internet posted videos. For the Zao app, there is a ten second process in which selfies are taken of you in a few different poses: mouth closed, mouth open, raised head, head to the left and blinking. By ensuring that the source images are the “correct” source images rather than random ones, they are able to make that side of the equation much more efficient.
While there is a nice selection of “target” videos and gifs for face swapping, its is still a limited number (I’d guess about 200). Additionally, there is no way to upload your own videos (as far as I could tell with the app running on one phone and Bing translator running on a second phone in the other – the app is almost entirely in simplified Chinese). The limited number of short target videos may simply be part of a curation process to make sure that the face angles are optimized for this process, mostly facing forward and with good lighting. I suspect, though, that the quantity is limited because the makers of the Zao app have also spent a good amount of time feature mapping the faces in order to facilitate the process. It’s a clever sleight of hand, combined with amazing technology, used to create a social app people are afraid of.
The deeper story is that deepfakes are here to stay and they have gotten really, really good over the past year. And deepfakes are like a box of chocolates. You can try to hide them because they are potentially bad for you. Or you can try to understand it better in order to 1) educate others about the capabilities of deepfakes and 2) find ways to spot them either through heuristics or CV algorithms.
Consider what happened with Photoshopping. We all know how powerful this technology is and how easy it is, these days, to fake an image. But we don’t worry about it today because we all know it can be done. It is not a mysterious process anymore.
Making people more aware of this tech, even popularizing it as a way of normalizing and then trivializing it, may be the best way to head off a deepfake October surprise in the 2020 U.S. elections. Because make no mistake: we will all be seeing a lot of deepfakes in October, 2020.