Last year I started to explore deepfakes. Initially I bumped into a low quality real-time video stream deepfake project which was available in GitHub. Later on, related to a possible project, I started to explore high quality deepfakes that leverage neural networks (Artificial Intelligence).
In this post I will go through some of the things I’ve learned about actual deepfakes and the ecosystem. When I write about DeepFakes here, I am referring to faked video, in contrast to e.g. fake still pictures. If you are interested in deepfakes then maybe you’ll get some tips on how you can start learning more about them. I am not a deepfake expert, so it is possible some things stated here are not 100% correct. I kept non-technical readers in mind while writing this post. I would have come up with more to write about deepfakes but I am hoping that this way you will last long enough to read the post through.
At the end of the post there is a sneak and peek of a video in which I used in real-time a deepfake model I created of a quite well known person. It is not the best quality of video of what I could have done, but I will explain that in more detail in the post.
Deepfake use cases
Deepfakes have been around for some years. Mostly they are seen in either publicity stunts or they are used for malicious purposes. Lately though, some more or less beneficial use cases for them have been popping up.
You might have heard of Synthesia. Synthesia’s service creates the customer a deepfake avatar of themselves. The customer can then order videos of themselves making informative speeches etc. without taking physically part in the video creation process. Articles about their service have gone viral several times during the last year. E.g. recently in the Wall Street Journal there was an article about their service and a bit earlier locally in Finland a known figure showcased their services in Twitter. I don’t see any reason why Synthesia wouldn’t have started their operations using the same software’s (DeepFaceLab) code base that I used for creating a deepfake. Of course if that would be the case, then they have at least later on forked the code and developed it according to their needs. Point is, the tech is out there for anyone to take.
Disney has also been developing deepfake technology and has even patented a technology for creating high resolution 1024px deepfakes. One of their use cases is apparently to be able to resurrect deceased actors in films.
I will not go into the malicious use cases of deepfakes here. But there are plenty. From porn to propaganda to all sorts of frauds.
Deepfake software
Two of the most popular applications for creating deepfakes are DeepFaceLab and FaceSwap. Both DeepFaceLab and FaceSwap are open source. You can find the code bases from GitHub and freely modify them for your own needs or contribute to the existing projects.
From my perspective the main difference between the two of these was that DeepFaceLab also has a “sister” application called DeepFaceLive. DeepFaceLive makes things more interesting because with it you can fake a person on a video stream in real-time. That means the people who see the stream at the other end see a totally different person who moves and makes all the expressions as the actual person in front of the camera. The deepfake model for DeepFaceLive has to be first created (trained) with DeepFaceLab.
It took me perhaps some tens of hours to understand the technical basics of DeepFaceLab and training deepfakes. I’m a programmer, for a non-technical learner it would likely take more. While learning I also learned more about graphical processing units (GPU from this on). GPUs are essential for training and running AI models. Bear in mind that when I was learning this stuff at the end of the summer and beginning of autumn 2022 the AI stuff hadn’t yet materialised to the biggest thing since the invention of the internet. Since then I’ve been learning some more about training and running AI models, which also deepfakes are.
If you want to create high quality deepfake videos the process is a bit different in contrast to creating a model for real-time video stream usage. When creating the videos you can kind of train the model separately for each target video. That way you can affect the end result even frame by frame. This makes the end result much better quality than it would be if faked in real-time. With DeepFaceLab the best resolution you can achieve is 640px. In the other deepfake application - FaceSwap - you can get even to 1024px resolution, but it can not be used in live streaming. The better quality deepfakes you want, the more performant GPU you need. Performant GPUs are expensive. The prices range from something like 1000-5000€. That narrows down the group of people being able to train high quality deepfake models.
Training the deepfake model for real-time usage
We were interested in running a deepfake model in real-time with DeepFaceLive on a project. That is why I set to train a model that would be usable in real-time streaming. For test purposes the target was a famous Finnish person. For training I used video material that I found from the internet.
Training the deepfake model took maybe 2-3 weeks. I was learning the stuff from tutorials while doing it. By far the majority of the time though I was not actively doing anything, but it was the GPU that was training the neural network of the target person’s facial features.
I must say creating the model did feel a bit creepy. I was staring for some tens of hours at tens of thousands of close up frames of the person’s face and different facial expressions, and as if that was not enough, I even happened to bump into the person on the street during the process.
The deepfake model ended up being quite good, given the fact that the training was stopped before “needed” due to external reasons and the fact that it was trained with a graphics card from 2017 (an NVIDIA GTX 1080 Ti). With a new top tier NVIDIA I could have trained a better quality model with better settings.
You can also make videos with DeepFaceLive but the quality is not as good as you can get when making them specifically for some video with DeepFaceLab. Below is an example of such a video faked in real-time with DeepFaceLive with the model I created. I could have used a better quality source video with a higher pixel density, but with the old GPU it would have required the use of e.g. Adobe AfterEffects and that is not in my area of expertise.The clip is just filmed with my mobile phone from the computer screen. It reduces the quality even more. The lagging is because the GPU can’t keep up the faking with the source video. From the video you can get a glimpse of the quality that you achieve even with a graphics card from 2017 in a live stream. The original clip is from a TedX talk. You probably recognize who the deepfaked person is.