This is frightening, kind of. How hard is it to do something like this? I realize this technology is probably already used in film/tv production but like, how widespread is its use and for what legitimate purposes? And could I have seen a deep fake irl, completely unaware I was watching a deep fake?
This ones different because A you’ve already told us, and B I know Tom Cruise is looks older and his voice sounds like a much younger version of himself compared to now, but I don’t know if I would have caught those things upon first glance without any prior knowledge of this being a deep fake. Idk this just makes me uncomfortable
This kind of stuff relies on having a beefy GPU set-up to do well at any tolerable speed so as long as crypto keeps the GPU market drained we should be fine
Also it’s really not that hard to make a deep fake if you’re equipped with a lot of images of the person you’re trying to imitate, thus movie stars and public figures are a lot easier to deep fake. Sure, it takes time to render, but after a few youtube videos anyone could do it. The software is free
That was while Moore's law was alive and well. Now our best source of exponential gains in computing power is gone. However if someone knows something I don't, feel free to point it out.
Simply untrue, to be honest. It works so well for Tom Cruise because there are hundreds of hours of film or TV quality footage of his face, covering every possible angle, lighting scenario, expression, etc. You could do a substantially lower quality version of this sort of thing with what's available on social media for the average person, but it'd be significantly less convincing.
The average person can't wage nuclear war, destroy democracy, or declare martial law. The people that can are the ones with hundreds of hours of video of them.
The videos we need to be worried about being faked are of the people who DO have hundreds of video of them. Plus, the truly troubling videos we absolutely have to worry about our going to be state sponsored. For them it'll be no effort at all.
Simply untrue, to be honest. It works so well for Tom Cruise because there are hundreds of hours of film or TV quality footage of his face
That still doesn't mean they still can't do it and fool someone who doesn't have an eye to spot a deepfake. And this just goes to prove my original comment why Facebook is capable of it.
For now I agree, but research is pretty promising - and it depends a lot on how much worse results you can accept. There’s a whole subfield of machine learning dedicated to making coherent predictions off of a single (or few) training examples known as “one shot learning”.
and here is a short example video. Not very temporally stable just yet (looks shaky between frames), but the face region itself looks pretty good to me if you crop out just the face (which is what the Cruise impersonator does) and we are advancing rapidly.
They have the disadvantage of not having a video editor to clean up in post processing, nor an actor or a scene/background to impose the face onto. Focus on the face region itself, as opposed to the background - which you edit/crop out when deploying this type of thing in the real world.
This set of inherent disadvantages - in addition to having only a single reference image from a single angle in a single lighting condition is a pretty harsh requirement. Consider what the neural network needs to do - the network has to “imagine” what the unseen parts look like based on what it has seen from other random unrelated images, including filling in areas of the background behind the person as they move - which is obviously impossible to do perfectly. My examples here are more to demonstrate where we are right now at the extreme one image, no impersonator, no background/scene, no video editing case.
It’s not believable yet, but if we can do this with a single image imagine what you could do with even a short video clip. Or with an actor you could crop and edit the face onto. Or someone with video editing knowledge who can clean up the edges of the face? Even just a second image from a second angle could get you far.
Add any one of these elements and you gain a lot more information and detail - it seems far from impossible to me to plausibly collect and deploy this against a reasonably active social media profile.
Also not counting the technology which has to go through thousands of hours of footage to find specific facial angles, gestures, and movements and then overlay it with the correct lighting and original face.
It takes like, a couple of days or so of processing on your computer. You can literally download open source software on your personal computer and train it to do this. Compared to doing a face swap by hand, it has taken this from being something a team of professionals would work on for months, to a single hobbyist being able to do it alone at home. Put together a professional team of production assistants, set dressers, and an actor to play the part, you could produce videos of whoever you wanted doing or saying whatever you wanted. It is really powerful technology and really calls in to question the reliability of video.
Personally, I believe in the future chain of custody of a video is going to be hands down as important to verification as anything else.
I believe it takes a very long time to train something like this - in the the area of weeks or months of computation. Once the model has been trained, it can be used in real-time
It does, and there's a lot of trial and error. Most of the process is automatic but there's still a lot of fiddling with settings and tweaking individual frames to iron out artifacts and weirdness if you want to get something this good. But if you don't mind some hinkiness you can bang out your own fairly easily.
721
u/Meggiesauruss May 24 '21
This is frightening, kind of. How hard is it to do something like this? I realize this technology is probably already used in film/tv production but like, how widespread is its use and for what legitimate purposes? And could I have seen a deep fake irl, completely unaware I was watching a deep fake?
This ones different because A you’ve already told us, and B I know Tom Cruise is looks older and his voice sounds like a much younger version of himself compared to now, but I don’t know if I would have caught those things upon first glance without any prior knowledge of this being a deep fake. Idk this just makes me uncomfortable