Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos
The ability of artificial intelligence techniques to build synthesized brand new videos or to alter the facial expression of already existing ones has been efficiently demonstrated in the literature. The identification of such new threat generally known as Deepfake, but consisting of different techniques, is fundamental in multimedia forensics. In fact this kind of manipulated information could undermine and easily distort the public opinion on a certain person or about a specific event. Thus, in this paper, a new technique able to distinguish synthetic generated portrait videos from natural ones is introduced by exploiting inconsistencies due to the prediction error in the re-encoding phase. In particular, features based on inter-frame prediction error have been investigated jointly with a Long Short-Term Memory (LSTM) model network able to learn the temporal correlation among consecutive frames. Preliminary results have demonstrated that such sequence-based approach, used to distinguish between original and manipulated videos, highlights promising performances.