Learning spatiotemporal information for deepfake detection

Guan, Kaili

Author:	Guan, Kaili
Title:	Learning spatiotemporal information for deepfake detection
Advisors:	Wang, Yi (EEE)
Degree:	M.Sc.
Year:	2024
Department:	Department of Electrical and Electronic Engineering
Pages:	49 pages : color illustrations
Language:	English
Abstract:	In recent years, deepfakes have witnessed a more extensive application on the Internet, thereby giving rise to numerous concerns regarding network security among the public. This situation has drawn the attention of a growing number of academic researchers to the domain of deepfakes. Moreover, with the increasing threat that deepfakes pose to society and the mounting attention they have garnered, there has been a substantial growth in the number of individuals engaged in the research on deepfake detection. Despite the performance of existing approaches being satisfactory, they are afflicted with a significant drawback, namely, a hefty computational demand. For instance, the majority of current video-level deepfake detection methods rely on 3DCNN. Although it exhibits excellent performance in terms of data processing and overall efficacy, its computational requirements are remarkably high. This dissertation puts forward an effective methodology that integrates Thumbnail Layout (TALL), ResNet101, Temporal Encoding (TE), and Graph Convolutional Networks (GCN) strategies. The Thumbnail Layout approach entails extracting frames from videos and subsequently assembling them into predefined layouts, this process can retain both spatial and temporal information. Owing to its characteristics of being model-independent, ResNet is introduced. ResNet boasts a potent feature extraction capacity and a relatively rapid inference speed. Additionally, it demands fewer computational resources when compared to the Swin Transformer. Consequently, the combination of Thumbnail Layout with ResNet facilitates feature extraction, which contributes to a more profound understanding of spatial information. Temporal Encoding is then appended to the extracted feature information, enabling the network to better comprehend the position of features along the time axis. Subsequently, the resultant data is fed into GCN. GCN is capable of more precisely capturing the minute differences in lighting, shadow, and expressions within deepfakes, and it also facilitates a more comprehensive understanding of temporal information. The proposed approach is hereby named TALL-ResNet101-TE-GCN. Subsequently, experiments were carried out on the FaceForensics++ dataset. The experimental results prove that TALL-ResNet101-TE-GCN can enhance the deepfake detection ability of the model in comparison to TALL-ResNet101.
Rights:	All rights reserved
Access:	restricted access

Files in This Item:

File	Description	Size	Format
8295.pdf	For All Users (off-campus access for PolyU Staff & Students only)	7.01 MB	Adobe PDF	View/Open

Copyright Undertaking

As a bona fide Library user, I declare that:

I will abide by the rules and legal ordinances governing copyright regarding the use of the Database.
I will use the Database for the purpose of my research or private study only and not for circulation or further reproduction or any other purpose.
I agree to indemnify and hold the University harmless from and against any loss, damage, cost, liability or expenses arising from copyright infringement or unauthorized usage.

By downloading any item(s) listed above, you acknowledge that you have read and understood the copyright undertaking as stated above, and agree to be bound by all of its terms.

Show full item record

Please use this identifier to cite or link to this item: https://theses.lib.polyu.edu.hk/handle/200/13889