Enhancing Deepfake Video Verification using Spatial-Temporal Long-Distance Attention and weak supervision

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Enhancing Deepfake Video Verification using Spatial-Temporal Long-Distance Attention and weak supervision

Version

File Size 408.85 KB

Downloads 22

Files 1

Published 8 April 2026

Updated 8 April 2026

Enhancing Deepfake Video Verification using Spatial-Temporal Long Distance Attention and weak supervision

B.Rajesh1 , Guttula Kavya Sri2 , Neelam Sai Swetha3 , Kodur Sravanthi4 , Obul Reddy Puli5 1Assistant
Professor Dept of Information Technology, SV College of Engineering, Tirupati, India.
2B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India.
3B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India.
4B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India.
5B.Tech, Dept of Information Technology, SV College of Engineering, Tirupati, India.
Email:1bondirajesh88@gmail.com, 2kavyasreegutthula@gmail.com,
3swethaneelam2805@gmail.com, 4kodurusravanthi2004@gmail.com,
5obulpuli414@gmail.com
Corresponding Author*: B.Rajesh

ABSTRACT:With the rapid advancement of deepfake technologies, detecting highly realistic forged facial videos has become increasingly critical yet challenging. Existing detection methods mainly treat this task as a binary classification problem, often relying on fragile, specific semantic or local artifacts and lacking effective global context modeling. This paper reformulates deepfake detection as a fine-grained classification problem, where subtle and localized differences between real and fake faces must be captured. To address existing limitations, a novel spatial-temporal model is proposed that integrates a long-distance attention mechanism designed to assemble global spatial and temporal information. The spatial module focuses on detecting generation artifacts within individual frames by recalibrating shallow texture features, while the temporal module captures inter-frame inconsistencies by guiding mid-level semantic features using motion residualsacross consecutive frames. This dual-attention approach leverages non-overlapping image patches and trainable global forgery templates to highlight critical forged regions. Extensive experiments on public datasets demonstrate that the proposed method significantly outperforms state-of-the-art approaches, achieving robust accuracy even under heavy compression and cross-dataset settings. The design's weakly supervised nature enhances adaptability and interpretability, making it a promising direction for future deepfake video detection systems.KEYWORDS: deepfake technologies, spatial-temporal model, attention mechanism, binary classification,