Adaptive Multi-Modal Driver Drowsiness Detection Using Temporal Convolutional Networks
Adaptive Multi-Modal Driver Drowsiness Detection Using Temporal Convolutional Networks
Authors:
A.Karunamurthy, Associate Professor, Dept of CSE, Sri Manakula Vinayagar Engineering College (SMVEC), Puducherry,India. karunamurthy26@gmail.com (corresponding Author)
P.Gurumoorthy, PG Student, Dept of MCA, Sri Manakula Vinayagar Engineering College (SMVEC), Puducherry,India. gurumoorthy0809@gmail.com
Abstract
Driver drowsiness remains a leading cause of road accidents, yet existing detection systems often rely on rigid thresholds applied to isolated physiological signals, resulting in high false positive rates and poor generalization across individuals. We propose a multi-modal adaptive framework that integrates facial landmark detection, eye and mouth aspect ratios, and head pose estimation into a unified temporal model for real-time fatigue scoring. The system first extracts key facial points using a MobileNetV3 backbone with a coordinate regression head, from which geometric ratios such as the Eye Aspect Ratio and Mouth Aspect Ratio are computed. Head pose angles are simultaneously estimated by solving the Perspective-n-Point problem. Instead of applying fixed thresholds to these individual metrics, we feed the continuous feature sequence—including blink frequency derived from the temporal derivative of the Eye Aspect Ratio—into a Temporal Convolutional Network (TCN) with dilated causal convolutions. The TCN captures long-range temporal dependencies that distinguish transient distractions from genuine drowsiness patterns. A fully connected layer with sigmoid activation then outputs a scalar fatigue score between zero and one, representing the probability of a drowsy state. This adaptive score dynamically replaces conventional binary alert logic, allowing the system to respond to complex, multi-variate fatigue signatures such as the simultaneous occurrence of yawning and head nodding. The proposed method reduces false alarms caused by natural facial expressions or environmental factors. Furthermore, the end-to-end learning framework eliminates the need for manual threshold tuning. Experimental results demonstrate that the temporal integration of multiple modalities significantly improves detection accuracy compared to single-metric approaches. The framework is computationally efficient and suitable for real-time deployment in embedded driver monitoring systems.
Keywords Driver drowsiness detection · Temporal convolutional network · Facial landmark detection · Eye aspect ratio · Mouth aspect ratio · Head pose estimation · Multimodal learning · Real-time driver monitoring