AI-Powered Mental Health Diagnostics Using Multimodal Data: A Deep Learning-Based Approach
Harsh Kumar1*, Rishabh Raikwar2, Rakshit Jaryal3, Adarsh Kumar4, Aryan Koundal 5, Navjot Singh Talwandi6
1,2,3,4,5,6*Department of AIT CSE, Chandigarh University, Gharuan, Mohali, 140413, Punjab, India.
*Corresponding author(s). E-mail(s): harsh2017himani@gmail.com;
Contributing authors: rishabhraikwarssc@gmail.com; rakshjaryal@gmail.com; kumaradarsh040604@gmail.com; aryankoundal2005@gmail.com; navjot.e17908@cumail.in ;
Abstract
In recent years, the growing prevalence of mental health disorders has highlighted the urgent need for efficient, scalable, and accurate diagnostic tools. Traditional methods often rely on self-reporting and clinical interviews, which can be limited by subjectivity and accessibility. This research presents an AI-powered system for mental health diagnostics using multimodal data—integrating natural language (text), audio (speech tone and pitch), and visual (facial expression and micro- movements) inputs. By combining these modalities, the system captures a more comprehensive understanding of emotional and psychological states. Deep learn- ing models, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer-based architectures, are employed for feature extraction and classification. The model is trained and validated on benchmark datasets such as DAIC-WOZ and AVEC. The results show enhanced diagnostic performance over unimodal systems, offering potential as a non-invasive, real- time, and accessible tool for early mental health screening and support. This work contributes toward ethical, AI-assisted mental healthcare.
Keywords: Artificial Intelligence, Mental Health, Multimodal Data, Deep Learning, Emotion Recognition, NLP, Speech Analysis, Facial Expression