Real-Time Multimodal Emotion Recognition

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Real-Time Multimodal Emotion Recognition

Version
Download 129
File Size 884.28 KB
File Count 1
Create Date 22 April 2024
Last Updated 22 April 2024

Real-Time Multimodal Emotion Recognition

1.Mrs.S. Bhargavi, 2.Mrs.B.Siva Kumari 3.Ms. M.Swathi, 4.Ms. K.Kalyani, 5.Ms.Ch.Rajasri

Faculty of Electronics and Communication Engineering, Bapatla women’s Engineering College, Bapatla, Andhra Pradesh, India.
Faculty of Electronics and Communication Engineering, Bapatla women’s Engineering College, Bapatla, Andhra Pradesh, India.
Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.
Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.
Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.

ABSTRACT:

Safeguarding the well-being of women and children presents a challenging research endeavor. Multimodal emotion recognition poses a formidable task within this domain. The field of Human-Computer Interaction (HCI) heavily relies on multimodal data, encompassing audio, video, text, facial expressions, body motions, bio-signals, and physiological data, to predict the safety of women and children. Substantial research efforts have been dedicated to this cause. To develop an optimal multimodal model for emotion recognition, which integrates visual, textual, auditory, and video modalities, a novel deep learning framework is proposed. This framework involves a comprehensive analysis of data, feature extraction, and model-level fusion. Innovative feature extractor networks are tailored specifically for processing visual, textual, auditory, and video data. At the model level, an effective multimodal emotion recognition model is devised, synthesizing information from images, text, voice, and video. The proposed models exhibit impressive performance on three benchmark multimodal datasets, namely IEMOCAP, Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE), achieving high predicted accuracies of 96%, 97%, and 97%, respectively. Comparative analysis with existing emotion recognition models further validates the efficacy and optimality of the proposed approach. The application of multimodal enhanced emotion recognition holds promise in predicting women and children's safety.

Index Terms: Facial Expression Recognition, Deep Learning, Multimodal, Women's Safety, Audio-Visual Media, Fusion.

Download