Real-Time Multimodal Emotion Recognition
- Version
- Download 36
- File Size 884.28 KB
- File Count 1
- Create Date 22 April 2024
- Last Updated 22 April 2024
Real-Time Multimodal Emotion Recognition
1.Mrs.S. Bhargavi, 2.Mrs.B.Siva Kumari 3.Ms. M.Swathi, 4.Ms. K.Kalyani, 5.Ms.Ch.Rajasri
- Faculty of Electronics and Communication Engineering, Bapatla women’s Engineering College, Bapatla, Andhra Pradesh, India.
- Faculty of Electronics and Communication Engineering, Bapatla women’s Engineering College, Bapatla, Andhra Pradesh, India.
- Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.
- Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.
- Student of Electronics and Communication Engineering, Bapatla Women’s Engineering College, Bapatla, Andhra Pradesh, India.
ABSTRACT:
Safeguarding the well-being of women and children presents a challenging research endeavor. Multimodal emotion recognition poses a formidable task within this domain. The field of Human-Computer Interaction (HCI) heavily relies on multimodal data, encompassing audio, video, text, facial expressions, body motions, bio-signals, and physiological data, to predict the safety of women and children. Substantial research efforts have been dedicated to this cause. To develop an optimal multimodal model for emotion recognition, which integrates visual, textual, auditory, and video modalities, a novel deep learning framework is proposed. This framework involves a comprehensive analysis of data, feature extraction, and model-level fusion. Innovative feature extractor networks are tailored specifically for processing visual, textual, auditory, and video data. At the model level, an effective multimodal emotion recognition model is devised, synthesizing information from images, text, voice, and video. The proposed models exhibit impressive performance on three benchmark multimodal datasets, namely IEMOCAP, Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE), achieving high predicted accuracies of 96%, 97%, and 97%, respectively. Comparative analysis with existing emotion recognition models further validates the efficacy and optimality of the proposed approach. The application of multimodal enhanced emotion recognition holds promise in predicting women and children's safety.
Index Terms: Facial Expression Recognition, Deep Learning, Multimodal, Women's Safety, Audio-Visual Media, Fusion.
Download