Speech Emotion Recognition Using Deformable Convolutional Neural Networks
- Version
- Download 17
- File Size 480.97 KB
- File Count 1
- Create Date 17 May 2025
- Last Updated 17 May 2025
Speech Emotion Recognition Using Deformable Convolutional Neural Networks
A. Pramod Reddy1, Kura Abhiram2, Kunchem Rakesh3 , M. Sai Kiran4, M Naresh5
1-5 Department of CSE & TKR College of Engineering & Technology
2-5cB.Tech Students
ABSTRACT
Speech Emotion Recognition (SER) enhances human-computer interaction by enabling machines to detect and respond to emotional cues in speech. This project proposes a deep learning-based SER system using Deformable Convolutional Neural Networks (DCNNs), which dynamically adjust receptive fields to better capture nuanced speech patterns often missed by standard CNNs. It leverages three benchmark datasets—RAVDESS, CREMA-D, and TESS—providing a rich and diverse emotional speech corpus. Preprocessed audio is transformed into MFCCs and Mel-spectrograms, which are stacked to form a dual-channel input for the DCNN. The model accurately classifies eight core emotions across varied speakers and conditions. Results show DCNNs significantly outperform conventional CNNs, highlighting their potential in applications like virtual assistants, mental health tools, and customer service systems.
Keywords — SER, DCNN, TESS, CREMA, MFCC
Download