EmoTeluNet: A Deep Learning Architecture for Telugu Speech Emotion Recognition
- Version
- Download 17
- File Size 519.57 KB
- File Count 1
- Create Date 24 May 2025
- Last Updated 24 May 2025
EmoTeluNet: A Deep Learning Architecture for Telugu Speech Emotion Recognition
Authors:
M.Venkata Ramana
Department of Artificial Intelligence and Data Science Central University of Andhra Pradesh, Ananthapuramu, India Email: mvramana5767@gmail.com
Dr. C. Krishna Priya Assistant Professor
Department of Artificial Intelligence and Data Science Central University of Andhra Pradesh, Ananthapuramu, India Email: krishnapriyarams@cuap.edu.in
Abstract—Speech Emotion Recognition (SER) is pivotal for advancing human-centric artificial intelligence, yet regional lan- guages like Telugu, spoken by over 80 million people, lack robust SER frameworks. This paper introduces Deep Telugu Emotion, a deep learning framework designed to recognize emotions in Telugu speech. We curated a novel dataset of Telugu emotional speech and evaluated six neural network models: Artificial Neural Network (ANN), Multi-Layer Perceptron (MLP), Bidirectional Long Short-Term Memory (BiLSTM), Attention-based BiLSTM, Convolutional Recurrent Neural Network (CRNN), and 1D Convolutional Neural Network (CNN1D). Features such as Mel- Frequency Cepstral Coefficients (MFCCs), chroma, and spectral contrast were extracted to train the models. Experimental results demonstrate that ANN and MLP achieved the highest test accuracy of 84.21%, followed by Attention BiLSTM at 81.58%. BiLSTM, CNN1D, and CRNN recorded accuracies of 78.95%, 76.32%, and 50.00%, respectively. This framework establishes a benchmark for Telugu SER, highlighting the efficacy of feedfor- ward models for regional language applications and paving the way for empathetic AI systems.
Index Terms—Speech Emotion Recognition, Telugu, Deep Learning, Neural Networks, MFCC, Attention Mechanism
Download