EmoTeluNet: A Deep Learning Architecture for Telugu Speech Emotion Recognition

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

EmoTeluNet: A Deep Learning Architecture for Telugu Speech Emotion Recognition

Version
Download 203
File Size 519.57 KB
File Count 1
Create Date 24 May 2025
Last Updated 24 May 2025

EmoTeluNet: A Deep Learning Architecture for Telugu Speech Emotion Recognition

Authors:

M.Venkata Ramana

Department of Artificial Intelligence and Data Science Central University of Andhra Pradesh, Ananthapuramu, India Email: mvramana5767@gmail.com

Dr. C. Krishna Priya Assistant Professor

Department of Artificial Intelligence and Data Science Central University of Andhra Pradesh, Ananthapuramu, India Email: krishnapriyarams@cuap.edu.in

Abstract—Speech Emotion Recognition (SER) is pivotal for advancing human-centric artificial intelligence, yet regional lan- guages like Telugu, spoken by over 80 million people, lack robust SER frameworks. This paper introduces Deep Telugu Emotion, a deep learning framework designed to recognize emotions in Telugu speech. We curated a novel dataset of Telugu emotional speech and evaluated six neural network models: Artificial Neural Network (ANN), Multi-Layer Perceptron (MLP), Bidirectional Long Short-Term Memory (BiLSTM), Attention-based BiLSTM, Convolutional Recurrent Neural Network (CRNN), and 1D Convolutional Neural Network (CNN1D). Features such as Mel- Frequency Cepstral Coefficients (MFCCs), chroma, and spectral contrast were extracted to train the models. Experimental results demonstrate that ANN and MLP achieved the highest test accuracy of 84.21%, followed by Attention BiLSTM at 81.58%. BiLSTM, CNN1D, and CRNN recorded accuracies of 78.95%, 76.32%, and 50.00%, respectively. This framework establishes a benchmark for Telugu SER, highlighting the efficacy of feedfor- ward models for regional language applications and paving the way for empathetic AI systems.

Index Terms—Speech Emotion Recognition, Telugu, Deep Learning, Neural Networks, MFCC, Attention Mechanism

Download