Unified Human Activity and Sign Recognition System

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Unified Human Activity and Sign Recognition System

Version

File Size 369.22 KB

Downloads 0

Files 1

Published 17 December 2025

Updated 17 December 2025

Unified Human Activity and Sign Recognition System

Divyashree Nayak1, Hamsini M2, Harshitha R3 , Nikita Shantappa Biradar4

1Student, Dept. of Information Science Engineering, AMC Engineering College, Karnataka, India

2Student, Dept. of Information Science Engineering, AMC Engineering College, Karnataka, India

3Student, Dept. of Information Science Engineering, AMC Engineering College, Karnataka, India

4Student, Dept. of Information Science Engineering, AMC Engineering College, Karnataka, India

Abstract
Human Activity Recognition (HAR) and Sign Language Recognition (SLR) play a crucial role in advancing intelligent human–computer interaction systems. However, most existing approaches address these tasks independently, which limits their real-world usability and scalability. To overcome this limitation, this paper presents a unified deep learning–based framework that integrates Human Activity Recognition (HAR), Sign Language Recognition (SLR), and American Sign Language (ASL) gesture classification into a single, cohesive system.The proposed framework utilizes video-based datasets for HAR and SLR stored in .npy format, along with an image-based ASL dataset in .jpg format. For effective feature extraction, 3D Convolutional Neural Networks (3D CNNs) are employed to capture spatio-temporal patterns from video data, while MobileNetV2 is used for lightweight and efficient feature extraction from image-based gestures. The extracted features are further processed using Long Short-Term Memory (LSTM) networks to model temporal dependencies and improve recognition accuracy.To enhance societal applicability, the system provides real-time feedback through a text-to-speech module and generates automated alerts via Telegram for detected activities or recognized signs. Experimental evaluations show high accuracy, stable convergence, and efficient performance across all tasks, demonstrating the robustness of the unified architecture. This work contributes to the development of an intelligent multimodal recognition system that enhances accessibility and bridges communication gaps between humans and machines.

Key Words: Human Activity Recognition (HAR), Sign Language Recognition (SLR), American Sign Language (ASL), Deep Learning, 3D CNN, MobileNetV2, LSTM, Text-to-Speech, Telegram Alert.