Sign Language to Speech Conversion Using Machine Learning

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Sign Language to Speech Conversion Using Machine Learning

Version

File Size 4.35 MB

Downloads 1

Files 1

Published 8 April 2026

Updated 8 April 2026

Sign Language to Speech Conversion Using Machine Learning

Authors:

M.Bharath¹, Ch. Deviprasad², S.Abhishek Goud³, G. Mayank⁴, K. Likhith⁵, Mr. E. Kiran Kumar⁶

Student^1-5 BTech (CSE) From Sphoorthy Engineering College, Hyderabad.

Assistant Professor⁶, Dep of CSE, Sphoorthy Engineering College, Hyderabad.

ABSTRACT

Sign language constitutes the principal mode of communication for millions of hearing-impaired individuals worldwide; however, its limited comprehension among the general population perpetuates a significant communication divide. This study introduces a computationally efficient, real-time, vision-based sign language recognition framework that translates hand gestures into both textual and auditory outputs.

The proposed methodology leverages MediaPipe-based 3D hand landmark extraction, which is subsequently transformed into a background-invariant skeletal representation, thereby eliminating environmental dependencies such as illumination variability and visual clutter. A two-tier hierarchical classification paradigm is employed, wherein a Convolutional Neural Network (CNN) performs coarse-grained classification across structural gesture groups, followed by deterministic geometric inference for fine-grained character discrimination.

Empirical evaluations demonstrate superior performance with 99.1% accuracy in controlled settings and 96.9% in unconstrained environments, significantly outperforming traditional RGB-based models. The system achieves real-time execution at approximately 50 FPS on standard CPU hardware, integrating multilingual translation and text-to-speech synthesis.

This framework presents a scalable, cost-effective, and highly accessible solution for assistive communication technologies, with substantial implications for inclusive human-computer interaction systems.

Keywords

Sign Language Recognition, Deep Learning, MediaPipe, CNN, Human-Computer Interaction, Assistive Technology, Real-Time Systems