Image-to-Text-Speech Converter

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Image-to-Text-Speech Converter

Version

File Size 334.96 KB

Downloads 109

Files 1

Published 6 June 2025

Updated 6 June 2025

Image-to-Text-Speech Converter

Prathamesh Bothe, Dipali Bhusari

PG Student, Department of MCA, Trinity Academy of Engineering, Pune, India

Guide, Department of MCA, Trinity Academy of Engineering, Pune, India

ABSTRACT

This paper presents the development of an Image-to-Text-Speech Converter, a Python-based assistive system designed to extract and vocalize text from images. The primary goal of the project is to support visually impaired individuals and those with reading difficulties by converting printed or digital text into clear, audible speech. The system integrates Optical Character Recognition (OCR) using Tesseract with Text-to-Speech (TTS) synthesis using tools like pyttsx3 or gTTS. Preprocessing techniques such as grayscale conversion, noise reduction, and thresholding are applied through OpenCV to enhance OCR accuracy across various image conditions.

The converter provides a simple and user-friendly interface, supporting multiple image formats and capable of functioning both online and offline. It effectively bridges the gap between visual content and auditory accessibility, making it suitable for use in reading printed documents, signboards, or

labels. This paper explores the system’s architecture, implementation details, and practical applications, while also addressing the challenges encountered and potential areas for future enhancement, including real-time camera integration and multilingual support.

Download

or download free

[free_download_btn]

[changelog]