Implementation of the Image Text to Speech Conversion in the Desired Language by Translating with Raspberry Pi

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Implementation of the Image Text to Speech Conversion in the Desired Language by Translating with Raspberry Pi

Version
Download 3
File Size 408.82 KB
File Count 1
Create Date 25 June 2025
Last Updated 25 June 2025

Implementation of the Image Text to Speech Conversion in the Desired Language by Translating with Raspberry Pi

N. MALLISHWARI, B.Tech III- ECE (226F1A0453),

Department of Electronics and Communication Engineering , Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

Dr. K. NAVEEN KUMAR, M.Tech., Ph.D., MISTE., MIAEng., MIETE

Professor & HOD of Department of Electronics and Communication Engineering, , Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

Mr.B. LAXMAN, M.Tech

Professor & HOD of Electronics and Communication Engineering, Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

B. ARTHISHA, B. Tech III- ECE (226F1A0408),

Department of Electronics and Communication Engineering, Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

CH. BHARATH, B. Tech III- ECE (226F1A0412),

Department of Electronics and Communication Engineering, Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

M.RAJU, B. Tech III- ECE (236F1A404),

Department of Electronics and Communication Engineering, Pallavi Engineering College, Survey No.209, Swathi Residency Road KUNTLOOR, Hayathnagar, Kuntloor Village, Hayathnagar_Khalsa, Hyderabad, Telangana 501505.

ABSTRACT:

The main problem in communication is language bias between the communicators. This device basically can be used by people who do not know English and want it to be translated to their native language. The novelty component of this research work is the speech output which is available in 53 different languages translated from English. This paper is based on a prototype which helps user to hear the contents of the text images in the desired language. It involves extraction of text from the image and converting the text to translated speech in the user desired language. This is done with Raspberry Pi and a camera module by using the concepts of Tesseract OCR [optical character recognition] engine, Google Speech API [application program interface] which is the Text to speech engine and the Microsoft translator. This relieves the travelers as they can use this device to hear the English text in their own desired language. It can also be used by the visually impaired. This device helps users to hear the images being read in their desired language.

Image Text to Speech (ITTS) conversion is an assistive technology that bridges the gap between visual and auditory information, making printed text accessible to visually impaired individuals. This project focuses on developing a system that captures text from images using a camera module, processes the image to extract text using Optical Character Recognition (OCR), and then converts the recognized text into audible speech using Text-to-Speech (TTS) synthesis. The implementation utilizes a Raspberry Pi as the core processing unit, integrated with a webcam for image capture and a speaker for audio output. The Python-based system employs libraries such as Tesseract OCR for text extraction and pyttsx3 or gTTS for speech generation.

Key Words: Raspberry Pi (any model, e.g., Raspberry Pi , Camera Module / USB Webcam, Speaker / Audio Output, MicroSD Card,GPIO (if using external controls), Python, Bash / Shell Script, Linux (Raspberry Pi OS),Open Source Libraries, Automation , Script Crontab (for scheduling tasks)

Download