Images to Audio Story Generator by Using Python

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Images to Audio Story Generator by Using Python

Version

File Size 458.35 KB

Downloads 0

Files 1

Published 21 June 2026

Updated 21 June 2026

Images to Audio Story Generator by Using Python

1 SATISH VARADA,2VAKADA MAHENDRA REDDY

1Assistant Professor, 22MCA Final Semester, Master of Computer Applications, Sanketika Vidya Parishad Engineering College, Vishakhapatnam, Andhra Pradesh, India

Abstract

The Image to Audio Story Generator is a Python-based AI application developed using Streamlit that converts uploaded images into engaging audio stories. The project integrates computer vision, natural language processing, and text-to-speech technologies to create an automated storytelling experience. The system uses the Salesforce BLIP image captioning model to analyze images and generate descriptive text. The extracted image description is then processed using the OpenAI GPT-3.5 Turbo model to generate creative and context-aware short stories. Finally, the generated story is converted into audio format using the Google Text-to-Speech (gTTS) library, allowing users to listen to the AI-generated narrative. The application is built entirely in Python and provides a simple and user-friendly interface through Streamlit for uploading images, viewing generated stories, and playing audio output. This project demonstrates the practical integration of AI models, deep learning, and multimedia processing to enhance storytelling, creativity, and accessibility.

IndexTerms: Image Captioning; Audio Story Generation, Artificial Intelligence (AI), Natural Language Processing (NLP), Text-to-Speech (TTS), Generative AI.