Images to Audio Story Generator by Using Python
Images to Audio Story Generator by Using Python
1 SATISH VARADA,2VAKADA MAHENDRA REDDY
1Assistant Professor, 22MCA Final Semester, Master of Computer Applications, Sanketika Vidya Parishad Engineering College, Vishakhapatnam, Andhra Pradesh, India
Abstract
The Image to Audio Story Generator is a Python-based AI application developed using Streamlit that converts uploaded images into engaging audio stories. The project integrates computer vision, natural language processing, and text-to-speech technologies to create an automated storytelling experience. The system uses the Salesforce BLIP image captioning model to analyze images and generate descriptive text. The extracted image description is then processed using the OpenAI GPT-3.5 Turbo model to generate creative and context-aware short stories. Finally, the generated story is converted into audio format using the Google Text-to-Speech (gTTS) library, allowing users to listen to the AI-generated narrative. The application is built entirely in Python and provides a simple and user-friendly interface through Streamlit for uploading images, viewing generated stories, and playing audio output. This project demonstrates the practical integration of AI models, deep learning, and multimedia processing to enhance storytelling, creativity, and accessibility.
IndexTerms: Image Captioning; Audio Story Generation, Artificial Intelligence (AI), Natural Language Processing (NLP), Text-to-Speech (TTS), Generative AI.