Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI
Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI
Dr. Vijayalaxmi Mekali, Akash S, Amruth CK, C Nagendra Reddy, G Akash
Department of Computer Science and Engineering
K. S. Institute of Technology, Bengaluru – 560109, India vijayalaxmimekali@ksit.edu.in
{1KS23CS007, 1KS23CS010, 1KS23CS032, 1KS23CS044}@ksit.edu.in
Abstract- Podcasts have become one of the most popular platforms for sharing knowledge, storytelling, education, and entertainment. However, podcasts mainly rely on audio, making it difficult for users to visually understand key topics and follow long conversations. This project, “Aawaz aur Drishya ka Setu”, introduces an AI-based system that transforms podcast audio into meaningful visual representations. The system converts podcast speech into text using speech recognition techniques and divides the transcript into meaningful sections. Natural Language Processing is then used to summarize content, extract keywords, and identify emotional tones. Based on the detected context and emotions, AI image generation models create visuals that represent the podcast scenes and mood. Color theory is also applied to improve emotional connection and storytelling. The proposed system enhances podcast accessibility, user engagement, and content presentation for creators, educational platforms, media applications, and hearing-impaired users.
Keywords — Podcast Visualization, Speech Recognition, Natural Language processing, Image Generation, Emotion Detection, Multimodal Learning.