Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI

Version

File Size 282.08 KB

Downloads 25

Files 1

Published 11 May 2026

Updated 11 May 2026

Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI

Dr. Vijayalaxmi Mekali, Akash S, Amruth CK, C Nagendra Reddy, G Akash

Department of Computer Science and Engineering

K. S. Institute of Technology, Bengaluru – 560109, India vijayalaxmimekali@ksit.edu.in

{1KS23CS007, 1KS23CS010, 1KS23CS032, 1KS23CS044}@ksit.edu.in

Abstract- Podcasts have become one of the most popular platforms for sharing knowledge, storytelling, education, and entertainment. However, podcasts mainly rely on audio, making it difficult for users to visually understand key topics and follow long conversations. This project, “Aawaz aur Drishya ka Setu”, introduces an AI-based system that transforms podcast audio into meaningful visual representations. The system converts podcast speech into text using speech recognition techniques and divides the transcript into meaningful sections. Natural Language Processing is then used to summarize content, extract keywords, and identify emotional tones. Based on the detected context and emotions, AI image generation models create visuals that represent the podcast scenes and mood. Color theory is also applied to improve emotional connection and storytelling. The proposed system enhances podcast accessibility, user engagement, and content presentation for creators, educational platforms, media applications, and hearing-impaired users.

Keywords — Podcast Visualization, Speech Recognition, Natural Language processing, Image Generation, Emotion Detection, Multimodal Learning.

International Scientific Journal of Engineering and Management

An International Scholarly || Multidisciplinary || Open Access || Indexing in all major Database & Metadata

The journal follows the UGC Guidelines and is evaluated for inclusion in the Web of Science

Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI

Aawaz Aur Drishya Ka Setu: Podcast Audio-To-Image Generation using Automatic Speech Recognition, Summarization, and Generative AI

Categories & Tags

Similar Downloads

What is the difference between a Research Paper and a Review Paper?

What is DOI?

What do you need to do during production of your Research Paper?

What are the advantages of publishing a research paper?

Ways to Support your Academic Wellbeing which preparing the Research Paper/Article

How to improve your Research Paper writing Skills?

Is DOI compulsory to publish a research paper in a Journal?

In what ways does research paper give weight to career development?

How to develop a Research Paper from Scratch

How Plagiarism report plays crucial role in Research Paper Publication?

What is DOI?

Quick Links

Contact Us