GENERATING IMAGES FROM TEXT USING FINE-TUNED BERT IN A GAN FRAMEWORK

Notification

Announcement!

ISJEM Invites papers for various areas like engineering, Management, Science & other multi discplinary subjects. Please submit your paper for review.

ISJEM assigns a digital object identifier (DOI) to each published paper, making it easier for the paper to be cited in various major databases like Google Scholar, ResearchGate, Academia.edu, etc…

ISJEM takes 24–48 hours to publish a research paper. Within 24 hours, the submitted paper will be reviewed and notified of its status, and it will be published once the processing fee is successfully received.

GENERATING IMAGES FROM TEXT USING FINE-TUNED BERT IN A GAN FRAMEWORK

Version
Download 43
File Size 292.42 KB
File Count 1
Create Date 20 May 2025
Last Updated 20 May 2025

GENERATING IMAGES FROM TEXT USING FINE-TUNED BERT IN A GAN FRAMEWORK

Authors:

Mr.P.Chandrashekar 1, Salluri Sampath2, Y Shireesha3, Vemula Karthik4, Somu Sidharth Reddy5. 1 Assistant Professor, Department of Computer and Science Engineering,

TKR College of Engineering and Technology.

2,3,4,5UG Scholars, Department of Computer and Science Engineering, TKR College of Engineering and Technology, Medbowli, Meerpet.

ABSTRACT: Recently, multimodal learning has gained significant attention due to its potential to enhance AI performance and enable diverse applications. Text-to-image generation, a critical multimodal task, seeks to generate images that semantically align with textual descriptions. Traditional approaches based on Generative Adversarial Networks (GANs) leverage text encoders pre- trained on image-text pairs. However, these encoders often fail to capture rich semantic details for unseen text, limiting the fidelity of generated images. This study proposes a novel text-to-image generation model using a fine-tuned BERT model as the text encoder. By fine-tuning BERT on a large corpus of text data, the model captures deeper textual semantics, leading to improved image generation quality. Experimental evaluation using a multimodal benchmark dataset demonstrates that the proposed approach outperforms baseline methods both quantitatively and qualitatively.

KEYWORDS: Multimodal Learning, Generative Adversarial Network (GAN), BERT Fine Tuning, Natural Language Processing (NLP), FID (Fréchet Inception Distance), Inception Score (IS)

Download