GENERATING IMAGES FROM TEXT USING FINE-TUNED BERT IN A GAN FRAMEWORK
- Version
- Download 6
- File Size 292.42 KB
- File Count 1
- Create Date 20 May 2025
- Last Updated 20 May 2025
GENERATING IMAGES FROM TEXT USING FINE-TUNED BERT IN A GAN FRAMEWORK
Authors:
Mr.P.Chandrashekar 1, Salluri Sampath2, Y Shireesha3, Vemula Karthik4, Somu Sidharth Reddy5. 1 Assistant Professor, Department of Computer and Science Engineering,
TKR College of Engineering and Technology.
2,3,4,5UG Scholars, Department of Computer and Science Engineering, TKR College of Engineering and Technology, Medbowli, Meerpet.
ABSTRACT: Recently, multimodal learning has gained significant attention due to its potential to enhance AI performance and enable diverse applications. Text-to-image generation, a critical multimodal task, seeks to generate images that semantically align with textual descriptions. Traditional approaches based on Generative Adversarial Networks (GANs) leverage text encoders pre- trained on image-text pairs. However, these encoders often fail to capture rich semantic details for unseen text, limiting the fidelity of generated images. This study proposes a novel text-to-image generation model using a fine-tuned BERT model as the text encoder. By fine-tuning BERT on a large corpus of text data, the model captures deeper textual semantics, leading to improved image generation quality. Experimental evaluation using a multimodal benchmark dataset demonstrates that the proposed approach outperforms baseline methods both quantitatively and qualitatively.
KEYWORDS: Multimodal Learning, Generative Adversarial Network (GAN), BERT Fine Tuning, Natural Language Processing (NLP), FID (Fréchet Inception Distance), Inception Score (IS)