Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm

GONG Fuzhou, XIA Zigeng

系统科学与复杂性(英文) ›› 2025

PDF(6396 KB)
PDF(6396 KB)
系统科学与复杂性(英文) ›› 2025

Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm

    GONG Fuzhou1, XIA Zigeng2
作者信息 +

Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm

    GONG Fuzhou1, XIA Zigeng2
Author information +
文章历史 +

摘要

Synthesizing images or texts automatically becomes a useful research area in the artificial intelligence nowadays. Generative adversarial networks (GANs), proposed by Goodfellow et al in 2014, make this task to be done more efficiently by using deep neural networks (DNNs). We consider generating corresponding images from a single-sentence input text description using a GAN. Specifically, we analyze the GAN-CLS algorithm, which is a kind of advanced method of GAN proposed by Reed et al in 2016. In this paper we show the theoretical problem with this algorithm and correct it by modifying the objective function of the model. Experiments are performed on the Oxford-102 dataset and the CUB dataset to support our theoretical results. Since our modification can be seen as an idea which can be used to improve all such kind of GAN models, we try two models, GAN-CLS and AttnGANGPT. As a result, in both of the two models, our modified algorithm is more stable and can generate images which are more plausible than the original algorithm. Also, some of the generated images match the input texts better, and our modified algorithm has better performance on the quantitative indicators including FID and inception score. Finally, we propose some future application prospect of our modification idea, especially in the area of large language models.

Abstract

Synthesizing images or texts automatically becomes a useful research area in the artificial intelligence nowadays. Generative adversarial networks (GANs), proposed by Goodfellow et al in 2014, make this task to be done more efficiently by using deep neural networks (DNNs). We consider generating corresponding images from a single-sentence input text description using a GAN. Specifically, we analyze the GAN-CLS algorithm, which is a kind of advanced method of GAN proposed by Reed et al in 2016. In this paper we show the theoretical problem with this algorithm and correct it by modifying the objective function of the model. Experiments are performed on the Oxford-102 dataset and the CUB dataset to support our theoretical results. Since our modification can be seen as an idea which can be used to improve all such kind of GAN models, we try two models, GAN-CLS and AttnGANGPT. As a result, in both of the two models, our modified algorithm is more stable and can generate images which are more plausible than the original algorithm. Also, some of the generated images match the input texts better, and our modified algorithm has better performance on the quantitative indicators including FID and inception score. Finally, we propose some future application prospect of our modification idea, especially in the area of large language models.

关键词

Deep learning / Generative adversarial networks / Negative examples / Text-to-image synthesis

Key words

Deep learning / Generative adversarial networks / Negative examples / Text-to-image synthesis

引用本文

导出引用
GONG Fuzhou , XIA Zigeng. Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm. 系统科学与复杂性(英文), 2025
GONG Fuzhou , XIA Zigeng. Generate Corresponding Image from Text Description Using Modified GAN-CLS Algorithm. Journal of Systems Science and Complexity, 2025

基金

This research was supported by the National Natural Science Foundation of China (No.12288201).
PDF(6396 KB)

17

Accesses

0

Citation

Detail

段落导航
相关文章

/