Coder Social home page Coder Social logo

awesome-text-to-image's Introduction

Awesome Text_to_Image papers

Awesome

A collection of resources on text-to-image synthesis task.

Content

1.Description

  • In the last few decades, the fields of Computer Vision (CV) and Natural Language Processing (NLP) have been made several major technological breakthroughs in deep learning research. Recently, researchers appear interested in combining semantic information and visual information in these traditionally independent fields. A number of studies have been conducted on the text-to-image synthesis techniques that transfer input textual description (keywords or sentences) into realistic images.

  • Papers, codes and datasets for the text-to-image task are available here.

2.Quantitative Evaluation Metrics

3.Datasets

  • Caltech-UCSD Bird(CUB)

    Caltech-UCSD Birds-200-2011 (CUB-200-2011) is an extended version of the CUB-200 dataset, with roughly double the number of images per class and new part location annotations.

    • Detailed information (Images): ⇒ [Paper] [Website]
      • Number of different categories: 200 (Training: 150 categories. Testing: 50 categories.)
      • Number of bird images: 11,788
      • Annotations per image: 15 Part Locations, 312 Binary Attributes, 1 Bounding Box, Ground-truth Segmentation
    • Detailed information (Text Descriptions): ⇒ [Paper] [Website]
      • Descriptions per image: 10 Captions
  • Oxford-102 Flower

    Oxford-102 Flower is a 102 category dataset, consisting of 102 flower categories. The flowers are chosen to be flower commonly occurring in the United Kingdom. The images have large scale, pose and light variations.

    • Detailed information (Images): ⇒ [Paper] [Website]
      • Number of different categories: 102 (Training: 82 categories. Testing: 20 categories.)
      • Number of flower images: 8,189
    • Detailed information (Text Descriptions): ⇒ [Paper] [Website]
      • Descriptions per image: 10 Captions
  • MS-COCO

    COCO is a large-scale object detection, segmentation, and captioning dataset.

    • Detailed information (Images & Text Descriptions): ⇒ [Paper] [Website]
      • Number of images: 120k (Training: 80k. Testing: 40k.)
      • Descriptions per image: 5 Captions
  • Multi-Modal-CelebA-HQ

    Multi-Modal-CelebA-HQ is a large-scale face image dataset for text-to-image-generation, text-guided image manipulation, sketch-to-image generation, GANs for face generation and editing, image caption, and VQA.

    • Detailed information (Images & Text Descriptions): ⇒ [Paper] [Website] [Download]
      • Number of images (from Celeba-HQ): 30,000 (Training: 24,000. Testing: 6,000.)
      • Descriptions per image: 10 Captions
    • Detailed information (Masks):
      • Number of masks (from Celeba-Mask-HQ): 30,000 (512 x 512)
    • Detailed information (Sketches):
      • Number of Sketches: 30,000 (512 x 512)
    • Detailed information (Image with transparent background):
      • Not fully uploaded

4.Paper With Code

  • Survey

    • (2021) Adversarial Text-to-Image Synthesis: A Review, Stanislav Frolov et al. [Paper]
    • (2019) A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis, Jorge Agnese et al. [Paper]
  • 2021

    • (arXiv preprint 2021) Cross-Modal Contrastive Learning for Text-to-Image Generation, Han Zhang et al. [Paper]
    • ⭐(arXiv preprint 2021) CogView: Mastering Text-to-Image Generation via Transformers, Ming Ding et al. [Paper] [Code] [Demo Website(Chinese)]
    • (CVPR 2021) TediGAN: Text-Guided Diverse Image Generation and Manipulation, Weihao Xia et al. [Paper] [Code] [Dataset] [Colab] [Video]
    • ⭐(arXiv preprint 2021) Zero-Shot Text-to-Image Generation, Aditya Ramesh et al. [Paper] [Code] [Blog] [Model Card] [Colab] [Code(Pytorch)]
    • (Pattern Recognition 2021) Unsupervised text-to-image synthesis, Yanlong Dong et al. [Paper]
    • (WACV 2021) Faces a la Carte: Text-to-Face Generation via Attribute Disentanglement, Tianren Wang et al. [Paper]
    • (WACV 2021) Text-to-Image Generation Grounded by Fine-Grained User Attention, Jing Yu Koh et al. [Paper]
    • (arXiv preprint 2021) Cross-Modal Contrastive Learning for Text-to-Image Generation, Han Zhang et al. [Paper]
  • 2020

    • (WIREs Data Mining and Knowledge Discovery 2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis, Jorge Agnese et al. [Paper]
    • (TPAMI 2020) Semantic Object Accuracy for Generative Text-to-Image Synthesis, Tobias Hinz et al. [Paper] [Code]
    • (TIP 2020) KT-GAN: Knowledge-Transfer Generative Adversarial Network for Text-to-Image Synthesis, Hongchen Tan et al. [Paper]
    • (ACM Trans 2020) End-to-End Text-to-Image Synthesis with Spatial Constrains, Min Wang et al. [Paper]
    • (Neural Networks) Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network, DaweiZhu et al. [Paper]
    • (IEEE Access 2020) TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator, Doyeon Kim et al. [Paper]
    • (IEEE Access 2020) Dualattn-GAN: Text to Image Synthesis With Dual Attentional Generative Adversarial Network, Yali Cai et al. [Paper]
    • (ECCV 2020) CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis, Jiadong Liang et al. [Paper] [Code]
    • (CVPR 2020) RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis From Prior Knowledge, Jun Cheng et al. [Paper]
    • (CVPR 2020) CookGAN: Causality based Text-to-Image Synthesis, Bin Zhu et al. [Paper]
    • (CVPR 2020 - Workshop) SegAttnGAN: Text to Image Generation with Segmentation Attention, Yuchuan Gou et al. [Paper]
    • (IVPR 2020) PerceptionGAN: Real-world Image Construction from Provided Text through Perceptual Understanding, Kanish Garg et al. [Paper]
    • (COLING 2020) Leveraging Visual Question Answering to Improve Text-to-Image Synthesis, Stanislav Frolov et al. [Paper]
    • (IRCDL 2020) Text-to-Image Synthesis Based on Machine Generated Captions, Marco Menardi et al. [Paper]
    • (arXiv preprint 2020) TIME: Text and Image Mutual-Translation Adversarial Networks, Bingchen Liu et al. [Paper]
    • (arXiv preprint 2020) DF-GAN: Deep fusion generative adversarial networks for Text-to-Image synthesis, Ming Tao et al. [Paper] [Code]
    • (arXiv preprint 2020) MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs, Fangda Han et al. [Paper]
  • 2019

    • (IEEE TCSVT 2019) Bridge-GAN: Interpretable Representation Learning for Text-to-image Synthesis, Mingkuan Yuan et al. [Paper] [Code]
    • (AAAI 2019) Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis, Minfeng Zhu et al. [Web]
    • (AAAI 2019) Adversarial Learning of Semantic Relevance in Text to Image Synthesis, Miriam Cha et al. [Web]
    • (NIPS 2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge, Tingting Qiao et al. [Paper] [Code]
    • (NIPS 2019) Controllable Text-to-Image Generation, Bowen Li et al. [Paper] [Code]
    • (CVPR 2019) DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis, Minfeng Zhu et al. [Paper] [Code]
    • (CVPR 2019) Object-driven Text-to-Image Synthesis via Adversarial Training, Wenbo Li et al. [Paper] [Code]
    • (CVPR 2019) MirrorGAN: Learning Text-to-image Generation by Redescription, Tingting Qiao et al. [Paper] [Code]
    • (CVPR 2019) Text2Scene: Generating Abstract Scenes from Textual Descriptions, Fuwen Tan et al. [Paper] [Code]
    • (CVPR 2019) Semantics Disentangling for Text-to-Image Generation, Guojun Yin et al. [Paper] [Website]
    • (CVPR 2019) Text Guided Person Image Synthesis, Xingran Zhou et al. [Paper]
    • (ICCV 2019) Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis, Hongchen Tan et al. [Paper]
    • (ICCV 2019) Dual Adversarial Inference for Text-to-Image Synthesis, Qicheng Lao et al. [Paper]
    • (ICCV 2019) Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction, Alaaeldin El-Nouby et al. [Paper] [Code]
    • (BMVC 2019) MS-GAN: Text to Image Synthesis with Attention-Modulated Generators and Similarity-aware Discriminators, Fengling Mao et al. [Paper]
    • (arXiv preprint 2019) GILT: Generating Images from Long Text, Ori Bar El et al. [Paper] [Code]
  • 2018

    • (TPAMI 2018) StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks, Han Zhang et al. [Paper] [Code]
    • (BMVC 2018) MC-GAN: Multi-conditional Generative Adversarial Network for Image Synthesis, Hyojin Park et al. [Paper] [Code]
    • (CVPR 2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks, Tao Xu et al. [Paper] [Code]
    • (CVPR 2018) Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network, Zizhao Zhang et al. [Paper] [Code]
    • (CVPR 2018) Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis, Seunghoon Hong et al. [Paper]
    • (CVPR 2018) Image Generation from Scene Graphs, Justin Johnson et al. [Paper] [Code]
    • (NIPS 2018) Text-adaptive generative adversarial networks: Manipulating images with natural language, Seonghyeon Nam et al. [Paper] [Code]
    • (ICLR 2018 - Workshop) ChatPainter: Improving Text to Image Generation using Dialogue, Shikhar Sharma et al. [Paper]
    • (ACMMM 2018) Text-to-image Synthesis via Symmetrical Distillation Networks, Mingkuan Yuan et al. [Paper]
    • (WACV 2018) C4Synth: Cross-Caption Cycle-Consistent Text-to-Image Synthesis, K. J. Joseph et al. [Paper]
    • (arXiv preprint 2018) Text to Image Synthesis Using Generative Adversarial Networks, Cristian Bodnar. [Paper]
    • (arXiv preprint 2018) Text-to-image-to-text translation using cycle consistent adversarial networks, Satya Krishna Gorti et al. [Paper] [Code]
  • 2017

    • (ICCV 2017) StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, Han Zhang et al. [Paper] [Code]
    • (ICIP 2017) I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation, Hao Dong et al. [Paper] [Code]
    • (MLSP 2017) Adversarial nets with perceptual losses for text-to-image synthesis, Miriam Cha et al. [Paper]
  • 2016

    • (ICML 2016) Generative Adversarial Text to Image Synthesis, Scott Reed et al. [Paper] [Code]
    • (NIPS 2016) Learning What and Where to Draw, Scott Reed et al. [Paper] [Code]

5. Other Related Works

  • Label-set → Semantic maps

    • (ECCV 2020) Controllable image synthesis via SegVAE, Yen-Chi Cheng et al. [Paper] [Code]
  • Text+Image → Image

    • (arXiv preprint 2021) StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery, Or Patashnik et al. [Paper] [Code]
    • (arXiv preprint 2021) Paint by Word, David Bau et al. [Paper]
    • ⭐(arXiv preprint 2021) Zero-Shot Text-to-Image Generation, Aditya Ramesh et al. [Paper] [Code] [Blog] [Model Card] [Colab]
    • (NIPS 2020) Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation, Bowen Li et al. [Paper]
    • (CVPR 2020) ManiGAN: Text-Guided Image Manipulation, Bowen Li et al. [Paper] [Code]
    • (ACMMM 2020) Text-Guided Neural Image Inpainting, Lisai Zhang et al. [Paper] [Code]
    • (ACMMM 2020) Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach, Yahui Liu et al. [Paper]
  • Layout → Image

    • (CVPR 2021) Context-Aware Layout to Image Generation with Enhanced Object Appearance, Sen He et al. [Paper] [Code]
  • Speech → Image

    • (IEEE Journal of Selected Topics in Signal Processing 2020) Direct Speech-to-Image Translation, Jiguo Li et al. [Paper] [Code] [Project]
  • Text → Visual Retrieval

    • (CVPR 2021) T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval, Xiaohan Wang et al. [Paper]
    • (CVPR 2021) Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers, Antoine Miech et al. [Paper]
  • Text → Video

    • (arXiv preprint 2021) GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions, Chenfei Wu et al. [Paper]
    • (arXiv preprint 2021) Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary, Sibo Zhang et al. [Paper]
    • (IEEE Access 2020) TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator, DOYEON KIM et al. [Paper]
    • (IJCAI 2019) Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis, Yogesh Balaji et al. [Paper]
    • (IJCAI 2019) IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation, Kangle Deng et al. [Paper]
    • (AAAI 2018) Video Generation From Text, Yitong Li et al. [Paper]
    • (ACMMM 2017) To create what you tell: Generating videos from captions, Yingwei Pan et al. [Paper]

Contact Me

awesome-text-to-image's People

Contributors

yutong-zhou-cv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.