Coder Social home page Coder Social logo

analysis-of-faces-and-facial-details-generated-by-text-to-image-models's Introduction

Analysis-of-Faces-and-Facial-Details-Generated-by-Text-to-Image-Models

Overview

In this project, we reproduced the results of Borji and extended the work to further analysis of facial details, eyes and mouth. We also included a new diffusion model, ERNIE-ViLG, that achieves state-of-the-art on MS-COCO with zero-shot FID score of 6.75, which outperforms Google’s Imagen. We find that despite being the state-of-the-art Text-to-Image model, ERNIE-ViLG scores highest in FID score on all categories. While DALL-E 2 struggles more with generating natural eyes, Midjourney falls short on mouth details. We explored possible reasons and included some analysis of the model.

Instruction

I would not suggest rerunning any of the codes that generate or extract images. Datasets needed to reproduce results in the report are listed under the Dataset section below, and you can directly run the notebook to regenerate plots. If you are interested in seeing how FID scores were calculated, go here, and you can see which datasets were used. From there, you can find how the datasets were produced from corresponding Jupyter notebooks.

Codes

  1. Code for extracting face images from COCO dataset.
  2. Code for extracting eyes area from face images.
  3. Code for extracting mouth from face images.
  4. Code for generating face images from ERNIE: You will not be able to generate an image without a valid API key and Secret key.
  5. Code for calculating FID scores.
  6. Code for ploting FID scores.

Dataset

  1. GFW: face image dataset provided by Borji.
  2. COCO: extracted face images from COCO, face images resized to 100 by 100 for FID calculation against GFW real face image.
  3. Real images from GFW: face images, eyes images, mouth images.
  4. ERNIR-ViLG: extracted face image after manual filtering, eyes images, mouth images, face images resized to 100 by 100.
  5. Stable Diffusion: extracted face images, eyes images, mouth images.
  6. DALL-E 2: extracted face images, eyes images, mouth images.
  7. Midjourney: extracted face images, eyes images, mouth images.

analysis-of-faces-and-facial-details-generated-by-text-to-image-models's People

Contributors

xiaonazhou avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.