Analysis-of-Faces-and-Facial-Details-Generated-by-Text-to-Image-Models

Overview

In this project, we reproduced the results of Borji and extended the work to further analysis of facial details, eyes and mouth. We also included a new diffusion model, ERNIE-ViLG, that achieves state-of-the-art on MS-COCO with zero-shot FID score of 6.75, which outperforms Google’s Imagen. We find that despite being the state-of-the-art Text-to-Image model, ERNIE-ViLG scores highest in FID score on all categories. While DALL-E 2 struggles more with generating natural eyes, Midjourney falls short on mouth details. We explored possible reasons and included some analysis of the model.

Instruction

I would not suggest rerunning any of the codes that generate or extract images. Datasets needed to reproduce results in the report are listed under the Dataset section below, and you can directly run the notebook to regenerate plots. If you are interested in seeing how FID scores were calculated, go here, and you can see which datasets were used. From there, you can find how the datasets were produced from corresponding Jupyter notebooks.

Codes

Code for extracting face images from COCO dataset.
Code for extracting eyes area from face images.
Code for extracting mouth from face images.
Code for generating face images from ERNIE: You will not be able to generate an image without a valid API key and Secret key.
Code for calculating FID scores.
Code for ploting FID scores.

Dataset

GFW: face image dataset provided by Borji.
COCO: extracted face images from COCO, face images resized to 100 by 100 for FID calculation against GFW real face image.
Real images from GFW: face images, eyes images, mouth images.
ERNIR-ViLG: extracted face image after manual filtering, eyes images, mouth images, face images resized to 100 by 100.
Stable Diffusion: extracted face images, eyes images, mouth images.
DALL-E 2: extracted face images, eyes images, mouth images.
Midjourney: extracted face images, eyes images, mouth images.

xiaonazhou / analysis-of-faces-and-facial-details-generated-by-text-to-image-models Goto Github PK