Coder Social home page Coder Social logo

kopetri / hpsv2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tgxs002/hpsv2

0.0 1.0 0.0 8.62 MB

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

License: Apache License 2.0

Shell 0.11% Python 9.49% Jupyter Notebook 90.40%

hpsv2's Introduction


HPS v2: Benchmarking Text-to-Image Generative Models

PyPI PyPI - Downloads Arxiv Huggingface PyPI - License

This is the official repository for the paper: Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis.

Updates

Overview


Human Preference Dataset v2 (HPD v2): a large-scale (798k preference choices / 430k images), a well-annotated dataset of human preference choices on images generated by text-to-image generative models.

Human Preference Score v2 (HPS v2): a preference prediction model trained on HPD v2. HPS v2 can be used to compare images generated with the same prompt. We also provide a fair, stable, and easy-to-use set of evaluation prompts for text-to-image generative models.

The HPS v2 benchmark

The HPS v2 benchmark evaluates models' capability of generating images of 4 styles: Animation, Concept-art, Painting, and Photo.

The benchmark is actively updating, email us @ [email protected] or raise an issue if you feel your model/method needs to be included in this benchmark!

Model Animation Concept-art Painting Photo Averaged
Dreamlike Photoreal 2.0 28.24 27.60 27.59 27.99 27.86
SDXL Refiner 0.9 28.45 27.66 27.67 27.46 27.80
Realistic Vision 28.22 27.53 27.56 27.75 27.77
SDXL Base 0.9 28.42 27.63 27.60 27.29 27.73
Deliberate 28.13 27.46 27.45 27.62 27.67
ChilloutMix 27.92 27.29 27.32 27.61 27.54
MajicMix Realistic 27.88 27.19 27.22 27.64 27.48
Openjourney 27.85 27.18 27.25 27.53 27.45
DeepFloyd-XL 27.64 26.83 26.86 27.75 27.27
Epic Diffusion 27.57 26.96 27.03 27.49 27.26
Stable Diffusion v2.0 27.48 26.89 26.86 27.46 27.17
Stable Diffusion v1.4 27.26 26.61 26.66 27.27 26.95
DALL路E 2 27.34 26.54 26.68 27.24 26.95
Versatile Diffusion 26.59 26.28 26.43 27.05 26.59
CogView2 26.50 26.59 26.33 26.44 26.47
VQGAN + CLIP 26.44 26.53 26.47 26.12 26.39
DALL路E mini 26.10 25.56 25.56 26.12 25.83
Latent Diffusion 25.73 25.15 25.25 26.97 25.78
FuseDream 25.26 25.15 25.13 25.57 25.28
VQ-Diffusion 24.97 24.70 25.01 25.71 25.10
LAFITE 24.63 24.38 24.43 25.81 24.81
GLIDE 23.34 23.08 23.27 24.50 23.55

Quick Start

Installation

# Method 1: Pypi download and install
pip install hpsv2

# Method 2: install locally
git clone https://github.com/tgxs002/HPSv2.git
cd HPSv2
python -m pip install . 

# Optional: checkpoint and images will be downloaded here
# default: ~/.cache/hpsv2/
export HPS_ROOT=/your/cache/path

After installation, we show how to:

We also provide command line interfaces for debugging purposes.

Image Comparison

You can score and compare several images generated by the same prompt by running the following code:

import hpsv2

result = hpsv2.score(imgs_path, '<prompt>') 
# imgs_path is a list of image paths, with the images generated by the same prompt

Note: Comparison is only meaningful for images generated by the same prompt.

Benchmark Reproduction

We also provide images generated by models in our benchmark used for evaluation. You can easily download the data and evaluate the models by running the following code.

import hpsv2

print(hpsv2.available_models) # Get models that have access to data
hpsv2.evaluate_benchmark('<model_name>')

Custom Evaluation

To evaluate your own text-to-image generative model, you can prepare the images for evaluation base on the benchmark prompts we provide by running the following code:

import os
import hpsv2

# Get benchmark prompts (<style> = all, anime, concept-art, paintings, photo)
all_prompts = hpsv2.benchmark_prompts('all') 

# Iterate over the benchmark prompts to generate images
for style, prompts in all_prompts.items():
    for prompt in prompts:
        image = TextToImageModel(prompt) 
        # TextToImageModel is the model you want to evaluate
        image.save(os.path.join("<image_path>", style, "<image_name>")) 
        # <image_path> is the folder path to store generated images, as the input of hpsv2.evaluate().
        # <image_name> is of the form of '00xxx.jpg', with 'xxx' ranging from '000' to '799' corresponding to each prompt.

And then run the following code to conduct evaluation:

import hpsv2

hpsv2.evaluate("<images_path>") 
# <image_path> is the same as <image_path> in the prevoius part

Preference Model Evaluation

Evaluating HPS v2's correlation with human preference choices:

Model Acc. on ImageReward test set (%) Acc. on HPD v2 test set (%) Acc. on new test set (%)
Aesthetic Score Predictor 57.4 76.8 -
ImageReward 65.1 74.0 -
HPS 61.2 77.6 -
PickScore 62.9 79.8 -
Single Human 65.3 78.1 65.4*
HPS v2 65.7 83.3 73.2*

* The new test set is another test set annotated similarly to the HPD v2 test set, except that images are generated from 10 better models (Dreamlike Photoreal 2.0, SDXL Refiner 0.9, Realistic Vision, SDXL Base 0.9, Deliberate, ChilloutMix, MajicMix Realistic, Openjourney, DeepFloyd-XL, Epic Diffusion).

HPS v2 checkpoint can be downloaded from here. The model and live demo is also hosted on 馃 Hugging Face at here.

Run the following commands to evaluate the HPS v2 model on HPD v2 test set and ImageReward test set (Need to install the package hpsv2 first):

# evaluate on HPD v2 test set
python evaluate.py --data-type test --data-path /path/to/HPD --image-path /path/to/image_folder

# evaluate on ImageReward test set
python evaluate.py --data-type ImageReward --data-path /path/to/IR --image-path /path/to/image_folder

Human Preference Dataset v2

The prompts in our dataset are sourced from DiffusionDB and MSCOCO Captions. Prompts from DiffusionDB are first cleaned by ChatGPT to remove biased function words. Human annotators are tasked to rank images generated by different text-to-image generative models from the same prompt. Totally there are about 798k pairwise comparisons of images for over 430k images and 107k prompts, 645k pairs for training split and 153k pairs for test split.

Image sources of HPD v2:

Source # of images
CogView2 73697
DALL路E 2 101869
GLIDE (mini) 400
Stable Diffusion v1.4 101869
Stable Diffusion v2.0 101869
LAFITE 400
VQ-GAN+CLIP 400
VQ-Diffusion 400
FuseDream 400
COCO Captions 28272

Currently, the test data can be downloaded from here. You can inspect the test data at https://tgxs002.github.io/hpd_test_vis/. Here is a screenshot: test_vis

The training dataset will be released soon. Once unzipped, you should get a folder with the following structure:

HPD
---- train/
-------- {image_id}.jpg
---- test/
-------- {image_id}.jpg
---- train.json
---- test.json
---- benchmark/
-------- benchmark_imgs/
------------ {model_id}/
---------------- {image_id}.jpg
-------- drawbench/
------------ {model_id}/
---------------- {image_id}.jpg
-------- anime.json
-------- concept-art.json
-------- paintings.json
-------- photo.json
-------- drawbench.json

The annotation file, train.json, is organized as:

[
    {
        'human_preference': list[int], # 1 for preference
        'prompt': str,
        'file_path': list[str],
        'user_hash': str,
    },
    ...
]

The annotation file, test.json, is organized as:

[
    {
        'prompt': str,
        'image_path': list[str],
        'rank': list[int], # averaged ranking result for image at the same index in image_path,
        'raw_annotation': list[{'rank', 'user_hash'}]  # raw ranking result from each annotator
    },
    ...
]

The benchmark prompts file, ie. anime.json is pure prompts. The corresponding image can be found in the folder of the corresponding model by indexing the prompt.

Command Line Interface

Evaluating Text-to-image Generative Models using HPS v2

The generated images in our experiments can be downloaded from here.

The following script reproduces the benchmark table and our results on DrawBench (reported in the paper) (Need to install the package hpsv2 first):

# HPS v2 benchmark (for more than one models)
python evaluate.py --data-type benchmark_all --data-path /path/to/HPD/benchmark --image-path /path/to/benchmark_imgs

# HPS v2 benchmark (for only one models)
python evaluate.py --data-type benchmark --data-path /path/to/HPD/benchmark --image-path /path/to/benchmark_imgs/${model_name}

# DrawBench
python evaluate.py --data-type drawbench --data-path /path/to/HPD/benchmark --image-path /path/to/drawbench_imgs

Scoring Single Generated Image and Corresponding Prompt

We provide one example image in the asset/images directory of this repo. The corresponding prompt is "A cat with two horns on its head".

Run the following commands to score the single generated image and the corresponding prompt (Need to install the package hpsv2 first):

python score.py --image-path assets/demo_image.jpg --prompt 'A cat with two horns on its head'

Train Human Preference Predictor

To train your own human preference predictor, just change the corresponding path in configs/controller.sh and run the following command:

# if you are running locally
bash configs/HPSv2.sh train 8 local
# if you are running on slurm
bash configs/HPSv2.sh train 8 ${quota_type}

BibTeX

@article{wu2023human,
  title={Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis},
  author={Wu, Xiaoshi and Hao, Yiming and Sun, Keqiang and Chen, Yixiong and Zhu, Feng and Zhao, Rui and Li, Hongsheng},
  journal={arXiv preprint arXiv:2306.09341},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.