Coder Social home page Coder Social logo

mlperf_sd_inference's Introduction

Deprecation notice

This repo is no longer in use or maintained, the benchmark code now lives in the MLCommons inference repo here.

Stable Diffusion Inference


Welcome to the experimental stable diffusion inference repository. This repository aims to test and evaluate the model as a new benchmark candidate for MLPerf inference .

References

Getting Started

1. Setting Up the Environment

To set up the environment, build and launch the container using the following commands:

docker build . -t sd_mlperf_inference
docker run --rm -it --gpus=all -v {PWD}:/workspace sd_mlperf_inference bash

Note : The subsequent commands are assumed to be executed within this container.

2. Dataset Overview

This repository leverages the coco-2014 validation set for image generation and computation of FID and CLIP scores. COCO (Common Objects in Context) is a diverse dataset instrumental for object detection, segmentation, and captioning tasks. It boasts a substantial validation set with over 40,000 images encompassing more than 200,000 labels.

For benchmarking purposes, we utilize a random subset of {TBD} images and their associated labels, determined by a preset seed of {TBD}. As the focus is on image generation from labels and subsequent score calculation, downloading the entire COCO dataset is unnecessary. The required files for the benchmark, already part of the repository:

  • captions.tsv: Contains processed coco2014 validation annotations with 40,504 prompts and their respective IDs. Essential for image generation and CLIP scoring.
  • captions_5k.tsv: Contains the benchmark annotations. Essential for image generation and CLIP scoring.
  • val2014.npz: Consists of precomputed statistics of the coco2014 validation set. Utilized for FID scoring.

For details on file generation, refer to Appendix A .

3. Image Generation

Execute the main.py script to generate images:

python main.py \
    --model-id xl \         # xl for SD-XL, xlr for SD-XL + Refiner
    --guidance 8.0 \
    --precision fp16 \     # fp16, bf16 and fp32
    --scheduler euler \
    --steps 20 \
    --latent-path latents.pt

For additional execution options:

python main.py --help

4. Compute FID Score

python fid/fid_score.py \
    --batch-size 1 \                 # batch size for the inception network. keep it 1.
    --subset-size 35000 \            # validation subset size, if you want to score the full dataset don't set the argument
    --shuffle-seed 2023 \            # the seed used for random the random subset selection
    ./val2014.npz \                  # ground truth (coco 2014 validation) statistics
    ./output                         # folder with the generated images

For more options:

python fid/fid_score.py --help

5. Compute CLIP Score

python clip/clip_score.py \
    --subset-size 35000 \          # validation subset size, if you want to score the full dataset don't set the argument
    --shuffle-seed 2023 \          # the seed used for random the random subset selection
    --tsv-file captions_5k.tsv \   # captions file
    --image-folder ./output        # Folder with the generated images
    --device cuda                  # Device in which CLIP model is run (cpu, cuda)

For more options:

python clip/clip_score.py --help

Appendix A: Generating Dataset Files

To create the captions.tsv, captions_5k.tsv and val2014.npz files:

  1. Download the coco2014 validation set:
scripts/coco-2014-validation-download.sh
  1. Process the downloaded annotations (provided in JSON format):
python process-coco-annotations.py \
    --input-captions-file {PATH_TO_COCO_ANNOTATIONS_FILE} \                 # Input annotations file
    --output-tsv-file captions.tsv \                                        # Output annotations
    --allow-duplicate-images                                                # Pick one prompt per image
  1. Select a pseduo-random captions subset ():
python subset_generator.py \
    --seed 2023 \                               # Random number generator seed
    --subset-size 5000 \                        # Subset size
    --input-captions-file captions.tsv \        # Input annotations file
    --output-captions-file captions_5k.tsv      # Output annotations
  1. Generate ground truth statistics:
python fid/fid_score.py \
    --batch-size 1 \                      # inception network batch size
    --save-stats {COCO_2014_IMAGES} \     # Input folder with coco images
    val2014.npz                           # Output file

mlperf_sd_inference's People

Contributors

ahmadki avatar badhri-intel avatar

Watchers

 avatar

Forkers

badhri-intel

mlperf_sd_inference's Issues

Consider more images for FID

This is more like a discussion, shall we consider more generated images per prompt while calculating FID? The generated images can be very different if the seed changes, shall we do some aggregation for FID? Trying to understand the implication of considering one generated image per prompt.

Imaginary component error in FID code when using fewer images (<2048)

When we run FID script with using fewer generated images (typically < 2048 images), the covariance matrix has complex numbers and it raises the following error.

Traceback (most recent call last):
File "fid_score.py", line 260, in
args.dims)
File "fid_score.py", line 248, in calculate_fid_given_paths
fid_value = calculate_frechet_distance(m1, s1, m2, s2)
File "fid_score.py", line 184, in calculate_frechet_distance
raise ValueError('Imaginary component {}'.format(m))
ValueError: Imaginary component 0.6888144870124462

This is a bug in the FID code as mentioned here. @ahmadki Please let us know how you are able to calculate valid FID scores for much fewer samples (100 samples for instance)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.