Coder Social home page Coder Social logo

zxl502 / diffusiondb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from poloclub/diffusiondb

0.0 0.0 0.0 1.21 MB

A large-scale text-to-image prompt gallery dataset based on Stable Diffusion

Home Page: https://poloclub.github.io/diffusiondb

License: MIT License

Python 99.98% HTML 0.02%

diffusiondb's Introduction

DiffusionDB

hugging license arxiv badge datasheet

DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 2 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users. The unprecedented scale and diversity of this human-actuated dataset provide exciting research opportunities in understanding the interplay between prompts and generative models, detecting deepfakes, and designing human-AI interaction tools to help users more easily use these models.

Get Started

DiffusionDB is available at ๐Ÿค— Hugging Face Datasets.

Dataset Structure

We use a modularized file structure to distribute DiffusionDB. The 2 million images in DiffusionDB are split into 2,000 folders, where each folder contains 1,000 images and a JSON file that links these 1,000 images to their prompts and hyperparameters.

./
โ”œโ”€โ”€ images
โ”‚ย ย  โ”œโ”€โ”€ part-000001
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 3bfcd9cf-26ea-4303-bbe1-b095853f5360.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 5f47c66c-51d4-4f2c-a872-a68518f44adb.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 66b428b9-55dc-4907-b116-55aaa887de30.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ 99c36256-2c20-40ac-8e83-8369e9a28f32.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ f3501e05-aef7-4225-a9e9-f516527408ac.png
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ [...]
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ part-000001.json
โ”‚ย ย  โ”œโ”€โ”€ part-000002
โ”‚ย ย  โ”œโ”€โ”€ part-000003
โ”‚ย ย  โ”œโ”€โ”€ part-000004
โ”‚ย ย  โ”œโ”€โ”€ [...]
โ”‚ย ย  โ””โ”€โ”€ part-002000
โ””โ”€โ”€ metadata.parquet

These sub-folders have names part-00xxxx, and each image has a unique name generated by UUID Version 4. The JSON file in a sub-folder has the same name as the sub-folder. Each image is a PNG file. The JSON file contains key-value pairs mapping image filenames to their prompts and hyperparameters. For example, below is the image of f3501e05-aef7-4225-a9e9-f516527408ac.png and its key-value pair in part-000001.json.

{
  "f3501e05-aef7-4225-a9e9-f516527408ac.png": {
    "p": "geodesic landscape, john chamberlain, christopher balaskas, tadao ando, 4 k, ",
    "se": 38753269,
    "c": 12.0,
    "st": 50,
    "sa": "k_lms"
  },
}

The data fields are:

  • key: Unique image name
  • p: Prompt
  • se: Random seed
  • c: CFG Scale (guidance scale)
  • st: Steps
  • sa: Sampler

At the top level folder of DiffusionDB, we include a metadata table in Parquet format metadata.parquet. This table has seven columns: image_name, prompt, part_id, seed, step, cfg, and sampler, and it has 2 million rows where each row represents an image. seed, step, and cfg are We choose Parquet because it is column-based: researchers can efficiently query individual columns (e.g., prompts) without reading the entire table. Below are the five random rows from the table.

image_name prompt part_id seed step cfg sampler
49f1e478-ade6-49a8-a672-6e06c78d45fc.png ryan gosling in fallout 4 kneels near a nuclear bomb 1643 2220670173 50 7.0 8
b7d928b6-d065-4e81-bc0c-9d244fd65d0b.png A beautiful robotic woman dreaming, cinematic lighting, soft bokeh, sci-fi, modern, colourful, highly detailed, digital painting, artstation, concept art, sharp focus, illustration, by greg rutkowski 87 51324658 130 6.0 8
19b1b2f1-440e-4588-ba96-1ac19888c4ba.png bestiary of creatures from the depths of the unconscious psyche, in the style of a macro photograph with shallow dof 754 3953796708 50 7.0 8
d34afa9d-cf06-470f-9fce-2efa0e564a13.png close up portrait of one calico cat by vermeer. black background, three - point lighting, enchanting, realistic features, realistic proportions. 1685 2007372353 50 7.0 8
c3a21f1f-8651-4a58-a4d4-7500d97651dc.png a bottle of jack daniels with the word medicare replacing the word jack daniels 243 1617291079 50 7.0 8

To save space, we use an integer to encode the sampler in the table above.

Sampler Integer Value
ddim 1
plms 2
k_euler 3
k_euler_ancestral 4
ddik_heunm 5
k_dpm_2 6
k_dpm_2_ancestral 7
k_lms 8
others 9

Loading DiffusionDB

DiffusionDB is large (1.6TB)! However, with our modularized file structure, you can easily load a desirable number of images and their prompts and hyperparameters. In the example-loading.ipynb notebook, we demonstrate three methods to load a subset of DiffusionDB. Below is a short summary.

Method 1: Using Hugging Face Datasets Loader

You can use the Hugging Face Datasets library to easily load prompts and images from DiffusionDB. We pre-defined 16 DiffusionDB subsets (configurations) based on the number of instances. You can see all subsets in the Dataset Preview.

import numpy as np
from datasets import load_dataset

# Load the dataset with the `random_1k` subset
dataset = load_dataset('poloclub/diffusiondb', 'random_1k')

Method 2. Manually Download the Data

All zip files in DiffusionDB have the following URLs, where {xxxxxx} ranges from 000001 to 002000. Therefore, you can write a script to download any number of zip files and use them for your task.

https://huggingface.co/datasets/poloclub/diffusiondb/resolve/main/images/part-{xxxxxx}.zip

from urllib.request import urlretrieve
import shutil

# Download part-000001.zip
part_id = 1
part_url = f'https://huggingface.co/datasets/poloclub/diffusiondb/resolve/main/images/part-{part_id:06}.zip'
urlretrieve(part_url, f'part-{part_id:06}.zip')

# Unzip part-000001.zip
shutil.unpack_archive(f'part-{part_id:06}.zip', f'part-{part_id:06}')

Method 3. Use metadata.parquet (Text Only)

If your task does not require images, then you can easily access all 2 million prompts and hyperparameters in the metadata.parquet table.

from urllib.request import urlretrieve
import pandas as pd

# Download the parquet table
table_url = f'https://huggingface.co/datasets/poloclub/diffusiondb/resolve/main/metadata.parquet'
urlretrieve(table_url, 'metadata.parquet')

# Read the table using Pandas
metadata_df = pd.read_parquet('metadata.parquet')

Dataset Creation

We collected all images from the official Stable Diffusion Discord server. Please read our research paper for details. The code is included in ./scripts/.

Data Removal

If you find any harmful images or prompts in DiffusionDB, you can use this Google Form to report them. Similarly, if you are a creator of an image included in this dataset, you can use the same form to let us know if you would like to remove your image from DiffusionDB. We will closely monitor this form and update DiffusionDB periodically.

Credits

DiffusionDB is created by Jay Wang, Evan Montoya, David Munechika, Alex Yang, Ben Hoover, Polo Chau.

Citation

@article{wangDiffusionDBLargescalePrompt2022,
  title = {{{DiffusionDB}}: {{A}} Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models},
  author = {Wang, Zijie J. and Montoya, Evan and Munechika, David and Yang, Haoyang and Hoover, Benjamin and Chau, Duen Horng},
  year = {2022},
  journal = {arXiv:2210.14896 [cs]},
  url = {https://arxiv.org/abs/2210.14896}
}

Licensing

The DiffusionDB dataset is available under the CC0 1.0 License. The Python code in this repository is available under the MIT License.

Contact

If you have any questions, feel free to open an issue or contact Jay Wang.

diffusiondb's People

Contributors

xiaohk avatar alexanderhyang avatar evan-eng avatar davidmunechika avatar bhoov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.