Coder Social home page Coder Social logo

steven-xiong / attention-mask-control Goto Github PK

View Code? Open in Web Editor NEW

This project forked from oppo-mente-lab/attention-mask-control

0.0 0.0 0.0 62 KB

code for paper "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

License: MIT License

Shell 1.05% Python 98.95%

attention-mask-control's Introduction

Code for paper: "Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models"

[Paper]

Requirements

A suitable conda environment named AMC can be created and activated with:

conda env create -f environment.yaml
conda activate AMC

Data Prepearing

First, please download the coco dataset from here. We use COCO2014 in the paper. Then, you can process your data with this script:

python coco_preprocess.py \
    --coco_image_path /YOUR/COCO/PATH/train2014 \
    --coco_caption_file /YOUR/COCO/PATH/annotations/captions_train2014.json \
    --coco_instance_file /YOUR/COCO/PATH/annotations/instances_train2014.json \
    --output_dir /YOUR/DATA/PATH

Training

Before training, you need to change configs in train_boxnet.sh

  • ROOT_DIR: where to save all the results.
  • webdataset_base_urls: /YOUR/DATA/PATH/{xxx-xxx}.tar
  • model_path: stable diffusion v1-5 checkpoint

You can train the BoxNet through this script:

sh train_boxnet.sh

Text-to-Image Synthesis

With a trained BoxNet, you can start the Text-to-Image Synthesis with:

python test_pipeline_onestage.py \
	--stable_model_path /stable-diffusion-v1-5/checkpoint
	--boxnet_model_path /TRAINED/BOXNET/CKPT
	--output_dir /YOUR/SAVE/DIR

all the test prompt is saved in file test_prompts.json.

TODOs

  • Release data preparation code
  • Release inference code
  • Release training code
  • Release demo
  • Release checkpoint

Acknowledgements

This implementation is based on the repo from the diffusers library. Fengshenbang-LM codebase. DETR codebase.

attention-mask-control's People

Contributors

wrch1994 avatar 1073521013 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.