Coder Social home page Coder Social logo

bcmi / controlcom-image-composition Goto Github PK

View Code? Open in Web Editor NEW
133.0 14.0 8.0 25.21 MB

A controllable image composition model which could be used for image blending, image harmonization, view synthesis.

Python 99.96% Shell 0.04%
diffusion-model image-blending image-composition image-harmonization object-insertion

controlcom-image-composition's Introduction

ControlCom-Image-Composition

This is the official repository for the following research paper:

ControlCom: Controllable Image Composition using Diffusion Model [arXiv]

Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu

Part of our ControlCom has been integrated into our image composition toolbox libcom https://github.com/bcmi/libcom. Welcome to visit and try \(^▽^)/

Table of Contents

Demo

The online demo of image composition can be found here.

Task Definition

In our controllable image composition model, we unify four tasks in one model using an 2-dim binary indicator vector, in which the first (resp., second) dimension represents whether adjusting the foreground illumination (resp., pose) to be compatible with background. 1 means making adjustment and 0 means remaining the same. Therefore, (0,0) corresponds to image blending, (1,0) corresponds to image harmonization, (0,1) corresponds to view synthesis, (1,1) corresponds to generative composition.

Our method can selectively adjust partial foreground attributes. Previous methods may adjust the foreground color/pose unexpectedly and even unreasonably, even when the foreground illumination and pose are already compatible with the background. In the left part, the foreground pose is already compatible with background and previous methods make unnecessary adjustment. In the right part, the foreground illumination is already compatible with the background and previous methods adjust the foreground color in an undesirable manner.

The (0,0), (1,0) versions without changing foreground pose are very robust and generally well-behaved, but some tiny details may be lost or altered. The (0,1), (1,1) versions changing foreground pose are less robust and may produce the results with distorted structures or noticeable artifacts. For foreground pose variation, we recommend more robust ObjectStitch.

Network Architecture

Our method is built upon stable diffusion and the network architecture is shown as follows.

FOSCom Dataset

  • Download link:
  • Description:
    • This dataset is built upon the existing Foreground Object Search dataset.
    • Each background image within this dataset comes with a manually annotated bounding box. These bounding boxes are suitable for placing one object from a specified category.
    • The resultant dataset consists of 640 pairs of backgrounds and foregrounds. This dataset is utilized in our user study and qualitative comparison.

Code and Model

1. Dependencies

  • Python == 3.8.5

  • Pytorch == 1.10.1

  • Pytorch-lightning == 1.9.0

  • Run

    cd ControlCom-Image-Composition
    pip install -r requirements.txt
    cd src/taming-transformers
    pip install -e .

2. Download Models

  • Please download the following files to the checkpoints folder to create the following file tree:

    checkpoints/
    ├── ControlCom_blend_harm.pth
    ├── ControlCom_view_comp.pth
    └── openai-clip-vit-large-patch14
        ├── config.json
        ├── merges.txt
        ├── preprocessor_config.json
        ├── pytorch_model.bin
        ├── tokenizer_config.json
        ├── tokenizer.json
        └── vocab.json
  • openai-clip-vit-large-patch14 (Huggingface | ModelScope): The foreground encoder of our ControlCom is built on pretrained clip.

  • ControlCom_blend_harm.pth (Huggingface | ModelScope): This model is finetuned for 20 epochs specifically for the tasks of image blending and harmonization. Therefore, when the task argument is set to "blending" or "harmonization" in the following test code, this checkpoint will be loaded.

  • ControlCom_view_comp.pth (Huggingface | ModelScope): This model is enhanced on viewpoint transformation through finetuning for several epochs using additional multi-viewpoint datasets, i.e., MVImgNet. When the task argument is set to "viewsynthesis" or "composition" in the following test code, this checkpoint will be loaded. Note tht this checkpoint can also be used for "blending" and "harmonization". If you wish to use one checkpoint for four tasks, we recommend this checkpoint.

3. Inference on examples

  • To perform image composition using our model, you can use scripts/inference.py. For example,

    python scripts/inference.py \
    --task harmonization \
    --outdir results \
    --testdir examples \
    --num_samples 1 \
    --sample_steps 50 \
    --gpu 0
    

    or simply run:

    sh test.sh
    

These images under examples folder are obtained from COCOEE dataset.

4. Inference on your data

  • Please refer to the examples folder for data preparation:
    • keep the same filenames for each pair of data.
    • either the mask_bbox folder or the bbox folder is sufficient.
    • foreground_mask folder is optional but recommended for better composite results.

5. Training code

Notes: certain sensitive information has been removed since the model training was conducted within a company. To start training, you'll need to prepare your own training data and make necessary modifications to the code according to your requirements.

Experiments

We show our results using four types of indicators.

Evaluation

The quantitative results and evaluation code can be found here.

Acknowledgements

This code borrows heavily from Paint-By-Example. We also appreciate the contributions of Stable Diffusion.

Citation

If you find this work or code is helpful in your research, please cite:

@article{zhang2023controlcom,
  title={Controlcom: Controllable image composition using diffusion model},
  author={Zhang, Bo and Duan, Yuxuan and Lan, Jun and Hong, Yan and Zhu, Huijia and Wang, Weiqiang and Niu, Li},
  journal={arXiv preprint arXiv:2308.10040},
  year={2023}
}

Other Resources

controlcom-image-composition's People

Contributors

bo-zhang-cs avatar ustcnewly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

controlcom-image-composition's Issues

The inputs of UNet

Hi, I am interested in your excellent work. And could you tell me how you unify the inputs of UNet with shape 6464? The local embedding shape is 2561024, the global embedding shape is 768, and the indictor shape is 2*2.

Jittery images after 3-4 generations

Hi team,
We are trying to generate pest-infused plant images. While the 1st and 2nd generations work well. After the third one, we start to get noisy and jittery images.
image_480
The actual background looked something similar to this, to the same, but similar.
203457
Is there a solution/ insight that you can provide for the aforementioned problem?

Release of pretrained checkpoint

In the training config file ControlCom_train/configs/finetune_paint.yaml, the pretrained model path points to pretrained_models/paint-11channels.ckpt. Are there plans to release this checkpoint? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.