Coder Social home page Coder Social logo

wisconsinaivision / visii Goto Github PK

View Code? Open in Web Editor NEW
74.0 4.0 1.0 127.44 MB

๐Ÿ‘€ Visual Instruction Inversion: Image Editing via Visual Prompting (NeurIPS 2023)

Home Page: https://thaoshibe.github.io/visii/

CSS 5.92% JavaScript 18.34% HTML 10.85% Python 64.90%
diffusion-models image-editing visual-prompting image-manipulation neurips neurips-2023

visii's Introduction

VISII - Visual Instruction Inversion ๐Ÿ‘€

./assets/images/teaser.png
Visii learn instruction from before โ†’ after image, then apply to new images to perform same edit.

๐Ÿ‘€ Visual Instruction Inversion: Image Editing via Image Prompting (NeurIPS 2023)
Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
๐Ÿฆก University of Wisconsin-Madison

TL;DR: A framework for inverting visual prompts into editing instructions for text-to-image diffusion models.

ELI5 ๐Ÿ‘ง: You show the machine how to perform a task (by images), and then it replicates your actions. For example, it can learn your drawing style ๐Ÿ–๏ธ and use it to create a new drawing ๐ŸŽจ.

result

๐Ÿ”— Jump to: Requirements | Quickstart | Visii + Ip2p | Visii + ControlNet | BibTeX | ๐Ÿงš Go Crazy ๐Ÿงš

Requirements

This script is tested on NVIDIA RTX 3090, Python 3.7 and PyTorch 1.13.0 and diffusers.

pip install -r requirements.txt

Quickstart

Visual Instruction Inversion with InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test.py --hybrid_ins True --prompt "a husky" --guidance_scale 10

Result image will be saved in ./result folder.

Before:
before
After:
after
Test:
test

Visii learns editing instruction from dog โ†’ watercolor dog image, then applies it into new image to perform same edit. You can also concatenate new information to achieve new effects: dog โ†’ watercolor husky.

Different photos are generated from different noises.
<ins>
<ins> + "a husky" ๐Ÿถ
<ins> + "sa quirrel" ๐Ÿฟ๏ธ
<ins> + "a tiger" ๐Ÿฏ
<ins> + "a rabbit" ๐Ÿฐ
<ins> + "a blue jay" ๐Ÿฆ
<ins> + "a polar bear" ๐Ÿปโ€โ„๏ธ
<ins> + "a badger" ๐Ÿฆก
on & on ...

โš ๏ธ If you're not getting the quality that you want... You might tune the guidance_scale.

<ins> + "a poodle": From left to right: Increase the guidance scale (4, 6, 8, 10, 12, 14)
Starbucks Logo

๐Ÿงš๐Ÿงš๐Ÿงš Inspired by this reddit, we tested Visii + InstructPix2Pix with Starbucks and Gandour logos.

Before:
before
After:
after

Test:
test
<ins>
+ "Wonder Woman"
ours
<ins>
+ "Scarlet Witch"
ours
<ins>
+ "Daenerys Targaryen"
ours
<ins>
+ "Neytiri in Avatar"
ours
<ins>
+ "She-Hulk"
ours
<ins>
+ "Maleficent"
ours

(If you're still not getting the quality that you want... You might tune the InstructPix2Pix parameters. See Tips or Optimizing progress โš ๏ธ for more details.)

Visual Instruction Inversion

1. Prepare before-after images: A basic structure for image-folder should look like below. {image_name}_{0}.png denotes before image, {image_name}_{1}.png denotes after image.

By default, we use 0_0.png as the before image and 0_1.png as the after image. 1_0.png is the test image.

{image_folder}
โ””โ”€โ”€โ”€{subfolder}
    โ”‚   0_0.png # before image
    โ”‚   0_1.png # after image
    โ”‚   1_0.png # test image

Check ./images/painting1 for example folder structure.

2. Instruction Optimization: Check the ./configs/ip2p_config.yaml for more details of hyper-parameters and settings.

Visii + InstructPix2Pix
# optimize <ins> (default checkpoint)
python train.py --image_folder ./images --subfolder painting1
# test <ins>
python test.py --log_folder ip2p_painting1_0_0.png
# hybrid instruction: <ins> + "a squirrel" (default checkpoint)
python test_concat.py --prompt "a husky"
Visii + ControlNet!

We plugged Visii with ControlNet 1.1 InstructPix2Pix.

# optimize <ins> (default checkpoint)
python train_controlnet.py --image_folder ./images --subfolder painting1
# test <ins>
python test_controlnet.py --log_folder controlnet_painting1_0_0.png

Optimizing Progress

By default, we use the lowest MSE checkpoint (./logs/{foldername}/best.pth) as the final instruction.

Sometimes, the best.pth checkpoint might not yield the best result.

If you want to use a different checkpoint, you can specify it using the --checkpoint_number argument.

A visualization of the optimization progress is saved in ./logs/{foldername}/eval_100.png โš ๏ธ. You can visually select the best checkpoint for testing.

# test <ins> (with specified checkpoint)
python test.py --log_folder ip2p_painting1_0_0.png --checkpoint_number 800
# hybrid instruction: <ins> + "a squirrel" (with specified checkpoint)
python test_concat.py --prompt "a husky" --checkpoint_number 800
From left to right: [Before, After, Iter 0, Iter 100, ..., Iter 900]. You can visually select the best checkpoint for testing.
  • Side note: Before-after image should be algined for better results.

Acknowledgement

Ours code is based on InstructPix2Pix, Hard Prompts Made Easy, Imagic, and Textual Inversion. You might also check awesome Visual Prompting via Image Inpainting. Thank you! ๐Ÿ™‡โ€โ™€๏ธ

Photo credit: Bo the Shiba & Mam the Cat ๐Ÿ•๐Ÿˆ.

BibTeX

@inproceedings{
nguyen2023visual,
title={Visual Instruction Inversion: Image Editing via Image Prompting},
author={Thao Nguyen and Yuheng Li and Utkarsh Ojha and Yong Jae Lee},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=l9BsCh8ikK}
}

visii's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

h4shk4t

visii's Issues

Controlnet and Checkpoint

Hi, thanks for your work. I have some questions as follows:

  1. how to perform with controlnet using the following command, as there is no such file named train_controlnet.py
python train_controlnet.py --image_folder ./images --subfolder painting1
  1. Could you release a checkpoint that is trained on the combination of the subset of CleanInstructPix2Pix and other data, such that we could reproduce the results of Figure 7, the main result of this paper?

Question about the reproduce.

Hi I have a quesion about the reproduct.

I trained a model according this commend.
"python train.py --image_folder ./images --subfolder painting1"

And then, I test a model using this commend.
"python test.py --prompt 'a husky'"

However, the results I got are quite different from what the GitHub README suggests.
image
image

Which options can I change to achieve the same output?

Thank you!

Clean-instructpix2pix dataset

Hi, what do you mean for clean-instructpix2pix dataset? I see the reference is the original paper, but it doesn't mention clean-instructpix2pix dataset, could you introduce more and give a link?

Estimated release timeline?

Hey, thanks for the great paper -- its a super interesting idea, do you have an estimated timeline of when you plan to release the code?

Regarding the checkpoint issue

Dear authors,

I hope this message finds you well. I'm currently attempting to replicate the Visual Instruction Inversion project you shared on GitHub. However, I couldn't locate any checkpoints related to InstructPix2Pix within the provided codebase.

While going through the 'train.py' file, I noticed that the default checkpoint should be placed in the './logs/' directory. Regrettably, I couldn't find this directory within the project.

I wanted to inquire whether this code includes the checkpoint section. If not, would it be possible for me to request a checkpoint from you to facilitate my replication process?

Thank you sincerely for your time and consideration. I eagerly await your response.

Best regards,
Zou Ling

questions about metrics

Hi author, thanks for your team's contribution.

I would like to ask you a question about calculating the metrics during the training process. Specifically, the training process is usually interspersed with a validation step, do you perform the computation of the evaluation metrics during the validation step, which seems to be time consuming. So I'm wondering how you schedule the evaluation during the training process?

Questions about implementation

Hello,

I'm trying to implement your paper on my own. I'm confused with the "reusing identical noises during inference" part.

During training, since we are randomly sampling timestep t, there could be multiple or no noises sampled for a specific t. For example, with T=1000, during training I could have sampled t=200 twice and never sampled t=100. During inference time, what noises should I use for t=200 and t=100?

Which inference method do you use? If I use deterministic ODE solvers such as DDIM, where no noise is needed during backward process, what should I do?

Do you have an estimated time for code release?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.