Coder Social home page Coder Social logo

garibida / renoise-inversion Goto Github PK

View Code? Open in Web Editor NEW
117.0 17.0 4.0 9.02 MB

Officail Implementation for "ReNoise: Real Image Inversion Through Iterative Noising"

Home Page: https://garibida.github.io/ReNoise-Inversion/

CSS 0.03% Python 99.97%
diffusion-models image-editing inversion lcm lcm-lora sdxl-turbo stable-diffusion text-to-image

renoise-inversion's Issues

Configuration for models other than SDXL Turbo

Great work!

When I use SDXL Turbo with the configurations you provided I get good results. for example:
image
But when I try using other models the results seems bad, including SDXL (not turbo). Here are the relevant code I used:

model_type = Model_Type.SDXL
scheduler_type = Scheduler_Type.DDIM
pipe_inversion, pipe_inference = get_pipes(model_type, scheduler_type, device=device)

input_image = Image.open("example_images/lion.jpeg").convert("RGB").resize((512, 512))
prompt = "a lion"

config = RunConfig(
    model_type=model_type,
    scheduler_type=scheduler_type,
    num_inversion_steps=50,
    num_inference_steps=50,
    num_renoise_steps=1,
    perform_noise_correction=False,
)

edit_img, inv_latent, noise, all_latents = invert(
    input_image,
    prompt,
    config,
    pipe_inversion=pipe_inversion,
    pipe_inference=pipe_inference,
    do_reconstruction=True,
    edit_prompt="a tiger in the field",
)

I believe it is a matter of specific configurations each model has. I tried to modify the configurations according to the appendix as you can see, but it still did not work. Here is the result:
image

Thanks in advance!

Inversion with guidance scale > 1.0

Hi,

I noticed that in the default run config, the guidance scale for reconstruction is set to 0. and the examples use a guidance scale of 1.0 for inference. I tried to set that to the usual value of 7 and get the error:

Traceback (most recent call last):
  File "/mnt/USER/ReNoise-Inversion-main/inversion_example_sd.py", line 30, in <module>
    _, inv_latent, _, all_latents = invert(input_image,
  File "/mnt/USER/ReNoise-Inversion-main/main.py", line 44, in run
    res = pipe_inversion(prompt = prompt,
  File "/mnt/USER/ReNoise-Inversion-main/src/pipes/sd_inversion_pipeline.py", line 153, in __call__
    latents = inversion_step(self,
  File "/mnt/USER/ReNoise-Inversion-main/src/renoise_inversion.py", line 146, in inversion_step
    noise_pred = noise_regularization(noise_pred, noise_pred_optimal, lambda_kl=pipe.cfg.noise_regularization_lambda_kl, lambda_ac=pipe.cfg.noise_regularization_lambda_ac, num_reg_steps=pipe.cfg.noise_regularization_num_reg_steps, num_ac_rolls=pipe.cfg.noise_regularization_num_ac_rolls, generator=generator)
  File "/mnt/USER/ReNoise-Inversion-main/src/renoise_inversion.py", line 11, in noise_regularization
    l_kld = patchify_latents_kl_divergence(_var, noise_pred_optimal)
  File "/mnt/USER/ReNoise-Inversion-main/src/renoise_inversion.py", line 69, in patchify_latents_kl_divergence
    kl = latents_kl_divergence(x0, x1).sum()
  File "/mnt/USER/ReNoise-Inversion-main/src/renoise_inversion.py", line 82, in latents_kl_divergence
    torch.log((var1 + EPSILON) / (var0 + EPSILON))
RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 0

Is it possible to use ReNoise inversion with classifier-free guidance (guidance scale > 1) and why do you use a value of 0 for guidance during the inversion and 1.0 during inference in the examples?

Thank you!

Issue with Image Reconstruction Using LCM Scheduler

Thank you for sharing your code

I am currently facing an issue when trying to reconstruct images using the LCM scheduler.
Below is the code that I'm working with:

from PIL import Image
from src.eunms import Model_Type, Scheduler_Type
from src.utils.enums_utils import get_pipes
from src.config import RunConfig
from main import run as invert

model_type = Model_Type.LCM_SDXL
scheduler_type = Scheduler_Type.LCM

device = 'cuda'
pipe_inversion, pipe_inference = get_pipes(model_type, scheduler_type, device=device)

input_image = Image.open("example_images/lion.jpeg").convert("RGB").resize((512, 512))
prompt = "a lion in the field"

config = RunConfig(model_type = model_type,
                    scheduler_type = scheduler_type)

rec_img, inv_latent, noise, all_latents = invert(input_image,
                                                 prompt,
                                                 config,
                                                 pipe_inversion=pipe_inversion,
                                                 pipe_inference=pipe_inference,
                                                 do_reconstruction=True)

When using the same code with model_type set to SDXL_Turbo and scheduler_type to EULER, everything works fine.
However, when using LCM, I encounter an error stating that MyCLMScheduler has no attribute step_and_update_noise.
Could you please advise on how to address this issue?

Additionally, I have some questions regarding the inversion process with LCM:

The inversion process with LCM isn't discussed in the paper. Specifically, lcm_scheduler.py lines 185-188 detail the main process, but I'm unclear on where the equations used are derived from. Could you provide some clarity or point me towards relevant resources that explain this?

Thank you for your help and I look forward to your response.

Reconstruct image fail.

Hi, very good work and thanks for open source code!
I use inversion_example_sdxl.py to reconstruct the image of lion:
image

but get results like:
image

All settings are as default.
Would you please give me some suggestions?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.