Coder Social home page Coder Social logo

Comments (1)

bloc97 avatar bloc97 commented on July 20, 2024 2

Hi, unfortunately there isn't any reference for the "ad hoc" method I used to compensate for CFG, but I can give a quick explanation, if you have more questions we can discuss this further...

Because ODEs used in diffusion models are somewhat sensitive to initial conditions, using the CFG "vector" at t-1 to invert and find the t latent does not give the correct answer (seen in the fact that it is not always possible to invert a generated image back to the latent if the CFG is high). The correct answer is found by finding what CFG vector at t gives the correct t-1 latent, but since we do not know the latent at t in the first place, how can we find the CFG vector?
One solution is to use a gradient descent approximation, where we first use the wrong CFG vector (at t-1) to get an approximation of the latent at t, then do a forward diffusion pass to re-obtain our latent at t-1, we can then compute the difference and use gradient descent on the CFG vector.

In my simple implementation, I am assuming that the latent landscape near our point of interest (latent at t) is a convex and smooth function (which is most likely wrong), thus I am directly doing gradient descent on the latent at t using the difference of the ground truth and predicted t-1. (The numerically correct method would be to do backprop through the model twice, but it would be too slow...) This solution provided here is literally an approximation of an approximation, but works quite well for images generated by Stable Diffusion. In my tests, images that were produced using a CFG of up to 5.5 can be reasonably well inverted. For real images, the results are satisfactory in most cases up to a CFG of 4.5, but some images cannot be inverted at all.

For the magic number, it was found empirically. If tless is not used, sometimes the result diverges when re-diffusing the inverted latent and you get a completely grey image.

from crossattentioncontrol.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.