Coder Social home page Coder Social logo

Comments (5)

ljleb avatar ljleb commented on August 29, 2024 5

All merge methods are key-level. I omitted weighted sum and add difference as they are trivial.

  • slerp(a, b): circular linear interpolation. normalize A and B, slerp, then recover a proper norm by interpolating the norm of A and the norm of B.
  • perpendicular_component(a, b): intended to be used in delta space: c + perpendicular_component(a - c, b - c). It finds the perpendicular component between A and B. This allows to add difference orthogonal information.
  • geometric_sum(a, b): AND gate. It works either in delta space or directly with model weights. It's equivalent to weighted sum in log space. For any corresponding parameters in A and B, if A or B is 0, then the result is 0; and if A and B have the same value, then that value is returned.
  • add_cosine_a(a, b), add_cosine_b(a, b): brought from supermerger, not exactly sure if they are good or not
  • ties_sum(a, b, c, ...): implementation of ties https://arxiv.org/abs/2306.01708
  • tensor_sum(a, b): copy parameters form A and from B using a window over dimension 0 to decide which model to pick the weights from. Brought from supermerger.
  • top_k_tensor_sum(a, b): reorder the parameters of A in the order of the parameters of B (call this reordered weight C). Then, determine a mask to pick the top k weights in A to give up on, and replace them with values from C at the corresponding indices in the weight.
  • train_difference(a, b, c): original supermerger train difference, except that I found a better filter metric. hako-mikan/sd-webui-supermerger#264 (reply in thread)
  • multiply_quotient(a, b, c): train difference in log space. It tries to make the equation $\frac{AB}{C}$ work. Without the dissimilarity filter, it completely breaks down. Otherwise the resulting model gives very similar outputs to train difference, although it is still different. alpha can be brought up to 4 before it starts breaking down with NaNs at generation. @John-WL came up with the idea and I found a way to implement it.
  • distribution_crossover(a, b, c): reorder the weights of A and B into the order of C. Apply a crossover filter between A and B. A contributes the low end of the model, B contributes the high end. Then, reorder the merged weights back in the order of C.
  • crossover(a, b): n-dimensional crossover between A and B (in the case A and B are conv layers, the spatial dimensions stay on their axis). A contributes the low end, B contributes the high end. The weights are not reordered or reshaped. The filter should be isotropic, but honestly I'm not an expert in filter modelling so this might need to be verified.
  • rotate(a, b): find an orthogonal transform Q that minimizes the frobenius norm between AQ and B, then return $A^{'}Q^{\alpha}$ with $\alpha \in [0,1]$ the alignment factor. $A^{'}$ is the weighted sum between A and $BQ^T$, which effectively interpolates the relationship between the neurons of A and the neurons of B oriented towards A. Contrarily to other methods, this one works on the "neurons" of A and B. a "neuron" is just a quick way to talk about "all parameters that contribute to a single weighted sum operation during inference" (matrix multiplication can be seen as one weighted sum per output value). This is highly inspired by OPT https://opt-training.github.io/
  • clip(a, b): weights clipping, but allows to soften the clip bounds using many models.
  • dropout(a, b, c, ...): implementation of dare but with many models as input. I went a bit experimental with it by adding parameters to control the way the bernoulli mask is created.

I apologize if this is too much text to read. Let me know if I can clarify anything.

from sd-webui-model-mixer.

ljleb avatar ljleb commented on August 29, 2024 2

There is a more up to date set of merge method implementations:
https://github.com/ljleb/sd-mecha/blob/main/sd_mecha/merge_methods.py

Everything marked as @convert_to_recipe is a merge method.

I can explain any/all of them if you want, as I came up with most of them. You can decide whether any is worth implementing. rotate is the slowest one, it takes ~1h on sdxl and ~9 minutes on sd1.5.

from sd-webui-model-mixer.

wkpark avatar wkpark commented on August 29, 2024 1

this routines are tensor level, it means it could be applied to the model-mixer easily.
but also we need to check each algorthm and its meaning and it's speed.

from sd-webui-model-mixer.

wkpark avatar wkpark commented on August 29, 2024

thank you for your information!

from sd-webui-model-mixer.

wkpark avatar wkpark commented on August 29, 2024

Please see #35

from sd-webui-model-mixer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.