chenyangsi / freeu Goto Github PK

FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

Home Page: https://chenyangsi.top/FreeU/

License: MIT License

freeu's Introduction

Academic Template for Hugo

The Hugo Academic Resumé Template empowers you to create your job-winning online resumé and showcase your academic publications.

Check out the latest demo of what you'll get in less than 10 minutes, or view the showcase.

Wowchemy makes it easy to create a beautiful website for free. Edit your site in Markdown, Jupyter, or RStudio (via Blogdown), generate it with Hugo, and deploy with GitHub or Netlify. Customize anything on your site with widgets, themes, and language packs.

👉 Get Started
📚 View the documentation
💬 Chat with the Wowchemy community or Hugo community
🐦 Twitter: @wowchemy @GeorgeCushen #MadeWithWowchemy
💡 Request a feature or report a bug for Wowchemy
⬆️ Updating Wowchemy? View the Update Guide and Release Notes

Crowd-funded open-source software

To help us develop this template and software sustainably under the MIT license, we ask all individuals and businesses that use it to help support its ongoing maintenance and development via sponsorship.

❤️ Click here to unlock rewards with sponsorship

Ecosystem

Wowchemy Admin: An admin tool to import publications from BibTeX

freeu's People

Contributors

Stargazers

Watchers

Forkers

zhuxiongwei24 kustomzone x-ck-x ishow520 cian0 bluehope luke2642 soudia jags111 nasirkhalid24 looooongchen eltociear leftomelas yqgao716 andreped liangofthechen wangzhiwei-ai cloudybai devphampham xiusdk ailabteam tonydev-ml zeronerorgb muharremokutan gorockronnie peterzs dl-diffusion lyntai great1001 kevinwang676 unclep fastflair deqingli jackzhousz 5l1v3r1 alexandor91 ai-ml-lab anibus rfan-debug resleeve lrq3000 arggasasao hinahyugahime bingxueyouwu navezjt af-74413592 zhaopufeng sai0246069 sadernalwis marenan

freeu's Issues

when code will be released?

FreeU with Lora Stack

Since there is no Discussion Tab available, I am asking a question here.

In comfyui should we put FreeU node before or after lora Stack ?

Whether freeu can be applied to Stable Cascade

Implemented in a genAI add-on for Blender

Thank you for sharing your work!

Using this wrapper: https://github.com/lyn-rgb/FreeU_Diffusers I was able to implement it as an experimental feature in the genAI add-on for Blender: https://github.com/tin2tin/Pallaidium

Here are some early tests:

These are rendered with SDXL - maybe they're a bit too dark?

How to use it in SimpleCrossAttnUpBlock2D?

I've tried to change your code in order to maintain SimpleCrossAttnUpBlock2D however it seems that shapes doesn't fit up. How can I do it? Thanks!

  File "/usr/local/lib/python3.9/dist-packages/gradio/routes.py", line 523, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1437, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 1109, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.9/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.9/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.9/dist-packages/gradio/utils.py", line 865, in wrapper
    response = f(*args, **kwargs)
  File "/home/ubuntu/mimesis-ml-gan-backend/app.py", line 128, in generate
    image = pipe(image=input_image,
  File "/usr/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/mimesis-ml-gan-backend/src/diffusions/kandinsky/pipeline_kandinsky_img2img_scheduler.py", line 125, in __call__
    noise_pred = self.unet(
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 1020, in forward
    sample = upsample_block(
  File "/usr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/mimesis-ml-gan-backend/free_lunch_utils.py", line 166, in forward
    hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4 ```

running FreeU on SM_52 cards

hello. i'm trying to use freeU with SD1.5 on a TitanX 12Gb card and i get this message :
RuntimeError: cuFFT doesn't support signals of half type with compute capability less than SM_53, but the device containing input half tensor only has SM_52

the strange thing is that it runs perfectly when used with SDXL so i was wondering ..
(i'm using diffusers)

diffusers version: 0.27.0
Platform: Linux-5.15.0-94-generic-x86_64-with-glibc2.29
Python version: 3.8.0
PyTorch version (GPU?): 2.2.1+cu121 (True)
Huggingface_hub version: 0.21.4
Transformers version: 4.38.2
Accelerate version: 0.27.2
xFormers version: not installed

how did u generate the SDXL image samples in the paper?

pure python inference/or in diffusers/or in sdwebui?
could u provide some related inference scripts that can reproduce the SDXL results?
Many thanks!

Details of the hyperparameters

Thanks for this interesting work. While I was trying to implement this, I find that some key hyperparameters are missing, for example:

What are the radius and radius threshold for the skip features?
Why use only half of the channels for (x)?
What are default recommended values for s and b? Are they the same for different models?

Thanks!

What's your class UNetModel and function timestep_embedding

This question may be a bit silly. When I read your code, I found that your class inherited UNetModel and used the function timestep_embedding, but you did not open source them. Could you please open source them, it would make it easier to try FreeU.

how to use this code in diffusers?

Where should it be placed when used in comfyui?

Since I still have to use IPAdapter, I'm not sure where FreeU should be placed.
Should it be before or after the Ipadapter?

You should look to input blocks

The results from FreeU often cause a colors to be far too intense (with default values) often causing burning that is specifically something looked to be avoided. Applying your method to the input_blocks provides extra details, without the cost of color burning. I have made a mod of this called FreeU Advanced for ComfUI that does this (among other experiments) if you want to take a look.

What are the parameters for Stable Diffusion 1.5 and required Sampler?

Hi,

What are the parameters for Stable Diffusion 1.5? And does it depend on a specific sampler (like DDIM, PLMS etc)? Which sampler works best?

Threshold of factor s

The factor s used parameters as threshold = 1 , which makes s1 and s2 no sense in inference.
Is this intended ? Or it's just a little bug

Can you add a License to this repository?

Where are the install instructions???

How do you install this? Where is the information???

generate the similar image pairs

how to generate the similar image pairs like these in the paper. my results always change a lot 'w freeu' or 'wo freeu'

FreeU doesn't work with ZLUDA on auto1111

Hello! I'm not sure if it's an issue with Zluda (it probably is), but freeU doesn't seem to work at all with it. Whenever I try to generate an image with freeU activated, it returns an error:

*** Error completing request
*** Arguments: ('task(4ctd2qw8925ax5d)', <gradio.routes.Request object at 0x0000020FAF26F460>, "a test to showcase that freeU doesn't work", '', [], 20, 'Euler a', 1, 1, 7, 1152, 768, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, 'MultiDiffusion', False, True, 1024, 1024, 96, 96, 48, 4, 'None', 2, False, 10, 1, 1, 64, False, False, False, False, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 0.4, 0.4, 0.2, 0.2, '', '', 'Background', 0.2, -1.0, False, 'DemoFusion', True, 128, 64, 4, 2, False, 10, 1, 1, 64, False, True, 3, 1, 1, False, 3072, 192, True, True, True, False, True, 0, 1, 0, 'Version 2', 1.2, 0.9, 0, 0.5, 0, 1, 1.4, 0.2, 0, 0.5, 0, 1, 1, 1, 0, 0.5, 0, 1, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "F:\SD-Zluda\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "F:\SD-Zluda\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "F:\SD-Zluda\modules\txt2img.py", line 110, in txt2img
        processed = processing.process_images(p)
      File "F:\SD-Zluda\modules\processing.py", line 787, in process_images
        res = process_images_inner(p)
      File "F:\SD-Zluda\modules\processing.py", line 1015, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "F:\SD-Zluda\modules\processing.py", line 1351, in sample
        samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
      File "F:\SD-Zluda\modules\sd_samplers_kdiffusion.py", line 239, in sample
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "F:\SD-Zluda\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "F:\SD-Zluda\modules\sd_samplers_kdiffusion.py", line 239, in <lambda>
        samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "F:\SD-Zluda\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "F:\SD-Zluda\repositories\k-diffusion\k_diffusion\sampling.py", line 145, in sample_euler_ancestral
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "F:\SD-Zluda\modules\sd_samplers_cfg_denoiser.py", line 237, in forward
        x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "F:\SD-Zluda\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "F:\SD-Zluda\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "F:\SD-Zluda\modules\sd_models_xl.py", line 44, in apply_model
        return self.model(x, t, cond)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1561, in _call_impl
        result = forward_call(*args, **kwargs)
      File "F:\SD-Zluda\modules\sd_hijack_utils.py", line 18, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "F:\SD-Zluda\modules\sd_hijack_utils.py", line 32, in __call__
        return self.__orig_func(*args, **kwargs)
      File "F:\SD-Zluda\repositories\generative-models\sgm\modules\diffusionmodules\wrappers.py", line 28, in forward
        return self.diffusion_model(
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "F:\SD-Zluda\venv\lib\site-packages\torch\nn\modules\module.py", line 1561, in _call_impl
        result = forward_call(*args, **kwargs)
      File "F:\SD-Zluda\modules\sd_unet.py", line 91, in UNetModel_forward
        return original_forward(self, x, timesteps, context, *args, **kwargs)
      File "F:\SD-Zluda\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 997, in forward
        h = th.cat([h, hs.pop()], dim=1)
      File "F:\SD-Zluda\extensions\sd-webui-freeu\lib_free_u\unet.py", line 67, in free_u_cat_hijack
        h_skip = filter_skip(
      File "F:\SD-Zluda\extensions\sd-webui-freeu\lib_free_u\unet.py", line 99, in filter_skip
        x_freq = torch.fft.fftn(x.to(fft_device).float(), dim=(-2, -1))
    RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

I'm running auto1111 on an AMD GPU (rx 6800) with the directml fork of auto1111 and Zluda installed on top, if that helps. I'm not sure what other details I can provide to help troubleshoot the issue, so feel free to ask :)

UnetModel

I save the code in .py file but I get an error when ComfyUi start:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\FreeU\init.py'
Cannot import C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\FreeU module for custom nodes: [Errno 2] No such file or directory: 'C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\FreeU\init.py'

Node is available and works in ComfyUI... but I'm not sure to get 100% of the function.
class Free_UNetModel(UNetModel)
no UnitModel.py file... anywhere, in ComfyUI or A1111 folders

Some feedback

Disclaimer: I'm not an AI researcher so I could've done something wrong.

So I got it working after some minor changes and import fixes. Now when I run it with 512x768 resolution the issue is: RuntimeError: cuFFT only supports dimensions whose sizes are powers of two when computing in half precision, but got a signal size of[12, 8] I suppose that's the corresponding layer size which becomes rectangular because of the base resolution being like that. 512x512 should result in a [8, 8] array here and it all works fine.

As expected, running A1111 with --no-half makes it work but the speed is much worse.

I used the parameters for SD1.4 and simply hardcoded them to quickly test if it works at all. On a fine tuned model epiCRealism naturalSin FreeU makes the images worse: they become more saturated, the skin texture turns into plastic (maybe because we suppress the high frequency features exactly?). It starts looking more like the base models or the early fine tuned models:

Original:

FreeU:

AnimateDiff doesn't seem to work in --no-half mode, throws a CUDA error. So we're limited by 512x512. Same symptoms of oversaturation, the skin quality doesn't apply due to grain and artifacts. However, FreeU added a third hand. I tried two slightly different prompts.

Original:

FreeU:

The faces are garbled in all cases but to be honest I much prefer the results without FreeU. The colors are better, the anatomy is better, the skirt is much more detailed and moves more naturally.

My patch, applied to stable-diffusion-webui/repositories/stable-diffusion-stability-ai:

diff --git a/ldm/modules/diffusionmodules/openaimodel.py b/ldm/modules/diffusionmodules/openaimodel.py
index cc3875c..ede0b5a 100644
--- a/ldm/modules/diffusionmodules/openaimodel.py
+++ b/ldm/modules/diffusionmodules/openaimodel.py
@@ -4,6 +4,7 @@ import math
 import numpy as np
 import torch as th
 import torch.nn as nn
+import torch.fft as fft
 import torch.nn.functional as F
 
 from ldm.modules.diffusionmodules.util import (
@@ -418,6 +419,24 @@ class Timestep(nn.Module):
         return timestep_embedding(t, self.dim)
 
 
+def Fourier_filter(x, threshold, scale):
+    # FFT
+    x_freq = fft.fftn(x, dim=(-2, -1))
+    x_freq = fft.fftshift(x_freq, dim=(-2, -1))
+
+    B, C, H, W = x_freq.shape
+    mask = th.ones((B, C, H, W)).cuda()
+
+    crow, ccol = H // 2, W //2
+    mask[..., crow - threshold:crow + threshold, ccol - threshold:ccol + threshold] = scale
+    x_freq = x_freq * mask
+
+    # IFFT
+    x_freq = fft.ifftshift(x_freq, dim=(-2, -1))
+    x_filtered = fft.ifftn(x_freq, dim=(-2, -1)).real
+
+    return x_filtered
+
 class UNetModel(nn.Module):
     """
     The full UNet model with attention and timestep embedding.
@@ -798,8 +817,24 @@ class UNetModel(nn.Module):
             hs.append(h)
         h = self.middle_block(h, emb, context)
         for module in self.output_blocks:
-            h = th.cat([h, hs.pop()], dim=1)
-            h = module(h, emb, context)
+            if True:
+                hs_ = hs.pop()
+
+                # --------------- FreeU code -----------------------
+                # Only operate on the first two stages
+                if h.shape[1] == 1280:
+                    h[:,:640] = h[:,:640] * 1.2
+                    hs_ = Fourier_filter(hs_, threshold=1, scale=0.9)
+                if h.shape[1] == 640:
+                    h[:,:320] = h[:,:320] * 1.4
+                    hs_ = Fourier_filter(hs_, threshold=1, scale=0.2)
+                # ---------------------------------------------------------
+
+                h = th.cat([h, hs_], dim=1)
+                h = module(h, emb, context)
+            else:
+                h = th.cat([h, hs.pop()], dim=1)
+                h = module(h, emb, context)
         h = h.type(x.dtype)
         if self.predict_codebook_ids:
             return self.id_predictor(h)

The simplest way to switch between FreeU and vanilla is to change if True: to if False:. Again, it's just a hack to test if it works.

In conclusion, if everything is correct on my end, it's probably not worth it for the best fine tuned models. On the opposite, to make it work in all cases you have to run it in full 32 bit resolution at ≈3x slowdown and get images that look worse than without it. The base models sure benefit from it but honestly, who uses them except the researchers and LoRA trainers?

I hope I did a mistake somewhere so these results are all wrong. After all, I just copied the part that differs from the original code and fixed the errors to make it work, but who knows.

How do you create the figures in your research paper?

Could you please show me how figures 2 and 3 in the paper were created? Such as,the figure of “The denoising process（figure 2）” and “Relative log amplitudes of Fourier for diffusion inter-mediate steps（figure 3）”

What are the parameters for 1.5 or XL?

Is there code for the experiment

Hello, Si,
I am very interested in this work. I also want to know if this result hold in other models as well. So I would be appreciated if you can provide details and code of the experiment.
Sincerely Jiahui.Li

wrong page, cant delete post.

Hyperparameter range

Thanks for sharing your hyperparameter values. I am trying to apply FreeU to Diffusion-based TTS model, which is my field of interest.

Grad-TTS-Free-U

I have a question because I had difficulty setting hyperparameter ranges during my experiment.
How did you set the range of skip factors and backbone factors?

Implementation in Modelscope/Zeroscope?

How would this be implemented in modelscope/zeroscope t2v? Thanks!

Some question about FreeU code

Thank you for your excellent and intriguing work! However, I still have some questions regarding the application of FreeU in downstream tasks:
1、Since FreeU doesn't have any parameters to train, do I only need to incorporate the FreeU component during reverse denoising, rather than including it in the training phase?
2、Have you ever experimented with applying FreeU to sequence generation tasks? If so, could you provide insights on how to set the values for 'b' and 's'?
3、The code includes FreeU operations only in the first two up blocks. Have you explored adding FreeU operations to all upblocks? What outcomes or results can be expected from such an approach?

SDXL parameters

Forgot to include SDXL parameters.