Coder Social home page Coder Social logo

sdxs's Introduction

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Project Paper SDXS-512-0.9 SDXS-512-DreamShaper SDXS-512-DreamShaper-Anime SDXS-512-DreamShaper-Sketch SDXS-512-DreamShaper-Demo SDXS-512-DreamShaper-Anime-Demo SDXS-512-DreamShaper-Sketch-Demo

Yuda Song, Zehao Sun, Xuanwu Yin

We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU. Assuming the image generation time is limited to 1 second, then SDXL can only use 16 NFEs to produce a slightly blurry image, while SDXS-1024 can generate 30 clear images.

Moreover, our proposed method can also train ControlNet, offering promising applications in image-conditioned control and facilitating efficient image-to-image translation.

🔥News

⚡️Demo

Create a new environment:

conda create -n sdxs

Activate the new environment:

conda activate sdxs

Install requirements:

conda install python=3.10 pytorch=2.2.1 torchvision torchaudio pytorch-cuda=11.8 xformers=0.0.25 -c pytorch -c nvidia -c xformers
pip install -r requirements.txt

Run text-to-image demo:

python demo.py

Run anime-style text-to-image (LoRA) demo:

python demo_anime.py

Run sketch-to-image (ControlNet) demo:

python demo_sketch.py

💡Train

I found that DMD2 release the training code, and its training scheme is identical to the new version of SDXS, so you can refer to it. Unfortunately, the SDXS training code is not allowed to be open-sourced and will most likely not be updated again.

✒️Method

Model Acceleration

We train an extremely light-weight image decoder to mimic the original VAE decoder’s output through a combination of output distillation loss and GAN loss. We also leverage the block removal distillation strategy to efficiently transfer the knowledge from the original U-Net to a more compact version.

SDXS demonstrates efficiency far surpassing that of the base models, even achieving image generation at 100 FPS for 512x512 images and 30 FPS for 1024x1024 images on the GPU.

Text-to-Image

To reduce the NFEs, we suggest straightening the sampling trajectory and quickly finetuning the multi-step model into a one-step model by replacing the distillation loss function with the proposed feature matching loss. Then, we extend the Diff-Instruct training strategy, using the gradient of the proposed feature matching loss to replace the gradient provided by score distillation in the latter half of the timestep.

Despite a noticeable downsizing in both the sizes of the models and the number of sampling steps required, the prompt-following capability of SDXS-512 remains superior to that of SD v1.5. This observation is consistently validated in the performance of SDXS-1024 as well.

Image-to-Image

We extend our proposed training strategy to the training of ControlNet, relying on adding the pretrained ControlNet to the score function.

We demonstrate its efficacy in facilitating image-to-image conversions utilizing ControlNet, specifically for transformations involving canny edges and depth maps.

Citation

If you find this work useful for your research, please cite our paper:

@article{song2024sdxs,
  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},
  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
  journal   = {arxiv},
  year      = {2024},
}

Acknowledgment: the demo code is based on https://github.com/GaParmar/img2img-turbo.

sdxs's People

Contributors

idkiro avatar eltociear avatar

Stargazers

graenys avatar Lennart SBR avatar MinGiSa avatar kai wang avatar Alex Genovese avatar  avatar shammy5 avatar Serghei Diulgherov avatar  avatar Thomas Darimont avatar neru avatar  avatar  avatar  avatar  avatar  avatar Jianjin Xu avatar Jyotirmaya Mahanta avatar Daqing Liu avatar Fei avatar Hu Wenbo avatar Taehoon Kim avatar Xinyu Peng avatar Thao Pham avatar Matthias Brenninkmeijer avatar  avatar  avatar  avatar  avatar Xi Yang avatar  avatar  avatar Michael Scofield avatar  avatar Shareef Ifthekhar avatar ElliotQi avatar Lu Ming avatar Ren Tianhe avatar Shi Guo avatar Chaise Farrar avatar  avatar Nymph avatar Frank (Haofan) Wang avatar Anthony Perrett avatar Ethan Smith avatar  avatar Amr Kayid avatar Amin avatar Mor avatar  avatar  avatar Moritz August avatar Dipak Savaliya avatar  avatar M1nd 3xpand3r avatar Paolo Faccini avatar Henrik Westphal avatar Daniel Reimann avatar CJ Pais avatar lthero avatar Ruiming Guo (郭睿明) avatar Dongyu Yan avatar Yongsheng Yu avatar Leheng Li avatar Zhe Liu avatar Xu Ma avatar Yuming Gu avatar  avatar Tianwei Yin avatar  avatar KMario avatar  avatar  avatar whhamber avatar  avatar Xuejian Rong avatar godIsAProgrammer avatar PanYang avatar  avatar Mingkang Xiong avatar miaotouy avatar  avatar Han Lin avatar Xiangzhe Lu avatar  avatar Luís Eduardo Ribeiro Guerra avatar  avatar  avatar Roman Sima avatar pe653 avatar Wanglong Lu avatar  avatar  avatar  avatar George fe avatar  avatar wurui avatar  avatar Zinuo Li avatar Zhoujingming Gao avatar

Watchers

gradetwo avatar  avatar 刘国友 avatar Wu Yiming avatar Awais Jafar avatar  avatar Ollin Boer Bohan avatar Simon Wengeler avatar  avatar  avatar  avatar Maitreya Patel avatar  avatar  avatar  avatar IronMan avatar Maxime Vandegar avatar sudo avatar shammy5 avatar Duc Anh (Aengus) N. avatar  avatar Sanbu avatar  avatar  avatar Mateos avatar  avatar

sdxs's Issues

Number Steps and Strength different than 1 not working.

Hello, I am trying SDXS with the AutoPipelineForImage2Image pipeline, but I noticed that if I set up a number of steps bigger than 1 the results get worse and if I set a value of Strength smaller than 1 it gives an error. Is this behaviour normal?

Thanks in advance,

Joan

cannot get comparable results as original DreamShaper model

Thanks for releasing the model. Really interested in your work.
However, using the same prompt, I CANNOT get comparable results as original DreamShaper model. Here are the examples:

sdxs-512-dreamshaper / large vae / fp16
image

dreamshaper / 4 steps / cfg 3
res-6-1421173560-4

dreamshaper / 8 steps / cfg 3
res-5-8530355218

Can you help me with this problem? Maybe should I change more parameters during inference?

How to access the prompt of Laion-5b?

Thanks for your impressive work! I found that you use the prompt of Laion-5B to generate data, can you share the way to access the prompt of Laion-5B?

New model distillation?

Is it similar to the silly distillation of the LCM TCG LIGHT TURBO on the China mainland? They are only meant to generate portrait photos, and low CFG makes the multiple art styles supported by the SDXL itself completely unusable

ControlNet Release?

Any plan to release a ControlNet for SDXS?
If not, I'm curious if adapting a standard ControlNet to align with its structure would be feasible.

SSIM parameter

Hi, very interesting work indeed. I was wondering, how did you pick the parameter for the SSIM loss since you're using it in the latent space, specifically the data_range and the window_size. Looking forward to your answer, thank you!!

How to distil ControlNet?

I am interested in lightweighting ControlNet, and I wonder how to do that. Is there any train script?

Fine tuning

Thank you for such a great work, I need to train models for image translation tasks (no text conditioning needed). Do you plan to release how to fine-tune, or can I integrate lroa or dreamboot methods into your code? Do you have any plans to release source codes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.