Coder Social home page Coder Social logo

fastdiffusion's Introduction

fastdiffusion

Useful resources

Additional papers

  • Diffusion Models Beat GANs on Image Synthesis, Dhariwal & Nichol 2021.

    Proposes architecture improvements (as of the state of the art in 2021, i.e. DDPM and DDIM) that could give some insight when we write models from scratch. In addition, introduces classifier guidance to improve conditional image synthesis. This was later replaced by classifier-free guidance, but using a classifier looks like the natural thing to do for conditional generation.

  • Fast Sampling of Diffusion Models with Exponential Integrator.

    DEIS Scheduler. Authors claim excellent sampling results with as few as 12 steps. I haven't read it yet.

Application-oriented papers

Some of these tricks could be effective / didactic.

Improvements on simple diffusion

  • Better denoising autoencoder (diffusion model)
    • Unet
    • Attention
    • P2 weighting
    • EMA
    • Self-conditioning
  • Predict noise / gradient (Score based diffusion)
  • Latent diffusion (can not be a unet)
    • Attention
  • Better loss functions
    • Perceptual + MSE + GAN (in the VAE)
  • Preconditioning/scaling inputs and outputs
  • Other crappifiers
  • Data augmentation
  • Better samplers / optimisers
  • Initialisers such as pixelshuffle
  • Learnable blur
  • Blur noise

Applications

  • Style transfer
  • Super-res
  • Colorisation
  • Remove jpeg noise
  • Remove watermarks
  • Deblur
  • CycleGAN / Pixel2Pixel -> change subject/location/weather/etc

Diffusion Applications and Demos

  • Stable Diffusion fine-tuning (for specific styles or domains).

    • Pokemon fine-tuning.

    • Japanese Stable Diffusion code demo. They had to fine-tune the text embeddings too because the tokenizer was different.

  • Stable Diffusion morphing / videos. Code by @nateraw based on a gist by @karpathy.

  • Image Variations. Demo, with links to code. Use the CLIP image embeddings as conditioning for the generation, instead of the text embeddings. This requires fine-tuning of the model because, as far as I understand it, the text and image embeddings are not aligned in the embedding space. CLOOB doesn't have this limitation, but I heard (source: Boris Dayma from a conversation with Katherine Crowson) that attempting to train a diffusion model with CLOOB conditioning instead of CLIP produced less variety of results.

  • Image to image generation. Demo sketch -> image.

Style Transfer

Other model ideas

  • Latent space models
    • Imagenet
    • CLIP
    • Noisy clip

fastdiffusion's People

Contributors

jantic avatar johnowhitaker avatar jph00 avatar pcuenca avatar seem avatar tcapelle avatar tmabraham avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.