Coder Social home page Coder Social logo

rmgogogo / nano-aigc Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 2.15 MB

Generative models nano version for fun. No STOA here, nano first.

Python 34.59% Jupyter Notebook 65.41%
conditional-vae diffusion vae variational-autoencoder clip conditional-diffusion bert huggingface ddim latent-diffusion

nano-aigc's Introduction

AIGC Generative Models

For NLP generative, like GPT, please check https://github.com/rmgogogo/nano-transformers

Here this repo more on generatives. GPT still may be tried here.

This repo uses PyTorch.

VAE

python vae.py --train --epochs 10 --predict

Conditional VAE

python cvae.py --train --epochs 10 --predict

Diffusion

python diffusion.py --train --epochs 100 --predict

Mac Mini M1 takes around 1 hour (1:17:16).

Conditional Diffusion

python conditional_diffusion.py --train --epochs 100 --predict

CLIP

python clip.py --train --epochs 10 --predict

CLIP Pro

A pro version of CLIP. It uses the BERT text encoder with real text. Since this is a nano image VAE, while BERT encoder generates 768-d vector, and we only have 10 ditigals, it has high prob to contain same digital in one batch, then the CLIP's loss can't work well. Using small batch would help but small batch has its own problem. So the performance is not good. However it's good enough as a demo to tell the essience.

python clip_pro.py --train --epochs 10 --predict

VQ VAE

python vqvae.py --train --epochs 100 --predict

Codebook size is 32, here display the whole possibilites. This sample VQ the whole z, in real case, it VQ the parts.

The initial codebook:

The learned codebook:

DDIM (Faster Diffusion Generation)

50 times faster.

python diffusion.py --predict --ddim
python conditional_diffusion.py --predict --ddim

Latent Diffusion

Based on vae with latent 8, it do diffusion in latent space. However since the latent space already is noise-make-sense and high compressed (8 numbers), the diffusion in latent didn't work well as expected. It's mainly for demo purpose.

GAN

Gan with a simple conv net, so it's DCGAN.

Patches VQ VAE

Split image into 4x4 smaller images, so we have 7x7 patches.

Train VQ VAE for the patches.

It's like tokenizer to give each patch an identifier. So image can be represented as a 7x7 sequence. Later we can implement ViT based on it.

Compare the Patches VQ VAE with VQ-VAE or VAE, we would find that image is more sharp. However in the boundary of the two patches, we may need to do some additional low-band filtering to make it be more smooth.

The codebook is trained and looks good.

GPT2

GPT2 based on a toy dataset (simple math).

python gpt2.py --train --epochs 400 --predict --input "1 + 1 ="

LLaMA

python llama.py --train --epochs 400 --predict --input "1 + 1 ="

Gemma

python gemma.py --train --epochs 400 --predict --input "1 + 1 ="

DiT

(1)

Split image into patches, VQ the patch to tokenize the image into tokens (distinct) and then get its vector via embedding. Train GPT to predict the tokens, which finally generates the image. Diffusion Transformer.

https://arxiv.org/pdf/2212.09748.pdf

(2)

Split image into patches via using Conv to get token vectors directly.

nano-aigc's People

Contributors

rmgogogo avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.