Coder Social home page Coder Social logo

marcus-arcadius / japanese-stable-diffusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rinnakk/japanese-stable-diffusion

0.0 1.0 0.0 3.29 MB

Japanese Stable Diffusion is a Japanese specific latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

License: Other

Python 16.40% Jupyter Notebook 83.60%

japanese-stable-diffusion's Introduction

Japanese Stable Diffusion

img

Japanese Stable Diffusion is a Japanese-specific latent text-to-image diffusion model.

This model was trained by using a powerful text-to-image model, Stable Diffusion. Many thanks to CompVis, Stability AI and LAION for public release.

Table of Contents
News
Model Details
Usage
Citation
License

News

September 9, 2022

Model Details

Why Japanese Stable Diffusion?

Stable Diffusion is a very powerful text-to-image model, not only in terms of quality but also in terms of computational cost. Because Stable Diffusion was trained on English dataset, you need translate prompts or use directly if you are non-English users. Surprisingly, Stable Diffusion can sometimes generate nice images even by using non-English prompts. So, why do we need a language-specific Japanese Stable Diffusion?

Because we want a model to understand our culture, identity, and unique expressions such as slang. For example, one of the famous Japanglish is "salary man" which means a businessman especially we often imagine he's wearing a suit. Stable Diffusion cannot understand such Japanese unique words correctly because Japanese is not their target.

So, we made a language-specific version of Stable Diffusion! Japanese Stable Diffusion can achieve the following points compared to the original Stable Diffusion.

  • Generate Japanese-style images
  • Understand Japanglish
  • Understand Japanese unique onomatope
  • Understand Japanese proper noun

img

caption: "サラリーマン 油絵", that means "salary man, oil painting"

Training

Japanese Stable Diffusion was trained by using Stable Diffusion and has the same architecture and the same number of parameters. But, this is not a fully fine-tuned model on Japanese datasets because Stable Diffusion was trained on English dataset and the CLIP tokenizer is basically for English. To achieve make a Japanese-specific model based on Stable Diffusion, we had 2 stages inspired by PITI.

  1. Train a Japanese-specific text encoder with our Japanese tokenizer from scratch with the latent diffusion model fixed. This stage is expected to map Japanese captions to Stable Diffusion's latent space.
  2. Fine-tune the text encoder and the latent diffusion model jointly. This stage is expected to generate Japanese-style images more.

We used the following dataset for training the model:

  • Approximately 100 million images with Japanese captions, including the Japanese subset of LAION-5B.

Usage

Open In Colab

Firstly, install our package as follows. This package is modified 🤗's Diffusers library to run Japanese Stable Diffusion.

pip install git+https://github.com/rinnakk/japanese-stable-diffusion

You need to accept the model license before downloading or using the weights. So, you'll need to visit its card, read the license and tick the checkbox if you agree.

You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to this section of the documentation.

huggingface-cli login

Running the pipeline with the k_lms scheduler:

import torch
from torch import autocast
from diffusers import LMSDiscreteScheduler
from japanese_stable_diffusion import JapaneseStableDiffusionPipeline

model_id = "rinna/japanese-stable-diffusion"
device = "cuda"
# Use the K-LMS scheduler here instead
scheduler = LMSDiscreteScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)
pipe = JapaneseStableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, use_auth_token=True)
pipe = pipe.to(device)

prompt = "猫の肖像画 油絵"
with autocast("cuda"):
    image = pipe(prompt, guidance_scale=7.5)["sample"][0]  
    
image.save("output.png")

Note: JapaneseStableDiffusionPipeline is almost same as diffusers' StableDiffusionPipeline but added some lines to initialize our models properly.

Japanese Stable Diffusion pipelines also include

Citation

@InProceedings{Rombach_2022_CVPR,
      author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
      title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month     = {June},
      year      = {2022},
      pages     = {10684-10695}
  }
@misc{japanese_stable_diffusion,
    author    = {Shing, Makoto and Sawada, Kei},
    title     = {Japanese Stable Diffusion},
    howpublished = {\url{https://github.com/rinnakk/japanese-stable-diffusion}},
    month     = {September},
    year      = {2022},
}

License

The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.

japanese-stable-diffusion's People

Contributors

mkshing avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.