Coder Social home page Coder Social logo

osediff's Introduction

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

1The Hong Kong Polytechnic University, 2OPPO Research Institute 

[paper]


🔥 News

  • [2024.07] Release OSEDiff-SD21base.
  • [2024.06] This repo is created.

🎬 Overview

overview

🔧 Dependencies and Installation

  1. Clone repo

    git clone https://github.com/cswry/OSEDiff.git
    cd OSEDiff
  2. Install dependent packages

    conda create -n OSEDiff python=3.10 -y
    conda activate OSEDiff
    pip install --upgrade pip
    pip install -r requirements.txt
  3. Download Models

Dependent Models

⚡ Quick Inference

python test_osediff.py \
-i preset/datasets/test_dataset/input \
-o preset/datasets/test_dataset/output \
--osediff_path preset\models\osediff.pkl \
--pretrained_model_name_or_path SD21BASE_PATH \
--ram_ft_path DAPE_PATH \
--ram_path RAM_PATH

📷 Results

benchmark

Quantitative Comparisons (click to expand)

Visual Comparisons (click to expand)

📧 Contact

If you have any questions, please feel free to contact: [email protected]

🎓Citations

@article{wu2024one,
  title={One-Step Effective Diffusion Network for Real-World Image Super-Resolution},
  author={Wu, Rongyuan and Sun, Lingchen and Ma, Zhiyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2406.08177},
  year={2024}
}
statistics

visitors

osediff's People

Contributors

cswry avatar

Stargazers

riofang avatar HUANG Yuanhao avatar tuyo avatar Chen Chen avatar 小三爷我大胆地往前走莫回头 avatar Sun Liang avatar Yang Jianhong avatar Hang Guo avatar Jiahan Chen avatar Xiangtao Kong avatar  avatar Weixia Zhang avatar  avatar  avatar Li-Yuan Tsao avatar  avatar Zhiyuan Ma avatar  avatar  avatar  avatar 0xhephaistos avatar Allen Chen avatar 任祉涵 avatar Gordon_cv avatar brux avatar  avatar Yifan Liu avatar Yan Wang avatar  avatar  avatar Evan Kim avatar sudo avatar Xuehang Zheng avatar  avatar  avatar 佰阅 avatar EpicScene avatar Tang avatar  avatar Yiyun Chen avatar Jiangtao lv avatar Linghui_Yang avatar  avatar tianxue avatar  avatar guoyong avatar xing-shuai avatar 张明琦 avatar dddd_eep avatar Yuxiang Chen avatar WenKang Han avatar  avatar Elon avatar wind222 avatar  avatar Ziheng Zhang avatar Shaun Heng avatar zhuyinghao avatar cly avatar sun avatar SaiFarmer avatar Xi Yang avatar Lau Van Kiet avatar Jhin avatar Bin Chen avatar ~Cc avatar  avatar Chen Zida avatar  avatar Sean avatar  avatar kyle avatar Edge Micro avatar  avatar NIYU avatar Jingbo Lin avatar Shihao Yin avatar  avatar Yabin Zhang avatar ZhiyuanthePony avatar Tianhe Wu avatar  avatar Lingchen Sun avatar chrisCC avatar Mohan Zhou avatar Yuxiang Wei avatar  avatar Shi Guo avatar  avatar  avatar

Watchers

 avatar  avatar yangkai.wei avatar  avatar PixelArtAI avatar  avatar ZhiyuanthePony avatar  avatar  avatar Lingchen Sun avatar

osediff's Issues

when trying to adapt to sdxl I am unable to properly add time_ids to UNet2DConditionModel

    elif self.config.addition_embed_type == "text_time":
        # SDXL - style
        if "text_embeds" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `text_embeds` to be passed in `added_cond_kwargs`"
            )
        text_embeds = added_cond_kwargs.get("text_embeds")
        if "time_ids" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `time_ids` to be passed in `added_cond_kwargs`"
            )
        time_ids = added_cond_kwargs.get("time_ids")
        time_embeds = self.add_time_proj(time_ids.flatten())
        time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))
        add_embeds = torch.concat([text_embeds, time_embeds], dim=-1)
        add_embeds = add_embeds.to(emb.dtype)
        aug_emb = self.add_embedding(add_embeds)
        
      
        model_pred = self.unet(
            lq_latent,
            self.timesteps,
            encoder_hidden_states=prompt_embeds,
            added_cond_kwargs={
                "text_embeds": prompt_embeds,
                "time_ids": self.timesteps,
            },
        ).sample
        
        
        self.timesteps as placeholder as this is not the correct value

Is Lora training necessary?

Is Lora training necessary? What would happen if it were changed to full parameter fine-tuning? How do you view this

SDXL-Turbo?

Will OSEDiff support SDXL-Turbo?

Thanks. Sorry for many questions

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

Hi,
Thank you for wonderfull work.
On windows system, this error is present during code execution:

python test_osediff.py -i input -o output --osediff_path preset/models/osediff.pkl --pretrained_model_name_or_path preset/models/stable-diffusion-2-1-base/ --ram_ft_path preset/models/DAPE.pth --ram_path preset/models/ram_swin_large_14m.pth
Traceback (most recent call last):
  File "C:\Users\Miki\OSEDiff\test_osediff.py", line 68, in <module>
    DAPE = ram(pretrained=args.ram_path,
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 329, in ram
    model = RAMLora(**kwargs)
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 109, in __init__
    self.tokenizer = init_tokenizer()
  File "C:\Users\Miki\OSEDiff\ram\models\utils.py", line 132, in init_tokenizer
    tokenizer = BertTokenizer.from_pretrained('/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased', local_files_only=True)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\tokenization_utils_base.py", line 1770, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.



as a temporarily solution I use utils.py from your previous repo:
https://github.com/cswry/SeeSR/blob/main/ram/models/utils.py

And everything works. :)

windows error:expected str, bytes or os.PathLike object, not NoneType

Hi anyone meets this error?
include_dir += [os.path.join(os.environ.get("CUDA_PATH"), "include")]
File "C:\Users\Demo-NT\AppData\Local\anaconda3\envs\OSEDIFF\lib\ntpath.py", line 104, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

About the implementation of the method

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

When will you release training codes?

I'm really curious about the effectiveness of VSD Loss, and I want to retrain this model for demonstration. It will be so kind of you to release the training code in the most early recent, thx.

Whether release training code or not?

Thanks for your wonderful work!!!
The balance of effectiveness and efficiency of OSEDiff is very shocked!!!
I trained our network with vsd loss, but maybe my training code have some difference with you, the results are terrible, so would you release training code in the future?
Hope your reply!

questions about the output.

Hello, I'm sorry to bother you. When I use landscape or flower images, there is no output result.

For human images, the colors of the eyes and lips become vivid. Is there any way to adjust this?
Capture
Capture2

fp16inference + tile option?

Is it possible to implement?
I can upscale images only up to 640x480pixels. (PC specs RTX 3090 24GB, 64GB RAM, Ryzen 7950X)
Thanks

Output results not as good as SeeSR?

Thought the output should be better, or at least the same but faster? Tried with many images that I tested SeeSR with and all results with OSEDiff are worse. :(

Input:
image

SeeSR: (Using SD-Turbo)
image

OSEDiff:
Star Trek Ds9 - 6x13 - Far Beyond The Stars x264 ac3 03
Facial details are not very good and texture of the wall is gone. Only the gray part of the outfit is much better. Not sure why it isn't as good?

Thank you for your hard work on this, just not understanding why it isn't as good as SeeSR. It is faster however!

Question about MANIQA Metric

Great job! But when I reproduced it, I found that there was a significant difference in the MANIQA metric. May I ask which version of MANIQA you are using? Others,could you provide the code for the metric testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.