cswry / osediff Goto Github PK

View Code? Open in Web Editor NEW

90.0 10.0 0.0 34.6 MB

Python 99.94% Shell 0.06%

osediff's Introduction

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

Rongyuan Wu^1,2,* Lingchen Sun^1,2,* Zhiyuan Ma^1,* Lei Zhang^1,2,†

¹The Hong Kong Polytechnic University, ²OPPO Research Institute

[paper]

🔥 News

[2024.07] Release OSEDiff-SD21base.
[2024.06] This repo is created.

🎬 Overview

🔧 Dependencies and Installation

Clone repo

git clone https://github.com/cswry/OSEDiff.git
cd OSEDiff

Install dependent packages

conda create -n OSEDiff python=3.10 -y
conda activate OSEDiff
pip install --upgrade pip
pip install -r requirements.txt

Download Models

Dependent Models

⚡ Quick Inference

python test_osediff.py \
-i preset/datasets/test_dataset/input \
-o preset/datasets/test_dataset/output \
--osediff_path preset\models\osediff.pkl \
--pretrained_model_name_or_path SD21BASE_PATH \
--ram_ft_path DAPE_PATH \
--ram_path RAM_PATH

📷 Results

Quantitative Comparisons (click to expand)

Visual Comparisons (click to expand)

📧 Contact

If you have any questions, please feel free to contact: [email protected]

🎓Citations

@article{wu2024one,
  title={One-Step Effective Diffusion Network for Real-World Image Super-Resolution},
  author={Wu, Rongyuan and Sun, Lingchen and Ma, Zhiyuan and Zhang, Lei},
  journal={arXiv preprint arXiv:2406.08177},
  year={2024}
}

statistics

osediff's People

Contributors

Stargazers

Watchers

osediff's Issues

when trying to adapt to sdxl I am unable to properly add time_ids to UNet2DConditionModel

    elif self.config.addition_embed_type == "text_time":
        # SDXL - style
        if "text_embeds" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `text_embeds` to be passed in `added_cond_kwargs`"
            )
        text_embeds = added_cond_kwargs.get("text_embeds")
        if "time_ids" not in added_cond_kwargs:
            raise ValueError(
                f"{self.__class__} has the config param `addition_embed_type` set to 'text_time' which requires the keyword argument `time_ids` to be passed in `added_cond_kwargs`"
            )
        time_ids = added_cond_kwargs.get("time_ids")
        time_embeds = self.add_time_proj(time_ids.flatten())
        time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))
        add_embeds = torch.concat([text_embeds, time_embeds], dim=-1)
        add_embeds = add_embeds.to(emb.dtype)
        aug_emb = self.add_embedding(add_embeds)
        
      
        model_pred = self.unet(
            lq_latent,
            self.timesteps,
            encoder_hidden_states=prompt_embeds,
            added_cond_kwargs={
                "text_embeds": prompt_embeds,
                "time_ids": self.timesteps,
            },
        ).sample
        
        
        self.timesteps as placeholder as this is not the correct value

Is Lora training necessary?

Is Lora training necessary? What would happen if it were changed to full parameter fine-tuning? How do you view this

SDXL-Turbo?

Will OSEDiff support SDXL-Turbo?

Thanks. Sorry for many questions

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

Hi,
Thank you for wonderfull work.
On windows system, this error is present during code execution:

python test_osediff.py -i input -o output --osediff_path preset/models/osediff.pkl --pretrained_model_name_or_path preset/models/stable-diffusion-2-1-base/ --ram_ft_path preset/models/DAPE.pth --ram_path preset/models/ram_swin_large_14m.pth
Traceback (most recent call last):
  File "C:\Users\Miki\OSEDiff\test_osediff.py", line 68, in <module>
    DAPE = ram(pretrained=args.ram_path,
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 329, in ram
    model = RAMLora(**kwargs)
  File "C:\Users\Miki\OSEDiff\ram\models\ram_lora.py", line 109, in __init__
    self.tokenizer = init_tokenizer()
  File "C:\Users\Miki\OSEDiff\ram\models\utils.py", line 132, in init_tokenizer
    tokenizer = BertTokenizer.from_pretrained('/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased', local_files_only=True)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\tokenization_utils_base.py", line 1770, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\transformers\utils\hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "C:\Users\Miki\anaconda3\envs\osediff\lib\site-packages\huggingface_hub\utils\_validators.py", line 154, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/notebook/data/group/LowLevelLLM/LLM/bert-base-uncased'. Use `repo_type` argument if needed.

as a temporarily solution I use utils.py from your previous repo:
https://github.com/cswry/SeeSR/blob/main/ram/models/utils.py

And everything works. :)

windows error：expected str, bytes or os.PathLike object, not NoneType

Hi anyone meets this error？
include_dir += [os.path.join(os.environ.get("CUDA_PATH"), "include")]
File "C:\Users\Demo-NT\AppData\Local\anaconda3\envs\OSEDIFF\lib\ntpath.py", line 104, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Question: Is the time step directly fixed to T in the input of diffusion (the largest time step)

Is the time step directly fixed to T in the input of diffusion (the largest time step)

May I ask when the author will upload the complete code

About the implementation of the method

Hello!
I've also emailed you the same question but you seem miss it.
I've read your paper One-Step Effective Diffusion Network for Real-World Image Super-Resolution, I found it very interesting and tried to reproduce it following your paper.
However, I found the pseudo code provided in the appendix (Algorithm1) a little bit confusing.
Based on my understanding of the paper, I think the E_\phi, E_\theta in line2 should be E_\phi', E_\phi respectively since E_\phi is the pretrained model and we shouldn't re-initialize it.
E_\theta and E_\theta' in line13 should also be E_\phi, E_\phi', and E_\theta' in line 14 should be E_\phi', which is consistent with the symbols used in Eq.7.
I wonder if I am wrong or right? Thank you!
Also, I have another question. Is the frozen regularizer used in VSD loss exactly the pretrained model, i.e. SD2.1? And the trainable regularizer is initialized by the pretrained model with LoRA? Then I think the VSD loss is almost 0 in the beginning of the training?
I am not sure if my understanding is correct, please correct me.

How these indicators such as PSNR, SSIM and FID are calculated?

Hi, I would like to ask if there is a calculation code for PSNR, SSIM, FID, LPIPS, DISTS, MUSIQ and other related evaluation indicators, can you share it，thank you.

About the model folder

Hello. Great job。
May I ask what has been added to autoencoder_kl.py and (unet_2d_condition.py) in the models folder compared to Diffusers.

Question: Able to make this temporally stable for video input?

SeeSR is amazing, and OSEDiff looks even better. Would be incredible if it could also process video frames and stay consistent between frames while improving the input image/frame.

Thank you for all the amazing work!

When will you release training codes?

I'm really curious about the effectiveness of VSD Loss, and I want to retrain this model for demonstration. It will be so kind of you to release the training code in the most early recent, thx.

Question: SD3 support

Will either OSEDiff or SeeSR at some point support SD3?

Thank you.

Whether release training code or not?

Thanks for your wonderful work!!!
The balance of effectiveness and efficiency of OSEDiff is very shocked!!!
I trained our network with vsd loss, but maybe my training code have some difference with you, the results are terrible, so would you release training code in the future?
Hope your reply!

questions about the output.

Hello, I'm sorry to bother you. When I use landscape or flower images, there is no output result.

For human images, the colors of the eyes and lips become vivid. Is there any way to adjust this?

code and weights

Hi, could you please share the code and weights? Thx! @cswry

fp16inference + tile option?

Is it possible to implement?
I can upscale images only up to 640x480pixels. (PC specs RTX 3090 24GB, 64GB RAM, Ryzen 7950X)
Thanks

Excited but hope issues from SeeSR are addressed

Primarily this one, results looking light a painting, and color banding on results: cswry/SeeSR#50

Output results not as good as SeeSR?

Thought the output should be better, or at least the same but faster? Tried with many images that I tested SeeSR with and all results with OSEDiff are worse. :(

Input:

SeeSR: (Using SD-Turbo)

OSEDiff:

Facial details are not very good and texture of the wall is gone. Only the gray part of the outfit is much better. Not sure why it isn't as good?

Thank you for your hard work on this, just not understanding why it isn't as good as SeeSR. It is faster however!

Question about MANIQA Metric

Great job! But when I reproduced it, I found that there was a significant difference in the MANIQA metric. May I ask which version of MANIQA you are using? Others,could you provide the code for the metric testing.