siliconflow / onediff Goto Github PK

View Code? Open in Web Editor NEW

1.2K 39.0 73.0 30.21 MB

OneDiff: An out-of-the-box acceleration library for diffusion models.

Home Page: https://github.com/siliconflow/onediff/wiki

Python 97.59% Dockerfile 0.03% Shell 2.12% Jupyter Notebook 0.26%

comfyui diffusers pytorch sdxl stable-diffusion aigc-serving comfyui-workflow cuda inference-engine lcm

onediff's People

Contributors

Stargazers

Watchers

Forkers

wangchangh shanghaiqiguang echo719 yufan956932910 smallyabo jqk6 openonev benleo usamaa-saleem qxln-tangbx tongvivi-0418 hqysh mart999 siweilai gzadigo xiaoqingwang kioco helloword8 haoyaogang githubmyc svennj carlziess alex3076 zkings001 josenjin joeyisboy runpod aokifish aoesyl 5l1v3r1 levi131 onaugust1 dango233 ingeniousfrog zhangjun qipa kopyl muhammadarham-43 anthonyyuan eltociear camenduru isidentical syunar steveefemsc kekewind lihuibng zl-su treksis erwannmillon m00nsp3ll mexicanamerican jovany-wang zhaopufeng joseph16388 jizhongpeng keyzf cherwayway keyman9848 wangerlie totsukawaii fmk345 gpu-net iamrohitanshu

onediff's Issues

Do you support the acceleration on P100?

Describe the bug

This is my performance of oneflow diffusers on P100

its 1.68it/s

But my performance on official stable-diffusion, there is 2.32it/s on PLMS Sampler

Its doesn't support P100 card yet?

Thank you so much!

Reproduction

No response

Logs

No response

System Info

diffusers version: 0.4.0.dev0
Platform: Linux-3.10.0-514.26.2.el7.x86_64-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.13.0 (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.23.0.dev0
Using GPU in script?:
Using distributed or parallel set-up in script?:

Dreambooth training with oneflow to improve training time

Hi,

I tried to infer the dreambooth model with oneflow. Oneflow reduces the infer time by 1/2 than normal inference time and its amazing. Then I tried to train dreambooth model with Oneflow to reduce the training time. But my training stuck here

accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
logging_dir=logging_dir,
)

and then after 30mins timed out.

I used

import oneflow as torch

Kindly, guide me how to train it with oneflow.
thanks

每次使用img2img时，unet都会重新编译

Describe the bug

每次使用img2img时，unet都会重新编译，导致没办法投入实际使用。
猜测是由于每次用户传入的初始图尺寸、比例不同的原因

Reproduction

No response

Logs

No response

System Info

Oneflow + cuda11.6

实现超长prompt输入以及prompt加权计算支持

参考 https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py
可以实现大于77token的prompt输入，以及使用（）[]等为prompt中的元素进行加权强调，很实用。
请求支持一下StableDiffusionLongPromptWeightingPipeline的Oneflow框架运行版本。

oneflow._softmax_backward_data is not implemented

Description

Traceback (most recent call last):
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1050, in _get_module
return importlib.import_module("." + module_name, self.name)
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 27, in
from ...modeling_utils import PreTrainedModel
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/modeling_utils.py", line 41, in
from .generation_utils import GenerationMixin
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/generation_utils.py", line 61, in
from .pytorch_utils import torch_int_div
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/pytorch_utils.py", line 19, in
from torch import _softmax_backward_data, nn
File "", line 1039, in _handle_fromlist
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/oneflow/mock_torch/init.py", line 42, in getattr
raise NotImplementedError(self.module.name + "." + name + error_msg)
NotImplementedError: oneflow._softmax_backward_data is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/user/Desktop/yao/projects/Text2img/AIdraw/en/utils/scheduling_ddim_oneflow.py", line 9, in
from diffusers.configuration_utils import ConfigMixin, register_to_config
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/diffusers/init.py", line 22, in
from transformers import CLIPTextModel, CLIPFeatureExtractor
File "", line 1039, in _handle_fromlist
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1041, in getattr
value = getattr(module, name)
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1040, in getattr
module = self._get_module(self._class_to_module[name])
File "/home/user/Software/Anaconda/envs/test/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1052, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
oneflow._softmax_backward_data is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

Does it support vae and negative prompt?

I couldn't find any information about those. Could you give me some sample code?

support for inpaint and inpaint_legacy

request to add oneflow pipe as the diffusers repo code file:
pipeline_stable_diffusion_inpaint.py

pipeline_stable_diffusion_inpaint_legacy.py

test_pipelines_oneflow_graph_load out of host memory error in WSL

The running environment is wsl2 Ubuntu 20.04, neither the host nor wsl2 is running any other CUDA programs.

ubuntu@DESKTOP-531RKJN:~$ python3 diffusers/tests/test_pipelines_oneflow_graph_load.py
libibverbs not available, ibv_fork_init skipped

==> Try to run graph save...
==> get_pipe  try to run
get_pipe  cuda mem before  1301.5
Fetching 12 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 56488.94it/s]
get_pipe  run time  15.074813842773438
get_pipe  cuda mem after  1301.5
get_pipe  cuda mem diff  0.0
<== get_pipe  finish run

==> pipe_to_cuda  try to run
pipe_to_cuda  cuda mem before  1301.5
pipe_to_cuda  run time  1.1066811084747314
pipe_to_cuda  cuda mem after  4061.5
pipe_to_cuda  cuda mem diff  2760.0
<== pipe_to_cuda  finish run

==> config_graph  try to run
config_graph  cuda mem before  4061.5
config_graph  run time  1.5735626220703125e-05
config_graph  cuda mem after  4061.5
config_graph  cuda mem diff  0.0
<== config_graph  finish run

sd init time  16.18261170387268 s.
==> text_to_image  try to run
text_to_image  cuda mem before  4061.5
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:09<00:00,  5.53it/s]
W20230210 00:32:48.454388  8336 cudnn_conv_util.cpp:102] Currently available alogrithm (algo=1, require memory=7472256, idx=1) meeting requirments (max_workspace_size=1073741824, determinism=0) is not fastest. Fastest algorithm (1) requires memory 1074922512
text_to_image  run time  9.699114561080933
text_to_image  cuda mem after  8125.5
text_to_image  cuda mem diff  4064.0
<== text_to_image  finish run

==> text_to_image  try to run
text_to_image  cuda mem before  8125.5
/home/ubuntu/.local/lib/python3.8/site-packages/oneflow/nn/modules/module.py:152: UserWarning: Interpolate() is called in a nn.Graph, but not registered into a nn.Graph.
  warnings.warn(
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:15<00:00,  3.17it/s]
text_to_image  run time  23.669822216033936
text_to_image  cuda mem after  9561.5
text_to_image  cuda mem diff  1436.0
<== text_to_image  finish run

====> diff  0.0023254268
st init and run time  49.55777668952942 s.
==> save_pipe_sch  try to run
save_pipe_sch  cuda mem before  9561.5
terminate called after throwing an instance of 'oneflow::RuntimeException'
  what():  Error: out of memory
Error message from /home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/op_call_instruction_policy.cpp:209
        OpCallInstructionUtil::Compute(this, instruction): copy:OpCall:s_d2h

  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/op_call_instruction_policy.cpp", line 209, in Compute
    OpCallInstructionUtil::Compute(this, instruction)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/op_call_instruction_policy.cpp", line 41, in Compute
    AllocateOutputBlobsMemory(op_call_instruction_policy, allocator, instruction)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/op_call_instruction_policy.cpp", line 89, in AllocateOutputBlobsMemory
    blob_object->TryAllocateBlobBodyMemory(allocator)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/eager/eager_blob_object.cpp", line 100, in TryAllocateBlobBodyMemory
    allocator->Allocate(&dptr, required_body_bytes)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/bin_allocator.h", line 392, in Allocate
    AllocateBlockToExtendTotalMem(aligned_size)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/bin_allocator.h", line 305, in AllocateBlockToExtendTotalMem
    backend_->Allocate(&mem_ptr, final_allocate_bytes)
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/ep_backend_host_allocator.cpp", line 25, in Allocate
    ep_device_->AllocPinned(allocation_options_, reinterpret_cast<void**>(mem_ptr), size)
Error Type: oneflow.ErrorProto.runtime_error
  File "/home/ci-user/runners/release/_work/oneflow/oneflow/oneflow/core/vm/op_call_instruction_policy.cpp", line 209, in operator()
    
Error Type: oneflow.ErrorProto.runtime_error
You can set ONEFLOW_DEBUG or ONEFLOW_PYTHON_STACK_GETTER to 1 to get the Python stack of the error.
Aborted
ubuntu@DESKTOP-531RKJN:~$

Originally posted by @MirrorCY in https://github.com/Oneflow-Inc/diffusers/issues/75#issuecomment-1424482749

not working with xformers?

Describe the bug

all kinds of problems, depending on which versions of xformers installed. I tried the master dev branch, v0.0.13, v0.0.12. All had some exceptions. I believe the main problem is the system cannot cast verify the inputs with oneflow.float32 while it should be torch.float32. Which version of xformers is the current dev version of oneflow is testing against?

Reproduction

No response

Logs

No response

System Info

python3.10, cu117, the latest oneflow version.

TypeError: Cannot interpret 'oneflow.float64' as a data type

Describe the bug

When run fp16 version descrided in https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion, I find this error.
However, the fp32 version works well.

Reproduction

import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline
import timeit

pipe = OneFlowStableDiffusionPipeline.from_pretrained(
    "./stable-diffusion-v1-4",
    revision="fp16",
    torch_dtype=torch.float16,
)

pipe = pipe.to("cuda")

start = timeit.default_timer()
prompt = "a photo of an astronaut riding a horse on mars"
with torch.autocast("cuda"):
    images = pipe(prompt).images
    for i, image in enumerate(images):
        image.save(f"{prompt}-of-{i}.png")

end = timeit.default_timer()
print('Running time: %s Seconds' % (end - start))

Logs

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
[oneflow] [vae] diffusers.OneFlowAutoencoderKL
[diffusers] [tokenizer] transformers.CLIPTokenizer
[oneflow] [unet] diffusers.OneFlowUNet2DConditionModel
[oneflow] [safety_checker] stable_diffusion.OneFlowStableDiffusionSafetyChecker
[oneflow] [scheduler] diffusers.OneFlowPNDMScheduler
[diffusers] [feature_extractor] transformers.CLIPFeatureExtractor
[oneflow] [text_encoder] transformers.OneFlowCLIPTextModel
[oneflow] compiling unet beforehand to make sure the progress bar is more accurate
[oneflow] [elapsed(s)] [unet compilation] 25.13586367602693
 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋  | 50/51 [00:02<00:00, 22.99it/s]
Traceback (most recent call last):
  File "demo.py", line 17, in <module>
    images = pipe(prompt).images
  File "/opt/conda/lib/python3.8/site-packages/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/code/diffusion/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py", line 321, in __call__
    latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
  File "/code/diffusion/diffusers/src/diffusers/schedulers/scheduling_pndm_oneflow.py", line 223, in step
    return self.step_plms(model_output=model_output, timestep=timestep, sample=sample, return_dict=return_dict)
  File "/code/diffusion/diffusers/src/diffusers/schedulers/scheduling_pndm_oneflow.py", line 338, in step_plms
    prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output)
  File "/code/diffusion/diffusers/src/diffusers/schedulers/scheduling_pndm_oneflow.py", line 362, in _get_prev_sample
    if (alpha_prod_t_prev.dtype == torch.float64):
TypeError: Cannot interpret 'oneflow.float64' as a data type

System Info

diffusers version: 0.4.0.dev0
Platform: Linux-4.18.0-2.4.3.x86_64-x86_64-with-glibc2.10
Python version: 3.8.8
PyTorch version (GPU?): 1.10.0+cu111 (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.23.0.dev0
Using GPU in script?: True
Using distributed or parallel set-up in script?:

运行img2img ValueError: torch.float16 needs to be of type `torch.dtype`, e.g. `torch.float16`, but is <class 'torch.dtype'>.

Describe the bug

运行img2img时，加载模型有报错

Reproduction

import requests
import torch
from PIL import Image
from io import BytesIO

#from diffusers import StableDiffusionImg2ImgPipeline
from diffusers import OneFlowStableDiffusionImg2ImgPipeline as StableDiffusionImg2ImgPipeline

load the pipeline

device = "cuda"
model_id_or_path = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)

or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

and pass `model_id_or_path="./stable-diffusion-v1-5"`.

generator = torch.Generator("cuda").manual_seed(42)
pipe = pipe.to(device)

let's download an initial image

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"

images = pipe(prompt=prompt, generator=generator, image=init_image, strength=0.75, guidance_scale=7.5).images

Logs

Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 25940.81it/s]
The config attributes {'upcast_attention': True} were passed to OneFlowUNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Traceback (most recent call last):
  File "sd-i.py", line 13, in <module>
    pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
  File "/root/zsf/oneflow/diffusers/src/diffusers/pipeline_oneflow_utils.py", line 706, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/root/zsf/oneflow/diffusers/src/diffusers/modeling_oneflow_utils.py", line 518, in from_pretrained
    raise ValueError(
ValueError: torch.float16 needs to be of type `torch.dtype`, e.g. `torch.float16`, but is <class 'torch.dtype'>.

System Info

.......

Inconsistent with diffusers

Using the same seed and code, the generated results are still not the same as huggingface diffusers and look relatively fuzzy

Produced a 3-legged astronaut image with the given docker script

Describe the bug

I copy-and-pasted the docker script from wiki and tested on A100.
The inference speed was fast as you claimed, but the result seems not good.
The prompt was the default one, "a photo of an astronaut riding a horse on mars".
As shown in this image, there are two astronauts in the image. One of them has 3 legs while the other is riding a twisted motorcycle.
Do you have any idea why it's not producing a good image?

Reproduction

No response

Logs

No response

System Info

I used your docker image

“ImportError: cannot import name 'OneFlowStableDiffusionPipeline' from 'diffusers'” ”TypeError: invalid dtype object: only floating-point types are supported as the default type”

diffusers仓库更新，原tests样例几乎全部无法执行。

参照新样例 https://github.com/Oneflow-Inc/diffusers/blob/main/examples/text_to_image_sd2.py
将 https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py 中的导入方式从

from diffusers import (
    OneFlowStableDiffusionPipeline as StableDiffusionPipeline,
    OneFlowEulerDiscreteScheduler as EulerDiscreteScheduler,
)
from diffusers import utils

改为

from onediff import OneFlowStableDiffusionPipeline as StableDiffusionPipeline
from diffusers import EulerDiscreteScheduler

from diffusers import utils

后，执行报错

==> Try to run graph save...
==> function  get_pipe  try to run...
get_pipe  cuda mem before  2854.75  MB
get_pipe  host mem before  1729.0  MB
Fetching 12 files: 100%|████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 49490.31it/s]
<frozen importlib._bootstrap>:283: DeprecationWarning: the load_module() method is deprecated and slated for removal in Python 3.12; use exec_module() instead
E
======================================================================
ERROR: test_sd_graph_save_and_load (__main__.OneFlowPipeLineGraphSaveLoadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/zhaodi/work/test.py", line 171, in test_sd_graph_save_and_load
    _test_sd_graph_save_and_load(True, f0 ,f1, f2)
  File "/home/zhaodi/work/test.py", line 76, in _test_sd_graph_save_and_load
    sch, pipe = get_pipe()
  File "/home/zhaodi/work/test.py", line 28, in new_fn
    out = fn(*args, **kwargs)
  File "/home/zhaodi/work/test.py", line 72, in get_pipe
    sd_pipe = StableDiffusionPipeline.from_pretrained(
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 739, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2325, in from_pretrained
    dtype_orig = cls._set_default_torch_dtype(torch_dtype)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1109, in _set_default_torch_dtype
    torch.set_default_dtype(dtype)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/__init__.py", line 395, in set_default_dtype
    _C._set_default_dtype(d)
TypeError: invalid dtype object: only floating-point types are supported as the default type

----------------------------------------------------------------------
Ran 1 test in 8.609s

Linux
diffusers 0.12.1
onediff 0.1.0 /home/zhaodi/onediff/src

No module named 'oneflow.utils.checkpoint'

Describe the bug

oneflow 0.8.0
diffusers 0.4.0.dev0
when i runned OneFlowStableDiffusionPipeline locally ,i met the problem,

/////////////////////
Traceback (most recent call last):
File "test_of.py", line 2, in
from diffusers import OneFlowStableDiffusionPipeline
File "/mnt/yinlong/project/disco/one-flow/diffusers/src/diffusers/init.py", line 21, in
from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
File "/mnt/yinlong/project/disco/one-flow/diffusers/src/diffusers/models/init.py", line 28, in
from .unet_2d_condition_oneflow import OneFlowUNet2DConditionModel
File "/mnt/yinlong/project/disco/one-flow/diffusers/src/diffusers/models/unet_2d_condition_oneflow.py", line 6, in
import oneflow.utils.checkpoint
ModuleNotFoundError: No module named 'oneflow.utils.checkpoint'
////////////////////////////////////

what is the problem?BUG?

Reproduction

No response

Logs

No response

System Info

the command failed too, met the same question above

Dynamic Resolution Compilation

Hello, is it possible to compile the model for dynamic resolution generation rather than static? Similar to TensorRT's?
I see in the code that the compilation call is made either if the model hasn't been compiled already OR the request is for a different resolution than the already compilated one.

运行img2img, 报错 Exception msg InferDataType Failed. Expected kFloat, but got kFloat16

Describe the bug

OneFlowStableDiffusionImg2ImgPipeline 貌似没有适配。数据类型对不上

Reproduction

import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline as StableDiffusionPipeline, OneFlowDPMSolverMultistepScheduler as DPMSolverMultistepScheduler, OneFlowStableDiffusionImg2ImgPipeline as StableDiffusionImg2ImgPipeline

model_id = "./stable-diffusion-2-model"
scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
img2img = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
img2img = img2img.to("cuda")

r = img2img(prompt="a cat", image=image, num_inference_steps=25)

Logs

[ERROR](GRAPH:UNetGraph_0:UNetGraph) building graph got error.
ERROR:root:Internal Error: Exception msg InferDataType Failed. Expected kFloat, but got kFloat16

System Info

...

AttributeError: 'OneFlowStableDiffusionImg2ImgPipeline' object has no attribute 'graph_compile_cache'

Describe the bug

我用OneFlowStableDiffusionPipeline通过https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py 编译好的graph，但是用OneFlowStableDiffusionImg2ImgPipeline去用会报错。

Reproduction

from diffusers import OneFlowStableDiffusionImg2ImgPipeline

scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012,
                            beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False,
                            steps_offset=1)
sd_pipe = OneFlowStableDiffusionImg2ImgPipeline.from_pretrained(
      "3_pipe_file_path", scheduler=scheduler, revision="fp16", torch_dtype=torch.float16
  )
sd_pipe.to("cuda:0")

sd_pipe.set_graph_compile_cache_size(5)
sd_pipe.enable_graph_share_mem()

sd_pipe.load_graph("1_graph_save_path", compile_unet=True, compile_vae=False)

prompt = "Pale green clouds,a castle with a garden full of flowers is above the clouds ，light effect,by Makoto Shinkai and Claude Monet,trending on behance,8K"

image = 'data/init_images/1.jpg'
img = sd_pipe(
      prompt,
      image=image,
      strength=0.8,
      height=512,
      width=512,
      num_inference_steps=50,
      guidance_scale=10,
      compile_unet=True,
      compile_vae=False,
      num_images_per_prompt=1,
      eta=0.,
      generator=None,
      output_type="np",
  ).images

img_out = os.path.join('data/outputs/img2img/test', "%s_%s.%s" % (1, 1, 'jpg'))
Image.fromarray(img.astype(np.uint8)).save(img_out)

Logs

No response

System Info

LInux，diffusers=0.10.0.dev，oneflow=0.9.1

num_images_per_prompt cannot be changed at runtime

Describe the bug

An error raise when num_images_per_prompt is changed at runtime.

Reproduction

Code below is OK:

import oneflow as torch
from diffusers import (
    OneFlowStableDiffusionPipeline as DiffusionPipeline,
    OneFlowDPMSolverMultistepScheduler as DPMSolverMultistepScheduler,
)
model_id = "stabilityai/stable-diffusion-2"

# Use the Euler scheduler here instead
scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
images = pipe(prompt, height=768, width=768, num_images_per_prompt=1).images
print(len(images))

images = pipe(prompt, height=768, width=768, num_images_per_prompt=1).images # **same** num_images_per_promopt
print(len(images))

Code below which num_images_per_prompt changes at run time, will raise Error:

import oneflow as torch
from diffusers import (
    OneFlowStableDiffusionPipeline as DiffusionPipeline,
    OneFlowDPMSolverMultistepScheduler as DPMSolverMultistepScheduler,
)
model_id = "stabilityai/stable-diffusion-2"

# Use the Euler scheduler here instead
scheduler = DPMSolverMultistepScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
images = pipe(prompt, height=768, width=768, num_images_per_prompt=1).images
print(len(images))

images = pipe(prompt, height=768, width=768, num_images_per_prompt=2).images # **NOT same** num_images_per_promopt will raise Error
print(len(images))

Error says:

RuntimeError: nn.Graph ONLY accepts static inputs tensor meta, please check whether your input tensor meta each step is the same as the input of first call graph.
The excepted tensor meta is: shape=(2,4,96,96), dtype=oneflow.float16, device=cuda:0, but the actual tensor meta is: shape=(4,4,96,96), dtype=oneflow.float16, device=cuda:0

Logs

No response

System Info

python -m oneflow --doctor
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
path: ['/usr/local/miniconda3/envs/py3.10.8/lib/python3.10/site-packages/oneflow']
version: 0.8.1+cu112.git.2a86da23
git_commit: 2a86da23
cmake_build_type: Release
rdma: True
mlir: True

multi_head_attention error

Describe the bug

Run demo with docker:
docker run --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v ${HF_HOME}:${HF_HOME} -v ${PWD}:${PWD} -w ${PWD} -e HF_HOME=${HF_HOME} -e HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN} oneflowinc/oneflow-sd:cu112 python3 /demos/oneflow-t2i.py --prompt "a photo of a cat riding a horse on mars"

Reproduction

No response

Logs

WARNING: CUDA Minor Version Compatibility mode ENABLED.
  Using driver version 470.42.01 which has support for CUDA 11.4.  This container
  was built with CUDA 11.8 and will be run in Minor Version Compatibility mode.
  CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
  with this container but was unavailable:
  [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 2110.81it/s]
[oneflow] [text_encoder] transformers.OneFlowCLIPTextModel
[oneflow] [unet] diffusers.OneFlowUNet2DConditionModel
[oneflow] [safety_checker] stable_diffusion.OneFlowStableDiffusionSafetyChecker
[diffusers] [feature_extractor] transformers.CLIPFeatureExtractor
[oneflow] [scheduler] diffusers.OneFlowPNDMScheduler
[diffusers] [tokenizer] transformers.CLIPTokenizer
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
[oneflow] [vae] diffusers.OneFlowAutoencoderKL
[oneflow] compiling unet beforehand to make sure the progress bar is more accurate
[oneflow] [elapsed(s)] [unet compilation] 23.832770048989914
  0%|                                                                                                                                                                                       | 0/51 [00:00<?, ?it/s]F20221108 02:54:25.671985   157 fused_multi_head_attention_inference_kernel.cu:150] UNIMPLEMENTED
*** Check failure stack trace: ***
    @     0x7f35f204bf6a  google::LogMessage::Fail()
    @     0x7f35f204c252  google::LogMessage::SendToLog()
    @     0x7f35f204bad7  google::LogMessage::Flush()
    @     0x7f35f204e649  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f35e6042d62  oneflow::user_op::(anonymous namespace)::DispatchArchTag<>()
    @     0x7f35eba61e2a  oneflow::user_op::(anonymous namespace)::DispatchArchTag<>()
    @     0x7f35eba584c5  oneflow::user_op::(anonymous namespace)::DispatchCutlassFmha()
    @     0x7f35eba65c4a  oneflow::user_op::(anonymous namespace)::FusedMultiHeadAttentionInferenceKernel::Compute()
    @     0x7f35eac67523  oneflow::UserKernel::ForwardUserKernel()
    @     0x7f35eac676a4  oneflow::UserKernel::ForwardDataContent()
    @     0x7f35eac35747  oneflow::Kernel::Forward()
    @     0x7f35eac360c0  oneflow::Kernel::Launch()
    @     0x7f35eacda00d  oneflow::(anonymous namespace)::LightActor<>::ProcessMsg()
    @     0x7f35eb23dfd0  oneflow::Thread::PollMsgChannel()
    @     0x7f35eb23e348  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7oneflow6ThreadC4ERKNS3_8StreamIdEEUlvE_EEEEE6_M_runEv
    @     0x7f35f20609af  execute_native_thread_routine
    @     0x7f36c8db1609  start_thread
    @     0x7f36c8b70133  clone

System Info

Docker version: 20.10.14
Image: oneflowinc/oneflow-sd:cu112

support for SD v2 depth2image

Requesting support for SD v2.0 depth2image model:
pipeline_stable_diffusion_depth2img.py

Increasing the batch size may not improve performance, is that expected?

We use OneFlowStableDiffusionPipeline to run a pretrained model. It's a huge performance improvement over the original.

However if I want to increase the batch size to improve the average performance per image, this does not work. The gpu memory usage increased, but the average time consuming even increased as well.

I tried using a list of prompts (batchlized_prompt = [ p for p in range(num_images_per_prompt))),

or pass in the num_images_per_prompt parameter.

According to the callback, all the images are generating parallel, but the total time could never reduced.

Any advice is super appreciated.

AttributeError: module 'torch' has no attribute 'mock_torch'

Describe the bug

I got an error while following https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion.

Reproduction

No response

Logs

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-32-9857761ca7ed>](https://localhost:8080/#) in <module>
      2 from diffusers import OneFlowStableDiffusionPipeline
      3 
----> 4 pipe = OneFlowStableDiffusionPipeline.from_pretrained(
      5     "CompVis/stable-diffusion-v1-4",
      6     use_auth_token=True,

[/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_oneflow_utils.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    714                 class_candidates = {c: class_obj for c in importable_classes.keys()}
    715             else:
--> 716                 with torch.mock_torch.enable():
    717                     # else we just import it from the library.
    718                     library = importlib.import_module(library_name)

AttributeError: module 'torch' has no attribute 'mock_torch'

System Info

Colab.
Tried almost every versions untill 0.7.0dev0.

diffusers version: 0.7.0.dev0
Platform: Linux-5.10.147+-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.13.1+cu116 (True)
Huggingface_hub version: 0.12.0
Transformers version: 4.26.0
Using GPU in script?:
Using distributed or parallel set-up in script?:

CUDA initialization error

Describe the bug

Error when running OneFlow Stable Diffusion without docker
got error like:

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
W20221108 11:30:13.097517 41659 cuda_device_descriptor_class.cpp:48] initialization error

Reproduction

Code is the same as the provided example

Logs

loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1
W20221108 11:30:13.097517 41659 cuda_device_descriptor_class.cpp:48] initialization error
Fetching 16 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 17067.36it/s]
[oneflow] [unet] diffusers.OneFlowUNet2DConditionModel
[oneflow] [text_encoder] transformers.OneFlowCLIPTextModel
[oneflow] [safety_checker] stable_diffusion.OneFlowStableDiffusionSafetyChecker
[oneflow] [vae] diffusers.OneFlowAutoencoderKL
[diffusers] [feature_extractor] transformers.CLIPFeatureExtractor
[diffusers] [tokenizer] transformers.CLIPTokenizer
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
[oneflow] [scheduler] diffusers.OneFlowPNDMScheduler
E20221108 11:30:41.811245 41659 cuda_device_manager_factory.cpp:65] Failed to get cuda runtime version: initialization error
F20221108 11:30:41.811539 41659 scheduler.cpp:125] Check failed: err : initialization error (3) 
*** Check failure stack trace: ***
    @     0x7f29565f0f6a  google::LogMessage::Fail()
    @     0x7f29565f1252  google::LogMessage::SendToLog()
    @     0x7f29565f0ad7  google::LogMessage::Flush()
    @     0x7f29565f3649  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f294ef4e2f0  oneflow::boxing::collective::ExecutorImpl::Init()
    @     0x7f294ef4f6b5  oneflow::boxing::collective::Scheduler::Impl::Impl()
    @     0x7f294ef4fc93  oneflow::boxing::collective::Scheduler::Scheduler()
    @     0x7f294e6fa0b2  oneflow::MultiClientSessionContext::TryInit()
    @     0x7f294e6fab5e  oneflow::MultiClientSessionContext::TryInit()
    @     0x7f2a1e564dcf  (unknown)
    @     0x7f2a1e3f8f79  (unknown)
    @           0x4ffdb7  cfunction_call
    @           0x4f95eb  _PyObject_MakeTpCall.localalias
    @           0x50c73f  method_vectorcall
    @           0x4f4d0c  _PyEval_EvalFrameDefault
    @           0x5001ff  _PyFunction_Vectorcall
    @           0x4f0913  _PyEval_EvalFrameDefault
    @           0x50c44e  method_vectorcall
    @           0x4f4d0c  _PyEval_EvalFrameDefault
    @           0x4f893d  _PyObject_FastCallDictTstate.localalias
    @           0x509bc8  slot_tp_init
    @           0x4f963b  _PyObject_MakeTpCall.localalias
    @           0x4f4fcb  _PyEval_EvalFrameDefault
    @           0x5001ff  _PyFunction_Vectorcall
    @           0x4f89ed  _PyObject_FastCallDictTstate.localalias
    @           0x509bc8  slot_tp_init
    @           0x4f9956  type_call
    @           0x50d0d9  PyObject_Call
    @           0x4f2c32  _PyEval_EvalFrameDefault
    @           0x50c44e  method_vectorcall
    @           0x4f1592  _PyEval_EvalFrameDefault
    @           0x599fe2  _PyEval_Vector
Aborted (core dumped)

System Info

diffusers version: 0.4.0.dev0
Platform: Linux-4.15.0-55-generic-x86_64-with-glibc2.27
Python version: 3.10.6
PyTorch version (GPU?): 1.12.1 (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.23.0.dev0
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no
CUDA version: 10.2
CUDA Driver version: 440.33.01

Stuck at loading library

Describe the bug

Stuck on loaded library: /lib/x86_64-linux-gnu/libibverbs.so.1

Reproduction

No response

Logs

No response

System Info

My cuda and GPU info:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P8    15W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

"import error: oneflow.cuda.amp.GradScaler is not implemented" and "TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not Tensor"

按照 https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion 中 Without Docker 的方式配置，所有操作都完成，但是执行 from diffusers import OneFlowStableDiffusionPipeline 报错

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zhaodi/diffusers/src/diffusers/__init__.py", line 22, in <module>
    from transformers import CLIPTextModel, CLIPFeatureExtractor
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1101, in __getattr__
    value = getattr(module, name)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1100, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1112, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
oneflow.cuda.amp.GradScaler is not implemented, please submit an issue at  
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the 
minimum reproduction code, and the system information.

使用这个方法可以解决导入报错：https://github.com/Oneflow-Inc/diffusers/issues/104#issuecomment-1434151151

我想要运行该样例：https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/tests/test_pipelines_oneflow_graph_load.py
于是按照 https://github.com/Oneflow-Inc/diffusers/issues/104#issuecomment-1434151151 在 from diffusers import ... 之前先导入 transformers 模块，但是在运行中触发了新的错误

sd init time  373.8917450904846 s.
==> function  text_to_image  try to run...
text_to_image  cuda mem before  4336.75  MB
text_to_image  host mem before  9173.0  MB
E
======================================================================
ERROR: test_sd_graph_save_and_load (__main__.OneFlowPipeLineGraphSaveLoadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/zhaodi/work/test.py", line 175, in test_sd_graph_save_and_load
    _test_sd_graph_save_and_load(True, f0 ,f1, f2)
  File "/home/zhaodi/work/test.py", line 141, in _test_sd_graph_save_and_load
    no_g_images = text_to_image(prompt, (i, j), prefix=f"is_save_{str(is_save)}-", with_graph=False)
  File "/home/zhaodi/work/test.py", line 32, in new_fn
    out = fn(*args, **kwargs)
  File "/home/zhaodi/work/test.py", line 117, in text_to_image
    images = pipe(
  File "/home/zhaodi/oneflow/python/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/home/zhaodi/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py", line 620, in __call__
    text_embeddings = self._encode_prompt(
  File "/home/zhaodi/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py", line 393, in _encode_prompt
    text_embeddings = self.text_encoder(text_input_ids.to(device), attention_mask=attention_mask)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 816, in forward
    return self.text_model(
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 712, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 227, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/home/zhaodi/miniconda3/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not Tensor

----------------------------------------------------------------------
Ran 1 test in 374.084s

FAILED (errors=1)

System Info

Linux
oneflow 0.9.1.dev20230216+cu117
transformers 4.26.1
diffusers 0.10.0.dev0
huggingface-hub 0.12.0

AttributeError: 'NoneType' object has no attribute 'net' - Custom Stable Diffusion Pipeline

Hello, I am trying to replicate the work of the Stable Diffusion Pipeline so I can have more control over it.
More specifically, I want to only load specific (compiled) Unets and VAEs.

To do this I've written the following script:

https://gist.github.com/chavinlo/79776f50006698e477796c4c58083623

Everything goes well until inference on the unet is attempted, more precisely at L248: https://gist.github.com/chavinlo/79776f50006698e477796c4c58083623#file-test-py-L248

The model thinks its not compiled and tries to compile it, and returns the following error:

Traceback (most recent call last):
  File "/root/node/test_the_test.py", line 5, in <module>
    engine(json.load(open('/root/node/cfg/basic.json')))
  File "/root/env/lib/python3.10/site-packages/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/root/node/test.py", line 248, in engine
    noise_pred = unet_graph(latent_model_input, t, text_embeddings)
  File "/root/diffusers/src/diffusers/oneflow_graph_compile_cache.py", line 61, in __call__
    self.compile(*args, **kwargs)
  File "/root/diffusers/src/diffusers/oneflow_graph_compile_cache.py", line 31, in compile
    self.graph_._compile_from_shared(*args, **kwargs)
  File "/root/env/lib/python3.10/site-packages/oneflow/nn/graph/graph.py", line 872, in _compile_from_shared
    self._shared_graph._forward_job_proto.net.op
AttributeError: 'NoneType' object has no attribute 'net'

its trying to use _graph, that object seems to have the net attribute along with the rest that it's looking for
but the script searches it up as "_shared_graph"

What it's looking for:

right is the line where it breaks and left is the graph_ object above

So first when unet_graph() is called graph_.forward_job_proto is None
but at the end of it self.shared_graph is None and the prior graph is Attr Error
Then compile_from_shared is called again? this time self.graph is net
Then everything turns in to namerror where self is not defined and unet_graph is called again?
and then it fails with graph_ being net and _shared_graph being attr error

This is the lpw.py code if needed:
https://gist.github.com/chavinlo/9e4c5c6c8e0f82f882a04a3fe4e54d88
note that it's different from the other issue I created

Any suggestions?

No module named 'oneflow.utils.checkpoint'

Describe the bug

File "/root/diffusers/src/diffusers/models/unet_2d_condition_oneflow.py", line 6, in
import oneflow.utils.checkpoint

Reproduction

import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline
pipe = OneFlowStableDiffusionPipeline.from_pretrained(
local_model_path,
use_auth_token=True,
revision="fp16",
torch_dtype=torch.float16,
)

Logs

Traceback (most recent call last):
  File "demo_inference.py", line 8, in <module>
    from diffusers import StableDiffusionPipeline
  File "/root/diffusers/src/diffusers/__init__.py", line 21, in <module>
    from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
  File "/root/diffusers/src/diffusers/models/__init__.py", line 28, in <module>
    from .unet_2d_condition_oneflow import OneFlowUNet2DConditionModel
  File "/root/diffusers/src/diffusers/models/unet_2d_condition_oneflow.py", line 6, in <module>
    import oneflow.utils.checkpoint
ModuleNotFoundError: No module named 'oneflow.utils.checkpoint'

System Info

Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0]
OS platform: Linux-4.19.0-19-amd64-x86_64-with-glibc2.17
OS architecture: x86_64
Torch version: 1.12.1+cu102
Cuda available: True
Cuda version: 10.2
CuDNN version: 7605
Number of GPUs available: 1
transformers version: 4.23.0.dev0

我用的非docker安装的环境，生成的结果看着很不对

Describe the bug

这些生成结果明显就很有问题，和huggingface diffuser的结果不太一样，是我哪里搞得不对吗？

Reproduction

import oneflow as torch
import time
from diffusers import OneFlowStableDiffusionPipeline

pipe = OneFlowStableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
use_auth_token=True,
revision="fp16",
torch_dtype=torch.float16,
)

pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

for i in range(10):
torch.cuda.synchronize()
sampler_time = time.time()
with torch.autocast("cuda"):
images = pipe(prompt).images
torch.cuda.synchronize()
sampler1_time = time.time()
print('loop_time:',sampler1_time-sampler_time)
for j, image in enumerate(images):
image.save(f"{prompt}-of-{j}-{i}.png")

Logs

No response

System Info

diffusers-0.4.0.dev0
transformers-4.23.0.dev0
python3.8.13
pytorch 1.13.0a0+d0d6b1f
cuda11,8

请增加img2img的代码

请增加img2img的修改代码，目前只有txt2img的代码

When will fused_multi_head_attention_inference support attn mask?

Description

hi，
I have a stable diffusion project that requires attn mask and would like to use oneflow for acceleration. I see that xformers already supports it, but it's still too slow compared to oneflow.So when will fused_multi_head_attention_inference support attn mask?

AttributeError: module transformers has no attribute CLIPImageProcessor

Describe the bug

When I use locally trained with dreambooth, following error occurs.

AttributeError: module transformers has no attribute CLIPImageProcessor

Reproduction

In case of locally trained model with dreambooth,
model_index.json included following :

{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.10.2",
"feature_extractor": [
"transformers",
"CLIPImageProcessor"
],

Logs

Traceback (most recent call last):
  File "oneflow-test.py", line 17, in <module>
    pipe = StableDiffusionPipeline.from_pretrained(
  File "/src/diffusers/src/diffusers/pipeline_oneflow_utils.py", line 657, in from_pretrained
    class_obj = getattr(library, class_name)
  File "/src/transformers/src/transformers/utils/import_utils.py", line 1043, in __getattr__
    raise AttributeError(f"module {self.__name__} has no attribute {name}")
AttributeError: module transformers has no attribute CLIPImageProcessor
Segmentation fault

System Info

diffusers version: 0.10.0.dev0
Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.10
Python version: 3.8.13
PyTorch version (GPU?): 1.13.0a0+d0d6b1f (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.23.0.dev0
Using GPU in script?:
Using distributed or parallel set-up in script?:

OneFlow Stable Diffusion 运行

https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion

fail to run txt2img

Describe the bug

when I try:

from diffusers import StableDiffusionPipeline, OneFlowStableDiffusionPipeline
import oneflow as torch
# from diffusers import DDIMScheduler

model_path = "./xdiffusion"  
prompt = "a cute girl, blue eyes, brown hair"

pipe = OneFlowStableDiffusionPipeline.from_pretrained(
        model_path, 
        # scheduler=DDIMScheduler(
        #     beta_start=0.00085,
        #     beta_end=0.012,
        #     beta_schedule="scaled_linear",
        #     clip_sample=False,
        #     set_alpha_to_one=True,
        # )
    )

def dummy(images, **kwargs):
    return images, False
pipe.safety_checker = dummy
pipe = pipe.to("cuda")
image = pipe(prompt, num_inference_steps=30).images[0]  
image.save(f"output.png")

I got a error with:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 (positional 0) and cpu:0 (positional 1)!

Reproduction

No response

Logs

(ldm) [root@VM-0-3-centos models]# python test.py
libibverbs not available, ibv_fork_init skipped
[oneflow] [vae] diffusers.OneFlowAutoencoderKL
[diffusers] [tokenizer] transformers.CLIPTokenizer
[oneflow] [unet] diffusers.OneFlowUNet2DConditionModel
[oneflow] [safety_checker] stable_diffusion.OneFlowStableDiffusionSafetyChecker
[diffusers] [feature_extractor] transformers.CLIPFeatureExtractor
[oneflow] [text_encoder] transformers.OneFlowCLIPTextModel
[oneflow] [scheduler] diffusers.OneFlowDDIMScheduler
[oneflow] compiling unet beforehand to make sure the progress bar is more accurate
[oneflow] [elapsed(s)] [unet compilation] 32.545996212400496
  0%|                                                                                                                             | 0/30 [00:00<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /www/models/test.py:23 in <module>                                                               │
│                                                                                                  │
│   20 │   return images, False                                                                    │
│   21 pipe.safety_checker = dummy                                                                 │
│   22 pipe = pipe.to("cuda")                                                                      │
│ ❱ 23 image = pipe(prompt, num_inference_steps=30).images[0]                                      │
│   24 image.save(f"output.png")                                                                   │
│                                                                                                  │
│ /root/.conda/envs/ldm/lib/python3.8/site-packages/oneflow/autograd/autograd_mode.py:154 in       │
│ wrapper                                                                                          │
│                                                                                                  │
│   151 │   def __call__(self, func):                                                              │
│   152 │   │   def wrapper(*args, **kwargs):                                                      │
│   153 │   │   │   with AutoGradMode(False):                                                      │
│ ❱ 154 │   │   │   │   return func(*args, **kwargs)                                               │
│   155 │   │                                                                                      │
│   156 │   │   return wrapper                                                                     │
│   157                                                                                            │
│                                                                                                  │
│ /www/models/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow │
│ .py:345 in __call__                                                                              │
│                                                                                                  │
│   342 │   │   │   if isinstance(self.scheduler, LMSDiscreteScheduler):                           │
│   343 │   │   │   │   latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwarg   │
│   344 │   │   │   else:                                                                          │
│ ❱ 345 │   │   │   │   latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwarg   │
│   346 │   │   │   torch._oneflow_internal.profiler.RangePop()                                    │
│   347 │   │                                                                                      │
│   348 │   │   # scale and decode the image latents with vae                                      │
│                                                                                                  │
│ /www/models/diffusers/src/diffusers/schedulers/scheduling_ddim_oneflow.py:259 in step            │
│                                                                                                  │
│   256 │   │                                                                                      │
│   257 │   │   # 3. compute predicted original sample from predicted noise also called            │
│   258 │   │   # "predicted x_0" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf        │
│ ❱ 259 │   │   pred_original_sample = (sample - beta_prod_t ** (0.5) * model_output) / alpha_pr   │
│   260 │   │                                                                                      │
│   261 │   │   # 4. Clip "predicted x_0"                                                          │
│   262 │   │   if self.config.clip_sample:                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 (positional 0) and cpu:0 (positional 1)!

System Info

centos7.6
cuda11.3
python3.8

support for negative_prompt

the negative prompt is very useful to improve the quality of image in txt2img, img2img and inpaint tasks.Is it possible for oneflow to support it?

pipeline_stable_diffusion.py

sd2是否支持多卡推理？

Hi
感谢开源，在t4卡上，对比了下Aitempleate和oneflow，oneflow效果上比较稳定，但是虽然优化提速了很多，单条prompt生成(768*768, steps=20)差不多还是要5s，可以通过多卡来提升性能吗？

InpaintPipelineLegacy 无法使用

Describe the bug

测试图片：
#####################
https://image.netfrp.com/uploads/63cd2778d8ad9.png
https://image.netfrp.com/uploads/63cd2799f0868.png

Reproduction

No response

Logs

Traceback (most recent call last):
  File "/root/tasker-anime_txt2img/run.py", line 102, in execute_task
    result = inpaint(model, prompt, negative_prompt, img_url, mask_url, seed, strength, scale, steps)
  File "/root/tasker-anime_txt2img/api.py", line 251, in inpaint
    drawer.draw(
  File "/root/tasker-anime_txt2img/drawer_oneflow.py", line 191, in draw
    result = pipe(**params)
  File "/root/.conda/envs/ai/lib/python3.10/site-packages/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/root/diffusers-oneflow-fork/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy_oneflow.py", line 565, in __call__
    device = self._execution_device
  File "/root/diffusers-oneflow-fork/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint_legacy_oneflow.py", line 279, in _execution_device
    if self.device != torch.device("meta") or not hasattr(self.unet, "_hf_hook"):
RuntimeError: Expected one of cpu, cuda device type at start of device string: meta



### System Info

diffusers-oneflow  
transformer-oneflow

oneflow.utils.hooks is not implemented

Describe the bug

installed like this,

git clone https://github.com/Oneflow-Inc/diffusers.git
cd diffusers
python3 -m pip install -e .[oneflow]

Reproduction

No response

Logs

Traceback (most recent call last):
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1110, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 27, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 83, in <module>
    from accelerate import __version__ as accelerate_version
  File "/home/terrance/.local/lib/python3.10/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/home/terrance/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 27, in <module>
    import torch.utils.hooks as hooks
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 674, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "/home/terrance/.local/lib/python3.10/site-packages/oneflow/mock_torch/__init__.py", line 88, in create_module
    raise NotImplementedError(oneflow_mod_fullname + error_msg)
NotImplementedError: oneflow.utils.hooks is not implemented, please submit an issue at  
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the 
minimum reproduction code, and the system information.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/terrance/oneflow/test_diffusion.py", line 2, in <module>
    from diffusers import OneFlowStableDiffusionPipeline
  File "/home/terrance/oneflow/diffusers/src/diffusers/__init__.py", line 22, in <module>
    from transformers import CLIPTextModel, CLIPFeatureExtractor
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1101, in __getattr__
    value = getattr(module, name)
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1100, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/terrance/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1112, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.clip.modeling_clip because of the following error (look up to see its traceback):
oneflow.utils.hooks is not implemented, please submit an issue at  
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the 
minimum reproduction code, and the system information.

System Info

cuda 10.2

Custom Schedulers cannot be changed at runtime

Describe the bug

They simply can't. I am following the example provided in the Wiki.

Reproduction

I have tried loading the scheduler like this:

diffusers.OneFlowDPMSolverMultistepScheduler.from_config(sch_source, subfolder="scheduler", algorithm_type="dpmsolver")

following the example on the Wiki

then, passing it to the pipe like this:

  if scheduler in schedulers:
      scheduler_obj = schedulers[scheduler]
      scheduler_obj.set_timesteps(steps, device=device)
      #scheduler_obj is a OneFlowDPMSolverMultistepScheduler object.
  else:
      imgq('fail', f'scheduler {scheduler} not found')
      continue
  with torch.autocast("cuda"):
      pipe.scheduler = scheduler_obj
      image = pipe(
          prompt=prompt,
          num_inference_steps=steps,
          guidance_scale=cfg,
          negative_prompt=negative_prompt,
          height=image_height,
          width=image_width,
          generator=torch.Generator().manual_seed(seed)
      )[0][0]

Logs

Traceback (most recent call last):
  File "/workspace/node/threads/base.py", line 77, in image_generator
    image = pipe(
  File "/workspace/env/lib/python3.10/site-packages/oneflow/autograd/autograd_mode.py", line 154, in wrapper
    return func(*args, **kwargs)
  File "/workspace/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py", line 676, in __call__
    noise_pred = unet_graph(latent_model_input, t, text_embeddings)
  File "/workspace/diffusers/src/diffusers/oneflow_graph_compile_cache.py", line 63, in __call__
    return self.graph_(*args, **kwargs)
  File "/workspace/env/lib/python3.10/site-packages/oneflow/nn/graph/graph.py", line 250, in __call__
    return self.__run(*args, **kwargs)
  File "/workspace/env/lib/python3.10/site-packages/oneflow/nn/graph/graph.py", line 1481, in __run
    oneflow._oneflow_internal.nn.graph.RunLazyNNGraph(
oneflow._oneflow_internal.exception.RuntimeError: Error: nn.Graph ONLY accepts static inputs tensor meta, please check whether your input tensor meta each step is the same as the input of first call graph.
The excepted tensor meta is: shape=(), dtype=oneflow.int64, device=cpu:0, but the actual tensor meta is: shape=(), dtype=oneflow.float64, device=cpu:0. The input index is 1.

System Info

Using OneFlow latest version
Python 3.10.10
Torch 1.13.1
CUDA 12.0
Driver 525.78
A100-SXM4-80GB

Assertion Error: "assert og_torch.cuda.is_initialized() is False" Stable Diffusion Pipeline

Hello, I am trying to add Prompt extension and weighting by slightly modifying the Stable Diffusion Pipeline.
I do this by replacing the pipeline._encode_prompt with lpw_pipe._encode_prompt.
This is the lpw script: https://gist.github.com/chavinlo/b7ebc7e7dea59e311dab564fd452ff3c#file-lpw-py-L393

import oneflow as torch
import torch as og_torch
from .lpw import LongPromptWeightingPipeline

#load the text_model and tokenizer to be used on LPW
text_model = CLIPTextModel.from_pretrained(default_model, subfolder="text_encoder")
tokenizer_model = CLIPTokenizer.from_pretrained(default_model, subfolder="tokenizer")
text_model = text_model.to("cuda")
lpw_pipe = LongPromptWeightingPipeline(text_model, tokenizer_model, prompt_multiplier)

...
#Here I load multiple models from a configuration file.
pipe_map = dict()
for model in config['models']:
    print("Loading model:", model['model_path'])
    tmp_pipe = OneFlowStableDiffusionPipeline.from_pretrained(
        pretrained_model_name_or_path=model['model_path'],
        use_auth_token=True,
        torch_dtype=torch.float16
        )
    tmp_pipe.to("cuda")
    tmp_pipe._encode_prompt = lpw_pipe._encode_prompt
    tmp_pipe.enable_graph_share_mem()
    tmp_prompt = "Anime girl, beautiful"
    tmp_neg_prompt = "Disgusting, Horrible"
    for resolution in resultant_resolutions:
        print("Doing resolution:", resolution)
        with torch.autocast("cuda"):
            tmp_pipe(
                prompt=tmp_prompt,
                negative_prompt=tmp_neg_prompt,
                height=resolution[1],
                width=resolution[0]
            )
    pipe_map[model['alias']] = tmp_pipe

In normal circustances it exits due to assertionerror on assert og_torch.cuda.is_initialized() is False @ https://github.com/Oneflow-Inc/diffusers/blob/oneflow-fork/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_oneflow.py#L709

If this assertion is removed, it goes through but uses 3 times the VRAM per resolution round.

Heres the complete script: https://gist.github.com/chavinlo/d8005ebda6499853891c9edae8765b4b

how can I use cache model?

Describe the bug

when I try:

import os
import oneflow as torch
from diffusers import OneFlowStableDiffusionPipeline

pipe = OneFlowStableDiffusionPipeline.from_pretrained(
    '~/.cache/huggingface/diffusers/models--CompVis--stable-diffusion-v1-4/',
    revision="fp16",
    torch_dtype=torch.float16
)

Error

OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like ~/.cache/huggingface/diffusers/models--CompVis--stable-diffusion-v1-4/ is not the path to a directory containing a model_index.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/diffusers/installation#offline-mode'.

In addition

when I try:

pipe = OneFlowStableDiffusionPipeline.from_pretrained(
    './stable-diffusion-v1-4',  # download from huggingface diffusers model
    revision="fp16",
    torch_dtype=torch.float16
)

Error

RuntimeError: Error(s) in loading state_dict for OneFlowAutoencoderKL:
	While copying the parameter "encoder.conv_in.weight", an exception occurred : 

	Traceback (most recent call last):
	  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/oneflow/nn/module.py", line 788, in _load_from_state_dict
	    param.copy_(input_param)
	  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/oneflow/framework/tensor.py", line 341, in _copy
	    _copy_from_numpy_to_eager_local_tensor(self, other)
	  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/oneflow/framework/tensor.py", line 290, in _copy_from_numpy_to_eager_local_tensor
	    assert np_arr.dtype == flow.convert_oneflow_dtype_to_numpy_dtype(
	AssertionError

Reproduction

No response

Logs

No response

System Info

Platform: Linux-4.19.91-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.11.0 (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.23.0.dev0
oneflow：0.8.1+cu112.git.5610333

why img2img pipline's gpu memory usage is double bigger than pytorch?

Describe the bug

T4 gpu, 512*512
pytorch img2img pipline: 6GB
oneflow img2img pipline: from 6GB to 11GB peak

Reproduction

stable-diffusion-v1-5 fp16 model
No response

Logs

No response

System Info

centos 7, python 3.7

[Performance Issue]: Takes a long time after a change in width and height to previous request

Brief Description

I am using oneflow with stable diffusion. If I generate the results in 512x512, it can generate the result in 1 second. If I change the width and height, it will generate the next result in ~10 seconds. Then it will generate normally afterwards on the same dimensions. So, a change in width and height causes the model to slow down for the first inference on the new dimensions.

Device and Context

A100 40 Gb.

Benchmark

Normal inference: ~1 second
Inference after change in dimensions (for first time): ~10 seconds

Alternatives

No response

AttributeError: module 'torch' has no attribute 'mock_torch'

Describe the bug

This question is similar to #87, but it seems like I'm not facing the same root error. I had followed @daquexian's solution to install the diffusers using pip install instead of pip install -e but not works.

Besides, I can reproduce the issue using a py script instead of IPython or other similar notebooks.

According to the Reproduction part, it seems like the error may comes from the pollution of mock_torch.enable ?

Any advice is super appreciated.

Reproduction

import oneflow as torch
import torch as og_torch
from diffusers import (
    OneFlowStableDiffusionPipeline as StableDiffusionPipeline,
    OneFlowEulerDiscreteScheduler as EulerDiscreteScheduler,
    OneFlowDPMSolverMultistepScheduler as DPMSolverMultistepScheduler
)
MODEL_ID = "/path/to/a/local/checkpoint"
scheduler = EulerDiscreteScheduler.from_pretrained(MODEL_ID, subfolder="scheduler")
# scheduler = DPMSolverMultistepScheduler.from_config(MODEL_ID, subfolder="scheduler") # also reproducible
diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16) # passing og_torch.float16 also reproducible

I pasted the above code to a py script named as test_loader.py.

Firstly, let me add some print code in diffusers/pipeline_oneflow_utils.py around torch.mock_torch.enable

print(f"BEFORE: type: {torch} library_name {library_name} pid {os.getpid()} ppid {os.getppid()}")
traceback.print_stack()
with torch.mock_torch.enable():
    print(f"IN: type: {torch} library_name {library_name} pid {os.getpid()} ppid {os.getppid()}")
    # else we just import it from the library.
    library = importlib.import_module(library_name)

    class_obj = getattr(library, class_name)

This is the output

BEFORE: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f620662d220>)> library_name transformers pid 3134 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
IN: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f620662d220>)> library_name transformers pid 3134 ppid 95
BEFORE: type: <module 'torch' from '/opt/conda/lib/python3.8/site-packages/torch/__init__.py'> library_name diffusers pid 3134 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
Traceback (most recent call last):
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 720, in from_pretrained
    with torch.mock_torch.enable():
AttributeError: module 'torch' has no attribute 'mock_torch'

And then I imported oneflow as oneflow (😯)

Here is my new header:

...
import numpy as np
import oneflow as torch
import torch as og_torch
import oneflow as oneflow
import traceback

import diffusers

And then, I modified mock_torch.enable() function as follow:

print(f"BEFORE: type: {torch} library_name {library_name} pid {os.getpid()} ppid {os.getppid()}")
traceback.print_stack()
# with torch.mock_torch.enable():
with oneflow.mock_torch.enable():
    print(f"IN: type: {torch} library_name {library_name} pid {os.getpid()} ppid {os.getppid()}")
    # else we just import it from the library.
    library = importlib.import_module(library_name)

    class_obj = getattr(library, class_name)

And I executed the test script again, it finally works

BEFORE: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f63c677c1c0>)> library_name diffusers pid 3204 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
IN: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f63c677c1c0>)> library_name diffusers pid 3204 ppid 95
BEFORE: type: <module 'torch' from '/opt/conda/lib/python3.8/site-packages/torch/__init__.py'> library_name transformers pid 3204 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
IN: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f63c677c1c0>)> library_name transformers pid 3204 ppid 95
BEFORE: type: <module 'torch' from '/opt/conda/lib/python3.8/site-packages/torch/__init__.py'> library_name transformers pid 3204 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
IN: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f63c677c1c0>)> library_name transformers pid 3204 ppid 95
BEFORE: type: <module 'torch' from '/opt/conda/lib/python3.8/site-packages/torch/__init__.py'> library_name diffusers pid 3204 ppid 95
  File "test_loader.py", line 10, in <module>
    diffusion_t2m = StableDiffusionPipeline.from_pretrained(MODEL_ID, scheduler=scheduler, revision="fp16", torch_dtype=torch.float16)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/pipeline_oneflow_utils.py", line 719, in from_pretrained
    traceback.print_stack()
IN: type: <module 'torch' (<oneflow.mock_torch.OneflowImporter object at 0x7f63c677c1c0>)> library_name diffusers pid 3204 ppid 95
The config attributes {'class_embed_type': None, 'mid_block_type': 'UNetMidBlock2DCrossAttn', 'resnet_time_scale_shift': 'default', 'upcast_attention': False} were passed to OneFlowUNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.

Logs

No response

System Info

diffusers version: 0.10.0.dev0 ( commit sha ea94536539aa1f17511b83a85fa08e3b9f989411 )
Platform: Linux-3.10.0-1160.62.1.el7.x86_64-x86_64-with-glibc2.10
Python version: 3.8.12
PyTorch version (GPU?): 1.12.0+cu116 (True)
Huggingface_hub version: 0.12.0
Transformers version: 4.26.0
Using GPU in script?: yes
Using distributed or parallel set-up in script?: all tried

root@f1be73ed83cb:/tmp# pip freeze | grep -E 'oneflow|torch'
oneflow==0.9.0
onnx @ file:///opt/pytorch/pytorch/third_party/onnx
pytorch-quantization==2.1.2
torch==1.12.0+cu116
torch-tensorrt @ file:///opt/pytorch/torch_tensorrt/py/dist/torch_tensorrt-1.1.0a0-cp38-cp38-linux_x86_64.whl
torchaudio==0.12.0+cu116
torchtext==0.13.0
torchvision==0.13.0+cu116
root@f1be73ed83cb:/tmp# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Thu_Feb_10_18:23:41_PST_2022
Cuda compilation tools, release 11.6, V11.6.112
Build cuda_11.6.r11.6/compiler.30978841_0

run by "How to Run OneFlow Stable Diffusion "

Describe the bug

oneflow.cuda.amp.GradScaler is not implemented, please submit an issue at
'https://github.com/Oneflow-Inc/oneflow/issues' including the log information of the error, the
minimum reproduction code, and the system information.

Reproduction

No response

Logs

No response

System Info

Unbunt RTX3090

Support for stable diffusion 2.0

Stable Diffusion 2.0 is released. Just see https://github.com/Stability-AI/StableDiffusion.

run error

Describe the bug

The conv_out weight and bias have AssertionError

Reproduction

While copying the parameter "conv_out.weight", an exception occurred :

    Traceback (most recent call last):
      File "/home/kang/anaconda3/envs/kyt/lib/python3.8/site-packages/oneflow/nn/module.py", line 788, in

load_from_state_dict
param.copy(input_param)
File "/home/kang/anaconda3/envs/kyt/lib/python3.8/site-packages/oneflow/framework/tensor.py", line 341,
in _copy
_copy_from_numpy_to_eager_local_tensor(self, other)
File "/home/kang/anaconda3/envs/kyt/lib/python3.8/site-packages/oneflow/framework/tensor.py", line 290,
in _copy_from_numpy_to_eager_local_tensor
assert np_arr.dtype == flow.convert_oneflow_dtype_to_numpy_dtype(
AssertionError
.
While copying the parameter "conv_out.bias", an exception occurred :

    Traceback (most recent call last):
      File "/home/kang/anaconda3/envs/kyt/lib/python3.8/site-packages/oneflow/nn/module.py", line 788, in

Logs

No response

System Info

diffusers-cli env
loaded library: /usr/lib/x86_64-linux-gnu/libibverbs.so.1

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

diffusers version: 0.4.0.dev0
Platform: Linux-5.3.0-050300rc1-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.10.1+cu111 (True)
Huggingface_hub version: 0.10.1
Transformers version: 4.23.0.dev0
Using GPU in script?:
Using distributed or parallel set-up in script?:

ImportError: cannot import name 'OneFlowStableDiffusionPipeline' from 'diffusers' (unknown location)

Describe the bug

Install 'diffusers' follow by
git clone https://github.com/Oneflow-Inc/diffusers.git
cd diffusers
python3 -m pip install -e .[oneflow]
but when run the case, it push out ImportError: cannot import name 'OneFlowStableDiffusionPipeline' from 'diffusers' (unknown location) error.

Reproduction

No response

Logs

Python 3.7.3 (default, Jan 22 2021, 20:04:44) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from diffusers import OneFlowStableDiffusionPipeline
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'OneFlowStableDiffusionPipeline' from 'diffusers' (unknown location)

System Info

cuda version: 11.3
torch version: 1.10.0

无法找到OneFlowStableDiffusionPipeline

Describe the bug

Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from diffusers import OneFlowStableDiffusionPipeline
╭───────────────────────────── Traceback (most recent call last) ──────────────────────────────╮
│ :1 in │
╰──────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'OneFlowStableDiffusionPipeline' from 'diffusers'
(/home/arthur/miniconda3/envs/dd/lib/python3.10/site-packages/diffusers/init.py)

In [2]: import diffusers

In [3]: diffusers.version
Out[3]: '0.12.1'

In [4]:

Reproduction

No response

Logs

No response

System Info

diffusers 0.12.1
ubuntu 20.04

env error

Describe the bug

docker image: oneflowinc/oneflow-sd:cu112

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

cuda version does not match!!

siliconflow / onediff Goto Github PK

onediff's People

Contributors

Stargazers

Watchers

Forkers

onediff's Issues

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Description

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

load the pipeline

or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

and pass model_id_or_path="./stable-diffusion-v1-5".

let's download an initial image

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

System Info

Describe the bug

Reproduction

Logs

System Info

Describe the bug

Reproduction

Logs

System Info

Description

Describe the bug

Reproduction

and pass `model_id_or_path="./stable-diffusion-v1-5"`.