showlab / datasetdm Goto Github PK

View Code? Open in Web Editor NEW

290.0 290.0 12.0 2.8 MB

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models

Home Page: https://weijiawu.github.io/DatasetDM_page/

Python 99.12% Shell 0.88%

datasetdm's People

Contributors

Stargazers

Watchers

Forkers

zengpeng175 janfschr liu-b-s chy7074646 proling1994 lesterzoeyxu lyl1015 steven-xiong hologerry leesoon1984

datasetdm's Issues

Confusion about the quality of the generated images

Thank you for providing such excellent work! I have a question I would like to consult with you.

I noticed that if I input the GPT4 generated prompt directly into the Stable Diffusion v1.5, the resulting images may not be of good quality (e.g. missing objects, object relation errors). Have you done any post-processing on the generated images? (Not mentioned in the paper architecture diagram Fig.2)

I'm looking forward to your answers to these questions. Thank you.

prompt txt files

Hello, thanks for your work. What is prompt txt files and how to obtain them.

Thanks in advance.

I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1）1x1 conv 2）up sample 3）concat 4）mixed conv”

I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?

Looking forward to your reply.

Train nyu dataset suffer error

python train_depth_NYU.py --save_name Train_250_images_t1_attention_transformer_NYU_10layers_NOAug
--config /workspace/my_code/DatasetDM/config/NYU/NYU_Depth.yaml
--image_limitation 250

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/my_code/DatasetDM/model/unet.py", line 153, in forward
reshape_h=h.reshape(int(h.size()[0]/2),int(h.size()[1]*2),h.size()[2],h.size()[3])
IndexError: tuple index out of range

How did you supply prompts in depth task?

In both kitti and nyu datasets prompt set as "a photo of"
And i don't see any substitution of prompt in training scripts

Accelerate training via training on multiple gpus

Thanks for your great work! May I wonder whether there is a way to train the model parallelly on multiple gpus so that the training could be speeded up?

In the train code，file not found，

Thanks for your great work.
When I run the training code ./tools/train_depth_NYU.py, I find that the file that needs to be imported cannot be found，
in ./tools/train_depth_NYU.py file line 25
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
and in model/unet.py line9
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel

Looking forward to your reply.

About prompt in NYU dataset

I am not very familiar with depth estimation, but I am curious why class information is not added to prompt in NYU data reading code, while it is present in both VOC and COCO.

DatasetDM/dataset/nyudepthv2.py

Line 88 in 7545a68

"prompt":"a photo of "}

DatasetDM/dataset/VOC2012.py

Line 218 in 7545a68

prompt = prompt_templates[0].format(self.classes_zero_shot[select_class])

Weights about Depth Estimation

Your work is fantastic. I would like to ask when you can release the weights for depth estimation

No module named 'torch.distributed.algorithms.join

add noise problem

thanks for your work.
in trainng stage,why add 1 step noise to train P-decoder

question about coco dataset code

Thanks for your great job. I have question about f_classes.append(1) in dataset/COCO.py 519 line. Why there is not f_classes.append(classe)?

        f_classes = []
        masks = []
        for idx,(classe, segm) in enumerate(zip(classes,segms)):
            poly_mask = polygons_to_bitmask(segm, *image.shape[:2])
            if poly_mask.sum()<500:
                continue
#                 if classe!=class_id:
#                     continue
            f_classes.append(1)
            masks.append(poly_mask)
#             if len(f_classes)>0:
#                 break

PreTrained Weights

@weijiawu thanks for sharing this wonderful lwork just had few queries

can you please share ur pretrained weights for cityscapes or kitti dataset
How much time does take to train th emodel from scratch for cityscapes or kitti dataset
Can we train the model with custom dataseet ? if so what are the overview changes to be made in the repo

Thanks in advance

gt_classes in VOC2012.py

你好，有个问题想请教一下，在VOC2012.py中gt_classes为什么是1而不是select_class？

instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)

https://github.com/showlab/DatasetDM/blob/7545a68f35d8aad37f7963c326ae7c759a9a89d2/dataset/VOC2012.py#L233C2-L234C82 #

./DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/

Hello, thanks for your work. When I run commands “sh ./script/augmentation_coco.sh” on the terminal， Such an error has occurred “FileNotFoundError: [Errno 2] No such file or directory: './DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/”. How should I solve it.

Thanks in advance.

Main quantitative result.

First of all, thank you for coming up with such an outstanding job.
In addition, I saw that the main results in the paper are compared with the baseline trained with a small amount of real images(100,400, et.al). It will be a better job if there a comparison between baseline and DatasetDM in the case of training data of the same magnitude (real v.s.synthetic image).

Training not working.

Hello, thanks for your work. I tried following the tutorial, but when I download the weights using git lfs from the official huggingface repo, there is no config.json associated. I think it may be due to the diffusers version being outdated for the huggingface space.

Could you please provide more details about how to retrieve the correct weights for stable diffusion to be used in your project?

Thanks in advance.

How to get sd ckpt pretrain file?

In data_generation_NYU_depth.sh, --sd_ckpt './models/ldm/stable-diffusion-v1/stable_diffusion.ckpt', how to get this ckpt file。
No corresponding ckpt file found in the https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main

The cross attention module part I faced the problem of cross attention map

In the command: sh ./script/train_semantic_Cityscapes.sh to train the semantic result

File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 813, in prepare_features
attention_maps_8s = aggregate_attention(attention_store, 8, ("up", "mid", "down"), True, select,prompts=prompts)
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 532, in aggregate_attention
for item in attention_maps[f"{location}{'cross' if is_cross else 'self'}"]:
KeyError: 'up_cross'
I found that in the paper . it is the transformer decoder part of preception decoder.

About the weights of pre-trained diffusion model

Hi,

Thanks for your great work!
Could you provide some guidance about the weights of pre-trained diffusion models which you used for training and generating? In the code, the pathes of pre-trained models are defined by "./dataset/ckpts/imagenet/xxx". I am not very clear about how I can get these weights. And I also want to know why not use the weights of pre-trained stable diffusion models directly here.

Best,
Jing

P-decoder weights

greate work! could you offer the p-decoder weights?

Ran out of input

I really appreciate your great work.

When I try to torch.load of weights (https://drive.google.com/u/2/uc?id=12BzF51jwKpwmUb-jDyB5LHRrlk6DnfxR&export=download), the error "Ran out of input" occurs. Do you know the reason?

Thank you in advance.

About Diffuser unet_2d_blocks.py

Great work！！ I have a question about code. When I use COCO Instance weights to generate images, the error "Tensors must have same number of dimensions: got 3 and 4" will be reported in model/diffusers/models/unet_2d_blocks.py. I am using the latest version of diffusers code and stable diffusion-1.4 , torch-1.10 + cu111 . The following is my error message:

File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/parallel_generate_Instance_COCO_AnyClass.py", line 412, in sub_processor
images_here, x_t = ptp_utils.text2image(unet,vae,tokenizer,text_encoder,scheduler, prompts, controller, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=False)
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/unet.py", line 144, in forward
sample, up_samples = upsample_block(
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/diffusers/models/unet_2d_blocks.py", line 2202, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

I'm really sorry, I can't take a screenshot due to my computer.Thank you so much！！！

showlab / datasetdm Goto Github PK

datasetdm's People

Contributors

Stargazers

Watchers

Forkers

datasetdm's Issues

Recommend Projects

Recommend Topics

Recommend Org