showlab / datasetdm Goto Github PK

[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models

Home Page: https://weijiawu.github.io/DatasetDM_page/

Python 99.12% Shell 0.88%

datasetdm's Issues

./DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/

Hello, thanks for your work. When I run commands “sh ./script/augmentation_coco.sh” on the terminal， Such an error has occurred “FileNotFoundError: [Errno 2] No such file or directory: './DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/”. How should I solve it.

Thanks in advance.

About prompt in NYU dataset

I am not very familiar with depth estimation, but I am curious why class information is not added to prompt in NYU data reading code, while it is present in both VOC and COCO.

DatasetDM/dataset/nyudepthv2.py

Line 88 in 7545a68

"prompt":"a photo of "}

DatasetDM/dataset/VOC2012.py

Line 218 in 7545a68

prompt = prompt_templates[0].format(self.classes_zero_shot[select_class])

Training not working.

Hello, thanks for your work. I tried following the tutorial, but when I download the weights using git lfs from the official huggingface repo, there is no config.json associated. I think it may be due to the diffusers version being outdated for the huggingface space.

Could you please provide more details about how to retrieve the correct weights for stable diffusion to be used in your project?

Thanks in advance.

gt_classes in VOC2012.py

你好，有个问题想请教一下，在VOC2012.py中gt_classes为什么是1而不是select_class？

instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)

https://github.com/showlab/DatasetDM/blob/7545a68f35d8aad37f7963c326ae7c759a9a89d2/dataset/VOC2012.py#L233C2-L234C82 #

In the train code，file not found，

Thanks for your great work.
When I run the training code ./tools/train_depth_NYU.py, I find that the file that needs to be imported cannot be found，
in ./tools/train_depth_NYU.py file line 25
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
and in model/unet.py line9
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel

Looking forward to your reply.

dataset mode not defined while training

while training the dataset, i find out some question that maybe due to some code wrong or miss, the traceback are below.

Sorry for bothering you these time.

Weights about Depth Estimation

Your work is fantastic. I would like to ask when you can release the weights for depth estimation

Error in init_latent in ptp_utils.py during training

When I use the command sh ./script/train_semantic_VOC.sh to initiate training.

In train_semantic_voc.py, the latents obtained from latents = vae.encode(image.to(device)).latent_dist.sample().detach() have the dimensions ([1, 4, 32, 32]).

In the function call:
images_here, x_t = ptp_utils.text2image(unet, vae, tokenizer, text_encoder, noise_scheduler, prompts, controller, latent=start_code, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=True)
the parameter latent=start_code has the dimensions ([1, 4, 32, 32]).

Eventually, an error occurs in the function def init_latent(latent, unet, height, width, generator, batch_size) in ptp_utils.py with the following message:
RuntimeError: The expanded size of the tensor (64) must match the existing size (32) at non-singleton dimension 3. Target sizes: [1, 4, 64, 64]. Tensor sizes: [1, 4, 32, 32]

Is it expected that the dimensions of latents obtained from vae.encode(image.to(device)).latent_dist.sample().detach() should be ([1, 4, 64, 64])?

Sorry for bothering you these time.

adapt the synthetic data to Mask2Fomer model

Now i have succeeded in generating synthetic data and data augmentation. The result is image with mask in txt mode, but the Mask2Former needs the json mode. I wonder how you solve this problem, you change the mode of the txt file or there are some other methods to link these two models.

generate_instance_coco

where can i find the checkpoint/Train_10_images_t1_attention_transformer_COCO_10layers_NoClass/latest_checkpoint.pth

How did you supply prompts in depth task?

In both kitti and nyu datasets prompt set as "a photo of"
And i don't see any substitution of prompt in training scripts

About Diffuser unet_2d_blocks.py

Great work！！ I have a question about code. When I use COCO Instance weights to generate images, the error "Tensors must have same number of dimensions: got 3 and 4" will be reported in model/diffusers/models/unet_2d_blocks.py. I am using the latest version of diffusers code and stable diffusion-1.4 , torch-1.10 + cu111 . The following is my error message:

File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/parallel_generate_Instance_COCO_AnyClass.py", line 412, in sub_processor
images_here, x_t = ptp_utils.text2image(unet,vae,tokenizer,text_encoder,scheduler, prompts, controller, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=False)
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/unet.py", line 144, in forward
sample, up_samples = upsample_block(
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/diffusers/models/unet_2d_blocks.py", line 2202, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

I'm really sorry, I can't take a screenshot due to my computer.Thank you so much！！！

在VOC2012.py中gt_classes为什么是1而不是select_class？

你好，有个问题想请教一下，在VOC2012.py中gt_classes为什么是1而不是select_class？

instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)

https://github.com/showlab/DatasetDM/blob/7545a68f35d8aad37f7963c326ae7c759a9a89d2/dataset/VOC2012.py#L233C2-L234C82 #

exception when trying to generate data

tried running:
sh script/data_generation_coco_instance.sh

and got this error:

Traceback (most recent call last):
File "/whistler/miniconda3/envs/DatasetDM/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/whistler/miniconda3/envs/DatasetDM/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/whistler/DatasetDM/tools/parallel_generate_Instance_COCO_AnyClass.py", line 284, in sub_processor
scheduler = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN).to(device).scheduler
File "/whistler/DatasetDM/model/diffusers/pipeline_utils.py", line 377, in from_pretrained
load_method = getattr(class_obj, load_method_name)
TypeError: getattr(): attribute name must be string

doing data augmentation with coco2017 dataset but no image or mask generate

The code works well and without mistake, but after finishing the task, i can only find two directories are made without anything generates. i'm checking the solutions on the internet but find none of them matches the problem. I wonder if anyone solved this question. or before doing the augmentation is there anything else have to do.

sh ./script/augmentation_coco.sh

here is the code.

P-decoder weights

greate work! could you offer the p-decoder weights?

How to get sd ckpt pretrain file?

In data_generation_NYU_depth.sh, --sd_ckpt './models/ldm/stable-diffusion-v1/stable_diffusion.ckpt', how to get this ckpt file。
No corresponding ckpt file found in the https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main

Pixel decoder design

Dear author,

Thanks for your work.

I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1）1x1 conv 2）up sample 3）concat 4）mixed conv”

I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?

Looking forward to your reply.

PreTrained Weights

@weijiawu thanks for sharing this wonderful lwork just had few queries

can you please share ur pretrained weights for cityscapes or kitti dataset
How much time does take to train th emodel from scratch for cityscapes or kitti dataset
Can we train the model with custom dataseet ? if so what are the overview changes to be made in the repo

Thanks in advance

Ran out of input

I really appreciate your great work.

When I try to torch.load of weights (https://drive.google.com/u/2/uc?id=12BzF51jwKpwmUb-jDyB5LHRrlk6DnfxR&export=download), the error "Ran out of input" occurs. Do you know the reason?

Thank you in advance.

Why is mapper_classes = [1] in VOC2012.py?

In VOC2012.py,
since mapper_classes = [1]
and dataset_dict["classes_str"] = [self.classes[el] for el in mapper_classes],
this means dataset_dict["classes_str"] is only [aeroplane].

In the code for generating semantic segmentation labels in train_semantic_voc.py,
outputs = seg_model(diffusion_features, controller, prompts, tokenizer, text_embeddings)
the parameter 'text_imbedding' is generated by 'class_name'
class_name is dataset_dict["classes_str"],
which means class_name is [aeroplane]. Is this correct?

Shouldn't text_embedding be generated from batch["prompt"]?

Main quantitative result.

First of all, thank you for coming up with such an outstanding job.
In addition, I saw that the main results in the paper are compared with the baseline trained with a small amount of real images(100,400, et.al). It will be a better job if there a comparison between baseline and DatasetDM in the case of training data of the same magnitude (real v.s.synthetic image).

No module named 'torch.distributed.algorithms.join

About the weights of pre-trained diffusion model

Hi,

Thanks for your great work!
Could you provide some guidance about the weights of pre-trained diffusion models which you used for training and generating? In the code, the pathes of pre-trained models are defined by "./dataset/ckpts/imagenet/xxx". I am not very clear about how I can get these weights. And I also want to know why not use the weights of pre-trained stable diffusion models directly here.

Best,
Jing

prepare NYU dataset

I follow MED to prepare the dataset on ./data and the folder structure is as follows.

│ ├── nyu
│ │ ├── basement_0001a
│ │ ├── basement_0001b
│ │ ├── ... (all scene names)
│ │ ├── split_file.txt

The nyu depth of the paper requires the following structure. What code should I refer to?

 nyudepthv2/
sync/
official_splits/
	test/
nyu_class_list.json
train_list.txt
test_list.txt

Accelerate training via training on multiple gpus

Thanks for your great work! May I wonder whether there is a way to train the model parallelly on multiple gpus so that the training could be speeded up?

Can't reproduce the results in other COCO-format instance segmentation dataset

Thanks for your great job!
I tried to train a P-Decoder for instance segmentation tasks in another COCO-format dataset and the mask results of P-Decoder seemed very poor. The dataset I used is CIS(Construction Instance Segmentation Dataset) and the total training epochs are 5000. Here is some results of P-Decoder in different stages(epochs), all of them seem to fail in annotating.

I just adjust the code for dataset loading, and all the training codes remains the same.
Are there extra tricks you used in the training process like warm-up? Thank you in advance!

The cross attention module part I faced the problem of cross attention map

In the command: sh ./script/train_semantic_Cityscapes.sh to train the semantic result

File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 813, in prepare_features
attention_maps_8s = aggregate_attention(attention_store, 8, ("up", "mid", "down"), True, select,prompts=prompts)
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 532, in aggregate_attention
for item in attention_maps[f"{location}{'cross' if is_cross else 'self'}"]:
KeyError: 'up_cross'
I found that in the paper . it is the transformer decoder part of preception decoder.

prompt txt files

Hello, thanks for your work. What is prompt txt files and how to obtain them.

Thanks in advance.

question about coco dataset code

Thanks for your great job. I have question about f_classes.append(1) in dataset/COCO.py 519 line. Why there is not f_classes.append(classe)?

        f_classes = []
        masks = []
        for idx,(classe, segm) in enumerate(zip(classes,segms)):
            poly_mask = polygons_to_bitmask(segm, *image.shape[:2])
            if poly_mask.sum()<500:
                continue
#                 if classe!=class_id:
#                     continue
            f_classes.append(1)
            masks.append(poly_mask)
#             if len(f_classes)>0:
#                 break

Train nyu dataset suffer error

python train_depth_NYU.py --save_name Train_250_images_t1_attention_transformer_NYU_10layers_NOAug
--config /workspace/my_code/DatasetDM/config/NYU/NYU_Depth.yaml
--image_limitation 250

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/my_code/DatasetDM/model/unet.py", line 153, in forward
reshape_h=h.reshape(int(h.size()[0]/2),int(h.size()[1]*2),h.size()[2],h.size()[3])
IndexError: tuple index out of range

Confusion about the quality of the generated images

Thank you for providing such excellent work! I have a question I would like to consult with you.

I noticed that if I input the GPT4 generated prompt directly into the Stable Diffusion v1.5, the resulting images may not be of good quality (e.g. missing objects, object relation errors). Have you done any post-processing on the generated images? (Not mentioned in the paper architecture diagram Fig.2)

I'm looking forward to your answers to these questions. Thank you.

CUDA out of memory - on 12 GB GPU

Are there any ways to optimize the process so that the OOM error can be avoided?

add noise problem

thanks for your work.
in trainng stage,why add 1 step noise to train P-decoder

showlab / datasetdm Goto Github PK

datasetdm's Issues

Recommend Projects

Recommend Topics

Recommend Org