showlab / datasetdm Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
Home Page: https://weijiawu.github.io/DatasetDM_page/
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
Home Page: https://weijiawu.github.io/DatasetDM_page/
Hello, thanks for your work. When I run commands “sh ./script/augmentation_coco.sh” on the terminal, Such an error has occurred “FileNotFoundError: [Errno 2] No such file or directory: './DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/”. How should I solve it.
Thanks in advance.
I am not very familiar with depth estimation, but I am curious why class information is not added to prompt in NYU data reading code, while it is present in both VOC and COCO.
DatasetDM/dataset/nyudepthv2.py
Line 88 in 7545a68
Line 218 in 7545a68
Hello, thanks for your work. I tried following the tutorial, but when I download the weights using git lfs from the official huggingface repo, there is no config.json associated. I think it may be due to the diffusers version being outdated for the huggingface space.
Could you please provide more details about how to retrieve the correct weights for stable diffusion to be used in your project?
Thanks in advance.
你好,有个问题想请教一下,在VOC2012.py中gt_classes为什么是1而不是select_class?
instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)
Thanks for your great work.
When I run the training code ./tools/train_depth_NYU.py, I find that the file that needs to be imported cannot be found,
in ./tools/train_depth_NYU.py file line 25
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
and in model/unet.py line9
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
Looking forward to your reply.
Your work is fantastic. I would like to ask when you can release the weights for depth estimation
When I use the command sh ./script/train_semantic_VOC.sh to initiate training.
In train_semantic_voc.py, the latents obtained from latents = vae.encode(image.to(device)).latent_dist.sample().detach() have the dimensions ([1, 4, 32, 32]).
In the function call:
images_here, x_t = ptp_utils.text2image(unet, vae, tokenizer, text_encoder, noise_scheduler, prompts, controller, latent=start_code, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=True)
the parameter latent=start_code has the dimensions ([1, 4, 32, 32]).
Eventually, an error occurs in the function def init_latent(latent, unet, height, width, generator, batch_size) in ptp_utils.py with the following message:
RuntimeError: The expanded size of the tensor (64) must match the existing size (32) at non-singleton dimension 3. Target sizes: [1, 4, 64, 64]. Tensor sizes: [1, 4, 32, 32]
Is it expected that the dimensions of latents obtained from vae.encode(image.to(device)).latent_dist.sample().detach() should be ([1, 4, 64, 64])?
Sorry for bothering you these time.
Now i have succeeded in generating synthetic data and data augmentation. The result is image with mask in txt mode, but the Mask2Former needs the json mode. I wonder how you solve this problem, you change the mode of the txt file or there are some other methods to link these two models.
where can i find the checkpoint/Train_10_images_t1_attention_transformer_COCO_10layers_NoClass/latest_checkpoint.pth
In both kitti and nyu datasets prompt set as "a photo of"
And i don't see any substitution of prompt in training scripts
Great work!! I have a question about code. When I use COCO Instance weights to generate images, the error "Tensors must have same number of dimensions: got 3 and 4" will be reported in model/diffusers/models/unet_2d_blocks.py. I am using the latest version of diffusers code and stable diffusion-1.4 , torch-1.10 + cu111 . The following is my error message:
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/parallel_generate_Instance_COCO_AnyClass.py", line 412, in sub_processor
images_here, x_t = ptp_utils.text2image(unet,vae,tokenizer,text_encoder,scheduler, prompts, controller, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=False)
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/unet.py", line 144, in forward
sample, up_samples = upsample_block(
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/diffusers/models/unet_2d_blocks.py", line 2202, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4
I'm really sorry, I can't take a screenshot due to my computer.Thank you so much!!!
你好,有个问题想请教一下,在VOC2012.py中gt_classes为什么是1而不是select_class?
instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)
tried running:
sh script/data_generation_coco_instance.sh
and got this error:
Traceback (most recent call last):
File "/whistler/miniconda3/envs/DatasetDM/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/whistler/miniconda3/envs/DatasetDM/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/whistler/DatasetDM/tools/parallel_generate_Instance_COCO_AnyClass.py", line 284, in sub_processor
scheduler = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=MY_TOKEN).to(device).scheduler
File "/whistler/DatasetDM/model/diffusers/pipeline_utils.py", line 377, in from_pretrained
load_method = getattr(class_obj, load_method_name)
TypeError: getattr(): attribute name must be string
The code works well and without mistake, but after finishing the task, i can only find two directories are made without anything generates. i'm checking the solutions on the internet but find none of them matches the problem. I wonder if anyone solved this question. or before doing the augmentation is there anything else have to do.
sh ./script/augmentation_coco.sh
here is the code.
greate work! could you offer the p-decoder weights?
In data_generation_NYU_depth.sh, --sd_ckpt './models/ldm/stable-diffusion-v1/stable_diffusion.ckpt', how to get this ckpt file。
No corresponding ckpt file found in the https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main
Dear author,
Thanks for your work.
I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1)1x1 conv 2)up sample 3)concat 4)mixed conv”
I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?
Looking forward to your reply.
@weijiawu thanks for sharing this wonderful lwork just had few queries
Thanks in advance
I really appreciate your great work.
When I try to torch.load of weights (https://drive.google.com/u/2/uc?id=12BzF51jwKpwmUb-jDyB5LHRrlk6DnfxR&export=download), the error "Ran out of input" occurs. Do you know the reason?
Thank you in advance.
In VOC2012.py,
since mapper_classes = [1]
and dataset_dict["classes_str"] = [self.classes[el] for el in mapper_classes],
this means dataset_dict["classes_str"] is only [aeroplane].
In the code for generating semantic segmentation labels in train_semantic_voc.py,
outputs = seg_model(diffusion_features, controller, prompts, tokenizer, text_embeddings)
the parameter 'text_imbedding' is generated by 'class_name'
class_name is dataset_dict["classes_str"],
which means class_name is [aeroplane]. Is this correct?
Shouldn't text_embedding be generated from batch["prompt"]?
First of all, thank you for coming up with such an outstanding job.
In addition, I saw that the main results in the paper are compared with the baseline trained with a small amount of real images(100,400, et.al). It will be a better job if there a comparison between baseline and DatasetDM in the case of training data of the same magnitude (real v.s.synthetic image).
Hi,
Thanks for your great work!
Could you provide some guidance about the weights of pre-trained diffusion models which you used for training and generating? In the code, the pathes of pre-trained models are defined by "./dataset/ckpts/imagenet/xxx". I am not very clear about how I can get these weights. And I also want to know why not use the weights of pre-trained stable diffusion models directly here.
Best,
Jing
I follow MED to prepare the dataset on ./data and the folder structure is as follows.
│ ├── nyu
│ │ ├── basement_0001a
│ │ ├── basement_0001b
│ │ ├── ... (all scene names)
│ │ ├── split_file.txt
The nyu depth of the paper requires the following structure. What code should I refer to?
nyudepthv2/
sync/
official_splits/
test/
nyu_class_list.json
train_list.txt
test_list.txt
Thanks for your great work! May I wonder whether there is a way to train the model parallelly on multiple gpus so that the training could be speeded up?
Thanks for your great job!
I tried to train a P-Decoder for instance segmentation tasks in another COCO-format dataset and the mask results of P-Decoder seemed very poor. The dataset I used is CIS(Construction Instance Segmentation Dataset) and the total training epochs are 5000. Here is some results of P-Decoder in different stages(epochs), all of them seem to fail in annotating.
I just adjust the code for dataset loading, and all the training codes remains the same.
Are there extra tricks you used in the training process like warm-up? Thank you in advance!
In the command: sh ./script/train_semantic_Cityscapes.sh to train the semantic result
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 813, in prepare_features
attention_maps_8s = aggregate_attention(attention_store, 8, ("up", "mid", "down"), True, select,prompts=prompts)
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 532, in aggregate_attention
for item in attention_maps[f"{location}{'cross' if is_cross else 'self'}"]:
KeyError: 'up_cross'
I found that in the paper . it is the transformer decoder part of preception decoder.
Hello, thanks for your work. What is prompt txt files and how to obtain them.
Thanks in advance.
Thanks for your great job. I have question about f_classes.append(1)
in dataset/COCO.py
519 line. Why there is not f_classes.append(classe)
?
f_classes = []
masks = []
for idx,(classe, segm) in enumerate(zip(classes,segms)):
poly_mask = polygons_to_bitmask(segm, *image.shape[:2])
if poly_mask.sum()<500:
continue
# if classe!=class_id:
# continue
f_classes.append(1)
masks.append(poly_mask)
# if len(f_classes)>0:
# break
python train_depth_NYU.py --save_name Train_250_images_t1_attention_transformer_NYU_10layers_NOAug
--config /workspace/my_code/DatasetDM/config/NYU/NYU_Depth.yaml
--image_limitation 250
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/my_code/DatasetDM/model/unet.py", line 153, in forward
reshape_h=h.reshape(int(h.size()[0]/2),int(h.size()[1]*2),h.size()[2],h.size()[3])
IndexError: tuple index out of range
Thank you for providing such excellent work! I have a question I would like to consult with you.
I'm looking forward to your answers to these questions. Thank you.
Are there any ways to optimize the process so that the OOM error can be avoided?
thanks for your work.
in trainng stage,why add 1 step noise to train P-decoder
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.