showlab / datasetdm Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
Home Page: https://weijiawu.github.io/DatasetDM_page/
[NeurIPS2023] DatasetDM:Synthesizing Data with Perception Annotations Using Diffusion Models
Home Page: https://weijiawu.github.io/DatasetDM_page/
Thank you for providing such excellent work! I have a question I would like to consult with you.
I'm looking forward to your answers to these questions. Thank you.
Hello, thanks for your work. What is prompt txt files and how to obtain them.
Thanks in advance.
Dear author,
Thanks for your work.
I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1)1x1 conv 2)up sample 3)concat 4)mixed conv”
I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?
Looking forward to your reply.
python train_depth_NYU.py --save_name Train_250_images_t1_attention_transformer_NYU_10layers_NOAug
--config /workspace/my_code/DatasetDM/config/NYU/NYU_Depth.yaml
--image_limitation 250
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/my_code/DatasetDM/model/unet.py", line 153, in forward
reshape_h=h.reshape(int(h.size()[0]/2),int(h.size()[1]*2),h.size()[2],h.size()[3])
IndexError: tuple index out of range
In both kitti and nyu datasets prompt set as "a photo of"
And i don't see any substitution of prompt in training scripts
Thanks for your great work! May I wonder whether there is a way to train the model parallelly on multiple gpus so that the training could be speeded up?
Thanks for your great work.
When I run the training code ./tools/train_depth_NYU.py, I find that the file that needs to be imported cannot be found,
in ./tools/train_depth_NYU.py file line 25
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
and in model/unet.py line9
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
Looking forward to your reply.
I am not very familiar with depth estimation, but I am curious why class information is not added to prompt in NYU data reading code, while it is present in both VOC and COCO.
DatasetDM/dataset/nyudepthv2.py
Line 88 in 7545a68
Line 218 in 7545a68
Your work is fantastic. I would like to ask when you can release the weights for depth estimation
thanks for your work.
in trainng stage,why add 1 step noise to train P-decoder
Thanks for your great job. I have question about f_classes.append(1)
in dataset/COCO.py
519 line. Why there is not f_classes.append(classe)
?
f_classes = []
masks = []
for idx,(classe, segm) in enumerate(zip(classes,segms)):
poly_mask = polygons_to_bitmask(segm, *image.shape[:2])
if poly_mask.sum()<500:
continue
# if classe!=class_id:
# continue
f_classes.append(1)
masks.append(poly_mask)
# if len(f_classes)>0:
# break
@weijiawu thanks for sharing this wonderful lwork just had few queries
Thanks in advance
你好,有个问题想请教一下,在VOC2012.py中gt_classes为什么是1而不是select_class?
instances = {}
mapper_classes = [1]
instances["gt_classes"] = torch.tensor(mapper_classes, dtype=torch.int64)
masks = []
masks.append(mask == 1)
Hello, thanks for your work. When I run commands “sh ./script/augmentation_coco.sh” on the terminal, Such an error has occurred “FileNotFoundError: [Errno 2] No such file or directory: './DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/”. How should I solve it.
Thanks in advance.
First of all, thank you for coming up with such an outstanding job.
In addition, I saw that the main results in the paper are compared with the baseline trained with a small amount of real images(100,400, et.al). It will be a better job if there a comparison between baseline and DatasetDM in the case of training data of the same magnitude (real v.s.synthetic image).
Hello, thanks for your work. I tried following the tutorial, but when I download the weights using git lfs from the official huggingface repo, there is no config.json associated. I think it may be due to the diffusers version being outdated for the huggingface space.
Could you please provide more details about how to retrieve the correct weights for stable diffusion to be used in your project?
Thanks in advance.
In data_generation_NYU_depth.sh, --sd_ckpt './models/ldm/stable-diffusion-v1/stable_diffusion.ckpt', how to get this ckpt file。
No corresponding ckpt file found in the https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main
In the command: sh ./script/train_semantic_Cityscapes.sh to train the semantic result
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 813, in prepare_features
attention_maps_8s = aggregate_attention(attention_store, 8, ("up", "mid", "down"), True, select,prompts=prompts)
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 532, in aggregate_attention
for item in attention_maps[f"{location}{'cross' if is_cross else 'self'}"]:
KeyError: 'up_cross'
I found that in the paper . it is the transformer decoder part of preception decoder.
Hi,
Thanks for your great work!
Could you provide some guidance about the weights of pre-trained diffusion models which you used for training and generating? In the code, the pathes of pre-trained models are defined by "./dataset/ckpts/imagenet/xxx". I am not very clear about how I can get these weights. And I also want to know why not use the weights of pre-trained stable diffusion models directly here.
Best,
Jing
greate work! could you offer the p-decoder weights?
I really appreciate your great work.
When I try to torch.load of weights (https://drive.google.com/u/2/uc?id=12BzF51jwKpwmUb-jDyB5LHRrlk6DnfxR&export=download), the error "Ran out of input" occurs. Do you know the reason?
Thank you in advance.
Great work!! I have a question about code. When I use COCO Instance weights to generate images, the error "Tensors must have same number of dimensions: got 3 and 4" will be reported in model/diffusers/models/unet_2d_blocks.py. I am using the latest version of diffusers code and stable diffusion-1.4 , torch-1.10 + cu111 . The following is my error message:
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/parallel_generate_Instance_COCO_AnyClass.py", line 412, in sub_processor
images_here, x_t = ptp_utils.text2image(unet,vae,tokenizer,text_encoder,scheduler, prompts, controller, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=False)
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/unet.py", line 144, in forward
sample, up_samples = upsample_block(
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/diffusers/models/unet_2d_blocks.py", line 2202, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4
I'm really sorry, I can't take a screenshot due to my computer.Thank you so much!!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.