Coder Social home page Coder Social logo

datasetdm's People

Contributors

weijiawu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasetdm's Issues

Confusion about the quality of the generated images

Thank you for providing such excellent work! I have a question I would like to consult with you.

  • I noticed that if I input the GPT4 generated prompt directly into the Stable Diffusion v1.5, the resulting images may not be of good quality (e.g. missing objects, object relation errors). Have you done any post-processing on the generated images? (Not mentioned in the paper architecture diagram Fig.2)

I'm looking forward to your answers to these questions. Thank you.

prompt txt files

Hello, thanks for your work. What is prompt txt files and how to obtain them.

Thanks in advance.

Pixel decoder design

Dear author,

Thanks for your work.

I have a little confusion about the design of the pixel decoder. “the pixel decoder consists of several straightforward up-sampling layers. Each layer comprises four types of computations 1)1x1 conv 2)up sample 3)concat 4)mixed conv”

I want to confirm after upsampling to the spatial size of 64x64 and fusing the 64x64 feature from the unet, how to get the per-pixel embedding say 512x512? Just do three more layers using 1) 2) and 4) without 3)?

Looking forward to your reply.

Train nyu dataset suffer error

python train_depth_NYU.py --save_name Train_250_images_t1_attention_transformer_NYU_10layers_NOAug
--config /workspace/my_code/DatasetDM/config/NYU/NYU_Depth.yaml
--image_limitation 250

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/workspace/my_code/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/workspace/my_code/DatasetDM/model/unet.py", line 153, in forward
reshape_h=h.reshape(int(h.size()[0]/2),int(h.size()[1]*2),h.size()[2],h.size()[3])
IndexError: tuple index out of range

In the train code,file not found,

Thanks for your great work.
When I run the training code ./tools/train_depth_NYU.py, I find that the file that needs to be imported cannot be found,
in ./tools/train_depth_NYU.py file line 25
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel
and in model/unet.py line9
from model.diffusers.models.unet_2d_condition import UNet2DConditionModel

Looking forward to your reply.

add noise problem

thanks for your work.
in trainng stage,why add 1 step noise to train P-decoder

question about coco dataset code

Thanks for your great job. I have question about f_classes.append(1) in dataset/COCO.py 519 line. Why there is not f_classes.append(classe)?

        f_classes = []
        masks = []
        for idx,(classe, segm) in enumerate(zip(classes,segms)):
            poly_mask = polygons_to_bitmask(segm, *image.shape[:2])
            if poly_mask.sum()<500:
                continue
#                 if classe!=class_id:
#                     continue
            f_classes.append(1)
            masks.append(poly_mask)
#             if len(f_classes)>0:
#                 break

PreTrained Weights

@weijiawu thanks for sharing this wonderful lwork just had few queries

  1. can you please share ur pretrained weights for cityscapes or kitti dataset
  2. How much time does take to train th emodel from scratch for cityscapes or kitti dataset
  3. Can we train the model with custom dataseet ? if so what are the overview changes to be made in the repo

Thanks in advance

./DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/

Hello, thanks for your work. When I run commands “sh ./script/augmentation_coco.sh” on the terminal, Such an error has occurred “FileNotFoundError: [Errno 2] No such file or directory: './DataDiffusion/COCO_Train_5_images_t1_10layers_NoClass/Image/”. How should I solve it.

Thanks in advance.

Main quantitative result.

First of all, thank you for coming up with such an outstanding job.
In addition, I saw that the main results in the paper are compared with the baseline trained with a small amount of real images(100,400, et.al). It will be a better job if there a comparison between baseline and DatasetDM in the case of training data of the same magnitude (real v.s.synthetic image).

Training not working.

Hello, thanks for your work. I tried following the tutorial, but when I download the weights using git lfs from the official huggingface repo, there is no config.json associated. I think it may be due to the diffusers version being outdated for the huggingface space.

Could you please provide more details about how to retrieve the correct weights for stable diffusion to be used in your project?

Thanks in advance.

The cross attention module part I faced the problem of cross attention map

In the command: sh ./script/train_semantic_Cityscapes.sh to train the semantic result

File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 813, in prepare_features
attention_maps_8s = aggregate_attention(attention_store, 8, ("up", "mid", "down"), True, select,prompts=prompts)
File "/root/project_yuxuan/DatasetDM/model/segment/transformer_decoder.py", line 532, in aggregate_attention
for item in attention_maps[f"{location}
{'cross' if is_cross else 'self'}"]:
KeyError: 'up_cross'
I found that in the paper . it is the transformer decoder part of preception decoder.

About the weights of pre-trained diffusion model

Hi,

Thanks for your great work!
Could you provide some guidance about the weights of pre-trained diffusion models which you used for training and generating? In the code, the pathes of pre-trained models are defined by "./dataset/ckpts/imagenet/xxx". I am not very clear about how I can get these weights. And I also want to know why not use the weights of pre-trained stable diffusion models directly here.

image

Best,
Jing

About Diffuser unet_2d_blocks.py

Great work!! I have a question about code. When I use COCO Instance weights to generate images, the error "Tensors must have same number of dimensions: got 3 and 4" will be reported in model/diffusers/models/unet_2d_blocks.py. I am using the latest version of diffusers code and stable diffusion-1.4 , torch-1.10 + cu111 . The following is my error message:

File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/parallel_generate_Instance_COCO_AnyClass.py", line 412, in sub_processor
images_here, x_t = ptp_utils.text2image(unet,vae,tokenizer,text_encoder,scheduler, prompts, controller, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=False)
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 212, in text2image
latents = diffusion_step_DDM(Unet, scheduler, controller, latents, context, t, guidance_scale, low_resource)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/ptp_utils.py", line 141, in diffusion_step_DDM
noise_pred = unet(latents_input, t, encoder_hidden_states=context)["sample"]
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/unet.py", line 144, in forward
sample, up_samples = upsample_block(
File "/home/xhm/anaconda3/envs/datasetdm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/xhm/Desktop/Code/Diffusion/DatasetDM/model/diffusers/models/unet_2d_blocks.py", line 2202, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Tensors must have same number of dimensions: got 3 and 4

I'm really sorry, I can't take a screenshot due to my computer.Thank you so much!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.