gaopengcuhk / smca-detr Goto Github PK

View Code? Open in Web Editor NEW

166.0 166.0 22.0 556 KB

Dockerfile 0.09% Python 98.60% Cuda 1.31%

smca-detr's People

Contributors

Stargazers

Watchers

smca-detr's Issues

Question about the FLOPs in Table 1

Hi, why the FLOPs is much smaller than the DETR? As the SMCA uses multiple-level features, it should have larger FLOPs intuitively.

And what operations make the lower FLOPs have a slower inference speed?
For example, SMCA-DC5 153GFLOPs 0.100s vs. DETR 187GFLOPs 0.079.

关于多尺度SMCA的代码

作者您好，我看了下您的多尺度代码，您的在多尺度code中transformer的编码层是完全用卷积实现的，没有用到attn，您这么做的意义是什么呢？

Some question about the SMCA module and the code

Hi，I'm very interested in your work about the newly decoder of DETR。I have some questions about your code：

When I debugged your source code, I didn't find the use of encoder in FPN, such as Intra-Scale self-attention and multi-scale self- attention mentioned in the paper; And the Scale Selection network in the decoder。
Does the 'type1'-'type4' parameters in the code only change how to generate the Gaussian-like weights map?
In the decoder layer in Transformer，the propagate process：
out = self.norm4(tgt + query_pos)
point_sigmoid_offset = self.point2(out)
Is the point_sigmoid_offset parameter corresponds to sw and sh in the paper？

if self.layer_index == 0:
point_sigmoid_ref_inter = self.point1(out)
point_sigmoid_ref = point_sigmoid_ref_inter.sigmoid()
point_sigmoid_ref = (h_w - 0) * point_sigmoid_ref / 32
point_sigmoid_ref = point_sigmoid_ref.repeat(1, 1, 8) # [100, bs, 2] -> [100, bs, 16]
else:
point_sigmoid_ref = point_ref_previous
point = point_sigmoid_ref + point_sigmoid_offset

Is the point_sigmoid_ref parameter corresponds to cw and ch in the paper? Why do these two parameters need to be added?
distance = (point.unsqueeze(1) - grid.unsqueeze(0)).pow(2)
This step corresponds to (i-cw) ^ 2 + (j-ch) ^ 2 in G (I, J) in the paper;

  if self.dynamic_scale == "type1":
        scale = 1
        distance = distance.sum(-1) * scale 
    elif self.dynamic_scale == "type2": # 对于type2:对out2再做一次线性映射 
        scale = self.point3(out)  # [100, bs, 256] -> [100, bs, 8]
        scale = scale * scale     # 
        scale = scale.reshape(tgt_len, -1).unsqueeze(1)
        distance = distance.sum(-1) * scale
    elif self.dynamic_scale == "type3":
        scale = self.point3(out)
        scale = scale * scale
        scale = scale.reshape(tgt_len, -1, 2).unsqueeze(1)
        distance = (distance * scale).sum(-1)
    elif self.dynamic_scale == "type4":
        scale = self.point3(out)
        scale = scale * scale
        scale = scale.reshape(tgt_len, -1, 3).unsqueeze(1)
        distance = torch.cat([distance, torch.prod(distance, dim=-1, keepdim=True)], dim=-1)
        distance = (distance * scale).sum(-1)
    # generate Gaussian-like weight map
    gaussian = -(distance - 0).abs() / self.smooth

According to the operation in the code, it seems that G (I, J) in the paper can not be obtained, I don't know what these steps meaning?
4. In addition, the paper says that logGi' needs to be added to generate the Co-attention weights map in the Co-attention of the decoder in the transformer, but the code is set as follows:
attn_output_weights = attn_output_weights + gaussian[0].permute(2, 0, 1)

I sincerely hope you can help me solve these problems . Thanks !

ACT module release?

I'm sorry to ask that, the code for the paper End-to-end object detection with adaptive clustering transformer is not publicly available, that's right?

40.38 AP for ResNet50 single level feature

Hi,

I trained the model with 8 nodes, each of which has 8 GPUs. I can only obtain 38.46 AP. I wonder if you have trained those models with multiple nodes and larger batch size, 8 x 16?

When will the code be released?

Hello, recently I`m doing some research on SMCA-single scale so I would like to know when will the code be released? Thansk.

ulti_head_attention_forward gaussian[0].permute(2, 0, 1) RuntimeError: number of dims don't match in permute

ulti_head_attention_forward
gaussian[0].permute(2, 0, 1)
RuntimeError: number of dims don't match in permute

Does h_w shape is 1,Bs, 2?

for example: torch.Size([1, 10, 2])?

I got this error when run your code.

pretrained model download link

what's the modification compare with original DETR

Hi, just wonder, except for the GaussianAttention and the transformers.py, anything else changed compare with original DETR?

请问能否开源ACT论文中Knowlwdge Disillation Loss相关代码

Pre-trained Model on VG

Hi,

Thanks for your work. I noticed in your repository description that you may have experiments on the VG dataset, and I was wondering if you would have a pre-trained model on VG available to share. Thank you for your attention.

Kind regards,
Romero

OOM for ResNet50-DC5 even for 32GB GPUs

Hi,

I tried to run ResNet50-DC5 for SMCA-DETR on 8 GPUs with 32 GB memory each. It shows OOM error when using ResNet50-DC5 with batch size 16. Why it uses so much memory than other models?

Searching for help about the visiulization image.

It's A nice job.
Could you guide/tell me how to plot the visualization co-attention image of FIgure 2/3 in your paper. I want to visualise some my own images in your repo.
Thanks very much.

How to use target GPU for training？

$3EFVR_ZLQHA`CZ)C LV{481$

你好，作者，我想使用指定GPU来完成多卡训练，为什么这里会报错呢？请问该如何指定GPU型号来完成多GPU训练，是需要在代码中修改吗？
希望可以收到您的答复

IndexError: list index out of range when running inference

Hello,

When I run a modified d2go version of the code, it shows index error. Did you see this error before?

I wonder if my h_w is not set correctly? I feel confused about the samples[0], samples[1] (target)? It seems h_w is a concatenation of original images? Then why we need a samples[1] to obtain h_w? So I change the code with the following code,

h_w = torch.stack([torch.tensor([inst.shape[-2] for inst in samples]), torch.tensor([inst.shape[-1] for inst in samples])], dim=-1).

About multi-scale SMCA

Hello! Can you send an unpolished version of multi-scale SMCA to me？I want to know how it is realized.

An error occurred while using ACT

Hello, I have come across the following issue while using ACT, and I would greatly appreciate it if I could receive some assistance:

File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 112, in forward
q_groups, q_counts = self._create_clusters(queries, self.q_hashes)
File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 103, in _create_clusters
groups, counts = ada_cluster(hashes, n_hashes=n_hashes)
TypeError: ada_cluster() got an unexpected keyword argument 'n_hashes'

Also, while reviewing the code in the ACT folder, I noticed that the variable “q_hashes” in ACT/extensions/init.py is not being used.

How to use target GPU for training？

$3EFVR_ZLQHA`CZ)C LV{481$

multiscale implementation

Hi, I checked the implementation. But I did not find the code implement the multiscale part. Would you like to add it?

Question about use multi-GPU for training

Hi， I want to try use 4 gpus to train your model DMS_MH_GMCA_resnet50 by myself, but it's report a warning,，like this：

Will this warning affect the accuracy of training？
Thanks ！

ACT model compute flops

Dir sir,
thanks for your work on ACT model in transformer.I tried to integrate the ACT module into the task and modify the network following the pattern of SMCA-DETR/Adaptive_Cluster_Transformer/.My task is an object detection task, modified based on DETR.After act module was added, the overall training time of the task decreased by 2 hours, and the decrease of mAP just declare 0.6 as the paper say.However, when compute FLOPS, the total FLOPS on the network increased after ACT module added. I think it may be my compute code error. My flop calculation code is modified based on the original detr FLOPS calculation code, I don't know if there is any extra consideration for attention calculation.I would appreciate it if you could share information about the FLOP calculation code for the ACT module or point out what went wrong with my calculation code.

Training coco metrics seems to be good but when trained model validated using --eval mode, only garbage values are printed.

Thank you for the amazing work.
I have trained model for 50 epochs and while training, periodic eval seems to be reasonable and improving.

But then, when I use trained model on same validation data by using --eval argument and defining path for "checkpoint.pth" in d2/configs/detr_256_6_6_torchvision.yaml config, I am getting all zeros.

my ground truth val.json in (xywh) format.
After investigating more,

I found that DataLoader loads ground truth boxes in normalize form but predicted output boxes, those are in unnormalized form
because of following lines from engine.py in evaluate function.

"
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
results = postprocessors['bbox'](outputs, orig_target_sizes)
"

even after fixing predicted bbox to normalize or unnormalize groud truth bbox in DataLoader class, still I am getting all zeros in coco detection metrics.

I am not sure though training script call same function for periodic evaluation, why using --eval argument on same data produces completely different result?

Am I using some different model instead of trained one? I changed weight field in d2/configs/detr_256_6_6_torchvision.yaml config to model's "checkpoint.pth" .

Please let me know if you need any additional information.
Thanks.

About input format

at this line code:

if isinstance(samples[0], (list, torch.Tensor)):
            samples[0] = nested_tensor_from_tensor_list(samples[0])

Since samples already is NestedTensor, why still need nested_tensor_from_tensor?

Besides, have u tested d2 version of SMCA? It's not work... the input is quit different from original DETR

Code for calculating Flops

Hi~, could you share the code for calculating flops?

gaopengcuhk / smca-detr Goto Github PK

smca-detr's People

Contributors

Stargazers

Watchers

Forkers

smca-detr's Issues

Recommend Projects

Recommend Topics

Recommend Org