Coder Social home page Coder Social logo

smca-detr's People

Contributors

gaopengcuhk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

smca-detr's Issues

Question about the FLOPs in Table 1

Hi, why the FLOPs is much smaller than the DETR? As the SMCA uses multiple-level features, it should have larger FLOPs intuitively.

And what operations make the lower FLOPs have a slower inference speed?
For example, SMCA-DC5 153GFLOPs 0.100s vs. DETR 187GFLOPs 0.079.

image

关于多尺度SMCA的代码

作者您好,我看了下您的多尺度代码,您的在多尺度code中transformer的编码层是完全用卷积实现的,没有用到attn,您这么做的意义是什么呢?

Some question about the SMCA module and the code

Hi,I'm very interested in your work about the newly decoder of DETR。I have some questions about your code:

  1. When I debugged your source code, I didn't find the use of encoder in FPN, such as Intra-Scale self-attention and multi-scale self- attention mentioned in the paper; And the Scale Selection network in the decoder。

  2. Does the 'type1'-'type4' parameters in the code only change how to generate the Gaussian-like weights map?

  3. In the decoder layer in Transformer,the propagate process:
    out = self.norm4(tgt + query_pos)
    point_sigmoid_offset = self.point2(out)
    Is the point_sigmoid_offset parameter corresponds to sw and sh in the paper?

    if self.layer_index == 0:
    point_sigmoid_ref_inter = self.point1(out)
    point_sigmoid_ref = point_sigmoid_ref_inter.sigmoid()
    point_sigmoid_ref = (h_w - 0) * point_sigmoid_ref / 32
    point_sigmoid_ref = point_sigmoid_ref.repeat(1, 1, 8) # [100, bs, 2] -> [100, bs, 16]
    else:
    point_sigmoid_ref = point_ref_previous
    point = point_sigmoid_ref + point_sigmoid_offset

Is the point_sigmoid_ref parameter corresponds to cw and ch in the paper? Why do these two parameters need to be added?
distance = (point.unsqueeze(1) - grid.unsqueeze(0)).pow(2)
This step corresponds to (i-cw) ^ 2 + (j-ch) ^ 2 in G (I, J) in the paper;

  if self.dynamic_scale == "type1":
        scale = 1
        distance = distance.sum(-1) * scale 
    elif self.dynamic_scale == "type2": # 对于type2:对out2再做一次线性映射 
        scale = self.point3(out)  # [100, bs, 256] -> [100, bs, 8]
        scale = scale * scale     # 
        scale = scale.reshape(tgt_len, -1).unsqueeze(1)
        distance = distance.sum(-1) * scale
    elif self.dynamic_scale == "type3":
        scale = self.point3(out)
        scale = scale * scale
        scale = scale.reshape(tgt_len, -1, 2).unsqueeze(1)
        distance = (distance * scale).sum(-1)
    elif self.dynamic_scale == "type4":
        scale = self.point3(out)
        scale = scale * scale
        scale = scale.reshape(tgt_len, -1, 3).unsqueeze(1)
        distance = torch.cat([distance, torch.prod(distance, dim=-1, keepdim=True)], dim=-1)
        distance = (distance * scale).sum(-1)
    # generate Gaussian-like weight map
    gaussian = -(distance - 0).abs() / self.smooth

According to the operation in the code, it seems that G (I, J) in the paper can not be obtained, I don't know what these steps meaning?
4. In addition, the paper says that logGi' needs to be added to generate the Co-attention weights map in the Co-attention of the decoder in the transformer, but the code is set as follows:
attn_output_weights = attn_output_weights + gaussian[0].permute(2, 0, 1)

I sincerely hope you can help me solve these problems . Thanks !

ACT module release?

I'm sorry to ask that, the code for the paper End-to-end object detection with adaptive clustering transformer is not publicly available, that's right?

40.38 AP for ResNet50 single level feature

Hi,

I trained the model with 8 nodes, each of which has 8 GPUs. I can only obtain 38.46 AP. I wonder if you have trained those models with multiple nodes and larger batch size, 8 x 16?

Pre-trained Model on VG

Hi,

Thanks for your work. I noticed in your repository description that you may have experiments on the VG dataset, and I was wondering if you would have a pre-trained model on VG available to share. Thank you for your attention.

Kind regards,
Romero

OOM for ResNet50-DC5 even for 32GB GPUs

Hi,

I tried to run ResNet50-DC5 for SMCA-DETR on 8 GPUs with 32 GB memory each. It shows OOM error when using ResNet50-DC5 with batch size 16. Why it uses so much memory than other models?

How to use target GPU for training?

3EFVR_ZLQHA`CZ)C LV{481
J 6{N G(Z(K0@~X4BCDRLA5

你好,作者,我想使用指定GPU来完成多卡训练,为什么这里会报错呢?请问该如何指定GPU型号来完成多GPU训练,是需要在代码中修改吗?
希望可以收到您的答复

IndexError: list index out of range when running inference

Hello,

When I run a modified d2go version of the code, it shows index error. Did you see this error before?

I wonder if my h_w is not set correctly? I feel confused about the samples[0], samples[1] (target)? It seems h_w is a concatenation of original images? Then why we need a samples[1] to obtain h_w? So I change the code with the following code,

h_w = torch.stack([torch.tensor([inst.shape[-2] for inst in samples]), torch.tensor([inst.shape[-1] for inst in samples])], dim=-1).

About multi-scale SMCA

Hello! Can you send an unpolished version of multi-scale SMCA to me?I want to know how it is realized.

An error occurred while using ACT

Hello, I have come across the following issue while using ACT, and I would greatly appreciate it if I could receive some assistance:

File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 112, in forward
q_groups, q_counts = self._create_clusters(queries, self.q_hashes)
File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 103, in _create_clusters
groups, counts = ada_cluster(hashes, n_hashes=n_hashes)
TypeError: ada_cluster() got an unexpected keyword argument 'n_hashes'

Also, while reviewing the code in the ACT folder, I noticed that the variable “q_hashes” in ACT/extensions/init.py is not being used.

How to use target GPU for training?

3EFVR_ZLQHA`CZ)C LV{481
J 6{N G(Z(K0@~X4BCDRLA5

你好,作者,我想使用指定GPU来完成多卡训练,为什么这里会报错呢?请问该如何指定GPU型号来完成多GPU训练,是需要在代码中修改吗?
希望可以收到您的答复

multiscale implementation

Hi, I checked the implementation. But I did not find the code implement the multiscale part. Would you like to add it?

Question about use multi-GPU for training

Hi, I want to try use 4 gpus to train your model DMS_MH_GMCA_resnet50 by myself, but it's report a warning,,like this:
683984ba833dcb8dd9fa60d93ca70df

Will this warning affect the accuracy of training?
Thanks !

ACT model compute flops

Dir sir,
thanks for your work on ACT model in transformer.I tried to integrate the ACT module into the task and modify the network following the pattern of SMCA-DETR/Adaptive_Cluster_Transformer/.My task is an object detection task, modified based on DETR.After act module was added, the overall training time of the task decreased by 2 hours, and the decrease of mAP just declare 0.6 as the paper say.However, when compute FLOPS, the total FLOPS on the network increased after ACT module added. I think it may be my compute code error. My flop calculation code is modified based on the original detr FLOPS calculation code, I don't know if there is any extra consideration for attention calculation.I would appreciate it if you could share information about the FLOP calculation code for the ACT module or point out what went wrong with my calculation code.

Training coco metrics seems to be good but when trained model validated using --eval mode, only garbage values are printed.

Thank you for the amazing work.
I have trained model for 50 epochs and while training, periodic eval seems to be reasonable and improving.

image

But then, when I use trained model on same validation data by using --eval argument and defining path for "checkpoint.pth" in d2/configs/detr_256_6_6_torchvision.yaml config, I am getting all zeros.

my ground truth val.json in (xywh) format.
After investigating more,

I found that DataLoader loads ground truth boxes in normalize form but predicted output boxes, those are in unnormalized form
because of following lines from engine.py in evaluate function.

"
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
results = postprocessors['bbox'](outputs, orig_target_sizes)
"

even after fixing predicted bbox to normalize or unnormalize groud truth bbox in DataLoader class, still I am getting all zeros in coco detection metrics.

I am not sure though training script call same function for periodic evaluation, why using --eval argument on same data produces completely different result?

Am I using some different model instead of trained one? I changed weight field in d2/configs/detr_256_6_6_torchvision.yaml config to model's "checkpoint.pth" .

Please let me know if you need any additional information.
Thanks.

About input format

at this line code:

if isinstance(samples[0], (list, torch.Tensor)):
            samples[0] = nested_tensor_from_tensor_list(samples[0])

Since samples already is NestedTensor, why still need nested_tensor_from_tensor?

Besides, have u tested d2 version of SMCA? It's not work... the input is quit different from original DETR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.