smca-detr's People
Forkers
cv-ip zyg11 zhou745 ljcuestc tvshow5727 luhb jimba86 huangjh98 lb-hit wstchhwp shiro-orz gaopengpjlab erica-yang wgqqgw hzxie99 whmjohn xiangqianlook baiyunwai1876 lihua199710 zjufkq mymulismca-detr's Issues
Does SMCA used warmup?
Does SMCA used warmup?
How many iterations it takes?
Question about the FLOPs in Table 1
关于多尺度SMCA的代码
作者您好,我看了下您的多尺度代码,您的在多尺度code中transformer的编码层是完全用卷积实现的,没有用到attn,您这么做的意义是什么呢?
Some question about the SMCA module and the code
Hi,I'm very interested in your work about the newly decoder of DETR。I have some questions about your code:
-
When I debugged your source code, I didn't find the use of encoder in FPN, such as Intra-Scale self-attention and multi-scale self- attention mentioned in the paper; And the Scale Selection network in the decoder。
-
Does the 'type1'-'type4' parameters in the code only change how to generate the Gaussian-like weights map?
-
In the decoder layer in Transformer,the propagate process:
out = self.norm4(tgt + query_pos)
point_sigmoid_offset = self.point2(out)
Is the point_sigmoid_offset parameter corresponds to sw and sh in the paper?if self.layer_index == 0:
point_sigmoid_ref_inter = self.point1(out)
point_sigmoid_ref = point_sigmoid_ref_inter.sigmoid()
point_sigmoid_ref = (h_w - 0) * point_sigmoid_ref / 32
point_sigmoid_ref = point_sigmoid_ref.repeat(1, 1, 8) # [100, bs, 2] -> [100, bs, 16]
else:
point_sigmoid_ref = point_ref_previous
point = point_sigmoid_ref + point_sigmoid_offset
Is the point_sigmoid_ref parameter corresponds to cw and ch in the paper? Why do these two parameters need to be added?
distance = (point.unsqueeze(1) - grid.unsqueeze(0)).pow(2)
This step corresponds to (i-cw) ^ 2 + (j-ch) ^ 2 in G (I, J) in the paper;
if self.dynamic_scale == "type1":
scale = 1
distance = distance.sum(-1) * scale
elif self.dynamic_scale == "type2": # 对于type2:对out2再做一次线性映射
scale = self.point3(out) # [100, bs, 256] -> [100, bs, 8]
scale = scale * scale #
scale = scale.reshape(tgt_len, -1).unsqueeze(1)
distance = distance.sum(-1) * scale
elif self.dynamic_scale == "type3":
scale = self.point3(out)
scale = scale * scale
scale = scale.reshape(tgt_len, -1, 2).unsqueeze(1)
distance = (distance * scale).sum(-1)
elif self.dynamic_scale == "type4":
scale = self.point3(out)
scale = scale * scale
scale = scale.reshape(tgt_len, -1, 3).unsqueeze(1)
distance = torch.cat([distance, torch.prod(distance, dim=-1, keepdim=True)], dim=-1)
distance = (distance * scale).sum(-1)
# generate Gaussian-like weight map
gaussian = -(distance - 0).abs() / self.smooth
According to the operation in the code, it seems that G (I, J) in the paper can not be obtained, I don't know what these steps meaning?
4. In addition, the paper says that logGi'
needs to be added to generate the Co-attention weights map in the Co-attention of the decoder in the transformer, but the code is set as follows:
attn_output_weights = attn_output_weights + gaussian[0].permute(2, 0, 1)
I sincerely hope you can help me solve these problems . Thanks !
ACT module release?
I'm sorry to ask that, the code for the paper End-to-end object detection with adaptive clustering transformer is not publicly available, that's right?
40.38 AP for ResNet50 single level feature
Hi,
I trained the model with 8 nodes, each of which has 8 GPUs. I can only obtain 38.46 AP. I wonder if you have trained those models with multiple nodes and larger batch size, 8 x 16?
When will the code be released?
Hello, recently I`m doing some research on SMCA-single scale so I would like to know when will the code be released? Thansk.
ulti_head_attention_forward gaussian[0].permute(2, 0, 1) RuntimeError: number of dims don't match in permute
ulti_head_attention_forward
gaussian[0].permute(2, 0, 1)
RuntimeError: number of dims don't match in permute
Does h_w shape is 1,Bs, 2?
for example: torch.Size([1, 10, 2])?
I got this error when run your code.
pretrained model download link
what's the modification compare with original DETR
Hi, just wonder, except for the GaussianAttention and the transformers.py, anything else changed compare with original DETR?
请问能否开源ACT论文中Knowlwdge Disillation Loss相关代码
Pre-trained Model on VG
Hi,
Thanks for your work. I noticed in your repository description that you may have experiments on the VG dataset, and I was wondering if you would have a pre-trained model on VG available to share. Thank you for your attention.
Kind regards,
Romero
OOM for ResNet50-DC5 even for 32GB GPUs
Hi,
I tried to run ResNet50-DC5 for SMCA-DETR on 8 GPUs with 32 GB memory each. It shows OOM error when using ResNet50-DC5 with batch size 16. Why it uses so much memory than other models?
Searching for help about the visiulization image.
It's A nice job.
Could you guide/tell me how to plot the visualization co-attention image of FIgure 2/3 in your paper. I want to visualise some my own images in your repo.
Thanks very much.
How to use target GPU for training?
IndexError: list index out of range when running inference
Hello,
When I run a modified d2go version of the code, it shows index error. Did you see this error before?
I wonder if my h_w is not set correctly? I feel confused about the samples[0], samples[1] (target)? It seems h_w is a concatenation of original images? Then why we need a samples[1] to obtain h_w? So I change the code with the following code,
h_w = torch.stack([torch.tensor([inst.shape[-2] for inst in samples]), torch.tensor([inst.shape[-1] for inst in samples])], dim=-1)
.
About multi-scale SMCA
Hello! Can you send an unpolished version of multi-scale SMCA to me?I want to know how it is realized.
An error occurred while using ACT
Hello, I have come across the following issue while using ACT, and I would greatly appreciate it if I could receive some assistance:
File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 112, in forward
q_groups, q_counts = self._create_clusters(queries, self.q_hashes)
File "/usr/local/miniconda3/lib/python3.8/site-packages/ACT-0.0.0-py3.8-linux-x86_64.egg/ACT/ada_clustering_attention.py", line 103, in _create_clusters
groups, counts = ada_cluster(hashes, n_hashes=n_hashes)
TypeError: ada_cluster() got an unexpected keyword argument 'n_hashes'
Also, while reviewing the code in the ACT folder, I noticed that the variable “q_hashes” in ACT/extensions/init.py is not being used.
How to use target GPU for training?
multiscale implementation
Hi, I checked the implementation. But I did not find the code implement the multiscale part. Would you like to add it?
Question about use multi-GPU for training
ACT model compute flops
Dir sir,
thanks for your work on ACT model in transformer.I tried to integrate the ACT module into the task and modify the network following the pattern of SMCA-DETR/Adaptive_Cluster_Transformer/.My task is an object detection task, modified based on DETR.After act module was added, the overall training time of the task decreased by 2 hours, and the decrease of mAP just declare 0.6 as the paper say.However, when compute FLOPS, the total FLOPS on the network increased after ACT module added. I think it may be my compute code error. My flop calculation code is modified based on the original detr FLOPS calculation code, I don't know if there is any extra consideration for attention calculation.I would appreciate it if you could share information about the FLOP calculation code for the ACT module or point out what went wrong with my calculation code.
Training coco metrics seems to be good but when trained model validated using --eval mode, only garbage values are printed.
Thank you for the amazing work.
I have trained model for 50 epochs and while training, periodic eval seems to be reasonable and improving.
But then, when I use trained model on same validation data by using --eval argument and defining path for "checkpoint.pth" in d2/configs/detr_256_6_6_torchvision.yaml config, I am getting all zeros.
my ground truth val.json in (xywh) format.
After investigating more,
I found that DataLoader loads ground truth boxes in normalize form but predicted output boxes, those are in unnormalized form
because of following lines from engine.py in evaluate function.
"
orig_target_sizes = torch.stack([t["orig_size"] for t in targets], dim=0)
results = postprocessors['bbox'](outputs, orig_target_sizes)
"
even after fixing predicted bbox to normalize or unnormalize groud truth bbox in DataLoader class, still I am getting all zeros in coco detection metrics.
I am not sure though training script call same function for periodic evaluation, why using --eval argument on same data produces completely different result?
Am I using some different model instead of trained one? I changed weight field in d2/configs/detr_256_6_6_torchvision.yaml config to model's "checkpoint.pth" .
Please let me know if you need any additional information.
Thanks.
About input format
at this line code:
if isinstance(samples[0], (list, torch.Tensor)):
samples[0] = nested_tensor_from_tensor_list(samples[0])
Since samples already is NestedTensor, why still need nested_tensor_from_tensor?
Besides, have u tested d2 version of SMCA? It's not work... the input is quit different from original DETR
Code for calculating Flops
Hi~, could you share the code for calculating flops?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.