gjylt / doubleattentionnet Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 3.0 3 KB

PyTorch implementation of Double Attention Net

Python 100.00%

doubleattentionnet's People

Contributors

Stargazers

Watchers

Forkers

yangkang779 juingzhou maklachur

doubleattentionnet's Issues

two bugs

A convolution that recover channel number from C_m to C is missing
a skip layer in the very end is missing(vital for performance)

Wrong softmax for attention maps?

I think this line is wrong.
softmaxB = self.softmax(tmpB).view( batch, self.c_n, self.K*h*w ).permute( 0, 2, 1) #batch, self.K*h*w, self.c_n

It should be:
softmaxB = F.softmax(tmpB, dim = -1).view( batch, self.c_n, self.K*h*w ) It means we softmax over the self.K * h * w (i.e. attention maps)
Any thoughts?
Solved.

I understand that in the original paper, the authors apply the double attention block to video data. From reading the paper, I understand how to apply the double attention block between 2D conv layers, such that higher-level features are weighted and combined with lower-level features.

I can't figure out how this implementation would apply to a 5D temporal input -- Batch, Time, Height, Width, Channels. I understand that the first step, feature gathering, involves a dimension reduction, 1x1 convolutions, softmax, and bilinear pooling. Should the data be reshaped to be (B, H, W, CxT)? That seems to be my inclination from the paper -- "where each b is a dhw-dimensional row vector" -- it seems that the output of the gathering stage is dxhxw size, and doesn't incorporate the input channel size because the conv is 1x1x1.

Thoughts?

Why do you need to divide the batch size by self.K?

Hi,

I am wondering why do you need to make batch size smaller?

batch = int(b / self.K)  # why do we need this line???
tmpA = A.view(batch, self.K, self.c_m, h * w).permute(0, 2, 1, 3).view(batch, self.c_m, self.K * h * w)

What does c_m and c_n mean? Does the dimension change when the tensor come cross the layer?

My question is what does c_m and c_n mean which are shown in the above picture. And another question is whether the dimension of tensor changed when it cross the double attention layer.
Thank you so much for you kind help.

gjylt / doubleattentionnet Goto Github PK

doubleattentionnet's People

Contributors

Stargazers

Watchers

Forkers

doubleattentionnet's Issues

two bugs

Wrong softmax for attention maps?

Application to temporal data

Why do you need to divide the batch size by self.K?

What does c_m and c_n mean? Does the dimension change when the tensor come cross the layer?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent