haomood / bilinear-cnn Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of bilinear CNN for fine-grained image recognition
License: GNU General Public License v3.0
PyTorch implementation of bilinear CNN for fine-grained image recognition
License: GNU General Public License v3.0
Hi, is stetp2 's network based on step1's FC parameters or just training a vgg16 net from scratch?
Hi, I am confused about a question in your code. The mean and variance of the data normalize transform in your code is [(0.485, 0.456, 0.406), (0.229, 0.224, 0.225)]. But I computed the mean and variance of train data and got [(0.4856, 0.4994, 0.4324), (0.1817, 0.1811, 0.1927)]. Then I used the mean and variance I computed, but the test accuracy of the result I got is lower than yours. First, I thought maybe you used mean and variance of the whole dataset, then I computed it but got a result very close to what I got before. So can you tell me how to get the mean and std of the data normalize transform in your code? Thank you!
Hi,I just used your class BCNN as a module, but what i get is the same classification result of different images.
Is there something wrong?
here is the output of predict class and truct class
predict_class: tensor([61, 61, 61, 61, 61, 61, 61, 61], device='cuda:0')
truth_class: tensor([180, 151, 187, 33, 70, 36, 109, 54], device='cuda:0')
the training process:
data = data.to(opt.device)
label = (label).to(opt.device)
optimizer.zero_grad()
score = bcnn_model(data)
loss = criterion(score,label)
loss.backward()
optimizer.step()
Hello,Could you achieve the result as the paper said, if I may ask?
Regarding line 129-130 in bilinear_cnn_fc.py, I'm confused about the magic number in
Normalize(mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225))
where are these numbers from?
Hi Hao,
Thank you for a neat implementation.
I wonde if training with the hyperparameters written in README
--base_lr 1e-2 \
--batch_size 64 --epochs 25 --weight_decay 1e-5 \
--model "model.pth"
gives 84.17% test accuracy?
I used exactly the commads which you provide in the README:
Step 1.
$ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_fc.py --base_lr 1.0 \
--batch_size 64 --epochs 55 --weight_decay 1e-8 \
| tee "[fc-] base_lr_1.0-weight_decay_1e-8-epoch_.log"
Step 2.
$ CUDA_VISIBLE_DEVICES=0,1,2,3 ./src/bilinear_cnn_all.py --base_lr 1e-2 \
--batch_size 64 --epochs 25 --weight_decay 1e-5 \
--model "model.pth" \
| tee "[all-] base_lr_1e-2-weight_decay_1e-5-epoch_.log"
I have trained step1 model and got 76.67% accuracy on test. I use this as initialization for step2 model and finetune all the layers further. But the accuracy saturates at 76.61% and doesn't grow further.
Are there any extra tricks to get the desired performance?
Hi Hao
First of all thanks for the excellent implementation. I have used the code here as a reference for my own implementations.
In the original paper (http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf) the authors have used signed square root operation. Something like:
X = torch.mul(torch.sign(X),torch.sqrt(torch.abs(X)+1e-5))
instead of the normal square root you have used X = torch.sqrt(X + 1e-5)
Was there a particular reason for using this ?
X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear
特征图A的尺寸为(C,M),B的尺寸为(C,N)
论文中提到
If fA and fB extract features of size C ×M and
C ×N respectively, then Φ(I) is of size M × N.
但是按照您的写法,这个结果是C × C
但是论文experiment部分似乎结果也是您的512*512,即C × C
我很困惑,望您能解答以下,谢谢。
Thank you very much for your code! But where can I find that model for fine-tuning? Or it need to be trained by myself?
Hi, Thank you for your code.
I wonder why there's only one feature extractor in BCNN class.
I think BCNN has two feature extractor, can you explain please?
Thank you
请问torch.sqrt与F.normalize的作用分别是什么?
谢谢。
Hi! After 2 epochs the backward runs out of memory :( First epoch its okey but then crash on second one. It seems that stores the graph or somethin but I change some things and crash:
`
for X, y in self._train_loader:
# Data.
# Clear the existing gradients.
X = X.cuda()
y = y.cuda()
# Forward pass.
score = self._net(X)
loss = self._criterion(score, y.long())
with torch.no_grad():
epoch_loss += loss.item()
# Prediction.
prediction = torch.argmax(score, dim=1)
num_total += y.size(0)
num_correct += torch.sum(prediction == y.long()).item()
# Backward pass.
self._optimizer.zero_grad()
loss.backward()
self._optimizer.step()
total_batches+=1
del X, y, score, loss, prediction
`
in the forward
function of BCNN class, the bilinear operation is
X = torch.bmm(X, torch.transpose(X, 1, 2)) / (28**2) # Bilinear
why does it require the result of matrix multiplication being divided by (28 ** 2)?
@HaoMood ,你好,非常感谢你的工作。
我在跑你代码时第一步的test acc是76%,保存的最好结果的model是vgg_16_epoch_21.pth
但是在step 2中load 这个21.pth 的model得到的train acc 是1%, test acc 是0.
请问这是什么问题?
hi, thanks a lot for your code! Everything works well when I only use one gpu by setting cuda_visible_devices=0 (for example), but when I use multiple gpus by setting cuda_visible_devices=0,1 (for example), the process will become a zombie process, which means it is not actually training, but it still holds the gpu and cpu resources. What's the worst is, you even cannot kill it through "kill -9 PID". The only thing you can do is a reboot. Have u come across the same issue before? Thanks a lot!
How do we obtain the model.pth
file for fine tuning all the layers ?
In your README, I see you used 4 gpus. So, how much memory has been used totally in your step1?
Hi, it is a concise and useful code for bilinear CNNs, however, from the paper I read about the
" elementwise signed square-root (x ← sign(x)|x|) and l2 normalization is applied to the matrix A"
which means it should be multiplied by the sign. But in this code just "X = torch.sqrt(X + 1e-5)"
Am I missing something? and even this not same thoroughly, I got the same result (84.2%) which suggests it should be a right answer?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.