Coder Social home page Coder Social logo

Comments (47)

Geeshang avatar Geeshang commented on August 13, 2024

The new download link includes the "AttentionCrop" layer.
https://1drv.ms/u/s!Ak3_TuLyhThpkxifVPt-w8e-axc5

from recurrent-attention-cnn.

q5390498 avatar q5390498 commented on August 13, 2024

thank you very very much!

from recurrent-attention-cnn.

Actasidiot avatar Actasidiot commented on August 13, 2024

I'm not familiar with Caffe. It seems that the authors haven't released the code of training process.
Has anyone implemented the alternative training process mentioned in this paper?

from recurrent-attention-cnn.

q5390498 avatar q5390498 commented on August 13, 2024

@kawhiwang I am sorry, I can not open this link "https://1drv.ms/u/s!Ak3_TuLyhThpkxifVPt-w8e-axc5". Have you download these files? Can you share it in baiduyun?

from recurrent-attention-cnn.

Actasidiot avatar Actasidiot commented on August 13, 2024

Of course.
You can download it in http://pan.baidu.com/s/1pL6vS63.
If you are also going to implement the training process, I hope to get your help.

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@kawhiwang @q5390498 I am also working on this problem. Did you make any progress on how to train RA_CNN?

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

has anyone tested the performance on the 200-bird dataset? I have created the standard lmdb (short side 448), and tested the test images(crop to 448*448) using the given caffemodel and deploy.prototxt, but the accuracy is about 1/200, random guess. I have tried my best to check the whole process, but haven't found any bug. Anyone can help?

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 How do you define the labels in dataset? The labels in CUB is [1,2,...200], you should change it to [1,2,.....,0]. I test 500 samples on CUB-200-2011, I don't remember the exact result, but I am sure it exceeds 80%. Besides, it's likely that I directly resize the image to fixed size [448*448] rather than crop it, anyway, I don't think it could make so difference.

from recurrent-attention-cnn.

Actasidiot avatar Actasidiot commented on August 13, 2024

@clover978
Do you directly resize the images in CUB_200_2011/images into [448*448] ?
Do we need to crop the birds from backgrounds according to bounding_boxes.txt?

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

@clover978 thanks! i have had put offset 1 and modulu 200 for labels, and accuracy around 85%. But it seems we need significant modification from deploy.prototxt for training? Have you ever tried retrain caffe model?

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@kawhiwang Yes, I just resize the original image into [448*448], The performance is 80%~85%, I think it would be better to keep the ratio of images, like @lizh0019 . As for the bounding box, I don't think we should use this information, according to the paper, RA_CNN does not need boundingbox annotations in training process as well as testing.

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 Giving the caffemodel, we can load and parse it, which means we can then generalize the trian_val.prototxt. Actually, I have done this, and the train_val.prototxt is not too different from deploy.prototxt . The problem is I don't know how to initialize the network. So I didn't make it to train from scratch.

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 https://gist.github.com/clover978/1969d4648458cf876af92b3507856d70
I make it a gist. How to load and parse a caffemodel file. I think the result should be identical with prototxt file used for trainning

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

@clover978 thanks a lot! I noticed that all the lr_mult and decay_mult in the generated train_val.prototxt are 0.0. According to the paper "Training strategy" section, I think the author might have a script to generate different train_val.prototxt with different lr_mult and decay_mult, keep some 0.0 and others 1.0, while 1.0 and others 0.0 in next super-iteration (in each iteration is a whole training process), and repeat many such super-iterations?

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 The lr_mult and decay_mult are set by experience. I refer to vgg_train_val.prototxt. As for the training process, we share the same idea, to be exactly, freeze parameters of APN and VGG alternatively. The prototxt should not differ from generated one too much, we just need to modify lr_mult and decay_mult.
I was struck at next step, the scale1 stem takes in images with size of 448448, but vgg's input is 224224, so I fine-tuned another vgg on CUB dataset, but the accuracy is around 0.5, according to paper, it should be 0.79. The input of scale2 and scale3 stems is 224224, so a fine-tuned vgg with 224224 input size is also needed, but I didn't try it.
I am not sure did I make any mistakes. If you spot any ones, or you get a different result when repeating training process, please inform me.

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

@clover978 you mean you finetuned a vgg model with input 448x448 from pretrained vgg-19 model? I tried this by just modifying the train_val.prototxt input size from 224x224 to 448x448, and accuracy is about 0.79, and conv5_4 size is 5121414 (was 51277).

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 Great ! It seems that there were some bugs in my early work. BTW, what's the performance of fine-tuned model with 224*224 input size? If it works as expect, I think it's possible to start training with those pre-trained model.
Besides, can I talk to you via some IM software like QQ, to debug my fine-tune process. Thanks.

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

@clover978 I am still not clear how to use the pre-trained vgg for scale 2 and 3, the layer name in scale 2 is something like "conv2_2_A", how to borrow the weights in "conv2_2" of the pretrained model?

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 it should be familiar with siamese network

from recurrent-attention-cnn.

lizh0019 avatar lizh0019 commented on August 13, 2024

@clover978 Hi Caffe_Expert, can you send an email to my hotmail so we can add contact of wechat or qq?

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@lizh0019 That's weird. I have sent email to [email protected]. Seems you didn't receive it.

from recurrent-attention-cnn.

QQQYang avatar QQQYang commented on August 13, 2024

It seems that the generated train_val.prototxt does not include a loss layer. In the original paper, the whole loss consists two parts. Do we need to create a new loss layer to calculate the pairwise ranking loss?

from recurrent-attention-cnn.

liangzimei avatar liangzimei commented on August 13, 2024

@clover978 hi, i use your script 'net.py' to parse RA_CNN.caffemodel. However, the output net structure don't contain any loss layer. do you know why? thanks....

from recurrent-attention-cnn.

clover978 avatar clover978 commented on August 13, 2024

@liangzimei Sorry, I didn't get loss layer either. I didn't remember that I have modified the generated prototxt until you remind me. I'll delete my wrong comment. It seems that implementing rank-loss is still in the way and we need more information from the author.

from recurrent-attention-cnn.

simo23 avatar simo23 commented on August 13, 2024

Hi everyone,
I'm trying to replicate the 0.79 accuracy result. Thanks for the great details you are sharing. I have just some questions to ask if anyone can help:

@lizh0019 Can you share the details of your training process which achieves 0.79 accuracy? You said that you modified the proto to have input from 224 to 448 but you mean the VGG net trained by the authors of VGG or the RA-CNN model?

The standard VGG has max pool layers which pass to the FC layers a 7x7x512 ( from conv5). So if I want to use the weights of the authors on a 448x448 the way I think is correct, also from the VGG paper, is to:

  • Give in input the 448x448 image, which will give a 14x14x512 feature map
  • Then use the 7x7x512 FC layer 4 times on the 4 parts of the 14x14 ( top-left,top-right,bot-left,bot-right)
  • Average the 200 class score output for each of the 4 parts to have the final score

The VGG authors convert the 7x7x512 FC into CONV layers so this step comes for free, but then we need to implement the avg of the 4 scores right? I read this in the VGG paper https://arxiv.org/abs/1409.1556v6 section 3.2 "Testing" when they say

"The result is a class score map with the number of
channels equal to the number of classes, and a variable spatial resolution, dependent on the input
image size. Finally, to obtain a fixed-size vector of class scores for the image, the class score map is
spatially averaged (sum-pooled)."

Lastly, can you share the train_val.prototxt?

Thanks, Andrea

from recurrent-attention-cnn.

super-wcg avatar super-wcg commented on August 13, 2024

@clover978 Could you send your qq number to my e-mail [email protected]? I want to get the train_prototxt, and I want to ask you some questions.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@clover978 I also need the train_prototxt, and want to know how you get the train_prototxt.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@lizh0019 How you do to get the accuracy 85%? I run the test net only got accuracy 83%.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@kawhiwang Hello, I retrain the model, the scale2's accuary is lower than the scale1's. However, according to the papper: the scale2's accuary is higher than the scale1's. I want to know whether tthe "AttentionCrop" layer is given by author? If you have retrained the model, can you help me?

from recurrent-attention-cnn.

Michael-Jing avatar Michael-Jing commented on August 13, 2024

@chenfeima The AttentionCrop layer is given by the author, it's in the layers folder. By the way, can you share me with your train_prototxt.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@Michael-Jing I realy need your help. My QQ number is 1691767172.

from recurrent-attention-cnn.

Michael-Jing avatar Michael-Jing commented on August 13, 2024

@chenfeima Sorry that I don't use QQ normally. And Fu has published a paper called multi attention cnn and the code is also released which doesn't have custom layers, you can check that out.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@Michael-Jing I have known the news of the new paper for a long time. Maybe, retraining 1 model can learn more than checking 10 other's caffemodels. Ha-ha.

from recurrent-attention-cnn.

cs2103t avatar cs2103t commented on August 13, 2024

@clover978 @lizh0019
Hi, may I ask how do you create the lmdb?

im = cv2.imread(single_file)
im = cv2.resize(im, (448,448))
im = im.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(im, label)
txn.put(str(counter), im_dat.SerializeToString())

This is how I read the images and store to lmdb. However, I cant reproduce the accuracy. Can you help me with this? Thank you very much in advance.

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@cs2103t Where is your labels get from? Only read this code, your labels are all zero.

from recurrent-attention-cnn.

cs2103t avatar cs2103t commented on August 13, 2024

@chenfeima Thank you for the quick reply! I got the label from the folder name. As the folder is indexed from 1 to 200, I can just use it right? I can only get the accuracy of random guess currently.
subfolder = subdir.split('/')[1]
label= int(subfolder.split('.')[0])

from recurrent-attention-cnn.

chenfeima avatar chenfeima commented on August 13, 2024

@cs2103t Maybe you need to change the index into [0, 199].

from recurrent-attention-cnn.

cs2103t avatar cs2103t commented on August 13, 2024

@chenfeima So I just minus all the index by 1? I tried also but I still get random guess accuracy.

from recurrent-attention-cnn.

cocowf avatar cocowf commented on August 13, 2024

@Michael-Jing sorry,I can't find attentioncrop layer in folder,can you help me?

from recurrent-attention-cnn.

Michael-Jing avatar Michael-Jing commented on August 13, 2024

@cocowf
it's located in caffe/src/caffe/layers

from recurrent-attention-cnn.

cocowf avatar cocowf commented on August 13, 2024

from recurrent-attention-cnn.

Michael-Jing avatar Michael-Jing commented on August 13, 2024

@cocowf
Yes, you should compile caffe. as to the loss function, I don't have knowledge to share on that. some other guy here may help.

from recurrent-attention-cnn.

 avatar commented on August 13, 2024

Hi everyone, I'm trying to read the images from Caltech-UCSD Birds 200. but there isn't any image except an image file 659MB????...

from recurrent-attention-cnn.

ouceduxzk avatar ouceduxzk commented on August 13, 2024

for those who want to reproduced the work, let's collaborate together and here is my initial work https://github.com/ouceduxzk/Fine_Grained_Classification/tree/master/RA-CNN

from recurrent-attention-cnn.

zhiAung avatar zhiAung commented on August 13, 2024

Who has the pytorch version

from recurrent-attention-cnn.

22wei22 avatar 22wei22 commented on August 13, 2024

Who has the pytorch version,thanks

from recurrent-attention-cnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.