Hello, luo when I pretrain the VSEFCmodel, the vse_loss doesn't converge well , j

the retrieval loss doesn't converge well about disccaptioning HOT 11 OPEN

ruotianluo commented on August 19, 2024

the retrieval loss doesn't converge well

from disccaptioning.

Comments (11)

ruotianluo commented on August 19, 2024 1

Pair loss is worse and vseattmodel gives worse result too.

from disccaptioning.

ruotianluo commented on August 19, 2024

thats very common in the first several epochs. Try training it a little bit longer. Or just restart the training.

from disccaptioning.

qq283215389 commented on August 19, 2024

ok, thanks a lot, for another VSE model(VSEAttModel) and "pair loss" , whose result isn't shown in your paper "Discriminability objective for training descriptive captions" in CVPR 2018?

from disccaptioning.

qq283215389 commented on August 19, 2024

thanks！if the retrieval model perform better（like the paper“Stacked Cross Attention for Image-Text Matching”），can we get a better result for captioning model？

from disccaptioning.

ruotianluo commented on August 19, 2024

I think it's very likely.

from disccaptioning.

qq283215389 commented on August 19, 2024

hello,luo
It's my result of pre-training retrieval model after i run “run_fc_con.sh”, there is still a difference with your result presented in your paper for the retrieval model.
Result:
Average i2t Recall: 53.9
Image to text: 29.9 59.2 72.6 4.0 19.6
Average t2i Recall: 42.3
Text to image: 20.6 46.5 59.8 7.0 40.8

from disccaptioning.

ruotianluo commented on August 19, 2024

Did you download my pretrained model? Does it perform better and the same as what's reported in the paper?
https://drive.google.com/open?id=1oQ_O-O2KoSQv1xdBPKaIOGt-VW0gS-42
These are my training curves, to give you a hint.

from disccaptioning.

qq283215389 commented on August 19, 2024

i might get the problem，i have used the size of 7x7 for coco fc features, i think u have used 14x14 for coco fc features?

from disccaptioning.

ruotianluo commented on August 19, 2024

fc feature doest have spatial dimensions, it's a vector

from disccaptioning.

qq283215389 commented on August 19, 2024

I found other paper use Karpathy'split for COCO, your paper use rama's split, whose test data are the same? why you can compare your result with the result in self-critical?

from disccaptioning.

ruotianluo commented on August 19, 2024

the splits are different. The self critical one is my implementation on Rama's split. Using Rama split I'd because we need to compare ours to Rama's result.

from disccaptioning.

Recommend Projects

the retrieval loss doesn't converge well about disccaptioning HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent