ruotianluo / disccaptioning Goto Github PK

Code for Discriminability objective for training descriptive captions(CVPR 2018)

Python 97.73% Shell 1.32% HTML 0.95%

disccaptioning's Issues

ValueError: sampler should be an instance of torch.utils.data.Sampler

pytorch 1.0.0

bash eval.sh att_d1 test

Traceback (most recent call last):
File "eval.py", line 146, in
vars(opt))
File "/content/DiscCaptioning/eval_utils.py", line 92, in eval_split
data = loader.get_batch(split)
File "/content/DiscCaptioning/dataloader.py", line 137, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "/content/DiscCaptioning/dataloader.py", line 256, in get
self.reset()
File "/content/DiscCaptioning/dataloader.py", line 235, in reset
collate_fn=lambda x: x[0]))
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 805, in init
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/sampler.py", line 146, in init
.format(sampler))
ValueError: sampler should be an instance of torch.utils.data.Sampler, but got sampler=[......]
Terminating BlobFetcher

evaluate error: KeyError: 'att_masks', not att_masks in data

File "/home/jzheng/PycharmProjects/DiscCaptioning/eval_utils.py", line 114, in eval_split
data['att_masks'][np.arange(loader.batch_size) * loader.seq_per_img]]
KeyError: 'att_masks'
There's no att_masks key in the data dict. Neither labels and masks. Am I missing sth?

I'm testing on val2014 dataset.

How to train on TopDown model?

when I try to run on TopDown model,
I got the following error:

File "/home/code/DiscCaptioning/models/AttModel.py", line 476, in forward
att_lstm_input = torch.cat([prev_h, fc_feats, xt], 1)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

Would you please tell me which parts of code I need to modify so I can train on the TopDown model?

how come evaluation result is very bad, Bleu_4 is 0.000, Meteor is 0.009 . BTW, how to generate caption on customized dataset

okenization...
PTBTokenizer tokenized 99344 tokens at 721579.12 tokens per second.
PTBTokenizer tokenized 16786 tokens at 239467.86 tokens per second.
setting up scorers...
computing Bleu score...
{'reflen': 15132, 'guess': [15165, 13543, 11921, 10299], 'testlen': 15165, 'correct': [30, 0, 0, 0]}
ratio: 1.00218080888
Bleu_1: 0.002
Bleu_2: 0.000
Bleu_3: 0.000
Bleu_4: 0.000
computing METEOR score...
METEOR: 0.009
computing Rouge score...
ROUGE_L: 0.002
computing CIDEr score...
CIDEr: 0.001
computing SPICE score...
Parsing reference captions
Parsing test captions
SPICE evaluation took: 2.144 s
SPICE: 0.002
loss: {'loss': tensor(31.6388, device='cuda:0'), 'cap_xe': tensor(31.6419, device='cuda:0'), 'retrieval_loss_greedy': tensor(7.4241, device='cuda:0'), 'retrieval_sc_loss': tensor(1.00000e-03 *
-3.1324, device='cuda:0'), 'loss_vse': tensor(0., device='cuda:0'), 'loss_cap': tensor(31.6419, device='cuda:0'), 'retrieval_loss': tensor(7.6047, device='cuda:0')}
{u'SPICE_Object': '0.006404463463649654', u'SPICE_Cardinality': '0.0', u'SPICE_Attribute': '0.0', 'CIDEr': '0.001079661462843171', u'SPICE_Size': '0.0', 'Bleu_4': 1.04439324421061e-15, 'Bleu_3': 2.3054219540753186e-14, 'Bleu_2': 1.208598304910465e-11, 'Bleu_1': 0.001978239366963272, u'SPICE_Color': '0.0', 'ROUGE_L': '0.001795472073475935', 'METEOR': 0.009059195566343728, u'SPICE_Relation': '0.0', 'SPICE': '0.0024048127567198488'}
Terminating BlobFetcher

Is this evaluating the image caption model? It looks like the retrieval model.
image 474190: woods conditioner china memorial scraper sash bringing woods interstate sunroof distant
image 277907: woods pairs china listed want listed bringing woods crowd
image 43033: woods hanging service woods peep dinosaurs cooking wonder
image 542103: woods conditioner china memorial gooey bringing cooking gain woody adorable
image 356116: woods majestically rice bringing cooking gain woody woods peep
image 538581: woods hanging service woods windsurfer dinosaurs cooking weeds woody woods windsurfer
image 359354: woods hanging effects woods silver dinosaurs woods silver
image 457146: woods captive honk bringing retrieve china woods holds
image 75305: woods majestically honk lots woods goofing woody woods silver
image 249968: woods troll honk bringing cooking fir china woods bubble foreheads
image 480451: woods hanging catchers woods tightly hollow bringing woods tightly hitting
image 379596: woods hangings china pouches want pouches bringing woods goofing
image 322362: woods patch benched honk bringing woods holds woody woods overgrowth gains
image 495233: woods conditioner china memorial honk bringing woods caddy musical woods overgrowth gains
image 366948: woods conditioner china lipstick rice dinosaurs woods mirrors
image 332833: woods burrito levels honk bringing cooking lock woody woods keypad
image 512346: woods hanging service woods draining buddhist dinosaurs woods peek
evaluating validation preformance... 2049/5000 (31.236956)

FileNotFoundError: [Errno 2] No such file or directory: 'cider/data/cocotalk_fc\\391895.npy Terminating BlobFetcher

After using a whole afternoon to fix 87 bugs in this repo one by one, finally I'm able to start training, but not surprisingly, I got stuck by another issue. This time I have no more idea about how to fix it, as there exists barely any related Q&A on Google (e.g. What is the 391895.npy? Where to find it? What may replace it? Or how to bypass this appear-to-be-easy FileNotFoundError?). Would anyone kindly give a hint of how I might sort out this issue? Many thanks.

The code to trigger this issue (after following all the previous steps): bash run_fc_con.sh
The error message:

DataLoader loading json file: data/cocotalk.json
vocab size is 9487
DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test

Traceback (most recent call last):
File "train.py", line 242, in
train(opt)
File "train.py", line 109, in train
data = loader.get_batch('train')
File "D:\Project\DiscCaptioning\dataloader.py", line 138, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "D:\Project\DiscCaptioning\dataloader.py", line 264, in get
tmp = self.split_loader.next()
File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\Project\DiscCaptioning\dataloader.py", line 205, in getitem
return (np.load(os.path.join("cider/" + self.input_fc_dir, str(self.info['images'][ix]['id']) + '.npy')), np.zeros((1,1)), ix)
File "D:\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'cider/data/cocotalk_fc\\391895.npy'
Terminating BlobFetcher

(After downloading and unzipping the cocotalk_fc.tar from Google drive, I obtain a folder cocotalk_fc with structure like this:
cocotalk_fc (the root folder obtained from zipping) -> cocotalk_fc (a binary file)
and after I cut them to under the directory cider, the path to the end of this directory branch is like this:
DiscCaptioning -> cider -> cocotalk_fc (the root folder after zipping) -> cocotalk_fc (a binary file)
Also, it won't help if you simply rename the binary cocotalk_fc file to 391895 or 391895.npy, which would still throw the same error. I'm hence stuck.)

Similar work

I think your work is very similar to "Deep Reinforcement Learning-based Image Captioning with Embedding Reward". I wonder what is the difference with them?

What's the difference between Rama's split and Karpathy's split?

In Rama's paper, they said that they use the split from Karpathy's paper,
but in your README.md you mention that it's different from standard Karpathy's split.

I wonder what's the differences between these 2 splits, thanks!

Make sure the vse opt are the same !!!!!

After training the retrieval model with "bash run_fc_con.sh", I pretrain the captioning model with "bash run_att.sh". However, it is not successful, with the following error:

DataLoader loading json file: data/cocotalk.json
vocab size is 9487
DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
...

key caption_generator.core.a2c.weight in model.state_dict() not in loaded state_dict
key caption_generator.core.h2h.bias in model.state_dict() not in loaded state_dict
key caption_generator.att_embed.0.weight in model.state_dict() not in loaded state_dict
...
key caption_generator.core.h2h.weight in model.state_dict() not in loaded state_dict
Read data: 0.360612869263
/home/jzheng/PycharmProjects/DiscCaptioning/misc/utils.py:123: UserWarning: volatile was removed (Variable.volatile is always False)
if isinstance(x, Variable) and volatile!=x.volatile:
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:514: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
weight = F.softmax(dot) # batch * att_size
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:125: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
output = F.log_softmax(self.logit(output))
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:131: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
self._loss['xe'] = loss.data[0]
/home/jzheng/PycharmProjects/DiscCaptioning/models/JointModel.py:122: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
self._loss['loss_cap'] = loss_cap.data[0]
iter 0 (epoch 0), train_loss = 9.190, time/batch = 0.157
loss_cap = 9.190 loss = 9.190 cap_xe = 9.190 loss_vse = 0.000
Read data: 0.176112890244
iter 1 (epoch 0), train_loss = 8.774, time/batch = 0.129
loss_cap = 8.774 loss = 8.774 cap_xe = 8.774 loss_vse = 0.000
Read data: 0.179361104965
iter 2 (epoch 0), train_loss = 8.381, time/batch = 0.116
loss_cap = 8.381 loss = 8.381 cap_xe = 8.381 loss_vse = 0.000
...

Because of this error, the model is not learning.

Unable to download pretrained models

Thanks for the implementation.

I downloaded the pre-trained models but the file is not a folder which eval.sh code is asking for and the downloaded file is in a file format which is not accessible.

I think the file gets corrupted after downloading from the drive.

IOError: [Errno 20] Not a directory: 'log_att_d1/infos_att_d1.pkl'

python: can't open file 'scripts/prepro_ngrams.py': [Errno 2] No such file or directory

To train on our own, do we have to pre-process for self-critical model? If so, I think a python file is missing here in this repo.

f30k-caption?

hello,where can I download the f30k-caption in your annFile = 'f30k-caption/annotations/dataset_flickr30k.json'?when I use the flickr30,the eval code takes wrong.Is there the eval code for flickr30?

关于每张图片使用几个句子

您好！我在跑您的代码时候，发现opt文件中默认每张图片使用一个句子，但是在最后联合训练的脚本中，又专门指定了每张图片使用一个句子，但是其他两个脚本并没有指定。我对此感到很困惑，请问您在实现的时候，训练自检索模型、预训练caption模型和最后的联合训练每张图片分别用了几个句子呢？

Which are the negatives for the retrieval model.

The negatives used for the retrieval model are all the rest images of the entire batch? In your paper you mentioned B images, how many are those images?

Traceback (most recent call last): File "train.py", line 250, in <module> train(opt) File "train.py", line 163, in train if opt.evaluation_retrieval: AttributeError: 'Namespace' object has no attribute 'evaluation_retrieval' Terminating BlobFetcher

While I was training with the command "bash run_fc_con.sh", the training was terminated because the following error:

Traceback (most recent call last):
File "train.py", line 250, in
train(opt)
File "train.py", line 163, in train
if opt.evaluation_retrieval:
AttributeError: 'Namespace' object has no attribute 'evaluation_retrieval'
Terminating BlobFetcher

However, in the opt.py file, I don't see the argument of "evaluation_retrieval".

infos_att_d1.pkl

Thanks for your works! Could you provide " infos_att_d1.pkl" for us?

when bash run_att_d.sh ,it broke

att_masks

excuse me, would you mind explaining the function about the att_mask?

Issues in FCModel

When I try to train the retrieval model using bash run_fc_con.sh, there is a problem before saving model. Actually, line 147 of FCModel.py xt = self.new_img_embed(fc_feats[k:k+1], fc_feats_d.chunk(batch_size)[k]).expand(beam_size, self.input_encoding_size) results in the issue, because it reminds that there is no attribute named new_img_embed. Also, fc_feats_d is an error, because the pycharm says that "Unresolved reference".

But after switching --caption_model fc to --caption_model att2in2 in run_fc_con.sh file, the issue will be solved, so I think the inference of FC model could be wrong.

BTW, (1) when training retrieval model, why do you need to use the caption model to generate captions?
(2) the image features is extracted before training the model, so I think you do not fine-tune the CNN, right?

ruotianluo / disccaptioning Goto Github PK

disccaptioning's Issues

Recommend Projects

Recommend Topics

Recommend Org