Coder Social home page Coder Social logo

disccaptioning's Issues

ValueError: sampler should be an instance of torch.utils.data.Sampler

pytorch 1.0.0

bash eval.sh att_d1 test

Traceback (most recent call last):
File "eval.py", line 146, in
vars(opt))
File "/content/DiscCaptioning/eval_utils.py", line 92, in eval_split
data = loader.get_batch(split)
File "/content/DiscCaptioning/dataloader.py", line 137, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "/content/DiscCaptioning/dataloader.py", line 256, in get
self.reset()
File "/content/DiscCaptioning/dataloader.py", line 235, in reset
collate_fn=lambda x: x[0]))
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 805, in init
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/sampler.py", line 146, in init
.format(sampler))
ValueError: sampler should be an instance of torch.utils.data.Sampler, but got sampler=[......]
Terminating BlobFetcher

evaluate error: KeyError: 'att_masks', not att_masks in data

File "/home/jzheng/PycharmProjects/DiscCaptioning/eval_utils.py", line 114, in eval_split
data['att_masks'][np.arange(loader.batch_size) * loader.seq_per_img]]
KeyError: 'att_masks'
There's no att_masks key in the data dict. Neither labels and masks. Am I missing sth?

I'm testing on val2014 dataset.

How to train on TopDown model?

when I try to run on TopDown model,
I got the following error:

File "/home/code/DiscCaptioning/models/AttModel.py", line 476, in forward
att_lstm_input = torch.cat([prev_h, fc_feats, xt], 1)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

Would you please tell me which parts of code I need to modify so I can train on the TopDown model?

how come evaluation result is very bad, Bleu_4 is 0.000, Meteor is 0.009 . BTW, how to generate caption on customized dataset

okenization...
PTBTokenizer tokenized 99344 tokens at 721579.12 tokens per second.
PTBTokenizer tokenized 16786 tokens at 239467.86 tokens per second.
setting up scorers...
computing Bleu score...
{'reflen': 15132, 'guess': [15165, 13543, 11921, 10299], 'testlen': 15165, 'correct': [30, 0, 0, 0]}
ratio: 1.00218080888
Bleu_1: 0.002
Bleu_2: 0.000
Bleu_3: 0.000
Bleu_4: 0.000
computing METEOR score...
METEOR: 0.009
computing Rouge score...
ROUGE_L: 0.002
computing CIDEr score...
CIDEr: 0.001
computing SPICE score...
Parsing reference captions
Parsing test captions
SPICE evaluation took: 2.144 s
SPICE: 0.002
loss: {'loss': tensor(31.6388, device='cuda:0'), 'cap_xe': tensor(31.6419, device='cuda:0'), 'retrieval_loss_greedy': tensor(7.4241, device='cuda:0'), 'retrieval_sc_loss': tensor(1.00000e-03 *
-3.1324, device='cuda:0'), 'loss_vse': tensor(0., device='cuda:0'), 'loss_cap': tensor(31.6419, device='cuda:0'), 'retrieval_loss': tensor(7.6047, device='cuda:0')}
{u'SPICE_Object': '0.006404463463649654', u'SPICE_Cardinality': '0.0', u'SPICE_Attribute': '0.0', 'CIDEr': '0.001079661462843171', u'SPICE_Size': '0.0', 'Bleu_4': 1.04439324421061e-15, 'Bleu_3': 2.3054219540753186e-14, 'Bleu_2': 1.208598304910465e-11, 'Bleu_1': 0.001978239366963272, u'SPICE_Color': '0.0', 'ROUGE_L': '0.001795472073475935', 'METEOR': 0.009059195566343728, u'SPICE_Relation': '0.0', 'SPICE': '0.0024048127567198488'}
Terminating BlobFetcher

Is this evaluating the image caption model? It looks like the retrieval model.
image 474190: woods conditioner china memorial scraper sash bringing woods interstate sunroof distant
image 277907: woods pairs china listed want listed bringing woods crowd
image 43033: woods hanging service woods peep dinosaurs cooking wonder
image 542103: woods conditioner china memorial gooey bringing cooking gain woody adorable
image 356116: woods majestically rice bringing cooking gain woody woods peep
image 538581: woods hanging service woods windsurfer dinosaurs cooking weeds woody woods windsurfer
image 359354: woods hanging effects woods silver dinosaurs woods silver
image 457146: woods captive honk bringing retrieve china woods holds
image 75305: woods majestically honk lots woods goofing woody woods silver
image 249968: woods troll honk bringing cooking fir china woods bubble foreheads
image 480451: woods hanging catchers woods tightly hollow bringing woods tightly hitting
image 379596: woods hangings china pouches want pouches bringing woods goofing
image 322362: woods patch benched honk bringing woods holds woody woods overgrowth gains
image 495233: woods conditioner china memorial honk bringing woods caddy musical woods overgrowth gains
image 366948: woods conditioner china lipstick rice dinosaurs woods mirrors
image 332833: woods burrito levels honk bringing cooking lock woody woods keypad
image 512346: woods hanging service woods draining buddhist dinosaurs woods peek
evaluating validation preformance... 2049/5000 (31.236956)

FileNotFoundError: [Errno 2] No such file or directory: 'cider/data/cocotalk_fc\\391895.npy Terminating BlobFetcher

After using a whole afternoon to fix 87 bugs in this repo one by one, finally I'm able to start training, but not surprisingly, I got stuck by another issue. This time I have no more idea about how to fix it, as there exists barely any related Q&A on Google (e.g. What is the 391895.npy? Where to find it? What may replace it? Or how to bypass this appear-to-be-easy FileNotFoundError?). Would anyone kindly give a hint of how I might sort out this issue? Many thanks.

The code to trigger this issue (after following all the previous steps): bash run_fc_con.sh
The error message:

DataLoader loading json file: data/cocotalk.json
vocab size is 9487
DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test

Traceback (most recent call last):
File "train.py", line 242, in
train(opt)
File "train.py", line 109, in train
data = loader.get_batch('train')
File "D:\Project\DiscCaptioning\dataloader.py", line 138, in get_batch
ix, tmp_wrapped = self._prefetch_process[split].get()
File "D:\Project\DiscCaptioning\dataloader.py", line 264, in get
tmp = self.split_loader.next()
File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 521, in next
data = self._next_data()
File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 561, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\Project\DiscCaptioning\dataloader.py", line 205, in getitem
return (np.load(os.path.join("cider/" + self.input_fc_dir, str(self.info['images'][ix]['id']) + '.npy')), np.zeros((1,1)), ix)
File "D:\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'cider/data/cocotalk_fc\\391895.npy'
Terminating BlobFetcher

(After downloading and unzipping the cocotalk_fc.tar from Google drive, I obtain a folder cocotalk_fc with structure like this:
cocotalk_fc (the root folder obtained from zipping) -> cocotalk_fc (a binary file)
and after I cut them to under the directory cider, the path to the end of this directory branch is like this:
DiscCaptioning -> cider -> cocotalk_fc (the root folder after zipping) -> cocotalk_fc (a binary file)
Also, it won't help if you simply rename the binary cocotalk_fc file to 391895 or 391895.npy, which would still throw the same error. I'm hence stuck.)

Similar work

I think your work is very similar to "Deep Reinforcement Learning-based Image Captioning with Embedding Reward". I wonder what is the difference with them?

Make sure the vse opt are the same !!!!!

After training the retrieval model with "bash run_fc_con.sh", I pretrain the captioning model with "bash run_att.sh". However, it is not successful, with the following error:

DataLoader loading json file: data/cocotalk.json
vocab size is 9487
DataLoader loading h5 file: data/cocotalk_fc data/cocobu_att data/cocotalk_label.h5
max sequence length in data is 16
read 123287 image features
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
Make sure the vse opt are the same !!!!!
...

key caption_generator.core.a2c.weight in model.state_dict() not in loaded state_dict
key caption_generator.core.h2h.bias in model.state_dict() not in loaded state_dict
key caption_generator.att_embed.0.weight in model.state_dict() not in loaded state_dict
...
key caption_generator.core.h2h.weight in model.state_dict() not in loaded state_dict
Read data: 0.360612869263
/home/jzheng/PycharmProjects/DiscCaptioning/misc/utils.py:123: UserWarning: volatile was removed (Variable.volatile is always False)
if isinstance(x, Variable) and volatile!=x.volatile:
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:514: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
weight = F.softmax(dot) # batch * att_size
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:125: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
output = F.log_softmax(self.logit(output))
/home/jzheng/PycharmProjects/DiscCaptioning/models/AttModel.py:131: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
self._loss['xe'] = loss.data[0]
/home/jzheng/PycharmProjects/DiscCaptioning/models/JointModel.py:122: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
self._loss['loss_cap'] = loss_cap.data[0]
iter 0 (epoch 0), train_loss = 9.190, time/batch = 0.157
loss_cap = 9.190 loss = 9.190 cap_xe = 9.190 loss_vse = 0.000
Read data: 0.176112890244
iter 1 (epoch 0), train_loss = 8.774, time/batch = 0.129
loss_cap = 8.774 loss = 8.774 cap_xe = 8.774 loss_vse = 0.000
Read data: 0.179361104965
iter 2 (epoch 0), train_loss = 8.381, time/batch = 0.116
loss_cap = 8.381 loss = 8.381 cap_xe = 8.381 loss_vse = 0.000
...

Because of this error, the model is not learning.

Unable to download pretrained models

Thanks for the implementation.

I downloaded the pre-trained models but the file is not a folder which eval.sh code is asking for and the downloaded file is in a file format which is not accessible.

I think the file gets corrupted after downloading from the drive.

IOError: [Errno 20] Not a directory: 'log_att_d1/infos_att_d1.pkl'

f30k-caption?

hello,where can I download the f30k-caption in your annFile = 'f30k-caption/annotations/dataset_flickr30k.json'?when I use the flickr30,the eval code takes wrong.Is there the eval code for flickr30?

关于每张图片使用几个句子

您好!我在跑您的代码时候,发现opt文件中默认每张图片使用一个句子 ,但是在最后联合训练的脚本中,又专门指定了每张图片使用一个句子,但是其他两个脚本并没有指定。我对此感到很困惑,请问您在实现的时候,训练自检索模型、预训练caption模型和最后的联合训练每张图片分别用了几个句子呢?

Traceback (most recent call last): File "train.py", line 250, in <module> train(opt) File "train.py", line 163, in train if opt.evaluation_retrieval: AttributeError: 'Namespace' object has no attribute 'evaluation_retrieval' Terminating BlobFetcher

While I was training with the command "bash run_fc_con.sh", the training was terminated because the following error:

Traceback (most recent call last):
File "train.py", line 250, in
train(opt)
File "train.py", line 163, in train
if opt.evaluation_retrieval:
AttributeError: 'Namespace' object has no attribute 'evaluation_retrieval'
Terminating BlobFetcher

However, in the opt.py file, I don't see the argument of "evaluation_retrieval".

infos_att_d1.pkl

Thanks for your works! Could you provide " infos_att_d1.pkl" for us?

att_masks

excuse me, would you mind explaining the function about the att_mask?

Issues in FCModel

When I try to train the retrieval model using bash run_fc_con.sh, there is a problem before saving model. Actually, line 147 of FCModel.py xt = self.new_img_embed(fc_feats[k:k+1], fc_feats_d.chunk(batch_size)[k]).expand(beam_size, self.input_encoding_size) results in the issue, because it reminds that there is no attribute named new_img_embed. Also, fc_feats_d is an error, because the pycharm says that "Unresolved reference".

But after switching --caption_model fc to --caption_model att2in2 in run_fc_con.sh file, the issue will be solved, so I think the inference of FC model could be wrong.

BTW, (1) when training retrieval model, why do you need to use the caption model to generate captions?
(2) the image features is extracted before training the model, so I think you do not fine-tune the CNN, right?

Training curve of reinforcement learning

When I train the model with RL using run_att_d.sh, the CIDEr score got a significant drop,
I saw that you have provided the curve of training VSE model#11,
I wonder would you like to provide the training curve of RL stage?
Thank you very much :D

the retrieval loss doesn't converge well

Hello, luo
when I pretrain the VSEFCmodel, the vse_loss doesn't converge well , just around 51.2. is there some mistakes in my experiments, how about your vse_loss when you pretrain VSEFCmodel?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.