ruotianluo / imagecaptioning.pytorch Goto Github PK

View Code? Open in Web Editor NEW

1.4K 25.0 408.0 1.39 MB

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

License: MIT License

Python 94.77% HTML 0.57% Shell 4.66%

imagecaptioning.pytorch's Introduction

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

Self critical training from Self-critical Sequence Training for Image Captioning
Bottom up feature from ref.
Test time ensemble
Multi-GPU training. (DistributedDataParallel is now supported with the help of pytorch-lightning, see ADVANCED.md for details)
Transformer captioning model.

A simple demo colab notebook is available here

Requirements

Python 3
PyTorch 1.3+ (along with torchvision) (Test with 1.13)
cider (already been added as a submodule)
coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
yacs
lmdbdict
Optional: pytorch-lightning (Tested with 2.0)

Install

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Pretrained models

Checkout MODEL_ZOO.md.

If you want to do evaluation only, you can then follow this section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, see data/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details in data/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

$ python tools/train.py --cfg configs/fc.yml --id fc

The train script will dump checkpoints into the folder specified by --checkpoint_path (default = log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set --save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify --start_from option to be the path saving infos.pkl and model.pth (usually you could just set --start_from and --checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into --checkpoint_path.

The current command use scheduled sampling, you can also set --scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to pull the submodule coco-caption.

For all the arguments, you can specify them in a yaml file and use --cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, see opts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size. Use --num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set --sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use --beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed file cocotest.json, cocotest_bu_att and cocotest_bu_fc from link.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file like dataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

Checkout ADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

imagecaptioning.pytorch's People

Contributors

Stargazers

Watchers

Forkers

walkacross wanjinchang jiapei100 allensmile fireae clcarwin benjamesbabala leezqcst vikingmew kevinwenya liudongcan ly015 cosmmb maga33 yougoforward goongong cyanzhqq hitluobin dolphin93 blitu12345 davidbau sampathweb cfh3c qzane andy-yangz raoyongming deepylt tianlu-wang roozbehsanaei rbunn80110 mengfansheng16 feiyuhug sadscv stevenlol nke001 dmitriy-serdyuk yangzlthu alicepeter sysuwuxian kekedan yaojipeng linhanxiao metrofun melody-xiaomi shubhampachori12110095 sharmalakshay93 keniuniu lxtgh icrown lsqshr reyhaneaskari lan1991xu erastogi justin0111 aakk883 elviswf mingzailao merajat swordsmanxyz zz198808 itssubas jeffrey1hu porcofly x-zho14 xuwangyin junting lab930boss nagizeroiw shikaize zyj0021200 harsh19 wzn0828 lnest liviust binbinbian liketheflower haiboowang kcarnold jl0623 dannyhung1128 russellcloud niuqun jxlijunhao wanglolly chengweibian keita1 kevinptt0323 hologerry ttengwang deep-learning-with-pytorch gjyin prokia utayao hjmengx qmiwang yichencoding kaikangsdu gnehz flaick shafiahmed

imagecaptioning.pytorch's Issues

Doubt in criterion

Hey as far as I can see you are creating 1 indexed labels and that same labels go into criterion. Don't you need 0 indexed labels for pytorch tensor gather function

Some confusion about adaptive attention model

First, thanks so much for contributing such great codes.
However,I get some questions when I review code of the adaptive attention model.According to the paper "Knowing When to Look", LSTM only receive the word vector Xt and the previous hidden state Ht-1,instead of the image vector,but your code includes the image vector when building the LSTM.
Would you please explain it ?

When beam_size > 1, it shows size mismatch error.

When beam_size > 1, it shows errors as follows:
Traceback (most recent call last):
File "/home/mh/workspace/MyImageCaptioning/MyTrain.py", line 346, in
train()
File "/home/mh/workspace/MyImageCaptioning/MyTrain.py", line 263, in train
val_loss, predictions, lang_stats = eval_utils.eval_split(model, in_model, vb_model, jj_nn_model, crit, val_loader, cl_loss, epoch, eval_kwargs)
File "/home/mh/workspace/MyImageCaptioning/eval_utils.py", line 148, in eval_split
seq, _ = model.sample(_features, eval_kwargs)
File "/home/mh/workspace/MyImageCaptioning/models/ShowTellModel.py", line 134, in sample
return self.sample_beam(fc_feats, opt)
File "/home/mh/workspace/MyImageCaptioning/models/ShowTellModel.py", line 113, in sample_beam
xt = self.img_embed(fc).expand(beam_size, self.input_encoding_size)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 835, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:243

I check torch.expand() tutorial, found that, tensor expand first Dim must be same as the tensor. I cannot sure isn't the main reason result in this error.

Cpu issue

I changed the eval.py to run it on cpu but I was encountered to some error. Can you provide the cpu version of it? Thanks in advance

Teacher Forcing

Has teacher forcing been done to train this model ? And if yes , to what degree. I am traning my own show attend tell model, and at every step I do teacher forcing, it's overfitting terribly. What do you think I should do ?
Thanks so much !

Train on Flicker8k Dataset

I want to train on flicker8k dataset. Some error occur in 'prepro_feats.py' when I using 'dataset_flickr8k.json'. The error is about no 'cocoid'. How to solve it? Do i need to generate my own '**.json'? 3Q~

The performance on MS COCO Val5000

Hi, very good job. Can you give the final performance that this code can achieve on MS COCO validation-5000 comparing to karpathy's neuraltalk2?

Pre trained vectors

Hi I'm making my own version if a image captioning model, I haven't gone through your code in detail yet. I was wondering if you have used pre-trained word vectors for this task or just used an one hot encoding representation ?
And do you think the use of pre-trained word vectors make a substantial impact on training time and accuracy ?

A bug in 'fc' model when using GRU?

There is a bug in 'fc' model with GRU as rnn_type, the error hint is as follows. I know the bug is caused by line 35 of FCModel.py, but I don't know how to fix it. Any help will be appreciated.

neuraltalk2.pytorch/models/FCModel.py", line 35, in forward
next_c = forget_gate * state[1][-1] + in_gate * in_transform
File "/home//anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 76, in getitem
return Index.apply(self, key)
File "/home//anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
result = i.index(ctx.index)
IndexError: index 1 is out of range for dimension 0 (of size 1)

where is infos_pkl

TensorBoard Problem

Dear @ruotianluo,
When I run the train.py code for MS-COCO data set, I have faced the following error:
Traceback (most recent call last): File "train.py", line 204, in <module> train(opt) File "train.py", line 152, in train for k,v in lang_stats.items(): AttributeError: 'NoneType' object has no attribute 'items' Terminating BlobFetcher
this error occurred, when the evaluation process is running (i.e., save_checkpoint process).
When I have commented the # Write validation result into summary part in your train.py code, then everything were correct, unless I can't seen chart in the TensorBoard.

One problem during the training process

preprocessing steps are both ok, but when I am trainning the model after epoch 0,

evaluating validation preformance... -1/5000 (2.649871)
Traceback (most recent call last):
  File "train.py", line 204, in <module>
    train(opt)
  File "train.py", line 152, in train
    for k,v in lang_stats.items():
AttributeError: 'NoneType' object has no attribute 'items'
Terminating BlobFetcher

seems like codes in line 138-140 in file eval_utils.py meet the problem
lang_stats = None if lang_eval == 1: lang_stats = language_eval(dataset, predictions, eval_kwargs['id'], split)
I am new to pytorch

Python-3 support

Hi @ruotianluo ,
I looked at the coco-caption codebase.
It seems that we need some modifications like xrange -> range etc., to port the code to Python-3.

So, I'd like to know if you have any plans to port the pycoco tools to Python-3 so that we can use this current codebase for training models in Python-3?

What are the abbreviation of "fc" and "att" meaning?

I can see that the abbreviation of "fc" and "att" are full of the whole project. And it seems that come from misc/resnet_utils.py, but no any comment mentions about their meaning.

Could someone tell me what are the abbreviation of "fc" and "att" meaning and what are the two variables doing?

Thanks.

A question about evaluation funciton

Sorry again for bother you. But I can't understand the code
in the class LanguageModelCriterion
what does the mask do ?
I don't understand the loss calculation process , Is that is to calculate the poster probabilty ?
Can you tell me or give some reference. Many thanks!!!

I met a error in the training...

xw@xw:~/ImageCaptioning.pytorch-master$ python train.py --id st --caption_model show_tell --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_st --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 25
...
evaluating validation preformance... 4989/5000 (2.672655)
image 324313: a man is sitting on a bed with a laptop
image 46616: a man is riding a skateboard on a ramp
image 285832: a living room with a couch and a table
image 496718: a man is holding a cell phone while standing in a park
image 398209: a living room with a couch and a table
image 568041: a living room with a couch and a table
image 206596: a man is playing tennis on a tennis court
image 451949: a man is holding a skateboard in a park
image 203138: a man in a suit and tie is holding a cell phone
image 296759: a close up of a person holding a hot dog
evaluating validation preformance... -1/5000 (2.669259)
Traceback (most recent call last):
File "train.py", line 204, in
train(opt)
File "train.py", line 152, in train
for k,v in lang_stats.items():
AttributeError: 'NoneType' object has no attribute 'items'
Terminating BlobFetcher

some code missing?

python scripts/prepro_labels.py --input_json .../dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk failed. Here are the errors:

Traceback (most recent call last):
  File "scripts/prepro_labels.py", line 192, in <module>
    main(params)
  File "scripts/prepro_labels.py", line 138, in main
    imgs = imgs['images']
TypeError: list indices must be integers, not str

It seems that some code is missing.

UnboundLocalError: local variable 'resnet' referenced before assignment

python eval.py --model topdown/model-best.pth --infos_path topdown/infos_td-best.pkl --image_folder images --num_images 5

Traceback (most recent call last):
  File "eval.py", line 114, in <module>
    'cnn_model': opt.cnn_model})
  File "/home/demobin/work/github/neuraltalk2.pytorch/dataloaderraw.py", line 37, in __init__
    resnet = getattr(resnet, self.cnn_model)()
UnboundLocalError: local variable 'resnet' referenced before assignment

Error in eval

I am encountering following error. Can somebody help me to resolve this issue?

python eval.py --model no_finetune/att2in/model-best.pth --infos_path no_finetune/att2in/infos_a2i-best.pkl --image_folder ../images/ --num_images 5
DataLoaderRaw loading images from folder: ../images/
0
listing all images in directory ../images/
DataLoaderRaw found 4 images
Traceback (most recent call last):
File "eval.py", line 122, in
vars(opt))
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/eval_utils.py", line 102, in eval_split
seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/Att2inModel.py", line 197, in sample
return self.sample_beam(fc_feats, att_feats, opt)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/Att2inModel.py", line 186, in sample_beam
self.done_beams[k] = self.beam_search(state, logprobs, tmp_fc_feats, tmp_att_feats, tmp_p_att_feats, opt=opt)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/CaptionModel.py", line 105, in beam_search
state)
File "/home/prince/imageCaptioning/ImageCaptioning.pytorch/models/CaptionModel.py", line 50, in beam_step
candidate_logprob = beam_logprobs_sum[q] + local_logprob
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #3 'other'

I have a GPU and I have confirmed that torch is using that GPU

Train on other dataset

@ruotianluo Thank you for your fantastic code. as you mentioned, If one wants to train your code on another dataset must create a file like dataset_coco.json. Would you please explain the format of dataset_coco.json file?

performances of each model and self-critical

Dear @ruotianluo,
Thank you for your fantastic code. I have three problems:
1, I trained on my own dataset with top-dowm model.Would you please tell me the performances of each model on COCO dataset in models folder ? I read your document and just think the dop-dowm model performs best.
2, i don't find the process of image size when using resnet,and i resize the image 512*512 as input.
3, what is self-critical training? i don't find relevant parameters. I just find "Att2in model in self-critical" in Att2inModel.
Thank you again. I hope you don't mind so many questions and my poor english. Look forward to communicating with you !

The attention mechanism of '''top-down'' is different from that of ''show attend and tell'', but they seem the same in your code.

the paper pointed that fc is generated by faster rcnn, I find you use resnet instead.

Will it influence the result? and I cannot understand the att contains 14x14 in myResnet, what does it mean? Thanks

About using cpu on evaluating

Hi~
Thanks for sharing the codes. I have trained a gpu model using my own dataset. It really helps a lot.
However, now I need evaluate the model on another machine only with cpu. So could you help provide some codes about how to convert gpu model to a cpu checkpoint, and how to eval using cpu model? Thanks a lot!

questions about initializing the lstm hidden states

here :https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/OldModel.py#L49
you seems to directly init the hidden states with the fc_feats with a linear layer. So I want to ask that if I want to implement an attention model where the lstm takes fc_feats as input at step 0, and takes start token as input at step 1, like the figure below, then how to init the hidden states of lstm?

ValueError: sampler should be an instance of torch.utils.data.Sampler.

The solution is in another code.

class SubsetSampler(torch.utils.data.sampler.Sampler):
    def __init__(self, indices):
        self.indices = indices

    def __iter__(self):
        return (self.indices[i] for i in range(len(self.indices)))

    def __len__(self):
        return len(self.indices)

and

sampler=SubsetSampler(self.dataloader.split_ix[self.split][self.dataloader.iterators[self.split]:])

Benchmarks

Cross entropy loss (Cider score on validation set without beam search; 25epochs):
fc 0.92
att2in 0.95
att2in2 0.99
topdown 1.01

(self critical training is in https://github.com/ruotianluo/self-critical.pytorch)
Self-critical training. (Self critical after 25epochs; Suggestion: don't start self critical too late):
att2in 1.12
topdown 1.12

Test split (beam size 5):
cross entropy:
topdown: 1.07

self-critical:
topdown:
Bleu_1: 0.779 Bleu_2: 0.615 Bleu_3: 0.467 Bleu_4: 0.347 METEOR: 0.269 ROUGE_L: 0.561 CIDEr: 1.143
att2in2:
Bleu_1: 0.777 Bleu_2: 0.613 Bleu_3: 0.465 Bleu_4: 0.347 METEOR: 0.267 ROUGE_L: 0.560 CIDEr: 1.156

about topdown model

Dear @ruotianluo ,
I was wondering if you could tell me your pretrained topdown model use Faster r-cnn(bottom-up attention) or ResNet

About att_feature

Hi, I am new to image captioning. I want to know what's the att_feats here. Could anyone explain it? Because I fail to find it in original paper...

if i want to train cnn and LSTM，how should i do????

UnboundLocalError: local variable 'cnn_optimizer' referenced before assignment

Hi, ruotian. Thank you for fantastic code. But when I try to finetune CNN using your with_finetune branch, I get the error as below:

Traceback (most recent call last):
File "train.py", line 254, in
train(opt)
File "train.py", line 151, in train
cnn_optimizer.zero_grad()
UnboundLocalError: local variable 'cnn_optimizer' referenced before assignment

Cpu issue

I changed the eval.py to run it on cpu but I was encountered to some error. Can you provide the cpu version of it? Thanks in advance

It slowed down as training processed

When train, i notice it slow down. And it also influence on other tensorflow code(from 0.2s/per_batch --> 1.2s/per_batch --> 1.8s/per_batch). But once i stopped training, other codes' speed is back to 0.2s/per_batch. What may cause this situation?

Performance of CIDEr decreases when performing self critic training at first 6000 iterations

Hi, I follow the instruction to train the model. Take the fc model for example. I train it with cross entropy loss for 25 epochs(iterations: 336000), and the CIDEr on validation set is 0.92. Then I further train the model with scst, its CIDEr on validation set is 0.89 at iterations 342000. I want to ask why the cider decrease
at the first 6000 iterations of scst(self critic sequence training).

Performance

Hi ruotian,
Have you tested the results on standard benchmark ? I am curious about it.
Thanks !

AttributeError: 'Namespace' object has no attribute 'use_att'

python eval.py --dump_images 0 --num_images 5000 --model topdown/model-best.pth --infos_path topdown/infos_td-best.pkl --language_eval 1

Traceback (most recent call last):
  File "eval.py", line 109, in <module>
    loader = DataLoader(opt)
  File "/home/demobin/work/github/neuraltalk2.pytorch/dataloader.py", line 42, in __init__
    self.use_att = opt.use_att
AttributeError: 'Namespace' object has no attribute 'use_att'

Multi-GPU Training Support

Does your nice code support Multi-GPU training?

Features saved in many npy files are slow to read

On certain file systems (e.g. NFS) storing/reading thousands of files is very slow.

I can see that there is h5py imported, although never used. Were there any problem with hdf5? Maybe it makes sense to save features to a single npz?

We can help each other! Finding friend who is studying image captioning , my WeChat ID lijingx-正在研究image caption 朋友们来一起讨论代码

RT,my English and Chinese are OK,I think good discussion make easier and better study of image captioning.My wechat ID lijingx-.

how to control the num of thread?

Hi, ruotian:

Thanks for your awesome code of reproducing the 'self-critical sequence training'.

I have a question that how to control the num of thread. When I run the code, all the threads are opened, and it occupy much cpu resource.

Thanks you!

Generate soft attention pictures of each word

Like the paper mentions, "As the model generates each word, its attention changes to reflect the relevant parts of the image." I'd like generate the soft attention pictures of each word, but met some problems.
Is the script of eval.py can do the function? Or how to implement the function?

Best regards.

dataloaderraw.py, line 25 - IOError: [Errno 2] No such file or directory: '/home-nfs/rluo/rluo/model/pytorch-resnet/resnet101.pth'

This access of an absolute directory throws an error unless on rluo's account.
Not difficult to manually fix for a user, but just thought I'd flag it.

Evaluation: AttributeError: 'Namespace' object has no attribute 'caption_model'

When running eval.py on python2.7, I get this error:

File "eval.py", line 99, in <module> model = models.setup(opt) File "/path/to/neuraltalk2.pytorch/models.py", line 16, in setup if opt.caption_model == 'show_tell': AttributeError: 'Namespace' object has no attribute 'caption_model'

It looks like the "caption_model" argument is missing from the Argument Parser in eval.py, causing an error to be thrown when model.py attempts to access it.

I see that there are model settings are in "opts.py". Are we somehow meant to import these?

It generate all the same sentences to different images, and i found the input of LSTM in the second time step is all zeros.

It generate all the same sentences when eval. I found when sample_max == 1, the second input to LSTM is all zeros. And if sample_max != 1, the result is not match to the picture although it seems right.

I cannot reach the score where "readme.md“ mentioned.

Hi, I use parameters as follows:(ShowAndTell)
CNN: resnet152
LSTM: 2 Layers
other paramters are same as you mentioned, but the CIDEr score is only 0.681
And when change resnet152 to InceptionV4, the CIDEr score is only 0.651.
Both of them are far away from 0.84 which you mentioned in train parts.
Can you give me some advice on this score? I have tried lots of different parameters, but the score is still low.

Potential Issue of using multi-gpu

Hey, thanks for this amazing repo.

I was reading through your code, and I think there might be a potential issue for using multiple GPU with torch.nn.DataParallel

Particularly, you break the procedure when sentences reach the end (by checking their sum)

# https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/AttModel.py#L90-L91
if i >= 1 and seq[:, i].data.sum() == 0:
    break

# I am using AttModel.py as an example, but it should be the same to other models

When forward passing the model, the data in mini-batch will be divided and send into individual GPUs, there could be a case where the output on one GPU is shorter than the others. This will result in an error when collecting output from all GPUs, since their dimension mismatch.

Is there any particular reason why you break the process, instead of letting it run to the end (let the for loop finish)?

Is it possible to detect anomaly with neuraltalk?

Hi, @rym9005023 @gujiuxiang @ruotianluo

Is it possible to detect anomaly with neuraltalk?

I have converted my 1d signals to images.

And I want to enter this images to neuraltalk network for anomaly signal detection.

I will just train the text "Normal" and "Abnormal" to neuraltalk for anomaly captioning.

Is that possible?

Thanks in advance.

failed to generate and save fc and att features to .h5 files, for my own datasets

processing 0/279 (0.00% done)
Traceback (most recent call last):
File "/home/jzheng/PycharmProjects/ImageCaptioning_Skyler/scripts/prepro_feats_sky.py", line 119, in
main(params)
File "/home/jzheng/PycharmProjects/ImageCaptioning_Skyler/scripts/prepro_feats_sky.py", line 89, in main
(2048,), dtype="float")
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 119, in create_dataset
self[name] = dset
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 287, in setitem
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)

some bug found in using

eval.py
line 79: opt.input_fc_h5 = infos['opt'].input_fc_h5 need change to opt.input_fc_dir = infos['opt'].input_fc_dir
line 80: opt.input_att_h5 = infos['opt'].input_att_h5 need change to opt.input_att_dir = infos['opt'].input_att_dir

dataloaderraw.py
line 104: img = img.concatenate((img, img, img), axis=2) need change to img = np.concatenate((img, img, img), axis=2)

About the batch_size

If I set --battch_size=16 and seq_per_img=5, the actual batch_size is 80 , right?