bearcatt / labert Goto Github PK

View Code? Open in Web Editor NEW

66.0 6.0 12.0 35 KB

A length-controllable and non-autoregressive image captioning model.

Python 100.00%

non-autoregressive controllable-image-captioning eccv2020 image-captioning

labert's People

Contributors

Stargazers

Watchers

Forkers

leoncatd rubickh nilinykh gyq716 whongzhong scut-ailab kyuyomo khanhnguyen21006 ammexm biubiuisacat qudu85 lucifer1702

labert's Issues

codes of VLP

Hi!
Thank you for the amazing work on LaBERT! I was wondering whether you would release the code for Length-Controllable VLP. In case I try my own implementation, adding length-level embedding to an input word embedding will suffice? Am I right?
BTW, where can I find weights for length-aware VLP or AoANet?
Thanks!

About datasets

thanks for your job sir ! In this work,can i use myself datasets? thank you!

"OSError: Caught OSError in DataLoader worker process 0." while training

I'm sorry. This is not a serious problem, only I made a mistake for designating the data path.
So, please delete this issue.

id2captions_test.json file

Hi!

You do evaluation using data/id2captions_test.json and data/caption_results.json files. But I could not find out where the files are. Can you share the files with us?

Thanks,

can't find the region_bbox.h5

Thanks for your amazing job ,i download the dataset from Baidunetdisk you provided ,but when i run the code, in the line88 dataset.py, i cant find the region_bbox.h5, is it in the dataset you provide?

Refinement Steps

Hi,
Great work, thanks for sharing the code.

I could not find the number of refinement steps T in the configuration / code. Does it mean that this code is for autoregressive generation (table 1 instead of 2) only? If I'm mistaken, would you point the parameter that corresponds to T?

Thanks!

Your GPU Spec for Training

Thanks for the great work!
In your paper, what are the GPU spec (e.g. GPU model, number of GPUs, and how many hours of training) for the training?

I can't download pre-trained weights

baidu said "client of new version set up, but still can't download"
Can you send me e-mail these weights?
My e-mail is [email protected]

Inference on raw images

Hi,
Thanks for sharing your interesting work on image captioning. I wanted to run the pretrained model on a few images of mine to test. Wanted to confirm if its this or this that I need to use to create the bounding and boxes and features for my images.
Thanks.

Question about number of region_spatial features

Hi @bearcatt,
Thanks for your sharing code. I met one problem here spatial_region, what are the 5th and 6th first feature about (region_spatial [:,:,[5,6])?
Moreover, from your paper there are five (four for location coordinate and one for relative area) in localization feature(in implementation details section), but in your code here there are six features in last dimension. Can you help me understand this part?

准备COCO数据问题

在VLP页面里，我是只需要下载95GB和79GB的文件就可以训练了么，还需要下载其他数据么

How to use your pre-trained models on a set of raw images to generate captions

Could you provide the steps in your Readme of how to use your pre-trained model on a a set of images to generate captions corresponding to them?
Thanks!

codes for AoA

Hi~
Your codes on LaBert is amazing, I wonder whether you would release the code of implementing non-autogressive length controllable image captioning on AoA as you announced in the paper.
Thanks!

Dataset format

Hi!
Thank you for your last reply.

I have a question about dataset. I did download this link's coco data part1 and part2 and exchanged files name coco_detection_vg_100dets_gvd_checkpoint_trainval_cls(0to999).h5 -> cls(0to999).h5, coco_detection_vg_100dets_gvd_checkpoint_trainval_feat(0to999).h5 -> feat(0to999).h5, and coco_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5 -> region_bbox.h5.
But an error has occurred. (command at terminal : python -m torch.distributed.launch --nproc_per_node=1 --master_port=4396 train.py save_dir result/ samples_per_gpu 1)

I add source code train.py at 90 line and dataset.py at 86line so these have prints.
train.py

print(pred_scores, gt_token_ids)
loss = criterion(pred_scores, gt_token_ids)

dataset.py

print(name)
with h5py.File(osp.join(self.root, f'feat{name[-3:]}.h5'), 'r') as features,
h5py.File(osp.join(self.root, f'cls{name[-3:]}.h5'), 'r') as classes,
h5py.File(osp.join(self.root, f'region_bbox.h5'), 'r') as bboxes:

Is it right to set up the dataset like this? If not, could you explain the dataset guidelines in more detail?
Thank you for providing good model. :)

Data Download Problem

Hi, I am interested in your great work and wish to follow it in my further experiments but I have a problem when trying to download MSCOCO data.
Since the one drive link provided in VLP cannot be visited in China without a VPN, it's difficult for me to prepare the data on my ubuntu machine.
Do you have any generous advice for me to solve this problem? Or would you please provide another download link that can be easily connected in China for MSCOCO data?
Thank you!

MS COCO dataset download issue from the link provided [VLP github page]

Hey! Really interested in your work and would love to divulge in it more.
I'm facing problems downloading the dataset. Can you please guide me as to where and how I can get the MS COCO dataset of 123k with splits of 113,287 images for training, 5,000 for validation, and 5,000 for offline evaluation?

The link provided for the dataset leads to the VLP github page. The dataset there is a combination of COCO and VQA of 95GB and 72GB. I'm unable to use them. Can you please suggest a way to download MS COCO dataset only.

Missing key(s) in state_dict

Hi,

Thanks for sharing this excellent work.
I encountered an issue when trying to test the inference step.
I downloaded the pretrained generator.pth and bert.pth from google drive.
The error message is attached below.
Could you give me an hint how to solve the problem?
Thanks in advance.

Traceback (most recent call last):
File "inference.py", line 140, in
g_checkpointer.load(config.model_path, True)
File "/data-2/home/xingzheng.xz/Research/LaBERT/utils/checkpointer.py", line 47, in load
self.model.load_state_dict(checkpoint.pop("model"))
File "/home/xingzheng.xz/miniconda3/envs/omninet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Generator:
Missing key(s) in state_dict: "classifier.decoder.bias", "embedding_layer.position_ids".