facebookresearch / connect-caption-and-trace Goto Github PK

View Code? Open in Web Editor NEW

78.0 9.0 7.0 8.22 MB

A unified framework to jointly model images, text, and human attention traces.

Python 99.99% Shell 0.01%

connect-caption-and-trace's Issues

No module named 'captioning.data'

I couldn't find the custom dataloader in a source codes.
The error occurred in the

connect-caption-and-trace/tools/train.py

Line 20 in d015988

from captioning.data.dataloader import DataLoader

More details for creating dense word-to-box alignment.

Hi, @zihangm,

Thanks for your nice work!

I am curious about more details for creating dense word-to-box alignment in Section3.1 in your paper. I have compared your released coco_LN_trace_box data with the original released LN dataset annotations, and found that the numbers of trace segments of one specific image are not the same. For example, considering the image(id: 322944) in coco_val split, the number of trace segments in your released data is 13 while in the original released data the number is 18. So I wonder whether you took some extra rules for filtering or merging the original trace segments for better alignment in your data preprocessing?

Since I can't find related preprocessing code in the repo, I will appreciate it if you can share some experience.

Thanks,
Jianjie

Detectron2 Preprocessing

Hi, I'm having trouble following the steps for (5) Image features (with bounding boxes) extracted by a Mask-RCNN pretrained on Visual Genome. The step: Prepare COCO-style annotations for Visual Genome.

Could you elaborate on these steps? I think this is due to the depreciation of the repository that was linked, with the newer installation not allowing for the same steps.

I would really appreciate detailed preprocessing instructions, or if you could provide the preprocessed features directly, so it would be possible to recreate the results, that would be amazing as well.

Thank you for the amazing work!

About the storage location of the results (like the generated caption and trace)

Hi, @zihangm,

Thanks for your excellent work!
I have successfully run the code, but I can't find where those results are stored. After the training, just one new folder called "eval_result" appears, which only contains the image caption of the eval part.

I would appreciate it if you could share with me the right way to use this code.

Thanks,
yuhu

How to get npz files from the tsv files for step 5- feature extraction using detectron2?

Hi!
I am using the already available features by Peter Anderson mentioned in your readme to get features for MS COCO dataset. The features mentioned here are stored as tsv files. But the files required for training are of the npz format. I am getting the following error while training-

python tools/train.py --language_eval 0 --id transformer_LN_coco --caption_model transformer --input_json data/coco_LN.json --input_att_dir test2014/ --input_box_dir data/coco_LN_trace_box --input_label_h5 data/coco_LN_label.h5 --batch_size 30 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 100 --learning_rate_decay_every 3 --save_checkpoint_every 1000 --max_epochs 30 --max_length 225 --seq_per_img 1 --use_box 1 --use_trace 1 --input_trace_dir data/coco_LN_trace_box --use_trace_feat 0 --beam_size 1 --val_images_use -1 --num_layers 2 --task c_joint_t --eval_task caption --dataset_choice=coco

Terminal output-
Warning: coco-caption not available cider or coco-caption missing DataLoader loading json file: data/coco_LN.json vocab size is 8370 DataLoader loading h5 file: data/cocotalk_fc test2014/ data/coco_LN_trace_box data/coco_LN_label.h5 max sequence length in data is 225 read 123287 image features assigned 118287 images to split train assigned 5000 images to split val assigned 5000 images to split test <class 'captioning.models.TransformerModel_mitr.TransformerModel'> Traceback (most recent call last): File "/Users/mayankamedhe/connect-caption-and-trace/train.py", line 329, in <module> train(opt) File "/Users/mayankamedhe/connect-caption-and-trace/train.py", line 172, in train data = loader.get_batch('train') ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 424, in get_batch data = next(self.iters[split]) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 633, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 344, in __getitem__ att_feat = self.att_loader.get(str(self.info['images'][ix]['id'])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 76, in get f_input = open(os.path.join(self.db_path, key + self.ext), 'rb').read() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'test2014/462632.npz'

Could you tell me how to get the npz files required for training? Thanks a lot!

box_feats, trace_feats dimension size 5

Hi,

I was attempting to reproduce the model and I had two questions. I saw that the box_feats (which corresponds to the bounding box of object proposals) and trace_feats (corresponding to bounding box of traces) has 5 dimensions.

Could you elaborate on what each dimension means?
Specifically what is the 5th dimension? What does this value refer to?

Also, is the bounding box expressed in terms of width and height or secondary x,y coordinates, i.e:
(x, y, w, h, ?) or (x1, y1, x2, y2, ?).

Thank you!

Some questions about dataset

Hi,
I'm interested in the data file ‘(2) h5 file containing caption labels (DATASET_LN_label.h5)’ & ‘The trace labels extracted from Localized Narratives (DATASET_LN_trace_box/)’.

How are these data files generated？
What is fc_feat in the model's input?
which image features provided by Peter Anderson are suitable for this task?

facebookresearch / connect-caption-and-trace Goto Github PK

connect-caption-and-trace's Issues

No module named 'captioning.data'

More details for creating dense word-to-box alignment.

Detectron2 Preprocessing

About the storage location of the results (like the generated caption and trace)

How to get npz files from the tsv files for step 5- feature extraction using detectron2?

box_feats, trace_feats dimension size 5

Some questions about dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent