Coder Social home page Coder Social logo

connect-caption-and-trace's Issues

More details for creating dense word-to-box alignment.

Hi, @zihangm,

Thanks for your nice work!

I am curious about more details for creating dense word-to-box alignment in Section3.1 in your paper. I have compared your released coco_LN_trace_box data with the original released LN dataset annotations, and found that the numbers of trace segments of one specific image are not the same. For example, considering the image(id: 322944) in coco_val split, the number of trace segments in your released data is 13 while in the original released data the number is 18. So I wonder whether you took some extra rules for filtering or merging the original trace segments for better alignment in your data preprocessing?

Since I can't find related preprocessing code in the repo, I will appreciate it if you can share some experience.

Thanks,
Jianjie

Detectron2 Preprocessing

Hi, I'm having trouble following the steps for (5) Image features (with bounding boxes) extracted by a Mask-RCNN pretrained on Visual Genome. The step: Prepare COCO-style annotations for Visual Genome.

Could you elaborate on these steps? I think this is due to the depreciation of the repository that was linked, with the newer installation not allowing for the same steps.

I would really appreciate detailed preprocessing instructions, or if you could provide the preprocessed features directly, so it would be possible to recreate the results, that would be amazing as well.

Thank you for the amazing work!

How to get npz files from the tsv files for step 5- feature extraction using detectron2?

Hi!
I am using the already available features by Peter Anderson mentioned in your readme to get features for MS COCO dataset. The features mentioned here are stored as tsv files. But the files required for training are of the npz format. I am getting the following error while training-

python tools/train.py --language_eval 0 --id transformer_LN_coco --caption_model transformer --input_json data/coco_LN.json --input_att_dir test2014/ --input_box_dir data/coco_LN_trace_box --input_label_h5 data/coco_LN_label.h5 --batch_size 30 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 100 --learning_rate_decay_every 3 --save_checkpoint_every 1000 --max_epochs 30 --max_length 225 --seq_per_img 1 --use_box 1 --use_trace 1 --input_trace_dir data/coco_LN_trace_box --use_trace_feat 0 --beam_size 1 --val_images_use -1 --num_layers 2 --task c_joint_t --eval_task caption --dataset_choice=coco

Terminal output-
Warning: coco-caption not available cider or coco-caption missing DataLoader loading json file: data/coco_LN.json vocab size is 8370 DataLoader loading h5 file: data/cocotalk_fc test2014/ data/coco_LN_trace_box data/coco_LN_label.h5 max sequence length in data is 225 read 123287 image features assigned 118287 images to split train assigned 5000 images to split val assigned 5000 images to split test <class 'captioning.models.TransformerModel_mitr.TransformerModel'> Traceback (most recent call last): File "/Users/mayankamedhe/connect-caption-and-trace/train.py", line 329, in <module> train(opt) File "/Users/mayankamedhe/connect-caption-and-trace/train.py", line 172, in train data = loader.get_batch('train') ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 424, in get_batch data = next(self.iters[split]) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 633, in __next__ data = self._next_data() ^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/anaconda3/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] ~~~~~~~~~~~~^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 344, in __getitem__ att_feat = self.att_loader.get(str(self.info['images'][ix]['id'])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/mayankamedhe/connect-caption-and-trace/captioning/data/dataloader.py", line 76, in get f_input = open(os.path.join(self.db_path, key + self.ext), 'rb').read() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'test2014/462632.npz'

Could you tell me how to get the npz files required for training? Thanks a lot!

box_feats, trace_feats dimension size 5

Hi,

I was attempting to reproduce the model and I had two questions. I saw that the box_feats (which corresponds to the bounding box of object proposals) and trace_feats (corresponding to bounding box of traces) has 5 dimensions.

Could you elaborate on what each dimension means?
Specifically what is the 5th dimension? What does this value refer to?

Also, is the bounding box expressed in terms of width and height or secondary x,y coordinates, i.e:
(x, y, w, h, ?) or (x1, y1, x2, y2, ?).

Thank you!

Some questions about dataset

Hi,
I'm interested in the data file ‘(2) h5 file containing caption labels (DATASET_LN_label.h5)’ & ‘The trace labels extracted from Localized Narratives (DATASET_LN_trace_box/)’.

  1. How are these data files generated?
  2. What is fc_feat in the model's input?
  3. which image features provided by Peter Anderson are suitable for this task?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.