microsoft / tap Goto Github PK

View Code? Open in Web Editor NEW

70.0 4.0 11.0 156 KB

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

License: MIT License

Python 99.07% C 0.93%

tap's Introduction

TAP: Text-Aware Pre-training

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, and Jiebo Luo

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, Oral

Introduction

We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. For more details, please refer to our paper.

Citation

@inproceedings{yang2021tap,
  title={TAP: Text-Aware Pre-training for Text-VQA and Text-Caption},
  author={Yang, Zhengyuan and Lu, Yijuan and Wang, Jianfeng and Yin, Xi and Florencio, Dinei and Wang, Lijuan and Zhang, Cha and Zhang, Lei and Luo, Jiebo},
  booktitle={CVPR},
  year={2021}
}

Prerequisites

Python 3.6
Pytorch 1.4.0
Please refer to requirements.txt. Or using
```
python setup.py develop
```

Installation

Clone the repository

git clone https://github.com/microsoft/TAP.git
cd TAP
python setup.py develop

Data

Please refer to the Readme in the data folder.

Training

Train the model, run the code under main folder. Using flag --pretrain to access the pre-training mode, otherwise the main QA/Captioning losses are used to optimize the model. Example yml files are in configs folder. Detailed configs are in released models.

Pre-training:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --pretrain --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$pretrain_yml".yml --save_dir save/$pretrain_savedir training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

Fine-tuning:

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --seed $seed --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --resume_file save/$pretrain_savedir/$savename/best.ckpt training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True

Evaluate the model, run the code under main folder. Set up val or test set by --run_type.

python -m torch.distributed.launch --nproc_per_node $num_gpu tools/run.py --tasks vqa --datasets $dataset --model $model --config configs/vqa/$dataset/"$refine_yml".yml --save_dir save/$refine_savedir --run_type val --resume_file save/$refine_savedir/$savename/best.ckpt training_parameters.distributed True

# for example
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt training_parameters.distributed True

Captioning evaluation.

python projects/M4C_Captioner/scripts/textcaps_eval.py --set val --pred_file YOUR_VAL_PREDICTION_FILE

Performance and Pre-trained Models

Please check the detailed experiment settings in our paper.

Model checkpoints (~17G).

path/to/azcopy copy https://tapvqacaption.blob.core.windows.net/data/save <local_path>/save --recursive

Please refer to the Readme in the data folder for the detailed instructions on azcopy downloading.

Text-VQA	TAP	TAP** (with extra data)
TextVQA	49.91	54.71
STVQA	45.29	50.83

Text-Captioning	TAP	TAP** (with extra data)
TextCaps	105.05	109.16

Credits

The project is built based on the following repository:

MMF: A multimodal framework for vision and language research.

tap's People

Contributors

Stargazers

Watchers

Forkers

henryjunw guitaricet deepshikharbhardwaj standardgalactic wangwen-whu lllilithyang priyamtejaswin test-mass-forker-org-1 frankzxshen leonodelee

tap's Issues

Text Caption

Hello,
Great work, kudos.
How to run TAP for Text Caption task? I see no instruction to run the code for the Text Caption task.

Thank you

Different --model for using `textvqa_tap_ocrcc_best.ckpt`?

Hello devs,

Thank you for publishing this work, and for sharing these resources!

I was trying to run the evaluation code for TextVQA that is mentioned in the README.
I can successfully run the following using textvqa_tap_base_best.ckpt

python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt

I believe this returns the results without the addition (OCR-CC) dataset. I think that checkpoint is saved under save/finetuned/textvqa_tap_ocrcc_best.ckpt

However, when I use the ocrcc checkpoint, it fails while loading the checkpoint...

2022-03-10T21:56:35 INFO: Loading datasets
2022-03-10T21:56:37 INFO: Fetching fastText model for OCR processing
2022-03-10T21:56:37 INFO: Loading fasttext model now from /usr1/home/ptejaswi/TAP/pythia/.vector_cache/wiki.en.bin
2022-03-10T21:56:47 INFO: Finished loading fasttext model
2022-03-10T21:56:50 INFO: CUDA Device 0 is: GeForce GTX TITAN X
2022-03-10T21:56:54 INFO: Torch version is: 1.8.1+cu101
2022-03-10T21:56:54 INFO: Loading checkpoint
2022-03-10T21:56:55 ERROR: Error(s) in loading state_dict for M4C:
	Missing key(s) in state_dict: "text_bert.encoder.layer.0.attention.self.query.weight", "text_bert.encoder.layer.0.attention.self.query.bias", "text_bert.encoder.layer.0.attention.self.key.weight", "text_bert.encoder.layer.0.attention.self.key.bias", "text_bert.encoder.layer.0.attention.self.value.weight", "text_bert.encoder.layer.0.attention.self.value.bias", "text_bert.encoder.layer.0.attention.output.dense.weight", "text_bert.encoder.layer.0.attention.output.dense.bias", "text_bert.encoder.layer.0.attention.output.LayerNorm.weight", "text_bert.encoder.layer.0.attention.output.LayerNorm.bias", "text_bert.encoder.layer.0.intermediate.dense.weight", "text_bert.encoder.layer.0.intermediate.dense.bias", "text_bert.encoder.layer.0.output.dense.weight", "text_bert.encoder.layer.0.output.dense.bias", "text_bert.encoder.layer.0.output.LayerNorm.weight", "text_bert.encoder.layer.0.output.LayerNorm.bias", "text_bert.encoder.layer.1.attention.self.query.weight", "text_bert.encoder.layer.1.attention.self.query.bias", "text_bert.encoder.layer.1.attention.self.key.weight", "text_bert.encoder.layer.1.attention.self.key.bias", "text_bert.encoder.layer.1.attention.self.value.weight", "text_bert.encoder.layer.1.attention.self.value.bias", "text_bert.encoder.layer.1.attention.output.dense.weight", "text_bert.encoder.layer.1.attention.output.dense.bias", "text_bert.encoder.layer.1.attention.output.LayerNorm.weight", "text_bert.encoder.layer.1.attention.output.LayerNorm.bias", "text_bert.encoder.layer.1.intermediate.dense.weight", "text_bert.encoder.layer.1.intermediate.dense.bias", "text_bert.encoder.layer.1.output.dense.weight", "text_bert.encoder.layer.1.output.dense.bias", "text_bert.encoder.layer.1.output.LayerNorm.weight", "text_bert.encoder.layer.1.output.LayerNorm.bias", "text_bert.encoder.layer.2.attention.self.query.weight", "text_bert.encoder.layer.2.attention.self.query.bias", "text_bert.encoder.layer.2.attention.self.key.weight", "text_bert.encoder.layer.2.attention.self.key.bias", "text_bert.encoder.layer.2.attention.self.value.weight", "text_bert.encoder.layer.2.attention.self.value.bias", "text_bert.encoder.layer.2.attention.output.dense.weight", "text_bert.encoder.layer.2.attention.output.dense.bias", "text_bert.encoder.layer.2.attention.output.LayerNorm.weight", "text_bert.encoder.layer.2.attention.output.LayerNorm.bias", "text_bert.encoder.layer.2.intermediate.dense.weight", "text_bert.encoder.layer.2.intermediate.dense.bias", "text_bert.encoder.layer.2.output.dense.weight", "text_bert.encoder.layer.2.output.dense.bias", "text_bert.encoder.layer.2.output.LayerNorm.weight", "text_bert.encoder.layer.2.output.LayerNorm.bias".

Do I need to change the --model argument passed to run.py? At the moment it is --model m4c_split.
This is the command to reproduce the above error:

python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_orcc_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_ocrcc_best.ckpt

Error during finetuning base model

Hi I encountered an error when I try to further finetune the base model. During validation check, there is a warning Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
How do I fix this ?
Log:

2022-04-15T21:01:58 INFO: m4c_textvqa:, 41100/41100, train/total_loss: 0.6898 (0.6971), train/m4c_textvqa/m4c_decoding_bce_with_mask: 0.6898 (0.6971), train/m4c_textvqa/textvqa_accuracy: 0.8406 (0.8330), val/total_loss: 6.9965, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.9965, val/m4c_textvqa/textvqa_accuracy: 0.4969, max mem: 6524.0, lr: 0., time: 01m 07s 802ms, eta: 
2022-04-15T21:01:58 INFO: Stepping into final validation check
2022-04-15T21:01:58 INFO: Evaluation time. Running on full validation set...
Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
intemediate model saving skipped. utiles/checkpoint, 41101
2022-04-15T21:05:21 INFO: m4c_textvqa: full val:, 41101/41100, val/total_loss: 6.5700, val/m4c_textvqa/m4c_decoding_bce_with_mask: 6.5700, val/m4c_textvqa/textvqa_accuracy: 0.4969, validation time: 04m 31s 394ms, best iteration: 41000, best val/m4c_textvqa/textvqa_accuracy: 0.499082
2022-04-15T21:05:21 INFO: Restoring checkpoint
2022-04-15T21:05:23 INFO: Starting inference on test set
  0%|                                                                                                                 | 0/180 [00:00<?, ?it/s]2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
  "Sample list has not field 'targets', are you "

2022-04-15T21:05:25 WARNING: /home/cybertron/TAP/pythia/modules/losses.py:93: UserWarning: Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1
  "Sample list has not field 'targets', are you "

Microsoft OCR data could not be found

Hello, I would like to know whether there are IMBD files and extracted features corresponding to Microsoft OCR in the data provided? I did not find the corresponding file, could you please clearly point out the corresponding path of each data?
The IMDB files in the figure all seem to correspond to Rosetta OCR.

A KEYERROR need to be solved--Emergency!

error raise:
File "/home/lianjunliang/anaconda3/envs/TAP/pythia/datasets/vqa/m4c_textvqa/dataset.py", line 112, in load_item
[self.object_clsname[x] for x in features['image_info_0']['objects']]
KeyError: 'objects'

we print fearures,here is what it comes:
2022-07-20T19:53:09 INFO: Starting training...
******************** {'image_feature_0': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000],
[0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000],
[4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000],
[0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_0': {'max_features': tensor(100)}, 'image_feature_1': tensor([[0.6171, 0.5207, 0.0000, ..., 6.4700, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 2.4846, 0.0000, 0.0000],
[0.0000, 0.0000, 1.2065, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 1.0070, ..., 5.8578, 5.1698, 0.0000],
[4.3617, 0.0000, 0.0000, ..., 0.0000, 7.6311, 0.0000],
[0.0000, 0.0000, 1.4914, ..., 0.0000, 3.2282, 0.0000]]), 'image_info_1': {'max_features': tensor(100)}}
2022-07-20T19:53:11 ERROR: Caught KeyError in DataLoader worker process 0.

Link has expired

Now(2323/12/09) I find the address provided in data/README.md is invalid. Could someone please provide me with a new dataset download address? Thanks.

Where is the text images in CC-OCR?

Hello!
When I try to download the link OCR-CC Data (Huge, ~1.3T), I find the CC-OCR dataset does not contain text images. So I would like to know where to get these images.

Reproduce of checkpoints

Dear authors:

I download the checkpoints Model checkpoints (~17G) and evaluate the model using the following code:

python tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --run_type val --resume_file save/finetuned/textvqa_tap_base_best.ckpt

I got the following results:

2022-03-24T11:13:42 INFO: m4c_textvqa: full val:, 41000/24000, val/total_loss: 7.9873, val/m4c_textvqa/m4c_decoding_bce_with_mask: 7.9873, val/m4c_textvqa/textvqa_accuracy: 0.4413

And I found an error prompt during the evaluation:

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors

In my opinion, the accuracy should be 0.4991 as shown in the following table:

What's wrong with my operations? Is there something to do with the error I encounter?

By the way, when I use the OCR-CC checkpoints: save/finetuned/textvqa_tap_ocrcc_best.ckpt, the accuracy is 0.4934 (which should be 0.5471), and I found the same error as mentioned above.

The GPU and PyTorch version is as following:

2022-03-24T11:09:34 INFO: CUDA Device 0 is: Tesla V100-SXM2-16GB
2022-03-24T11:09:37 INFO: Torch version is: 1.4.0

Hope to get your response

Thanks

TextCaps json file missing for TextVQA

Hello, I would like to train the TextVQA model with the extra data of TextCaps, however, the file 'TextCaps_0.1_train.json' is not provided in Line

TAP/pythia/datasets/vqa/m4c_textvqa/dataset.py

Line 36 in 352891f

capjson = json.load(open('data/original_dl/TextCaps_0.1_train.json','r'))

. Thanks!

Demo script..?

Thanks for sharing the source code.
Is there any script to check the textVQA result on single image? or do you have any plan about it?
Thanks,

Validation Accuracy different from paper

Hi, the validation accuracy I calculated for the fine-tuned models are different from the paper.
Command:

python -m torch.distributed.launch --nproc_per_node 2  tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split \
--config $config \
--save_dir $folder \
--run_type val \
--resume_file $finetuned_model \
training_parameters.distributed True

I observed changing the batch size results in different values.

	Val accuracy for batch size = 32	Val acc for batch size = 128	In paper
TextVQA TAP (base)	49.87	49.53	49.91
TextVQA TAP (additional data)	54.31	54.13	54.71

No targets for training

When I train the VQA model I get the warning "Sample list has not field 'targets', are you sure that your ImDB has labels? you may have wanted to run with --evalai_inference 1"

I executed the same command as mentioned:
python -m torch.distributed.launch --nproc_per_node 4 tools/run.py --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_refine.yml --save_dir save/m4c_split_refine_test --resume_file save/pretrained/textvqa_tap_base_pretrain.ckpt training_parameters.distributed True

Can you provide additional details on this and how to train the model with the targets? And can you point out where the targets and the predictions are getting compared to compute loss?

About the downloading errors by using azcopy

Hi, when I used azcopy command to download the data, the connection always reset by the remote host in the middle. I have tried this for two weeks but the error still remain...
So, may I know any other alternative ways downloading the data? Thanks a lot!

Error in running pretrain because of torch.distributed

Hi,
I install environment with below information
python=3.8
pytorch,cuda with command= conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
GPU= 1 geforce RTX 3090 (24 GPU-RAM)

I'm trying to run pretrain with below command
python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

but I encounter below code

Could you help me to resolve this problem?
Is this error because of using 1 GPU?
Do I need to change the initial value of a some parameter(like local_rank)?
Could the reason for this error be due to lack of GPU-memory?
It is very important to me to solve this problem.

Question about reproduce result

Hi!
I reproduce the TAP(w/o others) and the final accuracy is about 46.2% on the validation set. But it is reported the 49.91% on val set in the paper. Are there any details that I ignored? Or what is the reason for that?
Of course, due to insufficient memory, I can only set the batch size to 32, which is different from 128 in the paper.
Thanks a lot!

Error when download the code

failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason read tcp 192.168.0.178:51805->52.239.247.100:443: read: connection reset by peer.

TAP Pretraining on TextCaps, TextVQA, ST-VQA on TextVQA down-stream dataset

In the given README sample, I can see how to pretrain just on the TextVQA dataset, but when I want to pretrain with extra ST-VQA and TextCaps for the given TextVQA down-stream dataset, I have no idea.

By reading the newest mmf documents, I can see here are examples for training with extra ST-VQA config file, so now I successfully pretrain on the combination of TextVQA and ST-VQA. I just create a m4c_combo folder in the configs/vqa folder, and then new m4c_combo_pretrain.yml and m4c_combo_refine.yml. In the corresponding yml file, I include m4c_base_pretrain.yml and m4c_base_refine.yml, and then in the image_features and imdb_files field insert extra ST-VQA entry.

But when I want to add extra TextCaps dataset config, I get some no 'question_id' error in a dictionary during loading the dataset stage. I guess the reason may be TextCaps dataset has just 'caption' field in the imdb_files, but have not extra 'question_id' field in the imdb_files.

So can you provide extra multiple dataset pretraining examples or configuration files for us? I really appreciate for your help. And I believe this will be helpful for future textvqa research community, too.

Error of Pretraining User Defined Dataset

Hi：

I want to use TAP to pretrain model on my dataset, and I prepare the dataset following your data format.

But when I try to pretrain the model with distributed setting (use only one GPU is fine), I encounter the following error:

2022-04-15T14:13:50 INFO: m4c_textvqa:, 73100/96000, train/total_loss: 1.6139 (2.9855), train/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 1.6139 (2.9855), train/m4c_textvqa/maskpred_accuracy: 0.8486 (0.7797), val/total_loss: 4.3474, val/m4c_textvqa/pretrainonly_m4c_decoding_bce_with_mask: 4.3474 (4.3474), val/m4c_textvqa/maskpred_accuracy: 0.7328, max mem: 7456.0, lr: 0.00001, time: 02m 47s 324ms, eta: 10h 43m 43s 839ms 2022-04-15T14:13:50 INFO: Batch Size of one GPU:16 2022-04-15T14:14:40 ERROR: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=Truetotorch.nn.parallel.DistributedDataParallel; (2) making sure all forwardfunction outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforwardfunction. Please include the loss function and the structure of the return value offorward of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:514) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7ff58f8d1193 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10d::Reducer::prepare_for_backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x731 (0x7ff5dae6ff81 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #2: <unknown function> + 0xa0f14a (0x7ff5dae5c14a in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #3: <unknown function> + 0x2961c4 (0x7ff5da6e31c4 in /home/pai/envs/vqa/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #4: _PyCFunction_FastCallDict + 0x262 (0x56330c484562 in /home/pai/envs/vqa/bin/python) frame #5: <unknown function> + 0x183135 (0x56330c4b0135 in /home/pai/envs/vqa/bin/python) ...

Training loss drops as expected, but after several iterations (73100 iters in the above case), the above error happened. Which is very strange, since the kind of error should happened before the training starts.

Have you ever encounter the above problem? Or could you help me solve the problem?

Thanks very much.

Kang

What is the val set of pre-train

Hi! After downloading OCR-CC features, I found that there were only feature files of training set. But I noticed that the IMDB file contains information about the val set. And the 'tap_base_pretrain.yml' file needs to fill in the val set and test set. What should be filled in this part?
Thanks a lot!

Fail to get the data: AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.

After I ran azcopy login to authorize a user identity and it finally showede INFO: Login succeeded.,
I tried to run the download command azcopy copy https://tapvqacaption.blob.core.windows.net/data/data ./ --recursive
But I got a 401 failed, the detailed error information is as follows.

INFO: Scanning...                                                                                                                                           INFO: Authenticating to source using Azure AD
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support

failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/[email protected]/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=InvalidAuthenticationInfo) =====
Description=Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:3cd53a6d-601e-00c3-71ff-675565000000
Time:2021-06-23T07:13:31.2032025Z, Details:
   AuthenticationErrorDetail: Issuer validation failed. Issuer did not match.
   Code: InvalidAuthenticationInfo
   GET https://tapvqacaption.blob.core.windows.net/data?comp=list&delimiter=%2F&include=metadata&prefix=data%2F&restype=container&timeout=901
   Authorization: REDACTED
   User-Agent: [AzCopy/10.11.0 Azure-Storage/0.13 (go1.15; linux)]
   X-Ms-Client-Request-Id: [2c0efb91-40c3-4dd0-4634-aebdd3eeda04]
   X-Ms-Version: [2019-12-12]
   --------------------------------------------------------------------------------
   RESPONSE Status: 401 Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
   Content-Length: [402]                                                                                                                                       Content-Type: [application/xml]
   Date: [Wed, 23 Jun 2021 07:13:31 GMT]
   Server: [Microsoft-HTTPAPI/2.0]
   Www-Authenticate: [Bearer authorization_uri=https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/authorize resource_id=https://storage.azure.com]
   X-Ms-Error-Code: [InvalidAuthenticationInfo]
   X-Ms-Request-Id: [3cd53a6d-601e-00c3-71ff-675565000000]

I wonder know which step I did wrong and what should I do to download the data.

About the number of OCR in stvqa dataset

Hi！
I found that the number of words detected by OCR in some pictures in stvqa dataset is inconsistent with the corresponding feature number.
For example, the number of features in 'feat_resx/stvqa/train/imageNet/n03196217_ 7957. npy' is 33, while the number of OCR words in the corresponding 'ocr_ feat_ resx/stvqa_ conf/train/imageNet/n03196217_ 7957_info. npy' is 55. The two numbers do not match. About 2000 pictures have this problem in train dataset.

TextVQA accuracy

Hello, I have got the json file of TextVQA, can you tell me how to get the accuracy? Only the eval code for Textcaps is currently provided.

Require OCR-CC information (image IDs)

Hello @zyang-ur, and all

Thanks for this work, it is quite interesting.

I'm trying to obtain the OCR-CC dataset but due to my constraints, I can't download the 1.7TB dataset.
However, I have the CC dataset and it would be possible for me to obtain the subset of images that are in OCR-CC.

Could you please share the image IDs of CC that were used to construct OCR-CC?

Thanks in advance!

VQA output

Is it possible to visualize the output of the vqa and captioning i.e. what answers/captions models are producing?

Issue

how to convert multi distributed GPU to single GPU

in this project we should use multi GPU . How to change source code to could use for single GPU systems because when I run this code with system that has single GPU(Geforce gtx ti), I have error and I could not run project.
python -m torch.distributed.launch --nproc_per_node 1 tools/run.py --pretrain --tasks vqa --datasets m4c_textvqa --model m4c_split --seed 13 --config configs/vqa/m4c_textvqa/tap_base_pretrain.yml --save_dir save/m4c_split_pretrain_test training_parameters.distributed True

Hang when calculate validation accuracy

Hi, I ran a .sh script to calculate validation accuracy for few models.
The code hangs after calculating validation accuracy for a model ( the hang lasts for more than 30 minutes before). I have to use CTRL+C to break the hang, so the script continues calculate validation accuracy for the rest models (Hang occurs for each of the subsequent calculation too). How can I fix this ?

The print out on Terminal after CTRL + C :

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:37<00:00,  3.95s/it]
2022-06-06T00:21:54 INFO: Key current_iteration is not present in registry, returning default value of None
2022-06-06T00:21:54 INFO: m4c_textvqa: full val:, 0/4000, val/total_loss: 38.3987, val/m4c_textvqa/m4c_decoding_bce_with_mask: 38.3987, val/m4c_textvqa/textvqa_accuracy: 0.2572
^CTraceback (most recent call last):
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
    main()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 239, in main
    process.wait()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt