yiconghong / recurrent-vln-bert Goto Github PK

View Code? Open in Web Editor NEW

149.0 3.0 27.0 799 KB

Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation

License: Other

Python 99.16% Shell 0.84%

vision-and-language-navigation cvpr2021 cvpr-oral transformer bert pre-trained-model vision-and-language

recurrent-vln-bert's People

Stargazers

Watchers

recurrent-vln-bert's Issues

Mismatch between weights?

Hi there,

Congratulations for your CVPR paper and for releasing your code. I was wondering whether you could clarify the structure of the checkpoints you released. I'm interested in the OSCAR version of your model and I tried to load it. However, it looks like the following parameters cannot be found:

'img_projection.weight', 'img_projection.bias'

I tried to inspect the VLNBert class in the file vlnbert_OSCAR.py and it looks like there is not a module called img_projection. Instead, seems there is one in the vlnbert_PREVALENT.py file. In addition, even in the original OSCAR codebase I cannot find a mention to the img_projection layer (https://github.com/microsoft/Oscar/blob/master/oscar/modeling/modeling_bert.py). Could you please verify that the released model checkpoints are correct and referring to the correct models?

Thanks,
Alessandro

How to download panoramic images？

Thank you very much for your excellent work!

Could you please share how to download panoramic images and obtain visualizations for navigation?

I really appreciate your help!

Request for REVERIE

#15 Hi, Yicong, I have the same request, could you send me the link to download? My email is [email protected]. Thanks!

Why don't you use ‘speaker’ during training?

Hi! I don't see any codes about 'speaker', a useful way to make data augmentation for R2R. I am wondering why you delete the speaker part in your codes? Or have you done the experiments to show that using speaker doesn't work well in your method?
Thanks a lot!

Unable to test code

Hello Yicong,

Can you please add a section in the README about using Matterport3DSimulator docker image with your code? The documentation is missing details on where to put the ResNet zip, prevalent JSON, and the PyTorch model. It is unclear how MatterPort3DSimulator works with your code.

Thanks

A requst for the REVERIE

Hi yicong. I want to cite your work about REVERIE. Could you please provide the code and feartures? I have sent an e-mail to you. Thanks a lot!

Why split instructions?

Hi Yicong,

Thanks for open source your code!

I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142

# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
    try:
        new_item = dict(item)
        new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
        new_item['instructions'] = instr

        ''' BERT tokenizer '''
        instr_tokens = tokenizer.tokenize(instr)
        padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
        new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)

        if new_item['instr_encoding'] is not None:  # Filter the wrong data
            self.data.append(new_item)
            scans.append(item['scan'])
    except:
        continue

This is done for original path-instruction but not for prevalent_aug.json. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?

Best,
Jason

Request to download the model and code for the REVERIE branch

I have reproduce this codebase. I am writing for apply to download the model and code for the REVERIE branch which I will take it as a baseline. And I have send an email to you.
Looking forward your reply! Thanks in advance!

How to reduce the time-consuming during training processes?

Hi ,@YicongHong
Thank you so much for the great work you do. I reproduced Recurrent VLN BERT on the R2R and REVERIE datasets according to the file readme.md. During the training process, it takes about 2,500 minutes for a single GPU to run 200,000 iterations. Even if dual GPUs are used, this time still does not drop at all. I'm very confused about this. Do you also need such a long time in the training process? Why does adding GPUs not increase speed? What limits the improvement of speed in training process? Is there any other way to improve the speed besides reducing the number of iterations?

A Request for Code of REVERIE

Hi Yicong, I am interested in citing your work on REVERIE and would appreciate it if you could share the code and features with me. I have already sent an email to you regarding this matter. Thank you very much for your assistance!

Could you provide the object features for the REVERIE task?

Hi Yicong, I'm very impressed by the recurrent VLN-BERT work and want to re-produce it for the REVERIE task. But I noticed that the current repo seemed to only provide related files for the R2R task, e.g. the view features. So I open this issue to ask if you could release the object features (from Faster R-CNN) used in the REVERIE task. Thanks in advance!

REVERIE

How do you train the REVERIE dataset？

The vocab size

Hi, yicong,

Thanks for your great work!
I found the vocab size of R2R is 991，but the vocab size of Prevalent aug data is 1101. Additionaly, the Prevalent instructions is generated based on a speaker model trained on R2R dataset. Do you have any idea about this?

Thanks,

can you give me an Augmented data download link

I cannot download Augmented data

No such file or directory: 'data/R2R_val_train_seen.json'

Hi Yicong,

Following https://github.com/peteanderson80/Matterport3DSimulator/blob/master/tasks/R2R/data/download.sh wouldn't lead to R2R_val_train_seen.json, So where can I get it?

Best,
Jason

ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

Hello, I was trying to run the model with bash run/test_agent.bash as instructed in your readme but i get the error:
Optimizer: Using AdamW To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html Traceback (most recent call last): File "r2r_src/train.py", line 13, in <module> from agent import Seq2SeqAgent File "/Recurrent-VLN-BERT/r2r_src/agent.py", line 21, in <module> import model_OSCAR, model_PREVALENT File "/Recurrent-VLN-BERT/r2r_src/model_OSCAR.py", line 7, in <module> from vlnbert.vlnbert_init import get_vlnbert_models File "/Recurrent-VLN-BERT/r2r_src/vlnbert/vlnbert_init.py", line 3, in <module> from transformers.pytorch_transformers import (BertConfig, BertTokenizer) ModuleNotFoundError: No module named 'transformers.pytorch_transformers'
I have transformers and pytorch transformers installed also the old version of pytorch-pretrained-bert and am unsure of what is causing this, any help? thanks in advance

Details about the no init. OSCAR model

Hi Yicong, I wonder how do you initialize the no init. OSCAR model to get the results reported in the paper. Did you initialize all the parameters randomly or use some pretrained weights, e.g., initialize the language part with Bert pretrained weights?

Does this file 'id_paths.json' contain the test dataset?

When I run on the R2R_test dataset, the nDTW will have a key error. So I guess the id_paths.json doesn't contain the R2R_test dataset, is it right?

Version of the Matterport3D Simulator

Hi, Yicong

When I try to config the environment of Recurrent-VLN-BERT, I find only the old version of Mattorport3D Simulator supports this code. Because the new version has changed its api like 'sim.makeAction' for Parallel navigation. Maybe you can note this matter needing attention in readme.txt.

Specify license for the code

Hello,

Thanks again for your codebase. It was very useful indeed and congratulations for your accepted paper. I was wondering whether you could please add a license to your codebase so that it's very clear how this code can be used by third parties.

Thanks,
Alessandro

the data file R2R_test.json wasn't used when testing?

Hi,yicong! I have reproduced this codebase. While I tried run/test_agent.bash, I notice the data file R2R_test.json wasn't used by the test. So I set the key parameter 'submit' as 1 and rewirte the file 'id_paths.json' to test without any other change. And I get the following results.

`Optimizer: Using AdamW
Namespace(IMAGENET_FEATURES='img_features/ResNet-152-imagenet.tsv', angle_feat_size=128, aug=None, batchSize=16, description='VLNBERT-test-Prevalent', dropout=0.5, epsilon=0.1, featdropout=0.4, feature_size=2048, features='places365', feedback='sample', gamma=0.9, ignoreid=-100, iters=300000, load='snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen', loadOptim=False, log_dir='snap/VLNBERT-test-Prevalent', lr=1e-05, maxAction=15, maxInput=80, ml_weight=0.2, name='VLNBERT-test-Prevalent', normalize_loss='total', optim='adamW', optimizer=<class 'torch.optim.adamw.AdamW'>, submit=1, teacher='final', teacher_weight=1.0, test_only=0, train='validlistener', vlnbert='prevalent', weight_decay=0.0, zero_init=False)

Start loading the image feature ... (~50 seconds)
Finish Loading the image feature from img_features/ResNet-152-places365.tsv in 54.7334 seconds
The feature size is 2048
Loading navigation graphs for 61 scans
R2RBatch loaded with 14039 instructions, using splits: train
The feature size is 2048
Loading navigation graphs for 59 scans
R2RBatch loaded with 1501 instructions, using splits: val_train_seen
The feature size is 2048
Loading navigation graphs for 56 scans
R2RBatch loaded with 1021 instructions, using splits: val_seen
The feature size is 2048
Loading navigation graphs for 11 scans
R2RBatch loaded with 2349 instructions, using splits: val_unseen
The feature size is 2048
Loading navigation graphs for 18 scans
R2RBatch loaded with 4173 instructions, using splits: test

Initalizing the VLN-BERT model ...
Loaded the listener model at iter 114000 from snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen
result length 1501
Env name: val_train_seen, nav_error: 0.8354, oracle_error: 0.6634, steps: 5.1845, lengths: 10.0276, success_rate: 0.9394, oracle_rate: 0.9520, spl: 0.9124
result length 1021
Env name: val_seen, nav_error: 2.8968, oracle_error: 1.9405, steps: 5.5436, lengths: 11.1379, success_rate: 0.7228, oracle_rate: 0.7826, spl: 0.6775
result length 2349
Env name: val_unseen, nav_error: 3.9255, oracle_error: 2.5431, steps: 6.1243, lengths: 12.0028, success_rate: 0.6279, oracle_rate: 0.7024, spl: 0.5688
result length 4173
Env name: test, nav_error: 9.0420, oracle_error: 0.0000, steps: 6.1107, lengths: 12.3490, success_rate: 0.0357, oracle_rate: 1.0000, spl: 0.0000`

I am really shocked by the results on the test data. Do I make some mistakes? Where is it?

Failed to build Matterport3D Simulator

Hi Yicong,

This is not directly related to your code, but I've spent hours trying to follow the Matterport3DSimulator repo to build it, I encountered issues either building with or without Docker.

With docker, MatterSim can be built, but it is only available for system python, since I used anaconda on the lab server, importing matterSim will fail in my anaconda environment.

Without docker, the build failed. It's some line in the code has an error. line 59 of src/lib/NavGraph.cpp. CV_LOAD_IMAGE_ANYDEPTH is not defined in the scope. I only downloaded matterport_skybox_images, and this might be the problem (however, the readme.md in matterport3dsimulator says matterport_skybox_images is what you need to get the simulator to build and work) I wonder what data did you download from matterport 3D dataset?

Best,
Jason

As mentioned in #9, is it possible to have models and code available for REVERIE? I would like to have a fair comparison of your approach.

yiconghong / recurrent-vln-bert Goto Github PK

recurrent-vln-bert's People

Stargazers

Watchers

Forkers

recurrent-vln-bert's Issues

Recommend Projects

Recommend Topics

Recommend Org