Coder Social home page Coder Social logo

yiconghong / recurrent-vln-bert Goto Github PK

View Code? Open in Web Editor NEW
149.0 3.0 27.0 799 KB

Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation

License: Other

Python 99.16% Shell 0.84%
vision-and-language-navigation cvpr2021 cvpr-oral transformer bert pre-trained-model vision-and-language

recurrent-vln-bert's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

recurrent-vln-bert's Issues

Mismatch between weights?

Hi there,

Congratulations for your CVPR paper and for releasing your code. I was wondering whether you could clarify the structure of the checkpoints you released. I'm interested in the OSCAR version of your model and I tried to load it. However, it looks like the following parameters cannot be found:

'img_projection.weight', 'img_projection.bias'

I tried to inspect the VLNBert class in the file vlnbert_OSCAR.py and it looks like there is not a module called img_projection. Instead, seems there is one in the vlnbert_PREVALENT.py file. In addition, even in the original OSCAR codebase I cannot find a mention to the img_projection layer (https://github.com/microsoft/Oscar/blob/master/oscar/modeling/modeling_bert.py). Could you please verify that the released model checkpoints are correct and referring to the correct models?

Thanks,
Alessandro

How to download panoramic images?

Thank you very much for your excellent work!

Could you please share how to download panoramic images and obtain visualizations for navigation?

I really appreciate your help!

Why don't you use ‘speaker’ during training?

Hi! I don't see any codes about 'speaker', a useful way to make data augmentation for R2R. I am wondering why you delete the speaker part in your codes? Or have you done the experiments to show that using speaker doesn't work well in your method?
Thanks a lot!

Unable to test code

Hello Yicong,

Can you please add a section in the README about using Matterport3DSimulator docker image with your code? The documentation is missing details on where to put the ResNet zip, prevalent JSON, and the PyTorch model. It is unclear how MatterPort3DSimulator works with your code.

Thanks

A requst for the REVERIE

Hi yicong. I want to cite your work about REVERIE. Could you please provide the code and feartures? I have sent an e-mail to you. Thanks a lot!

Why split instructions?

Hi Yicong,

Thanks for open source your code!

I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142

# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
    try:
        new_item = dict(item)
        new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
        new_item['instructions'] = instr

        ''' BERT tokenizer '''
        instr_tokens = tokenizer.tokenize(instr)
        padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
        new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)

        if new_item['instr_encoding'] is not None:  # Filter the wrong data
            self.data.append(new_item)
            scans.append(item['scan'])
    except:
        continue

This is done for original path-instruction but not for prevalent_aug.json. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?

Best,
Jason

How to reduce the time-consuming during training processes?

Hi ,@YicongHong
Thank you so much for the great work you do. I reproduced Recurrent VLN BERT on the R2R and REVERIE datasets according to the file readme.md. During the training process, it takes about 2,500 minutes for a single GPU to run 200,000 iterations. Even if dual GPUs are used, this time still does not drop at all. I'm very confused about this. Do you also need such a long time in the training process? Why does adding GPUs not increase speed? What limits the improvement of speed in training process? Is there any other way to improve the speed besides reducing the number of iterations?

A Request for Code of REVERIE

Hi Yicong, I am interested in citing your work on REVERIE and would appreciate it if you could share the code and features with me. I have already sent an email to you regarding this matter. Thank you very much for your assistance!

Could you provide the object features for the REVERIE task?

Hi Yicong, I'm very impressed by the recurrent VLN-BERT work and want to re-produce it for the REVERIE task. But I noticed that the current repo seemed to only provide related files for the R2R task, e.g. the view features. So I open this issue to ask if you could release the object features (from Faster R-CNN) used in the REVERIE task. Thanks in advance!

REVERIE

How do you train the REVERIE dataset?

The vocab size

Hi, yicong,

Thanks for your great work!
I found the vocab size of R2R is 991,but the vocab size of Prevalent aug data is 1101. Additionaly, the Prevalent instructions is generated based on a speaker model trained on R2R dataset. Do you have any idea about this?

Thanks,

ModuleNotFoundError: No module named 'transformers.pytorch_transformers'

Hello, I was trying to run the model with bash run/test_agent.bash as instructed in your readme but i get the error:
Optimizer: Using AdamW To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html Traceback (most recent call last): File "r2r_src/train.py", line 13, in <module> from agent import Seq2SeqAgent File "/Recurrent-VLN-BERT/r2r_src/agent.py", line 21, in <module> import model_OSCAR, model_PREVALENT File "/Recurrent-VLN-BERT/r2r_src/model_OSCAR.py", line 7, in <module> from vlnbert.vlnbert_init import get_vlnbert_models File "/Recurrent-VLN-BERT/r2r_src/vlnbert/vlnbert_init.py", line 3, in <module> from transformers.pytorch_transformers import (BertConfig, BertTokenizer) ModuleNotFoundError: No module named 'transformers.pytorch_transformers'
I have transformers and pytorch transformers installed also the old version of pytorch-pretrained-bert and am unsure of what is causing this, any help? thanks in advance

Details about the no init. OSCAR model

Hi Yicong, I wonder how do you initialize the no init. OSCAR model to get the results reported in the paper. Did you initialize all the parameters randomly or use some pretrained weights, e.g., initialize the language part with Bert pretrained weights?

Version of the Matterport3D Simulator

Hi, Yicong

When I try to config the environment of Recurrent-VLN-BERT, I find only the old version of Mattorport3D Simulator supports this code. Because the new version has changed its api like 'sim.makeAction' for Parallel navigation. Maybe you can note this matter needing attention in readme.txt.

Specify license for the code

Hello,

Thanks again for your codebase. It was very useful indeed and congratulations for your accepted paper. I was wondering whether you could please add a license to your codebase so that it's very clear how this code can be used by third parties.

Thanks,
Alessandro

the data file R2R_test.json wasn't used when testing?

Hi,yicong! I have reproduced this codebase. While I tried run/test_agent.bash, I notice the data file R2R_test.json wasn't used by the test. So I set the key parameter 'submit' as 1 and rewirte the file 'id_paths.json' to test without any other change. And I get the following results.

`Optimizer: Using AdamW
Namespace(IMAGENET_FEATURES='img_features/ResNet-152-imagenet.tsv', angle_feat_size=128, aug=None, batchSize=16, description='VLNBERT-test-Prevalent', dropout=0.5, epsilon=0.1, featdropout=0.4, feature_size=2048, features='places365', feedback='sample', gamma=0.9, ignoreid=-100, iters=300000, load='snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen', loadOptim=False, log_dir='snap/VLNBERT-test-Prevalent', lr=1e-05, maxAction=15, maxInput=80, ml_weight=0.2, name='VLNBERT-test-Prevalent', normalize_loss='total', optim='adamW', optimizer=<class 'torch.optim.adamw.AdamW'>, submit=1, teacher='final', teacher_weight=1.0, test_only=0, train='validlistener', vlnbert='prevalent', weight_decay=0.0, zero_init=False)

Start loading the image feature ... (~50 seconds)
Finish Loading the image feature from img_features/ResNet-152-places365.tsv in 54.7334 seconds
The feature size is 2048
Loading navigation graphs for 61 scans
R2RBatch loaded with 14039 instructions, using splits: train
The feature size is 2048
Loading navigation graphs for 59 scans
R2RBatch loaded with 1501 instructions, using splits: val_train_seen
The feature size is 2048
Loading navigation graphs for 56 scans
R2RBatch loaded with 1021 instructions, using splits: val_seen
The feature size is 2048
Loading navigation graphs for 11 scans
R2RBatch loaded with 2349 instructions, using splits: val_unseen
The feature size is 2048
Loading navigation graphs for 18 scans
R2RBatch loaded with 4173 instructions, using splits: test

Initalizing the VLN-BERT model ...
Loaded the listener model at iter 114000 from snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen
result length 1501
Env name: val_train_seen, nav_error: 0.8354, oracle_error: 0.6634, steps: 5.1845, lengths: 10.0276, success_rate: 0.9394, oracle_rate: 0.9520, spl: 0.9124
result length 1021
Env name: val_seen, nav_error: 2.8968, oracle_error: 1.9405, steps: 5.5436, lengths: 11.1379, success_rate: 0.7228, oracle_rate: 0.7826, spl: 0.6775
result length 2349
Env name: val_unseen, nav_error: 3.9255, oracle_error: 2.5431, steps: 6.1243, lengths: 12.0028, success_rate: 0.6279, oracle_rate: 0.7024, spl: 0.5688
result length 4173
Env name: test, nav_error: 9.0420, oracle_error: 0.0000, steps: 6.1107, lengths: 12.3490, success_rate: 0.0357, oracle_rate: 1.0000, spl: 0.0000`

I am really shocked by the results on the test data. Do I make some mistakes? Where is it?

Failed to build Matterport3D Simulator

Hi Yicong,

This is not directly related to your code, but I've spent hours trying to follow the Matterport3DSimulator repo to build it, I encountered issues either building with or without Docker.

With docker, MatterSim can be built, but it is only available for system python, since I used anaconda on the lab server, importing matterSim will fail in my anaconda environment.

Without docker, the build failed. It's some line in the code has an error. line 59 of src/lib/NavGraph.cpp. CV_LOAD_IMAGE_ANYDEPTH is not defined in the scope. I only downloaded matterport_skybox_images, and this might be the problem (however, the readme.md in matterport3dsimulator says matterport_skybox_images is what you need to get the simulator to build and work) I wonder what data did you download from matterport 3D dataset?

Best,
Jason

R2R Test Unseen

Thank you so much for the great work you do. I reproduced Recurrent VLN on the R2R datasets according to the file readme.md.
But I only get the results for the validation set, how can I get the results of the Test Unseen?

is it possible to have a branch for REVERIE

Hello,

Thanks a lot for maintaining your open-source code!

As mentioned in #9, is it possible to have models and code available for REVERIE? I would like to have a fair comparison of your approach.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.