yiconghong / recurrent-vln-bert Goto Github PK
View Code? Open in Web Editor NEWCode of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
License: Other
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
License: Other
Hi there,
Congratulations for your CVPR paper and for releasing your code. I was wondering whether you could clarify the structure of the checkpoints you released. I'm interested in the OSCAR version of your model and I tried to load it. However, it looks like the following parameters cannot be found:
'img_projection.weight', 'img_projection.bias'
I tried to inspect the VLNBert
class in the file vlnbert_OSCAR.py
and it looks like there is not a module called img_projection
. Instead, seems there is one in the vlnbert_PREVALENT.py
file. In addition, even in the original OSCAR codebase I cannot find a mention to the img_projection
layer (https://github.com/microsoft/Oscar/blob/master/oscar/modeling/modeling_bert.py). Could you please verify that the released model checkpoints are correct and referring to the correct models?
Thanks,
Alessandro
Thank you very much for your excellent work!
Could you please share how to download panoramic images and obtain visualizations for navigation?
I really appreciate your help!
#15 Hi, Yicong, I have the same request, could you send me the link to download? My email is [email protected]. Thanks!
Hi! I don't see any codes about 'speaker', a useful way to make data augmentation for R2R. I am wondering why you delete the speaker part in your codes? Or have you done the experiments to show that using speaker doesn't work well in your method?
Thanks a lot!
Hello Yicong,
Can you please add a section in the README about using Matterport3DSimulator docker image with your code? The documentation is missing details on where to put the ResNet zip, prevalent JSON, and the PyTorch model. It is unclear how MatterPort3DSimulator works with your code.
Thanks
Hi yicong. I want to cite your work about REVERIE. Could you please provide the code and feartures? I have sent an e-mail to you. Thanks a lot!
Hi Yicong,
Thanks for open source your code!
I wonder why do you split instructions in /r2r_src/env.py, line 129 to 142
# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
try:
new_item = dict(item)
new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
new_item['instructions'] = instr
''' BERT tokenizer '''
instr_tokens = tokenizer.tokenize(instr)
padded_instr_tokens, num_words = pad_instr_tokens(instr_tokens, args.maxInput)
new_item['instr_encoding'] = tokenizer.convert_tokens_to_ids(padded_instr_tokens)
if new_item['instr_encoding'] is not None: # Filter the wrong data
self.data.append(new_item)
scans.append(item['scan'])
except:
continue
This is done for original path-instruction but not for prevalent_aug.json
. I wonder why do you do this. I understand that instructions in the original data is a bit long, but if you split then in to separate VLN jobs, while the desired path is always the complete path, how can an agent (or human) possibly do that?
Best,
Jason
I have reproduce this codebase. I am writing for apply to download the model and code for the REVERIE branch which I will take it as a baseline. And I have send an email to you.
Looking forward your reply! Thanks in advance!
Hi ,@YicongHong
Thank you so much for the great work you do. I reproduced Recurrent VLN BERT on the R2R and REVERIE datasets according to the file readme.md. During the training process, it takes about 2,500 minutes for a single GPU to run 200,000 iterations. Even if dual GPUs are used, this time still does not drop at all. I'm very confused about this. Do you also need such a long time in the training process? Why does adding GPUs not increase speed? What limits the improvement of speed in training process? Is there any other way to improve the speed besides reducing the number of iterations?
Hi Yicong, I am interested in citing your work on REVERIE and would appreciate it if you could share the code and features with me. I have already sent an email to you regarding this matter. Thank you very much for your assistance!
Hi Yicong, I'm very impressed by the recurrent VLN-BERT work and want to re-produce it for the REVERIE task. But I noticed that the current repo seemed to only provide related files for the R2R task, e.g. the view features. So I open this issue to ask if you could release the object features (from Faster R-CNN) used in the REVERIE task. Thanks in advance!
How do you train the REVERIE dataset?
Hi, yicong,
Thanks for your great work!
I found the vocab size of R2R is 991,but the vocab size of Prevalent aug data is 1101. Additionaly, the Prevalent instructions is generated based on a speaker model trained on R2R dataset. Do you have any idea about this?
Thanks,
I cannot download Augmented data
Hi Yicong,
Following https://github.com/peteanderson80/Matterport3DSimulator/blob/master/tasks/R2R/data/download.sh wouldn't lead to R2R_val_train_seen.json, So where can I get it?
Best,
Jason
Hello, I was trying to run the model with bash run/test_agent.bash
as instructed in your readme but i get the error:
Optimizer: Using AdamW To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html Traceback (most recent call last): File "r2r_src/train.py", line 13, in <module> from agent import Seq2SeqAgent File "/Recurrent-VLN-BERT/r2r_src/agent.py", line 21, in <module> import model_OSCAR, model_PREVALENT File "/Recurrent-VLN-BERT/r2r_src/model_OSCAR.py", line 7, in <module> from vlnbert.vlnbert_init import get_vlnbert_models File "/Recurrent-VLN-BERT/r2r_src/vlnbert/vlnbert_init.py", line 3, in <module> from transformers.pytorch_transformers import (BertConfig, BertTokenizer) ModuleNotFoundError: No module named 'transformers.pytorch_transformers'
I have transformers and pytorch transformers installed also the old version of pytorch-pretrained-bert and am unsure of what is causing this, any help? thanks in advance
Hi Yicong, I wonder how do you initialize the no init. OSCAR model to get the results reported in the paper. Did you initialize all the parameters randomly or use some pretrained weights, e.g., initialize the language part with Bert pretrained weights?
When I run on the R2R_test dataset, the nDTW will have a key error. So I guess the id_paths.json doesn't contain the R2R_test dataset, is it right?
Hi, Yicong
When I try to config the environment of Recurrent-VLN-BERT, I find only the old version of Mattorport3D Simulator supports this code. Because the new version has changed its api like 'sim.makeAction' for Parallel navigation. Maybe you can note this matter needing attention in readme.txt.
Hello,
Thanks again for your codebase. It was very useful indeed and congratulations for your accepted paper. I was wondering whether you could please add a license to your codebase so that it's very clear how this code can be used by third parties.
Thanks,
Alessandro
Hi,yicong! I have reproduced this codebase. While I tried run/test_agent.bash, I notice the data file R2R_test.json wasn't used by the test. So I set the key parameter 'submit' as 1 and rewirte the file 'id_paths.json' to test without any other change. And I get the following results.
`Optimizer: Using AdamW
Namespace(IMAGENET_FEATURES='img_features/ResNet-152-imagenet.tsv', angle_feat_size=128, aug=None, batchSize=16, description='VLNBERT-test-Prevalent', dropout=0.5, epsilon=0.1, featdropout=0.4, feature_size=2048, features='places365', feedback='sample', gamma=0.9, ignoreid=-100, iters=300000, load='snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen', loadOptim=False, log_dir='snap/VLNBERT-test-Prevalent', lr=1e-05, maxAction=15, maxInput=80, ml_weight=0.2, name='VLNBERT-test-Prevalent', normalize_loss='total', optim='adamW', optimizer=<class 'torch.optim.adamw.AdamW'>, submit=1, teacher='final', teacher_weight=1.0, test_only=0, train='validlistener', vlnbert='prevalent', weight_decay=0.0, zero_init=False)
Start loading the image feature ... (~50 seconds)
Finish Loading the image feature from img_features/ResNet-152-places365.tsv in 54.7334 seconds
The feature size is 2048
Loading navigation graphs for 61 scans
R2RBatch loaded with 14039 instructions, using splits: train
The feature size is 2048
Loading navigation graphs for 59 scans
R2RBatch loaded with 1501 instructions, using splits: val_train_seen
The feature size is 2048
Loading navigation graphs for 56 scans
R2RBatch loaded with 1021 instructions, using splits: val_seen
The feature size is 2048
Loading navigation graphs for 11 scans
R2RBatch loaded with 2349 instructions, using splits: val_unseen
The feature size is 2048
Loading navigation graphs for 18 scans
R2RBatch loaded with 4173 instructions, using splits: test
Initalizing the VLN-BERT model ...
Loaded the listener model at iter 114000 from snap/VLNBERT-PREVALENT-final/state_dict/best_val_unseen
result length 1501
Env name: val_train_seen, nav_error: 0.8354, oracle_error: 0.6634, steps: 5.1845, lengths: 10.0276, success_rate: 0.9394, oracle_rate: 0.9520, spl: 0.9124
result length 1021
Env name: val_seen, nav_error: 2.8968, oracle_error: 1.9405, steps: 5.5436, lengths: 11.1379, success_rate: 0.7228, oracle_rate: 0.7826, spl: 0.6775
result length 2349
Env name: val_unseen, nav_error: 3.9255, oracle_error: 2.5431, steps: 6.1243, lengths: 12.0028, success_rate: 0.6279, oracle_rate: 0.7024, spl: 0.5688
result length 4173
Env name: test, nav_error: 9.0420, oracle_error: 0.0000, steps: 6.1107, lengths: 12.3490, success_rate: 0.0357, oracle_rate: 1.0000, spl: 0.0000`
I am really shocked by the results on the test data. Do I make some mistakes? Where is it?
Hi Yicong,
This is not directly related to your code, but I've spent hours trying to follow the Matterport3DSimulator repo to build it, I encountered issues either building with or without Docker.
With docker, MatterSim can be built, but it is only available for system python, since I used anaconda on the lab server, importing matterSim will fail in my anaconda environment.
Without docker, the build failed. It's some line in the code has an error. line 59 of src/lib/NavGraph.cpp. CV_LOAD_IMAGE_ANYDEPTH is not defined in the scope. I only downloaded matterport_skybox_images
, and this might be the problem (however, the readme.md in matterport3dsimulator says matterport_skybox_images
is what you need to get the simulator to build and work) I wonder what data did you download from matterport 3D dataset?
Best,
Jason
Thank you so much for the great work you do. I reproduced Recurrent VLN on the R2R datasets according to the file readme.md.
But I only get the results for the validation set, how can I get the results of the Test Unseen?
How to see the visualization of the navigation result as shown in the paper? Have you made some videos for the agent's navigation process?
Hello,
Thanks a lot for maintaining your open-source code!
As mentioned in #9, is it possible to have models and code available for REVERIE? I would like to have a fair comparison of your approach.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.