Coder Social home page Coder Social logo

mrzihan / hnr-vln Goto Github PK

View Code? Open in Web Editor NEW
30.0 2.0 0.0 24.08 MB

Official implementation of Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation (CVPR'24 Highlight).

Python 98.77% Shell 1.23%

hnr-vln's Introduction

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang and Shuqiang Jiang

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path branch via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method.

image

TODOs

  • Release the pre-training code of the Hierarchical Neural Radiance Representation Model.
  • Release the checkpoints of the Hierarchical Neural Radiance Representation Model.
  • Tidy the pre-training code for easy execution.
  • Release the fine-tuning code of the Lookahead VLN Model.
  • Release the checkpoints of the Lookahead VLN Model.

Issues

For training speed, see Issue#7

Load only a few scenes for efficient debugging, see Issue#4

Requirements

  1. Install Habitat simulator: follow instructions from ETPNav and VLN-CE.
  2. Download the Habitat-Matterport 3D Research Dataset (HM3D) from habitat-matterport-3dresearch
    hm3d-train-habitat-v0.2.tar
    hm3d-val-habitat-v0.2.tar
    
  3. Download annotations (PointNav, VLN-CE) and trained models from Baidu Netdisk or TeraBox.
  4. Download pre-trained waypoint predictor from link.
  5. Install torch_kdtree for K-nearest feature search from torch_kdtree.
    git clone https://github.com/thomgrand/torch_kdtree
    cd torch_kdtree
    git submodule init
    git submodule update
    pip3 install .
    
  6. Install tinycudann for faster multi-layer perceptrons (MLPs) from tiny-cuda-nn.
    pip3 install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
    

Pre-train the HNR model

bash run_r2r/nerf.bash train 2345

Evaluate the HNR model

Evaluate the cosine similarity between the HNR model's predicted features and the CLIP model's ground truth features.

bash run_r2r/nerf.bash eval 2345

Set Visualization to True in line 68 of HNR-VLN/NeRF/ss_trainer_ETP.py, visualize and save the images predicted by the HNR model.

Citation

@InProceedings{Wang_lookahead,
    author    = {Wang, Zihan and Li, Xiangyang and Yang, Jiahao and Liu, Yeqi and Hu, Junjie and Jiang, Ming and Jiang, Shuqiang},
    title     = {Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {13753-13762}
}

Acknowledgments

Our code is based on ETPNav, nerf-pytorch and torch_kdtree. Thanks for their great works!

hnr-vln's People

Contributors

mrzihan avatar

Stargazers

Andrew Leo avatar Zhou Jiang avatar  avatar  avatar Zhuoguang Chen avatar ycghb-chenmaiyue avatar  avatar kelly avatar  avatar Chunghyun Park avatar  avatar Zane Du avatar GDRS avatar Joochan Joseph Kim avatar Hao Li avatar  avatar  avatar Felix Igelbrink avatar lee avatar TeaQwQTea avatar Eric Shannon avatar Vigar avatar Maxwells_Ayakashi avatar  avatar dongxinfeng avatar Polaris avatar Zhide Zhong avatar Shuo Feng avatar  avatar Zun Wang avatar

Watchers

Kostas Georgiou avatar  avatar

hnr-vln's Issues

Code Release Time

Hi, thanks for this fantastic work. I'm very interested in the source code and its release date. l wonder when it will be available?
Thanks!

computer server problem

When i want to build an environment in the server i meet a problem below. this problem occurs while installing the python packages(tinycudann,torch_kdtree,pyliblzfse). i have worked on this problem about 1 week and tired al the methods proposed online . But they do not work. Did you meet the problem? thanks
problem

NUM_ENVIRONMENTS setting

This is really a nice work! But there are some questions I want to ask.
I found that when I set NUM_ENVIRONMENTS>1, this code will run fail. Is that because in this method NUM_ENVIRONMENTSonly can be 1? And by the way, how much time does the training procedure take?

MLP training problem

Hi! I wonder whether the finetuning stage of MLP for volumn rendering is before the task execution, or is it finetuned online while navigating? In other words, is it simultaneously mapping and navigating or is it finetuned in a specific scene before language navigation task in that scene?

conda environment problem

Dear author:
Thanks for your great work. When I was creating a conda environment following the requirements.txt. I found there are some packages followed with @filepath/ instead of version. Could you help me and provide a file without file path?Thanks so much.

hvln37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer Exception ignored in: <function VectorEnv.__del__ at 0x7f11486e6200>

Dear author:
When I try to run "run_nerf.py", It shows the following problems:
" File "/home/user/dxf/HNR-VLN-main1/habitat-lab/habitat/core/embodied_task.py", line 275, in _init_entities
task=self,
File "/home/user/dxf/HNR-VLN-main1/habitat-lab/habitat/tasks/vln/vln.py", line 60, in init
self.observation_space = spaces.Discrete(0)
File "/home/user/anaconda3/envs/hvln37/lib/python3.7/site-packages/gym/spaces/discrete.py", line 36, in init
assert n > 0, "n (counts) have to be positive"
AssertionError: n (counts) have to be positive"

and

" buf = self.recv_bytes()
File "/home/user/anaconda3/envs/hvln37/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/user/anaconda3/envs/hvln37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/user/anaconda3/envs/hvln37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <function VectorEnv.del at 0x7f11486e6200>
Traceback (most recent call last):
File "/home/user/dxf/HNR-VLN-main1/habitat-lab/habitat/core/vector_env.py", line 588, in del
self.close()",

    Do you have any suggestions? 

Lack file 'habitat_extensions.discrete_planner'

Hi,
When I try to run this code, I find some problem, "from habitat_extensions.discrete_planner import DiscretePathPlanner
ModuleNotFoundError: No module named 'habitat_extensions.discrete_planner". Could you provide this file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.