Coder Social home page Coder Social logo

yiconghong / discrete-continuous-vln Goto Github PK

View Code? Open in Web Editor NEW
79.0 79.0 7.0 26.42 MB

Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

License: MIT License

Python 99.29% Shell 0.71%
computer-vision cvpr2022 deep-learning embodied-ai vision-and-language vision-and-language-navigation visual-navigation

discrete-continuous-vln's Issues

Question about the imitation learning strategy in the paper

Hi Yicong,

I realized that the imitation learning loss you used in the code base is essentially the cross entropy loss between the predicted action and the oracle action which is obtained by selecting the closest waypoint to the goal. However, this oracle action might not be optimal because sometimes the closest waypoint may not be on the ground truth path (reference path in the dataset). like the following pic,

Screenshot 2024-03-07 at 5 52 59 PM

It is likely to cause the agent to loop around the area.

As the waypoint predictor shows very good results, I wonder if you can comment on how the waypoint predictor manages to avoid the above issue.

Many thanks!
Andy

Question about the training strategy

Hi Yicong, thanks for releasing the code of the Discrete-to-Continuous work. I have been reading papers in the VLN field, including the recent VLN-CE. There is one detail that confuses me for long after reading papers related to VLN-CE. I found that in discrete VLN, almost all recent works adopted a mixed IL + RL training strategy for better performance. However, most later works in VLN-CE instead turned to a simpler IL training scheme without any RL, including your Discrete-to-Continuous work. I wonder why researchers gave up the effective IL+RL strategy. Is it just a conventional choice following the first VLN-CE work or there are some other reasons? I really appreciate it if you can share your thoughts.

Hi I hava a question about action space in habitat setting

I have been reading your paper lately and looking at the code to try to reproduce the experiment. In ss_trainer_VLNBERT.py, 0 action (stop) or 4 action (HIGHTOLOW) at the end of each step is performed through env.step. Can you explain why? And can you tell me what action HIGHTOLOW is? Is it an act of lowering the perspective from the top down?

Debug Problem

Thanks for releasing the code of the dcvln, it is an very interesting job. I follow the instruction to run the source code, it works perfectly. But when I want to debug the code by Pycharm 2018 Pro, it gets stuck in
self.envs = construct_envs( self.config, get_env_class(self.config.ENV_NAME), episodes_allowed=episode_ids, auto_reset_done=False ) in ss_trainer_CMA.py line 361, which is very weird.
Specifically, I use the Debug in Pycharm IDE and run the code step by step, it always gets stuck in this line and no response at all.
I guess the problem is due to the multiprocess, maybe certain subprocess is wait for interaction or something. I tried to go inside and found habitat.VectorEnv() is no response. Possibly, is it the problem of our computer or IDE? Does debug mode work well in your side?
I'm looking for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.