Hi Yicong, thanks for releasing the code of the Discrete-to-Continuous work. I have be

Question about the training strategy about discrete-continuous-vln HOT 5 CLOSED

yiconghong commented on August 28, 2024

Question about the training strategy

from discrete-continuous-vln.

Comments (5)

YicongHong commented on August 28, 2024 2

Hey Jackie,
I am not an expert in RL but I would like to share some of my experience in training VLN agents. Please correct me if I am wrong.

In discrete VLN where an agent has panoramic view and travels with high-level actions, it is easy to define rewards (e.g. progress-based, path fidelity-based) and quantify rewards. I think RL in this case serves the learning of view selection. With IL as a stabilizer, agent can explore and learn from mistakes in the early trainnig stage while maintaining a very stable learning curve.

However, in the continuous setting there are two main concerns. (1) It is hard to define rewards for low-level actions, think about how to shape rewards when agent turns 15 degrees, and think about we may only assign a very small reward when agent move forward a very small distance -- which is a very weak learning signal. (2) Computational cost, running in continuous environments requires significantly more computes than in discrete -- for the simulator to render the scenes and for the agent to learn much larger state space -- sadly, this is the main reason; I can't afford to run RL in my experiment.
My colleague Zun Wang @wz0919 tried RL + IL + low-level actions for continuous VLN (without much compute or tuning) and got very bad results. But for RL + IL + waypoint predictor, comparing to schedule sampling in our paper, RL + IL performs slightly better while the training time is similar.

But, what if we have sufficient resources? I believe with sufficient compute, great ideas can be pushed to another level. Take a look at the Waypoint Models (64GPUs), as well as many of the papers in object-goal navigation such as THDA (256GPUs) that only uses DD-PPO. Saying this, I also want to mention our Recurrent VLN-BERT, increase the batch size and re-attend the instruction in each step can achieve much higher performance ;).
On the other hand, you might be interested in the recent Habitat-Web -- with sufficient amount of IL data, an agent can learn exploration well via imitation.

Happy to learn more about your thoughts. Cheers.

from discrete-continuous-vln.

YESAndy commented on August 28, 2024 1

Hi Andy,

Thanks! The "single Nvidia 3090 and batch size of 64" is for training the waypoint predictor. For the navigation model, please see the Appendix for more details. For VLNBERT, the training takes about 3.5 days to complete 50 epochs using batch size 16 on a single 3090 GPU. I guess the speed difference is primarily due to the hardware, can you try to fit more samples (larger batch size)?

Cheers, Yicong

Hi Yicong,

Thank you for your reply! Yes, I tried larger batch sizes (8, 16, 32,...) but it seemed that it ate up a large amount of CPU memory (my computer has 24Gb RAM) so a larger batch size will cause the training to crash. I also checked the GPU utilization which is low (~40%) so I tried re-creating a new conda env and re-installing habitat-sim using conda (I built it from source before because I didn't realize my python version was not compatible with hatbitat-sim=0.1.7 ;))). I also revise some of the code to reduce CPU memory consumption. And now the training time is finally reduced to 1.5 hr/epoch 😫.

from discrete-continuous-vln.

Jackie-Chou commented on August 28, 2024

Thanks Yicong! It's so generous of you to share with me all these ideas and materials. I really appreciate it, sincerely! It is sad to know that RL in the continuous environment is unaffordable for now, but I agree with you that great ideas like your VLN-BERT could be pushed to another level given sufficient resources. Thanks again, and looking forward to seeing more great ideas from you!

from discrete-continuous-vln.

YESAndy commented on August 28, 2024

Hey Jackie, I am not an expert in RL but I would like to share some of my experience in training VLN agents. Please correct me if I am wrong.

In discrete VLN where an agent has panoramic view and travels with high-level actions, it is easy to define rewards (e.g. progress-based, path fidelity-based) and quantify rewards. I think RL in this case serves the learning of view selection. With IL as a stabilizer, agent can explore and learn from mistakes in the early trainnig stage while maintaining a very stable learning curve.

However, in the continuous setting there are two main concerns. (1) It is hard to define rewards for low-level actions, think about how to shape rewards when agent turns 15 degrees, and think about we may only assign a very small reward when agent move forward a very small distance -- which is a very weak learning signal. (2) Computational cost, running in continuous environments requires significantly more computes than in discrete -- for the simulator to render the scenes and for the agent to learn much larger state space -- sadly, this is the main reason; I can't afford to run RL in my experiment. My colleague Zun Wang @wz0919 tried RL + IL + low-level actions for continuous VLN (without much compute or tuning) and got very bad results. But for RL + IL + waypoint predictor, comparing to schedule sampling in our paper, RL + IL performs slightly better while the training time is similar.

But, what if we have sufficient resources? I believe with sufficient compute, great ideas can be pushed to another level. Take a look at the Waypoint Models (64GPUs), as well as many of the papers in object-goal navigation such as THDA (256GPUs) that only uses DD-PPO. Saying this, I also want to mention our Recurrent VLN-BERT, increase the batch size and re-attend the instruction in each step can achieve much higher performance ;). On the other hand, you might be interested in the recent Habitat-Web -- with sufficient amount of IL data, an agent can learn exploration well via imitation.

Happy to learn more about your thoughts. Cheers.

Hi Yicong,

Very impressive discussion. Just a quick follow-up question. As you mentioned in the paper you used Nvidia 3090 and a batch size of 64, I wonder how long your training time for VLNBERT is?

We have 4090 gpu and a batch size of 4 but the training time is extremely long (3 hr/epoch) TAT. I already checked the version of the Habitat-sim is as the same as yours and is installed with cuda. Do you have any suggestion regarding this issue?

Many thanks!!!!!!!

from discrete-continuous-vln.

YicongHong commented on August 28, 2024

Hi Andy,

Thanks! The "single Nvidia 3090 and batch size of 64" is for training the waypoint predictor. For the navigation model, please see the Appendix for more details. For VLNBERT, the training takes about 3.5 days to complete 50 epochs using batch size 16 on a single 3090 GPU.
I guess the speed difference is primarily due to the hardware, can you try to fit more samples (larger batch size)?

Cheers,
Yicong

from discrete-continuous-vln.

Question about the training strategy about discrete-continuous-vln HOT 5 CLOSED

Comments (5)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent