This is the official code for the following paper published in IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023: Value-Based Subgoal Discovery and Path Planning for Reaching Long-Horizon Goals
This work addresses the challenge of training autonomous agents to reach long-horizon goals in spatial traversal tasks. It introduces a novel planning method called "Learning Subgoal Graph using Value-based Subgoal Discovery and Automatic Pruning" (LSGVP). Unlike existing methods, LSGVP uses a subgoal discovery heuristic based on cumulative reward, resulting in sparse subgoals that align with higher cumulative reward paths. Additionally, LSGVP includes an automatic pruning mechanism to remove erroneous connections between subgoals, particularly those across obstacles. As a result, LSGVP outperforms other methods in terms of achieving higher cumulative rewards and goal-reaching success rates in spatial traversal tasks.
Python >= 3.5.0
tensorflow==2.1.0
tf-agents==0.4.0
tensorflow-probability==0.9.0
In end-to-end training and testing, the following three phases are executed in the same proram run:
- Pre-training of RL agent policy and universal Q-value function (defined in
agent.py
,actor_critic.py
, andtrain_agent.py
). - Subgoal graph learning and pruning using the universal Q-value function learned in phase 1 (refer to
LSGVP.py
). - Testing using various long-horizon goals, using the learned RL agent and subgoal graph (refer to
test_goalseeking.py
).
Running the end-to-end program is very simple, just call
python main.py
The file main.py
contains the parameters/arguments and calls to training, subgoal graph construction, and testing functions.