Coder Social home page Coder Social logo

vhzy / explore-eqa Goto Github PK

View Code? Open in Web Editor NEW

This project forked from stanford-iliad/explore-eqa

0.0 0.0 0.0 4.49 MB

Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"

Home Page: https://explore-eqa.github.io/

Python 100.00%

explore-eqa's Introduction

Explore until Confident: Efficient Exploration for Embodied Question Answering

Allen Z. Ren, Jaden Clark, Anushri Dixit, Masha Itkina, Anirudha Majumdar, Dorsa Sadigh

Princeton University, Stanford University, Toyota Research Institute

Project webpage

Installation

Set up the conda environment (Linux, Python 3.9):

conda env create -f environment.yml
conda activate explore-eqa
pip install -e .

Install the latest version of Habitat-Sim (headless with no Bullet physics) with:

conda install habitat-sim headless -c conda-forge -c aihabitat

Set up Prismatic VLM with the submodule:

cd prismatic-vlms && pip install -e .

Download the train split (hm3d-train-habitat-v0.2.tar) of the HM3D dataset here. You will be asked to request for access first.

(Optional) For running CLIP-based exploration:

cd CLIP && pip install -e .

Dataset

We release the HM-EQA dataset, which includes 500 questions about 267 scenes from the HM-3D dataset. They are available in data/.

Usage

First specify scene_data_path in the config files with the path to the downloaded HM3D train split, and specify hf_token to be your Hugging Face user access token.Running the script below for the first time will download the VLM model, which assumes access to a GPU with sufficient VRAM for the chosen VLM.

Run our method (VLM-semantic exploration) in Habitat-Sim:

python run_vlm_exp.py -cf cfg/vlm_exp.yaml

Run CLIP-based exploration in Habitat-Sim:

python run_clip_exp.py -cf cfg/clip_exp.yaml

Load a scene (with the question from our dataset) in Habitat-Sim:

python test_scene.py -cf cfg/test_scene.yaml

Scripts

We also share a few scripts that might be helpful:

Acknowledgement

The CLIP-based exploration uses the CLIP multi-scale relevancy extractor from Semantic Abstraction.

explore-eqa's People

Contributors

allenzren avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.