Coder Social home page Coder Social logo

tomitoivio / aerial-vision-and-dialog-navigation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eric-ai-lab/aerial-vision-and-dialog-navigation

1.0 0.0 0.0 4.37 MB

Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"

Home Page: https://sites.google.com/view/aerial-vision-and-dialog/home

Shell 0.73% Python 99.27%

aerial-vision-and-dialog-navigation's Introduction

Aerial Vision-and-Dialog Navigation

The ability to converse with humans and follow natural language commands is crucial for intelligent unmanned aerial vehicles (a.k.a. drones). It can relieve people's burden of holding a controller all the time, allow multitasking, and make drone control more accessible for people with disabilities or with their hands occupied. To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation. We build a drone simulator with a continuous photorealistic environment and collect a new AVDN dataset of over 3k recorded navigation trajectories with asynchronous human-human dialogs between commanders and followers. The commander provides initial navigation instruction and further guidance by request, while the follower navigates the drone in the simulator and asks questions when needed. During data collection, followers' attention on the drone's visual observation is also recorded. Based on the AVDN dataset, we study the tasks of aerial navigation from (full) dialog history and propose an effective Human Attention Aided Transformer model (HAA-Transformer), which learns to predict both navigation waypoints and human attention.

Todos:

  • Data released
  • Train code uploaded
  • Inference code uploaded and checkpoint released
  • Eval.ai challenge setup
  • Dataset format explanation in detail

AVDN Challenge and Leaderboard

Based on the AVDN dataset, we are hosting an ICCV 2023 Challenge (co-located at the ICCV 2023 CLVL workshop) for the Aerial Navigation from Dialog History (ANDH) task on Eval.ai: https://eval.ai/web/challenges/challenge-page/2049/overview

Download Data

Download xView data

Our AVDN dataset uses satellite images from the xView dataset. Follow the instruction at https://challenge.xviewdataset.org/data-download to download xView dataset.

Then move the images in xView dataset to under AVDN directory. (Assume the xView images are at ./XVIEW_images):

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/train_images

cp -r XVIEW_images/*.tif Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/train_images/

Download AVDN datasets

(https://sites.google.com/view/aerial-vision-and-dialog/home):

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations

gdown 1xUHnrYaNGe_IBG7W1ecaf6U2cyuBfYLr -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/train_data.json

gdown 1mtT3AVJQNEbjKkH6aINX3kj7ROADkBET -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/val_seen_data.json

gdown 17fVSHmuB3EFHkfNRZle6kgVcvZcumsJr -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/val_unseen_data.json

gdown 14BijI07ukKCSDh3T_RmUG83z6Oa75M-U -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/test_unseen_data.json

Training and Evaluation

Download pre-trained xview-yolov3 weights and configuration file

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights

gdown 1Ke-pA5jpq1-fsEwAch_iRCtJHx6rQc-Z -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights/best.pt

gdown 1n6RMWcHAbS6DA7BBug6n5dyN6NPjiPjh -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights/yolo_v3.cfg

Download the training checkpoints corresponding to the experiments in the AVDN paper

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/et_haa/ckpts/

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/lstm_haa/ckpts/

gdown 1fA6ckLVA-gsiOmWmOMkqJggTLbiJpFBI -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/et_haa/ckpts/best_val_unseen

gdown 1RYjo_vc5m5ZRUcjIFojZjke8RhlfX90I -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/lstm_haa/ckpts/best_val_unseen

Install requirements

pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

pip install torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt

Run training or evaluation:

The script, scripts/avdn_paper/run_et_haa.sh, includes commands for train and evaluate Human Attention Aided Transformer (HAA-Transformer) model.

The script, scripts/avdn_paper/run_lstm_haa.sh, includes commands for train and evaluate Human Attention Aided LSTM (HAA-LSTM) model.

cd Aerial-Vision-and-Dialog-Navigation/src

# For Human Attention Aided Transformer model
bash scripts/avdn_paper/run_et_haa.sh 

# For Human Attention Aided LSTM model
bash scripts/avdn_paper/run_lstm_haa.sh 

If you find this useful, please cite

@inproceedings{fan-etal-2023-aerial,
    title = "Aerial Vision-and-Dialog Navigation",
    author = "Fan, Yue  and
      Chen, Winson  and
      Jiang, Tongzhou  and
      Zhou, Chun  and
      Zhang, Yi  and
      Wang, Xin Eric",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.190",
    doi = "10.18653/v1/2023.findings-acl.190",
    pages = "3043--3061",
}

aerial-vision-and-dialog-navigation's People

Contributors

eric-xw avatar tomitoivio avatar uefan avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.