Coder Social home page Coder Social logo

eric-ai-lab / aerial-vision-and-dialog-navigation Goto Github PK

View Code? Open in Web Editor NEW
32.0 2.0 5.0 4.4 MB

Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"

Home Page: https://sites.google.com/view/aerial-vision-and-dialog/home

Python 99.26% Shell 0.74%
aerial-imagery drone-navigation navigation vision-and-language vln

aerial-vision-and-dialog-navigation's Introduction

Aerial Vision-and-Dialog Navigation

The ability to converse with humans and follow natural language commands is crucial for intelligent unmanned aerial vehicles (a.k.a. drones). It can relieve people's burden of holding a controller all the time, allow multitasking, and make drone control more accessible for people with disabilities or with their hands occupied. To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation. We build a drone simulator with a continuous photorealistic environment and collect a new AVDN dataset of over 3k recorded navigation trajectories with asynchronous human-human dialogs between commanders and followers. The commander provides initial navigation instruction and further guidance by request, while the follower navigates the drone in the simulator and asks questions when needed. During data collection, followers' attention on the drone's visual observation is also recorded. Based on the AVDN dataset, we study the tasks of aerial navigation from (full) dialog history and propose an effective Human Attention Aided Transformer model (HAA-Transformer), which learns to predict both navigation waypoints and human attention.

Todos:

  • Data released
  • Train code uploaded
  • Inference code uploaded and checkpoint released
  • Eval.ai challenge setup
  • Dataset format explanation in detail

AVDN Challenge and Leaderboard

Based on the AVDN dataset, we are hosting an ICCV 2023 Challenge (co-located at the ICCV 2023 CLVL workshop) for the Aerial Navigation from Dialog History (ANDH) task on Eval.ai: https://eval.ai/web/challenges/challenge-page/2049/overview

Download Data

Download xView data

Our AVDN dataset uses satellite images from the xView dataset. Follow the instruction at https://challenge.xviewdataset.org/data-download to download xView dataset.

Then move the images in xView dataset to under AVDN directory. (Assume the xView images are at ./XVIEW_images):

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/train_images

cp -r XVIEW_images/*.tif Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/train_images/

Download AVDN datasets

(https://sites.google.com/view/aerial-vision-and-dialog/home):

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations

gdown 1xUHnrYaNGe_IBG7W1ecaf6U2cyuBfYLr -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/train_data.json

gdown 1mtT3AVJQNEbjKkH6aINX3kj7ROADkBET -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/val_seen_data.json

gdown 17fVSHmuB3EFHkfNRZle6kgVcvZcumsJr -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/val_unseen_data.json

gdown 14BijI07ukKCSDh3T_RmUG83z6Oa75M-U -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/annotations/test_unseen_data.json

Training and Evaluation

Download pre-trained xview-yolov3 weights and configuration file

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights

gdown 1Ke-pA5jpq1-fsEwAch_iRCtJHx6rQc-Z -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights/best.pt

gdown 1n6RMWcHAbS6DA7BBug6n5dyN6NPjiPjh -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/pretrain_weights/yolo_v3.cfg

Download the training checkpoints corresponding to the experiments in the AVDN paper

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/et_haa/ckpts/

mkdir -p Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/lstm_haa/ckpts/

gdown 1fA6ckLVA-gsiOmWmOMkqJggTLbiJpFBI -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/et_haa/ckpts/best_val_unseen

gdown 1RYjo_vc5m5ZRUcjIFojZjke8RhlfX90I -O Aerial-Vision-and-Dialog-Navigation/datasets/AVDN/lstm_haa/ckpts/best_val_unseen

Install requirements

pip install torch==1.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

pip install torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt

Run training or evaluation:

The script, scripts/avdn_paper/run_et_haa.sh, includes commands for train and evaluate Human Attention Aided Transformer (HAA-Transformer) model.

The script, scripts/avdn_paper/run_lstm_haa.sh, includes commands for train and evaluate Human Attention Aided LSTM (HAA-LSTM) model.

cd Aerial-Vision-and-Dialog-Navigation/src

# For Human Attention Aided Transformer model
bash scripts/avdn_paper/run_et_haa.sh 

# For Human Attention Aided LSTM model
bash scripts/avdn_paper/run_lstm_haa.sh 

If you find this useful, please cite

@inproceedings{fan-etal-2023-aerial,
    title = "Aerial Vision-and-Dialog Navigation",
    author = "Fan, Yue  and
      Chen, Winson  and
      Jiang, Tongzhou  and
      Zhou, Chun  and
      Zhang, Yi  and
      Wang, Xin Eric",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.190",
    doi = "10.18653/v1/2023.findings-acl.190",
    pages = "3043--3061",
}

aerial-vision-and-dialog-navigation's People

Contributors

eric-xw avatar uefan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

aerial-vision-and-dialog-navigation's Issues

Forward Direction in visualize_sub_traj.py

Dear authors,
From the ./datasets/README.md, I know the meaning of "gt_path_corners". Therefore, I draw the corner[0] in a white circle and the corner[1] in a black circle using these lines of code.

```
for i, corner in enumerate(pos_list):
    if i==(len(pos_list)-1):
        cv2.circle(im_resized, corner[0], color=(255, 255, 255), radius=16, thickness=-1)
        cv2.circle(im_resized, corner[1], color=(0, 0, 0), radius=16, thickness=-1)
```

This is the output picture.

image

I think the white circle stands for "front left" and the black circle stands for "front right". I am confused because the Forward Direction arrow doesn't point to the midpoint of the white and black circles.

您好,请问可以提供 checkpoints 和预训练文件的谷歌云盘链接吗?我无法通过gdown来访问它

Traceback (most recent call last):
File "/root/miniconda3/bin/gdown", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.8/site-packages/gdown/cli.py", line 156, in main
filename = download(
File "/root/miniconda3/lib/python3.8/site-packages/gdown/download.py", line 161, in download
res = sess.get(url, stream=True, verify=verify)
File "/root/miniconda3/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
return self.request('GET', url, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='drive.google.com', port=443): Max retries exceeded with url: /uc?id=1Ke-pA5jpq1-fsEwAch_iRCtJHx6rQc-Z (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f355786fdc0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

about dialog

When drones run to different areas, should simulators or datasets have different dialog histories or content? But according to my understanding, in AVDN, from starting point a to b or c, will the drone yield the same question/dialog? And receive the same answer? Because in the task, the dialog content is not adjusted according to the position (but the time step T in the paper)

How to understand calculating gt_next_pos_ratio code?

Dear authors:
I have some difficulty when understanding these lines of code. I have known the mean of teacher_a[i][0], current_pos and corners. However, it's hard for me to know why gt_next_pos_ratio should be computed like this. Can you explain it in detail?

_net_next_pos = 1e5*(teacher_a[i][0] - current_pos)
_net_y = np.round(1e5*((corners[i][0] + corners[i][1])/2 - current_pos)).astype(np.int)
_net_x = np.round(1e5*((corners[i][1] + corners[i][2])/2 - current_pos)).astype(np.int)
A = np.mat([[_net_x[0],_net_y[0]],[_net_x[1],_net_y[1]]])
b = np.mat([_net_next_pos[0],_net_next_pos[1]]).T
r = np.linalg.solve(A,b)
gt_next_pos_ratio = [r[0,0], r[1,0]]

By the way, a small question, in my opinion, the origin point is in the upper left corner, the positive direction of the x-axis is to the right, and the positive direction of the y-axis is down. Is this correct please?
image

yolo_v3.cfg file missed

Hello,
IT is a great work I appreciate it. when I use the default yolov3 file from internet I face some problems such as:

Traceback (most recent call last):
  File "/home/pc4193/zeyn/random_repos/Aerial-Vision-and-Dialog-Navigation/src/xview_et/main.py", line 314, in <module>
    main()
  File "/home/pc4193/zeyn/random_repos/Aerial-Vision-and-Dialog-Navigation/src/xview_et/main.py", line 310, in main
    valid(args, val_envs, val_full_traj_envs, rank=rank)
  File "/home/pc4193/zeyn/random_repos/Aerial-Vision-and-Dialog-Navigation/src/xview_et/main.py", line 257, in valid
    agent = agent_class(args, rank=rank)
  File "/home/pc4193/zeyn/random_repos/Aerial-Vision-and-Dialog-Navigation/src/xview_et/agent.py", line 141, in __init__
    self.vision_model.load_state_dict(state)
  File "/home/pc4193/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Darknet:
	size mismatch for module_list.81.conv_81.weight: copying a param with shape torch.Size([650, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 1024, 1, 1]).
	size mismatch for module_list.81.conv_81.bias: copying a param with shape torch.Size([650]) from checkpoint, the shape in current model is torch.Size([255]).
	size mismatch for module_list.93.conv_93.weight: copying a param with shape torch.Size([650, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 512, 1, 1]).
	size mismatch for module_list.93.conv_93.bias: copying a param with shape torch.Size([650]) from checkpoint, the shape in current model is torch.Size([255]).
	size mismatch for module_list.105.conv_105.weight: copying a param with shape torch.Size([650, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([255, 256, 1, 1]).
	size mismatch for module_list.105.conv_105.bias: copying a param with shape torch.Size([650]) from checkpoint, the shape in current model is torch.Size([255]).

Even if I change the value 255 to 650 I face different problems. Am I doing sth wrong or is it because of cfg file ?
and if so would you be able to share the yolo_v3.cfg file please ?
Thanks in advance

Error returned during evalai server testing

Dear authors:
I encountered an uncommon error while submitting test results on Evalai, as shown in the following figure:
Snipaste_2023-11-02_11-12-39
Can you explain why this is happening? I have generated the test trajectory in the correct manner.

Time for reopening the test server of the ANDH task.

Hello,

AVDN is a valuable and challenging task in embodied artificial intelligence, and we are highly interested in it. We would like to kindly inquire about the time for reopening the test server of the ANDH and ANDH-Full tasks. This would allow us to assess the model performance.

Thank you!

About gp and oracle_gp metric

Dear authors:
I find some differences when computing gp and oracle_gp metrics. In line 347 gt_path[-2] is used as the final destination, but when it comes to line 349, gt_path[-1] is used. I think using the gt_path[-1] is reasonable. So is this a bug or is it written that way on purpose?

scores['gp'] = gt_net_lengths - \
np.linalg.norm(path[-1] - gt_path[-2])*11.13*1e4
scores['oracle_gp'] = gt_net_lengths - \
np.min([np.linalg.norm(path[x] - gt_path[-1]) for x in range(len(path)) ])*11.13*1e4

Some confusion in this function——gps_to_img_corrds

Dear authors:
I want to know the use of variable lng_ratio in this code:

def gps_to_img_coords(self, gps, ob):
gps_botm_left = ob['gps_botm_left']
gps_top_right = ob['gps_top_right']
lng_ratio = ob['lng_ratio']
lat_ratio = ob['lat_ratio']
return int(round((gps[1] - gps_botm_left[1]) / lat_ratio)), int(round((gps_top_right[0] - gps[0]) / lat_ratio))

I know that gps[1] and gps_botm_left[1] stand for longtitude, so it's confusing why lat_ratio is divided here.

在训练的时候遇到了网络连接错误该怎么解决呢OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a config.json file.

Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 32.6kB/s]
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/transformers/configuration_utils.py", line 601, in _get_config_dict
resolved_config_file = cached_path(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/utils/hub.py", line 283, in cached_path
output_path = get_from_cache(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/utils/hub.py", line 553, in get_from_cache
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "xview_et/main.py", line 314, in
main()
File "xview_et/main.py", line 305, in main
train_env, train_full_traj_env, val_envs, val_full_traj_envs = build_dataset(args, rank=rank)
File "xview_et/main.py", line 30, in build_dataset
tok = get_tokenizer(args)
File "xview_et/main.py", line 26, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(cfg_name)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 511, in from_pretrained
config = AutoConfig.from_pretrained(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 680, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/configuration_utils.py", line 553, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/configuration_utils.py", line 634, in _get_config_dict
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this model, couldn't find it in the cached files and it looks like bert-base-uncased is not the path to a directory containing a config.json file.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Ended flag in teacher action

Dear authors,
I see you set ended flag using 'np.array([0,0], dtype=np.float32)' in this code.

if ended[i] or progress[i] > 0.5:
teacher_a[i][0] = np.array([0,0], dtype=np.float32)
continue

However, when computing loss in line 665, this if will always be True since the ended data is numpy.array type and it will never be type(-100).
# Compute loss
for i in range(len(obs)):
# if the function teacher_action determins that the current view is the final position, no action should be made
if type(target[i][0]) != type(-100):
cuda_gt_next_pos_ratio = torch.from_numpy(target[i][0]).cuda()
ml_loss += self.progress_regression(pred_next_pos_ratio[i,:], cuda_gt_next_pos_ratio)
ml_loss += self.progress_regression((torch.atan2(pred_next_pos_ratio[i,0], pred_next_pos_ratio[i,1]+1e-5*np.random.rand(1)[0]) /3.14159 + 2) / 2 %1 ,

This situation may result in the calculation of losses for the data that have already been closed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.