Coder Social home page Coder Social logo

sekewei / overpassnl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from raphael-sch/overpassnl

0.0 0.0 0.0 1.15 MB

Natural Language to Overpass Query Language

Home Page: https://arxiv.org/pdf/2308.16060.pdf

Shell 0.67% Python 99.03% Dockerfile 0.30%

overpassnl's Introduction

Code for the preprint: Staniek et al, "Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap"

Live Demo: https://overpassnl.schumann.pub/

Dataset

The main dataset is found in the following files:

dataset.{train,dev,test}.nl
dataset.{train,dev,test}.query
dataset.{train,dev,test}.bbox

where .nl are the natural language inputs, .query are the Overpass queries and .bbox is used during evaluation for queries that use the {{bbox}} variable/shortcut. This ensures that they are evaluated in an area where the gold query returns results.

The following files are used to determine the difficulty of evaluation instances (Figure 5 in paper):

dataset.{dev,test}.difficulty_{len_nl, len_query, num_results, train_sim_nl train_sim_oqo, xml_components}

where train_sim_oqo is used to determine the 333 hard instances in dataset.{dev,test}.hard.{nl,query}.

Evaluation

Download the exact OpenStreetMap database we used for the evaluation in the paper here [306 GB]
Unzip the file (381 GB unziped) such that the folder structure is: evaluation/overpass_clone_db/db
Install docker and docker-compose and start the container with the Overpass API.

cd evaluation/docker
docker-compose up

This will start the Overpass API as a docker service and expose it at http://localhost:12346/api
If you see permission or file not found errors for db/osm3s_v0.7.57_areas or db/osm3s_v0.7.57_osm_base be sure to set correct execution permission to those files
You can test the Overpass API with:

curl -g 'http://localhost:12346/api/interpreter?data=[out:json];area[name="London"];out;'

If this returns an appropriate json output, you are set for the evaluation.

cd evaluation
pip install -r requirements.txt
python run_evaluation.py --ref_file ../dataset/dataset.dev --model_output_file ../models/outputs/byt5-base_com08/evaluation/preds_dev_beams4_com08_byt5_base.txt 

This will take around 5 hours depending on how many query results were cached in previous runs. Be sure to change the default arguments in the evaluation script if you use a different port for the Overpass API.
The evaluation results will be written to results_execution...txt and results_oqs...txt in the same dir as the model_output_file.

OverpassT5

Download the model config and weights here and place the files into models/outputs/byt5-base_com08/
Then run the following commands to generate the output queries.

cd evaluation
pip install -r requirements.txt
python inference_t5.py --exp_name com08 --model_name byt5-base --data_dir ../dataset --num_beams 4 --splits dev test 

Finetuning

To finetune your own model use the train_t5.py script.

cd evaluation
pip install -r requirements.txt
python train_t5.py --exp_name default --data_dir ../dataset --model_name google/byt5-base  --gradient_accumulation_steps 4
python inference_t5.py --exp_name default --model_name byt5-base --data_dir ../dataset --num_beams 4 

Python

Recommended Python version is 3.10 for all scripts.

References

The demo front-end is a fork of https://github.com/rowheat02/osm-gpt
We thank the https://overpass-turbo.eu/ community and Martin Raifer

Citation

Please cite the following paper:

@article {staniek-2023-overpassnl,
 title = "Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap",
 author = "Michael Staniek and Raphael Schumann and Maike Züfle and Stefan Riezler",
 year = "2023",
 publisher = "arXiv",
 eprint = "2308.16060" 
}

overpassnl's People

Contributors

raphael-sch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.