servicenow / duorat Goto Github PK

DuoRAT is a ServiceNow Research project that was started at Element AI.

License: Other

Dockerfile 0.16% Makefile 0.02% Jsonnet 3.36% Python 94.93% Shell 1.52%

duorat's Introduction

ServiceNow completed its acquisition of Element AI on January 8, 2021. All references to Element AI in the materials that are part of this project should refer to ServiceNow.

DuoRAT

This repository contains the implementation of the DuoRAT model as described in the respective technical report. Using this code you can

train and evaluate models on the Spider dataset
evaluate models on other single-database text2sql datasets
use trained models to perform text2sql parsing for any SQLite database or a CSV file

Setup

Download data and third party submodules: git submodule update --init

Download the Spider dataset:

bash scripts/download_and_preprocess_spider.sh

This script downloads Spider, splits the examples by database, and writes a json file to data/database.

Now, create the docker image:

make build-image

Create a directory to save models, and run interactive container:

mkdir logdir
nvidia-docker run -it -u $(id -u ${USER}) --name my_duorat --rm -v $PWD/logdir:/logdir -v $PWD/data/:/app/data duorat

(please disregard the "I have not name!" warning)

Running the code

Train the model:

python scripts/train.py --config configs/duorat/duorat-finetune-bert-large.jsonnet --logdir /logdir/duorat-bert

This script will first run a preprocessing step that creates the following files in the specified --logdir:

target_vocab.pkl: model's target vocabulary, obtained from the training set
train.pkl, val.pkl: preprocessed training and validation sets (tokenization, schema-linking and converting the output SQL into a sequence of actions)

Training will further save a number of files in the (mounted) log directory /logdir/duorat-bert: the config that was used config-{date}.json, model checkpoints model_{best/last}_checkpoint, some logs log.txt, and the inference outputs output-{step}. If your gpu does not have enough memory to run the model, you can try config=configs/duorat/duorat-12G.jsonnet instead. Note that caching of the preprocessed input tensors will significantly speed up training after the second epoch. During training, inference is run on the dev set once in a while. Here's how you can run inference manually:

python scripts/infer.py --logdir /logdir/duorat-bert --output /logdir/duorat-bert/my_inference_output
python scripts/eval.py --config configs/duorat/duorat-good-no-bert.jsonnet --section val --inferred /logdir/duorat-bert/my_inference_output --output /logdir/duorat-bert/my_inference_output.eval

To look at evaluation results:

>>> import json
>>> d = json.load(open('<PATH FOR EVAL OUTPUT>')) 
>>> print(d['total_scores']['all']['exact']) # should be ~0.69

Inference on new databases

Simply run

python scripts/interactive.py --logdir /logdir/duorat-bert --db-id [your_db]

[your_db] must be either an SQLite or CSV file. Type a question and the model will convert it into a query, which will then be executed on your database.

A batch mode inference script is also available: scripts/infer_questions.py.

New transition system

This codebase makes it possible to implement and use your own transition system (given a grammar, parse SQL to a tree representation and a sequence of actions) with this model. See the readme in duorat/asdl/ (from tranX)

Evaluation on Text2SQL datasets other than SPIDER

Our code support model evaluation on other Text2SQL datasets using the data from text2sql-data. We follow the methodology proposed by Suhr et al, 2020.

To run the download and conversion code:

Step 1

(for now this works only 5 out of 8 datasets; ATIS, Scholar and Advising are still TODO)

Build and start the MySQL docker container (do it outside of the interactive my_duorat container):

bash scripts/mysql_docker_build_and_run.sh

Step 2

Download the dataset and convert the dataset of interest, e.g. for GeoQuery:

bash scripts/download_michigan.sh geo

For IMDB, Yelp and Academic this might take a while.

Edit data/michigan.libsonnet to include only the datasets that you downloaded.

Step 4

Infer and evaluate the queries for all questions:

python scripts/infer_questions.py --logdir /logdir/duorat-bert --data-config data/michigan.libsonnet --questions data/database/geo_test/examples.json --output-google /logdir/duorat-bert/inferred_geo.json
python scripts/evaluation_google.py --predictions_filepath /logdir/duorat-bert/inferred_geo.json --output_filepath /logdir/duorat-bert/output_geo.json 
    --cache_filepath data/database/geo_test/geo_cache.json  --timeout 180
python scripts/filter_results.py /logdir/duorat-bert/output_geo.json

You might want to change the timeout if your system outputs correct but slow to execute queries.

Who we are

Acknowledgements

This implementation is originally based on the seq2struct codebase. Further model development in many aspects followed the RAT-SQL paper.

How to cite

@article{scholak_duorat_2020,
title = {{DuoRAT}: {Towards} {Simpler} {Text}-to-{SQL} {Models}},
author = {Scholak, Torsten and Li, Raymond and Bahdanau, Dzmitry and de Vries, Harm and Pal, Chris},
year = {2020},
journal = {arXiv:2010.11119 [cs]},
}

duorat's People

Contributors

Stargazers

Watchers

duorat's Issues

input token tensor has been truncated

When I try to train DuoRAT with a slightly tweaked duorat-finetune-bert-large.jsonnet, I received a number of warnings during training (~5 warnings per 10 steps). I want to confirm whether they are expected. Thank you.

2020-12-20 00:45:45 WARNING: input token tensor has been truncated to 512 tokens, original length was 516 tokens
2020-12-20 00:45:53 WARNING: input token tensor has been truncated to 512 tokens, original length was 1367 tokens
2020-12-20 00:45:56 WARNING: source length exceeds maximum source length, 398 > 200, skipping
2020-12-20 00:45:57 WARNING: input token tensor has been truncated to 512 tokens, original length was 524 tokens
2020-12-20 00:45:58 WARNING: input token tensor has been truncated to 512 tokens, original length was 1362 tokens
2020-12-20 00:46:02 WARNING: source length exceeds maximum source length, 393 > 200, skipping

Running git submodule update --init

Hi, I've followed the instructions in the README, and I seem to have trouble running git submodule update --init. Is it possible that third_party/spider is not public?

Thanks!

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:ElementAI/spider.git' into submodule path '/media/disk1/tomerwolfson/duorat/third_party/spider' failed
Failed to clone 'third_party/spider'. Retry scheduled
Cloning into '/media/disk1/tomerwolfson/duorat/third_party/spider'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:ElementAI/spider.git' into submodule path '/media/disk1/tomerwolfson/duorat/third_party/spider' failed
Failed to clone 'third_party/spider' a second time, aborting

Using float 16 in training?

I've noticed that in training some tensors are of the float 16 datatype, whereas in validation, I only see float 32. is that in line with what you see? Is this intentional? I haven't found the part of the code that causes the float 16 conversion; if there is some conversion like that, could you please point me to where it is in the code?

Nonstandard Beam Search (?)

It's super minor, but I noticed that in the beam search implementation

https://github.com/ElementAI/duorat/blob/6fba4c3f08d372465780dea0a2198a650dd407f4/duorat/utils/beam_search.py#L59

it is collecting all the finished hypothesis in to an array, and expanding the rest of the hypothesis and taking the top K - len(finished). As a result, the finished hypothesis never drops out the final returned results, and the last candidate is effectively found by greedy decoding towards the end.

I believe a more standard (and perhaps better) implementation is to still keep the finished hypothesis in the beam, expand the other hypothesis, and take the top-K hypothesis from the union of the finished hypothesis and the other candidates. In this way, sub-optimal already-finished hypothesis can be excluded.

I do not believe it would substantially change the results in the paper, though; but just to point it out. Or maybe both versions are legit, and it's just I am unaware of the other version before.

(Reference: Algo 1 in this paper)

How to get around 69 exact_match performance?

Hi authors,

Thanks for the excellent work. I really struggled to use RAT-SQL code (from Microsoft) to get >60 exact_match performance (its training was really unstable).

Switching to DuoRAT saves me a lot of time.

For the initial results, I trained DuoRAT with two configurations (duorat-finetune-bert-large & duorat-new-db-content) with only slight modification on batch_size=4 (instead of 9) due to the OOM issue on my GPU (16GB V100).

For duorat-finetune-bert-large, after 55000 steps, I got:

[2020-10-28T08:01:05] Step 5000 stats, val: loss = 0.12590350662727393, easy_exact = 0.7620967741935484, medium_exact = 0.49327354260089684, hard_exact = 0.40804597701149425, extra_exact = 0.2469879518072289, all_exact = 0.5038684719535783
[2020-10-28T09:22:16] Step 10000 stats, val: loss = 0.14725026781031253, easy_exact = 0.7782258064516129, medium_exact = 0.594170403587444, hard_exact = 0.5172413793103449, extra_exact = 0.3614457831325301, all_exact = 0.5880077369439072
[2020-10-28T10:44:27] Step 15000 stats, val: loss = 0.14249609349245057, easy_exact = 0.842741935483871, medium_exact = 0.6547085201793722, hard_exact = 0.5, extra_exact = 0.3253012048192771, all_exact = 0.620889748549323
[2020-10-28T12:04:45] Step 20000 stats, val: loss = 0.15154095876752985, easy_exact = 0.8306451612903226, medium_exact = 0.6636771300448431, hard_exact = 0.5, extra_exact = 0.3795180722891566, all_exact = 0.6305609284332688
[2020-10-28T13:26:16] Step 25000 stats, val: loss = 0.15309250173276034, easy_exact = 0.8790322580645161, medium_exact = 0.6838565022421524, hard_exact = 0.5574712643678161, extra_exact = 0.3674698795180723, all_exact = 0.6586073500967118
[2020-10-28T14:46:39] Step 30000 stats, val: loss = 0.15393706009051325, easy_exact = 0.8548387096774194, medium_exact = 0.6053811659192825, hard_exact = 0.47126436781609193, extra_exact = 0.3433734939759036, all_exact = 0.6005802707930368
[2020-10-28T16:06:07] Step 35000 stats, val: loss = 0.16426039838576167, easy_exact = 0.8669354838709677, medium_exact = 0.6771300448430493, hard_exact = 0.4942528735632184, extra_exact = 0.3433734939759036, all_exact = 0.6382978723404256
[2020-10-28T17:26:28] Step 40000 stats, val: loss = 0.1535649892186829, easy_exact = 0.8467741935483871, medium_exact = 0.647982062780269, hard_exact = 0.5229885057471264, extra_exact = 0.3373493975903614, all_exact = 0.6247582205029013
[2020-10-28T18:46:05] Step 45000 stats, val: loss = 0.16708454437467415, easy_exact = 0.8629032258064516, medium_exact = 0.6636771300448431, hard_exact = 0.5057471264367817, extra_exact = 0.3373493975903614, all_exact = 0.632495164410058
[2020-10-28T20:05:48] Step 50000 stats, val: loss = 0.1519994481004652, easy_exact = 0.8629032258064516, medium_exact = 0.625560538116592, hard_exact = 0.5229885057471264, extra_exact = 0.3433734939759036, all_exact = 0.6199226305609284
[2020-10-28T21:25:47] Step 55000 stats, val: loss = 0.17041306306087417, easy_exact = 0.8306451612903226, medium_exact = 0.6614349775784754, hard_exact = 0.5229885057471264, extra_exact = 0.35542168674698793, all_exact = 0.6295938104448743

For duorat-new-db-content, I got:

[2020-10-28T10:05:47] Step 5000 stats, val: loss = 0.12354135784776286, easy_exact = 0.7983870967741935, medium_exact = 0.5650224215246636, hard_exact = 0.42528735632183906, extra_exact = 0.25301204819277107, all_exact = 0.5473887814313346
[2020-10-28T11:33:05] Step 10000 stats, val: loss = 0.14531567805974493, easy_exact = 0.8306451612903226, medium_exact = 0.5986547085201793, hard_exact = 0.5172413793103449, extra_exact = 0.22289156626506024, all_exact = 0.5802707930367504
[2020-10-28T12:53:33] Step 15000 stats, val: loss = 0.15475997000768765, easy_exact = 0.8387096774193549, medium_exact = 0.6681614349775785, hard_exact = 0.4827586206896552, extra_exact = 0.3674698795180723, all_exact = 0.6295938104448743
[2020-10-28T14:18:32] Step 20000 stats, val: loss = 0.16852696785951896, easy_exact = 0.8064516129032258, medium_exact = 0.6412556053811659, hard_exact = 0.4942528735632184, extra_exact = 0.3674698795180723, all_exact = 0.6121856866537717
[2020-10-28T15:39:26] Step 25000 stats, val: loss = 0.1531622735157472, easy_exact = 0.7741935483870968, medium_exact = 0.6143497757847534, hard_exact = 0.5229885057471264, extra_exact = 0.39759036144578314, all_exact = 0.6025145067698259
[2020-10-28T16:59:41] Step 30000 stats, val: loss = 0.14458048883357177, easy_exact = 0.8467741935483871, medium_exact = 0.647982062780269, hard_exact = 0.5517241379310345, extra_exact = 0.37349397590361444, all_exact = 0.6353965183752418
[2020-10-28T18:20:00] Step 35000 stats, val: loss = 0.14305388022302443, easy_exact = 0.8669354838709677, medium_exact = 0.7174887892376681, hard_exact = 0.4885057471264368, extra_exact = 0.41566265060240964, all_exact = 0.6663442940038685
[2020-10-28T19:39:42] Step 40000 stats, val: loss = 0.1363988071367076, easy_exact = 0.8911290322580645, medium_exact = 0.7040358744394619, hard_exact = 0.5402298850574713, extra_exact = 0.3493975903614458, all_exact = 0.6644100580270793
[2020-10-28T20:59:41] Step 45000 stats, val: loss = 0.1358668204875234, easy_exact = 0.8709677419354839, medium_exact = 0.6345291479820628, hard_exact = 0.4827586206896552, extra_exact = 0.3855421686746988, all_exact = 0.625725338491296
[2020-10-28T22:19:24] Step 50000 stats, val: loss = 0.14011963381750844, easy_exact = 0.8790322580645161, medium_exact = 0.679372197309417, hard_exact = 0.5, extra_exact = 0.3433734939759036, all_exact = 0.6431334622823984

Do they look correct? Do I need to wait for it to be trained for a bit longer to get around 69 performance?

Thanks!

Problems loading jsonnet files

Hey there!

I am trying to run the DuoRat code from the Docker container, and experience issues with loading the jsonnet files

root@duorat:/app# python scripts/train.py --config configs/duorat/duorat-finetune-bert-large.jsonnet --logdir logdir/duorat-bert
Traceback (most recent call last):
  File "scripts/train.py", line 519, in <module>
    main()
  File "scripts/train.py", line 486, in main
    config = json.loads(_jsonnet.evaluate_file(args.config))
RuntimeError: RUNTIME ERROR: couldn't open import "../../data/train.libsonnet": no match locally or in the Jsonnet library paths.
	configs/duorat/duorat-base.libsonnet:5:17-52	object <anonymous>
	configs/duorat/duorat-base.libsonnet:(4:11)-(7:6)	object <anonymous>
	During manifestation

And if I sanity check and look in the configs, I am able to find these files

root@duorat:/app# ls configs/duorat/
duorat-12G.jsonnet     duorat-bert.jsonnet  duorat-finetune-bert-base.jsonnet                  duorat-finetune-bert-large.jsonnet   duorat-good-no-bert.jsonnet             duorat-new-db-content.jsonnet
duorat-base.libsonnet  duorat-dev.jsonnet   duorat-finetune-bert-large-attention-maps.jsonnet  duorat-good-no-bert-no-from.jsonnet  duorat-new-db-content-no-whole.jsonnet

I have of course already googled around to see what the source of this error could be, but I can't seem to find anything helpful enough. Would you advise maybe that I try to hardcode this, or do you have any intuition as to why I might be getting this error?

-- Edit with more details

I suspect the problem may have to do with this command, where /logdir and /data are remounted onto home.

nvidia-docker run -it -u $(id -u ${USER}) --name my_duorat --rm -v $PWD/logdir:/logdir -v $PWD/data/:/app/data duorat

Because I am not running the docker container locally, I did not use this command. Rather, I created an image with google cloud platform, and then I spin up the image as an interactive bash session to work with.
I tried making symlinks and also using mount where the docker command re-mounts them, but that hasn't fixed the problem. For example:

root@duorat:/app# mount $PWD/data/ /app/data
mount: /app/data/: mount point does not exist.

Thanks so much!

License for commercial use

Dear authors,

Thanks for a great work.

I really like the code base. I adopted this code and played around for several months. It worked very well compared to other code base.

Unfortunately, this code base is not for commercial use. It might be due to your company policy but it is restrictive.

Is there any chance to have more open license so that it can be used for commercial use? Just my curiosity.

Thanks.

docker: Error response from daemon: Unknown runtime specified nvidia.

I got this error when running the following command:

nvidia-docker run -it -u $(id -u ${USER}) --name my_duorat --rm -v $PWD/logdir:/logdir -v $PWD/data/:/app/data duorat

And I fixed this issue by replacing nvidia-docker with docker:

nvidia-docker run -it -u $(id -u ${USER}) --name my_duorat --rm -v $PWD/logdir:/logdir -v $PWD/data/:/app/data duorat

Not sure whether it is a typo.

Is this the model that reached 65.4 EM on the test set? （#7，DuoRAT + BERT (DB content used)）

Pre-trained model

Could you please release a pre-trained model for the final DuoRAT system (which gets 69.9% \pm 0.8 accuracy on the dev set of Spider)?

How to obtain ActionInfo in duorat/asdl/action_info.py ?

Hi !
I am wondering where is the code to generate ActionInfo.
The data structure is given in duorat/asdl/action_info.py, however, I can not find the code to generate ActionInfo from action.
Thank you for your help !

asdl license

Hello,

First, thanks for publishing this, great work!

The license for the entire repository is non-commercial, yet the asdl sub-package contains a mixture of MIT code and your additions (presumably under the same non-commercial license). Would it be possible to re-license at least the asdl utility package as MIT? It (a) already builds upon open-source MIT code, (b) is not the core contribution of the project. That would be much appreciated by the community, and would help referencing this work and building upon it.

Requirements

Hi, I wonder if the requirements is the same as RAT's. Could you please add a requirements.txt to the github.

Is it possible to run BERT Large with Multiple-GPUs?

Is it possible to run the BERT large version with multiple GPUs? For example, rather than have a single 32 GB gpu, I would like to use two 16 GB GPUs.