Coder Social home page Coder Social logo

ryansql's Introduction

RYANSQL

Introduction

A source code for RYANSQL, a text-to-SQL system for complex, cross-domain databases.

Reference Paper: Choi et al., RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases, 2020

The system is submitted to SPIDER leaderboard. The system and its minor improved version RYANSQL v2 is ranked at second and fourth place (as of February 2020).

The system does NOT use any database records, which make it more acceptable to the real world company applications.

Requirements

Python3
Tensorflow 1.14
nltk

Install

Download the BERT pretrained model. You can only download the model, not the whole git. The system uses BERT-large, uncased with Whole Word Masking model. Unzip the downloaded file.

Download the SPIDER dataset from https://yale-lily.github.io/spider. Unzip the downloaded file.

Train

Run:

python src/trainer.py [BERT_DIR] [SPIDER_DATASET_DIR]

An example is:

python src/trainer.py ./wwm_uncased_L-24_H-1024_A-16 ./spider

The training takes about a day using a single Tesla V100 GPU. The dev set performance during the training shows the exact slot matching performance, including ordering; it will range between 55 to 57 % for the final model.

The required files of the SPIDER dataset are: tables.json, train_spider.json, train_others.json, plus dev.json for testing.

Evaluate

Clone the Spider git (https://github.com/taoyds/spider), and add its local directory to python sys.path.

Run:

python src/actual_test.py [MODEL_PATH] [BERT_DIR] [SPIDER_DATASET_DIR] [OUT_FILE]

to get the resultant SQL statements for the development set. The generated output file then could be evaluated using the SPIDER's evaluation script.

The performance of evaluation script with the final model will range from 64 to 66 %, since the ordering of conditions is not important for an actual SQL statement.

The required files for SPIDER dataset is, table.json for database schema information, and dev.json for development dataset.

Contact

[email protected]

ryansql's People

Contributors

cdh4696 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.