Coder Social home page Coder Social logo

tshprecher / antlr_psql Goto Github PK

View Code? Open in Web Editor NEW
76.0 5.0 31.0 6.35 MB

ANTLR v4 grammar for Postgres SQL v10 (work in progress)

License: MIT License

ANTLR 24.62% Shell 0.95% Python 1.93% Java 6.15% PLpgSQL 17.47% PLSQL 1.48% SQLPL 3.83% HTML 2.65% TSQL 40.91%
grammar antlr antlr4 postgresql postgres psql sql

antlr_psql's Introduction

WORK IN PROGRESS

Antlr4 grammar for PostgresSQL v10.

This project is in development. See /test_coverage.html for a breakdown of test results by query type. Development is test driven, with tests generated by scraping sql commands directly from the source postgres repo. See the scripts section below for details.

Setup

This project was built on MacOs v.10.14 using

  • Maven v.3.3.9
  • Python v.2.7.13
  • Java v.1.8.0_144

Maven and Java are used for building and testing. Some scripts use Python.

Generating the parser

To generate the Java parser code used for testing, run mvn compile.

Testing

Test queries live in src/test/resources categorized by query type. Each has been scraped from code in the source postgres repo and each query type has a Java test harness. For example, running mvn -Dtest=SelectCommandTest test runs tests against all SELECT queries. To run tests against all queries, run mvn test.

Most tests fail simply because this project is incomplete. The development process has been test -> fail -> pick a failing query -> fix it. Addressing the root cause of a single failing test often fixes others, although diminishing returns applies as total accuracy increases.

Note that since each test lives in its own file and filesystems have minimum page sizes (e.g. 4K), the size of this repo can be over 80MB. Having each test in single file has been helpful for debugging. Test queries may be consolidated into fewer files when this project reaches a stable state.

Scripts

Generating test files

As mentioned, tests have been generated from the official postgres repo (https://github.com/postgres/postgres). To regenerate them, fetch the repository at the desired version and run ./scripts/gen_psql_tests src/test/regress/sql /tmp/my_local_psql_tests/ where src/test/regress/sql is a directory in the source postgres project.

Parsing a specific query

Sometimes it is useful to see exactly where the parser fails on a specific query. You could use IntelliJ Idea Plugin for ANTLR v4 for testing. Also there's a script for that. To see where the parser fails on selectttt 1, run ./scripts/parse "selectttt 1" stmt. The first argument is the query and the second the grammar rule, with stmt being the root rule defined in PostgreSQLParser.g4.

Generate test coverage report

The /test_coverage.html test coverage report can be generated from piping a script that runs all the tests and filters out summary lines into one that formats those lines into an html table. Simply run ./scripts/get_test_output | ./scripts/format_test_results > test_coverage.html. Note that this can take a minute through the entire suite.

./scripts/get_test_output also fills failed.log with the names of currently failed scripts. Sometimes small changes in parser could cause a decrease of the success rates in the coverage report. scripts/print_uncommitted_failed_sql prints to console a code of the statements that previously were success and now have status 'failed'. In normal way the length of failed.log should only decrease.

antlr_psql's People

Contributors

mebelousov avatar tshprecher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

antlr_psql's Issues

Exception when running the example parse shell script from the README

Running the example parse shell script from the README produces an exception

git clone https://github.com/tshprecher/antlr_psql.git
cd antlr_psql
mvn compile
./scripts/parse "selectttt 1" stmt

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
	at java.lang.String.charAt(String.java:658)
	at org.antlr.v4.Tool.handleArgs(Tool.java:207)
	at org.antlr.v4.Tool.<init>(Tool.java:195)
	at org.antlr.v4.Tool.main(Tool.java:164)

[TOKENS]
  IDENTIFIER           'selectttt'
  INTEGER_LITERAL      '1'
  EOF                  '<EOF>'

[PARSE-TREE]
  (stmt
    (abort_stmt
      (identifier selectttt)))

java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_212-b03)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.212-b03, mixed mode)
  • also get the same exception for a syntactically correct SQL statement.
./scripts/parse "select 1" stmt
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
	at java.lang.String.charAt(String.java:658)
	at org.antlr.v4.Tool.handleArgs(Tool.java:207)
	at org.antlr.v4.Tool.<init>(Tool.java:195)
	at org.antlr.v4.Tool.main(Tool.java:164)

[TOKENS]
  SELECT               'select'
  INTEGER_LITERAL      '1'
  EOF                  '<EOF>'

[PARSE-TREE]
  (stmt
    (select_stmt select
      (selector_clause
        (column_list
          (expr 1)))))

properly handle the `WITH` clauses in operations

The script to generate psql tests has a hack to support queries starting with WITH, but WITH is not a type of query. We should be smarter about how we slot queries by properly slotting SELECT, UPDATE, etc... queries with a WITH statement.

Handle some low hanging fruit

@mebelousov, since we're both committed to getting this done by the end of the year, I think we should start segmenting some of the low hanging fruit so we don't both work on the same things. This week, I'd like to knock the following completions:

LISTEN | 0/3 (0.00%)
LOCK | 0/34 (0.00%)
MOVE | 1/10 (10.00%)
NOTIFY | 0/2 (0.00%)
PREPARE | 0/46 (0.00%)
REASSIGN OWNED | 0/4 (0.00%)
REINDEX | 0/6 (0.00%)
RELEASE SAVEPOINT | 0/9 (0.00%)
RESET | 0/49 (0.00%)
REVOKE | 2/77 (2.60%)
ROLLBACK | 0/12 (0.00%)
ROLLBACK PREPARED | 0/4 (0.00%)
ROLLBACK TO SAVEPOINT | 0/10 (0.00%)
SAVEPOINT | 0/22 (0.00%)
SECURITY LABEL | 0/8 (0.00%)

They should be easy and bring lots more green into the test coverage status. I'm creating this issue so we don't end up working on the same things.

Status of the project

I am interested in contributing to this project. I have written several SQL parsers and optimisers over many years. I have also written my own ODBMS back in the day when Postgres started out.

I use antlr regularly.

Cheers

Steve T
([email protected])

Do a pass on performance

Once the grammar is correct, I'd like to do a pass over the tests to improve performance. We've seen high variance in the time it takes to run tests based on seemingly innocuous changes. If there's some recursion would could avoid, we should.

This would involve adding timing data to each test and check for outliers.

psql required for Generating Test Cases

While trying to generate test cases (run ./scripts/gen_psql_tests src/test/regress/sql /tmp/my_local_psql_tests/), the script fails at psql -U postgres -f /tmp/query.sql, since it tries to connect with psql. It is required as the script connects with database to check for any syntactical errors.

create_table_stmt defination

I see there are some motheds is not completed, such as create_table_stmt, create_table_stmt_as and so on. could you provider the implementes of those mothed;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.