Coder Social home page Coder Social logo

sqlparser-rs's Introduction

Extensible SQL Lexer and Parser for Rust

License Version Build Status Coverage Status Gitter Chat

This crate contains a lexer and parser for SQL that conforms with the ANSI/ISO SQL standard and other dialects. This crate is used as a foundation for SQL query engines, vendor-specific parsers, and various SQL analysis.

Example

To parse a simple SELECT statement:

use sqlparser::dialect::GenericDialect;
use sqlparser::parser::Parser;

let sql = "SELECT a, b, 123, myfunc(b) \
           FROM table_1 \
           WHERE a > b AND b < 100 \
           ORDER BY a DESC, b";

let dialect = GenericDialect {}; // or AnsiDialect, or your own dialect ...

let ast = Parser::parse_sql(&dialect, sql).unwrap();

println!("AST: {:?}", ast);

This outputs

AST: [Query(Query { ctes: [], body: Select(Select { distinct: false, projection: [UnnamedExpr(Identifier("a")), UnnamedExpr(Identifier("b")), UnnamedExpr(Value(Long(123))), UnnamedExpr(Function(Function { name: ObjectName(["myfunc"]), args: [Identifier("b")], filter: None, over: None, distinct: false }))], from: [TableWithJoins { relation: Table { name: ObjectName(["table_1"]), alias: None, args: [], with_hints: [] }, joins: [] }], selection: Some(BinaryOp { left: BinaryOp { left: Identifier("a"), op: Gt, right: Identifier("b") }, op: And, right: BinaryOp { left: Identifier("b"), op: Lt, right: Value(Long(100)) } }), group_by: [], having: None }), order_by: [OrderByExpr { expr: Identifier("a"), asc: Some(false) }, OrderByExpr { expr: Identifier("b"), asc: None }], limit: None, offset: None, fetch: None })]

Features

The following optional crate features are available:

  • serde: Adds Serde support by implementing Serialize and Deserialize for all AST nodes.
  • visitor: Adds a Visitor capable of recursively walking the AST tree.

Syntax vs Semantics

This crate provides only a syntax parser, and tries to avoid applying any SQL semantics, and accepts queries that specific databases would reject, even when using that Database's specific Dialect. For example, CREATE TABLE(x int, x int) is accepted by this crate, even though most SQL engines will reject this statement due to the repeated column name x.

This crate avoids semantic analysis because it varies drastically between dialects and implementations. If you want to do semantic analysis, feel free to use this project as a base.

Preserves Syntax Round Trip

This crate allows users to recover the original SQL text (with comments removed, normalized whitespace and keyword capitalization), which is useful for tools that analyze and manipulate SQL.

This means that other than comments, whitespace and the capitalization of keywords, the following should hold true for all SQL:

// Parse SQL
let ast = Parser::parse_sql(&GenericDialect, sql).unwrap();

// The original SQL text can be generated from the AST
assert_eq!(ast[0].to_string(), sql);

There are still some cases in this crate where different SQL with seemingly similar semantics are represented with the same AST. We welcome PRs to fix such issues and distinguish different syntaxes in the AST.

SQL compliance

SQL was first standardized in 1987, and revisions of the standard have been published regularly since. Most revisions have added significant new features to the language, and as a result no database claims to support the full breadth of features. This parser currently supports most of the SQL-92 syntax, plus some syntax from newer versions that have been explicitly requested, plus some MSSQL, PostgreSQL, and other dialect-specific syntax. Whenever possible, the online SQL:2016 grammar is used to guide what syntax to accept.

Unfortunately, stating anything more specific about compliance is difficult. There is no publicly available test suite that can assess compliance automatically, and doing so manually would strain the project's limited resources. Still, we are interested in eventually supporting the full SQL dialect, and we are slowly building out our own test suite.

If you are assessing whether this project will be suitable for your needs, you'll likely need to experimentally verify whether it supports the subset of SQL that you need. Please file issues about any unsupported queries that you discover. Doing so helps us prioritize support for the portions of the standard that are actually used. Note that if you urgently need support for a feature, you will likely need to write the implementation yourself. See the Contributing section for details.

Command line

This crate contains a CLI program that can parse a file and dump the results as JSON:

$ cargo run --features json_example --example cli FILENAME.sql [--dialectname]

Users

This parser is currently being used by the DataFusion query engine, LocustDB, Ballista, GlueSQL, Opteryx, PRQL, Qrlew, JumpWire, and ParadeDB.

If your project is using sqlparser-rs feel free to make a PR to add it to this list.

Design

The core expression parser uses the Pratt Parser design, which is a top-down operator-precedence (TDOP) parser, while the surrounding SQL statement parser is a traditional, hand-written recursive descent parser. Eli Bendersky has a good tutorial on TDOP parsers, if you are interested in learning more about the technique.

We are a fan of this design pattern over parser generators for the following reasons:

  • Code is simple to write and can be concise and elegant
  • Performance is generally better than code generated by parser generators
  • Debugging is much easier with hand-written code
  • It is far easier to extend and make dialect-specific extensions compared to using a parser generator

Supporting custom SQL dialects

This is a work in progress, but we have some notes on writing a custom SQL parser.

Contributing

Contributions are highly encouraged! However, the bandwidth we have to maintain this crate is limited. Please read the following sections carefully.

New Syntax

The most commonly accepted PRs add support for or fix a bug in a feature in the SQL standard, or a popular RDBMS, such as Microsoft SQL Server or PostgreSQL, will likely be accepted after a brief review. Any SQL feature that is dialect specific should be parsed by both the relevant Dialect as well as GenericDialect.

Major API Changes

The current maintainers do not plan for any substantial changes to this crate's API. PRs proposing major refactors are not likely to be accepted.

Testing

While we hope to review PRs in a reasonably timely fashion, it may take a week or more. In order to speed the process, please make sure the PR passes all CI checks, and includes tests demonstrating your code works as intended (and to avoid regressions). Remember to also test error paths.

PRs without tests will not be reviewed or merged. Since the CI ensures that cargo test, cargo fmt, and cargo clippy, pass you should likely to run all three commands locally before submitting your PR.

Filing Issues

If you are unable to submit a patch, feel free to file an issue instead. Please try to include:

  • some representative examples of the syntax you wish to support or fix;
  • the relevant bits of the SQL grammar, if the syntax is part of SQL:2016; and
  • links to documentation for the feature for a few of the most popular databases that support it.

Unfortunately, if you need support for a feature, you will likely need to implement it yourself, or file a well enough described ticket that another member of the community can do so. Our goal as maintainers is to facilitate the integration of various features from various contributors, but not to provide the implementations ourselves, as we simply don't have the resources.

Licensing

All code in this repository is licensed under the Apache Software License 2.0.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions.

sqlparser-rs's People

Contributors

alamb avatar alex-dukhno avatar andygrove avatar augustofkl avatar b41sh avatar benesch avatar dandandan avatar dependabot[bot] avatar emin100 avatar eyalleshem avatar iffyio avatar ivanceras avatar jefffrey avatar jmhain avatar jonathanlehto avatar koushiro avatar lovasoa avatar lustefaniak avatar maxcountryman avatar mazterqyou avatar miuy56dc avatar mobuchowski avatar mskrzypkows avatar nickolay avatar ovr avatar pawel-big-lebowski avatar sarahyurick avatar thomas-jeepe avatar togami2864 avatar yuval-illumex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sqlparser-rs's Issues

Remove copyright from headers

The file headers are very inconsistent right now. We should probably have CI fail if files don't have ASL headers so that this code is kept in compliance with ASL and could potentially be contributed to an Apache project in the future. We should certainly remove Copyright 2018 Grove Enterprises LLC from all headers.

Support join associativity

Alternate join associativity can be forced with parentheses, as in:

SELECT * FROM a NATURAL JOIN (b NATURAL JOIN c);

The current parser doesn't support it, nor is the design of the SQLSelect struct particularly conducive to it.

[mssql] Support DECLARE statements

I am running into a few errors trying to parse MsSql.

The following sql

DECLARE @Var1 INTEGER;

reports the following error:

ParserError("Unexpected keyword \"DECLARE\" at the beginning of a statement")

This is a pretty normal thing to appear in t-sql scripts from my experience, though I maybe misunderstanding why this might be an error.

Implement SQL writer

It is often useful to be able to generate SQL from the AST. This opens up possibilities for SQL manipulation or translation. The write should recursively walk the AST and write SQL to a writer / output stream.

A secondary benefit of having a writer is that it often simplifies writing unit tests for the parser, since a SQL statement can be parsed and re-written and then the output can be compared with the input (ignoring whitespace differences)

Move postgres specfiic code into postgres dialect

The recent PR for postgres support has led to postgres specific parsing and writing code creeping into the base sqlparser module. We need to move the postgres specific code into the postgres dialect module.

Thoughts on maintainership

Hi @andygrove! ๐Ÿ‘‹

First and foremost: thanks for all the work on this SQL parser! (With a big shoutout to @nickolay.) It's saved us a lot of time at @MaterializeInc, where we're using the parser in a SQL streaming dataflow engine.

I just got permission to open source our fork (https://github.com/MaterializeInc/sqlparser), which has a number of fixes on top of master. I've opened PRs for the first round of patches, but there's already a bunch of patches I haven't yet PR'd, and there's a bunch more improvements that we have planned.

We'd like to upstream as many patches as possible, but I'm nervous about the amount by which our fork is diverging already. (For example, improving standards compliance for CREATE TABLE is a rather invasive refactor: MaterializeInc/sqlparser@5fa7fe6.) I'm more than happy to do the work of extracting and PRing the upstreamable bits from our fork, but the work is exponential in the number of open, interdependent PRs, and I've already accumulated quite a few.

Since you mentioned your time is limited these days, I was wondering if you'd be comfortable letting me and @nickolay merge PRs without your approval. Totally understand if you'd rather build up to that level of trust, but I wanted to offer to step up as a maintainer to reduce the burden on you. I'd be happy to limit myself to those PRs which fix uncontroversial deficiencies, too, leaving some of the larger questions around supporting multiple dialects to when you have more time.

Parser mishandles quoted column names

The tokenizer nicely handles quoted column names, independent of dialect, e.g., [column name], "column name", and `column name` are all correctly recognized and the unquoted identifier column name is stored in the value slot of the corresponding Word. But then the parser's as_ident() restores the quotes by calling to_string() instead of just returning the value of self.value. I can't speak to all 7 lines that invoke this method but this seems the wrong implementation at least when parsing column names as well as parsing identifiers. If there are no objections, I'm gonna submit a pull request to fix this.

CREATE INDEX?

Not sure if I'm misusing things or misunderstood (it does look like index operations of any kind are not defined in the SQL specs I've peeked at), but is there a particular reason that it's currently not possible to parse a CREATE INDEX/DROP INDEX statement?

SELECT TOP N syntax (i/o LIMIT) for MSSQL

This is minor but seemed like potentially a good first issue, and the README mentioned you were interested in PRs to add support for RDBMS like MS SQL Server:

SQL Server syntax for limiting the result set is SELECT TOP N ... or SELECT TOP(N) ... optionally followed by PERCENT and/or WITH TIES -- see Microsoft Docs link. I don't see TOP in the keyword list and was able to create a failing test for parsing this. Thinking of submitting a PR to add support for this syntax.

Support case sensitive parsing

So by standard SQL queries are not case sensitive. But for me, this is problematic because I like to encode metadata information into database structures (like table names, column names, etc.) so that database introspection can work well. But then this means I have to quote all names all the time in all SQL queries.

So I would love if this package could support case sensitive parsing (and unparsing, #18). This could allow one to read (and rewrite) queries without changing the case.

I am mentioning this because as this package becomes more SQL standardized, it would be useful to keep some extensions available (like case sensitivity).

Thoughts on SQL Extensions

I've been playing around with a trait based solution to allowing dialects to carry their own extension definitions. Would something along these lines be accepted?

Obviously the example is a toy, and there would need to be some additional functions defined on the trait to be able to extract the AST from the boxed Trait, but I thought it was better to get feedback before rushing off and implementing.

Precedence of NOT isn't quite right

It took me a while to come up with this example, and it's not totally realistic, but it's probably still worth fixing. Basically, the precedence of NOT LIKE is mishandled, because the precedence of NOT is fixed to its unary operator precedence, when in it needs to vary based on whether the NOT keyword is followed by LIKE/BETWEEN/IN/.

Concretely, SELECT a LIKE b IS NULL parses correctly, with the LIKE binding more tightly than the IS NULL:

SQLIsNull(
    SQLBinaryExpr {
        left: SQLIdentifier(
            "a"
        ),
        op: Like,
        right: SQLIdentifier(
            "b"
        )
    }
)

But when NOT LIKE is substituted for LIKE, the IS NULL incorrectly binds more tightly than the NOT LIKE:

SQLBinaryExpr {
    left: SQLIdentifier(
        "a"
    ),
    op: NotLike,
    right: SQLIsNull(
        SQLIdentifier(
            "b"
        )
    )
}

Support `proc-macro` TokenStream

I'm currently writing a proc macro that uses SQL to generate code that works with the storage pattern in specs. As is, I have to call display on the TokenStream, which converts it to a raw string. This will mean I'm undoing the tokenization for this parser to re-tokenize, while it should be possible for this parser to use the TokenStream directly.

This is low priority for me, as this issue only causes excess compilation time, but after creating this proc macro, I can add it to my own to-do.

Add support for prepared queries.

Add support for prepared queries.

let sql = "SELECT * FROM users WHERE name = $1";
AST: Err(
    ParserError(
        "Did not expect Char(\'$\') at the beginning of an expression"
    )
)

Release schedule

Extracting discussion from #105:

We should also talk a release schedule at some point. The last release was 2 months ago apparently.

We have at least one user who's requested a new release. Some options:

  • Follow the Rust approach of releasing every n weeks, regardless of what's ready and what's not.
  • Aim for a regular cadence, but tie the release to the readiness of certain issues/features/whatever.
  • Release when the list of unreleased features piles up sufficiently; basically, whenever users are asking for it.

@nickolay @andygrove thoughts?

SQL:2016

There's a new version of the standard, so it'd be great to catch up with it.

Add support for LIMIT prepared query

Hi, I am using sqlparser 0.4.0 and rust version 1.36.0

Sql Query
SELECT "emp"."id" FROM "emp" WHERE "emp"."id" = $1 LIMIT $2 -- binds: [123, 5]

Parse error
ParserError("Expected literal int, found: $2")

Used Dialect:

impl Dialect for AckoPostgresSqlDialect {
    fn is_identifier_start(&self, ch: char) -> bool {
        (ch >= 'a' && ch <= 'z')
            || (ch >= 'A' && ch <= 'Z')
            || (ch == '@')
            || ch == '$'
            || ch == '_'
    }

    fn is_identifier_part(&self, ch: char) -> bool {
        (ch >= 'a' && ch <= 'z')
            || (ch >= 'A' && ch <= 'Z')
            || (ch >= '0' && ch <= '9')
            || (ch == '@')
            || ch == '$'
            || ch == '_'
    }
}

Add AddColumn in AlterTableOperation.

The AlterTable statements seems to lack support for adding a new column to an existing table.

pub enum AlterTableOperation {
    AddConstraint(TableConstraint),
    DropConstraint { name: Ident },
}

Add Support for Postgres JSON prepared query

I am using sqlparser 0.4.0 with rust version 1.36.0
Query:
SELECT "emp"."id", "emp"."data" FROM "emp" WHERE id = $1 AND data->>'id'=$2 -- binds: ["emp_name", "123"]

Error:
ParserError("Expected an expression, found: >")

Dialect Used:

impl Dialect for MyPostgresSqlDialect {
    fn is_identifier_start(&self, ch: char) -> bool {
        (ch >= 'a' && ch <= 'z')
            || (ch >= 'A' && ch <= 'Z')
            || (ch == '@')
            || ch == '$'
            || ch == '_'
    }

    fn is_identifier_part(&self, ch: char) -> bool {
        (ch >= 'a' && ch <= 'z')
            || (ch >= 'A' && ch <= 'Z')
            || (ch >= '0' && ch <= '9')
            || (ch == '@')
            || ch == '$'
            || ch == '_'
    }
}

Even I am adding chars - and > into is_identifier_part then it is giving Error:: ParserError("Expected end of statement, found: \'id\'")

No prefix parser for keyword NOT

Expressions with NOT don't seem to parse, returning No prefix parser for keyword NOT instead.
E.g. SELECT count(0) FROM default WHERE NOT passenger_count <> 1

Design suggestion: change ASTNode variants to contain concrete structs

As is, the example below can be a necessary evil. Using separate, concrete structs allows the data to be passed as a single value.

fn main() {
    // ...
    match sql_ast {
        SQLSelect { projection, relation, joins, selection, 
                    order_by, group_by, having, limit } =>  {
            select(projection, relation, joins, selection,
                   order_by, group_by, having, limit, &mut out_ts);
        },
        _ => {}
    }
}

fn select(projection: Vec<ASTNode>, relation: Option<Box<ASTNode>>, joins: Vec<Join>,
          selection: Option<Box<ASTNode>>, order_by: Option<Vec<SQLOrderByExpr>>,
          group_by: Option<Vec<ASTNode>>, having: Option<Box<ASTNode>>, limit: Option<Box<ASTNode>>,
          out_ts: &mut TokenStream) {
    // ...
}

Support ANSI SQL datetime / interval literals

See:

An open question is whether the date/time values should store the string values intact or parse the strings and store them as chrono values as in the code I removed in https://github.com/andygrove/sqlparser-rs/pull/61 (andygrove@de177f1) - I'm leaning towards the former, so that a user of this library can choose the preferred representation.

Add support for parsing placeholders

Maybe I missed something, but I think placeholders are not yet supported and cannot be parsed. I would like to ask if this could be added.

Ideally, with few extensions than a regular parser:

  • In regular SQL queries, only SELECT, INSERT, UPDATE queries can have placeholders. But for my use case it would be great if also other statements can have them, like CREATE VIEW which takes a query, and that one could have placeholders, because CREATE VIEW is specially processed in my case. (See mit-pdos/noria#142 for more information.)
  • Common SQL databases support position-based placeholders, I would love it some way of having keyword-based ones could be added. This then maps very well to how one can provide values with a dict structure instead of position.

PostgreSQL RETURNING Data From Modified Rows

Postgresql RETURNING

INSERT INTO users (firstname, lastname) VALUES ('Joe', 'Cool') RETURNING id;

UPDATE products SET price = price * 1.10
  WHERE price <= 99.99
  RETURNING name, price AS new_price;

DELETE FROM products
  WHERE obsoletion_date = 'today'
  RETURNING *;
Err(ParserError("Expected end of statement, found: RETURNING"))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.