derekstride / tree-sitter-sql Goto Github PK

View Code? Open in Web Editor NEW

146.0 6.0 47.0 43.42 MB

SQL grammar for tree-sitter

Home Page: http://derek.stride.host/tree-sitter-sql/

License: MIT License

JavaScript 87.47% Scheme 7.53% Shell 0.50% C 4.50%

tree-sitter tree-sitter-parser sql

tree-sitter-sql's Introduction

tree-sitter-sql

A general/permissive SQL grammar for tree-sitter.

Installation

We don't commit the generated parser files to the main branch. Instead, you can find them on the gh-pages branch. We're open to feedback & encourage you to open an issue to discuss any problems.

They are also hosted on the GitHub pages site and available for download here: github://derekstride/tree-sitter-sql/gh-pages.tar.gz.

Plugin maintainers ensure to specify the HEAD (or a specific revision) of the gh-pages branch when integrating with this project.

Step 1: Download the parser files

Using git

git clone https://github.com/DerekStride/tree-sitter-sql.git
cd tree-sitter-sql
git checkout gh-pages

Using curl

curl -LO https://github.com/DerekStride/tree-sitter-sql/archive/refs/heads/gh-pages.tar.gz
tar -xzf gh-pages.tar.gz
cd tree-sitter-sql-gh-pages

Step 2: Compile the Parser

Tree-sitter parsers need to be compiled as a shared-object / dynamic-library, you can enable this by passing the -shared & -fPIC flags to your compiler.

cc -shared -fPIC -I./src src/parser.c src/scanner.c -o sql.so

Using cargo

cargo add tree-sitter-sequel

Using npm

npm i @derekstride/tree-sitter-sql

Using pip

pip install tree-sitter-sql

Development

See CONTRIBUTING.md for documentation on how to set up the project for development.

Features

For a complete list of features see the the tests

References

Wikipedia#SQL_syntax - I consulted wikipedia for naming conventions, though I may not have been strict early on in the prototyping.
Phoenix Language Reference - A reference diagram.
SQLite's railroad diagram for expr - Another reference diagram.
Postgresql syntax documentation
Mariadb syntax documentation

Other projects

tree-sitter-sql's People

Contributors

Stargazers

Watchers

Forkers

muskanmahajan37 ffengill dmfay cfabianski zchee jaredramirez matthias-q mattmassicotte kristijanhusak immanuelhume lillymottershead alexkreidler pleshevskiy wishdev leoniephiline alexklapheke abalabahaha trevor-clarke langston-barrett treywood tjdevries toyrockets bombardier200 thecoolwinter antoineb ds-cbo blinxen leamingrad dgmcdona max-trepos barnettben eklmv rien serbanrobu ttitcombe evrensen467 ianks guilhas07 yabutan nvrxq pipedreamhq grpse emanuelturis ctcampbell herringtondarkholme vokturz

tree-sitter-sql's Issues

Comments are not terminated by newline

Example:

--
-- Table structure for table 'tx_digibordauthentication_activity_log'
--
CREATE TABLE tx_digibordauthentication_activity_log
(
  uid         int(11) unsigned    NOT NULL auto_increment,
);

Adding a space to the -- above CREATE does not help.

Adding any letter does help, however:

--
-- Table structure for table 'tx_digibordauthentication_activity_log'
-- X
CREATE TABLE tx_digibordauthentication_activity_log
(
  uid         int(11) unsigned    NOT NULL auto_increment,
);

Could the problem be this regex being eager-matching / greedily-matching?

https://github.com/DerekStride/tree-sitter-sql/blob/main/grammar.js#L331

Could a lazy /.*?\n/ do the trick?

CASE WHEN returns errors

Hi,
I was trying to find the cause of the following example query:

WITH test_data AS (
SELECT 
    *
FROM
    (VALUES 
        (true, true, false), 
        (false, true, true), 
        (true, true, true)) 
    AS t (a,b,c)
)
SELECT 
    CASE WHEN a THEN 'A' END AS A,
    CASE WHEN b THEN 'B' END AS B,
    CASE WHEN c THEN 'C' END AS C
FROM test_data;

This returns errors in CASE WHEN a THEN 'A' END AS A after the when. There must be some error in the second choice on the following section (L914)

choice(
  // simplified CASE x WHEN
  seq(
    $._expression,
    $.keyword_when,
    $._expression,
    $.keyword_then,
    $._expression,
    repeat(
      seq(
        $.keyword_when,
        $._expression,
        $.keyword_then,
        $._expression,
      )
    ),
  ),
  // standard CASE WHEN x, where x must be a predicate
  seq(
    $.keyword_when,
    $.predicate,
    $.keyword_then,
    $._expression,
    repeat(
      seq(
        $.keyword_when,
        $.predicate,
        $.keyword_then,
        $._expression,
      )
    ),
  ),
),

Can you take a look?

In the second choice, we expect a predicate after the when. However, it can also be a boolean column. Can we handle this case at all?

Invalid node type for language sql

When opening SQL files I am getting the following error:

E5108: Error executing lua /usr/share/nvim/runtime/lua/vim/treesitter/query.lua:261: query: invalid node type at position 315 for language sql
stack traceback:
        [C]: in function '_ts_parse_query'
        /usr/share/nvim/runtime/lua/vim/treesitter/query.lua:261: in function 'get_query'
        /usr/share/nvim/runtime/lua/vim/treesitter/highlighter.lua:50: in function 'new'
        /usr/share/nvim/runtime/lua/vim/treesitter/highlighter.lua:194: in function 'get_query'
        ...tart/playground/lua/nvim-treesitter-playground/utils.lua:52: in function 'fn'
        /usr/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:245: in function 'for_each_tree'
        ...tart/playground/lua/nvim-treesitter-playground/utils.lua:39: in function 'get_hl_groups_at_position'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:11: in function 'get_hl_group_for_node'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:34: in function 'flatten_node'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:77: in function 'process'
        ...t/playground/lua/nvim-treesitter-playground/internal.lua:693: in function 'update'
        ...t/playground/lua/nvim-treesitter-playground/internal.lua:148: in function 'toggle_hl_groups'
        [string ":lua"]:1: in main chunk

I am not sure when this started to happen. The specific dialect I am using is MySQL.

Insert overwrite with and without partition aren't parsed properly

Hi, the following queries aren't parsed properly. Please have a look.
sparksql syntax reference

insert overwrite data_base_name.tb_name_01 
partition ( key1 = '20230522' ,key2 = 'bb')
SELECT *
  FROM 
data_base_name.tb_name_00
;

insert overwrite data_base_name.tb_name_02
SELECT
  a1.col_a,
  a1.col_b
FROM 
(
  SELECT
    *
  FROM
    data_base_name.tb_name_01 
  WHERE
    key1 >= 'bb'
) a1

two queries:

just the second query

LICENSE

Hi, I see you mention MIT in the metadata, but there's no actual license text with attribution to copy. Can you add a LICENSE file to this repo? Thanks!

Question: ANSI SQL?

Hi Derek,

I had a few questions regarding your grammar:

I was wondering what source grammar you used to build the grammar file
Is this sql grammar ANSI SQL or some other dialect
How did you start writing this? Did you start with the grammar.js and was everything else generated using the parser generator?

Keyword `CHANGE` is missing (`ALTER TABLE`)

ALTER TABLE `tbl`
CHANGE `field_old` `field_new` INT;

Feature request: `RENAME TABLE tbl_old TO tbl_new`

RENAME TABLE `tbl_old` TO `tbl_new`;

Documentation

https://mariadb.com/kb/en/rename-table/

Error

Syntax

RENAME TABLE[S] [IF EXISTS] tbl_name 
  [WAIT n | NOWAIT]
  TO new_tbl_name
    [, tbl_name2 TO new_tbl_name2] ...

Adjustments

New $._rename_statement, or directly $.rename_table at:
https://github.com/DerekStride/tree-sitter-sql/blob/main/grammar.js#L484-L488

Definition of rename_table, similar to, but simpler than alter_table.

Missing floating numbers

The parser does not handle 1.1 .1 or 1.

Float should be parsed in statements like:

select
1.1 as f
where 1.1 = 1.1

select
a * 1.1
from b

I will submit a PR, but I am struggling to put the node at the right location.

Misuse of `alias()`?

I noticed that while creating syntax queries for hive statements.

I found that some uses of alias() (see https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl) appear questionable to me, and I wanted to check with you.

1. Doubts: `(identifier)` for builtin functions

In both the following builtin functions, the function name keyword is replaced in the AST with identifier. I just wanted to check if that's really intentional?

Isn't an (identifier) rather identifying a application-domain-specific item, such as a table or field - rather than a builtin SQL feature keyword?

At first sight, not only (identifier) is wrong in this case, but the entire field('name', ...) should just be replaced by $.keyword_cast and $.keyword_count.

However, field('name', ...) gives compatibility with the generic (invocation) (

tree-sitter-sql/grammar.js

Lines 1378 to 1381 in 393e0d3

    
           invocation: $ => seq( 
        
             field('name', $.identifier), 
        
             paren_list(field('parameter', $._expression)), 
        
           ),

), where a field is necessary to extract the function name -- so it's probably only the (identifier) alias that is to be removed.

✅ Done so in e4e43ba and eac9da2

1.1. `CAST`

tree-sitter-sql/grammar.js

Lines 1341 to 1350 in 393e0d3

    
           cast: $ => seq( 
        
             field('name', alias($.keyword_cast, $.identifier)), 
        
             '(', 
        
             seq( 
        
               field('parameter', $._expression), 
        
               $.keyword_as, 
        
               $._type, 
        
             ), 
        
             ')', 
        
           ),

✅ Fixed in e4e43ba fix: Remove identifier alias from cast builtin function - PR #102

1.2. `COUNT`

tree-sitter-sql/grammar.js

Lines 1357 to 1362 in 393e0d3

    
           count: $ => seq( 
        
             field('name', alias($.keyword_count, $.identifier)), 
        
             '(', 
        
             $._aggregate_expression, 
        
             ')', 
        
           ),

✅ Fixed in eac9da2 fix: Remove identifier alias from count builtin function - PR #102

2. Misuse? String literals as `(identifier)`

tree-sitter-sql/grammar.js

Lines 980 to 988 in 393e0d3

    
           storage_location: $ => prec.right( 
        
               seq( 
        
                   $.keyword_location, 
        
                   field('path', alias($._literal_string, $.identifier)), 
        
                   optional( 
        
                       seq( 
        
                           $.keyword_cached, 
        
                           $.keyword_in, 
        
                           field('pool', alias($._literal_string, $.identifier)),

Shouldn't these rather be alias($._literal_string, $.literal)?

✅ Fixed in 4216ecf fix: Replace identifier with literal for hive storage location and cache pool - PR #102

';', '"', '\n' and '/path/data' in the Hive CREATE TABLE test end up as identifier tree item, while they clearly are literals:

2.1 Update: `';'`, `'"'` and `'\n'` do not show up in the AST at all

tree-sitter-sql/test/corpus/create.txt

Lines 967 to 1029 in 393e0d3

    
           ================================================================================ 
        
           Create hive table 
        
           ================================================================================ 
        
           CREATE EXTERNAL TABLE tab (col int) 
        
           PARTITIONED BY (col int) 
        
           SORT BY (col) 
        
           ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' ESCAPED BY '"' 
        
           LINES TERMINATED BY '\n' 
        
           STORED AS PARQUET 
        
           LOCATION '/path/data' 
        
           CACHED IN 'pool1' WITH REPLICATION = 2 
        
           -------------------------------------------------------------------------------- 
        
           (program 
        
             (statement 
        
               (create_table 
        
                 (keyword_create) 
        
                 (keyword_external) 
        
                 (keyword_table) 
        
                 (table_reference 
        
                   name: (identifier)) 
        
                 (column_definitions 
        
                   (column_definition 
        
                     name: (identifier) 
        
                     type: (int (keyword_int)))) 
        
                 (table_partition 
        
                   (keyword_partitioned) 
        
                   (keyword_by) 
        
                   (column_definitions 
        
                     (column_definition 
        
                       name: (identifier) 
        
                       type: (int (keyword_int))))) 
        
                 (table_sort 
        
                   (keyword_sort) 
        
                   (keyword_by) 
        
                   (identifier)) 
        
                 (row_format 
        
                   (keyword_row) 
        
                   (keyword_format) 
        
                   (keyword_delimited) 
        
                   (keyword_fields) 
        
                   (keyword_terminated) 
        
                   (keyword_by) 
        
                   (keyword_escaped) 
        
                   (keyword_by) 
        
                   (keyword_lines) 
        
                   (keyword_terminated) 
        
                   (keyword_by)) 
        
                 (stored_as 
        
                   (keyword_stored) 
        
                   (keyword_as) 
        
                   (keyword_parquet)) 
        
                 (storage_location 
        
                   (keyword_location) 
        
                   path: (identifier) 
        
                   (keyword_cached) 
        
                   (keyword_in) 
        
                   pool: (identifier) 
        
                   (keyword_with) 
        
                   (keyword_replication) 
        
                   value: (literal)))))

✅ Fixed in d2f0f66 _ feat: Add AST-fields for row terminated/escaped-by and lines terminated-by-_ PR #102

Field names from: https://spark.apache.org/docs/latest/sql-ref-syntax-hive-format.html

Missing syntax: `ALTER TABLE ... MODIFY [ COLUMN ] ...`

https://mariadb.com/kb/en/alter-table/#modify-column

Postgres support

Hi,

Are there any plans to support PostgreSQL? Basic query works, but when I try to alias something and use double quotes, parsing fails.
For example:

select title as "Post title" from posts;

Parses it like this:

statement [0, 0] - [0, 40]
  select [0, 0] - [0, 28]
    keyword_select [0, 0] - [0, 6]
    ERROR [0, 7] - [0, 15]
      field [0, 7] - [0, 12]
        name: identifier [0, 7] - [0, 12]
      keyword_as [0, 13] - [0, 15]
    select_expression [0, 16] - [0, 28]
      literal [0, 16] - [0, 28]
  from [0, 29] - [0, 39]
    keyword_from [0, 29] - [0, 33]
    relation [0, 34] - [0, 39]
      table_reference [0, 34] - [0, 39]
        name: identifier [0, 34] - [0, 39]

Not using double quotes fixes it:

select title as Post_title from posts;

Result:

statement [0, 0] - [0, 38]
  select [0, 0] - [0, 26]
    keyword_select [0, 0] - [0, 6]
    select_expression [0, 7] - [0, 26]
      field [0, 7] - [0, 12]
        name: identifier [0, 7] - [0, 12]
      keyword_as [0, 13] - [0, 15]
      alias: identifier [0, 16] - [0, 26]
  from [0, 27] - [0, 37]
    keyword_from [0, 27] - [0, 31]
    relation [0, 32] - [0, 37]
      table_reference [0, 32] - [0, 37]
        name: identifier [0, 32] - [0, 37]

Invalid node type when using dadbod-ui

I cant seem to get treesitter highlighting working when writing queries inside vim-dadbod-ui

I keep getting the following error:

Between expression or parentheses within case when context are not parsed correctly

Tree-sitter-sql is really great, while I found some legal queries aren't paresed correctly. Would you mind checking them?
The problems occur when 'between expression' or 'parentheses' are surrounded by case when statement, here are two examples:

SELECT 
a.*,
case when a.col BETWEEN 0 AND 1 then 1
when a.col >=1 then 0
end as col2
FROM tb_name a
;


SELECT
  a.*,
  CASE WHEN a.col_1 >= 100 THEN 0 ELSE 1 END AS col_a,
  CASE WHEN (a.col_1 >= 100) THEN 0 ELSE 1 END AS col_b
  CASE WHEN ((a.col_0 >= 1) AND (a.col_1 >= 100)) THEN 0
  END AS col_c
FROM
  tb_name a
WHERE
  ((a.col_0 >= 1) AND (a.col_1 >= 100))
;

Other parser implementations

I didn't see your project when I started my parser implementation https://github.com/m-novikov/tree-sitter-sql
I added you to the readme, but maybe we should join forces somehow :)

SQL file lose syntax

Hello, I have a problem with syntax highlighting in an SQL file. I created an issue in the nvim-treesitter repository, but I was advised to come here. I am attaching a screenshot of what it looks like in my nvim 9.

Also here is a link to the issue in nvim-treesiter repo - there is more information about my problem.
nvim-treesitter/nvim-treesitter#4668

Issues with precedence implementing `NOT IN`

What

Implementing NOT IN as mentioned in #126, is causing some errors with precedence. I think the root cause might be from the between expression added in #117. It has a rule seq($._expression, $.keyword_and, $._expression) but the binary_expression node is also defined as seq($._expression, $.keyword_and, $._expression).

Some investigation & work is required to detangle the precedence issues and implement NOT IN.

Some more MariaDB syntax missing or inconsistent highlight

Using 4fe05b2

CREATE TABLE publications(
    `uid` INT AUTO_INCREMENT NOT NULL,
    `pid` INT DEFAULT 0 NOT NULL,
    `tstamp` INT UNSIGNED DEFAULT 0 NOT NULL,
    `crdate` INT UNSIGNED DEFAULT 0 NOT NULL,
    `cruser_id` INT UNSIGNED DEFAULT 0 NOT NULL,
    `deleted` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `hidden` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `notes` TEXT DEFAULT NULL,
    `code` VARCHAR(3) DEFAULT '' NOT NULL,
    `name` VARCHAR(100) DEFAULT '' NOT NULL,
    `frequency` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `editions` INT UNSIGNED DEFAULT 0 NOT NULL,
    `subscription_types` INT UNSIGNED DEFAULT 0 NOT NULL,
    `methods` INT UNSIGNED DEFAULT 0 NOT NULL,
    INDEX `parent`(pid),
    PRIMARY KEY(uid)
) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

NOT is sometimes lit, sometimes it isn't. ✅ Fixed by #73
UNSIGNED might be something to highlight as keyword? ✅ Fixed by #88
INDEX should be a keyword like KEY ✅ Fixed by #77
DEFAULT CHARACTER SET, COLLATE and ENGINE all keywords. ✅ Fixed by #77

CREATE TABLE some_table(
    `code` VARCHAR(9) DEFAULT '' NOT NULL,
    `name` VARCHAR(100) DEFAULT '' NOT NULL,
    `published_at` DATE DEFAULT '0000-00-00' NOT NULL,
    INDEX `parent`(pid),
    INDEX `parent_publication`(publication, hidden, deleted),
    INDEX `range`(publication, published_at),
    PRIMARY KEY(uid)
) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

All string literals, such as '' and '0000-00-00' should probably be yellow, as the varchar lengths are, and as other string literals are, e.g. in UPDATE ... SET field = 'string literal'. ✅ Fixed by #73
INDEX etc. as above. ✅ Fixed by #77

ALTER TABLE `lesson` ADD editions INT DEFAULT 0 NOT NULL;

INT is not recognized as keyword in this position. ✅ Fixed by #80
integer DEFAULT values are white (not highlighted as literal), where I would have expected them to be yellow (literal). ✅ Fixed by #73

DROP INDEX `parent_method` ON site;
CREATE INDEX `parent_method` ON site (pid, deleted, parent_method, hidden);

INDEX after DROP is not recognized as keyword. ✅ Fixed by #96

UPDATE site
  SET publication = 1
  WHERE type = 'method' AND identifier = "some_identifier";

String literals without highlight in UPDATE ... SET ... WHERE ... in WHERE-position. (Either quote style.) ✅ Fixed - must be caused by a previous parse error in the same file.
The numeric literal in SET-position should also be yellow rather than white. ✅ Fixed by #73

ALTER TABLE site 
  RENAME COLUMN access_groups TO subscription_types;

TABLE in ALTER TABLE not recognized as keyword. ✅ Fixed - must be caused by a previous parse error in the same file.

ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_code` VARCHAR(10) DEFAULT '' NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_access_model` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_allow_historical` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_allow_media_archive` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_access_first` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_access_last` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_restrict_auth` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD tx_digibordauthentication_notes TEXT DEFAULT NULL;
ALTER TABLE `fe_groups` ADD tx_digibordauthentication_publication INT UNSIGNED DEFAULT 0 NOT NULL;

CREATE INDEX `publication` ON `fe_groups` (pid, tx_digibordauthentication_publication);

ALTER TABLE `fe_groups` CHANGE `tx_digibordauthentication_site` `tx_digibordauthentication_site` INT UNSIGNED DEFAULT 0 NOT NULL;

Inconsistent highlight for ADD ✅ Fixed by #80
No highlight for ON at CREATE INDEX ... ON ✅ Fixed by #80 (parse error in the ALTER TABLE above)
No highlight for CHANGE in ALTER TABLE ... CHANGE ✅ Fixed by #99
Datatypes (except for TEXT!) and UNSIGNED not recognized as highlight (already mentioned further above). ✅ Fixed by #80 (parse error in the ALTER TABLE above) + ✅ Fixed by #99
Default values not recognized as literals (already mentioned further above). ✅ Fixed by #73

ALTER TABLE some_table 
ADD some_field INT UNSIGNED NOT NULL DEFAULT '0'
AFTER some_other_field;

AFTER not highlighted. ✅ Fixed by #99
INT UNSIGNED and '0' as mentioned above.

INT ✅ Fixed by #80
UNSIGNED ✅ Fixed by #88
'0' ✅ Fixed by #73

Please keep the generated parser files in the repository

The upcoming Emacs 29 can be optionally compiled with the tree-sitter library, and comes with a user command that assists users in building and installing the grammar libraries they need. However, that command assumes the C/C++ parser and scanner source files are available in the corresponding repositories, so as to avoid requiring Emacs users to have too many development tools installed.

The recent removal of the generated parser files from the repository will cause that Emacs command to fail; users will be unable to build and install the tree-sitter grammar for SQL using your repository, unless they install quite a few more tools and build the library manually, something that is a significant obstacle for many Emacs users.

So please add those generated files back to the repository. The Emacs user community will thank you.

Thanks in advance.

`CREATE TABLE|VIEW|MATERIALIZED VIEW` do not accept CTE as select statement

Create table as 
with cte1 as (
    select a from b
)
select * from cte1

Does currently not work. Same for VIEW and MATERIALIZED VIEW

P.S. I will work later today on this problem.

Postgres `CURRENT_TIMESTAMP` function

see #120 for more context

`UNSIGNED` is missing

Causing queries like this to fail:

SELECT CAST("1" AS UNSIGNED INTEGER);

Example source: https://mariadb.com/kb/en/cast/ .

Tree:

'unsigned' does not appear in grammar.js at all and should probably be a keyword and part of the numeric type declarations.

Parsing nested CTEs

I found that the following valid PostgreSQL query involving nested CTEs is not parsed correctly by this grammar:

WITH top_cte AS (
  WITH nested_cte AS (
    SELECT 1 as one, 2 as two
  )
  SELECT one, two
  FROM nested_cte
)
SELECT *
FROM top_cte;

Received AST:

(program
  (ERROR
    (statement
      (keyword_with)
      (cte
        (identifier)
        (keyword_as)
        (ERROR
          (keyword_as))
        (statement
          (select
            (keyword_select)
            (select_expression
              (term
                (literal)
                (keyword_as)
                (identifier))
              (term
                (literal)
                (keyword_as)
                (identifier))))))
      (select
        (keyword_select)
        (select_expression
          (term
            (field
              (identifier)))
          (term
            (field
              (identifier)))))
      (from
        (keyword_from)
        (relation
          (object_reference
            (identifier))))))
  (statement
    (select
      (keyword_select)
      (select_expression
        (term
          (all_fields))))
    (from
      (keyword_from)
      (relation
        (object_reference
          (identifier))))))

This SO answer says nested CTEs are not valid, at least for T-SQL. I found that PostgreSQL has no problem executing that query, though:

Bug: support SELECT from more than two tables

As of right now a select like

SELECT *
FROM 
    Table1 a,
    Table2 b,
    ...
    TableN z
WHERE...

will result in an error on the table list. It does recognize both the first and the last table with its name and table_alias just fine but it does not recognize the table(s) in between.

Feature Request: Support for parameter placeholders

Hi!

In my daily work I have to deal with literal SQL queries that contain parameter placeholders in 99% of the cases.
Is it possible/planned/not planned to add support for them? The drivers I use expect ? as placeholder although there are other syntaxes.

Or is this something that can possibly be even added at the level of queries?

Cheers
Manuel

Make `COLUMN` optional in `ALTER TABLE t ADD`

I will try to prepare my own PR for this.

Make $.keyword_column optional in https://github.com/DerekStride/tree-sitter-sql/blob/main/grammar.js#L660

Support for Hive specific statements

Hi,
would it be possible to support Hive specific commands such STORED AS PARQUET or COMPUTE STATS?
This SQL extensions are supported by many databases these days (Redshift, Impala, Snowflake, Datafusion)

I would create a PR for that, if you are ok with that and once I have collected more examples.

Cheers,
Matt

T-SQL Alter Table

TSQL:

ALTER TABLE Class_Session
ADD
type varchar(255) NOT NULL DEFAULT ('online');

Error:

Expected [A-Za-z0-9_:] or [A-Za-z0-9_] but " " found. (sql)

Expected:
No error, this is valid sql

`keyword_language` is unused, renderet to `parser.c` as `aux_sym_*`

While preparing helix-editor/helix#7387 I noticed that a (keyword_language) query makes tree-sitter fail parsing with a NodeType error.

Upon investigation, I found that:

The symbol is defined in grammar.json.

But it is not defined in node-types.json.

In parser.c, it is defined as aux_sym_keyword_language_token1. 🤔 (see also the codegen)

(keyword_language) appears to be unused.

I think it can be removed - or shall it be used for function_language?

`PARTITIONED BY` column_definitions for impala/hive is not correct

CREATE TABLE tab
PARTITIONED BY (col1, col2) -- <-- ERROR
STORED AS PARQUET AS
SELECT
    col1,
    col2,
    col3
FROM tab_2

The columns definitions list is the same as for a create table statement, however for the partitioning it is not necessary to add the datatype.

select statements enclosed in parentheses next to 'union' aren't parsed proplerly

Hi, the following query isn't parsed correctly, please have a look.

-- CREATE TABLE tb AS
(
  SELECT 1 as col
)
UNION ALL
(
  SELECT 2 as col
);

Support for floating point literals

_number is defined as integer:

    _number: _ => /\d+/,

tree-sitter-sql/grammar.js

Line 1488 in 286e10c

_number: _ => /\d+/,

This causes a parse error in queries containing floating point numbers:

SELECT 3.5;

For a fix, the question is if _number should add an optional (?:\.\d+)? or if a secondary number type should be added.

$._number is used in quite a few places. Some of these places should probably allow floating point numbers (e.g. ranges), while others should not (like params to parametric_type().

`ERROR` on `COUNT(DISTINCT tbl.field)`

Tree-sitter syntax error for tbl and field in:

SELECT
  COUNT(DISTINCT `tbl`.`field`)
FROM
  whatever;

Postgres Array function is not parsed correctly

select array(select 'x')

Is a valid SQL statement in Postgres. The odd thing is, that it does not seems to be documented.

Since we have the array node for statements like this

select array['x']

This actually gives the same result.

Quoted Table Indentifiers not parsed correctly

Statements like:

select b."A" from b

Result in the following AST:

statement [0, 0] - [0, 19]
  select [0, 0] - [0, 19]
    keyword_select [0, 0] - [0, 6]
    select_expression [0, 7] - [0, 19]
      term [0, 7] - [0, 19]
        value: field [0, 7] - [0, 19]
          table_alias: identifier [0, 7] - [0, 8]
          ERROR [0, 9] - [0, 17]
            identifier [0, 9] - [0, 12]
            keyword_from [0, 13] - [0, 17]
          name: identifier [0, 18] - [0, 19]

It is valid SQL syntax and necessary, when column names contain capital letters.

Parsing PostgreSQL JSON operators

I've noticed PostgreSQL JSON operators are not parsed correctly.

For example, the following query has parsing errors:

SELECT "user"->>'login' AS username FROM users;

Parsed AST:

(program
  (statement
    (select
      (keyword_select)
      (select_expression
        (term
          (binary_expression
            (literal)
            (ERROR)
            (literal))
          (keyword_as)
          (identifier))))
    (from
      (keyword_from)
      (relation
        (object_reference
          (identifier))))))

Question: Is recursion supported in this grammar?

Hello, we were wondering if this grammar supports recursion (especially for multiple queries)?

For example:

select * from foo where bar > (select foobar from barfoo);

Will something like this be supported by the grammar?

`DROP INDEX` is missing from `_drop_statement`

Only DROP TABLE and DROP VIEW are currently supported:

https://github.com/DerekStride/tree-sitter-sql/blob/main/grammar.js#L732-L757

`NOT IN` precedence issues

As discussed in #126 NOT IN is currently not implemented

Negative integers are not parsed

Issue is _number being used as choice in literal, but also as part of _decimal_number.

    literal: $ => prec(2,
      choice(
        $._number,
        $._decimal_number,
        $._literal_string,
        $.keyword_true,
        $.keyword_false,
        $.keyword_null,
      ),
    ),
    _double_quote_string: _ => seq('"', /[^"]*/, '"'),
    _literal_string: $ => choice(
      seq("'", /[^']*/, "'"),
      $._double_quote_string,
    ),
    _number: _ => /\d+/,
    _decimal_number: $ => choice(
        seq(optional("-"), ".", $._number),
        seq(optional("-"), $._number, ".", $._number),
        seq(optional("-"), $._number, "."),
    ),

Therefore, \d+ is correct, and a new anonymous option _integral_number must be added, using seq(optional("-"), $._number).

Usage of parenthesis is wrong

In $.list we are using paren_list() which wraps parenthesis around a comma separated list. Therefore, statements like:

select 1 + (2 * 3, 2 * 3)

are parsed, even though they are not allows in SQL (I have just tested it against Postgres. I am sure it is wrong in any SQL dialect)

Here is the AST for this statement:

(program [0, 0] - [1, 0]
  (statement [0, 0] - [0, 25]
    (select [0, 0] - [0, 25]
      (keyword_select [0, 0] - [0, 6])
      (select_expression [0, 7] - [0, 25]
        (term [0, 7] - [0, 25]
          value: (binary_expression [0, 7] - [0, 25]
            left: (literal [0, 7] - [0, 8])
            right: (list [0, 11] - [0, 25]
              (binary_expression [0, 12] - [0, 17]
                left: (literal [0, 12] - [0, 13])
                right: (literal [0, 16] - [0, 17]))
              (binary_expression [0, 19] - [0, 24]
                left: (literal [0, 19] - [0, 20])
                right: (literal [0, 23] - [0, 24])))))))))

Update the repo to not require checking in the parser files

Right now everything generated via tree-sitter generate gets checked in, I'd like to change it so that only the grammar.js is checked into the repo and the artifacts can be generated by the tree-sitter tool but consumers of this parser.

SQL highlighting issue in helix-editor, using tree-sitter-sql

The original bug report can be found at helix-editor/helix#5186

Other than in the bug report, I am using now helix at latest master commit (edd0ba7 right now), but that makes no difference.

I updated tree-sitter-sql by overriding https://github.com/helix-editor/helix/blob/8e10c592b90383a90b7e4a9335d323f8fab99ab5/languages.toml#L1435-L1437
followed by rebuilding grammers:

$ hx --grammar fetch                                                                                                                                                                                                                                                                                    
Fetching 128 grammars
127 up to date git grammars
1 updated grammars
        sql now on 6adc9578afb5c361858213adae5e2de990e46cb1


$ hx --grammar build                                                                                                                                                                                                                                                                                    
Building 128 grammars
127 grammars already built
1 grammars built now
        ["sql"]

however, the highlighting still fails, which makes me conclude this will rather be a tree-sitter-sql bug. (Or two, actually.)

Summary

Example SQL:

INSERT INTO ... SET ... syntax:

INSERT INTO some_table
  field = "String does not get highlight in INSERT SET syntax";

helix edd0ba7 with tree-sitter-sql 9d98029: String value does not get highlight:

INSERT INTO ... (...) VALUES (...) syntax with multiple values records:

INSERT INTO some_table
  (field)
VALUES
  ("String value"),
  ("String value");

helix edd0ba7 with tree-sitter-sql 9d98029: Values of second record do not get highlight:

GROUP_CONCAT with SEPARATOR:

SELECT GROUP_CONCAT(uid SEPARATOR ",") 
FROM some_table 
GROUP BY some_field;

SELECT GROUP_CONCAT(uid SEPARATOR ",") 
FROM some_table 
GROUP BY some_field;

helix edd0ba7 with tree-sitter-sql 9d98029: Separator string appears to not be expected (only "," should be yellow):

Refactor: remove `count` node

count is just a special invocation node, so we could simplify this. There is no special reason for having both.

This change requires downstream changes in the highlights.scm

Keywords `AFTER` is missing; `FIRST` and `AFTER` not yet part of `ALTER TABLE`

https://mariadb.com/kb/en/alter-table/

Used in ALTER TABLE ... ADD [COLUMN] ... [FIRST | AFTER col_name] ...

Dbt Jinja

Hi there,
I am using this Treesitter grammar for a while now in neovim. I work a lot with dbt, which consists of sql files with some jinja blocks inside. Is there any change to get this included in the grammar (or at least ignored).
I have some (pg)SQL files which look really weird, e.g. the Syntax Highlighting stops in the middle of the file and TS reports and error (even without jinja blocks near the error location).

Support `BEGIN ... END` blocks

BEGIN ... END blocks are used in MySQL and MSSQL.

Here's a minimal example (which GitHub also fails to highlight properly):

BEGIN
    CREATE TABLE FOO (bar INT);
END;

keyword_begin [0, 0] - [0, 5]
ERROR [1, 4] - [2, 3]
  keyword_table [1, 11] - [1, 16]

Multiline comments

Hi Derek! C-style multi-line comments are not parsed as a single (marginalia). I noticed it in the Helix editor which uses this project as its sql grammar source by default.

I took a look at https://github.com/m-novikov/tree-sitter-sql/blob/main/grammar.js#L1194 and saw that multiline comments were using seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/"). What do you think of updating the repo to support this? I'll be happy to test it out and open a PR.

`group_concat` as generic `invocation`

In 8c0ca86 group_concat was introduced. However, this function exists in other dialects was well with a different internal structure (aka the parameters of the list)

e.g.

select
    group_concat(col1, ',')
from tab

This statement is not parsed correctly. I think group_concat should a generic node of type invocation

Alternatively, implement the complete internal expression syntax. It looks quite similar between MariaDB, MySQL and Impala. In that example above, the optional separator is missing.

generated files and workflows

We need to check in the generated parser files to support downstream use, but that gets us into merge conflicts every time someone cuts a new branch before someone else lands theirs. It's only happened a few times so far, but if the pace picks up we should consider an action that can build those on pull requests and squash merge.

We could commit after build+test on the branch in progress, but that seems like a bit much.

	invocation: $ => seq(
	field('name', $.identifier),
	paren_list(field('parameter', $._expression)),
	),

	cast: $ => seq(
	field('name', alias($.keyword_cast, $.identifier)),
	'(',
	seq(
	field('parameter', $._expression),
	$.keyword_as,
	$._type,
	),
	')',
	),

	count: $ => seq(
	field('name', alias($.keyword_count, $.identifier)),
	'(',
	$._aggregate_expression,
	')',
	),

	storage_location: $ => prec.right(
	seq(
	$.keyword_location,
	field('path', alias($._literal_string, $.identifier)),
	optional(
	seq(
	$.keyword_cached,
	$.keyword_in,
	field('pool', alias($._literal_string, $.identifier)),

	================================================================================
	Create hive table
	================================================================================

	CREATE EXTERNAL TABLE tab (col int)
	PARTITIONED BY (col int)
	SORT BY (col)
	ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' ESCAPED BY '"'
	LINES TERMINATED BY '\n'
	STORED AS PARQUET
	LOCATION '/path/data'
	CACHED IN 'pool1' WITH REPLICATION = 2

	--------------------------------------------------------------------------------

	(program
	(statement
	(create_table
	(keyword_create)
	(keyword_external)
	(keyword_table)
	(table_reference
	name: (identifier))
	(column_definitions
	(column_definition
	name: (identifier)
	type: (int (keyword_int))))
	(table_partition
	(keyword_partitioned)
	(keyword_by)
	(column_definitions
	(column_definition
	name: (identifier)
	type: (int (keyword_int)))))
	(table_sort
	(keyword_sort)
	(keyword_by)
	(identifier))
	(row_format
	(keyword_row)
	(keyword_format)
	(keyword_delimited)
	(keyword_fields)
	(keyword_terminated)
	(keyword_by)
	(keyword_escaped)
	(keyword_by)
	(keyword_lines)
	(keyword_terminated)
	(keyword_by))
	(stored_as
	(keyword_stored)
	(keyword_as)
	(keyword_parquet))
	(storage_location
	(keyword_location)
	path: (identifier)
	(keyword_cached)
	(keyword_in)
	pool: (identifier)
	(keyword_with)
	(keyword_replication)
	value: (literal)))))

derekstride / tree-sitter-sql Goto Github PK

tree-sitter-sql's Introduction

tree-sitter-sql

Installation

Step 1: Download the parser files

Step 2: Compile the Parser

Using cargo

Using npm

Using pip

Development

Features

References

Other projects

tree-sitter-sql's People

Contributors

Stargazers

Watchers

Forkers

tree-sitter-sql's Issues

Documentation

Error

Syntax

Adjustments

1. Doubts: (identifier) for builtin functions

1.1. CAST

1.2. COUNT

2. Misuse? String literals as (identifier)

2.1 Update: ';', '"' and '\n' do not show up in the AST at all

What

Summary

Recommend Projects

Recommend Topics

Recommend Org

1. Doubts: `(identifier)` for builtin functions

1.1. `CAST`

1.2. `COUNT`

2. Misuse? String literals as `(identifier)`

2.1 Update: `';'`, `'"'` and `'\n'` do not show up in the AST at all