Coder Social home page Coder Social logo

derekstride / tree-sitter-sql Goto Github PK

View Code? Open in Web Editor NEW
146.0 6.0 47.0 43.42 MB

SQL grammar for tree-sitter

Home Page: http://derek.stride.host/tree-sitter-sql/

License: MIT License

JavaScript 87.47% Scheme 7.53% Shell 0.50% C 4.50%
tree-sitter tree-sitter-parser sql

tree-sitter-sql's Introduction

tree-sitter-sql

Build/test GitHub Pages npm package version

A general/permissive SQL grammar for tree-sitter.

Installation

We don't commit the generated parser files to the main branch. Instead, you can find them on the gh-pages branch. We're open to feedback & encourage you to open an issue to discuss any problems.

They are also hosted on the GitHub pages site and available for download here: github://derekstride/tree-sitter-sql/gh-pages.tar.gz.

Plugin maintainers ensure to specify the HEAD (or a specific revision) of the gh-pages branch when integrating with this project.

Step 1: Download the parser files

Using git

git clone https://github.com/DerekStride/tree-sitter-sql.git
cd tree-sitter-sql
git checkout gh-pages

Using curl

curl -LO https://github.com/DerekStride/tree-sitter-sql/archive/refs/heads/gh-pages.tar.gz
tar -xzf gh-pages.tar.gz
cd tree-sitter-sql-gh-pages

Step 2: Compile the Parser

Tree-sitter parsers need to be compiled as a shared-object / dynamic-library, you can enable this by passing the -shared & -fPIC flags to your compiler.

cc -shared -fPIC -I./src src/parser.c src/scanner.c -o sql.so

Using cargo

cargo add tree-sitter-sequel

Using npm

npm i @derekstride/tree-sitter-sql

Using pip

pip install tree-sitter-sql

Development

See CONTRIBUTING.md for documentation on how to set up the project for development.

Features

For a complete list of features see the the tests

References

Other projects

tree-sitter-sql's People

Contributors

abalabahaha avatar antoineb avatar blinxen avatar bombardier200 avatar derekstride avatar dgmcdona avatar dmfay avatar ds-cbo avatar eklmv avatar grpse avatar guilhas07 avatar herringtondarkholme avatar immanuelhume avatar kristijanhusak avatar leamingrad avatar leoniephiline avatar matthias-q avatar mattmassicotte avatar trevor-clarke avatar treywood avatar wishdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-sql's Issues

Comments are not terminated by newline

Example:

--
-- Table structure for table 'tx_digibordauthentication_activity_log'
--
CREATE TABLE tx_digibordauthentication_activity_log
(
  uid         int(11) unsigned    NOT NULL auto_increment,
);

image

Adding a space to the -- above CREATE does not help.

Adding any letter does help, however:

--
-- Table structure for table 'tx_digibordauthentication_activity_log'
-- X
CREATE TABLE tx_digibordauthentication_activity_log
(
  uid         int(11) unsigned    NOT NULL auto_increment,
);

image

Could the problem be this regex being eager-matching / greedily-matching?

https://github.com/DerekStride/tree-sitter-sql/blob/main/grammar.js#L331

Could a lazy /.*?\n/ do the trick?

CASE WHEN returns errors

Hi,
I was trying to find the cause of the following example query:

WITH test_data AS (
SELECT 
    *
FROM
    (VALUES 
        (true, true, false), 
        (false, true, true), 
        (true, true, true)) 
    AS t (a,b,c)
)
SELECT 
    CASE WHEN a THEN 'A' END AS A,
    CASE WHEN b THEN 'B' END AS B,
    CASE WHEN c THEN 'C' END AS C
FROM test_data;

This returns errors in CASE WHEN a THEN 'A' END AS A after the when. There must be some error in the second choice on the following section (L914)

choice(
  // simplified CASE x WHEN
  seq(
    $._expression,
    $.keyword_when,
    $._expression,
    $.keyword_then,
    $._expression,
    repeat(
      seq(
        $.keyword_when,
        $._expression,
        $.keyword_then,
        $._expression,
      )
    ),
  ),
  // standard CASE WHEN x, where x must be a predicate
  seq(
    $.keyword_when,
    $.predicate,
    $.keyword_then,
    $._expression,
    repeat(
      seq(
        $.keyword_when,
        $.predicate,
        $.keyword_then,
        $._expression,
      )
    ),
  ),
),

Can you take a look?

In the second choice, we expect a predicate after the when. However, it can also be a boolean column. Can we handle this case at all?

Invalid node type for language sql

When opening SQL files I am getting the following error:

E5108: Error executing lua /usr/share/nvim/runtime/lua/vim/treesitter/query.lua:261: query: invalid node type at position 315 for language sql
stack traceback:
        [C]: in function '_ts_parse_query'
        /usr/share/nvim/runtime/lua/vim/treesitter/query.lua:261: in function 'get_query'
        /usr/share/nvim/runtime/lua/vim/treesitter/highlighter.lua:50: in function 'new'
        /usr/share/nvim/runtime/lua/vim/treesitter/highlighter.lua:194: in function 'get_query'
        ...tart/playground/lua/nvim-treesitter-playground/utils.lua:52: in function 'fn'
        /usr/share/nvim/runtime/lua/vim/treesitter/languagetree.lua:245: in function 'for_each_tree'
        ...tart/playground/lua/nvim-treesitter-playground/utils.lua:39: in function 'get_hl_groups_at_position'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:11: in function 'get_hl_group_for_node'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:34: in function 'flatten_node'
        ...rt/playground/lua/nvim-treesitter-playground/printer.lua:77: in function 'process'
        ...t/playground/lua/nvim-treesitter-playground/internal.lua:693: in function 'update'
        ...t/playground/lua/nvim-treesitter-playground/internal.lua:148: in function 'toggle_hl_groups'
        [string ":lua"]:1: in main chunk

I am not sure when this started to happen. The specific dialect I am using is MySQL.

Insert overwrite with and without partition aren't parsed properly

Hi, the following queries aren't parsed properly. Please have a look.
sparksql syntax reference

insert overwrite data_base_name.tb_name_01 
partition ( key1 = '20230522' ,key2 = 'bb')
SELECT *
  FROM 
data_base_name.tb_name_00
;

insert overwrite data_base_name.tb_name_02
SELECT
  a1.col_a,
  a1.col_b
FROM 
(
  SELECT
    *
  FROM
    data_base_name.tb_name_01 
  WHERE
    key1 >= 'bb'
) a1

two queries:
image

just the second query
image

LICENSE

Hi, I see you mention MIT in the metadata, but there's no actual license text with attribution to copy. Can you add a LICENSE file to this repo? Thanks!

Question: ANSI SQL?

Hi Derek,

I had a few questions regarding your grammar:

  1. I was wondering what source grammar you used to build the grammar file
  2. Isย this sql grammar ANSI SQL or some other dialect
  3. How did you start writing this? Did you start with the grammar.js and was everything else generated using the parser generator?

Missing floating numbers

The parser does not handle 1.1 .1 or 1.

Float should be parsed in statements like:

select
1.1 as f
where 1.1 = 1.1

or

select
a * 1.1
from b

I will submit a PR, but I am struggling to put the node at the right location.

Misuse of `alias()`?

I noticed that while creating syntax queries for hive statements.

I found that some uses of alias() (see https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl) appear questionable to me, and I wanted to check with you.

1. Doubts: (identifier) for builtin functions

In both the following builtin functions, the function name keyword is replaced in the AST with identifier. I just wanted to check if that's really intentional?

Isn't an (identifier) rather identifying a application-domain-specific item, such as a table or field - rather than a builtin SQL feature keyword?

At first sight, not only (identifier) is wrong in this case, but the entire field('name', ...) should just be replaced by $.keyword_cast and $.keyword_count.

However, field('name', ...) gives compatibility with the generic (invocation) (

tree-sitter-sql/grammar.js

Lines 1378 to 1381 in 393e0d3

invocation: $ => seq(
field('name', $.identifier),
paren_list(field('parameter', $._expression)),
),
), where a field is necessary to extract the function name -- so it's probably only the (identifier) alias that is to be removed.

โœ… Done so in e4e43ba and eac9da2

1.1. CAST

tree-sitter-sql/grammar.js

Lines 1341 to 1350 in 393e0d3

cast: $ => seq(
field('name', alias($.keyword_cast, $.identifier)),
'(',
seq(
field('parameter', $._expression),
$.keyword_as,
$._type,
),
')',
),

โœ… Fixed in e4e43ba fix: Remove identifier alias from cast builtin function - PR #102

1.2. COUNT

tree-sitter-sql/grammar.js

Lines 1357 to 1362 in 393e0d3

count: $ => seq(
field('name', alias($.keyword_count, $.identifier)),
'(',
$._aggregate_expression,
')',
),

โœ… Fixed in eac9da2 fix: Remove identifier alias from count builtin function - PR #102

2. Misuse? String literals as (identifier)

tree-sitter-sql/grammar.js

Lines 980 to 988 in 393e0d3

storage_location: $ => prec.right(
seq(
$.keyword_location,
field('path', alias($._literal_string, $.identifier)),
optional(
seq(
$.keyword_cached,
$.keyword_in,
field('pool', alias($._literal_string, $.identifier)),

Shouldn't these rather be alias($._literal_string, $.literal)?

โœ… Fixed in 4216ecf fix: Replace identifier with literal for hive storage location and cache pool - PR #102

';', '"', '\n' and '/path/data' in the Hive CREATE TABLE test end up as identifier tree item, while they clearly are literals:

grafik

See also LOCATION under https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html#parameters

2.1 Update: ';', '"' and '\n' do not show up in the AST at all

================================================================================
Create hive table
================================================================================
CREATE EXTERNAL TABLE tab (col int)
PARTITIONED BY (col int)
SORT BY (col)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ';' ESCAPED BY '"'
LINES TERMINATED BY '\n'
STORED AS PARQUET
LOCATION '/path/data'
CACHED IN 'pool1' WITH REPLICATION = 2
--------------------------------------------------------------------------------
(program
(statement
(create_table
(keyword_create)
(keyword_external)
(keyword_table)
(table_reference
name: (identifier))
(column_definitions
(column_definition
name: (identifier)
type: (int (keyword_int))))
(table_partition
(keyword_partitioned)
(keyword_by)
(column_definitions
(column_definition
name: (identifier)
type: (int (keyword_int)))))
(table_sort
(keyword_sort)
(keyword_by)
(identifier))
(row_format
(keyword_row)
(keyword_format)
(keyword_delimited)
(keyword_fields)
(keyword_terminated)
(keyword_by)
(keyword_escaped)
(keyword_by)
(keyword_lines)
(keyword_terminated)
(keyword_by))
(stored_as
(keyword_stored)
(keyword_as)
(keyword_parquet))
(storage_location
(keyword_location)
path: (identifier)
(keyword_cached)
(keyword_in)
pool: (identifier)
(keyword_with)
(keyword_replication)
value: (literal)))))

โœ… Fixed in d2f0f66 _ feat: Add AST-fields for row terminated/escaped-by and lines terminated-by-_ PR #102

Field names from: https://spark.apache.org/docs/latest/sql-ref-syntax-hive-format.html

Postgres support

Hi,

Are there any plans to support PostgreSQL? Basic query works, but when I try to alias something and use double quotes, parsing fails.
For example:

select title as "Post title" from posts;

Parses it like this:

statement [0, 0] - [0, 40]
  select [0, 0] - [0, 28]
    keyword_select [0, 0] - [0, 6]
    ERROR [0, 7] - [0, 15]
      field [0, 7] - [0, 12]
        name: identifier [0, 7] - [0, 12]
      keyword_as [0, 13] - [0, 15]
    select_expression [0, 16] - [0, 28]
      literal [0, 16] - [0, 28]
  from [0, 29] - [0, 39]
    keyword_from [0, 29] - [0, 33]
    relation [0, 34] - [0, 39]
      table_reference [0, 34] - [0, 39]
        name: identifier [0, 34] - [0, 39]

Not using double quotes fixes it:

select title as Post_title from posts;

Result:

statement [0, 0] - [0, 38]
  select [0, 0] - [0, 26]
    keyword_select [0, 0] - [0, 6]
    select_expression [0, 7] - [0, 26]
      field [0, 7] - [0, 12]
        name: identifier [0, 7] - [0, 12]
      keyword_as [0, 13] - [0, 15]
      alias: identifier [0, 16] - [0, 26]
  from [0, 27] - [0, 37]
    keyword_from [0, 27] - [0, 31]
    relation [0, 32] - [0, 37]
      table_reference [0, 32] - [0, 37]
        name: identifier [0, 32] - [0, 37]

Between expression or parentheses within case when context are not parsed correctly

Tree-sitter-sql is really great, while I found some legal queries aren't paresed correctly. Would you mind checking them?
The problems occur when 'between expression' or 'parentheses' are surrounded by case when statement, here are two examples:

SELECT 
a.*,
case when a.col BETWEEN 0 AND 1 then 1
when a.col >=1 then 0
end as col2
FROM tb_name a
;


SELECT
  a.*,
  CASE WHEN a.col_1 >= 100 THEN 0 ELSE 1 END AS col_a,
  CASE WHEN (a.col_1 >= 100) THEN 0 ELSE 1 END AS col_b
  CASE WHEN ((a.col_0 >= 1) AND (a.col_1 >= 100)) THEN 0
  END AS col_c
FROM
  tb_name a
WHERE
  ((a.col_0 >= 1) AND (a.col_1 >= 100))
;


SQL file lose syntax

Hello, I have a problem with syntax highlighting in an SQL file. I created an issue in the nvim-treesitter repository, but I was advised to come here. I am attaching a screenshot of what it looks like in my nvim 9.
image
Also here is a link to the issue in nvim-treesiter repo - there is more information about my problem.
nvim-treesitter/nvim-treesitter#4668

Issues with precedence implementing `NOT IN`

What

Implementing NOT IN as mentioned in #126, is causing some errors with precedence. I think the root cause might be from the between expression added in #117. It has a rule seq($._expression, $.keyword_and, $._expression) but the binary_expression node is also defined as seq($._expression, $.keyword_and, $._expression).

Some investigation & work is required to detangle the precedence issues and implement NOT IN.

Some more MariaDB syntax missing or inconsistent highlight

Using 4fe05b2

CREATE TABLE publications(
    `uid` INT AUTO_INCREMENT NOT NULL,
    `pid` INT DEFAULT 0 NOT NULL,
    `tstamp` INT UNSIGNED DEFAULT 0 NOT NULL,
    `crdate` INT UNSIGNED DEFAULT 0 NOT NULL,
    `cruser_id` INT UNSIGNED DEFAULT 0 NOT NULL,
    `deleted` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `hidden` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `notes` TEXT DEFAULT NULL,
    `code` VARCHAR(3) DEFAULT '' NOT NULL,
    `name` VARCHAR(100) DEFAULT '' NOT NULL,
    `frequency` SMALLINT UNSIGNED DEFAULT 0 NOT NULL,
    `editions` INT UNSIGNED DEFAULT 0 NOT NULL,
    `subscription_types` INT UNSIGNED DEFAULT 0 NOT NULL,
    `methods` INT UNSIGNED DEFAULT 0 NOT NULL,
    INDEX `parent`(pid),
    PRIMARY KEY(uid)
) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

image

  1. NOT is sometimes lit, sometimes it isn't. โœ… Fixed by #73
  2. UNSIGNED might be something to highlight as keyword? โœ… Fixed by #88
  3. INDEX should be a keyword like KEY โœ… Fixed by #77
  4. DEFAULT CHARACTER SET, COLLATE and ENGINE all keywords. โœ… Fixed by #77

CREATE TABLE some_table(
    `code` VARCHAR(9) DEFAULT '' NOT NULL,
    `name` VARCHAR(100) DEFAULT '' NOT NULL,
    `published_at` DATE DEFAULT '0000-00-00' NOT NULL,
    INDEX `parent`(pid),
    INDEX `parent_publication`(publication, hidden, deleted),
    INDEX `range`(publication, published_at),
    PRIMARY KEY(uid)
) DEFAULT CHARACTER SET utf8mb4 COLLATE `utf8mb4_unicode_ci` ENGINE = InnoDB;

image

  1. All string literals, such as '' and '0000-00-00' should probably be yellow, as the varchar lengths are, and as other string literals are, e.g. in UPDATE ... SET field = 'string literal'. โœ… Fixed by #73
  2. INDEX etc. as above. โœ… Fixed by #77

ALTER TABLE `lesson` ADD editions INT DEFAULT 0 NOT NULL;

image

  1. INT is not recognized as keyword in this position. โœ… Fixed by #80
  2. integer DEFAULT values are white (not highlighted as literal), where I would have expected them to be yellow (literal). โœ… Fixed by #73

DROP INDEX `parent_method` ON site;
CREATE INDEX `parent_method` ON site (pid, deleted, parent_method, hidden);

image

  1. INDEX after DROP is not recognized as keyword. โœ… Fixed by #96

UPDATE site
  SET publication = 1
  WHERE type = 'method' AND identifier = "some_identifier";

image

  1. String literals without highlight in UPDATE ... SET ... WHERE ... in WHERE-position. (Either quote style.) โœ… Fixed - must be caused by a previous parse error in the same file.
  2. The numeric literal in SET-position should also be yellow rather than white. โœ… Fixed by #73

ALTER TABLE site 
  RENAME COLUMN access_groups TO subscription_types;

image

  1. TABLE in ALTER TABLE not recognized as keyword. โœ… Fixed - must be caused by a previous parse error in the same file.

ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_code` VARCHAR(10) DEFAULT '' NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_access_model` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_allow_historical` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_allow_media_archive` INT UNSIGNED DEFAULT 0 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_access_first` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_access_last` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD `tx_digibordauthentication_seasonal_restrict_auth` SMALLINT UNSIGNED DEFAULT 1 NOT NULL;
ALTER TABLE `fe_groups` ADD tx_digibordauthentication_notes TEXT DEFAULT NULL;
ALTER TABLE `fe_groups` ADD tx_digibordauthentication_publication INT UNSIGNED DEFAULT 0 NOT NULL;

CREATE INDEX `publication` ON `fe_groups` (pid, tx_digibordauthentication_publication);

ALTER TABLE `fe_groups` CHANGE `tx_digibordauthentication_site` `tx_digibordauthentication_site` INT UNSIGNED DEFAULT 0 NOT NULL;

image

  1. Inconsistent highlight for ADD โœ… Fixed by #80
  2. No highlight for ON at CREATE INDEX ... ON โœ… Fixed by #80 (parse error in the ALTER TABLE above)
  3. No highlight for CHANGE in ALTER TABLE ... CHANGE โœ… Fixed by #99
  4. Datatypes (except for TEXT!) and UNSIGNED not recognized as highlight (already mentioned further above). โœ… Fixed by #80 (parse error in the ALTER TABLE above) + โœ… Fixed by #99
  5. Default values not recognized as literals (already mentioned further above). โœ… Fixed by #73

ALTER TABLE some_table 
ADD some_field INT UNSIGNED NOT NULL DEFAULT '0'
AFTER some_other_field;

grafik

  1. AFTER not highlighted. โœ… Fixed by #99
  2. INT UNSIGNED and '0' as mentioned above.
  • INT โœ… Fixed by #80
  • UNSIGNED โœ… Fixed by #88
  • '0' โœ… Fixed by #73

Please keep the generated parser files in the repository

The upcoming Emacs 29 can be optionally compiled with the tree-sitter library, and comes with a user command that assists users in building and installing the grammar libraries they need. However, that command assumes the C/C++ parser and scanner source files are available in the corresponding repositories, so as to avoid requiring Emacs users to have too many development tools installed.

The recent removal of the generated parser files from the repository will cause that Emacs command to fail; users will be unable to build and install the tree-sitter grammar for SQL using your repository, unless they install quite a few more tools and build the library manually, something that is a significant obstacle for many Emacs users.

So please add those generated files back to the repository. The Emacs user community will thank you.

Thanks in advance.

`UNSIGNED` is missing

Causing queries like this to fail:

SELECT CAST("1" AS UNSIGNED INTEGER);

Example source: https://mariadb.com/kb/en/cast/ .

Tree:

image

'unsigned' does not appear in grammar.js at all and should probably be a keyword and part of the numeric type declarations.

Parsing nested CTEs

I found that the following valid PostgreSQL query involving nested CTEs is not parsed correctly by this grammar:

WITH top_cte AS (
  WITH nested_cte AS (
    SELECT 1 as one, 2 as two
  )
  SELECT one, two
  FROM nested_cte
)
SELECT *
FROM top_cte;

Received AST:

(program
  (ERROR
    (statement
      (keyword_with)
      (cte
        (identifier)
        (keyword_as)
        (ERROR
          (keyword_as))
        (statement
          (select
            (keyword_select)
            (select_expression
              (term
                (literal)
                (keyword_as)
                (identifier))
              (term
                (literal)
                (keyword_as)
                (identifier))))))
      (select
        (keyword_select)
        (select_expression
          (term
            (field
              (identifier)))
          (term
            (field
              (identifier)))))
      (from
        (keyword_from)
        (relation
          (object_reference
            (identifier))))))
  (statement
    (select
      (keyword_select)
      (select_expression
        (term
          (all_fields))))
    (from
      (keyword_from)
      (relation
        (object_reference
          (identifier))))))

This SO answer says nested CTEs are not valid, at least for T-SQL. I found that PostgreSQL has no problem executing that query, though:

image

Bug: support SELECT from more than two tables

As of right now a select like

SELECT *
FROM 
    Table1 a,
    Table2 b,
    ...
    TableN z
WHERE...

will result in an error on the table list. It does recognize both the first and the last table with its name and table_alias just fine but it does not recognize the table(s) in between.

Feature Request: Support for parameter placeholders

Hi!

In my daily work I have to deal with literal SQL queries that contain parameter placeholders in 99% of the cases.
Is it possible/planned/not planned to add support for them? The drivers I use expect ? as placeholder although there are other syntaxes.

Or is this something that can possibly be even added at the level of queries?

Cheers
Manuel

Support for Hive specific statements

Hi,
would it be possible to support Hive specific commands such STORED AS PARQUET or COMPUTE STATS?
This SQL extensions are supported by many databases these days (Redshift, Impala, Snowflake, Datafusion)

I would create a PR for that, if you are ok with that and once I have collected more examples.

Cheers,
Matt

T-SQL Alter Table

TSQL:

ALTER TABLE Class_Session
ADD
type varchar(255) NOT NULL DEFAULT ('online');

Error:

Expected [A-Za-z0-9_:] or [A-Za-z0-9_] but " " found. (sql)

Expected:
No error, this is valid sql

image

`keyword_language` is unused, renderet to `parser.c` as `aux_sym_*`

While preparing helix-editor/helix#7387 I noticed that a (keyword_language) query makes tree-sitter fail parsing with a NodeType error.

Upon investigation, I found that:

The symbol is defined in grammar.json.

But it is not defined in node-types.json.

In parser.c, it is defined as aux_sym_keyword_language_token1. ๐Ÿค” (see also the codegen)

(keyword_language) appears to be unused.

I think it can be removed - or shall it be used for function_language?

Support for floating point literals

_number is defined as integer:

    _number: _ => /\d+/,

_number: _ => /\d+/,

This causes a parse error in queries containing floating point numbers:

SELECT 3.5;

For a fix, the question is if _number should add an optional (?:\.\d+)? or if a secondary number type should be added.

$._number is used in quite a few places. Some of these places should probably allow floating point numbers (e.g. ranges), while others should not (like params to parametric_type().

Postgres Array function is not parsed correctly

select array(select 'x')

Is a valid SQL statement in Postgres. The odd thing is, that it does not seems to be documented.

Since we have the array node for statements like this

select array['x']

This actually gives the same result.

Quoted Table Indentifiers not parsed correctly

Statements like:

select b."A" from b

Result in the following AST:

statement [0, 0] - [0, 19]
  select [0, 0] - [0, 19]
    keyword_select [0, 0] - [0, 6]
    select_expression [0, 7] - [0, 19]
      term [0, 7] - [0, 19]
        value: field [0, 7] - [0, 19]
          table_alias: identifier [0, 7] - [0, 8]
          ERROR [0, 9] - [0, 17]
            identifier [0, 9] - [0, 12]
            keyword_from [0, 13] - [0, 17]
          name: identifier [0, 18] - [0, 19]

It is valid SQL syntax and necessary, when column names contain capital letters.

Parsing PostgreSQL JSON operators

I've noticed PostgreSQL JSON operators are not parsed correctly.

For example, the following query has parsing errors:

SELECT "user"->>'login' AS username FROM users;

Parsed AST:

(program
  (statement
    (select
      (keyword_select)
      (select_expression
        (term
          (binary_expression
            (literal)
            (ERROR)
            (literal))
          (keyword_as)
          (identifier))))
    (from
      (keyword_from)
      (relation
        (object_reference
          (identifier))))))

Question: Is recursion supported in this grammar?

Hello, we were wondering if this grammar supports recursion (especially for multiple queries)?

For example:

select * from foo where bar > (select foobar from barfoo);

Will something like this be supported by the grammar?

Negative integers are not parsed

Issue is _number being used as choice in literal, but also as part of _decimal_number.

    literal: $ => prec(2,
      choice(
        $._number,
        $._decimal_number,
        $._literal_string,
        $.keyword_true,
        $.keyword_false,
        $.keyword_null,
      ),
    ),
    _double_quote_string: _ => seq('"', /[^"]*/, '"'),
    _literal_string: $ => choice(
      seq("'", /[^']*/, "'"),
      $._double_quote_string,
    ),
    _number: _ => /\d+/,
    _decimal_number: $ => choice(
        seq(optional("-"), ".", $._number),
        seq(optional("-"), $._number, ".", $._number),
        seq(optional("-"), $._number, "."),
    ),

Therefore, \d+ is correct, and a new anonymous option _integral_number must be added, using seq(optional("-"), $._number).

Usage of parenthesis is wrong

In $.list we are using paren_list() which wraps parenthesis around a comma separated list. Therefore, statements like:

select 1 + (2 * 3, 2 * 3)

are parsed, even though they are not allows in SQL (I have just tested it against Postgres. I am sure it is wrong in any SQL dialect)

Here is the AST for this statement:

(program [0, 0] - [1, 0]
  (statement [0, 0] - [0, 25]
    (select [0, 0] - [0, 25]
      (keyword_select [0, 0] - [0, 6])
      (select_expression [0, 7] - [0, 25]
        (term [0, 7] - [0, 25]
          value: (binary_expression [0, 7] - [0, 25]
            left: (literal [0, 7] - [0, 8])
            right: (list [0, 11] - [0, 25]
              (binary_expression [0, 12] - [0, 17]
                left: (literal [0, 12] - [0, 13])
                right: (literal [0, 16] - [0, 17]))
              (binary_expression [0, 19] - [0, 24]
                left: (literal [0, 19] - [0, 20])
                right: (literal [0, 23] - [0, 24])))))))))

Update the repo to not require checking in the parser files

Right now everything generated via tree-sitter generate gets checked in, I'd like to change it so that only the grammar.js is checked into the repo and the artifacts can be generated by the tree-sitter tool but consumers of this parser.

SQL highlighting issue in helix-editor, using tree-sitter-sql

The original bug report can be found at helix-editor/helix#5186

Other than in the bug report, I am using now helix at latest master commit (edd0ba7 right now), but that makes no difference.

I updated tree-sitter-sql by overriding https://github.com/helix-editor/helix/blob/8e10c592b90383a90b7e4a9335d323f8fab99ab5/languages.toml#L1435-L1437
followed by rebuilding grammers:

$ hx --grammar fetch                                                                                                                                                                                                                                                                                    
Fetching 128 grammars
127 up to date git grammars
1 updated grammars
        sql now on 6adc9578afb5c361858213adae5e2de990e46cb1


$ hx --grammar build                                                                                                                                                                                                                                                                                    
Building 128 grammars
127 grammars already built
1 grammars built now
        ["sql"]

however, the highlighting still fails, which makes me conclude this will rather be a tree-sitter-sql bug. (Or two, actually.)

Summary

Example SQL:

INSERT INTO ... SET ... syntax:

INSERT INTO some_table
  field = "String does not get highlight in INSERT SET syntax";

helix edd0ba7 with tree-sitter-sql 9d98029: String value does not get highlight:

image

INSERT INTO ... (...) VALUES (...) syntax with multiple values records:

INSERT INTO some_table
  (field)
VALUES
  ("String value"),
  ("String value");

helix edd0ba7 with tree-sitter-sql 9d98029: Values of second record do not get highlight:

image

GROUP_CONCAT with SEPARATOR:

SELECT GROUP_CONCAT(uid SEPARATOR ",") 
FROM some_table 
GROUP BY some_field;

SELECT GROUP_CONCAT(uid SEPARATOR ",") 
FROM some_table 
GROUP BY some_field;

helix edd0ba7 with tree-sitter-sql 9d98029: Separator string appears to not be expected (only "," should be yellow):

image

Refactor: remove `count` node

count is just a special invocation node, so we could simplify this. There is no special reason for having both.

This change requires downstream changes in the highlights.scm

Dbt Jinja

Hi there,
I am using this Treesitter grammar for a while now in neovim. I work a lot with dbt, which consists of sql files with some jinja blocks inside. Is there any change to get this included in the grammar (or at least ignored).
I have some (pg)SQL files which look really weird, e.g. the Syntax Highlighting stops in the middle of the file and TS reports and error (even without jinja blocks near the error location).

Support `BEGIN ... END` blocks

BEGIN ... END blocks are used in MySQL and MSSQL.

Here's a minimal example (which GitHub also fails to highlight properly):

BEGIN
    CREATE TABLE FOO (bar INT);
END;
keyword_begin [0, 0] - [0, 5]
ERROR [1, 4] - [2, 3]
  keyword_table [1, 11] - [1, 16]

Multiline comments

Hi Derek! C-style multi-line comments are not parsed as a single (marginalia). I noticed it in the Helix editor which uses this project as its sql grammar source by default.

image

I took a look at https://github.com/m-novikov/tree-sitter-sql/blob/main/grammar.js#L1194 and saw that multiline comments were using seq("/*", /[^*]*\*+([^/*][^*]*\*+)*/, "/"). What do you think of updating the repo to support this? I'll be happy to test it out and open a PR.

`group_concat` as generic `invocation`

In 8c0ca86 group_concat was introduced. However, this function exists in other dialects was well with a different internal structure (aka the parameters of the list)

e.g.

select
    group_concat(col1, ',')
from tab

This statement is not parsed correctly. I think group_concat should a generic node of type invocation

Alternatively, implement the complete internal expression syntax. It looks quite similar between MariaDB, MySQL and Impala. In that example above, the optional separator is missing.

generated files and workflows

We need to check in the generated parser files to support downstream use, but that gets us into merge conflicts every time someone cuts a new branch before someone else lands theirs. It's only happened a few times so far, but if the pace picks up we should consider an action that can build those on pull requests and squash merge.

We could commit after build+test on the branch in progress, but that seems like a bit much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.