Coder Social home page Coder Social logo

tree-sitter-grammars / tree-sitter-xml Goto Github PK

View Code? Open in Web Editor NEW
27.0 3.0 7.0 690 KB

XML & DTD grammars for tree-sitter

License: MIT License

Python 5.38% JavaScript 24.54% Rust 8.30% Scheme 9.43% C 33.65% C++ 2.19% Makefile 9.63% Swift 4.27% Go 2.61%
dtd parser tree-sitter xml

tree-sitter-xml's Introduction

tree-sitter-xml's People

Contributors

amaanq avatar charles-zablit avatar observeroftime avatar simonacca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tree-sitter-xml's Issues

feature: match start and end tags

Did you check the tree-sitter docs?

Is your feature request related to a problem? Please describe.

<From>Jani</from> is invalid in XML but the parser allows it.

Describe the solution you'd like

The parser should ensure that start tags and end tags have the same name.

Describe alternatives you've considered

Just leave it to the language server/linter.

Additional context

Specification: https://www.w3.org/TR/xml/#sec-starttags
Example: https://www.w3schools.com/xml/note_error.xml

Installation instructions missing

I'd like to try out this nice project but didn't manage to install it according to the official doc. The command tree-sitter generate needs to be run from the directories where grammar.js are but this is not obvious for new users. I didn't find out yet how to install the parser for example, even on the cited page.

Filter `AttValue` with `#any-of?`

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.22.2

Describe the bug

Hey Devs ๐Ÿ‘‹๐Ÿป ,

This might be more of an understanding issue that a bug, because I'm new to the tree-sitter syntax :swe.
I'm using tree-sitter-xml 0.6.1 alongside tree-sitter 0.22.2 in a rust project.
In which I want parse a xml file like this:

   ...
    <import>
        <module name="module1" />
        <module name="module2" />
        <module name="module3" />
        <module name="module4" />
        ...
    </import>
   ... 

My goal is to check if certain modules are listed which I try with this query:

(element
    (_
        (Name) @name
        (Attribute
            (AttValue) @module_name
        )
    )
    (#eq? @name "module")
    (#any-of? @module_name "module1" "module3" "module4")
)

Apparently this query produces no matches. Without the (#any-of? @module_name "module1" "module3" "module4") I get all modules.

I expected to Match against only the against "module1" "module3" and "module4".

thanks in advance ๐Ÿ˜„

Steps To Reproduce/Bad Parse Tree

see in the description

Expected Behavior/Parse Tree

see in the description

Repro

see in the description

bug: longer XML tags are errors

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

4e8a9e654f90834be77a80a33264a54434a2ead3

Describe the bug

When using longer XML tags the parser seems to (mismatch?) generate an incorrect tree.

Note: I am using the C API.

Steps To Reproduce/Bad Parse Tree

Here's the bad parse tree:

(ERROR)(1:1 - 2:48)
    (prolog)(1:1 - 2:1)
        (XMLDecl)(1:1 - 1:39)
            (<?)(1:1 - 1:3)
            (xml)(1:3 - 1:6)
            (version)(1:7 - 1:14)
            (=)(1:14 - 1:15)
            (")(1:15 - 1:16)
            (VersionNum)(1:16 - 1:19)
            (")(1:19 - 1:20)
            (encoding)(1:21 - 1:29)
            (=)(1:29 - 1:30)
            (")(1:30 - 1:31)
            (EncName)(1:31 - 1:36)
            (")(1:36 - 1:37)
            (?>)(1:37 - 1:39)
    (STag)(2:1 - 2:21)
        (<)(2:1 - 2:2)
        (Name)(2:2 - 2:20)
        (>)(2:20 - 2:21)
    (content)(2:21 - 2:27)
        (CharData)(2:21 - 2:27)
    (</)(2:27 - 2:29)
    (Name)(2:29 - 2:47)
    (>)(2:47 - 2:48)

Expected Behavior/Parse Tree

(document)(1:1 - 2:44)
    (prolog)(1:1 - 2:1)
        (XMLDecl)(1:1 - 1:39)
            (<?)(1:1 - 1:3)
            (xml)(1:3 - 1:6)
            (version)(1:7 - 1:14)
            (=)(1:14 - 1:15)
            (")(1:15 - 1:16)
            (VersionNum)(1:16 - 1:19)
            (")(1:19 - 1:20)
            (encoding)(1:21 - 1:29)
            (=)(1:29 - 1:30)
            (")(1:30 - 1:31)
            (EncName)(1:31 - 1:36)
            (")(1:36 - 1:37)
            (?>)(1:37 - 1:39)
    (root: element)(2:1 - 2:44)
        (STag)(2:1 - 2:19)
            (<)(2:1 - 2:2)
            (Name)(2:2 - 2:18)
            (>)(2:18 - 2:19)
        (content)(2:19 - 2:25)
            (CharData)(2:19 - 2:25)
        (ETag)(2:25 - 2:44)
            (</)(2:25 - 2:27)
            (Name)(2:27 - 2:43)
            (>)(2:43 - 2:44)

Repro

<?xml version="1.1" encoding="UTF-8"?>
<exampleofaverylong>foobar</exampleofaverylong>

bug: Error parsing `]` in CharData

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.22.6 (b40f342067a89cd6331bf4c27407588320f3c263)

Describe the bug

Parsing the following document

<test>]</test>

results in the following tree:

(ERROR [0, 0] - [1, 0]
  (STag [0, 0] - [0, 6]
    (Name [0, 1] - [0, 5]))
  (content [0, 6] - [1, 0]
    (CharData [0, 6] - [1, 0])))

Probably a false positive happening in the scanner when looking for CDATA delimiters.

Steps To Reproduce/Bad Parse Tree

echo '<test>]</test>' > test.xml
tree-sitter parse test.xml

Expected Behavior/Parse Tree

(document [0, 0] - [1, 0]
  root: (element [0, 0] - [0, 14]
    (STag [0, 0] - [0, 6]
      (Name [0, 1] - [0, 5]))
    (content [0, 6] - [0, 7]
      (CharData [0, 6] - [0, 7]))
    (ETag [0, 7] - [0, 14]
      (Name [0, 9] - [0, 13]))))

Repro

No response

bug: parameter-entity references not fully supported

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

PE references are not parsed in all allowed contexts.

Steps To Reproduce/Bad Parse Tree

See https://github.com/tree-sitter-grammars/tree-sitter-xml/actions/runs/7993455735#summary-21829321669

Expected Behavior/Parse Tree

Most DTD nodes should be replaceable with PE references.

Repro

No response

bug: nvim-treesitter[xml] compile error

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

No response

Describe the bug

getting a compile error for nvim-treesitter[xml] in the scanner.c function 'string push', compiler is seeing an implicit declaration of function "max". There is a macro in the file that should define it on windows, but it doesn't seem to be doing so.

Steps To Reproduce/Bad Parse Tree

  1. Add nvim-treesitter to neovim config
  2. Start neovim

Expected Behavior/Parse Tree

I expect nvim-treesitter to compile, but it doesn't.

Repro

No response

feature: support Wasm build for compiling as a Zed extension

Did you check the tree-sitter docs?

Is your feature request related to a problem? Please describe.

I am trying to submit an extension for Zed, but the build process files with the error:
Error: Failed to instantiate wasm module: language version 12 is too old for wasm

Describe the solution you'd like

Wasm build to compile correctly

Describe alternatives you've considered

No response

Additional context

issue: zed-industries/extensions#590 (comment)
proposed Zed extension: https://github.com/sweetppro/zed-xml

Split into two repos?

I'm trying to add XML support to the Emacs world, but I cannot since your repo isn't a standard tree-sitter-[language] kind of repository. I think it would be nice if you could split this into two repos so you have one language per repo. ๐Ÿค”

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.