Coder Social home page Coder Social logo

tinypg's People

Contributors

dunbaratu avatar hvacengi avatar jissai avatar tomspilman avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tinypg's Issues

Change the boilerplate comment to make it clearer not to edit Parser.cs, ParseTree.cs, Scanner.cs

TinyPG spits out Parser.cs, ParseTree.cs, and Scanner.cs with this stock comment on top:

// Generated by TinyPG v1.3 available at www.codeproject.com

We should make our version of TinyPG spit out a more verbose clear comment that explains to people looking at our project that:
1 - DO NOT EDIT THIS FILE- IT IS AUTOGENERATED BY A PROGRAM CALLED TINYPG.
2 - And where to get TinyPG (our version of it in our github home)
3 - And how to run TinyPG to re-generate these files.
4 - And that the real change is to edit the kRISC.tpg file

This is because multiple times we've gotten PR's from people trying to change the parser by editing these files directly. We could make it more clear what's happening.

We could also perhaps change the folder tree to make them under a folder called "Autogenerated" but that's more for the KOS project not the TinyPG project. But I mention it here for reference.

[performance] Regex matches inefficiently find irrelevant hits that get culled out right away.

This issue in kOS project : KSP-KOS/KOS#2135
seems to imply that TinyPG itself can be edited to improve its regex performance in the scanner.

Example Text:

set   ident  to 1234 * sqrt(5432.1).[EOF]
             ^
             |
             |
    Imagine the Scanner's startpos is currently here
    because the scanner has already tokenized this much
    so far:
        set[whitespace skipped]ident[whitespace skipped]

That means the substring of the input file the scanner hasn't consumed yet is this:

to 1234 * sqrt(5432.1).[EOF]
^

And the zeroth position of that subset is where the caret is.

The Scanner currently does this in a for loop, inside LookAhead():

  • For each scantoken rule (regex pattern) defined in the grammar file:
    • Try to find a match within the remaining substring (to 1234 * sqrt(5432.1)[EOF] in the above example).
    • If a match is found AND that match started at index 0 and it is longer than the longest match so far:
      • Then this becomes the new match so far.
  • If no matches were found in the above loop, issue an error message - unexpected character.

But notice the bold text above. Only matches that start at index 0 count, but the way it implements this is to find the matches at higher indeces, but then it immediately throw them away. This is very inefficient, as discovered by @tsholmes. For example, if the scanner was looking at the above example, the rule to match INTEGER will find a hit at index 3 on 1234, but since that's not at index 0, it will be thrown out. The rule to match MULTIPLY will find a hit on the substring * at index 8, but since that's not at index 0, it doesn't count and gets thrown out. It will also find a hit for IDENTIFIER on the substring sqrt at index 10, but since that's not at index 0, it doesn't count and gets thrown out. etc, etc, etc. The only match that doesn't get thrown away is the one to find the keyword TO, which is kept because it was at index zero.

If you imagine a large file, this is a lot of matching that just gets thrown away right away.

By inserting an implied caret ("^") into the regex before running Regex.Match(), the Match routine itself can be told not to bother with any matches that don't start at index zero. Then instead of getting the match and immediately throwing it away, it just won't find the match in the first place.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.