Coder Social home page Coder Social logo

antlr4-autosuggest-js's People

Contributors

oranoran avatar sergey-behavox avatar wisebird avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

antlr4-autosuggest-js's Issues

Stack overflow Issue when fetching suggestions

Hi @oranoran,

I am using the following grammar file for testing. If user types 'SHOW EMPLOYEE ' then the suggestion should come as 'FOR' and 'WHERE', however, the autosuggest function is going into an infinite loop and is eventually causing stack overflow.

grammar autocomplete;

query
: query_stmt EOF
;

query_stmt
: start_keyword entity_name ( filter_name expr )?
;

expr
: column_name
| expr operator_exp literal_value
| expr logical_exp expr
;

operator_exp
:OPERATOR
;

logical_exp
:K_AND
;

entity_name
: any_name
;

column_name
: any_name
;

any_name
: IDENTIFIER
| STRING_LITERAL
;

filter_name
: K_FOR
| K_WHERE
;

start_keyword
: K_SHOW
| K_SELECT
;

literal_value
: NUMERIC_LITERAL
| IDENTIFIER
| STRING_LITERAL
;

K_SHOW : S H O W;
K_SELECT : S E L E C T;
K_AND : A N D;
K_FOR : F O R;
K_WHERE : W H E R E;

OPERATOR
: ('=' | '!=' | '>=' | '<=' )
;

IDENTIFIER
: '"' (~'"' | '""')* '"'
| '' (~'' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]*
;

STRING_LITERAL
: ''' ( ~''' | '''' )* '''
;

NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

UNEXPECTED_CHAR
: .
;

fragment DIGIT : [0-9];

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

Massive loop on autosuggest

EDIT: It is not a infinite loop, actually it is a massive loop suggesting all possible combination of upper cases/lower cases

First of all thanks for this, it is an amazing work (anltr4 it is hard).

I am trying to integrate the autosuggest in the Ace Editor, to allow autosuggest a SQL language base grammar (a small subset).

The issue I am getting in a basic POC, is that it is going in massive loop on the call of autosuggest function:

POC code

        const suggester = autosuggester.autosuggester(HatsLexer, HatsParser);
        let suggestions = suggester.autosuggest('SELE');
        console.log(suggestions);

Loop caused between these functions:

TokenSuggester.prototype._suggestViaLexerTransition

if (trans.isEpsilon) {
        this._suggest(tokenSoFar, trans.target, remainingText);
    }

and

TokenSuggester.prototype._suggest

Logs:

SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=140
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=16
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=158
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=89
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=172
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=91
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=137
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=63
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=240
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=241
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=64
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=118
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=77

I am pretty newbie in the ANTLR4 and I trying to decode what is going on in this case. Any help would be great. Thanks

Not able to make correct suggestion

Hi @oranoran,

I am using the below grammar file for testing. If user types 'SHOW EMPLOYEE' or 'SELECT EMPLOYEE' then the suggestion should be 'FOR' and 'WHERE', however, it is coming as empty at the moment.

grammar autocomplete;

query
: query_stmt EOF
;

query_stmt
: start_keyword literal_name filter_name
;

filter_name
: 'FOR'
| 'WHERE'
;

start_keyword
: 'SHOW'
| 'SELECT'
;

literal_name
: IDENTIFIER
;

IDENTIFIER: LETTER (LETTER | [0-9])*;

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

UNEXPECTED_CHAR
: .
;

fragment DIGIT : [0-9];

fragment LETTER : [a-zA-Z] ;

Not able to make correct suggestion

Hi,

I am using the following Test.G4, which is a very simple grammar for variable declaration.

grammar Test;

file: (varDecl)+ EOF;

varDecl
: type ID '=' NUMBER ';'
;

type: 'float' | 'int' | 'decimal' ; // user-defined types

ID : LETTER (LETTER | [0-9])* ;

NUMBER: DIGIT+;

fragment LETTER : [a-zA-Z] ;

fragment DIGIT : [0-9];

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

Ideally, if a user types "int a", then the expected suggestions will be '=', but it is not finding any suggestion. Anything wrong here?

Autocompletion not working when token has duplicate letters

Hey, I like the idea of this module - but I'm running into a bit of an issue with a simple auto-completion scenario.

I haven't taken a look at the source code yet but the npm module isn't generating any suggestions for an empty string and the following grammar:

grammar Expr;

prog:	SELECT ;

SELECT : S E L E C T;

WS: [ \t\n] -> channel(HIDDEN);

fragment S : [sS];
fragment E : [eE];
fragment L : [lL];
fragment C : [cC];
fragment T : [tT];

When there is no output, I would have expected an auto suggestion of select, but there is no suggestion given:

{
    "input": "",
    "errors": [],
    "suggestions": []
}

However, if I change the grammar to remove the repeated E letter within SELECT it works as expected:

prog:	SELECT ;

- SELECT : S E L E C T;
+ SELECT : S E L C T;

With the above change to the grammar, the autocomplete now suggests the initial token as expected, without the additional e of course:

{
    "input": "",
    "errors": [],
    "suggestions": [
        "selct"
    ]
}

I am using Antlr 4.7.1, and the code is mostly from the read me:

import { ExprLexer } from './gen/ExprLexer';
import { ExprParser } from './gen/ExprParser';
import ErrorAggregator from './error-aggregator';
import * as autosuggest from 'antlr4-autosuggest';

export function extract(input) {
    const errorAggregator = new ErrorAggregator();
    const autosuggester = autosuggest.autosuggester(ExprLexer, ExprParser, 'LOWER');

    let suggestions = autosuggester.autosuggest(input);

    console.log(JSON.stringify({
        input: input,
        errors: errorAggregator.getErrors(),
        suggestions: suggestions
    }, null, 4));
}

export default extract;

I can create a failing test / provide an example test project to help with debugging, just let me know how I can help ๐Ÿ‘

Add TypeScript bindings

This project needs TypeScript bindings so it can be used more easily in TypeScript projects.

Infinite loop in case of long input text

I am getting an infinite loop for next grammar:

grammar Query;

/* Parser rules */
orexpression
    : andexpression ( OR andexpression )*
    ;

andexpression
    : notexpression ( notexpression | AND notexpression )*
    ;

notexpression
    : (NOT)? searchterm
    ;

searchterm
    : TERM
    | QUOTEDTERM
    | LEFT_PAREN orexpression RIGHT_PAREN
    | linkexpression
    ;

linkexpression
    : LINK LEFT_BRACE linkinfo RIGHT_BRACE LEFT_PAREN orexpression RIGHT_PAREN
    ;

linkinfo
    : QUOTEDTERM ':' TERM
    ;

query:
  orexpression EOF;

/* Lexer rules */
AND : 'AND' ;
OR  : 'OR' ;
NOT : 'NOT' ;
LINK: 'LINK' ;
LEFT_PAREN   : '(' ;
RIGHT_PAREN  : ')' ;
LEFT_BRACE   : '{' ;
RIGHT_BRACE  : '}' ;
QUOTEDTERM   : '"' ~('"')* '"' ;
UNTERMINATED_QUOTEDTERM : '"' ~('"')* ;
NONSPECIALCHAR : ~(' '|'\t'|'"' | '\u00A0' | '(' | ')') ;
TERM : (NONSPECIALCHAR|QUOTEDTERM) (NONSPECIALCHAR|QUOTEDTERM)+ ;
WS  : [ \t\u00A0] -> skip;
ErrorChar : . ;

and with 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK input.

Note that with a shorten input, 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK , token suggestion works.

Thanks

Stack overflow for recursive grammar

I am getting stack overflow for next grammar:

clause
    : clause AND clause
    | action
    ;

action  : 'action' ;

AND : 'AND' ;

with action AND input.

Debug info:

TOKENS FOUND IN FIRST PASS:
[@-1,0:5='action',<1>,1:0]
[@-1,7:9='AND',<2>,1:7]
UNTOKENIZED:  
Parser rule names: clause, action
  State: 0 (type: RuleStartState)
    State: 4 (type: BasicState)
      State: 5 (type: BasicState)
        State: 2 (type: RuleStartState)
          State: 15 (type: BasicState)
            State: 16 (type: BasicState)
              State: 3 (type: RuleStopState)
                State: 6 (type: BasicState)
                  State: 12 (type: StarLoopEntryState)
                    State: 10 (type: StarBlockStartState)
                      State: 7 (type: BasicState)
                        State: 8 (type: BasicState)
                          State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText=  lexerState=1
SUGGEST: tokenSoFar= remainingText=  lexerState=5
NONMATCHING LEXER TOKEN: a remaining= 
                    State: 13 (type: LoopEndState)
                      State: 1 (type: RuleStopState)
                        State: 11 (type: BlockEndState)
                          State: 14 (type: StarLoopbackState)
                            State: 12 (type: StarLoopEntryState)
                              State: 10 (type: StarBlockStartState)
                                State: 7 (type: BasicState)
                                  State: 8 (type: BasicState)
                                    State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText=  lexerState=1
SUGGEST: tokenSoFar= remainingText=  lexerState=5
NONMATCHING LEXER TOKEN: a remaining= 
                              State: 13 (type: LoopEndState)
                                State: 1 (type: RuleStopState)
                                  State: 11 (type: BlockEndState)
                                    State: 14 (type: StarLoopbackState)
                                      State: 12 (type: StarLoopEntryState)
                                        State: 10 (type: StarBlockStartState)
                                          State: 7 (type: BasicState)
                                            State: 8 (type: BasicState)
                                              State: 9 (type: BasicState)
...

Expose matched rule name

I wonder if it's possible to expose the matching rule name as part of the autosuggest API? i.e. Let's consider the following SQL input string that we would like to provide suggestions for:

SELECT * from <cursor>

If we've defined the following grammar:

grammar Expr;

prog:
    SELECT
    selection
    FROM table
    ;

selection:
    '*'
    ;

table: IDENTIFIER  ;

// Keywords
SELECT : [sS] [eE] [lL] [eE] [cC] [tT] ;
FROM : [fF] [rR] [oO] [mM] ;

IDENTIFIER: [a-zA-Z] [a-zA-Z_0-9]+ ;

WS: [ \t\n] -> channel(HIDDEN);

I can see from where the cursor is, that the grammar would expect parse the table rule name next, which happens to be an IDENTIFIER token.

Currently if we use this library we'll get given the following autosuggestions list:

{
    "input": "select * from ",
    "errors": [],
    "suggestions": [
        "aa",
        "ab",
        "ac",
        "ad",
        "ae",
        "af",
        // ... etc etc ...
        "zr",
        "zs",
        "zt",
        "zu",
        "zv",
        "zw",
        "zx",
        "zy",
        "zz"
    ]
}

This module generates a list of valid identifiers, but if it exposed the matched rule name as part of this module's API, we could see that the matching rule name is table, and we could apply our own domain knowledge to provide a richer auto suggestion list, i.e. Looking at the database schema and suggesting the available table names instead of guessing valid identifiers.

Let me know your thoughts! ๐Ÿ‘

Not autosuggesting the values after brute force

Thanks again for this code it is really great.

I am having an issue integrating it with my local JS parser (using ANLTR4 4.9.3).

I have a SQL like grammar (sort of subset). When I run the autosuggest it starts doing a sort of brute force and at the end it does not find the property autosuggest value. For example when I start typing S for SELECT this is what I see in the console.log:

....
SUGGEST: tokenSoFar=DESCH remainingText= lexerState=131
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=137
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=169
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=28
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=174
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=30
debug.js?c20a:2       Not following visited 58->(1) 9
debug.js?c20a:2       Not following visited 109->(1) 29
debug.js?c20a:2       Not following visited 58->(1) 9
debug.js?c20a:2       Not following visited 109->(1) 29
debug.js?c20a:2 DROPPING non-parseable suggestion: GROM
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMA
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMAND
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDELIT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDECT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDEC
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDE
....

At the end autosuggests only:

['=', '!=', ')', ',']

Cannot understand if the issue is in the lexer definition or if I am doing something wrong. Would be great if I can have some help.

Cannot read property 'ruleToStartState' of undefined in TokenSuggester

Hi @oranoran,

There is an issue with below function of TokenSuggester. The atn property is not available in lexer file which is auto-generated by antlr4 ( v 4.7). It appears that in your project the lexer files in testGrammars folder currently have atn property defined which I believe is manually added, so the test cases are not failing.

If we use 'this._lexer._interp.atn.ruleToStartState' instead of 'this._lexer.atn.ruleToStartState' then it should work properly in all the cases.

TokenSuggester.prototype._findLexerStateByRuleNumber = function (ruleNumber) {
    return this._lexer.atn.ruleToStartState.slice(ruleNumber, ruleNumber + 1)[0];
};

Same applies to below function as well.

TokenSuggester.prototype._toLexerState = function (parserState) {
    var lexerState = this._lexer.atn.states.find((x) => { return (x.stateNumber === parserState.stateNumber); });
    if (lexerState == null) {
        debug('No lexer state matches parser state ' + parserState + ', not suggesting completions.');
    }
    return lexerState;
};

Browser-compatible

Hey,
thanks for awesome work and inspiration.

I experimented with this library and looks good.
However, it has dependency that block its usage on web projects: debug, which is purely nodejs.

It should not be a big deal, as antlr4 itself is browser-compatible.
Moreover, it does not seem like crucial library for a project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.