oranoran / antlr4-autosuggest-js Goto Github PK

View Code? Open in Web Editor NEW

42.0 6.0 16.0 226 KB

JavaScript auto-suggest engine for ANTLR4 grammars

License: MIT License

JavaScript 99.97% Shell 0.03%

antlr4-autosuggest-js's People

Contributors

Stargazers

Watchers

Forkers

bradbanister wisebird ankush2204 spogiri123 jonfreedman aotearoacoder i062955 simerlec yangjunhan v-anhnt234 maddymanu chengchengpei macnev2013 bryzaguy vitaly-z namnv98

antlr4-autosuggest-js's Issues

Intergrating autosuggest with HTML component

Hello, How do I connect this autosuggest with a HTML input component ?

Stack overflow Issue when fetching suggestions

Hi @oranoran,

I am using the following grammar file for testing. If user types 'SHOW EMPLOYEE ' then the suggestion should come as 'FOR' and 'WHERE', however, the autosuggest function is going into an infinite loop and is eventually causing stack overflow.

grammar autocomplete;

query
: query_stmt EOF
;

query_stmt
: start_keyword entity_name ( filter_name expr )?
;

expr
: column_name
| expr operator_exp literal_value
| expr logical_exp expr
;

operator_exp
:OPERATOR
;

logical_exp
:K_AND
;

entity_name
: any_name
;

column_name
: any_name
;

any_name
: IDENTIFIER
| STRING_LITERAL
;

filter_name
: K_FOR
| K_WHERE
;

start_keyword
: K_SHOW
| K_SELECT
;

literal_value
: NUMERIC_LITERAL
| IDENTIFIER
| STRING_LITERAL
;

K_SHOW : S H O W;
K_SELECT : S E L E C T;
K_AND : A N D;
K_FOR : F O R;
K_WHERE : W H E R E;

OPERATOR
: ('=' | '!=' | '>=' | '<=' )
;

IDENTIFIER
: '"' (~'"' | '""')* '"'
| '' (~'' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]*
;

STRING_LITERAL
: ''' ( ~''' | '''' )* '''
;

NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

UNEXPECTED_CHAR
: .
;

fragment DIGIT : [0-9];

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

Massive loop on autosuggest

EDIT: It is not a infinite loop, actually it is a massive loop suggesting all possible combination of upper cases/lower cases

First of all thanks for this, it is an amazing work (anltr4 it is hard).

I am trying to integrate the autosuggest in the Ace Editor, to allow autosuggest a SQL language base grammar (a small subset).

The issue I am getting in a basic POC, is that it is going in massive loop on the call of autosuggest function:

POC code

        const suggester = autosuggester.autosuggester(HatsLexer, HatsParser);
        let suggestions = suggester.autosuggest('SELE');
        console.log(suggestions);

Loop caused between these functions:

TokenSuggester.prototype._suggestViaLexerTransition

if (trans.isEpsilon) {
        this._suggest(tokenSoFar, trans.target, remainingText);
    }

and

TokenSuggester.prototype._suggest

Logs:

SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=140
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=16
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=158
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=89
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=172
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=91
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=137
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=63
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=240
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=241
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=64
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=118
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=77

I am pretty newbie in the ANTLR4 and I trying to decode what is going on in this case. Any help would be great. Thanks

Not able to make correct suggestion

Hi @oranoran,

I am using the below grammar file for testing. If user types 'SHOW EMPLOYEE' or 'SELECT EMPLOYEE' then the suggestion should be 'FOR' and 'WHERE', however, it is coming as empty at the moment.

grammar autocomplete;

query
: query_stmt EOF
;

query_stmt
: start_keyword literal_name filter_name
;

filter_name
: 'FOR'
| 'WHERE'
;

start_keyword
: 'SHOW'
| 'SELECT'
;

literal_name
: IDENTIFIER
;

IDENTIFIER: LETTER (LETTER | [0-9])*;

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

UNEXPECTED_CHAR
: .
;

fragment DIGIT : [0-9];

fragment LETTER : [a-zA-Z] ;

Not able to make correct suggestion

Hi,

I am using the following Test.G4, which is a very simple grammar for variable declaration.

grammar Test;

file: (varDecl)+ EOF;

varDecl
: type ID '=' NUMBER ';'
;

type: 'float' | 'int' | 'decimal' ; // user-defined types

ID : LETTER (LETTER | [0-9])* ;

NUMBER: DIGIT+;

fragment LETTER : [a-zA-Z] ;

fragment DIGIT : [0-9];

SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;

Ideally, if a user types "int a", then the expected suggestions will be '=', but it is not finding any suggestion. Anything wrong here?

Autocompletion not working when token has duplicate letters

Hey, I like the idea of this module - but I'm running into a bit of an issue with a simple auto-completion scenario.

I haven't taken a look at the source code yet but the npm module isn't generating any suggestions for an empty string and the following grammar:

grammar Expr;

prog:	SELECT ;

SELECT : S E L E C T;

WS: [ \t\n] -> channel(HIDDEN);

fragment S : [sS];
fragment E : [eE];
fragment L : [lL];
fragment C : [cC];
fragment T : [tT];

When there is no output, I would have expected an auto suggestion of select, but there is no suggestion given:

{
    "input": "",
    "errors": [],
    "suggestions": []
}

However, if I change the grammar to remove the repeated E letter within SELECT it works as expected:

prog:	SELECT ;

- SELECT : S E L E C T;
+ SELECT : S E L C T;

With the above change to the grammar, the autocomplete now suggests the initial token as expected, without the additional e of course:

{
    "input": "",
    "errors": [],
    "suggestions": [
        "selct"
    ]
}

I am using Antlr 4.7.1, and the code is mostly from the read me:

import { ExprLexer } from './gen/ExprLexer';
import { ExprParser } from './gen/ExprParser';
import ErrorAggregator from './error-aggregator';
import * as autosuggest from 'antlr4-autosuggest';

export function extract(input) {
    const errorAggregator = new ErrorAggregator();
    const autosuggester = autosuggest.autosuggester(ExprLexer, ExprParser, 'LOWER');

    let suggestions = autosuggester.autosuggest(input);

    console.log(JSON.stringify({
        input: input,
        errors: errorAggregator.getErrors(),
        suggestions: suggestions
    }, null, 4));
}

export default extract;

I can create a failing test / provide an example test project to help with debugging, just let me know how I can help 👍

Add TypeScript bindings

This project needs TypeScript bindings so it can be used more easily in TypeScript projects.

Infinite loop in case of long input text

I am getting an infinite loop for next grammar:

grammar Query;

/* Parser rules */
orexpression
    : andexpression ( OR andexpression )*
    ;

andexpression
    : notexpression ( notexpression | AND notexpression )*
    ;

notexpression
    : (NOT)? searchterm
    ;

searchterm
    : TERM
    | QUOTEDTERM
    | LEFT_PAREN orexpression RIGHT_PAREN
    | linkexpression
    ;

linkexpression
    : LINK LEFT_BRACE linkinfo RIGHT_BRACE LEFT_PAREN orexpression RIGHT_PAREN
    ;

linkinfo
    : QUOTEDTERM ':' TERM
    ;

query:
  orexpression EOF;

/* Lexer rules */
AND : 'AND' ;
OR  : 'OR' ;
NOT : 'NOT' ;
LINK: 'LINK' ;
LEFT_PAREN   : '(' ;
RIGHT_PAREN  : ')' ;
LEFT_BRACE   : '{' ;
RIGHT_BRACE  : '}' ;
QUOTEDTERM   : '"' ~('"')* '"' ;
UNTERMINATED_QUOTEDTERM : '"' ~('"')* ;
NONSPECIALCHAR : ~(' '|'\t'|'"' | '\u00A0' | '(' | ')') ;
TERM : (NONSPECIALCHAR|QUOTEDTERM) (NONSPECIALCHAR|QUOTEDTERM)+ ;
WS  : [ \t\u00A0] -> skip;
ErrorChar : . ;

and with 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK input.

Note that with a shorten input, 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK , token suggestion works.

Thanks

Stack overflow for recursive grammar

I am getting stack overflow for next grammar:

clause
    : clause AND clause
    | action
    ;

action  : 'action' ;

AND : 'AND' ;

with action AND input.

Debug info:

TOKENS FOUND IN FIRST PASS:
[@-1,0:5='action',<1>,1:0]
[@-1,7:9='AND',<2>,1:7]
UNTOKENIZED:  
Parser rule names: clause, action
  State: 0 (type: RuleStartState)
    State: 4 (type: BasicState)
      State: 5 (type: BasicState)
        State: 2 (type: RuleStartState)
          State: 15 (type: BasicState)
            State: 16 (type: BasicState)
              State: 3 (type: RuleStopState)
                State: 6 (type: BasicState)
                  State: 12 (type: StarLoopEntryState)
                    State: 10 (type: StarBlockStartState)
                      State: 7 (type: BasicState)
                        State: 8 (type: BasicState)
                          State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText=  lexerState=1
SUGGEST: tokenSoFar= remainingText=  lexerState=5
NONMATCHING LEXER TOKEN: a remaining= 
                    State: 13 (type: LoopEndState)
                      State: 1 (type: RuleStopState)
                        State: 11 (type: BlockEndState)
                          State: 14 (type: StarLoopbackState)
                            State: 12 (type: StarLoopEntryState)
                              State: 10 (type: StarBlockStartState)
                                State: 7 (type: BasicState)
                                  State: 8 (type: BasicState)
                                    State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText=  lexerState=1
SUGGEST: tokenSoFar= remainingText=  lexerState=5
NONMATCHING LEXER TOKEN: a remaining= 
                              State: 13 (type: LoopEndState)
                                State: 1 (type: RuleStopState)
                                  State: 11 (type: BlockEndState)
                                    State: 14 (type: StarLoopbackState)
                                      State: 12 (type: StarLoopEntryState)
                                        State: 10 (type: StarBlockStartState)
                                          State: 7 (type: BasicState)
                                            State: 8 (type: BasicState)
                                              State: 9 (type: BasicState)
...

Expose matched rule name

I wonder if it's possible to expose the matching rule name as part of the autosuggest API? i.e. Let's consider the following SQL input string that we would like to provide suggestions for:

SELECT * from <cursor>

If we've defined the following grammar:

grammar Expr;

prog:
    SELECT
    selection
    FROM table
    ;

selection:
    '*'
    ;

table: IDENTIFIER  ;

// Keywords
SELECT : [sS] [eE] [lL] [eE] [cC] [tT] ;
FROM : [fF] [rR] [oO] [mM] ;

IDENTIFIER: [a-zA-Z] [a-zA-Z_0-9]+ ;

WS: [ \t\n] -> channel(HIDDEN);

I can see from where the cursor is, that the grammar would expect parse the table rule name next, which happens to be an IDENTIFIER token.

Currently if we use this library we'll get given the following autosuggestions list:

{
    "input": "select * from ",
    "errors": [],
    "suggestions": [
        "aa",
        "ab",
        "ac",
        "ad",
        "ae",
        "af",
        // ... etc etc ...
        "zr",
        "zs",
        "zt",
        "zu",
        "zv",
        "zw",
        "zx",
        "zy",
        "zz"
    ]
}

This module generates a list of valid identifiers, but if it exposed the matched rule name as part of this module's API, we could see that the matching rule name is table, and we could apply our own domain knowledge to provide a richer auto suggestion list, i.e. Looking at the database schema and suggesting the available table names instead of guessing valid identifiers.

Let me know your thoughts! 👍

Not autosuggesting the values after brute force

Thanks again for this code it is really great.

I am having an issue integrating it with my local JS parser (using ANLTR4 4.9.3).

I have a SQL like grammar (sort of subset). When I run the autosuggest it starts doing a sort of brute force and at the end it does not find the property autosuggest value. For example when I start typing S for SELECT this is what I see in the console.log:

....
SUGGEST: tokenSoFar=DESCH remainingText= lexerState=131
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=137
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=169
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=28
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=174
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=30
debug.js?c20a:2       Not following visited 58->(1) 9
debug.js?c20a:2       Not following visited 109->(1) 29
debug.js?c20a:2       Not following visited 58->(1) 9
debug.js?c20a:2       Not following visited 109->(1) 29
debug.js?c20a:2 DROPPING non-parseable suggestion: GROM
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMA
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMAND
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDELIT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDECT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDEC
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDE
....

At the end autosuggests only:

['=', '!=', ')', ',']

Cannot understand if the issue is in the lexer definition or if I am doing something wrong. Would be great if I can have some help.

Cannot read property 'ruleToStartState' of undefined in TokenSuggester

Hi @oranoran,

There is an issue with below function of TokenSuggester. The atn property is not available in lexer file which is auto-generated by antlr4 ( v 4.7). It appears that in your project the lexer files in testGrammars folder currently have atn property defined which I believe is manually added, so the test cases are not failing.

If we use 'this._lexer._interp.atn.ruleToStartState' instead of 'this._lexer.atn.ruleToStartState' then it should work properly in all the cases.

TokenSuggester.prototype._findLexerStateByRuleNumber = function (ruleNumber) {
    return this._lexer.atn.ruleToStartState.slice(ruleNumber, ruleNumber + 1)[0];
};

Same applies to below function as well.

TokenSuggester.prototype._toLexerState = function (parserState) {
    var lexerState = this._lexer.atn.states.find((x) => { return (x.stateNumber === parserState.stateNumber); });
    if (lexerState == null) {
        debug('No lexer state matches parser state ' + parserState + ', not suggesting completions.');
    }
    return lexerState;
};

Browser-compatible

Hey,
thanks for awesome work and inspiration.

I experimented with this library and looks good.
However, it has dependency that block its usage on web projects: debug, which is purely nodejs.

It should not be a big deal, as antlr4 itself is browser-compatible.
Moreover, it does not seem like crucial library for a project.

oranoran / antlr4-autosuggest-js Goto Github PK

antlr4-autosuggest-js's People

Contributors

Stargazers

Watchers

Forkers

antlr4-autosuggest-js's Issues

Recommend Projects

Recommend Topics

Recommend Org