oranoran / antlr4-autosuggest-js Goto Github PK
View Code? Open in Web Editor NEWJavaScript auto-suggest engine for ANTLR4 grammars
License: MIT License
JavaScript auto-suggest engine for ANTLR4 grammars
License: MIT License
Hello, How do I connect this autosuggest with a HTML input component ?
Hi @oranoran,
I am using the following grammar file for testing. If user types 'SHOW EMPLOYEE ' then the suggestion should come as 'FOR' and 'WHERE', however, the autosuggest function is going into an infinite loop and is eventually causing stack overflow.
grammar autocomplete;
query
: query_stmt EOF
;
query_stmt
: start_keyword entity_name ( filter_name expr )?
;
expr
: column_name
| expr operator_exp literal_value
| expr logical_exp expr
;
operator_exp
:OPERATOR
;
logical_exp
:K_AND
;
entity_name
: any_name
;
column_name
: any_name
;
any_name
: IDENTIFIER
| STRING_LITERAL
;
filter_name
: K_FOR
| K_WHERE
;
start_keyword
: K_SHOW
| K_SELECT
;
literal_value
: NUMERIC_LITERAL
| IDENTIFIER
| STRING_LITERAL
;
K_SHOW : S H O W;
K_SELECT : S E L E C T;
K_AND : A N D;
K_FOR : F O R;
K_WHERE : W H E R E;
OPERATOR
: ('=' | '!=' | '>=' | '<=' )
;
IDENTIFIER
: '"' (~'"' | '""')* '"'
| '' (~'
' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]*
;
STRING_LITERAL
: ''' ( ~''' | '''' )* '''
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
UNEXPECTED_CHAR
: .
;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
EDIT: It is not a infinite loop, actually it is a massive loop suggesting all possible combination of upper cases/lower cases
First of all thanks for this, it is an amazing work (anltr4 it is hard).
I am trying to integrate the autosuggest in the Ace Editor, to allow autosuggest a SQL language base grammar (a small subset).
The issue I am getting in a basic POC, is that it is going in massive loop on the call of autosuggest function:
POC code
const suggester = autosuggester.autosuggester(HatsLexer, HatsParser);
let suggestions = suggester.autosuggest('SELE');
console.log(suggestions);
Loop caused between these functions:
TokenSuggester.prototype._suggestViaLexerTransition
if (trans.isEpsilon) {
this._suggest(tokenSoFar, trans.target, remainingText);
}
and
TokenSuggester.prototype._suggest
Logs:
SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=140
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=16
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=158
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=89
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=172
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHe remainingText= lexerState=91
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=137
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=63
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScH remainingText= lexerState=240
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=241
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=64
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=118
tokenSuggester.js?8de6:26 SUGGEST: tokenSoFar=GROmaScHE remainingText= lexerState=77
I am pretty newbie in the ANTLR4 and I trying to decode what is going on in this case. Any help would be great. Thanks
Hi @oranoran,
I am using the below grammar file for testing. If user types 'SHOW EMPLOYEE' or 'SELECT EMPLOYEE' then the suggestion should be 'FOR' and 'WHERE', however, it is coming as empty at the moment.
grammar autocomplete;
query
: query_stmt EOF
;
query_stmt
: start_keyword literal_name filter_name
;
filter_name
: 'FOR'
| 'WHERE'
;
start_keyword
: 'SHOW'
| 'SELECT'
;
literal_name
: IDENTIFIER
;
IDENTIFIER: LETTER (LETTER | [0-9])*;
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
UNEXPECTED_CHAR
: .
;
fragment DIGIT : [0-9];
fragment LETTER : [a-zA-Z] ;
Hi,
I am using the following Test.G4, which is a very simple grammar for variable declaration.
grammar Test;
file: (varDecl)+ EOF;
varDecl
: type ID '=' NUMBER ';'
;
type: 'float' | 'int' | 'decimal' ; // user-defined types
ID : LETTER (LETTER | [0-9])* ;
NUMBER: DIGIT+;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9];
SPACES
: [ \u000B\t\r\n] -> channel(HIDDEN)
;
Ideally, if a user types "int a", then the expected suggestions will be '=', but it is not finding any suggestion. Anything wrong here?
Hey, I like the idea of this module - but I'm running into a bit of an issue with a simple auto-completion scenario.
I haven't taken a look at the source code yet but the npm module isn't generating any suggestions for an empty string and the following grammar:
grammar Expr;
prog: SELECT ;
SELECT : S E L E C T;
WS: [ \t\n] -> channel(HIDDEN);
fragment S : [sS];
fragment E : [eE];
fragment L : [lL];
fragment C : [cC];
fragment T : [tT];
When there is no output, I would have expected an auto suggestion of select
, but there is no suggestion given:
{
"input": "",
"errors": [],
"suggestions": []
}
However, if I change the grammar to remove the repeated E
letter within SELECT
it works as expected:
prog: SELECT ;
- SELECT : S E L E C T;
+ SELECT : S E L C T;
With the above change to the grammar, the autocomplete now suggests the initial token as expected, without the additional e of course:
{
"input": "",
"errors": [],
"suggestions": [
"selct"
]
}
I am using Antlr 4.7.1, and the code is mostly from the read me:
import { ExprLexer } from './gen/ExprLexer';
import { ExprParser } from './gen/ExprParser';
import ErrorAggregator from './error-aggregator';
import * as autosuggest from 'antlr4-autosuggest';
export function extract(input) {
const errorAggregator = new ErrorAggregator();
const autosuggester = autosuggest.autosuggester(ExprLexer, ExprParser, 'LOWER');
let suggestions = autosuggester.autosuggest(input);
console.log(JSON.stringify({
input: input,
errors: errorAggregator.getErrors(),
suggestions: suggestions
}, null, 4));
}
export default extract;
I can create a failing test / provide an example test project to help with debugging, just let me know how I can help ๐
This project needs TypeScript bindings so it can be used more easily in TypeScript projects.
I am getting an infinite loop for next grammar:
grammar Query;
/* Parser rules */
orexpression
: andexpression ( OR andexpression )*
;
andexpression
: notexpression ( notexpression | AND notexpression )*
;
notexpression
: (NOT)? searchterm
;
searchterm
: TERM
| QUOTEDTERM
| LEFT_PAREN orexpression RIGHT_PAREN
| linkexpression
;
linkexpression
: LINK LEFT_BRACE linkinfo RIGHT_BRACE LEFT_PAREN orexpression RIGHT_PAREN
;
linkinfo
: QUOTEDTERM ':' TERM
;
query:
orexpression EOF;
/* Lexer rules */
AND : 'AND' ;
OR : 'OR' ;
NOT : 'NOT' ;
LINK: 'LINK' ;
LEFT_PAREN : '(' ;
RIGHT_PAREN : ')' ;
LEFT_BRACE : '{' ;
RIGHT_BRACE : '}' ;
QUOTEDTERM : '"' ~('"')* '"' ;
UNTERMINATED_QUOTEDTERM : '"' ~('"')* ;
NONSPECIALCHAR : ~(' '|'\t'|'"' | '\u00A0' | '(' | ')') ;
TERM : (NONSPECIALCHAR|QUOTEDTERM) (NONSPECIALCHAR|QUOTEDTERM)+ ;
WS : [ \t\u00A0] -> skip;
ErrorChar : . ;
and with 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK
input.
Note that with a shorten input, 'LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK { \"Is Version Of\" : ARTICLE } ( title:test ) OR LINK
, token suggestion works.
Thanks
I am getting stack overflow for next grammar:
clause
: clause AND clause
| action
;
action : 'action' ;
AND : 'AND' ;
with action AND
input.
Debug info:
TOKENS FOUND IN FIRST PASS:
[@-1,0:5='action',<1>,1:0]
[@-1,7:9='AND',<2>,1:7]
UNTOKENIZED:
Parser rule names: clause, action
State: 0 (type: RuleStartState)
State: 4 (type: BasicState)
State: 5 (type: BasicState)
State: 2 (type: RuleStartState)
State: 15 (type: BasicState)
State: 16 (type: BasicState)
State: 3 (type: RuleStopState)
State: 6 (type: BasicState)
State: 12 (type: StarLoopEntryState)
State: 10 (type: StarBlockStartState)
State: 7 (type: BasicState)
State: 8 (type: BasicState)
State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText= lexerState=1
SUGGEST: tokenSoFar= remainingText= lexerState=5
NONMATCHING LEXER TOKEN: a remaining=
State: 13 (type: LoopEndState)
State: 1 (type: RuleStopState)
State: 11 (type: BlockEndState)
State: 14 (type: StarLoopbackState)
State: 12 (type: StarLoopEntryState)
State: 10 (type: StarBlockStartState)
State: 7 (type: BasicState)
State: 8 (type: BasicState)
State: 9 (type: BasicState)
Suggesting tokens for rule numbers: 1
SUGGEST: tokenSoFar= remainingText= lexerState=1
SUGGEST: tokenSoFar= remainingText= lexerState=5
NONMATCHING LEXER TOKEN: a remaining=
State: 13 (type: LoopEndState)
State: 1 (type: RuleStopState)
State: 11 (type: BlockEndState)
State: 14 (type: StarLoopbackState)
State: 12 (type: StarLoopEntryState)
State: 10 (type: StarBlockStartState)
State: 7 (type: BasicState)
State: 8 (type: BasicState)
State: 9 (type: BasicState)
...
I wonder if it's possible to expose the matching rule name as part of the autosuggest API? i.e. Let's consider the following SQL input string that we would like to provide suggestions for:
SELECT * from <cursor>
If we've defined the following grammar:
grammar Expr;
prog:
SELECT
selection
FROM table
;
selection:
'*'
;
table: IDENTIFIER ;
// Keywords
SELECT : [sS] [eE] [lL] [eE] [cC] [tT] ;
FROM : [fF] [rR] [oO] [mM] ;
IDENTIFIER: [a-zA-Z] [a-zA-Z_0-9]+ ;
WS: [ \t\n] -> channel(HIDDEN);
I can see from where the cursor is, that the grammar would expect parse the table
rule name next, which happens to be an IDENTIFIER token.
Currently if we use this library we'll get given the following autosuggestions list:
{
"input": "select * from ",
"errors": [],
"suggestions": [
"aa",
"ab",
"ac",
"ad",
"ae",
"af",
// ... etc etc ...
"zr",
"zs",
"zt",
"zu",
"zv",
"zw",
"zx",
"zy",
"zz"
]
}
This module generates a list of valid identifiers, but if it exposed the matched rule name as part of this module's API, we could see that the matching rule name is table
, and we could apply our own domain knowledge to provide a richer auto suggestion list, i.e. Looking at the database schema and suggesting the available table names instead of guessing valid identifiers.
Let me know your thoughts! ๐
Thanks again for this code it is really great.
I am having an issue integrating it with my local JS parser (using ANLTR4 4.9.3).
I have a SQL like grammar (sort of subset). When I run the autosuggest it starts doing a sort of brute force and at the end it does not find the property autosuggest value. For example when I start typing S
for SELECT
this is what I see in the console.log:
....
SUGGEST: tokenSoFar=DESCH remainingText= lexerState=131
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=137
debug.js?c20a:2 SUGGEST: tokenSoFar=DESCH remainingText= lexerState=63
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=169
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=28
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=174
debug.js?c20a:2 SUGGEST: tokenSoFar=DESC remainingText= lexerState=30
debug.js?c20a:2 Not following visited 58->(1) 9
debug.js?c20a:2 Not following visited 109->(1) 29
debug.js?c20a:2 Not following visited 58->(1) 9
debug.js?c20a:2 Not following visited 109->(1) 29
debug.js?c20a:2 DROPPING non-parseable suggestion: GROM
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMA
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMAND
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDELIT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDECT
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDEC
debug.js?c20a:2 DROPPING non-parseable suggestion: GROMANDE
....
At the end autosuggests only:
['=', '!=', ')', ',']
Cannot understand if the issue is in the lexer definition or if I am doing something wrong. Would be great if I can have some help.
Hi @oranoran,
There is an issue with below function of TokenSuggester. The atn property is not available in lexer file which is auto-generated by antlr4 ( v 4.7). It appears that in your project the lexer files in testGrammars folder currently have atn property defined which I believe is manually added, so the test cases are not failing.
If we use 'this._lexer._interp.atn.ruleToStartState' instead of 'this._lexer.atn.ruleToStartState' then it should work properly in all the cases.
TokenSuggester.prototype._findLexerStateByRuleNumber = function (ruleNumber) {
return this._lexer.atn.ruleToStartState.slice(ruleNumber, ruleNumber + 1)[0];
};
Same applies to below function as well.
TokenSuggester.prototype._toLexerState = function (parserState) {
var lexerState = this._lexer.atn.states.find((x) => { return (x.stateNumber === parserState.stateNumber); });
if (lexerState == null) {
debug('No lexer state matches parser state ' + parserState + ', not suggesting completions.');
}
return lexerState;
};
Hey,
thanks for awesome work and inspiration.
I experimented with this library and looks good.
However, it has dependency that block its usage on web projects: debug
, which is purely nodejs.
It should not be a big deal, as antlr4
itself is browser-compatible.
Moreover, it does not seem like crucial library for a project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.