Comments (4)
This is an example of a more general thing which we currently allow in nearly all cases without attempting to mark invalidity: one assignment expression following another without a semicolon or (if applicable) a possible ASI opportunity. For example:
Although less surprising visually, that example is illustrating the same thing as 0b0123
. For 2 2
, a space is needed to observe the problem because 22 would lex as a single decimal token. Lexically, 0b0123
is valid source text — it tokenizes as a BinaryIntegerLiteral "0b01" followed by a DecimalLiteral "23" without a hitch. Whitespace isn’t required to appear between tokens and there’s no lookahead assertion after BinaryDigit or anything. But then in the syntactic grammar, both 2 2
and 0b0123
will end up being invalid anway, and for the same reason, which is that one number token followed by another doesn’t match any production in ES.
So the broader problem has to do with deciding that an expression (or at least, whatever would seem to start one) isn’t legal if the last thing matched was itself an expression. The current ‘allowances’ occur in various ways. For example, we aren’t requiring expression statements to be followed by a semicolon (or an ASI opportunity), and we aren’t requiring array element assignment expressions to be separated by at least one comma.
IIRC, I think I was originally reluctant to pursue marking ‘unexpected expression continations’ as invalid for two reasons. One is just the heuristic nature of expression matching in sublime syntax. Though hopefully the expression contexts ultimately implement the same logic (to the extent possible) as the real expression grammar, it’s still tough to say confidently ‘that’s definitely wrong’ when AE contexts didn’t find a way to continue, yet the next token is an ‘AE component’. This is closely related to the second, and primary reason: ASI. There’s really very little in ES Sublime that attempts addressing ASI (much less which attempts addressing it correctly) except to occasionally throw our hands in the air and say ‘well, anything could happen here,’ which is what occurs in these examples. The ASI algorithm + absence of cross-line matching in sublime-syntax ... not best pals.
There’s a precedent for special casing stuff like 0b0123
. It’s unambiguously a typo and addressing it doesn’t need to entail wading into the ASI swamp. The existing similar case is that we are expressly disallowing matching 123abc
as a decimal followed by an identifier, even though lexically that is correct (for the exact same reason that 0b0123
is). This is done by including {{idEnd}} at the end of the decimal pattern. The same could be done for the binary and octal (and hex) patterns.
That would fix the specific cases reported here, but on reviewing the bigger problem, I find that solution feels pretty weak. It’s not just a band-aid which doesn’t prevent the majority of similar cases from occurring — it also doesn’t actually produce the correct output. The existing {{idEnd}} in the decimal pattern should be removed, too, since it causes 123abc to be marked as invalid, yet the real invalid token here is only abc. Failing to scope the first of the two tokens as valid obscures where the mistake really happens. The SyntaxError messages in Firefox communicate this correctly:
So the questions now are sorta ... was I correct to be wary of / give up on asserting ‘this is wrong’ upon encountering another AE right after bottoming out in ae_AFTER_POSTFIX
? Is more accurate ASI simulation intractable?
I just had a go at correcting the handling in cases within arrray literals. This one is quite simple:
It’s also easy to get to this:
...but what’s hard is getting to that without also getting to this:
I suspect tackling ASI in a more legit way is possible. Some of the existing ASI-related logic has clear room for improvement, correction, and unification. I’ll continue playing around with different strategies (which likely will involve meta_include_prototype: false
, since linebreaks within comments count for ASI) for a bit. If it turns out to be too tough right now, sprinkling some {{idEnd}} dust on the numeric patterns might be alright as a stopgap.
from ecmascript-sublime.
(tagging both 'bug' and 'enhancement' wouldn’t quite capture the nature of this issue like bughancement does)
from ecmascript-sublime.
Just an update:
I dug into the ASI question more today. The logic can be generalized to address the whole category in theory, but it still fails under most common circumstances because all the constructs to which ASI can apply also (implicitly) have meta_include_prototype: true. That means they’ll consume whitespace (including newlines) and comments (including newlines) greedily. Putting meta_include_prototype in the asi scope effectively does nothing, then, and the only cases you can really handle right are those where the token which follows an ‘asi newline’ does so immediately (since you can use ^
, in that case).
If we could use arbitrary length lookbehinds, this could be addressed with pretty good accuracy without big changes (‘am I, the unexpected token, preceded by a newline followed by zero or more whitespace & inline comment tokens, or what appears to be the tail end of a multiline comment? if so, asi can be considered applicable; pop’). Sublime requires lookbehinds to have fixed length (probably for good reason), so that isn’t an option.
A solution would seem to need to get pushed up to all the points where expressions might or might not continue. All such scopes would have to have meta_include_prototype: false and would add an include of a new scope that, on the basis of newline-including would-be-proto tokens, may shift into additional new ‘asi could happen’ and ‘asi cannot happen’ scopes. I’m not sure how realistic this is — I think it would need to extend its tendrils through everything — though it kinda makes sense that this would be so, since the ASI algorithm is essentially ‘those non-syntactic tokens? they’re syntactic now, maybe’.
from ecmascript-sublime.
Really interesting getting this deep into the syntax -- I wonder if this will be yet another aspect of the language that ultimately motivates us to transition to a mostly generated syntax def. We could easily experiment with the feasibility of recursively generating contexts for situations like this.
By the way, I ended up creating this API/extension for generating syntaxes to streamline my work with the LinkedData syntaxes package. The API/extension is largely informed by the techniques I picked up from you and working on this repo (e.g., else_pop, other_illegal, etc). Perhaps the most convenient feature of this library tho is the switch/goto
directives which create auto-lookahead variables for all the possible regexes that can match in the context to which they are transitioning. Overall, it brings the syntax much closer to the productions of PEG parser.
from ecmascript-sublime.
Related Issues (20)
- a way of setting the default syntax for a template tag HOT 2
- Add support for graphql nested syntax HOT 5
- Comma background in destructuring assignments HOT 5
- Error loading colour scheme Packages/Ecmascript Syntax/excelsior.tmTheme HOT 4
- Update repository description HOT 3
- Add support for pug nested syntax HOT 5
- Build process does not work on Windows HOT 1
- Pipeline operator
- styled jsx has no css autocomplete and emmet does not work HOT 5
- Use a package setting to override the block comment directive syntax HOT 6
- Add support for interpolated attribute values in HTML / HTML-like template tags without quotation marks HOT 7
- AE identifiers prefixed with reserved words followed by `$`
- Async iterable inline literal object
- Associating Ecmascript with source.js ? HOT 4
- Update README ? HOT 1
- Symbols are not indexed in embedded syntax HOT 1
- Add DOM support for built-in color schemes HOT 5
- Turn off Safe Mode/Prefer Ecmascript standard mode. HOT 2
- LSP integration HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ecmascript-sublime.