Comments (13)
I might have more questions than answers…
-
Do we really expect to be able to live without new lines in a string? BTW, we also have \r around in mozilla-central
-
What about trying to promote FTL as a file format for other uses, where Fluent is not used as the technology driving the project but just as a parser? Is that excluded as a potential scenario? If it's not, supporting unicode and new lines seems needed.
-
IMO all special characters should be treated equally. If
[
is a special character, it should be escaped as\[
. Think regular expressions for example.
Also, the idea of having to write something like{ "[" }
for displaying one character makes we really want to ┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻ -
Displaying
t
when I write\t
, because\t
is not recognized as a known escape sequence, doesn't sound like a good idea. Here are a few possible scenarios:
Code:\t { foo }
: I wanted to create a tab, displaying t is bad, I should get rid of the whole escape sequence.Go to c:\documents
: I wanted to write a literal\
, so it should have beenGo to c:\\documents
. Again, displayingc:documents
doesn't seem like a good idea.%Spx \u00D7 %Spx
: I wanted to display%Spx × %Spx
, displaying%Spx u00D7 %Spx
, awful result.
Maybe dropping the string all together with an error is a better option.
from fluent.
Last week we met in person and briefly discussed this issues with @Pike, @zbraniecki and @flodolo . Here are the key take-aways from that conversation:
- We don't have to answer all questions right now.
- Prefer to be flexible: don't normalize by default.
- In the future, allow bindings to configure the context's behavior wrt. normalization.
- Unicode escapes are a safety valve.
- Parsing
\n
ton
isn't helpful and produces an unexpected behavior.
With that in mind, I'd like to suggest a minimal specification for our current purposes.
- Escape sequences are only allowed in
text
andquoted-text
. - Newlines are preserved by the parser. This allows proper serialization.
- Known escape sequences are:
\\
for the literal backslash,\"
for the literal double quote,\{
for the literal opening brace and\u
followed by 4 hex digits for Unicode code points. Representing code points from outside of the Basic Multilingual Plane is made possible with surrogate pairs (two\uXXXX
sequences). Using the actual character is encouraged, however. - Any other escaped characters result in a parsing error. (We might relax this to producing warnings and parsing to a space for instance, but let's start with a stricter approach.)
from fluent.
A draft proposal:
-
Escape sequences are only allowed in the
text
andquoted-text
productions. -
Known escapes are:
\\
,\*
,\[
,\{
,\uXXXX
,\t
,\n
,\"
.- Do we actually need
\t
and\n
? The whole syntax was designed to make it easy to use white-space and those characters could be written literally, if needed.
- Do we actually need
-
Unicode sequences are only valid with four characters:
\u0020
is valid,\u20
is not. This is the same as in JavaScript.- ES2015 introduced Unicode code point escapes which are written as
\u{XXXXXXX}
with any number ofX
, thus allowing representing code points from outside of the Basic Multilingual Plane without resorting to using surrogate pairs. Non-BMP code points are very rare but the questions still remains: are we okay with using surrogate pairs for them if we go with the\uXXXX
proposal?
- ES2015 introduced Unicode code point escapes which are written as
-
Escaping any other character returns the character itself and the character is parsed as normal;
\a
results ina
and\
at the end of line results in the EOL character which is parsed as normal. This is different from JavaScript which has a special case for\EOL
which is calledLineContinuation
and technically is not an escape sequence.For instance, in the following example, the escaped EOL results in a real EOL and it ends the value part of the
a
variant:foo = { *[a] AAA\ [b] BBB }
@Pike, @zbraniecki — I'd love to hear your thoughts on this. Thanks!
from fluent.
sgtm! I'd not do \n
, \t
until we have a use case.
from fluent.
I woke up this morning and I had another idea: what if we tried to use the { "x" }
pattern as much as we can? The following is a counter-proposal to the one above.
Firstly, let's talk about the backslash. In a more extreme version of the proposal, it can (a) become a regular literal character. Or, it could (b) escape any character to itself, taking it out of the parse flow.
-
In (a) we need a new solution for Unicode escapes. Perhaps a new literal:
Foo { U+10000 } bar
? -
In (b) we need four exceptions:
\\
,\"
,\uXXXX
and\EOL
because we don't want to take the EOL out of the parse flow. -
Or, (c) we could introduce the
U+10000
literals and leave backslash for escaping only the"
andEOL
(and the\
itself). This makes sense: these are the characters used to end a string.
Special characters occurring in text
can be escaped by putting then in placeables. quoted-text
doesn't allow more placeables, so the following are valid and unambiguous: { "{" }
, { "[" }
etc. The one exception is the double quote "
itself. In (a) we don't have a way to put it in a quoted-text
.
So the question boils down to: how much do we want to limit the quoted-text
production? It is mostly used in call-expression
and I like the idea of keeping the arguments very simple. But maybe we don't want to limit them too much in case we'd like to have things like WRAP(brand-name, char: "\"")
or LIST(users, separator: "\uXXXX")
in the future.
from fluent.
After more thought I'd like to go back to the first proposal and also make it simpler.
- Escape sequences are only allowed in
text
andquoted-text
. - Known escape sequences are:
\\
for the literal backslash,\"
for the literal double quote and\{
for the literal opening brace. - Any other escaped characters result in the literal character being added to the text content of the production. So,
\a
isa
and\EOL
isEOL
. - Other special characters like
[
can be written as{ "[" }
if they happen to be at the beginning of the line and should be part of the text content. - Using Unicode in FTL is encouraged and as such, we don't offer the
\uXXXX
sequence at all.
@Pike, @zbraniecki - mind taking another look at this, please?
from fluent.
I'm not sure if doing \n
->n
is a good idea. Or maybe that's something we can warn about in a linter step? It seems like such an ubiquitous assumption that that'd be a newline. And other fall-through escapes. We do have such a warning in compare-locales for .properties, too. Rambling.
For unicode escapes, I've just toyed around with the unicode hex keyboard on the mac. Interestingly, you need to enter surrogate pairs to get to 𝌆, 8 keystrokes away. I wonder if @flodolo or @TheoChevalier have opinions on this as people that actually have to type that unicode stuff.
Apart from that, the latest proposal sounds fine to me.
from fluent.
I don’t think not being able to use \uXXXX
would be a problem, but I guess people would have to try once to discover it’s not supported? Would using it produce syntax error?
from fluent.
Do we really expect to be able to live without new lines in a string? BTW, we also have \r around in mozilla-central
New-lines are supported by the syntax natively. You don't need to escape them, just write them as normal:
foo =
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed
aliquam dui quis nibh rutrum semper. Vestibulum a enim eget
orci imperdiet tincidunt nec mattis leo. Aenean faucibus ligula
turpis, eu tincidunt lorem malesuada eget.
Also, the idea of having to write something like { "[" } for displaying one character makes we really want to ┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻
That would be only necessary if [
happens to be at the beginning of a multiline value. Otherwise, it's not special. Consider:
foo = Foo [ Bar ]
bar =
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
{ "[" } bar ].
Code:\t { foo }
What is the advantage of writing \t
rather than typing the tab character?
from fluent.
New-lines are supported by the syntax natively.
Uh, forgot that multiline strings have a different syntax. So, that's not a problem.
That would be only necessary if [ happens to be at the beginning of a multiline value.
Aren't we asking too much then to localizers working on these files? I know we would prefer them to use tools, where this would be automated and transparent, but it seems to add a lot of complexity.
Can you think of other languages where escaping depends on the position of the character?
What is the advantage of writing \t rather than typing the tab character?
None, but my understanding is that we're considering the case where someone wrote the string assuming \t
(or \n
, \r
, etc.) would be converted to a tab or a newline, and how to deal with that.
from fluent.
What is the advantage of writing \t rather than typing the tab character?
Some editors define tab
behavior to jump between inputs.
from fluent.
@stasm is there anything left in this issue?
from fluent.
No, I forgot to close this issue. And to tag Syntax Spec 0.3 back in April. D'oh. Thanks.
from fluent.
Related Issues (20)
- Korean markers (조사. e.g. 은/는, 이/가, 을/를, 와/과, ...) support? HOT 2
- reusing terms to create more complex terms HOT 2
- Term's attributes can't be used as variable values HOT 2
- Term's attributes can't be used as placeable HOT 6
- `resolveVariableReference` enforces variables names to be own properties HOT 1
- Allow dots in message identifiers HOT 7
- Select expression breaks if you have dot character on the second line of any branch HOT 2
- Compiling fluent HOT 3
- Documentation on projectfluent.org is not updated HOT 1
- Let's reduce the gap between Fluent and MessageFormat 2
- EVAL builtin function
- Pass variables from app, to a fluent selector HOT 2
- Make `[]` legal in selector HOT 4
- Design a logo (square/circle) for *.ftl file associations HOT 3
- Qt (Linguist) support HOT 1
- Future of Fluent, MessageFormat 2, etc HOT 24
- Proper way to display ordinal numbers HOT 3
- Improve selector usages in the wiki HOT 1
- syntax guide seems to be down HOT 2
- Forward all variables to parameterized term
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluent.