Coder Social home page Coder Social logo

Comments (13)

flodolo avatar flodolo commented on May 21, 2024 1

I might have more questions than answers…

  1. Do we really expect to be able to live without new lines in a string? BTW, we also have \r around in mozilla-central

  2. What about trying to promote FTL as a file format for other uses, where Fluent is not used as the technology driving the project but just as a parser? Is that excluded as a potential scenario? If it's not, supporting unicode and new lines seems needed.

  3. IMO all special characters should be treated equally. If [ is a special character, it should be escaped as \[. Think regular expressions for example.
    Also, the idea of having to write something like { "[" } for displaying one character makes we really want to ┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻

  4. Displaying t when I write \t, because \t is not recognized as a known escape sequence, doesn't sound like a good idea. Here are a few possible scenarios:

  • Code:\t { foo }: I wanted to create a tab, displaying t is bad, I should get rid of the whole escape sequence.
  • Go to c:\documents: I wanted to write a literal \, so it should have been Go to c:\\documents. Again, displaying c:documents doesn't seem like a good idea.
  • %Spx \u00D7 %Spx: I wanted to display %Spx × %Spx, displaying %Spx u00D7 %Spx, awful result.

Maybe dropping the string all together with an error is a better option.

from fluent.

stasm avatar stasm commented on May 21, 2024 1

Last week we met in person and briefly discussed this issues with @Pike, @zbraniecki and @flodolo . Here are the key take-aways from that conversation:

  • We don't have to answer all questions right now.
  • Prefer to be flexible: don't normalize by default.
  • In the future, allow bindings to configure the context's behavior wrt. normalization.
  • Unicode escapes are a safety valve.
  • Parsing \n to n isn't helpful and produces an unexpected behavior.

With that in mind, I'd like to suggest a minimal specification for our current purposes.

  • Escape sequences are only allowed in text and quoted-text.
  • Newlines are preserved by the parser. This allows proper serialization.
  • Known escape sequences are: \\ for the literal backslash, \" for the literal double quote, \{ for the literal opening brace and \u followed by 4 hex digits for Unicode code points. Representing code points from outside of the Basic Multilingual Plane is made possible with surrogate pairs (two \uXXXX sequences). Using the actual character is encouraged, however.
  • Any other escaped characters result in a parsing error. (We might relax this to producing warnings and parsing to a space for instance, but let's start with a stricter approach.)

from fluent.

stasm avatar stasm commented on May 21, 2024

A draft proposal:

  • Escape sequences are only allowed in the text and quoted-text productions.

  • Known escapes are: \\, \*, \[, \{, \uXXXX, \t, \n, \".

    • Do we actually need \t and \n? The whole syntax was designed to make it easy to use white-space and those characters could be written literally, if needed.
  • Unicode sequences are only valid with four characters: \u0020 is valid, \u20 is not. This is the same as in JavaScript.

    • ES2015 introduced Unicode code point escapes which are written as \u{XXXXXXX} with any number of X, thus allowing representing code points from outside of the Basic Multilingual Plane without resorting to using surrogate pairs. Non-BMP code points are very rare but the questions still remains: are we okay with using surrogate pairs for them if we go with the \uXXXX proposal?
  • Escaping any other character returns the character itself and the character is parsed as normal; \a results in a and \ at the end of line results in the EOL character which is parsed as normal. This is different from JavaScript which has a special case for \EOL which is called LineContinuation and technically is not an escape sequence.

    For instance, in the following example, the escaped EOL results in a real EOL and it ends the value part of the a variant:

    foo = {
           *[a] AAA\
            [b] BBB
        }
    

@Pike, @zbraniecki — I'd love to hear your thoughts on this. Thanks!

from fluent.

zbraniecki avatar zbraniecki commented on May 21, 2024

sgtm! I'd not do \n, \t until we have a use case.

from fluent.

stasm avatar stasm commented on May 21, 2024

I woke up this morning and I had another idea: what if we tried to use the { "x" } pattern as much as we can? The following is a counter-proposal to the one above.

Firstly, let's talk about the backslash. In a more extreme version of the proposal, it can (a) become a regular literal character. Or, it could (b) escape any character to itself, taking it out of the parse flow.

  • In (a) we need a new solution for Unicode escapes. Perhaps a new literal: Foo { U+10000 } bar?

  • In (b) we need four exceptions: \\, \", \uXXXX and \EOL because we don't want to take the EOL out of the parse flow.

  • Or, (c) we could introduce the U+10000 literals and leave backslash for escaping only the " and EOL (and the \ itself). This makes sense: these are the characters used to end a string.

Special characters occurring in text can be escaped by putting then in placeables. quoted-text doesn't allow more placeables, so the following are valid and unambiguous: { "{" }, { "[" } etc. The one exception is the double quote " itself. In (a) we don't have a way to put it in a quoted-text.

So the question boils down to: how much do we want to limit the quoted-text production? It is mostly used in call-expression and I like the idea of keeping the arguments very simple. But maybe we don't want to limit them too much in case we'd like to have things like WRAP(brand-name, char: "\"") or LIST(users, separator: "\uXXXX") in the future.

from fluent.

stasm avatar stasm commented on May 21, 2024

After more thought I'd like to go back to the first proposal and also make it simpler.

  • Escape sequences are only allowed in text and quoted-text.
  • Known escape sequences are: \\ for the literal backslash, \" for the literal double quote and \{ for the literal opening brace.
  • Any other escaped characters result in the literal character being added to the text content of the production. So, \a is a and \EOL is EOL.
  • Other special characters like [ can be written as { "[" } if they happen to be at the beginning of the line and should be part of the text content.
  • Using Unicode in FTL is encouraged and as such, we don't offer the \uXXXX sequence at all.

@Pike, @zbraniecki - mind taking another look at this, please?

from fluent.

Pike avatar Pike commented on May 21, 2024

I'm not sure if doing \n->n is a good idea. Or maybe that's something we can warn about in a linter step? It seems like such an ubiquitous assumption that that'd be a newline. And other fall-through escapes. We do have such a warning in compare-locales for .properties, too. Rambling.

For unicode escapes, I've just toyed around with the unicode hex keyboard on the mac. Interestingly, you need to enter surrogate pairs to get to 𝌆, 8 keystrokes away. I wonder if @flodolo or @TheoChevalier have opinions on this as people that actually have to type that unicode stuff.

Apart from that, the latest proposal sounds fine to me.

from fluent.

TheoChevalier avatar TheoChevalier commented on May 21, 2024

I don’t think not being able to use \uXXXX would be a problem, but I guess people would have to try once to discover it’s not supported? Would using it produce syntax error?

from fluent.

stasm avatar stasm commented on May 21, 2024

Do we really expect to be able to live without new lines in a string? BTW, we also have \r around in mozilla-central

New-lines are supported by the syntax natively. You don't need to escape them, just write them as normal:

foo =
    Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed
    aliquam dui quis nibh rutrum semper. Vestibulum a enim eget
    orci imperdiet tincidunt nec mattis leo. Aenean faucibus ligula
    turpis, eu tincidunt lorem malesuada eget.

Also, the idea of having to write something like { "[" } for displaying one character makes we really want to ┻━┻ ︵ヽ(`Д´)ノ︵ ┻━┻

That would be only necessary if [ happens to be at the beginning of a multiline value. Otherwise, it's not special. Consider:

foo = Foo [ Bar ]
bar =
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    { "[" } bar ].

Code:\t { foo }

What is the advantage of writing \t rather than typing the tab character?

from fluent.

flodolo avatar flodolo commented on May 21, 2024

New-lines are supported by the syntax natively.

Uh, forgot that multiline strings have a different syntax. So, that's not a problem.

That would be only necessary if [ happens to be at the beginning of a multiline value.

Aren't we asking too much then to localizers working on these files? I know we would prefer them to use tools, where this would be automated and transparent, but it seems to add a lot of complexity.
Can you think of other languages where escaping depends on the position of the character?

What is the advantage of writing \t rather than typing the tab character?

None, but my understanding is that we're considering the case where someone wrote the string assuming \t (or \n, \r, etc.) would be converted to a tab or a newline, and how to deal with that.

from fluent.

zbraniecki avatar zbraniecki commented on May 21, 2024

What is the advantage of writing \t rather than typing the tab character?

Some editors define tab behavior to jump between inputs.

from fluent.

zbraniecki avatar zbraniecki commented on May 21, 2024

@stasm is there anything left in this issue?

from fluent.

stasm avatar stasm commented on May 21, 2024

No, I forgot to close this issue. And to tag Syntax Spec 0.3 back in April. D'oh. Thanks.

from fluent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.