Coder Social home page Coder Social logo

Support unicode escapes about fastparse HOT 7 CLOSED

lihaoyi avatar lihaoyi commented on June 8, 2024
Support unicode escapes

from fastparse.

Comments (7)

sirthias avatar sirthias commented on June 8, 2024

I agree that we should try to get away w/o preprocessing if at all possible.

As to anyOf and noneOf: If these only contain 7-bit ASCII chars you should replace them with CharPredicate instances defined on the companion object. Performance will be much better.
See the CharacterClasses definition in the akka-http header parser for inspiration.

is there any way to override the parsing over every single character or string

Yes, you can override the implicit converstion from String and/or Char.
See the Handling Whitespace section of the README for an example.

from fastparse.

propensive avatar propensive commented on June 8, 2024

This might be difficult to get right given that Unicode escaping is a
pre-process step in the compiler. How would you parse this, for example?

"\u005c"

Scalac equates it to a single backslash before parsing, which escapes the
"closing" double-quote, and thus it fails to parse.

Though "\u005c" parses just fine, and is equal to """\u005c""". But I can
only say that because Unicode escapes are the only kinds of escape that
work inside triple-quoted strings...

Incidentally, Erik Osheim did some work removing Unicode escaping as a
pre-processor step in the compiler, and moving it into the single-quoted
string parser to try to remove most of the unintuitive corner cases. This
should make it into the Typelevel fork at some point...

Cheers,
Jon

On 29 November 2014 at 20:51, Mathias [email protected] wrote:

I agree that we should try to get away w/o preprocessing if at all
possible.

As to anyOf and noneOf: If these only contain 7-bit ASCII chars you
should replace them with CharPredicate instances defined on the companion
object. Performance will be much better.
See the CharacterClasses definition
https://github.com/akka/akka/blob/release-2.3-dev/akka-http-core/src/main/scala/akka/http/model/parser/CharacterClasses.scala
in the akka-http header parser for inspiration.

is there any way to override the parsing over every single character or
string

Yes, you can override the implicit converstion from String and/or Char.
See the Handling Whitespace
https://github.com/sirthias/parboiled2#handling-whitespace section of
the README for an example.


Reply to this email directly or view it on GitHub
#2 (comment).

Jon Pretty | @propensive

from fastparse.

lihaoyi avatar lihaoyi commented on June 8, 2024

Maybe the right thing to do is to preprocess unicode escapes, and purposely leave the source positions all wrong.

from fastparse.

sirthias avatar sirthias commented on June 8, 2024

If you implement that simple pre-processing at the ParserInput level building a simple translation map you might be able to get the cake and eat it too.

from fastparse.

paulp avatar paulp commented on June 8, 2024

My approach was to honor unicode escapes where I must (that is, it won't parse if I don't) and ignore them otherwise (in strings and comments.) That's much closer to the behavior I think is sane, and I am doubtful the power of unicode escapes to open and close strings and comments is something which requires support.

Not claiming this is especially performant or anything, just a point of reference. https://github.com/paulp/scala-parser/blob/0a1e476c712d2ba/parser/src/main/scala/Basic.scala#L24

from fastparse.

lihaoyi avatar lihaoyi commented on June 8, 2024

Unicode escapes are now supported in strings. I'm going to just punt on this in general as a #wontfix. In all the dozen projects I parsed, I think I found exactly 4 unicode escapes that dont fall in a string, all of which are in Scalac test files. Not worth my time to support this ^_^

from fastparse.

propensive avatar propensive commented on June 8, 2024

Are they supported in characters, e.g. '\u0000'? That, I think, would clear up all of the other useful cases.

from fastparse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.