Coder Social home page Coder Social logo

Comments (10)

jamesdbrock avatar jamesdbrock commented on September 27, 2024

@paf31 introduces StringLike #36 to “support more efficient string representations.”

What kind of representations? The only thing I can think of is something like CatList<String>, which could be a more efficient “string” representation if the the “string” is a large document which is being edited, or is being lazily read in chunks out of a large file?

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

We should combine parsing and string-parsers #69

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

https://github.com/Thimoteus/SandScript/wiki/2.-Parsing-recursively

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

Here's a parsing monad.

https://github.com/natefaubion/purescript-language-cst-parser/blob/main/src/PureScript/CST/Parser/Monad.purs

Here's a CPS purescript parsing monad

https://github.com/cdepillabout/cabal2nixWithoutIFD/blob/main/purescript-parser-combinator/src/Parsec.purs

Adapted from

https://github.com/jonascarpay/alloy/blob/master/src/Parser/Parsec.hs

Adapted from Parsec.

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

I would want this library to be as full-featured as possible, and to have these properties (which we mostly already have):

  • Stack-safe
  • Auto-backtracking (if a parser fails then it consumes no input)
  • Monad-transformable
  • Input streams extendable with build-in support for
    • String UCS-2 Big-Endian
    • String UCS-2 Little-Endian
    • String UTF-16 Big-Endian
    • String UTF-16 Little-Endian
    • UInt8Array UTF-8
    • DataView
    • forall token. List<token>

Node.js only supports UTF-16 Little-Endian https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings

https://mathiasbynens.be/notes/javascript-encoding

https://kevin.burke.dev/kevin/node-js-string-encoding/

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

The purescript-string-parsers Text.Parsing.StringParser.CodePoints module has the design decision to

  • Use a cursor in units of code points.
  • Return a Char.

A better design would be to

  • Use a cursor in units of code units (and increment by two for astral characters)
  • Return a CodePoint.

purescript-contrib/purescript-string-parsers#48

We could use this getWholeChar function.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charAt#getting_whole_characters

Or actually the CodePoints.uncons might suffice

https://github.com/purescript/purescript-strings/blob/157e372a23e4becd594d7e7bff6f372a6f63dd82/src/Data/String/CodePoints.purs#L191-L191

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

Actually, maybe there is no performance improvement to be had with a cursor-based parser state?

purescript/purescript-strings#120

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

Instructions for how to parse a String with Regex, then switch to Parser, then switch back to Regex. This should be a supported use case, considering that Parser is 100× slower than Regex.

Also support the opposite case, with a parseRegex :: Regex -> ParserT String m (Array String).

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

More package properties

  • Does not release (does not free the memory of) input already consumed.
  • Does not allow for continuation of more input received (like Attoparsec).

from purescript-parsing.

jamesdbrock avatar jamesdbrock commented on September 27, 2024

instance altParserT :: Monad m => Alt (ParserT s m) where
alt p1 p2 = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState i p _)) -> do
Tuple e s'@(ParseState _ _ c') <- runStateT (runExceptT (unwrap p1)) (ParseState i p false)
case e of
Left _
| not c' -> runStateT (runExceptT (unwrap p2)) s
_ -> pure (Tuple e s')

I think the purpose of the consumed flag is to do something like this?

Note that if p succeeds without consuming input the second alternative is favored if it consumes input. This implements the “longest match” rule.

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf p.11

Except the way that I read this Alt instance is that it favors p2 if p1 failed and consumed no input.

from purescript-parsing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.