Coder Social home page Coder Social logo

purescript-parsing's Introduction

Parsing

CI Release Pursuit Maintainer: jamesdbrock Maintainer: robertdp Maintainer: chtenb

A monadic parser combinator library based on Haskell’s Parsec.

Installation

Install parsing with Spago:

$ spago install parsing

Quick start

Here is a basic tutorial introduction to monadic parsing with this package.

Parsers

A parser turns a string into a data structure. Parsers in this library have the type Parser s a, where s is the type of the input string, and a is the type of the data which the parser will produce on success. Parser s is a monad. It’s defined in the module Parsing.

Monads can be used to provide context for a computation, and that’s how we use them in monadic parsing. The context provided by the Parser s monad is the parser’s current location in the input string. Parsing starts at the beginning of the input string.

Parsing requires two more capabilities: alternative and failure.

We need alternative to be able to choose what kind of thing we’re parsing depending on the input which we encounter. This is provided by the <|> “alt” operator of the Alt typeclass instance of the Parser s monad. The expression p_left <|> p_right will first try the p_left parser and if that fails and consumes no input then it will try the p_right parser.

We need failure in case the input stream is not parseable. This is provided by the fail function, which calls the throwError function of the MonadThrow typeclass instance of the Parser s monad.

To run a parser, call the function runParser :: s -> Parser s a -> Either ParseError a in the Parsing module, and supply it with an input string and a parser. If the parse succeeds then the result is Right a and if the parse fails then the result is Left ParseError.

Primitive parsers

Each type of input string needs primitive parsers. Primitive parsers for input string type String are in the Parsing.String module. For example, the primitive char :: Char -> Parser String Char parser will exactly match one literal character and then advance by one position in the input string.

We can use these primitive parsers to write other String parsers.

Writing a parser

Here is a parser ayebee :: Parser String Boolean which will accept only two input strings: "ab" or "aB". It will return true if the b character is uppercase. It will return false if the b character is lowercase. It will fail with a ParseError if the input string is anything else.

ayebee :: Parser String Boolean
ayebee = do
  _ <- char 'a'
  b <- char 'b' <|> char 'B'
  pure (b == 'B')

We can run the parser ayebee like so

runParser "aB" ayebee

and then the parser will succeed and return Right true.

More parsers

There are other String parsers in the module Parsing.String.Basic, for example the parser letter :: Parser String Char which will accept any single alphabetic letter.

Parser combinators

Parser combinators are in this package in the module Parsing.Combinators.

A parser combinator is a function which takes a parser as an argument and returns a new parser. The many combinator, for example, will repeat a parser as many times as it can. So the parser many letter will have type Parser String (Array Char).

Running the parser

runParser "aBabaB" (many ayebee)

will return Right [true, false, true].

Stack-safety

Starting with v9.0.0, all parsers and combinators in this package are always stack-safe.

Recursion

For the most part, we can just write recursive parsers (parsers defined in terms of themselves) and they will work as we expect.

In some cases like this:

aye :: Parser String Char
aye = char 'a' *> aye

we might get a compile-time CycleInDeclaration error which looks like this:

  The value of aye is undefined here, so this reference is not allowed.


See https://github.com/purescript/documentation/blob/master/errors/CycleInDeclaration.md for more information,
or to contribute content related to this error.

This is happening because we tried to call aye recursively “at a point where such a reference would be unavailable because of strict evaluation.”

The best way to solve this is to stick a Control.Lazy.defer in front of the parser to break the cycle.

aye :: Parser String Char
aye = defer \_ -> char 'a' *> aye

Resources

There are lots of other great monadic parsing tutorials on the internet.

Related Packages

Documentation

parsing documentation is stored in a few places:

  1. Module documentation is published on Pursuit.
  2. Written documentation is kept in the docs directory.
  3. Usage examples can be found in the test suite.

If you get stuck, there are several ways to get help:

Contributing

You can contribute to parsing in several ways:

  1. If you encounter a problem or have a question, please open an issue. We'll do our best to work with you to resolve or answer it.

  2. If you would like to contribute code, tests, or documentation, please read the contributor guide. It's a short, helpful introduction to contributing to this library, including development instructions.

  3. If you have written a library, tutorial, guide, or other resource based on this package, please share it on the PureScript Discourse! Writing libraries and learning resources are a great way to help this library succeed.

purescript-parsing's People

Contributors

bentongxyz avatar cdepillabout avatar cryogenian avatar dependabot[bot] avatar dretch avatar epost avatar felixschl avatar fsoikin avatar garyb avatar jamesdbrock avatar jdegoes avatar joneshf avatar jordanmartinez avatar justinwoo avatar kl0tl avatar matoruru avatar maxdeviant avatar maybejustjames avatar mlang avatar mstream avatar natefaubion avatar paf31 avatar rndnoise avatar robertdp avatar ryanartecona avatar safareli avatar th-awake avatar thomashoneyman avatar tmcgilchrist avatar triggernz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purescript-parsing's Issues

Reconsider alternative instance for Transformer typeclasses

When I first used MonadState instance for ParserT I had ParserT s (State s') a and I was expecting it to be defined as in most of transformers:

(MonadState s m) => MonadState s (ParserT s' m)

but instead it's defined as:

(Monad m) => MonadState (ParseState s) (ParserT s m)
-- `MonadThrow` and `MonadError` are defined in same fashion.

So my question is, why it's like that? I think If we had normal definitions of Monad* then ParseT would be more useful in transformer stacks (we could expose functions from this typeclasses with different name, for users to still be able to manipulate ParserT's Internal structure).

manyTill_ combinator

We should have the super-useful manyTill_ combinator.

Combinators.manyTill_

manyTill_ :: forall e m a s. Monad m => ParserT s m a -> ParserT s m e -> ParserT s m (Tuple (List a) e)

Also

someTill_ :: forall e m a s. Monad m => ParserT s m a -> ParserT s m e -> ParserT s m (Tuple (List a) e)

Add `eof` for Token parsers?

Something like this:

eof :: forall a. Parser (List a) Unit
eof =
  get >>= \(input :: List a) ->
    unless (null input) (fail "expected eof")

I guess it would go into `Text.Parsing.Parser.Token?

By the way, is it right that we need that type annotation on input? Without it, type unification fails.

Argument order in sepBy is ambiguous

Here is the type

sepBy :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a)

Usually when parsing some tokens the separator parser results in same type of value as value parser, i.e. type var a and sep will be same type, which means user can write incorrect code for example:
sepBy separatorParser valueParser vs sepBy valueParser separatorParser

It means user needs to look at the type variables to understand in which order values should be passed. But if we change type to:

sepBy :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m Unit -> ParserT s m (List a)

Then the incorrect code from above will not compile. As a result it would be way less likely to mess with the argument order. (unless you want to get Array Unit as result).

Downside would be the need to do extra $> unit or *> pure unit in the end of separator parser.

Don't name files *.purs.hs?

Everyone who uses anything that depends on this has to add more rules to their Gruntfile to handle it and it's not an obvious solution.

Can we have more typesafety w.r.t. many-like combinators and infinite loops?

Is your change request related to a problem? Please describe.
many p will get stuck in a loop if p possibly doesn't consume any input but still succeeds.

One could argue that the parser is bogus in that case. However, it would be nice to have some compile time safety guarantees for this, because a hanging application isn't fun to deal with, even if it occurs while developing.

Some solution directions that cross my mind

  • Utilizing the typesystem in some way that makes it impossible to pass a parser that possibly consumes no input but still succeeds.
  • Changing the semantics of many such that it stops (or fails?) whenever the position of the parser state hasn't moved after applying p.

Q: thoughts on takeSome and takeMany

Recently I used List.Lazy and my expectation was that parser take 2 (many (char '0')) Will try to match character 0 maximum 2 times. But it's actually trying to match 0 until it fails. So if input to parser is 0000000 it will consume all of 0-s and result will be list of two 0-s.

To solve this issue I have implemented takeSome and takeMany for normal List:

-- | Attempt a computation `n` times, requiring at least one success.
-- |
-- | The `Lazy` constraint is used to generate the result lazily, to ensure
-- | termination.
takeSome :: forall f a. Alternative f => Lazy (f (List  a)) => Int -> f a -> f (List  a)
takeSome 0 _ = pure Nil
takeSome n v = Cons <$> v <*> defer (\_ -> takeMany (n - 1) v)

-- | Attempt a computation `n` times, returning as many successful results
-- | as possible (possibly zero).
-- |
-- | The `Lazy` constraint is used to generate the result lazily, to ensure
-- | termination.
takeMany :: forall f a. Alternative f => Lazy (f (List a)) => Int -> f a -> f (List a)
takeMany 0 _ = pure Nil
takeMany n v = takeSome n v <|> pure Nil

If Infinity was member of Int then:

some = takeSome Infinity
many = takeMany Infinity

This issue is not strictly related to Parser but i think it could be useful for parsers.
What you think if we define this functions for List/Array/....

We can have on internal definition take{Some,Many} which are taking Number (to allow use of Infinity) and define some, many, takeSome and takeMany in terms of them.

takeSome = toNumber >>> takeSome_ 
some = takeSome_ Infinity
takeMeny = toNumber >>> takeMeny_ 
meny = takeMeny_ Infinity

option description

Is this description correct?

-- | Provide a default result in the case where a parser fails without consuming input.
option :: forall m s a. Monad m => a -> ParserT s m a -> ParserT s m a
option a p = p <|> pure a

The way I read the Alt instance,

instance altParserT :: Monad m => Alt (ParserT s m) where
alt p1 p2 = (ParserT <<< ExceptT <<< StateT) \(s@(ParseState i p _)) -> do
Tuple e s'@(ParseState _ _ c') <- runStateT (runExceptT (unwrap p1)) (ParseState i p false)
case e of
Left _
| not c' -> runStateT (runExceptT (unwrap p2)) s
_ -> pure (Tuple e s')

I think this description should be

-- | Provide a default result in the case where a parser fails or consumes no input.

I don't see anything in the test suite which clarifies this. Need to test.

CodePoints uncons? Deprecate drop?

I just noticed that instance stringStringLike uses CodeUnits for uncons.

instance stringLikeString :: StringLike String where
uncons = SCU.uncons
drop = S.drop

Doesn't that mean that anyChar will be wrong for astral characters?

anyChar = do
input <- gets \(ParseState input _ _) -> input
case uncons input of

Also, the drop member of StringLike is now unused?

parsing failure context label

It would be nice if we could get some context with parse failures, like Megaparsec label

In ParserT, fail is implemented as throwError

-- | Fail with a message.
fail :: forall m s a. Monad m => String -> ParserT s m a
fail message = failWithPosition message =<< position
-- | Fail with a message and a position.
failWithPosition :: forall m s a. Monad m => String -> Position -> ParserT s m a
failWithPosition message pos = throwError (ParseError message pos)

This suggests a cheap implementation strategy for label, using catchError :: forall a. m a -> (e -> m a) -> m a.

-- | If parsing fails inside this context, then prepend the String to the error String.
label :: forall m s a. Monad m => String -> ParserT s m a -> ParserT s m a
label preerror p = catchError p go
 where go (ParseError err pos) = failWithPosition (preerror <> err) pos

Actually come to think of it, anyone can use this label function without library support. Maybe I'll just do that.

I suppose if we wanted to make this more official and support it in the library, then we could change the error type to ParseError (NonEmptyList String) Position and cons the errors up as we fail out of the parser.

Export `digit`

It's referred to in the documentation, but as far as I can tell, it is only actually defined in Main (and therefore can't be imported by users of this library).

Could it go in Text.Parsing.Parsers.String?

combine purescript-string-parsers ans purescript-parsing

Not totally sure, but I think we can somehow generalize current encoding of parser such that the parser from ps-string-parsers will be an instance of it.

What I see in ps-string-parsers is that it uses different Position type which is basically a cursor in the input state. but what if we want to have same performance benefits (of not allocating new states) when we have some tokens as array, or as some structure, where moving "cursor" (position) is cheap.

We can do similar thing as Stream here to make that possible but maybe we can combine cursor based Streams and non cursor bases streams and have one parser lib.

related #62 #58

lookAhead consume on failure?

-- | Parse a phrase, without modifying the consumed state or stream position.
lookAhead :: forall s a m. Monad m => ParserT s m a -> ParserT s m a
lookAhead p = (ParserT <<< ExceptT <<< StateT) \s -> do
Tuple e _ <- runStateT (runExceptT (unwrap p)) s
pure (Tuple e s)

This seems different from the way that Parsec and Megaparsec work? They consume input in case of failure.

https://hackage.haskell.org/package/megaparsec-9.2.0/docs/Text-Megaparsec.html#v:lookAhead

https://hackage.haskell.org/package/parsec-3.1.15.0/docs/Text-Parsec.html#v:lookAhead

See also purescript-contrib/purescript-string-parsers#73

`sepEndBy` behaves incorrectly (?)

elemA = PS.string "A"
elemB = PS.string "B"
sep = PS.string ";"
parse = flip P.runParser do Tuple <$> PC.sepEndBy elemA sep <*> elemB

This now fails:

parse "A;B"

Prior to v6 it succeeded. Copying the old implementation of sepEndBy locally and implementing parse with that restores the original behaviour.

I'd say this behaviour is wrong too, it's not just that it's different now - as far as I can tell, if there's a trailing separator, and then any other input follows, the combinator fails now.

rest primitive String parser

I needed this rest parser today so I wrote something like it. It's pretty common and I think we should include it in the primitive parsers for Text.Parsing.Parser.String.

-- | Parse and return the rest of the input. Always succeeds.
rest :: forall m. Monad m => ParserT String m String
rest = do
  ParseState input position _ <- get
  set $ ParseState "" (updatePosString position input) true
  pure input

Except we can't write it for StringLike. Are we sure that the StringLike typeclass is a good idea?

rest :: forall m s. StringLike s => Monad m => ParserT s m s

Notes on performance

This is what the benchmarks currently look like:

Text.Parsing.StringParser.CodeUnits

StringParser.runParser parse23Units
mean   = 10.10 ms
stddev = 1.13 ms
min    = 9.46 ms
max    = 24.07 ms

Text.Parsing.Parser.String

runParser parse23
mean   = 44.20 ms
stddev = 6.38 ms
min    = 42.25 ms
max    = 113.16 ms

Data.String.Regex

Regex.match pattern23
mean   = 728.23 μs
stddev = 339.32 μs
min    = 613.72 μs
max    = 2.97 ms

I would like to reduce that 4× slowness between Parser.String and StringParser.CodeUnits .

The difference could be due to:

  1. CodePoint rather than Char. Everything goes through the anyCodePoint parser since #119 , but I benchmarked it at the time and it didn't make any difference.
  2. String tail state. Every time the parser advances by one character, we uncons the input string and save the tail as the new state. I tried changing that to only keeping a codeunit index into the input string on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
  3. Parsing.Parser.String input position tracking with Pos { line :: Int, column :: Int}. I tried changing that to Pos Int on this branch and it didn't make any difference. https://github.com/jamesdbrock/purescript-parsing/tree/cursor
  4. Monad transformers. When I look at the benchmark profiling, it looks like most of the time is spent in Control.Monad.State.Trans.bind and Text.Parsing.Parser.Combinators.tryRethrow. So this might be the entire problem, but improving this won't be easy.

Return NonEmpty types for combinators requiring at least one match

It would be nice to not have to check for Nil from combinators that require at least one match.

The fact that they require at least one match could be encoded in the type as either NonEmptyList a or NonEmpty List a. (not sure which would be preferable)


Environment

  • PureScript 0.12.5
  • Pulp 12.4.2
  • purescript-parsing 5.0.3

Current behavior

These functions in Text.Parsing.Parser.Combinators currently produce a (Nilable) List:

-- | Parse phrases delimited by a separator, requiring at least one match.
sepBy1 :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a)

-- | Parse phrases delimited and optionally terminated by a separator, requiring at least one match.
sepEndBy1 :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a)

-- | Parse phrases delimited and terminated by a separator, requiring at least one match.
endBy1 :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (List a)

As a workaround, I defined a modified/alternate version of sepBy1 locally:

-- | Parse phrases delimited by a separator, requiring at least one match.
sepBy1Nel :: forall m s a sep. Monad m => ParserT s m a -> ParserT s m sep -> ParserT s m (NonEmptyList a)
sepBy1Nel p sep = do
  a <- p
  as <- List.many $ sep *> p
  pure $ NonEmptyList (a :| as)

I can then use NonEmpty.unsnoc and not worry about how to handle the impossible Nothing result from List.unsnoc

Here's a live example: https://github.com/ccap/purescript-ccap-codegen/blob/2c09022f7f37a5494bf58be8f28354d7071c774a/src/Ccap/Codegen/Parser.purs#L89

(admittedly, we're not using NonEmpty as pervasively as we could...)

Expected behavior

When a list is guaranteed to be non-empty, indicate it in the type.

I would be happy to open a PR if it would be welcome. If so, do you have a preference between changing the types of the existing combinators or adding new ones (to maintain backwards compatibility)?

Rounding errors when parsing floats

I was running some generative tests on my code and I ran into an issue with high precision floats. For example:

> x = 0.7531531167929774
> runParser (show x) testTokenParser.float
(Right 0.7531531167929775)

Note the final 5 instead of 4. The problem seems to be in the code that parses the decimal part of a number. Indeed, the following equivalent code gives the same erroneous result:

> digits = [7,5,3,1,5,3,1,1,6,7,9,2,9,7,7,4]
> foldr (\d f -> (f + toNumber d) / 10.0) 0.0 digits
0.7531531167929775

I'm not sure about the best way to modify fraction so as to avoid rounding errors. Maybe using the native parseFloat?

TypeError: 'undefined' is not a function (evaluating 'unParserT(p)(s)')

Apologises for the vague error details, but I'm having trouble digging much deeper than this. The error seems to be arising somewhere in Text.Parsing.Parser

The error message in Safari is this

undefined is not a function (evaluating 'unParserT(p)(s)')

The error message in chrome is this

TypeError: unParserT(...) is not a function

The stack trace seemed to be pointing here, but the only call of unParserT with the parameters p & s mentioned in the Safari error message seems to be here, here, and here.

I'd imagine it might help seeing the code where the error originated from, so in my own code it seems to be appearing somewhere in this function here, which looks like this (see label, I included the surrounding code for context).

  --
  -- internal link             [[hello]]
  -- internal link with text   [[hello|Greetings]]
  -- external link             [[[hello]]]
  -- external link with text   [[[hello|Greetings]]]
  --
  linkAtom :: WikiTextParser WikiAtom
  linkAtom = textLink Internal
         <|> textLink External where

    textLink target = try (openingDelimiter linkDelimiter) *> do
      destination <- word <?> "link destination"
      linkLabel   <- label <?> "text label"
      pure (HyperTextAtom target destination linkLabel) where

      linkDelimiter :: Delimiter
      linkDelimiter = targetToDelimiter target

      --
      -- The problematic function
      --
      label :: WikiTextParser (Maybe.Maybe [WikiAtom])
      label = optionMaybe (try (pipe *> body)) <* closer where
        closer = closingDelimiter linkDelimiter
        body = try (skipSpace (anyText `manyTill` (lookAhead closer)))
        pipe = try (skipSpace (nextType PipeType))

The type of the Parser I'm using is just

type WikiTextParser a = Parser [WikiToken]

This code is being triggered by this test, which isn't in source code at the moment but it's in this module if that's helpful

it "internal link with label \"[[hello|world]]\"" do
  doc <- wikiText [
    OpeningDelimiter DeLink, Word "hello",
    Pipe, Word "greetings",
    ClosingDelimiter DeLink, EndOfInput]
  doc @?= [Paragraph [
    HyperTextAtom Internal "hello" (Maybe.Just [WordAtom "world"])
  ]]

What it's meant to be doing is parsing tokens for [[hello|world]] (which is a wikitext link, with "display text"), it seems to be choking on the |world part, as it seems to parse [[hello]] (a wikitext link without "display text") just fine (according to the tests).

Not sure if the error is on the library side or my own, I'm just using the standard parser type, without defining any unsafe code in that library. But if it is an issue other people are having hopefully there's enough information here to show where the issue is.

Implementation of `string` is not ideal

string :: forall s m. StringLike s => Monad m => String -> ParserT s m String
string str = do
input <- gets \(ParseState input _ _) -> input
case indexOf (wrap str) input of
Just 0 -> do
modify_ \(ParseState _ position _) ->
ParseState (drop (length str) input)
(updatePosString position str)
true
pure str
_ -> fail ("Expected " <> show str)

This will perform a linear scan to find a match anywhere, before checking if that match is at index 0. It should probably just take whatever the length of the desired token is and check for eq.

Maximum call stack size exceeded

Hi,

I was wondering if anyone has an idea why large strings causes stack limit exceptions. While working on my bbcode parser which uses purescript-parsing, I ran into the issue that big forum posts cause exceptions. The strings in these forum posts are not continuous, such as in these tests I wrote. So it could be an entirely different issue in those specific cases. However, here is some code that causes stack limit exceptions via large strings:

https://github.com/adarqui/cardiac-arrest/blob/master/experiments/parsing/big-strings/purescript/purescript-parsing

more specifically:
https://github.com/adarqui/cardiac-arrest/blob/master/experiments/parsing/big-strings/purescript/purescript-parsing/test/Main.purs

Here is the example output from those tests:

* Running tests...
→ Suite: test big strings
  → Testing string of size: 1024
→ Testing string of size: 2048
→ Testing string of size: 3072
☠ Failed: simple strings because Maximum call stack size exceeded
→ Testing string of size: 1024
→ Testing string of size: 2048
→ Testing string of size: 3072

1 test failed:

In "test big strings":
  In "simple strings":
    Maximum call stack size exceeded

→ Testing string of size: 1024
→ Testing string of size: 2048
→ Testing string of size: 3072

So somewhere around ~3072 bytes, purescript-parsing throws a stack limit exception.

Thanks!

Library needs a quick start in the README

Is your change request related to a problem? Please describe.
As described in the documentation section of the Library Guidelines, Contributors libraries are expected to have in their README a short summary of the library's purpose, installation instructions, a quick start with a minimal usage example, and links to the documentation and contributing guide.

This library currently doesn't have a completed quick start in the README.

Describe the solution you'd like
The library needs a quick start section after the installation instructions. machines is a good example of a library with a quick start.

Additional context
See the Governance repository for more information about requirements in the Contributors organization.

`try` is broken

https://github.com/purescript-contrib/purescript-parsing/blob/master/src/Text/Parsing/Parser/Combinators.purs#L71-L75 According to this, when a parser fails, it is not restoring the initial input and position, but instead always propagates it from the inner parser.

I think this is largely masked for most String parsers because statisfy uses try https://github.com/purescript-contrib/purescript-parsing/blob/master/src/Text/Parsing/Parser/String.purs#L65 when it shouldn't. If backtracking is opt-in, none of the combinators should be using it, right?

If you look at the very first test https://github.com/purescript-contrib/purescript-parsing/blob/master/test/Main.purs#L417-L420 the expected result is wrong. many can match 0 items, so this example should always pass with Nil with o left in the input.

match combinator

We should have the super-useful match combinator.

To write this combinator for Text.Parsing.Parser.String, I think we will need some more members of StringLike.

class StringLike s where
drop :: Int -> s -> s
indexOf :: Pattern -> s -> Maybe Int
null :: s -> Boolean
uncons :: s -> Maybe { head :: Char, tail :: s }

Maybe if we have

length :: s -> Int
length = Data.String.length

take :: Int -> s -> s
take = Data.String.take

Then we can write

match :: forall m s a.  ParserT s m a -> ParserT s m (Tuple a s)
match p = do
  ParseState input1 _ _ <- get
  x <- p
  ParseState input2 _ _ <- get
  pure $ Tuple x $ take (length input1 - length input2) input1

Generalize StringLike?

Currently StringLike is implemented by String and looks like it was in mind to only allowe structures which contain some Chars.

But I think it could be easily changed to support arbitrary containers of arbitrary data.
for example one might define some sumtype Data Color = Red | Blue| Orange ... and have wants to parse List Color into some other structure or maybe even in itself and just validate that colors in the List are in correct order or whatever.

But the StringLike is limiting this kind of use.
What i think is that the class could be changed to:

class StringLike f where
  drop :: forall a. Int -> f a -> f a
  indexOf :: forall a. f a -> f a -> Maybe Int
  null :: forall a. f a -> Boolean
  uncons :: forall a.  f a -> Maybe { head :: a, tail :: f a }

This way one could implement StringLike for List/Array ... and for example anyChar will change to

anyChar :: forall f s m. StringLike f => Monad m => ParserT (f s) m s

add oneOfAs

oneOfAs :: forall s m a. (StringLike s, Monad m) => Array (Tuple Char a) -> ParserT s m a

(Haven't seen such function and i came up with this name, so it could be changed)

This will allow to change something like this:

digit   m. Monad m  P.ParserT String m Int
digit = do
  char ← PS.oneOf ['0','1','2','3','4','5','6','7','8','9']
  case char of
    '0' → pure 0
    '1' → pure 1
    '2' → pure 2
    '3' → pure 3
    '4' → pure 4
    '5' → pure 5
    '6' → pure 6
    '7' → pure 7
    '8' → pure 8
    '9' → pure 9
    _ → P.fail "Incorrect digit, impossible situation"

to

digit   m. Monad m  P.ParserT String m Int
digit = PS.oneOfAs $
  [ Tuple '0' 0
  , Tuple '1' 1
  , Tuple '2' 2
  , Tuple '3' 3
  , Tuple '4' 4
  , Tuple '5' 5
  , Tuple '6' 6
  , Tuple '7' 7
  , Tuple '8' 8
  , Tuple '9' 9]

`satisfy` and `when` return incorrect error locations

Both of these consume a token, and then throw an error when the predicate fails. This means that the error location will always point to the next token, and not the token to which we apply the predicate. Perhaps try needs to reannotate the error location?

Documentation directory needs a tutorial

Is your change request related to a problem? Please describe.
As described in the documentation section of the Library Guidelines, Contributors libraries are expected to have some documentation in the docs directory -- specifically, at least a short tutorial that expands on the quick start in the README.

This library currently doesn't have comprehensive documentation in the docs directory.

Describe the solution you'd like
At least a short tutorial needs to be added to the docs directory, or other documentation as described in this Divio article.

The argonaut-codecs docs directory has a good example of expanded documentation for a Contributor library. But it would even be useful to add something considerably smaller and shorter to this library.

Additional context
See the Governance repository for more information about requirements in the Contributors organization.

empty string parser consumed?

This applies to the previous implementation as well, but if passed the empty string won't this mark the input as consumed without actually consuming anything?

This is not feedback, just a discussion point.

Originally posted by @robertdp in #119 (comment)

lazy errors

We should probably change all error messages to be lazy like this with defer.

I chose to make error messages deferred to avoid the cost of generating the error messages until they're needed. Feels like a small thing, but the waste just bothers me too much.

Originally posted by @fsoikin in #127 (comment)

Problem parsing escaped characters.

I'm trying to parse some tab-separated integers, and I think I've run into a bug. To give a simplified example, if I run these three expressions:

import Text.Parsing.Parser.Language (haskell)

x = runParser "123-456" (tuple3 <$> haskell.integer <*> char '-' <*> haskell.integer)

y = runParser "123\t456" (tuple3 <$> haskell.integer <*> char '\t' <*> haskell.integer)

z = runParser "123\n456" (tuple3 <$> haskell.integer <*> char '\n' <*> haskell.integer)

The first one succeeds, but the second two both fail.

Am I misunderstanding something about how escape-characters are parsed, or have I found a bug?

(For comparison, similar code run against purescript-string-parsers works for me just fine...)

getting `Expected Expected` in errors

If you run:
> runParser "0" (sequence [P.char '0', P.char '0'])

You will get:
(Left (ParseError "Expected Expected '0'" (Position { line: 1, column: 2 })))

didn't investigated this hard enough so don't know the reason yet.

Float parser does not parse negative numbers

Describe the bug
Token parser created by (makeTokenParser emptyDef).float does not parse negative numbers.

To Reproduce
In repl, import Text.Parsing.Parser, Text.Parsing.Parser.Language and Text.Parsing.Parser.Token.
Evaluate runParser "(-6.0)" (makeTokenParser emptyDef).float (parentheses don't matter).

Expected behavior
Right (-6.0)

Actual behavior
(Left (ParseError "Expected float" (Position { line: 1, column: 1 })))

Additional context
I think the parser should be able to parse negative numbers. integer parser already does. Perhaps add a sign-aware parser?

Implement `Compactable` and `Filterable` for `ParserT`?

Yesterday i noticed that a Filterable instance could be useful for ParserT

Let's say you want to sometimes discard a parse based on some condition or predicate. For example, let's say you have a parseCSV :: Parser String (List (List String)), which parses a CSV input to a list of lines (lines being lists of strings, one string per value between commas or newlines). The first row of the CSV is usually just all the column names, so let's say i want to remove that line using tail :: forall a. List a -> Maybe (List a). I could then use filterMap :: forall a b f. Filterable f => (a -> Maybe b) -> f a -> f b to remove that like this: filterMap tail parseCSV. The result of this would be a parser that ignores the first line of the CSV, and only gives the rest. It would also fail if the CSV input only has one line. Additionally, a custom error message can be provided like this (i think): filterMap tail parseCSV <?> "CSV does not contain any data, only column names!"

The implementation would be similar to the Filterable instance for Maybe and Either, in the sense that it's only got one item to filter (as opposed to a list).

I could write up a PR for this, just wanted to hear some opinions first.

Natural transformation is too strong constraint for hoistParserT?

If you remove type declaration from hoistParserT compiler will infer this type:

hoistParserT ::  b n s a m.
  (   m (Tuple (Either ParseError a) (ParseState s))
   -> n (Tuple (Either ParseError b) (ParseState s))
  )
   -> ParserT s m a -> ParserT s n b

This is much useful than (m ~> n) -> ParserT s m a -> ParserT s n a.
For example if you want a function like this:
todo :: ∀ m x. x -> ParserT s (State x) Unit -> ParserT s m x
You wouldn't find anything that fit's this except hoistParserT, but you can't push x from State out. If type was weakened then it's possible to write this:

todo ::  m x. x -> ParserT s (State x) Unit -> ParserT s m x
todo i p = hoistParserT unState p
  where
    unState ::  z y m. State x (Tuple (Either y Unit) z) -> m (Tuple (Either y x) z)
    unState s = case runState s i of
      Tuple (Tuple e state) res -> pure (Tuple (e $> res) state)

So I'm proposing weakening type signature. as otherwise one would need to copy hoistParserT and change signature, which I think is not desired.


If reason, for using natural transformation, is to not expose internal structure of ParserT, then we can masaje it a bit:

hoistParserT ::  b n s a m.
  ( e s'.   m (Tuple (Either e a) s')
   -> n (Tuple (Either e b) s')
  )
   -> ParserT s m a -> ParserT s n b

or something like this:

-- name is arbitrary
type Internal s a = Internal (Tuple (Either ParseError a) (ParseState s))
-- derive functor

hoistParserT ::  m a n b. (m (Internal s a) -> n (Internal s b)) -> ParserT s m a -> ParserT s n b
-- or don't even expose arbitrary and change type to:

hoistParserT ::  m a n b f. (Functor f) => (m (f a) -> n (f b)) -> ParserT s m a -> ParserT s n b
hoistParserT f (ParserT m) = ParserT (mapExceptT (mapStateT g) m)
  where
  g = (map Internal) >>> f >>> (map unInternal)

unInternal (Internal x) = x

-- this way the `todo` implementation will look like this
todo ::  m x. x -> ParserT s (State x) Unit -> ParserT s m x
todo i p = hoistParserT unState p
  where
    unState ::  m x f. (Functor f) => State x (f Unit) -> m (f x)
    unState s = case runState s i of
      -- I like this approach as this line is much simpler that in prev implementation
      Tuple f res -> pure f $> res

hoist is not suitable for this new type then maybe call it something else? for example mapParserT and change hoistParserT:

+mapParserT f (ParserT m) = ParserT (mapExceptT (mapStateT f) m)
hoistParserT :: (m ~> n) -> ParserT s m ~> ParserT s n
-hoistParserT f (ParserT m) = ParserT (mapExceptT (mapStateT f) m)
+hoistParserT = mapParserT

v7 proposals

Here are some things I would like to see in v7 of this package.

The target design space for this package should be similar to MegaParsec: intended for users who prefer correctness and feature-completeness to speed. Anyone who wants speed in a V8 runtime environment will use the built-in Regex.

Text.Parsing.Parser

-- | Contains the remaining input and current position.
data ParseState s = ParseState s Position Boolean

Change the definition of ParseState so that we can have cursor-based state in parsers, and so that line-column state is optional.

Tracking the newline-based line and column position is an important feature but it’s expensive and rarely-used. I would like to try to make that optional.

  1. I'd like to switch to a cursor-based state for String parsers, instead of a state which tracks “the remaining input”.

Do we need the Boolean “consumed flag” in the ParseState? As far as I can tell this is set but never tested. Nothing cares what the “consumed flag” value is?

  1. Make the Position zero-based. #94
data ParseState s state = ParseState s state

Text.Parsing.Parser.Combinators

  1. Add combinators manyTill, many1Till_ #108

Text.Parsing.Parser.String

  1. UTF-16 correctness. We should always handle UTF-16 surrogate paris correctly, and that means always treating token as CodePoint instead of CodeUnit. #109
  2. Delete the StringLike typeclass. Has anyone ever created an instance of this class for a type other than String?
  3. Add combinator match #107

Text.Parsing.Parser.DataView

  1. Add DataView parsing to this package? rowtype-yoga/purescript-parsing-dataview#10

Module names

  1. Remove the Text. prefix from all module names.

Text.Parsing.Parser.ArrayBuffer

I want to parse the raw Uint8 bytes of a Javascript ArrayBuffer. You know, like the ByteString instance for Megaparsec Streams, or Data.Attoparsec.ByteString.

Is there a good way to do this in Purescript? I can't find it. If this doesn't exist, maybe we could add a new module to purescript-parsing called Text.Parsing.Parser.ArrayBuffer? It seems like purescript-parsing was designed to be extended in this way.

This module would be a peer of Text.Parsing.Parser.String and Text.Parsing.Parser.Token, and similarly constructed.

Or maybe this would be better as a separate library, since it would add some new dependencies:

https://pursuit.purescript.org/packages/purescript-arraybuffer
https://pursuit.purescript.org/packages/purescript-arraybuffer-types

Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.