Coder Social home page Coder Social logo

fastparse's Introduction

FastParse Build Status Join the chat at https://gitter.im/lihaoyi/Ammonite

This is where the code for the FastParse parsing library lives! If you want to use Fastparse, you probably will want to check out the documentation:

If you use FastParse and like it, you will probably enjoy the following book by the Author:

Hands-on Scala has uses FastParse extensively throughout the book, with the entirety of Chapter 19: Parsing Structured Text dedicated to the library and Chapter 20: Implementing a Programming Language making heavy use of it. Hands-on Scala is a great way to level up your skills in Scala in general and FastParse in particular.

For a good hands-on tutorial working through the basics of how to use this library, check out the following blog post:

This readme contains some developer docs, if you intend on working on the fastparse repo, not just using it as a library.

Developer Docs

The core of FastParse lives in the fastparse/ folder. It is cross-built ScalaJVM/Scala.js codebase, with almost everything shared between the two platforms in the fastparse/src/ and minor differences in fastparse/src-js/ and fastparse/src-jvm/.

The three subprojects scalaparse/, pythonparse/ and cssparse/ are FastParse parsers for those respective languages. These are both usable as standalone libraries, and also serve as extensive test-suites and use-cases for FastParse itself. Each of those projects clones & parses large quantities of code from Github as part of their own test suites.

perftests/ constains performance tests for main projects in the library including ScalaParse, PythonParse, CssParse, readme/ contains the documentation site, which contains several live demos of FastParse parsers compiled to Scala.js. These all live in demo/.

Common Commands

Note: you should use mill 0.11 or later.

  • mill -w "fastparse.jvm[2.12.10].test" runs the main testsuite. If you're hacking on FastParse, this is often where you want to go

  • You can run the other suites via fastparse.js, scalaparse.jvm, etc. if you wish, but I typically don't and leave that to CI unless I'm actively working on the sub-project

  • You can use mill -w "fastparse.jvm[_].test" to run it under different Scala versions, but again I usually don't bother

  • mill __.test.test is the aggregate test-all command, but is pretty slow. You can use mill "__.jvm[2.12.17].test" to run all tests only under JVM/Scala-2.12, which is much faster and catches most issues

  • mill demo.fullOpt && sbt readme/run builds the documentation site, which can then be found at readme/target/scalatex/index.html

Contribution Guidelines

  • If you're not sure if something is a bug or not, ask on Gitter first =)
  • All code PRs should come with: a meaningful description, inline comments for important things, unit tests, and a green build
  • Non-trivial changes, including bug fixes, should appear in the changelog. Feel free to add your name and link to your github profile!
  • New features should be added to the relevant parts of the documentation
  • To a large extent, FastParse is designed so that you can extend it in your own code without needing to modify the core. If you want to add features, be prepared to argue why it should be built-in and not just part of your own code.
  • It's entirely possible your changes won't be merged, or will get ripped out later. This is also the case for my changes, as the Author!
  • Even a rejected/reverted PR is valuable! It helps explore the solution space, and know what works and what doesn't. For every line in the repo, at least three lines were tried, committed, and reverted/refactored, and more than 10 were tried without committing.
  • Feel free to send Proof-Of-Concept PRs that you don't intend to get merged.
  • No binary or source compatibility is guaranteed between any releases. FastParse is still in the 0.x.y phase of development, which means it's still under rapid development and things do change. On the other hand, upgrading is usually trivial, and I don't expect existing functionality to go away

License

The MIT License (MIT)

Copyright (c) 2014 Li Haoyi ([email protected])

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

fastparse's People

Contributors

aborg0 avatar alain-bearez avatar alexarchambault avatar byf avatar chengpohi avatar dwijnand avatar gitter-badger avatar ichoran avatar jeremydhoon avatar lefou avatar lihaoyi avatar lolgab avatar martinsenne avatar martinweindel avatar mgzuber avatar olafurpg avatar reid-spencer avatar rklaehn avatar robmwalsh avatar rspier avatar runtologist avatar russwyte avatar sethtisue avatar stanch avatar taisukeoe avatar thesamet avatar triggernz avatar ttsymlov avatar vovapolu avatar xieyuheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastparse's Issues

compiling StringIn takes too long

90 seconds to compile StringIn with a 17 character string is way too long.

println("string length,time to compile StringIn (ms)")

for (subset <- "aaaaaaaaaaaaaaaaa".tails.toSeq.reverse) {

  val start = System.currentTimeMillis()

  StringIn(subset)

  val end = System.currentTimeMillis()

  println(s"${subset.size},${end - start}")
}
string length,time to compile StringIn (ms)
0,16
1,5
2,1
3,2
4,4
5,10
6,16
7,17
8,23
9,52
10,79
11,155
12,438
13,1274
14,3264
15,10069
16,29726
17,90797

monadic Parsed

It would be very useful to have some monadic functions to run on parsed result (in a similar way like we work with Try-s), things: like map, recover, orElse, getOrElse, etc

Include parsers for standard (Java) types

I often have to parse primitive Java/Scala types (Int, Boolean, Double), and while for some types, the parser is written quickly, the parser for floating point numbers is particularly complicated.

One great thing about the Scala stdlib parser library was its inclusion of JavaTokenParsers, I think.

I suggest to add just parsers for the primitive Java/Scala types. If you want to include these parsers, I'm happy to supply a pull-request. Maybe these parsers can reside within a package object under fastparse.parsers.javatokens. I'm not sure whether to supply only the parser, or also the mapping to the number types.

The needed parsers would be:

  • integral: Byte, Int, Long
  • decimal: Float, Double
  • boolean: Boolean

Parser doesn't generate any output

I have this code:

import fastparse._
object playGround extends App{

  someParsing()

  def someParsing() = {
    val ID =
      P(!CharIn('0' to '9') ~ (
          CharIn('0' to '9').rep ~
          CharIn('a' to 'z').rep ~
          CharIn('A' to 'Z').rep ~
          "-".rep ~ "_".rep).rep ~ " "
      )

    ID.parse("a ")
  }
}

and when I run it in SBT, it just hangs and doesn't return anything!

CharIn whitespace problem

I found that overriding WhitespaceApi also affects CharIn's behavior. Is this a bug?

import fastparse.WhitespaceApi
import fastparse.noApi._

val White = WhitespaceApi.Wrapper{
  import fastparse.all._
  NoTrace(" ".rep)
}
import White._

val x = P("abc" ~ CharIn("def").rep.!)
x.parse("abcdd d")

// result: res0: fastparse.core.Result[String] = Success(dd d,7)

scalaparseJVM/test fails on master with OutOfMemoryError

noticed in the context of the Scala community build:

[fastparse] Checking Dir target/repos/scala
[fastparse:error] <console>:2: warning: Detected apparent refinement of Unit; are you missing an '=' sign?
[fastparse:error]     def f1(a: T): Unit { }
[fastparse:error]                        ^
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error]   at java.util.zip.ZipFile.getInflater(ZipFile.java:455)
[fastparse:error]   at java.util.zip.ZipFile.getInputStream(ZipFile.java:374)
[fastparse:error]   at java.util.jar.JarFile.getInputStream(JarFile.java:447)
[fastparse:error]   at sun.misc.URLClassPath$JarLoader$2.getInputStream(URLClassPath.java:940)
[fastparse:error]   at sun.misc.Resource.cachedInputStream(Resource.java:77)
[fastparse:error]   at sun.misc.Resource.getByteBuffer(Resource.java:160)
[fastparse:error]   at java.net.URLClassLoader.defineClass(URLClassLoader.java:454)
[fastparse:error]   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
[fastparse:error]   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
[fastparse:error]   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
[fastparse:error]   at java.security.AccessController.doPrivileged(Native Method)
[fastparse:error]   at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
[fastparse:error]   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[fastparse:error]   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[fastparse:error]   at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
[fastparse:error]   at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
[fastparse:error]   at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
[fastparse:error]   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[fastparse:error]   at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[fastparse:error]   at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[fastparse:error]   at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded
[fastparse:error] java.lang.OutOfMemoryError: GC overhead limit exceeded

should I just crank up the heap size, or you think there's a real regression here you want to investigate?

Parser.rep does not handle max=0 correctly.

Description and Reproduction

When using .rep with min = 0 and max = 0

println( P("  ".rep(min=0, max=0) ~ End).parse("  "))
println( P("  ".rep(min=1, max=1) ~ End).parse("    "))

the result is

Success((),2)
Failure(End:1:3 ..."  ")

Second parser and second result are correct.
First parser with min=0 and max=0 and a Success result is not correct, because

  • max is not fulfilled (with a successful parse, the "determined" number of repetitions is 1, which is larger than a max of 0)

Proof (of consistency)

println( P("" ~ End).parse("  ")) 

delivers

Failure(End:1:1 ..."  ")

as expected.

Expected result after fix

  • First parser should return a Failure.
  • No input is consumed

Support for completion

Hi there, how easy would it be to adapt the library to support tab completion for e.g. a REPL grammar, similarly to what is supported by the sbt parsers? Thanks

Logged indexes seem to be off

haoyi-haoyi@ object Foo{
               import fastparse.all._
               val plus = P( "+" )
               val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
               val side = P( "(" ~! expr ~! ")" | num ).log()
               val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}.log()
             }
haoyi-haoyi@ Foo.expr.parse("(1+(2+3x))+4").asInstanceOf[fastparse.core.Result.Failure].index
+expr:0
  +side:0
    +expr:1
      +side:1
      -side:1:Success(2)
      +side:3
        +expr:4
          +side:4
          -side:4:Success(5)
          +side:6
          -side:6:Success(7)
        -expr:4:Success(7)
      -side:3:Failure(side:3 / ")":3 ..."(2+3x))+4", cut)
    -expr:1:Failure(expr:1 / side:3 / ")":1 ..."1+(2+3x))+", cut)
  -side:0:Failure(side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
-expr:0:Failure(expr:0 / side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
res76: Int = 7

This should probably be ")":7 rather than ")":0. The final index seems right, so something is funky in the logging

true stream parsing

Hi @vovapolu . I heard you are hacking on FastParse this summer. One thing that is similar to the changes you are doing would be support for true stream parsing. I actually opened #39 for this last year. It worked, but we never investigated the performance implications. There may be some refinements for speedup needed to avoid the control flow via exceptions. Maybe you want to take a crack at it :)?

One concrete use case I have personally would be improving Scala error messages with a streaming error message parser. Combining https://github.com/cvogt/cbt/ with https://github.com/cvogt/scalac-cosmetics/

Infinite loop while constructing ParseError

I have this problem where I'm getting stuck in an infinite loop while constructing ParseError. So the actual parsing completes, results into an error and then when I try to construct ParseError it's getting stuck into a loop while building the error trace. I'm using Fastparse version 0.3.4.

It looks like it's getting stuck here. The stack looks like this while it's stuck:

      at scala.collection.generic.Growable$class.loop$1(Growable.scala:54)
      at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:57)
      at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:183)
      at scala.collection.immutable.List.$colon$colon$colon(List.scala:128)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:444)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      - locked <0x961> (a fastparse.parsers.Combinators$Rule)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      - locked <0x964> (a fastparse.parsers.Combinators$Rule)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
      at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
      at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
      at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
      at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
      at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Repeat.rec$3(Combinators.scala:373)
      at fastparse.parsers.Combinators$Repeat.parseRec(Combinators.scala:409)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.parsers.Combinators$Sequence$Flat.rec$1(Combinators.scala:247)
      at fastparse.parsers.Combinators$Sequence$Flat.parseRec(Combinators.scala:268)
      at fastparse.parsers.Combinators$Either.rec$4(Combinators.scala:439)
      at fastparse.parsers.Combinators$Either.parseRec(Combinators.scala:447)
      at fastparse.parsers.Combinators$Capturing.parseRec(Combinators.scala:22)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.fastparse$parsers$Combinators$Rule$$res$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:32)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.WhitespaceApi$CustomSequence.parseRec(WhitespaceApi.scala:26)
      at fastparse.parsers.Combinators$Rule.res$lzycompute$1(Combinators.scala:142)
      at fastparse.parsers.Combinators$Rule.parseRec(Combinators.scala:147)
      at fastparse.core.Parsed$TracedFailure$.apply(Parsing.scala:199)
      at fastparse.core.Parsed$Failure$Extra$Impl.traced$lzycompute(Parsing.scala:116)
      - locked <0x982> (a fastparse.core.Parsed$Failure$Extra$Impl)
      at fastparse.core.Parsed$Failure$Extra$Impl.traced(Parsing.scala:116)
      at fastparse.core.ParseError.<init>(Parsing.scala:34)

To reproduce:

Clone this and run the tests

As discussed at 29-02-2016 in gitter the solution might be to provide a flag to ParseError which can be plumbed down into failure.extra.traced which disables the construction of traceParsers or to call .distinct on traceParsers at every step, to stop it from blowing up to infinity in these cases

Feature request: error recovery

Would it be possible to implement error recovery, like parboiled1 does (trying to synchronize the input by adding or deleting tokens from the token stream)? This would allow many more uses for fastparse, like creating very powerful editors for DSLs, in combination with scala.js and CodeMirror.

parsing a Stream[Char]

Any plans to support this? How hard would it be, what would be good places to look, if I wanted to add it?

I have a use case where I want to parse a stream of lines that are not succeeded by a line break, but proceeded. There can be significant wait between the individual lines, so I need to parse and process a line before it's terminating new line is sent. Most line based streaming stuff breaks on that unfortunately, so I imagine a Stream[Char] would be the right thing here.

Also see tpolecat/atto#11 which supports Streams of lines, not chars if I understand correctly

Typo in documentation

Under "Writing Parsers" - "Capturing":

captureOpt is a Parser[Opt[String]]

That should be

captureOpt is a Parser[Option[String]]

Write a cookbook of common patterns and techniques

Not everything that's useful lives in the library as a primitive, operator or class.

Some patterns aren't used widely enough, while others are difficult to encapsulate in a helper that's generic enough to be used in all cases, and others are so abstract that they're more developer workflows than code. Nevertheless, these are things that have been learned before writing Scalaparse/Pythonparse/Scalatex and other parsers, and are worth writing down somewhere so others can learn from it.

Here are a few from gitter:

Scoped cuts

Cuts are a nice feature for generating error information and for simplifying parsing, but they appear to be global in nature. That is, you can't (as far as I can tell) provide a scope for the cut that essentially says: for this parsing branch you treat this as a cut, but if you bail out on this entire branch, there's still another one to consider.

A motivating example: data may usually be in a format that can be efficiently represented, such as an array of ints, but may (rarely) contain valid yet not-efficiently-represented data (e.g. a struct). Scoped cuts can allow you to get precise information when it really must be an array of ints without having to write a second parsing routine to handle the case where non-Int input is okay.

Of course there are ways to do this: do lax parsing, then filter by sub-parsing the captured string using cuts. This, however, is inelegant and inefficient.

There is an additional consideration which is if one has scoped cuts, can you "cut more deeply" to make a cut that will escape one or more levels of scoping (or perhaps can escape scopes of particular names)? I do not yet have an opinion on whether this is a good idea.

The minimal syntax would look something like

val Num = CharsWhile(c => c >= '0' && c <= '9').!.map(_.toInt)
val Str = ("\"" ~ CharsWhile(c => c != '"').! ~ "\"")
val NumArray = Num.rep(1, " " ~! Pass)
val AnyArray = Scoped(NumArray) | (Num | Str).rep(1, " " ~! Pass)

Unexpected string capturing of Lookahead

val aa: P[String] = P("aa").!
val aaLA: P[(String, String)] = (&(aa)).! ~ aa

I'd have expected this to succeed, but it fails: val Success(("aa", "aa"), _) = aaLA.parse("aa")

Instead this succeeds: val Success(("", "aa"), _) = aaLA.parse("aa")

capture the whole parsed string along side with captured substrings.

(a.! ~ b.! ~c.!).! is a Parser[String], ignoring the inner captures. There should be a way to capture all 4 things like in a regex ((...)(...)(...)).

Apparently "it should be trivial to implement by taking the source code for Capture and making it append to a tuple instead of replacing the innards".

Just recording it here, hopefully I can contribute this some time.

Missing git tags for Releases.

It appears git tags are missing for the following releases that are listed in the changelog:
0.1.6 - 0.1.7
0.3.1

and the following versions in maven:
0.1.2 - 0.1.7
0.3.0 - 0.3.1

Either problem

Please try running this two ways.

  1. target = numberFirst
  2. target = textFirst

The output should be the same for both but it is not.

  1. numberFirst:

    <

  2. textFirst:

    Some text<

object BugReport extends App {
  object MyParser {
    val number = P(CharPred( (ch:Char) => {ch.isDigit || ch == '.'}).rep.! )
    val text = P(CharPred( (ch:Char) => {ch.isSpaceChar || ch.isLetterOrDigit}).rep.! )
    val textFirst = (text | number).!
    val numberFirst = (number | text).!
    def target = numberFirst
    def parseItem(str: String) = target.parse(str)
  }

  val input = "Some text"
  MyParser.parseItem(input) match {
    case Result.Success(res, _) =>
      println(">" + res + "<")
    case x => println("Could not parse the input string:" + x)
  }
}

Obtain line number and column

For debugging purposes, I would like to assign the line number and the column to my tree nodes. How do I obtain these information?

Should flatMap intersperse whitespace when using WhitespaceApi ?

We want to parse the following: the first line contains the size, the second line an integer sequence of that size. We use WhitespaceApi to define spaces and tabs as whitespace; line endings are considered significant and have their own parser.

We first parse the size, then the repeated sequence of integers using flatMap. However, flatMap does not eat whitespace between the first parser and the second parser. Written without an explicit whitespace token (see Scala code below), the following will parse:

2
3 4

but not the following:

2
  3 4

Code below:

import fastparse.WhitespaceApi

object Test extends App {

  // whitespace contains spaces and tabs
  val White = WhitespaceApi.Wrapper{
    import fastparse.all._
    NoTrace(CharsWhile(" \t".contains(_)).?)
  }

  import White._
  import fastparse.noApi._

  // line endings

  val lineEnding: P[Unit] = P("\r".? ~ "\n")

  // non-negative integer
  val nnInt: P[Int] = P( CharIn('0'to'9').repX(1).!.map(_.toInt) )

  // sized sequence of integers, separated by whitespace
  def seqInt(n: Int): P[Seq[Int]] = nnInt.rep(min=n, max=n)

  // size followed by sized sequence
  val sizeAndSeqInt: P[Seq[Int]] = (nnInt ~ lineEnding).flatMap( n => seqInt(n) )

  val sizeAndSeqInt1: P[Seq[Int]] = (nnInt ~ lineEnding).flatMap( n => Pass ~ seqInt(n) )

  sizeAndSeqInt.parse("2\n3 4").get // Success

  sizeAndSeqInt1.parse("2\n 3 4").get // Success

  sizeAndSeqInt.parse("2\n 3 4").get // Failure

}

The behavior is different from the Scala parser combinators and should be documented (or the flatMap semantics changed, if it makes sense).

Support unicode escapes

They're dumb but we probably have to support them

@sirthias is there any way to override the parsing over every single character or string to check for these silly \u0123 thingies? Maybe by modifying my current wspStr and wspChar thingies? I suppose I'd need to get rid of anyOf or noneOf because those don't support the stupid unicode escapes either.

I don't want to do a pre-processing stage if I can reasonably avoid it. Preprocessing will destroy all the source locations and require elaborate gymnastics to get them back.

better negations

Right now "!" is the most confusing sign in FastParse as it is used both for negation and for capturing. Moreover !(something) does not provide any output, so to have something very common and trivial, like a parser like "everything but space" I have to do something like this:

val notSpace = (!" ").flatMap(v => AnyChar)  
val stringWithoutSpaces = P( notSpace.rep.! ) 

I would be happy to have less verbose way to do this

value | is not a member of String in Scala 2.10

It seems that the ParserApi implicit does not get invoked in Scala 2.10. The same source code compiles with Scala 2.11.

Sample build.sbt:

scalaVersion := "2.10.6"

libraryDependencies += "com.lihaoyi" %% "fastparse" % "0.3.4"

Hello.scala:

import fastparse.all._

object Hello {
  val t: P[String] = P("boo" | "bar")
}

sbt compile:

[info] Compiling 1 Scala source to /tmp/fp/target/scala-2.10/classes...
[error] /tmp/fp/Hello.scala:4: value | is not a member of String
[error]   val t: P[String] = P("boo" | "bar")
[error]                              ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 1 s, completed Dec 27, 2015 5:36:13 PM

Stackoverflow while compiling

I keep getting StackOverflowErrors while trying to compile scala-parser. Getting them with both Java 7 (OS X) and 8 (Ubuntu 12.04, 14.04), as far as commit a0c39cd (didn't tried further in the past), on a clean config (Ivy cache cleared).

sbt compile output like

Loading /usr/share/sbt/bin/sbt-launch-lib.bash
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
[info] Set current project to scala-parser (in build file:/home/test/tmp/scala-parser/)
[info] Compiling 8 Scala sources to /home/test/tmp/scala-parser/target/scala-2.11/classes...
java.lang.StackOverflowError
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:455)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:367)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at scala.tools.nsc.typechecker.Namers$Namer.typeErrorHandler(Namers.scala:111)
        at scala.tools.nsc.typechecker.Namers$Namer.typeSig(Namers.scala:1539)
        at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply$mcV$sp(Namers.scala:778)
        at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:777)
        at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1$$anonfun$apply$1.apply(Namers.scala:777)
        at scala.tools.nsc.typechecker.Namers$Namer.scala$tools$nsc$typechecker$Namers$Namer$$logAndValidate(Namers.scala:1565)
        at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:777)
        at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$monoTypeCompleter$1.apply(Namers.scala:769)
        at scala.tools.nsc.typechecker.Namers$$anon$1.completeImpl(Namers.scala:1681)
        at scala.tools.nsc.typechecker.Namers$LockingTypeCompleter$class.complete(Namers.scala:1689)
        at scala.tools.nsc.typechecker.Namers$$anon$1.complete(Namers.scala:1679)

...

        at scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1628)
        at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:4911)
        at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5295)
        at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedInternal(Typers.scala:5322)
        at scala.tools.nsc.typechecker.Typers$Typer.body$2(Typers.scala:5269)
        at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5273)
        at scala.tools.nsc.typechecker.Typers$Typer.typedByValueExpr(Typers.scala:5351)
        at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedStat$1(Typers.scala:2977)
        at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$60.apply(Typers.scala:3081)
        at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$60.apply(Typers.scala:3081)
        at scala.collection.immutable.List.loop$1(List.scala:172)
        at scala.collection.immutable.List.mapConserve(List.scala:188)
        at scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:3081)
        at scala.tools.nsc.typechecker.Typers$Typer.typedBlock(Typers.scala:2340)
        at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$typedOutsidePatternMode$1$1.apply(Typers.scala:5217)
        at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$typedOutsidePatternMode$1$1.apply(Typers.scala:5217)
        at scala.tools.nsc.typechecker.Typers$Typer.typedOutsidePatternMode$1(Typers.scala:5216)
        at scala.tools.nsc.typechecker.Typers$Typer.typedInAnyMode$1(Typers.scala:5252)
        at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:5259)
        at scala.tools.nsc.typechecker.Typers$Typer.runTyper$1(Typers.scala:5295)
        at scala.tools.nsc.typechecker.Typers$Typer.scala$tools$nsc$typechecker$Typers$Typer$$typedInternal(Typers.scala:5322)
        at scala.tools.nsc.typechecker.Typers$Typer.body$2(Typers.scala:5269)
        at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5273)
        at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:5362)
        at scala.tools.nsc.typechecker.Typers$Typer.computeType(Typers.scala:5453)
        at scala.tools.nsc.typechecker.Namers$Namer.assignTypeToTree(Namers.scala:876)
[error] (compile:compile) java.lang.StackOverflowError
[error] Total time: 4 s, completed 6 mars 2015 10:16:29

Would you have any idea of what could cause that? Am I the only one getting these errors?

Syntax error of typo on `def` is not reported if it's not type annotated.

I happen to find bugs about syntax error reporting of typo on def,val or var, which are not occurred if method or variable doesn't have type annotation. I assume they should be reported as syntax error even if they're not type annotated

Here are examples of method with typo on def. Behavior of variable with typo on val or var is the same.

scala> val p0 = scalaparse.Scala.CompilationUnit
p0: fastparse.P0 = CompilationUnit

//Reporting typo of `def`, if return type is annotated
scala> p0.parse("object A{def i:Int = 1}",0,true)
res24: fastparse.core.Result[Unit] = Success((), 23)

scala> p0.parse("object A{de i:Int = 1}",0,true)
res25: fastparse.core.Result[Unit] = Failure(CompilationUnit:0 / Body:0 / TopStatSeq:0 / TopStat:0 / Tmpl:0 / ObjDef:0 / DefTmpl:8 / TmplBody:8 / }:17 / "}":18 ..."= 1}", true)

//Not reporting typo of `def`, if return type is NOT annotated
scala> p0.parse("object A{def i = 1}",0,true)
res26: fastparse.core.Result[Unit] = Success((), 19)

scala> p0.parse("object A{de i = 1}",0,true)
res27: fastparse.core.Result[Unit] = Success((), 18)

Add an upper boundary to rep

It's currently possible to specify a minimum number of repetitions for repeated sequences. It'd be great to also have a maximum number of repetitions.

My use case is parsing HTTP grammar, where for example language tags are defined as:

language-tag  = primary-tag *( "-" subtag )
primary-tag   = 1*8ALPHA
subtag        = 1*8ALPHA

I haven't yet found a sane way to express that with fastparse, but maybe I'm missing something?

Logged indexes seem to be off

haoyi-haoyi@ object Foo{
               import fastparse.all._
               val plus = P( "+" )
               val num = P( CharIn('0' to '9').rep(1) ).!.map(_.toInt)
               val side = P( "(" ~! expr ~! ")" | num ).log()
               val expr: P[Int] = P( side ~ plus ~ side ).map{case (l, r) => l + r}.log()
             }
haoyi-haoyi@ Foo.expr.parse("(1+(2+3x))+4").asInstanceOf[fastparse.core.Result.Failure].index
+expr:0
  +side:0
    +expr:1
      +side:1
      -side:1:Success(2)
      +side:3
        +expr:4
          +side:4
          -side:4:Success(5)
          +side:6
          -side:6:Success(7)
        -expr:4:Success(7)
      -side:3:Failure(side:3 / ")":3 ..."(2+3x))+4", cut)
    -expr:1:Failure(expr:1 / side:3 / ")":1 ..."1+(2+3x))+", cut)
  -side:0:Failure(side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
-expr:0:Failure(expr:0 / side:0 / expr:1 / side:3 / ")":0 ..."(1+(2+3x))", cut)
res76: Int = 7

This should probably be ")":7 rather than ")":0. The final index seems right, so something is funky in the logging

Add helpers to get line/col of error message

index is enough for machines but hard to read for humans. It's trivial to get the line/col via

val lines = f.input.take(f.index).lines.toVector
val line = lines.length 
val col = lines.last.length

But it's probably worth chucking this (or some more-optimized version of it) into the Failure object for everybody's convenience

Implement repN and repExactN

Implement repN and repExactN instances that collect either (N or more) or N instances of the given parser, by analogy with rep, rep1

`pythonparseJVM:test` fails on master

[info] Failures:
[info] 90/93   pythonparse.ProjectTests.ansible     
[info]              java.lang.Exception: pythonparse/jvm/target/repos/ansible/lib/ansible/parsing/vault/__init__.py
[info] pythonparse.ProjectTests$.check(ProjectTests.scala:49)
[info] pythonparse.ProjectTests$$anonfun$16$$anonfun$apply$9.apply(ProjectTests.scala:61)
[info] pythonparse.ProjectTests$$anonfun$16$$anonfun$apply$9.apply(ProjectTests.scala:53)
[info] Tests: 93
[info] Passed: 92
[info] Failed: 1

I noticed this happening in the Scala community build: https://scala-ci.typesafe.com/job/scala-2.11.x-jdk8-integrate-community-build/123/consoleFull

Doesn't build under windows

> compile
Generating Scalatex Sources...
[info] Compiling 3 Scala sources to C:\dev\prj\fastparse\readme\target\scala-2.11\classes...
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:5: invalid escape character
[error]             wd = ammonite.ops.Path("C:\dev\prj\fastparse"),
[error]                                        ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:5: invalid escape character
[error]             wd = ammonite.ops.Path("C:\dev\prj\fastparse"),
[error]                                            ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error]             output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error]                                            ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error]             output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error]                                                ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Main.scala:6: invalid escape character
[error]             output = ammonite.ops.Path("C:\dev\prj\fastparse\readme\target\scalatex"),
[error]                                                                            ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error]   def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error]                                               ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error]   def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error]                                                   ^
[error] C:\dev\prj\fastparse\readme\target\scala-2.11\src_managed\main\scalatex\Readme.scala:7: invalid escape character
[error]   def apply(): Frag = _root_.scalatex.twf("C:\dev\prj\fastparse\readme\Readme.scalatex")
[error]                                                                        ^
[error] 8 errors found
[error] (readme/compile:compileIncremental) Compilation failed
[error] Total time: 2 s, completed 01-Jun-2015 11:27:18

Derives from lihaoyi/Scalatex#9.

Allow failure within a map call

There are some cases where we need to fail gracefully within a map call. As an example, I have some code that parses a charset name and needs to turn it into an instance of java.nio.charset.Charset. The name itself might be valid syntactically but not actually be the name of a valid Charset, which I can only discover in the map call.

I think what I'm really asking for is for is for map to be able to return a failing parser - that is, for Parser to have a flatMap method.

lookahead does not capture

There appears to be no provision for using the captured result of a lookahead. In a context sensitive parser it may be better to jump ahead and get some value used in building the correct context parser. Is there some reason lookahead needs to throw away captured values?

Use case: In YAML a Scalar Block indentation needs to be detected by looking at the indentation of first non-empty line. The complicating factor is that empty lines are not just thrown away, and how they are parsed depends on the indentation of the block.

So doing something like the following:

def BlockScalar(indent:Int) = // builds a parser using the indent
val block_scalar = block_scalar_header ~ 
             &((" ".rep ~ "\n").rep ~ " ".!.rep.map(_.length)).flatMap(BlockScalar)

Support for predicate tests to indicate Success or Failure for parsers of T

If this feature already exists I apologize for missing it.

It would be nice to have a built in construct for parsers with predicates like this:
import fastparse._
object Parser {
def predicated[T](parser: Parser[T])(pred: T => Boolean): Parser[T] = P {
parser.flatMap(x => if (pred(x)) Pass.map(_ => x) else Fail)
}
val pDigits: Parser[Int] = P(CharIn('0' to '9').rep(1).!.map(_.toInt))
val pEven: Parser[Int] = predicated(pDigits)({x => x % 2 == 0})
val pOdd: Parser[Int] = predicated(pDigits)({x => x % 2 != 0})
}

object Run extends App {
import Parser._
println(pEven.parse("123"))
println(pEven.parse("124"))
println(pOdd.parse("123"))
println(pOdd.parse("124"))
}
Run...
Failure(predicated:0 / Fail:3 ..."", false)
Success(124, 3)
Success(123, 3)
Failure(predicated:0 / Fail:3 ..."", false)

Cuts are visually indistinct and typo-prone

~! "foo" and ~!" foo" and ~ !"foo" all parse, and all do different things, despite being nearly indistiguishable visually.

Either negation or the cut operator should change. Negation could be Not, or cut could be ~> or somesuch. (~> because it visually suggests it's one way, and it parses with the same precedence as ~!.)

StringIn takes a very long time with longer strings

Tested on 0.2.1:

sealed trait Val
case class Var(value : String) extends Val
val startTime = new Date()
val vars = List("\r\n", "\n", "hl", "SA", "avxavxavxavxavxavx").toSeq 
val variables : Parser[Var] = P ( StringIn(vars:_*).! ).map(Var)
val Result.Success(myVars, _) = variables.parse("SA")
val endTime = new Date()
val dateDiff = new SimpleDateFormat("mm:ss").format(new Date(endTime.getTime - startTime.getTime))
println (s"myVars = $myVars")
println (s"took $dateDiff to parse")

results in the following output:

myVars = Var(SA)
took 03:11 to parse

This only happens on long strings, shorter ones process almost instantly. I've tried various combinations of the long string and it always results in an exceedingly long parsing time.

Weird errors trying to start using fastparse

Hi!

I'm very newbie to Scala and a total newbie using fastparse.

I've created a very simple SBT project and tried to compile and run some simple parsers from the web examples, but I get a lot of compilation errors on sbt like this:

[error] /home/freinn/parsertests/src/main/scala/Main.scala:30: No implicit view available from String => fastparse.core.Parser[V].
[error] val ab = P( "a".rep ~ "b" )
[error] ^
[error] one error found
error Compilation failed

Or another one:

[error] /home/freinn/tecsisaDSL/src/main/scala/Main.scala:22: value ~ is not a member of String
[error] val ab = P("a" ~ "b")
[error] ^
[error] /home/freinn/tecsisaDSL/src/main/scala/Main.scala:41: too many arguments for method println: (x: Any)Unit
[error] println(ParserSet.val1, ParserSet.val2)
[error] ^
[error] two errors found

I'm using 0.3.7 with the following line on build.sbt:

libraryDependencies += "com.lihaoyi" %% "fastparse" % "0.3.7"

I did the import (import fastparse.all._) in the beginning of my file and created an object ParserSet { ... } with all the copied/pasted code from the web examples.

Enhance log() to display on Success the text (or a summarized version of it) that a rule processed.

Currently, log() displays debugging information only when a Failure object is returned.
It would be helpful for log() to additionally display for Success objects to display the text (or a summarized version of it) that a rule processed, for a couple of reasons:

  • If an error arises from a rule over-generalizing and accepting too much input, a Success object is returned; it is very hard to locate this type of error without seeing the text the Success object processed;
  • The strings from Success objects make it easier to see where you are lokking in the parse.

These original text strings should not pollute the visual space with too much information, though, which would make log() output hard to read. Thus, the strings should:

  • be summarized if excessively long, yet contain enough info to be useful
  • be on one line (i.e. remove newlines or 'vertical whitespace')
  • be easy to scan, so breaking on whitespace would be helpful when possible

I've written a patch that does this, by using regexs to:

  • change all whitespace strings to a single space
  • then summarize a string if longer than 49 chars:
    • require a certain amount of characters from the beginning and end
    • and then break on next whitespace possible.

Although the string may be up to 49 chars long, in practice it is shorter than that due to breaking on whitespace. Here are some sample summarizations:

"Sticks and stones may break my bones but names will never hurt me."
0:66 Success: "sticks and"..."will never hurt me."

"I'm going to go to this shop to go shopping while she goes shopping at that shop."
0:81 Success: "I'm going to"..."shopping at that shop."

"We really may not be all that hungry since we ate a lot already."
0:64 Success: "we really may"..."ate a lot already."

"I've been studying a parser combinator library for scala because it might be useful for my projects."
0:100 Success: "I've been studying"..."for my projects."

"A newspaper reported that the store is going to plan new studies on the project."
0:80 Success: "a newspaper"..."on the project."

Here is how the Success strings look in debugging a program with log(). The example program is a simple NLP chunker with an input sentence:
"A newspaper reported that the firm plans new studies on the project."

Without Success strings:

+s:0
  +clause:0
    +np:0
      +adjP:2
      -adjP:2:Failure(adjP:1:3 / adj:1:3 / ws:1:6 / (CharIn(" \t\n.;:?!").rep(1) | &(",") | End):1:6 ..."newspaper ")
      +pp:12
      -pp:12:Failure(pp:1:13 / prep:1:13 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:13 ..."reported t")
    -np:0:Success(12)
    +vp:12
      +vConj:12
      -vConj:12:Success(21)
      +pp:21
      -pp:21:Failure(pp:1:22 / prep:1:22 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:22 ..."that the f")
      +np:21
        +adjP:26
        -adjP:26:Failure(adjP:1:27 / adj:1:27 / StringIn("big", "small", "fast", "slow", "new", "old", "next", "red", "blue", "green", "orange", "yellow", "white", "black", "grey", "silver", "gold", "good", "bad", "great", "awful", "cool", "awesome", "worthless", "useful", "clever", "smart", "dumb", "stupid", "ridiculous", "fun", "interesting", "boring", "hungry", "thirsty", "firm"):1:27 ..."the firm p")
      -np:21:Failure(np:1:22 / (det.? ~ Logged(adjP,adjP,<function1>).? ~ n.rep(1) ~ Logged(pp,pp,<function1>).? | pronoun):1:22 ..."that the f")
    -vp:12:Success(21)
  -clause:0:Success(21)
  +clauseConnector:21
  -clauseConnector:21:Success(26)
  +clause:26
    +np:26
      +adjP:30
      -adjP:30:Success(35)
      +pp:41
      -pp:41:Failure(pp:1:42 / prep:1:42 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:42 ..."new studie")
    -np:26:Success(41)
    +vp:41
      +vConj:41
      -vConj:41:Failure(vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
    -vp:41:Failure(vp:1:42 / vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
    +copulaP:41
    -copulaP:41:Failure(copulaP:1:42 / be:1:42 / StringIn("am", "are", "is", "was", "were", "will be", "be", "'m", "'s", "'re", "'ll"):1:42 ..."new studie")
  -clause:26:Failure(clause:1:27 / (Logged(vp,vp,<function1>) | Logged(copulaP,copulaP,<function1>)):1:42 ..."the firm p")
-s:0:Failure(s:1:1 / End:1:22 ..."a newspape")

With Success strings:

+s:0
  +clause:0
    +np:0
      +adjP:2
      -adjP:2:Failure(adjP:1:3 / adj:1:3 / ws:1:6 / (CharIn(" \t\n.;:?!").rep(1) | &(",") | End):1:6 ..."newspaper ")
      +pp:12
      -pp:12:Failure(pp:1:13 / prep:1:13 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:13 ..."reported t")
    -np:0:12 Success: "a newspaper "
    +vp:12
      +vConj:12
      -vConj:12:21 Success: "reported "
      +pp:21
      -pp:21:Failure(pp:1:22 / prep:1:22 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:22 ..."that the f")
      +np:21
        +adjP:26
        -adjP:26:Failure(adjP:1:27 / adj:1:27 / StringIn("big", "small", "fast", "slow", "new", "old", "next", "red", "blue", "green", "orange", "yellow", "white", "black", "grey", "silver", "gold", "good", "bad", "great", "awful", "cool", "awesome", "worthless", "useful", "clever", "smart", "dumb", "stupid", "ridiculous", "fun", "interesting", "boring", "hungry", "thirsty", "firm"):1:27 ..."the firm p")
      -np:21:Failure(np:1:22 / (det.? ~ Logged(adjP,adjP,<function1>).? ~ n.rep(1) ~ Logged(pp,pp,<function1>).? | pronoun):1:22 ..."that the f")
    -vp:12:21 Success: "reported "
  -clause:0:21 Success: "a newspaper reported "
  +clauseConnector:21
  -clauseConnector:21:26 Success: "that "
  +clause:26
    +np:26
      +adjP:30
      -adjP:30:35 Success: "firm "
      +pp:41
      -pp:41:Failure(pp:1:42 / prep:1:42 / StringIn("about", "above", "according to", "across", "after", "against", "around", "at", "before", "behind", "below", "beneath", "beside", "besides", "between", "beyond", "by", "by way of", "down", "during", "except", "for", "from", "in", "in addition to", "in front of", "in place of", "in regard to", "in spite of", "inside", "instead of", "into", "like", "near", "of", "off", "on", "on account of", "out", "out of", "outside", "over", "through", "throughout", "till", "to", "toward", "under", "until", "up", "upon", "with", "without"):1:42 ..."new studie")
    -np:26:41 Success: "the firm plans "
    +vp:41
      +vConj:41
      -vConj:41:Failure(vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
    -vp:41:Failure(vp:1:42 / vConj:1:42 / ("to" ~ ws ~ adv.rep ~ infinitive | "going ".? ~ "to" ~ ws ~ adv.rep ~ infinitive ~ presentParticiple.? | modalAuxiliary ~ adv.rep ~ infinitive | (("do" | "did" | "will") ~ (ws ~ adv).? | ("don't" | "didn't" | "won't") ~ ws) ~ infinitive | have ~ adv.rep ~ pastParticiple | be ~ adv.rep ~ presentParticiple | have ~ adv.rep ~ "been " ~ adv.rep ~ presentParticiple | presentTense | pastTense):1:42 ..."new studie")
    +copulaP:41
    -copulaP:41:Failure(copulaP:1:42 / be:1:42 / StringIn("am", "are", "is", "was", "were", "will be", "be", "'m", "'s", "'re", "'ll"):1:42 ..."new studie")
  -clause:26:Failure(clause:1:27 / (Logged(vp,vp,<function1>) | Logged(copulaP,copulaP,<function1>)):1:42 ..."the firm p")
-s:0:Failure(s:1:1 / End:1:22 ..."a newspape")

In the bottom version, it is much easier to follow the parse and ID the problem, which is not a Failure object, but the Success object: -np:26:41 Success: "the firm plans " (i.e. an NP: "plans that are firm", rather than NP "the firm" + V "plans").

Pattern-matching on Result produces a (spurious?) warning

When my code tries to match on the result of a parse, such as:

    dumpfileP.parse(mySQL) match {
      case Result.Success(stmts, _) => stmts
      case Result.Failure(parser, index) => displayFailure(...)
    }

The Scala compiler consistently kicks out a warning:

[warn] /home/jducoeur/GitHub/Querki/querki/scalajvm/app/querki/imexport/MySQLImport.scala:143: The outer reference in this type test cannot be checked at run time.
[warn]       case Result.Success(stmts, _) => stmts

This warning may be correct -- I'm honestly unsure -- but I don't care about it, and definitely don't want to see it. My coding standards are "no warnings", so this is getting in the way of using FastParse for production code.

This can be worked around by using isInstanceOf + asInstanceOf, but that's rather boilerplatey. A more idiomatic-Scala solution would be preferable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.