Coder Social home page Coder Social logo

parsercombinator.jl's People

Contributors

alyst avatar andrewcooke avatar carlolucibello avatar iainnz avatar jtrakk avatar kristofferc avatar oxinabox avatar richardreeve avatar sbromberger avatar simonschoelly avatar tkelman avatar ztangent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsercombinator.jl's Issues

simple test with UTF8-String fails

On Julia v0.4.3 I installed Pkg.add("parserCombinators"); using ParserCombinator
and Pkg.installed("ParserCombinator") v"1.7.4"

Then the following test:parse_one("โ‚ฌ", p".") # any non-ascii character
ERROR: BoundsError: attempt to access 3-element Array{UInt8,1}:
0xe2
0x82
0xac
at index [4]
in schedule_and_wait at task.jl:343
in consume at task.jl:259
in once at ~/.julia/v0.4/ParserCombinator/src/core/parsers.jl:182
in single_result at ~/.julia/v0.4/ParserCombinator/src/core/parsers.jl:193

GML parsing error: Underscore in key name

I have a GML file that has underscores in key names:


        graph [
            directed 1
            id 42
            label "splice graph of s-exons"
        
                node [
                    id 1
                    label "start"
                    conservation 100.0
                    transcript_fraction 100.0
                    genes "ENSBTAG00000007876,ENSG00000107643,ENSGGOG00000011771,ENSMMUG00000004060,ENSMODG00000002193,ENSMUSG00000021936,ENSOANG00000012095,ENSRNOG00000020155,ENSSSCG00000010380,ENSXETG00000021691"            
                ]
.
.
.

This is failing with the following error:

ParserError{Int64}("Expected ] at (11,21)\n                    transcript_fraction 100.0\n                    ^\n", 253)

Stacktrace:
 [1] check_channel_state at ./channels.jl:125 [inlined]
 [2] take_unbuffered(::Channel{Any}) at ./channels.jl:327
 [3] take! at ./channels.jl:315 [inlined]
 [4] iterate(::Channel{Any}, ::Nothing) at ./channels.jl:395
 [5] iterate at ./channels.jl:394 [inlined]
 [6] once at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/core/parsers.jl:184 [inlined]
 [7] #single_result#36 at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/core/parsers.jl:192 [inlined]
 [8] (::getfield(ParserCombinator, Symbol("#kw##single_result#38")))(::NamedTuple{(:debug,),Tuple{Bool}}, ::getfield(ParserCombinator, Symbol("#single_result#38")){getfield(ParserCombinator, Symbol("##single_result#36#37")){Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},UnionAll}}, ::String, ::Trace) at ./none:0
 [9] #parse_raw#6(::Bool, ::Function, ::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:80
 [10] #parse_raw at ./none:0 [inlined]
 [11] #parse_dict#9(::Bool, ::Array{Symbol,1}, ::Bool, ::Function, ::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:162
 [12] parse_dict(::String) at /home/elin/.julia/packages/ParserCombinator/Rc0cd/src/gml/GML.jl:162
 [13] loadgml(::IOStream, ::String) at /home/elin/.julia/packages/GraphIO/IpSAL/src/GML/Gml.jl:33
 [14] loadgraph at /home/elin/.julia/packages/GraphIO/IpSAL/src/GML/Gml.jl:95 [inlined]
 [15] #119 at /home/elin/.julia/packages/LightGraphs/HsNig/src/persistence/common.jl:15 [inlined]
 [16] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(LightGraphs, Symbol("##119#120")){String,GraphIO.GML.GMLFormat}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
 [17] open at ./iostream.jl:367 [inlined]
 [18] loadgraph(::String, ::String, ::GraphIO.GML.GMLFormat) at /home/elin/.julia/packages/LightGraphs/HsNig/src/persistence/common.jl:14
 [19] top-level scope at In[17]:1

I think that the problem is that the key regex doesn't support underscores:

"[a-zA-Z][a-zA-Z0-9]*"

Is there ir a reason to avoid underscore (or other symbols) in key names?

I can't execute the Example

I guess this example is not updated so that I get an error.
"ERROR: LoadError: syntax: extra token "Node" after end of expression"
How can I make an struct that reads the expression and give me the result of the calculation?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

can't define matcher?

I'm trying to define a grammar with a loop, so I'm using Delayed(). I get an error when trying to redefine the matcher for a delayed rule:

ERROR: `convert` has no method matching convert(::Type{Nullable{Matcher}}, ::Alt)

Here's the grammar:

expr         = Delayed()                                                                                                                                                        
doubley      = p"[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?[dD]" > (x -> float64(x[1:end-1]))                                                                                       
floaty_dot   = p"[-+]?[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?[Ff]" > (x -> float32(x[1:end-1]))                                                                                        
floaty_nodot = p"[-+]?[0-9]*[0-9]+([eE][-+]?[0-9]+)?[Ff]" > (x -> float32(x[1:end-1]))                                                                                          
floaty       = floaty_dot | floaty_nodot                                                                                                                                        
expr.matcher = doubley | float

I'm basically copying what you have in calc.jl

wrong handling of strings

consider the following program

s = """
digraph "g" {
}
"""
ast = DOT.parse_dot(s)[1]

s = """
digraph g {
}
"""
ast = DOT.parse_dot(s)[1]

their id should be different, but currently the parser will not parse the quotes, but just ignore them

(Long-term) integration with Flow.jl?

I had a thought. Mike Innes's Flow.jl is looking promising: https://github.com/MikeInnes/Flow.jl. The package seems general enough to deal with any kind of Julia code, which is far more powerful than alternatives like TensorFlow.

Something that sort of bugs me when using ParserCombinator as a CFG-parser is the syntax. I sort of dislike writing, and even more dislike reading,

x = Delayed()
y = Star(x)
x.matcher = Seq(e"(", y, e")")

or similar. It would be cool if that could be written

matcher(@flow function()
    x = Seq(e"(", Star(x), e"))
end)

letting Flow.jl figure out what the graph looks like. What are your thoughts? I'd be happy to work on a prototype when/if I get the time.

fail to parse a dot program

This parser fails to parse the following dot program, it is an example from https://graphviz.org/Gallery/directed/datastruct.html

digraph g {
fontname="Helvetica,Arial,sans-serif"
node [fontname="Helvetica,Arial,sans-serif"]
edge [fontname="Helvetica,Arial,sans-serif"]
graph [
rankdir = "LR"
];
node [
fontsize = "16"
shape = "ellipse"
];
edge [
];
"node0" [
label = "<f0> 0x10ba8| <f1>"
shape = "record"
];
"node1" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |-1"
shape = "record"
];
"node2" [
label = "<f0> 0xf7fc44b8| | |2"
shape = "record"
];
"node3" [
label = "<f0> 3.43322790286038071e-06|44.79998779296875|0"
shape = "record"
];
"node4" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |2"
shape = "record"
];
"node5" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node6" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |1"
shape = "record"
];
"node7" [
label = "<f0> 0xf7fc4380| <f1> | <f2> |2"
shape = "record"
];
"node8" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node9" [
label = "<f0> (nil)| | |-1"
shape = "record"
];
"node10" [
label = "<f0> (nil)| <f1> | <f2> |-1"
shape = "record"
];
"node11" [
label = "<f0> (nil)| <f1> | <f2> |-1"
shape = "record"
];
"node12" [
label = "<f0> 0xf7fc43e0| | |1"
shape = "record"
];
"node0":f0 -> "node1":f0 [
id = 0
];
"node0":f1 -> "node2":f0 [
id = 1
];
"node1":f0 -> "node3":f0 [
id = 2
];
"node1":f1 -> "node4":f0 [
id = 3
];
"node1":f2 -> "node5":f0 [
id = 4
];
"node4":f0 -> "node3":f0 [
id = 5
];
"node4":f1 -> "node6":f0 [
id = 6
];
"node4":f2 -> "node10":f0 [
id = 7
];
"node6":f0 -> "node3":f0 [
id = 8
];
"node6":f1 -> "node7":f0 [
id = 9
];
"node6":f2 -> "node9":f0 [
id = 10
];
"node7":f0 -> "node3":f0 [
id = 11
];
"node7":f1 -> "node1":f0 [
id = 12
];
"node7":f2 -> "node8":f0 [
id = 13
];
"node10":f1 -> "node11":f0 [
id = 14
];
"node10":f2 -> "node12":f0 [
id = 15
];
"node11":f2 -> "node1":f0 [
id = 16
];
}

Trie-based matcher for `Alt(Equals.(a_long_list)...)`

Hi,
In the grammar I am writing,
there are a few places where I need to match against one of a large selection of constants.
Which I will call a_long_list, it might have 20 elements, it might have 200.
In theory, I'm sure there are use cases for matching against one of thousands, or tens of thousands.

Alt(Equals.(a_long_list)...) works.
But I understand that it will be O(n*m)
for n the length of the list, and m the maximum length of any element of that list
Which honestly isn't too bad, I think.

But I figure it can be done better.
If a Trie is used, I think this becomes just O(m).
I might be screwing up my math here, but I think that in the process of finding the longest match, one automatically finds all the shorter matches, which can be saved for if backoff is required.
So there is no need to re-step through the source, if one fails.

I've started working on this, I have the Trie stuff working to return what strings match, I just need to like it into a matcher, with the trampoline stuffs, Success/Fail/Execute.

What I currently propose is a matcher:

EqualsOneOf(values; greedy::Bool=true)

Where values are the values that it could be equal to.
If greedy is false then this matches shortest-first.
And if it has to back-off, then gives the second shortest string in values that matches,
etc.
This is the natural order for a Trie.

If greedy is true, then it matches longest-first (the longest string that is in values first), and then if that needs to be backedoff from, match's the second longest, and so forth.
The is accomplished by collecting, all the values that match as Trie keys,
and then reverseing the order. (Which i guess does make this O(n+m))

This is a bit less expressive than Alt(Equals.(a_long_list)...) since that lets you choose a priority for the matchers, not just longestfirst or shortestfirst

What do you think?
When I have it working, should I make a PR?

One issue is that right now, Tries only support strings.
JuliaCollections/DataStructures.jl#220

ParserCombinator is slow

I did some perfomance tests for reading a file from a graph using my package FatGraphs;

Pkg.clone("https://github.com/CarloLucibello/FatGraphs.jl")

For comparison I write a graph in a simple text format (Pajek .net). Each of the following function has been run twice to avoid compilations times:

julia> g = Graph(100,1000,seed=1)
Graph{Int64}(100, 1000)

julia> @time writegraph("test.net",g)
  0.243075 seconds (37.02 k allocations: 1.487 MB)
1

julia> @time readgraph("test.net")
  0.003040 seconds (18.54 k allocations: 550.281 KB)
Graph{Int64}(100, 1000)
```
Notice how  perfomance is degraded  when **reading** from a .dot or a .gml file, relying on ParserCombinator:
```julia
julia> @time writegraph("test.dot",g)
  0.001826 seconds (15.23 k allocations: 664.031 KB)
1

julia> @time readgraph("test.dot")
  2.408855 seconds (1.22 M allocations: 49.666 MB, 0.56% gc time)
Graph{Int64}(100, 1000)


julia> @time writegraph("test.gml",g)
  0.001426 seconds (16.83 k allocations: 789.031 KB)
1

julia> @time readgraph("test.gml")
  1.024898 seconds (511.11 k allocations: 18.279 MB, 0.64% gc time)
Graph{Int64}(100, 1000)
```
Probably ParserCombinator has some huge type instability issues. Can those be avoided?

Bye,
Carlo 

0.6 deprecations

This is probably a big job, but:

WARNING: produce is now deprecated. Use Channels for inter-task communication.
Stacktrace:
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:64
 [2] produce(::Array{Any,1}) at ./deprecated.jl:884
 [3] #producer#27(::Bool, ::Function, ::ParserCombinator.NoCache{String,Int64}, ::ParserCombinator.Seq!) at /Users/seth/.julia/v0.6/ParserCombinator/src/core/parsers.jl:141
 [4] (::ParserCombinator.#kw##producer)(::Array{Any,1}, ::ParserCombinator.#producer, ::ParserCombinator.NoCache{String,Int64}, ::ParserCombinator.Seq!) at ./<missing>:0
 [5] (::ParserCombinator.##29#30{Bool,ParserCombinator.Seq!,ParserCombinator.NoCache{String,Int64}})() at /Users/seth/.julia/v0.6/ParserCombinator/src/core/parsers.jl:171
while loading /Users/seth/.julia/v0.6/LightGraphs/test/persistence/persistence.jl, in expression starting on line 106

Can't test GML

Getting

ERROR: LoadError: LoadError: UndefVarError: Parsers not defined

on Pkg.test("ParserCombinator"). Not sure how to import ParserCombinator.Parsers.GML.

It may have to do with the fact that some OSes (like OSX) are case-insensitive, and you've got both Parsers.jl and parsers.jl.

ETA: confirmed - when I rename parsers.jl we're all good.

ร— or * instead of +?

Firstly, thanks for this package! I'm playing around with it and it's exceptionally easy to use ๐Ÿ‘.

This is a very minor matter (as it's just syntax), but it seems the Seq combinator used here is a Cartesian product (correct me if I'm wrong). Consequently, wouldn't * or ร— be a more natural choice of symbols? What are your thoughts?

Weird issue with regex and Eos()

I'm testing a simple Boolean regex:

julia> parse_one("false", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])"+Eos())
1-element Array{Any,1}:
 "false"

works fine, but this is weird:

julia> parse_one("true", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])"+Eos())
ERROR: ParserCombinator.ParserException("cannot parse")
 in once at /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:184
 [inlined code] from /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:169
 in single_result at /Users/john/.julia/v0.5/ParserCombinator/src/core/parsers.jl:193
 in eval at ./boot.jl:264

Without the Eos() it works:

julia> parse_one("true", p"([Tt][Rr][Uu][Ee])|([Ff][Aa][Ll][Ss][Ee])")
1-element Array{Any,1}:
 "true"

Any ideas?

P.S. Thanks for a great library!

v1.7.2 breaks DOT parsing

Pkg.update() to 1.7.2 and now getting

julia> Pkg.test("LightGraphs")
INFO: Testing LightGraphs
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/AutoHashEquals.ji for module AutoHashEquals.
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/LightGraphs.ji for module LightGraphs.
INFO: Recompiling stale cache file /Users/seth/.julia/lib/v0.5/ParserCombinator.ji for module ParserCombinator.
running /Users/seth/.julia/v0.5/LightGraphs/test/operators.jl ...
running /Users/seth/.julia/v0.5/LightGraphs/test/graphdigraph.jl ...
running /Users/seth/.julia/v0.5/LightGraphs/test/persistence.jl ...
ERROR: LoadError: LoadError: ParserCombinator.ParserException("cannot parse")
 in once at /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:184
 [inlined code] from /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:169
 in single_result at /Users/seth/.julia/v0.5/ParserCombinator/src/core/parsers.jl:193
 in parse_dot at /Users/seth/.julia/v0.5/ParserCombinator/src/dot/DOT.jl:216
 in readdot at /Users/seth/.julia/v0.5/LightGraphs/src/persistence/dot.jl:26
 in readdot at /Users/seth/.julia/v0.5/LightGraphs/src/persistence/dot.jl:25
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:392
 [inlined code] from /Users/seth/.julia/v0.5/LightGraphs/test/runtests.jl:81
 in anonymous at ./no file:4294967295
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:392
 in process_options at ./client.jl:277
 in _start at ./client.jl:377
while loading /Users/seth/.julia/v0.5/LightGraphs/test/persistence.jl, in expression starting on line 45
while loading /Users/seth/.julia/v0.5/LightGraphs/test/runtests.jl, in expression starting on line 78

Any ideas? I'll do some digging also.

Edit: also happening on 0.4...

funccall always parsed instead of funcdef because its shorter

Hello and thank you for this great library. I'm trying to parse a language with fat arrow funcdefs as in:

a() => 1

However, they are never begin used because funccall is a shorter version of this:

a()

I've verified they both work as removing funccall allows funcdef to work again. Is there any way to set precedence or otherwise allow funcdef to be used? Thanks for any advice or feedback.

Here are the relevent parser definitions:

arglist = (name | (name + E","))[0:end]
funccall = name + E"(" + arglist + E")" |> FuncCall

paramlist = ((name | assign) | ((name | assign) + E","))[0:end]
funcbody = stmt | (whitespacereq + stmt)[1:end]
funcdef = name + E"(" + paramlist + E")" + E"=>" + E"\n"[0:end] + funcbody |> FuncDef

Wrong Floating number regex

The current implementation

PFloat64() = Parse(p"-?(\d*\.?\d+|\d+\.\d*)([eE]\d+)?", Float64)

is missing the optional sign for the exponent: "-?(\d*\.?\d+|\d+\.\d*)([eE]-?\d+)?"
An implementation of PFloat16 is also missing.

abandoned or live?

Hi - I was wondering if this project is alive or has been abandoned? I wanted to try to use a combinator-based parser in julia and found this one. However, there haven't been any git commits for a while. I've hit a snag where StarPlus tends to get stuck in stupidly deep recursions, and have also found the error reporting to be a bit difficult to follow but don't really want to end up as the only user/developer of a library.

Alternative? Maintenance?

Is there an alternative to this or any will to make this work with current Julia?

The Example yields:

โ”Œ Info: Precompiling ParserCombinator [fae87a5f-d1ad-5cf0-8f61-c941e1580b46]
โ”” @ Base loading.jl:1260
syntax: extra token "Node" after end of expression

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.