Coder Social home page Coder Social logo

prataprc / goparsec Goto Github PK

View Code? Open in Web Editor NEW
275.0 14.0 23.0 1.48 MB

Parser combinator in Go. If there are any cross platform issues or backward compatibility issues, please reach out.

Home Page: http://prataprc.github.io/astquery.io

License: Other

Go 99.44% Makefile 0.56%
parser-combinators parser golang parsercombinator parsec

goparsec's People

Contributors

gitter-badger avatar joneshf avatar leovailati avatar prataprc avatar ptxmac avatar tokenshift avatar tompaton avatar wiiittttt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goparsec's Issues

AST Terminals and Non-Terminals.

Create a new AST struct that supports all combinators, in addition to that it can treat Non-terminals as a separate interface.

Allow Nodify functions to decide a match.

In the present implementation, standard combinators like AND, OR etc.. try to match the input text with parser function, and if there is a match it dispatches the Nodify function subscribed for that combinator.

Proposed solution is to allow the Nodify callback to return a nil if it decides that the matching nodes, though they succeed the combinator rule, fall under exceptional cases. In such cases, the standard combinator should proceed in such a manner that the match is a failure. And importantly I should not consume the scanned text.

Unicode support.

Default Scanner (SimpleScanner) and Terminal parser functions should have unicode support, along with utf8 encoding.

NewScanner with SetWsPattern does not skip whitespace when used with And

I was expecting that two should be matched since it should escape the whitespace but it did not. Also the whitespace should be already be set by default which means that there should be no need to call SetWSPattern

	s := NewScanner([]byte("one two three")).SetWSPattern(" ")
	v, e := p(s)
	if v == nil {
		t.Errorf("End() didn't match %q", e)
	}```


Error message:
End() didn't match &{"one two three" '\x00' '\x01' map["^ ":%!q(*regexp.Regexp=&{^  0xc0003bef00 <nil> 0 65536 []  [] 0 0 0 2 false 4 1 false})] " " %!q(bool=false)}

There are few places in tokeniser where it calls SkipWS but not used that Scanner returned by that function. For example
https://github.com/prataprc/goparsec/blob/master/tokeniser.go#L85

Feature Request: Expose ability to set whitespace pattern for SimpleScanner.SkipWS()

It seems like if you want to use the SimpleScanner, but override the default whitespace pattern, you need to copy all of the code for the scanner locally and change the pattern. It would be great if there was a simple way to change the pattern via the packages API (either a global variable, public struct member, etc.)

If there is currently a way to do this already, please let me know.

Tab completion.

Investigate whether goparsec can be used for tab completion when used for parsing text inputs, like in command line or web forms.

Thoughts on an assertion library?

Your tests are littered with assertions that look like this:

node, _ := ast.Parsewith(y, s)
if node.GetValue() != "onetwothree" {
	t.Errorf("unexpected: %v", node.GetValue())
} else if _, ok := node.GetChildren()[2].(MaybeNone); ok == false {
	t.Errorf("expected MaybeNone")
}

they could look like this:

node, _ := ast.Parsewith(y, s)
require.Equal(t, "onetwothree", node.GetValue())
require.IsType(t, MaybeNone{}, node.GetChildren()[2])

which is much, much easier to read.

Personally I'm a fan of https://github.com/stretchr/testify. Unlike some of the other "testing frameworks" it is still just using the standard go testing signatures.

Review downstream pull request from 5b7abca

a. parsec_test.go:TestStrEOF(), changes where removed check whether it needs to be brought back.
b. terminal.go:String(), implementation has changed this seem to correlate with (a.)
c. terminal.go:Token(): check on the implication of using SkipAny() instead of SkipWS().
d. terminal.go:TokenOrdChoice().

How to write parser to extract elements from an array

Hi,

I am writing a service where I have to parse a array of bytes. Now, different portions of the array contain different fields which I need to extract. I am wondering if parser-combinator is a good candidate to approach this.

For eg- if I have array of 10 elements. 1-5 signify field1, 6-7 is field2, 8-10 is field3. And I want to pass the array and get those 3 fields. How would I do it using this library ?

P.S. - Sorry if this is a n00b question. I am trying with familiarize myself with the concept.

Thanks !

Implement Atom() parser similar to Token().

@leovailati demonstrated that MatchString() method, that matches with i/p string
byte-by-byte, is lot faster than the Match() method from Scanner.

Token() parser takes a regular expression, and matches it with i/p token.
Similarly implement an Atom() parser that uses MatchString() that can match faster
than Token's Match().

Project status; Panic in parsec.String() with unterminated string

Hi @prataprc, first off: is this project still maintained? Im guessing that the Travis CI is no longer working and that the version of go (1.13) means that it wont build with modern toolchain?

If I run an unterminated string through the String() parser, it panics with runtime error: index out of range [6] with length 6:

data := []byte(`"hello`)
scanner := parsec.NewScanner(data)
node, _ := parsec.String()(scanner)
print(node)

I think the issue is in tokenizer.go, in method scanString (suggested changes in comments):

func scanString(txt []byte) (tok []byte, readn int) {
	if len(txt) < 2 {
		return nil, 0
	}

	e := 1
	for txt[e] != '"' {
		c := txt[e]
		if c == '\\' || c == '"' || c < ' ' {
			break
		}
		if c < utf8.RuneSelf {
			e++
			// SUGGESTED CHANGE 1
			// if e >= len(txt) {
			// 	return nil, 0
			// }
			continue
		}
		r, size := utf8.DecodeRune(txt[e:])
		if r == utf8.RuneError && size == 1 {
			return nil, 0
		}
		e += size

		// SUGGESTED CHANGE 2
		// if e >= len(txt) {
		// 	return nil, 0
		// }
	}

	// ...

Is there an appetite for bringing this project up-to-date and fixing this issue?

And(x, Maybe(And(y, z))) is broken

Consider this simplified grammar for addition:

Add ::= Int ("+" Int)?

Trying to express it with goparsec as follows:

package test

import (
	"testing"
	"github.com/prataprc/goparsec"
)

func TestParsec(t *testing.T) {
	intP := goparsec.Int()
	plusP := goparsec.Token("+", "op")
	addP := goparsec.And(nil, intP, goparsec.Maybe(nil, goparsec.And(nil, plusP, intP)))
	//addP := goparsec.OrdChoice(nil, goparsec.And(nil, intP, plusP, intP), intP) // (1)
	//addP := goparsec.And(nil, intP, plusP, intP) // (2)
	ast := goparsec.NewAST("expr", 32)
	s := goparsec.NewScanner([]byte("1 + 2"))
	_, err := ast.Parsewith(addP, s)
	if err != nil {
		t.Fatal(err)
	}
}

Gives the following error:

=== RUN   TestParsec
--- FAIL: TestParsec (0.00s)
panic: interface conversion: []parsec.ParsecNode is not parsec.Queryable: missing method GetAttribute [recovered]
	panic: interface conversion: []parsec.ParsecNode is not parsec.Queryable: missing method GetAttribute

goroutine 6 [running]:
testing.tRunner.func1.1(0x52cb80, 0xc00007cdb0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:940 +0x2f5
testing.tRunner.func1(0xc000136120)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:943 +0x3f9
panic(0x52cb80, 0xc00007cdb0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/runtime/panic.go:969 +0x166
PATH/vendor/github.com/prataprc/goparsec.(*AST).Parsewith(0xc00008df38, 0xc00007cb70, 0x583140, 0xc00007a230, 0xc00007cb70, 0x55, 0x37e, 0x4c0c30)
	PATH/vendor/github.com/prataprc/goparsec/ast.go:90 +0xae
PATH/test.TestParsec(0xc000136120)
	PATH/parser_test.go:19 +0x31e
testing.tRunner(0xc000136120, 0x55f4d0)
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:991 +0xdc
created by testing.(*T).Run
	/nix/store/i14hzzs9ww364b5zl4c4g503rarwk09g-go-1.14.4/share/go/src/testing/testing.go:1042 +0x357

Process finished with exit code 1

Replacing the parser with (1) gives the same error, replacing with (2) works, but it is not equivalent (obviously).

When parsec.String() attempts to scan at the end of the input buffer, a panic occurs.

I have a grammar of the form (simplified):

expr = term+
term = ident | string

Where ident = parsec.Ident() and string = parsec.String(). term+ is implemented using parsec.Many(). Terms are repeatedly consumed from the input until the buffer is completely consumed; at which point parsec.String() panics at terminals.go:24, due to the cursor having passed the end of the buffer.

Minor: Comments are wrong in JSON example

// False is alias for string type denoting JSON `null`
type False string

// Num is alias for string type denoting JSON `null`
type Num string

// String is alias for string type denoting JSON `null`
type String string

Error reporting

Is there a way to report errors when input cannot be parsed?

how to define recursive grammar?

var (
    UnaryExpr  = Ord(And(LPAREN, Expr, RPAREN))
    Expr       = Ord(UnaryExpr, BinaryExpr)
)

as you see, UnaryExpr and Expr refer to each other.
this library won't work because when we define UnaryExpr, Expr is still uninitialized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.