Coder Social home page Coder Social logo

lulpeg's Introduction

LuLPeg

/ˈluːɛlpɛɡ/ https://github.com/pygy/LuLPeg/

A pure Lua port of LPeg, Roberto Ierusalimschy's Parsing Expression Grammars library.

This version of LuLPeg emulates LPeg v0.12.

See http://www.inf.puc-rio.br/~roberto/lpeg/ for the original and its documentation.

Feedback most welcome.

Usage:

lulpeg.lua is a drop-in replacement for LPeg and re.

local lulpeg = require"lulpeg"
local re = lulpeg.re

-- from here use LuLPeg as you would use LPeg.

pattern = lulpeg.C(lulpeg.P"A" + "B") ^ 0
print(pattern:match"ABA") --> "A" "B" "A"

If you plan to fall back on LuLPeg when LPeg is not present, putting the following at the top level of your program will make the substitution transparent:

local success, lpeg = pcall(require, "lpeg")
lpeg = success and lpeg or require"lulpeg":register(not _ENV and _G)

:register(tbl) sets package.loaded.lpeg and package.loaded.re to their LuLPeg couterparts. If a table is provided, it will also populate it with the lpeg and re fields.

Compatibility:

Lua 5.1, 5.2 and LuaJIT are supported.

Main differences with LPeg:

This section assumes that you are familiar with LPeg and its official documentation.

LuLPeg passes most of the LPeg test suite: 6093 assertions succeed, 70 fail.

None of the failures are caused by semantic differences. They are related to grammar and pattern error checking, stack handling, and garbage collection of Cmt capture values.

LuLPeg does not check for infinite loops in patterns, reference errors in grammars and stray references outside of grammars. It should not be used for grammar developement at the moment if you want that kind of feedback, just for substitution, once you got the grammar right.

Bar bugs, all grammars accedpted by LPeg should work with LuLPeg, with the following caveats:

  • The LuLPeg stack is the Lua call stack. lpeg.setmaxstack(n) is a dummy function, present for compatibility. LuLPeg patterns are compiled to Lua functions, using a parser combinators approach. For example, C(P"A" + P"B"):match"A" pushes at most three functions on the call stack: one for the C capture, one for the + choice, and one for the P"A". If P"A" had failed, it would have been popped and P"B" would be pushed.

  • LuLPeg doesn't do any tail call elimination at the moment. Grammars that implement finite automatons with long loops, that run fine with LPeg, may trigger stack overflows. This point is high on my TODO list.

The next example works fine in LPeg, but blows up in LuLPeg.

-- create a grammar for a simple DFA for even number of 0s and 1s
--
--  ->1 <---0---> 2
--    ^           ^
--    |           |
--    1           1
--    |           |
--    V           V
--    3 <---0---> 4
--
-- this grammar should keep no backtracking information

p = m.P{
  [1] = '0' * m.V(2) + '1' * m.V(3) + -1,
  [2] = '0' * m.V(1) + '1' * m.V(4),
  [3] = '0' * m.V(4) + '1' * m.V(1),
  [4] = '0' * m.V(3) + '1' * m.V(2),
}

assert(p:match(string.rep("00", 10000)))
assert(p:match(string.rep("01", 10000)))
assert(p:match(string.rep("011", 10000)))
assert(not p:match(string.rep("011", 10000) .. "1"))
assert(not p:match(string.rep("011", 10001)))
  • During match time, LuLPeg may keep some garbage longer than needed, and certainly longer than what LPeg does, including the values produced by match-time captures (Cmt()). Not all garbage is kept around, though, and all of it is released after match() returns. In all cases, pattern / 0 and Cmt(pattern, function() return true end) discard all captures made by pattern.

Locales

LuLPeg only supports the basic C locale, defined as follows (in LPeg terms):

locale["cntrl"] = R"\0\31" + "\127"
locale["digit"] = R"09"
locale["lower"] = R"az"
locale["print"] = R" ~" -- 0x20 to 0xee
locale["space"] = S" \f\n\r\t\v" -- \f == form feed (for a printer), \v == vtab
locale["upper"] = R"AZ"

locale["alpha"]  = locale["lower"] + locale["upper"]
locale["alnum"]  = locale["alpha"] + locale["digit"]
locale["graph"]  = locale["print"] - locale["space"]
locale["punct"]  = locale["graph"] - locale["alnum"]
locale["xdigit"] = locale["digit"] + R"af" + R"AF"

Pattern identity and factorization:

The following is true with LuLPeg, but not with LPeg:

assert( P"A" == P"A" )

assert( P"A"*"B" == P"AB" )

assert( C(P"A") == C(P"A") )

assert( C(P"A") + C(P"B") == C(P"A" + "B") )

re.lua:

re.lua can be accessed as follows:

lulpeg = require"lulpeg"
re = lulpeg.re

if you call lulpeg:register(), you can also require"re" as you would with LPeg.

No auto-globals in Lua 5.1:

In Lua 5.1, LPeg creates globals when require"lpeg" and require"re" are called, as per the module() pattern. You can emulate that behaviour by passing the global table to lulpeg:register(), or, obviously, by creating the globals yourself :).

For Lua 5.1 sandboxes without proxies:

If you want to use LuLPeg in a Lua 5.1 sandbox that doesn't provide newproxy() and/or debug.setmetatable(), the #pattern syntax will not work for lookahead patterns. We provide the L() function as a fallback. Replace #pattern with L(pattern) in your grammar and it will work as expected.

"Global" mode for exploration:

LuLPeg:global(_G or _ENV) sets LuLPeg as the __index of the the current environment, sparring you from aliasing each LPeg command manually. This is useful if you want to explore LPeg at the command line, for example.

require"lulpeg":global(_G or _ENV)

pattern = Ct(C(P"A" + "B") ^ 0)
pattern:match"BABAB" --> {"B", "A", "B", "A", "B"}

UTF-8

The preliminary version of this library supported UTF-8 out of the box, but bitrot crept in that part of the code. I can look into it on request, though.

Performance:

LuLPeg with Lua 5.1 and 5.2 is ~100 times slower than the original.

With LuaJIT in JIT mode, it is from ~2 to ~10 times slower. The exact performance is unpredictable. Tiny changes in code, not necessarily related to the grammar, or a different subject string, can have a 5x impact. LPeg grammars are branchy, by nature, and this kind of code doesn't lend itself very well to tracing JIT compilation. Furthermore, LuaJIT uses probabilistic/speculative heuristics to chose what to compile. These are influenced by the memory layout, among other things, hence the unpredictability.

LuaJIT in with the JIT compiler turned off is ~50 times slower than LPeg.

License:

Copyright (C) Pierre-Yves Gerardy. Released under the Romantic WTF Public License.

The re.lua module and the test suite (tests/lpeg...tests.lua) are part of the original LPeg distribution, released under the MIT license.

See the LICENSE file for the details.

lulpeg's People

Contributors

aleclarson avatar khatskevich avatar pygy avatar pzduniak avatar seriane avatar serprex avatar solemnwarning avatar tst2005 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lulpeg's Issues

Some tests fails with lulpeg ?

Hello it's me again,

I see you have your own way to pack lulpeg.lua.
I tried to pack the lulpeg components (src/*.lua) with my own [lua-aio][lua-aio] util.
To compare our lulpeg.lua result to my lulpeg-aio.lua I tried to run the lpeg-0.12 tests suite.

I got the both result, that is good...
... but some tests fails.
Unfortunately I can not compare LuLPeg to LPeg because I have a LPeg v0.10 that completely fails the 0.12 tests.

  • Is LuLPeg does not pass all lpeg 0.12 tests ?
  • Is it known ?
  • Do you get the same result ?
lulpeg/tests$ for lua in luajit lua5.1 lua5.2; do for lulpeg in lulpeg lulpeg-aio; do echo "testing $lulpeg with $lua"; $lua lpeg.0.12.test.lua $lulpeg |tail -1; done; done               
testing lulpeg with luajit
OK      Success:        6093    Failures:       70
testing lulpeg-aio with luajit
OK      Success:        6093    Failures:       70
testing lulpeg with lua5.1
OK      Success:        6093    Failures:       70
testing lulpeg-aio with lua5.1
OK      Success:        6093    Failures:       70
testing lulpeg with lua5.2
OK      Success:        6092    Failures:       71
testing lulpeg-aio with lua5.2
OK      Success:        6092    Failures:       71

UTF-8 support.

Changed charset to UTF-8 and fixed utf8.char fuynction but anyway got error.

[GCompute GLua STEAM_0:1:513332898 (ekau) -> STEAM_0:1:513332898 (ekau)]
@!!! Match !!!@	userdata: 0x4db81898
------------------------------
set

Print pattern
set_Repr	table: 0x40986880	1072,1073,1074,1075,1076,1077,1078,1079,1080,1081,1082,1083,1084,1085,1086,1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100,1101,1102,1103
S( "абвгдежзийклмнопрстуфхцчшщъыьэюя" )
--- /pprint

!!! Done Matching !!! success: 	false	final position	1	final cap index	1	#caps	0
Failed
nil

print(package.loaded.lulpeg.R("ая"):dmatch("е"))

Tag current version

Hi. I've imported your rockspec in tarantool rocks as scm-1 version.
I would also like to release something like 0.1.0-1.rockspec, but there is no tags in your repo.
Could you please tag the latest version to be referenced as 0.1

installation error

Hi,

I'm trying to install LuLPeg with Lua 5.3 and I getting the error next:

./make.sh 
lua: ./init.lua:41: attempt to call a nil value (global 'map')
stack traceback:
	./init.lua:41: in main chunk
	[C]: in function 'require'
	../scripts/pack.lua:3: in main chunk
	[C]: in ?
'

Upgrade to Lpeg 1.0

Currently Lpeg 0.12 is emulated.
1.0 has been released (mostly bug fixes)

leg test suite fail with lua 5.2 + LuLPeg

Hello,

I tested leg, made to be use with LPeg.
All tests passed with :

  • lua 5.1 + LPeg
  • lua 5.1 + LuLPeg
  • luajit(5.1) + LPeg
  • luajit(5.1) + LuLPeg
  • lua 5.2 + LPeg

Except :

  • lua 5.2 + LuLPeg
lua5.2: ./lulpeg.lua:786: rule 'ID' used outside a grammar
stack traceback:
        [C]: in function 'error'
        ./lulpeg.lua:786: in function 'seq1'
        [string "Sequence"]:4: in function 'matcher'
        ./lulpeg.lua:495: in function <./lulpeg.lua:480>
        (...tail calls...)
        test_parser.lua:34: in function 'TEST'
        test_parser.lua:54: in main chunk
        [C]: in function 'dofile'
        test.lua:12: in main chunk
        [C]: in ?

Install error: permission denied (with sudo)

sudo -H luarocks install lulpeg:

Installing https://luarocks.org/lulpeg-0.1.0-1.src.rock
scripts/make.sh
sh: scripts/make.sh: Permission denied

Error: Build error: Failed building.

Nondeterministic behaviour when parsing

Hello

I'm using LuLPeg to parse a C-like template language. I'm not sure if this is down to an undiagnosed problem in my code or in LuLPeg itself, but if I run my test suite repeatedly it usually passes but sometimes fails (have seen random CI failures from this too). As far as I've been able to debug so far, when there are multiple possible matches (i.e. a + b + c) in the grammar, it is testing the earlier options, but somtimes choosing later ones instead.

My code is here: https://github.com/solemnwarning/rehex/blob/c160470aca8966fb8949dd8eff3321e9894f71b6/plugins/binary-template/parser.lua

I've been testing it using the parser_spec.lua in the same directory as follows:

ok=0; for i in $(seq 1 100); do busted parser_spec.lua && ok=$[$ok+1]; done; echo ok = $ok

It usually fails 5-10% of the time when using LuLPeg, but so far I haven't been able to make it fail once using LPeg.

EDIT: I have also seen it get stuck in either recursion or an infinite loop a few times, but that is very rare.

Adding number of replacement in gsub function of re module

Hello. It would be very good,if you will add a new param to gsub function,which allow to me number of replacements,which i want to have in line. in python regexp library and in string module in lua i have this feature,so i want to have it in your library. Thanks.

Lua 5.3 Compability

There are a few issues when running under Lua 5.3, ie, loadstring has become load, and there are a few other issues (ie, running the tests for typedlua fails in 5.3 but succeeds in 5.2). I have a version of typedlua here that will automatically use LuLPeg instead of lpeg.

I tried running the tests to give you a more concrete error report but I'm not sure how to execute some of them.

LPEGLabel support

Are there any plans to implement the improvements from the subsequent research project LPEGLabel by the authors of LPEG?

In the paper about it they show a significant increase in expressiveness and capabilities with the features of LPEGLabel.

Error with glua

This may be completely outside of your realm of interest, but I am working on a project that uses Moonscript and Gopher-Lua (Golang based Lua Interpreter) for plugins. I have my reasons for doing this, but I've created a single Lua script bundle for the Moonscript interpreter (using LuLPeg in place of LPeg) using Amalgamation and some personal tricks which works with both Lua and LuaJIT. This bundle is great for me single it has no deps and makes deployment way simpler. The problem is that it isn't working for Gopher-Lua! I will include the error below, I would love any input you may have as to how I can fix this issue.

Here are the files I am working with, running should be as simple as lua testers.lua | lua (swap the lua version as you see fit, I am trying to get Gopher-Lua to work which can be found here: https://github.com/yuin/gopher-lua#standalone-interpreter)
files.zip

panic: @/usr/share/lua/5.3/lulpeg.lua:1577: bad argument #1 to load (function expected, got string)
stack traceback:
	[G]: in function 'load'
	@/usr/share/lua/5.3/lulpeg.lua:1577: in function 'printers'
	@/usr/share/lua/5.3/lulpeg.lua:2839: in function 'LuLPeg'
	@/usr/share/lua/5.3/lulpeg.lua:2848: in function <@/usr/share/lua/5.3/lulpeg.lua:2794>
	@/usr/share/lua/5.3/lulpeg.lua:22: in function <@/usr/share/lua/5.3/lulpeg.lua:15>
	(tailcall): ?
	[G]: in function 'require'
	./moon-bundle.lua:2970: in function <./moon-bundle.lua:0>
	[G]: in function 'require'
	<string>:1: in main chunk
	[G]: ?

goroutine 1 [running]:
main.main()
	/home/rucuriousyet/go/src/gitlab.com/stackmesh/stackd/plugin/lua/glua.go:48 +0x878

Lunamark and LuLPeg

I'm currently trying to write a markdown parser in pure lua with luaJIT, and have chosen lunamark as the dependency. It has lpeg as dependency and so here I am.

The issue is likely a mix of things, but the issue I am having seems to come down to a call like (P("=")^1+P("-")^1) returning nil.

This is the callstack:

../luajit.exe: .\lunamark/lulpeg.lua:2255: attempt to call a nil value
stack traceback:
 .\lunamark/lulpeg.lua:2255: in function 'factorize_choice'
        .\lunamark/lulpeg.lua:2476: in function '__add'
        .\lunamark\reader\markdown.lua:914: in function 'new'
        test.lua:3: in main chunk
        [C]: at 0x7ff60cec47d0

the markdown lua referred to above is here

At the end of the day, that markdown.lua just localizes all the LuLPeg.P and other functions to their capital letter equivalent, i.e. LuLPeg.P = P, and at line 916, Parsers.dash and Parsers.Equals are P("-") and P("=").

The only other gotcha in all this is I'm using a pure lua utf8 library to polyfill the 5.3 utf8 library.

Do you have any suggestions? Sadly I'm not smart about lpeg at all.

Ambiguous syntax error

line 2023:
;(options.compiler or compiler)(Builder, LL)

gives me the error ambiguous syntax (function call x new statement) near '('

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.