dabeaz / ply Goto Github PK

View Code? Open in Web Editor NEW

2.7K 2.7K 458.0 1.08 MB

Python Lex-Yacc

Home Page: http://www.dabeaz.com/ply/index.html

Python 99.02% Shell 0.03% C 0.01% BASIC 0.77% Makefile 0.17%

ply's People

Stargazers

Watchers

Forkers

herczy pombredanne eriol kristofferkoch erossignon furrykef andkit zhshinco hellcoderz windwiny alex cshorler bcui6611 drnlm mdonahoe breakawayconsulting dacjames kennknowles mikehulluk inkymail mdboom distrotech nnen aleksandr-vin fcofdez yanok ygmpkk cherry-wb richardustc lelit zerolinux5 kris-kaspersky pyarnold atupal wfyaibaib tanshaona bukzor akirayu101 geraldstanje jbalthis oxnz gabrielaraujof boollab zaopuppy siweisun junjielee endlesscheng hamidnazari loryza mindw refi64 chinaquants inducer floppym kaschaeffer wfxiang08 jvtrigueros moses-palmer timleslie embray gmt joeedh cainiaocome warsaw nairolf21 mauriceling graingert douglaskastle dreamingpiggy horningfeng raquelalegre simudream manuphatak b-me navigator8 laerreal mdrubin cobbyom cschramm rikkigibson linearregression doctaphred h4ck3rm1k3 xvdy melund qiyeboy aalhour fu7iang alanwooo arturo1212 4sp1r3 sizeoftank nairwolf asegavac fraunhofercmi aa10000 yetone alex-duzhichao dyustc xiaowl

ply's Issues

Ply keeps regenerating parsetab.py even though nothing has changed

Create a file plytest.py with the following contents:

#!/usr/bin/python3 -tt

import ply.lex as lex
import ply.yacc as yacc

tokens = ['NAME']

t_NAME='[a-zA-Z]+'

def p_word(t):
    'word : NAME'
    t[0] = t[1]

lex.lex()
parser = yacc.yacc()
result = parser.parse('abcd')

Then do this:

mkdir subdir; cd subdir
../plytest.py

Ply will generate parsetab.py in the subdir. But if you run plytest.py again, it will regenerate parsetab.py and overwrite the old one. The produced files are always different. A sample diff between two runs looks like this.

< _lr_action_items = {'NAME':([0,],[1,]),'$end':([1,2,],[-1,0,]),}

---
> _lr_action_items = {'$end':([1,2,],[-1,0,]),'NAME':([0,],[1,]),}

The same happens when you try to write output in a different directory with yacc.yacc(outputdir='some_dir').

If you run the script with ./plytest.py it will work.

lexer.input() does not reset the state stack

If a lexer is using push_state() / pop_state(), that stack is not erased when a new input is specified via the input() method.

    def input(self,s):
        # Pull off the first character to see if s looks like a string
        c = s[:1]
        if not isinstance(c,StringTypes):
            raise ValueError("Expected a string")
        self.lexdata = s
        self.lexpos = 0
        self.lexlen = len(s)

In fact, it is pretty difficult to unwind it without accessing lex internals, so I have something like this in my code for now:

        if hasattr(self._lexer, "lexstatestack"):
            while len(self._lexer.lexstatestack) > 0:
                self._lexer.pop_state()

Typo in code snippet in documentation

The following example is given:

# C or C++ comment (ignore)    
def t_ccode_comment(t):
    r'(/\*(.|\n)*?*/)|(//.*)'
    pass

The regex contains an error: there is a backslash missing in front of the end of the C comment. It should be:

r'(/\*(.|\n)*?\*/)|(//.*)

Documentation mistake

In the official documentation, in the class example MyLexer, test function, it was ommited a "self" keyword before "lexer.token()". On https://github.com/dabeaz/ply/blob/master/doc/ply.html, line 911.

Precedence Issues

I may be missing something but I think there is a problem with precedence when there is more than one terminal in a rule. I have sample code which tries various permutations using either the first, last or both terminals in the precedence list. I don't get the results I expect. Seems to never do right associativity and sometimes other problems.

token lineno lost on error recovery

In a grammar where NL is the token returned by newline characters, and SOME_KW is a keyword, I have some recovery rules of the form:

widget : SOME_KW error NL

Which, after accepting SOME_KW and encountering a syntax error, consumes tokens up to NL. So far, so good. However, the action:

errorAtLine = p.lineno(1)

returns None instead of the line number associated with SOME_KW -- the line number appears to get lost during error recovery, but it is there on successful actions. This leads me to believe error recovery is wiping it out. For my current purpose, I am using p.lineno(3) as a work-around for picking up a line number near the syntax error.

Sorry, no test case right now... I'll try to make a cut down test case in the near future.

Trouble with yacc

I recently installed ply 3.4 On trying the example for yacc in the docs, I got this error:

calc > 1

Traceback (most recent call last):
  File "C:\Users\karuna\Desktop\Jython\Python\My Modules\PYTML\html_ast.py", line 61, in <module>
    result = parser.parse(s)
  File "C:\Python2.7 For Chintoo\lib\site-packages\ply\yacc.py", line 265, in parse
    return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc)
  File "C:\Python2.7 For Chintoo\lib\site-packages\ply\yacc.py", line 881, in parseopt_notrack
    lexer = lex.lexer
AttributeError: 'module' object has no attribute 'lexer'

Why is this happening?

Error in lexer matching rules

I have a set of rules which both match my input, but instead of returning the first matching rule, PLY returns the broadest rule every time. Adjusting the rule ordering does not seem to help, and I've checked to see if my input string has other characters in it.

    tokens = (
            'FORMFEED','PAGE','ACCOUNTS','ENDSTATEMENT','START','VALIDLINE',
    )

    ##
    ## Regexes for use in tokens
    ##
    ##

    FORMFEED  = r'\f'
    PAGE      = r'\s+STATEMENT PAGE \#: 1\s*'
    ACCOUNTS  = r'=+ S H A R E  A C C O U N T S =+'
    ENDSTATEMENT = r'<\d+>=+ E N D   O F   S T A T E M E N T =+'
    VALIDLINE = r'[\S \t]+'
    START     = r'[\x00]+[ ]+'

    ##
    ## Lexer states
    ##
    states = (
    )

    # Newlines
    def t_NEWLINE(self, t):
        r'\n+'
        t.lexer.lineno += t.value.count("\n")

    @TOKEN(START)
    def t_START(self, t):
        return t

    @TOKEN(PAGE)
    def t_PAGE(self, t):
        return t

    @TOKEN(ACCOUNTS)
    def t_ACCOUNTS(self, t):
        return t

    @TOKEN(ENDSTATEMENT)
    def t_ENDSTATEMENT(self, t):
        return t

    @TOKEN(VALIDLINE)
    def t_VALIDLINE(self, t):
        return t

    @TOKEN(FORMFEED)
    def t_FORMFEED(self, t):
        return t

When I give it the input string:

=============================== S H A R E A C C O U N T S ===============================

I expect to receive an ACCOUNTS token. Instead I get a VALIDLINE token. Putting the lexer into debug mode shows that the master regex is:

I tested that in python and I receive the proper return values. Why is PLY returning the wrong token?

error in validate_file

When a lexer is build from a class instance (like in §4.14 of the documentation), the LexerReflect object will call its validate_rules() method (by default optimize=0), which calls ultimately validate_file(f) on each file.

validate_file, checks for redefinition of rule functions by parsing the entire python file...but it should restrict itself to the class namespace. This leads to incorrect errors when several classes (with possibly the same set of rules 't_xxx') are defined within the same python file (see example below).

Fix: I'd suggest to get rid of validate_file anyway which seems a bit odd.

example : plytest.py------------------------------------------

from ply import lex

class lexer1(object):
def init(self):
self.tokens = ('regulars',)
self.t_ignore = ' '
def t_regulars(self,t):
r'\w+'
return t
def t_ANY_error(self,t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
def build(self,**kargs):
self._lexer = lex.lex(module=self,**kargs)
def test(self,data):
self._lexer.input(data)
while 1:
tok = self._lexer.token()
if not tok: break
print tok

class lexer2(object):
def init(self):
self.tokens = ('regulars',)
self.t_ignore = ' '
def t_regulars(self,t):
r'[0-9]+'
return t
def t_ANY_error(self,t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
def build(self,**kargs):
self._lexer = lex.lex(module=self,**kargs)
def test(self,data):
self._lexer.input(data)
while 1:
tok = self._lexer.token()
if not tok: break
print tok

l1 = lexer1()
l2 = lexer2()

l1.build()

Release 3.5 any time soon?

Hi, there are quite a lot commits since the last release in 2011. Is there a plan to make a release 3.5 with fixes any time soon?

Ways to define tokens seems to have different effects.

Suppose you have the following tokens defined:

tokens = (
    'THIS',
    'ANYTHING'
)

t_THIS = r'this'
t_ANYTHING = r'[a-z]+'

Any string is tokenized has an "anything" token. At first I thought this was what should be expected, but then I tried doing the following.

tokens = (
    'THIS',
    'ANYTHING'
)

def t_THIS(t):
    r'this'
    return t

def t_ANYTHING(t):
    r'[a-z]+'
    return t

Like this, any "this" string gets tokenized as a "this" token because that definition comes first, while anything else gets tokenized as "anything".

If I switch the two rules around, then nothing gets tokenized as a "this" token again, which seems to make sense and also is what happens when I do the same thing with flex.

Is this what is supposed to be expected and I am missing something, or is it a bug? Even if it not really a bug, should it be working like this?

Lexer lextokens changed from dict to set, breaks oe-lite

Hi,

With the release of ply-3.6 we have run into an issue when using it in the OE-lite project. See this issue: oe-lite/core#27

Basically, the lextoken was changed from a dict to a set (although the comment in the code still says dict ;) ), which breaks our code.

I'm thinking we are likely using some internal that we should not mess with as a consumer of Ply, however we need some way of accessing this information to make it work for OE-lite. We can of course just work around it, converting either a set or a dict to the list of tokens we need, but I would like to ask for your opinion about it to make a better solution possible.

Br,
Kim

Install/deployment methodology best practices

I am struggling with how best to write a setup.py to deploy a tool built using ply. The problem is how to deal with the parser output files. The fact that my tool uses ply under the hood should be transparent to the user, so I want to avoid leaving ply-dust all over the user's workspace.

In principal, it is simple enough to simply call my tool once at the end of the install script to generate the parse tables. The question is, where should they go? It would be nice to let them land in the package directory along with other support modules, but that directory name needs to somehow get permanently patched into the executable.

My grammar is small enough that I could simply turn-off caching, but I still get the parser.out file, even with debug=0, and a message about "Generating LALR tables."
edit: OK, so that was a PEBKAC, I was confused about debug=0 during generation versus usage. With debug=0 during generation, the output and the "Generating..." message can be suppressed. Still, this is only a solution for small grammars.

A third approach may be to create the parse tables in $HOME/.foo but that has it's own issues of making sure the path exists, etc.

What are best practices for deploying ply?

Edit #2 -- Well, I have what seems like a solution, but I'd like feedback -- it seems very warty.

Added a --runonceoninstall option to my application.
setpy.py does subprocess.call(''mytool','--runonceoninstall']) at the end
my parsing module does

_cacheDir = dirname(file) # Find out where this module lives during install.

_parser = yacc.yacc(debug=0,outputdir=_cacheDir) # Cache files to install directory.
my main driver causes my parser to get imported, which triggers the yacc.yacc() call above
the --runonceoninstall option does import parsetab which causes the creation of parsetab.pyc

This appears to work, but seems crufty. Suggestions welcome.

Test errors and failures with Python <2.6, Python ≥3.5, Jython and PyPy

Test suite of PLY has some errors and failures with Python <2.6, Python ≥3.5, Jython and PyPy.
All results for PLY trunk.

Reduce conflicts on generated yacctab.py

I notice that the latest version of ply (actually, current master), puts the fullpath of the yacctab.py in its header, something it didn't do in previous versions, and does not with lextab.py.

Consider:

--- a/src/slimit/yacctab.py
+++ b/src/slimit/yacctab.py
@@ -1,330 +1,330 @@

-# yacctab.py
+# /home/lele/wip/slimit/src/slimit/yacctab.py
 # This file is automatically generated. Do not edit.
-_tabversion = '3.2'
+_tabversion = '3.5'

This may introduce noisy/spurious conflicts when two different people rebuild the file.

As a side note, maybe it would be nicer having similar headers in the generated files: the lextab.py one is like the following:

# lextab.py. This file automatically created by PLY (version 3.6). Don't edit!
_tabversion   = '3.5'

Literals should be valid token types

I have a ply.lex class in which I've defined "{" and "}" as literals via the "literals" attribute, but I also want to attach an action to these tokens to match nesting.

I thought I would be able to do it as follows, but it complains about an unknown token type. Am I doing this correctly? I'd also like to avoid having a named token for something that really is a literal.

class Lexer(object):
    def __init__(self):
        self.lexer = ply.lex.lex(module=self)
        self.nesting = 0

    literals = [ '{', '}' ]

    def t_LBRACE(self, t):
        r'\{'
        self.nesting += 1
        t.type = '{'
        return t

    def t_RBRACE(self, t):
        r'\}'
        self.nesting += 1
        t.type = '}'
        return t

I'm not sure if this is intended design, or an oversight, so I thought I'd report it.

ply 3.7 missing from PyPI

It looks like ply 3.7 wasn't uploaded to PyPI.

https://pypi.python.org/pypi/ply

ply lex seems not greedy (when using class)

I have write an exemple of the issue : https://github.com/Astyan-42/plynotgreedy
I don't know a lot about ply but if I remember well lex should be greedy so the KEYWORD should be reconized like a KEYWORD and not a word.

Here the statements of the parser seem to count in the token size.

optionally omit or trim path info stored in table files

Parser production rules in PLY table files include path information (the p.file). This could be a security issue, because:

generating and including tables in a distribution archive makes the path public. So one either has to move to a neutral location (like /tmp) in order to produce a distribution, or they must not include the parser table files (requires unnecessary case selection in setup.py), or they disclose their local directory structure.
at installation time, on a multi-user system, if the tables are generated before copying to site-packages, then they will contain path information for the user that performed the installation. Others may have read access to site-packages, but not to the user's directories. (Thanks to @slivingston for describing this use case).

Reading through PLY's source code, it follows that the file path information is used only for error reporting, not for parsing functionality. It would be convenient if an option was made available in yacc.yacc to either:

omit the path from the table file, or
trim the path, keeping only the file name, so that error reporting functionality be affected less.

In any case, the paths reported for an installed package, whose tables were generated before copying to site-packages, will show incorrect path prefixes (preceding the package directory name).

Cannot install ply 3.5 via pip

Error message is:

$ pip install ply==3.5
Collecting ply==3.5
  Could not find a version that satisfies the requirement ply==3.5 (from versions: 3.4)
  Some externally hosted files were ignored as access to them may be unreliable (use --allow-external ply to allow).
  No matching distribution found for ply==3.5

Missing pickle file (or perhaps a misspelled file path)

When ply (dependency of Todoflow python module) calls its internal function yacc(...,picklefile=None), it is calling it with a picklefile that does not exist: /Users/my_account/.todoflow_parsetab.pickle. Please see my simple script below and the error output. Notice that in the last line of the error output, there is a u in front of the file path. This may be causing the issue, but I haven't yet found out when it gets put there...

FYI I am running El Cap public beta, ply (3.6), Todoflow (4.0.1). Perhaps this is instead a Todoflow issue?

todoflow_now.py:

#!/usr/local/bin/python
# return query of tasks to be added to reminders Now list

import sys
import os
import time

#sys.path.append('/usr/local/lib/python2.7/site-packages/')
import todoflow

curDate = time.strftime("%Y-%m-%d")
todos = todoflow.from_path('~/Dropbox/Notes/test.taskpaper')

#def get_project(iterable):

for item in todos.search('((@today or @focus or @overdue or @next or @due <= ' & curDate & ' ) and not @done)+d'):
    print item
    #item.tag('@done', curDate)
    #item.remove_tag('@today')

error messages:

$ todoflow_now.py 
Traceback (most recent call last):
  File "/Users/my_account/bin/todoflow_now.py", line 8, in <module>
    import todoflow
  File "/usr/local/lib/python2.7/site-packages/todoflow/__init__.py", line 3, in <module>
    from .todoflow import from_text, from_path, from_paths, from_dir
  File "/usr/local/lib/python2.7/site-packages/todoflow/todoflow.py", line 4, in <module>
    from .parser import parse
  File "/usr/local/lib/python2.7/site-packages/todoflow/parser.py", line 4, in <module>
    from .todos import Todos, Node
  File "/usr/local/lib/python2.7/site-packages/todoflow/todos.py", line 5, in <module>
    from .querying_parser import parse as parse_query
  File "/usr/local/lib/python2.7/site-packages/todoflow/querying_parser.py", line 247, in <module>
    debug=0, write_tables=0,
  File "/usr/local/lib/python2.7/site-packages/ply/yacc.py", line 3242, in yacc
    read_signature = lr.read_pickle(picklefile)
  File "/usr/local/lib/python2.7/site-packages/ply/yacc.py", line 1986, in read_pickle
    in_f = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: u'/Users/my_account/.todoflow_parsetab.pickle'

yacc: Publish the current state to the error callback

I have a use-case where I am using ply generated grammar for a programmable tab-completion infrastructure.

This is the one: https://github.com/balabit/syslog-ng/blob/master/modules/python/pylib/syslogng/debuggercli/completerlang.py

Basically, whenever the user presses tab to request completion, I parse the currently edited string according to a grammar and upon the occurence of the injected TAB token I inspect the current state of the parser.

This way I get all the terminals and rules that apply at the specific point where the user pressed TAB. This information can then be used collect the potential completions, making it pretty easy to implement tab completion for LALR(1) grammars.

Right now, "state" is not published anywhere, it is a local variable within the parse() function, which I look up using sys._getframe(). This is of course fragile and was recently broken due as the part that called the error handler function was extracted into a separate function in ply 3.6.

I see that statestack is published in the parser and its last element should be equal to the current state, however I am not entirely sure.

If I got some help I would be willing to contribute a patch.
Thanks.

call_errorfunc() should return the result of errorfunc()

I added the following note to the commit that (I think) introduced the problem:

Shouldn't the call_errorfunc() actually return the result of errorfunc() here?
Otherwise the logic in the parse() methods (currently at line 566, line 844 and line 1105) won't ever see the fixup token.

Python 3.5 compatibility

Several testlex.py failures under Python 3.5:

% python3.5 testlex.py
....F/usr/lib/python3.5/unittest/case.py:625: ResourceWarning: unclosed file <_io.BufferedReader name=4>
  outcome.errors.clear()
FFF.............F.F..................
======================================================================
FAIL: test_lex_opt_alias (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 541, in test_lex_opt_alias
    self.assert_(pymodule_out_exists("aliastab.pyo"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_optimize (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 390, in test_lex_optimize
    self.assert_(pymodule_out_exists("lextab.pyo"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_optimize2 (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 443, in test_lex_optimize2
    self.assert_(pymodule_out_exists("opt2tab.pyo"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_optimize3 (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 494, in test_lex_optimize3
    self.assert_(pymodule_out_exists("lexdir/sub/calctab.pyo"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_re1 (__main__.LexErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 161, in test_lex_re1
    contains=True))
AssertionError: False is not true

======================================================================
FAIL: test_lex_re3 (__main__.LexErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 179, in test_lex_re3
    contains=True))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 42 tests in 1.595s

FAILED (failures=6)

Chromium build error with ply-3.6

Attempting to build the Chromium web browser using ply-3.6 fails with the following error:

[351/17437] cd ../../third_party/WebKit/Source/bindings/scripts; python blink_idl_parser.py ../../../../../out/Release/gen/blink/bindings/scripts
FAILED: cd ../../third_party/WebKit/Source/bindings/scripts; python blink_idl_parser.py ../../../../../out/Release/gen/blink/bindings/scripts
Traceback (most recent call last):
  File "blink_idl_parser.py", line 456, in <module>
    sys.exit(main(sys.argv))
  File "blink_idl_parser.py", line 452, in main
    parser = BlinkIDLParser(outputdir=outputdir, rewrite_tables=True)
  File "blink_idl_parser.py", line 428, in __init__
    picklefile=picklefile)
  File "/usr/lib64/python2.7/site-packages/ply/yacc.py", line 3242, in yacc
    read_signature = lr.read_pickle(picklefile)
  File "/usr/lib64/python2.7/site-packages/ply/yacc.py", line 1986, in read_pickle
    in_f = open(filename, 'rb')
IOError: [Errno 2] No such file or directory: '../../../../../out/Release/gen/blink/bindings/scripts/parsetab.pickle'

The relevant script is here:

https://chromium.googlesource.com/chromium/blink/+/master/Source/bindings/scripts/blink_idl_parser.py

I think it passes a picklefile to yacc.yacc() which is not expected to exist; ply-3.4 would allow this without error, and I think it would write a picklefile for future use.

Does that sound right? Or is this script doing something wrong?

Test failures with Python 3.3

Some tests fail randomly with Python 3.3 due to hash randomization. Test suite might need to be run multiple times to reproduce all failures.

$ python3.3 testyacc.py
======================================================================
FAIL: test_yacc_inf (__main__.YaccErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testyacc.py", line 209, in test_yacc_inf
    "Token 'NUMBER' defined, but not used\n"
AssertionError: False is not true

======================================================================
FAIL: test_yacc_prec1 (__main__.YaccErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testyacc.py", line 368, in test_yacc_prec1
    "Precedence rule 'left' defined for unknown symbol '+'\n"
AssertionError: False is not true

======================================================================
FAIL: test_yacc_rr (__main__.YaccErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testyacc.py", line 288, in test_yacc_rr
    "Generating LALR tables\n"
AssertionError: False is not true

======================================================================
FAIL: test_yacc_rr_unused (__main__.YaccErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testyacc.py", line 299, in test_yacc_rr_unused
    "no p_error() function is defined\n"
AssertionError: False is not true

======================================================================
FAIL: test_yacc_unused (__main__.YaccErrorWarningTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testyacc.py", line 336, in test_yacc_unused
    "yacc_unused.py:62: Symbol 'COMMA' used, but not defined as a token or a rule\n"
AssertionError: False is not true

----------------------------------------------------------------------

Unit tests when using the regex module instead of re fails

I tried replacing "import re" with "import regex as re" in lex.py and yacc.py for good measure (I saw that regex is slotted for python 3.4 and that that module supports concurrency) but got two errors in the lex unit tests (python testlex.py):

FAIL: test_lex_re1 (main.LexErrorWarningTests)

Traceback (most recent call last):
File "testlex.py", line 143, in test_lex_re1
"Invalid regular expression for rule 't_NUMBER'. unbalanced parenthesis\n"))
AssertionError: False is not true

FAIL: test_lex_re3 (main.LexErrorWarningTests)

Traceback (most recent call last):
File "testlex.py", line 155, in test_lex_re3
"Invalid regular expression for rule 't_POUND'. unbalanced parenthesis\n"
AssertionError: False is not true

According to the maintainer of regex this is because the exception messages are different - regex throws "missing )" instead of "unbalanced parenthesis".

Is it correct to rely on the message strings for the unit tests?

Missing trailing context?

In lex/flex, there is a 'trailing context' pattern: match/require, such as

a/n

which will match only a if it follow by n. So a in and, can, another are matched, but a in alone, mask is not matched.

I found this feature is missing in PLY. Since I'm greatly rely on this feature, please add it.

cannot do "import ply.lex as lex"

solved

Does ignoring comments work in Python 3.2?

Hi,

Discarding comments by returning None seems does not work. If add:

def t_COMMENT(t):
    r'\#.*'
    pass
    # No return value. Token discarded

And run example from the documentation with Python3.2, we will get:

Traceback (most recent call last):
File "./parsing/calc.py", line 115, in
yacc.parse(s)
File "./parsing/ply/yacc.py", line 303, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc)
File "./parsing/ply/yacc.py", line 1095, in parseopt_notrack
tok = call_errorfunc(self.errorfunc, errtoken, self)
File "./parsing/ply/yacc.py", line 196, in call_errorfunc
r = errorfunc(token)
File "./parsing/calc.py", line 106, in p_error
print("Syntax error at '%s'" % t.value)
AttributeError: 'NoneType' object has no attribute 'value'

The same is true for simple '\n'. Ideas?

Thanks!

XOREQUAL pattern needs escaping

As the title says, the t_XOREQUAL pattern in ctokens.py needs to escape the ^ character for the pattern to work correctly

lex crashes python when given a bad regex

To Reproduce:
$ cat > crash.py << EOF
import lex

tokens = ('STAR',)
t_STAR = '*'

lex.lex(optimize=1)
EOF
$ python3 --version; uname -a
Python 3.2.3rc2
Linux ariel-linux2 3.2.0-2-686-pae #1 SMP Mon May 21 18:24:12 UTC 2012 i686 GNU/Linux
$ python3 crash.py
Fatal Python error: Cannot recover from stack overflow.
Aborted
Expected Result:
A clean exception.
Actual Result:
Python segfaults. Note that no exception is raised - Python segfaults, giving no stack trace.

pip installation with python3 fails

When installing with "pip install ply" on Python 2, it correctly downloads and installs PLY-3.4
However, doing the same with Python 3, it downloads and tries to install PLY 2.5, which is not Python 3 compatible - so the installation fails.

TypeError in lex

With the 3.6 update, I see the following TypeError where slimit makes a call to ply:

line 893, in lex
    if '.' not in lextab:
TypeError: argument of type 'module' is not iterable

Errors in the testsuite in testlex.py and python 2.7.5 on RHEL-7

matej@mitmanek: test$ python testlex.py
.F..FFF...................................
======================================================================
FAIL: test_lex_many_tokens (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 544, in test_lex_many_tokens
    self.assert_(os.path.exists("manytab.py"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_opt_alias (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 487, in test_lex_opt_alias
    self.assert_(os.path.exists("aliastab.py"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_optimize (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 342, in test_lex_optimize
    self.assert_(os.path.exists("lextab.py"))
AssertionError: False is not true

======================================================================
FAIL: test_lex_optimize2 (__main__.LexBuildOptionTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "testlex.py", line 396, in test_lex_optimize2
    self.assert_(os.path.exists("opt2tab.py"))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 42 tests in 0.101s

FAILED (failures=4)
matej@mitmanek: test$ python -V
Python 2.7.5
matej@mitmanek: test$

Using pyl 3.4 installed from pypi.

Please make the parsetab.py files reproducible

Whilst working on the Debian reproducible builds effort, I noticed that python-ply generates parsetab.py files with non-determinstic contents.

I first had a quick go at fixing this by adding a bunch of sorts inside write_table but looking deeper into the data structures it appears that "more" determinism is needed to ensure that the states are consistently numbered across builds. There are whole bunch of iterations over dict's items() throughout the table generation which—as you are no doubt aware—are non-determinstic. I'm sure some of these are harmless from a reproducibility point of view, so simply adding sorted() everywhere would be a total mess.

Of course, one solution would be to wontfix this and simply decree that these files are non-determistc.. but that would require that Debian etc. would not be able to ship these useful optimisations as they would render the package unreproducible.

UnicodeDecodeError on pip install

When using "pip install ply" on Python 3.3, a UnicodeDecodeError is thrown.

$ python --version
Python 3.3.0
$ pip install PLY
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.3/threading.py", line 639, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.3/threading.py", line 596, in run
    self._target(*self._args, **self._kwargs)
  File "/home/danilo/.virtualenvs/pycoolc/lib/python3.3/site-packages/pip-1.2.1-py3.3.egg/pip/index.py", line 245, in _get_queued_page
    page = self._get_page(location, req)
  File "/home/danilo/.virtualenvs/pycoolc/lib/python3.3/site-packages/pip-1.2.1-py3.3.egg/pip/index.py", line 337, in _get_page
    return HTMLPage.get_page(link, req, cache=self.cache)
  File "/home/danilo/.virtualenvs/pycoolc/lib/python3.3/site-packages/pip-1.2.1-py3.3.egg/pip/index.py", line 466, in get_page
    inst = cls(u(contents), real_url, headers)
  File "/home/danilo/.virtualenvs/pycoolc/lib/python3.3/site-packages/pip-1.2.1-py3.3.egg/pip/backwardcompat.py", line 44, in u
    return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 10831: invalid start byte

Downloading/unpacking PLY
  Downloading ply-3.4.tar.gz (138kB): 138kB downloaded
  Running setup.py egg_info for package PLY

Installing collected packages: PLY
  Running setup.py install for PLY

Successfully installed PLY
Cleaning up...

The installation itself works fine though. Not sure whether this is a PLY or a pip issue.

lexer interface specification?

Is there a description of the lexer interface required by the PLY parser? I am interested in combining the PLY parser layer with the vera++ tool (which already has a tokenizer) and am wondering what methods/properties I need to provide so that PLY's parser could use a "foreign" lexer.

sdist package on PyPI

Is there any reason why no sdist package has been uploaded to PLY's PyPI page? I tried adding ply to my requirement list, but when I try to install my package with pip (under Python 3), all hell breaks loose because it tries to download ply v2.5, which seemingly doesn't support py3k.

Can i define lexer in the form of classes?

instead of defining regex rules in individual functions with names like t_<some_name>
can't we use python classes to define these functions in, and then pass a class instance to lex.lex()?

I'm new to writing lexers/parsers as part of my course on compilers, so I might as well ask :-
Can this be done in the current implementation?
and if not, is it even fruitful and beneficial in some way?

t_INTEGER regex in cpp.py

To match "1UL" as a single integer, the t_CPP_INTEGER regex in ply/ply/cpp.py shoud be

r'(((((0x)|(0X))[0-9a-fA-F]+)|(\d+))([uU][lL]|[lL][uU]|[uU]|[lL])?)'

Note the different order of [uU][lL] vs. [uU]

Partial ES6 Transpiler

Hi all. I wrote a ES6 transpiler using PLY, which I've been using in production code for several years now. I'm planning on switching to a different transpiler (so I don't have to continue maintaining my own), but I wanted to let the PLY developers know I really love PLY and express my deepest gratitude. I even implemented it's lexical scanner API in JavaScript.

PLY was my first experience with bottom-up compiler-compilers. I found the tutorial on the website extremely helpful. At the time, I had found myself suddenly in need of an ES6 transpiler. I panickly searched the web, only to discover (it's funny, but true) that every single transpiler website was down that day, which made me (erroneously) think they were not being maintained.

That was when I turned to PLY. I managed to get something that served my basic needs up and running in, if memory serves, two weeks or so. The sheer speed I could work in PLY was awe-inspiring. Of course I ran into difficult grammar issues like everyone else, though it helped that the ES6 spec had plenty of tips on making JavaScript's weird grammar work in a bottom-up parser.

Anyway, my actual transpiler code (especially the AST manipulation stuff) is horrible, but I thought you might want to know that someone found your project (quite) useful.

Newcomer has a problem with ply....

Hi,
Sorry but I could not find any other place to write about my issues.
I am new in Compiler and trying to learn using PLY, it seems very easy and handy, so thank you for you great project. :)
The question is that i cannot made the correct parser for the case when we have something like the following code:

BG = 12 mg/dL
timeout = None

the part which is related to this code in parser is

reserved = {
'bg' : 'BG',
'timeout' : 'TIMEOUT',
}

t_STRING = r'[a-zA-Z]\w_'
t_BG_VALUE = r'\d+\s_mg\s_/\s_dL'
t_TIMEOUT_VALUE = 'None|(30\s_sec)|([1-3]\s_min)'''

def p_configuration_step(t):
'''configuration_step : BG ASSIGN BG_VALUE
| TIMEOUT ASSIGN TIMEOUT_VALUE'''
node = ASTNode(NodeTypes.ConfigurationStep, t[1])
node.insertChild(0, ASTNode(NodeTypes.ConfigurationValue, t[3]))
t[0] = node

what ever I do I got

Syntax error on timeout at line 1

Please advice....

skype: akeshmiri
email: [email protected]

ply-3.6.tar.gz in PyPI containing .pyc files

Hi,

the release file contains .pyc files.
(found when trying to generate debian source with stdeb ; it's not a big issue and can be resolved easily by hand - just a note).

raise exception in a p_ function

I am using ply and don't know why raise SyntaxError won't run .

But only the raise inside p_error works. Where am I wrong? Thank you , please attention to this..

shift/reduce conflicts: resolving by precedence

Hello,

in parser.out I have the state 8 where shift/reduce conflict is resolved by precedence and reduce rule is being used. However I would really expect the shift rule to be preferred.

 precedence = (
       ('left', 'WS'),                                                                                                         
 )  


 state 8

     (11) expr -> ID . WS ID
     (16) expr -> ID .

     SEMICOL         reduce using rule 16 (expr -> ID .)
     NEWLINE         reduce using rule 16 (expr -> ID .)
     $end            reduce using rule 16 (expr -> ID .)
     RCURLY          reduce using rule 16 (expr -> ID .)
     WS              reduce using rule 16 (expr -> ID .)

   ! WS                  [ shift and go to state 16 ]

Literals empty list causes IndexError

If i set:

literals = []

I get:

Traceback (most recent call last):
File "Translator.py", line 379, in
lexer = lex(outputText)
File "Translator.py", line 105, in lex
lexer = lex.lex(reflags=re.VERBOSE, module=LexerPassThrough)
File "ply/lex.py", line 920, in lex
IndexError: list index out of range

Below is a suggested possible patch.

*** lex.py.old  Wed Dec 21 10:44:55 2011
--- lex.py  Wed Dec 21 10:45:28 2011
***************
*** 917,923 ****

      # Get literals specification
      if isinstance(linfo.literals,(list,tuple)):
!         lexobj.lexliterals = type(linfo.literals[0])().join(linfo.literals)
      else:
          lexobj.lexliterals = linfo.literals

--- 917,926 ----

      # Get literals specification
      if isinstance(linfo.literals,(list,tuple)):
!         if linfo.literals:
!             lexobj.lexliterals = type(linfo.literals[0])().join(linfo.literals)
!         else:
!             lexobj.lexliterals = ""
      else:
          lexobj.lexliterals = linfo.literals

I could also see calling iter(linfo.literals) , which would be more Pythonic than an explicit check for list/tuple, but that's maybe going too far.

Warnings when using with slimit package

I get the following warnings when I use the slimit package.

WARNING: Couldn't write lextab module <module 'slimit.lextab' from '/usr/lib/python2.7/site-packages/slimit/lextab.pyc'>. Won't overwrite existing lextab module
WARNING: yacc table file version is out of date
WARNING: Token 'IMPORT' defined, but not used
WARNING: Token 'BLOCK_COMMENT' defined, but not used
WARNING: Token 'ENUM' defined, but not used
WARNING: Token 'EXTENDS' defined, but not used
WARNING: Token 'LINE_COMMENT' defined, but not used
WARNING: Token 'LINE_TERMINATOR' defined, but not used
WARNING: Token 'CONST' defined, but not used
WARNING: Token 'EXPORT' defined, but not used
WARNING: Token 'CLASS' defined, but not used
WARNING: Token 'SUPER' defined, but not used
WARNING: There are 10 unused tokens
WARNING: Couldn't create <module 'slimit.yacctab' from '/usr/lib/python2.7/site-packages/slimit/yacctab.pyc'>. Won't overwrite existing tabmodule

I get this with ply 3.7 and 3.8. How can I fix this?

Don't use docstring when @TOKEN decorator is being used

I think it's an elegant solution to enter the lexer regexes in the docstrings of function, but as stated in the documentation, this has some drawbacks. When using the @token decorator, there should be no reason to set doc over any other atribute (such as e.g. 'regex') on the function. I'll attach a patch to show how it might be solved differently. Setting another attribute wil allow users of ply (me :) to write a lexer that works with "python -O", if they want to.

Thanks for a very nice framework, I'll let you know if I make something cool with it :)

yacc start keyword and parsetab caching

I think I found a bug with parse table caching and the start keyword to
yacc.yacc().

This script illustrates the problem:

""" Nasty behavior for start=
"""

tokens = ['FOO', 'BAR']

t_FOO = r'foo'
t_BAR = r'bar'


def p_foo_bar(p):
    ' foo_bar : FOO BAR'
    p[0] = 'have foobar'


def p_bar(p):
    ' bar : BAR '
    p[0] = 'have bar'


if __name__ == '__main__':
    import os
    from ply import lex, yacc
    lex.lex()
    # Remove written parsed tables
    if os.path.exists('parsetab.py'):
        os.unlink('parsetab.py')
    if os.path.exists('parsetab.pyc'):
        os.unlink('parsetab.pyc')
    # Generate a parser with non-default start rule
    parser = yacc.yacc(start='bar')             # no error if commenting
    assert parser.parse('bar') == 'have bar'    # out these two lines
    # Generate a parser with default start rule and another tabmodule
    parser = yacc.yacc(start='foo_bar', tabmodule='another')
    # This works
    assert parser.parse('foobar') == 'have foobar'
    # Generate a parser with default start rule and tabmodule
    parser = yacc.yacc(start='foo_bar')
    # The following failus "yacc: Syntax error at line 1, token=FOO"
    assert parser.parse('foobar') == 'have foobar'

Investigating further, I think what is happening is that the changes to the
start symbol around 3129 of yacc.py get written to the parsetab module, but
they do not change the signature of the parsetab module. When yacc.yacc()
gets called with another start symbol (or the default), it reads the lex /
yacc symbols from the relevant module or class, checks the signature, detects
that the signature matches the cached parsetab signature, and uses the cached
parstab, even though the specified (or default) start synbol differs from the
start symbol in the previoulsy written parsetab. This can be very confusing,
because the actual start symbol used will depend which one got written first.

It wasn't clear to me what the right fix for this was. I wonder whether the
yacc() should specify the start symbol in the lexer / grammar symbols before
checking the signatures, something like:

diff --git a/ply/yacc.py b/ply/yacc.py
index f70439e..e50d81c 100644
--- a/ply/yacc.py
+++ b/ply/yacc.py
@@ -3054,6 +3054,10 @@ def yacc(method='LALR', debug=yaccdebug, module=None, tabmodule=tab_module, star
     else:
         pdict = get_caller_module_dict(2)

+    # Set start symbol if specified
+    if start is not None:
+        pdict['start'] = start
+
     # Collect parser information from the dictionary
     pinfo = ParserReflect(pdict,log=errorlog)
     pinfo.get_all()

This does change the signature from pinfo.signature() so will force yacc()
to regenerate the parsetab module unless the explicit start symbol was the
same.

Thanks for a lot for Ply, I have had good use from it.