idank / bashlex Goto Github PK
View Code? Open in Web Editor NEWPython parser for bash
License: GNU General Public License v3.0
Python parser for bash
License: GNU General Public License v3.0
It looks like, that bashlex has problems with parsing of case statements. Please try read following into parser.parse:
case "$1" in
start)
start
;;
stop)
stop
;;
*)
echo $"Usage: $0 {start|stop}"
exit 1
esac
I have following error message:
Traceback (most recent call last):
File "ttt.py", line 12, in
trees = parser.parse(s)
File "/home/joe/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
File "/home/joe/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/home/joe/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 439, in parse
p.callable(pslice)
File "/home/joe/.local/lib/python3.10/site-packages/bashlex/parser.py", line 401, in p_pattern
handleNotImplemented(p, 'pattern')
File "/home/joe/.local/lib/python3.10/site-packages/bashlex/parser.py", line 17, in handleNotImplemented
raise NotImplementedError('type = {%s}, token = {%s}' % (type, p[1]))
NotImplementedError: type = {pattern}, token = {start}
I hope, you can help here.
Best regards
Parsing fails for
if [[ -f "../build/tmp/dklm/klm_exports.h" ]]
(I plan to keep working on this, but I wanted to make everyone aware of it first.)
The README indicates that the following should work:
>>> bashlex.split('cat <(echo "a $(echo b)") | tee')
['cat', '<(echo "a $(echo b)")', '|', 'tee']
However, when I run it, I get a generator (see #13) that cannot be converted to a list without raising an error. For example:
>>> list(bashlex.split('cat <(echo "a $(echo b)") | tee'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bashlex/tokenizer.py", line 1176, in split
doublequoted, 0, 0)
File "bashlex/subst.py", line 225, in _expandwordinternal
node, sindex[0] = _extractprocesssubst(parserobj, string, tindex)
File "bashlex/subst.py", line 61, in _extractprocesssubst
node, si = _parsedolparen(parserobj, string, sindex)
File "bashlex/subst.py", line 31, in _parsedolparen
copiedps = copy.copy(parserobj.parserstate)
AttributeError: 'tokenizer' object has no attribute 'parserstate'
Here's an online demo. Note that the command can be simplified to $(echo) or `echo` (the latter raises a slightly different error).
The following bash code throws a parsing error
let
X=1
ParsingError: unexpected token '\n' (position 3)
Any plans to fix this?
bashlex/utils.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
class typedset(collections.MutableSet):
bashlex/utils.py:51
bashlex/utils.py:51: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
class frozendict(collections.Mapping):
-- Docs: https://docs.pytest.org/en/latest/warnings.html
I just wanted to say thanks for making this library. It's been super-useful while I've been implementing a feature in cibuildwheel. Parsing is super hard, and this seems to nail it! :)
by the way, did you (or did you know of anything) ever try to make something that would try to execute the ast, or even just evaluate CommandNodes, CommandsubstitutionNodes and ParameterNodes? I'm working on something that does that at the moment :)
/usr/lib/python3.5/site-packages/bashlex/init.py in ()
----> 1 import parser, tokenizer
ImportError: No module named 'tokenizer'
Attempting to parse a script with array declaration fails upon encountering the opening set mark (ie: ().
The following bashlex
information was provided by pip
:
$ pip show bashlex
Name: bashlex
Version: 0.18
Summary: Python parser for bash
Home-page: https://github.com/idank/bashlex.git
Author: Idan Kamara
Author-email: [email protected]
License: GPLv3+
Location: /home/user/.local/lib/python3.10/site-packages
Requires:
Required-by:
In a Python interactive session with the following setup:
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bashlex
Running the bashlex.parse
function with the string declare -a CMDS=()
produces the following output:
>>> bashlex.parse('declare -a CMDS=()')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
tok = self.errorfunc(errtoken)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 548, in p_error
raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token '(' (position 16)
When removing the round brackets it succeeds:
>>> bashlex.parse('declare -a CMDS')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 7) word='declare'), WordNode(parts=[] pos=(8, 10) word='-a'), WordNode(parts=[] pos=(11, 15) word='CMDS')] pos=(0, 15))]
It's independent of the declare
keyword:
>>> bashlex.parse('CMDS=()')
bashlex.errors.ParsingError: unexpected token '(' (position 5)
The error occurs when appending to the array as well:
>>> bashlex.parse('CMDS+=("init")')
bashlex.errors.ParsingError: unexpected token '(' (position 6)
Parsing parenthesis is not by itself the issue:
>>> bashlex.parse('(env)')
[CompoundNode(list=[ReservedwordNode(pos=(0, 1) word='('), CommandNode(parts=[WordNode(parts=[] pos=(1, 4) word='env')] pos=(1, 4)), ReservedwordNode(pos=(4, 5) word=')')] pos=(0, 5) redirects=[])]
The lexer seems to recognize arrays as WordNode
s:
>>> bashlex.parse('ARRAY[1]=init')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 13) word='ARRAY[1]=init')] pos=(0, 13))]
>>> bashlex.parse('echo ${ARRAY[*]}')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 4) word='echo'), WordNode(parts=[ParameterNode(pos=(5, 16) value='ARRAY[*]')] pos=(5, 16) word='${ARRAY[*]}')] pos=(0, 16))]
>>> bashlex.parse('unset ARRAY[1]')
[CommandNode(parts=[WordNode(parts=[] pos=(0, 5) word='unset'), WordNode(parts=[] pos=(6, 14) word='ARRAY[1]')] pos=(0, 14))]
It just seems to have issues recognizing array sets when performing assignments.
To reproduce do:
import bashlex
bashlex.parser.parse('echo $(pwd && pwd)')
Results in ParsingError: unexpected token ')' (position 10)
Parsing also fails for 'echo $(pwd || pwd)'
and 'echo $(pwd & pwd)'
, but 'echo $(pwd ; pwd)'
parses just fine.
The tokenizer fails to parse files with backslashes separating lines properly\
For example:
for hook in \
/etc/* \
/lib/* \
/etc/*
do
echo hook
done
Results in the following exception:
Exception has occurred: ParsingError (note: full exception trace is shown but execution is paused at: <module>)
unexpected token '/etc/*' (position 15)
File "[/bashlex/bashlex/parser.py]()", line 589, in p_error
raise errors.ParsingError('unexpected token %r' % p.value,
File "[/bashlex/bashlex/yacc.py]()", line 1107, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "[/bashlex/bashlex/yacc.py]()", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "[/bashlex/bashlex/parser.py]()", line 733, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "[/bashlex/bashlex/parser.py]()", line 652, in parse
parts = [p.parse()]
File "[/bashlex/example.py]()", line 4, in <module> (Current frame)
parts = bashlex.parse(script)
Parsing a file with new lines between statements is not supported. For the following script:
echo "Line 1"
echo "Line 3"
The sample program (the one in the README) generates the following error:
Traceback (most recent call last):
File "sp.py", line 4, in <module>
parts = bashlex.parse(open(sys.argv[1]).read())
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 595, in parse
part = _parser(s[index:], strictmode=strictmode).parse()
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 641, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token 'echo' (position 1)
Hey,
I am trying to traverse through the bashlex.ast.node to find something specific, such as if the bash command is writing something to temp or deleting any file.
Till now I was trying manual check
if (tree[0].tree[i].word) == 'rm' :
return command
But I assume its not the right way to traverse the Bashlex AST tree, what if I need to find the files with are writing to Temp directory.
Can you shed some light on how I can efficiently traverse through the AST and fulfill the above requirement.
As a noob to Linux, I hated seeing tutorials online that shows which commands to use, but gave arguments along with them and didn't explain what the option meant. I use Explainshell everyday to learn more about commands and now feel a lot more comfortable running tutorial commands knowing exactly what they do.
I actually wondered if there was something like this, only wondered for a few week before finding this though. Just wanted to say thank you :)
Hello,
I am packaging bashlex as a conda package but the license file is not available in the PyPI tarball.
Could you please include it in the next release?
xref: conda-forge/staged-recipes#4401
Best regards,
Sebastian
To facilitate broader coverage of the analyzer, it would be good for the parser to add "unimplemented nodes" to the AST rather than raising an error. This can be done as follows:
$ git-diff bashlex/parser.py
...
+from mezcla import system
+
+ADD_UNIMPLEMENTED_NODE = system.getenv_bool("ADD_UNIMPLEMENTED_NODE", False,
+ "Add unimplemented nodes to parse tree")
+
from bashlex import yacc, tokenizer, state, ast, subst, flags, errors, heredoc
def _partsspan(parts):
@@ -13,14 +19,21 @@ precedence = (
)
def handleNotImplemented(p, type):
- if len(p) == 2:
+ if ADD_UNIMPLEMENTED_NODE:
+ parts = _makeparts(p)
+ p[0] = ast.node(kind='unimplemented', parts=parts, pos=_partsspan(parts))
+ elif len(p) == 2:
raise NotImplementedError('type = {%s}, token = {%s}' % (type, p[1]))
else:
raise NotImplementedError('type = {%s}, token = {%s}, parts = {%s}' % (type, p[1], p[2]))
This way, a parse tree can still be recovered even though a particular construct is not supported:
$ ADD_UNIMPLEMENTED_NODE=1 python -c 'import bashlex; print(bashlex.parse("case fu in esac")[0].dump())'
UnimplementedNode(pos=(0, 15), parts=[
ReservedwordNode(pos=(0, 4), word='case'),
WordNode(pos=(5, 7), word='fu'),
ReservedwordNode(pos=(8, 10), word='in'),
ReservedwordNode(pos=(11, 15), word='esac'),
])
I can add a pull request for this if you want.
So currently the p_arith_command and _extractcommandsubst functions pop a NotImplemented errors when an arithmetic expression is found. In bash-master/make_cmd line 430, the make_arith_command function is implemented simply to set the .value attribute equal to the string, the flags to zero, give the node type cm_arith, and set the redirects to null. Would adding this implementation into the p_arith_command function be an acceptable fix? subst.py would also need to be changed to implement these functions. If you just call _parsedelparen on the airthmetic expression, the parsing seems to work just fine. The parens shouldn't be parsed as nodes and the node type should be 'arith_cmd' but those are easy fixes. Is there something I am missing as to why these aren't implemented?
I am trying to parse the following bash script (simplified example) and print the produced AST as JSON using bashlex 0.12:
function a {
a;
}
# Comment
But it fails:
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 614, in parse
part = _parser(s[index:], strictmode=strictmode).parse()
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 682, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 10)
A trivial workaround is to wrap the code in any other construct, the simplest being a set of curly braces. Then everything works just fine:
{
function a {
a;
}
# Comment
}
Of course I can live with the workaround but I think it would be great if you took a look at it.
Thanks a lot for the great job you've done!
https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html
Expected result:
>>> list(bashlex.split("echo $'hello'"))
['echo', 'hello']
>>> list(bashlex.split("echo $'hello\\nworld'"))
['echo', 'hello\nworld'] # notice \\n becomes a real newline character \n
Actual result (bashlex
0.15):
>>> list(bashlex.split("echo $'hello'"))
['echo', '$hello']
>>> list(bashlex.split("echo $'hello\\nworld'"))
['echo', '$hellonworld']
Quoting Bash Guide for Beginners :: 10.2.1. Creating arrays:
Array variables may also be created using compound assignments in this format:
ARRAY=(value1 value2 ... valueN)
I have lots of scripts with such statements:
ARRAY=('value1' 'value2')
However they raise a ParsingError:
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 682, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '(' (position 6)
Hi,
I was about to make a pull request for a command substitution that returned a wrong position when there are spaces in the commands in the command substitution $(foo )
. But then I discovered another problem and I couldnt find how to fix it. It is when there is a semi column in the list of commands, parsing failed.
Ex: parsing this command failed $(foo;)
. I got this error. Can you tell me where the problem lies? It is a list, but I couldnt find where to fix it. It does not happen with another form of command subsitution `command;`
. I know they are treated differently in parser.
File "python3.6/site-packages/bashlex/parser.py", line 605, in parse
parts = [p.parse()]
File "python3.6/site-packages/bashlex/parser.py", line 686, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "python3.6/site-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "python3.6/site-packages/bashlex/yacc.py", line 998, in parseopt_notrack
p.callable(pslice)
File "python3.6/site-packages/bashlex/parser.py", line 157, in p_simple_command_element
p[0] = [_expandword(parserobj, p.slice[1])]
File "python3.6/site-packages/bashlex/parser.py", line 137, in _expandword
doublequoted, 0, 0)
File "python3.6/site-packages/bashlex/subst.py", line 271, in _expandwordinternal
node, sindex[0] = _paramexpand(parserobj, string, sindex[0])
File "python3.6/site-packages/bashlex/subst.py", line 165, in _paramexpand
return _extractcommandsubst(parserobj, string, zindex + 1)
File "python3.6/site-packages/bashlex/subst.py", line 55, in _extractcommandsubst
node, si = _parsedolparen(parserobj, string, sindex)
File "python3.6/site-packages/bashlex/subst.py", line 42, in _parsedolparen
node, endp = _recursiveparse(parserobj, base, sindex, tokenizerargs)
File "python3.6/site-packages/bashlex/subst.py", line 23, in _recursiveparse
node = p.parse()
File "python3.6/site-packages/bashlex/parser.py", line 686, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "python3.6/site-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "python3.6/site-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "python3.6/site-packages/bashlex/parser.py", line 543, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token ')' (position 4)
Could you please also include a wheel when releasing bashlex? Only the classic ".egg" is provided, so this triggers a SDist install (I don't think .egg's are used anymore, setuptools-scm is the only other package I know still also providing an egg). See reasons listed here: https://pythonwheels.com for why wheels are nice, even for pure-python packages (faster, better security, pre-generates .pyc's, etc). Thank you!
pip wheel .
will make one, or use pip install build && python -m build
(may be best with a pyproject.toml too, which is also a good idea, but I think it works in legacy mode for simple packages).
I wonder if it is possible to convert the parsed AST back into a valid bash script? Since the grammar is already there, in theory nothing stops it from doing so, right?
This happens on explainshell.com.
for the code as following
code = '''cat << EOF
abc
def
EOF'''
ret = bashlex.parse(code)
print(ret[0].dump())
we got:
CommandNode(pos=(0, 9), parts=[
WordNode(pos=(0, 3), word='cat'),
RedirectNode(heredoc=
HeredocNode(pos=(10, 21), value='abc\ndef\nEOF'), output=
WordNode(pos=(6, 9), word='EOF'), pos=(4, 21), type='<<'),
])
that's fine so far.
but for the code:
code = '''function foo () {
cat << EOF
abc
def
EOF
}'''
ret = bashlex.parse(code)
print(ret[0].dump())
we got:
FunctionNode(pos=(0, 40), parts=[
ReservedwordNode(pos=(0, 8), word='function'),
WordNode(pos=(9, 12), word='foo'),
ReservedwordNode(pos=(12, 13), word='('),
ReservedwordNode(pos=(13, 14), word=')'),
CompoundNode(list=[
ReservedwordNode(pos=(15, 16), word='{'),
ListNode(pos=(17, 39), parts=[
CommandNode(pos=(17, 26), parts=[
WordNode(pos=(17, 20), word='cat'),
RedirectNode(heredoc=
HeredocNode(pos=(31, 38), value='def\nEOF'), output=
WordNode(pos=(23, 26), word='EOF'), pos=(21, 26), type='<<'),
]),
OperatorNode(op='\n', pos=(26, 27)),
CommandNode(pos=(27, 30), parts=[
WordNode(pos=(27, 30), word='abc'),
]),
OperatorNode(op='\n', pos=(30, 39)),
]),
ReservedwordNode(pos=(39, 40), word='}'),
], pos=(15, 40)),
])
in this case, abc
no longer a part of the heredoc, but came out as a standalone CommandNode.
The space in between \ and \n causes the tokenizer to not treat the \ as an independent and removable character, like it would if there were no space. Bash treats these as the same so it makes sense for the parser to do so as well
I'm seeing a strange bug with variable assignments
>>> list(bashlex.split("PATH=\"$PATH:/usr/local/bin/\""))
['PATH="$PATH:/usr/local/bin/"']
^ ^
# note the quote marks /
>>> list(bashlex.split("PATH2=\"$PATH:/usr/local/bin/\""))
['PATH2=$PATH:/usr/local/bin/']
# the quote marks are gone!
In the above example, it seems to be the number in the env var name that triggers the removal of quotes.
The following example shows that a preceeding var assignment with a number in the name will trigger the different quote behaviour.
>>> list(bashlex.split("VAR_ABC=1 PATH=\"$PATH:/usr/local/bin/\""))
['VAR_ABC=1', 'PATH="$PATH:/usr/local/bin/"']
^ ^
# note the quote marks /
>>> list(bashlex.split("VAR_123=1 PATH=\"$PATH:/usr/local/bin/\""))
['VAR_123=1', 'PATH=$PATH:/usr/local/bin/']
# the quote marks are gone!
Retaining the quotes is desirable for my use case. I can workaround, so I'm just wondering if this is a bug in bashlex or some strange bash behaviour.
Parsing a file with comments is not supported. For the following script:
# A comment
echo "A script with a comment"
The sample program (the one in the README) generates the following error:
Traceback (most recent call last):
File "sp.py", line 4, in <module>
parts = bashlex.parse(open(sys.argv[1]).read())
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 582, in parse
parts = [p.parse()]
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 641, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token 'echo' (position 12)
I'm using bashlex
to parse build log files to extract compilation commands. I've just realized that when single line strings with comments are passed to the parser, it fails raising the exception below:
Traceback (most recent call last):
File "/bin/compiledb", line 11, in <module>
load_entry_point('compiledb', 'console_scripts', 'compiledb')()
File "/usr/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.7/site-packages/click/core.py", line 1043, in invoke
return Command.invoke(self, ctx)
File "/usr/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/nick/projects/compiledb/compiledb-generator/compiledb/cli.py", line 74, in cli
done = generate(infile, outfile, build_dir, exclude_files, verbose, overwrite, not no_strict)
File "/home/nick/projects/compiledb/compiledb-generator/compiledb/__init__.py", line 78, in generate
r = generate_json_compdb(infile, proj_dir=build_dir, verbose=verbose, exclude_files=exclude_files)
File "/home/nick/projects/compiledb/compiledb-generator/compiledb/__init__.py", line 34, in generate_json_compdb
result = parse_build_log(instream, proj_dir, exclude_files, verbose)
File "/home/nick/projects/compiledb/compiledb-generator/compiledb/parser.py", line 103, in parse_build_log
commands = CommandProcessor.process(line, working_dir)
File "/home/nick/projects/compiledb/compiledb-generator/compiledb/parser.py", line 163, in process
trees = bashlex.parser.parse(line)
File "/home/nick/sandbox/bashlex/bashlex/parser.py", line 611, in parse
ef.visit(parts[-1])
File "/home/nick/sandbox/bashlex/bashlex/ast.py", line 35, in visit
k = n.kind
AttributeError: 'NoneType' object has no attribute 'kind'
Patch coming..
Hi,
I am using bashlex, installed system wide (in a container, but that's not the issue), and when I try to import it as an unprivileged user, it shows some errors:
# sudo -u user python -c "import bashlex"
Unable to create '/usr/lib/python3.7/site-packages/bashlex/parsetab.py'
[Errno 13] Permission denied: '/usr/lib/python3.7/site-packages/bashlex/parsetab.py'
tracking the issue, it seems that https://github.com/idank/bashlex/blob/master/bashlex/yacc.py#L3291 is the call that write this file.
It would be good for array assignments to flagged as unimplemented when the new proceedonerror flag is enabled. This way, a complete AST can still be generated.
Currently, array assignment leads to a parsing error:
$ snippet='num=2 arr=(1 2 3)'
$ python -c "import bashlex; print(''.join(p.dump() for p in bashlex.parse('$snippet', proceedonerror=0)))"
Traceback (most recent call last):
...
File "/usr/local/misc/programs/python/bashlex/bashlex/parser.py", line 587, in p_error
raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token '(' (position 10)
It would be better to add an unimplemented node to the AST:
$ python -c "import bashlex; print(''.join(p.dump() for p in bashlex.parse('$snippet', proceedonerror=1)))"
CommandNode(pos=(0, 17), parts=[
AssignmentNode(pos=(0, 5), word='num=2'),
UnimplementedNode(pos=(6, 17), word='arr=(1 2 3)'),
])
This can be implemented as follows (see attachment for complete diff):
--- a/bashlex/flags.py
+++ b/bashlex/flags.py
@@ -52,4 +52,5 @@ word = enum.Enum('wordflags', [
+ 'UNIMPLEMENTED', # word uses unimplemented feature (e.g., array)
--- a/bashlex/parser.py
+++ b/bashlex/parser.py
@@ -173,6 +173,8 @@ def p_simple_command_element(p):
+ if (p.slice[1].flags & flags.word.UNIMPLEMENTED):
+ p[0][0].kind = 'unimplemented'
@@ -720,6 +722,7 @@ class _parser(object):
+ proceedonerror=proceedonerror,
--- a/bashlex/tokenizer.py
+++ b/bashlex/tokenizer.py
@@ -199,7 +199,8 @@ eoftoken = token(tokentype.EOF, None)
- lastreadtoken=None, tokenbeforethat=None, twotokensago=None):
+ lastreadtoken=None, tokenbeforethat=None, twotokensago=None,
+ proceedonerror=None):
@@ -232,6 +233,7 @@ class tokenizer(object):
+ self._proceedonerror = proceedonerror
@@ -391,7 +393,7 @@ class tokenizer(object):
- d['dollar_present'] = d['quoted'] = d['pass_next_character'] = d['compound_assignment'] = False
+ d['dollar_present'] = d['quoted'] = d['pass_next_character'] = d['compound_assignment'] = d['unimplemented'] = False
@@ -467,6 +469,19 @@ class tokenizer(object):
+ def handlecompoundassignment():
+ # note: only finds matching parenthesis, so parsing can proceed
+ handled = False
+ if self._proceedonerror:
+ ttok = self._parse_matched_pair(None, '(', ')')
+ if ttok:
+ tokenword.append(c)
+ tokenword.extend(ttok)
+ d['compound_assignment'] = True
+ d['unimplemented'] = True
+ handled = True
+ return handled
+
@@ -512,6 +527,8 @@ class tokenizer(object):
+ elif c == '(' and handlecompoundassignment():
+ gotonext = True
@@ -573,7 +590,7 @@ class tokenizer(object):
- if d['compound_assignment'] and tokenword[-1] == ')':
+ if d['compound_assignment'] and tokenword.value[-1] == ')':
@@ -581,6 +598,10 @@ class tokenizer(object):
+ if d['compound_assignment']:
+ tokenword.flags.add(wordflags.ASSIGNARRAY)
+ if d['unimplemented']:
+ tokenword.flags.add(wordflags.UNIMPLEMENTED)
unimplemented-array-node-diff.txt
I can work this into a pull request if desired. I wasn't quite sure of the best way to handle the flags, so suggestions would be welcome. For example, I was going to use parser flags, but they seemed more related to internal state than final attribute.
Hi,
The current release 0.16 was released in September 2021, since you made a change to fix the blank line which never got released.
Could you set a new release 017 ?
Line 42 in 9017528
(root) # pip install bashlex
LICENSE put at /usr/local/LICENSE
(root) # pip install --user bashlex
LICENSE put at /root/.local/LICENSE
(venv) $ pip install bashlex
LICENSE put at venv/LICENSE
None of these locations seems related to bashlex at first glance.
I think this is too aggressive. Considering LICENSE file is included in tarball, and after installing bashlex LICENSE file can be found in site-packages/bashlex-0.13.dist-info/LICENSE, is this still necessary?
I checked in the test case, there is a test case for comment but in my application, when I tried to parse the comment (e.g # foo
), it failed
File "lib/python3.6/site-packages/bashlex/parser.py", line 611, in parse
ef.visit(parts[-1])
File "lib/python3.6/site-packages/bashlex/ast.py", line 35, in visit
k = n.kind
builtins.AttributeError: 'NoneType' object has no attribute 'kind'
I used the latest version 0.14.
Working off of the base for PR #71 so this will be relevant after that PR is merged.
Multiple new lines at the end of an input triggers an "Unexpected EOF" error in line 546, in p_error.
Minimal example:
from bashlex import parse
parts = parse('cmd1\n\n')
This is not the case for a single newline at the end of the file (as of PR #71).
There's no tag for 0.12 which is mentioned on PyPi
Hey idank, what precise version of bash did you use to build this? how difficult is it to redo or update?
>>> bashlex.parse('cmd1\ncmd2 \ncmd3\n')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bashlex/parser.py", line 614, in parse
part = _parser(s[index:], strictmode=strictmode).parse()
File "bashlex/parser.py", line 682, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token 'cmd3' (position 1)
When bashlex used first time it prints:
WARNING: Token 'COND_ERROR' defined, but not used
WARNING: There is 1 unused token
If an assign statement is used after local, global, and export it is treated as a word node, not an assignment node.
Hi,
I am using bashlex to parse some shell commands, and I encountered some problems with arguments of commands that are enclosed with quotes '...'
, the word node does not include surrounding quotes.
Example:
$ awk '{print $0};' /tmp/test
The dump of treenode outputs only {print $0};
. The correct token should be '{print $0};'
CommandNode(pos=(0, 25), parts=[
WordNode(pos=(0, 3), word='awk'),
WordNode(pos=(4, 15), word='print $0;'),
WordNode(pos=(16, 25), word='/tmp/test'),
])
I want to change parsing/tokenizer, but if you can pinpoint me to where I should change, I would be glad to do it.
I found a problem with bigger numbers after $ sign. Bashlex will only return first number. As you can see on this example:
[ParameterNode(pos=(879, 881) value='1')] pos=(879, 883) word='$124')
It should be value='124' not only '1'
While following seems to parse just fine:
foo && bar
However when I try to parse this:
foobar=$(foo && bar)
I get the following error:
bashlex.errors.ParsingError: unexpected token ')' (position 10)
The same goes for ||
.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/yacc.py", line 439, in parse
p.callable(pslice)
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 167, in p_simple_command_element
p[0] = [_expandword(parserobj, p.slice[1])]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 145, in _expandword
parts, expandedword = subst._expandwordinternal(parser,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/subst.py", line 271, in _expandwordinternal
node, sindex[0] = _paramexpand(parserobj, string, sindex[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/subst.py", line 165, in _paramexpand
return _extractcommandsubst(parserobj, string, zindex + 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/subst.py", line 55, in _extractcommandsubst
node, si = _parsedolparen(parserobj, string, sindex)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/subst.py", line 42, in _parsedolparen
node, endp = _recursiveparse(parserobj, base, sindex, tokenizerargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/subst.py", line 23, in _recursiveparse
node = p.parse()
^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/yacc.py", line 537, in parse
tok = self.errorfunc(errtoken)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/___/.local/lib/python3.11/site-packages/bashlex/parser.py", line 548, in p_error
raise errors.ParsingError('unexpected token %r' % p.value,
bashlex.errors.ParsingError: unexpected token ')' (position 10)
Hope this helps, thanks!
bashlex.errors.ParsingError: unexpected token '(' (position 7)
May be caused by the fact that '!' is interpreted as WORD instead of BANG.
Hi,
i have problems to parse whole test file. The problems are that bashlex hates empty lines and comments. Could you please fix it? I really would like to use it.
Thanks
If a variable declaration has a number in it, the parser will treat it as a singular word entity. This is true in bash only if the 1st character is a number. 2all=something, will not be treated as a variable declaration according to bash, but a2ll=something is treated as a variable declaration. The parser currently treats both of these are not being assignment statements.
Last one was in 2016. Without the additions in the most recent commit, bashlex fails to install for me because it tries to install enum34, which gets used over the standard enum.
Collecting compiledb
Downloading https://files.pythonhosted.org/packages/20/b8/b0912c8198baf67ebba62c46d21bbb16f03ff072eee782ee659dd11520ee/compiledb-0.9.8.tar.gz
Collecting click (from compiledb)
Downloading https://files.pythonhosted.org/packages/f8/5c/f60e9d8a1e77005f664b76ff8aeaee5bc05d0a91798afd7f53fc998dbc47/Click-7.0.tar.gz (286kB)
100% |████████████████████████████████| 286kB 5.8MB/s
Collecting bashlex (from compiledb)
Using cached https://files.pythonhosted.org/packages/e6/83/8f35a0a430908e5c964fbf31a8e46fbac125d1bbf066a1e26110c618a3ff/bashlex-0.12.tar.gz
Collecting enum34 (from bashlex->compiledb)
Downloading https://files.pythonhosted.org/packages/bf/3e/31d502c25302814a7c2f1d3959d2a3b3f78e509002ba91aea64993936876/enum34-1.1.6.tar.gz (40kB)
100% |████████████████████████████████| 40kB 8.5MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/a3/.local/share/pythons/c/lib/python3.7/site-packages/setuptools/__init__.py", line 6, in <module>
import distutils.core
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/core.py", line 16, in <module>
from distutils.dist import Distribution
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 9, in <module>
import re
File "/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 143, in <module>
class RegexFlag(enum.IntFlag):
AttributeError: module 'enum' has no attribute 'IntFlag'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/lz/tm467dx170g12t9bg6mg9h8w0000gn/T/pip-install-vbmmlqvd/enum34/
Parsing an if statement will crash if you tried anything along the lines of [[ 0 -eq 0 ]]. This bug comes because state 41 has no transition to states for parsing words in the test
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.