bblfsh / python-client Goto Github PK
View Code? Open in Web Editor NEWBabelfish Python client
Home Page: https://doc.bblf.sh/using-babelfish/clients.html
License: Apache License 2.0
Babelfish Python client
Home Page: https://doc.bblf.sh/using-babelfish/clients.html
License: Apache License 2.0
I try to run bblfsh from python for this file: https://github.com/INRIA/spoon/blob/master/src/test/resources/noclasspath/IsoEncoding.java
As you can see it is the file with non-standard characters.
If I run
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9234")
res = client.parse("IsoEncoding.java")
I get
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/type_checkers.py", line 182, in CheckValue
proposed_value = proposed_value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 69: invalid continuation byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1599, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1026, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/k/sourced/ast2vec/how_to_use_ast2vec.py", line 4, in <module>
res = client.parse("../bad_file/IsoEncoding.java")
File "/usr/local/lib/python3.6/site-packages/bblfsh/client.py", line 56, in parse
language=self._scramble_language(language))
File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 526, in init
setattr(self, field_name, field_value)
File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 662, in field_setter
new_value = type_checker.CheckValue(new_value)
File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/type_checkers.py", line 187, in CheckValue
(proposed_value))
ValueError: b'package my.pack;\n\npublic class IsoEncoding {\n\n private String entr\xe9e;\n public IsoEncoding(String maCha\xeene) {\n this.entr\xe9e = maCha\xeene;\n }\n\n private String r\xe9cup\xe9rerUneEntr\xe9e() {\n return this.entr\xe9e;\n }\n}' has type bytes, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added.
Is it expected behavior for non-unicode files?
If so I think it is better to provide better error message from bblfsh client.
Allow for using XPath functions with return values different from nodeslists. Already supported in libuast. Need support in the clients.
I ran pip install bblfsh
on RHEL 7.4 and I get:
Failed to build bblfsh
Installing collected packages: bblfsh, chardet, certifi, urllib3, idna, requests
Running setup.py install for bblfsh ... error
Complete output from command /opt/rh/rh-python35/root/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-phlmbc69/b
blfsh/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))"
install --record /tmp/pip-kkd6eyh4-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.5
creating build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/__main__.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/launcher.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/aliases.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/sdkversion.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/client.py -> build/lib.linux-x86_64-3.5/bblfsh
copying bblfsh/test.py -> build/lib.linux-x86_64-3.5/bblfsh
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg
copying bblfsh/gopkg/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg
creating build/lib.linux-x86_64-3.5/bblfsh/github
copying bblfsh/github/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in
copying bblfsh/gopkg/in/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh
copying bblfsh/gopkg/in/bblfsh/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk
copying bblfsh/gopkg/in/bblfsh/sdk/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1
copying bblfsh/gopkg/in/bblfsh/sdk/v1/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
copying bblfsh/gopkg/in/bblfsh/sdk/v1/uast/generated_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
copying bblfsh/gopkg/in/bblfsh/sdk/v1/uast/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2_grpc.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
creating build/lib.linux-x86_64-3.5/bblfsh/github/com
copying bblfsh/github/com/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com
creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo
copying bblfsh/github/com/gogo/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo
creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf
copying bblfsh/github/com/gogo/protobuf/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf
creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
copying bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
copying bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
running build_ext
curl -SL https://github.com/bblfsh/libuast/releases/download/v1.6.0/libuast-v1.6.0.tar.gz | tar xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 611 0 611 0 0 2395 0 --:--:-- --:--:-- --:--:-- 2405
100 21511 100 21511 0 0 42866 0 --:--:-- --:--:-- --:--:-- 42866
mv libuast-v1.6.0 libuast
cp -a libuast/src bblfsh/libuast
rm -rf libuast
building 'bblfsh.pyuast' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/bblfsh
creating build/temp.linux-x86_64-3.5/bblfsh/libuast
g++ -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-
strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/opt/rh/rh-python35/root/usr/include -O
2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=gener
ic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Ibblfsh/libuast/ -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/include -I/usr/include/libxml2 -
I/opt/rh/rh-python35/root/usr/include/python3.5m -c bblfsh/pyuast.c -o build/temp.linux-x86_64-3.5/bblfsh/pyuast.o -std=c++11
g++ -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-
strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/opt/rh/rh-python35/root/usr/include -O
2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=gener
ic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Ibblfsh/libuast/ -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/include -I/usr/include/libxml2 -
I/opt/rh/rh-python35/root/usr/include/python3.5m -c bblfsh/libuast/uast.cc -o build/temp.linux-x86_64-3.5/bblfsh/libuast/uast.o -std=c++11
In file included from bblfsh/libuast/uast.cc:4:0:
bblfsh/libuast/uast_private.h:4:25: fatal error: libxml/tree.h: No such file or directory
#include <libxml/tree.h>
^
compilation terminated.
error: command 'g++' failed with exit status 1
----------------------------------------
Command "/opt/rh/rh-python35/root/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-phlmbc69/bblfsh/setup.py';f=getatt
r(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/p
ip-kkd6eyh4-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-phlmbc69/bblfsh/
Additionally, the libxml2-dev package is installed on the system.
It is a suggestion of enhancement.
Sometimes I don't need a full list, and it is a more pythonic way to have an iterator instead of the list.
To discuss.
Original report by @AnneshaChowdhury: src-d/tmsc#10
Hi,
I am having a hard time with the installation of tmsc. I have been trying to get tmsc up but have been running into some problems. I am installing inside 'venv' on Windows 10.
This is the snippet of the error I have been getting.
pyuast.cc
bblfsh/pyuast.cc(481): error C2059: syntax error: '.'
bblfsh/pyuast.cc(502): error C2143: syntax error: missing ';' before '}'
bblfsh/pyuast.cc(504): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
bblfsh/pyuast.cc(504): error C2040: 'ctx': 'int' differs in levels of indirection from 'Uast *'
bblfsh/pyuast.cc(504): error C2065: 'iface': undeclared identifier
bblfsh/pyuast.cc(505): error C2059: syntax error: 'return'
bblfsh/pyuast.cc(506): error C2059: syntax error: '}'
bblfsh/pyuast.cc(506): error C2143: syntax error: missing ';' before '}'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.11.25503\\bin\\HostX86\\x86\\cl.exe' failed with exit status 2
----------------------------------------
Command "c:\progra~1\python36\venv~1\scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Annesha\\AppData\\Local\\Temp\\pip-install-ur9kqskb\\bblfsh\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Annesha\AppData\Local\Temp\pip-record-jp280ri4\install-record.txt --single-version-externally-managed --compile --install-headers c:\progra~1\python36\venv~1\include\site\python3.6\bblfsh" failed with error code 1 in C:\Users\Annesha\AppData\Local\Temp\pip-install-ur9kqskb\bblfsh\c\
Any help is appreciated!
Hi,
>>> for node in bblfsh.iterator(uasts[100], bblfsh.TreeOrder.POSITION_ORDER):
... print(node)
...
Segmentation fault (core dumped)
all other iterators are working well.
When I try to follow installation notes in README I failed.
pip3 install bblfsh
leads to the error:
pip3 install bblfsh
Collecting bblfsh
Downloading bblfsh-0.1.0.tar.gz
Complete output from command python setup.py egg_info:
make: *** No rule to make target 'deps'. Stop.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-m2t1aa3i/bblfsh/setup.py", line 16, in <module>
subprocess.check_output(['make', 'deps'])
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['make', 'deps']' returned non-zero exit status 2
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-m2t1aa3i/bblfsh/
and
git clone https://github.com/bblfsh/client-python.git
cd client-python
make install
leads to the next error:
Cloning into 'client-python'...
remote: Counting objects: 280, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 280 (delta 11), reused 21 (delta 8), pack-reused 255
Receiving objects: 100% (280/280), 131.22 KiB | 0 bytes/s, done.
Resolving deltas: 100% (132/132), done.
make: *** No rule to make target `install'. Stop.
I was able to install it using
git clone https://github.com/bblfsh/client-python.git
cd client-python
pip3 install .
import bblfsh
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/bblfsh/__init__.py", line 1, in <module>
from bblfsh.client import BblfshClient
File "/usr/local/lib/python3.5/dist-packages/bblfsh/client.py", line 6, in <module>
from bblfsh.aliases import (ParseRequest, NativeParseRequest, VersionRequest,
File "/usr/local/lib/python3.5/dist-packages/bblfsh/aliases.py", line 11, in <module>
"bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % VERSION)
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: No module named 'bblfsh.gopkg'
import importlib
import bblfsh
m = importlib.import_module('bblfsh.gopkg.in.bblfsh.sdk.v1.protocol.generated_pb2_grpc')
print(id(m.generated__pb2.gopkg_dot_in_dot_bblfsh_dot_sdk_dot_v1_dot_uast_dot_generated__pb2.Node))
print(id(bblfsh.Node))
m = importlib.import_module('bblfsh.gopkg.in.bblfsh.sdk.v1.uast.generated_pb2')
print(id(m.Node))
The first printed number is different from the rest. That's because bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2_grpc.py
does
import generated_pb2 as generated__pb2
As soon as the module is imported from the different PYTHONPATH, the classes get screwed and although they are really the same, they are different. Hence isinstance()
returns False on 3.7, etc.
Suggested fix: remove all the tricks with PYTHONPATH and make all the imports absolute everywhere and starting with bblfsh.
.
I see there is version 2.12.7 in releases on github but when I try to install it:
Could not find a version that satisfies the requirement bblfsh<3.0,>=2.12.7 (from lookout-sdk==0.2.0) (from versions: 0.0.4, 0.0.5, 0.1.0, 0.2.0, 0.2.1, 0.2.3, 1.0.0, 1.0.1, 1.1.0, 2.0.0, 2.1.0, 2.2.1, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.1, 2.6.0, 2.6.1, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 2.9.5, 2.9.6, 2.9.8, 2.9.10, 2.9.12, 2.9.13, 2.9.14, 2.10.0, 2.10.1, 2.11.0, 2.11.1, 2.11.2, 2.12.0, 2.12.1, 2.12.2, 2.12.3, 2.12.4, 2.12.5, 2.12.6)
Here https://pypi.org/search/?q=bblfsh I see only 2.12.6 too.
Doesn't make sense to have a Makefile
when we have a setup.py
file, also, we could remove the .pb files that were added to avoid a dependency on protoc
from the pip
package.
Makefile
to the setup.py
.--release
switch should be added to setup.py
to generate the .pb
files.protoc
and call --release
before the pip install
.pip install
should check if the pb files are generated and if not, retrieve the right release zip from github an extract the files from there before install.I'm using python client to parse a java source code file and use bblfsh.filter to check a condition of a rule.
The return type of this filter function is an iterator. I want to convert back to UAST.
how to do it ? please guide
I have installed libxml2
from brew
,
but pip install bblfsh
fails with this error:
In file included from bblfsh/libuast/uast.cc:4:
bblfsh/libuast/uast_private.h:4:10: fatal error: 'libxml/tree.h' file not found
#include <libxml/tree.h>
^~~~~~~~~~~~~~~
1 error generated.
error: command 'g++' failed with exit status 1
I suggest adding a new roles
module with roles info and add at least two functions:
def role_id(role_name):
return DESCRIPTOR.enum_types_by_name["Role"].values_by_name[role_name].number
def role_name(role_id):
return DESCRIPTOR.enum_types_by_name["Role"].values_by_number[role_id].name
Also, It is good to have something like constants with Roles id.
No need to go to documentation to check which roles we have or if you try to remember name (and you have tips from IDE :) ).
We have such constants in ast2vec:
https://github.com/src-d/ast2vec/blob/master/ast2vec/bblfsh_roles.py#L18
Can make a PR if you approve.
Hi,
I noticed that comments have some unexpected brackets (and symbols - should #
be included into token?):
616|[# noqa]| # noqa
| # noqa
Code to reproduce:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
file_loc = "location/of/file.py"
# read content
with open(file_loc, "r") as f:
content = f.read()
# extract uast
uast = client.parse(file_loc).uast
# select nodes with tokens and sort them by position
nodes = []
for node in bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER):
if node.token:
nodes.append(node)
nodes = list(sorted(nodes, key=lambda n: n.start_position.offset))
# print token position, token, select source by position information
for n in nodes:
print(n.start_position.offset, n.token,
content[n.start_position.offset:n.start_position.offset + len(n.token)],
content[n.start_position.offset:n.end_position.offset + 1],
sep="|")
The source code I used is in details
import argparse
import os
import tempfile
import unittest
import sourced.ml.tests.models as paths
from sourced.ml.models import Topics
from sourced.ml.cmd import bigartm2asdf
class TopicsTests(unittest.TestCase):
def setUp(self):
self.model = Topics().load(source=paths.TOPICS)
def test_dump(self):
res = self.model.dump()
self.assertEqual(res, """320 topics, 1000 tokens
First 10 tokens: ['ulcancel', 'domainlin', 'trudi', 'fncreateinstancedbaselin', 'wbnz', 'lmultiplicand', 'otronumero', 'qxln', 'gvgq', 'polaroidish']
Topics: unlabeled
non-zero elements: 6211 (0.019409)""") # noqa
def test_props(self):
self.assertEqual(len(self.model), 320)
self.assertEqual(len(self.model.tokens), 1000)
self.assertIsNone(self.model.topics)
zt = self.model[0]
self.assertEqual(len(zt), 8)
self.assertEqual(zt[0][0], "olcustom")
self.assertAlmostEqual(zt[0][1], 1.23752e-06, 6)
def test_label(self):
with self.assertRaises(ValueError):
self.model.label_topics([1, 2, 3])
with self.assertRaises(TypeError):
self.model.label_topics(list(range(320)))
self.model.label_topics([str(i) for i in range(320)])
self.assertEqual(self.model.topics[0], "0")
def test_save(self):
with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
self.model.save(f.name)
new = Topics().load(f.name)
self.assertEqual(self.model.tokens, new.tokens)
self.assertEqual((self.model.matrix != new.matrix).getnnz(), 0)
def test_bigartm2asdf(self):
with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
args = argparse.Namespace(
input=os.path.join(os.path.dirname(__file__), paths.TOPICS_SRC),
output=f.name)
bigartm2asdf(args)
model = Topics().load(f.name)
self.assertEqual(len(model), 320)
self.assertEqual(len(model.tokens), 1000)
if __name__ == "__main__":
unittest.main()
As result we may notice seral tokens without position information:
0|argparse|import a|i
0|os|im|i
0|tempfile|import a|i
0|unittest|import a|i
0|sourced.ml.tests.models|import argparse
import |i
0|paths|impor|i
0|sourced.ml.models|import argparse
i|i
0|Topics|import|i
0|sourced.ml.cmd|import argpars|i
0|bigartm2asdf|import argpa|i
0|source|import|i
0|!=|im|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i
0|==|im|i
184|TopicsTests|TopicsTests|TopicsTests
I am using the most recent bblfsh client, exec the following code:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
uast = client.parse("/path/to/src-d/gemini/gemini/python/setup.py").uast
list(bblfsh.filter(uast, "//@roleLiteral"))
I get SIGSEGV. gdb bt
:
#0 PyFilter (self=<optimized out>, args=<optimized out>) at bblfsh/pyuast.c:168
#1 0x00000000004e10ef in PyCFunction_Call () at ../Objects/methodobject.c:109
#2 0x00000000005240b4 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffbde0) at ../Python/ceval.c:4705
#3 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#4 0x000000000052cf19 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#5 0x000000000052dbcf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#6 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#7 0x0000000000543075 in builtin_exec_impl.isra.11 (locals=
{'_i2': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'bblfsh': <module at remote 0x7ffff3285d68>, '_sh': <module at remote 0x7ffff475def8>, '_iii': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', '_': <Node at remote 0x7ffff4b973f8>, 'In': ['', 'import bblfsh', 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'uast = client.parse("setup.py").uast', 'uast', 'list(bblfsh.filter(uast, "//@roleLiteral"))'], '_oh': {4: <...>}, '_i3': 'uast = client.parse("setup.py").uast', '__package__': None, '_ii': 'uast = client.parse("setup.py").uast', 'Out': {...}, 'client': <BblfshClient(_channel=<Channel(_channel=<grpc._cython.cygrpc.Channel at remote 0x7ffff4b744a8>, _connectivity_state=<_ChannelConnectivityState(callbacks_and_connectivities=[[<function at remote 0x7ffff329cb70>, <ChannelConnectivity(_name_='READY', __objclass__=<EnumMeta(__module__='grpc', __new__=<function at remote 0x7ffff5d0d7b8>, _member_map_={'TRANSIENT_FAILURE': <ChannelConnectivity(_name_='TRANSIENT_FAILURE', __objclass__=<...>, _value_=(3, 'tran...(truncated),
globals={'_i2': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'bblfsh': <module at remote 0x7ffff3285d68>, '_sh': <module at remote 0x7ffff475def8>, '_iii': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', '_': <Node at remote 0x7ffff4b973f8>, 'In': ['', 'import bblfsh', 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'uast = client.parse("setup.py").uast', 'uast', 'list(bblfsh.filter(uast, "//@roleLiteral"))'], '_oh': {4: <...>}, '_i3': 'uast = client.parse("setup.py").uast', '__package__': None, '_ii': 'uast = client.parse("setup.py").uast', 'Out': {...}, 'client': <BblfshClient(_channel=<Channel(_channel=<grpc._cython.cygrpc.Channel at remote 0x7ffff4b744a8>, _connectivity_state=<_ChannelConnectivityState(callbacks_and_connectivities=[[<function at remote 0x7ffff329cb70>, <ChannelConnectivity(_name_='READY', __objclass__=<EnumMeta(__module__='grpc', __new__=<function at remote 0x7ffff5d0d7b8>, _member_map_={'TRANSIENT_FAILURE': <ChannelConnectivity(_name_='TRANSIENT_FAILURE', __objclass__=<...>, _value_=(3, 'tran...(truncated), source=<code at remote 0x7ffff202e150>) at ../Python/bltinmodule.c:957
#8 builtin_exec () at ../Python/clinic/bltinmodule.c.h:275
#9 0x00000000004e10ef in PyCFunction_Call () at ../Objects/methodobject.c:109
#10 0x00000000005240b4 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffc0b0) at ../Python/ceval.c:4705
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#12 0x000000000052cf19 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#13 0x0000000000528b3f in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffc2c0, func=<optimized out>)
at ../Python/ceval.c:4813
#14 call_function (oparg=<optimized out>, pp_stack=0x7fffffffc2c0) at ../Python/ceval.c:4730
Code:
static PyObject *PyFilter(PyObject *self, PyObject *args) │
│150 { │
│151 PyObject *obj = NULL; │
│152 const char *query = NULL; │
│153 if (!PyArg_ParseTuple(args, "Os", &obj, &query)) │
│154 return NULL; │
│155 │
│156 Nodes *nodes = UastFilter(ctx, obj, query); │
│157 if (!nodes) { │
│158 char *error = LastError(); │
│159 PyErr_SetString(PyExc_RuntimeError, error); │
│160 free(error); │
│161 return NULL; │
│162 } │
│163 int len = NodesSize(nodes); │
│164 PyObject *list = PyList_New(len); │
│165 │
│166 for (int i = 0; i < len; i++) { │
│167 PyObject *node = (PyObject *)NodeAt(nodes, i); │
>│168 Py_INCREF(node); │
│169 PyList_SET_ITEM(list, i, node); │
│170 } │
│171 NodesFree(nodes); │
│172 return PySeqIter_New(list); │
│173 }
(gdb) p node
$1 = 0x0
py-bt
(gdb) py-bt
Traceback (most recent call first):
<built-in method filter of module object at remote 0x7ffff3285ef8>
File "<ipython-input-5-f6a6d28734b6>", line 1, in <module>
<built-in method exec of module object at remote 0x7ffff7fbc5e8>
File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
Hi,
I noticed that link to https://doc.bblf.sh/user/language-clients.html
is not working anymore.
Probably it should be replaced with https://doc.bblf.sh/using-babelfish/language-clients.html
.
With this file:
https://gist.github.com/juanjux/5d7f8b736fde2bbc8635c5f4f689b573
It crashes while trying to iterate over it:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
uast = client.parse(filename="issue_broken_bblfsh_a.py").uast
it = bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER)
next(it)
It crashes on the first call to the iterator's next()
. The uast object can be accessed and printed without problems.
Hi,
I noticed that bblfsh.filter
causes high memory consumption.
Here it's script to reproduce
from collections import defaultdict, deque
import bblfsh
IDENTIFIER = bblfsh.role_id("IDENTIFIER")
QUALIFIED = bblfsh.role_id("QUALIFIED")
LITERAL = bblfsh.role_id("LITERAL")
def uast2sequence(root):
sequence = []
nodes = defaultdict(deque)
stack = [root]
nodes[id(root)].extend(root.children)
while stack:
if nodes[id(stack[-1])]:
child = nodes[id(stack[-1])].popleft()
nodes[id(child)].extend(child.children)
stack.append(child)
else:
sequence.append(stack.pop())
return sequence
def filter_bblfsh(n_times=1000,
py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
py_uast = client.parse(py_path).uast
java_uast = client.parse(java_path).uast
XPATH = "//*[@roleIdentifier and not(@roleQualified)]"
for i in range(n_times):
bblfsh.filter(py_uast, XPATH)
bblfsh.filter(java_uast, XPATH)
def filter_alternative(n_times=1000,
py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
py_uast = client.parse(py_path).uast
java_uast = client.parse(java_path).uast
for i in range(n_times):
list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
uast2sequence(py_uast)))
list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
uast2sequence(java_uast)))
if __name__ == "__main__":
import sys
if int(sys.argv[1]) == 0:
print("bblfsh")
filter_bblfsh(n_times=int(sys.argv[2]))
else:
print("bblfsh-free")
filter_alternative(n_times=int(sys.argv[2]))
and some measurements (surprisingly python code ~20 times faster than from bblfsh-client):
egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 400
bblfsh
Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 100"
User time (seconds): 13.19
System time (seconds): 0.09
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.61
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 119948
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 38584
Voluntary context switches: 606
Involuntary context switches: 49
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
bblfsh
Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 200"
User time (seconds): 26.68
System time (seconds): 0.15
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 188672
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 63460
Voluntary context switches: 1146
Involuntary context switches: 115
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
bblfsh
Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 400"
User time (seconds): 54.72
System time (seconds): 0.22
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:55.22
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 326392
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 113651
Voluntary context switches: 2382
Involuntary context switches: 164
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 400
bblfsh-free
Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 100"
User time (seconds): 0.86
System time (seconds): 0.03
Percent of CPU this job got: 70%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.27
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 37548
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 12861
Voluntary context switches: 103
Involuntary context switches: 7
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
bblfsh-free
Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 200"
User time (seconds): 1.50
System time (seconds): 0.01
Percent of CPU this job got: 80%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.88
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 37172
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 15766
Voluntary context switches: 123
Involuntary context switches: 25
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
bblfsh-free
Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 400"
User time (seconds): 2.69
System time (seconds): 0.03
Percent of CPU this job got: 87%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.11
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 37292
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 21603
Voluntary context switches: 191
Involuntary context switches: 38
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
import bblfsh.github.com.gogo.protobuf.gogoproto.gogo_pb2
fails with
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py", line 670, in <module>
google_dot_protobuf_dot_descriptor__pb2.EnumOptions.RegisterExtension(goproto_enum_prefix)
File "/usr/local/lib/python3.7/site-packages/google/protobuf/internal/python_message.py", line 751, in RegisterExtension
cls.DESCRIPTOR.file.pool.AddExtensionDescriptor(extension_handle)
File "/usr/local/lib/python3.7/site-packages/google/protobuf/descriptor_pool.py", line 264, in AddExtensionDescriptor
extension.containing_type.full_name, extension.number))
AssertionError: Extensions "gogoproto.goproto_enum_prefix" and "gogoproto.goproto_enum_prefix" both try to extend message type "google.protobuf.EnumOptions" with field number 62001.
Reproducible exclusively on Python 3.7.0 final (works fine with beta3). How to reproduce:
docker run -it --rm python:3.7.0-stretch bash
pip3 install bblfsh
python3 -c "import bblfsh.github.com.gogo.protobuf.gogoproto.gogo_pb2"
Related to protocolbuffers/protobuf#2533 but not really.
Looks like the top-level import of bblfsh
is not visible to the import system and it imports the module twice. The workaround is to comment out that assertion inside protobuf.
Since it is hard to get DESCRIPTOR and Node modules, can we add it to bblfsh directly?
I suggest adding to __main__.py
something like this:
import importlib
DESCRIPTOR = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % bblfsh.sdkversion.VERSION).DESCRIPTOR
Node = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % bblfsh.sdkversion.VERSION).Node
Then everyone can just do from bblfsh import Node, DESCRIPTOR
.
Can make a PR if you approve.
The recent bblfsh release adds Version() API. Support it here?
I installed the most recent version 2.8 and I get:
In [1]: import bblfsh
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-e99ccc08549f> in <module>()
----> 1 import bblfsh
/usr/local/lib/python3.5/dist-packages/bblfsh/__init__.py in <module>()
----> 1 from bblfsh.client import BblfshClient
2 from bblfsh.pyuast import filter, iterator
3 from bblfsh.aliases import *
4
5 class TreeOrder:
/usr/local/lib/python3.5/dist-packages/bblfsh/client.py in <module>()
4 import grpc
5
----> 6 from bblfsh.aliases import ParseRequest, NativeParseRequest, VersionRequest, ProtocolServiceStub
7 from bblfsh.sdkversion import VERSION
8
/usr/local/lib/python3.5/dist-packages/bblfsh/aliases.py in <module>()
24
25 ParseResponse = importlib.import_module(
---> 26 "bblfsh.gopkg.in.bblfsh.sdk.%s.protocol.generated_pb2" % VERSION).ParseResponse
27
28 NativeParseResponse = importlib.import_module(
/usr/lib/python3.5/importlib/__init__.py in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
127
128
/usr/local/lib/python3.5/dist-packages/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2.py in <module>()
101 message_type=None, enum_type=None, containing_type=None,
102 is_extension=False, extension_scope=None,
--> 103 options=None, file=DESCRIPTOR),
104 _descriptor.FieldDescriptor(
105 name='language', full_name='gopkg.in.bblfsh.sdk.v1.protocol.NativeParseRequest.language', index=1,
TypeError: __new__() got an unexpected keyword argument 'file'
Hi,
Follow up #92 that had to be fixed by #94.
I checked again - there is a memory leakage. And quite huge.
script to reproduce
import bblfsh
from bblfsh import filter as filter_uast
def filter_bblfsh(n_times=100, file="/home/egor/workspace/code-annotation/java_code.java"):
for i in range(n_times):
extract_functions_from_uast(file=file)
client = bblfsh.BblfshClient("0.0.0.0:9432")
filter_uast = filter_uast
FUNC_XPATH = "//*[@roleFunction and @roleDeclaration]"
FUNC_NAME_XPATH = "/*[@roleFunction and @roleIdentifier and @roleName] " \
"| /*/*[@roleFunction and @roleIdentifier and @roleName]"
def extract_functions_from_uast(file="/home/egor/workspace/code-annotation/java_code.java"):
uast = client.parse(file).uast
allfuncs = list(filter_uast(uast, FUNC_XPATH))
internal = set()
for func in allfuncs:
if id(func) in internal:
continue
for sub in filter_uast(func, FUNC_XPATH):
if sub != func:
internal.add(id(sub))
for f in allfuncs:
if id(f) not in internal:
name = "+".join(n.token for n in filter_uast(f, FUNC_NAME_XPATH))
if __name__ == "__main__":
import sys
import resource
before = resource.getrusage(resource.RUSAGE_SELF)
file = "/home/egor/workspace/code-annotation/java_code.java"
if len(sys.argv) == 3:
file = sys.argv[2]
filter_bblfsh(n_times=int(sys.argv[1]))
after = resource.getrusage(resource.RUSAGE_SELF)
print('Memory increased by: %d%%' % int(100 * ((after[2] / before[2]) - 1)))
and the results of measurements:
egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 100
Memory increased by: 2061%
egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 200
Memory increased by: 4109%
egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 400
Memory increased by: 8158%
java code that I used:
/*
* ScreenSlicer (TM)
* Copyright (C) 2013-2015 Machine Publishers, LLC
* [email protected] | screenslicer.com | machinepublishers.com
* Cincinnati, Ohio, USA
*
* You can redistribute this program and/or modify it under the terms of the GNU Affero General Public
* License version 3 as published by the Free Software Foundation.
*
* ScreenSlicer is made available under the terms of the GNU Affero General Public License version 3
* with the following clarification and special exception:
*
* Linking ScreenSlicer statically or dynamically with other modules is making a combined work
* based on ScreenSlicer. Thus, the terms and conditions of the GNU Affero General Public License
* version 3 cover the whole combination.
*
* As a special exception, Machine Publishers, LLC gives you permission to link unmodified versions
* of ScreenSlicer with independent modules to produce an executable, regardless of the license
* terms of these independent modules, and to copy, distribute, and make available the resulting
* executable under terms of your choice, provided that you also meet, for each linked independent
* module, the terms and conditions of the license of that module. An independent module is a module
* which is not derived from or based on ScreenSlicer. If you modify ScreenSlicer, you may not
* extend this exception to your modified version of ScreenSlicer.
*
* "ScreenSlicer", "jBrowserDriver", "Machine Publishers", and "automatic, zero-config web scraping"
* are trademarks of Machine Publishers, LLC.
*
* This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Affero General Public License version 3 for more details.
*
* You should have received a copy of the GNU Affero General Public License version 3 along with this
* program. If not, see <http://www.gnu.org/licenses/>.
*
* For general details about how to investigate and report license violations, please see:
* <https://www.gnu.org/licenses/gpl-violation.html> and email the author: [email protected]
*/
package com.screenslicer.core.scrape;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.NodeVisitor;
import com.screenslicer.api.datatype.HtmlNode;
import com.screenslicer.core.scrape.neural.NeuralNetManager;
import com.screenslicer.core.scrape.type.ComparableNode;
import com.screenslicer.core.util.NodeUtil;
public class Extract {
private static final int SCORE_PENALTY = 10000;
private static final int TWICE = 2;
private static HashMap<Element, ComparableNode[]> nodesCache = new HashMap<Element, ComparableNode[]>();
private static class TrainingData {
private final ComparableNode target;
private int finalMisses = 0;
private boolean winner = false;
private int winnerDistance = 0;
private ComparableNode best = null;
public TrainingData(ComparableNode target) {
this.target = target;
}
}
private static ComparableNode best(ComparableNode[] nodes, Integer[][] comparisonCache, Collection<Node> ignore, TrainingData trainingData) {
int ignoreSize = ignore == null ? 0 : ignore.size();
if (nodes.length - ignoreSize == 1) {
if (ignore == null || ignore.isEmpty()) {
return nodes[0];
}
for (int i = 0; i < nodes.length; i++) {
if (!ignore.contains(nodes[i])) {
return nodes[i];
}
}
}
if (comparisonCache == null) {
comparisonCache = new Integer[nodes.length][nodes.length];
}
int adjustedLen = nodes.length - ignore.size();
for (int failMax = 0; failMax < adjustedLen; failMax++) {
Map<ComparableNode, Integer> winners = new HashMap<ComparableNode, Integer>();
for (int i = 0; i < nodes.length; i++) {
if (ignore != null && ignore.contains(nodes[i].node())) {
continue;
}
boolean found = true;
int fail = 0;
for (int j = 0; j < nodes.length; j++) {
if (ignore != null && ignore.contains(nodes[j].node())) {
continue;
}
if (nodes[j] != null) {
if (comparisonCache[i][j] == null) {
int result = nodes[i].compare(nodes[j]);
if (result != -1) {
++fail;
}
comparisonCache[i][j] = new Integer(result);
comparisonCache[j][i] = new Integer(result * (-1));
} else if (comparisonCache[i][j].intValue() != -1) {
++fail;
}
if (fail > failMax) {
found = false;
break;
}
}
}
if (found) {
if (failMax == 0) {
if (trainingData != null) {
trainingData.winner = trainingData.target.equals(nodes[i]);
trainingData.finalMisses = trainingData.winner ? 0 : 1;
trainingData.winnerDistance = 0;
trainingData.best = nodes[i];
}
return nodes[i];
}
winners.put(nodes[i], i);
}
}
if (winners.size() == 1) {
ComparableNode ret = winners.keySet().toArray(new ComparableNode[1])[0];
if (trainingData != null) {
trainingData.winner = trainingData.target.equals(ret);
trainingData.finalMisses = trainingData.winner ? 0 : 1;
trainingData.winnerDistance = failMax;
trainingData.best = ret;
}
return ret;
}
if (!winners.isEmpty()) {
int targetIndex = -1;
ComparableNode[] winnersArray = winners.keySet().toArray(new ComparableNode[0]);
if (trainingData != null) {
trainingData.winnerDistance = failMax;
if (winners.containsKey(trainingData.target)) {
for (int i = 0; i < winnersArray.length; i++) {
if (trainingData.target.equals(winnersArray[i])) {
targetIndex = i;
break;
}
}
}
if (targetIndex == -1) {
trainingData.finalMisses = winners.size();
trainingData.winner = false;
}
}
for (int i = 0; i < winnersArray.length; i++) {
boolean found = true;
for (int j = 0; j < winnersArray.length; j++) {
if (i != j) {
int iCache = winners.get(winnersArray[i]);
int jCache = winners.get(winnersArray[j]);
if (comparisonCache[iCache][jCache] == null) {
int result = winnersArray[i].compare(winnersArray[j]);
comparisonCache[iCache][jCache] = new Integer(result);
comparisonCache[jCache][iCache] = new Integer(result * (-1));
}
if (comparisonCache[iCache][jCache].intValue() != -1) {
found = false;
if (i != targetIndex) {
break;
} else if (trainingData != null) {
++trainingData.finalMisses;
}
}
}
}
if (found) {
if (trainingData != null) {
trainingData.best = winnersArray[i];
}
if (targetIndex == i && trainingData != null) {
trainingData.finalMisses = 0;
trainingData.winner = true;
return trainingData.target;
} else if (targetIndex == -1 || targetIndex < i) {
if (trainingData != null) {
trainingData.winner = false;
}
return winnersArray[i];
}
} else if (targetIndex == i
&& trainingData != null
&& trainingData.best != null) {
trainingData.winner = false;
return trainingData.best;
}
}
return null;
}
}
return null;
}
public static ComparableNode[] trainInit(Element body, int page, int thread) {
ComparableNode[] nodesArray = performInternal(body, page, null, null, null, thread);
nodesCache.put(body, nodesArray);
return nodesArray;
}
public static int train(Element body, int page, ComparableNode target, int targetIndex, int thread) {
ComparableNode[] nodesArray = null;
nodesArray = nodesCache.get(body);
int score = 0;
if (NeuralNetManager.instance(thread).isMulti()) {
int votes = 0;
final int majority = (NeuralNetManager.instance(thread).multiSize() / TWICE) + 1;
boolean won = false;
ComparableNode fallback = null;
int[] distances = new int[NeuralNetManager.instance(thread).multiSize()];
int curDistance = 0;
Map<ComparableNode, Integer> votesMap = new HashMap<ComparableNode, Integer>();
if (targetIndex < 0) {
for (int i = 0; i < nodesArray.length; i++) {
if (nodesArray[i].equals(target)) {
targetIndex = i;
break;
}
}
}
while (NeuralNetManager.instance(thread).hasNext()) {
Integer[][] comparisonCache = new Integer[nodesArray.length][nodesArray.length];
int distance = 0;
for (int i = 0; i < nodesArray.length; i++) {
if (!target.equals(nodesArray[i])) {
int result = target.compare(nodesArray[i]);
if (result != -1) {
++distance;
}
comparisonCache[targetIndex][i] = new Integer(result);
comparisonCache[i][targetIndex] = new Integer(result * (-1));
}
}
TrainingData trainingData = new TrainingData(target);
ComparableNode tmp = best(nodesArray, comparisonCache, null, trainingData);
if (tmp != null) {
fallback = tmp;
}
if (trainingData.best != null) {
if (!votesMap.containsKey(trainingData.best)) {
votesMap.put(trainingData.best, new Integer(1));
} else {
votesMap.put(trainingData.best,
new Integer(votesMap.get(trainingData.best).intValue() + 1));
}
}
distance = (distance - trainingData.winnerDistance) + trainingData.finalMisses;
NeuralNetManager.instance(thread).next();
if (trainingData.winner) {
++votes;
}
if (votes == majority) {
won = true;
break;
}
distances[curDistance++] = distance;
}
NeuralNetManager.instance(thread).resetNext();
if (!won) {
int maxVotes = 0;
ComparableNode maxComparableNode = null;
for (Map.Entry<ComparableNode, Integer> entry : votesMap.entrySet()) {
if (entry.getValue().intValue() == maxVotes) {
maxComparableNode = null;
} else if (entry.getValue().intValue() > maxVotes) {
maxVotes = entry.getValue().intValue();
maxComparableNode = entry.getKey();
}
}
if (maxComparableNode == null) {
maxComparableNode = fallback;
}
if (!target.equals(maxComparableNode)) {
int totalDistance = 0;
Arrays.sort(distances);
for (int i = 0; i < majority; i++) {
totalDistance += distances[i];
}
score += totalDistance + SCORE_PENALTY;
}
}
} else {
int distance = 0;
Integer[][] comparisonCache = new Integer[nodesArray.length][nodesArray.length];
if (targetIndex < 0) {
for (int i = 0; i < nodesArray.length; i++) {
if (nodesArray[i].equals(target)) {
targetIndex = i;
break;
}
}
}
for (int i = 0; i < nodesArray.length; i++) {
if (!target.equals(nodesArray[i])) {
int result = target.compare(nodesArray[i]);
if (result != -1) {
++distance;
}
comparisonCache[targetIndex][i] = new Integer(result);
comparisonCache[i][targetIndex] = new Integer(result * (-1));
}
}
TrainingData trainingData = new TrainingData(target);
best(nodesArray, comparisonCache, null, trainingData);
score += (distance - trainingData.winnerDistance) + trainingData.finalMisses;
score += trainingData.winner ? 0 : SCORE_PENALTY;
}
return score;
}
private static ComparableNode[] performInternal(final Element body, final int page,
final HtmlNode matchResult, final HtmlNode matchParent, final Collection<Node> ignore, int thread) {
final Map<Node, ComparableNode> nodes = new HashMap<Node, ComparableNode>();
if (body != null) {
body.traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
int nonEmptyChildren = 0;
for (Node child : node.childNodes()) {
if (!NodeUtil.isEmpty(child)) {
nonEmptyChildren++;
}
}
if (!NodeUtil.isEmpty(node)
&& NodeUtil.isContent(node, matchResult, matchParent) && nonEmptyChildren > 0) {
nodes.put(node, new ComparableNode(node, matchResult, matchParent, thread));
}
}
@Override
public void tail(Node node, int depth) {}
});
}
return nodes.values().toArray(new ComparableNode[0]);
}
public static class Cache {
public ComparableNode[] nodesCache = null;
public Integer[][][] comparisonCache = null;
}
public static List<Node> perform(Element body, int page, Collection<Node> ignore,
HtmlNode matchResult, HtmlNode matchParent, Cache cache, int thread) {
Map<ComparableNode, Integer> votes = new LinkedHashMap<ComparableNode, Integer>();
if (cache == null) {
cache = new Cache();
}
if (cache.nodesCache == null) {
cache.nodesCache = performInternal(body, page, matchResult, matchParent, ignore, thread);
cache.comparisonCache = new Integer[NeuralNetManager.instance(thread).multiSize()]
[cache.nodesCache.length][cache.nodesCache.length];
}
final int majority = (NeuralNetManager.instance(thread).multiSize() / TWICE) + 1;
Node best = null;
int cur = 0;
NeuralNetManager.instance(thread).resetNext();
while (NeuralNetManager.instance(thread).hasNext()) {
ComparableNode winner = best(cache.nodesCache, cache.comparisonCache[cur++],
new HashSet<Node>(ignore), null);
NeuralNetManager.instance(thread).next();
if (winner != null) {
if (!votes.containsKey(winner)) {
votes.put(winner, new Integer(1));
} else {
votes.put(winner, new Integer(votes.get(winner).intValue() + 1));
}
if (votes.get(winner).intValue() == majority) {
best = winner.node();
break;
}
}
}
if (best == null) {
int bestVotes = 0;
List<Node> bestNodes = new ArrayList<Node>();
for (Map.Entry<ComparableNode, Integer> entry : votes.entrySet()) {
int val = entry.getValue().intValue();
if (val >= bestVotes) {
if (val > bestVotes) {
bestVotes = val;
bestNodes.clear();
}
bestNodes.add(entry.getKey().node());
}
}
return bestNodes;
}
return Arrays.asList(new Node[] { best });
}
}
Hi,
I found the strange behavior of language
argument in client.parse
.
When you parse files with and without this argument and after select files with specific language - it returns a different number of files.
Code for reproducibility:
import argparse
import glob
import os
import bblfsh
from bblfsh.client import NonUTF8ContentException
def prepare_files(folder, client, language, use_lang=True):
files = []
# collect filenames with full path
filenames = glob.glob(folder, recursive=True)
for file in filenames:
if not os.path.isfile(file):
continue
try:
# TODO (Egor): figure out why `language` argument changes number of files significantly
if use_lang:
res = client.parse(file, language)
else:
res = client.parse(file)
except NonUTF8ContentException:
# skip files that can't be parsed because of UTF-8 decoding errors.
continue
if res.status == 0 and res.language.lower() == language.lower():
files.append("")
return files
def test_client(args):
client = bblfsh.BblfshClient(args.bblfsh)
files = prepare_files(args.input, client, args.language, args.use_lang)
print("Number of files: %s" % (len(files)))
def create_parser():
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input", required=True, type=str,
help="Path to folder with source code - "
"should be in a format compatible with glob (ends with**/* "
"and surrounded by quotes. Ex: `path/**/*`).")
parser.add_argument("--bblfsh", default="0.0.0.0:9432",
help="Babelfish server's address.")
# I'm using javascript for experiments
parser.add_argument("-l", "--language", default="javascript",
help="Programming language to use.")
parser.add_argument("-u", "--use-lang", action="store_true",
help="If lang in client.parse should be used.")
return parser
def main():
parser = create_parser()
args = parser.parse_args()
client = bblfsh.BblfshClient(args.bblfsh)
files = prepare_files(args.input, client, args.language, args.use_lang)
print("Number of files %s with args %s" % (len(files), args))
if __name__ == "__main__":
main()
and results:
egor@egor-sourced:~/workspace/style-analyzer$ python3 lookout/style/format/test_client.py -i '/home/egor/workspace/tmp/freeCodeCamp/**/*'
Number of files 187 with args Namespace(bblfsh='0.0.0.0:9432', input='/home/egor/workspace/tmp/freeCodeCamp/**/*', language='javascript', use_lang=False)
egor@egor-sourced:~/workspace/style-analyzer$ python3 lookout/style/format/test_client.py -i '/home/egor/workspace/tmp/freeCodeCamp/**/*' -u
Number of files 258 with args Namespace(bblfsh='0.0.0.0:9432', input='/home/egor/workspace/tmp/freeCodeCamp/**/*', language='javascript', use_lang=True)
Similar to bblfsh/scala-client#68 it is really useful to have this feature in Python client.
Now we (ML team) have a hardcoded list of languages: https://github.com/src-d/ml/blob/master/sourced/ml/cmd_entries/args.py#L36-L40 and we should update it each time.
bblfsh/pyuast.c: At top level:
bblfsh/pyuast.c:179:27: error: storage size of ‘module_def’ isn’t known
static struct PyModuleDef module_def = {
In requirements.txt
:
grpcio>=1.3,<=2.0
grpcio-tools>=1.3,<2.0
In setup.py
:
install_requires=[" install_requires=["grpcio==1.10.0", "grpcio-tools==1.10.0", "docker", "protobuf>=3.4.0"],==1.10.0", "grpcio-tools==1.10.0", "docker", "protobuf>=3.4.0"],
Now that version 1.11.0 of grcpio
is out, this causes conflicts during coverage check for ml, among other things.
Could you either check that 1.11 is supported and change setup, or modify requirements to insure we use version 1.10 ?
Hi,
I found that python-driver has lack of position information for several types of tokens.
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
file_loc = "location/of/file.py"
# read content
with open(file_loc, "r") as f:
content = f.read()
# extract uast
uast = client.parse(file_loc).uast
# select nodes with tokens and sort them by position
nodes = []
for node in bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER):
if node.token:
nodes.append(node)
nodes = list(sorted(nodes, key=lambda n: n.start_position.offset))
# print token position, token, select source by position information
for n in nodes:
print(n.start_position.offset, n.token,
content[n.start_position.offset:n.start_position.offset + len(n.token)],
content[n.start_position.offset:n.end_position.offset + 1],
sep="|")
The source code I used is in details
import argparse
import os
import tempfile
import unittest
import sourced.ml.tests.models as paths
from sourced.ml.models import Topics
from sourced.ml.cmd import bigartm2asdf
class TopicsTests(unittest.TestCase):
def setUp(self):
self.model = Topics().load(source=paths.TOPICS)
def test_dump(self):
res = self.model.dump()
self.assertEqual(res, """320 topics, 1000 tokens
First 10 tokens: ['ulcancel', 'domainlin', 'trudi', 'fncreateinstancedbaselin', 'wbnz', 'lmultiplicand', 'otronumero', 'qxln', 'gvgq', 'polaroidish']
Topics: unlabeled
non-zero elements: 6211 (0.019409)""") # noqa
def test_props(self):
self.assertEqual(len(self.model), 320)
self.assertEqual(len(self.model.tokens), 1000)
self.assertIsNone(self.model.topics)
zt = self.model[0]
self.assertEqual(len(zt), 8)
self.assertEqual(zt[0][0], "olcustom")
self.assertAlmostEqual(zt[0][1], 1.23752e-06, 6)
def test_label(self):
with self.assertRaises(ValueError):
self.model.label_topics([1, 2, 3])
with self.assertRaises(TypeError):
self.model.label_topics(list(range(320)))
self.model.label_topics([str(i) for i in range(320)])
self.assertEqual(self.model.topics[0], "0")
def test_save(self):
with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
self.model.save(f.name)
new = Topics().load(f.name)
self.assertEqual(self.model.tokens, new.tokens)
self.assertEqual((self.model.matrix != new.matrix).getnnz(), 0)
def test_bigartm2asdf(self):
with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
args = argparse.Namespace(
input=os.path.join(os.path.dirname(__file__), paths.TOPICS_SRC),
output=f.name)
bigartm2asdf(args)
model = Topics().load(f.name)
self.assertEqual(len(model), 320)
self.assertEqual(len(model.tokens), 1000)
if __name__ == "__main__":
unittest.main()
As result we may notice seral tokens without position information:
0|argparse|import a|i
0|os|im|i
0|tempfile|import a|i
0|unittest|import a|i
0|sourced.ml.tests.models|import argparse
import |i
0|paths|impor|i
0|sourced.ml.models|import argparse
i|i
0|Topics|import|i
0|sourced.ml.cmd|import argpars|i
0|bigartm2asdf|import argpa|i
0|source|import|i
0|!=|im|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i
0|==|im|i
184|TopicsTests|TopicsTests|TopicsTests
some of them are imports like
0|argparse|import a|i
0|os|im|i
some operators
0|==|im|i
0|!=|im|i
some arguments
0|source|import|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i
How to reproduce:
wget https://raw.githubusercontent.com/lumoslabs/aleph/master/vendor/assets/javascripts/ace/mode-pgsql.js
in python:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
client.parse("mode-pgsql.js")
And you will get:
ERROR:root:Exception deserializing message!
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 87, in _transform
return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
File "/home/k/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-33-6f0fa647fcd6>", line 1, in <module>
resp = bf.parse("/home/k/sourced/workdir/mode-pgsql.js")
File "/usr/local/lib/python3.5/dist-packages/bblfsh/client.py", line 74, in parse
return self._stub.Parse(request, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 487, in call
return _end_unary_response_blocking(state, call, False, deadline)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 437, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>
versions:
REPOSITORY TAG IMAGE ID CREATED SIZE
bblfsh/bblfshd latest sha256:fb5eb6936c67c20f451daddc09a4648cc26da978041af36f9f70b246f71d804d 13 days ago 173MB
| javascript | docker://bblfsh/javascript-driver:latest | dev-adcd1b4 | beta | 11 days | alpine | 1.9 | 8.9.3-r0 |
I am not sure is it relatet to client-python or to javascript driver...
I get next output when run pip install -U bblfsh
:
https://gist.github.com/zurk/c327a3923529aab58a246ef23deb148a
Hi,
I found that client returns the empty response without any warnings/errors when code with the wrong extension is sent to a client.
Code to reproduce:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
print(client.parse("java_code.py").uast)
Sample is attached
java_code.py.zip
the release was today: https://github.com/grpc/grpc/releases/tag/v1.10.0
issue: grpc/grpc#14573
run:
mkdir test
cd test
virtualenv -p python3 .venv-py3
source .venv-py3/bin/activate
pip install bblfsh
you will get an error from grpcio installation:
Command "/Users/k/test/.venv-py3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-build-y50usx77/grpcio/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-5owjmdjl-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/k/test/.venv-py3/bin/../include/site/python3.6/grpcio" failed with error code 1 in /private/var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-build-y50usx77/grpcio/
full log file:
grpcio.logs.zip
I can suggest fixing grpcio package version. I am also fine with any other solution, let's discuss.
With newlines included:
class Repo2nBOW(Repo2Base):
@property
def id2vec(self):
return self._id2vec
This produce two instances of role 141
but only with the client-python, it works on the integration tests and other ways to produce the UAST, so probably the bug is in this project.
Currently the sdk/generated.proto
and uast/generated.proto
files are included in the repository. To correctly support SDK versioning, it should download them from gopkg.in redirect URL.
Saw in travis.yml it's only tested in 3.6 and 3.7, but in PyPI is listed in the 2.7 category. Is it supported?
In the clients guide:
The client API's differ to adapt to their language specific idioms, the following [codes] [shows] several simple examples with the Go, Python and Scala clients that [parsers] a file and [applies] a filter to return all the simple identifiers.
codes
. Maybe it could be replaced with code snippets
for example.shows
should be show
also.parsers
-> parse
applies
-> apply
As I'm not native english speaker, every time I mention something non-trivial please double-check if possible in case I am wrong.
I run the following code and get:
import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
print(client.version())
elapsed {
}
version: "v2.11.0"
build {
seconds: -62135596800
}
print(client.supported_languages())
[name: "JavaScript"
language: "javascript"
version: "dev-adcd1b4"
status: "beta"
features: "ast"
features: "uast"
features: "roles"
]
The version string has a negative number of seconds and empty "elapsed".
The javascript driver version is reported to be "dev-adcd1b4", which does not help to check that it is 1.2.0 (which we strictly require since the newer versions do not work for us).
Hi,
I measured time for the same query that I used in #100 - I think it's very suspicious that it's so slow.
Measurements:
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 100 sourced/ml/__main__.py
Memory increased by: 2099%
real 0m32.773s
user 0m25.209s
sys 0m0.425s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 200 sourced/ml/__main__.py
Memory increased by: 4130%
real 1m22.348s
user 1m5.700s
sys 0m0.776s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 400 sourced/ml/__main__.py
Memory increased by: 8275%
real 2m49.173s
user 2m19.193s
sys 0m1.528s
Currently the tests get the latest
server image, that points to the last versioned tag. This means that changes in the SDK or server can't be properly tested in the client-python until a new tag is released.
Hi. I am new to bblfsh and was wondering if I could convert a UAST into Python's AST structure defined by the ast
module. Thanks!
if you extract UAST for a=None
and run this command:
bblfsh.filter(uast, "//*[@roleAssignment]")
you will get empty list.
Also for "//*[@roleNull]"
queury.
P.S. UAST structure:
# Token Internal Role Roles Tree
|| Module FILE
1 || Assign ┣ BINARY, ASSIGNMENT, EXPRESSION
1 |a| Name ┃ ┣ LEFT, IDENTIFIER, EXPRESSION
1 |<nil>| NoneLiteral ┗ ┗ LITERAL, NULL, EXPRESSION, RIGHT
This might be linked to issue 89, when I looked for this kind of error on slack I found a thread from. couple days ago where Egor had this issue, and @smola referenced it. However after looking at it, I do not think it is the same, given it seems to concern JavaScript files.
Was trying to run apollo
bags on 4G of files, specified languages the following languages: Bash
, Java
, Python
, Ruby
, then during deserialization I got this error log:
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 177, in main
process()
File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 172, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 268, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/basic.py", line 186, in deserialize_uast
File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/basic.py", line 186, in <listcomp>
google.protobuf.message.DecodeError: Error parsing message
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:156)
at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:152)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:372)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:395)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The error popped at this line), as you can see a fix has already been implemented to circumvent this for now but still.
Since I had not seen this error before, I tried to specify only one language at a time to see if it was a driver problem. After trials, was surprised to see it was the Python driver that caused this (surprised because when doing tests on another batch of siva files where i know there where Python files, this error did not pop up). When looking in the executor logs, I found these kind of errors logs:
18/04/04 14:11:56 WARN Bblfsh: FATAL os/src/shell/micropython/tests/pyboard.py: Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py", line 139, in process_request
'\n------ Python3 errors:\n%s' % codeinfo['py3_ast_errors']
Exception: Errors produced trying to get an AST for both Python versions
------ Python2 errors:
[b'Traceback (most recent call last):\n File "<string>", line 1, in <module>\n File "/usr/lib/python2.7/site-packages/pydetector/ast2dict.py", line 19, in ast2dict\n return visitor.parse()\n File "/usr/lib/python2.7/site-packages/pydetector/ast2dict.py", line 45, in parse\n tree = ast.parse(self.codestr, mode=\'exec\')\n File "/usr/lib/python2.7/ast.py", line 37, in parse\n return compile(source, filename, mode, PyCF_ONLY_AST)\n File "<unknown>", line 1\n ../tools/pyboard.py\n ^\nSyntaxError: invalid syntax\n']
------ Python3 errors:
['Traceback (most recent call last):\n File "/usr/lib/python3.6/site-packages/pydetector/ast_checks.py", line 53, in check_ast\n current_ast = ast2dict(code)\n File "/usr/lib/python3.6/site-packages/pydetector/ast2dict.py", line 19, in ast2dict\n return visitor.parse()\n File "/usr/lib/python3.6/site-packages/pydetector/ast2dict.py", line 45, in parse\n tree = ast.parse(self.codestr, mode=\'exec\')\n File "/usr/lib/python3.6/ast.py", line 35, in parse\n return compile(source, filename, mode, PyCF_ONLY_AST)\n File "<unknown>", line 1\n ../tools/pyboard.py\n ^\nSyntaxError: invalid syntax\n']
So yeah, don't have much more info then this, this is not insanely urgent as we have a workaround at the moment, but still pretty annoying.
Hi,
I tried to call https://github.com/bblfsh/client-python/blob/master/bblfsh/client.py#L103:
client.supported_languages()
and it gives me
_Rendezvous Traceback (most recent call last)
<ipython-input-11-50b543ce587c> in <module>()
----> 1 client.supported_languages()
/usr/local/lib/python3.5/dist-packages/bblfsh/client.py in supported_languages(self)
102
103 def supported_languages(self):
--> 104 sup_response = self._stub.SupportedLanguages(SupportedLanguagesRequest())
105 return sup_response.languages
106
/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials)
512 def __call__(self, request, timeout=None, metadata=None, credentials=None):
513 state, call, = self._blocking(request, timeout, metadata, credentials)
--> 514 return _end_unary_response_blocking(state, call, False, None)
515
516 def with_call(self, request, timeout=None, metadata=None, credentials=None):
/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
446 return state.response
447 else:
--> 448 raise _Rendezvous(state, None, None, deadline)
449
450
_Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNIMPLEMENTED
details = "unknown method SupportedLanguages"
debug_error_string = "{"created":"@1536337572.354363571","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1083,"grpc_message":"unknown method SupportedLanguages","grpc_status":12}"
>
I updated bblfsh-client version to bblfsh-2.12.1
but in python it still shows client version:
elapsed {
}
version: "v2.4.1"
build {
seconds: -62135596800
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.