Coder Social home page Coder Social logo

python-client's People

Contributors

abeaumont avatar bzz avatar creachadair avatar dennwc avatar juanjux avatar manucorporat avatar mcarmonaa avatar mcuadros avatar ncordon avatar prashik-s avatar smacker avatar smola avatar vmarkovtsev avatar vovak avatar zurk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-client's Issues

UnicodeDecodeError in client.parse()

I try to run bblfsh from python for this file: https://github.com/INRIA/spoon/blob/master/src/test/resources/noclasspath/IsoEncoding.java
As you can see it is the file with non-standard characters.

If I run

import bblfsh

client = bblfsh.BblfshClient("0.0.0.0:9234")
res = client.parse("IsoEncoding.java")

I get

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/type_checkers.py", line 182, in CheckValue
    proposed_value = proposed_value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 69: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1599, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1026, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/k/sourced/ast2vec/how_to_use_ast2vec.py", line 4, in <module>
    res = client.parse("../bad_file/IsoEncoding.java")
  File "/usr/local/lib/python3.6/site-packages/bblfsh/client.py", line 56, in parse
    language=self._scramble_language(language))
  File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 526, in init
    setattr(self, field_name, field_value)
  File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 662, in field_setter
    new_value = type_checker.CheckValue(new_value)
  File "/usr/local/lib/python3.6/site-packages/google/protobuf/internal/type_checkers.py", line 187, in CheckValue
    (proposed_value))
ValueError: b'package my.pack;\n\npublic class IsoEncoding {\n\n    private String entr\xe9e;\n    public IsoEncoding(String maCha\xeene) {\n        this.entr\xe9e = maCha\xeene;\n    }\n\n    private String r\xe9cup\xe9rerUneEntr\xe9e() {\n        return this.entr\xe9e;\n    }\n}' has type bytes, but isn't valid UTF-8 encoding. Non-UTF-8 strings must be converted to unicode objects before being added.

Is it expected behavior for non-unicode files?
If so I think it is better to provide better error message from bblfsh client.

'pip install bblfsh' fails on RHEL 7.4

I ran pip install bblfsh on RHEL 7.4 and I get:

Failed to build bblfsh
Installing collected packages: bblfsh, chardet, certifi, urllib3, idna, requests
Running setup.py install for bblfsh ... error
Complete output from command /opt/rh/rh-python35/root/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-phlmbc69/b
blfsh/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))"
 install --record /tmp/pip-kkd6eyh4-record/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.5
    creating build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/__main__.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/launcher.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/aliases.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/sdkversion.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/client.py -> build/lib.linux-x86_64-3.5/bblfsh
    copying bblfsh/test.py -> build/lib.linux-x86_64-3.5/bblfsh
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg
    copying bblfsh/gopkg/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg
    creating build/lib.linux-x86_64-3.5/bblfsh/github
    copying bblfsh/github/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in
    copying bblfsh/gopkg/in/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh
    copying bblfsh/gopkg/in/bblfsh/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk
    copying bblfsh/gopkg/in/bblfsh/sdk/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/uast/generated_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/uast/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/uast
    creating build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
    copying bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2_grpc.py -> build/lib.linux-x86_64-3.5/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol
    creating build/lib.linux-x86_64-3.5/bblfsh/github/com
    copying bblfsh/github/com/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com
    creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo
    copying bblfsh/github/com/gogo/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo
    creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf
    copying bblfsh/github/com/gogo/protobuf/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf
    creating build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
    copying bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
    copying bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py -> build/lib.linux-x86_64-3.5/bblfsh/github/com/gogo/protobuf/gogoproto
    running build_ext
curl -SL https://github.com/bblfsh/libuast/releases/download/v1.6.0/libuast-v1.6.0.tar.gz | tar xz
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   611    0   611    0     0   2395      0 --:--:-- --:--:-- --:--:--  2405
    100 21511  100 21511    0     0  42866      0 --:--:-- --:--:-- --:--:-- 42866
    mv libuast-v1.6.0 libuast
    cp -a libuast/src bblfsh/libuast
    rm -rf libuast
    building 'bblfsh.pyuast' extension
    creating build/temp.linux-x86_64-3.5
    creating build/temp.linux-x86_64-3.5/bblfsh
    creating build/temp.linux-x86_64-3.5/bblfsh/libuast
    g++ -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-
strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/opt/rh/rh-python35/root/usr/include -O
2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=gener
ic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Ibblfsh/libuast/ -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/include -I/usr/include/libxml2 -
I/opt/rh/rh-python35/root/usr/include/python3.5m -c bblfsh/pyuast.c -o build/temp.linux-x86_64-3.5/bblfsh/pyuast.o -std=c++11
    g++ -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-
strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -I/opt/rh/rh-python35/root/usr/include -O
2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=gener
ic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Ibblfsh/libuast/ -I/usr/local/include -I/usr/local/include/libxml2 -I/usr/include -I/usr/include/libxml2 -
I/opt/rh/rh-python35/root/usr/include/python3.5m -c bblfsh/libuast/uast.cc -o build/temp.linux-x86_64-3.5/bblfsh/libuast/uast.o -std=c++11
    In file included from bblfsh/libuast/uast.cc:4:0:
    bblfsh/libuast/uast_private.h:4:25: fatal error: libxml/tree.h: No such file or directory
     #include <libxml/tree.h>
                             ^
    compilation terminated.
    error: command 'g++' failed with exit status 1

    ----------------------------------------
Command "/opt/rh/rh-python35/root/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-phlmbc69/bblfsh/setup.py';f=getatt
r(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/p
ip-kkd6eyh4-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-phlmbc69/bblfsh/

Additionally, the libxml2-dev package is installed on the system.

Error on Windows installation

Original report by @AnneshaChowdhury: src-d/tmsc#10

Hi,
I am having a hard time with the installation of tmsc. I have been trying to get tmsc up but have been running into some problems. I am installing inside 'venv' on Windows 10.
This is the snippet of the error I have been getting.

    pyuast.cc
    bblfsh/pyuast.cc(481): error C2059: syntax error: '.'
    bblfsh/pyuast.cc(502): error C2143: syntax error: missing ';' before '}'
    bblfsh/pyuast.cc(504): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
    bblfsh/pyuast.cc(504): error C2040: 'ctx': 'int' differs in levels of indirection from 'Uast *'
    bblfsh/pyuast.cc(504): error C2065: 'iface': undeclared identifier
    bblfsh/pyuast.cc(505): error C2059: syntax error: 'return'
    bblfsh/pyuast.cc(506): error C2059: syntax error: '}'
    bblfsh/pyuast.cc(506): error C2143: syntax error: missing ';' before '}'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.11.25503\\bin\\HostX86\\x86\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\progra~1\python36\venv~1\scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\Annesha\\AppData\\Local\\Temp\\pip-install-ur9kqskb\\bblfsh\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\Annesha\AppData\Local\Temp\pip-record-jp280ri4\install-record.txt --single-version-externally-managed --compile --install-headers c:\progra~1\python36\venv~1\include\site\python3.6\bblfsh" failed with error code 1 in C:\Users\Annesha\AppData\Local\Temp\pip-install-ur9kqskb\bblfsh\c\ 

Any help is appreciated!

[bug] POSITION_ORDER iterator is broken

Hi,

>>> for node in bblfsh.iterator(uasts[100], bblfsh.TreeOrder.POSITION_ORDER):
...     print(node)
... 
Segmentation fault (core dumped)

all other iterators are working well.

Installation problems v0.1.0

When I try to follow installation notes in README I failed.

pip3 install bblfsh leads to the error:

pip3 install bblfsh
Collecting bblfsh
  Downloading bblfsh-0.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    make: *** No rule to make target 'deps'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-m2t1aa3i/bblfsh/setup.py", line 16, in <module>
        subprocess.check_output(['make', 'deps'])
      File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
        **kwargs).stdout
      File "/usr/lib/python3.5/subprocess.py", line 708, in run
        output=stdout, stderr=stderr)
    subprocess.CalledProcessError: Command '['make', 'deps']' returned non-zero exit status 2

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-m2t1aa3i/bblfsh/

and

git clone https://github.com/bblfsh/client-python.git
cd client-python
make install

leads to the next error:

Cloning into 'client-python'...
remote: Counting objects: 280, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 280 (delta 11), reused 21 (delta 8), pack-reused 255
Receiving objects: 100% (280/280), 131.22 KiB | 0 bytes/s, done.
Resolving deltas: 100% (132/132), done.
make: *** No rule to make target `install'.  Stop.

I was able to install it using

git clone https://github.com/bblfsh/client-python.git
cd client-python
pip3 install .

No module named 'bblfsh.gopkg' for v2.12.2

import bblfsh
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/bblfsh/__init__.py", line 1, in <module>
    from bblfsh.client import BblfshClient
  File "/usr/local/lib/python3.5/dist-packages/bblfsh/client.py", line 6, in <module>
    from bblfsh.aliases import (ParseRequest, NativeParseRequest, VersionRequest,
  File "/usr/local/lib/python3.5/dist-packages/bblfsh/aliases.py", line 11, in <module>
    "bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % VERSION)
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: No module named 'bblfsh.gopkg'

Different Node classes in the same package

import importlib
import bblfsh

m = importlib.import_module('bblfsh.gopkg.in.bblfsh.sdk.v1.protocol.generated_pb2_grpc')
print(id(m.generated__pb2.gopkg_dot_in_dot_bblfsh_dot_sdk_dot_v1_dot_uast_dot_generated__pb2.Node))
print(id(bblfsh.Node))
m = importlib.import_module('bblfsh.gopkg.in.bblfsh.sdk.v1.uast.generated_pb2')
print(id(m.Node))

The first printed number is different from the rest. That's because bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2_grpc.py does

import generated_pb2 as generated__pb2

As soon as the module is imported from the different PYTHONPATH, the classes get screwed and although they are really the same, they are different. Hence isinstance() returns False on 3.7, etc.

Suggested fix: remove all the tricks with PYTHONPATH and make all the imports absolute everywhere and starting with bblfsh..

The latest version is not found on pypi

I see there is version 2.12.7 in releases on github but when I try to install it:

Could not find a version that satisfies the requirement bblfsh<3.0,>=2.12.7 (from lookout-sdk==0.2.0) (from versions: 0.0.4, 0.0.5, 0.1.0, 0.2.0, 0.2.1, 0.2.3, 1.0.0, 1.0.1, 1.1.0, 2.0.0, 2.1.0, 2.2.1, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.1, 2.6.0, 2.6.1, 2.8.0, 2.8.1, 2.8.2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 2.9.5, 2.9.6, 2.9.8, 2.9.10, 2.9.12, 2.9.13, 2.9.14, 2.10.0, 2.10.1, 2.11.0, 2.11.1, 2.11.2, 2.12.0, 2.12.1, 2.12.2, 2.12.3, 2.12.4, 2.12.5, 2.12.6)

Here https://pypi.org/search/?q=bblfsh I see only 2.12.6 too.

Remove the `Makefile`, make `setup.py` retrieve the `.pb` files from release

Doesn't make sense to have a Makefile when we have a setup.py file, also, we could remove the .pb files that were added to avoid a dependency on protoc from the pip package.

  • Move the functionality of the Makefile to the setup.py.
  • A --release switch should be added to setup.py to generate the .pb files.
  • The travis build should install protoc and call --release before the pip install.
  • The normal build called by pip install should check if the pb files are generated and if not, retrieve the right release zip from github an extract the files from there before install.

failed to install on macos

I have installed libxml2 from brew,
but pip install bblfsh fails with this error:

In file included from bblfsh/libuast/uast.cc:4:
bblfsh/libuast/uast_private.h:4:10: fatal error: 'libxml/tree.h' file not found
#include <libxml/tree.h>
         ^~~~~~~~~~~~~~~
1 error generated.
error: command 'g++' failed with exit status 1

[feature request] new roles submodule with role_id and role_names functions.

I suggest adding a new roles module with roles info and add at least two functions:

def role_id(role_name):
    return DESCRIPTOR.enum_types_by_name["Role"].values_by_name[role_name].number

def role_name(role_id):
    return DESCRIPTOR.enum_types_by_name["Role"].values_by_number[role_id].name

Also, It is good to have something like constants with Roles id.
No need to go to documentation to check which roles we have or if you try to remember name (and you have tips from IDE :) ).

We have such constants in ast2vec:
https://github.com/src-d/ast2vec/blob/master/ast2vec/bblfsh_roles.py#L18

Can make a PR if you approve.

[bug] comments have unexpected brackets

Hi,
I noticed that comments have some unexpected brackets (and symbols - should # be included into token?):

616|[# noqa]| # noqa
| # noqa

Code to reproduce:

import bblfsh

client = bblfsh.BblfshClient("0.0.0.0:9432")
file_loc = "location/of/file.py"

# read content
with open(file_loc, "r") as f:
    content = f.read()

# extract uast
uast = client.parse(file_loc).uast

# select nodes with tokens and sort them by position
nodes = []
for node in bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER):
    if node.token:
        nodes.append(node)
nodes = list(sorted(nodes, key=lambda n: n.start_position.offset))

# print token position, token, select source by position information
for n in nodes:
    print(n.start_position.offset, n.token,
              content[n.start_position.offset:n.start_position.offset + len(n.token)],
              content[n.start_position.offset:n.end_position.offset + 1],
              sep="|")

The source code I used is in details

import argparse
import os
import tempfile
import unittest

import sourced.ml.tests.models as paths
from sourced.ml.models import Topics
from sourced.ml.cmd import bigartm2asdf


class TopicsTests(unittest.TestCase):
    def setUp(self):
        self.model = Topics().load(source=paths.TOPICS)

    def test_dump(self):
        res = self.model.dump()
        self.assertEqual(res, """320 topics, 1000 tokens
First 10 tokens: ['ulcancel', 'domainlin', 'trudi', 'fncreateinstancedbaselin', 'wbnz', 'lmultiplicand', 'otronumero', 'qxln', 'gvgq', 'polaroidish']
Topics: unlabeled
non-zero elements: 6211  (0.019409)""")  # noqa

    def test_props(self):
        self.assertEqual(len(self.model), 320)
        self.assertEqual(len(self.model.tokens), 1000)
        self.assertIsNone(self.model.topics)
        zt = self.model[0]
        self.assertEqual(len(zt), 8)
        self.assertEqual(zt[0][0], "olcustom")
        self.assertAlmostEqual(zt[0][1], 1.23752e-06, 6)

    def test_label(self):
        with self.assertRaises(ValueError):
            self.model.label_topics([1, 2, 3])
        with self.assertRaises(TypeError):
            self.model.label_topics(list(range(320)))
        self.model.label_topics([str(i) for i in range(320)])
        self.assertEqual(self.model.topics[0], "0")

    def test_save(self):
        with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
            self.model.save(f.name)
            new = Topics().load(f.name)
            self.assertEqual(self.model.tokens, new.tokens)
            self.assertEqual((self.model.matrix != new.matrix).getnnz(), 0)

    def test_bigartm2asdf(self):
        with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
            args = argparse.Namespace(
                input=os.path.join(os.path.dirname(__file__), paths.TOPICS_SRC),
                output=f.name)
            bigartm2asdf(args)
            model = Topics().load(f.name)
            self.assertEqual(len(model), 320)
            self.assertEqual(len(model.tokens), 1000)


if __name__ == "__main__":
    unittest.main()

As result we may notice seral tokens without position information:

0|argparse|import a|i
0|os|im|i
0|tempfile|import a|i
0|unittest|import a|i
0|sourced.ml.tests.models|import argparse
import |i
0|paths|impor|i
0|sourced.ml.models|import argparse
i|i
0|Topics|import|i
0|sourced.ml.cmd|import argpars|i
0|bigartm2asdf|import argpa|i
0|source|import|i
0|!=|im|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i
0|==|im|i
184|TopicsTests|TopicsTests|TopicsTests

SIGSEGV on filtering //@roleLiteral

I am using the most recent bblfsh client, exec the following code:

import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")
uast = client.parse("/path/to/src-d/gemini/gemini/python/setup.py").uast
list(bblfsh.filter(uast, "//@roleLiteral"))

I get SIGSEGV. gdb bt:

#0  PyFilter (self=<optimized out>, args=<optimized out>) at bblfsh/pyuast.c:168
#1  0x00000000004e10ef in PyCFunction_Call () at ../Objects/methodobject.c:109
#2  0x00000000005240b4 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffbde0) at ../Python/ceval.c:4705
#3  PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#4  0x000000000052cf19 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#5  0x000000000052dbcf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039
#6  PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:777
#7  0x0000000000543075 in builtin_exec_impl.isra.11 (locals=
    {'_i2': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'bblfsh': <module at remote 0x7ffff3285d68>, '_sh': <module at remote 0x7ffff475def8>, '_iii': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', '_': <Node at remote 0x7ffff4b973f8>, 'In': ['', 'import bblfsh', 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'uast = client.parse("setup.py").uast', 'uast', 'list(bblfsh.filter(uast, "//@roleLiteral"))'], '_oh': {4: <...>}, '_i3': 'uast = client.parse("setup.py").uast', '__package__': None, '_ii': 'uast = client.parse("setup.py").uast', 'Out': {...}, 'client': <BblfshClient(_channel=<Channel(_channel=<grpc._cython.cygrpc.Channel at remote 0x7ffff4b744a8>, _connectivity_state=<_ChannelConnectivityState(callbacks_and_connectivities=[[<function at remote 0x7ffff329cb70>, <ChannelConnectivity(_name_='READY', __objclass__=<EnumMeta(__module__='grpc', __new__=<function at remote 0x7ffff5d0d7b8>, _member_map_={'TRANSIENT_FAILURE': <ChannelConnectivity(_name_='TRANSIENT_FAILURE', __objclass__=<...>, _value_=(3, 'tran...(truncated), 
    globals={'_i2': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'bblfsh': <module at remote 0x7ffff3285d68>, '_sh': <module at remote 0x7ffff475def8>, '_iii': 'client = bblfsh.BblfshClient("0.0.0.0:9432")', '_': <Node at remote 0x7ffff4b973f8>, 'In': ['', 'import bblfsh', 'client = bblfsh.BblfshClient("0.0.0.0:9432")', 'uast = client.parse("setup.py").uast', 'uast', 'list(bblfsh.filter(uast, "//@roleLiteral"))'], '_oh': {4: <...>}, '_i3': 'uast = client.parse("setup.py").uast', '__package__': None, '_ii': 'uast = client.parse("setup.py").uast', 'Out': {...}, 'client': <BblfshClient(_channel=<Channel(_channel=<grpc._cython.cygrpc.Channel at remote 0x7ffff4b744a8>, _connectivity_state=<_ChannelConnectivityState(callbacks_and_connectivities=[[<function at remote 0x7ffff329cb70>, <ChannelConnectivity(_name_='READY', __objclass__=<EnumMeta(__module__='grpc', __new__=<function at remote 0x7ffff5d0d7b8>, _member_map_={'TRANSIENT_FAILURE': <ChannelConnectivity(_name_='TRANSIENT_FAILURE', __objclass__=<...>, _value_=(3, 'tran...(truncated), source=<code at remote 0x7ffff202e150>) at ../Python/bltinmodule.c:957
#8  builtin_exec () at ../Python/clinic/bltinmodule.c.h:275
#9  0x00000000004e10ef in PyCFunction_Call () at ../Objects/methodobject.c:109
#10 0x00000000005240b4 in call_function (oparg=<optimized out>, pp_stack=0x7fffffffc0b0) at ../Python/ceval.c:4705
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3236
#12 0x000000000052cf19 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018
#13 0x0000000000528b3f in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffc2c0, func=<optimized out>)
    at ../Python/ceval.c:4813
#14 call_function (oparg=<optimized out>, pp_stack=0x7fffffffc2c0) at ../Python/ceval.c:4730

Code:

static PyObject *PyFilter(PyObject *self, PyObject *args)                                                                                                    │
   │150     {                                                                                                                                                            │
   │151       PyObject *obj = NULL;                                                                                                                                      │
   │152       const char *query = NULL;                                                                                                                                  │
   │153       if (!PyArg_ParseTuple(args, "Os", &obj, &query))                                                                                                           │
   │154         return NULL;                                                                                                                                             │
   │155                                                                                                                                                                  │
   │156       Nodes *nodes = UastFilter(ctx, obj, query);                                                                                                                │
   │157       if (!nodes) {                                                                                                                                              │
   │158         char *error = LastError();                                                                                                                               │
   │159         PyErr_SetString(PyExc_RuntimeError, error);                                                                                                              │
   │160         free(error);                                                                                                                                             │
   │161         return NULL;                                                                                                                                             │
   │162       }                                                                                                                                                          │
   │163       int len = NodesSize(nodes);                                                                                                                                │
   │164       PyObject *list = PyList_New(len);                                                                                                                          │
   │165                                                                                                                                                                  │
   │166       for (int i = 0; i < len; i++) {                                                                                                                            │
   │167         PyObject *node = (PyObject *)NodeAt(nodes, i);                                                                                                           │
  >│168         Py_INCREF(node);                                                                                                                                         │
   │169         PyList_SET_ITEM(list, i, node);                                                                                                                          │
   │170       }                                                                                                                                                          │
   │171       NodesFree(nodes);                                                                                                                                          │
   │172       return PySeqIter_New(list);                                                                                                                                │
   │173     }
(gdb) p node
$1 = 0x0

py-bt

(gdb) py-bt
Traceback (most recent call first):
  <built-in method filter of module object at remote 0x7ffff3285ef8>
  File "<ipython-input-5-f6a6d28734b6>", line 1, in <module>
  <built-in method exec of module object at remote 0x7ffff7fbc5e8>
  File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

[broken doc] link to doc.bblf.sh is broken

Hi,
I noticed that link to https://doc.bblf.sh/user/language-clients.html is not working anymore.
Probably it should be replaced with https://doc.bblf.sh/using-babelfish/language-clients.html.

[memory leakage?] bblfsh.filter consumes more and more memory

Hi,
I noticed that bblfsh.filter causes high memory consumption.
Here it's script to reproduce

from collections import defaultdict, deque

import bblfsh


IDENTIFIER = bblfsh.role_id("IDENTIFIER")
QUALIFIED = bblfsh.role_id("QUALIFIED")
LITERAL = bblfsh.role_id("LITERAL")


def uast2sequence(root):
    sequence = []
    nodes = defaultdict(deque)
    stack = [root]
    nodes[id(root)].extend(root.children)
    while stack:
        if nodes[id(stack[-1])]:
            child = nodes[id(stack[-1])].popleft()
            nodes[id(child)].extend(child.children)
            stack.append(child)
        else:
            sequence.append(stack.pop())
    return sequence


def filter_bblfsh(n_times=1000,
                  py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
                  java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
    import bblfsh

    client = bblfsh.BblfshClient("0.0.0.0:9432")
    py_uast = client.parse(py_path).uast
    java_uast = client.parse(java_path).uast
    XPATH = "//*[@roleIdentifier and not(@roleQualified)]"

    for i in range(n_times):
        bblfsh.filter(py_uast, XPATH)
        bblfsh.filter(java_uast, XPATH)


def filter_alternative(n_times=1000,
                  py_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/python/pyspark/cloudpickle.py",
                  java_path="/home/egor/workspace/spark-2.2.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/JavaHdfsLR.java"):
    import bblfsh

    client = bblfsh.BblfshClient("0.0.0.0:9432")
    py_uast = client.parse(py_path).uast
    java_uast = client.parse(java_path).uast

    for i in range(n_times):
        list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
               uast2sequence(py_uast)))
        list(filter(lambda node: IDENTIFIER in node.roles and QUALIFIED not in node.roles,
               uast2sequence(java_uast)))


if __name__ == "__main__":
    import sys
    if int(sys.argv[1]) == 0:
        print("bblfsh")
        filter_bblfsh(n_times=int(sys.argv[2]))
    else:
        print("bblfsh-free")
        filter_alternative(n_times=int(sys.argv[2]))

and some measurements (surprisingly python code ~20 times faster than from bblfsh-client):

egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 0 400
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 100"
	User time (seconds): 13.19
	System time (seconds): 0.09
	Percent of CPU this job got: 97%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 119948
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 38584
	Voluntary context switches: 606
	Involuntary context switches: 49
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 200"
	User time (seconds): 26.68
	System time (seconds): 0.15
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.19
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 188672
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 63460
	Voluntary context switches: 1146
	Involuntary context switches: 115
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 0 400"
	User time (seconds): 54.72
	System time (seconds): 0.22
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:55.22
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 326392
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 113651
	Voluntary context switches: 2382
	Involuntary context switches: 164
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
egor@egor-sourced:~/workspace$ /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 100; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 200; /usr/bin/time -v python3 ml/sourced/ml/utils/misc.py 1 400
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 100"
	User time (seconds): 0.86
	System time (seconds): 0.03
	Percent of CPU this job got: 70%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.27
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37548
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 12861
	Voluntary context switches: 103
	Involuntary context switches: 7
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 200"
	User time (seconds): 1.50
	System time (seconds): 0.01
	Percent of CPU this job got: 80%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.88
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37172
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 15766
	Voluntary context switches: 123
	Involuntary context switches: 25
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
bblfsh-free
	Command being timed: "python3 ml/sourced/ml/utils/misc.py 1 400"
	User time (seconds): 2.69
	System time (seconds): 0.03
	Percent of CPU this job got: 87%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.11
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 37292
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 21603
	Voluntary context switches: 191
	Involuntary context switches: 38
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Import explodes on Python 3.7

import bblfsh.github.com.gogo.protobuf.gogoproto.gogo_pb2

fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py", line 670, in <module>
    google_dot_protobuf_dot_descriptor__pb2.EnumOptions.RegisterExtension(goproto_enum_prefix)
  File "/usr/local/lib/python3.7/site-packages/google/protobuf/internal/python_message.py", line 751, in RegisterExtension
    cls.DESCRIPTOR.file.pool.AddExtensionDescriptor(extension_handle)
  File "/usr/local/lib/python3.7/site-packages/google/protobuf/descriptor_pool.py", line 264, in AddExtensionDescriptor
    extension.containing_type.full_name, extension.number))
AssertionError: Extensions "gogoproto.goproto_enum_prefix" and "gogoproto.goproto_enum_prefix" both try to extend message type "google.protobuf.EnumOptions" with field number 62001.

Reproducible exclusively on Python 3.7.0 final (works fine with beta3). How to reproduce:

docker run -it --rm python:3.7.0-stretch bash
pip3 install bblfsh
python3 -c "import bblfsh.github.com.gogo.protobuf.gogoproto.gogo_pb2"

Related to protocolbuffers/protobuf#2533 but not really.

Looks like the top-level import of bblfsh is not visible to the import system and it imports the module twice. The workaround is to comment out that assertion inside protobuf.

[feature request] easy import of DESCRIPTOR, Node

Since it is hard to get DESCRIPTOR and Node modules, can we add it to bblfsh directly?
I suggest adding to __main__.py something like this:

import importlib

DESCRIPTOR = importlib.import_module(
        "bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % bblfsh.sdkversion.VERSION).DESCRIPTOR
Node = importlib.import_module(
        "bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % bblfsh.sdkversion.VERSION).Node

Then everyone can just do from bblfsh import Node, DESCRIPTOR.

Can make a PR if you approve.

import bblfsh fails on 2.8.0

I installed the most recent version 2.8 and I get:

In [1]: import bblfsh
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-e99ccc08549f> in <module>()
----> 1 import bblfsh

/usr/local/lib/python3.5/dist-packages/bblfsh/__init__.py in <module>()
----> 1 from bblfsh.client import BblfshClient
      2 from bblfsh.pyuast import filter, iterator
      3 from bblfsh.aliases import *
      4 
      5 class TreeOrder:

/usr/local/lib/python3.5/dist-packages/bblfsh/client.py in <module>()
      4 import grpc
      5 
----> 6 from bblfsh.aliases import ParseRequest, NativeParseRequest, VersionRequest, ProtocolServiceStub
      7 from bblfsh.sdkversion import VERSION
      8 

/usr/local/lib/python3.5/dist-packages/bblfsh/aliases.py in <module>()
     24 
     25 ParseResponse = importlib.import_module(
---> 26     "bblfsh.gopkg.in.bblfsh.sdk.%s.protocol.generated_pb2" % VERSION).ParseResponse
     27 
     28 NativeParseResponse = importlib.import_module(

/usr/lib/python3.5/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/usr/local/lib/python3.5/dist-packages/bblfsh/gopkg/in/bblfsh/sdk/v1/protocol/generated_pb2.py in <module>()
    101       message_type=None, enum_type=None, containing_type=None,
    102       is_extension=False, extension_scope=None,
--> 103       options=None, file=DESCRIPTOR),
    104     _descriptor.FieldDescriptor(
    105       name='language', full_name='gopkg.in.bblfsh.sdk.v1.protocol.NativeParseRequest.language', index=1,

TypeError: __new__() got an unexpected keyword argument 'file'

[memory leakage] when use filter

Hi,
Follow up #92 that had to be fixed by #94.
I checked again - there is a memory leakage. And quite huge.
script to reproduce

import bblfsh
from bblfsh import filter as filter_uast


def filter_bblfsh(n_times=100, file="/home/egor/workspace/code-annotation/java_code.java"):
    for i in range(n_times):
        extract_functions_from_uast(file=file)


client = bblfsh.BblfshClient("0.0.0.0:9432")
filter_uast = filter_uast
FUNC_XPATH = "//*[@roleFunction and @roleDeclaration]"
FUNC_NAME_XPATH = "/*[@roleFunction and @roleIdentifier and @roleName] " \
                  "| /*/*[@roleFunction and @roleIdentifier and @roleName]"
    
    
def extract_functions_from_uast(file="/home/egor/workspace/code-annotation/java_code.java"):
    uast = client.parse(file).uast
    allfuncs = list(filter_uast(uast, FUNC_XPATH))
    internal = set()
    for func in allfuncs:
        if id(func) in internal:
            continue
        for sub in filter_uast(func, FUNC_XPATH):
            if sub != func:
                internal.add(id(sub))
    for f in allfuncs:
        if id(f) not in internal:
            name = "+".join(n.token for n in filter_uast(f, FUNC_NAME_XPATH))


if __name__ == "__main__":
    import sys
    import resource

    before = resource.getrusage(resource.RUSAGE_SELF)
    file = "/home/egor/workspace/code-annotation/java_code.java"
    if len(sys.argv) == 3:
        file = sys.argv[2]
    filter_bblfsh(n_times=int(sys.argv[1]))
    after = resource.getrusage(resource.RUSAGE_SELF)
    print('Memory increased by: %d%%' % int(100 * ((after[2] / before[2]) - 1)))

and the results of measurements:

egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 100
Memory increased by: 2061%
egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 200
Memory increased by: 4109%
egor@egor-sourced:~/workspace/ml$ python3 test_filter_libuast.py 400
Memory increased by: 8158%

java code that I used:

/* 
 * ScreenSlicer (TM)
 * Copyright (C) 2013-2015 Machine Publishers, LLC
 * [email protected] | screenslicer.com | machinepublishers.com
 * Cincinnati, Ohio, USA
 *
 * You can redistribute this program and/or modify it under the terms of the GNU Affero General Public
 * License version 3 as published by the Free Software Foundation.
 *
 * ScreenSlicer is made available under the terms of the GNU Affero General Public License version 3
 * with the following clarification and special exception:
 *
 *   Linking ScreenSlicer statically or dynamically with other modules is making a combined work
 *   based on ScreenSlicer. Thus, the terms and conditions of the GNU Affero General Public License
 *   version 3 cover the whole combination.
 *
 *   As a special exception, Machine Publishers, LLC gives you permission to link unmodified versions
 *   of ScreenSlicer with independent modules to produce an executable, regardless of the license
 *   terms of these independent modules, and to copy, distribute, and make available the resulting
 *   executable under terms of your choice, provided that you also meet, for each linked independent
 *   module, the terms and conditions of the license of that module. An independent module is a module
 *   which is not derived from or based on ScreenSlicer. If you modify ScreenSlicer, you may not
 *   extend this exception to your modified version of ScreenSlicer.
 *
 * "ScreenSlicer", "jBrowserDriver", "Machine Publishers", and "automatic, zero-config web scraping"
 * are trademarks of Machine Publishers, LLC.
 * 
 * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
 * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
 * Affero General Public License version 3 for more details.
 * 
 * You should have received a copy of the GNU Affero General Public License version 3 along with this
 * program. If not, see <http://www.gnu.org/licenses/>.
 * 
 * For general details about how to investigate and report license violations, please see:
 * <https://www.gnu.org/licenses/gpl-violation.html> and email the author: [email protected]
 */
package com.screenslicer.core.scrape;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.NodeVisitor;

import com.screenslicer.api.datatype.HtmlNode;
import com.screenslicer.core.scrape.neural.NeuralNetManager;
import com.screenslicer.core.scrape.type.ComparableNode;
import com.screenslicer.core.util.NodeUtil;

public class Extract {
  private static final int SCORE_PENALTY = 10000;
  private static final int TWICE = 2;
  private static HashMap<Element, ComparableNode[]> nodesCache = new HashMap<Element, ComparableNode[]>();

  private static class TrainingData {
    private final ComparableNode target;
    private int finalMisses = 0;
    private boolean winner = false;
    private int winnerDistance = 0;
    private ComparableNode best = null;

    public TrainingData(ComparableNode target) {
      this.target = target;
    }
  }

  private static ComparableNode best(ComparableNode[] nodes, Integer[][] comparisonCache, Collection<Node> ignore, TrainingData trainingData) {
    int ignoreSize = ignore == null ? 0 : ignore.size();
    if (nodes.length - ignoreSize == 1) {
      if (ignore == null || ignore.isEmpty()) {
        return nodes[0];
      }
      for (int i = 0; i < nodes.length; i++) {
        if (!ignore.contains(nodes[i])) {
          return nodes[i];
        }
      }
    }
    if (comparisonCache == null) {
      comparisonCache = new Integer[nodes.length][nodes.length];
    }
    int adjustedLen = nodes.length - ignore.size();
    for (int failMax = 0; failMax < adjustedLen; failMax++) {
      Map<ComparableNode, Integer> winners = new HashMap<ComparableNode, Integer>();
      for (int i = 0; i < nodes.length; i++) {
        if (ignore != null && ignore.contains(nodes[i].node())) {
          continue;
        }
        boolean found = true;
        int fail = 0;
        for (int j = 0; j < nodes.length; j++) {
          if (ignore != null && ignore.contains(nodes[j].node())) {
            continue;
          }
          if (nodes[j] != null) {
            if (comparisonCache[i][j] == null) {
              int result = nodes[i].compare(nodes[j]);
              if (result != -1) {
                ++fail;
              }
              comparisonCache[i][j] = new Integer(result);
              comparisonCache[j][i] = new Integer(result * (-1));
            } else if (comparisonCache[i][j].intValue() != -1) {
              ++fail;
            }
            if (fail > failMax) {
              found = false;
              break;
            }
          }
        }
        if (found) {
          if (failMax == 0) {
            if (trainingData != null) {
              trainingData.winner = trainingData.target.equals(nodes[i]);
              trainingData.finalMisses = trainingData.winner ? 0 : 1;
              trainingData.winnerDistance = 0;
              trainingData.best = nodes[i];
            }
            return nodes[i];
          }
          winners.put(nodes[i], i);
        }
      }
      if (winners.size() == 1) {
        ComparableNode ret = winners.keySet().toArray(new ComparableNode[1])[0];
        if (trainingData != null) {
          trainingData.winner = trainingData.target.equals(ret);
          trainingData.finalMisses = trainingData.winner ? 0 : 1;
          trainingData.winnerDistance = failMax;
          trainingData.best = ret;
        }
        return ret;
      }
      if (!winners.isEmpty()) {
        int targetIndex = -1;
        ComparableNode[] winnersArray = winners.keySet().toArray(new ComparableNode[0]);
        if (trainingData != null) {
          trainingData.winnerDistance = failMax;
          if (winners.containsKey(trainingData.target)) {
            for (int i = 0; i < winnersArray.length; i++) {
              if (trainingData.target.equals(winnersArray[i])) {
                targetIndex = i;
                break;
              }
            }
          }
          if (targetIndex == -1) {
            trainingData.finalMisses = winners.size();
            trainingData.winner = false;
          }
        }
        for (int i = 0; i < winnersArray.length; i++) {
          boolean found = true;
          for (int j = 0; j < winnersArray.length; j++) {
            if (i != j) {
              int iCache = winners.get(winnersArray[i]);
              int jCache = winners.get(winnersArray[j]);
              if (comparisonCache[iCache][jCache] == null) {
                int result = winnersArray[i].compare(winnersArray[j]);
                comparisonCache[iCache][jCache] = new Integer(result);
                comparisonCache[jCache][iCache] = new Integer(result * (-1));
              }
              if (comparisonCache[iCache][jCache].intValue() != -1) {
                found = false;
                if (i != targetIndex) {
                  break;
                } else if (trainingData != null) {
                  ++trainingData.finalMisses;
                }
              }
            }
          }
          if (found) {
            if (trainingData != null) {
              trainingData.best = winnersArray[i];
            }
            if (targetIndex == i && trainingData != null) {
              trainingData.finalMisses = 0;
              trainingData.winner = true;
              return trainingData.target;
            } else if (targetIndex == -1 || targetIndex < i) {
              if (trainingData != null) {
                trainingData.winner = false;
              }
              return winnersArray[i];
            }
          } else if (targetIndex == i
              && trainingData != null
              && trainingData.best != null) {
            trainingData.winner = false;
            return trainingData.best;
          }
        }
        return null;
      }
    }
    return null;
  }

  public static ComparableNode[] trainInit(Element body, int page, int thread) {
    ComparableNode[] nodesArray = performInternal(body, page, null, null, null, thread);
    nodesCache.put(body, nodesArray);
    return nodesArray;
  }

  public static int train(Element body, int page, ComparableNode target, int targetIndex, int thread) {
    ComparableNode[] nodesArray = null;
    nodesArray = nodesCache.get(body);
    int score = 0;
    if (NeuralNetManager.instance(thread).isMulti()) {
      int votes = 0;
      final int majority = (NeuralNetManager.instance(thread).multiSize() / TWICE) + 1;
      boolean won = false;
      ComparableNode fallback = null;
      int[] distances = new int[NeuralNetManager.instance(thread).multiSize()];
      int curDistance = 0;
      Map<ComparableNode, Integer> votesMap = new HashMap<ComparableNode, Integer>();
      if (targetIndex < 0) {
        for (int i = 0; i < nodesArray.length; i++) {
          if (nodesArray[i].equals(target)) {
            targetIndex = i;
            break;
          }
        }
      }
      while (NeuralNetManager.instance(thread).hasNext()) {
        Integer[][] comparisonCache = new Integer[nodesArray.length][nodesArray.length];
        int distance = 0;
        for (int i = 0; i < nodesArray.length; i++) {
          if (!target.equals(nodesArray[i])) {
            int result = target.compare(nodesArray[i]);
            if (result != -1) {
              ++distance;
            }
            comparisonCache[targetIndex][i] = new Integer(result);
            comparisonCache[i][targetIndex] = new Integer(result * (-1));
          }
        }
        TrainingData trainingData = new TrainingData(target);
        ComparableNode tmp = best(nodesArray, comparisonCache, null, trainingData);
        if (tmp != null) {
          fallback = tmp;
        }
        if (trainingData.best != null) {
          if (!votesMap.containsKey(trainingData.best)) {
            votesMap.put(trainingData.best, new Integer(1));
          } else {
            votesMap.put(trainingData.best,
                new Integer(votesMap.get(trainingData.best).intValue() + 1));
          }
        }
        distance = (distance - trainingData.winnerDistance) + trainingData.finalMisses;
        NeuralNetManager.instance(thread).next();
        if (trainingData.winner) {
          ++votes;
        }
        if (votes == majority) {
          won = true;
          break;
        }
        distances[curDistance++] = distance;
      }
      NeuralNetManager.instance(thread).resetNext();
      if (!won) {
        int maxVotes = 0;
        ComparableNode maxComparableNode = null;
        for (Map.Entry<ComparableNode, Integer> entry : votesMap.entrySet()) {
          if (entry.getValue().intValue() == maxVotes) {
            maxComparableNode = null;
          } else if (entry.getValue().intValue() > maxVotes) {
            maxVotes = entry.getValue().intValue();
            maxComparableNode = entry.getKey();
          }
        }
        if (maxComparableNode == null) {
          maxComparableNode = fallback;
        }
        if (!target.equals(maxComparableNode)) {
          int totalDistance = 0;
          Arrays.sort(distances);
          for (int i = 0; i < majority; i++) {
            totalDistance += distances[i];
          }
          score += totalDistance + SCORE_PENALTY;
        }
      }
    } else {
      int distance = 0;
      Integer[][] comparisonCache = new Integer[nodesArray.length][nodesArray.length];
      if (targetIndex < 0) {
        for (int i = 0; i < nodesArray.length; i++) {
          if (nodesArray[i].equals(target)) {
            targetIndex = i;
            break;
          }
        }
      }
      for (int i = 0; i < nodesArray.length; i++) {
        if (!target.equals(nodesArray[i])) {
          int result = target.compare(nodesArray[i]);
          if (result != -1) {
            ++distance;
          }
          comparisonCache[targetIndex][i] = new Integer(result);
          comparisonCache[i][targetIndex] = new Integer(result * (-1));
        }
      }
      TrainingData trainingData = new TrainingData(target);
      best(nodesArray, comparisonCache, null, trainingData);
      score += (distance - trainingData.winnerDistance) + trainingData.finalMisses;
      score += trainingData.winner ? 0 : SCORE_PENALTY;
    }
    return score;
  }

  private static ComparableNode[] performInternal(final Element body, final int page,
      final HtmlNode matchResult, final HtmlNode matchParent, final Collection<Node> ignore, int thread) {
    final Map<Node, ComparableNode> nodes = new HashMap<Node, ComparableNode>();
    if (body != null) {
      body.traverse(new NodeVisitor() {
        @Override
        public void head(Node node, int depth) {
          int nonEmptyChildren = 0;
          for (Node child : node.childNodes()) {
            if (!NodeUtil.isEmpty(child)) {
              nonEmptyChildren++;
            }
          }
          if (!NodeUtil.isEmpty(node)
              && NodeUtil.isContent(node, matchResult, matchParent) && nonEmptyChildren > 0) {
            nodes.put(node, new ComparableNode(node, matchResult, matchParent, thread));
          }
        }

        @Override
        public void tail(Node node, int depth) {}
      });
    }
    return nodes.values().toArray(new ComparableNode[0]);
  }

  public static class Cache {
    public ComparableNode[] nodesCache = null;
    public Integer[][][] comparisonCache = null;
  }

  public static List<Node> perform(Element body, int page, Collection<Node> ignore,
      HtmlNode matchResult, HtmlNode matchParent, Cache cache, int thread) {
    Map<ComparableNode, Integer> votes = new LinkedHashMap<ComparableNode, Integer>();
    if (cache == null) {
      cache = new Cache();
    }
    if (cache.nodesCache == null) {
      cache.nodesCache = performInternal(body, page, matchResult, matchParent, ignore, thread);
      cache.comparisonCache = new Integer[NeuralNetManager.instance(thread).multiSize()]
          [cache.nodesCache.length][cache.nodesCache.length];
    }
    final int majority = (NeuralNetManager.instance(thread).multiSize() / TWICE) + 1;
    Node best = null;
    int cur = 0;
    NeuralNetManager.instance(thread).resetNext();
    while (NeuralNetManager.instance(thread).hasNext()) {
      ComparableNode winner = best(cache.nodesCache, cache.comparisonCache[cur++],
          new HashSet<Node>(ignore), null);
      NeuralNetManager.instance(thread).next();
      if (winner != null) {
        if (!votes.containsKey(winner)) {
          votes.put(winner, new Integer(1));
        } else {
          votes.put(winner, new Integer(votes.get(winner).intValue() + 1));
        }
        if (votes.get(winner).intValue() == majority) {
          best = winner.node();
          break;
        }
      }
    }
    if (best == null) {
      int bestVotes = 0;
      List<Node> bestNodes = new ArrayList<Node>();
      for (Map.Entry<ComparableNode, Integer> entry : votes.entrySet()) {
        int val = entry.getValue().intValue();
        if (val >= bestVotes) {
          if (val > bestVotes) {
            bestVotes = val;
            bestNodes.clear();
          }
          bestNodes.add(entry.getKey().node());
        }
      }
      return bestNodes;
    }
    return Arrays.asList(new Node[] { best });
  }
}

`language` argument in `client.parse` changes number of files signigicantly

Hi,
I found the strange behavior of language argument in client.parse.
When you parse files with and without this argument and after select files with specific language - it returns a different number of files.
Code for reproducibility:

import argparse
import glob
import os

import bblfsh
from bblfsh.client import NonUTF8ContentException


def prepare_files(folder, client, language, use_lang=True):
    files = []

    # collect filenames with full path
    filenames = glob.glob(folder, recursive=True)

    for file in filenames:
        if not os.path.isfile(file):
            continue
        try:
            # TODO (Egor): figure out why `language` argument changes number of files significantly
            if use_lang:
                res = client.parse(file, language)
            else:
                res = client.parse(file)
        except NonUTF8ContentException:
            # skip files that can't be parsed because of UTF-8 decoding errors.
            continue
        if res.status == 0 and res.language.lower() == language.lower():
            files.append("")
    return files


def test_client(args):
    client = bblfsh.BblfshClient(args.bblfsh)
    files = prepare_files(args.input, client, args.language, args.use_lang)
    print("Number of files: %s" % (len(files)))


def create_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--input", required=True, type=str,
                        help="Path to folder with source code - "
                             "should be in a format compatible with glob (ends with**/* "
                             "and surrounded by quotes. Ex: `path/**/*`).")
    parser.add_argument("--bblfsh", default="0.0.0.0:9432",
                        help="Babelfish server's address.")
    # I'm using javascript for experiments
    parser.add_argument("-l", "--language", default="javascript",
                        help="Programming language to use.")
    parser.add_argument("-u", "--use-lang", action="store_true",
                        help="If lang in client.parse should be used.")
    return parser


def main():
    parser = create_parser()
    args = parser.parse_args()
    client = bblfsh.BblfshClient(args.bblfsh)
    files = prepare_files(args.input, client, args.language, args.use_lang)
    print("Number of files %s with args %s" % (len(files), args))


if __name__ == "__main__":
    main()

and results:

egor@egor-sourced:~/workspace/style-analyzer$ python3 lookout/style/format/test_client.py -i '/home/egor/workspace/tmp/freeCodeCamp/**/*'  
Number of files 187 with args Namespace(bblfsh='0.0.0.0:9432', input='/home/egor/workspace/tmp/freeCodeCamp/**/*', language='javascript', use_lang=False)
egor@egor-sourced:~/workspace/style-analyzer$ python3 lookout/style/format/test_client.py -i '/home/egor/workspace/tmp/freeCodeCamp/**/*'  -u
Number of files 258 with args Namespace(bblfsh='0.0.0.0:9432', input='/home/egor/workspace/tmp/freeCodeCamp/**/*', language='javascript', use_lang=True)

New grcpio causes conflict

In requirements.txt:

grpcio>=1.3,<=2.0
grpcio-tools>=1.3,<2.0

In setup.py:

 install_requires=[" install_requires=["grpcio==1.10.0", "grpcio-tools==1.10.0", "docker", "protobuf>=3.4.0"],==1.10.0", "grpcio-tools==1.10.0", "docker", "protobuf>=3.4.0"],

Now that version 1.11.0 of grcpio is out, this causes conflicts during coverage check for ml, among other things.

Could you either check that 1.11 is supported and change setup, or modify requirements to insure we use version 1.10 ?

[feature request] add position information

Hi,
I found that python-driver has lack of position information for several types of tokens.

import bblfsh

client = bblfsh.BblfshClient("0.0.0.0:9432")
file_loc = "location/of/file.py"

# read content
with open(file_loc, "r") as f:
    content = f.read()

# extract uast
uast = client.parse(file_loc).uast

# select nodes with tokens and sort them by position
nodes = []
for node in bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER):
    if node.token:
        nodes.append(node)
nodes = list(sorted(nodes, key=lambda n: n.start_position.offset))

# print token position, token, select source by position information
for n in nodes:
    print(n.start_position.offset, n.token,
              content[n.start_position.offset:n.start_position.offset + len(n.token)],
              content[n.start_position.offset:n.end_position.offset + 1],
              sep="|")

The source code I used is in details

import argparse
import os
import tempfile
import unittest

import sourced.ml.tests.models as paths
from sourced.ml.models import Topics
from sourced.ml.cmd import bigartm2asdf


class TopicsTests(unittest.TestCase):
    def setUp(self):
        self.model = Topics().load(source=paths.TOPICS)

    def test_dump(self):
        res = self.model.dump()
        self.assertEqual(res, """320 topics, 1000 tokens
First 10 tokens: ['ulcancel', 'domainlin', 'trudi', 'fncreateinstancedbaselin', 'wbnz', 'lmultiplicand', 'otronumero', 'qxln', 'gvgq', 'polaroidish']
Topics: unlabeled
non-zero elements: 6211  (0.019409)""")  # noqa

    def test_props(self):
        self.assertEqual(len(self.model), 320)
        self.assertEqual(len(self.model.tokens), 1000)
        self.assertIsNone(self.model.topics)
        zt = self.model[0]
        self.assertEqual(len(zt), 8)
        self.assertEqual(zt[0][0], "olcustom")
        self.assertAlmostEqual(zt[0][1], 1.23752e-06, 6)

    def test_label(self):
        with self.assertRaises(ValueError):
            self.model.label_topics([1, 2, 3])
        with self.assertRaises(TypeError):
            self.model.label_topics(list(range(320)))
        self.model.label_topics([str(i) for i in range(320)])
        self.assertEqual(self.model.topics[0], "0")

    def test_save(self):
        with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
            self.model.save(f.name)
            new = Topics().load(f.name)
            self.assertEqual(self.model.tokens, new.tokens)
            self.assertEqual((self.model.matrix != new.matrix).getnnz(), 0)

    def test_bigartm2asdf(self):
        with tempfile.NamedTemporaryFile(prefix="sourced.ml-topics-test-") as f:
            args = argparse.Namespace(
                input=os.path.join(os.path.dirname(__file__), paths.TOPICS_SRC),
                output=f.name)
            bigartm2asdf(args)
            model = Topics().load(f.name)
            self.assertEqual(len(model), 320)
            self.assertEqual(len(model.tokens), 1000)


if __name__ == "__main__":
    unittest.main()

As result we may notice seral tokens without position information:

0|argparse|import a|i
0|os|im|i
0|tempfile|import a|i
0|unittest|import a|i
0|sourced.ml.tests.models|import argparse
import |i
0|paths|impor|i
0|sourced.ml.models|import argparse
i|i
0|Topics|import|i
0|sourced.ml.cmd|import argpars|i
0|bigartm2asdf|import argpa|i
0|source|import|i
0|!=|im|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i
0|==|im|i
184|TopicsTests|TopicsTests|TopicsTests

some of them are imports like

0|argparse|import a|i
0|os|im|i

some operators

0|==|im|i
0|!=|im|i

some arguments

0|source|import|i
0|prefix|import|i
0|input|impor|i
0|output|import|i
0|prefix|import|i

ERROR:root:Exception deserializing message!

How to reproduce:

wget https://raw.githubusercontent.com/lumoslabs/aleph/master/vendor/assets/javascripts/ace/mode-pgsql.js

in python:

import bblfsh

client = bblfsh.BblfshClient("0.0.0.0:9432")
client.parse("mode-pgsql.js")

And you will get:

ERROR:root:Exception deserializing message!
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 87, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "/home/k/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-6f0fa647fcd6>", line 1, in <module>
    resp = bf.parse("/home/k/sourced/workdir/mode-pgsql.js")
  File "/usr/local/lib/python3.5/dist-packages/bblfsh/client.py", line 74, in parse
    return self._stub.Parse(request, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 487, in call
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 437, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>

versions:

  • python-client: 2.9.6
  • bblfsh container:
REPOSITORY                       TAG                 IMAGE ID                                                                  CREATED             SIZE
bblfsh/bblfshd                   latest              sha256:fb5eb6936c67c20f451daddc09a4648cc26da978041af36f9f70b246f71d804d   13 days ago         173MB
  • javascript driver:
| javascript | docker://bblfsh/javascript-driver:latest | dev-adcd1b4 | beta      | 11 days  | alpine | 1.9 | 8.9.3-r0    |

I am not sure is it relatet to client-python or to javascript driver...

bblfsh client-python failed to install on mac since grpcio new release 1.10

the release was today: https://github.com/grpc/grpc/releases/tag/v1.10.0
issue: grpc/grpc#14573

run:

mkdir test
cd test
virtualenv -p python3 .venv-py3
source .venv-py3/bin/activate
pip install bblfsh

you will get an error from grpcio installation:

Command "/Users/k/test/.venv-py3/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/private/var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-build-y50usx77/grpcio/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-5owjmdjl-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/k/test/.venv-py3/bin/../include/site/python3.6/grpcio" failed with error code 1 in /private/var/folders/v5/3pvllllj4l1dvx7s_2qkq0_40000gn/T/pip-build-y50usx77/grpcio/

full log file:
grpcio.logs.zip

I can suggest fixing grpcio package version. I am also fine with any other solution, let's discuss.

Numeric role with a specific snippet

With newlines included:

class Repo2nBOW(Repo2Base):
    @property
    def id2vec(self):
        return self._id2vec











This produce two instances of role 141 but only with the client-python, it works on the integration tests and other ways to produce the UAST, so probably the bug is in this project.

Makefile: Download versioned protofiles

Currently the sdk/generated.proto and uast/generated.proto files are included in the repository. To correctly support SDK versioning, it should download them from gopkg.in redirect URL.

Text improvement on clients documentation

In the clients guide:

The client API's differ to adapt to their language specific idioms, the following [codes] [shows] several simple examples with the Go, Python and Scala clients that [parsers] a file and [applies] a filter to return all the simple identifiers.

  • I don't know if it's correct in english using the plural codes. Maybe it could be replaced with code snippets for example.
  • shows should be show also.
  • parsers -> parse
  • applies -> apply

As I'm not native english speaker, every time I mention something non-trivial please double-check if possible in case I am wrong.

Issues with `version()` and `supported_languages()`

I run the following code and get:

import bblfsh
client = bblfsh.BblfshClient("0.0.0.0:9432")

print(client.version())
elapsed {
}
version: "v2.11.0"
build {
  seconds: -62135596800
}

print(client.supported_languages())
[name: "JavaScript"
language: "javascript"
version: "dev-adcd1b4"
status: "beta"
features: "ast"
features: "uast"
features: "roles"
]

The version string has a negative number of seconds and empty "elapsed".

The javascript driver version is reported to be "dev-adcd1b4", which does not help to check that it is 1.2.0 (which we strictly require since the newer versions do not work for us).

Queries are slow (v1)

Hi,
I measured time for the same query that I used in #100 - I think it's very suspicious that it's so slow.
Measurements:

egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 100 sourced/ml/__main__.py 
Memory increased by: 2099%

real	0m32.773s
user	0m25.209s
sys	0m0.425s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 200 sourced/ml/__main__.py 
Memory increased by: 4130%

real	1m22.348s
user	1m5.700s
sys	0m0.776s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 400 sourced/ml/__main__.py 
Memory increased by: 8275%

real	2m49.173s
user	2m19.193s
sys	0m1.528s

Convert a UAST into Python AST

Hi. I am new to bblfsh and was wondering if I could convert a UAST into Python's AST structure defined by the ast module. Thanks!

bblfsh.filter(uast, "//*[@roleAssignment]") does not work.

if you extract UAST for a=None and run this command:

bblfsh.filter(uast, "//*[@roleAssignment]") 

you will get empty list.
Also for "//*[@roleNull]" queury.

P.S. UAST structure:

#  Token    Internal Role  Roles Tree                          
                                                               
   ||       Module         FILE                                
1  ||       Assign         ┣ BINARY, ASSIGNMENT, EXPRESSION    
1  |a|      Name           ┃ ┣ LEFT, IDENTIFIER, EXPRESSION    
1  |<nil>|  NoneLiteral    ┗ ┗ LITERAL, NULL, EXPRESSION, RIGHT

Python driver causes error during deserialization

This might be linked to issue 89, when I looked for this kind of error on slack I found a thread from. couple days ago where Egor had this issue, and @smola referenced it. However after looking at it, I do not think it is the same, given it seems to concern JavaScript files.

Context

Was trying to run apollo bags on 4G of files, specified languages the following languages: Bash, Java, Python, Ruby, then during deserialization I got this error log:

org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 177, in main
    process()
  File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 172, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 268, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/basic.py", line 186, in deserialize_uast
  File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/basic.py", line 186, in <listcomp>
google.protobuf.message.DecodeError: Error parsing message

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
	at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:156)
	at org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:152)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:372)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1055)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:760)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:395)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The error popped at this line), as you can see a fix has already been implemented to circumvent this for now but still.

Since I had not seen this error before, I tried to specify only one language at a time to see if it was a driver problem. After trials, was surprised to see it was the Python driver that caused this (surprised because when doing tests on another batch of siva files where i know there where Python files, this error did not pop up). When looking in the executor logs, I found these kind of errors logs:

18/04/04 14:11:56 WARN Bblfsh: FATAL os/src/shell/micropython/tests/pyboard.py: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py", line 139, in process_request
    '\n------ Python3 errors:\n%s' % codeinfo['py3_ast_errors']
Exception: Errors produced trying to get an AST for both Python versions
------ Python2 errors:
[b'Traceback (most recent call last):\n  File "<string>", line 1, in <module>\n  File "/usr/lib/python2.7/site-packages/pydetector/ast2dict.py", line 19, in ast2dict\n    return visitor.parse()\n  File "/usr/lib/python2.7/site-packages/pydetector/ast2dict.py", line 45, in parse\n    tree = ast.parse(self.codestr, mode=\'exec\')\n  File "/usr/lib/python2.7/ast.py", line 37, in parse\n    return compile(source, filename, mode, PyCF_ONLY_AST)\n  File "<unknown>", line 1\n    ../tools/pyboard.py\n    ^\nSyntaxError: invalid syntax\n']
------ Python3 errors:
['Traceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/pydetector/ast_checks.py", line 53, in check_ast\n    current_ast = ast2dict(code)\n  File "/usr/lib/python3.6/site-packages/pydetector/ast2dict.py", line 19, in ast2dict\n    return visitor.parse()\n  File "/usr/lib/python3.6/site-packages/pydetector/ast2dict.py", line 45, in parse\n    tree = ast.parse(self.codestr, mode=\'exec\')\n  File "/usr/lib/python3.6/ast.py", line 35, in parse\n    return compile(source, filename, mode, PyCF_ONLY_AST)\n  File "<unknown>", line 1\n    ../tools/pyboard.py\n    ^\nSyntaxError: invalid syntax\n']

So yeah, don't have much more info then this, this is not insanely urgent as we have a workaround at the moment, but still pretty annoying.

Function `supported_languages` raises unimplemented exception

Hi,
I tried to call https://github.com/bblfsh/client-python/blob/master/bblfsh/client.py#L103:

client.supported_languages()

and it gives me

_Rendezvous                               Traceback (most recent call last)
<ipython-input-11-50b543ce587c> in <module>()
----> 1 client.supported_languages()

/usr/local/lib/python3.5/dist-packages/bblfsh/client.py in supported_languages(self)
    102 
    103     def supported_languages(self):
--> 104         sup_response = self._stub.SupportedLanguages(SupportedLanguagesRequest())
    105         return sup_response.languages
    106 

/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials)
    512     def __call__(self, request, timeout=None, metadata=None, credentials=None):
    513         state, call, = self._blocking(request, timeout, metadata, credentials)
--> 514         return _end_unary_response_blocking(state, call, False, None)
    515 
    516     def with_call(self, request, timeout=None, metadata=None, credentials=None):

/usr/local/lib/python3.5/dist-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    446             return state.response
    447     else:
--> 448         raise _Rendezvous(state, None, None, deadline)
    449 
    450 

_Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "unknown method SupportedLanguages"
	debug_error_string = "{"created":"@1536337572.354363571","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1083,"grpc_message":"unknown method SupportedLanguages","grpc_status":12}"
>

I updated bblfsh-client version to bblfsh-2.12.1 but in python it still shows client version:

elapsed {
}
version: "v2.4.1"
build {
  seconds: -62135596800
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.