Coder Social home page Coder Social logo

saltudelft / type4py Goto Github PK

View Code? Open in Web Editor NEW
60.0 8.0 13.0 206 KB

Type4Py: Deep Similarity Learning-Based Type Inference for Python

License: Apache License 2.0

Python 98.16% Shell 1.10% CSS 0.07% HTML 0.19% Dockerfile 0.47%
typeinference deeplearning python type4py similarity-learning machinelearning ml4se

type4py's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

type4py's Issues

Error in variable initialisation

When using the preprocess command with only the -o argument, the code crashes with the following

UnboundLocalError: local variable 'train_files_vars' referenced before assignment

This is because in the following extract

if all(processed_proj_fns['set'].isin(['train', 'valid', 'test'])) and \
all(processed_proj_vars['set'].isin(['train', 'valid', 'test'])):
logger.info("Found the sets split in the input dataset")
train_files = processed_proj_fns['file'][processed_proj_fns['set'] == 'train']
valid_files = processed_proj_fns['file'][processed_proj_fns['set'] == 'valid']
test_files = processed_proj_fns['file'][processed_proj_fns['set'] == 'test']
train_files_vars = processed_proj_vars['file'][processed_proj_vars['set'] == 'train']
valid_files_vars = processed_proj_vars['file'][processed_proj_vars['set'] == 'valid']
test_files_vars = processed_proj_vars['file'][processed_proj_vars['set'] == 'test']
else:
logger.info("Splitting sets randomly")
train_files, test_files = train_test_split(pd.DataFrame(processed_proj_fns['file'].unique(), columns=['file']),
test_size=0.2)
train_files, valid_files = train_test_split(pd.DataFrame(processed_proj_fns[processed_proj_fns['file'].isin(train_files.to_numpy().flatten())]['file'].unique(),
columns=['file']), test_size=0.1)

the train_files_vars variable is only initialised in the if branch

Crash when trying to infer single file with freshly trained model using ManyTypes4Py

Hello, thank you for creating and providing this great project! I plan to use this project for my bachelor thesis. Therefore, I am mainly interested in the inference functionality provided with infer.py on branch server (branch infer seems to be outdated).
I am aware of the VS Code extension and the public JSON API. I, however, prefer to use this project locally.

Since infer.py takes a pre-trained model as a program argument, I followed all the steps in the README to train such a model.
Unfortunately, the script crashes with the following message (excerpt):

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: tok for the following indices
 index: 0 Got: 7 Expected: 1
 Please fix either the inputs or the model.

Below you can find a link to a Google Colab notebook with all the steps from start (downloading the ManyTypes4Py dataset, pip-installing type4py, preprocessing) to finish (training a model, trying to infer the types of a single file) and the corresponding output from when I ran it the last time (including the full error backtrace on the bottom):

https://colab.research.google.com/drive/1kRIffMlgGCeW55wXelksGrXfSd0WjhKQ?usp=sharing

It should be relatively self-explanatory. Evidently, I use a fork of this project and not the project itself. The differences are minor though: In learn.py, I just re-uncommented the .to(DEVICE)-calls (c42144d) as otherwise it would lead to a crash in the notebook (vectors are on different devices). The remaining changes don't affect Python files and are not relevant to this issues.
Further, I am using venv, although I doubt this has any negative influence on the execution of this project.


My question is, how can I successfully use infer.py? How can I obtain a proper compatible model for it?
Are any of those steps in the linked notebook incorrect?

JSON output file not JSON conformant

The JSON output file is not JSON conformant in two aspects:

  1. Single quotes (') are used instead of double quotes(")
  2. Some words such as None, True or False are not wrapped in any quotes at all

This may affect some simpler JSON parsers, better JSON parsers can handle these minor errors just fine.

'error': None
#should be
"error": "None"

Cannot preprocess ManyTypes4Py dataset

Hey there,

I am currently trying to getting this project up and running and was following the instructions to train the model using the ManyTypes4Py dataset. Unfortunately, the preprocess command just skips the dataset (or rather, does not find any relevant information). I solved this issue by removing the files all_fns.csv and all_vars.csv and symlinking processed_projects_complete to processed_projects.

Did I miss anything during the setup? Are those steps expected and should be added to the documentation?

Integrate with pyre incremental and adapt the TypeWriter search strategy

It would be interesting to see how well the TypeWriter algorithm (https://software-lab.org/publications/TypeWriter_arXiv_1912.03768.pdf) for searching type annotation suggestions works against type4py. We might get dramatically better results for two reasons:

  • type4py's ML model seems to perform quite a bit better
  • today's pyre incremental is orders of magnitude faster than pyre was when the TypeWriter paper was written, so we may be able to try many more combinations and get correspondingly better results

At one point we'd considered hacking this very quickly as an internal project in my company, but we ran out of time. I think it would be better done open-source anyway because then

  • it would be easier to try out against external projects
  • we could publish our results with code if they are interesting enough to be worth a paper
  • the entire OSS community could benefit

I'm unsure if I can find time to prioritize this in the next 6 months at work but it's a little more likely if I treat it as a side project, which would also open the door to an informal weekend hackathon as a way to kick it off :)

I could do this in a separate repository or inside of type4py. What do you think @mir-am ? And does this sound interesting to you?

Return fixed amount of type predictions

I experimented with the type prediction (http://localhost:5001/api/predict?tc=0) using the provided docker image.
I noticed that depending on the analysed source code, I get different amounts of type predictions per parameter/return/variable type.
Is it possible to retrieve a fixed number of predicted types?
For example, I would like to retrieve the Top-10 type predictions for each parameter and return type.

Best regards
Florian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.