Coder Social home page Coder Social logo

gitmahsa / pythia-ai-code-completion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from motykatomasz/pythia-ai-code-completion

0.0 0.0 0.0 134 KB

Project reproducing paper: "Pythia, AI-assisted code completion system". The project was done for the course "Machine Learning for Software Engineering" at TU Delft.

Python 93.15% Jupyter Notebook 6.85%

pythia-ai-code-completion's Introduction

Group 3: Code Completion

Paper replicating: Pythia: AI-assisted Code Completion System
Dataset: 150k Python Dataset

Dataset statistics

Python150 marks 100k files as training files and 50k files as evaluation/test files. Using a deduplication tool the original files were deduplicated resulting in 84728 training files and 42372 test files.

In the preprocessing phase files which couldn't be parsed to an AST (e.g. because the Python version was too old) were removed. This reduced the training files to 75183 AST's. From these AST's a vocabulary of tokens is built using a threshold of 20. This means that tokens which occur in the training set more than 20 times will be added to the vocabulary. This resulted in a vocabulary of size 43853.

Experiments

Shared parameters:

batch size: 64
embed dimension: 150
hidden dimension: 500
num LSTM layers: 2
lookback tokens: 100
norm clipping: 10
initial learning rate: 2e-3
learning rate schedule: decay of 0.97 every epoch
epochs: 15

Deployed code: experiments release

Data used:
Training set (1970000 items): download here
Validation set (227255 items): download here
Evaluation set (911213 items): download here
Vocabulary (size: 43853): download here

Experiment 1 | Regularization - L2 regularizer

L2 parameter of 1e-6 (also done here).

Top-1 accuracy Top-5 accuracy
Validation set 46.61% 71.67%
Evaluation set 47.89% 69.76%

plot

Resulting model: final_model_experiment_1

Experiment 2 | Regularization - Dropout

Dropout parameter of 0.8 (based on Pythia).

Top-1 accuracy Top-5 accuracy
Validation set 38.53% 63.31%
Evaluation set 39.37% 61.15%

plot

Resulting model: final_model_experiment_2

Experiment 3 | No regularization

No L2, dropout or weighted loss.

Top-1 accuracy Top-5 accuracy
Validation set 43.03% 67.37%
Evaluation set 46.63% 68.67%

plot

Resulting model: final_model_experiment_3

Experiment 4 | Regularization - Weighted loss + L2

Includes a weighted loss + L2 (1e-6).

Top-1 accuracy Top-5 accuracy
Validation set 40.53% 63.62%
Evaluation set 41.31% 60.84%

plot

Resulting model: final_model_experiment_4

Experiment 5 | bi-directional LSTM

Using a bi-directional LSTM instead of uni-directional. Also includes L2 regularizer (1e-6).

Top-1 accuracy Top-5 accuracy
Validation set 48.60% 71.49%
Evaluation set 49.87% 70.11%

plot

Resulting model: final_model_experiment_5

Experiment 6 | attention

Using an attention mechanism. Also includes L2 regularizer.

Top-1 accuracy Top-5 accuracy
Validation set 51.10% 73.95%
Evaluation set 52.90% 72.86%

plot

Resulting model: final_model_experiment_6

Experiment 7 | attention

Using an attention mechanism. Also includes L2 regularizer (1e-6) and a (lower) dropout (0.4).

Top-1 accuracy Top-5 accuracy
Validation set 51.51% 74.17%
Evaluation set 53.70% 73.22%

plot

Resulting model: final_model_experiment_7

Experiment 8 | Regularizer - (low) dropout

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4)

Top-1 accuracy Top-5 accuracy
Validation set 47.31% 71.89%
Evaluation set 48.56% 70.71%

plot Resulting model: final_model_experiment_8

Experiment 9 | Attention different

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4)

Top-1 accuracy Top-5 accuracy
Validation set 50.67% 73.38%
Evaluation set 52.69%% 72.36%

plot Resulting model: final_model_experiment_9

Experiment 10 | Attention final

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4). Runs for 30 epochs.

Top-1 accuracy Top-5 accuracy
Validation set 52.20% 75.12%
Evaluation set 54.80% 74.51%

plot Resulting model: final_model_experiment_10

Experiment 11 | Best regularizer

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4).

Top-1 accuracy Top-5 accuracy
Validation set 47.26% 71.96%
Evaluation set 48.51% 70.60%

plot Resulting model: final_model_experiment_11

pythia-ai-code-completion's People

Contributors

motykatomasz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.