Light

gitmahsa / pythia-ai-code-completion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from motykatomasz/pythia-ai-code-completion

0.0 0.0 0.0 134 KB

Project reproducing paper: "Pythia, AI-assisted code completion system". The project was done for the course "Machine Learning for Software Engineering" at TU Delft.

Python 93.15% Jupyter Notebook 6.85%

pythia-ai-code-completion's Introduction

Group 3: Code Completion

Paper replicating: Pythia: AI-assisted Code Completion System
Dataset: 150k Python Dataset

Dataset statistics

Python150 marks 100k files as training files and 50k files as evaluation/test files. Using a deduplication tool the original files were deduplicated resulting in 84728 training files and 42372 test files.

In the preprocessing phase files which couldn't be parsed to an AST (e.g. because the Python version was too old) were removed. This reduced the training files to 75183 AST's. From these AST's a vocabulary of tokens is built using a threshold of 20. This means that tokens which occur in the training set more than 20 times will be added to the vocabulary. This resulted in a vocabulary of size 43853.

Experiments

Shared parameters:

batch size: 64
embed dimension: 150
hidden dimension: 500
num LSTM layers: 2
lookback tokens: 100
norm clipping: 10
initial learning rate: 2e-3
learning rate schedule: decay of 0.97 every epoch
epochs: 15

Deployed code: experiments release

Data used:
Training set (1970000 items): download here
Validation set (227255 items): download here
Evaluation set (911213 items): download here
Vocabulary (size: 43853): download here

Experiment 1 | Regularization - L2 regularizer

L2 parameter of 1e-6 (also done here).

	Top-1 accuracy	Top-5 accuracy
Validation set	46.61%	71.67%
Evaluation set	47.89%	69.76%

Resulting model: final_model_experiment_1

Experiment 2 | Regularization - Dropout

Dropout parameter of 0.8 (based on Pythia).

	Top-1 accuracy	Top-5 accuracy
Validation set	38.53%	63.31%
Evaluation set	39.37%	61.15%

Resulting model: final_model_experiment_2

Experiment 3 | No regularization

No L2, dropout or weighted loss.

	Top-1 accuracy	Top-5 accuracy
Validation set	43.03%	67.37%
Evaluation set	46.63%	68.67%

Resulting model: final_model_experiment_3

Experiment 4 | Regularization - Weighted loss + L2

Includes a weighted loss + L2 (1e-6).

	Top-1 accuracy	Top-5 accuracy
Validation set	40.53%	63.62%
Evaluation set	41.31%	60.84%

Resulting model: final_model_experiment_4

Experiment 5 | bi-directional LSTM

Using a bi-directional LSTM instead of uni-directional. Also includes L2 regularizer (1e-6).

	Top-1 accuracy	Top-5 accuracy
Validation set	48.60%	71.49%
Evaluation set	49.87%	70.11%

Resulting model: final_model_experiment_5

Experiment 6 | attention

Using an attention mechanism. Also includes L2 regularizer.

	Top-1 accuracy	Top-5 accuracy
Validation set	51.10%	73.95%
Evaluation set	52.90%	72.86%

Resulting model: final_model_experiment_6

Experiment 7 | attention

Using an attention mechanism. Also includes L2 regularizer (1e-6) and a (lower) dropout (0.4).

	Top-1 accuracy	Top-5 accuracy
Validation set	51.51%	74.17%
Evaluation set	53.70%	73.22%

Resulting model: final_model_experiment_7

Experiment 8 | Regularizer - (low) dropout

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4)

	Top-1 accuracy	Top-5 accuracy
Validation set	47.31%	71.89%
Evaluation set	48.56%	70.71%

Resulting model: final_model_experiment_8

Experiment 9 | Attention different

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4)

	Top-1 accuracy	Top-5 accuracy
Validation set	50.67%	73.38%
Evaluation set	52.69%%	72.36%

Resulting model: final_model_experiment_9

Experiment 10 | Attention final

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4). Runs for 30 epochs.

	Top-1 accuracy	Top-5 accuracy
Validation set	52.20%	75.12%
Evaluation set	54.80%	74.51%

Resulting model: final_model_experiment_10

Experiment 11 | Best regularizer

Includes L2 regularizer (1e-6) and a (lower) dropout (0.4).

	Top-1 accuracy	Top-5 accuracy
Validation set	47.26%	71.96%
Evaluation set	48.51%	70.60%

Resulting model: final_model_experiment_11

pythia-ai-code-completion's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.