Coder Social home page Coder Social logo

bhavyagera10 / code_summarization_public Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wanyao1992/code_summarization_public

0.0 1.0 0.0 102.27 MB

source code for 'Improving automatic source code summarization via deep reinforcement learning'

Python 60.09% ANTLR 2.05% Java 37.39% Perl 0.48%

code_summarization_public's Introduction

Requirement

This repos is developed based on the environment of:

  • Python 2.7
  • PyTorch 0.2

Data folder structure

/media/BACKUP/ghproj_d/code_summarization/github-python/ is the folder to save all the data in this project, please replace it to your own folder. The data files are organized as follows in my computer:

|- /media/BACKUP/ghproj_d/code_summarization/github-python

|--original (used to save the raw data)

|----data_ps.declbodies data_ps.descriptions

|--processed (used to save the preprocessed data)

|----all.code all.comment

|--result (used to save the results)

|--train (get the data files before training)

You need to get these files before you starting to train our model. Here I put the original folder in the dataset foler of this project. You'd better copy them to your own folder.

Data preprocess

cd script/github
python python_process.py -train_portion 0.6 -dev_portion 0.2 > log.python_process

Training

Back to the projector folder

cd ../..

Get the data for training

python run.py preprocess

Training

python run.py train_a2c 10 30 10 hybrid 1 0

Testing

python run.py test_a2c hybrid 1 0

TODO

  • To build the AST, on the data preprocessing, I parse the AST into a json and then parse the json into AST on training. This kind of approach is not elegant.
  • On training, I don't know how to batchify the ASTs, so I have to put the ASTs into a list and encode them one by one. It's unefficient, making the training of one epoch takes about 2-3 hours. Please let me know if you have a better way to accelerate this process.
  • On the encoder side, I am working on applying Tree-CNN and GraphCNN to represent the code in a better way.
  • On the decoder side, GAN network will also be considered for the code summarization task.

Acknowledgement

This repos is based on https://github.com/khanhptnk/bandit-nmt

Please cite our paper if you use this repos.

Bibtex:
@Inproceedings{wan2018improving,
title={Improving automatic source code summarization via deep reinforcement learning},
author={Wan, Yao and Zhao, Zhou and Yang, Min and Xu, Guandong and Ying, Haochao and Wu, Jian and Yu, Philip S},
booktitle={Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering}
pages={397--407},
year={2018},
organization={ACM}
}

code_summarization_public's People

Contributors

wanyao1992 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.