multifix's Introduction

MultiFix: Learning to Repair Multiple Errors by Optimal Alignment Learning

Overview

This project is a Torch implementation which learning to repair multiple errors by optimal alignment learning.

Hardware

The models are trained using folloing hardware:

Ubuntu 18.04.5 LTS
NVIDA TITAN Xp 24GB * 4
Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz
64GB RAM

Dependencies

Python version is 3.6.7 We use the following version of Pytorch. gpu support (CUDA==10.1)
torch==1.1.0 gpu support (CUDA>10.1)
torch==1.5.0 Etc. (Included in "requirements.txt")
torchtext==0.3.1
numpy==1.16.1
tqdm
matplotlib
regex

Prerequisite

Use virtualenv

    sudo apt-get install build-essential libssl-dev libffi-dev python-dev
    sudo apt install python3-pip
    sudo pip3 install virtualenv
    virtualenv -p python3 venv
    . venv/bin/activate
    # code your stuff
    deactivate

Datasets

Our dataset is based on the dataset provided by DeepFix. https://www.cse.iitk.ac.in/users/karkare/prutor/prutor-deepfix-09-12-2017.zip

HOW TO EXECUTE OUR MODEL?

Data Processing

Generate training data based on the DeepFix and DrRepair dataset.

    bash data_processing.sh

Model training

Train the data with our model.

    bash model_training.sh

However, this takes a significant time, so we provide 2 models that were trained.

log/pth

Evaluation

You can check the repair result through the saved model.

    bash evaluation.sh

Known issues

If the beam size is 100, it takes a significant time.
We did not fix the seed, so training results may be slightly different. We actually use the average of the three training results.

multifix's People

Contributors

Stargazers

Watchers

multifix's Issues

Some questions about the test dataset

Hi, thanks for your impressive work. And I have some questions about the test dataset you used in your paper.

In the paper, you said "On a set of 6,975 erroneous C programs from the DeepFix dataset, our approach achieves the state-of-the-art result in terms of full repair rate on the DeepFix dataset" in the abstract. While in Section 4.1, it said "The dataset contains 37,415 correct programs (compiled without error) and 6,971 erroneous programs." The number is different. Is it just a typo?
Based on the description above and released code, does it mean that you are using all erroneous programs for testing which are solving the same problem with the programs in the train dataset? For example, the correct programs in Prob10 are used for training while erroneous programs in Prob10 are also used for testing. If so, I think it may be unfair for your work to compare with DrRepair since they trained their model on bin0/1/2/3 correct and tested it on bin4 error.

Looking forward to your reply and thanks for your contribution!

Recommend Projects