Cerebro: Static Subsuming Mutant Selection

This repo contains the code, data set and trained models for the paper Cerebro: Static Subsuming Mutant Selection, published in IEEE Transactions on Software Engineering (TSE).

The paper is available here:

The bib entry for citing the paper is available here:

The dataset is composed of the following:

Codebase gathered for the 48 GNU Coreutils [1] programs in C language and 10 projects in Java from Apache Commons Proper [2], Joda-Time [3], and Jsoup [4];
Mutant infomation in json file format for every program/project with Mutant ID, Source Code File Name, Mutation Type, and Line #;
Subsuming Mutant Label information in json file format with mapping to every mutant on ID basis for every program/project;
Abstracted Code for every original source code file and mutant for every program/project; and
Mutant Annotation Sequences in pairs of lhs (input) and rhs (expected output) for all mutants in every project/program, with mappings between Sequence File Indexes and Mutant IDs, and Sequences and Original Code File Indexes.

Tools/dependencies that we require before executing the code:

Apache Maven ( available here: https://maven.apache.org/download.cgi )
srcML ( available here: https://www.srcml.org/ )

NOTE: please do not forget to modify below variables in data.java file to specify your desired repository locations and/or dependencies

static String dirDataset = "D:/ag/github/Cerebro/dataset";

Commands to execute:

mvn clean package

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar [arguments]

options based on tasks:

to prepare dataset for model training:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep [language] [sequence-length] [abstraction-level]

where,

available options for [language] are c or java

[sequence-length] is the desired number of tokens in a sequence (numeric value) e.g. 25 / 50 / 100

available options for [abstraction-level] are full and partial

so, to create dataset for projects in java, of sequence length 100 with abstraction, below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep java 100 full

to create dataset for projects in c, of sequence length 50 with no abstraction (only code comments removed), below command should be executed:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar prep c 50 partial

to test the performance of model by evaluating the model generated sequences:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar test [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

to generate XMLs for input in simulation:

java -jar D:/ag/github/Cerebro/code/target/cerebro-1.0.jar combinetosimulate [language] [sequence-length] [abstraction-level]

values for [language], [sequence-length], and [abstraction-level] follow the same as described above.

Where to find trained models in the repo?

the trained models are available as below:

dataset/subsuming-mutant-prediction-[language]/smp/smp-[language]-[sequence-length]-[fold#]/model

e.g. model trained for java projects with abstracted sequences of length 100 is available below:

dataset/subsuming-mutant-prediction-java/smp/smp-java-100-01/model

Tools/dependencies that we require to train/test the models:

seq2seq ( available here: https://google.github.io/seq2seq/getting_started/#download-setup )
Tkinter (available here: https://docs.python.org/3.8/library/tkinter.html )
TensorFlow ( available here: https://www.tensorflow.org/install/pip )
PyYAML ( available here: https://pyyaml.org/wiki/LibYAML )
Perl (available here: https://www.cpan.org/modules/INSTALL.html )

for model training:

please refer to the script train.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/train.sh

./train.sh [dirpath] [training-samples-num * epoch-num] [dirpath]/model [config] 1 [training-samples-num] [training-samples-num] 0

below is a sample usage for training a model till 10 epochs for projects in java with sequence length 50 having 135,903 training samples:

./train.sh ../smp-java-50-01 1359030 ../smp-java-50-01/model length_51-g-1-2 1 135903 135903 0

please refer to configurations available in directory Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/configs.

for sequence length 25, 50, and 100, please use length_26-g-1-2, length_51-g-1-2, and length_101-g-1-2

for model testing:

please refer to the script test.sh available at Cerebro/dataset/subsuming-mutant-prediction-java/smp/seq2seq/test.sh

./test.sh [dirpath]/test [dirpath]/model [desired-generated-sequences-file-name]

below is a sample usage for using the trained model available at location - (../smp-java-50-01/model) and test set available at location - (../smp-java-50-01/test) to generate sequences in file genrhs-smp-java-50-01.txt:

./test.sh ../smp-java-50-01/test ../smp-java-50-01/model genrhs-smp-java-50-01.txt

note:

please note that few models were larger than 100MB in size, hence they were split in 2 files to be able to check-in. below are those models:

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-01/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-02/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-03/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-04/model/model.ckpt.data-00000-of-00001

dataset/subsuming-mutant-prediction-java/smp/pa-smp-java-50-05/model/model.ckpt.data-00000-of-00001

in aforementioned cases, model.ckpt.data-00000-of-00001 was divided in model.ckpt.data-00000-of-00001.001 and model.ckpt.data-00000-of-00001.002

References

[1] GNU Coreutils. https://www.gnu.org/software/coreutils/, (last accessed April 24, 2021).

[2] Apache Commons Proper. https://commons.apache.org, (last accessed April 24, 2021).

[3] Joda-Time. https://github.com/JodaOrg/joda-time/, (last accessed April 24, 2021).

[4] Jsoup. https://github.com/jhy/jsoup, (last accessed April 24, 2021).

garghub / cerebro Goto Github PK

cerebro's Introduction

Cerebro: Static Subsuming Mutant Selection

References

cerebro's People

Stargazers

Watchers

Forkers

cerebro's Issues

how to get the subsuming mutants

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent