Coder Social home page Coder Social logo

mat's Introduction

The replication kit of MAT

Title: How far have we progressed in identifying self-admitted technical debts? A comprehensive empirical study

This repository stores the source codes of the four state-of-the-art SATD comments detection approaches, and 20 Java projects whose comments were manually labeled by Maldonado et al. (10) and ourselves (10).

1. Folders Introduction

  • MAT/dataset/ This folder stores the comment data of 20 Java projects, consisting of 40 files: 20 comments files (e.g., data--Ant.txt), 20 labels files (i.e., label--Ant).

  • MAT/src/ This folder stores the source code of Pattern, NLP, TM, and MAT written in Java.

  • MAT/CNN_Code/ This folder stores the source code for CNN written in Python. This code was provided by Ren et al. and we modified some code so that it can be used for cross-project predictions.

  • MAT/exp_data/ This folder stores the experimental data, including some configuration files exp_data/dic and the comment data folder exp_data/origin.

  • MAT/result/ This folder stores all classification results of the each approaches. In particular, MAT/result/predictions/ stores the detailed classification result for each comment of each project.

  • Tips 1: The implementation of Jitterbug can be found in its origin repository

  • Tips 2: The results of CNN approach are all copied from their paper Neural network based detection of self-admitted technical debt: From performance to explainability [1].

2. Runnable jar archive

In order to make it easier to obtain the classification results, all the Java source codes have been packaged into a runnable jar archive file MAT.jar. One can run it according to the following command regulation.

java -jar MAT.jar -p exp_data_folder_path -o result_path -m model -s scenario

In above command,

  • -p indicates the experimental data folder path, in which two sub-folder dic and origin that can be found in MAT/exp_data/ should be pre placed. Specifically, dic stores some configuration files, and origin stores the comment data of each project. The simplest way is to copy the MAT/exp_data/ folder to your machine and then add option -p local position/MAT/exp_data in the command;
  • -o indicates the result folder path, which stores the classification result of each approach. You can make an empty folder to store the results.
  • -m indicates a SATD identification model, i.e., Pattern, NLP, TM, and MAT;
  • -s indicates a prediction scenario, i.e., MTO and OTO.

Here is some usage examples:

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m Pattern -s MTO

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m NLP -s MTO

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m TM -s MTO

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m MAT -s MTO

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m NLP -s OTO

java -jar MAT.jar -p D:/exp_data/ -o D:/Result/ -m TM -s OTO

3. Studied Approaches

Year Authors Approach isSupervised Description
2015 Potdar et al. Pattern No Pattern (key words) matching
2017 Maldonado et al. NLP Yes Natural language processing
2018 Huang et al. TM Yes Text mining
2019 Ren et al. CNN Yes Convolutional Neural Network
2020 Yu et al. Jitterbug Yes Pattern matching & Hunman effort

4. Dataset Summary

4.1 Projects labeled by Maldonado et al.

Project Release Contributors #Classes #Comments #After flitering SATD % of SATD
Ant 1.7.0 74 1,475 21,587 3,052 102 0.47%
ArgoUML 0.34 87 2,609 67,716 5,426 969 1.43%
Columba 1.4 9 1,711 33,895 4,090 128 0.38%
EMF 2.4.1 30 1,458 25,229 2,585 74 0.29%
Hibernate 3.3.2 226 1,356 11,630 2,492 377 3.24%
JEdit 4.2 57 800 16,991 4,644 195 1.15%
JFreeChart 1.0.19 19 1,065 23,474 2,494 101 0.43%
JMeter 2.10 33 1,181 20,084 4,148 282 1.40%
JRuby 1.4.0 328 1,486 11,149 3,652 383 3.44%
SQuirrel 3.0.3 46 3,108 27,474 4,473 201 0.73%
Total ----- - 16,249 259,229 37,056 2,812 1.08%

4.2 Projects labeled by ourselves.

Project Release Contributors #Files #Comments #After flitering SATD % of SATD
Dubbo 2.7.4 255 1,493 5,875 1,649 85 1.45%
Gradle 5.6.3 409 7,965 15,901 3,324 321 2.02%
Groovy 2.5.8 284 1,526 14,199 4,435 249 1.75%
Hive 3.1.2 192 5,817 81,127 29,340 1,046 1.29%
Maven 3.6.2 87 886 5,448 1,219 136 2.50%
Poi 4.1.1 12 3,477 45,666 15,033 618 1.35%
SpringFramework 5.2.0 401 6,355 42,574 7,712 98 0.23%
Storm 2.1.0 304 2,267 12,258 3,639 92 0.75%
Tomcat 9.0.27 31 2,343 37,038 12,218 287 0.77%
Zookeeper 3.5.6 93 677 6,894 2,691 63 0.91%
Total ------ - 32,806 266,980 81,260 2,995 1.12%

5. Contact us

Mail: [email protected]

[1] X. Ren, Z. Xing, X. Xia, D. Lo, X. Wang, J. Grundy. Neural network based detection of self-admitted technical debt: From performance to explainability. ACM Transactions on Software Engineering and Methodology, 28(3), 2019: 1-45.

mat's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.