Coder Social home page Coder Social logo

commit-type-detection's Introduction

Commit Type Detection

Classify Git commits with deep learning

Introduction

According to this paper, we suppose that there are 3 main classification categories for software project maintenance activities:

Corrective: fixing faults (functional and non-functional)

Perfective: improving the system and its design

Adaptive: introducing new features into the system

In this work, we seek to design a commit classification model capable of providing high accuracy to detect these three types of commits.

The used dataset can be found here.

Method

In the mentioned paper, three algorithms have been used and compared. Among J48, GBM, and RF algorithms, RF had a better performance.

Instead of using these algorithms, we implemented a deep learning approach. Here you can see the implemented neural network architecture (copied from network.py file):

class Network(nn.Module):
    def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):
        super(Network, self).__init__()
        self.fc1 = nn.Linear(input_size, 80)
        self.fc2 = nn.Linear(80, 60)
        self.dropout1 = nn.Dropout(0.01)
        self.fc3 = nn.Linear(60, 40)
        self.fc4 = nn.Linear(40, 20)
        self.fc5 = nn.Linear(20, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.fc3(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.fc4(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.fc5(x)
        x = torch.tanh(x)
        return x

As you can read, a fully-connected neural network has been implemented in PyTorch deep learning framework.

In our dataset, each commit has a message, project name, and 68 other features. By applying tf-idf algorithm on the commit messages, we may convert each commit data to a vector with size 100. So, the input of this network is a vector with a size equal to 100.

Like the paper method, our models were trained using 85% of the dataset, while the remaining 15% was used as a test set.

Result

A confusion matrix will be shown after training. You can compare this data to the 8th table of the mentioned paper. As you can see, our method has reached 74.5% accuracy in this case.

Predict  a        c        p        
Actual
a        17       4        10       

c        5        74       6        

p        3        16       38       




Overall Statistics:

Kappa                                                      0.57912
NIR                                                        0.49133
Overall Accuracy                                           0.74566
P-Value [Accuracy > NIR]                                   0.0

Class Statistics:

Classes                                                    Adaptive    Corrective  Perfective
ACC(Accuracy)                                              0.87283     0.82081     0.79769
ERR(Error rate)                                            0.12717     0.17919     0.20231
FN(False negative/miss/type 2 error)                       14          11          19
FP(False positive/type 1 error/false alarm)                8           20          16
FPR(Fall-out or false positive rate)                       0.05634     0.22727     0.13793
PPV(Precision or positive predictive value)                0.68        0.78723     0.7037
TN(True negative/correct rejection)                        134         68          100
TNR(Specificity or true negative rate)                     0.94366     0.77273     0.86207
TP(True positive/hit)                                      17          74          38
TPR(Sensitivity, recall, hit rate, or true positive rate)  0.54839     0.87059     0.66667

Usage

Use Python version 3.

First of all, install the required Python packages:

pip install requirements.txt

And then run the Python program:

python main.py

commit-type-detection's People

Contributors

erfaniaa avatar

Stargazers

 avatar Henry Lao avatar David Alexander Pfeiffer avatar Donatien Eneman avatar Cinna avatar ArmanAminian avatar Mohammadreza Pakzadian avatar Amir Hallaji avatar Mohammadhossein Zarei avatar Amin Asadi Sarijalou avatar 上古野人 avatar  avatar Krazy Bug avatar Roozbeh Sayadi avatar Saman Khamesian avatar Alireza Tarazani avatar Hossein Kashiani avatar Sina Sheikholeslami avatar

Watchers

 avatar  avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.