Evaluating-CMG

Implementation of the research paper "Evaluating Commit Message Generation: To BLEU Or Not To BLEU?"

Implementation Environment

Please install the neccessary libraries before running our codes:

python==3.6.9
nltk==3.4.5
numpy==1.16.5
scikit-learn==0.22.1

Data & Models:

Our data is extracted from the MSR dataset. The data used for our experiments can be found in the "Dataset" folder.

The "human_annotations.csv" file contains the human annotated scores for 100 pairs of reference and predicted commit messages.
A subset of the original MCMD dataset has been used for our experiments. The .csv files for pairs of reference and predicted sentences is of the general form "model_MCMD(Number).csv", where "model" could be any of the CMG models listed below and "(Number)" takes one the values 1-5 according to the choice of the programming language (PL).

Number	PL
1	C++ (C plus plus)
2	C# (C sharp)
3	Java (Java)
4	JS (Javascript)
5	Py (Python)

The Commit Message generation (CMG) models considered in our experiments are:

Evaluation Metrics

The Machine Translation (MT) metrics considered in our experimentations are: BLEU4, BLEUNorm, BLEUCC, METEOR, METEOR-NEXT, ROUGE-1, ROUGE-2, ROUGE-L, TER.

Research Questions Asked and Answered

RQ1: Which factors affect commit message quality?
RQ2: Which metric is best suited to evaluate commit messages?
RQ3: How do the CMG tools perform on the new metric?

Experimentation and Implementation

1. Effect of various factors on the MT metrics

The potential factors included in our study are Length, Word Alignment, Semantic Scoring, Case Folding, Punctuation Removal and Smoothing.
For replication of RQ1 section in the paper, simply run the Effect of {Factor_name}.py file under the "Experimental Results" folder to observe the effect of {Factor_name} factor on the metrics.
For implementation of the code on your own human annotated dataset, replace the human_annotations.csv part with your own human annotated .csv file in the following code snippet of the Effect of {Factor_name}.py file under the "Experimental Results" folder, followed by minor alterations in case of changes in the number of human annotators used (here, #annotators = 3).

with open('human_annotations.csv') as csvfile:
    ader = csv.reader(csvfile)

2. The Log-MNEXT metric

For replication of RQ2 section in the paper, simply run the The Log-MNEXT metric.py file to obtain its correlation with human evaluation scores.
For implementation of the metric and getting the score for any given reference and predicted sentence pair, run the The Log-MNEXT metric.py file and then call the function log_mnext_score([reference],predicted). However, both reference and predicted are of type 'string'.

3. Performance of Log-MNEXT on CMG models

For replication of RQ3 section in the paper, i.e, observing the performance of Log-MNEXT metric on a specific model for a particular PL of the MCMD dataset, simply update the model_MCMD(Number).csv part by putting the model name in place of model, number 1,2,3,4 or 5 in place of (Number) in the code snippet of the Log-MNEXT performance on the models.py file under the "Experimental Results" folder.
For observing the performance of the metric on any other CMG model, replace the model_MCMD(Number).csv part of the code snippet below with the required .csv file containing the reference senetences and the predicted sentences generated by your specific model.

refs=[]
preds=[]
with open('model_MCMD(Number).csv') as csvfile:
    ader = csv.reader(csvfile)
    for row in ader:
        refs.append(row[0])
        preds.append(row[1])

cmgeval / evaluating-cmg Goto Github PK

evaluating-cmg's Introduction

Evaluating-CMG

Implementation Environment

Data & Models:

Evaluation Metrics

Research Questions Asked and Answered

Experimentation and Implementation

1. Effect of various factors on the MT metrics

2. The Log-MNEXT metric

3. Performance of Log-MNEXT on CMG models

evaluating-cmg's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent