Coder Social home page Coder Social logo

evaluating-cmg's Introduction

Evaluating-CMG

Implementation of the research paper "Evaluating Commit Message Generation: To BLEU Or Not To BLEU?"

Implementation Environment

Please install the neccessary libraries before running our codes:

  • python==3.6.9
  • nltk==3.4.5
  • numpy==1.16.5
  • scikit-learn==0.22.1

Data & Models:

Our data is extracted from the MSR dataset. The data used for our experiments can be found in the "Dataset" folder.

  • The "human_annotations.csv" file contains the human annotated scores for 100 pairs of reference and predicted commit messages.
  • A subset of the original MCMD dataset has been used for our experiments. The .csv files for pairs of reference and predicted sentences is of the general form "model_MCMD(Number).csv", where "model" could be any of the CMG models listed below and "(Number)" takes one the values 1-5 according to the choice of the programming language (PL).
Number PL
1 C++ (C plus plus)
2 C# (C sharp)
3 Java (Java)
4 JS (Javascript)
5 Py (Python)

The Commit Message generation (CMG) models considered in our experiments are:

Evaluation Metrics

The Machine Translation (MT) metrics considered in our experimentations are: BLEU4, BLEUNorm, BLEUCC, METEOR, METEOR-NEXT, ROUGE-1, ROUGE-2, ROUGE-L, TER.

Research Questions Asked and Answered

  • RQ1: Which factors affect commit message quality?
  • RQ2: Which metric is best suited to evaluate commit messages?
  • RQ3: How do the CMG tools perform on the new metric?

Experimentation and Implementation

1. Effect of various factors on the MT metrics

  • The potential factors included in our study are Length, Word Alignment, Semantic Scoring, Case Folding, Punctuation Removal and Smoothing.
  • For replication of RQ1 section in the paper, simply run the Effect of {Factor_name}.py file under the "Experimental Results" folder to observe the effect of {Factor_name} factor on the metrics.
  • For implementation of the code on your own human annotated dataset, replace the human_annotations.csv part with your own human annotated .csv file in the following code snippet of the Effect of {Factor_name}.py file under the "Experimental Results" folder, followed by minor alterations in case of changes in the number of human annotators used (here, #annotators = 3).
with open('human_annotations.csv') as csvfile:
    ader = csv.reader(csvfile)

2. The Log-MNEXT metric

  • For replication of RQ2 section in the paper, simply run the The Log-MNEXT metric.py file to obtain its correlation with human evaluation scores.
  • For implementation of the metric and getting the score for any given reference and predicted sentence pair, run the The Log-MNEXT metric.py file and then call the function log_mnext_score([reference],predicted). However, both reference and predicted are of type 'string'.

3. Performance of Log-MNEXT on CMG models

  • For replication of RQ3 section in the paper, i.e, observing the performance of Log-MNEXT metric on a specific model for a particular PL of the MCMD dataset, simply update the model_MCMD(Number).csv part by putting the model name in place of model, number 1,2,3,4 or 5 in place of (Number) in the code snippet of the Log-MNEXT performance on the models.py file under the "Experimental Results" folder.

  • For observing the performance of the metric on any other CMG model, replace the model_MCMD(Number).csv part of the code snippet below with the required .csv file containing the reference senetences and the predicted sentences generated by your specific model.

refs=[]
preds=[]
with open('model_MCMD(Number).csv') as csvfile:
    ader = csv.reader(csvfile)
    for row in ader:
        refs.append(row[0])
        preds.append(row[1])

evaluating-cmg's People

Contributors

samantadey avatar

Stargazers

Yu-Fan Tang avatar

Watchers

 avatar

Forkers

christiannov

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.