Coder Social home page Coder Social logo

cptanalatriste / code-review-reciprocity Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 52 KB

Analysing reciprocity in code review using multivariate time-series analysis.

Home Page: https://ieeexplore.ieee.org/abstract/document/9796163

License: MIT License

Python 100.00%
time-series-analysis mining-software-repositories

code-review-reciprocity's Introduction

Analysing Reciprocity in Code Review

This is Team Balloon's project for the MSR Virtual Hackathon 2022.

We analysed reciprocity in the code review process. Using vector autoregressive (VAR) models over GitHub's pull request data, we explored if there is a causal relationship between: 1) reviews performed by a mantainer and 2) the acceptance of their own code contributions.

This repository contains the Python scripts for performing such analysis.

Installation

The code was tested on MacOS using Anaconda Python 3.9 and ElasticSearch 7.15.2. You need to install ElasticSearch in your system before running the scripts. Before running the scripts, please start ElasticSearch and update the global variable ELASTICSEARCH_HOST on the config.py file with your ElasticSearch URL.

To install Python dependencies, execute the following:

pip install -r requirements.txt

Finally, we use GitHub's API ,via Perceval, to obtain pull-request data. For this to work, we need you to obtain a personal access token from your GitHub account and set it to the global variable GITHUB_API_TOKEN on the config.py file.

Usage

The file dataloading.py contains functions for retriving pull-request from GitHub and storing it in ElasticSearch. The following snippet retrieves pull-requests for the Kubernetes repository, created over the last 10 years:

from dataloading import get_and_store

def star_extraction(owner: str, repository: str):
    get_and_store(owner, repository, factor=0, new_index=True)
    for year_factor in range(1, 10):
        get_and_store(owner, repository, factor=year_factor, new_index=False)


if __name__ == "__main__":
    star_extraction(owner="kubernetes", repository="kubernetes")

This code will store the data in a ElasticSearch index called kubernetes-kubernetes. To perform the reciprocity analysis over this project, we can use the functions from the devanalysis.py file:

from typing import Tuple

import elasticsearch
from elasticsearch import Elasticsearch

from aggregation import PRS_REVIEWED_AND_MERGED, PRS_AUTHORED_AND_MERGED
from config import ELASTICSEARCH_HOST
from devanalysis import analyse_project

if __name__ == "__main__":
    elastic_search: Elasticsearch = elasticsearch.Elasticsearch(ELASTICSEARCH_HOST)
    variables: Tuple = (PRS_REVIEWED_AND_MERGED, PRS_AUTHORED_AND_MERGED)
    project_prs, project_analysis = analyse_project(elastic_search, "kubernetes-kubernetes", "month", variables,
                                                    "aic")

This code will generate monthly time series from the data stored in the kubernetes-kubernetes index. Regarding the order of the VAR model, it will choose the value with the minimum Akaike Information Criteria (AIC). Per developer, plots will be stored at the img folder and the output of statistical tests at txt.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

code-review-reciprocity's People

Contributors

cptanalatriste avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.