Coder Social home page Coder Social logo

reproducibilityproject / effortly Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 262.15 MB

A preliminary analysis into the effort required for reproducing computational science scholarly articles.

Home Page: https://reproducibilityproject.github.io/effortly/

License: MIT License

Python 0.46% Jupyter Notebook 99.54%
reproducibility reproducibility-challenge effort-of-reproducibility

effortly's Introduction

effortly

A preliminary analysis of the effort required for reproducing computational science scholarly articles.

About

The idea of estimating the underlying effort in reproducing scholarly articles is slightly new, and the NIH article serves as a good starting point to get some historical perspective on the topic. In an attempt to estimate "Effort in Reproducibility", we collected replication reports from Machine Learning Reproducibility Challenge (2020, 2021). The primary goal of ML Reproducibility Challenge was to have a community of researchers investigate the claims made in scholarly articles published at top conferences. The community selected papers and attempted to verify the claims made in the paper by reproducing computational experiments. The reports published on ReScience were a by-product outlining the underlying effort behind reproducing the papers. We believe these reports to be a good starting point for understanding the operational framework of reproducibility. The reports had detailed information about the scope of reproducibility and what was easy and difficult for the researchers while replicating the original article.

Overview

Repository structure

├── data
│   ├── original-pdfs
│   ├── pdfs
│   │   ├── 10_5281-zenodo_1003214.pdf
│   │   ├── ...
│   │   ├── 10_5281-zenodo_890884.pdf
│   ├── ReScience.csv
│   ├── ReScience_JCDL-23.csv
│   ├── ReScience_ML_repro_challenge_alpha.csv
│   └── sciparse_outputs
│       ├── 10_5281-zenodo_1003214.json
│       ├── .........
│       ├── 10_5281-zenodo_1289889.json
├── LICENSE
├── media
│   ├── inductive_analysis.png
│   └── quantitative_analysis.png
├── notebooks
│   └── JCDL-23_Effort_of_Reproducibility.ipynb
├── README.md
├── slides
│   ├── JCDL'23 _ Effort of Reproducibility.pdf
│   └── JCDL'23 _ Effort of Reproducibility.pptx
└── src
    └── util.py

Notebook

Open In Colab

Data Collection

In an effort to build a consolidated repository of datasets pertaining to reproducibility of scholarly articles, we initiated reproducibility/datasets. The central idea here was to work towards studying "All things Reproducibility in Science". Data collection for ReScience was a part of it and gathering the data was accomplished using methods from the following util file.

Data Description

The Machine Learning Reproducibility Challenge (2020, 2021) had a total of 87 articles, of which 15 were removed because they didn't belong to the discipline of machine learning. Additionally, two more articles were removed from the final dataset because they were editorials. The final dataset comprised of 70 articles and said analysis was made on these respective articles.

Authors

Akhil Pandey

PI, and Co-PI

Hamed Alhoori, David Koop

Acknowledgement

This work is supported in part by NSF Grant No. 2022443.

Citation

If you find this work useful, please cite our paper:

@article{akella2023JCDL,
  title={{Laying foundations to quantify the "Effort of Reproducibility"}},
  author={Akhil Pandey Akella and David Koop and Hamed Alhoori},
  journal={Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries (JCDL), ACM/IEEE},
  year={2023}
}

effortly's People

Contributors

akhilpandey95 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

effortly's Issues

Add info about original articles

  • include science-parse outputs for the original articles inside src/original-article-sciparse-outputs
  • add PDF's of the original article with the same file name as the reproduced article in a directory src/orignal-pdfs
  • Include full text information about the original article inside the csv.
  • utilize full text information to build structural features from the original article.

Update the ReScience ML dataset

  • Encode the sections "Scope of Reproducibility", "What was easy", "What was difficult".
  • Include a README.md inside src to outline the thought process behind encoding the above sections.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.