Coder Social home page Coder Social logo

han-tun / orderly Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vimc/orderly

0.0 1.0 0.0 10.49 MB

:hospital::ambulance: Lightweight Reproducible Reporting for R

Home Page: https://vimc.github.io/orderly

License: Other

Makefile 0.18% R 98.37% Shell 1.06% HTML 0.06% Dockerfile 0.32% JavaScript 0.02%

orderly's Introduction

orderly

Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. Build Status AppVeyor Build Status codecov.io

  1. an attendant in a hospital responsible for the non-medical care of patients and the maintenance of order and cleanliness.
  2. a soldier who carries orders or performs minor tasks for an officer.

orderly is a package designed to help make analysis more reproducible. Its principal aim is to automate a series of basic steps in the process of writing analyses, making it easy to:

  • track all inputs into an analysis (packages, code, and data resources)
  • store multiple versions of an analysis where it is repeated
  • track outputs of an analysis
  • create analyses that depend on the outputs of previous analyses

With orderly we have two main hopes:

  • analysts can write code that will straightforwardly run on someone else's machine (or a remote machine)
  • when an analysis that is run several times starts behaving differently it will be easy to see when the outputs started changing, and what inputs started changing at the same time

orderly requires a few conventions around organisation of a project, and after that tries to keep out of your way. However, these requirements are designed to make collaborative development with git easier by minimising conflicts and making backup easier by using an append-only storage system.

The problem

One often-touted goal of R over point-and-click analyses packages is that if an analysis is scripted it is more reproducible. However, essentially all analyses depend on external resources - packages, data, code, and R itself; any change in these external resources might change the results. Preventing such changes in external resources is not always possible, but tracking changes should be straightforward - all we need to know is what is being used.

For example, while reproducible research has become synonymous with literate programming this approach often increases the number of external resources. A typical knitr document will depend on:

  • the source file (.Rmd or .Rnw)
  • templates used for styling
  • data that is read in for the analysis
  • code that is directly read in with source

The orderly package helps by

  • collecting external resources before an analysis
  • ensuring that all required external resources are identified
  • removing any manual work in tracking information about these external resources
  • allowing running reports multiple times and making it easy to see what changed and why

The core problem is that analyses have no general interface. Consider in contrast the role that functions take in programming. All functions have a set of arguments (inputs) and a return value (outputs). With orderly, we borrow this idea, and each piece of analysis will require that the user describes what is needed and what will be produced.

The process

The user describes the inputs of their analysis, including:

  • SQL queries (if using databases)
  • Required R sources
  • External resource files (e.g., csv data files, Rmd files, templates)
  • Packages required to run the analysis
  • Dependencies on previously run analyses

The user also provides a list of "artefacts" (file-based results) that they will produce.

Then orderly:

  1. creates a new empty directory
  2. copies over only the declared file resources
  3. loads only the declared packages
  4. loads the declared R sources
  5. evaluates any sql queries to create R objects
  6. then runs the analysis
  7. verifies that the declared artefacts are produced

It then stores metadata alongside the analysis including md5 hashes of all inputs and outputs, copies of data extracted from the database, a record of all R packages loaded at the end of the session, and (if using git) information about the git state (hash, branch and status).

Then if one of the dependencies of a report changes (the used data, code, etc), we have metadata that can be queried to identify the likely source of the change.

Workflows with orderly

In the MRC Centre for Global Infectious Disease Analysis we use orderly on two major projects:

The workflows we have developed here are oriented towards collaborative groups of researchers - other workflows are possible (indeed orderly is also designed to support a decentralised workflow, though this has not been used in practice yet).

In these projects we have a group of researchers who develop and test analyses locally. These are developed on a branch in git and then run on a centralised staging environment (a duplicate of our production environment). The code and outputs are reviewed with the help of GitHub's "Pull requests" and then the reports are run on our production environment.

Interaction with the remote environments is achieved using an HTTP API which orderly itself transparently uses, so that reports can be run remotely, directly from R. The remote systems also include an interactive web interface that can be used to explore and download versions of analyses, as well as run new ones.

Internal database schema

orderly has a database, which should be the preferred way of querying the report archive from other programs. The schema is programmatically described at inst/database/schema.yml and automatically generated database documentation is available here.

Testing

There is a set of regression tests that require the reference data. Enable these by running the script ./scripts/copy_reference which creates data in tests/testthat/reference

Installation

Install orderly from CRAN with

install.packages("orderly")

To install our internally released version (which might be ahead of CRAN) via drat, use

# install.packages("drat")
drat:::add("vimc")
install.packages("orderly")

License

MIT ยฉ Imperial College of Science, Technology and Medicine

orderly's People

Contributors

emmalrussell avatar hillalex avatar jamesthompson1729 avatar martineden avatar r-ash avatar richfitz avatar tinigarske avatar weshinsley avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.