Coder Social home page Coder Social logo

machiela-lab / ukbbcleanr Goto Github PK

View Code? Open in Web Editor NEW
10.0 3.0 1.0 1.26 MB

Prepare electronic medical record data from the UK Biobank for time-to-event analyses

License: MIT License

R 100.00%
data-processing electronic-medical-records r r-package rstats rstats-package time-to-event uk-biobank

ukbbcleanr's Introduction

UKBBcleanR: Prepare electronic medical record data from the UK Biobank for time-to-event analyses

R-CMD-check License: MIT GitHub last commit DOI

Date repository last updated: January 26, 2023

Overview

The UKBBcleanR package contains an R function that prepares time-to-event data from raw UK Biobank electronic medical record data. The prepared data can be used for cancer outcomes, but could be modified for other health outcomes. This package is not available on CRAN.

Installation

To install the development version from GitHub:

devtools::install_github("machiela-lab/UKBBcleanR")

Available function(s)

Function Description
tte Prepares time-to-event data from raw UK Biobank electronic medical record data.

The repository also includes the resources and code to create the project hex sticker.

Authors

  • Alexander Depaulis - Integrative Tumor Epidemiology Branch (ITEB), Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, Maryland (MD), USA - GitHub

  • Derek W. Brown - ITEB, DCEG, NCI, NIH, Rockville, MD, USA (original) - GitHub - ORCID

  • Aubrey K. Hubbard - ITEB, DCEG, NCI, NIH, Rockville, MD, USA - ORCID

See also the list of contributors who participated in this package, including:

  • Ian D. Buller - Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland (current) - Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original) - GitHub - ORCID

  • Mitchell J. Machiela - ITEB, DCEG, NCI, NIH, Rockville, MD, USA - GitHub - ORCID

Getting Started

The tte function requires several raw UK Biobank variables to run correctly. A detailed list of required variables are provided in the README_required_variables.txt file.

Data can be loaded in the tte function in two ways:

  • The user can specify a working directory using setwd() to where each individual data set is stored.

    • NOTE: These individual data sets must contain the specific variables and have names which match the README_required_variables.txt file. Example data is available within the package.
  • The user can generate a single data set containing all the variables of interest. This data set can then be loaded into the tte function using the combined_data argument. Example data is available within the package.

Usage

# ------------------ #
# Necessary packages #
# ------------------ #

library(UKBBcleanR)

# -------- #
# Settings #
# -------- #

##### Input UKBBcleanR sample data

 # Use combined data set
 testdata <- as.data.frame(combined_data)
 
 # Set ICD-10 outcome of interest
 cancer_outcome <- c("C911") 
 
 # Set prevalent cancers to identify in data cleaning
 prevalent_cancers <- c("D37", "D38", "D39", "D40", "D41", "D42",
                        "D43", "D44", "D45", "D46", "D47", "D48") 
 
 # Set incident cancers to identify in data cleaning
 incident_cancers <- c("C900") 
 
# ------- #
# Run tte #
# ------- #

# Run without removing prevalent cancers from analysis
test1 <- tte(combined_data = testdata, 
             cancer_of_interest_ICD10 = cancer_outcome,
             prevalent_cancer_list = prevalent_cancers, 
             prevalent_C_cancers = TRUE, 
             incident_cancer_list = incident_cancers, 
             remove_prevalent_cancer = FALSE, 
             remove_self_reported_cancer = FALSE)
            
table(test1$case_control_cancer_ignore)  # tte outcome ignoring other incident cancers
table(test1$case_control_cancer_control) # tte outcome controlling for other incident cancers


# Run with removing prevalent cancers from analysis
test2 <- tte(combined_data = testdata, 
             cancer_of_interest_ICD10 = cancer_outcome,
             prevalent_cancer_list = prevalent_cancers, 
             prevalent_C_cancers = TRUE, 
             incident_cancer_list = incident_cancers, 
             remove_prevalent_cancer = TRUE, 
             remove_self_reported_cancer = TRUE)
table(test2$case_control_cancer_ignore)  # tte outcome ignoring other incident cancers
table(test2$case_control_cancer_control) # tte outcome controlling for other incident cancers

Vignette

We provide a vignette with a practical example and work through of the provided example data.

Funding

Package was developed while the first author was a participant of the 2022 National Institutes of Health Summer Internship Program in Biomedical Research and while the second author was a postdoctoral fellow supported by the Cancer Prevention Fellowship Program at the National Cancer Institute (NCI) and the third author was a postdoctoral fellow in the NCI Division of Cancer Epidemiology and Genetics.

Acknowledgments

When citing this package for publication, please cite follow:

citation("UKBBcleanR")

Questions? Feedback?

For questions about the package please contact the maintainer Dr. Derek Brown or submit a new issue.

ukbbcleanr's People

Contributors

derekbrown12 avatar idblr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

abdrmlr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.