Coder Social home page Coder Social logo

amylei96 / unsupervised-learning-breast-cancer-subtypes Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 89.15 MB

Unsupervised clustering of transcriptomic and proteomic data for breast cancer patients

R 100.00%
r machine-learning unsupervised-clustering unsupervised-machine-learning transcriptomic-data proteomic-data breast-cancer health biology breast-cancer-classification

unsupervised-learning-breast-cancer-subtypes's Introduction

Unsupervised Learning of Breast Cancer Subtypes

Overview

Breast cancer is the most common type of cancer in women regardless of age, ethnicity, or race. As a highly heterogeneous disease, breast cancer has four subtypes: Basal-like, HER2-enriched, Luminal A, and Luminal B. Each subtype requires different treatment due to the expression (positive status) or lack of expression (negative status) of three biomarkers: estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor 2 (HER2). A panel of 50 genes known as the PAM50 signature is currently used to classify subtypes at the molecular level with transcriptomic data, however, studies have shown that this classifier is not perfect. In addition, transcriptomic data does not inform about the important role of proteins in cell signaling pathways that promote cell proliferation and cell growth in breast cancer. This project uses an unsupervised learning approach to assess whether proteomic data adds additional information for subtype clustering using transcriptomic data of 17,607 genes and proteomic data of 7,853 proteins for 77 breast cancer patients.

Raw Data

Raw -omics data and clinical data for the breast cancer patients can be found in Mertins, et al. (2016). For the purpose of this project, the Cancer Genome Atlas (TCGA) identifiers have been stripped of their 'TCGA-' prefix.

Directories

data

  • rna.csv: global transcriptomic data
  • rna_filtered.csv: transcriptomic data filtered for genes present in at least 90% of the samples
  • rna_pam50.csv: ranscriptomic data for PAM50 genes
  • rna_protein_pam50_norm.csv: normalized transcriptomic and proteomic data for PAM50 genes and proteins
  • rna_pam50_mofa_lf6_n47.csv: transcriptomic data for the top 47 genes in MOFA LF6
  • protein.csv: global proteomic data
  • protein_filtered.csv: proteomic data filtered for proteins present in at least 90% of the samples
  • protein_pam50.csv: proteomic data for PAM50 proteins
  • mofa_trained_model.RData: output of trained MOFA model on filtered transcriptomic and proteomic data (see MOFA for a complete guide to train a MOFA model)
  • samples.csv: vital status, PAM50 subtype, ER, PR, and HER2 marker status for each patient

src

  • heatmap.R: function for plotting heatmaps using the pheatmap package
  • hierarchicalclustering.R: functions for hierarchical clustering of transcriptomic and proteomic data
  • normalization.R: functions for row-median centering, log-transformation of transcriptomic data and imputation of missing values in proteomic data

output

  • output files from analysis

Analysis

  • hierarchicalclustering_analysis.R: hierarchical clustering of transcriptomic and proteomic data
  • hierarchicalclustering_quantify.R: quantitative analysis of heterogeneity in clusters produced by hierarchicalclustering_analysis.R that are stored in output/hierarchicalclustering_clusters.xlsx
  • clustering_results.xlsx: manual mapping of cluster assignments in hierarchical_clusters.xlsx to PAM50 subtype names; includes columns for patient identifiers and original PAM50 subtype assignment
  • mofa.R: MOFA of filtered transcriptomic and proteomic data
  • gsea.R: gene set enrichment analysis of top 47 genes in MOFA LF6

Acknowledgements

  • Vogel Lab (New York University) for sponsoring this project
  • Hyungwon Choi (National University of Singapore) for collaboration on this project

unsupervised-learning-breast-cancer-subtypes's People

Contributors

amylei96 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.