Coder Social home page Coder Social logo

raqmejtru / bch339n_predicting_tumor_phenotypes_from_gene_expression Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 29.22 MB

We use gene expression data to perform gene set enrichments and random forest predictions to distinguish between four tumor types.

R 100.00%
brca deseq2 gsea meso random-forest umap lgg skcm

bch339n_predicting_tumor_phenotypes_from_gene_expression's Introduction

BCH339N: Predicting Tumor Phenotypes from Gene Expression



slide_01 The objective of our research is to predict various tumor phenotypes from gene expression data.

slide_02 Our presentation today will start out by outlining our two project objectives.

After that, we’ll visualize how our patient samples cluster, then perform a differential expression analysis and gene set enrichment analysis.

Next, we’ll use random forest predictions to assign tumor phenotypes.

And lastly we’ll discuss the broader impacts of our findings.

slide_03 Let’s start out by outlining our objectives and explaining where we sourced our data.

slide_04 We started out by searching The Cancer Genome Atlas for transcriptomic data of tumors that we predicted would be biologically diverse.

Our analyses are based around gene expression data for breast cancer samples, skin melanoma samples, low grade glioma samples (which is a type of tumor that occurs in non-neuronal nervous system cells, like the spinal cord), and lastly, samples from mesothelioma tumors (which is usually caused by asbestos exposure).

slide_05 Our first objective was to determine which underlying biological processes characterize each of the four tumor types.

Our second objective was to determine whether gene expression profiles could be used to predict the identity of a tumor.

slide_06 I’ll now go through the steps that we took to address the first objective. The first step in working with high dimensional data like transcriptomic profiles is to cluster the information into interpretable dimensions. For this, I used the UMAP method for dimensionality reduction.

slide_07 UMAP is a non-linear algorithm that projects high dimensional data into two dimensions. So for example, the gene expression matrix with 60k genes and 1200 samples was able to be simplified into two dimensions using this algorithm.

slide_08 Why is dimensionality reduction useful? It allows us to verify that the overarching differences between patient sample data is caused by biological groups like tumor type, as opposed to other underlying variables, perhaps like age or sex.

By verifying that clusters correspond to biological groups, the design of our differential gene expression experiment is more robust.

slide_09 Here is the two dimensional projection of expression counts. We can see that for the most part, samples cluster based on their tumor identity. It’s important to note that the distances between clusters do not have any meaning using UMAP, since the algorithm is non-linear.

slide_10 Now that we validated that tumor identity is a justifiable way to group our samples, we performed a differential expression analysis followed by a gene set enrichment analysis.

slide_11 Within this portion of the analysis, the first goal was to determine which genes were over-expressed in each tumor type.

Four DESeq experiments were designed so that log2Fold changes of a particular tumor type were compared to the remaining samples.

Once DESeq provided log2Fold changes and test statistics for each gene, a gene set enrichment analysis was performed to determine which biological pathways were over-expressed in each tumor type.

We used the Hallmark set of 50 well defined biological pathways for this analysis.

Statistics from DESeq were used to rank genes by their importance in each of the biological pathways.

slide_12 Here are the gene set enrichment results for breast cancer tumors.

The y axis defines the pathways expressed in the data, and the x axis describes the normalized enrichment scores. Negative enrichment scores indicate that the pathway was over-expressed in breast cancer samples.

Our results support that the most over-expressed genes belong to estrogen response pathways.

slide_13 Based on our data, we characterize breast cancer samples by their expression of estrogen response pathways.

This is a reasonable pathway since estrogen is responsible for female sex characteristics.

slide_14 Next, we look at the gene set enrichment results for skin cancer tumors.

Again, negative enrichment scores indicate that the pathway was over-expressed in skin cancer samples.

Our results support that the most over-expressed genes belong to MYC pathways.

slide_15 Based on our data, we characterize skin cancer samples by their expression of MYC pathways, which are oncogenic transcription factors.

slide_16 Next, we look at the gene set enrichment results for low grade glioma tumors.

Again, negative enrichment scores indicate that the pathway was over-expressed in glioma samples.

Our results support that the most over-expressed genes belong to hedgehog signaling pathways.

slide_17 Based on our data, we characterize low grade glioma samples by their expression of hedgehog signaling pathways, which play important roles in stem cell regulation.

This is a reasonable pathway since low grade gliomas occur in the spinal cord, which houses lots of stem cells.

slide_18 Next, we look at the gene set enrichment results for mesothelioma tumors.

Again, negative enrichment scores indicate that the pathway was over-expressed in mesothelioma samples.

Our results support that the most over-expressed genes belong to interferon response pathways.

slide_19 Based on our data, we characterize mesothelioma samples by their expression of interferon response pathways, which play important roles in cell immune response.

This is a reasonable pathway since asbestos exposure is a frequent cause of mesothelioma.

slide_20 slide_21 slide_22 slide_23 slide_24 slide_25 slide_26 slide_27 slide_28 slide_29 slide_30 slide_31 slide_32 slide_33 slide_34

bch339n_predicting_tumor_phenotypes_from_gene_expression's People

Contributors

raqmejtru avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.