Coder Social home page Coder Social logo

oubounyt / sc-networks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from l1w1/sc-networks

0.0 0.0 0.0 40.07 MB

My internship working on single cell RNA data and creating networks to find genes that are expressed differently in e.g. healthy and diseased cells.

Jupyter Notebook 100.00%

sc-networks's Introduction

sc-Networks

My internship working on single cell RNA data and creating networks to find genes that are expressed differently in e.g. healthy and diseased cells.

Goal:

Create a workflow to go from scRNA data to gene interaction networks that represent the interactions that are highly different in e.g. healthy and diseased cells (two different trajectories as a result of pseudotime analysis). After that verify the results by computing the correlation between the gene expression. Perfom a test for permutation p-value.

Data:

For this project I used different datasets from scanpy (https://scanpy.readthedocs.io/en/latest/index.html).

Files:

01_trajectories_processing.ipynb

Basic preprocessing of the sc-RNA data. Visualized the data and doing pseudotime analysis. Marked the two different trajectories and sorted the cells either to trajectory A or trajectory B. To try to understand what happens along those trajectories they are splitted into windows so that a cell belongs to one trajectory and one window. Then infer the adjacencies of the genes to each other for each window and create regulons that represent the interactions. Therefore grnboost from pyScenic (https://pyscenic.readthedocs.io/en/latest/) is used. Most of the steps here are taken from the scanpy tutorial "Preprocessing and clustering 3k PBMCs" (https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html).

02_abs_matrix.ipynb

From the weights of each transcription factor gene interaction the absolute value of the difference between trajectory A and trajectory B is taken. So if a tf-gene interaction has a high difference between A and B this results in an high absolute value. Those values are stored in a matrix for each window.

03_window_networks.ipynb

This script takes the matrices with the absolute values and creates a network for each window where the tf-gene interactions with the highes values are shown. So these networks show those interactions that vary the most between the two trajectories.

04_combined_network.ipynb

Here all the window networks are combined so that it results in one large network where the interactions with the largest differences between the two trajectories for each window are combined. Some of the genes are significant in more than one window.

05_correlation_and_testing

Compute the correlation between two vectors of the expression matrix and compute the absolute value between them. Then do this for each pair in a window and after that for each pair in each window. To make sure to use a useful correlation method I tested two different methods to decide which one to use in the end. I decided to go with pearsons correlation coefficient.

After that I tried to evaluate significance of correlation results with permutation test for p-value. Therefore the expression matrices were randomized and the correlation is for each tf-gene pair computed. Then it can be evaluated how significant the computed correlation the real expression of tf and gene is. For example if the correlation is very high for the real expression the random matrices would not reach a similar correlation value so no value is higher as the real correlation which means it is significant.

I had and still have a lot of problems with efficiency because computing the correlation for each tf-gene pair for a lot of randomized matrices takes a lot. With multiprocessing the needed time got less but still it is not satisfying.

Resume

I learned a lot especially how to deal with single cell RNA data but also get to know a lot tools during this internship. There is much to improve on this project especially with the confimation of correctness of my workflow and the speed for the correlation computation. Still I had fun trying out different methods and experimenting with various packages and tools. Thanks to my supervisor for helping me through this and being so kind and patient with me!

sc-networks's People

Contributors

l1w1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.