Coder Social home page Coder Social logo

paralog_essentiality's Introduction

Paralog Essentiality

Source code for Paralog buffering contributes to the variable essentiality of genes in cancer cell lines

Cite as: De Kegel & Ryan (2019) Paralog buffering contributes to the variable essentiality of genes in cancer cell lines. PLoS Genet

Data analysis notebooks overview:

All of the main and most of the supplementary figures for the manuscript are generated in these notebooks.

Notebook Figures Brief description
01_paralog_multitarget.ipynb Fig. S1A, S1B Investigate paralogs affected by multi-targeting sgRNAs
02_paralogs_vs_essentiality.ipynb Fig. 2, 3 Compare essentiality of paralogs and singletons
03_WGD_vs_SSD.ipynb Fig. 4 Compare essentiality of WGD vs. SSD paralogs
04_regression_model.ipynb Compare regression models for essentiality with different paralog features
05_A1_essentiality_vs_A2_expression.ipynb Fig. 5, S2B Identify putative SL pairs using essentiality (CERES scores) and gene expression
06_A2_loss_reasons.ipynb Identify potential explanations (copy number, mutation) for gene loss
07_putative_SL_features.ipynb Fig. 6, S2A, S2C, S2D Identify features of putative SL paralog pairs

NOTE: Figures S3 and S4 consist of Figs 2, 3B, 3C and 4B re-drawn using different fitness score thresholds

Data processing notebooks overview:

These notebooks process raw/third party data for use in the analysis.

Notebook Figures Brief description
01_process_protein_complexes.ipynb Create map of Entrez ID to CORUM protein complex ID (protein complex membership)
02_process_ensembl_paralogs Generate list of protein-coding paralog pairs with min. 20% sequence identity + some feat. annotations
03_generate_gene_summary.ipynb Fig. S1C For each gene, summarize essentiality and paralog features
04_process_depmap_expr.ipynb Filter CCLE gene expression for use in data analysis nb 5

Running CERES overview:

These notebooks + R scripts are used to re-process logfold change data from DepMap CRISPR screens with CERES, to remove multi-targetting sgRNAs. The output from running all scripts in this folder is local_data\processed\depmap19Q1\gene_scores_11_07_19.csv. The notebooks can run in the conda environment, but the R scripts should be run independently (e.g. in R studio). Code to install the necessary Bioconductor packages + CERES is included in the R scripts. The R scripts also require the bowtie and samtools command line tools which are only available for Linux/macOS.

Order Notebook or R scripts Brief description
1 01_preprocess_depmap_data Extract sgRNAs to align and format files for CERES
2 R_scripts/align_guides.R Align all sgRNAs to the reference genome (hg19), allowing up to two mismatches
3 02_filter_multi_target_guides Filter out sgRNAs that align to readthrough genes and/or to multiple genes (incl. w/ mismatch)
4 R_scripts/run_CERES.R Run CERES with the filtered logfold changes data
5 03_ceres_post_processing Process CERES output to drop genes targeted by too few guides + normalize scores to reference (non-)essential genes
6 04_essential_cutoff.ipynb Calculate a cutoff fitness score for cell line specific essentiality

To run Jupyter notebooks:

  • Obtain 3rd party data - sources are listed in data_sources
    • The folder for running CERES has its own data_sources file with the third party files that are only needed there
    • Note: this is primarily needed for running notebooks in the 0_running_ceres and 1_data_processing folders
  • Set path to 3rd party data directory in environment.yml (3rd_party_dir)
    • Note: this can also be set in the activate environment with: conda env config vars set 3rd_party_data=my_dir
  • Create conda environment from the environment.yml file: conda env create -f environment.yml
  • Activate conda environment: conda activate paralogSL
  • Start notebooks: jupyter notebook

paralog_essentiality's People

Contributors

dekegel avatar

Watchers

 avatar

Forkers

cancergenetics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.