Coder Social home page Coder Social logo

deib-geco / gis-weigthed_lasso Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 0.0 25.09 MB

Integrative approach to feature selection combining weighted LASSO and prior biological knowldge

Jupyter Notebook 95.89% Python 4.08% R 0.03%
classification feature-selection gene-expression-profiles lasso prior-knowledge

gis-weigthed_lasso's Introduction

GIS-weighted LASSO

Enhancing functional interpretability in gene expression data analysis by prior-knowledge incorporation

Description

We developed an integrative approach to feature selection that combines weighted LASSO feature selection and prior biological knowledge in a single step by means of a novel score of biological relevance that summarizes information extracted from popular biological knowledge bases.

Application Use Cases

We compared the performance of the standard regularized LASSO model and our proposed approach on two application use cases concerning the cancer-related subtype prediction of patients based on gene expression data. The use cases concern the classification of Breast Invasive Carcinoma (BRCA) patients and Colorectal Cancer (CRC) patients in their corresponding cancer subtypes. We also performed two distinct sensitivity analyses to evaluate the impact of incorporating our proposed score of biological relevance into LASSO regularization. We used a controlled dataset with limited correlation among the features for these analyses, considering publicly available RNA-seq profiles of Kidney Renal Clear Cell Carcinoma patients from The Cancer Genome Atlas (TCGA) project. The preprocessed dataset is available here, along with the list of features considered in the controlled dataset. For all datasets analysed, the data are preprocessed as described in the notebooks found here.

Implementation

To perform the GIS-weighted LASSO in Python using the scikit-learn library, we modified the corresponding functions using the development version of scikit-learn. The modified package is available here. After cloning the repository, build a dedicated environment with:

conda create -n sklearn-env -c conda-forge python=3.9 numpy scipy cython=0.29.33 
conda activate sklearn-env

Then, build the scikit-learn package with:

cd scikit-learn-lasso 
pip install -v --no-use-pep517 --no-build-isolation -e . 

Lastly, install the required packages from requirements.txt

We computed the score of biological relevance using the specific versions of the knowledge bases, which can found here. These versions are the following:

  • GO (format-version 1.2, release date: 2023-03-06)
  • Reactome (version V85)
  • HPO (format-version 1.2, release date: 2023-09-01)

To download and use the updated versions, run this notebook. We used the biomaRt R library (version 2.56.1) to extract all available GO annotation terms. To obtained updated GO annotation terms for each gene run the R script here.

Additional Information

All the code is available here. To replicate the experiments run the following scripts:

  • experiments_BRCA.py for experiments on the BRCA dataset
  • experiments_CRC.py for experiments on the CRC dataset

The code to replicate the two sensitivity analyses is available here. All the results from the experiments we performed can be found here.

gis-weigthed_lasso's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.