Coder Social home page Coder Social logo

tlatkowski / deep-learning-gene-expression Goto Github PK

View Code? Open in Web Editor NEW
32.0 3.0 17.0 43.38 MB

Deep learning methods for feature selection in gene expression autism data.

License: MIT License

Python 41.01% Shell 0.22% Jupyter Notebook 58.76%
gene-expression autism neural-network deep-learning feature-selection autism-data gene-expression-profiles gene-annotation feature-extraction feature-detection

deep-learning-gene-expression's Introduction

Deep learning methods for gene expression

Deep learning methods for feature selection in gene expression autism data.

Description

This project implements several features selection algorithms intended for finding the most significant subset of genes and gene sequences stored in dataset of gene expression microarray.

Current version of project provides the following list of feature selection algorithms:

  • Fisher discriminant analysis
  • two sample t-test
  • feature correlation with a class

More implementation details of the above methods can be found here:

Data mining for feature selection in gene expression autism data

Feature selection methods in application to gene expression: autism data

The outcome of feature selection stage is consumed by fully connected feedforward neural network. The following list of hyperparameters can be configured in this neural network:

  • number of layers,
  • number of hidden units in each layer,
  • activation function: sigmoid, tanh and ReLU,
  • L2 lambda reguralization parameter.
  • batch size,
  • number of epochs.

Model Flow

The below diagram depicts the training and testing procedures:

Dataset

The dataset is publicity available and was downloaded from GEO (NCBI) repository. Data file in this repository was cleaned up and contains only raw data with annotated genes and gene sequences annotations.

Dataset details

Number of observations in this dataset equals 146 and number of genes 54613. The database consists of two classes: the first one is related to children with autism (n=82) and the second to control (healthy) children (n=64). Blood draws for all subjects were done between the spring and summer of 2004. Total RNA was extracted for microarray experiments with Affymetrix Human U133 Plus 2.0 39 Expression Arrays.

Run the pipeline locally

Installation (Ubuntu)

In order to install all requirements execute the following script: (If needed add 'execute' permission to install.sh script before running it):

chmod a+x bin/install.sh
./bin/install.sh

Then activate the Virtual Environment (if needed):

source .venv/bin/activate

In order to run the pipeline execute:

python pipeline.py

Run the pipeline on Google Colab

In order to run the pipeline on Google Colab use the following notebook: Deep Learning Gene Expression in Google Colab

Pipeline configuration

Pipeline gives you possibility to tweak training parameters. In order to modify them use configuration file placed in ./config/experiment_setup.yml. Below you can find the default configuration:

selection_methods:
  - method: fisher
    num_features: 100
  - method: ttest
    num_features: 100
  - method: corr
    num_features: 100
  - method: random
    num_features: 100
hyperparameters:
  learning_rate: 0.001
  input_size: 100
  hidden_sizes: [80]
  output_size: 1
  num_features: 100
  activation_function: 'tanh'
  lambda_reg: 0.8
  norm_data: True
  data_file: 'data/data.tsv'
training:
  num_epochs: 10000
  cross_validation_folds: 10
  batch_size: 20  # online learning when batch_size=1

deep-learning-gene-expression's People

Contributors

tlatkowski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deep-learning-gene-expression's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.