Coder Social home page Coder Social logo

sriq's Introduction

Abstract

This is an implementation of Systematic Random forest Integrative Qualitative threshold (SRIQ) clustering. SRIQ clusters by finding a small amount of highly correlated observations, then spiralling out from them to create bigger clusters. SRIQ evaluates clustering solution stability on its own and won't need user input for what number of cluster solutions to be evaluated. SRIQ has no limit to feature size performance wise, and can be run on ordinary home computers.

For more information about SRIQ, see our publication

Installation

To install this repository simply create a folder and clone the repository:

git clone https://github.com/Fattigman/SRIQ
cd SRIQ
pip install -r requirements.txt

Usage

To run the pipeline, start the jupyter notebook file and follow the instructions within the pipeline.

jupyter notebook analysis_pipeline

To run SRIQ, navigate to the folder in which the VRLA.jar file exist and run following command:

java -jar VRLA.jar

Features

The project provide following features:

  • SRIQ-clustering
  • Pre-clustering data normalization
  • Silhoutte-plot analysis
  • UMAP
  • Differential gene expression with SAM
  • Gene enrichment analysis with EnrichR against customizable databases
  • Visualization of single or multiple genes across clusters

Data format requirements

Your data, to be clustered, should be a tab separated .txt file and look like this:

For SRIQ to accept the data to be clustered, the file has to be in following format:

Gene Sample1 Sample2 ... SampleN
Genename 1 val1 val2 ... valN
... ... ... ... ...
Genename M val1 val2 ... valN

For clustering, expression data should be:

  1. Off-set by 0.1 OR set all values < 1 set to 1
  2. Log2transformed
  3. Median-centered

For SAM-analysis:

  1. Off-set by 0.1 OR all values < 1 set to 1
  2. Log2transformed

Before running SRIQ, test.properties file need to be correctly configured. The following lines needs to be correct otherwise SRIQ won't start.

studyName: Desired output name for project
studyPath: Folderpath to expression data
inFileName: expression file. exclude '.txt' from file name
outPath: Folderpath for SRIQ output

The enrichR module assumes that gene names are in the form of gene symbols. I have implemented mygene api, set the variable 'scopes' as the format of your gene names.

Package requirements

For python packages run following command:

pip install -r requirements.txt

To run SRIQ and SAM analysis java is needed on your system.

License

Copyright (C) 2021 Srinivas Veerla

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

sriq's People

Contributors

fattigman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.