Coder Social home page Coder Social logo

fastrna's Introduction

Documentation Status

FastRNA

FastRNA is a scalable framework for single-cell RNA sequencing (scRNA-seq) analysis.

Dependencies and Installation

Dependencies can be installed using the following command.

conda install -c conda-forge numpy scipy mkl mkl-include cython pandas

Currently, there is an issue with setuptools>=60.0.0 so please install an older version. For example,

conda install -c conda-forge setuptools=58.0.4

The package can be installed by

pip install git+https://github.com/hanbin973/FastRNA.git

Note that the current implementation does not work in ARM-based MAC systems because MKL only works in x86 processors.

Getting started

FastRNA requires two inputs: a gene x cell matrix mtx and a numpy array containing batch labels batch_labels. Note that both mtx and batch_labels should be sorted in an ascending order according to batch_labels. Also, batch_label should be in an integer format. This can be done by the following commands.

batch_label = pd.factorize(batch_label)[0] # convert batch label to integer
idx_sorted = batch_label.argsort() # sort index in an ascending order

mtx = mtx[:,idx_sorted] # reorder mtx according to sorted index
batch_label = batch_label[idx_sorted] # reorder mtx according to sorted index

Functions

Two functions fastrna_hvg and fastrna_pca performs feature selection and principal component analysis (PCA), respectively. For feature selection, gene_var = fastrna_hvg(mtx, batch_label) will return an array of length n_gene which is the number of genes (= equals the number of rows of mtx) that contains the variance of genes. These variances can be used for feature selection (e.g. top 1000 genes with highest variance).

For PCA, eig_val, eig_vec, pca_coord, cov_mat = fastrna_pca(mtx, numi, batch_label) will return four objects: eigenvalues, eigenvectors, PCA coordinates and the covariance matrix. numi is the user-specified size factor. A typical choice would be the sum over all UMI counts inside a cell, therefore, numi = np.asarray(mtx.sum(axis=0)).ravel().

Use example

Create the fastrna folder inside the your project folder and download the .so in this repository. A usage example can be found here.

Caution

Current scipy sparse matrix does not enforce index sorting. Therefore, whenever one takes a row subset using mtx[some_index,:], run mtx.sort_indices() before using the functions of FastRNA. It will take less than a second even for very large matrices.

License

The FastRNA Software is freely available for non-commercial academic research use. For other usage, one must contact Buhm Han (BH) at [email protected] (patent pending). WE (Hanbin Lee and BH) MAKE NO REPRESENTATIONS OR WARRANTIES WHATSOEVER, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THE CODE PROVIDED HERE UNDER. IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO CODE ARE EXPRESSLY DISCLAIMED. THE CODE IS FURNISHED "AS IS" AND "WITH ALL FAULTS" AND DOWNLOADING OR USING THE CODE IS UNDERTAKEN AT YOUR OWN RISK. TO THE FULLEST EXTENT ALLOWED BY APPLICABLE LAW, IN NO EVENT SHALL WE BE LIABLE, WHETHER IN CONTRACT, TORT, WARRANTY, OR UNDER ANY STATUTE OR ON ANY OTHER BASIS FOR SPECIAL, INCIDENTAL, INDIRECT, PUNITIVE, MULTIPLE OR CONSEQUENTIAL DAMAGES SUSTAINED BY YOU OR ANY OTHER PERSON OR ENTITY ON ACCOUNT OF USE OR POSSESSION OF THE CODE, WHETHER OR NOT FORESEEABLE AND WHETHER OR NOT WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, INCLUDING WITHOUT LIMITATION DAMAGES ARISING FROM OR RELATED TO LOSS OF USE, LOSS OF DATA, DOWNTIME, OR FOR LOSS OF REVENUE, PROFITS, GOODWILL, BUSINESS OR OTHER FINANCIAL LOSS.

Commercialization

To commercialize this software/algorithm, please contact Genealogy Inc. Use of this algorithm for commercial purposes, including implementing and recoding the same algorithm yourself to circumvent license protection, without permission is prohibited as the algorithm is patented. In addition, it is forbidden to insert this code or algorithm into other software packages without permission.

Citation

This work has been accepted at the American Journal of Human Genetics. Please cite as

H Lee, and B Han (2022). FastRNA: an efficient solution for PCA of single-cell RNA sequencing data based on a batch-accounting count model. Am J Hum Genet, in press

fastrna's People

Contributors

hanbin973 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.