Coder Social home page Coder Social logo

velorama's Introduction

Velorama - Gene regulatory network inference for RNA velocity and pseudotime data

http://cb.csail.mit.edu/cb/velorama/velorama_v5.png

Velorama is a Python library for inferring gene regulatory networks from single-cell RNA-seq data

It is designed for the case where RNA velocity or pseudotime data is available. Here are some of the analyses that you can do with Velorama:

  • infer temporally-causal regulator-target links from RNA velocity cell-to-cell transition matrices.
  • infer over branching/merging trajectories using just pseudotime data without having to manually separate them.
  • estimate the relative speed of various regulators (i.e., how quickly they act on the target).

Velorama offers support for both pseudotime and RNA velocity data.

Velorama is based on a Granger causal approach and models the differentiation landscape as a directed acyclic graph (DAG) of cells, rather than as a linear total ordering required by previous approaches.

API Example Usage

Velorama is currently offered as a command line tool that operates on AnnData objects. [Ed. Note: We are working on a clean API compatible with the scanpy ecosystem.] First, prepare an AnnData object of the dataset to be analyzed with Velorama. If you have RNA velocity data, make sure it is in the layers as required by CellRank and scVelo, so that transition probabilities can be computed. We recommend performing standard single-cell normalization procedures (i.e. normalize counts to the median per-cell transcript count and log transform the normalized counts plus a pseudocount). Next, annotate the candidate regulators and targets in the var DataFrame of the AnnData object as follows.

adata.var['is_reg'] = [n in regulator_genes for n in adata.var.index.values]
adata.var['is_target'] = [n in target_genes for n in adata.var.index.values]

Here regulator_genes is the set of gene symbols or IDs for the candidate regulators, while target_genes indicates the set of gene symbols or IDs for the candidate target genes. This AnnData object should be saved as {dataset}.h5ad.

We provide an example dataset here: mouse endocrinogenesis. This dataset is from the scVelo vignette and is based on the study by Bergen et al. (2020).

The below command runs Velorama, which saves the inferred Granger causal interactions and interaction speeds to a given directory.

velorama -ds $dataset -dyn $dynamics -dev $device -l $L -hd $hidden -rd $rd

Here, $dataset is the name of the dataset associated with the saved AnnData object. $dynamics can be "rna_velocity" or "pseudotime", depending on which data the user desires to use to construct the DAG. $device is chosen to be either "cuda" or "cpu". $rd is the name of the root directory that contains the saved AnnData object and where the outputs will be saved. Among the optional arguments, $L refers to the maximum number of lags to consider (default=5). $hidden indicates the dimensionality of the hidden layers (default=32).

We encourage you to report issues at our Github page ; you can also create pull reports there to contribute your enhancements. If Velorama is useful for your research, please consider citing bioRxiv (2022).

velorama's People

Contributors

alexw16 avatar amudide avatar rs239 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

velorama's Issues

utils module not found

hi there,

thanks for the package which I will test soon. Regarding the installation I'd have the following feedback:

the utils.py is not found , so somehow the linking is not properly working or you have to specify it differently in the import statements? One workaround is to add the package path to the python path (sys.path) or simply run velorama inside the site-package/velorama.....egg/velorama folder

cheers
Daniel

feature request

Hi there,

I'd have another feature request as I computed pseudotimes with other tools and not scVelo. Therefore I'd have everything in my adata object, even an integrated embedding after batch correction which I'd like to use.

So it would be great to be able to specify pre-existing pca's or other embeddings to be used as well as the option to use non scVelo pseudotimes (set the iroot to None) as otherwise an error will be thrown.

kind regards
Daniel

running velorama

Hi,

I've encountered several issues running velorama

  1. precomputed pseudotime has to be saved in adata.obs['pseudotime'] though scVelo saves it either as dpt_pseudotime or velocity_pseudotime
  2. the ray.init in run.py kills the process on a cluster as it tries to spawn as many processes as cpus are detected, if this differs from the number that were requested the worker process is killed, so here an additional parameter to define the number of cores to be used would be helpful . the same is true for the memory if the user supplies less than 10GB, this should be anothe parameter
  3. when creating is_target and is_reg as suggested by the code in the github repo, it's an adata.var entry of the same length with logical values, in the code it's simply taking the shape, so the status message prints the dimension of adata.var and not the actual number of regulators and targets. hope this is just the print statement and not affecting downstream analysis. For the proper print statement rather use X.sum() and Y.sum()

best
Daniel

setup.py calls sklearn instead of scikit-learn and prevents install

Hi,

I am reporting a potential issue with install/setup script. Velorama will not install natively on Linux with the following error. Does setup.py call a depreciated version of sklearn?

Collecting sklearn
  Using cached sklearn-0.0.post10.tar.gz (3.6 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
      rather than 'sklearn' for pip commands.
      
      Here is how to fix this error in the main use cases:
      - use 'pip install scikit-learn' rather than 'pip install sklearn'
      - replace 'sklearn' by 'scikit-learn' in your pip requirements files
        (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
      - if the 'sklearn' package is used by one of your dependencies,
        it would be great if you take some time to track which package uses
        'sklearn' instead of 'scikit-learn' and report it to their issue tracker
      - as a last resort, set the environment variable
        SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
      
      More information is available at
      https://github.com/scikit-learn/sklearn-pypi-package
      
      If the previous advice does not cover your use case, feel free to report it at
      https://github.com/scikit-learn/sklearn-pypi-package/issues/new
      [end of output]

subprocess-exited-with-error

Thank you for this package, I tried installing Velorama using
pip install velorama

But I get this error:
error: subprocess-exited-with-error

python setup.py egg_info did not run successfully.
exit code: 1

[8 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\ahmed\AppData\Local\Temp\pip-install-sgrrhble\sklearn_fa38950c8574492da0c4cb7e252dd14d\setup.py", line 10, in
LONG_DESCRIPTION = f.read()
File "C:\Users\ahmed\miniconda3\envs\my_project\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 7: character maps to
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I tried to upgrade pip, wheel, setuptools, but it didn't work.
Any help will be appreciated
Thank you

SERGIO simulation data with technical noise or not?

Hi,
Very nice work! I noticed that you used SERGIO to generate simulated data for benchmarking, does the simulated data have technical noise added to it?

If you added technical noise, could you please provide specific parameters?
If you did not add technical noise, I would like to know whether this would have a big impact on the benchmark results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.