Coder Social home page Coder Social logo

marvinquiet / cellcano Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 2.0 1.51 MB

Supervised cell type identification for scATAC-seq data

Home Page: https://marvinquiet.github.io/Cellcano/

License: MIT License

R 44.74% Python 53.32% Shell 1.95%
bioinformatics cell-type-classification cell-type-identification celltype-annotation single-cell supervised-classification-methods

cellcano's Introduction

Cellcano


PyPI version Python 3.8 DOI

Cellcano is an open source software for supervised cell type identification (celltyping) in scATAC-seq data, published in Nature Communications. The motivation to develop Cellcano are:

  1. Supervised methods are more accurate, robust and efficient than unsupervised clustering methods in scATAC-seq data
  2. With more high-quality scATAC-seq datasets being generated, methods using scATAC-seq as references can have better prediction performances and are in high demand

More details and tutorial: https://marvinquiet.github.io/Cellcano/.

Table of Contents

System Requirements

Hardware requirements

Cellcano package requires only a standard computer with enough RAM to support the in-memory operations. Cellcano can use GPU if the computer has the GPU resource but it is not required.

Software requirements

OS requirements

Cellcano supports macOS, Linux and Windows. It has been tested on all three systems. (However, Cellcano has not been tested on M1 because I do not have the test environment. Thanks to @nleroy917, who has helped with the installation on M1, which can be referred in Issue #6.)

Dependencies

Cellcano requires the following:

  • python (3.8 recommended)
  • R
  • tensorflow (2.7.1)
  • anndata (0.7.4)
  • scanpy (1.8.2)
  • numpy (1.19.2)
  • h5py (2.10.0)
  • keras (version compatible with tensor flow)
  • rpy2 (version compatible with both Python and R)
  • cuda toolkit and nvidia cudnn if using GPU, more information can be found here

If the input is scATAC-seq raw data (i.e. fragment file or bam file), ArchR package has to be installed.

Installation

The most convinient way is to install with pip.

pip install Cellcano

To upgrade to a newer release use the --upgrade flag.

pip install --upgrade Cellcano

We have a detailed tutorial on installation in our documentation.

License

This project is covered under the MIT license.

Citing Our Work

For usage of the package and associated manuscript, please cite:

@article{ma23cellcano,
  title   = {Cellcano: supervised cell type identification for single cell ATAC-seq data},
  author  = {Ma, Wenjing and Lu, Jiaying and Wu, Hao},
  journal = {Nature Communications},
  year    = {2023},
  month   = {Apr.},
  day     = {03},
  volume={14},
  number={1},
  pages={1864},
  issn={2041-1723},
  doi={10.1038/s41467-023-37439-3},
  url={https://doi.org/10.1038/s41467-023-37439-3}
}

cellcano's People

Contributors

lujiaying avatar marvinquiet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

nleroy917 mang30

cellcano's Issues

Data preprocessing

Inspired by the newly downloaded mouse brain datasets.

I use ArchR to process the raw input. For those data provided by SnapATAC, I preprocess the data to 5kb tile matrix and generate ArchR gene score matrix.

  • Raw Input format:
    • *.fragment.tsv.gz: fragment file (mostly from 10X platform)
    • *.bam: aligned bam file
  • Processed Output:
    • *.barcodes.tsv.gz: cell barcode information
    • *.genescore.mtx.gz: ArchR best gene score matrix
    • *.genes.tsv.gz: genes from gene score matrix

Provide a selection for genome such as mm9, mm10, hg19, hg38 -> we then use the corresponding genomic annotation for the ArchR preprocess

02-01

Finished ArchR preprocessing on fragment file

04-07

  • Finished the ArchR preprocessing sub-command line
  • Finished training model using KD model
  • Figured out how to use logger to generate console/file log

LinkError occurs when installing Cellcano

I followed the installation tutorial on the Cellcano website. But when I try to install Cellcano using this command below, there are some strange errors about the glibc.

pip install Cellcano

I received the following error message.

Installing build dependencies ... done                                                                                                                                                                                                   
  Getting requirements to build wheel ... error                                                                                                                                                                                            
  error: subprocess-exited-with-error                                                                                                                                                                                                      
                      
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [37 lines of output]
      /home/wangzian/anaconda3/envs/Cellcano/bin/../lib/gcc/x86_64-conda-linux-gnu/12.1.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/wangzian/anaconda3/envs/Cellcano/lib/R/lib/../../libreadline.so.8: undefined reference to `__fdelt_chk@GLIBC_2.15'
      /home/wangzian/anaconda3/envs/Cellcano/bin/../lib/gcc/x86_64-conda-linux-gnu/12.1.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/wangzian/anaconda3/envs/Cellcano/lib/R/lib/../.././libtinfow.so.6: undefined reference to `__poll_chk@GLIBC_2.16'
      /home/wangzian/anaconda3/envs/Cellcano/bin/../lib/gcc/x86_64-conda-linux-gnu/12.1.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/wangzian/anaconda3/envs/Cellcano/lib/liblzma.so: undefined reference to `memcpy@GLIBC_2.14'
      /home/wangzian/anaconda3/envs/Cellcano/bin/../lib/gcc/x86_64-conda-linux-gnu/12.1.0/../../../../x86_64-conda-linux-gnu/bin/ld: /home/wangzian/anaconda3/envs/Cellcano/lib/liblzma.so: undefined reference to `clock_gettime@GLIBC_2.17'
      collect2: error: ld returned 1 exit status
      Traceback (most recent call last):
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 269, in link
          self.spawn(linker + ld_args)
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 1041, in spawn
          spawn(cmd, dry_run=self.dry_run, **kwargs)
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/_distutils/spawn.py", line 68, in spawn
          raise DistutilsExecError(f"command {cmd!r} failed with exit code {exitcode}")
      distutils.errors.DistutilsExecError: command '/home/wangzian/anaconda3/envs/Cellcano/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/home/wangzian/anaconda3/envs/Cellcano/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/wangzian/anaconda3/envs/Cellcano/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/wangzian/anaconda3/envs/Cellcano/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 141, in <module>
        File "<string>", line 121, in get_r_c_extension_status
        File "<string>", line 82, in get_c_extension_status
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 781, in link_executable
          self.link(
        File "/tmp/pip-build-env-_f48gmvh/overlay/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 271, in link
          raise LinkError(msg)
      distutils.errors.LinkError: command '/home/wangzian/anaconda3/envs/Cellcano/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

At first I think maybe the glibc version was wrong, so I checked it with the command ldd --versionand got the following messages:

ldd (Ubuntu GLIBC 2.35-0ubuntu3.6) 2.35

I tried many methods but I can not figure it out. I also tried to install Cellcano on another machine which has the GLIBC 2.27, and I got the same problem. Can anyone help me or give me some useful advice? Thanks!

Add Conda Install Option

  • Add conda one-shot installation
  • Add document on website: seems like Wenjing did it..
  • Add test workflow
  • Manually test dataset download, script training.
  • Publish to Bioconda

Cannot create ArchR file for test data

Hello, thanks for the great tool. I recently try to generate input files for test data from 10X. I used the following command for the test data folder that includes fragments.tsv file:

Cellcano preprocess -i test_data -o test_data -g hg19 --threads 4

However, I met the following issues:

2023-07-07 20:13:02 : Detected 2 or less cells pass filter (Non-Zero median TSS = 1.43, median Frags = 8567.5) in file!
Check inputs such as 'filterFrags' or 'filterTSS' to keep cells! Exiting!

2023-07-07 20:13:02 : createArrowFiles has encountered an error, checking if any ArrowFiles completed..

And I can't find new files produced in the test_data folder. Can you help with this? Thanks.

Can't install Cellcano in M1 Mac

Hi! Thanks for providing this package. I really appreciate the clear documentation.

I had an issue installing Cellcano:
I can't seem to install Cellcano in a fresh virtual environment with python on my M1 MacBook. I first tried to install using the instructions in the documentation:

pip install Cellcano

This gives a really long, and odd stack trace:

pip install Cellcano
Collecting Cellcano
  Using cached Cellcano-1.0.4.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
INFO: pip is looking at multiple versions of cellcano to determine which version is compatible with other requirements. This could take a while.
  Using cached Cellcano-1.0.3.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
  Using cached Cellcano-1.0.2.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
  Using cached Cellcano-1.0.1.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
  Using cached Cellcano-1.0.0.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/Users/nathanleroy/sandbox/cellcano-test/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/nathanleroy/sandbox/cellcano-test/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/Users/nathanleroy/sandbox/cellcano-test/venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/private/var/folders/_w/yrr0qqbd52gc6jj0fnqyk8640000gn/T/pip-build-env-fhqejdy_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/private/var/folders/_w/yrr0qqbd52gc6jj0fnqyk8640000gn/T/pip-build-env-fhqejdy_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/private/var/folders/_w/yrr0qqbd52gc6jj0fnqyk8640000gn/T/pip-build-env-fhqejdy_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super(_BuildMetaLegacyBackend,
        File "/private/var/folders/_w/yrr0qqbd52gc6jj0fnqyk8640000gn/T/pip-build-env-fhqejdy_/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 4, in <module>
      FileNotFoundError: [Errno 2] No such file or directory: 'DESCRIPTION'
      [end of output]

It seemed to be getting stuck looking through PyPi versions... so I then cloned the repository and tried to install from source:

git clone https://github.com/marvinquiet/Cellcano.git
cd Cellcano
pip install .

This got me further but gave me a weird FileNotFoundError for the rpy2 dependency:

FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_w/yrr0qqbd52gc6jj0fnqyk8640000gn/T/pip-install-5_qkniwu/rpy2_6697f4de8fe84bab886138ed845b2719/requirements.txt'

Finally, I just manually removed all pinned dependencies from setup.py and specified the tensorflow-macos requirement instead of tensorflow, and then everything was installed nicely. I haven't tested it fully, but I thought I'd open the issue to let you know there might be install issues on an M1.

A question about multi-sample training

Hello, wenjing. I would like to ask you a question about training. When I predict cell types, I want to select training data from multiple different samples. How to deal with this situation, it seems that running the Cellcano train command multiple times cannot be solved, because the trained model needs to be selected in the second round of training. How to predict the situation of the multi-sample training set, thank you.

Sincerely,
Jingya

Version that uses SnapATAC2

hi guys,

I opened an issue a while back regarding some problems with installation #6 and a PR #7. I even tried Dockerizing it all. But that was not possible due to the balancing act of supporting python, R, and ArchR all in one container.

I'm not sure if you have seen, but SnapATAC2 was recently released, and it looks like a fantastic python-native substitution for ArchR when analyzing scATAC-seq data from within python. There's even a gene activity matrix construction tutorial. In addition to having a native python library, it

  1. doesn't pollute your directory with arrow files, logs and tmp dirs,
  2. is built on rust so it's faster, uses less memory, and is overall more efficient,
  3. supports AnnData natively - an extraordinarily common datatype for scATAC seq data.

If the only purpose of ArchR within this library is to preprocess the data, perhaps it would be worth transitioning to SnapATAC2 to make installation easier, enable dockerization, and make it faster overall.

Can't obtain the example data

Hi,Wenjing.Thank you for developing such a great tool.

I am currently using the human immune cell data set provided in the tutorial for testing, but the following three files cannot be downloaded.The web page indicates that the file has been deleted. Is there a new demo file, can you provide a copy, thank you.

#Can't download provided data:
wget https://www.dropbox.com/s/7mdevmylonlh4w4/ref_genescore.csv
wget https://www.dropbox.com/s/18n5wabgg8g2gob/ref_meta.csv
wget https://www.dropbox.com/s/e7g9vem3oxt096l/target_genescore.csv

Sincerely,
Jingya

A question about multi-sample training

Hi,wenjing.Bother you again.When I use Cellcano preprocess to create a gene matrix from the data, an error is reported, the specific content is as follows.I don't know if it's a problem with the file path or something else. Can you check it out for me?thank you.

R[write to console]: Saving ArchRProject...

R[write to console]: ArchR logging to : ArchRLogs/ArchR-getMatrixFromProject-12b345949cca7-Date-2023-08-25_Time-11-08-57.03071.log
If there is an issue, please report to github with logFile!

R[write to console]: Error in h5checktypeOrOpenLoc(file, readonly = TRUE, fapl = NULL, native = native) : 
  Error in h5checktypeOrOpenLoc(). Cannot open file. File 'NA' does not exist.

Traceback (most recent call last):
  File "/data/houjy/anaconda3/envs/cellcano/bin/Cellcano", line 8, in <module>
    sys.exit(main())
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/Cellcano/main.py", line 107, in main
    preprocess.preprocess(args)
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/Cellcano/preprocess.py", line 156, in preprocess
    _run_ArchR(input_files, sample_names, output_dir=output_dir)
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/Cellcano/preprocess.py", line 69, in _run_ArchR
    ArchR_genescore_mat = ArchR_genescore_func(ArchR_proj, useMatrix="GeneScoreMatrix")
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/rpy2/robjects/functions.py", line 208, in __call__
    return (super(SignatureTranslatedFunction, self)
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/rpy2/robjects/functions.py", line 131, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/rpy2/rinterface_lib/conversion.py", line 45, in _
    cdata = function(*args, **kwargs)
  File "/data/houjy/anaconda3/envs/cellcano/lib/python3.8/site-packages/rpy2/rinterface.py", line 817, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in h5checktypeOrOpenLoc(file, readonly = TRUE, fapl = NULL, native = native) : 
  Error in h5checktypeOrOpenLoc(). Cannot open file. File 'NA' does not exist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.