Coder Social home page Coder Social logo

scverse / squidpy Goto Github PK

View Code? Open in Web Editor NEW
414.0 9.0 75.0 9.38 MB

Spatial Single Cell Analysis in Python

Home Page: https://squidpy.readthedocs.io/en/stable/

License: BSD 3-Clause "New" or "Revised" License

Python 99.92% Shell 0.08%
single-cell-rna-seq single-cell-genomics spatial-transcriptomics spatial-analysis image-analysis data-visualization squidpy

squidpy's Introduction

PyPI Downloads CI Documentation Coverage Discourse Zulip NumFOCUS

Squidpy - Spatial Single Cell Analysis in Python

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

Visit our documentation for installation, tutorials, examples and more.

Squidpy is part of the scverse project (website, governance) and is fiscally sponsored by NumFOCUS. Please consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

NumFOCUS logo

Manuscript

Please see our manuscript Palla, Spitzer et al. (2022) in Nature Methods to learn more.

Squidpy's key applications

  • Build and analyze the neighborhood graph from spatial coordinates.
  • Compute spatial statistics for cell-types and genes.
  • Efficiently store, analyze and visualize large tissue images, leveraging skimage.
  • Interactively explore anndata and large tissue images in napari.

Installation

Install Squidpy via PyPI by running:

pip install squidpy
# or with napari included
pip install 'squidpy[interactive]'

or via Conda as:

conda install -c conda-forge squidpy

Contributing to Squidpy

We are happy about any contributions! Before you start, check out our contributing guide.

squidpy's People

Contributors

annachristina avatar chaichontat avatar cornhundred avatar davidsebfischer avatar dfhannum avatar dineshpalli avatar djlee1 avatar dplemonade avatar francescadr avatar ghar1821 avatar giovp avatar gottfrid91 avatar grst avatar hspitzer avatar ilan-gold avatar ilibarra avatar ivirshup avatar jo-mueller avatar koncopd avatar linearparadox avatar llehner avatar louisk92 avatar m0hammadl avatar marcovarrone avatar michalk8 avatar mikelkou avatar mxmstrmn avatar pre-commit-ci[bot] avatar sabrinarichter avatar timtreis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

squidpy's Issues

Function to build graph from spatial coordinates

Core function to compute adjancency matrix from spatial coordinates array.

  • It should account for the type of spatial data (hex Visium spots or general spatial coordinates like FISH/IMC).
  • It should also allow flexibility to select the number of neighborhood (for e.g. visium, the number of surrounding circles to the spot, for others the total number of nhoods).

You can look at an initial prototype from Isaac here: scverse/scanpy@c117508

However, we might want to incorporate a notion of radius (in pixel coordinates) that would express better distances in space.

package API

should we define an API of operations that can be run on adata objects? at least in images there is mostly array manipulating functions right now that could be wrapped for this or which could have their own API?

Typing

Yeah, we should probably do it.

feature_evaluation grid search

Description

The cropping function in manipulate.py and get_image_feature.py file now take **kwargs for easy passing of multiple cropping parameters to perform grid search. Remaining is the evaluation script in hs_feature_evaluation.ipynb.

  1. Add function to compute cluster quality score for grid search comparisons
  2. finish the evaluation loop in hs_feature_evaluation.ipynb computing features, clusters and cluster quality score for each crop setting. See notebook for more detail.

NOTE: most likely one need multithreading for feasability of all features

Permutation based test

Permutation based test as in histoCAT to compute neighborhood enrichment based on clusters in gene expression space. For reference, see paper

tutorial / analysis of image features

  • write a clean tutorial on how to use the implemented features
  • analysis: combine genes + features / compare clusters, overlap, etc. Look if we can find interesting things in the data with the new image features!

Agenda 24-09

Points to be discussed today with the group @hspitzer :

  • package organisation
    • new package api - @davidsebfischer comment in the API-Issue #39
    • cleanup / merging of scripts (_utils.py vs tools.py) + notebook organisation (naming convention + separate folders for devel and finished tutorials)
  • programming best practices
    • settle on one way of doing docstrings
    • write tests
    • develop in feature branches + create PRs.
  • make sure everybody knows what they can/should be working on.

Immediate next steps (for everyone):

  • adapt to new api, cleanup files
  • document + test

image plotting functions

which plotting functions do we need for the images?

  • plot feature (from adata.obsm) on image
    • for downscaled pngs this already works with scanpy
  • plot overlays on image (e.g. segmentation)
  • plot image crops (with features / overlays)
    • useful to speed up plotting or view details

for interactive plotting we could use napari, but I think that some static plotting functions would be useful as well. Then this would behave in a very similar way to scanpy, making it easy for the users to adopt.

Agenda 1-10

Points to be discussed

  • images:
    • features?
  • spatial graph:
    • Integrate cellphoneDb/omnipath to neighborhood enrichment to provide ligand receptor pair in a straighforward way (Giovanni)
    • Ripley function relies on Astropy: include it as rquirements or re-implement from scratch ?
    • Clustering with spatial coordinate: remove or adapt?
  • plotting:
    • move outside of both graph and image?
  • General:
    • Include other people?

Multi threading of feature calculation

Implement a multi-threaded/processed version of the feature calculation code.

Description:

  • Current function "get_features_abt.py" in ./spatial_tools/image/tool.py accumulates feature results from multiple spot ids into one data frame by looping trough the spot_ids and adding them sequentially.

  • For certain features, e.g. HOG, this takes to long. Currently in the notebook hs_extract_image_features_table-multi-threading.ipynb there is a multi-threaded version implemented but its not does improve speed much.

Possible reasons for not working:

  1. currently all threads/processes are accessing the same image to extract crop and features in parallel. This might cause queuing times. A possible solution would be to pre-compute the crops and then apply parallelization.

statistic on node type connections in graph

when looking further into graph measures and metrics, I saw this networkx function:
https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.node_attribute_xy.html#networkx.algorithms.assortativity.node_attribute_xy

would be quite easy to extract the clusters, i.e. node types which are often connected in the graph to maybe conclude something on cell-cell-interactions

e.g. you can use this information to also check if specific genes show higher expressions if those cell types are close to each other

Logging

Shall we have a specific logging class like scanpy
or standard logging should do the job?

Agenda 8-10

roadmap

  • we now have a roadmp, check it out in projects

development practices:

  • is it good time for typing?
  • remember to format with black and use pylint
  • how does it work with counting contributions in the repo? seems like whoever squash and merge is counted as contributor, and not who actually wrote the code. Does it make sense? Shall we always make the author of the PR merge?
  • need to discuss about push rights
  • need to discuss about common dataset to be stored in repo for testing and CI

next steps

  • discuss next tasks for everyone

relates image features to discrete annotation (e.g. clusters) or continuous annotations (genes)

@LuckyMD point during GM: provide ways to relate image features to discrete annotation, such as clusters, found in either gene expression space or image feature space, or continuous annotations, such as genes. Ways that this could be done:

  • correlates e.g. marker genes with image features @LouisK92
  • regression framework: Y= AX + B where X are image features, and Y are discrete or continuous annotations.
  • others

Account for cavities in crops and spatial graph

basically, if you extract features on a crop larger than spot sizes, how do you account for crop that doesn't have tissue anymore?
Also, for spatial graph the same, maybe sometime the cavity is a blood vessel and you want to keep the spots close to each other

functional API cropping

i think cropping should be possible on adata object, ie creating new img entries, so that cropping can be varied in functional API workflows, eg load->crop->segment->uncrop->plot. I have a uncrop function on the segmentation branch already, we just need to wrap these in functions that sit on adata

define setup file

would like to know which packages are optional and should go into functions and which are not

Counting by edges inflating z-scores for high number of n_rings.

The function permtest_leiden_pairs works well for nodes so far. However, it reports inflated number of observed edges between leiden pairs when n_rings is 2 or higher. This can be visually inspected in visium cases where the top selected leiden pairs actually separated by many nodes.

spatial_connectivity(adata, n_rings=2)
permtest_leiden_pairs(adata, count_option='edges',
                              print_log_each=25, n_permutations=n_permutations)

I can attach images for this issue. It needs to be however tackled to allow the mode edges to be accepted.

implement methods for segmenting cells / nucleii from histopathological tissue images / fluorescence images

Benefit: knowledge of cell count in each spot is very important for simulation for deconvolution, and for analysing data in general. On top of cell segmentations could calculate shape / size statistics as additional features

test_get_image_features is failing

the errors occurs in this line: spot_diameter = adata.uns['spatial'][dataset_name]['scalefactors']['spot_diameter_fullres']
Its failing because the dummy adata does not have `.uns['spatial'] containing the spot_diamteres.

Additional features for pl.spatial function

  • Additional function parameters / changed functionality / changed defaults?
  • New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
  • New plotting function: A kind of plot you would like to see in sc.pl?
  • External tools: Do you know an existing package that should go into sc.external.*?
  • Other?

Here's some features that we should add after the scatterplot module is refactored:

  • Add scalebar on the tissue image as suggested on Twitter .
  • Change default to “plot hires if available, else lowres if available, else nothing” .
  • Add alpha smoothing for continuous features as in Seurat .

I'll modify this issue if other ideas pop up.

scaling of image features

Problem: Distributions of different features extracted from images might have very different value magnitudes. This will introduce biases etc. in downstream analyses (e.g. clustering).

Ideally we can scale all features between 0 and 1. This might be tricky: If a feature has natural and reasonable min, max it's easily done. For other features we could set 0 and 1 according the observed min, max values (not ideal imo). Happy to discuss further.

evaluation of different features

Some ideas:

  • Without ground truth: look at how good a clustering based on the features is (eg. silhouette score?)
  • With ground truth: Use gene expression space as GT (e.g. different cell types in brain have different morphology), and compare feature clustering to cell type clusters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.