Coder Social home page Coder Social logo

giacomos97 / phenographproject Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 134 KB

Repository dedicated to my project on PhenoGraph.

License: GNU General Public License v3.0

Python 68.41% Jupyter Notebook 31.59%
single-cell-analysis phenograph cell-clusters clustering-algorithm louvain umap

phenographproject's Introduction

PhenoGraph clustering project

This project implements PhenoGraph as clustering method for Single-Cell measurements.
Once provided both the measurements dataset and the spatial coordinates of the cells, this script will perform the necessary steps to generate a spatial representation of the cells clusters. The image below summarizes the steps that make up the process. workflow_representation

Pre-requisites

To run this code, install in your environment the following libraries.
anndata scanpy numpy pandas pathlib plotly datetime configparser uuid

Testing

To perform the testing routine, when in the main directory of this project run pytest testing.py from your python console. The following tests will be performed:

  • test_source_data_reading: verify proper functionality of I/O functions
  • test_clustering_reproducibility: verify whether two identical executions of the clustering process provide the same result
  • test_clustering_reliability: verify proper functionality of clustering process by comparing results with the expected outcome

On an average laptop, this testing routine might require approximately 2 minutes.

How to run the code

Following these instructions you may run the code without performing any adaptation. All user parameters can be customized inside an external configuration file.

First, verify that your input data is structured as expected, and tune the configuration.txt file according to your needs. Once done,you will have two options:

Both implementations contain the same procedure, thus providing equivalent results.
Check messages and results appearing in the console during execution. All plots shown will be saved inside the figures directory you selected.

Structure of input data

The program expects two .csv files:

  • One file containing data on single-cell measurements.
    The document should contain comma separated columns and be organized as: first column as cell number, one row for each cell, one more column for each feature.
    For example:
CD14C01CD16C02HL CD14C01CD16C02HLA CD1aC01SynCAMC02RB ...
1 0.145321181994838 -0.7427901168535 0.8510099217044
2 0.009652348004606 0.14785165558968 -1.0977624202569
3 1.254862485568716 0.25684126698538 -0.0489567128639
...
  • One file containing the spatial coordinates of each cell.
    The document should contain three comma-separated columns (without any header) describing: cell number, x coordinate, y coordinate.
    For example:
1 5 7.5
2 8 7
3 5.2 7
...

Cell number must have exact correspondence in both files (i.e. cell number 1 in measurements file should correspond to cell number 1 in coordinates file). Hence both files should contain the same amount of cells.

Parameters in configuration.txt

[paths]
data_source -> Path of .csv file containing single-cell measurements.

coordinates_source -> Path of .csv file containing cells' coordinates.

figures_dir -> Path where output figures will be stored (automatically created if not existing).

temp_dir -> Path where output temporary files will be stored (automatically created if not existing).

[scanpy_figures]
Parameters listed in this category will be provided to the scanpy.set_figure_params method.

[scanpy_PCA]
n_comps -> Number of components obtained from PCA. Possible values: 0 (mantain all the components available - no reduction), any n>0 (reduce to n components), any n<0 (reduce by n dimensions from the maximum number of available components). Indeed, this value must not exceed the effective number of components available.

init_seed -> Initial seed for the computation.

[scanpy_NEIGHBORS]
n_comps -> Number of components obtained from neighborhood graph. Possible values: 0 (mantain all the components available - no reduction), any n>0 (reduce to n components), any n<0 (reduce by n dimensions from the maximum number of available components). Indeed, this value must not exceed the effective number of components available.

n_neighbors -> Number of nearest neighbors.

init_seed -> Initial seed for the computation.

[scanpy_UMAP]
n_comps -> Number of components obtained from UMAP. Possible values: 0 (mantain all the components available - no reduction), any n>0 (reduce to n components), any n<0 (reduce by n dimensions from the maximum number of available components). Indeed, this value must not exceed the effective number of components available.

init_seed -> Initial seed for the computation.

[scanpy_PHENOGRAPH]
clustering_algo -> Clustering algorithm, whether 'louvain' or 'leiden'.

n_neighbors -> Number of nearest neighbors.

n_jobs -> Number of threads (multiprocessing).

[scanpy_PCA_plot]
Parameters listed in this category will be provided to the sc.pl.pca method.

[scanpy_UMAP_plot]
Parameters listed in this category will be provided to the sc.pl.umap method.

[scanpy_CLUSTERS_plot]
Parameters listed in this category will be provided to the fig.update_layout method.

Tips

Foresee backing up results at intermediate stages of the process

With the current implementation, results obtained at each intermediate step might get lost in case of accidental/unexpected interruption before the complete execution of the process. To mitigate that risk, after each step you may add the line (currently present only at the end):

filename = data_IO.WRITE_ADATA(adata, filename, temp_dir)

This would allow you to store newly obtained results by overwriting the work dataset inside temp directory. At any time, you will be able to retrieve such file and continue the execution of the process from where you left off. To re-import a specific work dataset (check its filename) run:

adata = data_IO.READ_ADATA(filename, temp_dir)

Possible warning messages

References

Check the following references for more details on the parameters available in each function.

Principal component analysis (PCA):
https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.pca.html
Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP):
https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.umap.html
PhenoGraph clustering method:
https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.tl.phenograph.html

Plotting PCA results:
https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.pca.html
Plotting UMAP results:
https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.umap.html

phenographproject's People

Contributors

giacomos97 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.