theislab / campa Goto Github PK

Conditional Autoencoders for Multiplexed Pixel Analysis

License: BSD 3-Clause "New" or "Revised" License

Python 1.74% Jupyter Notebook 98.26%

campa's Introduction

CAMPA - Conditional Autoencoder for Multiplexed Pixel Analysis

CAMPA is a framework for quantiative analysis of subcellular multi-channel imaging data. It consists of a workflow that generates consistent subcellular landmarks (CSLs) using conditional Variational Autoencoders (cVAE). The output of the CAMPA workflow is an anndata object that contains interpretable per-cell features summarizing the molecular composition and spatial arrangement of CSLs inside each cell.

Visit our documentation for installation and usage examples.

Manuscript

Please see our preprint "Learning consistent subcellular landmarks to quantify changes in multiplexed protein maps" (Spitzer, Berry et al. (2022)) to learn more.

Installation

CAMPA was developed for Python 3.9 and can be installed by running:

pip install campa

Contributing

We are happy about any contributions! Before you start, check out our contributing guide.

campa's People

Contributors

Stargazers

Watchers

Forkers

amgfernandes rimanb animesh mohammedzidane

campa's Issues

Minor changes to plotting functions

Update legend title of dotplots

Size of dots in -log(p-value). Edit legend to reflect that.

Co-occ grid: add labels on the y-axis of the grid

Its hard to understand what comparison you are looking at otherwise

Can i use this tool in other data?

Dear Professor,
I am a first-year Ph.D. student, I read your paper "Learning consistent subcellular landmarks to quantify changes in multiplexed protein maps" carefully. But my research direction is not protein, but spatial transcriptomics with subcellular resolution, and I wonder if I can use your tool on my data.

Spatial transcriptomics with subcellular resolution such as Stereo-seq, uses barcodes to represent spatial location information, it also contains expression quantity.

Looking forward to your reply! : )

Spellchecker for documentation

Use spellchecking in documentation

install errors - potential conflict with tensorflow

Dear Developers,

I am trying to install campa on my Apple M1 Max machine;
Please note that I get the same error regardless of whether tensorflow is installed in my conda environment.

Normal installation of tensorflow does not work in my machine, so I had to do following to install tensorflow:
conda install -c apple tensorflow-deps==2.7.0
python -m pip install tensorflow-macos==2.7.0
python -m pip install tensorflow-metal

After installing tensorflow, I can see below:

More details of pip freeze;

absl-py==1.1.0
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.0.5
astunparse==1.6.3
attrs==21.4.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.0
brotlipy @ file:///Users/runner/miniforge3/conda-bld/brotlipy_1648854242877/work
cached-property @ file:///home/conda/feedstock_root/build_artifacts/cached_property_1615209429212/work
cachetools==5.2.0
certifi==2022.6.15
cffi @ file:///Users/runner/miniforge3/conda-bld/cffi_1636046173594/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1644853463426/work
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
conda-package-handling @ file:///Users/runner/miniforge3/conda-bld/conda-package-handling_1649385125392/work
cryptography @ file:///Users/runner/miniforge3/conda-bld/cryptography_1652967108255/work
cycler==0.11.0
czifile==2019.7.2
debugpy==1.6.0
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
executing==0.8.3
fastjsonschema==2.15.3
flatbuffers==1.12
fonttools==4.33.3
gast==0.4.0
google-auth==2.8.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio @ file:///Users/runner/miniforge3/conda-bld/grpc-split_1655728648515/work
h5py @ file:///Users/runner/miniforge3/conda-bld/h5py_1637964045648/work
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1642433548627/work
imageio==2.19.3
importlib-metadata==4.11.4
ipykernel==6.15.0
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==7.7.1
jedi==0.18.1
Jinja2==3.1.2
joblib @ file:///home/conda/feedstock_root/build_artifacts/joblib_1633637554808/work
jsonschema==4.6.0
jupyter==1.0.0
jupyter-client==7.3.4
jupyter-console==6.4.4
jupyter-core==4.10.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.3
libclang==14.0.1
libmambapy @ file:///Users/runner/miniforge3/conda-bld/mamba-split_1649138467459/work/libmambapy
llvmlite==0.38.1
Markdown==3.3.7
MarkupSafe==2.1.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mistune==0.8.4
nbclient==0.6.4
nbconvert==6.5.0
nbformat==5.4.0
nest-asyncio==1.5.5
networkx==2.8.4
notebook==6.4.12
numba==0.55.2
numpy @ file:///Users/runner/miniforge3/conda-bld/numpy_1653325964689/work
oauthlib==3.2.0
opencv-python==4.6.0.66
opt-einsum==3.3.0
packaging==21.3
pandas==1.4.2
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.1.1
prometheus-client==0.14.1
prompt-toolkit==3.0.29
protobuf==3.19.4
psutil==5.9.1
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycosat @ file:///Users/runner/miniforge3/conda-bld/pycosat_1649384941891/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
Pygments==2.12.0
pynndescent==0.5.7
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1643496850550/work
pyparsing==3.0.9
pyrsistent==0.18.1
PySocks @ file:///Users/runner/miniforge3/conda-bld/pysocks_1648857374584/work
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1626286286081/work
pytz @ file:///home/conda/feedstock_root/build_artifacts/pytz_1647961439546/work
PyWavelets==1.3.0
pyzmq==23.2.0
qtconsole==5.3.1
QtPy==2.1.0
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1641580202195/work
requests-oauthlib==1.3.1
rsa==4.8
ruamel-yaml-conda @ file:///Users/runner/miniforge3/conda-bld/ruamel_yaml_1653464548430/work
scikit-image==0.19.3
scikit-learn @ file:///Users/runner/miniforge3/conda-bld/scikit-learn_1652976950430/work
scipy @ file:///Users/runner/miniforge3/conda-bld/scipy_1653074075583/work
seaborn==0.11.2
Send2Trash==1.8.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
soupsieve==2.3.2.post1
stack-data==0.3.0
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow-estimator==2.9.0
tensorflow-macos==2.9.2
tensorflow-metal==0.5.0
termcolor==1.1.0
terminado==0.15.0
threadpoolctl @ file:///home/conda/feedstock_root/build_artifacts/threadpoolctl_1643647933166/work
tifffile==2022.5.4
tinycss2==1.1.1
tornado==6.1
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1649051611147/work
traitlets==5.3.0
typing_extensions==4.2.0
umap-learn==0.5.3
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1647489083693/work
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==2.1.2
widgetsnbextension==3.6.1
wrapt==1.14.1
zipp==3.8.0

Nicer logging of progress bars

Default logging by Keras produces one line per for each update of the progress bar. Could investigate if using tqdm.keras fixes this.

See here: https://stackoverflow.com/questions/41442276/keras-verbose-training-progress-bar-writing-a-new-line-on-each-batch-issue

issues with creating campa.ini with campa setup commandline

Dear Developers,

I am using a lab cluster (cpu) to test CAMPA out.

It appears that /home/bkang/.config/ directory does not exist. Please help!

Documentation review

Here are some comments after looking at the tutorials. What I did was: 1) follow the installation instructions in the CONTRIBUTING file, 2) work through the tutorials, 3) quickly look through the other documentation. I did this all pretty quickly so I probably missed some things and I didn't cover any of the command line stuff other than campa setup. Some of the comments are just ideas/suggestions/questions so don't feel like you have to do/respond to everything. Overall I think it's already really good but hopefully the comments are helpful.

General comments

In the tutorials it would be good to explain the output a bit more. You do a pretty good job of covering the steps and parameters but sometimes I was unsure what the results were. Doing things like showing what/where results have been added to an object or what the columns in a table mean can be really helpful for new starters.
I wonder if it would be worth having another section in the docs for the CLI? I didn't try using this but I'm not sure how easy it would be currently.
There were some steps in the notebooks where I felt like I was working on "magic" variables (things that weren't defined previously). I would try to avoid this as much as possible or at least explain when this happens.
I wasn't sure when things should always be done on subsets or when that was just to make things faster for the examples. It would be good to make that clearer and explain why when things should always be subsetted.
There were some typos/formatting issues in a few places. Nothing major so I didn't mention them here but if you want me to look at them and maybe open a PR with suggestions let me know.

Installation/setup

Some of the linting and tests failed for me (good chance that is something to do with my environment but thought I should mention anyway).
I got this error from campa setup. I was able to run things anyway so I think it still worked by probably worth looking at.

ERROR: None of [PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/campa.ini'), PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/config.ini'), PosixPath('/Users/luke.zappia/.config/campa/campa.ini')] exists. Please create a config with "python cli/setup.py"
experiment_dir not defined in [PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/campa.ini'), PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/config.ini'), PosixPath('/Users/luke.zappia/.config/campa/campa.ini')]
data_dir not defined in [PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/campa.ini'), PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/config.ini'), PosixPath('/Users/luke.zappia/.config/campa/campa.ini')]
co_occ_chunk_size not defined in [PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/campa.ini'), PosixPath('/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/config.ini'), PosixPath('/Users/luke.zappia/.config/campa/campa.ini')]
/Users/luke.zappia/Documents/Code/GitHub-theislab/campa/campa/campa.ini.example
No campa.ini found in /Users/luke.zappia/.config/campa/campa.ini. Creating default config file.
setting up ExampleData config

Introduction

It would be good to have more of an introduction/getting started/overview document. This could cover how to create a CAMPA project but also things like terminology. There were some terms used in the tutorials that I wasn't familiar with and that sometimes make it hard to understand what was happening.

Config file(s)

I was a bit unclear which config file was being used and the search order (the docs made it seem like the global file is preferred over any local one)
I wonder if it would be better to use a more common format like YAML or JSON rather than INI? People are more likely to be familiar with those and could lead to less config issues.
- I also think one of these formats would be better for the params files than a Python script. If this is just a dictionary there shouldn't be any issues converting it and would be easier for people who aren't familiar with Python syntax. Generally, I think that when using a command line tool you shouldn't have to know anything about the language it was written in (or any programming language).
It would be good to have some documentation for the fields of the config files. I feel like there are probably things that aren't included in the examples.
Does the example main config file contain all the possible fields or can it be more complex than this? If this is everything I wonder if needing this could be avoided.
If possible I would avoid using config files in the interactive notebooks and just manually set parameters. I think this will be less confusing to people and give you an opportunity to explain what they do.

Tutorials

It would be good to make it clear when things are using the files on disk and when we are working in memory. I wasn't sure which objects/functions did which.
Related to this it would be good to say when things should usually be done on the command line (because of runtime, memory etc.) and when they are done interactively. You show the CLI commands sometimes but I wasn't sure if it was just they can be used or they should be used.

MPPData

Loading the data here didn't work for me, I had to change the data_config argument from "TestData" to "exampledata"
Could you explain more about what is being loaded? I wasn't sure what the input was or if it had already been processed in some way for the examples.
It would be good to explain more about the object, what it contains, how to access stuff etc. There were times when I wanted to check what a function had done but I didn't know how to access things (this applies to your custom objects more generally).
In subset_channels() maybe it would be good to have an argument that lets you exclude things rather than having to list everything you want to keep?
- It would also be good to have an explanation of why we exclude the "00_EU" channel in this example
In the conditions section I think you need to explain the string format a bit more. The print output does this but it wasn't in the text.
When I ran the normalize() step the output list was empty, unlike what was in the docs (possibly my fault)
It would be good to explain the difference between the subset and subsample functions (and when each should be used)
Is there a reason to have a special case of the get_object_img() function for three channels?
Could the code that loops over and plots all channels be made into a function? I feel like people will want to do this fairly frequently.
- It would also be good if this function labelled the plots, I don't think you can tell which channel is which at the moment

Model

I had to modify the path to the params file. This is probably due to where I was running things but maybe worth mentioning in the text.
I think this had a note that it wasn't finished so you probably know this already but it could do with more detail/explanation.
There was a commented code chunk somewhere, I would delete this if it's not needed
The ModelComparator class is cool! It's great to have inbuilt functionality for this. My output plots were different to what was in the docs though.
I had an error with plot_cluster_images() ('test/VAE/results_epoch020/val/clustering_annotation.csv' missing). Maybe I missed something?

Clustering

Can the exploration functions you define in the notebook be part of the package?
I had an error when loading cells for plotting clusters (TypeError: expected str, bytes or os.PathLike object, not NoneType)
Is the legend bar plot correct? All the values are 1. If you just want it to show the colours I would remove the scale on the x-axis.

Features

Needed to modify the path to the params file again
You might need a smaller example for this notebook. The extract_features() step took 70 mins on my laptop. Possibly that has something to do with my setup but my laptop is more powerful than average. Everything else was really fast so would be good if this bit didn't take too long.
I wasn't sure where the output of extract_intensity_csv() was created?
In the foldchange plot are the circle sizes -log(p-value) rather than just p-values? If so I would update the legend title.
In the co-occurrences distance plot I think you need labels on the y-axis of the grid as well, otherwise it's a bit hard to tell which comparison you are looking at

Function docs

It would be good to have examples for functions, particularly for things like plot to see what they do
There are a few todos notes in places (I'm sure you know that already but just a reminder 😸 )

I hope that helps. Please let me know if anything isn't clear. Congrats again on the package 🎉 !

Tests

add notebooks to pytest
tests for extracting features (shapes correct? values correct?)
test for nn_dataset: correct size, shuffling working?
test for cVAE model creation: do experiment parameters result in correct architecture?

Documentation

extend extract_features notebook
add more description to notebooks
add readthedocs + polish docstrings

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.