Please read the documentation.
You can install pertpy via pip from PyPI:
pip install pertpy
if you want to use scCODA please install it as:
pip install pertpy[coda]
Perturbation Analysis in the scverse ecosystem.
Home Page: https://pertpy.readthedocs.io/en/latest/
License: MIT License
Please read the documentation.
You can install pertpy via pip from PyPI:
pip install pertpy
if you want to use scCODA please install it as:
pip install pertpy[coda]
Title
Noting down issues to fix and potential improvements to milopy, encountered while writing tutorial (theislab/pertpy-tutorials#4) - issues entirely inherited from my code ofc :)
milo.add_covariate_to_nhoods_var
add_covariate_to_nhoods_obs
and update docscategory
to object
and then nhood_counts_by_cond
complainspt.pl.milo.nhood_counts_by_cond
add_covariate_to_nhoods_var
)milo.da_nhoods
:
subset_samples
and subset_nhoods
is buggy, in some edge cases this messes up the SpatialFDR calculation. While the bug is easy enough to catch with diagnostic plots (randomized SpatialFDR) and to fix (emdann/milopy#30), I'm wondering whether these parameters should be dropped completely in the new implementation in pertpy. The use case of subset_samples
is essentially covered by specifying contrasts (+ adding a new column in adata.obs if needed to specify groups). Using subset_nhoods
is somewhat of a pathological hack (I don't think I've ever used it and I can't even remember the original use case for adding this parameter). In practice if one wants to test on a specific subset of cell phenotypes the best solution is to subset cells and restart from KNN graph building.Happy to open a PR to implement these changes.
Would be great to have support for Perturb-ATAC in pertpy.
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116249 has the ATAC data. Not sure where we get the information on the perturbations from?
The first row of this data are the cells (remove the suffixes) and the chromosomes are the var. Rest are counts.
Will come up with a plan to work with this.
Currently not showing up
Write dataloaders for all of
https://zenodo.org/record/7278143
that we are not covering yet.
Source: http://projects.sanderlab.org/scperturb/
I installed the latest version of pertpy from GitHub pull but getting an error at the step below:
> milo = pt.tl.Milopy()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[47], line 1
----> 1 milo = pt.tl.Milopy()
AttributeError: module 'pertpy.tools' has no attribute 'Milopy'
However this works-
> milo = pt.tl.Milo()
The tutorial needs to be updated.
No response
Currently folders in tl and pl. Not totally happy about it and the resulting docs.
I shall think about it. Likely has to be a class.
We received the request to add support for guide RNA assignment.
I am trying to using code from the "Compositional Analysis" tutorial from Single Cell Best Practices to analyze compositional differences in cell types along two dimensions (two covariates). When I run sccoda_model.load()
using my two covariates ('age' and 'region', see below):
sccoda_model = pt.tl.Sccoda() sccoda_data = sccoda_model.load( adata, type="cell_level", generate_sample_level=True, cell_type_identifier="cell_type", sample_identifier="replicate", covariate_obs=["age","region"], )
I get the following error:
"Covariate region has non-unique values! Skipping...". However, the categories in "region" are unique. The same results when I run the above with "region" as the sole covariate, but not when I run with "age" as the sole covariate. Both are categorical types in pandas series. Any ideas on the source of this issue would be greatly appreciated, thanks!
No response
The link to the usage in readme doesn't work. If this issue is not closed in one of the PRs, I suggest changing it from
https://pertpy.readthedocs.io/en/latest/usage.html
to
https://pertpy.readthedocs.io/en/latest/usage/usage.html
Just like I did for ehrapy
To be able to validate our approach and to compare against mixscape we need to run the complete mixscape vignette on our data (and ideally their data if available).
We should benchmark the speed of all tools to highlight that our versions are faster and scale better.
Let's use our cluster and keep the runtime environment consistent -> always the same pertpy version (please start with the current main branch because I made lots of updates...) and the same number of CPUs/RAM on our cluster.
The tutorials can be found here: https://pertpy.readthedocs.io/en/latest/tutorials/index.html . The respective tutorials of the tools define the pipeline that we want to benchmark.
We always use the datasets that are defined in the pertpy dataloaders. They should also always exist for the R versions and the other versions. Always collect the runtime.
The intelex patch could make the random forest and logistic regression computations several times faster.
For reference: https://github.com/intel/scikit-learn-intelex
Example implementation:
https://github.com/theislab/ehrapy/blob/development/ehrapy/preprocessing/_data_imputation.py#L217
python:
version: 3.8
install:
- method: pip
path: .
- requirements: docs/requirements.txt
in .readthedocs.yml
Following the mixscape tutorial
pt.pl.ms.barplot(mdata['rna'])
results in
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
-> 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'NT'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/home/zeth/PycharmProjects/extended-single-cell-best-practices/jupyter-book/conditions/perturbation_modeling.ipynb Cell 139 in <cell line: 1>()
----> [1](vscode-notebook-cell:/home/zeth/PycharmProjects/extended-single-cell-best-practices/jupyter-book/conditions/perturbation_modeling.ipynb#ch0000137?line=0) pt.pl.ms.barplot(mdata['rna'])
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pertpy/plot/_mixscape.py:71, in MixscapePlot.barplot(adata, control, mixscape_class_global, axis_text_x_size, axis_text_y_size, axis_title_size, strip_text_size, panel_spacing_x, panel_spacing_y, legend_title_size, legend_text_size, show, save)
69 if mixscape_class_global not in adata.obs:
70 raise ValueError("Please run `pt.tl.mixscape` first.")
---> 71 count = pd.crosstab(index=adata.obs[mixscape_class_global], columns=adata.obs[control])
72 all_cells_percentage = pd.melt(count / count.sum(), ignore_index=False).reset_index()
73 KO_cells_percentage = all_cells_percentage[all_cells_percentage[mixscape_class_global] == "KO"]
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pandas/core/frame.py:3505, in DataFrame.__getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
File ~/miniconda3/envs/bp_perturbation_modeling/lib/python3.8/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
-> 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
KeyError: 'NT'
compare to: mixscape perturbed vs not-perturbed states
Thinks about how to visualize:
How to visualize :
Distances to implement
Tests to implement
Implementation plan
_distances.py
to tools in pertpyAlso in tutorials + book
pt.tl.Dialogue().calculate_multifactor_PMD()
adds MCPs to .obs
with int
keys instead of str
leading to error on adata.write_h5ad()
:
Traceback (most recent call last):
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/utils.py", line 214, in func_wrapper
return func(elem, key, val, *args, **kwargs)
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 175, in write_elem
_REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/specs/registry.py", line 24, in wrapper
result = func(g, k, *args, **kwargs)
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 499, in write_dataframe
col_names = [check_key(c) for c in df.columns]
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/specs/methods.py", line 499, in <listcomp>
col_names = [check_key(c) for c in df.columns]
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/utils.py", line 109, in check_key
raise TypeError(f"{key} of type {typ} is an invalid key. Should be str.")
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/eliaskahl/workspace/github.com/theislab-gobi-sc-dialogue/dialogue-experiments-elias/mre-mcp-str-keys.py", line 19, in <module>
adata.write_h5ad("data/dialogue-example_pmd.h5ad")
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_core/anndata.py", line 1918, in write_h5ad
_write_h5ad(
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/h5ad.py", line 98, in write_h5ad
write_elem(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
File "/Users/eliaskahl/miniconda3/envs/pertpy-apple-silicon/lib/python3.10/site-packages/anndata/_io/utils.py", line 220, in func_wrapper
raise type(e)(
TypeError: 0 of type <class 'int'> is an invalid key. Should be str.
Minimal reproducible example:
import pertpy as pt
import scanpy as sc
import session_info
session_info.show(html=False, dependencies=True)
adata = pt.dt.dialogue_example()
sc.pp.pca(adata)
dl = pt.tl.Dialogue()
adata, mcps, ws, ct_subs = dl.calculate_multifactor_PMD(
adata,
sample_id="clinical.status",
celltype_key="cell.subtypes",
n_counts_key="nCount_RNA",
normalize=True,
)
adata.write_h5ad("data/dialogue-example_pmd.h5ad")
Session information updated at 2023-03-01 13:34
Hello! As a python/scanpy user, I am happy to see that you are working on a scverse impelementation of mixscape. I have attempted to follow along with this notebook : https://github.com/theislab/pertpy-tutorials/blob/main/mixscape.ipynb using a third-party dataset (this. I have been using pertpy 0.3
I have had difficulty, however, because the names of some columns/features are occasionally required to have a certain value. For example, pt.tl.Mixscape.pert_sign
apparently requires that the supplied AnnData
object has a column in its .obs
table of the format <gene_target>g<#>
that has the same name as the nontargeting control. As far as I can tell, this is not documented.
Similarly, in pt.tl.Mixscape.lda
value of the labels
keyword argument must be 'gene_target' for proper function, since that is hard-coded elsewhere in the function.
These are just the functions that I have been able to make work. pt.pl.ms.barplot
fails to work, and I cannot decipher the traceback to understand how to reconfigure my AnnData.obs
table properly. I also cannot find how to make pt.pl.ms.lda
work properly, since some array is getting reshaped during function runtime in ways that I cannot understand.
If this pipeline is working as designed, I would greatly appreciate a tool or guide to make sure that my AnnData.obs
table has all required columns formatted properly.
The goal of this module should be around answering questions about high-level biological changes and enrichment analysis. For example which terms will appear after a certain perturbation?
@ilibarra what would be ideas to implement here?
https://github.com/theislab/jaxpert/tree/main/jaxscgen/jaxscgen
Will need to figure out how to make the dependencies optional. Jax is not really installable on Windows.
A reimplementation of https://github.com/livnatje/DIALOGUE could be awesome.
Then we could also talk to @davidsebfischer et al. to see how far their MCP method is.
We have multiple datasets that we need to prepare:
https://www.nature.com/articles/s41588-021-00778-2 (mixscape)
Christians Bocks dataset (we already have it)
Hanifta large scale cite-se datasets with covid 19 (Chris has the data, large-scale CITE-seq, maybe goo to show scale)
We want to integrate https://github.com/emdann/milopy into pertpy with an updated version
Tasks (roughly in order) are:
_milopy.py
file.Hello!
When I'm trying to reproduce the tutorial with a different dataset, I receive the following error in the function Mixscape.mixscape
.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[70], line 1
----> 1 mixscape_identifier.mixscape(adata = adata, control = 'control', labels='perturbation', layer='X_pert')
File ~/miniconda3/envs/pertpy/lib/python3.9/site-packages/pertpy/tools/_mixscape.py:244, in Mixscape.mixscape(self, adata, labels, control, new_class_name, min_de_genes, layer, logfc_threshold, iter_num, split_by, pval_cutoff, perturbation_type, copy)
240 vec = np.mean(X[guide_cells][:, de_genes_indices], axis=0) - np.mean(
241 X[nt_cells][:, de_genes_indices], axis=0
242 )
243 # project cells onto the perturbation vector
--> 244 pvec = np.sum(np.multiply(dat.toarray(), vec), axis=1) / np.sum(np.multiply(vec, vec))
245 pvec = pd.Series(np.asarray(pvec).flatten(), index=list(all_cells.index[all_cells]))
246 if n_iter == 0:
AttributeError: 'numpy.ndarray' object has no attribute 'toarray'
Nevertheless, it doesn't happen when running with the dataset of the tutorial. I think that the problem is that Mixscape.pert_sign
does not create .layers['X_pert']
as a sparse matrix, but as a normal numpy array.
Thanks!
No response
Likely pertpy-tutorials as a submodule with the new docstyle that we're using at ehrapy.
As discussed, we should not use test data is input for the tests, but rather design artificial AnnData objects with clear distribution shifts that we can test for.
Our current milo implementation uses edgeR from R to make a few calculations here: https://github.com/theislab/pertpy/blob/development/pertpy/tools/_milo.py#L310 . However, we want pertpy to be a pure Python experience. Hence, we want to offer an alternative that does NOT use edgeR but instead uses statsmodels to do a very similar calculations.
https://pertpy.readthedocs.io/en/latest/usage/usage.html#data should not just be scperturb
Describe the bug
Unable to install using pip on M1 mac (Tessa and @stefanpeidli)
To Reproduce
mamba create -n pertpy_env python=3.9 conda activate pertpy_env pip install pertpy
(pertpy_env) stefanpeidli@Stefans-MBP ~ % pip install pertpy Collecting pertpy Using cached pertpy-0.3.0-py3-none-any.whl (82 kB) Collecting muon>=0.1.2 Using cached muon-0.1.2-py3-none-any.whl (287 kB) Collecting pypi-latest>=0.1.1 Using cached pypi_latest-0.1.2-py3-none-any.whl (10 kB) Collecting scipy<2.0.0,>=1.9.3 Downloading scipy-1.10.0-cp39-cp39-macosx_12_0_arm64.whl (28.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
28.9/28.9 MB 12.3 MB/s eta 0:00:00
Collecting PyYAML>=5.4.1
Downloading PyYAML-6.0-cp39-cp39-macosx_11_0_arm64.whl (173 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
174.0/174.0 kB 8.3 MB/s eta 0:00:00
Collecting arviz<0.15.0,>=0.14.0
Using cached arviz-0.14.0-py3-none-any.whl (1.7 MB)
Collecting ipywidgets>=7.6.5
Downloading ipywidgets-8.0.4-py3-none-any.whl (137 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 137.8/137.8 kB 8.5 MB/s eta 0:00:00
Collecting scanpy>=1.8.1
Downloading scanpy-1.9.1-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 12.1 MB/s eta 0:00:00 Collecting pyqt5<6.0.0,>=5.15.7 Using cached PyQt5-5.15.9.tar.gz (3.2 MB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [29 lines of output]
Traceback (most recent call last):
File "/Users/stefanpeidli/miniconda3/envs/pertpy_env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 144, in prepare_metadata_for_build_wheel
hook = backend.prepare_metadata_for_build_wheel
AttributeError: module 'sipbuild.api' has no attribute 'prepare_metadata_for_build_wheel'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/stefanpeidli/miniconda3/envs/pertpy_env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
main()
File "/Users/stefanpeidli/miniconda3/envs/pertpy_env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/stefanpeidli/miniconda3/envs/pertpy_env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 148, in prepare_metadata_for_build_wheel
whl_basename = backend.build_wheel(metadata_directory, config_settings)
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/sipbuild/api.py", line 46, in build_wheel
project = AbstractProject.bootstrap('wheel',
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/sipbuild/abstract_project.py", line 87, in bootstrap
project.setup(pyproject, tool, tool_description)
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/sipbuild/project.py", line 585, in setup
self.apply_user_defaults(tool)
File "project.py", line 68, in apply_user_defaults
super().apply_user_defaults(tool)
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/pyqtbuild/project.py", line 70, in apply_user_defaults
super().apply_user_defaults(tool)
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/sipbuild/project.py", line 236, in apply_user_defaults
self.builder.apply_user_defaults(tool)
File "/private/var/folders/fr/z238tpm931s466144gcdpdj40000gn/T/pip-build-env-hvo51gpt/overlay/lib/python3.9/site-packages/pyqtbuild/builder.py", line 69, in apply_user_defaults
raise PyProjectOptionException('qmake',
sipbuild.pyproject.PyProjectOptionException
[end of output]
` note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`
Expected behavior
Installs and loads
System [please complete the following information]:
Additional context
brew install of Qt5 seems like it might be necessary, but we're still having trouble with it freezing on the license step for pyqt5 install even after that
We should provide tooling for a "perturbation dictionary" to allow more metadata to be easily fetched and added to perturbations. To facilitate this we query a couple of databases to annotate things. The actual API and design needs to be discussed.
Things that we have on the table but never wrote code for:
https://github.com/theislab/cpa/tree/main/docs/tutorials
Useful for the book
Is it possible to install with version compatible with latest python? Otherwise it's hard to work with updated pacakges like scvi-tools that require python >3.10.
> pip install pertpy
ERROR: Ignored the following versions that require a different python version: 0.1.0 Requires-Python >=3.8.0,<3.10; 0.2.0 Requires-Python >=3.8.0,<3.10; 0.3.0 Requires-Python >=3.8.0,<3.10
ERROR: Could not find a version that satisfies the requirement pertpy (from versions: none)
ERROR: No matching distribution found for pertpy
No response
Bug is occurring on server and on my laptop (M1). Both use conda. Python version 3.10.6.
I create a new environment following the directions in the contributor guide. poetry install doesn't install pyomo, so I get an error when running poetry run pertpy
from that, then conda install pyomo, run poetry install
again, which seems like it works, and then when I try to actually import and use the package I get:
``
Installed version 0.4.0 of pertpy is newer than the latest release 0.3.0! You are running a nightly version and features may break! ryp2 is not installed. Install with pip install rpy2 to run tools with R support. To use sccoda or tasccoda please install ete3 with pip install ete3 Traceback (most recent call last): File "<string>", line 1, in <module> File "/n/groups/marks/software/anaconda_o2/envs/pertpy-dev/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'pertpy.__main__'
I tried installing from github via pip, which will load into python but won't let me run nox without erroring (" The Poetry configuration is invalid" error when running tests or attempting make poetry install again
No response
It would be great to have support in pertpy for complex mixed effect models and drug dose response prediction. Some things are in CPA, but I'd like to see CPA being published first before we consider integration a fast JAX based implementation.
Other options:
-> @moinfar please talk to @MO-L about this. He had an idea here once. As soon as we know exactly what this is supposed to look like we'll discuss it in this issue and plan the implementation.
We need to have a function like perp.composition(adata, method, cell_type:...)
. I have already talked to Beni, it seems scCODA already has good wrappers for statistical methods (some in R but they have wrapper for them) and has already 12 test statistics implemented
We need to include them in here and for some of them reimplement (fastest ones).
For now, we can use the ones that are already there to reimplement stuff we have to talk to Johannes who maintains them.
<Processing data... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% -:--:--
extracting highly variable genes
finished (0:00:00)
--> added
'highly_variable', boolean vector (adata.var)
'means', float vector (adata.var)
'dispersions', float vector (adata.var)
'dispersions_norm', float vector (adata.var)
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
r = call_item()
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call
return self.fn(self.args, **self.kwargs)
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/parallel_backends.py", line 595, in _call
return self.func(args, kwargs)
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/parallel.py", line 262, in call
return [func(*args, kwargs)
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(args, *kwargs)
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/pertpy/tools/_augurpy.py", line 318, in cross_validate_subsample
subsample = self.draw_subsample(
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/pertpy/tools/_augurpy.py", line 281, in draw_subsample
return self.sample(
File "/home/niklas/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/pertpy/tools/_augurpy.py", line 234, in sample
subsample.var["var"] = np.ravel(subsample.X.power(2).mean(axis=0) - np.power(subsample.X.mean(axis=0), 2))
AttributeError: 'numpy.ndarray' object has no attribute 'power'
"""
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 ag_rfc.predict(loaded_data1,
2 n_subsamples=20, # number of random subsamples
3 subsample_size=10, # number of cells to subsample
4 folds=3,
5 min_cells=200)
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/pertpy/tools/_augurpy.py:703, in Augurpy.predict(self, adata, n_subsamples, subsample_size, folds, min_cells, feature_perc, var_quantile, span, filter_negative_residuals, n_threads, augur_mode, select_variance_features, key_added, random_state, zero_division)
698 print(
699 f"[bold red]Skipping {cell_type} cell type - the number of samples for at least one class type is less than "
700 f"subsample size {subsample_size}."
701 )
702 else:
--> 703 results[cell_type] = Parallel(n_jobs=n_threads)(
704 delayed(self.cross_validate_subsample)(
705 adata=cell_type_subsample,
706 augur_mode=augur_mode,
707 subsample_size=subsample_size,
708 folds=folds,
709 feature_perc=feature_perc,
710 subsample_idx=i,
711 random_state=random_state,
712 zero_division=zero_division,
713 )
714 for i in range(n_subsamples)
715 )
716 # summarize scores for cell type
717 results["summary_metrics"][cell_type] = self.average_metrics(results[cell_type])
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/parallel.py:1056, in Parallel.call(self, iterable)
1053 self._iterating = False
1055 with self._backend.retrieval_context():
-> 1056 self.retrieve()
1057 # Make sure that we get a last message telling us we are done
1058 elapsed_time = time.time() - self._start_time
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/parallel.py:935, in Parallel.retrieve(self)
933 try:
934 if getattr(self._backend, 'supports_timeout', False):
--> 935 self._output.extend(job.get(timeout=self.timeout))
936 else:
937 self._output.extend(job.get())
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/site-packages/joblib/_parallel_backends.py:542, in LokyBackend.wrap_future_result(future, timeout)
539 """Wrapper for Future.result to implement the same behaviour as
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/concurrent/futures/_base.py:444, in Future.result(self, timeout)
442 raise CancelledError()
443 elif self._state == FINISHED:
--> 444 return self.__get_result()
445 else:
446 raise TimeoutError()
File ~/miniconda3/envs/niche_fibrosis_env_pertpy/lib/python3.8/concurrent/futures/_base.py:389, in Future.__get_result(self)
387 if self._exception:
388 try:
--> 389 raise self._exception
390 finally:
391 # Break a reference cycle with the exception in self._exception
392 self = None
AttributeError: 'numpy.ndarray' object has no attribute 'power'
>
Details to reproduce on MM.
CC @niklaslang
I'm running the composition analysis per the tutorial and encountering an error at the tree generation step as below:
> tasccoda_model = pt.tl.Tasccoda()
> tasccoda_data = tasccoda_model.load(
adata,
type="cell_level",
cell_type_identifier="nsbm_level_1",
sample_identifier=batch_key,
covariate_obs=["genotype"],
levels_orig=["nsbm_level_4", "nsbm_level_3", "nsbm_level_2", "nsbm_level_1"],
add_level_name=True,
)
> tasccoda_data
MuData object with n_obs × n_vars = 23039 × 18297
var: 'n_cells'
2 modalities
rna: 23034 x 18227
obs: 'genotype', 'replicate', 'batch', 'scDblFinder_class', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_20_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'total_counts_hb', 'log1p_total_counts_hb', 'pct_counts_hb', 'outlier', 'mt_outlier', 'size_factors', 'leiden', 'leiden_res0_25', 'leiden_res0_5', 'leiden_res1', 'S_score', 'G2M_score', 'phase', 'manual_celltype_annotation', 'gt_rep', 'scCODA_sample_id', 'nsbm_level_0', 'nsbm_level_1', 'nsbm_level_2', 'nsbm_level_3', 'nsbm_level_4', 'nsbm_level_5', 'nsbm_level_6'
var: 'gene_ids', 'feature_types', 'genome', 'mt', 'ribo', 'hb', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_deviant', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
uns: 'genotype_colors', 'gt_rep_colors', 'hvg', 'leiden', 'leiden_res0_25_colors', 'leiden_res0_5_colors', 'leiden_res1_colors', 'manual_celltype_annotation_colors', 'neighbors', 'pca', 'phase_colors', 'replicate_colors', 'tsne', 'umap', 'schist', 'nsbm_level_1_colors', 'nsbm_level_2_colors'
obsm: 'X_pca', 'X_tsne', 'X_umap', 'CM_nsbm_level_0', 'CM_nsbm_level_1', 'CM_nsbm_level_2', 'CM_nsbm_level_3', 'CM_nsbm_level_4', 'CM_nsbm_level_5', 'CM_nsbm_level_6'
varm: 'PCs'
layers: 'PFlog1pPF_normalization', 'counts', 'log1pPF_normalization', 'raw_counts', 'scTransform_normalization', 'scran_normalization', 'soupX_counts'
obsp: 'connectivities', 'distances'
coda: 5 x 70
obs: 'genotype', 'gt_rep'
var: 'n_cells'
uns: 'tree'
> pt.pl.coda.draw_tree(tasccoda_data)
File ~/opt/anaconda3/envs/scanpy/lib/python3.10/site-packages/ete3/coretype/tree.py:1392, in TreeNode.render(self, file_name, layout, w, h, tree_style, units, dpi)
1388 return drawer.get_img(self, w=w, h=h,
1389 layout=layout, tree_style=tree_style,
1390 units=units, dpi=dpi, return_format=file_name)
1391 else:
-> 1392 return drawer.render_tree(self, file_name, w=w, h=h,
1393 layout=layout, tree_style=tree_style,
1394 units=units, dpi=dpi)
File ~/opt/anaconda3/envs/scanpy/lib/python3.10/site-packages/ete3/treeview/drawer.py:111, in render_tree(t, imgName, w, h, layout, tree_style, header, units, dpi)
109 scene.addItem(scene.master_item)
110 if imgName.startswith("%%inline"):
--> 111 imgmap = save(scene, imgName, w=w, h=h, units=units, dpi=dpi)
112 else:
113 x_scale, y_scale = save(scene, imgName, w=w, h=h, units=units, dpi=dpi)
File ~/opt/anaconda3/envs/scanpy/lib/python3.10/site-packages/ete3/treeview/main.py:752, in save(scene, imgName, w, h, dpi, take_region, units)
750 else:
751 targetRect = QRectF(0, 0, w, h)
--> 752 ii= QImage(w, h, QImage.Format_ARGB32)
753 ii.fill(QColor(Qt.white).rgb())
754 ii.setDotsPerMeterX(dpi / 0.0254) # Convert inches to meters
TypeError: arguments did not match any overloaded call:
QImage(): too many arguments
QImage(size: QSize, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(width: int, height: int, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(data: bytes, width: int, height: int, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(data: PyQt5.sip.voidptr, width: int, height: int, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(data: bytes, width: int, height: int, bytesPerLine: int, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(data: PyQt5.sip.voidptr, width: int, height: int, bytesPerLine: int, format: QImage.Format): argument 1 has unexpected type 'float'
QImage(xpm: List[str]): argument 1 has unexpected type 'float'
QImage(fileName: str, format: typing.Optional[str] = None): argument 1 has unexpected type 'float'
QImage(a0: QImage): argument 1 has unexpected type 'float'
QImage(variant: Any): too many arguments
Can you please guide me how to fix this?
Session information updated at 2023-03-01 21:14
https://depmap.org/portal/download/all/?releasename=DepMap+Public+22Q2&filename=CCLE_expression.csv
Instead of one-hot encoding cell lines we could throw in the whole bulk expression as input into classification models. It might be required to PCA those though.
Maybe also offer PCAed versions of those in the first place.
Describe the bug
To Reproduce
Steps to reproduce the behavior:
pyqt5
, ete3
and pertpy
import ete3
works fineimport pertpy as pt
-> shows warning for ete3Actual behaviour
Warning appears:
To use sccoda or tasccoda please install ete3 with `pip install ete3`
Expected behavior
No warning when ete3
available.
System [please complete the following information]:
# environment.yaml
name: pertpy-apple-silicon
channels:
- conda-forge
dependencies:
- python=3.9
- rpy2
brew install qt5
# make sure to add all the environment variables to your .zshrc
# as described at the end of the output of the brew install command
# then reload your shell:
zsh
create environment with mamba (conda should work as well, but I haven't tested it)
mamba create -f environment.yaml -n pertpy-apple-silicon
mamba activate pertpy-apple-silicon
pip install pyqt5 --config-settings --confirm-license= --verbose
pip install ete3
pip install git+https://github.com/theislab/pertpy.git
Additional context
Bug in pertpy/tools/init.py#L10
There are faulty import paths getting masked by the catching of the ImportError for the ete3 warning.
from pertpy.tools.coda._sccoda import Sccoda
works fine.
Follow up on #201
Data not normalized. Normalizing now using scanpy log1p normalize.
not the best message.uns
Add new implementation of tascCODA together with an associated new tutorial.
The idea is the central approach to first visual analysis, similar to mixescape where they perform 20 nn analysis and subtract the mean of top n non-perturbed analysis and then visualize cells.
I would suggest a more sophisticated method compared to PCA representation here :
in addition to that we can ->
We can come up with a asbtract vector for each of those embeddings for each pertrubation conditioned on cell-types not cell type meaning: for each pertrubation in the latent space we average representation for each pertrbation and plot a latent pertrbation space.
Option 1 : use a super light version of scGen which takes less than 5 mins to run. @Koncopd and then benchmark against approach 2, I think we can have it there any way
Option 2: very light CVAE with embeddings as conditions
option 3: very light CPA
all options in approach 2 would allow to have latent space representation per cell, embeddings for cell-types, pertrubations.
pertpy/scanpy related note (not sure what to do with it): in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8822455/ under scBasal and scClassical programs, they use a score (like from sc.tl.score_genes ) and then infer the markers most representative of that score using plain correlations and FDR. Is there something like this already and if not, can we trivially add it to something? I think it’s super useful and also a much better way of identifying gene programs.
I shall look into this
Thank you for opening the script!
I used the following jupyter-notebook as a reference to filter my data at hand and enter the following command:
https://github.com/theislab/pertpy-tutorials/blob/main/mixscape.ipynb
mixscape_identifier = pt.tl.Mixscape()
mixscape_identifier.pert_sign(adata, 'perturbation', 'NT')
mixscape_identifier.mixscape(adata = adata, control = 'NT', labels='grna')
I got the mislabeling result that KO labels were reversed in some gRNAs (see RPL11, LDB1).
Since RACK1 is labeled correctly, I do not believe it is an alphabetical issue.
I am wondering if this result is due to the labeling of KO on the higher cell number of gRNA-transfected cells.
Could you please tell us why this is happening?
The ECCITE-seq data consists of transcriptomics and surface protein data. Both types of data require quality control and preprocessing to ensure that the data is ready for downstream tasks.
- QC and normalization of transcriptomics data:
We don't need to do much just similar prep from Scanpy. Basically, we just need to ensure that our example notebook has a quick run down of scRNA-seq QC and normalization.
- QC and normalization of protein data:
Continuation of a discussion in theislab/single-cell-best-practices#141
Implementation of useful helper methods for common DE tools. We don't plan to reimplement several DE methods. Other people are trying this already.
We could implement wrapper functions that make it streamlined to run DE tools within the scverse ecosystem. The output of the tools should be in the right slots of the AnnData object.
Most important
Other options
Disclaimer: while the rpy2 code snippets should work in principle, I haven't used them extensively and am not sure if they follow the best practices of the respective tool.
I envisage something like
def de_analysis(
adata: AnnData,
groupby: str,
method: Literal["t-test", "wilcoxon", "pydeseq2"],
*,
formula: Optional[str],
contrast: Optional[Any],
inplace: bool = True,
key_addeed: Optional[str] = None,
) -> pd.DataFrame:
"""
Perform differential expression analysis
Parameters
----------
adata
single-cell or pseudobulk AnnData object
groupby
column in adata.obs that contains the factor to test, e.g. `treatment`.
For simple statistical tests (t-test, wilcoxon), it is sufficient to specify groupby. Linear models require to specify
a formula. In that case, the `groupby` column is used to compute the contrast.
method
Which method to use to perform the DE test
formula
model specification for linear models. E.g. `~ treatment + sex + age`.
MUST contain the factor specified in `groupby`.
contrast
TODO
See e.g. https://www.statsmodels.org/devel/contrasts.html for more information.
inplace
if True, save the result in `adata.varm[key_added]`
key_added
Key under which the result is saved in `adata.varm` if inplace is True.
If set to None this defaults to `de_{method}_{groupby}`.
"""
Clearly, the challenge here is to find an interface that works with both simple methods like t-test and the more flexible linear models. It will also always be a tradeoff between simplicity and the possibility to use all features of the underlying methods (e.g. do we want to support arbitrary contrasts? Do we want to support multiple contrasts for a single model fit?).
Some of these cases could require splitting the function up into a fit
and a compute_contrast
step. In that case an OOP interface might be more appropriate.
Each DE tool should output a data frame with at least the following columns
and optionally other columns (may depend on the method).
Add volcano plots from decoupler/wrap them (plot_volcano
)
Note: an (optionally) interactive version based on plotly/altair could be really nice -- hover over the dots to view the gene name.
Add paired or unpaired scatter+boxplots are nice ways of visualizing individual genes (each dot is a pseudobulk-sample)
[code snipped based on seaborn]
Add bar charts. Scatterplot on top for paired observations. Each dot represents the fold change for one paired observation.
[code snipped based on altair]
heatmap of coefficients with FDR correction
[code snippet based on altair]
A highly requested feature was support for genetics data. Currently there are already toolkits for genetics data out there: namely https://github.com/pystatgen/sgkit and https://github.com/hail-is/hail
We don't want to reimplement any of the genetics data tools. However, we want to enable bridges to their ecosystem by providing functions that convert the data structures from one into another.
The tasks are roughly:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.