pmbio / mudataseurat Goto Github PK

View Code? Open in Web Editor NEW

24.0 11.0 10.0 4.07 MB

.h5mu files interface for Seurat

Home Page: https://pmbio.github.io/MuDataSeurat/

R 100.00%

seurat multimodal-omics multi-omics scrna-seq anndata mudata

mudataseurat's Introduction

MuDataSeurat

Documentation | Preprint | Discord

MuDataSeurat is a package that provides I/O functionality for .h5mu files and Seurat objects.

You can learn more about multimodal data containers in the mudata documentation.

Installation

remotes::install_github("pmbio/MuDataSeurat")

Quick start

MuDataSeurat provides a set of I/O operations for multimodal data.

MuDataSeurat implements WriteH5MU() that saves Seurat objects to .h5mu files that can be further integrated into workflows in multiple programming languages, including the muon Python library and the Muon.jl Julia library. ReadH5MU() reads .h5mu files into Seurat objects.

MuDataSeurat currently works for Seurat objects of v3 and above.

Writing files

Start with an existing dataset, e.g. a Seurat object with CITE-seq data:

library(SeuratData)
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")

WriteH5MU() allows to save the object into a .h5mu file:

library(MuDataSeurat)
WriteH5MU(bm, "bmcite.h5mu")

Please note that only standardised parts of the object are written to the file, and extra information from specific methods, stored in the Seurat object, might be omitted upon writing the file.

Reading files

bm <- ReadH5MU("bmcite.h5mu")

Please note that only the intersection of cells is currently loaded into the Seurat object due to the object structure limitation. Multimodal embeddings (global .obsm slot) are loaded with the assay.used field set to the default assay. Embeddings names are changed in order to comply with R & Seurat requirements and conventions.

Relevant projects

Other R packages for multimodal I/O include:

MuData for MultiAssayExperiment objects
SeuratDisk

mudataseurat's People

Contributors

Stargazers

Watchers

Forkers

kerwin12580 liuzj039 casblaauw longnguyen17031999 zqfang wook2014 bisho2122 ruiwang0236 driesschaumont

mudataseurat's Issues

`mu.read` ValueError

Dear author,

I encountered the following issue upon reading a mudata object that was converted from Seurat:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <module>
----> 1 mdata2 = mu.read("data/processed/mudata.h5mu/ADT")

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:409, in read(filename, **kwargs)
    406     return read_h5mu(filepath, **kwargs)
    407 elif m[3] == "":
    408     # .h5mu/<modality>
--> 409     return read_h5ad(filepath, m[2], **kwargs)
    410 elif m[2] == "mod":
    411     # .h5mu/mod/<modality>
    412     return read_h5ad(filepath, m[3], **kwargs)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:372, in read_h5ad(filename, mod, backed)
    370 with h5py.File(filename, hdf5_mode) as f_root:
    371     f = f_root["mod"][mod]
--> 372     return _read_h5mu_mod(f, manager, backed)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/mudata/_core/io.py:320, in _read_h5mu_mod(g, manager, backed)
    318     elif k != "raw":
    319         d[k] = read_attribute(g[k])
--> 320 ad = AnnData(**d)
    321 if manager is not None:
    322     ad.file = AnnDataFileManager(ad, os.path.basename(g.name), manager)

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:308, in AnnData.__init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    306     self._init_as_view(X, oidx, vidx)
    307 else:
--> 308     self._init_as_actual(
    309         X=X,
    310         obs=obs,
    311         var=var,
    312         uns=uns,
    313         obsm=obsm,
    314         varm=varm,
    315         raw=raw,
    316         layers=layers,
    317         dtype=dtype,
    318         shape=shape,
    319         obsp=obsp,
    320         varp=varp,
    321         filename=filename,
    322         filemode=filemode,
    323     )

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:526, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    523 # Backwards compat for connectivities matrices in uns["neighbors"]
    524 _move_adj_mtx({"uns": self._uns, "obsp": self._obsp})
--> 526 self._check_dimensions()
    527 self._check_uniqueness()
    529 if self.filename:

File /exports/para-lipg-hpc/mdmanurung/conda/envs/scanpy/lib/python3.8/site-packages/anndata/_core/anndata.py:1837, in AnnData._check_dimensions(self, key)
   1835     key = {key}
   1836 if "obs" in key and len(self._obs) != self._n_obs:
-> 1837     raise ValueError(
   1838         "Observations annot. `obs` must have number of rows of `X`"
   1839         f" ({self._n_obs}), but has {self._obs.shape[0]} rows."
   1840     )
   1841 if "var" in key and len(self._var) != self._n_vars:
   1842     raise ValueError(
   1843         "Variables annot. `var` must have number of columns of `X`"
   1844         f" ({self._n_vars}), but has {self._var.shape[0]} rows."
   1845     )

ValueError: Observations annot. `obs` must have number of rows of `X` (163), but has 62773 rows.

I then tried to load each modality one by one. I could load my RNA data, but not my ADT. My ADT data has 163 features in it. For both modalities, I have 62773 observations.

Considering that, I am a bit confused by the error. Why would obs of my ADT data expect 163 rows, which should be the number of features?

Thanks for taking the time.

Regards,
Mikhael

can't read mudata created with muon (python)

Hello, thanks for working on interoperability between seurat and mudata!

I can't read a mudata that I created following your multimodal tutorial using ReadH5MU

test<-ReadH5MU("data_test.dir/pbmc_w3_teaseq.h5mu")
Error in dataset[[name]]$read() : attempt to apply non-function

I have no problems loading the object with muon in python

import muon as mu
mu.read_h5mu("data_test.dir/pbmc_w3_teaseq.h5mu")
MuData object with n_obs × n_vars = 5805 × 113187
  obs:  'sample', 'well', 'leiden_multiplex', 'leiden_mofa', 'leiden_wnn'
  var:  'highly_variable', 'gene_ids', 'feature_types', 'genome', 'interval'
  obsm: 'X_mofa', 'X_umap', 'X_wnn_umap'
  varm: 'LFs'
  obsp: 'mofa_connectivities', 'mofa_distances', 'wnn_connectivities', 'wnn_distances'
  3 modalities
    rna:        5805 x 16381
      obs:      'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'lognorm'
      obsp:     'connectivities', 'distances'
    atac:       5805 x 96760
      obs:      'n_fragments', 'n_duplicate', 'n_mito', 'n_unique', 'altius_count', 'altius_frac', 'gene_bodies_count', 'gene_bodies_frac', 'peaks_count', 'peaks_frac', 'tss_count', 'tss_frac', 'barcodes', 'cell_name', 'well_id', 'chip_id', 'batch_id', 'pbmc_sample_id', 'DoubletScore', 'DoubletEnrichment', 'TSSEnrichment', 'n_genes_by_counts', 'total_counts', 'n_counts', 'leiden'
      var:      'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
      uns:      'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts', 'lognorm'
      obsp:     'connectivities', 'distances'
    prot:       5805 x 46
      obs:      'total_counts'
      var:      'highly_variable'
      uns:      'neighbors', 'pca', 'umap'
      obsm:     'X_pca', 'X_umap'
      varm:     'PCs'
      layers:   'counts'
      obsp:     'connectivities', 'distances'

I can explore the h5 but it breaks where the error says. it also seems to expect some attributes that I don't have in the mudata

h5 <- open_and_check_mudata("~/Documents/devel/data_test.dir/pbmc_w3_teaseq.h5mu")
metadata <- read_with_index(h5[["obs"]])
dataset = h5[['obs']]
dataset_attr <- tryCatch({
  h5attributes(dataset)
  }, error = function(e) {
  list("_index" = "_index")
  })
  indexcol <- "_index"
if ("_index" %in% names(dataset_attr)) {
  indexcol <- dataset_attr$`_index`
}
dataset_attr
columns <- names(dataset)
columns <- columns[columns != "__categories"]
columns

dataset[["sample"]]$read()

Error in dataset[[name]]$read() : attempt to apply non-function

values_attr <-h5attributes(dataset)
values_attr 
$`column-order`
[1] "sample" "well"  

$`_index`
[1] "_index"

$`encoding-type`
[1] "dataframe"

$`encoding-version`
[1] "0.2.0"

# so the following line will be NULL
# values_attr$categories

any suggestions?

thanks!

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bmcite.SeuratData_0.3.0 pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2        hdf5r_1.3.5             MuDataSeurat_0.0.0.9000 magrittr_2.0.3          datapasta_3.1.0        
 [8] forcats_0.5.1           stringr_1.4.0           dplyr_1.0.8             purrr_0.3.4             readr_2.1.2             tidyr_1.2.0             tibble_3.1.6           
[15] ggplot2_3.3.5           tidyverse_1.3.1        

loaded via a namespace (and not attached):
  [1] readxl_1.4.0          backports_1.4.1       plyr_1.8.7            igraph_1.3.0          lazyeval_0.2.2        splines_4.1.2         listenv_0.8.0         scattermore_0.8      
  [9] digest_0.6.29         htmltools_0.5.2       fansi_1.0.3           tensor_1.5            cluster_2.1.3         ROCR_1.0-11           tzdb_0.3.0            remotes_2.4.2        
 [17] globals_0.14.0        modelr_0.1.8          matrixStats_0.62.0    spatstat.sparse_2.1-0 prettyunits_1.1.1     colorspace_2.0-3      rappdirs_0.3.3        rvest_1.0.2          
 [25] ggrepel_0.9.1         haven_2.4.3           callr_3.7.0           crayon_1.5.1          jsonlite_1.8.0        spatstat.data_2.1-4   survival_3.3-1        zoo_1.8-9            
 [33] glue_1.6.2            polyclip_1.10-0       gtable_0.3.0          leiden_0.3.9          clipr_0.8.0           pkgbuild_1.3.1        future.apply_1.8.1    abind_1.4-5          
 [41] scales_1.1.1          DBI_1.1.2             spatstat.random_2.2-0 miniUI_0.1.1.1        Rcpp_1.0.8.3          viridisLite_0.4.0     xtable_1.8-4          reticulate_1.24      
 [49] spatstat.core_2.4-2   bit_4.0.4             htmlwidgets_1.5.4     httr_1.4.2            anndata_0.7.5.3       RColorBrewer_1.1-3    ellipsis_0.3.2        Seurat_4.1.0         
 [57] ica_1.0-2             pkgconfig_2.0.3       uwot_0.1.11           dbplyr_2.1.1          deldir_1.0-6          utf8_1.2.2            tidyselect_1.1.2      rlang_1.0.2          
 [65] reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.1.2           cli_3.3.0             generics_0.1.2        broom_0.7.12         
 [73] ggridges_0.5.3        fastmap_1.1.0         goftest_1.2-3         processx_3.5.3        bit64_4.0.5           fs_1.5.2              fitdistrplus_1.1-8    RANN_2.6.1           
 [81] pbapply_1.5-0         future_1.24.0         nlme_3.1-157          mime_0.12             formatR_1.12          xml2_1.3.3            compiler_4.1.2        rstudioapi_0.13      
 [89] plotly_4.10.0         curl_4.3.2            png_0.1-7             spatstat.utils_2.3-0  reprex_2.0.1          stringi_1.7.6         ps_1.6.0              lattice_0.20-45      
 [97] Matrix_1.4-1          SeuratDisk_0.0.0.9019 vctrs_0.3.8           pillar_1.7.0          lifecycle_1.0.1       spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19     
[105] addinexamples_0.1.0   data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1              promises_1.2.0.1     
[113] KernSmooth_2.23-20    gridExtra_2.3         parallelly_1.31.0     codetools_0.2-18      MASS_7.3-56           assertthat_0.2.1      rprojroot_2.0.3       withr_2.5.0          
[121] SeuratObject_4.0.4    sctransform_0.3.3     mgcv_1.8-40           parallel_4.1.2        hms_1.1.1             grid_4.1.2            rpart_4.1.16          Rtsne_0.15           
[129] shiny_1.7.1           lubridate_1.8.0

Columns of type characters converted into byte string

Dear author,

I noticed variables of my cell metadata with character type were converted into byte string (b'' surrounding the entries). I could fix this by applying .str.decode('utf-8') to the affected columns. Leaving the columns as byte caused scanpy to fail in detecting those (e.g. when selecting argument for color for plotting functions).

Regards,
Mikhael

invalid class “DimReduc” object

Hi,

ReadH5MU was giving me the following error:

Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
Warning: No columnames present in cell embeddings, setting to 'atacpca_1:50'
Error in validObject(.Object) : 
  invalid class “DimReduc” object: invalid object for slot "feature.loadings" in class "DimReduc": got class "NULL", should be or extend class "matrix"
In addition: Warning messages:
1: In missing_on_read("/var", "global variables metadata") :
  Missing on read: /var. Seurat does not support global variables metadata.
2: In missing_on_read("/varp", "pairwise annotation of variables") :
  Missing on read: /varp. Seurat does not support pairwise annotation of variables.

I managed to fix it by computing the PCA embeddings explicitly. Before that I was plotting UMAPs without explicitly computing PCA before.

Putting it here in case someone encounters the same error. Does the conversion to Seurat require PCA embeddings ? Or maybe it does when UMAP is present ?

Anyway thanks for this compatibility tool!

Best,

Error occurred when convert Seurat v5 object to h5ad file

When I try to convert Seurat Object to .h5ad file I got an error:

WriteH5AD(seu.filtered, 'data/allcell/allcell_filtered.h5ad')
Error in WriteH5ADHelper(object, assay, h5, global = TRUE) : 
  no slot of name "meta.features" for this object of class "Assay5"

Here is my environment:

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS/LAPACK: /home/software_install/miniconda3/envs/r_envs/lib/libopenblasp-r0.3.21.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Asia/Shanghai
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] MuDataSeurat_0.0.0.9000 lubridate_1.9.3         forcats_1.0.0           stringr_1.5.0           dplyr_1.1.3             purrr_1.0.2             readr_2.1.4             tidyr_1.3.0            
 [9] tibble_3.2.1            ggplot2_3.4.4           tidyverse_2.0.0         scCustomize_1.1.3       Seurat_5.0.0            SeuratObject_5.0.0      sp_2.1-1               

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.15.0      jsonlite_1.8.7         shape_1.4.6            magrittr_2.0.3         spatstat.utils_3.0-4   ggbeeswarm_0.7.2       GlobalOptions_0.1.2    vctrs_0.6.4           
 [10] ROCR_1.0-11            spatstat.explore_3.2-5 paletteer_1.5.0        janitor_2.2.0          htmltools_0.5.7        sctransform_0.4.1      parallelly_1.36.0      KernSmooth_2.23-22     htmlwidgets_1.6.2     
 [19] ica_1.0-3              plyr_1.8.9             plotly_4.10.3          zoo_1.8-12             igraph_1.5.1           mime_0.12              lifecycle_1.0.4        pkgconfig_2.0.3        Matrix_1.6-2          
 [28] R6_2.5.1               fastmap_1.1.1          snakecase_0.11.1       fitdistrplus_1.1-11    future_1.33.0          shiny_1.7.5.1          digest_0.6.33          colorspace_2.1-0       rematch2_2.1.2        
 [37] patchwork_1.1.3        tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1          progressr_0.14.0       timechange_0.2.0       fansi_1.0.5            spatstat.sparse_3.0-3  httr_1.4.7            
 [46] polyclip_1.10-6        abind_1.4-5            compiler_4.3.1         withr_2.5.2            bit64_4.0.5            fastDummies_1.7.3      MASS_7.3-60            tools_4.3.1            vipor_0.4.5           
 [55] lmtest_0.9-40          beeswarm_0.4.0         httpuv_1.6.11          future.apply_1.11.0    goftest_1.2-3          glue_1.6.2             nlme_3.1-163           promises_1.2.1         grid_4.3.1            
 [64] Rtsne_0.16             cluster_2.1.4          reshape2_1.4.4         generics_0.1.3         hdf5r_1.3.8            gtable_0.3.4           spatstat.data_3.0-3    tzdb_0.4.0             hms_1.1.3             
 [73] data.table_1.14.8      utf8_1.2.4             spatstat.geom_3.2-7    RcppAnnoy_0.0.21       ggrepel_0.9.4          RANN_2.6.1             pillar_1.9.0           spam_2.10-0            RcppHNSW_0.5.0        
 [82] ggprism_1.0.4          later_1.3.1            circlize_0.4.15        splines_4.3.1          lattice_0.22-5         survival_3.5-7         bit_4.0.5              deldir_1.0-9           tidyselect_1.2.0      
 [91] miniUI_0.1.1.1         pbapply_1.7-2          gridExtra_2.3          scattermore_1.2        matrixStats_1.1.0      stringi_1.8.1          lazyeval_0.2.2         codetools_0.2-19       cli_3.6.1             
[100] uwot_0.1.16            xtable_1.8-4           reticulate_1.34.0      munsell_0.5.0          Rcpp_1.0.11            globals_0.16.2         spatstat.random_3.2-1  png_0.1-8              ggrastr_1.0.2         
[109] parallel_4.3.1         ellipsis_0.3.2         dotCall64_1.1-0        listenv_0.9.0          viridisLite_0.4.2      scales_1.2.1           ggridges_0.5.4         crayon_1.5.2           leiden_0.4.3          
[118] rlang_1.1.2            cowplot_1.1.1

Is this problem caused by Seurat upgrade? How can I convert Seurat v5 object into anndata?

String categories written by MuDataSeurat are read in as bytes by anndata

Using the same setup in #5, with the fix that closed it:

suppressWarnings(SeuratData::InstallData("pbmc3k", force.reinstall = F))
suppressWarnings(data("pbmc3k"))
seuratObj <- suppressWarnings(pbmc3k)

WriteH5AD(seuratObj, "mudata_seurat.h5ad")

import anndata as ad

a = ad.read_h5ad("./mudata_seurat.h5ad")
a.obs

              orig.ident  nCount_RNA  nFeature_RNA seurat_annotations
AAACATACAACCAC  b'pbmc3k'      2419.0           779    b'Memory CD4 T'
AAACATTGAGCTAC  b'pbmc3k'      4903.0          1352               b'B'
AAACATTGATCAGC  b'pbmc3k'      3147.0          1129    b'Memory CD4 T'
AAACCGTGCTTCCG  b'pbmc3k'      2639.0           960      b'CD14+ Mono'
AAACCGTGTATGCG  b'pbmc3k'       980.0           521              b'NK'
...                   ...         ...           ...                ...
TTTCGAACTCTCAT  b'pbmc3k'      3459.0          1153      b'CD14+ Mono'
TTTCTACTGAGGCA  b'pbmc3k'      3443.0          1224               b'B'
TTTCTACTTCCTCG  b'pbmc3k'      1684.0           622               b'B'
TTTGCATGAGAGGC  b'pbmc3k'      1022.0           452               b'B'
TTTGCATGCCTCAC  b'pbmc3k'      1984.0           723     b'Naive CD4 T'

The categorical should be read in as strings. I would also suggest just writing the more recent dataframe and categorical format where everything is more self contained and annotated while you're at it.

Allow reading files with the duplicated feature names across modalities

See #10 to reproduce the steps:

LoadData(ds = "bmcite") %>% WriteH5MU("bmcite.h5mu"); ReadH5MU("bmcite.h5mu")

This line has to account for duplicated feature names in the multimodal .var data frame.

Remote install fails

Remote install fails with

Error: Failed to install 'MuDataSeurat' from GitHub:
  Unknown remote type: SeuratData=github
  object 'seuratdata=github_remote' of mode 'function' was not found
Traceback:

1. remotes::install_github("PMBio/MuDataSeurat")
2. install_remotes(remotes, auth_token = auth_token, host = host, 
 .     dependencies = dependencies, upgrade = upgrade, force = force, 
 .     quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual, 
 .     build_vignettes = build_vignettes, repos = repos, type = type, 
 .     ...)
3. tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
 .     stop(remote_install_error(remotes[[i]], e))
 . })
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
other attached packages:
[1] remotes_2.4.1

This was fixed by changing

MuDataSeurat/DESCRIPTION

Line 24 in bdd8bfa

Remotes: SeuratData=github::satijalab/seurat-data

to Remotes: github::satijalab/seurat-data

Null categorical codes written incorrectly

Writing

suppressWarnings(SeuratData::InstallData("pbmc3k", force.reinstall = F))
suppressWarnings(data("pbmc3k"))
seuratObj <- suppressWarnings(pbmc3k)

WriteH5AD(seuratObj, "mudata_seurat.h5ad")

Reading

import anndata as ad

ad.read_h5ad("./mudata_seurat.h5ad")

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/pandas/core/arrays/categorical.py:709, in Categorical.from_codes(cls, codes, categories, ordered, dtype)
    706     raise ValueError("codes need to be array-like integers")
    708 if len(codes) and (codes.max() >= len(dtype.categories) or codes.min() < -1):
--> 709     raise ValueError("codes need to be between -1 and len(categories)-1")
    711 return cls(codes, dtype=dtype, fastpath=True)

ValueError: codes need to be between -1 and len(categories)-1

Full Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 ad.read_h5ad("./mudata_seurat.h5ad")

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/h5ad.py:236, in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
    233     assert False, "unexpected raw format"
    234 elif k in {"obs", "var"}:
    235     # Backwards compat
--> 236     d[k] = read_dataframe(f[k])
    237 else:  # Base case
    238     d[k] = read_elem(f[k])

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/h5ad.py:301, in read_dataframe(group)
    299     return read_dataframe_legacy(group)
    300 else:
--> 301     return read_elem(group)

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/specs/registry.py:183, in read_elem(elem, modifiers)
    178 def read_elem(
    179     elem: Union[H5Array, H5Group, ZarrGroup, ZarrArray],
    180     modifiers: frozenset(str) = frozenset(),
    181 ) -> Any:
    182     """Read an element from an on disk store."""
--> 183     return _REGISTRY.get_reader(type(elem), get_spec(elem), frozenset(modifiers))(elem)

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/specs/methods.py:564, in read_dataframe_0_1_0(elem)
    561 columns = _read_attr(elem.attrs, "column-order")
    562 idx_key = _read_attr(elem.attrs, "_index")
    563 df = pd.DataFrame(
--> 564     {k: read_series(elem[k]) for k in columns},
    565     index=read_series(elem[idx_key]),
    566     columns=list(columns),
    567 )
    568 if idx_key != "_index":
    569     df.index.name = idx_key

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/specs/methods.py:564, in <dictcomp>(.0)
    561 columns = _read_attr(elem.attrs, "column-order")
    562 idx_key = _read_attr(elem.attrs, "_index")
    563 df = pd.DataFrame(
--> 564     {k: read_series(elem[k]) for k in columns},
    565     index=read_series(elem[idx_key]),
    566     columns=list(columns),
    567 )
    568 if idx_key != "_index":
    569     df.index.name = idx_key

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/anndata/_io/specs/methods.py:586, in read_series(dataset)
    584     categories = read_elem(categories_dset)
    585     ordered = bool(_read_attr(categories_dset.attrs, "ordered", False))
--> 586     return pd.Categorical.from_codes(
    587         read_elem(dataset), categories, ordered=ordered
    588     )
    589 else:
    590     return read_elem(dataset)

File ~/miniconda3/envs/seurat-conversion/lib/python3.10/site-packages/pandas/core/arrays/categorical.py:709, in Categorical.from_codes(cls, codes, categories, ordered, dtype)
    706     raise ValueError("codes need to be array-like integers")
    708 if len(codes) and (codes.max() >= len(dtype.categories) or codes.min() < -1):
--> 709     raise ValueError("codes need to be between -1 and len(categories)-1")
    711 return cls(codes, dtype=dtype, fastpath=True)

ValueError: codes need to be between -1 and len(categories)-1

Checking out the file:

import h5py
import pandas as pd

f = h5py.File("./mudata_seurat.h5ad")

pd.value_counts(f["obs"]["seurat_annotations"][:])

 0             697
 1             483
 2             480
 3             344
 4             271
 5             162
 6             155
-2147483648     62
 7              32
 8              14
dtype: int64

It looks like what's happening is that R and pandas encode categorical missing values quite differently. pandas (and anndata) are expecting null values to have codes of -1.

HDF5-API Errors: error #000: H5D.c in H5Dvlen_reclaim(): line 732: invalid argument

Hi, I'm trying to convert mudata to Seurat object. I'm getting following error while using ReadH5MU():

seurat <- ReadH5MU("str.h5mu")
Error in self$read_low_level(file_space = self_space_id, mem_space = mem_space_id, :
HDF5-API Errors:
error #000: H5D.c in H5Dvlen_reclaim(): line 732: invalid argument
class: HDF5
major: Invalid arguments to routine
minor: Bad value

Kindly help, thank you

WriteH5MU fails

Hi, thanks for working on interoperability between seurat and mudata!

I am failing to save the seurat object offered in this tutorial

reference <- LoadH5Seurat("../data/pbmc_multimodal.h5seurat")
WriteH5MU(reference, "tea.h5mu")

Defining highly variable features...
Defining highly variable features...
Error in self$exists(name) : 
STRING_ELT() can only be applied to a 'character vector', not a 'NULL'

I tried different things like removing some of the reduction methods but still no luck.
the file is created but it breaks somewhere in the process, still haven't figured out where exactly (and ofc it can't be read with ReadH5MU)
Any idea what I should check next?
thank you!

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bmcite.SeuratData_0.3.0 pbmc3k.SeuratData_3.1.4 SeuratData_0.2.2        hdf5r_1.3.5             MuDataSeurat_0.0.0.9000 magrittr_2.0.3          datapasta_3.1.0        
 [8] forcats_0.5.1           stringr_1.4.0           dplyr_1.0.8             purrr_0.3.4             readr_2.1.2             tidyr_1.2.0             tibble_3.1.6           
[15] ggplot2_3.3.5           tidyverse_1.3.1        

loaded via a namespace (and not attached):
  [1] readxl_1.4.0          backports_1.4.1       plyr_1.8.7            igraph_1.3.0          lazyeval_0.2.2        splines_4.1.2         listenv_0.8.0        
  [8] scattermore_0.8       digest_0.6.29         htmltools_0.5.2       fansi_1.0.3           tensor_1.5            cluster_2.1.3         ROCR_1.0-11          
 [15] tzdb_0.3.0            remotes_2.4.2         globals_0.14.0        modelr_0.1.8          matrixStats_0.62.0    spatstat.sparse_2.1-0 prettyunits_1.1.1    
 [22] colorspace_2.0-3      rappdirs_0.3.3        rvest_1.0.2           ggrepel_0.9.1         haven_2.4.3           callr_3.7.0           crayon_1.5.1         
 [29] jsonlite_1.8.0        spatstat.data_2.1-4   survival_3.3-1        zoo_1.8-9             glue_1.6.2            polyclip_1.10-0       gtable_0.3.0         
 [36] leiden_0.3.9          clipr_0.8.0           pkgbuild_1.3.1        future.apply_1.8.1    abind_1.4-5           scales_1.1.1          DBI_1.1.2            
 [43] spatstat.random_2.2-0 miniUI_0.1.1.1        Rcpp_1.0.8.3          viridisLite_0.4.0     xtable_1.8-4          reticulate_1.24       spatstat.core_2.4-2  
 [50] bit_4.0.4             htmlwidgets_1.5.4     httr_1.4.2            anndata_0.7.5.3       RColorBrewer_1.1-3    ellipsis_0.3.2        Seurat_4.1.0         
 [57] ica_1.0-2             pkgconfig_2.0.3       uwot_0.1.11           dbplyr_2.1.1          deldir_1.0-6          utf8_1.2.2            tidyselect_1.1.2     
 [64] rlang_1.0.2           reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.1.2           cli_3.3.0            
 [71] generics_0.1.2        broom_0.7.12          ggridges_0.5.3        fastmap_1.1.0         goftest_1.2-3         processx_3.5.3        bit64_4.0.5          
 [78] fs_1.5.2              fitdistrplus_1.1-8    RANN_2.6.1            pbapply_1.5-0         future_1.24.0         nlme_3.1-157          mime_0.12            
 [85] formatR_1.12          xml2_1.3.3            compiler_4.1.2        rstudioapi_0.13       plotly_4.10.0         curl_4.3.2            png_0.1-7            
 [92] spatstat.utils_2.3-0  reprex_2.0.1          stringi_1.7.6         ps_1.6.0              lattice_0.20-45       Matrix_1.4-1          SeuratDisk_0.0.0.9019
 [99] vctrs_0.3.8           pillar_1.7.0          lifecycle_1.0.1       spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19      addinexamples_0.1.0  
[106] data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1              promises_1.2.0.1     
[113] KernSmooth_2.23-20    gridExtra_2.3         parallelly_1.31.0     codetools_0.2-18      MASS_7.3-56           assertthat_0.2.1      rprojroot_2.0.3      
[120] withr_2.5.0           SeuratObject_4.0.4    sctransform_0.3.3     mgcv_1.8-40           parallel_4.1.2        hms_1.1.1             grid_4.1.2           
[127] rpart_4.1.16          Rtsne_0.15            shiny_1.7.1           lubridate_1.8.0

WriteH5MU fails with NULL as varm_key after running Seurat workflow

Saving an Seurat objects after normal Seurat workflow fails due to NULL value a varm_key.
I'm not sure why and if it is occuring here or here but wrapping this lines in if (!is.null(varm_key)){} solved it for me and no keys where obviously missing in the h5mu object.

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Packages:
hdf5r_1.3.4
SeuratObject_4.0.2
Seurat_4.0.4
MuDataSeurat_0.0.0.9000

Thanks for developing this package. It helps a lot in switching between Seurat and scanpy!

I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using `Seurat::as.sparse`. Now I can load my `mudata` file.

I've tested the patch but I still see the same error. Strangely, the error disappears upon forcing my ADT matrix to a sparse matrix using Seurat::as.sparse. Now I can load my mudata file.

Originally posted by @mdmanurung in #2 (comment)

I have the same issue on the bonemarrow data used to illustrate the package functionality

library(SeuratData)
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")

library(MuDataSeurat)
WriteH5MU(bm, "bmcite.h5mu")


test<- ReadH5MU("bmcite.h5mu")

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘CD14’, ‘CD19’, ‘CD27’, ‘CD28’, ‘CD34’, ‘CD38’, ‘CD4’, ‘CD69’

also reading the same object in python fails (with or without using seurat::as.sparse on the ADT)

import muon as mu

mu.read_h5mu("bmcite.h5mu")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/mudata/_core/io.py", line 380, in read_h5mu
    ad = _read_h5mu_mod(gmods[m], manager, backed not in (None, False))
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/mudata/_core/io.py", line 513, in _read_h5mu_mod
    ad = AnnData(**d)
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 291, in __init__
    self._init_as_actual(
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 521, in _init_as_actual
    self._check_dimensions()
  File "/Users/fabiola.curion/Documents/devel/miniconda3/envs/R405py39/lib/python3.9/site-packages/anndata/_core/anndata.py", line 1843, in _check_dimensions
    raise ValueError(
ValueError: Observations annot. `obs` must have number of rows of `X` (25), but has 30672 rows.

The anndata package expects 'anndata', not 'AnnData' as encoding-type

It seems this line is causing H5AD files saved with MuDataSeurat to not be read.

MuDataSeurat/R/WriteUtils.R

Line 29 in 9a66a48

h5$create_attr("encoding-type", "AnnData", space=H5S$new("scalar"))

Compare with the reading code here: https://github.com/scverse/anndata/blob/b118ba4bfb182a29509d7b07cb5966fb0297be5a/anndata/_io/h5ad.py#L222

Best,
/Valentine

Actions fail due to issues with pkgdown

See r-lib/pkgdown#2155.

Actions fail because of SeuratData

When configuring the dependencies, SeuratData seems to be installed before Seurat, which fails.

While this should be handled by the build system, it is not for some reason...

dgCMatrix

I pulled down two h5ad files from https://developmental.cellatlas.io/fetal-bone-marrow.
(1) Human fetal BM 10x dataset and (2) Human fetal BM Down syndrome 10x dataset)

The Down syndrome data loads fine using MuDataSeurat::ReadH5AD. The typical data set gives me this error.

d21 <- MuDataSeurat::ReadH5AD("/tmp/fig1b_fbm_scaled_gex_updated_dr_20210104.h5ad")
Warning: Feature names cannot have underscores (''), replacing with dashes ('-')
Error in (function (cl, name, valueClass) :
assignment of an object of class “dgCMatrix” is not valid for @‘scale.data’ in an object of class “Assay”; is(value, "matrix") is not TRUE
In addition: Warning message:
In read_layers_to_assay(h5) :
The var_names from modality have been renamed as feature names cannot contain ''. E.g. RP11-442N24__B.1 -> RP11-442N24--B.1.

Error in WriteH5ADHelper(object, assay, h5, global = TRUE): no slot of name "meta.features" for this object of class "Assay5"

Hello, author. I had an error converting the seruat v5 object to h5ad.
MuDataSeurat::WriteH5AD(a, "a.h5ad",assay="RNA")

Error in WriteH5ADHelper(object, assay, h5, global = TRUE): no slot of name "meta.features"for this object of class "Assay5"
Traceback:

1.MuDataSeurat::WriteH5AD(a, "a.h5ad", assay = "RNA")
2. MuDataSeurat::WriteH5AD(a, "a.h5ad", assay = "RNA")
3. WriteH5ADHelper(object, assay, h5, global = TRUE)