Hi there--I'm an analyst in the Luo lab (associated with one of the single-cell methylomes datasets used in the preprint!). Thanks for developing this software-- I love the user-friendliness and the VMR concept in scbs, and running scbs on CpG methylation is quite smooth.
However, I'm running into some issues as I frequently need to work with non-CpG methylation (CH methylation), which can be associated with ~20-fold more loci. Using 64GB memory and ~2,000 cells, larger chromosomes will fail at the end of the scbs prepare
step in what looks like the .coo to .npz conversion.
While I'm currently attempting to re-run with more memory, this is a relatively low cell count dataset for us. I'd be great to somehow merge sets of cells (so maybe I could run on 1,000 of the dataset at a time?) or somehow read/write each chromosome in blocks to use less memory, though I'm not sure either is possible with the details of the sparse format. Are there any ways to "recover" a run to convert the 1.coo file (which is successfully created) without re-running prepare, or any other recommendations?
Populating 57334996 x 2165 matrix for chromosome 19...
Converting from COO to CSR...
Writing to scbs_out_CH/19.npz ...
Populating 260521582 x 2165 matrix for chromosome 1...
Traceback (most recent call last):
File "/u/home/c/lib/python3.8/site-packages/scbs/prepare.py", line 156, in _load_csr_from_coo
coo = pd.read_csv(coo_path, delimiter=",", header=None).values
File "/u/home/c/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/u/home/c/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/u/home/c/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/u/home/c/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1269, in read
df = DataFrame(col_dict, columns=columns, index=index)
File "/u/home/c/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "/u/home/c/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "/u/home/c/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 156, in arrays_to_mgr
return create_block_manager_from_column_arrays(
File "/u/home/c/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1954, in create_block_manager_from_column_arrays
blocks = _form_blocks(arrays, consolidate)
File "/u/home/c/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 2028, in _form_blocks
values, placement = _stack_arrays(list(tup_block), dtype)
File "/u/home/c/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 2067, in _stack_arrays
stacked = np.empty(shape, dtype=dtype)
numpy.core._exceptions.MemoryError: Unable to allocate 51.7 GiB for an array with shape (3, 2313401180) and data type int64
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/u/home/c/bin/scbs", line 8, in <module>
sys.exit(cli())
File "/u/home/c/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/u/home/c/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/u/home/c/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/u/home/c/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/u/home/c/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/u/home/c/lib/python3.8/site-packages/scbs/cli.py", line 157, in prepare_cli
prepare(**kwargs)
File "/u/home/c/lib/python3.8/site-packages/scbs/prepare.py", line 44, in prepare
mat = _load_csr_from_coo(coo_path, chrom_size, n_cells)
File "/u/home/c/lib/python3.8/site-packages/scbs/prepare.py", line 165, in _load_csr_from_coo
raise type(exc)(f"{exc} (problematic file: {coo_path})").with_traceback(
TypeError: __init__() missing 1 required positional argument: 'dtype'