Could you add something to the readme and the GitHub Pages saying which versions of Ma

System requirements? about neuropixel-utils HOT 6 CLOSED

apjanke commented on August 23, 2024

System requirements?

from neuropixel-utils.

Comments (6)

apjanke commented on August 23, 2024

Couple other questions about neuropixel-util's own requirements:

I see that you're using adapted kwikteam/npy-matlab to do NPY file I/O. What requirements do you have there?

How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?
Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?
Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?
What are your I/O speed needs?

from neuropixel-utils.

djoshea commented on August 23, 2024

I'm not sure actually what versions of Matlab are supported; I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?

I'm mostly using readNPY and writeNPY to interact with data written by Kilosort 2.
Most of the loading happens here. The files are all .npy, because Kilosort writes datasets that can be loaded into a Python graphical tool called Phy for reviewing the sorting results and merging / splitting clusters. There are a couple of scattered locations where I write to npy files as well, but none of them is essential.

* How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?

The largest npy file I have among my datasets is 6 Gb. I think they would need to fit in memory for neuropixel-utils, because the loading code simply loads all the npy files for a given dataset into memory anyway. These npy files only contain data per-spike, and so they are considerably smaller than the raw, binary data files (Imec datasets).

* Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?

All of the outputs of Kilosort are themselves written by npy-matlab's writeNPY(), which according to their docs: "Only writes little endian, fortran (column-major) ordering"

* Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?

I've chosen to simply load everything from the npy files into memory upfront. The gist of building the KilosortDataset class was simply to load the results into memory to facilitate subsequent processing. I imagine this could create issues for someone working on a laptop with low RAM. I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine. But I went for simple. I do random access for raw imec binary files though, through Matlab's memory mapped files infrastructure, inside ImecDataset.

* What are your I/O speed needs?

For npy files, the loading only happens once. Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab seems sufficient?

from neuropixel-utils.

apjanke commented on August 23, 2024

Cool, thanks for all the info!

I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?

There are tools for determining whether a codebase is subject to compatibility issues from upgrading to a new version of Matlab, but these tools are pretty new, so I don't think they'd work on older versions. And I don't know if they'd recognize new features from future versions anyway.

The largest npy file I have among my datasets is 6 Gb.

Wha? From reading https://numpy.org/devdocs/reference/generated/numpy.lib.format.html I gathered that the max NPY file size was 4 GB?

I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine.

Are you willing to take a dependency on Python here? In my experience, setting up a Python environment and getting it working with Matlab is kind of a pain, especially on Windows and Mac.

What operating systems do your users use?

Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab seems sufficient?

Cool, just wondering.

from neuropixel-utils.

djoshea commented on August 23, 2024

My best guess is R2018b since I use strings in a few places, though not everywhere, and I'm guessing you'd then push it to R2019b if you substitute in arguments blocks. I don't recall if there's anything more recent than that I'm utilizing.

I'm not super familiar with the NPY format, but my quick read of that page is that the array header is limited to 4 GB, not the total file size. I don't think there's any real need to have random access of NPY files within neuropixel-utils though since loading everything into memory upfront is simple enough.

from neuropixel-utils.

apjanke commented on August 23, 2024

Yeah, I'd like you to consider letting me bump it to R2019b: the new arguments block syntax is really nice, and I'm going to try to convince you to use it. :)

from neuropixel-utils.

djoshea commented on August 23, 2024

Yeah, go for it. I have started using it elsewhere. It's definitely better than inputParser

from neuropixel-utils.

System requirements? about neuropixel-utils HOT 6 CLOSED

Comments (6)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent