Comments (6)
Couple other questions about neuropixel-util's own requirements:
I see that you're using adapted kwikteam/npy-matlab to do NPY file I/O. What requirements do you have there?
- How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?
- Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?
- Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?
- What are your I/O speed needs?
from neuropixel-utils.
I'm not sure actually what versions of Matlab are supported; I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?
I'm mostly using readNPY and writeNPY to interact with data written by Kilosort 2.
Most of the loading happens here. The files are all .npy, because Kilosort writes datasets that can be loaded into a Python graphical tool called Phy for reviewing the sorting results and merging / splitting clusters. There are a couple of scattered locations where I write to npy files as well, but none of them is essential.
* How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?
The largest npy file I have among my datasets is 6 Gb. I think they would need to fit in memory for neuropixel-utils, because the loading code simply loads all the npy files for a given dataset into memory anyway. These npy files only contain data per-spike, and so they are considerably smaller than the raw, binary data files (Imec datasets).
* Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?
All of the outputs of Kilosort are themselves written by npy-matlab's writeNPY(), which according to their docs: "Only writes little endian, fortran (column-major) ordering"
* Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?
I've chosen to simply load everything from the npy files into memory upfront. The gist of building the KilosortDataset
class was simply to load the results into memory to facilitate subsequent processing. I imagine this could create issues for someone working on a laptop with low RAM. I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine. But I went for simple. I do random access for raw imec binary files though, through Matlab's memory mapped files infrastructure, inside ImecDataset.
* What are your I/O speed needs?
For npy files, the loading only happens once. Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab
seems sufficient?
from neuropixel-utils.
Cool, thanks for all the info!
I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?
There are tools for determining whether a codebase is subject to compatibility issues from upgrading to a new version of Matlab, but these tools are pretty new, so I don't think they'd work on older versions. And I don't know if they'd recognize new features from future versions anyway.
The largest npy file I have among my datasets is 6 Gb.
Wha? From reading https://numpy.org/devdocs/reference/generated/numpy.lib.format.html I gathered that the max NPY file size was 4 GB?
I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine.
Are you willing to take a dependency on Python here? In my experience, setting up a Python environment and getting it working with Matlab is kind of a pain, especially on Windows and Mac.
What operating systems do your users use?
Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab seems sufficient?
Cool, just wondering.
from neuropixel-utils.
My best guess is R2018b since I use strings in a few places, though not everywhere, and I'm guessing you'd then push it to R2019b if you substitute in arguments
blocks. I don't recall if there's anything more recent than that I'm utilizing.
I'm not super familiar with the NPY format, but my quick read of that page is that the array header is limited to 4 GB, not the total file size. I don't think there's any real need to have random access of NPY files within neuropixel-utils though since loading everything into memory upfront is simple enough.
from neuropixel-utils.
Yeah, I'd like you to consider letting me bump it to R2019b: the new arguments
block syntax is really nice, and I'm going to try to convince you to use it. :)
from neuropixel-utils.
Yeah, go for it. I have started using it elsewhere. It's definitely better than inputParser
from neuropixel-utils.
Related Issues (13)
- Common setup can erase raw data HOT 2
- Error while loading binaries from SpikeGLX (Release v.20190327) - port numbers on imec PXI card HOT 1
- Issue with concatenating LF HOT 2
- typo in function name
- Error loading an imec Dataset , .imec..bin gets added to filename HOT 7
- error while trying to exclude specific time window HOT 1
- Missing link in imec_dataset GH page
- turbo colormap not included HOT 2
- error on loading and working with kilosort dataset HOT 2
- Outdated function reference? HOT 1
- meta.snsShankMap may not be available if only writing lf.bin
- Output metadata is incorrect HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neuropixel-utils.