Coder Social home page Coder Social logo

System requirements? about neuropixel-utils HOT 6 CLOSED

apjanke avatar apjanke commented on August 23, 2024
System requirements?

from neuropixel-utils.

Comments (6)

apjanke avatar apjanke commented on August 23, 2024

Couple other questions about neuropixel-util's own requirements:

I see that you're using adapted kwikteam/npy-matlab to do NPY file I/O. What requirements do you have there?

  • How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?
  • Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?
  • Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?
  • What are your I/O speed needs?

from neuropixel-utils.

djoshea avatar djoshea commented on August 23, 2024

I'm not sure actually what versions of Matlab are supported; I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?

I'm mostly using readNPY and writeNPY to interact with data written by Kilosort 2.
Most of the loading happens here. The files are all .npy, because Kilosort writes datasets that can be loaded into a Python graphical tool called Phy for reviewing the sorting results and merging / splitting clusters. There are a couple of scattered locations where I write to npy files as well, but none of them is essential.

* How big a NPY file do you need to handle? Small enough that they can fit in memory, or potentially larger?

The largest npy file I have among my datasets is 6 Gb. I think they would need to fit in memory for neuropixel-utils, because the loading code simply loads all the npy files for a given dataset into memory anyway. These npy files only contain data per-spike, and so they are considerably smaller than the raw, binary data files (Imec datasets).

* Are you going to need to support different endiannesses and both row-major (C) and column-major (Fortran/Matlab) dimension ordering?

All of the outputs of Kilosort are themselves written by npy-matlab's writeNPY(), which according to their docs: "Only writes little endian, fortran (column-major) ordering"

* Will you need to do partial, random-access file reads to just sections of the files, or is just reading a whole file in at once adequate for you?

I've chosen to simply load everything from the npy files into memory upfront. The gist of building the KilosortDataset class was simply to load the results into memory to facilitate subsequent processing. I imagine this could create issues for someone working on a laptop with low RAM. I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine. But I went for simple. I do random access for raw imec binary files though, through Matlab's memory mapped files infrastructure, inside ImecDataset.

* What are your I/O speed needs?

For npy files, the loading only happens once. Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab seems sufficient?

from neuropixel-utils.

apjanke avatar apjanke commented on August 23, 2024

Cool, thanks for all the info!

I've just been using the latest (R2020b). Did Vijay mention a tool for determining this automatically?

There are tools for determining whether a codebase is subject to compatibility issues from upgrading to a new version of Matlab, but these tools are pretty new, so I don't think they'd work on older versions. And I don't know if they'd recognize new features from future versions anyway.

The largest npy file I have among my datasets is 6 Gb.

Wha? From reading https://numpy.org/devdocs/reference/generated/numpy.lib.format.html I gathered that the max NPY file size was 4 GB?

I imagine there may be a way to do something more sophisticated if numpy as its own random access system for npy files, and Matlab could wrap it through the Python engine.

Are you willing to take a dependency on Python here? In my experience, setting up a Python environment and getting it working with Matlab is kind of a pain, especially on Windows and Mac.

What operating systems do your users use?

Generally the loading of a Kilosort Dataset takes seconds to tens of seconds, so the current performance of npy-matlab seems sufficient?

Cool, just wondering.

from neuropixel-utils.

djoshea avatar djoshea commented on August 23, 2024

My best guess is R2018b since I use strings in a few places, though not everywhere, and I'm guessing you'd then push it to R2019b if you substitute in arguments blocks. I don't recall if there's anything more recent than that I'm utilizing.

I'm not super familiar with the NPY format, but my quick read of that page is that the array header is limited to 4 GB, not the total file size. I don't think there's any real need to have random access of NPY files within neuropixel-utils though since loading everything into memory upfront is simple enough.

from neuropixel-utils.

apjanke avatar apjanke commented on August 23, 2024

Yeah, I'd like you to consider letting me bump it to R2019b: the new arguments block syntax is really nice, and I'm going to try to convince you to use it. :)

from neuropixel-utils.

djoshea avatar djoshea commented on August 23, 2024

Yeah, go for it. I have started using it elsewhere. It's definitely better than inputParser

from neuropixel-utils.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.