flatironinstitute / mountainlab-js Goto Github PK

MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, spike sorting software, but is designed to be more generally applicable.

License: Other

JavaScript 85.98% Shell 0.96% CSS 0.82% HTML 6.11% MATLAB 1.63% Dockerfile 0.26% Python 1.57% Jupyter Notebook 0.54% OCaml 1.93% Standard ML 0.21%

mountainlab-js's Introduction

MountainLab

MountainLab is data processing, sharing and visualization software for scientists. It was built to support MountainSort, spike sorting software, but is designed to be more generally applicable.

Spike Sorting

This page documents MountainLab only. If you would like to use MountainSort spike sorting software, then please follow this link for installation and usage instructions.

Installation

MountainLab and associated plugins and helper code are available for Linux and MacOS. At some point, this may run on Windows.

The easiest way to install MountainLab is using conda:

conda create -n mountainlab
conda activate mountainlab
conda install -c flatiron -c conda-forge mountainlab mountainlab_pytools

You should regularly update the installation via:

conda install -c flatiron -c conda-forge mountainlab mountainlab_pytools

If you are not familiar with conda or do not have it installed, then you should read this conda guide.

Alternative installation

Alternatively you can install MountainLab using npm (you must first install a recent version of NodeJS)

npm install -g mountainlab

and mountainlab_pytools can be installed using pip (use python 3.6 or later)

pip install mountainlab_pytools

Developer installation

Developers should install MountainLab and mountainlab_pytools from source

git clone [this-repo]
cd [this-repo-name]
npm install .

Then add [this-repo-name]/bin to your PATH environment variable.

See the mountainlab_pytools repository for information on installing that package from source.

A note about prior versions of MountainLab

If you have a prior (non-js) version of MountainLab installed, then you may want to uninstall it for sanity's sake (either via apt-get remove or by removing the mountainlab binaries from your path), although it is possible for them to co-exist since the command-line utilities have different names. Note that the processor plugin libraries work equally well and simultaneously with both (we have not changed the .mp spec system). The default package search path has changed, though, so you will need to copy or link your processor packages to the new location (see below).

Test and configure your Installation

Test the installation by running

ml-config

The output of this command will explain how to configure MountainLab on your system (it simply involves setting environment variables by editing a .env file).

Note that, when installed using conda, MountainLab will default to searching a configuration directory within the current conda env. Otherwise, the default location will be be in ~/.mountainlab. You can always determine this location by running ml-config.

Further test the installation by running

ml-list-processors

This should list the names of all the available processors. If you have not yet installed any processor packages, then it will just list a single hello.world processor distributed with MountainLab. To see the specification for this processor in human-readable format, enter

ml-spec hello.world -p

This will show the inputs, outputs, and parameters for this processor. Since it is a minimalist processor, it doesn't have any of these, so the output of this command will be very unexciting.

Processors

MountainLab is not useful if it can't do more than hello.world. The main functionality of MountainLab is to wrap well-defined, deterministic compute operations into processors. Each processor has a specification which defines the inputs, outputs, and parameters that the processor operates on, and can encapsulate programs written in any language.

In order to run a processor it must either be installed or available in a Singularity container.

Installing processors

The easiest way to install a MountainLab processor package is using conda. For example, the ml_ephys package contains processors that are useful for electrophysiology and spike sorting. It can be installed via:

conda install -c flatiron -c conda-forge ml_ephys

Assuming that the conda package is configured properly, this will make a symbolic link in a packages/ directory within the current conda environment. To verify that it was installed properly, try ml-list-processors once again. This time, in addition to hello.world, you should see a collection of processors that start with the ephys. prefix. Now we can get something more useful from ml-spec:

ml-spec ephys.bandpass_filter -p

Alternatively, if you are not using conda, or if a MountainLab package is not available in conda, then you can control which processor packages are registered by manually creating symbolic links to the packages/ directory as follows:

pip install ml_ephys
ml-link-python-module ml_ephys `ml-config package_directory`/ml_ephys

Here, ml-link-python-module is a convenience command distributed with MountainLab that creates symbolic links based on installed python modules. The ml-config package_directory command returns the directory where MountainLab looks for processor packages.

Developers of processor packages should use the following method for installing packages from source

git clone https://github.com/magland/ml_ephys
ln -s ml_ephys `ml-config package_directory`/ml_ephys

Note that in this last case, you should make sure that all python dependencies of ml_ephys are installed.

In general, MountainLab finds registered processors by recursively searching the packages directory for any executable files with a .mp extension. More details on creating plugin processor packages can be found elsewhere in the documentation.

Running processors

Once installed, processors can either be run directly on the command-line (as shown in this section), or by using a Python script (see github.com/mountainsort_examples for an example Python pipelines).

From the command-line (or within a bash script) processors jobs can be executed by issuing the ml-run-process command:

ml-run-process [processor_name] \
    --inputs \
        [ikey1]:[ifile1] \
        [ikey2]:[ifile2] \
        ... 
    --outputs \
        [okey1]:[ofile1] \
        ... 
    --parameters \
        [pkey1]:[pval1] \
        ... 
    [other options]

(Note that -i, -o, and -p can be used in place of --inputs, --outputs, and --parameters)

For example, to run the hello.world processor:

ml-run-process hello.world

MountainLab maintains a database/cache of all of the processor jobs that have executed. If the same processor command is issued at a later time, with the same input files, output files, and parameters, then the system recognizes this and does not actually run the job. To force it to re-execute, use the --force_run flag as follows:

ml-run-process hello.world --force_run

To get help on other options of ml-run-process, use the following or look elsewhere in the documentation

ml-run-process --help

Non-hello-world examples can be found in the MountainSort examples repository

Configuration

As you will learn from running the ml-config command, MountainLab can be configured by setting environment variables. Ideally these should should be specified in the mountainlab.env file (see the output ml-config to determine its location), but those values can also be overridden by setting the variables by command-line in the terminal. You can always check whether these have been successfully set for the current instance of MountainLab by running ml-config after making changes.

The following are some of the configuration variables (they each have a default value if left empty):

ML_TEMPORARY_DIRECTORY -- the location where temporary data files are stored (default: /tmp/mountainlab-tmp)
ML_PACKAGE_SEARCH_DIRECTORY -- the primary location for ML processing packages
ML_ADDITIONAL_PACKAGE_SEARCH_DIRECTORIES -- optional additional directories to search for packages (colon separated list)
ML_ADDITIONAL_PRV_SEARCH_DIRECTORIES -- optional additional directories to search for files pointed to by .prv objects

Command reference

The following commands are available from any terminal. Use the --help flag on any of these to get more detailed information.

mda-info Get information about a .mda file
ml-config Show the current configuration (i.e., environment variables)
ml-exec-process Run a processor job without involving the process cache for completed processes
ml-link-python-module Register a processor package from an installed python module, as described above
ml-list-processors List all registered processors on the local machine
ml-prv-create Create a new .prv file based on an existing data file (computes the sha1sum, etc)
ml-prv-download Download a file corresponding to a .prv file or object
ml-prv-locate Locate a file on the local machine (or remotely) based on a .prv file
ml-prv-sha1sum Compute the sha1sum of a data file (uses a cache for efficiency)
ml-prv-stat Compute the prv object for a data file (uses a cache for efficiency)
ml-read-dir Read a directory, which could be a kbucket path, returning a JSON object
ml-run-process Run a processor job
ml-spec Retrieve the spec object for a particular registered processor

PRV files

This section needs to be expanded and corrected. Right now it does not explain what a PRV file is.

Note that .prv files can be substituted for both inputs or outputs. In such a case, where an input file has a .prv extension, MountainLab will search the local machine for the corresponding data file and substitute that in before running the processor (see the ml-prv-locate command). In the case that one of the output files has a .prv extension, MountainLab will store the output in a temporary file (in ML_TEMPORARY_DIRECTORY) and then create a corresponding .prv file in the output location specified in the command (see the ml-prv-create command).

Thus one can do command-line processing purely using .prv files, as in the following example of creating a synthetic electrophysiology dataset (which requires the ml_ephys processor library to be installed):

ml-run-process ephys.synthesize_random_waveforms --outputs waveforms_out:data/waveforms_true.mda.prv geometry_out:data/geom.csv --parameters upsamplefac:13 M:4
ml-run-process ephys.synthesize_random_firings --outputs firings_out:data/firings_true.mda.prv --parameters duration:600
ml-run-process ephys.synthesize_timeseries --inputs firings:data/firings_true.mda.prv waveforms:data/waveforms_true.mda.prv --outputs timeseries_out:data/raw.mda.prv --parameters duration:600 waveform_upsamplefac:13

All files will be stored in temporary locations, which can be retrieved using the ml-prv-locate command as follows:

ml-prv-locate raw_synth.mda.prv 
/tmp/mountainlab-tmp/output_184a04c2877517f8996fd992b6f923bee8c6bbd2_timeseries_out

Creating custom python processors

While MountainLab processor packages can be created in any language, the easiest way to contribute to code is to use our mlprocessors framework. There you will find a step-by-step guide and links to several examples.

Custom processor libraries

Here is a list of user-contributed processor packages that we know of. You may git clone each of these into a working directory, then link them to your MountainLab packages directory as above.

Identity processors A set of "hello world" processors, to show how to make a simple processor and do file I/O.
- ml_identity: Python version, by Alex Morley.
- ml_identity_matlab: Matlab version, by Thiago Gouvea.
ddms: Tools for converting to/from neurosuite format, by Alex Morley.
ironclust: CPU-only octave implementation of JRCLUST algorithm, wrapped as a processor, by James Jun.
Loren Frank's lab processors:
- franklab_msdrift: Modified drift processors that compare both neighbor and non-neighbor epochs for drift tracking, by Mari Sosa.
- franklab_mstaggedcuration: Tagged curation processors that preserve "rejected" clusters for accurate metrics recalculation, by Anna Gillespie.

You can also create your own MountainLab processor libraries using any language (python, C/C++, matlab, etc). Processor libraries are simply represented by executable .mp files that provide the specifications (spec) for a collection of processors together with command strings telling MountainLab how to execute those processors using system calls. For details, see the above ml_identity processors, and creating custom processor libraries

Credits and acknowledgements

The framework was conceived by and primarily implemented by Jeremy Magland at the Flatiron Institute and is released under the Apache license v2.

The project is currently being developed and maintained by:

Jeremy Magland
Tom Davidson
Alex Morley
Witold Wysota

Other key collaborators include folks at Flatiron Institute, including Alex Barnett, Dylan Simon, Leslie Greengard, Joakim Anden, and James Jun.

Jason Chung, Loren Frank, Leslie Greengard and Alex Barnett are direct collaborators in our spike sorting efforts and have therefore contributed to MountainLab, which has a broader scope. Other MountainSort users have contributed invaluable feedback, particularly investigators at UCSF (Mari Sosa has contributed code to the project).

MountainLab will also play a central role in the implementation of a website for comparing spike sorting algorithms on a standard set of ground-truth datasets. Alex Barnett, Jeremy Magland, and James Jun are leading this effort, but it is a community project, and we hope to have a lot of involvement from other players in the field of electrophysiology and spike sorting.

Alex Morley has a project and vision for applying continuous integration principles to research which will most likely use MountainLab as a core part of its implementation.

(If I have neglected to acknowledge your contribution, please remind me.)

Related Projects / Components

KBucket & kbclient - Distributed Data Access
MountainView & EPhys-Viz (WIP) - Visualisation
MountainLab PyTools - Python Tools

mountainlab-js's People

Contributors

Stargazers

Watchers

mountainlab-js's Issues

Proposal: standalone installer

A standalone, single-step installer that does not require installing, configuring, or using Conda.

Conda constructor is the tool used to create custom Conda distributions (like miniconda, Anaconda, etc). It creates single-file .sh, .pkg, or .exe installers that include a pre-selected set of conda packages.

They are really easy to build with a single config file (see, e.g. https://github.com/flatironinstitute/mountainlab-conda/tree/master/constructor.ms4 ), but they don't currently function as a 'click-and-run' software installer. In particular, if you put the /bin directory on the path, then you get all the binaries from all the dependencies in the conda env, and clobber many of the user's other software (python, qmake, npm,...)... which is not what is expected of an installer.

I have proposed an enhancement to constructor at this issue which would provide specified 'entry points' for specified apps. I'm not sure if it will get any uptake from the conda devs, but there's probably a way to implement it ourselves. Using either conda-run or exec-wrappers, one can run an executable in a conda env without having to activate the env in your shell.

Even though we would hide the 'conda'-ness of this install method, it would still be a full-fledged conda environment under the hood, which means the install could be updated (with, e.g. conda update -p /path/to/mountainsort --all), or extended (conda install -p /path/to/mountainsort -c flatiron fancy_new_plugin). We could even provide this functionality with a wrapper function like ml-update or ml-get-package <packagename>)

I could also see this being a nice way to distribute things like lari/kbucket (that need to run persistently), e.g., since it manages dependencies, integrates with the way we distribute other components, and doesn't require the use of any package manager by the end user.

One downside of this is that you ship EVERYTHING in one big bundle. For Qt or Electron apps this adds up to >100s MB fairly quickly. For a first-time install this is no penalty, since you'd need to download those dependencies regardless; but it could add up and be unwieldy if there are multiple versions kicking around. Installing via conda directly gets around this, since dependencies are shared across envs; maybe there's a way to get a 'constructor'-built installer to take advantage of that.

Test for unimodality

Hello!

Sorry, It's a repost from flatironinstitute/mountainsort#48

I want to understand your own test for unimodality (based on Hartigan and Hartigan, 1985) but I can't found it in the code.
Perhaps, I am looking for this in a incorrect code or I can't understand a sentences in C++.
Would you help me please? Where can i found the test? I guess If i see the algorithm before C++ sentences, it would be perfect for understand the test.

Thank you!

Guada

Package mountainlab (+plugins) using conda

Once we figure out how to run mountainlab in an isolated env (#14), consider whether we can package it using conda.

Goal is to be able to run:
conda install -c flatironinstitute mountainlab-js mountainlab-ephys-plugins
...or similar, and have all dependencies pulled in and configured, binaries on path, etc.

Nodejs and mongo are both already provided as conda packages, and conda claims to support distribution of js apps, so I think this might be doable without heroics.

mlprocessors: invoke processor classes directly -- including validation

for example

import ml_ephys
ml_ephys.compute_templates(timeseries=...,etc)

That does not require mountainlab.

spikeforestwidgets: expose filesystem to javascript widgets

Polish / Debug ephys.convert_array

kbclient: search also for files on local system (like prv-locate)

when kbclient needs to read a file for abc.prv, it should do a prv search on the local system first.

error when there are no clusters found on a particular channel

Reposting error from Issue: ms4alg hangs after lots of thread-related log output #24

Clustering for channel 80 (phase1)...
Found 0 clusters for channel 80 (phase1)...
Computing templates for channel 80 (phase1)...

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/data/zaworaca/anaconda3/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/data/zaworaca/anaconda3/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg.py", line 513, in run_phase1_sort
   neighborhood_sorter.runPhase1Sort()
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg.py", line 345, in runPhase1Sort
    self.runSort(mode='phase1')
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg.py", line 395, in runSort
    templates=compute_templates_from_timeseries_model(X,times,labels,nbhd_channels=nbhd_channels,clip_size=clip_size,chunk_infos=chunk_infos)
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg.py", line 269, in compute_templates_from_timeseries_model
    K=np.max(labels)
  File "/data/zaworaca/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2320, in amax
    out=out, **kwargs)
  File "/data/zaworaca/anaconda3/lib/python3.6/site-packages/numpy/core/_methods.py", line 26, in _amax
    return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity
"""

The above exception was the direct cause of the following exception:


Traceback (most recent call last):
  File "/data/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg_spec.py", line 11, in <module>
    if not PM.run(sys.argv):
  File "/data/zaworaca/anaconda3/lib/python3.6/site-packages/mltools/processormanager/processormanager_impl.py", line 37, in run
    return P(**args)
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/p_ms4alg.py", line 72, in sort
    MS4.sort()
  File "/gpfs/gsfs7/users/zaworaca/mountainlab-js/packages/ml_ms4alg/ms4alg.py", line 591, in sort
    pool.map(run_phase1_sort, neighborhood_sorters)
  File "/data/zaworaca/anaconda3/lib/python3.6/multiprocessing/pool.py", line 266, in map
   return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/data/zaworaca/anaconda3/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
ValueError: zero-size array to reduction operation maximum which has no identity

[ Removing temporary directory ... ]
Process returned with non-zero exit code.

This error only appears when I sort an hour long session (96 channels X 107919471 time points). Sorting only a small subset of that session (96 channels X 18000000 time points) runs successfully with your fix above.

_sort_array_err.log

mlprocessors: some mechanism to export the registered processors as symbols

something boilerplate goes into the init.py somehow.

Visualization of noise cluster

If I get it right noise metrics are computed from a noise cluster consisting of snippets from random time points.
It would be useful to have that noise cluster (or newly generated ones) visible in mountainview (or its successor) - e.g. spike spray, pca features, templates.

Enable running mountainlab in isolated environments (e.g. conda)

Working towards multiple installed instances of mountainlab on a single machine. There are 2 main cases to consider:

Multiple users logged into same machine, using mountainlab simultaneously or asynchronously
- Do they want to share .prv files?
- How to manage resources (CPU, memory, disk). Queueing shared across users? Or use a non-mountainlab, systemwide solution (e.g. https://en.wikipedia.org/wiki/Cgroups)
A single user with multiple versions of mountainlab installed in different isolated environments (e.g. conda env)

This is a placeholder issue, I'll expand each of the points below out into a separate issue later

Override hardcoded .mountainlab/mountainlab.env search path (with an environment variable). (string appears in mlproc/common.js , lari/laricommon.js and the examples ms4_sort.sh shell script)
Create per-user (or per-user+per-env) databases in system-wide mongo database. (Could also just use per-user mongo instances, writing to files in user's home dir, in which case may need to deal with port conflicts).
Support npm install --global: currently it hangs, though it works if you manually do global installs of all the subpackages. Global installs work nicely within the env--they make the binaries available within the env.
Figure out how this all interacts with lari, and ml-queue-process. (For starters, just run without lari)

ephys-viz: develop dataset view

select channels
fix icons

ms4alg hangs after lots of thread-related log output

My run of the sorting algorithm produces the following output and seems to hang during the re-assigning phase (attaching output file). It will stay at this spot for over 15 minutes, even while running with 60 CPUs and 100GB of RAM (processing a 40GB file). Is it normal for the re-assigning phase to take that long?

Could it be related to all the OpenBLAS outputs I get? How should I interpret the OpenBLAS outputs:

OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 6190469 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable

_sort_hang.log

Make safedatabase module...

... because right now safedatabase.js is duplicated code between kbclient and mountainlab-js

TODO:

Think of a better name?
Publish on npm
Update mountainlab-js and kbclient to use the new package

ml-run-process ms4alg.sort, Warning: unable to rename file

Hi, I just installed mountainlab-js through conda and am trying out your algorithm but I keep getting the following error

RUNNING: ml-run-process ephys.bandpass_filter --inputs timeseries:/home/bl202/Documents/sample_data/raw.mda --parameters freq_max:6000 freq_min:300 samplerate:30000 --outputs timeseries_out:/home/bl202/Documents/sample_data/out/filt.mda.prv
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]
[ Checking outputs... ]
[ Checking process cache ... ]
[ Process ephys.bandpass_filter already completed. ]
[ Creating output prv for timeseries_out ... ]
[ Done. ]
RUNNING: ml-run-process ephys.whiten --inputs timeseries:/home/bl202/Documents/sample_data/out/filt.mda.prv --parameters --outputs timeseries_out:/home/bl202/Documents/sample_data/out/pre.mda.prv
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
Locating /home/bl202/Documents/sample_data/out/filt.mda.prv ...
[ Computing process signature ... ]
[ Checking outputs... ]
[ Checking process cache ... ]
[ Process ephys.whiten already completed. ]
[ Creating output prv for timeseries_out ... ]
[ Done. ]
RUNNING: ml-run-process ms4alg.sort --inputs geom:/home/bl202/Documents/sample_data/geom.csv timeseries:/home/bl202/Documents/sample_data/out/pre.mda.prv --parameters adjacency_radius:100 detect_sign:-1 detect_threshold:-1 --outputs firings_out:output/firings.mda
[ Getting processor spec... ]
Warning: unable to rename file /home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC/output_firings_out.mda -> /home/bl202/Documents/output/firings.mda . Perhaps temporary directory is not on the same device as the output file directory.
[ Checking inputs and substituting prvs ... ]
Error renaming file /home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC/output_firings_out.mda -> /home/bl202/Documents/output/firings.mda: ENOENT: no such file or directory, copyfile '/home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC/output_firings_out.mda' -> '/home/bl202/Documents/output/firings.mda'
Locating /home/bl202/Documents/sample_data/out/pre.mda.prv ...
[ Computing process signature ... ]
[ Checking outputs... ]
[ Checking process cache ... ]
[ Creating temporary directory ... ]
[ Preparing temporary outputs... ]
[ Initializing process ... ]
[ Running ... ] /home/bl202/anaconda3/envs/mlab/bin/python3 /home/bl202/anaconda3/envs/mlab/etc/mountainlab/packages/ml_ms4alg/ms4alg_spec.py.mp ms4alg.sort --geom=/home/bl202/Documents/sample_data/geom.csv --timeseries=/home/bl202/Documents/tmp-mlab/output_fc01c6308d08b94c019ddc06dc9ff6e6debf8494_timeseries_out.mda --firings_out=/home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC/output_firings_out.mda --adjacency_radius=100 --detect_sign=-1 --detect_threshold=-1 --_tempdir=/home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC
Using tempdir=/home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC
Preparing /home/bl202/Documents/tmp-mlab/tempdir_d5c04fe186_mR0sjC/timeseries.hdf5...
Preparing neighborhood sorters...

Detecting events on channel 1 (phase1)...
Detecting events on channel 2 (phase1)...

Detecting events on channel 4 (phase1)...
Detecting events on channel 3 (phase1)...

Computing PCA features for channel 3 (phase1)...

Computing PCA features for channel 4 (phase1)...

Computing PCA features for channel 1 (phase1)...

Computing PCA features for channel 2 (phase1)...

Clustering for channel 3 (phase1)...

Clustering for channel 2 (phase1)...

Clustering for channel 1 (phase1)...

Clustering for channel 4 (phase1)...

Found 3 clusters for channel 1 (phase1)...
Computing templates for channel 1 (phase1)...

Found 2 clusters for channel 4 (phase1)...
Computing templates for channel 4 (phase1)...

Found 2 clusters for channel 3 (phase1)...
Computing templates for channel 3 (phase1)...

Found 2 clusters for channel 2 (phase1)...
Computing templates for channel 2 (phase1)...

Re-assigning events for channel 1 (phase1)...

Re-assigning events for channel 4 (phase1)...

Re-assigning events for channel 3 (phase1)...

Re-assigning events for channel 2 (phase1)...

Computing PCA features for channel 3 (phase2)...

Computing PCA features for channel 1 (phase2)...

Computing PCA features for channel 4 (phase2)...

Computing PCA features for channel 2 (phase2)...

Clustering for channel 3 (phase2)...

Clustering for channel 1 (phase2)...

Clustering for channel 4 (phase2)...

Clustering for channel 2 (phase2)...

Found 4 clusters for channel 3 (phase2)...

Found 6 clusters for channel 1 (phase2)...

Found 12 clusters for channel 4 (phase2)...

Found 9 clusters for channel 2 (phase2)...

Preparing output...

Writing firings file...

Done.

Elapsed time for processor ms4alg.sort: 99.194 sec
Finalizing output firings_out
[ Removing temporary directory ... ]
Traceback (most recent call last):

**File "", line 1, in
runfile('/home/bl202/Documents/temp.py', wdir='/home/bl202/Documents')

File "/home/bl202/anaconda3/envs/mlab/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "/home/bl202/anaconda3/envs/mlab/lib/python3.6/site-packages/spyder_kernels/customize/spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/bl202/Documents/temp.py", line 259, in
detect_threshold=-1,

File "/home/bl202/Documents/temp.py", line 54, in sort_dataset
opts=opts

File "/home/bl202/Documents/temp.py", line 125, in ms4alg_sort
opts

File "/home/bl202/anaconda3/envs/mlab/lib/python3.6/site-packages/mountainlab_pytools/mlproc/mlproc_impl.py", line 235, in runProcess
return P.run(inputs,outputs,parameters,opts)

File "/home/bl202/anaconda3/envs/mlab/lib/python3.6/site-packages/mountainlab_pytools/mlproc/mlproc_impl.py", line 189, in run
raise Exception('Non-zero exit code for {}'.format(self.name()))

Exception: Non-zero exit code for ms4alg.sort**

As you can see, the temporary directory is in fact in the same device as the output file directory. Do you have any pointers on what I'm doing wrong?

Thanks in advance

ml-spec should return non-zero exit value on non-existent processor error

If you run ml-spec foobarbaz, you get an error message ("Processor not found: non-existent-processor"), but the command returns with success (return value =0). It should instead return an error code, like 1. This makes this command less useful in testing
I have a fix I'm about to commit.

mountainlab_pytools: better error reporting for processing pipelines

"Unexpected input" error for input listed by ml-spec

Input file_in is listed for processor kepecs.duplifile. Yet, ml-run-process throws a "Unexpected input: file_in" error.

I might be doing something wrong (eg. exe_command is possibly flawed - see issue #16) - help appreciated.

Spec file created with jsonlab.
Original code here.

(mycondaenv) computername:~ username$ ml-spec kepecs.duplifile
{
    "name": "kepecs.duplifile",
    "version": "0.0",
    "description": "Duplicates input file",
    "inputs": {
        "name": "file_in",
        "optional": 0
    },
    "outputs": [],
    "parameters": [],
    "opts": {
        "force_run": 1
    },
    "exe_command": "/Users/username/.mountainlab/packages/minim/run_mfile.sh \"cd('processors'), duplifile($file_in$)\""
}
(mycondaenv) computername:~ username$ ml-run-process kepecs.duplifile --inputs file_in:"/Users/username/Downloads/geom.csv"
[ Getting processor spec... ]
Error: Unexpected input: file_in
    at check_iop (/Users/username/Programs/mountainlab-js/mlproc/run_process.js:906:10)
    at run_process_2 (/Users/username/Programs/mountainlab-js/mlproc/run_process.js:47:3)
    at /Users/username/Programs/mountainlab-js/mlproc/run_process.js:29:3
    at /Users/username/Programs/mountainlab-js/mlproc/common.js:72:7
    at /Users/username/Programs/mountainlab-js/mlproc/common.js:226:6
    at /Users/username/Programs/mountainlab-js/mlproc/db_utils.js:65:4
    at result (/Users/username/Programs/mountainlab-js/mlproc/node_modules/mongodb/lib/utils.js:414:17)
    at session.endSession (/Users/username/Programs/mountainlab-js/mlproc/node_modules/mongodb/lib/utils.js:401:11)
    at ClientSession.endSession (/Users/username/Programs/mountainlab-js/mlproc/node_modules/mongodb/node_modules/mongodb-core/lib/sessions.js:104:41)
    at executeCallback (/Users/username/Programs/mountainlab-js/mlproc/node_modules/mongodb/lib/utils.js:397:17)
Unexpected input: file_in
(mycondaenv) computername:~ username$

example 002 could not connect ot lari server

Hi Jeremy -
the example 001 sort and view example bash scripts work.
However, 002 doesn't. I haven't started a local server since it's not clear I had to - do I?
Assuming I do, ml-lari-start
isn't found even if I'm in the lm-env virtualenv.
Thanks, Alex

Documentation + Web Server

Tried to add some "higher level" docs.

Need to lay out exactly what each part of the framework is responsible for
Need to work more on explaining the motivation

Webpage

Delete Any Unnecessary Files

See PR #7

mountainlab: Report error if processor modifies an input file.

Include example mountainlab.env file in source

It would be nice to include a sample mountainlab.env file that users can refer to when setting up envs. This could also serve as documentation of the various config options. A common pattern on Unix is to include config files like this with the default values filled, but the lines commented out. (A little tricky in our case since we want everything to be relocatable; we also don't want this config to suffer from documentation skew if we forget to update it...).

## *** Run 'ml-config' to inspect the current and default values
## *** Separate multiple directory entries with ':'

## Directory where Mountainlab will keep temp files
# ML_TEMPORARY_DIRECTORY=/tmp/mountainlab-tmp

## Directory to search for processor packages:
# ML_PACKAGE_SEARCH_DIRECTORY=

Missing doc on exe_command

Documentation on how to create custom processor libraries is truncated at the section "Formatting the exe command."
Examples are not sufficiently clarifying, specially for processors written in different languages.

Processor for creating timeseries in hdf5 file

make safedatabase package

Because right now the safedatabase.js code is duplicated between mountainlab-js and kbclient.

TODO:

Choose a better name?
Separated it out into a package, and publish on npm.
Update mountainlab-js and kbclient to use this package

npm install should use local dependencies not post-install script

I think the dependencies on local npm packages (mlproc, etc.) should be done using real dependencies, not a post-install script in package.json. Like this: https://stackoverflow.com/questions/14381898/local-dependency-in-package-json/14387210#14387210

Error in phase 2 of clustering

When I run some of my datasets, I get a repeated warning that splitting the data is generating empty parcels, and that this could be due to duplicate points. Eventually, the program crashes. I've checked and there aren't any duplicates; also, I can run this on the same data multiple times, and only sometimes get it to crash without changing anything. How would you recommend troubleshooting this? I didn't attach the entire error log, but it's a repetition of

Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.
Warning: Size did not change after splitting parcel.
Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.
Warning in isosplit5: new parcel has no points -- perhaps dataset contains duplicate points? -- original size = 19.

finally followed by

RangeError: Invalid string length
    at Socket.<anonymous> (/home/anaconda3/envs/ml-env/lib/node_modules/mountainlab-js/mlproc/systemprocess.js:52:24)
    at Socket.emit (events.js:180:13)
    at addChunk (_stream_readable.js:274:12)
    at readableAddChunk (_stream_readable.js:261:11)
    at Socket.Readable.push (_stream_readable.js:218:10)
    at Pipe.onread (net.js:581:20)

Update README with new installation instructions (simplified)

@tjd2002

ms4alg error with intermediate files

Hello mountainlab-js developers,

I'm trying to use ms4alg with my data and I get this error:
KeyError: "Unable to open object (file read failed: time = Tue Jun 19 16:04:17 2018\n, filename = '/media/C/Projects/SPIKE_SORT/temp/tempdir_a072851555_xIrCyD/timeseries.hdf5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0x2f0a168, total read size = 544, bytes this sub-read = 544, bytes actually read = 18446744073709551615, offset = 5405935000)"

Here is the full output: https://pastebin.com/LP8qf5wC

Best,
Ioannis

Creating spikeforest processing pipeline

a jupyter notebook to do spike sorting with various algorithms on standard datasets

ironclust matlab: separate processing code from gui code

Errors when tmp dir is on another device

I get errors like this on linux:

[ Removing temporary directory ... ]
Error renaming file /tmp/mountainlab-tmp/tempdir_854c703b10_9sIHUg/output_geometry_out.csv -> /home/tjd/Src/mountainlab-js/examples/spike_sorting/001_ms4_bash_example/dataset/geom.csv: EXDEV: cross-device link not permitted, rename '/tmp/mountainlab-tmp/tempdir_854c703b10_9sIHUg/output_geometry_out.csv' -> '/home/tjd/Src/mountainlab-js/examples/spike_sorting/001_ms4_bash_example/dataset/geom.csv'

The problem is an attempt to use a js 'rename' to move a file from /tmp (which in this case is on a local HD) to /home/tjd/... (which in this case is an NFS share):

$ df -h / /home
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdc1        55G   44G  8.7G  84% /
hippo:/home     5.4T  1.8T  3.4T  35% /home

I could see this coming up a lot. This stack overflow post suggests the solution is to use something like fs.copyfile + unlink, instead of fs.rename. Probably we should root out all calls to rename that could possibly be on different devices.

lari: ensure output file is accessible via kbucket prior to returning result

document new python processor framework

Provide tutorial, instructions on how to create processor libraries using the new framework.

Test example pipelines

We should extend test/test.sh to include running some basic pipelines.

Running the example scripts seems like a good start.

ml-run-process option: --lari_out

Specify an output .json file for LARI information... such as lari_id and (more importantly) lari_job_id. This is important for jupyter lab integration... and integration with kbucketgui viewing status of a job.

Translate ironclust to python

processing job database attached to each lari node

Each lari node will have an on-disk database (.lari/[database]) including info on past, current processing jobs, and the api can be used query that database via http.

when to use whitening step?

If I intend to sort all my channels independently, does it still make sense to perform the whitening step on all the channels?

Allow multiple output file names to be collected into array.

At the moment
--inputs input:file1 input:file2
is allowed but
--outputs output:file1 output:file2
isn't.

Ideally they would both be allowed as it can be useful to have an array of output file names collected automatically as well. @magland was this deliberate?

Implement mda command in new mountainlab

should it be called ml-mda?

Multiple Calls to MLC.addProcess in for loop -> error

e.g.

async function convert_firings(MLC, dataset_dirname, output_dirname, params, temp_dir) {
  for (tet = 1; tet < params.tetrodes.length+1; tet++) {
    MLC.addProcess({
      processor_name: 'dd.convert_firings',
      inputs: {
        firings: output_dirname + '/firings.mda',
        params: temp_dir + 'params.json'
      },
      outputs: {
        res_fname: output_dirname + '/res.' + tet,
        clu_fname: output_dirname + '/clu.' + tet
      },
      parameters: {
        tetrode:tet
      }
    })
  };
  await MLC.run();
}

Errors like

[...]
Error in find: Unexpected token { in JSON at position 10746 
[...]

I think its a race condition. Maybe two processes trying to write to the database at the same time? Or is this not the intended use?

Is there currently a lock on the database?

re-implement ephys.* and ms4alg.* processors using new processor manager

Use a slightly different namespace, so we can test the equivalence, and then we'll replace the existing with the new.

These include the following:

ephys.bandpass_filter
ephys.compare_ground_truth
ephys.compute_cluster_metrics
ephys.compute_cross_correlograms
ephys.compute_templates
ephys.convert_array
ephys.synthesize_random_firings
ephys.synthesize_random_waveforms
ephys.synthesize_timeseries
ephys.whiten

ms4alg.apply_label_map
ms4alg.create_label_map
ms4alg.sort

add frank lab processors to the additional packages list in the readme

The Frank lab has two packages (processor bundles) that may be of use to others.

A modification of the drift processors that compares both neighbor and non-neighbor epochs for drift tracking: https://bitbucket.org/franklab/franklab_msdrift
Tagged curation processors that keep "rejected" clusters around for the purposes of accurate metrics recalculation later on: https://bitbucket.org/franklab/franklab_mstaggedcuration

Updates

Best way to go about updating mountainlab and all available processors/packages. As this can be annoying to do manually once there are a few processors. Especially as the project is pretty fast-moving at the moment.

I can think of two/three options

Simple bash script.
Would have to be edited by users who have different packages.
npm script
No way of knowing how to update a given package
Either of the above + an executable "update" script in the root of each package
Meh...
Add an entry onto the spec file called "update"
This is my favorite but could still be fragile to different installation commands depending on the host environment. And spec-based things are expected to be robust.
Orrrr is this beyond the scope of the platform?

command line tools should report version on request

Can we add a -V or --version flag (or both) to the following tools. Handy for debugging user installs, path issues, etc.

mlproc (and derivatives: ml-list-processors, etc). ml-queue-process --version would result in a call of mlproc run-process --mode=queue --version, so ideally this would work too.). These should return the version of mountainlab-js
- Exception: ml-spec --version should work, but ml-spec <processor-name> --version should probably fail, as it's ambiguous whether the request is for the version of the processor or of mountainlab.
mda-info
ephys-viz tools
qt-mountainview
lari-* kb-* kbucket-* ?

Processor plugins already report their version through ml-spec. Is 'version' a required field there?
Python packages should already expose .version

Meta-Issue: "Soft" ToDo's before Beta Release

Docs

Audit Docs, have naive user install/test/explain
Resolve #37 (Install Instructions)

Other

Go through error messages and make sure at least most of them are helpful
Add a CHANGELOG.md

Community

Add contributing.md
Add Issue Template
?Add PR Template

Processing Server Manager - WIP

See alexmorley:ProcessorManager

Rough Logic:

MLSTUDY -> "I have this pool ID", "Give me a container"

LARI(Hub)
if !CONTAINER_ID & POOL_ID then

get all CONTAINER_IDs associated with POOL_ID
check their stats and/or in use flag
pick best
then return CONTAINER_ID to MLstudy

MLSTUDY then has a container ID and can proceed as before.

Details

Pool IDs for now are just simple strings. In the future they could be JWTs or we add a POOL_AUTH token variable.

To Do

Add Pool_ID to lariserver
Add Pool_ID as a filter parameter to get-available-containers from lari API
Add interface to explore in mlstudy.
MLStudy "Central Hub" Filter Containers by POOL_IDs accessible to a given user (--> thus user's pool access for a given hub either needs to be stored somewhere)
List all current (and past processes) for a child lari -
IN_USE Flag - Flag for child lari to say that no new clients should be able to connect.
"Spawners"
IF POOL_ID == some kubernetes cluster then spin me up a new pod in that cluster with mountainlab installed
- Similar logic for mlworkspace containers
- Similar logic for mountainlab containers but local spawners (e.g. exec(docker ...)))

64bit firings.mda write error with big files

Hey,

I have just begun using mountainlab-js for spike sorting. I followed the installation steps and have mountainlab-js installed along with the following packages installed.

banjoview.cross_correlograms ephys.bandpass_filter ephys.compare_ground_truth ephys.compute_cluster_metrics ephys.compute_cross_correlograms ephys.compute_templates ephys.convert_array ephys.synthesize_random_firings ephys.synthesize_random_waveforms ephys.synthesize_timeseries ephys.whiten kbucket.download kbucket.upload mountainsortalg.ms3 mountainsortalg.ms3alg ms3.apply_timestamp_offset ms3.apply_whitening_matrix ms3.bandpass_filter ms3.cluster_metrics ms3.combine_cluster_metrics ms3.combine_firing_segments ms3.combine_firings ms3.compute_amplitudes ms3.compute_templates ms3.compute_whitening_matrix ms3.concat_event_times ms3.concat_firings ms3.concat_timeseries ms3.confusion_matrix ms3.create_firings ms3.create_multiscale_timeseries ms3.extract_clips ms3.extract_firings ms3.isolation_metrics ms3.link_segments ms3.load_test ms3.mask_out_artifacts ms3.mv_compute_amplitudes ms3.mv_compute_templates ms3.mv_discrimhist ms3.mv_extract_clips ms3.mv_extract_clips_features ms3.mv_subfirings ms3.reorder_labels ms3.run_metrics_script ms3.split_firings ms3.whiten ms3.whiten_clips ms4alg.sort mv.compute_templates mv.create_multiscale_timeseries mv.extract_clips mv.mv_compute_amplitudes mv.mv_compute_templates mv.mv_discrimhist mv.mv_extract_clips mv.mv_extract_clips_features mv.mv_subfirings pyms.apply_label_map pyms.compute_accuracies pyms.compute_templates pyms.create_label_map pyms.extract_clips pyms.extract_geom pyms.extract_timeseries pyms.normalize_channels pyms.synthesize_drifting_timeseries pyms.synthesize_random_firings pyms.synthesize_random_waveforms pyms.synthesize_timeseries spikeview.metrics1 spikeview.templates

As you can see, I have packages from the old implementation of mountainsort linked to the new installation as well. I run the ephys.bandpass_filter followed by ms4alg.sort on my data using the following command.

ml-run-process ephys.bandpass_filter --inputs timeseries:i140703-001.mda --outputs timeseries_out:i140703-001-filt.mda --parameters samplerate:30000 freq_min:300 freq_max:6000 && ml-run-process ms4alg.sort --inputs timeseries:i140703-001-filt.mda geom:utah_geom.csv --outputs firings_out:i140710-001-firings.mda --parameters adjacency_radius:0 detect_sign:-1 detect_threshold:2.5 clip_size:40

The filtering works just fine and creates a filtered data file. I know it works fine, because I use the ev-view-timeseries package to visualize the data. But, the second step seems to work alright all the way to the end, but then doesn't really work. Here's why.

Visualizing it using qt-mountainview simply doesn't work. I use the following command to initiate it:

qt-mountainview --raw=i140703-001.mda --filt=i140703-001-filt.mda --geom=utah_geom.csv --firings=i140703-001-firings.mda

It appears to do a lot of computation, but the command-line output starts so:

** qt-mountainview ; origin: https://github.com/flatironinstitute/qt-mountainview.git ; commit: tags/standalone-r1-0-g9a3acac
** Compiled using Qt version: 5.5.1 (/usr on host: 'inm6058'
Setting up object registry...
Parsing command-line parameters...
Creating MVContext...
Setting up context...
Creating prv object for: i140703-001-filt.mda
Creating prv object for: i140703-001.mda
Creating prv object for: i140703-001-firings.mda
Setting up main window...
Adding controls to main window...
Opening initial views...
Starting event loop...

and then thousands of lines of:

Warning problem reading chunk in diskreadmda: 0<>100000

and the qt-mountainview window shows nothing. This might be a qt-mountainview error, so I'd be happy to file another report on the qt-mountainview repository.

I then used the readmda function to load the firings file and it throws up this error:

cannot reshape array of size 0 into shape (3,109122729)

so then I used numpy to load the same file using numpy.fromfile() with count=-1 and the output is just:

array([1.90979621e-313, 6.36598737e-314])

So, finally, I checked the filesize of the firings file and its just 20 bytes. Something is obviously wrong.

I am running this in a pip environment on Ubuntu 16.04. Any ideas what might be wrong here?

kbucketgui: view status and output of a lari job

retrieve info from lari node jobs/[job_id].json and jobs/[job_id].console.out