rdicosmo / parmap Goto Github PK

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.

Home Page: http://rdicosmo.github.io/parmap/

License: Other

OCaml 92.67% C 4.76% Makefile 0.50% TeX 2.08%

parmap's Introduction

Parmap in a nutshell

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications: if you want to use your many cores to accelerate an operation which happens to be a map, fold or map/fold (map-reduce), just use Parmap's parmap, parfold and parmapfold primitives in place of the standard List.map and friends, and specify the number of subprocesses to use by the optional parameter ~ncores.

See the example directory for a couple of running programs.

DO'S and DONT'S

Parmap is not meant to be a replacement for a full fledged implementation of parallelism skeletons (map, reduce, pipe, and the many others described in the scientific literature since the end of the 1980's, much earlier than the specific implementation by Google engineers that popularised them). It is meant, instead, to allow you to quickly leverage the idle processing power of your extra cores, when handling some heavy computational load.

The principle of parmap is very simple: when you call one of the three available primitives, map, fold, and mapfold , your OCaml sequential program forks in n subprocesses (you choose the n), and each subprocess performs the computation on the 1/n of the data, in chunks of a size you can choose, returning the results through a shared memory area to the parent process, that resumes execution once all the children have terminated, and the data has been recollected.

You need to run your program on a single multicore machine; repeat after me: Parmap is not meant to run on a cluster, see one of the many available (re)implementations of the map-reduce schema for that.

By forking the parent process on a single machine, the children get access, for free, to all the data structures already built, even the imperative ones, and as far as your computation inside the map/fold does not produce side effects that need to be preserved, the final result will be the same as performing the sequential operation, the only difference is that you might get it faster.

The OCaml code is reasonably simple and only marginally relies on external C libraries: most of the magic is done by your operating system's fork and memory mapping mechanisms. One could gain some speed by implementing a marshal/unmarshal operation directly on bigarrays, but we did not do this yet.

Of course, if you happen to have open channels, or files, or other connections that should only be used by the parent process, your program may behave in a very wierd way: as an example, do not open a graphic window before calling a Parmap primitive, and do not use this library if your program is multi-threaded!

Pinning processes to physical CPUs

To obtain maximum speed, Parmap tries to pin the worker processes to a CPU, using the scheduler affinity interface that is available in recent Linux kernels. Similar functionality may be obtained on different platforms using slightly different API. Contributions are welcome to support those other APIs, just make sure that you use autoconf properly.

Using Parmap with Ocamlnat

You can use Parmap in a native toplevel (it may be quite useful if you use the native toplevel to perform fast interactive computations), but remember that you need to load the .cmxs modules in it; an example is given in example/topnat.ml

Preservation of output order in Parmap

If the number of chunks is equal to the number of cores, it is easy to preserve the order of the elements of the sequence passed to the map/fold operations, so the result will be a list with the same order as if the sequential function would be applied to the input. This is what the parmap, parmapfold and parfold functions do when the chunksize argument is not used.

If the user specifies a chunksize that is different from the number of cores, the current implementation for parmap, parmapi, array_parmap and array_parmapi by default does not guarantee the preservation of the order of the results. If the keeporder parameter is set to true, an alternative implementation is used, that tags the chunks and reorders them at the end, so the result of calling Parmap.parmap f l is the same as List.map f l. Depending on the nature of your workload (in particular, number of chunks and size of the results), this may be way more efficient than implementing a sorting mechanism yourself, but may also end up using up to twice the space and time of the default implementation: there is a tradeoff, and it is up to the user to choose the solution that better suits him/her.

No reordering logic is implemented for parmapfold, parfold and their variants, as performing these operations in parallel only make sense if the order is irrelevant.

In general, using little chunksize helps in balancing the load among the workers, and provides better speed, but incurs a little overhead for tagging and reordering the chunks: there is a tradeoff, and it is up to the user to choose the solution that better suits him/her.

Fast map on arrays and on float arrays

Visiting an array is much faster than visiting a list, and conversion of an array to and from a list is expensive, on large data structures, so we provide a specialised version of map on arrays, that beaves exactly like parmap.

We also provide a highly optimised specialised parmap version that is targeted to float arrays, array_float_parmap, that allows you to perform parallel computation on very large float arrays efficiently, without the boxing/unboxing overhead introduced by the other primitives, including array_parmap.

To understand the efficiency issues involved in the case of large arrays of float, here is a short summary of the steps that any implementation of a parallel map function must perform.

create a float array to hold the result of the computation. This operation is expensive: on an Intel i7, creating a 10M float array takes 50 milliseconds

    ocamlnat
         Objective Caml version 3.12.0 - native toplevel

    # #load "unix.cmxs";;
    # let d = Unix.gettimeofday() in ignore(Array.create 10000000 0.); Unix.gettimeofday() -. d;;
    - : float = 0.0501301288604736328

create a shared memory area,
possibly copy the result array to the shared memory area,
perform the computation in the children writing the result in the shared memory area,
possibly copy the result back to the OCaml array.

All implementations need to do 1, 2 and 4; steps 3 and/or 5 may be omitted depending on what the user wants to do with the result.

The array_float_parmap performs steps 1, 2, 4 and 5. It is possible to share steps 1 and 2 among subsequent calls to the parallel function by preallocating the result array and the shared memory buffer, and passing them as optional parameters to the array_float_parmap function: this may save a significant amount of time if the array is very large.

Install

With opam

opam install parmap

From source

make
make install
make test

parmap's People

Contributors

Stargazers

Watchers

Forkers

unixjunkie rixed nilsbecker remicardona abate jeffmahoney celestial-intellect thierry-martinez taskset naereen madroach drup peterfrey raphael-proust eduardorfs iagoabal moyodiallo kit-ty-kate gasche

parmap's Issues

linking fails on 4.03.0 where 4.02.3 builds fine

one project of mine compiles fine with 4.02.3 but when i opam switch to 4.03.0+flambda or to 4.03.0, (and eval $(opam config env)) i get this:

+ ocamlfind ocamlopt -c -g -bin-annot -principal -w A -package 'containers, sequence, parmap, csv, gsl' -I facs -o facs_run.cmx facs_run.ml
File "_none_", line 1:
Warning 58: no cmx file was found in path for module Parmap, and its interface was not compiled with -opaque
+ ocamlfind ocamlopt -linkpkg -g -package 'containers, sequence, parmap, csv, gsl' facs/ad.cmx rng.cmx langevin.cmx facs/pars.cmx linalg.cmx facs/fields.cmx facs/run_pars.cmx facs/motion.cmx facs_run.cmx -o facs_run.native
File "_none_", line 1:
Error: No implementations provided for the following modules:
         Setcore referenced from /Users/nbecker/.opam/4.03.0+flambda/lib/parmap/parmap.cmxa(Parmap)

this is on os x, using ocamlbuild to build.

Parmap.parmap problem when process uses too much memory.

Hi
I am trying to use Parmap at many points in my code.
My program processes a lot of data loaded in memory. It uses up to 5GB out the 6GB of my system.

It looks like parmap always gets stuck at the same point.
My interpretation is the following one: when the father process uses for example 2GB, if I uses parmap with 4 cores, the 4 forked processes would then occupy 8GB and this causes the fork calls to fail and my program to be stuck.

Is this plausible/possible ?
If it is indeed the cause of my problem, is there any solution to this ?

Thanks for your time.
Regards.
Johan

Is there a way to identify the current process?

For example, do we have an integer that can be read that says
"you were the first worker process the master created by fork" (that would be 0 probably).
Also, I might need the total number of working processes to be accessible by a worker.
In MPI there is such thing (though I don't remember how they are called).
I would need this in order to read a file in parallel during a Parmap.pariter.
Reading a file in parallel will put more work in the parallel section of my program so will
increase the scalability.

Cooperation with job submission systems

There are some task execution environments available to which data processing can be delegated.
How do you think about to extend your library so that more processors and also remote computing resources can be reused?
Would you like to improve job submission possibilities?

Fatal error: exception End_of_file

Hi,

I'm using Mac OS X 10.7.5, ocaml 4.00.1 from Macports, and I just git-cloned & compiled parmap. It compiles and installs fine. Then, I compile and try the mandels example:

$ ./mandels.native
Computing...Got task...
Fatal error: exception End_of_file
Fatal error: exception End_of_file

On my own program, the same error happens. During its compilation, I get:
findlib: [WARNING] Interface myocamlbuild.cmi occurs in several directories: ., /opt/local/lib/ocaml/site-lib/parmap
(which I don't think is related to the crash, but just in case...)
Many thanks in advance for advises on how to solve this crash.
sam

Parmap does not install with OCaml 4.00.01

I tried:

opam -switch 4.00.01

Then resource the opam config.

Then I could not install parmap.

It does not happen with 4.00.0

Regards,
F.

we depend on automake

aclocal is provided by automake.
So parmap (the opam package) fails to install in case we don't have automake installed.
So we need a conf-automake package first ...

mandel_sdl compilation on os x

hi, i had to do a little work to get the sdl example to compile on os x. brew install sdl sdl_gfx sdl_image (although i don't know if the latter are necessary). then for the compilation, after some googling, i had to add

-ccopt "-framework CoreFoundation -framework Cocoa"

to the mandels_sdl.native target in the Makefile. it does run, and a lot faster than the pure Ocaml version. maybe this could be added?

Parmap MPI ?

Hello,

I wonder if it is possible to do a Parmap version using MPI ?

Then, it would be Dismap (distributed map).

With the exact same interface as Parmap I was thinking.

Regards,
F.

Parmap drops to half # cores half way through processing

Hi
I've been using parmap for over a year, but I noticed now that for very large memory tasks (e.g. >100GB) sometimes half the cores stop half way through processing.

I'm running on a dual-xeon workstation (8 physical cores x 2 processors).

Any suggestions for why this might happen? Any ways to diagnose such a problem?

Thanks!

************************ CALLING CODE **********************
spike_trains = parmap.map(deconvolve_parallel_new, zip(idx_list,proc_indexes), params, processes=n_processors)

************************** OUTPUT *******************************
(Cores 0,1,4,6 eventually stop running)
...
('# of chunks: ', 8, ' # cores: ', 8)
('Start time for deconvolution: ', 1520724614.860805)
(0, 47000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
(11750000, 11797000, 122)
('Processor: ', 1, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
(82250000, 82297000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
(23500000, 23547000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
(58750000, 58797000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
('Processor: ', 1, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 0, ' in dot product loop ...')
(35250000, 35297000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
('Processor: ', 2, 'chunk: ', 0, ' in dot product loop ...')
(47000000, 47047000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
(70500000, 70547000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 0, ' of ', 250)
('Processor: ', 6, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 1, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 0, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 0, ' in thresholding loop ...')
('Processor: ', 1, 'cleaned spikes: ', 28143)
(11797000, 11844000, 122)
('Processor: ', 1, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 1, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 30809)
(47000, 94000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 0, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33734)
(23547000, 23594000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 2, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 1, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 44277)
(82297000, 82344000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 4, 'cleaned spikes: ', 33821)
(47047000, 47094000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 7, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 1, 'cleaned spikes: ', 28213)
(11844000, 11891000, 122)
('Processor: ', 1, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 1, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37354)
(35297000, 35344000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 1, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 34794)
(94000, 141000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 5, 'cleaned spikes: ', 38211)
(58797000, 58844000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 0, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33733)
(23594000, 23641000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 2, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 6, 'cleaned spikes: ', 41285)
(70547000, 70594000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 1, ' of ', 250)
('Processor: ', 2, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 1, ' in dot product loop ...')
('Processor: ', 1, 'cleaned spikes: ', 28059)
(11891000, 11938000, 122)
('Processor: ', 1, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 7, 'cleaned spikes: ', 34761)
(82344000, 82391000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 1, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 1, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 30164)
(141000, 188000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 0, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 6, 'chunk: ', 1, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33902)
(23641000, 23688000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 2, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 1, 'cleaned spikes: ', 28371)
(11938000, 11985000, 122)
('Processor: ', 1, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 1, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 28222)
(188000, 235000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 0, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 41564)
(82391000, 82438000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 7, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33347)
(23688000, 23735000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 2, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 26848)
(235000, 282000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 0, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33229)
(23735000, 23782000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 2, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 39721)
(82438000, 82485000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 7, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 26138)
(282000, 329000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 0, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 4, 'cleaned spikes: ', 32354)
(47094000, 47141000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 4, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 25733)
(329000, 376000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 0, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33537)
(23782000, 23829000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 2, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37381)
(35344000, 35391000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 4, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 42143)
(82485000, 82532000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 7, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 6, 'cleaned spikes: ', 39024)
(70594000, 70641000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 7, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 43441)
(58844000, 58891000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 2, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 24679)
(376000, 423000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 6, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 2, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33029)
(23829000, 23876000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 2, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 2, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36239)
(35391000, 35438000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 24633)
(423000, 470000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 0, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 35683)
(82532000, 82579000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 7, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 4, 'cleaned spikes: ', 36150)
(47141000, 47188000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 0, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33347)
(23876000, 23923000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 2, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 24346)
(470000, 517000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 0, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 6, 'cleaned spikes: ', 36645)
(70641000, 70688000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 0, 'chunk: ', 10, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36694)
(35438000, 35485000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 3, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 39495)
(58891000, 58938000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 3, ' of ', 250)
('Processor: ', 6, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 3, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 42581)
(82579000, 82626000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 7, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23995)
(517000, 564000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 11, ' of ', 250)
('Processor: ', 0, 'chunk: ', 11, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33645)
(23923000, 23970000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 2, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 3, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 11, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23882)
(564000, 611000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 12, ' of ', 250)
('Processor: ', 0, 'chunk: ', 12, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37543)
(35485000, 35532000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 4, 'cleaned spikes: ', 37592)
(47188000, 47235000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 3, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 12, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34124)
(23970000, 24017000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 3, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 39159)
(82626000, 82673000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 7, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 10, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 24011)
(611000, 658000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 13, ' of ', 250)
('Processor: ', 0, 'chunk: ', 13, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 13, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 40356)
(58938000, 58985000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 5, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 6, 'cleaned spikes: ', 50675)
(70688000, 70735000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 4, ' of ', 250)
('Processor: ', 3, 'cleaned spikes: ', 37562)
(35532000, 35579000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 6, 'chunk: ', 4, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23655)
(658000, 705000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 14, ' of ', 250)
('Processor: ', 3, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 14, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33782)
(24017000, 24064000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 11, ' of ', 250)
('Processor: ', 2, 'chunk: ', 11, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 14, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 4, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 11, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 41785)
(82673000, 82720000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 7, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 4, 'cleaned spikes: ', 35244)
(47235000, 47282000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 7, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23572)
(705000, 752000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 15, ' of ', 250)
('Processor: ', 0, 'chunk: ', 15, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 15, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33803)
(24064000, 24111000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 12, ' of ', 250)
('Processor: ', 2, 'chunk: ', 12, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36564)
(35579000, 35626000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 3, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 12, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23916)
(752000, 799000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 16, ' of ', 250)
('Processor: ', 0, 'chunk: ', 16, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 16, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 41868)
(82720000, 82767000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 7, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 10, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33892)
(24111000, 24158000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 13, ' of ', 250)
('Processor: ', 2, 'chunk: ', 13, ' in dot product loop ...')
('Processor: ', 6, 'cleaned spikes: ', 38188)
(70735000, 70782000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 23726)
(799000, 846000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 17, ' of ', 250)
('Processor: ', 3, 'cleaned spikes: ', 36734)
(35626000, 35673000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 0, 'chunk: ', 17, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 6, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 42797)
(58985000, 59032000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 5, ' of ', 250)
('Processor: ', 2, 'chunk: ', 13, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 17, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 5, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 5, ' in thresholding loop ...')
('Processor: ', 4, 'cleaned spikes: ', 37038)
(47282000, 47329000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 23638)
(846000, 893000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 18, ' of ', 250)
('Processor: ', 0, 'chunk: ', 18, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 18, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 43500)
(82767000, 82814000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 11, ' of ', 250)
('Processor: ', 4, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 11, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33747)
(24158000, 24205000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 14, ' of ', 250)
('Processor: ', 2, 'chunk: ', 14, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36545)
(35673000, 35720000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 3, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 11, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 14, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 22953)
(893000, 940000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 19, ' of ', 250)
('Processor: ', 3, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 19, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 19, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33721)
(24205000, 24252000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 15, ' of ', 250)
('Processor: ', 2, 'chunk: ', 15, ' in dot product loop ...')
('Processor: ', 6, 'cleaned spikes: ', 36094)
(70782000, 70829000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 7, 'cleaned spikes: ', 35595)
(82814000, 82861000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 12, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 23794)
(940000, 987000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 20, ' of ', 250)
('Processor: ', 7, 'chunk: ', 12, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 20, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 15, ' in thresholding loop ...')
('Processor: ', 6, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 20, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 36168)
(59032000, 59079000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 6, ' of ', 250)
('Processor: ', 3, 'cleaned spikes: ', 37143)
(35720000, 35767000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 7, 'chunk: ', 12, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 6, ' in dot product loop ...')
('Processor: ', 6, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 6, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 10, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23689)
(987000, 1034000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 21, ' of ', 250)
('Processor: ', 0, 'chunk: ', 21, ' in dot product loop ...')
('Processor: ', 4, 'cleaned spikes: ', 40212)
(47329000, 47376000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 2, 'cleaned spikes: ', 33371)
(24252000, 24299000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 16, ' of ', 250)
('Processor: ', 2, 'chunk: ', 16, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 21, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 16, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 44336)
(82861000, 82908000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 13, ' of ', 250)
('Processor: ', 3, 'cleaned spikes: ', 37154)
(35767000, 35814000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 11, ' of ', 250)
('Processor: ', 7, 'chunk: ', 13, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23613)
(1034000, 1081000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 22, ' of ', 250)
('Processor: ', 3, 'chunk: ', 11, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 22, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 22, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 13, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 11, ' in thresholding loop ...')
('Processor: ', 6, 'cleaned spikes: ', 40275)
(70829000, 70876000, 122)
('Processor: ', 6, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 6, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33745)
(24299000, 24346000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 17, ' of ', 250)
('Processor: ', 2, 'chunk: ', 17, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 39902)
(59079000, 59126000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 7, ' of ', 250)
('Processor: ', 0, 'cleaned spikes: ', 23536)
(1081000, 1128000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 23, ' of ', 250)
('Processor: ', 0, 'chunk: ', 23, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 7, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 17, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 23, ' in thresholding loop ...')
('Processor: ', 4, 'cleaned spikes: ', 35044)
(47376000, 47423000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 5, 'chunk: ', 7, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 42459)
(82908000, 82955000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 14, ' of ', 250)
('Processor: ', 7, 'chunk: ', 14, ' in dot product loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23798)
(1128000, 1175000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 24, ' of ', 250)
('Processor: ', 0, 'chunk: ', 24, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33405)
(24346000, 24393000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 18, ' of ', 250)
('Processor: ', 2, 'chunk: ', 18, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37963)
(35814000, 35861000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 12, ' of ', 250)
('Processor: ', 0, 'chunk: ', 24, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 14, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 12, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 18, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 12, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23804)
(1175000, 1222000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 25, ' of ', 250)
('Processor: ', 0, 'chunk: ', 25, ' in dot product loop ...')
('Processor: ', 0, 'chunk: ', 25, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33620)
(24393000, 24440000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 19, ' of ', 250)
('Processor: ', 2, 'chunk: ', 19, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 38697)
(82955000, 83002000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 15, ' of ', 250)
('Processor: ', 7, 'chunk: ', 15, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 19, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 24238)
(1222000, 1269000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 26, ' of ', 250)
('Processor: ', 0, 'chunk: ', 26, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 15, ' in thresholding loop ...')
('Processor: ', 0, 'chunk: ', 26, ' in thresholding loop ...')
('Processor: ', 0, 'cleaned spikes: ', 23951)
(1269000, 1316000, 122)
('Processor: ', 0, ' Loading : ', 96.256, 'MB, chunk: ', 27, ' of ', 250)
('Processor: ', 2, 'cleaned spikes: ', 33508)
(24440000, 24487000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 20, ' of ', 250)
('Processor: ', 0, 'chunk: ', 27, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 20, ' in dot product loop ...')
('Processor: ', 4, 'cleaned spikes: ', 37434)
(47423000, 47470000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 2, 'chunk: ', 20, ' in thresholding loop ...')
('Processor: ', 4, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 42054)
(59126000, 59173000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 8, ' of ', 250)
('Processor: ', 7, 'cleaned spikes: ', 42837)
(83002000, 83049000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 16, ' of ', 250)
('Processor: ', 7, 'chunk: ', 16, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 8, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 16, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 8, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37090)
(35861000, 35908000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 13, ' of ', 250)
('Processor: ', 3, 'chunk: ', 13, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33851)
(24487000, 24534000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 21, ' of ', 250)
('Processor: ', 2, 'chunk: ', 21, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 13, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 21, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 40524)
(83049000, 83096000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 17, ' of ', 250)
('Processor: ', 7, 'chunk: ', 17, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 17, ' in thresholding loop ...')
('Processor: ', 4, 'cleaned spikes: ', 37956)
(47470000, 47517000, 122)
('Processor: ', 4, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 2, 'cleaned spikes: ', 34146)
(24534000, 24581000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 22, ' of ', 250)
('Processor: ', 2, 'chunk: ', 22, ' in dot product loop ...')
('Processor: ', 4, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 22, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 41225)
(59173000, 59220000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 9, ' of ', 250)
('Processor: ', 5, 'chunk: ', 9, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37287)
(35908000, 35955000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 14, ' of ', 250)
('Processor: ', 3, 'chunk: ', 14, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 9, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 14, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 40758)
(83096000, 83143000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 18, ' of ', 250)
('Processor: ', 7, 'chunk: ', 18, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34011)
(24581000, 24628000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 23, ' of ', 250)
('Processor: ', 2, 'chunk: ', 23, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 18, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 23, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 39093)
(59220000, 59267000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 10, ' of ', 250)
('Processor: ', 5, 'chunk: ', 10, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37200)
(35955000, 36002000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 15, ' of ', 250)
('Processor: ', 3, 'chunk: ', 15, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 10, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33736)
(24628000, 24675000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 24, ' of ', 250)
('Processor: ', 2, 'chunk: ', 24, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 15, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 42525)
(83143000, 83190000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 19, ' of ', 250)
('Processor: ', 2, 'chunk: ', 24, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 19, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 19, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36865)
(36002000, 36049000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 16, ' of ', 250)
('Processor: ', 3, 'chunk: ', 16, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33531)
(24675000, 24722000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 25, ' of ', 250)
('Processor: ', 5, 'cleaned spikes: ', 40783)
(59267000, 59314000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 11, ' of ', 250)
('Processor: ', 2, 'chunk: ', 25, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 11, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 16, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 11, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 25, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 47975)
(83190000, 83237000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 20, ' of ', 250)
('Processor: ', 7, 'chunk: ', 20, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33197)
(24722000, 24769000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 26, ' of ', 250)
('Processor: ', 2, 'chunk: ', 26, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 20, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 35812)
(59314000, 59361000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 12, ' of ', 250)
('Processor: ', 5, 'chunk: ', 12, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 26, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 45069)
(36049000, 36096000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 17, ' of ', 250)
('Processor: ', 3, 'chunk: ', 17, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 12, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 17, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 38380)
(83237000, 83284000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 21, ' of ', 250)
('Processor: ', 2, 'cleaned spikes: ', 33467)
(24769000, 24816000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 27, ' of ', 250)
('Processor: ', 7, 'chunk: ', 21, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 27, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 21, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 27, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 41816)
(59361000, 59408000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 13, ' of ', 250)
('Processor: ', 3, 'cleaned spikes: ', 36041)
(36096000, 36143000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 18, ' of ', 250)
('Processor: ', 5, 'chunk: ', 13, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 18, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 18, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 13, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34213)
(24816000, 24863000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 28, ' of ', 250)
('Processor: ', 2, 'chunk: ', 28, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 34783)
(83284000, 83331000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 22, ' of ', 250)
('Processor: ', 7, 'chunk: ', 22, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 28, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 22, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36062)
(36143000, 36190000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 19, ' of ', 250)
('Processor: ', 3, 'chunk: ', 19, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 38813)
(59408000, 59455000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 14, ' of ', 250)
('Processor: ', 5, 'chunk: ', 14, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 19, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 14, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34197)
(24863000, 24910000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 29, ' of ', 250)
('Processor: ', 2, 'chunk: ', 29, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 29, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 47942)
(83331000, 83378000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 23, ' of ', 250)
('Processor: ', 7, 'chunk: ', 23, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 34400)
(36190000, 36237000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 20, ' of ', 250)
('Processor: ', 3, 'chunk: ', 20, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 23, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 20, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 39521)
(59455000, 59502000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 15, ' of ', 250)
('Processor: ', 5, 'chunk: ', 15, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34706)
(24910000, 24957000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 30, ' of ', 250)
('Processor: ', 2, 'chunk: ', 30, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 15, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 30, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 35234)
(83378000, 83425000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 24, ' of ', 250)
('Processor: ', 7, 'chunk: ', 24, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 35514)
(36237000, 36284000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 21, ' of ', 250)
('Processor: ', 3, 'chunk: ', 21, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 24, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34095)
(24957000, 25004000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 31, ' of ', 250)
('Processor: ', 3, 'chunk: ', 21, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 31, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 41911)
(59502000, 59549000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 16, ' of ', 250)
('Processor: ', 5, 'chunk: ', 16, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 31, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 16, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 37764)
(36284000, 36331000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 22, ' of ', 250)
('Processor: ', 3, 'chunk: ', 22, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34068)
(25004000, 25051000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 32, ' of ', 250)
('Processor: ', 2, 'chunk: ', 32, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 45370)
(83425000, 83472000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 25, ' of ', 250)
('Processor: ', 7, 'chunk: ', 25, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 22, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 32, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 25, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 43003)
(59549000, 59596000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 17, ' of ', 250)
('Processor: ', 5, 'chunk: ', 17, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 17, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 31876)
(36331000, 36378000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 23, ' of ', 250)
('Processor: ', 2, 'cleaned spikes: ', 34136)
(25051000, 25098000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 33, ' of ', 250)
('Processor: ', 3, 'chunk: ', 23, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 33, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 23, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 33, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 44092)
(83472000, 83519000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 26, ' of ', 250)
('Processor: ', 7, 'chunk: ', 26, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 26, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33309)
(25098000, 25145000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 34, ' of ', 250)
('Processor: ', 2, 'chunk: ', 34, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 44921)
(59596000, 59643000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 18, ' of ', 250)
('Processor: ', 5, 'chunk: ', 18, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 38187)
(36378000, 36425000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 24, ' of ', 250)
('Processor: ', 3, 'chunk: ', 24, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 34, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 18, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 24, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 43008)
(83519000, 83566000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 27, ' of ', 250)
('Processor: ', 7, 'chunk: ', 27, ' in dot product loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34192)
(25145000, 25192000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 35, ' of ', 250)
('Processor: ', 2, 'chunk: ', 35, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 27, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 35, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 36091)
(36425000, 36472000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 25, ' of ', 250)
('Processor: ', 3, 'chunk: ', 25, ' in dot product loop ...')
('Processor: ', 5, 'cleaned spikes: ', 43376)
(59643000, 59690000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 19, ' of ', 250)
('Processor: ', 5, 'chunk: ', 19, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 25, ' in thresholding loop ...')
('Processor: ', 5, 'chunk: ', 19, ' in thresholding loop ...')
('Processor: ', 7, 'cleaned spikes: ', 31721)
(83566000, 83613000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 28, ' of ', 250)
('Processor: ', 7, 'chunk: ', 28, ' in dot product loop ...')

('Processor: ', 2, 'cleaned spikes: ', 34201)
(25192000, 25239000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 36, ' of ', 250)
('Processor: ', 7, 'chunk: ', 28, ' in thresholding loop ...')
('Processor: ', 2, 'chunk: ', 36, ' in dot product loop ...')
('Processor: ', 3, 'cleaned spikes: ', 30913)
(36472000, 36519000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 26, ' of ', 250)
('Processor: ', 3, 'chunk: ', 26, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 36, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 26, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 38935)
(59690000, 59737000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 20, ' of ', 250)
('Processor: ', 5, 'chunk: ', 20, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 20, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 33964)
(25239000, 25286000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 37, ' of ', 250)
('Processor: ', 2, 'chunk: ', 37, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 42433)
(83613000, 83660000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 29, ' of ', 250)
('Processor: ', 7, 'chunk: ', 29, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 37, ' in thresholding loop ...')
('Processor: ', 7, 'chunk: ', 29, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 35592)
(36519000, 36566000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 27, ' of ', 250)
('Processor: ', 3, 'chunk: ', 27, ' in dot product loop ...')
('Processor: ', 3, 'chunk: ', 27, ' in thresholding loop ...')
('Processor: ', 5, 'cleaned spikes: ', 36710)
(59737000, 59784000, 122)
('Processor: ', 5, ' Loading : ', 96.256, 'MB, chunk: ', 21, ' of ', 250)
('Processor: ', 5, 'chunk: ', 21, ' in dot product loop ...')
('Processor: ', 5, 'chunk: ', 21, ' in thresholding loop ...')
('Processor: ', 2, 'cleaned spikes: ', 34275)
(25286000, 25333000, 122)
('Processor: ', 2, ' Loading : ', 96.256, 'MB, chunk: ', 38, ' of ', 250)
('Processor: ', 2, 'chunk: ', 38, ' in dot product loop ...')
('Processor: ', 7, 'cleaned spikes: ', 38651)
(83660000, 83707000, 122)
('Processor: ', 7, ' Loading : ', 96.256, 'MB, chunk: ', 30, ' of ', 250)
('Processor: ', 7, 'chunk: ', 30, ' in dot product loop ...')
('Processor: ', 2, 'chunk: ', 38, ' in thresholding loop ...')
('Processor: ', 3, 'cleaned spikes: ', 33573)
(36566000, 36613000, 122)
('Processor: ', 3, ' Loading : ', 96.256, 'MB, chunk: ', 28, ' of ', 250)
('Processor: ', 3, 'chunk: ', 28, ' in dot product loop ...')
('Processor: ', 7, 'chunk: ', 30, ' in thresholding loop ...')
('Processor: ', 3, 'chunk: ', 28, ' in thresholding loop ...')

question about the array_float_parmapi performance hack

Hello,

In this function, is the bigarray used as a shared memory between several
processes?

Does it allow the worker processes to communicate with the parent
process without having to marshal/unmarshal their messages?

Thanks,
F.

make uninstall target

Hello,

Does it also remove the .so file generated
at compile time and installed by make install?

Thanks,
F.

chunksize

I have the impression that providing a chunksize and a list of work that has length longer than the number of cores requested allocates a file descriptor that is never freed.

I believe that I am using the following version:

.opam/4.06.1/lib/parmap

bug: gross error in Parmap.array_parmapi

 #require "parmap";;
Parmap.array_parmapi ~ncores:2 ~chunksize:1 (fun i x -> (i, x)) [|0;1;2;3;4;5;6;7;8;9|];;
- : (int * int) array =
[|(0, 8); (0, 6); (0, 4); (0, 3); (0, 1); (0, 0); (0, 9); (0, 7); (0, 5);
  (0, 2)|]

parmap throws Out of memory exception on 32 bit architectures

Hi,

I'm using parmap to parallelize parts of the creation of Debian dependency graphs. Recently I noticed that given the same input, my program would quit with:

Fatal error: exception Out of memory

on all 32 bit architectures. Here is an example build log of the problem on i386:

https://buildd.debian.org/status/fetch.php?pkg=botch&arch=i386&ver=0.18-1&stamp=1473844641

The problem does not occur in the exact same situation on a 64 bit architecture like amd64. I can also confirm that the problem is not a lack of physical memory because I get the same error inside a 32 bit i386 chroot on an amd64 system where my program using parmap runs without problems.

Furthermore, the problem goes away if I increase the parallelism of parmap. In my specific setup the problem occurs when I split the job into two but disappears when I split the job into four.

I rebuilt my software with debugging enabled and ran it with OCAMLRUNPARAM=b and was able to confirm that the exception gets thrown by this line in my code:

Parmap.parmap ~ncores:num_cores worker (Parmap.L todo)

I did not yet rebuild parmap with debugging enabled to follow the error deeper into the parmap code.

Since this problem is 32 bit specific, I guess this is related to OCaml string or buffer size constraints?

exception handling in child processes

Parmap would be more useful for server-side and daemon applications if it had better handling of exceptions raised in child processes. In such applications, errors must be handled promptly, and the show must go on!

An exception in a worker should cause the parallel computation to stop ASAP, and an informative exception raised to the caller. This is understandably tricky due to lack of exception marshalling, but there are better things to do than panic and exit.

Some example semantics that might be worth evaluating are in my ForkWork library. See particularly the paragraph starting "If a child process ends with an exception...", the fail_fast option, and the ChildExn discussion.

parmap doesn't compile with OCaml 4.06.0

#=== ERROR while installing parmap.1.0-rc8 ====================================#
# opam-version 1.2.2
# os           linux
# command      make DESTDIR=/home/travis/.opam/4.06.0 OCAMLLIBDIR=lib
# path         /home/travis/.opam/4.06.0/build/parmap.1.0-rc8
# compiler     4.06.0
# exit-code    2
# env-file     /home/travis/.opam/4.06.0/build/parmap.1.0-rc8/parmap-15223-c1bafb.env
# stdout-file  /home/travis/.opam/4.06.0/build/parmap.1.0-rc8/parmap-15223-c1bafb.out
# stderr-file  /home/travis/.opam/4.06.0/build/parmap.1.0-rc8/parmap-15223-c1bafb.err
### stdout ###
# [...]
# Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
# File "bytearray.ml", line 41, characters 0-116:
# Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
# File "bytearray.ml", line 47, characters 10-23:
# Warning 3: deprecated: String.create
# Use Bytes.create instead.
# File "bytearray.ml", line 48, characters 28-29:
# Error: This expression has type bytes but an expression was expected of type
#          string
# Command exited with code 2.

Data marshaling problem

I am currently getting some errors "Fatal error: exception Failure("output_value: object too big")".
I am not absolutely sure about this, but I strongly suspect the Marshal.to_string call in the function marshal at line 114 of the parmap.ml file to be the cause of this.

Do you have any idea how to fix this ?
Can bin_prot/biniou help in this case ? If yes, it would be nice to be able to use these alternative marshaling methods.

Number of cores

For code portability, it would be nice if Parmap could figure out the number of available cores by itself.

automatic detection of the maximum number of local cores

this was discussed previously, maybe we should first do
an ocaml library wrapping some C code used in distcc
in order to do that

redirect

Is there anything that flushes or closes stdout when using Parmap.redirect? I have the impression that my stderr files are complete, but my stdout files are not.
Are there examples of using redirect properly?

when ~ncores=1, Parmap should not fork

We don't want Parmap to be invoked in vain.
Also, this would ease writing tests for Parmap.
List.map is equivalent to Parmap.map ~ncores:1 and there is almost no overhead.

META linkopts

Ocamlbuild singlequotes the value of linkopts in the META file, making the subsequent calls to ocamlmktop fails. ocamlmktop '-cclib ...'

linkopts(byte) = "-cclib -lparmap_stubs"

should be replaced by:

linkopts(byte) += "-cclib"
linkopts(byte) += "-lparmap_stubs"

stray myocamlbuild.cmi

Here is a smaller issue. Doing make; make install in the parmap directory triggers findlib installation in .../site-lib/parmap/
One file is copied there that should not : myocamlbuild.cmi
Then findlib warns that it finds many interface files with the same name:

 findlib: [WARNING] Interface myocamlbuild.cmi occurs in several directories: ., /opt/local/lib/ocaml/site-lib/parmap

I guess the installation configuration should be revised to not copy that .cmi

ocaml 4.08

Apparently parmap doesn't work with ocaml 4.08. Apparently Bigarray.Array1.map_file has disappeared.

lines are too long in many files

lines longer than 80 chars hinder the readability of the code

configure needlessly checks for camlp4

I didn't find any place where the build of Parmap needs camlp4, but configure checks for it. Is there a reason for this?

redirect error

My code is:

	Printf.eprintf
	  "prefix: %s %b\n" prefix (Sys.file_exists prefix);
	flush stderr;
	Parmap.parfold
	  ~init:(fun id ->
	    Printf.eprintf "init: %s %d\n" prefix id; flush stderr;
	    Parmap.redirect ~path:prefix ~id)

I'm getting:

prefix: nothing_17_10_2017:8:36:16_9316 false
init: nothing_17_10_2017:8:36:16_9316 0
init: nothing_17_10_2017:8:36:16_9316 1
[Pid 9338]: Error creating nothing_17_10_2017:8:36:16_9316 : File exists; procee
ding without stdout/stderr redirection
init: nothing_17_10_2017:8:36:16_9316 2
init: nothing_17_10_2017:8:36:16_9316 3

What is going wrong?

Cores use for concurrent programs

When I run two or more copies of the same program, the subprocesses (say 4) for the different program copies always share the first 4 cores although there are other cores free. Is this the expected behaviour, that all programs request the first (4) cores?
Thanks in advance,

ocamldoc doesn't show up in ocp-browser

we might need to install the .cmt and .cmti files

I suspect the current implementations of parmapfold and parfold are far from optimal

One example: parallel folding over a list will first convert it into an array.
I think this is unnecessary.
I may send some propositions (as code) soon.

Running tests on 32bit arches

While tests work really fine on amd64, I'm trying to run them on arm (I've also seen similar reports for x86), and they fail:

 # ./floatscale.native 
Fatal error: exception Invalid_argument("Array.make")

This one looks like a bug in ocaml itself:

# nData="100000" ./floatscale.native 
Test: normal parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.087256 seconds
Parmap failure: result mismatch!
Speedup with 8 cores (average on 1 iterations): 0.333858 (tseq=0.087256, tpar=0.261357)
Test: specialised array parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.083198 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 1 iterations): 0.885123 (tseq=0.083198, tpar=0.093996)
Test: specialised float array parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.028458 seconds
Segmentation fault

Reducing nData even more makes this one pass:

# nData="10000" ./floatscale.native 
Test: normal parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.011752 seconds
Parmap failure: result mismatch!
Speedup with 8 cores (average on 1 iterations): 0.162444 (tseq=0.011752, tpar=0.072344)
Test: specialised array parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.001987 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 1 iterations): 0.074017 (tseq=0.001987, tpar=0.026845)
Test: specialised float array parmap
Testing scalability with 1 iterations on 8 to 8 cores, step 1
Sequential execution takes 0.002054 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 1 iterations): 0.152427 (tseq=0.002054, tpar=0.013475)

This one seems to work but is quite heavy (1h 20 mins to run):

# time ./simplescale_array.native 
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 275.374505 seconds
Speedup with 1 cores (average on 2 iterations): 1.004610 (tseq=275.374505, tpar=274.110981)
Speedup with 2 cores (average on 2 iterations): 2.017963 (tseq=275.374505, tpar=136.461628)
Speedup with 3 cores (average on 2 iterations): 3.014285 (tseq=275.374505, tpar=91.356502)
Speedup with 4 cores (average on 2 iterations): 3.925457 (tseq=275.374505, tpar=70.150940)
Speedup with 5 cores (average on 2 iterations): 3.883548 (tseq=275.374505, tpar=70.907973)
Speedup with 6 cores (average on 2 iterations): 3.967147 (tseq=275.374505, tpar=69.413733)
Speedup with 7 cores (average on 2 iterations): 3.961677 (tseq=275.374505, tpar=69.509584)
Speedup with 8 cores (average on 2 iterations): 3.995084 (tseq=275.374505, tpar=68.928342)
Speedup with 9 cores (average on 2 iterations): 3.998568 (tseq=275.374505, tpar=68.868276)
Speedup with 10 cores (average on 2 iterations): 4.006729 (tseq=275.374505, tpar=68.728008)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 271.168359 seconds
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 1 cores (average on 2 iterations): 1.003275 (tseq=271.168359, tpar=270.283260)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 2 cores (average on 2 iterations): 1.974649 (tseq=271.168359, tpar=137.324849)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 3 cores (average on 2 iterations): 2.948170 (tseq=271.168359, tpar=91.978551)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 4 cores (average on 2 iterations): 3.832768 (tseq=271.168359, tpar=70.750011)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 5 cores (average on 2 iterations): 3.853020 (tseq=271.168359, tpar=70.378129)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 6 cores (average on 2 iterations): 3.891766 (tseq=271.168359, tpar=69.677466)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 7 cores (average on 2 iterations): 3.926030 (tseq=271.168359, tpar=69.069350)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 8 cores (average on 2 iterations): 3.934487 (tseq=271.168359, tpar=68.920886)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 9 cores (average on 2 iterations): 3.945327 (tseq=271.168359, tpar=68.731533)
Parmap failure: result order was expected to be preserved, and is not.
Parmap failure: result order was expected to be preserved, and is not.
Speedup with 10 cores (average on 2 iterations): 3.930145 (tseq=271.168359, tpar=68.997032)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.076638 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 1 cores (average on 2 iterations): 0.983674 (tseq=11.076638, tpar=11.260477)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 2 cores (average on 2 iterations): 1.968036 (tseq=11.076638, tpar=5.628270)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 3 cores (average on 2 iterations): 2.932193 (tseq=11.076638, tpar=3.777596)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 4 cores (average on 2 iterations): 3.877352 (tseq=11.076638, tpar=2.856753)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 5 cores (average on 2 iterations): 3.866129 (tseq=11.076638, tpar=2.865046)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 6 cores (average on 2 iterations): 3.840747 (tseq=11.076638, tpar=2.883980)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 7 cores (average on 2 iterations): 3.849689 (tseq=11.076638, tpar=2.877281)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 2 iterations): 3.854308 (tseq=11.076638, tpar=2.873833)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 9 cores (average on 2 iterations): 3.847798 (tseq=11.076638, tpar=2.878695)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 10 cores (average on 2 iterations): 3.818525 (tseq=11.076638, tpar=2.900764)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.086917 seconds
Speedup with 1 cores (average on 2 iterations): 0.990678 (tseq=11.086917, tpar=11.191242)
Speedup with 2 cores (average on 2 iterations): 1.966918 (tseq=11.086917, tpar=5.636696)
Speedup with 3 cores (average on 2 iterations): 2.941194 (tseq=11.086917, tpar=3.769529)
Speedup with 4 cores (average on 2 iterations): 3.848697 (tseq=11.086917, tpar=2.880694)
Speedup with 5 cores (average on 2 iterations): 3.449896 (tseq=11.086917, tpar=3.213696)
Speedup with 6 cores (average on 2 iterations): 3.789007 (tseq=11.086917, tpar=2.926074)
Speedup with 7 cores (average on 2 iterations): 3.824533 (tseq=11.086917, tpar=2.898895)
Speedup with 8 cores (average on 2 iterations): 3.865556 (tseq=11.086917, tpar=2.868130)
Speedup with 9 cores (average on 2 iterations): 3.891372 (tseq=11.086917, tpar=2.849102)
Speedup with 10 cores (average on 2 iterations): 3.871551 (tseq=11.086917, tpar=2.863689)

real	78m5.130s
user	197m51.870s
sys	0m3.320s

And this one fails:

# ./simplescale.native 
*** Checking corner cases: call on empty lists and arrays must not raise an exception
*   parmap []
*   parmap [| |]
*   pariter []
*   pariter [| |]
*** Checking the code for non tail recursive calls: an exception here indicates there are some left
Testing scalability with 1 iterations on 2 to 2 cores, step 1
Sequential execution takes 5.213010 seconds
Fatal error: exception Invalid_argument("Array.make")

If I reduce a bit the data used by changing the line to:

scale_test (fun x -> x) (L (initsegm 1000000)) 1 2 2;;

(remove one zero)

then it works but is also rather long (3 hours):

# time ./simplescale.native 
*** Checking corner cases: call on empty lists and arrays must not raise an exception
*   parmap []
*   parmap [| |]
*   pariter []
*   pariter [| |]
*** Checking the code for non tail recursive calls: an exception here indicates there are some left
Testing scalability with 1 iterations on 2 to 2 cores, step 1
Sequential execution takes 0.318109 seconds
Speedup with 2 cores (average on 1 iterations): 0.170100 (tseq=0.318109, tpar=1.870130)
*** Checking that we properly parallelise execution if we have less tasks than cores: if you do not see 5 processes, there is a problem
*   Simplemapper 8 cores, 5 elements
[Parmap]: mapper on 5 elements, on 5 cores
[Parmap]: simplemapper on 5 elements, on 5 cores, chunksize = 1
*   Simpleiter 8 cores, 5 elements
[Parmap]: geniter on 5 elements, on 5 cores
[Parmap]: simplemapper on 5 elements, on 5 cores, chunksize = 1
*** Checking that we properly handle bogus core numbers
*   Simplemapper 0 cores
[Parmap]: mapper on 5 elements, on 1 cores
[Parmap]: simplemapper on 5 elements, on 1 cores, chunksize = 5
*   Simpleiter 0 cores
[Parmap]: geniter on 5 elements, on 1 cores
[Parmap]: simplemapper on 5 elements, on 1 cores, chunksize = 5
*** Computations on integer lists
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.071200 seconds
Speedup with 1 cores (average on 2 iterations): 0.985225 (tseq=11.071200, tpar=11.237235)
Speedup with 2 cores (average on 2 iterations): 1.969637 (tseq=11.071200, tpar=5.620935)
Speedup with 3 cores (average on 2 iterations): 2.903535 (tseq=11.071200, tpar=3.813007)
Speedup with 4 cores (average on 2 iterations): 3.828603 (tseq=11.071200, tpar=2.891708)
Speedup with 5 cores (average on 2 iterations): 3.534386 (tseq=11.071200, tpar=3.132426)
Speedup with 6 cores (average on 2 iterations): 3.614958 (tseq=11.071200, tpar=3.062608)
Speedup with 7 cores (average on 2 iterations): 3.780064 (tseq=11.071200, tpar=2.928840)
Speedup with 8 cores (average on 2 iterations): 3.868722 (tseq=11.071200, tpar=2.861720)
Speedup with 9 cores (average on 2 iterations): 3.739794 (tseq=11.071200, tpar=2.960377)
Speedup with 10 cores (average on 2 iterations): 3.818951 (tseq=11.071200, tpar=2.899016)
*** Computations on integer lists (chunksize=100)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.144213 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 1 cores (average on 2 iterations): 0.996611 (tseq=11.144213, tpar=11.182109)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 2 cores (average on 2 iterations): 1.985630 (tseq=11.144213, tpar=5.612432)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 3 cores (average on 2 iterations): 2.942269 (tseq=11.144213, tpar=3.787625)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 4 cores (average on 2 iterations): 3.900890 (tseq=11.144213, tpar=2.856838)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 5 cores (average on 2 iterations): 3.883706 (tseq=11.144213, tpar=2.869479)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 6 cores (average on 2 iterations): 3.832544 (tseq=11.144213, tpar=2.907785)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 7 cores (average on 2 iterations): 3.868699 (tseq=11.144213, tpar=2.880610)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 2 iterations): 3.879886 (tseq=11.144213, tpar=2.872304)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 9 cores (average on 2 iterations): 3.823246 (tseq=11.144213, tpar=2.914856)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 10 cores (average on 2 iterations): 3.841000 (tseq=11.144213, tpar=2.901384)
*** Computations on integer arrays
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.231302 seconds
Speedup with 1 cores (average on 2 iterations): 1.003011 (tseq=11.231302, tpar=11.197590)
Speedup with 2 cores (average on 2 iterations): 1.995717 (tseq=11.231302, tpar=5.627703)
Speedup with 3 cores (average on 2 iterations): 2.980688 (tseq=11.231302, tpar=3.768024)
Speedup with 4 cores (average on 2 iterations): 3.942644 (tseq=11.231302, tpar=2.848673)
Speedup with 5 cores (average on 2 iterations): 3.613408 (tseq=11.231302, tpar=3.108230)
Speedup with 6 cores (average on 2 iterations): 3.796506 (tseq=11.231302, tpar=2.958326)
Speedup with 7 cores (average on 2 iterations): 3.766651 (tseq=11.231302, tpar=2.981774)
Speedup with 8 cores (average on 2 iterations): 3.901223 (tseq=11.231302, tpar=2.878918)
Speedup with 9 cores (average on 2 iterations): 3.833667 (tseq=11.231302, tpar=2.929650)
Speedup with 10 cores (average on 2 iterations): 3.860892 (tseq=11.231302, tpar=2.908991)
*** Computations on integer arrays (chunksize-100)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 11.145155 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 1 cores (average on 2 iterations): 0.990616 (tseq=11.145155, tpar=11.250728)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 2 cores (average on 2 iterations): 1.990085 (tseq=11.145155, tpar=5.600341)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 3 cores (average on 2 iterations): 2.920895 (tseq=11.145155, tpar=3.815665)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 4 cores (average on 2 iterations): 3.900096 (tseq=11.145155, tpar=2.857661)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 5 cores (average on 2 iterations): 3.844258 (tseq=11.145155, tpar=2.899170)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 6 cores (average on 2 iterations): 3.863045 (tseq=11.145155, tpar=2.885070)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 7 cores (average on 2 iterations): 3.838845 (tseq=11.145155, tpar=2.903257)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 2 iterations): 3.829774 (tseq=11.145155, tpar=2.910134)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 9 cores (average on 2 iterations): 3.817382 (tseq=11.145155, tpar=2.919581)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 10 cores (average on 2 iterations): 3.852729 (tseq=11.145155, tpar=2.892795)
*** Computations on lists of floats
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 266.002211 seconds
Speedup with 1 cores (average on 2 iterations): 0.996843 (tseq=266.002211, tpar=266.844535)
Speedup with 2 cores (average on 2 iterations): 1.947518 (tseq=266.002211, tpar=136.585236)
Speedup with 3 cores (average on 2 iterations): 2.912077 (tseq=266.002211, tpar=91.344489)
Speedup with 4 cores (average on 2 iterations): 3.812942 (tseq=266.002211, tpar=69.762977)
Speedup with 5 cores (average on 2 iterations): 3.852758 (tseq=266.002211, tpar=69.042026)
Speedup with 6 cores (average on 2 iterations): 3.871261 (tseq=266.002211, tpar=68.712037)
Speedup with 7 cores (average on 2 iterations): 3.869209 (tseq=266.002211, tpar=68.748471)
Speedup with 8 cores (average on 2 iterations): 3.921493 (tseq=266.002211, tpar=67.831877)
Speedup with 9 cores (average on 2 iterations): 3.895838 (tseq=266.002211, tpar=68.278555)
Speedup with 10 cores (average on 2 iterations): 3.907914 (tseq=266.002211, tpar=68.067569)
*** Computations on lists of floats (chunksize=100)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 267.217013 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 1 cores (average on 2 iterations): 0.998549 (tseq=267.217013, tpar=267.605340)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 2 cores (average on 2 iterations): 1.990431 (tseq=267.217013, tpar=134.250836)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 3 cores (average on 2 iterations): 2.965334 (tseq=267.217013, tpar=90.113622)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 4 cores (average on 2 iterations): 3.922953 (tseq=267.217013, tpar=68.116284)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 5 cores (average on 2 iterations): 3.356925 (tseq=267.217013, tpar=79.601733)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 6 cores (average on 2 iterations): 3.687274 (tseq=267.217013, tpar=72.470067)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 7 cores (average on 2 iterations): 3.522946 (tseq=267.217013, tpar=75.850436)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 2 iterations): 3.912715 (tseq=267.217013, tpar=68.294516)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 9 cores (average on 2 iterations): 3.460438 (tseq=267.217013, tpar=77.220567)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 10 cores (average on 2 iterations): 3.761283 (tseq=267.217013, tpar=71.044107)
*** Computations on arrays of floats
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 265.433520 seconds
Speedup with 1 cores (average on 2 iterations): 0.994272 (tseq=265.433520, tpar=266.962715)
Speedup with 2 cores (average on 2 iterations): 1.949320 (tseq=265.433520, tpar=136.167258)
Speedup with 3 cores (average on 2 iterations): 2.912681 (tseq=265.433520, tpar=91.130309)
Speedup with 4 cores (average on 2 iterations): 3.797807 (tseq=265.433520, tpar=69.891270)
Speedup with 5 cores (average on 2 iterations): 3.323223 (tseq=265.433520, tpar=79.872306)
Speedup with 6 cores (average on 2 iterations): 3.768237 (tseq=265.433520, tpar=70.439718)
Speedup with 7 cores (average on 2 iterations): 3.865783 (tseq=265.433520, tpar=68.662289)
Speedup with 8 cores (average on 2 iterations): 3.818341 (tseq=265.433520, tpar=69.515413)
Speedup with 9 cores (average on 2 iterations): 3.427148 (tseq=265.433520, tpar=77.450255)
Speedup with 10 cores (average on 2 iterations): 3.925661 (tseq=265.433520, tpar=67.614982)
*** Computations on arrays of floats (chunksize=100)
Testing scalability with 2 iterations on 1 to 10 cores, step 1
Sequential execution takes 269.548036 seconds
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 1 cores (average on 2 iterations): 1.008679 (tseq=269.548036, tpar=267.228713)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 2 cores (average on 2 iterations): 2.009827 (tseq=269.548036, tpar=134.115069)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 3 cores (average on 2 iterations): 2.994604 (tseq=269.548036, tpar=90.011240)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 4 cores (average on 2 iterations): 3.963357 (tseq=269.548036, tpar=68.010027)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 5 cores (average on 2 iterations): 3.909074 (tseq=269.548036, tpar=68.954439)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 6 cores (average on 2 iterations): 3.409723 (tseq=269.548036, tpar=79.052765)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 7 cores (average on 2 iterations): 3.970276 (tseq=269.548036, tpar=67.891507)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 8 cores (average on 2 iterations): 3.961305 (tseq=269.548036, tpar=68.045254)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 9 cores (average on 2 iterations): 3.936089 (tseq=269.548036, tpar=68.481176)
Parmap warning: result order is not preserved (it was not expected to be).
Parmap warning: result order is not preserved (it was not expected to be).
Speedup with 10 cores (average on 2 iterations): 3.914925 (tseq=269.548036, tpar=68.851392)

real	156m9.672s
user	390m24.870s
sys	0m10.400s

All the other tests not mentioned here pass properly.

Now, my question is: Is it possible to reduce a bit the data used by the tests like that? Do they remain meaningful ?

My goal by running the tests is to ensure that they provide a good coverage of parmap, ensuring that if they pass that means parmap will work properly. It seems to me that those tests are more about getting accurate speedup results. Can I skip some of those and keep the same level of safety wrt my goal ? If yes, which ones ?

Fatal error: exception Failure("input_value_from_block: bad object")

Anyone saw this before?

I have some code using Parmap that crashes in parallel
but not in sequential (no Parmap).

ocamlfind: [WARNING] You have installed DLLs but the directory /home/xtal/.opam/4.01.0/lib/stublibs is not mentioned in ld.conf

How can I correct this warning?

Fatal error: exception End_of_file

Hi Roberto,

Thanks a lot for parmap which I was using successfully until recently. With v1.0-rc5 array_float_parmap returns the above exception for a source array of ~90K elements, even with ncores = 1. There does not seem to be any memory issue as around half of the computer's memory is free when the exception is raised. This is on Ubuntu 14.04 64bits btw.

Matt

Some code lines are > 80 chars

I like to edit several files at the same time
on my screens. ;)

Parmap.pariter and side effects

The following program :

let i = ref 0
let _ =
Parmap.pariter ~ncores:2 (fun k ->
incr i;
Printf.printf "loop %i : i:%i\n%!" k !i;
) (Parmap.L [1;2;3;4]);
Printf.printf "end : i:%i\n%!" !i;

returns :
./pb_parmap.native
loop 1 : i:1
loop 2 : i:2
loop 3 : i:1
loop 4 : i:2
end : i:0

which is not so straightforward when I read the presentation...

The paragraph : "By forking the parent process on a sigle machine, the children get access, for free, to all the data structures already built, even the imperative ones, and as far as your computation inside the map/fold does not produce side effects that need to be preserved, the final result will be the same as performing the sequential operation, the only difference is that you might get it faster" miss some warning that updates of imperative values accessed out of the parmap.par* will not be updated for other processes...

Thanks for this interesting library

Some "progress bar"

Hello,

It's true we should find some optional way so that the master
process can get feedback on the advancement of tasks
by worker processes.

The correct way should not hinder performances.

There should be some parameter to control
how many tasks are to be done before feedback is sent
(to be able to control the overhead of it).

Regards,
F.

findlib: [WARNING] Interface myocamlbuild.cmi occurs in several directories: ., /home/berenger/.opam/4.00.1/lib/parmap

opam list parmap

Available packages for 4.00.1:
parmap 1.0-rc2

There is a trailing .cmi file after
install of parmap via OPAM.

That's a little annoying as it gives compiler
warnings.

obey DESTDIR

cf. failed trial #37

build error: No rule to make target 'setcore.cmi', needed by 'parmap.cma'

bash-4.3# git remote -v
origin https://github.com/rdicosmo/parmap.git (fetch)
origin https://github.com/rdicosmo/parmap.git (push)
bash-4.3#

bash-4.3# git pull
Already up-to-date.

bash-4.3# git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

bash-4.3# make
ocamlfind ocamlc -package "unix bigarray " -c bytearray.mli
ocamlfind ocamlc -package "unix bigarray " -c parmap_utils.mli
ocamlfind ocamlc -package "unix bigarray " -c parmap.mli
make: *** No rule to make target 'setcore.cmi', needed by 'parmap.cma'. Stop.

A workaround for this problem is to manually compile setcore.ml:
bash-4.3# ocamlc -g -c setcore.ml
bash-4.3# make
This time we can compile successfully.

windows

A question rather than an issue: Does Parmap work on Windows?

round up

It seems that parmap rounds down when statically distributing tasks to cores. For example, I have 284 tasks and 48 cores, which gives 47 cores with 5 tasks each and one core with 49. This represents a pretty massive hit on the benefits of parallelism. Giving 6 tasks to most cores would be a big performance improvement. Everything is fine with ~chunksize:1, but the need for this may not be obvious to most users.

process initialization and finalization

For complex things, it would be very handy to be able to register
an init function and a finalize function that would be run by each worker process:

the init function will be called only once by each child process, just after the
process is created
the finalize function will be called only once by a child process just before
it exit

This allows, for example, to setup and cleanup per process output files
for workers of Array.iteri or List.iteri.
Maybe those functions should be called process_setup and process_cleanup,
or some better name.

I cannot install properly parmap from the sources anymore

Hello,

If I install from the sources, I get:

aclocal -I m4
autoconf
autoheader
./configure --prefix ~/.opam/4.01.0
make
make install

That worked OK, but:

# ocamlfind -query parmap
ocamlfind: Package `parmap' not found
# ocamlfind -query batteries
/home/xtal/.opam/4.01.0/lib/batteries

# opam --version
1.1.1
# ocaml -version
The OCaml toplevel, version 4.01.0
# cat /etc/issue
CentOS release 6.5 (Final)

# make tests
ocamlbuild -j 10 -use-ocamlfind  tests/simplescale.native tests/floatscale.native tests/simplescale_array.native tests/simplescalefold.native tests/simplescalemapfold.native
Finished, 0 targets (0 cached) in 00:00:00.
+ ocamlfind ocamldep -package unix -package bigarray -package parmap -modules tests/simplescale.ml > tests/simplescale.ml.depends
ocamlfind: Package `parmap' not found
Command exited with code 2.
Compilation unsuccessful after building 1 target (0 cached) in 00:00:00.
make: *** [tests] Error 10

The documentation is I think misleading on this matter.

Best regards

do you want parallel file operations?

I have this one currently:

let parmap_on_file (ncores: int) (fn: string) (f: 'a -> 'b) (read_one: in_channel -> 'a): 'b list = ...

rdicosmo / parmap Goto Github PK

parmap's Introduction

Parmap in a nutshell

DO'S and DONT'S

Pinning processes to physical CPUs

Using Parmap with Ocamlnat

Preservation of output order in Parmap

Fast map on arrays and on float arrays

Install

With opam

From source

parmap's People

Contributors

Stargazers

Watchers

Forkers

parmap's Issues

opam list parmap

Recommend Projects

Recommend Topics

Recommend Org