libmir / mir Goto Github PK

View Code? Open in Web Editor NEW

210.0 21.0 20.0 2.03 MB

Mir (backports): Sparse tensors, Hoffman

Home Page: http://mir.libmir.org

License: Boost Software License 1.0

D 96.11% Meson 2.47% Makefile 1.42%

mir glas blas mir-glas numeric math llvm

mir's Introduction

❗️ ndslice was reworked and moved to Mir-Algorithm.

The last Mir version with old ndslice is v0.22.1.

❗️ Mir GLAS was moved to https://github.com/libmir/mir-glas.

Mir

Generic Numerical Library for Science and Machine Learning.

Separated Mir Projects

Mir Algorithm - Multidimensional arrays (ndslice), iterators, algorithms.
Mir Random - Professional Random Number Generators
Mir GLAS - Linear Algebra Library (Experimental, not supported for now)
Mir BLAS - Bindings to libraries with CBLAS API like OpenBLAS and Intel MKL.
Mir LAPACK - Bindings to libraries with LAPACK API like OpenBLAS and Intel MKL.
Mir Optim - Nonlinear Solvers.
Mir CPUID - CPU Identification routines (less buggy then Phobos).

Documentation

Documentation API can be found here.

mir.glas - Generic Linear Algebra Subroutines
mir.sparse Sparse Tensors
Sparse - DOK format
Different ranges for COO format
CompressedTensor - CSR/CSC formats
mir.sparse.blas - Sparse BLAS for CompressedTensor
mir.model.lda.hoffman - Online variational Bayes for latent Dirichlet allocation (Online VB LDA) for sparse documents. LDA is used for topic modeling.
mir.combinatorics Combinations, combinations with repeats, cartesian power, permutations.

Compatibility

	Linux	Mac OS X	Windows
64-bit
32-bit		N/A	N/A

Example

/+dub.sdl:
dependency "mir" version="~>3.1.0"
+/
import std.stdio;
import mir.combinatorics;
void main(string[] args)
{
    writeln([1, 2].combinations);
}

Fast setup with the dub package manager

Dub is the D's package manager. You can create a new project with:

dub init <project-name>

Now you need to edit the dub.json add mir as dependency.

{
	...
	"dependencies": {
		"mir": "~><current-version>"
	},
	"dflags-ldc": ["-mcpu=native"]
}

Now you can create an app.d file in the source folder and run your code with

dub --compiler=ldmd2

Flag --build=release and can be added for a performance boost:

dub --compiler=ldmd2 --build=release

ldmd2 is a shell on top of LDC (LLVM D Compiler).

"dflags-ldc": ["-mcpu=native"] allows LDC to optimize Mir for your CPU.

Contributing

See our TODO List. Mir is very young and we are open for contributing to source code, documentation, examples and benchmarks.

mir's People

Contributors

Stargazers

Watchers

Forkers

transformersprimeabcxyz wilzbach ljubobratovicrelja john-colvin haraldzealot petarkirov rjmcguire martinnowak 5632741 henrygouk shigekikarita awesome-ml wangyx0055 bausshf n8sh geod24 rjkilpatrick dut3062796s 00mjk

mir's Issues

fromArray

It would be nice to have a way to convert a multi-dimensional array to a slice, e.g.

[[1,2], [3, 4]].fromArray

mir.las.sum: doesn't compile on x86

Fails in line 1827, as nextDown and nextUp aren't defined:

assert(nextDown(s) <= r && r <= nextUp(s) || s.isNaN && r.isNaN);

Squeeze

Yet another nice method that NumPy has.

Remove single-dimensional entries from the shape of an array.

>>> x = np.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> np.squeeze(x).shape
(3,)
>>> np.squeeze(x, axis=(2,)).shape
(1, 3)

https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.squeeze.html#numpy.squeeze

expose createSlice

I found the following pattern hidden in an unittest - why is this not public?

This would be very useful for many array creation, because it would solve NumPy's popular zeros, ones and with createSlice(3,3).diag[] = 1 also identitiy

Moreover I would suggest to use a default type or add some convenience wrappes like the one mentioned from NumPy

auto createSlice(T, Lengths...)(Lengths lengths)
{
    return createSlice2!(T, Lengths.length)(cast(size_t[Lengths.length])[lengths]);
}

///ditto
auto createSlice2(T, size_t N)(auto ref size_t[N] lengths)
{
    size_t length = lengths[0];
    foreach (len; lengths[1 .. N])
        length *= len;
    return new T[length].sliced(lengths);
}


pure nothrow unittest
{
    auto slice = createSlice!int(5, 6, 7);
    assert(slice.length == 5);
    assert(slice.elementsCount == 5 * 6 * 7);
    static assert(is(typeof(slice) == Slice!(3, int*)));
}

For what it is worth I also saw a makeSlice which is based on the new allocator.

import std.experimental.allocator;

auto makeSlice(T, Allocator, Lengths...)(auto ref Allocator alloc, Lengths lengths)
{
    enum N = Lengths.length;
    struct Result { T[] array; Slice!(N, T*) slice; }
    size_t length = lengths[0];
    foreach (len; lengths[1 .. N])
        length *= len;
    T[] a = alloc.makeArray!T(length);
    return Result(a, a.sliced(lengths));
}

unittest
{
    auto tup = makeSlice!int(theAllocator, 2, 3, 4);

    static assert(is(typeof(tup.array) == int[]));
    static assert(is(typeof(tup.slice) == Slice!(3, int*)));

    assert(tup.array.length           == 24);
    assert(tup.slice.elementsCount    == 24);
    assert(tup.array.ptr == &tup.slice[0, 0, 0]);

    theAllocator.dispose(tup.array);
}

Github name space - d-science and d-mir

Hey just a short heads up that I reserved both d-science and d-mir, so just in case we want to switch to one of them.
You should receive invitations soon.

mir.combinatorics: behavior with tuples

It doesn't compile yet with tuples (which we should fix) - and only returns a result in the same type:

import std.range: only;
auto projectionC = 2.permutations.indexedRoR(only('a', 1));
projection.front // [97, 1]

mir.glas: linear algebra subroutines

Common linear algebra functions. Probably this is part of the Blas integration?

See the wiki for NumPy's capabilities: https://github.com/DlangScience/mir/wiki/NumPy:-Linear-algebra

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797213-mir-glas-linear-algebra-subroutines?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.probcounting: hyperloglog algorithm implementation

https://en.wikipedia.org/wiki/HyperLogLog

indexSlice documentation

What does indexSlice actually do? It's not clear to me from the documentation or the rather small unittests.

better access to allowDownsize

It seems that this that the default flag for allowDownsize has been recently changed - (at least for the latest version in phobos).

This makes sense, but we should have a handy access to the flag, this is annoying!

iota(10).sliced!(Yes.replaceArrayWithPointer, Yes.allowDownsize)(4);

indexing a slice with an array causes an error inside ndslice

e.g. mySlice[[1,2,3]] gives

/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1556): Error: no property 'i' for type 'int[]'
/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1557): Error: no property 'j' for type 'int[]'
/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1557): Error: no property 'i' for type 'int[]'
test.d(8): Error: template instance mir.ndslice.slice.Slice!(3LU, ulong*).Slice.opIndex!(int[]) error instantiating

it should either work or should error at the API level, not internally.

allow strided slice indexing: m[0, 0..5, _, R(0, $, 2), R($-1, 0, -2) ]

discussed here: https://docs.google.com/document/d/1cEf8AynZEZxlTENJx1i4w1GTB481bbc4iUbUeJT00-U/edit#heading=h.kv2inl5g6n2a

Also note that i..j only works within opIndex, not within opCall (math index), so R(i,j) is needed for opCall (and should also be allowed for opIndex for uniformity)

Also this syntax sugar (built on top of R) is convenient:

enum _=R();
R(...).reverse
inclusive(a,b,-2)

also discussed in the doc

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/30070920-allow-strided-slice-indexing-m-0-0-5-_-r-0-2-r-1-0-2?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir logo

As using the "official D logo" doesn't make some parts of the community happy, I propose a couple of ideas for a custom mir logo. This is round is just about finding 3-5 nice fonts that we want to try in the future.

So this round is not about criticizing the rocket (next round!), however don't hesitate to send me ideas for symbol(s) that could potentially represent mir.

You can browse their rendered png version here and find the svg source here.

Artwork

Atoms: https://commons.wikimedia.org/wiki/File:TBBTAtmBlack.svg (CC3 with attribution - there's a whole bunch of atom vector graphics - probably one which does't require republishing under CC. Alternatively we could draw this ourselves)
Rocket: https://pixabay.com/en/rocket-space-shuttle-ship-black-303886 (CC0 Public Domain: Free for commercial use, no attribution required)
Fonts: Probably we have to check once we shrunk down our selection - however fonts are usually pretty liberal.

Update: attached ideas.zip for archiving purposes and mirror from Google Drive.

Remove non `v` git tags

I just looked through the git tags and so that a couple of them didn't use the v format:

0.0.10
0.0.14
0.0.15
0.6.14
0.8.7

You probably want to remove them as long as mir isn't that often forked.

mir.stat: statistical functions

TODO

Probabilistic counting. See https://en.wikipedia.org/wiki/HyperLogLog

Statistical functions and algorithms.

NumPy's capabilities:
https://github.com/DlangScience/mir/wiki/NumPy:-Statistics

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

zero property functions should be const

Dscanner will warn one if the @property attribute is provided it will warn you that a zero property function should be const. After all D supports (see below) and it yields a clearer API and avoids mistakes if the user tries to assign something to the property functions

size_t a()
{
 return _a;
}

size_t a(size_t a)
{
  return _a = a;
}

There is a bug in dscanner, that doesn't trigger warnings if the @property attribute is provided before the function name - so I suggest we keep this here as a reminder and try to tackle this once we have a
bit of time and no open PRs at Phobos?

Maybe you have a "code quality" or "code style" tag?

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32994076-zero-property-functions-should-be-const?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.combinatorics: uint on windows x86

seems like I should add some casts - will prepare a PR.

source\mir\combinatorics\package.d(129,27): Error: safe function 'mir.combinatorics.__unittestL122_2' cannot call system function 'mir.combinatorics.binomial!(BigInt, int).binomial' 
source\mir\combinatorics\package.d(700,35): Error: cannot implicitly convert expression (binomial(n, repeatLen)) of type ulong to uint
source\mir\combinatorics\package.d(983,34): Error: cannot implicitly convert expression (binomial(n + repeatLen - 1u, repeatLen)) of type ulong to uint

allow upSize for sliced

It would be quite nice to handle be able to fit an existing slice into an larger one (for sliced).
Have a look at this example from NumPy:

>>> b = np.array([[0, 1], [2, 3]])
>>> b.resize(2, 3) # new_shape parameter doesn't have to be a tuple
>>> b
array([[0, 1, 2],
       [3, 0, 0]])

I imagine sth. like this:

iota(4).sliced!Yes.allowUpsize(2, 3)

(but be aware of the flag "hell" - see #18)

merging back into phobos

Just out of interest how you plan to accomplish

mir.ndslice is a development version of the std.experimental.ndslice package.

At least with the changed paths it sounds like a lot of work to get new commits from here to phobos without messing up the history ...

Allow down and upSize for reshape

(related to #19)

Moreover having downsize and upsize for reshape makes sense to, otherwise
people will end up with something like this:

auto s = iota(4).sliced(2, 2);
assert(s[0..1,0..2] == [[0, 1]]);
assert(s[0..2,0..1] == [[0], [2]]);

Which works nicely if we just "cut" a slice, however once we don't we have to do such an ugly pattern:

auto s2 = iota(9).sliced(3, 3);
assert(s2.byElement.sliced!(Yes.replaceArrayWithPointer,
       Yes.allowDownsize)(4, 1) == [[0], [1], [2], [3]]);

NumPy users migration guide

Started in #27 there exist are already a couple of wiki entries about NumPy's API. The main idea of this issue would be to complete this "cheatsheet" and create an guide for an easy migration from NumPy to mir.

This will probably be pending for a long time (as some functionality is still missing), but we shouldn't loose out of our focus. Creating a happy user base is important :)

mir.example: use-cases and common examples

As mentioned briefly in another issue we might want to put down a couple of examples in a separate package.

It could also contain a "mapping/guide" to NumPy users (see #77)

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Row/column major ordering

It is sometimes useful to apply an operation in "Fortran mode"
(=column-major, everted)

Have a look at this example from NumPy:

>>> a = np.arange(6).reshape((3, 2))
>>> a.reshape(2,3, order='F')
array([[0, 4, 3],
       [2, 1, 5]])

You can think of this as "look at the array in column mode" -
apply the operation and "write in back in column mode".

So in Python one can actually "simulate" it with the following

>>> a.T.reshape(3,2).T
>>> array([[0, 4, 3],
       [2, 1, 5]])

If one tries the example with ndslice we get

b.everted.reshape(2, 3).everted.writeln;
/// [[0, 1], [2, 3], [4, 5]]

which makes a bit of sense, because everted just reverts the order of iteration.
It is possible to do the same as above correctly with byElement and array.
However it feels wrong to have to copy the array.

b.everted.byElement.array.sliced(3,2).everted.writeln;
// [[0, 4, 3], [2, 1, 5]]

More convenient Iota wrapper

Looking throw the unittests I see this pattern coming up > 80% of all tests.

auto b = 4.iota.sliced(2, 2);
auto tensor = 60.iota.array.sliced(3, 4, 5);
auto a = 240000.iota.sliced(10, 20, 30, 40);

I do like this pattern, but seeing 1) how often this occurs and 2) that you need to know the size of iota no that downSize is disabled by default it seems to me that having a convenience wrapper for this, like Iota.

Iota.sliced(10, 20, 30, 40);

It looks nicer and avoids the problem that you actually mistyped the number of elements - which can only be evaluated in runtime - e.g. 24000.iota.sliced(10, 20, 30, 40); will result in an hard-to-see runtime error.

I haven't thought much about how we could combine this start, stepsize from usual iota, but how about:

Iota!10(10,20,30,40) // starts at ten
Iota!(10, 2)(10,20,30,40) // starts at ten, with steps of 2
Iota!(10, 0.5)(10,20,30,40) // starts at ten, with steps of 0.5

Would be happy to send a PR for this ;-)

Swap axis

I am still looking at NumPy and ndslice ;-)
Here's something nice that is still missing.

Interchange two axes of an array.

I know there is everted that changes entirely the order of the axes, but
sometimes it is useful just to switch to specific ones.

>>> np.swapaxes(x,0,2)
array([[[0, 4],
        [2, 6]],
       [[1, 5],
        [3, 7]]])

https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.swapaxes.html

mir.dlang.io - create a new web presence

I own dlang.io, so we can use science.dlang.io or mir.dlang.io for a new web presence :)

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797547-mir-dlang-io-create-a-new-web-presence?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

slice and makeSlice should accept ranges

Reason: When one uses an input range like iota or iotaSlice which is doesn't allow modifications (=doesn't implement lvalue index), the following two don't work:

auto a = 6.iotaSlice(2, 3);
a[0, 0] = 1; // not an lvalue
auto b = 6.iota.sliced(2, 3);
b[0, 0] = 1; // not an lvalue

However when using the new allocation APIs, we can allocate memory:

auto a = 6.iotaSlice(2, 3).slice; // we create a modifyable copy
a[0, 0] = 1; 
auto b = 6.iota.array.sliced(2, 3); // or duplicate directly
b[0, 0] = 1;

Thus imho it would be nice if slice directly accepts a range, that fills the allocated array.
Basically a short wrapper around:

auto a = slice!int(2, 2);
a[] = iota(4).sliced(2, 2);

What do you think?

reshape should be nogc

[came up after reading comment https://github.com//issues/1#issuecomment-170299410]

I think sliced and reshape should be nogc, eg:
auto slice = 1000.iota.sliced([5, 6, 7], 9);
=> should be completely gc free.

The only think that comes to my mind is formatted error messages, for which #1 (comment) provides a nogc solution [to be generalized to more than 1 integer to format, if need be]

Windows tests

See https://ci.appveyor.com

aim: 100% code coverage

I know it is quite silly to say that a templated D-code has 100% coverage. Nevertheless we should use the compiler's feature to detected unchecked branches and aim for 100% coverage soon.

My idea is that we then can enforce this requirement for all further code additions.

support nogc concatenation in the outermost dimension

algo:

auto  cat!0(S0 a0, S1 a1) if ( isSlice!S0 && isSlice!S1) {
  auto shape0=a0.shape;
  auto shape1=a1.shape;
  shape0[0]+=shape1[0];
  assert(shape0[1..$]==shape1[1..$]);
  return chain(a0.byElement, a1.byElement).sliced(shape);
}

generalizes trivially to variadic arguments

documentation build

I guess in the near future we want to provide a generated online documentation, preferable automatically (e.g. travis build uploaded to s3).

Maybe Adam's new documentation generator is worth looking at.

fast byElement iteration

Reminder: As proposed in #82 - it should be possible to write a faster byElement version if RandomAccess is not needed.

lazy operations

@9il You have suggested that adding lazy element-wise operations to ndslice isn't a good idea at the moment, but I don't fully understand why. Could you try to explain more?

mir.spatial - spatial algorithms

I happen to use the distance functions from scipy quite a lot

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

mir.random: non-uniform random generators

Non-uniform random generators.

mir.algorithm - Sorting, searching and counting

Ideally this would seamlessly integrate with the existing algorithms in Phobos from std.algorithm.

I don't know what we do with new algorithms, but mir.algorithm sounds like a good place?
At least sorting, partitioning, counting and searching should be checked.

mir.sparse multidimensional sparse arrays

See #79

byElement seems to be missing some slicing primitives

e.g.

    int[] a = [1,2,3,4];
    auto s = a.sliced(2,2);
    s.byElement[0 .. 3] = 3; // Error
    s.byElement[] = [3,3,3,3]; // Error

[ndslice.selection] rangeHasMutableElements is not defined

mir.fft: multidimensional FFT

See http://www.fftw.org/

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32796856-mir-fft-multidimensional-fft?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.data: sci data formats

Use the new allocation api to support import and export into common file formats - e.g csv or Matlab

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797127-mir-data-sci-data-formats?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

Detect performance regressions

Perfomance is crucial, so we should have a way to detect regressions or evaluate improvements.

Here is an idea that I have:

We add some more complicated "performance" unittest and exclude them from compilation by default
They probably have to use a special API or mixin to report a unique name and their runtime.
On a PR a CI checks out the new version and the master branch and runs for both the "performance" tests several times in random order and then calculates the average for every tests (that's why we need a unique name for mapping) and difference between master and the PR / feature branch.
Probably some variance due to different loads has to be tolerated and shouldn't be reported
The CI could complain via git bot (like coverage), email or the CI status icon
Maybe we then want to use a different CI, so that is just additional info and doesn't block Travis

Btw this is also a topic that often comes up in Phobos, but afaik currently it always depends on manual benchmarking.

std.algorithm.sort perfomance: dlang/phobos#3922
std.regex JIT compiling: dlang/phobos#4120
Faster pairwise summation: dlang/phobos#4069
Faster topN: dlang/phobos#3934

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32834334-detect-performance-regressions?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

Convenient aliases: `T` and `I`

Coming from NumPy I would propose to reserve the following two short aliases:

T as a property alias for transposed
I as a property alias for inverted

This is often quite handy

better error messages, eg: report indexes and shape instead of static text

these error messages are not very helpful for debugging:
https://github.com/DlangScience/mir/blob/master/source/mir/ndslice/slice.d#L854 [+ elsewhere similar]

assert(_indexes[0][i] < _lengths[i], "indexStride: index must be less than lengths");

how about instead:

version(assert)
  enforce(_indexes[0][i] < _lengths[i], text("indexStride: index must be less than lengths", _indexes[0][i], " ", _lengths[i], " ", i));

NOTE: not sure if there's a better way to do it, but i want to have the check only in -release mode (hence version(assert)), and I want to avoid runtime cost in forming a text(..) expression when the assert passes, hence the enforce(boo, lazy exp)

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/29736892-better-error-messages-eg-report-indexes-and-shape-instead-of-static-text?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.bignum: hex/decimal integer/FP implementation

https://github.com/andersonpd/eris
http://speleotrove.com/decimal/decarith.html

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32819823-mir-bignum-hex-decimal-integer-fp-implementation?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).