Coder Social home page Coder Social logo

libmir / mir Goto Github PK

View Code? Open in Web Editor NEW
210.0 21.0 20.0 2.03 MB

Mir (backports): Sparse tensors, Hoffman

Home Page: http://mir.libmir.org

License: Boost Software License 1.0

D 96.11% Meson 2.47% Makefile 1.42%
mir glas blas mir-glas numeric math llvm

mir's Introduction

❗️ ndslice was reworked and moved to Mir-Algorithm.

The last Mir version with old ndslice is v0.22.1.

❗️ Mir GLAS was moved to https://github.com/libmir/mir-glas.

Dub downloads License Bountysource Latest version codecov.io Circle CI

Mir

Generic Numerical Library for Science and Machine Learning.

Separated Mir Projects
  • Mir Algorithm - Multidimensional arrays (ndslice), iterators, algorithms.
  • Mir Random - Professional Random Number Generators
  • Mir GLAS - Linear Algebra Library (Experimental, not supported for now)
  • Mir BLAS - Bindings to libraries with CBLAS API like OpenBLAS and Intel MKL.
  • Mir LAPACK - Bindings to libraries with LAPACK API like OpenBLAS and Intel MKL.
  • Mir Optim - Nonlinear Solvers.
  • Mir CPUID - CPU Identification routines (less buggy then Phobos).

Documentation

Documentation API can be found here.

Contents

  • mir.glas - Generic Linear Algebra Subroutines
  • mir.sparse Sparse Tensors
  • Sparse - DOK format
  • Different ranges for COO format
  • CompressedTensor - CSR/CSC formats
  • mir.sparse.blas - Sparse BLAS for CompressedTensor
  • mir.model.lda.hoffman - Online variational Bayes for latent Dirichlet allocation (Online VB LDA) for sparse documents. LDA is used for topic modeling.
  • mir.combinatorics Combinations, combinations with repeats, cartesian power, permutations.

Compatibility

Linux Mac OS X Windows
64-bit Build Status Build Status Build status
32-bit Build Status N/A N/A

Example

/+dub.sdl:
dependency "mir" version="~>3.1.0"
+/
import std.stdio;
import mir.combinatorics;
void main(string[] args)
{
    writeln([1, 2].combinations);
}

Fast setup with the dub package manager

Latest version

Dub is the D's package manager. You can create a new project with:

dub init <project-name>

Now you need to edit the dub.json add mir as dependency.

{
	...
	"dependencies": {
		"mir": "~><current-version>"
	},
	"dflags-ldc": ["-mcpu=native"]
}

Now you can create an app.d file in the source folder and run your code with

dub --compiler=ldmd2

Flag --build=release and can be added for a performance boost:

dub --compiler=ldmd2 --build=release

ldmd2 is a shell on top of LDC (LLVM D Compiler).

"dflags-ldc": ["-mcpu=native"] allows LDC to optimize Mir for your CPU.

Contributing

See our TODO List. Mir is very young and we are open for contributing to source code, documentation, examples and benchmarks.

mir's People

Contributors

9il avatar haraldzealot avatar john-colvin avatar ljubobratovicrelja avatar martinnowak avatar n8sh avatar petarkirov avatar rjkilpatrick avatar shigekikarita avatar trikko avatar wilzbach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mir's Issues

fromArray

It would be nice to have a way to convert a multi-dimensional array to a slice, e.g.

[[1,2], [3, 4]].fromArray

expose createSlice

I found the following pattern hidden in an unittest - why is this not public?

This would be very useful for many array creation, because it would solve NumPy's popular zeros, ones and with createSlice(3,3).diag[] = 1 also identitiy

Moreover I would suggest to use a default type or add some convenience wrappes like the one mentioned from NumPy

auto createSlice(T, Lengths...)(Lengths lengths)
{
    return createSlice2!(T, Lengths.length)(cast(size_t[Lengths.length])[lengths]);
}

///ditto
auto createSlice2(T, size_t N)(auto ref size_t[N] lengths)
{
    size_t length = lengths[0];
    foreach (len; lengths[1 .. N])
        length *= len;
    return new T[length].sliced(lengths);
}


pure nothrow unittest
{
    auto slice = createSlice!int(5, 6, 7);
    assert(slice.length == 5);
    assert(slice.elementsCount == 5 * 6 * 7);
    static assert(is(typeof(slice) == Slice!(3, int*)));
}

For what it is worth I also saw a makeSlice which is based on the new allocator.

import std.experimental.allocator;

auto makeSlice(T, Allocator, Lengths...)(auto ref Allocator alloc, Lengths lengths)
{
    enum N = Lengths.length;
    struct Result { T[] array; Slice!(N, T*) slice; }
    size_t length = lengths[0];
    foreach (len; lengths[1 .. N])
        length *= len;
    T[] a = alloc.makeArray!T(length);
    return Result(a, a.sliced(lengths));
}

unittest
{
    auto tup = makeSlice!int(theAllocator, 2, 3, 4);

    static assert(is(typeof(tup.array) == int[]));
    static assert(is(typeof(tup.slice) == Slice!(3, int*)));

    assert(tup.array.length           == 24);
    assert(tup.slice.elementsCount    == 24);
    assert(tup.array.ptr == &tup.slice[0, 0, 0]);

    theAllocator.dispose(tup.array);
}

mir.combinatorics: behavior with tuples

It doesn't compile yet with tuples (which we should fix) - and only returns a result in the same type:

import std.range: only;
auto projectionC = 2.permutations.indexedRoR(only('a', 1));
projection.front // [97, 1]

mir.glas: linear algebra subroutines

Common linear algebra functions. Probably this is part of the Blas integration?

See the wiki for NumPy's capabilities: https://github.com/DlangScience/mir/wiki/NumPy:-Linear-algebra

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797213-mir-glas-linear-algebra-subroutines?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

indexSlice documentation

What does indexSlice actually do? It's not clear to me from the documentation or the rather small unittests.

better access to allowDownsize

It seems that this that the default flag for allowDownsize has been recently changed - (at least for the latest version in phobos).

This makes sense, but we should have a handy access to the flag, this is annoying!

iota(10).sliced!(Yes.replaceArrayWithPointer, Yes.allowDownsize)(4);

indexing a slice with an array causes an error inside ndslice

e.g. mySlice[[1,2,3]] gives

/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1556): Error: no property 'i' for type 'int[]'
/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1557): Error: no property 'j' for type 'int[]'
/Users/john/.dub/packages/mir-0.10.2/source/mir/ndslice/slice.d(1557): Error: no property 'i' for type 'int[]'
test.d(8): Error: template instance mir.ndslice.slice.Slice!(3LU, ulong*).Slice.opIndex!(int[]) error instantiating

it should either work or should error at the API level, not internally.

allow strided slice indexing: m[0, 0..5, _, R(0, $, 2), R($-1, 0, -2) ]

discussed here: https://docs.google.com/document/d/1cEf8AynZEZxlTENJx1i4w1GTB481bbc4iUbUeJT00-U/edit#heading=h.kv2inl5g6n2a

Also note that i..j only works within opIndex, not within opCall (math index), so R(i,j) is needed for opCall (and should also be allowed for opIndex for uniformity)

Also this syntax sugar (built on top of R) is convenient:

enum _=R();
R(...).reverse
inclusive(a,b,-2)

also discussed in the doc

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/30070920-allow-strided-slice-indexing-m-0-0-5-_-r-0-2-r-1-0-2?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir logo

As using the "official D logo" doesn't make some parts of the community happy, I propose a couple of ideas for a custom mir logo. This is round is just about finding 3-5 nice fonts that we want to try in the future.

So this round is not about criticizing the rocket (next round!), however don't hesitate to send me ideas for symbol(s) that could potentially represent mir.

You can browse their rendered png version here and find the svg source here.

Artwork

Atoms: https://commons.wikimedia.org/wiki/File:TBBTAtmBlack.svg (CC3 with attribution - there's a whole bunch of atom vector graphics - probably one which does't require republishing under CC. Alternatively we could draw this ourselves)
Rocket: https://pixabay.com/en/rocket-space-shuttle-ship-black-303886 (CC0 Public Domain: Free for commercial use, no attribution required)
Fonts: Probably we have to check once we shrunk down our selection - however fonts are usually pretty liberal.

Update: attached ideas.zip for archiving purposes and mirror from Google Drive.

Remove non `v` git tags

I just looked through the git tags and so that a couple of them didn't use the v format:

0.0.10
0.0.14
0.0.15
0.6.14
0.8.7

You probably want to remove them as long as mir isn't that often forked.

zero property functions should be const

Dscanner will warn one if the @property attribute is provided it will warn you that a zero property function should be const. After all D supports (see below) and it yields a clearer API and avoids mistakes if the user tries to assign something to the property functions

size_t a()
{
 return _a;
}

size_t a(size_t a)
{
  return _a = a;
}

There is a bug in dscanner, that doesn't trigger warnings if the @property attribute is provided before the function name - so I suggest we keep this here as a reminder and try to tackle this once we have a
bit of time and no open PRs at Phobos?

Maybe you have a "code quality" or "code style" tag?

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32994076-zero-property-functions-should-be-const?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.combinatorics: uint on windows x86

seems like I should add some casts - will prepare a PR.

source\mir\combinatorics\package.d(129,27): Error: safe function 'mir.combinatorics.__unittestL122_2' cannot call system function 'mir.combinatorics.binomial!(BigInt, int).binomial' 
source\mir\combinatorics\package.d(700,35): Error: cannot implicitly convert expression (binomial(n, repeatLen)) of type ulong to uint
source\mir\combinatorics\package.d(983,34): Error: cannot implicitly convert expression (binomial(n + repeatLen - 1u, repeatLen)) of type ulong to uint

allow upSize for sliced

It would be quite nice to handle be able to fit an existing slice into an larger one (for sliced).
Have a look at this example from NumPy:

>>> b = np.array([[0, 1], [2, 3]])
>>> b.resize(2, 3) # new_shape parameter doesn't have to be a tuple
>>> b
array([[0, 1, 2],
       [3, 0, 0]])

I imagine sth. like this:

iota(4).sliced!Yes.allowUpsize(2, 3)

(but be aware of the flag "hell" - see #18)

merging back into phobos

Just out of interest how you plan to accomplish

mir.ndslice is a development version of the std.experimental.ndslice package.

At least with the changed paths it sounds like a lot of work to get new commits from here to phobos without messing up the history ...

Allow down and upSize for reshape

(related to #19)

Moreover having downsize and upsize for reshape makes sense to, otherwise
people will end up with something like this:

auto s = iota(4).sliced(2, 2);
assert(s[0..1,0..2] == [[0, 1]]);
assert(s[0..2,0..1] == [[0], [2]]);

Which works nicely if we just "cut" a slice, however once we don't we have to do such an ugly pattern:

auto s2 = iota(9).sliced(3, 3);
assert(s2.byElement.sliced!(Yes.replaceArrayWithPointer,
       Yes.allowDownsize)(4, 1) == [[0], [1], [2], [3]]);

NumPy users migration guide

Started in #27 there exist are already a couple of wiki entries about NumPy's API. The main idea of this issue would be to complete this "cheatsheet" and create an guide for an easy migration from NumPy to mir.

This will probably be pending for a long time (as some functionality is still missing), but we shouldn't loose out of our focus. Creating a happy user base is important :)

Row/column major ordering

See also: https://en.wikipedia.org/wiki/Row-major_order

It is sometimes useful to apply an operation in "Fortran mode"
(=column-major, everted)

Have a look at this example from NumPy:

>>> a = np.arange(6).reshape((3, 2))
>>> a.reshape(2,3, order='F')
array([[0, 4, 3],
       [2, 1, 5]])

You can think of this as "look at the array in column mode" -
apply the operation and "write in back in column mode".

So in Python one can actually "simulate" it with the following

>>> a.T.reshape(3,2).T
>>> array([[0, 4, 3],
       [2, 1, 5]])

If one tries the example with ndslice we get

b.everted.reshape(2, 3).everted.writeln;
/// [[0, 1], [2, 3], [4, 5]]

which makes a bit of sense, because everted just reverts the order of iteration.
It is possible to do the same as above correctly with byElement and array.
However it feels wrong to have to copy the array.

b.everted.byElement.array.sliced(3,2).everted.writeln;
// [[0, 4, 3], [2, 1, 5]]

More convenient Iota wrapper

Looking throw the unittests I see this pattern coming up > 80% of all tests.

auto b = 4.iota.sliced(2, 2);
auto tensor = 60.iota.array.sliced(3, 4, 5);
auto a = 240000.iota.sliced(10, 20, 30, 40);

I do like this pattern, but seeing 1) how often this occurs and 2) that you need to know the size of iota no that downSize is disabled by default it seems to me that having a convenience wrapper for this, like Iota.

Iota.sliced(10, 20, 30, 40);

It looks nicer and avoids the problem that you actually mistyped the number of elements - which can only be evaluated in runtime - e.g. 24000.iota.sliced(10, 20, 30, 40); will result in an hard-to-see runtime error.

I haven't thought much about how we could combine this start, stepsize from usual iota, but how about:

Iota!10(10,20,30,40) // starts at ten
Iota!(10, 2)(10,20,30,40) // starts at ten, with steps of 2
Iota!(10, 0.5)(10,20,30,40) // starts at ten, with steps of 0.5

Would be happy to send a PR for this ;-)

mir.dlang.io - create a new web presence

I own dlang.io, so we can use science.dlang.io or mir.dlang.io for a new web presence :)

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797547-mir-dlang-io-create-a-new-web-presence?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

slice and makeSlice should accept ranges

Reason: When one uses an input range like iota or iotaSlice which is doesn't allow modifications (=doesn't implement lvalue index), the following two don't work:

auto a = 6.iotaSlice(2, 3);
a[0, 0] = 1; // not an lvalue
auto b = 6.iota.sliced(2, 3);
b[0, 0] = 1; // not an lvalue

However when using the new allocation APIs, we can allocate memory:

auto a = 6.iotaSlice(2, 3).slice; // we create a modifyable copy
a[0, 0] = 1; 
auto b = 6.iota.array.sliced(2, 3); // or duplicate directly
b[0, 0] = 1;

Thus imho it would be nice if slice directly accepts a range, that fills the allocated array.
Basically a short wrapper around:

auto a = slice!int(2, 2);
a[] = iota(4).sliced(2, 2);

What do you think?

reshape should be nogc

[came up after reading comment https://github.com//issues/1#issuecomment-170299410]

I think sliced and reshape should be nogc, eg:
auto slice = 1000.iota.sliced([5, 6, 7], 9);
=> should be completely gc free.

The only think that comes to my mind is formatted error messages, for which #1 (comment) provides a nogc solution [to be generalized to more than 1 integer to format, if need be]

aim: 100% code coverage

I know it is quite silly to say that a templated D-code has 100% coverage. Nevertheless we should use the compiler's feature to detected unchecked branches and aim for 100% coverage soon.

My idea is that we then can enforce this requirement for all further code additions.

support nogc concatenation in the outermost dimension

algo:

auto  cat!0(S0 a0, S1 a1) if ( isSlice!S0 && isSlice!S1) {
  auto shape0=a0.shape;
  auto shape1=a1.shape;
  shape0[0]+=shape1[0];
  assert(shape0[1..$]==shape1[1..$]);
  return chain(a0.byElement, a1.byElement).sliced(shape);
}

generalizes trivially to variadic arguments

documentation build

I guess in the near future we want to provide a generated online documentation, preferable automatically (e.g. travis build uploaded to s3).

Maybe Adam's new documentation generator is worth looking at.

fast byElement iteration

Reminder: As proposed in #82 - it should be possible to write a faster byElement version if RandomAccess is not needed.

lazy operations

@9il You have suggested that adding lazy element-wise operations to ndslice isn't a good idea at the moment, but I don't fully understand why. Could you try to explain more?

mir.fft: multidimensional FFT

See http://www.fftw.org/

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32796856-mir-fft-multidimensional-fft?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.data: sci data formats

Use the new allocation api to support import and export into common file formats - e.g csv or Matlab

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32797127-mir-data-sci-data-formats?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

Detect performance regressions

Perfomance is crucial, so we should have a way to detect regressions or evaluate improvements.

Here is an idea that I have:

  • We add some more complicated "performance" unittest and exclude them from compilation by default
  • They probably have to use a special API or mixin to report a unique name and their runtime.
  • On a PR a CI checks out the new version and the master branch and runs for both the "performance" tests several times in random order and then calculates the average for every tests (that's why we need a unique name for mapping) and difference between master and the PR / feature branch.
  • Probably some variance due to different loads has to be tolerated and shouldn't be reported
  • The CI could complain via git bot (like coverage), email or the CI status icon
  • Maybe we then want to use a different CI, so that is just additional info and doesn't block Travis

Btw this is also a topic that often comes up in Phobos, but afaik currently it always depends on manual benchmarking.

std.algorithm.sort perfomance: dlang/phobos#3922
std.regex JIT compiling: dlang/phobos#4120
Faster pairwise summation: dlang/phobos#4069
Faster topN: dlang/phobos#3934

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32834334-detect-performance-regressions?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

Convenient aliases: `T` and `I`

Coming from NumPy I would propose to reserve the following two short aliases:

  • T as a property alias for transposed
  • I as a property alias for inverted

This is often quite handy

better error messages, eg: report indexes and shape instead of static text

these error messages are not very helpful for debugging:
https://github.com/DlangScience/mir/blob/master/source/mir/ndslice/slice.d#L854 [+ elsewhere similar]

assert(_indexes[0][i] < _lengths[i], "indexStride: index must be less than lengths");

how about instead:

version(assert)
  enforce(_indexes[0][i] < _lengths[i], text("indexStride: index must be less than lengths", _indexes[0][i], " ", _lengths[i], " ", i));

NOTE: not sure if there's a better way to do it, but i want to have the check only in -release mode (hence version(assert)), and I want to avoid runtime cost in forming a text(..) expression when the assert passes, hence the enforce(boo, lazy exp)

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/29736892-better-error-messages-eg-report-indexes-and-shape-instead-of-static-text?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

mir.bignum: hex/decimal integer/FP implementation

https://github.com/andersonpd/eris
http://speleotrove.com/decimal/decarith.html

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/32819823-mir-bignum-hex-decimal-integer-fp-implementation?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F18251717&utm_medium=issues&utm_source=github).

#dscience-mir

Hi, as you might know by now I went ahead and created #dscience and #dscience on freenode.

While I don't want to push IRC hard (I am fine with communication purely through Github), but I just wanted to know what your preferred communication channel is?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.