lehins / massiv Goto Github PK

Efficient Haskell Arrays featuring Parallel computation

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.96% C 0.04%

haskell array arrays stencil multidimensional-arrays convolution parallel-computing parallel-processing haskell-arrays massiv

massiv's Introduction

massiv

massiv is a Haskell library for array manipulation. Performance is one of its main goals, thus it is capable of seamless parallelization of most of the operations provided by the library

The name for this library comes from the Russian word Massiv (Масси́в), which means an Array.

Status

Language	Github Actions	Coveralls	Gitter.im

Package	Hackage	Nightly	LTS
`massiv`
`massiv-test`
`haskell-scheduler`

Introduction

Everything in the library revolves around an Array r ix e - a data family for anything that can be thought of as an array. The type variables, from the end, are:

e - element of an array.
ix - an index that will map to an actual element. The index must be an instance of the Index class with the default one being an Ix n type family and an optional being tuples of Ints.
r - underlying representation. There are two main categories of representations described below.

Manifest

These are your classical arrays that are located in memory and allow constant time lookup of elements. Another main property they share is that they have a mutable interface. An Array with manifest representation can be thawed into a mutable MArray and then frozen back into its immutable counterpart after some destructive operation is applied to the mutable copy. The differences among representations below is in the way that elements are being accessed in memory:

P - Array with elements that are an instance of Prim type class, i.e. common Haskell primitive types: Int, Word, Char, etc. It is backed by unpinned memory and based on ByteArray.
U - Unboxed arrays. The elements are instances of the Unbox type class. Usually just as fast as P, but has a slightly wider range of data types that it can work with. Notable data types that can be stored as elements are Bool, tuples and Ix n.
S - Storable arrays. Backed by pinned memory and based on ForeignPtr, while elements are instances of the Storable type class.
B - Boxed arrays that don't have restrictions on their elements, since they are represented as pointers to elements, thus making them the slowest type of array, but also the most general. Arrays of this representation are element strict, in other words its elements are kept in Weak-Head Normal Form (WHNF).
BN - Also boxed arrays, but unlike the other representation B, its elements are in Normal Form, i.e. in a fully evaluated state and no thunks or memory leaks are possible. It does require an NFData instance for the elements though.
BL - Boxed lazy array. Just like B and BN, except values are evaluated on demand.

Delayed

Main trait of delayed arrays is that they do not exist in memory and instead describe the contents of an array as a function or a composition of functions. In fact all of the fusion capabilities in massiv can be attributed to delayed arrays.

D - Delayed "pull" array is just a function from an index to an element: (ix -> e). Therefore indexing into this type of array is not possible, instead elements are evaluated with the evaluateM function each time when applied to an index. It gives us a nice ability to compose functions together when applied to an array and possibly even fold over without ever allocating intermediate manifest arrays.
DW - Delayed windowed array is very similar to the version above, except it has two functions that describe it, one for the near border elements and one for the interior, aka. the window. This is used for Stencil computation and things that derive from it, such as convolution, for instance.
DL - Delayed "push" array contains a monadic action that describes how an array can be loaded into memory. This is most useful for composing arrays together.
DS - Delayed stream array is a sequence of elements, possibly even an infinite one. This is most useful for situations when we don't know the size of our resulting array ahead of time, which is common in operations such as filter, mapMaybe, unfold etc. Naturally, in the end we can only load such an array into a flat vector.
DI - Is just like D, except loading is interleaved and is useful for parallel loading arrays with unbalanced computation, such as Mandelbrot set or ray tracing, for example.

Construct

Creating a delayed type of array allows us to fuse any future operations we decide to perform on it. Let's look at this example:

λ> import Data.Massiv.Array as A
λ> makeVectorR D Seq 10 id
Array D Seq (Sz1 10)
  [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

Here we created a delayed vector of size 10, which is in reality just an id function from its index to an element (see the Computation section for the meaning of Seq). So let's go ahead and square its elements

λ> vec = makeVectorR D Seq 10 id
λ> evaluateM vec 4
4
λ> vec2 = A.map (^ (2 :: Int)) vec
λ> evaluateM vec2 4
16

It's not that exciting, since every time we call evaluateM it will recompute the element, every time, therefore this function should be avoided at all costs! Instead we can use all of the functions that take Source like arrays and then fuse that computation together by calling compute, or a handy computeAs function and only afterwards apply an indexM function or its partial synonym: (!). Any delayed array can also be reduced using one of the folding functions, thus completely avoiding any memory allocation, or converted to a list, if that's what you need:

λ> vec2U = computeAs U vec2
λ> vec2U
Array U Seq (Sz1 10)
  [ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 ]
λ> vec2U ! 4
16
λ> toList vec2U
[0,1,4,9,16,25,36,49,64,81]
λ> A.sum vec2U
285

There is a whole multitude of ways to construct arrays:

by using one of many helper functions: makeArray, range, rangeStepFrom, enumFromN, etc.
through conversion: from lists, from Vectors in vector library, from ByteStrings in bytestring;
with a mutable interface in PrimMonad (IO, ST, etc.), eg: makeMArray, generateArray, unfoldrPrim, etc.

It's worth noting that, in the next example, nested lists will be loaded into an unboxed manifest array and the sum of its elements will be computed in parallel on all available cores.

λ> A.sum (fromLists' Par [[0,0,0,0,0],[0,1,2,3,4],[0,2,4,6,8]] :: Array U Ix2 Double)
30.0

The above wouldn't run in parallel in ghci of course, as the program would have to be compiled with ghc using -threaded -with-rtsopts=-N flags in order to use all available cores. Alternatively we could compile with the -threaded flag and then pass the number of capabilities directly to the runtime with +RTS -N<n>, where <n> is the number of cores you'd like to utilize.

Index

The main Ix n closed type family can be somewhat confusing, but there is no need to fully understand how it works in order to start using it. GHC might ask you for the DataKinds language extension if IxN n is used in a type signature, but there are type and pattern synonyms for the first five dimensions: Ix1, Ix2, Ix3, Ix4 and Ix5.

There are three distinguishable constructors for the index:

The first one is simply an int: Ix1 = Ix 1 = Int, therefore vectors can be indexed in a usual way without some extra wrapping data type, just as it was demonstrated in a previous section.
The second one is Ix2 for operating on 2-dimensional arrays and has a constructor :.

λ> makeArrayR D Seq (Sz (3 :. 5)) (\ (i :. j) -> i * j)
Array D Seq (Sz (3 :. 5))
  [ [ 0, 0, 0, 0, 0 ]
  , [ 0, 1, 2, 3, 4 ]
  , [ 0, 2, 4, 6, 8 ]
  ]

The third one is IxN n and is designed for working with N-dimensional arrays, and has a similar looking constructor :>, except that it can be chained indefinitely on top of :.

λ> arr3 = makeArrayR P Seq (Sz (3 :> 2 :. 5)) (\ (i :> j :. k) -> i * j + k)
λ> :t arr3
arr3 :: Array P (IxN 3) Int
λ> arr3
Array P Seq (Sz (3 :> 2 :. 5))
  [ [ [ 0, 1, 2, 3, 4 ]
    , [ 0, 1, 2, 3, 4 ]
    ]
  , [ [ 0, 1, 2, 3, 4 ]
    , [ 1, 2, 3, 4, 5 ]
    ]
  , [ [ 0, 1, 2, 3, 4 ]
    , [ 2, 3, 4, 5, 6 ]
    ]
  ]
λ> arr3 ! (2 :> 1 :. 4)
6
λ> ix10 = 10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1
λ> :t ix10
ix10 :: IxN 10
λ> ix10 -- 10-dimensional index
10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1

Here is how we can construct a 4-dimensional array and sum its elements in constant memory:

λ> arr = makeArrayR D Seq (Sz (10 :> 20 :> 30 :. 40)) $ \ (i :> j :> k :. l) -> (i * j + k) * k + l
λ> :t arr -- a 4-dimensional array
arr :: Array D (IxN 4) Int
λ> A.sum arr
221890000

There are quite a few helper functions that can operate on indices, but these are only needed when writing functions that work for arrays of arbitrary dimension, as such they are scarcely used:

λ> pullOutDim' ix10 5
(5,10 :> 9 :> 8 :> 7 :> 6 :> 4 :> 3 :> 2 :. 1)
λ> unconsDim ix10
(10,9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :> 2 :. 1)
λ> unsnocDim ix10
(10 :> 9 :> 8 :> 7 :> 6 :> 5 :> 4 :> 3 :. 2,1)

All of the Ix n indices are instances of Num so basic numeric operations are made easier:

λ> (1 :> 2 :. 3) + (4 :> 5 :. 6)
5 :> 7 :. 9
λ> 5 :: Ix4
5 :> 5 :> 5 :. 5

It is important to note that the size type is distinct from the index by the newtype wrapper Sz ix. There is a constructor Sz, which will make sure that none of the dimensions are negative:

λ> Sz (2 :> 3 :. 4)
Sz (2 :> 3 :. 4)
λ> Sz (10 :> 2 :> -30 :. 4)
Sz (10 :> 2 :> 0 :. 4)

Same as with indices, there are helper pattern synonyms: Sz1, Sz2, Sz3, Sz4 and Sz5.

λ> Sz3 2 3 4
Sz (2 :> 3 :. 4)
λ> Sz4 10 2 (-30) 4
Sz (10 :> 2 :> 0 :. 4)

As well as the Num instance:

λ> 4 :: Sz5
Sz (4 :> 4 :> 4 :> 4 :. 4)
λ> (Sz2 1 2) + 3
Sz (4 :. 5)
λ> (Sz2 1 2) - 3
Sz (0 :. 0)

Alternatively tuples of Ints can be used for working with arrays, up to and including 5-tuples (type synonyms: Ix2T .. Ix5T), but since tuples are polymorphic it is necessary to restrict the resulting array type. Not all operations in the library support tuples, so it is advised to avoid them for indexing.

λ> makeArray Seq (4, 20) (uncurry (*)) :: Array P Ix2T Int
(Array P Seq ((4,20))
  [ [ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ]
  , [ 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 ]
  , [ 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 ]
  , [ 0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57 ]
  ])
λ> :i Ix2T
type Ix2T = (Int, Int)

There are helper functions that can go back and forth between tuples and Ix n indices.

λ> fromIx4 (3 :> 4 :> 5 :. 6)
(3,4,5,6)
λ> toIx5 (3, 4, 5, 6, 7)
3 :> 4 :> 5 :> 6 :. 7

Slicing

In order to get a subsection of an array there is no need to recompute it, unless we want to free up the no longer memory, of course. So, there are a few slicing, resizing and extraction operators that can do it all in constant time, modulo the index manipulation:

λ> arr = makeArrayR U Seq (Sz (4 :> 2 :. 6)) fromIx3
λ> arr !> 3 !> 1
Array M Seq (Sz1 6)
  [ (3,1,0), (3,1,1), (3,1,2), (3,1,3), (3,1,4), (3,1,5) ]

As you might suspect all of the slicing, indexing, extracting, resizing operations are partial, and those are frowned upon in Haskell. So there are matching functions that can do the same operations safely by using MonadThrow and thus returning Nothing, Left SomeException or throwing an exception in case of IO on failure, for example:

λ> arr !?> 3 ??> 1
Array M Seq (Sz1 6)
  [ (3,1,0), (3,1,1), (3,1,2), (3,1,3), (3,1,4), (3,1,5) ]
λ> arr !?> 3 ??> 1 ?? 0 :: Maybe (Int, Int, Int)
Just (3,1,0)

In above examples we first take a slice at the 4th page (index 3, since we start at 0), then another one at the 2nd row (index 1). While in the last example we also take 1st element at position 0. Pretty neat, huh? Naturally, by doing a slice we always reduce dimension by one. We can do slicing from the outside as well as from the inside:

λ> Ix1 1 ... 9
Array D Seq (Sz1 10)
  [ 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
λ> a <- resizeM (Sz (3 :> 2 :. 4)) $ Ix1 11 ... 34
λ> a
Array D Seq (Sz (3 :> 2 :. 4))
  [ [ [ 11, 12, 13, 14 ]
    , [ 15, 16, 17, 18 ]
    ]
  , [ [ 19, 20, 21, 22 ]
    , [ 23, 24, 25, 26 ]
    ]
  , [ [ 27, 28, 29, 30 ]
    , [ 31, 32, 33, 34 ]
    ]
  ]
λ> a !> 0
Array D Seq (Sz (2 :. 4))
  [ [ 11, 12, 13, 14 ]
  , [ 15, 16, 17, 18 ]
  ]
λ> a <! 0
Array D Seq (Sz (3 :. 2))
  [ [ 11, 15 ]
  , [ 19, 23 ]
  , [ 27, 31 ]
  ]

Or we can slice along any other available dimension:

λ> a <!> (Dim 2, 0)
Array D Seq (Sz (3 :. 4))
  [ [ 11, 12, 13, 14 ]
  , [ 19, 20, 21, 22 ]
  , [ 27, 28, 29, 30 ]
  ]

In order to extract sub-array while preserving dimensionality we can use extractM or extractFromToM.

λ> extractM (0 :> 1 :. 1) (Sz (3 :> 1 :. 2)) a
Array D Seq (Sz (3 :> 1 :. 2))
  [ [ [ 16, 17 ]
    ]
  , [ [ 24, 25 ]
    ]
  , [ [ 32, 33 ]
    ]
  ]
λ> extractFromToM (1 :> 0 :. 1) (3 :> 2 :. 4) a
Array D Seq (Sz (2 :> 2 :. 3))
  [ [ [ 20, 21, 22 ]
    , [ 24, 25, 26 ]
    ]
  , [ [ 28, 29, 30 ]
    , [ 32, 33, 34 ]
    ]
  ]

Computation and parallelism

There is a data type Comp that controls how elements will be computed when calling the compute function. It has a few constructors, although most of the time either Seq or Par will be sufficient:

Seq - computation will be done sequentially on one core (capability in ghc).
ParOn [Int] - perform computation in parallel while pinning the workers to particular cores. Providing an empty list will result in the computation being distributed over all available cores, or better known in Haskell as capabilities.
ParN Word16 - similar to ParOn, except it simply specifies the number of cores to use, with 0 meaning all cores.
Par - isn't really a constructor but a pattern for constructing ParOn [], which will result in Scheduler using all cores, thus should be used instead of ParOn.
Par' - similar to Par, except it uses ParN 0 underneath.

Just to make sure a simple novice mistake is prevented, which I have seen in the past, make sure your source code is compiled with ghc -O2 -threaded -with-rtsopts=-N, otherwise no parallelization and poor performance are waiting for you. Also a bit later you might notice the {-# INLINE funcName #-} pragma being used, oftentimes it is a good idea to do that, but not always required. It is worthwhile to benchmark and experiment.

Stencil

Instead of manually iterating over a multi-dimensional array and applying a function to each element, while reading its neighboring elements (as you would do in an imperative language) in a functional language it is much more efficient to apply a stencil function and let the library take care of all of bounds checking and iterating in a cache friendly manner.

What's a stencil? It is a declarative way of specifying a pattern for how elements of an array in a neighborhood will be used in order to update each element of the newly created array. In massiv a Stencil is a function that can read the neighboring elements of the stencil's center (the zero index), and only those, and then outputs a new value for the center element.

Let's create a simple, but somewhat meaningful array and create an averaging stencil. There is nothing special about the array itself, but the averaging filter is a stencil that sums the elements in a Moore neighborhood and divides the result by 9, i.e. finds the average of a 3 by 3 square.

arrLightIx2 :: Comp -> Sz Ix2 -> Array D Ix2 Double
arrLightIx2 comp arrSz = makeArray comp arrSz $ \ (i :. j) -> sin (fromIntegral (i * i + j * j))
{-# INLINE arrLightIx2 #-}

average3x3Filter :: Fractional a => Stencil Ix2 a a
average3x3Filter = makeStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  (  get (-1 :. -1) + get (-1 :. 0) + get (-1 :. 1) +
     get ( 0 :. -1) + get ( 0 :. 0) + get ( 0 :. 1) +
     get ( 1 :. -1) + get ( 1 :. 0) + get ( 1 :. 1)   ) / 9
{-# INLINE average3x3Filter #-}

Here is what it would look like in GHCi. We create a delayed array with some funky periodic function, and make sure it is computed prior to mapping an average stencil over it:

λ> arr = computeAs U $ arrLightIx2 Par (Sz (600 :. 800))
λ> :t arr
arr :: Array U Ix2 Double
λ> :t mapStencil Edge average3x3Filter arr
mapStencil Edge average3x3Filter arr :: Array DW Ix2 Double

As you can see, that operation produced an array of the earlier mentioned representation Delayed Windowed DW. In its essence DW is an array type that does no bounds checking in order to gain performance, except when it's near the border, where it uses a border resolution technique supplied by the user (Edge in the example above). Currently it is used only in stencils and not much else can be done to an array of this type besides further computing it into a manifest representation.

This example will be continued in the next section, but before that I would like to mention that some might notice that it looks very much like convolution, and in fact convolution can be implemented with a stencil. There is a helper function makeConvolutionStencil that lets you do just that. For the sake of example we'll do a sum of all neighbors by hand instead:

sum3x3Filter :: Fractional a => Stencil Ix2 a a
sum3x3Filter = makeConvolutionStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  get (-1 :. -1) 1 . get (-1 :. 0) 1 . get (-1 :. 1) 1 .
  get ( 0 :. -1) 1 . get ( 0 :. 0) 1 . get ( 0 :. 1) 1 .
  get ( 1 :. -1) 1 . get ( 1 :. 0) 1 . get ( 1 :. 1) 1
{-# INLINE sum3x3Filter #-}

There is not a single plus or multiplication sign, that is because convolutions is actually summation of elements multiplied by a kernel element, so instead we have composition of functions applied to an offset index and a multiplier. After we map that stencil, we can further divide each element of the array by 9 in order to get the average. Yeah, I lied a bit, Array DW ix is an instance of Functor class, so we can map functions over it, which will be fused as with a regular Delayed array:

computeAs U $ fmap (/9) $ mapStencil Edge sum3x3Filter arr

If you are still confused of what a stencil is, but you are familiar with Conway's Game of Life this should hopefully clarify it a bit more. The function life below is a single iteration of Game of Life:

lifeRules :: Word8 -> Word8 -> Word8
lifeRules 0 3 = 1
lifeRules 1 2 = 1
lifeRules 1 3 = 1
lifeRules _ _ = 0

lifeStencil :: Stencil Ix2 Word8 Word8
lifeStencil = makeStencil (Sz (3 :. 3)) (1 :. 1) $ \ get ->
  lifeRules (get (0 :. 0)) $ get (-1 :. -1) + get (-1 :. 0) + get (-1 :. 1) +
                             get ( 0 :. -1)         +         get ( 0 :. 1) +
                             get ( 1 :. -1) + get ( 1 :. 0) + get ( 1 :. 1)

life :: Array S Ix2 Word8 -> Array S Ix2 Word8
life = compute . mapStencil Wrap lifeStencil

The full working example that uses GLUT and OpenGL is located in GameOfLife. You can run it if you have the GLUT dependencies installed:

$ cd massiv-examples && stack run GameOfLife

massiv-io

In order to do anything useful with arrays we often need to be able to read some data from a file. Considering that most common array-like files are images, massiv-io provides an interface to read, write and display images in common formats using Haskell native JuicyPixels and Netpbm packages.

Color package provides a variety of color spaces and conversions between them, which are used by massiv-io package as pixels during reading and writing images.

An earlier example wasn't particularly interesting, since we couldn't visualize what is actually going on, so let's expand on it:

import Data.Massiv.Array
import Data.Massiv.Array.IO

main :: IO ()
main = do
  let arr = computeAs S $ arrLightIx2 Par (600 :. 800)
      toImage ::
           (Functor (Array r Ix2), Load r Ix2 (Pixel (Y' SRGB) Word8))
        => Array r Ix2 Double
        -> Image S (Y' SRGB) Word8
      toImage = computeAs S . fmap (PixelY' . toWord8)
      lightPath = "files/light.png"
      lightImage = toImage $ delay arr
      lightAvgPath = "files/light_avg.png"
      lightAvgImage = toImage $ mapStencil Edge (avgStencil 3) arr
      lightSumPath = "files/light_sum.png"
      lightSumImage = toImage $ mapStencil Edge (sumStencil 3) arr
  writeImage lightPath lightImage
  putStrLn $ "written: " ++ lightPath
  writeImage lightAvgPath lightAvgImage
  putStrLn $ "written: " ++ lightAvgPath
  writeImage lightSumPath lightSumImage
  putStrLn $ "written: " ++ lightSumPath
  displayImageUsing defaultViewer True . computeAs S
    =<< concatM 1 [lightAvgImage, lightImage, lightSumImage]

massiv-examples/vision/files/light.png:

massiv-examples/vision/files/light_avg.png:

The full example is in the example vision package and if you have stack installed you can run it as:

$ cd massiv-examples && stack run avg-sum

Other libraries

A natural question might come to mind: Why even bother with a new array library when we already have a few really good ones in the Haskell world? The main reasons for me are performance and usability. I personally felt like there was much room for improvement before I even started working on this package, and it seems like it turned out to be true. For example, the most common goto library for dealing with multidimensional arrays and parallel computation used to be Repa, which I personally was a big fan of for quite some time, to the point that I even wrote a Haskell Image Processing library based on top of it.

Here is a quick summary of how massiv is better than Repa:

It is actively maintained.
Much more sophisticated scheduler. It is resumable and is capable of handling nested parallel computation.
Improved indexing data types.
Safe stencils for arbitrary dimensions, not only 2D convolution. Stencils are composable
Improved performance on almost all operations.
Structural parallel folds (i.e. left/right - direction is preserved)
Super easy slicing.
Extensive mutable interface
More fusion capabilities with delayed stream and push array representations.
Delayed arrays aren't indexable, only Manifest are (saving user from common pitfall in Repa of trying to read elements of delayed array)

As far as usability of the library goes, it is very subjective, thus I'll let you be a judge of that. When talking about performance it is the facts that do matter. Thus, let's not continue this discussion in pure abstract words, below is a glimpse into benchmarks against Repa library running with GHC 8.8.4 on Intel® Core™ i7-3740QM CPU @ 2.70GHz × 8

Matrix multiplication:

benchmarking Repa/MxM U Double - (500x800 X 800x500)/Par
time                 120.5 ms   (115.0 ms .. 127.2 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 124.1 ms   (121.2 ms .. 127.3 ms)
std dev              5.212 ms   (2.422 ms .. 6.620 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking Massiv/MxM U Double - (500x800 X 800x500)/Par
time                 41.46 ms   (40.67 ms .. 42.45 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 38.45 ms   (37.22 ms .. 39.68 ms)
std dev              2.342 ms   (1.769 ms .. 3.010 ms)
variance introduced by outliers: 19% (moderately inflated)

Sobel operator:

benchmarking Sobel/Par/Operator - Repa
time                 17.82 ms   (17.30 ms .. 18.32 ms)
                     0.997 R²   (0.994 R² .. 0.998 R²)
mean                 17.42 ms   (17.21 ms .. 17.69 ms)
std dev              593.0 μs   (478.1 μs .. 767.5 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking Sobel/Par/Operator - Massiv
time                 7.421 ms   (7.230 ms .. 7.619 ms)
                     0.994 R²   (0.991 R² .. 0.997 R²)
mean                 7.537 ms   (7.422 ms .. 7.635 ms)
std dev              334.3 μs   (281.3 μs .. 389.9 μs)
variance introduced by outliers: 20% (moderately inflated)

Sum all elements of a 2D array:

benchmarking Sum/Seq/Repa
time                 539.7 ms   (523.2 ms .. 547.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 540.1 ms   (535.7 ms .. 543.2 ms)
std dev              4.727 ms   (2.208 ms .. 6.609 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking Sum/Seq/Vector
time                 16.95 ms   (16.78 ms .. 17.07 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 17.23 ms   (17.13 ms .. 17.43 ms)
std dev              331.4 μs   (174.1 μs .. 490.0 μs)

benchmarking Sum/Seq/Massiv
time                 16.78 ms   (16.71 ms .. 16.85 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.80 ms   (16.76 ms .. 16.88 ms)
std dev              127.8 μs   (89.95 μs .. 186.2 μs)

benchmarking Sum/Par/Repa
time                 81.76 ms   (78.52 ms .. 84.37 ms)
                     0.997 R²   (0.990 R² .. 1.000 R²)
mean                 79.20 ms   (78.03 ms .. 80.91 ms)
std dev              2.613 ms   (1.565 ms .. 3.736 ms)

benchmarking Sum/Par/Massiv
time                 8.102 ms   (7.971 ms .. 8.216 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 7.967 ms   (7.852 ms .. 8.028 ms)
std dev              236.4 μs   (168.4 μs .. 343.2 μs)
variance introduced by outliers: 11% (moderately inflated)

Here is also a blog post that compares Performance of Haskell Array libraries through Canny edge detection

Further resources on learning `massiv`:

massiv's People

Contributors

Stargazers

Watchers

massiv's Issues

Introduce github templates with contribution guidelines (both PRs and issues)

I'd like to contribute to massiv, specifically regarding #60, #56, #48 and some other things which I've found useful. Are there any requirements for code submitted as a PR here? For example:

HLint
Specific versions of GHC we should be compatible with
Code formatting requirements, such as use of a formatter

I'd rather not create extra work for you if I can avoid it.

Building with cabal alone

To build with cabal alone I just added a cabal.project file at the top level with the following content

packages:   */*.cabal
optimization: 2

package    massiv-bench
  benchmarks: True

at the top level, and then you can cabal v2-build all, cabal v2-bench massiv-bench, cabal v2-test massiv, etc.

The examples seem to need at least one extra external library

sudo apt install freeglut3 freeglut3-dev

The examples still don't build completely:

[12 of 16] Compiling Data.Massiv.Array.IO.Base ( src/Data/Massiv/Array/IO/Base.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/massiv-io-0.1.5.0/opt/build/Data/Massiv/Array/IO/Base.o )
[13 of 16] Compiling Data.Massiv.Array.IO.Image.JuicyPixels ( src/Data/Massiv/Array/IO/Image/JuicyPixels.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/massiv-io-0.1.5.0/opt/build/Data/Massiv/Array/IO/Image/JuicyPixels.o )

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:775:3: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: (JP.ImageY32 _)
    |
775 |   case jpDynImg of
    |   ^^^^^^^^^^^^^^^^...

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:821:3: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In a case alternative: Patterns not matched: (JP.ImageY32 _)
    |
821 |   case jpDynImg of
    |   ^^^^^^^^^^^^^^^^...

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:852:17: warning: [-Wdeprecations]
    In the use of ‘computeSource’
    (imported from Data.Massiv.Array, but defined in massiv-0.3.0.0:Data.Massiv.Array.Manifest.Internal):
    Deprecated: "In favor of less restrictive `convert`"
    |
852 |     [ (\Refl -> computeSource img) <$>
    |                 ^^^^^^^^^^^^^

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:881:1: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In an equation for ‘showJP’: Patterns not matched: (JP.ImageY32 _)
    |
881 | showJP (JP.ImageY8     _) = "Image S Y Word8"
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:906:11: warning: [-Wdeprecations]
    In the use of ‘computeSource’
    (imported from Data.Massiv.Array, but defined in massiv-0.3.0.0:Data.Massiv.Array.Manifest.Internal):
    Deprecated: "In favor of less restrictive `convert`"
    |
906 |   !arrS = computeSource img :: Image S cs (JP.PixelBaseComponent a)
    |           ^^^^^^^^^^^^^

src/Data/Massiv/Array/IO/Image/JuicyPixels.hs:972:12: warning: [-Wdeprecations]
    In the use of ‘fromVector’
    (imported from Data.Massiv.Array.Manifest.Vector):
    Deprecated: "In favor of safer `fromVectorM`"
    |
972 |   return $ fromVector Par (Sz (m :. n)) $ V.unsafeCast v
    |            ^^^^^^^^^^
[14 of 16] Compiling Data.Massiv.Array.IO.Image.Netpbm ( src/Data/Massiv/Array/IO/Image/Netpbm.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/massiv-io-0.1.5.0/opt/build/Data/Massiv/Array/IO/Image/Netpbm.o )

src/Data/Massiv/Array/IO/Image/Netpbm.hs:169:12: warning: [-Wdeprecations]
    In the use of ‘fromVector’
    (imported from Data.Massiv.Array.Manifest.Vector):
    Deprecated: "In favor of safer `fromVectorM`"
    |
169 |   return $ fromVector Par (Sz (m :. n)) $ V.unsafeCast v
    |            ^^^^^^^^^^
[15 of 16] Compiling Data.Massiv.Array.IO.Image ( src/Data/Massiv/Array/IO/Image.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/massiv-io-0.1.5.0/opt/build/Data/Massiv/Array/IO/Image.o )
[16 of 16] Compiling Data.Massiv.Array.IO ( src/Data/Massiv/Array/IO.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/massiv-io-0.1.5.0/opt/build/Data/Massiv/Array/IO.o )
Configuring library for examples-0.1.0.0..
Preprocessing library for examples-0.1.0.0..
Building library for examples-0.1.0.0..
[1 of 2] Compiling Examples.Convolution ( src/Examples/Convolution.hs, /home/jrp/Projects/massiv/dist-newstyle/build/x86_64-linux/ghc-8.6.4/examples-0.1.0.0/opt/build/Examples/Convolution.o )

src/Examples/Convolution.hs:14:41: error:
    • Couldn't match expected type ‘Sz Ix2’ with actual type ‘Ix2’
    • In the second argument of ‘makeArray’, namely ‘arrSz’
      In the expression: makeArray comp arrSz lightFunc
      In an equation for ‘arrLightIx2’:
          arrLightIx2 comp arrSz
            = makeArray comp arrSz lightFunc
            where
                lightFunc (i :. j)
                  = sin (fromIntegral (i ^ (2 :: Int) + j ^ (2 :: Int)) :: Double)
   |
14 | arrLightIx2 comp arrSz = makeArray comp arrSz lightFunc
   |                                         ^^^^^

src/Examples/Convolution.hs:20:33: error:
    • Couldn't match expected type ‘Sz Ix2’ with actual type ‘Ix2’
    • In the first argument of ‘makeStencil’, namely ‘(3 :. 3)’
      In the expression: makeStencil (3 :. 3) (1 :. 1)
      In the expression:
        makeStencil (3 :. 3) (1 :. 1)
          $ \ get
              -> (get (- 1 :. - 1) + get (- 1 :. 0) + get (- 1 :. 1)
                    + get (0 :. - 1)
                    + get (0 :. 0)
                    + get (0 :. 1)
                    + get (1 :. - 1)
                    + get (1 :. 0)
                    + get (1 :. 1))
                   / 9
   |
20 | average3x3Filter = makeStencil (3 :. 3) (1 :. 1) $ \ get ->
   |                                 ^^^^^^

src/Examples/Convolution.hs:28:40: error:
    • Couldn't match expected type ‘Sz Ix2’ with actual type ‘Ix2’
    • In the first argument of ‘makeConvolutionStencil’, namely
        ‘(3 :. 3)’
      In the expression: makeConvolutionStencil (3 :. 3) (1 :. 1)
      In the expression:
        makeConvolutionStencil (3 :. 3) (1 :. 1)
          $ \ get
              -> get (- 1 :. - 1) 1
                   . get (- 1 :. 0) 1
                       . get (- 1 :. 1) 1
                           . get (0 :. - 1) 1
                               . get (0 :. 0) 1
                                   . get (0 :. 1) 1
                                       . get (1 :. - 1) 1 . get (1 :. 0) 1 . get (1 :. 1) 1
   |
28 | sum3x3Filter = makeConvolutionStencil (3 :. 3) (1 :. 1) $ \ get ->
   |                                        ^^^^^^
Installing   OpenGLRaw-3.3.2.0 (lib)
Completed    OpenGLRaw-3.3.2.0 (lib)
cabal: Failed to build examples-0.1.0.0 (which is required by exe:mkfilter
from examples-0.1.0.0, exe:life from examples-0.1.0.0 and others).

These are fixable by adding a few Sz annotations. The examples also seem to need massiv-scheduler as a dependency.

Monadic maps that keep the result

Currently the massiv API seems to be lacking monadic map operations, in particular mapM and imapM. These operations are tricky to implement since they can’t be delayed so I guess they might be intentionally omitted? Since these operations are quite common in Haskell, I think it makes sense to either implement them in massiv and maybe add a warning that they will create a new array and can thereby be expensive or add a comment to the docs explaining why these operations are not provided. I don’t have a particularly strong preference regarding the two options.

Keep up the good work, I really like the lib so far!

[request] LLVM Benchmarks

Repa warns that performance can go out the window if we're not careful. One thing it advertises to help with this is to compile with -fllvm. In my old repa-based benchmarks for mapalgebra, I generally saw a 2x speedup. Do you know how LLVM affects Massiv?

Regardless of the answer here, I'll be testing both for mapalgebra. I plan to write up a long blog post about it with all sorts of benchmarks, and turning on LLVM would be included with that.

Add massiv benchmarks in `haskell-perf`

It would be very informative to have massiv featured alongside vector in the haskell-perf/sequences benchmark suite

https://github.com/haskell-perf/sequences

``resizeDefault`` allowing a 'fill' element if the dimensions are short

Is there a way to do something like

resizeDefault :: (Index ix', Size r ix e) => e -> ix' -> Array r ix e -> Array r ix' e

This will use the supplied e to 'fill in' any elements needed to resize that don't exist in the argument array. For example:

resizeDefault 0 (4 :. 3) (makeArrayR U Par 10 ((+ 1) :: Int -> Int))

should produce

Array U Par (4 :. 3)
  [  [ 1,2,3 ]
  ,  [ 4,5,6 ]
  ,  [ 7,8,9 ]
  ,  [ 10,0,0 ]
  ]

[question] Escape from DW

This is half a question and half me trying to work out where one can go after getting an Array DW.

So, DW doesn't have a Source instance, meaning we can't zipWith two Array DW. This makes doing Local Operations (binary ops on two Rasters) harder with two Array DW (the output of a Focal op), since you'd have to force the memory once to get something Source, then continue.

We can't writeImage on an Array DW for the same reason, but that makes sense to me. computeAs S beforehand makes a lot of sense.

We can't delay an Array DW either (as we discussed before), but this also makes sense.
You'd lose all the goodness that DW provides.

We can displayImage an Array DW. How does this work? Is the Array silently forced to some concrete Representation before being fed to the image viewer?

Am I missing anything? Where else can an Array DW go? By constrast, it seems like Array D can go just about anywhere.

[request] Array Interpolation via Stencils

I assume that it's currently possible "expand" an image, say from 256x256 -> 512x512 via some manual traverse calls and simple interpolation.

Both for mapalgebra and for implementing a Haskell version of the hqx algorithm, it would be very handy if images could be upsampled/downsampled (I can never remember the difference) using stencils.

For instance, hxq wants to know the neighbourhood around each pixel. Based on the "shape" of the surrounding colours, it replaces the current pixel with 4 (a 2x2 grid) new pixels, effectively ballooning the image up by double. Currently, stencil operations assume "read some neighbourhood, write 1 pixel". Is this an unchangeable assumption?

Cheers,
Colin

Rename Rank -> Dimension

Rank term is used incorrectly in massiv and should be renamed to dimensions. Goes for both function rank and type family Rank

Proper exceptions instead of `error`

Calling error is a pretty poor way to fail. Now, that common failures have been mapped out it is time to switch to custom exceptions, eg. IndexOutOfBounds, UnitializedElement etc.

[massiv] Odd stencil indexing?

The stencil section of the README implies that neighbourhood indexing works like this:

| (-1, -1) | (-1, 0) | (-1, 1) |
--------------------------------
| (0, -1)  | (0, 0)  | (0, 1)  |
--------------------------------
| (1, -1)  | (1, 0)  | (1, 1)  |

and yet:

> makeArrayR U Seq (3 :. 3) (\(r :. c) -> (r * 3) + c + 1)
(Array U Seq (3 :. 3)
  [ [ 1,2,3 ]
  , [ 4,5,6 ]
  , [ 7,8,9 ]
  ])

> computeAs U . zag $ makeArrayR U Seq (3 :. 3) (\(r :. c) -> (r * 3) + c + 1)
(Array U Seq (3 :. 3)
  [ [ 0,1,2 ]
  , [ 0,4,5 ]
  , [ 0,7,8 ]
  ])

where:

fidSten :: Default a => Stencil Ix2 a a
fidSten = makeStencil (Fill def) (3 :. 3) (1 :. 1) $ \f -> f (0 :. 1)

zag :: (Default a, Manifest r Ix2 a) => Array r Ix2 a -> Array DW Ix2 a
zag a = mapStencil fidSten a

It looks like f (0 :. 1) is grabbing the cell to the left, not the right as I had expected.

Any idea what's going on? (I found this deep down the debugging rabbit-hole)

Implementation for `deleteM`

Similar to deleteRegionM, but more general function. Also will need some benchmarks.

deleteM ::
     (MonadThrow m, Extract r ix e, Source (EltRepr r ix) ix e, Foldable f)
  => Dim
  -> f Ix1
  -> Array r ix e
  -> m (Array DL ix e)

[massiv-io] Automatic computation strategy detection

I discovered experimentally that readImage will always produce an Array whose Comp is Seq.

Not-unreasonably-expected-behaviour could be:

Image reading itself is done in serial, regardless of CPU core count
If getNumCapabilities is greater than 1 and/or the Array dimensions are high enough
to make Par overhead worth it:
- ColorSpace / cell type conversion (if necessary) could be in Par
- The final Array S could be automatically set to Par

Questions:

If getNumCapabilities is greater than 1, should everything be Par? Sure there might be a bit of unneeded overhead if the image is too small, but that might not be a typical use case. For GIS, we expect 256x256 rasters at minimum, with 512x512 and 1024x1024 also being typical. Giant ones like 65k x 65k are also out there, and are a use-case I'd like to support in mapalgebra.

Summary of planned breaking changes that will happen in v0.4 release

Check list of planned breaking changes:

``ifoldMono`` and friends?

It would be good to have functions with signatures like:

ifoldMono :: (Source r ix e, Monoid m) => (ix -> e -> m) -> Array r ix e -> m

Is this actually doable without forcing a sequential fold, or is there some inherent limitation that makes this impossible?

Remove deprecated functions

Here is the list of all deprecated functions to be removed in the next major release:

massiv:

massiv-io:

``foldMono, ifoldMono`` over the outermost dimension

By this, I mean functions with signatures like:

foldMonoOuter :: (OuterSlice r ix e, Source r ix e, Monoid m) => (Elt r ix e -> m) -> Array r ix e -> m
ifoldMonoOuter :: (OuterSlice r ix e, Source r ix e, Monoid m) => (Int -> Elt r ix e -> m) -> Array r ix e -> m

This would be very convenient for the work I'm trying to do at the moment.

generateM is broken for monads encoding nondeterminism

Illustration is quite simple:

λ> import qualified Data.Massiv.Array as A
λ> xs = (A.generateM A.Seq 3 (\i -> [0..i]) :: [A.Array A.U Int Int])
λ> xs
[(Array U Seq (3)
  [ 0,0,0 ]),(Array U Seq (3)
  [ 0,0,1 ]),(Array U Seq (3)
  [ 0,0,2 ]),(Array U Seq (3)
  [ 0,1,0 ]),(Array U Seq (3)
  [ 0,1,1 ]),(Array U Seq (3)
  [ 0,1,2 ])]
λ> xs
[(Array U Seq (3)
  [ 0,1,2 ]),(Array U Seq (3)
  [ 0,1,2 ]),(Array U Seq (3)
  [ 0,1,2 ]),(Array U Seq (3)
  [ 0,1,2 ]),(Array U Seq (3)
  [ 0,1,2 ]),(Array U Seq (3)
  [ 0,1,2 ])]

As it could be seen all vectors share same buffer so it becomes overwritten. I thing only possible fix is to strengthen Monad constraint to PrimMonad and use it to thread state token.

Terrible stencil performance with GHC 8.0.2

I've known this for a while, but it be good to have an official record of this. There has been a regression at some point when there was work on indexing and stencil computation performs terribly on GHC 8.0, but not with newer GHC 8.2 or GHC 8.4.

It would be worth investigating, but for anybody affected for now I can only recommend upgrading your ghc or use newer LTS snapshot.

`loadP` has no need to be in `IO`

There is a way to abstract parallel loading as far as Load class is concerned to be an arbitrary Monad for loadP, just as it is already with loadS. This would enforce safer instances for Loadable arrays. Here is the proposed function to be added to Load as a replacement for loadP:

  loadWithWorkers
    :: Monad m =>
       Int -- ^ Total number of workers
    -> (m () -> m ()) -- ^ A monadic action that will schedule work for the workers
    -> Array r ix e -- ^ Array that is being loaded
    -> (Int -> m e) -- ^ Function that reads an element from target array
    -> (Int -> e -> m ()) -- ^ Function that writes an element into target array
    -> m ()

And the function that does the actual loading:

loadMutableWithScheduler :: (Load r' ix e, Mutable r ix e) =>
                            [Int] -> Array r' ix e -> IO (Array r ix e)
loadMutableWithScheduler wIds !arr = do
  mArr <- unsafeNew (size arr)
  withScheduler_ wIds $ \scheduler ->
    loadWithWorkers
      (numWorkers scheduler)
      (scheduleWork scheduler)
      arr
      (unsafeLinearRead mArr)
      (unsafeLinearWrite mArr)
  unsafeFreeze (ParOn wIds) mArr

This is a pure safety and clarity improvement, as such it is not urgent. Moreover it would be considered a breaking change, therefore this implementation will be postponed to v0.3, whenever that will happen :)

upload massive-0.2.3.0 to hackage

Hey guys,

Is there a reason why 0.2.3.0 is not uploaded to hackage yet?

Cheers,
Vlad

Make range functions work with all dimensions

range et al. functions are unnecessarily restricted to Ix1, they can to be switched to work with any Index:

range :: Index ix => Comp -> ix -> ix -> Array D ix ix
range comp ixFrom ixTo =
  makeArray comp (liftIndex2 (-) ixTo ixFrom) (liftIndex2 (+) ixFrom)

Then it would work for any dimension:

λ> range Seq (Ix1 2) 10
(Array D Seq (8)
  [ 2,3,4,5,6,7,8,9 ])
λ> range Seq (2 :. 3) (4 :. 5)
(Array D Seq (2 :. 2)
  [ [ 2 :. 3,2 :. 4 ]
  , [ 3 :. 3,3 :. 4 ]
  ])

It's practically backwards compatible, but without Ix1, result type is ambiguous, so this would be a type inference error: range Seq 2 10, therefore this change will be introduced in the next major release, until then general implementation included in this issue can be used if necessary.

Conflict with Prelude's map; update README with clarification?

Probably a Haskell-newbie issue, but I'm wondering what the recommended way to deal with the conflict between Haskell's and Prelude's map; performance aside, it seems like the Massiv map may be needed at times, unless I'm interpreting my errors incorrectly. My current solution which works for my small example is:

import               Prelude hiding (map)
import               Data.Massiv.Array as AM

But I don't know if this is ideal. Thanks!

Monadic version of ``makeArray`` and friends?

I might be missing it, but I can't seem to find anything like a monadic version of a construction function. Something of the form:

makeArrayM :: (Monad m, Construct r ix e) => Comp -> ix -> (ix -> m e) -> m (Array r ix e)

I understand that this might not be workable with every representation type, but D should be possible, right? As current, the only way I see to do this is to go via fromList and friends. Is there something I'm missing here?

[massiv-io - bug?] `readImageAuto` assumes conversion?

I just hit a funny error in my tests. I have a function fromGray which is supposed to read any image type and produce a grayscale Raster. It uses readImageAuto for this under the hood.

When reading an already grayscale image just now, I got a runtime error:

ConvertError "Cannot decode Auto TIF image <Image S Y Word8> as <Image S Y Word8>"

Is this expected, or should readImageAuto attempt no conversion when it detects that the desired type and actual type are the same?

Type safety for size vs index

Currently it is way too easy to mix up index with size, since they are of the same type. For example function isSafeIndex :: ix -> ix -> Bool does not give any type level guarantees that first argument is size while the second one is index, which can lead to nasty bugs.

This ticket proposes a newtype wrapper:

newtype Sz ix = Sz ix

Which will not have any runtime overhead but will promote library safety.

This change will affect many functions in the library, for example core function makeArray will now have a type signature:

makeArray :: Construct r ix e => Comp -> Sz ix -> (ix -> e) -> Array r ix e

but the change, IMHO, totally worth it. Especially since transitioning current code is trivial.

Together with the wrapper should be added helper pattern synonyms:

pattern Sz1 :: Int -> Sz Ix1
pattern Sz1 i = Sz i

pattern Sz2 :: Int -> Int -> Sz Ix2
pattern Sz2 m n = Sz (m :. n)

...
-- so forth until Sz5

Bonus.

It would be possible to prevent negative size to be constructed altogether similar to how the Stride is currently implemented. Unlikely that it will have any overhead, but the drawback is that on ghc < 8.2 pattern matching will be yielding an non-exhaustive warning. That is still up for a debate though. Moreover it can be added in the future, since it would be backwards compatibe.

Switch to MonadIO

It will be more convenient to have MonadIO restriction, rather than a usual IO monad on all operations do live in IO.

Some places like a scheduler for instance might not be possible to switch to MonadIO and something like MonadUnliftIO will be required. Therefore there is a possibility of an extra new dependency unliftio-core.

`zipWith (+)` vs `(.+)`

From an email exchange we had:

Hey Alexey, I was doing a bit of benchmarking this morning and noticed that zipWith (+) and friends are about twice as fast as their counterparts in Data.Massiv.Array.Numeric(like (.+)). Looking at the code, I see those operators using liftArray2, while zipWith does its own work.

Great find, and also an unfortunate one. zipWith (+) and liftArray2 (+) are almost identical except the latter errors out on mismatched size and looks at two extra cases when either one of the arguments is an array with one element. Those two extra case causes ghc failing to properly apply some optimizations, thus causing a significant slowdown (it's even more drastic in my benchmark, a factor of 4)
Good thing for all of the binary operators like (.+), (.*), is that those case that cause the slowdown aren't that important, but using zipWith (+) would not be correct, since it will silently discard elements on mismatched size arrays (it doesn't affect mapalgebra, since you check size equality at the type level).

[request] Matrix Inverse

massiv has transpose already - it would be handy for mathematical libraries/applications to have inverse as well.

DW instance functionality can be expanded

As per #16 it should be possible to save arrays with DW representation, so Writable instance is needed.

This also brings up another point that there is no reason that DW isn't an instance of Show, which should be added as well.

Remove current `traverse` and implement a more general `transform`

I bit of historical background. Repa library has a slightly confusing function traverse:

traverse :: (Source r a, Shape sh) => 
  Array r sh a -> (sh -> sh') -> ((sh -> a) -> sh' -> b) -> Array D sh' b

which was also implemented in massiv with slight adjustment that the size got supplied directly instead of modified with a function and the order of arguments was different:

traverse :: (Source r1 ix1 e1, Index ix) =>
  ix -> ((ix1 -> e1) -> ix -> e) -> Array r1 ix1 e1 -> Array D ix e

Regardless of the differences the function is too confusing (see #57), but mostly because of the name, since it clashes with Prelude's traverse, which have nothing in common with above functions. In order to disambiguate, it needs to renamed. Especially considering that there will be an actual traverse coming as part of #52 and that name will no longer be available.

Moreover the concept of old traverse is easily replaceable with a simple call to makeArray. For that reason it will be expanded into a more advanced function transform (using new types due to #65 and for clarity):

transform :: (Source r' ix' e', Index ix)  =>
  (Sz ix' -> (Sz ix, a)) -> (a -> (ix' -> e') -> ix -> e) -> Array r' ix' e' -> Array D ix e

An actual explanation of its functionality will be in the haddock, but in essence it allows for element producing function to depend not only on the new index, old elements, but also on the sizes of the source and the target arrays.

Task: Also add unsafeTransform, and possibly (unsafe)transformX up to some X

PS. Believe it or not I have a really good use case for transform function, but it is a bit too involved to go into details here.

Stencil itself shouldn't care about border resolution

Currently makeStencil et al. take Border resolution technique as the first argument, which is the wrong approach to take and it's the mapping function mapStencil that should be the one that is aware of Border instead.

In a single Stencil setup it doesn't make a difference, but considering the fact that we can compose Stencils, Border argument should be moved over to mapStencil

This is a breaking change, thus will be postponed to v0.2

[massiv] -N for tests?

https://github.com/lehins/massiv/blob/master/massiv/massiv.cabal#L88

Tests seem to be ran with -with-rtsopts=-N2. Can this be -N, or does this hurt Travis, say?

Eq and Ord instances for Value

Stencil creation goes into great trouble in order to guarantee safety, so that created stencil can be applied to any array and there will be no silent memory reads that aren't allowed. This is the whole reason for existence of Value data type in the first place, which prevents dependency of indices used by stencils on elements of the array it is being applied to, or so I thought. Apparently Ord and Eq instances for Value can cause havoc for stencil validation and should be removed. Consider example:

λ> let s = makeStencil (Fill 200) 3 1 $ \f -> if f 0 == f 1 then f (-1) else f 100
λ> let arr = mapStencil s $ computeAs P $ range Seq 0 10
λ> computeAs P arr
(Array P Seq (10)
  [ 200,140192713198744,8589934594,140192651898908,283614252664,1,140192717508584,1,200,200 ])

This is a breaking change and will be postponed to v0.2

Taking a 1D slice of a 3D array

As I understand it, all the slicing functions produce slices of a 'dimensionality' exactly one lower than that of their first argument: for example, if I slice a 3D array, I will end up with a 2D array. If I want to slice 'lower' than this, does this mean I have to slice twice, or is there some other function that can do that slicing that I'm missing?

error: • No instance for (Ragged DI Ix1 Double) arising from a use of ‘empty’

I am trying to generate a vector using specific values, for this I read the documentation in
Hackage. First, I tried to generate a singleton array and add values, in the same way given by the example, but the following error appears:

λ> import Data.Massiv.Array as A
λ> singleton 7 :: Array D Ix4 Double

<interactive>:218:1-11: error:
    • Couldn't match expected type ‘Array D Ix4 Double’
                  with actual type ‘e0 -> Array r0 ix0 e0’
    • In the expression: singleton 7 :: Array D Ix4 Double
      In an equation for ‘it’: it = singleton 7 :: Array D Ix4 Double
λ>

So, I tried to generate a empty array, but I have an scope and instance error.

λ>  import Data.Massiv.Array as A
λ> :set -XTypeApplications
λ> xs = empty @DL @Ix1 @Double

<interactive>:212:13-14: error:
    Not in scope: type constructor or class ‘DL’
    Perhaps you meant one of these:
      ‘D’ (imported from Data.Massiv.Array),
      ‘DI’ (imported from Data.Massiv.Array),
      ‘DW’ (imported from Data.Massiv.Array)
λ> xs = empty @D @Ix1 @Double

<interactive>:213:6-26: error:
    • No instance for (Ragged D Ix1 Double)
        arising from a use of ‘empty’
    • In the expression: empty @D @Ix1 @Double
      In an equation for ‘xs’: xs = empty @D @Ix1 @Double

I am using massive 0.3.0.1 package version and ghc 8.6.4

|*| from master branch slower than in massiv-0.2.8.0

Hi!

First of all I'm really impressed by your work!
I've noted something when running some benchmarks. I usually set the packages to be fetched directly from the last github commit that I know off, but I had forgotten to do so for massiv , as I had just recently added it to the benchmarks.
When I ended up switching for the last commit, |*| suddenly became slower.
On my machine, for 100*100 matrices, I get:

for 2.8.0 (the massiv version packaged with lts13.22) : ~1.5 ms
for master: ~2.4 ms

(60% slowdown)

EDIT:
After running some more tests the change seems to happen between 2.8 and 3.0. I couldn't test for 2.8.1 since github doesn't seem to recognize this tag

Bad performance with delayed values

I've built a simple raytracer with massiv and noticed it had an abysmal performance. I've created it using delayed arrays all the way, apart from the top level when they are computed to get the result. After some investigation I found that code

a = dot direction direction
b = dot oc direction

computes direction and all of it's computation subtree three times, even though all the arrays involved are delayed.

This is most likely just an issue with my mental model of how delayed should work in that scenario (ie. compute direction once and stick it in all the places), but I think it should be more visible in the docs that delayed should effectively never be used apart from the cases where values are indeed coming from a generator function. If it's not an issue with my mental model and is indeed a bug, I am happy to provide the repro code ;)

MArray usage

Hello,

I was looking at the hackage MArray docs and am somewhat confused, probably partly because the docs aren't yet as extensive as for immutable Arrays (especially with regard to examples), and partly because I'm still a beginner with Haskell.

So for something more concrete, I'm looking at new.

I'll try to write up some docs at some point, if it helps, after I get a bit further along (even if it is just restating what is in this issue). I frequently see PrimMonad m, but what are some example m types. How does one call new in an analogous way to the example: makeArray Seq (3 :. 4) (\ (i :. j) -> if i == j then i else 0) :: Array D Ix2 Int?

My goal is to create a large matrix, which I can then modify by doing some for of slice-base indexing to interact with submatrices.

Thanks!

`Construct` => `Size` => `Load` hierarchy isn't optimal

Size type class should only care about the size, not the resize nor extract functionality, which should become their own type class(es). Also Construct constraint is irrelevant for size, resize or extract
As for Load type class, all it really needs is information about the size of the array in order to be able to create a new mutable array and fill it up.

Proposed hierarchy:

Size class should only care about the size. Example implementation:

class Size array r ix where
  size :: array r ix e -> ix
instance Size Array P ix where
  size (PArray _ sz _) = sz
instance Size (MArray s) P ix where
  size (MPArray sz _) = sz

A possible hierarchy:

class Size Array r ix => Construct r ix e where
-- ... all else is the same

class Resize array r ix where
  unsafeResize :: array r ix e -> array r ix e

instance Size Array P ix => Resize Array P ix where
  unsafeResize ...

instance Size (MArray s) P ix => Resize MArray P ix where
  unsafeResize ...

class Size Array r ix => Extract r ix e where
  unsafeExtract ...

class Size Array r ix => Load r ix e where
  loadArray ...
  loadArrayWithStride ...

Above approach has quite a few benefits:

LN and L can potentially be made an instance of Load, since Resize and Construct are no longer souperclasses
Constraint hierarchy is less strict
Ability of adding MArray type family to Size, making msize redundant.
Resize will work for mutable arrays as well.
Works well with changes proposed in #41

It is a breaking change, thus is postponed till v0.3

Build failure with ghc 8.4

Add a Semigroup instance.

massiv: BuildFailureException Process exited with ExitFailure 1: ./Setup build

Building library for massiv-0.1.1.0..
[ 1 of 34] Compiling Data.Massiv.Core.Computation ( src/Data/Massiv/Core/Computation.hs, dist/build/Data/Massiv/Core/Computation.o )

src/Data/Massiv/Core/Computation.hs:46:10: error:
    • No instance for (Semigroup Comp)
        arising from the superclasses of an instance declaration
    • In the instance declaration for ‘Monoid Comp’
   |
46 | instance Monoid Comp where
   |          ^^^^^^^^^^^

How to get the average from the non-convolution stencil example?

In the README, there's an example of an averaging stencil, which is first described without convolution, and then with. However, only the convolution example has an explanation of how to get the average out of it, while the non-convolution one doesn't. What would be the way to get the average from the non-convolution example, once mapStencil had been applied?

Sorting massiv arrays

Are there sorting algorithms implemented already for massiv? Thanks!

``zipInner``

The goal is to zip together an array of a dimensionality one smaller (but matching) and another array, by essentially zipping along the inner dimensions only. At the moment, I can write what I want by doing something like

zipInner :: (Manifest r (Lower ix) a, Manifest r' ix b) => Array r (Lower ix) a -> Array r' ix b -> Array D ix (a,b)
zipInner a1 a2 = makeArray (getComp a1) (size a2) go
    where go i = (index' a1 (tailDim i), index' a2 i)

However, this requires me to have a Manifest constraint for both 'input' arrays, while zip and friends require only Source. Is it possible to provide a function similar to this one, but with a signature like

zipInner :: (Source r (Lower ix) a, Source r' ix b) => Array r (Lower ix) a -> Array r' ix b -> Array D ix (a,b)

This extends to other similar functions, like zip3, zipWith, and so forth.

Confused by ``traverse``

I'm not sure I understand how this function would actually be used. I vaguely get that this is designed to be a version of makeArray that's augmented by having access to another auxiliary array that it can make use of, but I have no idea how you are meant to make use of that array. I'm particularly confused by the second argument. Would you be able to explain, or perhaps give an example?

Summary of breaking changes for v0.2

Before bumping up a version and introducing breaking changes here is the live list of all of them:

Remove Eq and Ord instances for Value #19
Stencil itself shouldn't care about border resolution #17
Rename rank -> dimensions #25

Bug I the convolution functions?

Here is a simple example:

    1 module Massiv where
    2 
    3 import Data.Massiv.Array as A
    4 
    5 xs :: A.Array A.U A.Ix1 Int
    6 xs = A.fromList A.Seq [1, 2, 3, 4] 
    7 
    8 ys :: A.Array A.U A.Ix1 Int
    9 ys = A.fromList A.Seq [1, 2, 3, 4, 5] 
   10 
   11 zs :: A.Array A.D A.Ix1 Int
   12 zs = A.range A.Seq 20001 30000
   13 
   14 stencil :: A.Stencil A.Ix1 Int Int
   15 stencil = makeConvolutionStencilFromKernel xs
   16 
   17 result :: Array U Ix1 Int
   18 result = A.computeAs A.U (A.mapStencil (A.Fill 0) stencil ys)

result comes out as (Array U Seq (5) [ 10,20,30,4510889970,31 ]). If I swap xs and ys in 15 and 18 I get (Array U Seq (4) [ 10,20,30,34 ]).

Perhaps with appropriate 0 padding, I was expecting [1, 4, 10, 20, 30, 34, 31, 20]. Some of this is clearly the fact that the built-in stencil-creation functions are centred with image filters, rather than, eg, digital signals / FIR, in mind. But the 4510889970 looks odd either way.

Performance degradation with Storable

https://github.com/mkawalec/raytracer/blob/master/src/Lib.hs#L105 is a very hot array, so hopefully changing it to S would bring a significant speedup. However, after making it a Storable, I've noticed in Core that GHC stopped unpacking virtually all the parameters and because of constant boxing and unboxing on every call boundary the code is slightly slower than the Boxed representation. I've played with switches to SpecConstr, but I didn't notice any change whatsoever from tuning them. Do you know what could cause the optimizer to trip when Storable is used?

In Core, the code goes from

$w$scolor
  :: Double#
     -> Double#
     -> Double#
     -> Double#
     -> Double#
     -> Double#
     -> Int#
     -> Array# Sphere
     -> PCG32
     -> Int
     -> (# Double#, Double#, Double# #)
$w$scolor
  = \ (ww :: Double#)
      (ww1 :: Double#)
      (ww2 :: Double#)
      (ww3 :: Double#)
      (ww4 :: Double#)
      (ww5 :: Double#)
      (ww6 :: Int#)
      (ww7 :: Array# Sphere)
      (w :: PCG32)
      (w1 :: Int) ->
      let {
        a :: Double#
        a = +## (+## (*## ww3 ww3) (*## ww4 ww4)) (*## ww5 ww5) } in

$scolor :: Ray -> World Sphere -> PCG32 -> Int -> V3
$scolor
  = \ (eta :: Ray)
      (eta1 :: World Sphere)
      (eta2 :: PCG32)
      (eta3 :: Int) ->
      case eta of { Ray dt1 dt2 dt3 dt4 dt5 dt6 ->
      case eta1 `cast` <Co:5> of nt12 { SArray ipv ipv1 ipv2 ->
      case ipv1 of { I# y ->
      case ipv2 of { Vector dt7 dt8 dt9 ->

after switching the Array to Storable

Support for Sparse matrices

I believe it is almost a requirement for an array library to have support for sparse matrices, so they need to be implemented in massiv as well.

With a little bit of research that I have done in this area, so far I envision it as another array representation:

data CSR r = CSR r

data instance Array (CSR r) Ix2 e = CSRArray
  { csrSize :: Ix2
  , csrIA :: ByteArray
  , csrJA :: ByteArray
  , csrData :: Array r Ix1 e }

Or something along these lines. Similarly CSC representation could be described.

Performance regression in ghc-8.6

There seems to be a x2~x4 regression in performance when programs using massiv are compiled with ghc-8.6. It looks like this issue has to do with a bug in ghc #16004

I'd like to record it here just in case anyone stumbles upon it, like I did. Two ways to avoid it are:

use another version of ghc :)
compile with -fno-full-laziness for ghc-8.6