feldspar / feldspar-language Goto Github PK

The goal of the Feldspar project is to define a high-level language that allows description of high-performance digital signal processing algorithms.

License: Other

Haskell 84.63% C 15.37%

feldspar-language's Issues

Introduce a table combinator

Implement a function for constructing dynamic tables:

table :: Syntax a => [a] -> Data [Internal a]

This is needed for making lookup tables, and it is also needed by the Repa library.

zipWithk implementations on vectors should fold left instead of right

minmintest :: Vector1 IntN -> Vector1 IntN -> Vector1 IntN
minmintest lx lz = zipWith3 (\x y z -> 7) lforward lforward lx
where
lforward = replicate (i2n (lx ! 0)) (zipWith (+) lx lz)

printExpr minmintest
(\var0 -> (\var1 -> (letBind (i2n (var0 ! 0)) (\var3 -> (parallel (min var3 (min var3 (getLength var0))) (\var2 -> 7))))))

Changing Vector to use Prelude.foldl1 instead of Prelude.foldr1 gives:

printExpr minmintest
(\var0 -> (\var1 -> (parallel (min (i2n (var0 ! 0)) (getLength var0)) (\var2 -> 7))))

System monad

As seen in #122, I'm working on adding support for system calls to Feldspar.

There are some issues related to this that I would like to discuss.

Which monad?

I'm currently (ab)using the M monad for system calls which is not so good given that M is for mutable data structures and that it has a run function. A better solution seems to be to define a new Sys monad as a newtype wrapper. The core language would use the same monad for both Sys and M, but the interfaces would be different:

Sys calls are only allowed in Sys
runMutable only works for M

The only problem I see with this is that one would have to redefine the mutable data structures for the Sys monad, since we need references and arrays also at the system level.

Another option might be to just have a single monad, and have a static analysis that checks for sys calls under runMutable.

Semantics

foreignCall requires a semantic function to be provided. But I think it will be hard to model all sys calls as Haskell functions. So I suggest that we simply leave the semantics undefined for sys calls (when there is no sensible semantics to give).

Here is my motivation: Sys calls will be used to implement various system abstractions in Feldspar. These abstractions should be implemented as shallow or deep EDSLs on top of Feldspar, but their implementation should not rely on sys calls. Instead, we will provide one or more compilation functions that translate the EDSL to Feldspar programs with sys calls. But when evaluating the EDSL, we should not use the compilation function but rather a specialized evaluator for the EDSL (that calls Feldspar's eval).

As an example, the implementation of Stream does not use sys calls, but one can imagine a compilation function that translates a stream into a loop in the Sys monad that uses sys calls write out the output stream.

Missing optimizations for min of same variable modulo arithmetic

I used splitAt and got

(min var3 (shiftR var3 1))

in the output from printExpr. I was about to add an rewrite rule but I get a cyclic import problem:

Module imports form a cycle:
         module `Feldspar.Core.Constructs.Logic' (work/feldspar-language/src/Feldspar/Core/Constructs/Logic.hs)
        imports `Feldspar.Core.Constructs.Ord' (work/feldspar-language/src/Feldspar/Core/Constructs/Ord.hs)
  which imports `Feldspar.Core.Constructs.Bits' (work/feldspar-language/src/Feldspar/Core/Constructs/Bits.hs)
  which imports `Feldspar.Core.Constructs.Logic' (work/feldspar-language/src/Feldspar/Core/Constructs/Logic.hs)

Can the problem be avoided easily or is a major refactoring required?

Introduce a select combinator

Introduce a select combinator which is rendered as a multiple choice in the backend (e.g switch statement in C)

-- | Multiple choice. The Index in the first argument is used to select a result from the
--    key-value list in the second argument. The third argument provides a default result if
--    the key is nor present.
select :: Syntax a => Data Index -> [(Index, a)] -> a -> a

Implement virtual tuples

See this proposal.

See also the issue for feldspar-compiler.

Speed up compilation of programs with a lot of sharing

Currently we do optimizations before running code motion, which means that optimizations are not aware of any sharing in the program. The syntax tree without sharing can be exponentially larger than the tree with sharing, which causes a blow-up in compilation time for certain programs.

Here is an example of a program that takes quite a long time to process despite being quite small after code motion:

f as = force (bs ++ cs)
  where
    (bs,cs) = splitAt (length as `div` 2) as

prog1 :: Vector1 Index -> Vector1 Index
prog1 as = Prelude.iterate f as Prelude.!! 3

test1 = drawAST prog1

Here is the same program with explicit sharing:

iterate :: Syntax a => (a -> a) -> a -> [a]
iterate f a = a : iterate f (share a f)

prog2 :: Vector1 Index -> Vector1 Index
prog2 as = iterate f as Prelude.!! 3

test2 = drawAST prog2

test2 terminates instantly.

One possibility would be to run optimizations after code motion, but that wouldn't work because our optimizations don't work across let bindings.

We need some kind of graph representation of the syntax (possibly using a special kind of let binding), and adapt the optimizations to work on this representation.

A related issue is that code motion itself is slow because it doesn't detect any sharing in the original program. This should be fixed in the Syntactic package.

Deprecate Feldspar.Matrix?

Are we ready to deprecate Feldspar.Matrix in favour of Feldspar.Repa?

(The Mul class etc. may still be useful.)

(@josefs)

Add LANGUAGE pragmas as necessary to modules

It is not possible to build the modules outside cabal since they do not have the language pragmas

Add a README.md to the repo

Add a `String` argument to `save`

This could be used in the back end to identify the saved value (e.g. used as the variable name).

Bug in rangeRem

QuickCheck reports the following bug:
prop_rangeRem Int8: FAIL
*** Failed! Falsifiable (after 60 tests and 1 shrink):
Range {lowerBound = 124, upperBound = 127}
Range {lowerBound = -128, upperBound = -1}
127
-128

It is indeed a bug in rangeRem.

User's guide by examples

Make a collection of commented example files to serve as a Feldspar user's guide.

Each file should be relatively self-contained and focus on a single topic.

Topic suggestions:

Basics (tiny examples + evaluation + syntax tree rendering)
Loops
Basics of the vector library
Size constraints
Advanced examples

Unexpected undefined

> eval $ case Indexed 0 (\i -> err "sdf") Empty :: Matrix Index of Indexed l ixf Empty -> (ixf 0) 
*** Exception: Prelude.undefined

Expected result

*** Exception: sdf

Due to constant folding?

How to generate system calls?

In the system-core branch I've added a construct to do foreign import of system calls. For example:

alert :: Data WordN -> M ()
alert = foreignCall "alert" print

How should I represent these calls in the middle end? @pjonsson

Call seems like a good candidate, but it seems to have to do with forking, so I'm not sure if that would be abuse.

It seems wrong to use ForeignImport, because it's handled like an expression in the back end, and the only way to distinguish a system call from a pure function call would be to look at the return type.

Another option is of course to add a new construct in the middle end.

Minor note: foreignCall currently allows any monadic return type, but it could easily be restricted to M () if that's more appropriate for a sys call.

Generalize types of polymorphic functions

For example:

getIx :: Syntax a => Data [Internal a] -> Data Index -> a
setIx :: Syntax a => Data [Internal a] -> Data Index -> a -> Data [Internal a]

Is this a good idea?

Constant folding prevents fusion

*Main> printExpr $ \a -> parallel 10 (*2) ! a
(\var0 -> ([0,2,4,6,8,10,12,14,16,18] ! var0))

Expected result:

*Main> printExpr $ \a -> parallel 10 (*2) ! a
(\var0 -> (var0 * 2))

Make div a primitive again

div is currently defined in terms of /. When Feldspar/feldspar-compiler#16 has been fixed, it can be made a primitive again.

Cyclic imports when trying to optimize index calculations

Multidimensional vectors leave index calculations in the code, and some of them are not necessary. Here's one example:

divmodtest :: Data Index -> Data Index -> Data Index
divmodtest x y = x `rem` y + y * (x `quot` y)

I have the rule for simplification implemented, but Num.hs can not import Integral.hs without creating a cyclic import. The offender is the second rule in Integral.hs, which optimizes x quot (y^2k).

Short term: is that an important rule, or can we disable it until later?

Longer term: Can we solve the cyclic import issues for numbers by simply putting the number-related things, except for maybe complex numbers into the same module?

`store` doesn't

I have a program which tries to store an intermediate vector to memory. But the vector is still being fused.

f :: Manifest DIM1 (Data Double) -> Manifest DIM1 (Data Double)
f = store . s1 . store. s1
s1 vs = zipWith avg vs (tail vs)
avg a b = (a + b) / 2

The generated code for f is the same (modulo variable names) as if the middle store in between the two calls to s1 wasn't there.

Improve simplification of arithmetic

The current implementation is quite incomplete and it only deals with merging literals, so it can't cancel out variables; e.g.

(x+x)-x  ===>  x

It would be better to optimize a whole arithmetic expression at once. Gather all variables in one list, all literals in one list and all non-arithmetic sub-terms in one list. Then make a new optimized expression by combining the three lists.

However, doing this compositionally will probably lead to a lot of re-traversals of the same sub-terms, so the optimization framework will probably have to be modified so that arithmetic optimization only happens at feasible places (i.e. arithmetic sub-terms whose parents are not arithmetic expressions).

We should also think about how to deal with examples like this one (reported by Gergely):

> icompile (\(n::Data Word8) -> (n-2)+1)
...
void test(uint8_t v0, uint8_t * out)
{
    (* out) = (v0 + 255);
}

The result is correct, but not ideal from a readability point of view.

Proper handling of DefaultWord/DefaultInt

Right now, these are effectively 32 bit numbers. They should really be target-dependent.

Overwrapping (orphan) instances

I tried to use feldspar today, but it was not cabal install-able because of the following problem.

src/Feldspar/Core/Types.hs:257:10:
    Duplicate instance declarations:
      instance Show (IORef a)
        -- Defined at src/Feldspar/Core/Types.hs:257:10
      instance Show (IORef a)
        -- Defined in `monad-par-0.3.4.1:Control.Monad.Par.Scheds.DirectInternal'
Failed to install feldspar-language-0.6.0.2

Best,

Takayuki

Internal representation of MultiDim vectors is too general

When a MultiDim vector is converted to its internal representation, type information is lost.

Internal (Pull DIM2) a

becomes

([Length],[Internal a])

The number of dimensions is converted into a runtime property and the information is lost at the type level.

Among other things, this makes it impossible to write a good Arbitrary instance that will generate only arrays with the correct number of dimensions.

Can we make the internal representation an explicit (new)type instead of the pair and encode the dimensions so that the number of dimensions is still available in the type.

cc @josefs

Reenable fixed-point iterations in optimizations

See the TODO in the optimization of ForLoop.

Should constant folding respect `save`?

If so, this could be done by adding a field to Info saying whether or not a node contains save.

Optimize remainder by a constant

From the old trac:

We currently optimize division by a constant, but not remainder by a constant.

Generate asserts for undefined behavior

Feldspar has a few corners with undefined semantics. These are the ones I can think of:

undef and err
getIx and setIx
setLength can be used to make an array longer, which semantically amounts to appending undefined elements.
Size constraints are guarantees provided by the user. If the guarantees are not respected, the result is undefined.

Some undefined programs result in errors in eval, but invalid size annotations will instead result in a bogus value. The C code usually produces a bogus value for undefined programs, but it can probably also crash with a segfault.

It would be nice to have a safe compilation mode where all undefined behavior would be caught by assert/abort in the C code.

Question: Is it an error to use setLength to make an array longer? If yes, this error could easily be caught by an assert. If no, I don't think it's possible to catch errors from indexing uninitialized elements. It would require keeping track of exactly which elements of an array are initialized and which are not. This ultimately requires storing an extra Boolean for each element.

A related problem is when eval has well-defined semantics, but the C code has not. This is the case for e.g. [http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html signed integer overflow]. Should this be caught by an assert as well?

Optimize division by a constant which is not a power of two

From the old trac:

Currently we only optimize division by a constant when it is a power of two. This is the simple case; division can be replaced by a right shift. But it is also possible to optimize the general case.

Remove target-awareness

Interpretation.hs currently has hard coded compilation targets. This is non-modular. A better solution would be to parameterize optimization on the set of rewrite rules and let the back end decide which set to use.

Add support for rewrites in `targetSpecialization`

Add support for rewriting compiler known functions before optimizations occur, see Feldspar/feldspar-compiler#101.

(?) should be curried

...to make nesting easier, see this page.

Any objections?

Re-optimize after the call to codeMotion

I've been trying to optimize doubly-nested let-bindings and my rewrite rules did not fire. Here's one test case:

stestL2 :: Data Index -> Data Length -> Data [[Index]]
stestL2 m x = parallel x (\x1 -> let z = let y = x `mod` m in (y, y) in parallel 2 (\x -> fst z))

Gives the result:

*Main> printExpr stestL2
(\var0 -> (\var1 -> (letBind (letBind (rem var1 var0) (\var4 -> (parallel 2 (\var3 -> var4)))) (\var5 -> (parallel var1 (\var2 -> var5))))))

Turns out that the let-bindings are introduced by codeMotion and never passed through the rewrite rules which explains why my rules never matched.

I would prefer to get the nesting of the let-bindings flattened (and the let-bindings floated after that), but there's currently no hook after codeMotion is performed in feldspar-language. Could we re-run the optimize function after codeMotion, or alternatively add another pass that I can hook into?

Keep size of function argument

See this issue in the compiler.

Code motion makes expressions travel in tame

Synopsis

The code motion strives to push shared expressions as high up as possible in the syntax tree. When combined with the FUTURE feature some expressions will "travel in time" and execute now instead of in the future.

Motivating example

import qualified Prelude

import Feldspar
import Feldspar.Vector
import Feldspar.Algorithm.FFT

pmap f = map await . force . map (future . f)

p0 = pmap (fft . ifft)

In p0, we expect the syntax tree to consist of two major parts; the fused application of fft . ifft wrapped in a future, and a loop await-ing the results.
However, we end up with a tree where the ifft is performed before spawning any future expressions.

Problem

The code motion tries to move (hoist) shared expressions as high up as possible in the syntax tree to maximize sharing. An expression is eligible for hoisting if the outermost symbol allows sharing and the expression is used at least twice.

In our example the resulting vector of the ifft is used several times inside the fft (both for its length and its values) and thus it is selected for sharing.

Now, the code motion does not know about the future construct and thus the expression is lifted above the future in the syntax tree. This lifting makes it difficult for a programmer to predict which code will run in parallel and which will be computed serially before the parallel parts of the program.

Solution

A fix is provided in Feldspar/feldspar-language@ 7bac736.
The fix depends on unreleased changes to the syntactic library.

With the fix it is possible to disallow sharing through a symbol, i.e. make hoistOver return False.

Update tutorial

Some files in the tutorial don't build anymore (due to the multidimensional arrays). Reenable the tutorial test suite once this has been fixed.

Break out Range into a separate package

The Range module is usable outside Feldspar

Semantics of `Let`

Let is intended to be strict (at least that's how it's compiled to C), but eval gives non-strict semantics.

Refactor vector library

Break Vector into SimpleVector and Vector. SimpleVector would be single-segment vectors, and a Vector could be defined as

type Vector a = [SimpleVector a]

Improve precision of range propagation through 'xor' with signed operands

Range propagation defaults to universal for signed operands. Universal is of course a correct upper bound of the range, but it is not tight.

Function for weakening size

Add a function like cap that has "union" semantics instead of "intersection". This can be used e.g. to increase the size of the initial state of a for loop to make sure that the fixed-point is reached quickly.

Not meaningful until #15 is fixed.

Monotonic/Crease-handling

cc @emwap @emilaxelsson

The history of this ticket is that #38 exposed some expressions that should be optimized, for example min var3 (var3 >> 1). That optimization was checked in with the class Monotonic. Commit a23d9a1 renamed Monotonic to Crease, but that is not a great name. The optimization rule before that commit reads:

 constructFeatOpt _ (C' Min) (a :* b :* Nil)
        | as <- viewMonotonicDec a
        , any (alphaEq b) as = return a

I am convinced about the correctness of the optimization suggested in #38. Revisiting this generalization makes me a bit worried that the API is suggestive for making incorrect rewrite rules that will show upp in transitive uses, but I don't have a counter example at hand. I will try to construct a counter example this week.

The second issue is the name of the class. Aren't we really trying to Ord functions and their output?

f x < x

foreignCall and then-statements

I had a closer look at the problem mentioned in #122 and that needs a bit of deeper surgery. Tracking progress in this ticket.

Reject invalid uses of `runMutable`

It is possible to use runMutable in an unsafe way. We decided no to use the "ST trick" to enforce safe use in order to keep the types simple.

It should be possible to make a static analysis that discovers any unsafe use of runMutable. The downside compared to a type-based approach is that a static analysis won't be able to point to a line number when a program is rejected.

Add test for range propagation over shift operators

There is no test case for range propagation over shift operators. Also, the implementation should be updated to propagate over signed operands.

This blocks #13.

enumFromTo in MultiDim is faulty

The first pattern match in enumFromTo in MultiDim is faulty:

Main> eval $ length $ (0...(7::Data Index))
6
*Main> eval $ length $ (1...(8::Data Index))
8
*Main> eval $ (0...(7::Data Index))
([6],[0,1,2,3,4,5])
*Main> eval $ (1...(8::Data Index))
([8],[1,2,3,4,5,6,7,8])

printExprUnOpt hangs on parallel 10 (*2)

cc @emilaxelsson

Both printExpr and printExprUnOpt hangs on the following for me:

*Feldspar> printExprUnOpt $ parallel 10 (*2)

Tried going back a couple of months in the repository to find a working point but they all seem to hang on printExpr. Anything obvious that pops out?

Reject programs with over-constrained sizes

Optimization should throw an error when the size of an expression is over-constrained. It can only happen if there's a bug in the general size inference, or if the user has stated invalid size constraints. In both cases it may lead to incorrect optimizations, so throwing an error seems preferable.

Singleton ranges and literals

optimizeM should turn any expression with a singleton range into a literal. That way, optimizations that match on literals won't have to check for singleton ranges as well.