Coder Social home page Coder Social logo

bodigrim / chimera Goto Github PK

View Code? Open in Web Editor NEW
59.0 6.0 6.0 159 KB

Lazy infinite compact streams with cache-friendly O(1) indexing and applications for memoization

Home Page: http://hackage.haskell.org/package/chimera

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.49% C 0.51%
memoize recursive-functions lazy-streams infinite-stream memoization dynamic-programming

chimera's Introduction

chimera Hackage Stackage LTS Stackage Nightly

Lazy infinite compact streams with cache-friendly O(1) indexing and applications for memoization.

Introduction

Imagine having a function f :: Word -> a, which is expensive to evaluate. We would like to memoize it, returning g :: Word -> a, which does effectively the same, but transparently caches results to speed up repetitive re-evaluation.

There are plenty of memoizing libraries on Hackage, but they usually fall into two categories:

  • Store cache as a flat array, enabling us to obtain cached values in O(1) time, which is nice. The drawback is that one must specify the size of the array beforehand, limiting an interval of inputs, and actually allocate it at once.

  • Store cache as a lazy binary tree. Thanks to laziness, one can freely use the full range of inputs. The drawback is that obtaining values from a tree takes logarithmic time and is unfriendly to CPU cache, which kinda defeats the purpose.

This package intends to tackle both issues, providing a data type Chimera for lazy infinite compact streams with cache-friendly O(1) indexing.

Additional features include:

  • memoization of recursive functions and recurrent sequences,
  • memoization of functions of several, possibly signed arguments,
  • efficient memoization of boolean predicates.

Example 1

Consider the following predicate:

isOdd :: Word -> Bool
isOdd n = if n == 0 then False else not (isOdd (n - 1))

Its computation is expensive, so we'd like to memoize it:

isOdd' :: Word -> Bool
isOdd' = memoize isOdd

This is fine to avoid re-evaluation for the same arguments. But isOdd does not use this cache internally, going all the way of recursive calls to n = 0. We can do better, if we rewrite isOdd as a fix point of isOddF:

isOddF :: (Word -> Bool) -> Word -> Bool
isOddF f n = if n == 0 then False else not (f (n - 1))

and invoke memoizeFix to pass cache into recursive calls as well:

isOdd' :: Word -> Bool
isOdd' = memoizeFix isOddF

Example 2

Define a predicate, which checks whether its argument is a prime number, using trial division.

isPrime :: Word -> Bool
isPrime n = n > 1 && and [ n `rem` d /= 0 | d <- [2 .. floor (sqrt (fromIntegral n))], isPrime d]

This is certainly an expensive recursive computation and we would like to speed up its evaluation by wrappping into a caching layer. Convert the predicate to an unfixed form such that isPrime = fix isPrimeF:

isPrimeF :: (Word -> Bool) -> Word -> Bool
isPrimeF f n = n > 1 && and [ n `rem` d /= 0 | d <- [2 .. floor (sqrt (fromIntegral n))], f d]

Now create its memoized version for rapid evaluation:

isPrime' :: Word -> Bool
isPrime' = memoizeFix isPrimeF

Example 3

No manual on memoization is complete without Fibonacci numbers:

fibo :: Word -> Integer
fibo = memoizeFix $ \f n -> if n < 2 then toInteger n else f (n - 1) + f (n - 2)

No cleverness involved: just write a recursive function and let memoizeFix take care about everything else:

> fibo 100
354224848179261915075

What about non-Word arguments?

Chimera itself can memoize only Word -> a functions, which sounds restrictive. That is because we decided to outsource enumerating of user's datatypes to other packages, e. g., cantor-pairing. Use fromInteger . fromCantor to convert data to Word and toCantor . toInteger to go back.

Also, Data.Chimera.ContinuousMapping covers several simple cases, such as Int, pairs and triples.

Benchmarks

How important is to store cached data as a flat array instead of a lazy binary tree? Let us measure the maximal length of Collatz sequence, using chimera and memoize packages.

#!/usr/bin/env cabal
{- cabal:
build-depends: base, chimera, memoize, time
-}
{-# LANGUAGE TypeApplications #-}
import Data.Chimera
import Data.Function.Memoize
import Data.Ord
import Data.List
import Data.Time.Clock

collatzF :: Integral a => (a -> a) -> (a -> a)
collatzF f n = if n <= 1 then 0 else 1 + f (if even n then n `quot` 2 else 3 * n + 1)

measure :: (Integral a, Show a) => String -> (((a -> a) -> (a -> a)) -> (a -> a)) -> IO ()
measure name memo = do
  t0 <- getCurrentTime
  print $ maximumBy (comparing (memo collatzF)) [0..1000000]
  t1 <- getCurrentTime
  putStrLn $ name ++ " " ++ show (diffUTCTime t1 t0)

main :: IO ()
main = do
  measure "chimera" Data.Chimera.memoizeFix
  measure "memoize" (Data.Function.Memoize.memoFix @Int)

Here chimera appears to be 20x faster than memoize:

837799
chimera 0.428015s
837799
memoize 8.955953s

Magic and its exposure

Internally Chimera is represented as a boxed vector of growing (possibly, unboxed) vectors v a:

newtype Chimera v a = Chimera (Data.Vector.Vector (v a))

Assuming 64-bit architecture, the outer vector consists of 65 inner vectors of sizes 1, 1, 2, 2², ..., 2⁶³. Since the outer vector is boxed, inner vectors are allocated on-demand only: quite fortunately, there is no need to allocate all 2⁶⁴ elements at once.

To access an element by its index it is enough to find out to which inner vector it belongs, which, thanks to the doubling pattern of sizes, can be done instantly by ffs instruction. The caveat here is that accessing an inner vector first time will cause its allocation, taking O(n) time. So to restore amortized O(1) time we must assume a dense access. Chimera is no good for sparse access over a thin set of indices.

One can argue that this structure is not infinite, because it cannot handle more than 2⁶⁴ elements. I believe that it is infinite enough and no one would be able to exhaust its finiteness any time soon. Strictly speaking, to cope with indices out of Word range and memoize Ackermann function, one could use more layers of indirection, raising access time to O(log ⃰ n). I still think that it is morally correct to claim O(1) access, because all asymptotic estimates of data structures are usually made under an assumption that they contain less than maxBound :: Word elements (otherwise you can not even treat pointers as a fixed-size data).

Additional resources

chimera's People

Contributors

bodigrim avatar felixonmars avatar fybe avatar jberryman avatar pgujjula avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

chimera's Issues

iterate function

Implement iterate function with semantics similar to Data.List.iterate and its monadic analogue iterateM both for Data.Chimera and Data.Chimera.Unboxed.

iterate :: U.Unbox a => (a -> a) -> a -> Data.Chimera.Unboxed.Chimera a
iterate f = runIdentity . iterateM (return . f)

iterateM :: (Monad m, U.Unbox a) => (a -> m a) -> a -> m (Data.Chimera.Unboxed.Chimera a)
iterateM = undefined

Recursive calls don't get memoized with memoizeFix, when the recursive call uses higher numbers than the input.

The following script calculates the 50th fibonacci number by recursing upwards instead of downwards, with the base-cases being 49 and 50. I have added traceShows to see, when recursive calls are not memoized.

fib50F :: (Word -> Word) -> Word -> Word
fib50F _ 50 = traceShow 50 0
fib50F _ 49 = traceShow 49 1
fib50F f n  = traceShow n $ f (n+1) + f (n+2)

fib50 :: Word -> Word
fib50 = memoizeFix fib50F

main :: IO ()
main = do
  print $ fib50 0

Ideally, each number between 0 and 50 would only be printed once, but they are printed many times. In fact, the above code never finishes on my machine, whereas the usual memoized implementation finishes immediately.

Benchmark Collatz sequence

Benchmark measuring Collatz sequence with chimera, memoize and other memoization packages. Include results into README.

Tighten bounds on primitive on Hackage

Building fails with:

[3 of 4] Compiling Data.Chimera     ( Data/Chimera.hs, /tmp/chimera-0.3.3.0/dist-newstyle/build/x86_64-linux/ghc-9.6.3/chimera-0.3.3.0/build/Data/Chimera.o, /tmp/chimera-0.3.3.0/dist-newstyle/build/x86_64-linux/ghc-9.6.3/chimera-0.3.3.0/build/Data/Chimera.dyn_o ) [Data.Primitive.Array package changed]

Data/Chimera.hs:386:20: error: [GHC-76037]
    Not in scope: ‘A.fromListN’
    NB: the module ‘Data.Primitive.Array’ does not export ‘fromListN’.
    Suggested fix:
      Perhaps use one of these:
        ‘G.fromListN’ (imported from Data.Vector.Generic),
        ‘U.fromListN’ (imported from Data.Vector.Unboxed),
        ‘V.fromListN’ (imported from Data.Vector)
    |
386 |   pure $ Chimera $ A.fromListN (bits + 1) (z : zs)
    |                    ^^^^^^^^^^^

Data/Chimera.hs:430:20: error: [GHC-76037]
    Not in scope: ‘A.fromListN’
    NB: the module ‘Data.Primitive.Array’ does not export ‘fromListN’.
    Suggested fix:
      Perhaps use one of these:
        ‘G.fromListN’ (imported from Data.Vector.Generic),
        ‘U.fromListN’ (imported from Data.Vector.Unboxed),
        ‘V.fromListN’ (imported from Data.Vector)
    |
430 |   pure $ Chimera $ A.fromListN (bits + 1) (z : zs)
    |                    ^^^^^^^^^^^

Data/Chimera.hs:504:31: error: [GHC-76037]
    Not in scope: ‘A.fromListN’
    NB: the module ‘Data.Primitive.Array’ does not export ‘fromListN’.
    Suggested fix:
      Perhaps use one of these:
        ‘G.fromListN’ (imported from Data.Vector.Generic),
        ‘U.fromListN’ (imported from Data.Vector.Unboxed),
        ‘V.fromListN’ (imported from Data.Vector)
    |
504 | fromListWithDef a = Chimera . A.fromListN (bits + 1) . go0
    |                               ^^^^^^^^^^^

Data/Chimera.hs:526:33: error: [GHC-76037]
    Not in scope: ‘A.fromListN’
    NB: the module ‘Data.Primitive.Array’ does not export ‘fromListN’.
    Suggested fix:
      Perhaps use one of these:
        ‘G.fromListN’ (imported from Data.Vector.Generic),
        ‘U.fromListN’ (imported from Data.Vector.Unboxed),
        ‘V.fromListN’ (imported from Data.Vector)
    |
526 | fromVectorWithDef a = Chimera . A.fromListN (bits + 1) . go0
    |                                 ^^^^^^^^^^^
Error: cabal: Failed to build chimera-0.3.3.0.

but everything works fine with

cabal build --constraint='primitive<0.9'

I think it needs a bounds revision!

Explore useful monads for tabulateM

Explore and reflect in documentation, which monads are useful (not hanging) for tabulateM, tabulateFixM, etc.

  • Identity obviously works.
  • Reader is fine.
  • A lazy Writer should be fine as well.
  • What about lazy State?
  • Gen from QuickCheck - surprisingly enough - is productive.

Is there a connection with MonadInterleave?

Lift vector actions

liftVecUnOpM
  :: (Monam m, G.Vector v a, G.Vector v b)
  => (Int -> v a -> m (v b))
  -> Chimera v a
  -> m (Chimera v b)
liftVecUnOpM = undefined

liftVecBinOpM
  :: (Monam m, G.Vector v a, G.Vector v b, G.Vector v c)
  => (Int -> v a -> v b -> m (v c))
  -> Chimera v a
  -> Chimera v b
  -> m (Chimera v c)
liftVecBinOpM = undefined

`fromInfinite` errors

Running the following example:

naturals :: UChimera Int
naturals = fromInfinite (0...)

main :: IO ()
main = print (index naturals 0)

results in the error

Data.Primitive.Array.fromListN: list length greater than specified size
CallStack (from HasCallStack):
  error, called at ./Data/Primitive/Array.hs:373:19 in primitive-0.8.0.0-EBSr4O8W5US1YYODUpLUQc:Data.Primitive.Array

Can you confirm if you can also reproduce this @Bodigrim? I can prepare a PR for this one as well. Thanks!

Remove instance Foldable

Or is it a bad idea, given that most of the folds diverge? On the other hand, we already provide instance Foldable.

Add iterateWithIndex

Equivalent to

iterateWithIndex :: G.Vector v a => (Word -> a -> a) -> a -> Ch.Chimera v a
iterateWithIndex f seed = Ch.unfoldr (\(ix, a) -> let a' = f (ix + 1) a in (a, (ix + 1, a'))) (0, seed)

Benchmark build failure

On stackage, log below

Building benchmark 'bench' for bit-stream-0.1.0.1..
[1 of 1] Compiling Main             ( bench/Bench.hs, dist/build/bench/bench-tmp/Main.o )

bench/Bench.hs:26:23: error:
    Variable not in scope: toIdx :: Word -> Word
   |
26 |   , doBench "toIdx" $ toIdx
   |                       ^^^^^

cycle function

Implement function cycle with semantics similar to Data.List.cycle both for Data.Chimera and Data.Chimera.Unboxed. A wrapper around tabulate should suffice.

cycle :: Vector a -> Chimera a 
cycle = undefined

`fromListWithDef` diverges

The following example diverges, with ghc-9.8.2 and ghc-9.6.4

naturals :: UChimera Int
naturals = fromListWithDef 0 [0..]

main :: IO ()
main = print (index naturals 0)

If you can reproduce @Bodigrim, I can investigate and prepare a PR.

drop function

Drop n first elements of Chimera, effectively offsetting indices.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.