gregorycollins / hashtables Goto Github PK

View Code? Open in Web Editor NEW

125.0 125.0 36.0 313 KB

Mutable hash tables for Haskell, in the ST monad

License: BSD 3-Clause "New" or "Revised" License

Haskell 90.21% Makefile 0.12% C 9.60% Shell 0.07%

hashtables's People

Contributors

Stargazers

Watchers

hashtables's Issues

Add support for checking the table to be empty

Basic.lookup hangs

seems like 'forwardSearch2' is looping infinitely, both the default C version and in portable mode.

Test system is Mac OS X 10.7.2, Core 2 Duo, ghc-7.2.1 (64-bit).
I can no longer confirm, but I think I had the same problem under 10.6.7 and ghc-7.0.4 (64-bit) as well.

Any ideas what could trigger this? What sort of information will be useful for debugging?

Support for hashable-1.3.0.0

hashable-1.3.0.0 is released. Could you update hashtable to support that?

Why is performance a third of hash tables in other languages?

Some years ago I note HashTables based solutions in Haskell didn't compare well, performance wise, with the same solution in Python. For example, a trivial insert of millions of elements:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Data.HashTable.IO as H
import qualified Data.ByteString as BS
import Data.Hashable

newtype Bad = B { unB :: Int }  deriving (Show, Enum, Ord, Eq)
instance Hashable Bad where
    hash (B i) = i
    hashWithSalt _ (B i) = i

main ::  IO ()
main = do
    ht <- H.new :: IO (H.BasicHashTable Bad BS.ByteString)
    let go (B 63000001) = pure ()
        go n = H.insert ht n BS.empty >> go (B $ unB n + 1)
    go (B 1)

In the above we're just inserting the pairs [ (i, empty) | i <- [1 .. 63000000] ]

Notice I've made the "hash" match python's integer hash (mostly, ignoring what they do with negatives) though this didn't have a meaninful performance impact.

The performance difference between this snippet and the matching python is about 3x (30 seconds for Haskell with ghc 8.4.1 -O2 -fllvm -optl-march=native, 8 or 10 seconds for python or pypy). Has there been an exploration on this front? Is this mostly an issue with creating (and collecting) intermediate values such as HashTable_?

Performance gap between 1.2.4.2 and 1.3

For the Agda code, @nad observed a performance increase from hashtables 1.2.4.2 to 1.3, see:

agda/agda#5966 (comment)

He reports that this fixes a performance regression for him, but does not pin down where the regression happened.

Also, it seems that vector-hashtables outperforms hashtables by a slight margin, so we are switching.

I thought I let you know, possible actions could be:

Document more clearly the quite substantial performance gap.
Investigate its reasons.
Point to vector-hashtables.

Missing instance Eq (HashTable s k v)

You can compare STRefs for equality, so you can have an Eq instance that checks to see if two HashTable s k v are the same hashtable. I wound up needing this, but can't write the isntance from outside of the package.

The instance does the only thing such an instance could do. It obviously can't compare the contents, and works just like when you compare STRefs or IORefs for equality.

The performance degrades as the table grows

Following is a benchmark that outlines the problem boldly: inserting 100 000 rows into a single table takes at least twice as long as inserting the same amount of rows into 100 tables. This means that a table with 100 000 rows performs two times worse than a table with 1000 rows. This must mean that something is leaking.

import Control.Monad
import Criterion.Main
import qualified Data.HashTable.IO as H

main =
  defaultMain
  [
    insertingBench 100000 1,
    insertingBench 1000 100
  ]

type RowsAmount = Int
type TablesAmount = Int

insertingBench :: RowsAmount -> TablesAmount -> Benchmark
insertingBench rows tables =
  bench ("Inserting " ++ show rows ++ " rows into " ++ show tables ++ " tables")
        io
  where
    io = 
      replicateM tables $ do
        t <- H.new :: IO (H.BasicHashTable Int ())
        forM_ [0..rows] $ \i -> H.insert t i ()

Results:

benchmarking Inserting 100000 rows into 1 tables
mean: 22.11892 ms, lb 22.09413 ms, ub 22.14364 ms, ci 0.950
std dev: 126.8588 us, lb 113.2326 us, ub 146.2977 us, ci 0.950

benchmarking Inserting 1000 rows into 100 tables
mean: 10.05312 ms, lb 10.05143 ms, ub 10.05483 ms, ci 0.950
std dev: 8.703506 us, lb 7.309154 us, ub 11.17319 us, ci 0.950

[perf.] CuckooHashTable vs BasicHashTable

I stumbled upon this SO thread. It's quite dated, so I wanted to test it on GHC 8.0.2.

First, I've used CuckooHashTable (code attached below) — as can be easily verified, I'm using hashtables-1.2.1.0 there. The results on OSX are dissatisfying: ~4.4s for OCaml, ~16s for Haskell.

Then I've changed CuckooHashTable to BasicHashTable and automagically the Haskell got into ~4.4s range.

Docs state that:

Randomized testing shows this implementation of cuckoo hashing to be slightly faster on insert and slightly slower on lookup than Data.Hashtable.ST.Basic, while being more space efficient by about a half-word per key-value mapping.

While [1..10000000] is certainly not randomized, the workload suggest that Cuckoo should be quicker than Basic but isn't. What's happening here?

Maybe the docs shall be slightly corrected?

-- stack --resolver nightly-2017-02-08 ghc --package hashtables -- -O2 harrop_ghc.hs
import           Control.Monad
import qualified Data.HashTable.IO as H

type HashTable k v = H.CuckooHashTable k v

main = do
  m <- H.new :: IO (HashTable Int Int)
  forM_ [1..10000000] $ \n -> H.insert m n n
  v <- H.lookup m 100
  print v

(* ocamlopt -o harrop_ocaml harrop_ocaml.ml *)
let n = 10000000

let () =
  let m = Hashtbl.create n in
  for n=1 to n do
    Hashtbl.add m n n
  done;
  Printf.printf "%d\n%!" (Hashtbl.find m 100)

Performance compared to unordered-containers

This might be related to @nikita-volkov's #14

@gregorycollins I was curious about the performance between this and unordered-containers, thinking this would blow it out of the water. But I was kinda shocked to see that unordered-containers was performing much faster.

I put together some reproducible benchmarks in https://github.com/zmoazeni/testing-hashtable-performance but I was wondering if I'm missing something big that is swaying the results (very possible).

# A test file with 10,000 inserts and 500 lookups

benchmarking hashmap
mean: 38.75604 ms, lb 38.25674 ms, ub 39.32312 ms, ci 0.950
std dev: 2.724852 ms, lb 2.382821 ms, ub 3.149132 ms, ci 0.950
variance introduced by outliers: 64.623%
variance is severely inflated by outliers

benchmarking hashtable cuckoo
collecting 100 samples, 1 iterations each, in estimated 5.484390 s
mean: 50.78706 ms, lb 50.17328 ms, ub 51.44667 ms, ci 0.950
std dev: 3.252150 ms, lb 2.837077 ms, ub 3.812114 ms, ci 0.950
found 3 outliers among 100 samples (3.0%)
  3 (3.0%) high mild
variance introduced by outliers: 60.535%
variance is severely inflated by outliers

benchmarking hashtable basic
collecting 100 samples, 1 iterations each, in estimated 5.088615 s
mean: 52.46744 ms, lb 51.90874 ms, ub 53.06552 ms, ci 0.950
std dev: 2.986507 ms, lb 2.611857 ms, ub 3.462132 ms, ci 0.950
found 2 outliers among 100 samples (2.0%)
  2 (2.0%) high mild
variance introduced by outliers: 54.488%
variance is severely inflated by outliers

cbits/util.c is missing

I just tried to build the package with -funsafe-tricks -fsse42, and the build failed since this file does not exist.

cuckoo segfaults

Running the test suite, cuckoo segfaults. Enabling bounds checking yields no further insight. Works fine in portable mode. Throwing in a few extra debug messages, I got as far as readArray at Cuckoo.hs:261.

This is ghc-7.2.1 (64-bit) on Mac OS 10.7.2, gcc-4.2. However, works fine on a ubuntu box also with ghc-7.2.1 (64-bit) and gcc-4.4.3, so unless you also have an OS X box that displays this problem, I might be stuck reading about how cuckoo hashing works...

$ ./dist/build/testsuite/testsuite
cuckoo:
creating cuckoo hash table with 19 buckets having 152 total slots
creating cuckoo hash table with 19 buckets having 152 total slots
insert': begin
updateOrFail: begin: sz = 19
   h1=1, h2=7470097589608349705, b1=8, b2=80
delete' b1=8 b2=80 h1=1 h2=7470097589608349705
Segmentation fault: 11

Support fromVector

I have a use-case to create a hash table from a Data.Vector (k, v) and I was wondering if you wanted a patch for this for your code?

Compile failure with GHC HEAD

Just a warning that the package fails to compile due to a regression in HEAD.

The ticket tracking the issue is https://ghc.haskell.org/trac/ghc/ticket/11246

Hopefully it gets fixed before RC1.

How to merge two hashtables?

Consider using hashabler instead of hashable?

Would you be interested in merging changes that moved from hashable to my hashabler project? Here's a blog post I wrote that introduces an older version.

This would leave the API more or less unchanged but would

allow the data structures here to be safely public facing, by using SipHash
permit extending the API to make the choice of hash function user-configurable
be generally less sketchy
support consistent cross platform hashing, and in turn makes possible consistent serialization of these structures

The downsides might include:

hashable may have greater backwards compatibility
hashable is more widely used and users would need to redefine any custom data type instances for hashabler's Hashable
we don't yet support deriving instances with TH

The last two downsides will get mostly solved together when I define a TH deriving routine that's a drop-in replacement for hashable's.

I'm not sure when I'd be able to get to a PR, but I wanted to float the idea first. Let me know if you're interested and would like more details about what that might look like. Also if this idea sounds appealing but you have changes you'd need to see made in hashabler first. Thanks!

Hackage package does not have documentation

The hackage package index page for version 1.2.1.0 does not have links to documentation. Earlier versions do contain documentation, so I presume that the issue is some part of the upload process.

Orphan Monoid Instance

Hi! Would just like to mention that hashtables can form a monoid, it would be nice to not have this implemented as an orphan. Also I think that there might be a faster implementation, maybe someone else might be able to suggest something better?

instance (HashTable h, Eq a, Hashable a, Monoid v) => Monoid (IO (h RealWorld a v)) where
 mempty = H.new
 mappend a b = do
  a' <- a
  b' <- b
  H.mapM_ (uncurry $ update a') b'
  return a'
 mconcat = foldl' mappend mempty

update h k w = do
 l <- H.lookup h k
 case l of
  Nothing -> H.insert h k mempty
  Just v -> H.insert h k $ v `mappend` w

Feature request: modifyAll (internal but exposed helper)

I am implementing a cache using HashTable. It will occasionally scan the entire table looking for out-of-date values (based on a field of v) and expunge them. I would love to be able to implement this without doing multiple scans, hashing, lookups, and more.

Any chance you'd be willing to expose something like the below in an ".Internal.[Cuckoo|Basic|Linear]" or similar module, or accept a pull request for the same?

modifyAll :: ((k,v) -> Either Bool v)
          -> HashTable_ s k v
          -> ST s ()
modifyAll f (HashTable sz _ hashes keys values _) = go 0
  where
    totSz = numElemsInCacheLine * sz

    go !i | i >= totSz = return ()
          | otherwise  = do
        h <- U.readArray hashes i
        if h /= emptyMarker
          then do
            k <- readArray keys i
            v <- readArray values i
            case f (k,v) of
              Left False -> do
                U.writeArray hashes i emptyMarker
                writeArray keys i undefined
                writeArray values i undefined
              Left True  -> return ()
              Right nv   -> writeArray values i nv
            go (i+1)
          else
            go (i+1)

For my purposes I don't actually need k or the ability to write a new v, but this seems more generally applicable.

hashtables 1.2.0.0 update breaks builds with base-4.3 and before

A complete build log is at http://hydra.cryp.to/build/287347/log/raw.

Compilation failure using GHC 7.8.4

I got the following error using GHC 7.8.4:

$ cabal get hashtables
$ cd hashtables-1.2.2.0
$ cabal sandbox init
$ cabal install
Building library for hashtables-1.2.2.0..
[ 8 of 12] Compiling Data.HashTable.Class ( src/Data/HashTable/Class.hs, dist/dist-sandbox-21f4a9e2/build/Data/HashTable/Class.o )

src/Data/HashTable/Class.hs:99:70:
    Not in scope: type constructor or class ‘Word’

src/Data/HashTable/Class.hs:104:31:
    Not in scope: type constructor or class ‘Word’

src/Data/HashTable/Class.hs:104:52:
    Not in scope: type constructor or class ‘Word’
Failed to install hashtables-1.2.2.0

SSE4_2 Broken For Cuckoo HashTable

Hello, there seems to be a bug in the Cuckoo hash table when using sse4 instructions. The table will actually drop keys (at least i can't find them on lookup). Its possible you can find it if you enumerate entire table (did not try this). But its a big enough issue that look up is returning false. I can confirm that using the linear table with the same dataset and operations does not lose the key (with sse4 flag true). If the sse4 flag is false Cuckoo works as expected. I'm attaching a test setup for reproducing the issue. The test creates a hash table with UUID keys and then perform a series of adds/removes. Then a target key is looked up (that was not removed) and you will find it is not there. I've left code in place for toggling table type.

This was a really nasty thing to find lol.

hash-test.tar.gz
Note: uuids.txt has an initial set of uuids to load into table. hash_ops.txt contains ops that when replayed eventually corrupt table.

Support for freezing of containers

Would it be possible to add support to freeze the container - creating a read only copy which can be queried outside monad.

Compile failure with GHC 9.2

Compilation fails with GHC 9.2 (see below).
There is a fix by @RyanGIScott on head.hackage: https://gitlab.haskell.org/ghc/head.hackage/-/blob/master/patches/hashtables-1.2.4.1.patch

$ cabal install -w ghc-9.2.0 hashtables

Build profile: -w ghc-9.2.0.20210806 -O1
In order, the following will be built (use -v for more details):
 - hashtables-1.2.4.1 (lib) (requires build)
...
[ 3 of 12] Compiling Data.HashTable.Internal.IntArray ( src/Data/HashTable/Internal/IntArray.hs, dist/build/Data/HashTable/Internal/IntArray.o, dist/build/Data/HashTable/Internal/IntArray.dyn_o )

src/Data/HashTable/Internal/IntArray.hs:55:18: error:
    • Couldn't match type ‘Word16#’ with ‘Word#’
      Expected: Word# -> Elem
        Actual: Word16# -> Word16
    • In the expression: W16#
      In an equation for ‘primWordToElem’: primWordToElem = W16#
   |
55 | primWordToElem = W16#
   |                  ^^^^

src/Data/HashTable/Internal/IntArray.hs:66:34: error:
    • Couldn't match expected type ‘Word#’ with actual type ‘Word16#’
    • In the first argument of ‘word2Int#’, namely ‘w#’
      In the expression: word2Int# w#
      In an equation for ‘elemToInt#’:
          elemToInt# (W16# w#) = word2Int# w#
   |
66 | elemToInt# (W16# w#) = word2Int# w#
   |                                  ^^
Error: cabal: Failed to build hashtables-1.2.4.1.

Blocking:

Agda agda/agda#5442

Poor performance of Basic compared to Cuckoo

This ticket is based on GaloisInc/saw-script#674.

We recently switched one of our projects from from Data.HashTable.ST.Cuckoo to Data.HashTable.ST.Basic, to avoid a bug in Cuckoo (related to #55, I believe). As a result, we noticed a surprisingly large performance hit. Here are some numbers from profiling a part of our test suite that makes heavy use of hash table operations, where I compared runs with Cuckoo and Basic hash table libraries:

Each version did 4.8 million hash table insertions:
- cuckoo: 4.3% of 285.39s runtime = 12.3s
- basic: 32.1% of 1011.48 runtime = 325s
- 26x slowdown
Each version did 8.5 million hash table lookups:
- cuckoo: 1.4% of 285.39s runtime = 4.0s
- basic: 40.4% of 1011.48s runtime = 408s
- 102x slowdown

This was especially surprising considering that the hashtables documentation claims Basic should have the fastest lookups. Profiling showed that resizing of the Basic hash table happened 42 times, taking up about 10% of total runtime, so resizing only accounts for a small portion of the slowdown.

To see whether profiling itself had anything to do with the numbers, I recorded a trace of the keys for all insert and lookup operations we were doing, and then ran a testbench that only performed the hash table operations. Timing (without profiling) shows that after parsing, Cuckoo takes an additional ten seconds or so, while Basic takes approximately an additional 8 minutes. So the slowdown ratio indicated by profiling is not far off.

These tests were done on MacOS with ghc-8.8.3; I haven't done thorough testing with other compiler versions.

undefined, called at src/Data/HashTable/ST/Cuckoo.hs:406:31

Hi, I am seeing this error

...  Prelude.undefined                                                                                      
CallStack (from HasCallStack):                             
  error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
  undefined, called at src/Data/HashTable/ST/Cuckoo.hs:406:31 in hashtables-1.2.4.1-6MJy5UcaJATEc9hXD9mdTw:Data.HashTa
ble.ST.Cuckoo

on x86_64, with +RTS -A400m -M100g. I am only using insert and lookup. The code runs in the ST monad. The error is happening when re-sizing?

is it worth investigating? I cannot easily release the code as-is. I could debug on my machine (how?), or try to extract a reproducer.

Return Bool from insert indicating whether the value was inserted or not

Currently, the insert function has () as return type. However, it's often useful to know whether the value was actually inserted (didn't exist yet) or not. I think this should be easy to determine while inserting.

Does not build on powerpc

Hi,

hashtables does not build on a powerpc machine, for a build log see
https://buildd.debian.org/status/fetch.php?pkg=haskell-hashtables&arch=powerpc&ver=1.0.0.0-2&stamp=1330002314

Do you think you can fix that?

Thanks,
Joachim

Adding Hopscotch Hashing

Hi!
I has been thinking about implementing a Hopscotch hashing to extend the library.
Hopscotch is said to provide good performance at very high table load factors, even ones exceeding 0.9.
However, I am not sure about how fast it could be on haskell due to bit manipulation.
Have you ever try to implement it? @gregorycollins

Cuckoo loses data when inserting 65th element via mutate

edit: see next comment for a much better and more minimal reproduction:

Update for compatibility with the STT monad transformer

Would be great if the hashtable implementations could be used with the STT transformer (https://hackage.haskell.org/package/STMonadTrans-0.3.3). Probably more usefully still would be to have an interface for parameterising how the state accesses are performed, assuming this can be written so that inlining avoids any performance penalty, so that they could be used with any monad that implements mutable state.

1.2.3.0 breaks ghc pre 7.10

Can it be, that 1.2.3.0 silently breaks compatibility with ghc < 7.10, e.g. without applicative => monad proposal?

gtk2hs has no version constraint on hastables and I just run into by building svgcairo, see here.

That might be a good thing after all ...

Noncanonical mappend

These warnings should be addressed at some point:

[ 9 of 12] Compiling Data.HashTable.ST.Basic ( src/Data/HashTable/ST/Basic.hs, dist/build/Data/HashTable/ST/Basic.o, dist/build/Data/HashTable/ST/Basic.dyn_o )

src/Data/HashTable/ST/Basic.hs:522:2: warning: [-Wnoncanonical-monoid-instances]
    Noncanonical ‘(<>) = mappend’ definition detected
    in the instance declaration for ‘Semigroup Slot’.
    Move definition from ‘mappend’ to ‘(<>)’
    See also: https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/semigroup-monoid
    |
522 |  (<>) = mappend
    |  ^^^^^^^^^^^^^^

src/Data/HashTable/ST/Basic.hs:527:5: warning: [-Wnoncanonical-monoid-instances]
    Noncanonical ‘mappend’ definition detected
    in the instance declaration for ‘Monoid Slot’.
    ‘mappend’ will eventually be removed in favour of ‘(<>)’
    Either remove definition for ‘mappend’ (recommended) or define as ‘mappend = (<>)’
    See also: https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/semigroup-monoid
    |
527 |     (Slot x1) `mappend` (Slot x2) =
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

How to control load factor?

Hi,

I don't see a way to set load factor for new/existing hashtable.
When hashtable is doubled?

Alarming profiling results

I've been experiencing multiple issues with different implementations of "hashtables" while benchmarking insert operations, so I made a simple profiling script:

import Prelude
import Control.Monad
import qualified Data.HashTable.IO as H


main = do
  c <- H.newSized (10^6) :: IO (H.LinearHashTable Int ())
  forM_ [0..500000] $ \i -> H.insert c i ()

Profiling it produces daunting results to say at very least. The program spends most of its time collecting garbage. What's worse is that the Productivity percent drops under a progression, i.e. the more rows you insert the larger part of execution time the program spends on GC.

Here are the results:

   1,521,188,672 bytes allocated in the heap
   3,599,454,896 bytes copied during GC
      37,996,112 bytes maximum residency (13 sample(s))
       1,404,680 bytes maximum slop
              76 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      2999 colls,     0 par    4.13s    4.23s     0.0014s    0.0026s
  Gen  1        13 colls,     0 par    0.23s    0.26s     0.0198s    0.0341s

  TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.76s  (  0.87s elapsed)
  GC      time    4.28s  (  4.41s elapsed)
  RP      time    0.00s  (  0.00s elapsed)
  PROF    time    0.08s  (  0.08s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    5.13s  (  5.28s elapsed)

  Alloc rate    1,992,503,395 bytes per MUT second

  Productivity  14.9% of total user, 14.5% of total elapsed

Data.HashTable.Internal.UnsafeTricks doesn't compile with GHC 6.x

Building hashtables-1.2.0.1...
[ 1 of 12] Compiling Data.HashTable.Internal.UnsafeTricks ( src/Data/HashTable/Internal/UnsafeTricks.hs, dist/build/Data/HashTable/Internal/UnsafeTricks.o )

src/Data/HashTable/Internal/UnsafeTricks.hs:29:8:
    parse error on input `::'

The complete build log is at http://hydra.cryp.to/build/287890/log/raw.

Please add a method to update a value in place

I would like to update a value stored in a hashtable. The unordered-containers package provides methods like insertWith to do this, but the hashtable package unfortunately doesn't. The only way to update a value in place seems to be to do a lookup followed by an insert, but this strikes me as unnecessarily inefficient. Or am I missing something terribly obvious?

Support vector-0.12

I think that you can just bump the dependency.

Add support for checking the size of the table

Segfault in BasicHashTable with hashtables >= 1.2.0.0

I ran into a segfault with the newer releases of hashtables under both ghc 7.6.3 and 7.8.4. My workload looks up keys in the hash table, inserting them with some value if they are not already present. The order is basically random. Here is a test program:

{-# LANGUAGE ViewPatterns #-}
module Main ( main ) where

import qualified Data.HashTable.IO as IOH

import qualified Data.Time.Clock as C
import qualified System.Environment as E
import qualified System.Random as R
import Text.Printf ( printf )

main :: IO ()
main = do
  (read -> nActions) : args <- E.getArgs
  rgen <- case args of
    [] -> do
      now <- C.getCurrentTime
      let seed = fromIntegral ((floor (C.utctDayTime now)) :: Int)
      _ <- printf "Seed: %d\n" seed
      return $ R.mkStdGen seed
    seed : _ -> return $ R.mkStdGen (read seed)
  ht <- IOH.newSized 8192
  stress ht (take nActions (R.randoms rgen))
  return ()

stress :: IOH.BasicHashTable Int Int -> [Int] -> IO ()
stress _ [] = return ()
stress ht (val:rest) = do
  res <- IOH.lookup ht val
  case res of
    Nothing -> IOH.insert ht val val
    Just _ -> return ()
  stress ht rest

and a cabal file to build it

name:                ht-test
version:             0.1.0.0
synopsis:            hashtables stress test
license:             BSD3
license-file:        LICENSE
author:              Tristan Ravitch
maintainer:          [email protected]
category:            Data
build-type:          Simple
cabal-version:       >=1.10

executable ht-test
  main-is:             Main.hs
  build-depends:       base == 4.*,
                       random,
                       time,
--                       hashtables == 1.2.0.1
--                       hashtables == 1.2.0.0
                       hashtables < 1.2
  default-language:    Haskell2010
  ghc-options:         -Wall

The program takes two parameters: the number of (random) values to look up/insert and the seed to initialize the random number generator with. I have two interesting inputs to report:

Crash

ht-test 100000 1862 causes a segfault with hashtables >= 1.2. With 1.1, it acts as expected. That is 100,000 elements starting with a seed of 1862.

Infinite loop (or maybe just a huge slowdown?)

This is probably an infinite loop, and probably also just another manifestation of the same underlying problem as the segfault: ht-test 1000000 12 This completes in about 6 seconds with hashtables < 1.2. I never saw it finish for 1.2+. I did some profiling on my real application and it looked like execution was looping forever in delete'.go in the BasicHashTable code.

I'll try to look for some smaller test cases that trigger these issues. Finding the larger ones was a bit easier.

Link error on clang re: improper use of "inline" in C99

Hi @gregorycollins,

This isn't really an issue for hashtables, as it is an issue involving hashtables. I'm currently working on a program that depends on hashtables, and when I try to profile it, my build simply doesn't want to work. It complains about missing symbols from hashtables:

∴ cabal build
Building profiling-with-hashtables-0.1.0.0...
Preprocessing executable 'profiling-with-hashtables' for
profiling-with-hashtables-0.1.0.0...
[1 of 1] Compiling Main             ( Main.hs, dist/build/profiling-with-hashtables/profiling-with-hashtables-tmp/Main.p_o )
Linking dist/build/profiling-with-hashtables/profiling-with-hashtables ...
Undefined symbols for architecture x86_64:
  "_line_mask", referenced from:
      _line_search in libHShashtables-1.2.0.0_p.a(default.p_o)
  "_line_mask_2", referenced from:
      _line_search_2 in libHShashtables-1.2.0.0_p.a(default.p_o)
  "_line_mask_3", referenced from:
      _line_search_3 in libHShashtables-1.2.0.0_p.a(default.p_o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I have a reproducible test case here https://github.com/zmoazeni/profiling-with-hashtables

I think it's a local only issue (or maybe clang is to blame?) because I can start up a vagrant-haskell container and it will build/run/profile just fine.

Does this sound familiar to you? Do you have any ideas of what I can investigate on my side to compile with profiling enabled?

I'm on OSX Mavericks using ghc 7.8.3 and cabal 1.20.0.3

Update benchmarks to criterion 1.5

As it stands it seems like:

The benchmarks don't build with new versions of criterion.
New versions of GHC don't build the old versions of criterion.

But old enough versions of criterion also don't build with new GHC versions (8.4+)

So there is currently no way to run the benchmarks using recent GHC versions.

Support insertWith to class HashTable

Firstly, thank you for such brilliant code.

I would be interested to contribute a 'insertWith' function with identical semantics to the one in Data.Map

Would this be welcomed?

Support for hashable-1.4

hashable-1.4 has been released. Can hashtables support it?

Build error with GHC 8.4.1-alpha1

GHC 8.4.1-alpha1 was announced. While testing Agda with this version of GHC, I got the following error on hashtables (master branch):

$ cabal install
...
src/Data/HashTable/ST/Basic.hs:491:10: error:
    • No instance for (Semigroup Slot)
        arising from the superclasses of an instance declaration
    • In the instance declaration for ‘Monoid Slot’
    |
491 | instance Monoid Slot where
    |          ^^^^^^^^^^^
Failed to install hashtables-1.2.2.1

Blocking agda/agda#2878.

Out of bounds errors with GHC 7.8.2

So, this is an issue we discovered in Accelerate (see AccelerateHS/accelerate#162).

It can be triggered by building hashtables in GHC 7.8.2 with the flags set as below.

cabal configure -f-unsafe-tricks -fbounds-checking -fportable -f-debug

Then running the testsuite

dist/build/testsuite/testsuite -a 10000 --maximum-unsuitable-generated-tests=10000

This produces the following output.

basic:
  basic/fromListToList: [OK, passed 10000 tests]
  basic/insert: [Failed]
*** Failed! Exception: './Data/Vector/Generic/Mutable.hs:596 (write): index out of bounds (19,19)' (after 4882 tests): 
(used seed -8742506372111056024)
  basic/insert2: [Failed]
*** Failed! Exception: './Data/Vector/Generic/Mutable.hs:596 (write): index out of bounds (53,53)' (after 7987 tests): 
(used seed -254882056523583964)
  basic/newAndInsert: [OK, passed 10000 tests]
  basic/growTable: [OK, passed 10000 tests]
  basic/delete: [OK, passed 10000 tests]
  basic/nastyFullLookup: [OK]
  basic/forwardSearch3: [OK]

In Accelerate we found this failure was occurring entirely non-deterministically. Running tests 1000's of times, like above, was the only way to guarantee its occurrence.

Build error using primitive 0.7.0.0

I'm getting the following error installing hashtables 1.2.3.2 using using GHC 8.6.5 and primtive 0.7.0.0:

src/Data/HashTable/Internal/IntArray.hs:23:44: error:
    Module ‘Data.Primitive.Types’ does not export ‘Addr(..)’
   |
23 | import           Data.Primitive.Types     (Addr (..))
   |                                            ^^^^^^^^^
Failed to install hashtables-1.2.3.2

Blocking agda/agda#3800.

Unboxed variants of hash tables

There are uses cases where compactness is quite important and we're working with unboxable types like Ints (or newtyped wrapped ints) where it'd be nice to have hash table that uses an unboxed vector for the keys and values. (These cases also tend to want inplace updates of the values, issue #6)

For such an application I hacked up the implementation to add a variant of Basic that uses unboxed vectors rather than boxed ones. I had to make some adjustments to the type class to allow this, but it doesn't affect the existing implementations too much. It just changes how we declare the instances, and it affects generic code in the IO wrapper.

So this is an RFC. In particular, @gregorycollins would you be interested in unboxed variants of the existing implementations? If so I can try and tidy up what I hacked up quickly and we can look at different ways to adjust the class to allow for both boxed and unboxed variants.

Hashtables 1.2.2.0 should be marked as depending on base >= 4.7

The dependencies on hashtables-1.2.2.0 as listed on hackage are incorrect. It is listed as requiring base (==4.*) but should list the dependencies as base (>= 4.7.0.0 && < 5)

The code will not compile on ghc 7.6.3:

Downloading hashtables-1.2.2.0...
Configuring hashtables-1.2.2.0...
Building hashtables-1.2.2.0...
Preprocessing library hashtables-1.2.2.0...
[ 1 of 12] Compiling Data.HashTable.Internal.UnsafeTricks ( src/Data/HashTable/Internal/UnsafeTricks.hs, dist/build/Data/HashTable/Internal/UnsafeTricks.o )
[ 2 of 12] Compiling Data.HashTable.Internal.IntArray ( src/Data/HashTable/Internal/IntArray.hs, dist/build/Data/HashTable/Internal/IntArray.o )

src/Data/HashTable/Internal/IntArray.hs:74:19:
    Not in scope: `finiteBitSize'
Failed to install hashtables-1.2.2.0

Literals in CheapPseudoRandomBitStream overflow

Judging from the class name, maybe the implementation was never intended to be correct. Nevertheless, in GHC 7.8, we get these warnings for integer overflows. Perhaps these were intended to be words?

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:25:26: Warning:
    Literal 3801334549 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:26:26: Warning:
    Literal 2195914594 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:27:26: Warning:
    Literal 3407804846 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:31:26: Warning:
    Literal 3071798340 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:32:26: Warning:
    Literal 3886219854 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:34:26: Warning:
    Literal 2279799119 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:35:26: Warning:
    Literal 4195705714 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:36:26: Warning:
    Literal 3778173719 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:37:26: Warning:
    Literal 3758449706 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:38:26: Warning:
    Literal 3369086643 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:39:26: Warning:
    Literal 3909880455 of type Int32 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:48:26: Warning:
    Literal 14589171272287321080 of type Int64 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:54:26: Warning:
    Literal 14059785151234048019 of type Int64 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:55:26: Warning:
    Literal 16300700092804310893 of type Int64 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:60:26: Warning:
    Literal 11590662408288112295 of type Int64 overflows

src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:61:26: Warning:
    Literal 16966951249769150579 of type Int64 overflows

gregorycollins / hashtables Goto Github PK

hashtables's People

Contributors

Stargazers

Watchers

Forkers

hashtables's Issues

Crash

Infinite loop (or maybe just a huge slowdown?)

Recommend Projects

Recommend Topics

Recommend Org