gregorycollins / hashtables Goto Github PK
View Code? Open in Web Editor NEWMutable hash tables for Haskell, in the ST monad
License: BSD 3-Clause "New" or "Revised" License
Mutable hash tables for Haskell, in the ST monad
License: BSD 3-Clause "New" or "Revised" License
seems like 'forwardSearch2' is looping infinitely, both the default C version and in portable mode.
Test system is Mac OS X 10.7.2, Core 2 Duo, ghc-7.2.1 (64-bit).
I can no longer confirm, but I think I had the same problem under 10.6.7 and ghc-7.0.4 (64-bit) as well.
Any ideas what could trigger this? What sort of information will be useful for debugging?
hashable-1.3.0.0 is released. Could you update hashtable to support that?
Some years ago I note HashTables based solutions in Haskell didn't compare well, performance wise, with the same solution in Python. For example, a trivial insert of millions of elements:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Data.HashTable.IO as H
import qualified Data.ByteString as BS
import Data.Hashable
newtype Bad = B { unB :: Int } deriving (Show, Enum, Ord, Eq)
instance Hashable Bad where
hash (B i) = i
hashWithSalt _ (B i) = i
main :: IO ()
main = do
ht <- H.new :: IO (H.BasicHashTable Bad BS.ByteString)
let go (B 63000001) = pure ()
go n = H.insert ht n BS.empty >> go (B $ unB n + 1)
go (B 1)
In the above we're just inserting the pairs [ (i, empty) | i <- [1 .. 63000000] ]
Notice I've made the "hash" match python's integer hash (mostly, ignoring what they do with negatives) though this didn't have a meaninful performance impact.
The performance difference between this snippet and the matching python is about 3x (30 seconds for Haskell with ghc 8.4.1 -O2 -fllvm -optl-march=native
, 8 or 10 seconds for python or pypy). Has there been an exploration on this front? Is this mostly an issue with creating (and collecting) intermediate values such as HashTable_
?
For the Agda code, @nad observed a performance increase from hashtables 1.2.4.2 to 1.3, see:
He reports that this fixes a performance regression for him, but does not pin down where the regression happened.
Also, it seems that vector-hashtables outperforms hashtables by a slight margin, so we are switching.
I thought I let you know, possible actions could be:
You can compare STRef
s for equality, so you can have an Eq
instance that checks to see if two HashTable s k v
are the same hashtable. I wound up needing this, but can't write the isntance from outside of the package.
The instance does the only thing such an instance could do. It obviously can't compare the contents, and works just like when you compare STRefs or IORefs for equality.
Following is a benchmark that outlines the problem boldly: inserting 100 000 rows into a single table takes at least twice as long as inserting the same amount of rows into 100 tables. This means that a table with 100 000 rows performs two times worse than a table with 1000 rows. This must mean that something is leaking.
import Control.Monad
import Criterion.Main
import qualified Data.HashTable.IO as H
main =
defaultMain
[
insertingBench 100000 1,
insertingBench 1000 100
]
type RowsAmount = Int
type TablesAmount = Int
insertingBench :: RowsAmount -> TablesAmount -> Benchmark
insertingBench rows tables =
bench ("Inserting " ++ show rows ++ " rows into " ++ show tables ++ " tables")
io
where
io =
replicateM tables $ do
t <- H.new :: IO (H.BasicHashTable Int ())
forM_ [0..rows] $ \i -> H.insert t i ()
Results:
benchmarking Inserting 100000 rows into 1 tables
mean: 22.11892 ms, lb 22.09413 ms, ub 22.14364 ms, ci 0.950
std dev: 126.8588 us, lb 113.2326 us, ub 146.2977 us, ci 0.950
benchmarking Inserting 1000 rows into 100 tables
mean: 10.05312 ms, lb 10.05143 ms, ub 10.05483 ms, ci 0.950
std dev: 8.703506 us, lb 7.309154 us, ub 11.17319 us, ci 0.950
I stumbled upon this SO thread. It's quite dated, so I wanted to test it on GHC 8.0.2.
First, I've used CuckooHashTable (code attached below) — as can be easily verified, I'm using hashtables-1.2.1.0
there. The results on OSX are dissatisfying: ~4.4s for OCaml, ~16s for Haskell.
Then I've changed CuckooHashTable
to BasicHashTable
and automagically the Haskell got into ~4.4s range.
Docs state that:
Randomized testing shows this implementation of cuckoo hashing to be slightly faster on insert and slightly slower on lookup than Data.Hashtable.ST.Basic, while being more space efficient by about a half-word per key-value mapping.
While [1..10000000]
is certainly not randomized, the workload suggest that Cuckoo should be quicker than Basic but isn't. What's happening here?
Maybe the docs shall be slightly corrected?
-- stack --resolver nightly-2017-02-08 ghc --package hashtables -- -O2 harrop_ghc.hs
import Control.Monad
import qualified Data.HashTable.IO as H
type HashTable k v = H.CuckooHashTable k v
main = do
m <- H.new :: IO (HashTable Int Int)
forM_ [1..10000000] $ \n -> H.insert m n n
v <- H.lookup m 100
print v
(* ocamlopt -o harrop_ocaml harrop_ocaml.ml *)
let n = 10000000
let () =
let m = Hashtbl.create n in
for n=1 to n do
Hashtbl.add m n n
done;
Printf.printf "%d\n%!" (Hashtbl.find m 100)
This might be related to @nikita-volkov's #14
@gregorycollins I was curious about the performance between this and unordered-containers, thinking this would blow it out of the water. But I was kinda shocked to see that unordered-containers was performing much faster.
I put together some reproducible benchmarks in https://github.com/zmoazeni/testing-hashtable-performance but I was wondering if I'm missing something big that is swaying the results (very possible).
# A test file with 10,000 inserts and 500 lookups
benchmarking hashmap
mean: 38.75604 ms, lb 38.25674 ms, ub 39.32312 ms, ci 0.950
std dev: 2.724852 ms, lb 2.382821 ms, ub 3.149132 ms, ci 0.950
variance introduced by outliers: 64.623%
variance is severely inflated by outliers
benchmarking hashtable cuckoo
collecting 100 samples, 1 iterations each, in estimated 5.484390 s
mean: 50.78706 ms, lb 50.17328 ms, ub 51.44667 ms, ci 0.950
std dev: 3.252150 ms, lb 2.837077 ms, ub 3.812114 ms, ci 0.950
found 3 outliers among 100 samples (3.0%)
3 (3.0%) high mild
variance introduced by outliers: 60.535%
variance is severely inflated by outliers
benchmarking hashtable basic
collecting 100 samples, 1 iterations each, in estimated 5.088615 s
mean: 52.46744 ms, lb 51.90874 ms, ub 53.06552 ms, ci 0.950
std dev: 2.986507 ms, lb 2.611857 ms, ub 3.462132 ms, ci 0.950
found 2 outliers among 100 samples (2.0%)
2 (2.0%) high mild
variance introduced by outliers: 54.488%
variance is severely inflated by outliers
I just tried to build the package with -funsafe-tricks -fsse42
, and the build failed since this file does not exist.
Running the test suite, cuckoo segfaults. Enabling bounds checking yields no further insight. Works fine in portable mode. Throwing in a few extra debug messages, I got as far as readArray
at Cuckoo.hs:261.
This is ghc-7.2.1 (64-bit) on Mac OS 10.7.2, gcc-4.2. However, works fine on a ubuntu box also with ghc-7.2.1 (64-bit) and gcc-4.4.3, so unless you also have an OS X box that displays this problem, I might be stuck reading about how cuckoo hashing works...
$ ./dist/build/testsuite/testsuite
cuckoo:
creating cuckoo hash table with 19 buckets having 152 total slots
creating cuckoo hash table with 19 buckets having 152 total slots
insert': begin
updateOrFail: begin: sz = 19
h1=1, h2=7470097589608349705, b1=8, b2=80
delete' b1=8 b2=80 h1=1 h2=7470097589608349705
Segmentation fault: 11
I have a use-case to create a hash table from a Data.Vector (k, v) and I was wondering if you wanted a patch for this for your code?
Just a warning that the package fails to compile due to a regression in HEAD.
The ticket tracking the issue is https://ghc.haskell.org/trac/ghc/ticket/11246
Hopefully it gets fixed before RC1.
Would you be interested in merging changes that moved from hashable
to my hashabler
project? Here's a blog post I wrote that introduces an older version.
This would leave the API more or less unchanged but would
The downsides might include:
hashable
may have greater backwards compatibilityhashable
is more widely used and users would need to redefine any custom data type instances for hashabler
's Hashable
The last two downsides will get mostly solved together when I define a TH deriving routine that's a drop-in replacement for hashable
's.
I'm not sure when I'd be able to get to a PR, but I wanted to float the idea first. Let me know if you're interested and would like more details about what that might look like. Also if this idea sounds appealing but you have changes you'd need to see made in hashabler
first. Thanks!
The hackage package index page for version 1.2.1.0 does not have links to documentation. Earlier versions do contain documentation, so I presume that the issue is some part of the upload process.
Hi! Would just like to mention that hashtables can form a monoid, it would be nice to not have this implemented as an orphan. Also I think that there might be a faster implementation, maybe someone else might be able to suggest something better?
instance (HashTable h, Eq a, Hashable a, Monoid v) => Monoid (IO (h RealWorld a v)) where
mempty = H.new
mappend a b = do
a' <- a
b' <- b
H.mapM_ (uncurry $ update a') b'
return a'
mconcat = foldl' mappend mempty
update h k w = do
l <- H.lookup h k
case l of
Nothing -> H.insert h k mempty
Just v -> H.insert h k $ v `mappend` w
I am implementing a cache using HashTable. It will occasionally scan the entire table looking for out-of-date values (based on a field of v
) and expunge them. I would love to be able to implement this without doing multiple scans, hashing, lookups, and more.
Any chance you'd be willing to expose something like the below in an ".Internal.[Cuckoo|Basic|Linear]" or similar module, or accept a pull request for the same?
modifyAll :: ((k,v) -> Either Bool v)
-> HashTable_ s k v
-> ST s ()
modifyAll f (HashTable sz _ hashes keys values _) = go 0
where
totSz = numElemsInCacheLine * sz
go !i | i >= totSz = return ()
| otherwise = do
h <- U.readArray hashes i
if h /= emptyMarker
then do
k <- readArray keys i
v <- readArray values i
case f (k,v) of
Left False -> do
U.writeArray hashes i emptyMarker
writeArray keys i undefined
writeArray values i undefined
Left True -> return ()
Right nv -> writeArray values i nv
go (i+1)
else
go (i+1)
For my purposes I don't actually need k
or the ability to write a new v
, but this seems more generally applicable.
A complete build log is at http://hydra.cryp.to/build/287347/log/raw.
I got the following error using GHC 7.8.4:
$ cabal get hashtables
$ cd hashtables-1.2.2.0
$ cabal sandbox init
$ cabal install
Building library for hashtables-1.2.2.0..
[ 8 of 12] Compiling Data.HashTable.Class ( src/Data/HashTable/Class.hs, dist/dist-sandbox-21f4a9e2/build/Data/HashTable/Class.o )
src/Data/HashTable/Class.hs:99:70:
Not in scope: type constructor or class ‘Word’
src/Data/HashTable/Class.hs:104:31:
Not in scope: type constructor or class ‘Word’
src/Data/HashTable/Class.hs:104:52:
Not in scope: type constructor or class ‘Word’
Failed to install hashtables-1.2.2.0
Hello, there seems to be a bug in the Cuckoo hash table when using sse4 instructions. The table will actually drop keys (at least i can't find them on lookup). Its possible you can find it if you enumerate entire table (did not try this). But its a big enough issue that look up is returning false. I can confirm that using the linear table with the same dataset and operations does not lose the key (with sse4 flag true). If the sse4 flag is false Cuckoo works as expected. I'm attaching a test setup for reproducing the issue. The test creates a hash table with UUID keys and then perform a series of adds/removes. Then a target key is looked up (that was not removed) and you will find it is not there. I've left code in place for toggling table type.
This was a really nasty thing to find lol.
hash-test.tar.gz
Note: uuids.txt has an initial set of uuids to load into table. hash_ops.txt contains ops that when replayed eventually corrupt table.
Would it be possible to add support to freeze the container - creating a read only copy which can be queried outside monad.
Compilation fails with GHC 9.2 (see below).
There is a fix by @RyanGIScott on head.hackage: https://gitlab.haskell.org/ghc/head.hackage/-/blob/master/patches/hashtables-1.2.4.1.patch
$ cabal install -w ghc-9.2.0 hashtables
Build profile: -w ghc-9.2.0.20210806 -O1
In order, the following will be built (use -v for more details):
- hashtables-1.2.4.1 (lib) (requires build)
...
[ 3 of 12] Compiling Data.HashTable.Internal.IntArray ( src/Data/HashTable/Internal/IntArray.hs, dist/build/Data/HashTable/Internal/IntArray.o, dist/build/Data/HashTable/Internal/IntArray.dyn_o )
src/Data/HashTable/Internal/IntArray.hs:55:18: error:
• Couldn't match type ‘Word16#’ with ‘Word#’
Expected: Word# -> Elem
Actual: Word16# -> Word16
• In the expression: W16#
In an equation for ‘primWordToElem’: primWordToElem = W16#
|
55 | primWordToElem = W16#
| ^^^^
src/Data/HashTable/Internal/IntArray.hs:66:34: error:
• Couldn't match expected type ‘Word#’ with actual type ‘Word16#’
• In the first argument of ‘word2Int#’, namely ‘w#’
In the expression: word2Int# w#
In an equation for ‘elemToInt#’:
elemToInt# (W16# w#) = word2Int# w#
|
66 | elemToInt# (W16# w#) = word2Int# w#
| ^^
Error: cabal: Failed to build hashtables-1.2.4.1.
Blocking:
This ticket is based on GaloisInc/saw-script#674.
We recently switched one of our projects from from Data.HashTable.ST.Cuckoo
to Data.HashTable.ST.Basic
, to avoid a bug in Cuckoo (related to #55, I believe). As a result, we noticed a surprisingly large performance hit. Here are some numbers from profiling a part of our test suite that makes heavy use of hash table operations, where I compared runs with Cuckoo
and Basic
hash table libraries:
This was especially surprising considering that the hashtables
documentation claims Basic
should have the fastest lookups. Profiling showed that resizing of the Basic
hash table happened 42 times, taking up about 10% of total runtime, so resizing only accounts for a small portion of the slowdown.
To see whether profiling itself had anything to do with the numbers, I recorded a trace of the keys for all insert and lookup operations we were doing, and then ran a testbench that only performed the hash table operations. Timing (without profiling) shows that after parsing, Cuckoo
takes an additional ten seconds or so, while Basic
takes approximately an additional 8 minutes. So the slowdown ratio indicated by profiling is not far off.
These tests were done on MacOS with ghc-8.8.3; I haven't done thorough testing with other compiler versions.
Hi, I am seeing this error
... Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
undefined, called at src/Data/HashTable/ST/Cuckoo.hs:406:31 in hashtables-1.2.4.1-6MJy5UcaJATEc9hXD9mdTw:Data.HashTa
ble.ST.Cuckoo
on x86_64, with +RTS -A400m -M100g
. I am only using insert
and lookup
. The code runs in the ST
monad. The error is happening when re-sizing?
is it worth investigating? I cannot easily release the code as-is. I could debug on my machine (how?), or try to extract a reproducer.
Currently, the insert
function has () as return type. However, it's often useful to know whether the value was actually inserted (didn't exist yet) or not. I think this should be easy to determine while inserting.
Hi,
hashtables does not build on a powerpc machine, for a build log see
https://buildd.debian.org/status/fetch.php?pkg=haskell-hashtables&arch=powerpc&ver=1.0.0.0-2&stamp=1330002314
Do you think you can fix that?
Thanks,
Joachim
Hi!
I has been thinking about implementing a Hopscotch hashing to extend the library.
Hopscotch is said to provide good performance at very high table load factors, even ones exceeding 0.9.
However, I am not sure about how fast it could be on haskell due to bit manipulation.
Have you ever try to implement it? @gregorycollins
edit: see next comment for a much better and more minimal reproduction:
Would be great if the hashtable implementations could be used with the STT transformer (https://hackage.haskell.org/package/STMonadTrans-0.3.3). Probably more usefully still would be to have an interface for parameterising how the state accesses are performed, assuming this can be written so that inlining avoids any performance penalty, so that they could be used with any monad that implements mutable state.
Can it be, that 1.2.3.0 silently breaks compatibility with ghc < 7.10, e.g. without applicative => monad proposal?
gtk2hs has no version constraint on hastables and I just run into by building svgcairo, see here.
That might be a good thing after all ...
These warnings should be addressed at some point:
[ 9 of 12] Compiling Data.HashTable.ST.Basic ( src/Data/HashTable/ST/Basic.hs, dist/build/Data/HashTable/ST/Basic.o, dist/build/Data/HashTable/ST/Basic.dyn_o )
src/Data/HashTable/ST/Basic.hs:522:2: warning: [-Wnoncanonical-monoid-instances]
Noncanonical ‘(<>) = mappend’ definition detected
in the instance declaration for ‘Semigroup Slot’.
Move definition from ‘mappend’ to ‘(<>)’
See also: https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/semigroup-monoid
|
522 | (<>) = mappend
| ^^^^^^^^^^^^^^
src/Data/HashTable/ST/Basic.hs:527:5: warning: [-Wnoncanonical-monoid-instances]
Noncanonical ‘mappend’ definition detected
in the instance declaration for ‘Monoid Slot’.
‘mappend’ will eventually be removed in favour of ‘(<>)’
Either remove definition for ‘mappend’ (recommended) or define as ‘mappend = (<>)’
See also: https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/semigroup-monoid
|
527 | (Slot x1) `mappend` (Slot x2) =
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...
Hi,
I don't see a way to set load factor for new/existing hashtable.
When hashtable is doubled?
I've been experiencing multiple issues with different implementations of "hashtables" while benchmarking insert operations, so I made a simple profiling script:
import Prelude
import Control.Monad
import qualified Data.HashTable.IO as H
main = do
c <- H.newSized (10^6) :: IO (H.LinearHashTable Int ())
forM_ [0..500000] $ \i -> H.insert c i ()
Profiling it produces daunting results to say at very least. The program spends most of its time collecting garbage. What's worse is that the Productivity
percent drops under a progression, i.e. the more rows you insert the larger part of execution time the program spends on GC.
Here are the results:
1,521,188,672 bytes allocated in the heap
3,599,454,896 bytes copied during GC
37,996,112 bytes maximum residency (13 sample(s))
1,404,680 bytes maximum slop
76 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2999 colls, 0 par 4.13s 4.23s 0.0014s 0.0026s
Gen 1 13 colls, 0 par 0.23s 0.26s 0.0198s 0.0341s
TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.76s ( 0.87s elapsed)
GC time 4.28s ( 4.41s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.08s ( 0.08s elapsed)
EXIT time 0.00s ( 0.01s elapsed)
Total time 5.13s ( 5.28s elapsed)
Alloc rate 1,992,503,395 bytes per MUT second
Productivity 14.9% of total user, 14.5% of total elapsed
Building hashtables-1.2.0.1...
[ 1 of 12] Compiling Data.HashTable.Internal.UnsafeTricks ( src/Data/HashTable/Internal/UnsafeTricks.hs, dist/build/Data/HashTable/Internal/UnsafeTricks.o )
src/Data/HashTable/Internal/UnsafeTricks.hs:29:8:
parse error on input `::'
The complete build log is at http://hydra.cryp.to/build/287890/log/raw.
I would like to update a value stored in a hashtable. The unordered-containers
package provides methods like insertWith
to do this, but the hashtable package unfortunately doesn't. The only way to update a value in place seems to be to do a lookup followed by an insert, but this strikes me as unnecessarily inefficient. Or am I missing something terribly obvious?
I think that you can just bump the dependency.
I ran into a segfault with the newer releases of hashtables under both ghc 7.6.3 and 7.8.4. My workload looks up keys in the hash table, inserting them with some value if they are not already present. The order is basically random. Here is a test program:
{-# LANGUAGE ViewPatterns #-}
module Main ( main ) where
import qualified Data.HashTable.IO as IOH
import qualified Data.Time.Clock as C
import qualified System.Environment as E
import qualified System.Random as R
import Text.Printf ( printf )
main :: IO ()
main = do
(read -> nActions) : args <- E.getArgs
rgen <- case args of
[] -> do
now <- C.getCurrentTime
let seed = fromIntegral ((floor (C.utctDayTime now)) :: Int)
_ <- printf "Seed: %d\n" seed
return $ R.mkStdGen seed
seed : _ -> return $ R.mkStdGen (read seed)
ht <- IOH.newSized 8192
stress ht (take nActions (R.randoms rgen))
return ()
stress :: IOH.BasicHashTable Int Int -> [Int] -> IO ()
stress _ [] = return ()
stress ht (val:rest) = do
res <- IOH.lookup ht val
case res of
Nothing -> IOH.insert ht val val
Just _ -> return ()
stress ht rest
and a cabal file to build it
name: ht-test
version: 0.1.0.0
synopsis: hashtables stress test
license: BSD3
license-file: LICENSE
author: Tristan Ravitch
maintainer: [email protected]
category: Data
build-type: Simple
cabal-version: >=1.10
executable ht-test
main-is: Main.hs
build-depends: base == 4.*,
random,
time,
-- hashtables == 1.2.0.1
-- hashtables == 1.2.0.0
hashtables < 1.2
default-language: Haskell2010
ghc-options: -Wall
The program takes two parameters: the number of (random) values to look up/insert and the seed to initialize the random number generator with. I have two interesting inputs to report:
ht-test 100000 1862
causes a segfault with hashtables >= 1.2
. With 1.1, it acts as expected. That is 100,000 elements starting with a seed of 1862.
This is probably an infinite loop, and probably also just another manifestation of the same underlying problem as the segfault: ht-test 1000000 12
This completes in about 6 seconds with hashtables < 1.2
. I never saw it finish for 1.2+. I did some profiling on my real application and it looked like execution was looping forever in delete'.go
in the BasicHashTable code.
I'll try to look for some smaller test cases that trigger these issues. Finding the larger ones was a bit easier.
Hi @gregorycollins,
This isn't really an issue for hashtables, as it is an issue involving hashtables. I'm currently working on a program that depends on hashtables, and when I try to profile it, my build simply doesn't want to work. It complains about missing symbols from hashtables:
∴ cabal build
Building profiling-with-hashtables-0.1.0.0...
Preprocessing executable 'profiling-with-hashtables' for
profiling-with-hashtables-0.1.0.0...
[1 of 1] Compiling Main ( Main.hs, dist/build/profiling-with-hashtables/profiling-with-hashtables-tmp/Main.p_o )
Linking dist/build/profiling-with-hashtables/profiling-with-hashtables ...
Undefined symbols for architecture x86_64:
"_line_mask", referenced from:
_line_search in libHShashtables-1.2.0.0_p.a(default.p_o)
"_line_mask_2", referenced from:
_line_search_2 in libHShashtables-1.2.0.0_p.a(default.p_o)
"_line_mask_3", referenced from:
_line_search_3 in libHShashtables-1.2.0.0_p.a(default.p_o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I have a reproducible test case here https://github.com/zmoazeni/profiling-with-hashtables
I think it's a local only issue (or maybe clang is to blame?) because I can start up a vagrant-haskell container and it will build/run/profile just fine.
Does this sound familiar to you? Do you have any ideas of what I can investigate on my side to compile with profiling enabled?
I'm on OSX Mavericks using ghc 7.8.3 and cabal 1.20.0.3
As it stands it seems like:
But old enough versions of criterion also don't build with new GHC versions (8.4+)
So there is currently no way to run the benchmarks using recent GHC versions.
Firstly, thank you for such brilliant code.
I would be interested to contribute a 'insertWith' function with identical semantics to the one in Data.Map
Would this be welcomed?
hashable-1.4
has been released. Can hashtables support it?
GHC 8.4.1-alpha1 was announced. While testing Agda with this version of GHC, I got the following error on hashtables (master branch):
$ cabal install
...
src/Data/HashTable/ST/Basic.hs:491:10: error:
• No instance for (Semigroup Slot)
arising from the superclasses of an instance declaration
• In the instance declaration for ‘Monoid Slot’
|
491 | instance Monoid Slot where
| ^^^^^^^^^^^
Failed to install hashtables-1.2.2.1
Blocking agda/agda#2878.
So, this is an issue we discovered in Accelerate (see AccelerateHS/accelerate#162).
It can be triggered by building hashtables in GHC 7.8.2 with the flags set as below.
cabal configure -f-unsafe-tricks -fbounds-checking -fportable -f-debug
Then running the testsuite
dist/build/testsuite/testsuite -a 10000 --maximum-unsuitable-generated-tests=10000
This produces the following output.
basic:
basic/fromListToList: [OK, passed 10000 tests]
basic/insert: [Failed]
*** Failed! Exception: './Data/Vector/Generic/Mutable.hs:596 (write): index out of bounds (19,19)' (after 4882 tests):
(used seed -8742506372111056024)
basic/insert2: [Failed]
*** Failed! Exception: './Data/Vector/Generic/Mutable.hs:596 (write): index out of bounds (53,53)' (after 7987 tests):
(used seed -254882056523583964)
basic/newAndInsert: [OK, passed 10000 tests]
basic/growTable: [OK, passed 10000 tests]
basic/delete: [OK, passed 10000 tests]
basic/nastyFullLookup: [OK]
basic/forwardSearch3: [OK]
In Accelerate we found this failure was occurring entirely non-deterministically. Running tests 1000's of times, like above, was the only way to guarantee its occurrence.
I'm getting the following error installing hashtables 1.2.3.2
using using GHC 8.6.5 and primtive 0.7.0.0
:
src/Data/HashTable/Internal/IntArray.hs:23:44: error:
Module ‘Data.Primitive.Types’ does not export ‘Addr(..)’
|
23 | import Data.Primitive.Types (Addr (..))
| ^^^^^^^^^
Failed to install hashtables-1.2.3.2
Blocking agda/agda#3800.
There are uses cases where compactness is quite important and we're working with unboxable types like Ints (or newtyped wrapped ints) where it'd be nice to have hash table that uses an unboxed vector for the keys and values. (These cases also tend to want inplace updates of the values, issue #6)
For such an application I hacked up the implementation to add a variant of Basic that uses unboxed vectors rather than boxed ones. I had to make some adjustments to the type class to allow this, but it doesn't affect the existing implementations too much. It just changes how we declare the instances, and it affects generic code in the IO wrapper.
So this is an RFC. In particular, @gregorycollins would you be interested in unboxed variants of the existing implementations? If so I can try and tidy up what I hacked up quickly and we can look at different ways to adjust the class to allow for both boxed and unboxed variants.
The dependencies on hashtables-1.2.2.0 as listed on hackage are incorrect. It is listed as requiring base (==4.*)
but should list the dependencies as base (>= 4.7.0.0 && < 5)
The code will not compile on ghc 7.6.3:
Downloading hashtables-1.2.2.0...
Configuring hashtables-1.2.2.0...
Building hashtables-1.2.2.0...
Preprocessing library hashtables-1.2.2.0...
[ 1 of 12] Compiling Data.HashTable.Internal.UnsafeTricks ( src/Data/HashTable/Internal/UnsafeTricks.hs, dist/build/Data/HashTable/Internal/UnsafeTricks.o )
[ 2 of 12] Compiling Data.HashTable.Internal.IntArray ( src/Data/HashTable/Internal/IntArray.hs, dist/build/Data/HashTable/Internal/IntArray.o )
src/Data/HashTable/Internal/IntArray.hs:74:19:
Not in scope: `finiteBitSize'
Failed to install hashtables-1.2.2.0
Judging from the class name, maybe the implementation was never intended to be correct. Nevertheless, in GHC 7.8, we get these warnings for integer overflows. Perhaps these were intended to be words?
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:25:26: Warning:
Literal 3801334549 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:26:26: Warning:
Literal 2195914594 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:27:26: Warning:
Literal 3407804846 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:31:26: Warning:
Literal 3071798340 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:32:26: Warning:
Literal 3886219854 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:34:26: Warning:
Literal 2279799119 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:35:26: Warning:
Literal 4195705714 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:36:26: Warning:
Literal 3778173719 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:37:26: Warning:
Literal 3758449706 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:38:26: Warning:
Literal 3369086643 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:39:26: Warning:
Literal 3909880455 of type Int32 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:48:26: Warning:
Literal 14589171272287321080 of type Int64 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:54:26: Warning:
Literal 14059785151234048019 of type Int64 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:55:26: Warning:
Literal 16300700092804310893 of type Int64 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:60:26: Warning:
Literal 11590662408288112295 of type Int64 overflows
src/Data/HashTable/Internal/CheapPseudoRandomBitStream.hs:61:26: Warning:
Literal 16966951249769150579 of type Int64 overflows
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.