Comments (9)
I've run a few benchmarks with lowbias32
and found that better hashing indeed improves performance. These are the results for single-threaded Add
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1635/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1260P, 1 CPU, 16 logical and 12 physical cores
.NET SDK=8.0.100-preview.2.23157.25
[Host] : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
Job-LBSDVQ : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
InvocationCount=1 UnrollFactor=1
Method | N | Mean | Error | StdDev | Median |
---|---|---|---|---|---|
Add_Concurrent | 1 | 1,104.0 ns | 118.56 ns | 349.6 ns | 900.0 ns |
Add_NonBlocking | 1 | 493.0 ns | 61.59 ns | 181.6 ns | 400.0 ns |
Add_Experimental | 1 | 592.0 ns | 68.71 ns | 202.6 ns | 450.0 ns |
Add_Concurrent | 10 | 1,655.1 ns | 117.56 ns | 342.9 ns | 1,500.0 ns |
Add_NonBlocking | 10 | 3,576.3 ns | 222.32 ns | 645.0 ns | 3,400.0 ns |
Add_Experimental | 10 | 3,440.6 ns | 201.95 ns | 582.7 ns | 3,300.0 ns |
Add_Concurrent | 100 | 9,233.7 ns | 187.97 ns | 348.4 ns | 9,150.0 ns |
Add_NonBlocking | 100 | 22,891.7 ns | 457.90 ns | 903.9 ns | 22,650.0 ns |
Add_Experimental | 100 | 16,071.9 ns | 327.02 ns | 757.9 ns | 15,900.0 ns |
Add_Concurrent | 1000 | 74,595.0 ns | 1,418.06 ns | 1,633.0 ns | 74,750.0 ns |
Add_NonBlocking | 1000 | 97,635.7 ns | 1,854.58 ns | 1,644.0 ns | 97,800.0 ns |
Add_Experimental | 1000 | 92,435.7 ns | 1,659.60 ns | 1,471.2 ns | 92,400.0 ns |
Add_Concurrent | 10000 | 820,619.4 ns | 24,486.37 ns | 71,427.9 ns | 801,100.0 ns |
Add_NonBlocking | 10000 | 1,236,969.8 ns | 24,408.56 ns | 45,242.8 ns | 1,228,100.0 ns |
Add_Experimental | 10000 | 1,213,119.2 ns | 24,226.23 ns | 33,161.1 ns | 1,202,950.0 ns |
from nonblocking.
some clustering is a bit of intentional trade off as hinted in
small clusters are ok though
Some background on this:
In theory a good hash function maps any given set of keys to a uniformly distributed set of buckets.
Also the hashtable is not really in the business of hashing the keys. Ideally it could assume that GetHashCode
is already as good as it gets and the table only needs to reduce the full range of int32 hash to the actual smaller number of buckets.
Ideally masking N bits of the hash would be all that is needed... Ideally.
In practice GetHashCode
is not always good. Degenerate cases of GetHashCode
like always returning 1
can't be helped, but there are also cases of uneven distribution of hashes. These are common and can cause clustering, thus additional reshuffle of the hash before reducing is helpful.
A mod-prime bucket-chaining ConcurrentDictionary
cares much less about this, but we have a power-2 open addressing dictionary, which has many advantages, but is also more sensitive to poor hashing, so we need to shuffle.
There is also one common scenario where keys are integers - 42, 43, 44, 45,... Someone will certainly use the dictionary as a lock-free arraylist. It also could be phone numbers or postal codes, memory addresses, or some other numbers.
Such keys are not only presenting poor hash functions (mapping an int to itself is far from random), but they also often have locality of access - someone will access the hashtable in the key order - 1, 2, 3, 4... Use it in for
loops, use just a few keys that are close to each other, and the like.
In such case preserving some clustering is useful. The keys with "good" GetHashCode
, like strings, will not lose much either way - as long as the shuffle is a reversible function, we will map random to random, but for keys that naturally have locality of access it is a tradeoff. Do we want to completely randomize 1, 2, 3, 4, ... or we want to keep small runs together?
Right now the shuffle will preserve short runs of consecutive hashes, up to 8, if I recall correctly. After that it should scatter the hashes. The picture that the shader map shows is expected.
erroneous resizing even moderately empty tables.
I assume this happens only when dictionary is small and the keys are integers. Once the dictionary gets much bigger than 8, the fact that we preserve runs of 8 in the shuffle will no longer matter.
So far I think the tradeoffs are reasonable, but let me think more about the alternatives.
from nonblocking.
I've removed the mixing method altogether and used the worst-case scenario for the hash values. I used consecutive hash codes from 0 to N. Considering this issue there is a hash conflict on every other insertion.
These are the benchmarks:
BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1702/22H2/2022Update/SunValley2)
12th Gen Intel Core i7-1260P, 1 CPU, 16 logical and 12 physical cores
.NET SDK=7.0.300-preview.23179.2
[Host] : .NET 6.0.16 (6.0.1623.17311), X64 RyuJIT AVX2
Job-XVWKVQ : .NET 6.0.16 (6.0.1623.17311), X64 RyuJIT AVX2
InvocationCount=1 UnrollFactor=1
Method | N | Mean | Error | StdDev | Median |
---|---|---|---|---|---|
Add_Concurrent | 1 | 1,136.4 ns | 95.26 ns | 279.4 ns | 1,100.0 ns |
Add_NonBlocking | 1 | 223.2 ns | 52.20 ns | 153.1 ns | 200.0 ns |
Add_Experimental | 1 | 199.0 ns | 53.69 ns | 157.5 ns | 200.0 ns |
Add_Concurrent | 10 | 1,527.6 ns | 91.93 ns | 268.2 ns | 1,500.0 ns |
Add_NonBlocking | 10 | 2,738.4 ns | 236.67 ns | 694.1 ns | 2,400.0 ns |
Add_Experimental | 10 | 2,470.2 ns | 220.29 ns | 628.5 ns | 2,200.0 ns |
Add_Concurrent | 100 | 9,119.8 ns | 186.31 ns | 522.4 ns | 9,000.0 ns |
Add_NonBlocking | 100 | 18,237.2 ns | 367.65 ns | 717.1 ns | 18,050.0 ns |
Add_Experimental | 100 | 13,853.6 ns | 282.75 ns | 820.3 ns | 13,700.0 ns |
Add_Concurrent | 1000 | 71,986.7 ns | 1,384.40 ns | 1,295.0 ns | 71,900.0 ns |
Add_NonBlocking | 1000 | 75,135.5 ns | 1,467.62 ns | 2,241.2 ns | 74,300.0 ns |
Add_Experimental | 1000 | 71,261.5 ns | 1,130.33 ns | 943.9 ns | 71,300.0 ns |
Add_Concurrent | 10000 | 794,375.0 ns | 11,790.12 ns | 9,205.0 ns | 796,700.0 ns |
Add_NonBlocking | 10000 | 1,029,300.0 ns | 20,234.87 ns | 24,850.2 ns | 1,023,650.0 ns |
Add_Experimental | 10000 | 909,333.3 ns | 8,171.22 ns | 7,643.4 ns | 907,200.0 ns |
The point I am attempting to make is this: mixing should probably be either improved or removed as redundant. As it is right now, it creates overhead without improving performance. Personally, I'd vote for removing it and enjoy an instant bump in performance.
from nonblocking.
Sorry for not getting to this earlier. It has been pretty busy time.
My current thinking is that we still need the mixing. It is correct that mixing is redundant when keys are already random or otherwise well-behaving. However a general-purpose hashtable also must avoid perf traps if keys are not well-behaved.
For example keys that are multiples of big ^2 numbers - like {1 * 0x100000, 2 * 0x100000, 3 * 0x100000, 4 * 0x100000, ...}
may hash poorly into a ^2 table without some mixing. A possibility that someone will use such keys is pretty real.
As a result we need to balance - we do some mixing to avoid degenerate behaviors, but not too much, so that we could benefit from some sequential locality. Since the typical CPU cache line is 64 to 128 bytes, preserving runs of 8, seems reasonable.
I understand the desire to be able to say "my keys are well-behaved, please do not mix" through some API or flag when the hashtable is created and enjoy the performance boost, but I can't think of an efficient way to support that.
from nonblocking.
@ENikS Are you planning to contribute the change to this? If not, I was thinking about looking into this too :)
from nonblocking.
I can add the code to my PR if the author has no objections. I personally like lowbias32 which performs consistently well according to this
from nonblocking.
Hmm, I was thinking along the lines of matching the hashing behavior to what standard libraryConcurrentDictionary<K, V>
does first and then evaluating general and resizing perf. characteristics. Arguably, you're more qualified on this topic but it might be a good start?
See:
- https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Collections/HashHelpers.cs
- https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentDictionary.cs#L251
- https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentDictionary.cs#L2222
from nonblocking.
The method you are referencing uses prime-sized arrays and MOD to calc buckets. Normally it is less sensitive to inconsistent hashing but slower than the power of two tables. With FastMod it might be different though.
I would be interested to see benchmarks
from nonblocking.
I started researching this implementation because I see it as a promising algorithm for Unity Container's new storage method. I trust Dr. Click that it is faster than MOD-based hashes and scales better. Unfortunately, all the benchmarks I am running are quite disappointing. In theory, it should be much faster, but as you can see from benchmarks, it is not the case.
from nonblocking.
Related Issues (10)
- CoreCLR Friendly? HOT 1
- Does it allocate for each indexing operation? HOT 1
- Consider specialcasing struct keys that fit into int/long/nint HOT 5
- Consecutive hashes create conflicts
- Make and publish a nuget for this thing. HOT 1
- Performance comparison HOT 1
- Many compile error HOT 2
- Performance of your solution HOT 15
- Tests should use threads, not tasks HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nonblocking.