lemire / clhash Goto Github PK

View Code? Open in Web Editor NEW

262.0 15.0 28.0 29 KB

C library implementing the ridiculously fast CLHash hashing function

License: Apache License 2.0

C 84.55% Makefile 4.68% C++ 10.78%

hashing hash

clhash's Introduction

Hi there 👋

I am a computer science professor in the Data Science Laboratory at the Université du Québec (TÉLUQ).

📰 I write about software performance weekly on my blog.
📫 I can be reached on twitter/X.
📺 I gave a best voted talk at QCon San Francisco.
📄 I wrote over 80 research papers including over 50 journal articles, cited over 5,000 times.
📖 I am editor of the Software: Practice and Experience journal (Wiley); it was founded in 1971.

clhash's People

Contributors

Stargazers

Watchers

clhash's Issues

Port to ARM NEON

clhash is 3x slower than std::hash in Mac OS and 2x slower than std::hash in Linux

Hi Daniel,

I write a simple benchmark using https://github.com/google/benchmark and my benchmark results show that clhash is 3 times slower than std::hash in my MacBook Pro and 2 times slower than std::hash in Centos 6 with Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz. You can find my benchmark and CMakeLists.txt files from my GitHub folder https://github.com/hungptit/clhash/tree/convert.to.header.only.library/benchmark.

Do you have any suggestion?

Regards,
Hung

Benchmark results obtained in my MacBook Pro

./basic_benchmark
2018-10-25 22:57:00
Running ./basic_benchmark
Run on (8 X 2200 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 262K (x4)
  L3 Unified 6291K (x1)
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
std_hash_string          5 ns          5 ns  138334453
clhash_string           17 ns         17 ns   36479233

Benchmark results obtained in Centos with gcc-5.5

2018-10-25 23:08:44
Running ./basic_benchmark
Run on (88 X 2199.89 MHz CPU s)
CPU Caches:
  L1 Data 32K (x44)
  L1 Instruction 32K (x44)
  L2 Unified 256K (x44)
  L3 Unified 56320K (x2)
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
std_hash_string         12 ns         12 ns   60746327
clhash_string           23 ns         23 ns   30505677

reserved identifier violation

I would like to point out that an identifier like "__clmulhalfscalarproductwithoutreduction" does eventually not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?

Add a streaming interface

I just did a short test, intel skylake i7, gcc 5.4. Only modified your example.c a bit like this:

#include <assert.h>

#include "clhash.h"

int main() {
    void * random =  get_random_key_for_clhash(UINT64_C(0x23a23cf5033c3c81),UINT64_C(0xb3816f6a2c68e530));
    int i;
    for (i = 0; i < 10000000; i++)
    {
    uint64_t hashvalue1 = clhash(random,"my dog0123456789", 12);
    uint64_t hashvalue2 = clhash(random,"my cat0123456789", 12);
    uint64_t hashvalue3 = clhash(random,"my dog0123456789", 12);
    }
    // assert(hashvalue1 == hashvalue3);
    // assert(hashvalue1 != hashvalue2);// very likely to be true
    free(random);
    return 0;
}

Tested for string size 6, 7, 8, 12. For 8 byte key I get this:

make example

time ./example 
real	0m0.152s
user	0m0.148s
sys	0m0.000s

For the other sizes something like

$ time ./example 
real	0m0.493s
user	0m0.492s
sys	0m0.000s

So for really high performance, we are supposed to pad our data and use multiples of 8 for size?

From figure 1 in your paper I had the impression that its would work smooth fast for sizes >= 8 at least.

https://arxiv.org/abs/1503.03465

May there be a bug in recent code?

lemire / clhash Goto Github PK

clhash's Introduction

Hi there 👋

clhash's People

Contributors

Stargazers

Watchers

Forkers

clhash's Issues

Port to ARM NEON

clhash is 3x slower than std::hash in Mac OS and 2x slower than std::hash in Linux

reserved identifier violation

Add a streaming interface

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent