What OS are you using (uname -a
, or Windows version)?
Linux hzscn008 5.10.27-051027-generic #202103310028 SMP Thu Apr 1 02:16:48 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
What programming language are you using (C/C++/Go/Rust)?
C++
g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
What did you expect to see and what you saw instead?
I've tested your murmurhash.c, which implements the Murmur Hash algorithm. My testing approach involves converting IPv4 addresses to 32-bit integers. After applying the hash function and modulo operator, I place them into separate buckets. Subsequently, I perform statistical analysis on the bucket counts and check for deviations.
Here's my test code,save it as test_murmur.cpp,
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include "murmurhash.h"
#include <sys/socket.h>
#include <arpa/inet.h>
#include <map>
#include <iostream>
#include <numeric>
#include <algorithm>
int main (int argc, char **argv)
{
if (argc != 2)
{
printf("Usage: %s buckets_num\n", argv[0]);
return 1;
}
int port_num = atoi(argv[1]);
struct in_addr netAddr;
const char *ip_address_str = "192.168.8.1" ;
if (inet_pton(AF_INET, ip_address_str, &netAddr) <= 0) {
perror("inet_pton");
return 1;
}
uint32_t start = ntohl(netAddr.s_addr);
uint32_t hash_value = 0;
char *ip = NULL;
std::map<uint32_t, uint32_t> myMap;
uint32_t port = 0;
for (uint32_t i = start; i< start+100000;i++)
{
hash_value = murmurhash((const char *)&i, sizeof(i), 0x8edd);
port = hash_value % port_num;
myMap[port]++;
}
//get max
auto it_max = std::max_element(myMap.begin(), myMap.end(), [](const auto& lhs, const auto& rhs) {
return lhs.second < rhs.second;
});
//get min
auto it_min = std::min_element(myMap.begin(), myMap.end(), [](const auto& lhs, const auto& rhs) {
return lhs.second < rhs.second;
});
// get total
int sum = std::accumulate(myMap.begin(), myMap.end(), 0, [](int acc, const auto& p) {
return acc + p.second;
});
size_t size = myMap.size();
// get average
double average = sum / size;
std::cout <<"positive deviation: " << (it_max->second- average)/average*100 << "%" <<std::endl;
std::cout <<"negative deviation: " << (it_min->second- average)/average*100 << "%" <<std::endl;
return 0;
}
To compile:
g++ -o test_murmur test_murmur.cpp murmurhash.c -std=c++17
To run:
./test_murmur 512 # (buckets number)
Since the Murmur algorithm has passed chi-squared and avalanche tests, I initially assumed that the counts of each bucket would be almost the same, resulting in a deviation close to zero. However, as the total number of buckets increases, the deviation becomes significantly larger. For instance, when the number of buckets is set to 512, the deviation is as follows:
Positive deviation: 18.9744%
Negative deviation: -21.0256%
Do you have any insights on my test results? I don't believe the deviation is acceptable. What do you think? Thanks in advance .