Coder Social home page Coder Social logo

smhasher's People

Contributors

aappleby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

smhasher's Issues

[Bug] in SHA1_Transform - const uint8_t buffer's data is modified

Hi,

When calculating SHA-1 hashes of a block of data, SHA1_Update is called with a pointer to const uint8_t data.

void SHA1_Update(SHA1_CTX* context, const uint8_t* data, const size_t len)

Unfortunately, the constness of data is casted away, and the contents of data are modified during the SHA1_Transform function.

An example, using sha1.h, and also sha1.cpp's digest_to_hex:

#include <array>
#include <string>
#include <iostream>
#include "sha1.h"
void digest_to_hex(const uint8_t digest[SHA1_DIGEST_SIZE], char *output);

std::string getHash(const std::string &buffer) {
	using namespace std;
	SHA1_CTX context;
	array<uint8_t, 20> digest;
	array<char, 80> outputHash;
	SHA1_Init(&context);
	SHA1_Update(&context, reinterpret_cast<const uint8_t*>(buffer.data()), buffer.size());
	SHA1_Final(&context, digest.data());
	digest_to_hex(digest.data(), outputHash.data());
	string sHash;
	sHash += outputHash.data();
	return sHash;
}
int main(void) {
	using namespace std;
	string test_buffer(1000, 'x');
	auto hash1 = getHash(test_buffer);
	auto hash2 = getHash(test_buffer);
	auto hash3 = getHash(test_buffer);
	cout << hash1 << endl << hash2 << endl << hash3 <<endl;
}

hash1!=hash2!=hash3, and test_buffer is changed every time its SHA-1 hash is calculated.

This is fixed in other repositories, based off of the same original code (ex.: http://download.redis.io/redis-stable/src/sha1.c).

Hash chain cycle length patterning of 32 bit murmur3 hash

This is a very very simple test, but I have determined the maximum length one can start from a seed value, and hash the seed, then hash the hash of the seed, etc, and in a small sampling so far (about 25 runs) I have found that 3432278346 repetitions of this consistently produces a cycle. Occasionally it finds a cycle of 10, and one time out of 5 or so occurrances of 10 cycles I found a 9 cycle.

I am currently exploring the idea of a hashchain cycle search as a PoW algorithm, and I will be implementing a simple search that simply searches for the first hash in the chain repeating in the chain, and I'll get statistics for the cycle length frequency that shows different numbers. I'm not sure if this means the hash is weak, or anything, but I think the repetition patterns based on cryptographically random seeds is useful information. I will update once I have written the repetition cycle search and have some numbers for you.

Wrote a constexpr MurmurHash3 function

If of any interest I wrote a 32bit constexpr MurmurHash3 implementation that 
calculates the hash value at compile time:

https://gist.github.com/mattyclarkson/5318077

Original issue reported on code.google.com by [email protected] on 20 May 2013 at 9:21

murmurhash3 128bit gives wrong result on big-endian platform

string "str_hash_mmh"

should give mmh3_128 value

EBC36FB5EAEF57B81E12217B6E10C8B1 (x64 Linux/Solaris/Windows/etc)

while for big-endian machines it gives

B857EFEAB56FC3EBB1C8106E7B21121E (SPARC Solaris)


Original issue reported on code.google.com by alexandr.ustinov on 5 Dec 2014 at 4:35

missing license

The murmurhash code is open-source, but lacking a proper license.
"Public domain is not a license", as they say. 
<http://www.linuxjournal.com/article/6225>

Original issue reported on code.google.com by [email protected] on 14 Mar 2014 at 6:21

Murmur3A on CentOS 6 x86_64 fails Window Tests

SMHasher is reporting that Murmur3A is failing on CentOS 6 x86_64.

The standard development environment was used
(gcc 4.4.6, cmake 2.6-patch 4, GNU make 3.81)

Expected output is successful window tests, similar to 
http://code.google.com/p/smhasher/wiki/MurmurHash3_x86_32 

Instead, 'SMHasher Murmur3A' is reporting that window tests are failing.
Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per 
test
Window at   0 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
Window at   1 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
Window at   2 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
[...]
Window at  64 - Testing collisions   - Expected   128.00, actual 1048575.00 
(8192.00x) !!!!! 
*********FAIL*********

Repeatable on multiple systems, using SVN version 145, CentOS 6.2, 
2.6.32-220.13.1.el6.x86_64

SMHasher results attached.

Original issue reported on code.google.com by [email protected] on 2 May 2012 at 3:23

Attachments:

MurmurHash3_x64_128 reads past end of key buffer

o it calculates the buffer size in quadwords, rounded down
  nblocks = len/16;  

o then the body loop iterates i up to nblocks-1:
  for(int i = 0; i < nblocks; i++)

o Inside the loop, these accesses happen:
  uint64_t k1 = getblock(blocks,i*2+0);
  uint64_t k2 = getblock(blocks,i*2+1);

o "blocks" is a qword pointer and the getblock calls are equivalent to
  blocks[i*2]  and   blocks[i*2+1]

o So these qword accesses will occur up to qword offset:
  (nblocks-1)*2+1
  ...almost twice the caller's buffer size.





Original issue reported on code.google.com by [email protected] on 2 Apr 2013 at 6:02

Probably bug in KeysetTest.cpp

Trying to port C++ implementation to C# I found a funny bug that confused me 
when I compared the hashes between both implementations.

The hard coded value for the MurMurHash3_x86_32 (end probably for the rest of 
methods) is not correct because of the bug in KeysetTest.cpp.

The problem is in the following piece of code (lines 28-33):

 for(int i = 0; i < 256; i++)
  {
    key[i] = (uint8_t)i;

    hash(key,i,256-i,&hashes[i*hashbytes]);
  }

At first iteration the first element of key array is set to "0" and the length 
of the key array is 1, but as far as i is sent to hash function as len 
parameter, it is set to 0.

The correct code should look like:

    hash(key,i+1,256-i,&hashes[i*hashbytes]);

and the resulting hash is 0xDBFB92BE.

Original issue reported on code.google.com by [email protected] on 28 Feb 2013 at 4:09

Minor changes to allow compiling MurmurHash3.cpp as C code.

Compiling MurmurHash3.cpp as C code instead of C++ outputs some errors that are 
trivially fixed.

The attached diff makes two changes.

Added the suffixes 32 and 64 to getblock and fmix since C does not do function 
overloading. Updated function calls sites to use the correct one.

Changed "uint64_t(tail[14])" style casts to "((uint64_t)tail[14])" in the 
function MurmurHash3_x64_128.

Original issue reported on code.google.com by [email protected] on 9 Jul 2012 at 2:53

Attachments:

New Fork - demerphq/smhasher

My apologies -- I was unaware of this version of the code, and I have produced a version of SMHasher forked from the google code repo version, with many changes and improvements (like using g-test's for stats), and properly testing the seeding of hash functions with seeds larger than 32 bits.

You can find it at https://github.com/demerphq/smhasher

I am not issuing a pull request as my version has MANY changes compared to yours at this point.

Should there be relevant changes I would like to merge in any changes from your version to mine if that is ok. Obviously that will have to be done manually.

Please reach out to me if you are interested.

Thanks,
yves

Is there a MurmurHash3_x64_64?

Hi,

The benchmarks: http://code.google.com/p/smhasher/wiki/MurmurHash3 show a 
MurmurHash2_x64_64 which is fast but not recommended. But i wonder, is there a 
MurmurHash3_x64_64? I'm guessing such a function would be about the same as 
MurmurHash3_x64_128 only faster since it just needs 1 64 bit int.

Kind regards,
Mark

Original issue reported on code.google.com by [email protected] on 16 Feb 2012 at 3:12

[Flaw/Bug in MurmurHash3_x86_128] Collisions within component hashes for small keys - related to shared seed

I noticed something alarming when generating hashes for keys under 16 bytes using the x86_128 MurmurHash3 function that uses 32-bit operations. h2, h3 and h4 have collisions at a 100% rate.

This doesn't affect MurmurHash3_x64_128 but in my case I can't do 64-bit operations.

Look:

seed = 1, data = {}
h1 = 2294590956, h2 = 1423049145 h3 = 1423049145, h4 = 1423049145
BAD.. h2 == h3 == h4

seed = 0, data = {0}
h1 = 2294590956, h2 = 1423049145 h3 = 1423049145, h4 = 1423049145
BAD.. h2 == h3 == h4

seed = 0, data = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
h1 = 1514173910, h2 = 1798126887, h3 = 3549588527, h4 = 3944909319
OK

seed = 0, data = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
h1 = 894685359, h2 = 2425853539, h3 = 2425853539, h4 = 2425853539
BAD.. h2 == h3 == h4

seed = 253, data = {254,254,254,254,254,254,254,254}
h1 = 3510856102, h2 = 544881857, h3 = 3491786221, h4 = 3491786221
BAD.. h3 == h4

It seems this happens because h1-h4 is initialized exactly the same. The issue is mitigated by XORing the seed like so:

  uint32_t h1 = seed ^ c1;
  uint32_t h2 = seed ^ c2;
  uint32_t h3 = seed ^ c3;
  uint32_t h4 = seed ^ c4;

But I am not a mathematician, so don't know if that modification compromises the hash quality.

MurmurHash3_x86_32 Algorithm Does Not match the one in Wikipedia

Your MurmurHash3 implementation uses the following tail code:

  switch(len & 3)
  {
  case 3: k1 ^= tail[2] << 16;
  case 2: k1 ^= tail[1] << 8;
  case 1: k1 ^= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };


But this does match the (otherwise identical) MurmurHash algorithm in Wikipedia:


    with any remainingBytesInKey
        remainingBytes <- SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.

        remainingBytes <- remainingBytes * c1
        remainingBytes <- (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes <- remainingBytes * c2

        hash <- hash XOR remainingBytes


To make the two versions match, you'd have to change the bitwise-xor in your 
implementation to bitwise-or, as follows:

  switch(len & 3)
  {
  case 3: k1 |= tail[2] << 16;
  case 2: k1 |= tail[1] << 8;
  case 1: k1 |= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };



I'm just wondering which implementation is correct.  I can't figure out where 
the canonical implementation of the algorithm is.


Original issue reported on code.google.com by [email protected] on 17 Aug 2013 at 7:07

Stable release with tag?

Currently there seems to be only the main branch, could we have some stable release tags as well?

SHA1 Sanity does not agree with Node.js SHA1.

Perhaps a bug in my code. I'm not able to create a sanity test that will 
generate the SHA1 result from the SHA1 in SMHasher using the sha1 implemented 
by Node.js. I'll create a test, but offhand I'm wondering if you're aware of 
any discrepancies between your SHA1 and the OpenSSL SHA1?


Original issue reported on code.google.com by [email protected] on 16 Aug 2013 at 3:47

Quality Score

While researching various hash algorithms, I constantly see Q-score or Quality Score -- mostly 10/10.

After downloading, compiling and running SMHasher, I cannot find a reference to this score or how to compute such. I am also failing to find any description online.

Can you point me at such a description?

City.h case

In City.cpp, change #include "city.h" to #include "City.h"

Otherwise it doesn't compile in *nix.

Original issue reported on code.google.com by [email protected] on 12 Apr 2011 at 6:14

Clarify the units in MH3 wiki performance section

The Wiki page indicates that the performance for MurmurHash3_x64_128 is 5058 mb/sec. This is less than one byte per second. In units, case matters. The prefix m is 10^-3 and a lowercase b is bits.

I'm not just being pedantic: I don't know what those numbers actually mean. Is that supposed to be megabytes (MB = 8 * 10^6 bits) or megabits (Mb = 10^6 bits)? Both are pretty common when talking about streams. Or are those actually mebibytes (MiB = 8 * 2^20 bits) or mebibits (Mib = 2^20 bits)?

I would guess that the correct number is actually 5058 MB/s, 5058 Mb/s or 5058 MiB/s.

There are some other places where the units are slightly off but still clear, but it's important to differentiate upper– and lowercase with M/m and B/b, and it's important to use the IEEE prefixes Mi, ki, Gi if you're using binary units instead of decimal. I would be happy to make a pull request that corrects the units if you let me know which ones they actually are. πŸ˜„

ppc64le port of smhasher.

Hi,

I am working on porting smhasher to Power8/LE (ppc64le) platform. I have done necessary code changes and carried out tests to ensure that there are no regressions on ppc64le. I will create a PR soon for these changes. Requesting smhasher community to review the changes.

Thanks,
Atul.

SmHasher breaks without -fno-strict-aliasing

A change in revision 145 caused MurmurHash3 to start failing the "keyset 
'Window' tests" with:

  Keyset 'Windowed' -  64-bit key,  20-bit window - 64 tests, 1048576 keys per test
  Window at   0 - Testing collisions   - Expected   128.00, actual 1048575.00 (8192.00x) !!!!!
  Window at   1 - Testing collisions   - Expected   128.00, actual 1048575.00 (8192.00x) !!!!!
  ... etc ...

I applies the r145 changes one by one to r144 and found that the removal of the 
following line from CMakeLists.txt is the cause.

  set(CMAKE_CXX_FLAGS "-g -fno-strict-aliasing -Wall")

After about an hour I've come to the conclusion that strict-aliasing is complex 
:). GCC has __attribute__((__may_alias__)) that I believe should allow 
strict-aliasing to be enabled project wide but disabled in the places that 
cause the problem, but I can't get it to work.

Adding the line back into CMakeLists.txt fixes the problem, but the logs say it 
was removed to stop Visual Studio from complaining. Perhaps that problem can be 
solved by some other means.

(I'm using Debian-6 with GCC 4.4.5)

Shane Day

Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 4:33

Lots of collisions with triplets of integers

I was interested in using a hash for integer arrays with few numbers in them.
So I searched for any collisions in a very sample case: unique triplets of 
numbers of the form (i,j,k) , where 1<= i< j < k <=255. The hash is calculated 
from the simple uint32_t array made with those 3 numbers. There are dozens of 
repetitions!

What steps will reproduce the problem?
-Run the enclosed C++ file
See the attached "sample output" for identified collisions.

Am I doing something wrong?

Original issue reported on code.google.com by [email protected] on 10 Mar 2012 at 11:10

Attachments:

Fixes to City.h to to allow inclusion in third-party sources

City.h shouldn't include Platform.h, as that can lead to compile errors when 
City.h is included in other third-party software (such as Chromium: 
https://code.google.com/p/chromium/issues/detail?id=353157).

The attached patch should fix the issue, in a way analogous to what was done in 
r151 for the MurmurHash functions.  If this looks good to you could you please 
apply it?

Original issue reported on code.google.com by [email protected] on 24 Mar 2014 at 11:23

Attachments:

Low 32-bits of seed data is discarded for h2 in MurmurHash64B [MurmurHash2_x86_64]

The MurmurHash64B function of MurmurHash2 specifies a 64-bit seed like so:

uint64_t MurmurHash64B ( const void * key, int len, uint64_t seed )

Then proceeds to shift the high 32-bits over to initialize h2:

uint32_t h2 = uint32_t(seed >> 32);

This means that no matter what the seed is, if seed < 0x100000000, then h2 will always be initialized to 0. Effectively making h1 and h2 use two different 32-bit seed components split from a 64-bit integer.

Is there a sound reason that h2 cannot use the same seed as h1 (even a transformation)?

To implement this function without support for 64-bit integer shifts, the arguments must be equivalent to uint32_t seed1 and uint32_t seed2 instead of just one uint32_t seed.

Otherwise (seed >> 32) === seed and the hash value will not match, making it not fully 32-bit compatible (using JavaScript port which uses 32-bit shifts).


If I had to choose how h2 was initialized, it would be like so:

uint32_t h2 = uint32_t(seed) ^ len ^ m;

Then change the seed size to 32-bit in the arguments because it is no longer needing to be 64-bits:

uint64_t MurmurHash64B ( const void * key, int len, uint32_t seed )

Also, seems this 64-bit hash has collision probability of 32-bit hash in some cases - can't that be fixed by simply XORing h1 ^= h2 and h2 ^= h1 between each read in the mixing function? Seems fast in JS, but is that too expensive in 32-bit machines?

[Question]How can i use murmur3 to reshuffle a same dataset for many times

Can i use different seed for every time reshuffling to do that?
Is the every time reshuffling expected to be well-distributed(or maybe depends on the seed)?

If i can:
Comparing to the solution that appending different salt string to the key to get a reshuffling,
the way that changing the seed has any advantage(disadvantage)?

Very slow with gcc 4.4.4

It's slow in my machine. A Slackware 13.1 box with a Core i5 650 3.2 GHz.

It's only 500 MB/s, where OpenSSL MD5 is 600 MB/s.

The reason is the (broken) optimizer of gcc 4.4.4 that does things like:

g++ -g -c -O2 MurmurHash3.cpp -o MurmurHash3.o

-> .cpp
    k1 *= c1;
-> .asm
 152:   8d 14 80                lea    (%eax,%eax,4),%edx
 155:   8d 14 90                lea    (%eax,%edx,4),%edx
 158:   c1 e2 03                shl    $0x3,%edx
 15b:   29 c2                   sub    %eax,%edx
 15d:   8d 14 d2                lea    (%edx,%edx,8),%edx
 160:   8d 14 90                lea    (%eax,%edx,4),%edx
 163:   8d 14 d0                lea    (%eax,%edx,8),%edx
 166:   8d 14 90                lea    (%eax,%edx,4),%edx
 169:   8d 14 50                lea    (%eax,%edx,2),%edx
 16c:   8d 14 90                lea    (%eax,%edx,4),%edx
 16f:   8d 14 92                lea    (%edx,%edx,4),%edx
 172:   8d 14 50                lea    (%eax,%edx,2),%edx
 175:   8d 04 d0                lea    (%eax,%edx,8),%eax
 178:   8d 14 c5 00 00 00 00    lea    0x0(,%eax,8),%edx
 17f:   29 d0                   sub    %edx,%eax

changing the c1,c2,c3,c4 constants as global variables, fixes that, and I get 
3000 MB/s.



Original issue reported on code.google.com by [email protected] on 12 Apr 2011 at 6:31

More hashes

Adds:
* One-At-A-Time
* Lookup2
* Python's hash function

Also renames FNV to FNV-1a because the implemented FNV hash is the 1a variant.  
Adds FNV-1 so it can be compared to the 1a variant.

Original issue reported on code.google.com by [email protected] on 6 Jul 2014 at 6:41

Attachments:

Make inline Functions static

What steps will reproduce the problem?
1. Rename Murmurhash3.cpp to Murmurhash3.c
2. Import Murmurhash3.h and Murmurhash3.c into Xcode
3. Attempt to compile it into a project.

What is the expected output? What do you see instead?

Should work, but to errors:

Undefined symbols for architecture i386:
  "_rotl32", referenced from:
      _MurmurHash3_x86_32 in MurmurHash3.o
      _MurmurHash3_x86_128 in MurmurHash3.o
  "_rotl64", referenced from:
      _MurmurHash3_x64_128 in MurmurHash3.o
ld: symbol(s) not found for architecture i386
clang: error: linker command failed with exit code 1 (use -v to see invocation)

What version of the product are you using? On what operating system?

Copied from Subversion today (2014-05-05). Xcode 5.1.1.

Please provide any additional information below.

Was able to get it to work by making those functions static:

Index: MurmurHash3.cpp
===================================================================
--- MurmurHash3.cpp (revision 152)
+++ MurmurHash3.cpp (working copy)
@@ -31,12 +31,12 @@

 #define    FORCE_INLINE inline __attribute__((always_inline))

-inline uint32_t rotl32 ( uint32_t x, int8_t r )
+static inline uint32_t rotl32 ( uint32_t x, int8_t r )
 {
   return (x << r) | (x >> (32 - r));
 }

-inline uint64_t rotl64 ( uint64_t x, int8_t r )
+static inline uint64_t rotl64 ( uint64_t x, int8_t r )
 {
   return (x << r) | (x >> (64 - r));
 }

Original issue reported on code.google.com by [email protected] on 5 May 2014 at 10:22

MSVC2010 compilation.

Compilation of MurmurHash3.cpp under MSVC2010 failed with error "redefinition 
of uint32_t...".
I changed line "#if defined(_MSC_VER)" to "#if defined(_MSC_VER) && (_MSC_VER 
<= 1500)" and it worked for me.
Thank you!

Original issue reported on code.google.com by [email protected] on 20 Dec 2012 at 10:03

[PATCH] Add missing `inline' keyword to FORCE_INLINE

More recent versions of gcc (such as 4.7.0) have started warning that functions 
using the FORCE_INLINE attribute could not always be inlined.

In fact, __attribute__((always_inline)) is supposed to be used together with 
`inline' instead of replace it.

The attached patch (verified to work with gcc 4.8.0 and clang 3.2) adds the 
missing keyword.

Original issue reported on code.google.com by [email protected] on 18 Apr 2013 at 12:05

Attachments:

MurmurHash3 collision property

Hi,

I have to choose a hash function for a Bloom Filter in my Bachelor's thesis.
As recommended in some tutorials I used a version of the MurmurHash3.
My supervisor wants me to find the value of the collision property but I cannot find the place of the documentation.
I have read that it is low but not a real value.
If someone could help to find the collision property, I'd be very greatful.

Thanks,
Julia

Class similar to CMurmurHash2A, using the MurmurHash3 algorithm?

Is there a class similar to CMurmurHash2A, using the MurmurHash3 algorithm, or 
can you explain breifly how it might be produced? I imagine it needs 3 state 
fields, namely h1, c1, and c2. I previously converted CMurmurHash2A to a struct 
in C#, and plan to do the same for the new algorithm. 

Thank you, Frank Hileman

Original issue reported on code.google.com by [email protected] on 8 Jan 2011 at 5:46

How do you compile this thing

The readme doesn't explain how to use/compile this. I am not intimately familiar with compilers, only know how to compile with MinGW using one or more files like so:

g++ -o out file1 file2 file3

I even tried grabbing CMake but I have no clue how to use it:
image

Help a brother out. I need to test hashes.

Should MurmurHash3_x64_128 take a uint64_t seed parameter instead?

The 32-bit seed parameter is used to initialize the two 64-bit h1 and h2 variables. Would it make sense (and would it make any qualitative difference) to allow a 64-bit seed value instead?

void MurmurHash3_x64_128 ( const void * key, const int len, const uint32_t seed, void * out )

[...]

uint64_t h1 = seed;
uint64_t h2 = seed;

[...]

Thank you!

Improper handling of negative seeds

What steps will reproduce the problem?
1. Supply a negative seed value when constructing an instance of MurmurHash3
2. Compute a hash
3. Get a different hash value than the reference implementation provides for 
the same input and seed



Please provide any additional information below.

Below is the current constructor for when a seed is supplied.

   public Murmur3F(int seed)
   {
      this.seed = seed & (0xffffffffL); // unsigned
      h1 = seed;
      h2 = seed;
   }

Here is the code from the reference implementation:

void MurmurHash3_x64_128 ( const void * key, const int len,
                           const uint32_t seed, void * out )
{
  const uint8_t * data = (const uint8_t*)key;
  const int nblocks = len / 16;

  uint64_t h1 = seed;
  uint64_t h2 = seed;


Note that h1 and h2 are unsigned 64-bit integers. Because they are unsigned, 
they will not inherit the sign of the supplied seed. However, in the 
corresponding java code, h1 and h2 will become negative if the seed is negative 
(e.g. a value greater than Integer.MAX_VALUE).

A minor modification to the constructor resolves the issue, and makes the 
output congrument with the reference implementation for negative seed values:

   public Murmur3F(int seed)
   {
      this.seed = seed & (0xffffffffL); // unsigned
      h1 = this.seed;
      h2 = this.seed;
   }

Original issue reported on code.google.com by [email protected] on 31 Dec 2014 at 3:39

PMurHash.c on Solaris

On Solaris, with native cc, _BIG_ENDIAN or _LITTLE_ENDIAN is #define'd, but 
just #define'd; there is no numeric value. This causes tests like 
(_BIG_ENDIAN==1) (PMurHash.c:96) to fail with a "missing operand".

Perhaps choose BIG if _BIG_ENDIAN is defined and _LITTLE_ENDIAN is not defined, 
and vice versa. Using (_BIG_ENDIAN+0==1) should get around the syntax problem 
for that part of the test.

Original issue reported on code.google.com by [email protected] on 15 Oct 2014 at 8:48

Lacking 3 MurmurHash3 variations

What steps will reproduce the problem?
Read the code.

What is the expected output? What do you see instead?
In MurmurHash3.h there are all 6 variations of MurmurHash3. However in 
MurmurHash3.cpp, there are only 3 implementations.

What version of the product are you using? On what operating system?
SVN repository by 4/6/2011. Revision 125.

Please provide any additional information below.
In project homepage, update on 4/2/2011 states that "MurmurHash3, all versions, 
is final." Is this suggesting that the other 3 variations are on the way? Will 
be the package that contains all 6 variation released for downloading?

Original issue reported on code.google.com by [email protected] on 8 Apr 2011 at 3:34

Please provided basic test vectors

Would it be possible to have some basic test vectors for testing alternate 
implementations ?

Even a simple single one, like the hash of "The quick brown fox jumps over the 
lazy dog" for all the algorithm flavor would be enough.

I found a lot of wrong implementations of Murmur3 and a such simple test vector 
from the official site will help a lot the developers.


Original issue reported on code.google.com by [email protected] on 30 Apr 2011 at 2:06

License File Missing ;)

It would be nice if a license file would be included, for MurmurHash3 and Co, 
that we can package with our binary distribution of our software. I intend to 
use the Hash function, which seems to be ok, given the MIT license.

Original issue reported on code.google.com by [email protected] on 26 Jan 2011 at 12:54

[Suggest]Why not use git submodule?

Currently we included several hash source code into our project. This is difficult to maintain. Sometimes the author updated the hash project to get a better performance, but we didn't notice that.

Why don't we turn this project into a main test suit with several git submodule for other hash functions?

16-bit version?

Hello,

Is it possible to create a version of MurmurHash where the result and intermediate values are 16 bits?

My intended application would be small, typically 8-bit, microcontrollers using small keysets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.