Coder Social home page Coder Social logo

Secure hash lookups about highwayhash HOT 22 CLOSED

google avatar google commented on May 14, 2024
Secure hash lookups

from highwayhash.

Comments (22)

dgryski avatar dgryski commented on May 14, 2024 2

@rurban Let's keep in mind that this repository is for a single hash function, not a hash table implementation claiming to be sure against all attacks. The best you can hope for in this repo is a doc fix to the readme to clarify what security using highway hash provides.

Second, your threat model appears to include "attacker has the seed". Very few cryptographic systems are secure against a known-key attack, so that's a bit of a silly claim.

Please submit a PR with your proposed documentation fix for the Highway Hash README.

Otherwise, I think you're attacking the wrong project.

from highwayhash.

funny-falcon avatar funny-falcon commented on May 14, 2024 2

+1 for README changes.

Note for my suggestion for "unbalanced tree":

  • I meant, chained hash table could use unbalanced-tree instead of simple chaining. So, it is not switching from "hash-table" to "tree", but use of "unbalanced-tree" for collision resolution in a hash table.

from highwayhash.

rurban avatar rurban commented on May 14, 2024 1
  • How can you still claim that the hash function has anything to do with seed detection? It has nothing. There exist bad hash function, for which collisions can be found independent of the seed, but for normal hash functions not.

You can use a very slow hash function, like scrypt, which makes naive seed detection impractical, but also makes a hash table impracticle.
You can use a cryptographically strong hash function, like AES, and still get at the seed and still can brute force collisions at ease.

  • Strong hash functions do not exist for hash tables. Strong hashing starts with 128 bit minimum. Hash tables use 32bit , and some 64bit. An attack needs 10-16bit typically, max 20bit which is still easily brute-forcable.

  • Simple seed detection is outlined in #28. The old timing attack is primitive and not practical. But practical seed exposure exists as outlined. And advanced attacks exists also, not published, but hinted at with examples.

  • A hash function can never be part of the security of a hash table. The reverse is true, an insecure hash function, like fast xor-add or mult schemes or CRC32, can lead to insecure hash tables. But secure hash functions do not lead to secure hash tables.

First of all you want to protect your seed, i.e. don't print it to the user, randomize the ordering of the collisions or iterator, harden the memory location of the seed if the target is a dynamic language.

But more important is to protect your collision resolution scheme from constant to becoming linear (was: quadratic). Chaining via linked lists is outdated 90ies technology and cache unfriendly. It is the slowest hash table. e.g. even keeping arrays for buckets is faster. Knuth mentioned ordered tables, but this never got traction, even if normal collisions for fast functions gather around max 5-8, and this is trivially insertion sortable. Trees neither.
For security simply count your collisions. Since the max collisions are on average log n and the max size is 32bit (sometimes 64bit), the max collisions can be hardcoded to 100. Not a single hash table usage will exceed that besides an attack. So either kill it like djb did in his dns server or add a small sleep and log the attack as I did in cperl so that other services can respond. And maybe add a fail2ban rule.
Also for modern hash tables with open addressing and either linear probing, quadratic probing, cuckoo hashing, robin hood, hopscotch or preshing's leapfrog just count it. Double hashing with two independent random functions is secure by default, but not as fast as linear, cuckoo or robin hood.

With a proper collision resolution scheme and without any siphash security theatre a hash table can become a fast hash table again. The worst case, an attack, is trivially detectable, and you should not harm your average or fast case for the unlikely event of the worst case by using a slow, big hash function which does not add any security.

It should also be noted that writing a secure hash table is not easy, many mistakes had been made over the decades, and there rarely exist good hash tables other than in the top league: linux or glibc.
Most implementations are insecure. Most implementations are overly slow. Most optimizations are unsound. Many simple mistakes have been made, only detected years after.
Why are they not attacked? Because a DoS attack on a hash table is not so easy anymore as it was a few years ago, it's just a stupid denial of service, and there exist much easier and more trivial methods to perform that.

from highwayhash.

funny-falcon avatar funny-falcon commented on May 14, 2024 1

You're right, resolution scheme matters:

  • double hashing could be done with single calculation of SipHash: use low 32bits as first hash and upper 32 bits as second.
  • bucketized double hashing could be as efficient as linear probing in average (i did it already, i know what I'm talking about).
  • cuckoo hash could also use sibngle SipHash calculation for both position, and bucketized cuckoo hash allows huge guaranteed fillrate (>90% for buckets of 4 entries). ( i did it already too, it is also quite efficient ).

So, it is quite simple to build safe hash table if you have safe hash function. It doesn't really cost all those noise you produce, @rurban.

One the other hand, I still believe, that "hash flood resilent" function could be much simpler and faster than SipHash.

For example, Microsoft has Marvel32 which operates with 2*32bit state instead of 4*64bit, and has comparable performance and provides comparable protection. 2*64 bit version will be certainly faster than SipHash, cause it will consume 8bytes at round instead of 4. But, unfortunately, Microsoft covers it with patent. So one can safely use it only as a part of .Net and .Net Core. ( @dgryski be aware of it ).

from highwayhash.

dgryski avatar dgryski commented on May 14, 2024 1

Thanks for the heads up. I'll add a disclaimer to my repository.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024 1

Please see the now edited OP for the newly proposed section. In particular, there are concrete proposals how to "avoid linear search", going beyond "kill it" or "sleep and log" (which might be counterproductive). Also note that the most theoretically sound collision resolution scheme mentioned so far, augmented cuckoo hashing, has a strong hash function as a prerequisite.

@dgryski and others, looking forward to any feedback you might have.
@rurban, I wanted to thank you for starting this discussion, which has added much useful information to the (proposed) README.

I am far more interested in PRF than hash tables, so we will definitely retain the "cryptographically strong" (not secure) description. However, I'll also edit the README to make clear that it refers to the hash function, not any hash table.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024 1

Looks like there are no remaining concerns, but please feel free to reopen if you have any further comments.

from highwayhash.

rurban avatar rurban commented on May 14, 2024

@funny-falcon

So, it is quite simple to build safe hash table if you have safe hash function. It doesn't really cost all those noise you produce, @rurban.

No it's not. I see that you still don't get it.
A safe hash function does not exist for hash tables. Such a thing cannot exist in hash tables. It's a wrong myth which needs to be destroyed. This myth originated with the SipHash paper and was widely adopted amongst some not so experienced developers.
There only exist fast/slow hash functions, and unsafe/bad/good hash functions. But never safe. safe starts with 128bit. hash tables use 6-16bit, max 32-64.

This hash here, which seems to be a really good and practical improvement over SipHash, copied these false claims. Otherwise it's fine.
A hash function should not be recommended to counter DoS attacks. This is a wrong and dangerous claim.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024

@funny-falcon:

I still believe, that "hash flood resilent" function could be much simpler and faster than SipHash.

Agreed :) HighwayHash is somewhat simpler/faster, would be great if it could be simpler still.
The ideas of PCG random are intriguing but apparently require some additional tricks to achieve good distribution and avalanching. We can gladly discuss this in another issue if someone is interested.

Strong hashing starts with 128 bit minimum.

The minimum size depends on how long a hash must resist attack. 64-bit hashes are very useful for securely authenticating short-lived messages such as RPCs.

You can use a cryptographically strong hash function, like AES, and still get at the seed and still can brute force collisions at ease.

Are you claiming the existence of an "easy" remote attack for recovering the seed of an AES-based hash?

practical seed exposure exists as outlined. And advanced attacks exists also, not published, but hinted at with examples.

These are interesting claims, but we have yet to see any actual evidence.

Chaining via linked lists is outdated 90ies technology and cache unfriendly.

Is anyone here advocating chaining?

either kill it like djb did in his dns server

Yes, that seems fine in the simple case of a cache that can just discard some requests.

or add a small sleep and log the attack

This sounds like it might worsen the effects of an attack.

secure hash functions do not lead to secure hash tables

You are repeating the first part of this statement above: "a strong hash function will not magically save a chained hash table from flooding attacks, but it is an essential part of a robust solution."

Let’s also repeat the second part: the augmented cuckoo hash table linked above requires a true pseudorandom hash function in order to meet its worst-case guarantees. Using a reversible/bad hash function would break the security of that hash table.

A hash function should not be recommended to counter DoS attacks. This is a wrong and dangerous claim.

This was never our claim. Hopefully the text above will make clear that a strong hash function is necessary or at least helpful, but not (by itself) sufficient :)

from highwayhash.

rurban avatar rurban commented on May 14, 2024

@jan-wassenberg

You can use a cryptographically strong hash function, like AES, and still get at the seed and still can brute force collisions at ease.
Are you claiming the existence of an "easy" remote attack for recovering the seed of an AES-based hash?

I was claiming that many hash table implementations are bad, severe errors had been continuously found over decades of false security claims. I do not trust anybody other than linux and glibc on hash table security. Esp. not anyone who is arguing for SipHash as Anti-DoS measure.
Seeds do get exposed. Get over that. And then you brute force it.

AES (if mangled into a hash function) or SipHash and most likely HighWayHash are fine, the problems are exposure via other means.
I outlined the perl problems, as I rewrote the perl hash parts to become faster, with less security theatre, but it still is horrible.
Similar problems do exist in similar languages and implementations. We just heard of rust, we know that python does not care, we know that ruby devs come up with wrong security claims all the time.
Both python and ruby just rewrote their hash tables. python looks fine at a first glance but who knows. php still thinks that MAX_LIMIT is enough. It's a good idea (better than ruby), but via JSON still vulnerable. Nobody other than perl implemented anything against order-exposure, even if perl still has the worst of all hash table implementations.

Of course I haven't implemented a solver based remote attack for AES (which AES? AES is a block cypher. The go variant? It needs to be a crippled variant when used as simple hash table function. Remember, only the last 10 bits, not all 256), but it might be doable given we need to unsat just 10 bits, and https://jheusser.github.io/2013/02/03/satcoin.html
solves unsat sha(sha(block)) in 1m for 1000 seeds. I'm too lazy to find a AES hash in C (google for crippled Rijndael in C?), given that we nowadays use the intrinsics which the solver does not understand. For SipHash and variants it's doable given the many exposures the current implementations offer for attackers.
And as long as even AES as hash table function is brute-forcable it's a non-issue.

So far no hardened hash tables do exist in most implementations. I know only of 4 out of hundreds. I don't consider hardening by SipHash or random seed or re-seeding on resize or seed per table secure. I only consider secure avoidance of linear search through all the collisions. Because attackers will get to that.

BTW: I'm not a fanatic on hash table security. I have the same lax POV as perl, python and PHP on this. I just don't like false security claims. I rather use fast hash functions and secure it where it matters. I.e. not in the hash function.

PS-PS: Given the conspiracy theory brought up by the wikileaks anon on 8chan, who violated his gag order, that Assange's dead men switch was avoided by a huge DoS attack on the US nameservers, so that the automatic publication of the dead men switch failed, and now they control all the wikileaks servers, hash table security does have some global impact. But this is just a theory. Thanksfully critical services do have a different stand on lax hash table security.

from highwayhash.

funny-falcon avatar funny-falcon commented on May 14, 2024

About chaining:
If you have strong hash function, you need not RBTree with ordering by key. You may use just unbalanced binary search tree with ordering by hashsum, cause balance is provided by randomness of hash function.
If you have no strong hash function, then you cannot use this trick.

It is quite simple to build "double hashing" which consumes most of hash bits:

   /* for table of power 2 */
   unsigned h, d, mix;
   h = strong_hash(key);
   d = 1; mix = h;
   for (;; h+=d; d+=1+(mix>>=8)) {
     pos = h & table_mask;
     ... /* body */
   }

And this will be very strong hash-table implementation if strong hash function is used.
And it will be comparable in performance to linear hashing if bucketized (ie position is not for one slot, but for several).
(in fact, i've already put a trick into code above for second lookup to next neighbor :-) )

@rurban , you make a lot of noise instead of looking for simple solutions and popularizing them.

from highwayhash.

funny-falcon avatar funny-falcon commented on May 14, 2024

@rurban , btw even SipHash should be used (if used) in SipHash13 configuration instead of SipHash24, cause it is already provides desired "strong level" for hash table and it is 25% faster on short strings and 50% faster on long strings.

from highwayhash.

dgryski avatar dgryski commented on May 14, 2024

@funny-falcon The source code to Marvin32 is now available under the MIT license: https://github.com/dotnet/coreclr/blob/master/src/vm/marvin32.cpp

from highwayhash.

funny-falcon avatar funny-falcon commented on May 14, 2024

@dgryski , it is very delicate situation in legal sense:

So, do one have right to reimplement Marvin32? without copying parts of file? in other language?
Even if one may reimplement Marvin32, may one use it in cases covered by patent and not within .NET application?
Cite:

a checksum component that computes a checksum value derived from the received input data and determined seed value using a Marvin32 checksum function;
a data output component that provides the computed checksum value as output data to other components of the system or external components; and
a hash table component that manages a hash table that uses the checksum value as a hash key.

Patent is not expired https://www.google.ch/patents/US20130262421 .

I'm not expert in license and patent laws.
So, if you more confident in this questions, I'd be glad to hear competent conclusion.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024

I'm back :)

@funny-falcon: thanks for proposing an unbalanced tree of hashes - we'll edit that into the original post as another alternative.

Seeds do get exposed. Get over that. And then you brute force it.

Even if the seed is exposed, the above cuckoo hash scheme comes with worst-case guarantees for insertion/lookup cost, which solves the flooding problem.

I do not trust anybody other than linux and glibc on hash table security. Esp. not anyone who is arguing for SipHash as Anti-DoS measure.

You have not yet contradicted the statements that strong hashes are required for the augmented cuckoo hashing security proof, and/or helpful for avoiding similar worst-case behavior in unbalanced trees. Therefore, strong hashes are indeed part of Anti-DoS measures.

I rather use fast hash functions and secure it where it matters. I.e. not in the hash function.

This entire issue is about security measures outside of the hash function.
A per-bin counter and some unspecified logging/fallback mechanism is not sufficient for all applications and may even make attacks worse.

The only two secure schemes proposed so far, augmented cuckoo hashing and hash-keyed trees, both require or benefit from a strong hash function. Your "fast hash functions" will likely undermine the security of these approaches.

@funny-falcon: I have some experience with licensing issues and agree with your interpretation: the patent grant is only for .Net code. MIT does not grant any patent rights (by contrast, Apache 2 does), so I believe this code is not safe to use nor even re-implement. Let’s move any further discussion of it into another issue to keep this one on-topic.

from highwayhash.

rurban avatar rurban commented on May 14, 2024

@rurban , you make a lot of noise instead of looking for simple solutions and popularizing them.

Nonsense. Your solution is not secure, only more secure and slower for the average case.
And my solution is simple, fast and secure.

from highwayhash.

rurban avatar rurban commented on May 14, 2024

@dgryski Exactly. I wanted only 3 introductory claims to be improved. This secure hash table discussion was only added to close the original issue, and keep hanging on the false and dangerous claims from SipHash. Which were mostly dangerous in the interpretation of the claims downstream.

Second, your threat model appears to include "attacker has the seed". Very few cryptographic systems are secure against a known-key attack, so that's a bit of a silly claim.

Since so many systems make it trivial to expose the seed, you cannot claim security based on the secrecy of it. But you can trivially claim security by avoiding linear search for the worst case.

I know that I was "attacking" the wrong project. I said explicitly so. But still 3rd parties shouldn't claim false security based in a high-profile other project, based on it. siphash has no repo and issue tracker. They solely base everything on their high crypto profile. And these claims spread like wildfire in the dynamic language community. And since highway hash is the superior variant of siphash it is a valid argument to make here.

Just don't announce it as siphash did. It doesn't make your hash table secure.
This is the main argument repeated in the communities. Avoid the misleading "cryptographically secure" label, which is for a PRF, but not for hash function in a hash table.
And don't recommend it for hash table usage, and DoS countermeasure. Hash tables should use fast, small hash functions, not strong hash functions. Security comes from the collision resolution, not from a strong hash function.
Even quality doesn't come from a strong hash function as I measured.
Strong hash functions in hash tables help in hiding the seed and avoid too stupid mistakes, but not much else.

Don't take part in the theatre others are celebrating. You are google at last and not OpenBSD or such.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024

The proposed README changes are in. I will keep this issue open for a while and will be happy to incorporate additional feedback/suggestions.

from highwayhash.

rurban avatar rurban commented on May 14, 2024

Thanks

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024

Thanks for helping me understand your proposal. I have revised the README and also mentioned prime hash table sizes do not require expensive division/modulo.

from highwayhash.

rurban avatar rurban commented on May 14, 2024

@jan-wassenberg

also mentioned prime hash table sizes do not require expensive division/modulo.

This is actually the other way round.

  • prime hash table sizes are a good security counter measure, because there all the bits are used for the index, but require an expensive modulo.

  • power of 2 size tables can by modulo'd much faster, usually via the ctz (count trailing zeros) builtin. But there only the last bits are used for the index, so brute-forcing such a table is much faster. 4min vs < 0.1s.

good hash tables use primes, dynamic languages use power of 2.

from highwayhash.

jan-wassenberg avatar jan-wassenberg commented on May 14, 2024

@rurban : the reason I mention this is that it is surprising and counter-intuitive. Lemire's multiply-and-shift mapping (see link in README) combines the benefits of bitwise AND and division-by-prime - it uses all hash bits, gives a fair distribution and still avoids the expensive division.

from highwayhash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.