howardhinnant / hash_append Goto Github PK
View Code? Open in Web Editor NEWImplementation of hash_append proposal
Implementation of hash_append proposal
I would like to propose adding an identity_hash
that stores its input bytes as result.
class identity_hash
{
std::size_t m_state;
public:
static constexpr xstd::endian endian = xstd::endian::native;
using result_type = std::size_t;
constexpr void operator()(void const* key, std::size_t /* len */) noexcept
{
m_state = *static_cast<result_type const*>(key);
}
explicit constexpr operator result_type() const noexcept
{
return m_state;
}
};
The use cases for this simple hash function are for thin wrapper classes around regular value types that cache their hash keys. E.g. below is a sketch of such a wrapper class that holds a value and its hash key. This key can be computed in the wrapper's constructor by xstd::uhash
using any full-fledged hash function (fnv1a
here, but also Murmur, City or even std::hash
).
template<class T, class InternalHash = acme::fnv1a>
class wrap
{
T m_value;
typename InternalHash::result_type m_hash;
public:
explicit wrap(T const& u) noexcept
: m_value{u}
, m_hash{xstd::uhash<InternalHash>{}(m_value)}
{}
template<class ExternalHash>
friend void hash_append(ExternalHash& h, wrap const& w)
{
static_assert(std::is_same<ExternalHash, acme::identity_hash>{});
using xstd::hash_append;
hash_append(h, w.m_hash);
}
};
Once the key has been computed and wrapped, the wrapper object itself can be stored inside a std::unordered_set<wrap, xstd::uhash<acme::identity_hash>>
. This saves recomputing the hash key.
The proposed identity_hash
would enable the hash_append
framework for these type of data structures. This also paves the way for (but does not depend on) using a tabulation hashing algorithm (where the key is incrementally updated when the wrapped value is mutated; caching the key is required in order to achieve this).
Applications are e.g. chess engines, where the wrapped type represents the full chess Position object. Storing the hash key and incrementally updating it as the underlying position changes, is standard practice in every top board game program (chess, checkers, Go). Similar optimizations are done in protein design and other backtracking search applications where small incremental changes and their inverses are done to the data structure.
I can send a PR if you would welcome identity_hash
(I'm obivously not proposing the wrapper class).
uhash
should have a constructor that takes a seed. One possible (performance-oriented) implementation could then xor the seed with h
before returning it from operator()
. Combined with initializing the seed to 0 in the default constructor, this preserves existing behavior without introducing a branch and an additional call to hash_append
.
It's of course possible to seed in a hash adapter, as explained, but this makes seeded hashing second-class, something that the user needs to work for, and it needs to be easy.
Seeding hash functions is all the rage nowadays among the security-savvy, and without explicit support, the standard library may well decide to do it on the container level, which makes it impossible for the user to influence it or supply a seed. It would be better for those standard library implementations to have the option to process-wide seed in the default constructor of uhash
instead, in which case the user would be able to override it.
Current code seems to be taken from Boost, but it is likely 32-bit oriented. Shouldn't the different constant be used for 64-bit size_t?
Line 34 in bd892bf
When hashing floats. You seem to handle plus and minus zero as the same value (as you should).
In floats there are many different NAN values, that generally fall into thee classes, -infinity, +infinity and "the rest". Do you not agree that all floats in the "the rest" class should hash to the same value? In boost they switch over the result of std::fpclassify when hashing floats.
Hi!
Well, this is not a feature request, it's just that I'm not sure what's the preferred place for discussions about hash_append.
I was wondering if you had considered using a stream-like syntax for hash_append. We are already familiar with it.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.