Comments (5)
Hello, and thank you for the kind words!
A Rust porting is something I have consider writing but I first have to learn Rust, so it would not appear soon unless someone else is going to do it.
We can work though on a C api, yes, assuming you only need the signature of the functions to be in pure C but the function itself will then call the "read" C++ function as written in PTHash.
Best,
-Giulio
from pthash.
Thank you! There are several people (including me, of course) who are interested in porting PTHash to Rust.
I can connect you with them if interested and work together.
Best,
-Giulio
from pthash.
The construction algorithm is actually pretty simple. I have recently rewrote it from scratch for my (abandoned) project. Instead of C API it's better to just write a whole algorithm in Rust. I've made this for Apache Arrow C++ and found it is simple enough to fit into a few steps:
- Hash key columns into 2 vectors of size N:
hash1(k)
,hash2(k)
; - Compute a load factor for each bucket:
count[skew_bucketer(hash1_i)]++;
; - Group buckets by descending count, i.e. make a nested list of indices
[[b1,b2,b3], [b4], ... [bm]]
wherecount[b1]=count[b2]
and so on (you can encode a nested list as 2 flat vectors); - Reorder
hash2
vector according to bucket bins:[[hash2 elements for b1], [hash2 elements for b2], ..., [hash2 elements for bm]]
. This is anotherList[List[UInt64]]
that could be encoded as 2 flat vectors; - Now iterate that bins and pick a pilot value for each of them. That's it.
from pthash.
makes sense, trying to add c header file. Also want to write rust port
from pthash.
@snizovtsev Iām glad you found PTHash useful and easy to implement! I would be grateful if you could point out for what application you used PTHash. Thanks!
-Giulio
from pthash.
Related Issues (20)
- Encoders must be documented in the help text HOT 14
- Seed not working HOT 3
- pthash stucks while creating mphf HOT 9
- missing include HOT 5
- "std::runtime_error ... using too many buckets: change bucket_id_type to uint64_t or use a smaller " with c >= 5 and 34*10^9 keys HOT 4
- runtime error "blank line detected after reading X non-empty lines" HOT 2
- -i can't read from a non-mmap-friendly input file when using --external HOT 2
- runtime error "seed did not work" when -i points to a non-regular file HOT 7
- add support for reading from zstd-compressed files HOT 1
- Support process substitution to read keys from std::in HOT 12
- duplicate hash values on additive displacement HOT 3
- Is there a map implementation of PTHash? HOT 8
- update README in pthash/phobic
- undefined reference to `pthread_create' HOT 9
- possibility to adapt builders for partitions with single elements HOT 7
- the program crashed HOT 1
- the program fall in loop HOT 3
- C++14 support HOT 4
- Use pthash with unordered_map HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pthash.