Coder Social home page Coder Social logo

C API about pthash HOT 5 CLOSED

jermp avatar jermp commented on September 28, 2024
C API

from pthash.

Comments (5)

jermp avatar jermp commented on September 28, 2024 1

Hello, and thank you for the kind words!

A Rust porting is something I have consider writing but I first have to learn Rust, so it would not appear soon unless someone else is going to do it.

We can work though on a C api, yes, assuming you only need the signature of the functions to be in pure C but the function itself will then call the "read" C++ function as written in PTHash.

Best,
-Giulio

from pthash.

jermp avatar jermp commented on September 28, 2024 1

Thank you! There are several people (including me, of course) who are interested in porting PTHash to Rust.
I can connect you with them if interested and work together.

Best,
-Giulio

from pthash.

snizovtsev avatar snizovtsev commented on September 28, 2024 1

The construction algorithm is actually pretty simple. I have recently rewrote it from scratch for my (abandoned) project. Instead of C API it's better to just write a whole algorithm in Rust. I've made this for Apache Arrow C++ and found it is simple enough to fit into a few steps:

  1. Hash key columns into 2 vectors of size N: hash1(k), hash2(k);
  2. Compute a load factor for each bucket: count[skew_bucketer(hash1_i)]++;;
  3. Group buckets by descending count, i.e. make a nested list of indices [[b1,b2,b3], [b4], ... [bm]] where count[b1]=count[b2] and so on (you can encode a nested list as 2 flat vectors);
  4. Reorder hash2 vector according to bucket bins: [[hash2 elements for b1], [hash2 elements for b2], ..., [hash2 elements for bm]]. This is another List[List[UInt64]] that could be encoded as 2 flat vectors;
  5. Now iterate that bins and pick a pilot value for each of them. That's it.

from pthash.

ozgrakkurt avatar ozgrakkurt commented on September 28, 2024

makes sense, trying to add c header file. Also want to write rust port

from pthash.

jermp avatar jermp commented on September 28, 2024

@snizovtsev Iā€™m glad you found PTHash useful and easy to implement! I would be grateful if you could point out for what application you used PTHash. Thanks!
-Giulio

from pthash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.