Comments (6)
Hi, thanks for reaching out. We could extend them whenever someone needs them. Is speed a concern? If not, there's already the simple C version in the c/ directory which includes cat.
Otherwise, if you're willing to extend the bindings, I'd be happy to review and integrate.
from highwayhash.
Hi Jan,
Sorry for the slow response.
Speed is indeed a concern, we originally used exactly the ones you pointed out.
Since then original question we have made some hack-ey C bindings wrapped around the C++ but so far are drastically under-performing. We're definitely hitting the avx instructions (confirmed by with valgrind/callgrind), and getting correct hash values back, so I'm not sure what we're doing wrong. We've got aligned allocations in the buffers, and I've seen similarly bad performance when I just use the native CPP class, so I'm clearly doing something wrong.
Any common "gotchas" that you might know of for us to look at would be much appreciated!
Thanks again.
Ian
from highwayhash.
Hi Ian, no worries!
I've seen similarly bad performance when I just use the native CPP class, so I'm clearly doing something wrong.
I'm not sure I understand the problem - are you saying both the C and C++ versions are equally slow? What's the basis for comparison - the published benchmark results?
Some ideas: is the InstructionSets dispatcher involved? Especially with Cat, the app-specific code generating data and the calls to Cat really should be inlined together with only a single call to the dispatcher (instead of once per Append).
Somewhat related: compilers didn't seem able to keep the hash state in registers across calls to Append, so we're loading 128 bytes for every call. Might be even worse on an older/non-Clang compiler.
Also, depending on how the C++ code and its wrapper are compiled, we might get VZEROUPPER after every function call. Does it help to enable -mavx2 in all translation units?
from highwayhash.
Hi Jan,
Right now, The C++ version with the -mavx2 flag set is working significantly slower than the pure C implementation when we're streaming file-data through them. Our measurements are just a crude timer for MBPS throughput, and getting something like <100MBPS using the C++ libraries, and we're pulling from the pagecache, so filethroughput is >5GBPS, so thats not the issue.
That said included benchmark is a bit slower than the results published on the README, but not so much that it can explain that. I can provide detailed numbers if thats helpful.
We'll take a look at the stuff you suggested.
Thanks again!
Ian
from highwayhash.
Oh, that's a surprise. We aren't using Cat in time-critical apps yet, and I did have trouble with the state not being kept in vector registers, but I'm still surprised it's slower than C.
Would you be able to share disassembly of the relevant parts to see where we're running into trouble?
from highwayhash.
Hi Jan,
As usual, we get sidetracked before circling around, apologies. I'll see if we can do that. Not sure we can (heavy-weight bureaucracy on our end even in innocuous cases) but worth a shot.
from highwayhash.
Related Issues (20)
- Unexpected low speeds with the generic C?! HOT 2
- New Nodejs bindings package HOT 2
- highwayhash-wasm - WASM JS binding for the browser and nodejs HOT 3
- Alignment attributes are in the wrong place for arrays HOT 1
- Build fails (gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0) HOT 6
- Small bug in documentation (?) HOT 1
- Question: When the README says "an expected 2^32 guesses of m per the birthday paradox", does it refers to 128 and 256-bits versions too? HOT 3
- Alignment warning when compiling with GCC7 on aarch64 HOT 2
- Issues with Core 2 Duo HOT 1
- [Partial implementation] NEON version HOT 1
- Question: SSSE3 HOT 3
- doc: wikipedia article has been deleted HOT 1
- Failed to build on arm64 HOT 1
- Please strip the installed library HOT 1
- SipHash performs badly on short msgs (< 100 bytes). And missing le64toh swap on key? HOT 4
- Installs an unstripped binary lib/libhighwayhash.so HOT 3
- Builds static library with relocatable code (-fPIC) HOT 3
- Problem with Python 3 bindings HOT 5
- Test fails with gcc 10.2.0 HOT 3
- The tricky __SSE4_1__ macro under Visual Studio HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from highwayhash.