lukechampine / fastxor Goto Github PK
View Code? Open in Web Editor NEWThe fastest way to xor bytes in Go
License: MIT License
The fastest way to xor bytes in Go
License: MIT License
For some reason, fastxor seems to crash:
Exception 0xc000001d 0x0 0x0 0x13d301c
PC=0x13d301c
github.com/lukechampine/fastxor.xorBytesAVX(0xc0004a9600, 0x1aee, 0x1aee, 0xc0004a6000, 0x1aee, 0x1aee, 0xc0004a7b00, 0x1aee, 0x1aee, 0x1aee, ...)
github.com/lukechampine/[email protected]/xor_amd64.s:94 +0x5c
github.com/lukechampine/fastxor.Bytes(0xc0004a9600, 0x1aee, 0x1aee, 0xc0004a6000, 0x1aee, 0x1aee, 0xc0004a7b00, 0x1aee, 0x1aee, 0x0)
github.com/lukechampine/[email protected]/xor_amd64.go:36 +0x1ae
This is on Windows 10 Pro - 10.0.19042
, with Intel(R) Core(TM) i7-3770 CPU @ 3.40 GHz
CPU.
Specs for this CPU are here: https://ark.intel.com/content/www/us/en/ark/products/65719/intel-core-i7-3770-processor-8m-cache-up-to-3-90-ghz.html
I'm completely out of my depth here (sadly not an expert on assembly), but if I read it correctly, the problematic code is 256 bit AVX. This page (https://www.felixcloutier.com/x86/pxor) leads me to believe that this code requires AVX2, which (if I read it correctly), this CPU doesn't support.
Is it possible the check needs to be extended?
Thanks for making this.
I noticed that you're putting the comparison at the top of the assembly functions.
If you put the comparisons at the end, it would be a bit faster. You can assume that there is at least one element.
It's like using 'for ()' when you could instead use 'do...while{}'
16-byte width is common as is aes block size, but the newer chacha20's state size as well as the sha3-512 output length is 64 byte. So, is there any plan for 32-/64-byte version block() ?
I tested fastxor for use with some keccak code I've been writing, and it's no faster on aligned 64-byte blocks than the plain go code I've been using:
const wordSize = int(unsafe.Sizeof(int(0)))
// XORs multiples of 4 or 8 bytes (depending on architecture.)
// The arguments must be of equal length.
func fastXORWords(dst, a, b []byte) {
dw := *(*[]uintptr)(unsafe.Pointer(&dst))
aw := *(*[]uintptr)(unsafe.Pointer(&a))
bw := *(*[]uintptr)(unsafe.Pointer(&b))
n := len(b) / wordSize
for i := 0; i < n; i++ {
dw[i] = aw[i] ^ bw[i]
}
}
The profiling results show that my code gets inlined, which probably makes up for the slower 64-bit operations vs the 128-bit SSE instructions used by fastxor.
FWIW, when I tried partially unrolling the loop, I did not see a material change in performance. With 2 xor ops per loop iteration, it was still inlined, and with 4 ops per loop it was no longer inlined.
p.s. here's the profile results from one of the fastxor test runs:
Showing top 10 nodes out of 13
flat flat% sum% cum cum%
2530ms 77.13% 77.13% 2530ms 77.13% github.com/nerdralph/crypto/sha3.keccakF1600
290ms 8.84% 85.98% 290ms 8.84% github.com/lukechampine/fastxor.xorBytesSSE
250ms 7.62% 93.60% 3180ms 96.95% main.makeCacheFast
100ms 3.05% 96.65% 100ms 3.05% runtime.cgocall
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.