Coder Social home page Coder Social logo

fastxor's People

Contributors

lukechampine avatar rubenv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fastxor's Issues

Exception 0xc000001d

For some reason, fastxor seems to crash:

Exception 0xc000001d 0x0 0x0 0x13d301c
PC=0x13d301c

github.com/lukechampine/fastxor.xorBytesAVX(0xc0004a9600, 0x1aee, 0x1aee, 0xc0004a6000, 0x1aee, 0x1aee, 0xc0004a7b00, 0x1aee, 0x1aee, 0x1aee, ...)
	github.com/lukechampine/[email protected]/xor_amd64.s:94 +0x5c
github.com/lukechampine/fastxor.Bytes(0xc0004a9600, 0x1aee, 0x1aee, 0xc0004a6000, 0x1aee, 0x1aee, 0xc0004a7b00, 0x1aee, 0x1aee, 0x0)
	github.com/lukechampine/[email protected]/xor_amd64.go:36 +0x1ae

This is on Windows 10 Pro - 10.0.19042, with Intel(R) Core(TM) i7-3770 CPU @ 3.40 GHz CPU.

Specs for this CPU are here: https://ark.intel.com/content/www/us/en/ark/products/65719/intel-core-i7-3770-processor-8m-cache-up-to-3-90-ghz.html

I'm completely out of my depth here (sadly not an expert on assembly), but if I read it correctly, the problematic code is 256 bit AVX. This page (https://www.felixcloutier.com/x86/pxor) leads me to believe that this code requires AVX2, which (if I read it correctly), this CPU doesn't support.

Is it possible the check needs to be extended?

way to make it a bit faster

Thanks for making this.

I noticed that you're putting the comparison at the top of the assembly functions.
If you put the comparisons at the end, it would be a bit faster. You can assume that there is at least one element.

It's like using 'for ()' when you could instead use 'do...while{}'

Implement 32- and 64-byte block

16-byte width is common as is aes block size, but the newer chacha20's state size as well as the sha3-512 output length is 64 byte. So, is there any plan for 32-/64-byte version block() ?

not faster on Core i5 with 64-byte blocks and golang 1.16

I tested fastxor for use with some keccak code I've been writing, and it's no faster on aligned 64-byte blocks than the plain go code I've been using:

const wordSize = int(unsafe.Sizeof(int(0)))

// XORs multiples of 4 or 8 bytes (depending on architecture.)
// The arguments must be of equal length.
func fastXORWords(dst, a, b []byte) {
    dw := *(*[]uintptr)(unsafe.Pointer(&dst))
    aw := *(*[]uintptr)(unsafe.Pointer(&a))
    bw := *(*[]uintptr)(unsafe.Pointer(&b))
    n := len(b) / wordSize
    for i := 0; i < n; i++ {
        dw[i] = aw[i] ^ bw[i]
    }
}

The profiling results show that my code gets inlined, which probably makes up for the slower 64-bit operations vs the 128-bit SSE instructions used by fastxor.
FWIW, when I tried partially unrolling the loop, I did not see a material change in performance. With 2 xor ops per loop iteration, it was still inlined, and with 4 ops per loop it was no longer inlined.

p.s. here's the profile results from one of the fastxor test runs:

Showing top 10 nodes out of 13
      flat  flat%   sum%        cum   cum%
    2530ms 77.13% 77.13%     2530ms 77.13%  github.com/nerdralph/crypto/sha3.keccakF1600
     290ms  8.84% 85.98%      290ms  8.84%  github.com/lukechampine/fastxor.xorBytesSSE
     250ms  7.62% 93.60%     3180ms 96.95%  main.makeCacheFast
     100ms  3.05% 96.65%      100ms  3.05%  runtime.cgocall

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.