Coder Social home page Coder Social logo

simd_test's Introduction

simd_test

SIMD instruction benchmark

Created to learn how to use SIMD instructions in C with compiler intrinsics and x64 assembly language (with Visual Studio for Windows program.)

This program executes some very basic linear algebra (vector) operations and also contains logic to determine whether the CPU supports some SIMD instruction set or not (the program isn't using it though). If you interested, just look at the source files. Note that there are no x86 (Win32) version of .asm files.

Example Results

An example output from the program built with Visual Studio 2015 (on a laptop with Core i7 6600U, Skylake architecture):

Number of times each function ran in 0.5 secs:
add_c               :  471470 times
add_sse             :  496611 times
add_avx             :  830946 times
mul_c               :  486972 times
mul_sse             :  497136 times
mul_avx             :  813090 times
div_c               :  498424 times
div_sse             :  492239 times
div_avx             :  595721 times
mul_add_c           :  384463 times
mul_add_sse         :  281115 times
mul_add_avx         :  506863 times
dot_c               :   96564 times
dot_sse             :  377050 times
dot_sse41_dp        :  254332 times
dot_sse_asm         :  387838 times
dot_avx             :  306822 times
dot_avx_dp          :  437111 times
dot_avx_asm         :  384199 times

An example output from the program built with Visual Studio 2008:

Number of times each function ran in 0.5 secs:
add_c               :  222883 times
add_sse             :  642514 times
mul_c               :  233686 times
mul_sse             :  667243 times
div_c               :  127245 times
div_sse             :  508287 times
mul_add_c           :  157523 times
mul_add_sse         :  345454 times
dot_c               :   96752 times
dot_sse             :  380795 times
dot_sse41_dp        :  251552 times
dot_sse_asm         :  379998 times

Notes

Some notes for who reads this readme:

  • The results above are not a comparison of C, SSE, and AVX -- it's just a result of my implementation.
  • Benchmark result is very unstable so that speed ranking changes almost every time I've executed -- test yourself.

Some notes I've learned by writing this:

  • Since every x64 CPU supports SSE, SSE checking logic is not needed for 64-bit program.
  • As naturally expected, VS2015 emits much optimized code than VS2008's one.
  • As for the cost of implementation, compiler intrinsics is better than hand-written assembly -- significantly easier to write, safer because of type checking, no need to reimplement architecture by architecture, etc.

One important thing to note. Even though output from code using compiler intrinsics is generally as fast as fine-tuned hand-written assembly, it CAN be slower. Actually I couldn't write effective code using instrinsics first time. Then I checked how my code was translated at disassembly window in Visual Studio, learned which part is inefficient, and tried to change my code more compiler friendly to make that part efficient. The problem in my case was that the intrinsic function I chose was not optimal for the case. Since compiler intrinsics directs the compiler to use the very specific instruction, choosing unsuitable intrinsics forces the compiler to emit the unsuitable instruction, and forces to make a detour. To avoid such pitfalls, and to shoot such troubles, I felt that it's better to know basic assembly language to analyze what's going on behind the scene.

Once I understood basics of x64 assembly, there seems to be no reason to choose writing assembly code by hand any more.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.