Coder Social home page Coder Social logo

Comments (5)

eschnett avatar eschnett commented on August 15, 2024

I believe you are misunderstanding the "length" argument of a pack. You should probably set it to 1 (not 4) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.

I usually just use pack<double>, without specifying the length. The SIMD length is chosen automatically by nsimd.

As a test, you can evaluate sizeof(pack<float, 4>).

from nsimd.

raphaelthegreat avatar raphaelthegreat commented on August 15, 2024

I believe you are misunderstanding the "length" argument of a pack. You should probably set it to 1 (not 4) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.

I usually just use pack<double>, without specifying the length. The SIMD length is chosen automatically by nsimd.

As a test, you can evaluate sizeof(pack<float, 4>).

Thanks for pointing it out. The code looks much cleaner now. But if size of the vector depends on the architecture provided, is there any cross platform way of defining the size? In addition, though, I have been facing numerous build errors on compilers that aren't GCC when using the nsimd::cvt function. For example:

image
GCC

image
Clang

image
MSVC

from nsimd.

eschnett avatar eschnett commented on August 15, 2024

I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.

I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify -std=c++17 as well.

Most SIMD hardware works on a fixed number of bits. Thus converting from 4xi32 to 4xi16 is not a natural operation. The natural operation would generate an 8xi16 vector, presumably from two 4xi32 inputs. Maybe you need to use downcvt instead of cvt?

from nsimd.

raphaelthegreat avatar raphaelthegreat commented on August 15, 2024

I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.

I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify -std=c++17 as well.

Most SIMD hardware works on a fixed number of bits. Thus converting from 4xi32 to 4xi16 is not a natural operation. The natural operation would generate an 8xi16 vector, presumably from two 4xi32 inputs. Maybe you need to use downcvt instead of cvt?

What I want to do is take each element of the int32 vector and cast it down to int16 without reducing the vector size (in elements). With intrinsics this can be done to my knowledge with _mm_cvtps_pi16 so I assumed that nsimd::cvt would work based on the similar name.

from nsimd.

raphaelthegreat avatar raphaelthegreat commented on August 15, 2024

Closing is now as resolved

from nsimd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.