I recently found the need for a cross platform simd library and nsimd seems to be the

NSIMD generates too much assembly about nsimd HOT 5 CLOSED

raphaelthegreat commented on August 15, 2024

NSIMD generates too much assembly

from nsimd.

Comments (5)

eschnett commented on August 15, 2024

I believe you are misunderstanding the "length" argument of a pack. You should probably set it to 1 (not 4) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.

I usually just use pack<double>, without specifying the length. The SIMD length is chosen automatically by nsimd.

As a test, you can evaluate sizeof(pack<float, 4>).

from nsimd.

raphaelthegreat commented on August 15, 2024

I believe you are misunderstanding the "length" argument of a pack. You should probably set it to 1 (not 4) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.

I usually just use pack<double>, without specifying the length. The SIMD length is chosen automatically by nsimd.

As a test, you can evaluate sizeof(pack<float, 4>).

Thanks for pointing it out. The code looks much cleaner now. But if size of the vector depends on the architecture provided, is there any cross platform way of defining the size? In addition, though, I have been facing numerous build errors on compilers that aren't GCC when using the nsimd::cvt function. For example:

GCC

Clang

MSVC

from nsimd.

eschnett commented on August 15, 2024

I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.

I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify -std=c++17 as well.

Most SIMD hardware works on a fixed number of bits. Thus converting from 4xi32 to 4xi16 is not a natural operation. The natural operation would generate an 8xi16 vector, presumably from two 4xi32 inputs. Maybe you need to use downcvt instead of cvt?

from nsimd.

raphaelthegreat commented on August 15, 2024

I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.

I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify -std=c++17 as well.

Most SIMD hardware works on a fixed number of bits. Thus converting from 4xi32 to 4xi16 is not a natural operation. The natural operation would generate an 8xi16 vector, presumably from two 4xi32 inputs. Maybe you need to use downcvt instead of cvt?

What I want to do is take each element of the int32 vector and cast it down to int16 without reducing the vector size (in elements). With intrinsics this can be done to my knowledge with _mm_cvtps_pi16 so I assumed that nsimd::cvt would work based on the similar name.

from nsimd.

raphaelthegreat commented on August 15, 2024

Closing is now as resolved

from nsimd.

NSIMD generates too much assembly about nsimd HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent