Comments (5)
I believe you are misunderstanding the "length" argument of a pack. You should probably set it to 1
(not 4
) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.
I usually just use pack<double>
, without specifying the length. The SIMD length is chosen automatically by nsimd.
As a test, you can evaluate sizeof(pack<float, 4>)
.
from nsimd.
I believe you are misunderstanding the "length" argument of a pack. You should probably set it to
1
(not4
) to create a pack that has the natural size for your architecture. Choosing a larger value allows setting up SIMD vectors with a length that is a multiple of the natural hardware vector length. In other words, your nsimd examples are working on 16 elements, not just 4, and the code is thus much longer than expected. You are also seeing a lot of code related to loading and storing function arguments from and to the stack; these instructions will go away if the function is inlined, or is used inside a larger function.I usually just use
pack<double>
, without specifying the length. The SIMD length is chosen automatically by nsimd.As a test, you can evaluate
sizeof(pack<float, 4>)
.
Thanks for pointing it out. The code looks much cleaner now. But if size of the vector depends on the architecture provided, is there any cross platform way of defining the size? In addition, though, I have been facing numerous build errors on compilers that aren't GCC when using the nsimd::cvt function. For example:
from nsimd.
I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.
I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify -std=c++17
as well.
Most SIMD hardware works on a fixed number of bits. Thus converting from 4xi32
to 4xi16
is not a natural operation. The natural operation would generate an 8xi16
vector, presumably from two 4xi32
inputs. Maybe you need to use downcvt
instead of cvt
?
from nsimd.
I am not aware of a way to define the SIMD vector size. The suggested approach is to let nsimd choose the optimal vector length, and then to inquire the actual vector length at compile time. This allows efficient loop vectorization when traversing arrays in loops.
I am not encountering any such errors. I uses nsimd regularly, on various systems. I usually build with GCC 11.2 (as you do), and I specify
-std=c++17
as well.Most SIMD hardware works on a fixed number of bits. Thus converting from
4xi32
to4xi16
is not a natural operation. The natural operation would generate an8xi16
vector, presumably from two4xi32
inputs. Maybe you need to usedowncvt
instead ofcvt
?
What I want to do is take each element of the int32 vector and cast it down to int16 without reducing the vector size (in elements). With intrinsics this can be done to my knowledge with _mm_cvtps_pi16 so I assumed that nsimd::cvt would work based on the similar name.
from nsimd.
Closing is now as resolved
from nsimd.
Related Issues (20)
- Performance different on X86 and ARM service HOT 2
- Vector<nsimd::pack<T>> HOT 4
- Support Reg Size /2 HOT 1
- There is not ‘&’ bitwise operator for ’packl‘ type and no conversion of 'pack' and 'packl' type
- nsimd defines `i64` etc. in global namespace HOT 3
- Provide a C11 API for core
- NSIMD SVE intrinsics generates movprfx HOT 3
- Provide cbrt HOT 1
- Provide += operators etc.
- mask_for_loop_tail produces scalar code HOT 2
- Provide fabs, fmax, fmin HOT 2
- Use Sleef math functions to complete NSIMD
- Provide more "inline" operators such as copysign, isfinite, etc... HOT 2
- Implement a `flipsign` function HOT 1
- Provide a constexpr size function HOT 4
- Compile nsimd-2.2 on the ARM server HOT 2
- Document "andnot"
- Calculate i32 horizontal sum in nsimd. HOT 3
- Feature request: allow logical arguments for if_else1 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nsimd.