Comments (4)
Hi ustcsq,
I am not sure how you get the segfault --> https://godbolt.org/z/z1se7oGe3
To help you I will need a fully functionnal snippet of code along with its compilation command-line.
My advice
Do not write code like this. The code you write won't even compile for certain SIMD extensions such as Arm SVE. You have to separate concerns : data structure and computations. You should use a std::vector<float>
and in you computation code something like this:
template <typename T>
int computations(std::vector<T> &tmp) {
typedef nsimd::pack<T> vec; // shortcut
int s = nsimd::len<vec>(); // shortcut
int n = (int)tmp.size(); // shortcut
T *ptr = tmp.data(); // always work with raw pointers
for (int i = 0; i + s <= tmp.size(); i += s) {
vec v = nsimd::loadu<vec>(ptr + i);
// ... do some work with v
nsimd::storeu(ptr + i, v);
}
}
from nsimd.
https://godbolt.org/z/z9fMd3o8n
I got the segfault under avx2 gcc version <=10.3
thks
from nsimd.
It seems to be a bug of those versions of GCC. As you can see when disassembling the resulting binary and doing a run with GDB you can see the problem. A lot of people think that doing a std::vector<nsimd::pack<float>>
or a std::vector<__m256>
will be more optimized as there will be no need for loads/stores. But this is of course wrong. When writing tmp[0]
a load (or store) instruction is generated by the compiler. And this is where is the problem. You cannot assume that data is properly aligned and obviously it is not but the compiler wrongly generated a movaps
as you can see below
friend std::ostream &operator<<(std::ostream &os, pack const &a0) {
12b0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10
12b5: 48 83 e4 e0 and $0xffffffffffffffe0,%rsp
__ostream_insert(__out, __s,
12b9: ba 02 00 00 00 mov $0x2,%edx
12be: 41 ff 72 f8 pushq -0x8(%r10)
12c2: 55 push %rbp
12c3: 48 89 e5 mov %rsp,%rbp
12c6: 41 56 push %r14
12c8: 41 55 push %r13
12ca: 41 54 push %r12
12cc: 49 89 fc mov %rdi,%r12
12cf: 41 52 push %r10
12d1: 53 push %rbx
12d2: 48 81 ec 28 01 00 00 sub $0x128,%rsp
T buf[max_len_t<T>::value];
storeu(buf, a0.car, T(), SimdExt());
12d9: c5 fc 28 06 vmovaps (%rsi),%ymm0
12dd: 48 8d 35 20 0d 00 00 lea 0xd20(%rip),%rsi # 2004 <_IO_stdin_used+0x4>
}
extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_storeu_ps (float *__P, __m256 __A)
{
*(__m256_u *)__P = __A;
12e4: c5 fc 29 85 d0 fe ff vmovaps %ymm0,-0x130(%rbp)
12eb: ff
12ec: c5 fc 29 85 b0 fe ff vmovaps %ymm0,-0x150(%rbp)
12f3: ff
To bypass this GCC bug you can do the following:
#include <nsimd/nsimd-all.hpp>
#include <iostream>
int main() {
std::vector<nsimd::pack<float>, nsimd::allocator<float>> tmp(8);
std::cout << "tmp : " << tmp.size() << ", " << tmp.capacity() << std::endl;
std::cout << tmp[0] << std::endl;
return 0;
}
The nsimd::allocator
will force the proper alignment of data and the wrongly generated movaps
by GCC will be given aligned pointers and it will work.
But again, please, do not write code like this.
from nsimd.
thks
from nsimd.
Related Issues (20)
- Performance different on X86 and ARM service HOT 2
- Support Reg Size /2 HOT 1
- NSIMD generates too much assembly HOT 5
- There is not ‘&’ bitwise operator for ’packl‘ type and no conversion of 'pack' and 'packl' type
- nsimd defines `i64` etc. in global namespace HOT 3
- Provide a C11 API for core
- NSIMD SVE intrinsics generates movprfx HOT 3
- Provide cbrt HOT 1
- Provide += operators etc.
- mask_for_loop_tail produces scalar code HOT 2
- Provide fabs, fmax, fmin HOT 2
- Use Sleef math functions to complete NSIMD
- Provide more "inline" operators such as copysign, isfinite, etc... HOT 2
- Implement a `flipsign` function HOT 1
- Provide a constexpr size function HOT 4
- Compile nsimd-2.2 on the ARM server HOT 2
- Document "andnot"
- Calculate i32 horizontal sum in nsimd. HOT 3
- Feature request: allow logical arguments for if_else1 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nsimd.