The way swizzles are implemented on vectors and matrices is UB due to the strict alias

HI <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Undefined behavior when accessing vector elements with operator[] about hlslpp HOT 10 CLOSED

mlangerak commented on September 26, 2024

Undefined behavior when accessing vector elements with operator[]

from hlslpp.

Comments (10)

redorav commented on September 26, 2024

HI @mlangerak, is this something you have come across in practice? I tend to be quite practical with regards to UB, and unions are something compilers tend to make exceptions with even with strict aliasing on. In this instance in fact, a float points to another float, so I can't see where that would be an issue. I make an explicit mention of this in the readme:

The f32 members of float4 and the [ ] operators make use of the union directly, so the generated code is up to the compiler. Use with care

The compiler needs to issue the SIMD instructions for this, and it is up to it to generate whatever code it feels is appropriate. It's not something that is necessarily good practice all the time, but if you're just setting up matrices or whatever or quickly fiddling about, it can be useful.

from hlslpp.

mlangerak commented on September 26, 2024

The union consists of a float[] and an n128 which when, vector intrinsics are enabled, are distinct types e.g. float and __m128. The compiler is only required to track the most recently written union member and accessing the other one is UB. Whether or not that is a problem in practice is compiler dependent as you mention. I haven't directly encountered an issue with this in using hlsl++ but it is a concern, esp. with the gcc compiler which can be perversely aggressive with exploiting UB in optimizations.

Also there might be a performance concern with this -- does the optimizer recognize the float access through operator[] can be achieved with permute on the n128? I haven't checked the assembly output, but I can imagine the compiler will throw up its hands in that case and do a round trip through memory instead.

from hlslpp.

redorav commented on September 26, 2024

I added this feature for someone who asked for it as a quick way to access floats, in the understanding that it is up to the compiler. I don't even know how I would do this in some other way, having to return a float&. There is a performance concern, that's why there is a caveat that it's up to the compiler and that's why the recommended way to do it is either via the swizzles or the constructors.

In case you're curious, here's a quick godbolt example that you may want to play with, you can see that the compilers know exactly what I'm doing, GCC is quite clever when you set the vectors as constants. https://godbolt.org/z/qvooMG9EK

Anyway, I would rather we come up with a case where it breaks and check the assembly, there's no point trying to fix something without hard evidence. I'll leave this open for the reply.

from hlslpp.

mlangerak commented on September 26, 2024

The reason why I started digging into the hlsl++ implementation, and I noticed the UB in the use of unions, is that I get incorrect results from hlsl++ in release builds. This occurs only on Apple Clang 15 at optimization level -O3. Apple Clang -O2 and MSVC at max optimization level give correct results (I have not tried GCC.) Under Apple Clang -O3 the code either produces garbage results or crashes, depending on minor changes in my code that uses hlsl++. Interestingly, when I compile with HLSLPP_SCALAR the code works correctly, even with -O3.

This is a fairly large codebase, and narrowing down to a minimal repro would be very time consuming. I cannot claim that the errors I am seeing at -O3 is due to UB in the use of unions, but nothing else in the hlsl++ code stands out as problematic so it seems the most likely culprit? Unfortunately, I noticed that the union float/vector aliasing is quite pervasive in hlsl++, e.g. it is used in swizzle1 too, and I gave up on my attempts at hacking at hlsl++ to remove this source of UB to prove/disprove whether or not it is causing the errors I am seeing at -O3.

from hlslpp.

redorav commented on September 26, 2024

I see what you mean, it might have been more useful to lead with that so I can understand your problem better :)

Just to reply to a few points, the swizzles rely on the union, there's nothing that can be done about that, it's the only way I could think of to enable code like a = b.xzyx, so it's essentially a core feature. That said, I can't imagine why simply having that there would be an issue if you don't use swizzles.

I don't remember if the unit tests are run with -O3 in appveyor, I could try that as someone else reported a Mac-specific issue that I wasn't able to reproduce. I don't have a Mac proper (I test in a virtual machine + appveyor) so that's a difficulty for me in that sense as well. Is this apple silicon or intel? What version of Mac and the Clang compiler? That's important as in one case you'd use NEON and in the other SSE, just so we can narrow down the issue.

In terms of the problem itself, are you able to provide a snippet of code of where it goes wrong to look at it? I'm happy to try a bit of guesswork if that might narrow it down.

from hlslpp.

mlangerak commented on September 26, 2024

I see what you mean, it might have been more useful to lead with that so I can understand your problem better :)

That was somewhat hot of the press, before I could figure out if the problem was in my code or in hlsl++. Actually, it is still unclear, since even with HLSLPP_SCALAR I notice there are subtle differences in output between my GPU and CPU with hlsl++ codepaths (the differences are greater than what can be explained with fp roundoff error or indeterminism due to GPU threading execution order.)

This is on Apple Silicon, M2, Apple Clang 1500.0.40.1, Sonoma 14.1.1.

I do use swizzles in my code quite a bit, so I am dependent on the union trick doing the right thing. I did eliminate all uses of operator[] in my client code, so the only uses are within hlsl++ itself, but that made no difference. I'll work on narrowing down the problem on my end, I don't know if I'll be successful to reduce it to a snippet I can post but I'll try.

from hlslpp.

mlangerak commented on September 26, 2024

The subtle difference with HLSLPP_SCALAR was due to a bug in hlsl++, I opened a new issue for this.

from hlslpp.

mlangerak commented on September 26, 2024

OK so maybe this is a compiler bug in (Apple) clang? Whilst hacking my code trying to narrow down the problem, I got into a state where it would crash in the NEON implementation of vpermq_s32 with -O3 but it would work fine with -O2.

Comparing the disassembly, with -O2:

mov.d  v2[1], v1[0]
tbl.8b v1, { v2 }, v8
mov.d  v0[1], v1[0]
ldr    q2, [sp]
mul.4s v23, v0, v2
fmov   w9, s1

With -O3:

mov.d  v24[1], v23[0]
.long  0x0e033717  ; unknown opcode
mov.d  v22[1], v23[0]
mul.4s v22, v22, v1
fmov   w9, s23

The compiler generates an unknown opcode with -O3...?! Never seen that before :O

from hlslpp.

redorav commented on September 26, 2024

That does sound pretty strange. Maybe I'm doing something weird, have you tried vshufq_s32 or vpermq_f32? Those are all relatively similar

from hlslpp.

redorav commented on September 26, 2024

@mlangerak I'll close this off on the assumption that it's fixed, let me know if I can help any further

from hlslpp.

Undefined behavior when accessing vector elements with operator[] about hlslpp HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent