Comments (10)
HI @mlangerak, is this something you have come across in practice? I tend to be quite practical with regards to UB, and unions are something compilers tend to make exceptions with even with strict aliasing on. In this instance in fact, a float points to another float, so I can't see where that would be an issue. I make an explicit mention of this in the readme:
The f32 members of float4 and the [ ] operators make use of the union directly, so the generated code is up to the compiler. Use with care
The compiler needs to issue the SIMD instructions for this, and it is up to it to generate whatever code it feels is appropriate. It's not something that is necessarily good practice all the time, but if you're just setting up matrices or whatever or quickly fiddling about, it can be useful.
from hlslpp.
The union consists of a float[] and an n128 which when, vector intrinsics are enabled, are distinct types e.g. float and __m128. The compiler is only required to track the most recently written union member and accessing the other one is UB. Whether or not that is a problem in practice is compiler dependent as you mention. I haven't directly encountered an issue with this in using hlsl++ but it is a concern, esp. with the gcc compiler which can be perversely aggressive with exploiting UB in optimizations.
Also there might be a performance concern with this -- does the optimizer recognize the float access through operator[] can be achieved with permute on the n128? I haven't checked the assembly output, but I can imagine the compiler will throw up its hands in that case and do a round trip through memory instead.
from hlslpp.
I added this feature for someone who asked for it as a quick way to access floats, in the understanding that it is up to the compiler. I don't even know how I would do this in some other way, having to return a float&. There is a performance concern, that's why there is a caveat that it's up to the compiler and that's why the recommended way to do it is either via the swizzles or the constructors.
In case you're curious, here's a quick godbolt example that you may want to play with, you can see that the compilers know exactly what I'm doing, GCC is quite clever when you set the vectors as constants. https://godbolt.org/z/qvooMG9EK
Anyway, I would rather we come up with a case where it breaks and check the assembly, there's no point trying to fix something without hard evidence. I'll leave this open for the reply.
from hlslpp.
The reason why I started digging into the hlsl++ implementation, and I noticed the UB in the use of unions, is that I get incorrect results from hlsl++ in release builds. This occurs only on Apple Clang 15 at optimization level -O3. Apple Clang -O2 and MSVC at max optimization level give correct results (I have not tried GCC.) Under Apple Clang -O3 the code either produces garbage results or crashes, depending on minor changes in my code that uses hlsl++. Interestingly, when I compile with HLSLPP_SCALAR the code works correctly, even with -O3.
This is a fairly large codebase, and narrowing down to a minimal repro would be very time consuming. I cannot claim that the errors I am seeing at -O3 is due to UB in the use of unions, but nothing else in the hlsl++ code stands out as problematic so it seems the most likely culprit? Unfortunately, I noticed that the union float/vector aliasing is quite pervasive in hlsl++, e.g. it is used in swizzle1 too, and I gave up on my attempts at hacking at hlsl++ to remove this source of UB to prove/disprove whether or not it is causing the errors I am seeing at -O3.
from hlslpp.
I see what you mean, it might have been more useful to lead with that so I can understand your problem better :)
Just to reply to a few points, the swizzles rely on the union, there's nothing that can be done about that, it's the only way I could think of to enable code like a = b.xzyx, so it's essentially a core feature. That said, I can't imagine why simply having that there would be an issue if you don't use swizzles.
I don't remember if the unit tests are run with -O3 in appveyor, I could try that as someone else reported a Mac-specific issue that I wasn't able to reproduce. I don't have a Mac proper (I test in a virtual machine + appveyor) so that's a difficulty for me in that sense as well. Is this apple silicon or intel? What version of Mac and the Clang compiler? That's important as in one case you'd use NEON and in the other SSE, just so we can narrow down the issue.
In terms of the problem itself, are you able to provide a snippet of code of where it goes wrong to look at it? I'm happy to try a bit of guesswork if that might narrow it down.
from hlslpp.
I see what you mean, it might have been more useful to lead with that so I can understand your problem better :)
That was somewhat hot of the press, before I could figure out if the problem was in my code or in hlsl++. Actually, it is still unclear, since even with HLSLPP_SCALAR I notice there are subtle differences in output between my GPU and CPU with hlsl++ codepaths (the differences are greater than what can be explained with fp roundoff error or indeterminism due to GPU threading execution order.)
This is on Apple Silicon, M2, Apple Clang 1500.0.40.1, Sonoma 14.1.1.
I do use swizzles in my code quite a bit, so I am dependent on the union trick doing the right thing. I did eliminate all uses of operator[] in my client code, so the only uses are within hlsl++ itself, but that made no difference. I'll work on narrowing down the problem on my end, I don't know if I'll be successful to reduce it to a snippet I can post but I'll try.
from hlslpp.
The subtle difference with HLSLPP_SCALAR was due to a bug in hlsl++, I opened a new issue for this.
from hlslpp.
OK so maybe this is a compiler bug in (Apple) clang? Whilst hacking my code trying to narrow down the problem, I got into a state where it would crash in the NEON implementation of vpermq_s32
with -O3 but it would work fine with -O2.
Comparing the disassembly, with -O2:
mov.d v2[1], v1[0]
tbl.8b v1, { v2 }, v8
mov.d v0[1], v1[0]
ldr q2, [sp]
mul.4s v23, v0, v2
fmov w9, s1
With -O3:
mov.d v24[1], v23[0]
.long 0x0e033717 ; unknown opcode
mov.d v22[1], v23[0]
mul.4s v22, v22, v1
fmov w9, s23
The compiler generates an unknown opcode with -O3...?! Never seen that before :O
from hlslpp.
That does sound pretty strange. Maybe I'm doing something weird, have you tried vshufq_s32 or vpermq_f32? Those are all relatively similar
from hlslpp.
@mlangerak I'll close this off on the assumption that it's fixed, let me know if I can help any further
from hlslpp.
Related Issues (20)
- Integer vector comparison is working incorrectly on Intel based Macs with MacOS >= 11 HOT 3
- Improve HLSL++ unit tests by using full-featured testing framework HOT 2
- Matrix accessor operator HOT 6
- operator / (float1, float3) looks inverted HOT 8
- How well does this cover HLSL202x? HOT 11
- * operator brake mult function in quaternion HOT 7
- Quaternion slerp returning nan values HOT 2
- Clang error: unused variable 't4' in hlsl++_quaternion.h HOT 6
- Upgrade warnings to errors HOT 2
- round() on ARM Neon uses wrong rounding mode HOT 2
- mul() intrinsic is missing some overloads. HOT 2
- clamp() intrinsic is missing overloads for intN HOT 1
- Some non square matrix definitions are incorrect HOT 6
- Missing matrix constructors HOT 2
- Some vector double functions missing HOT 5
- internal::round_float gives incorrect results HOT 4
- Build failure on G++13.2 HOT 11
- Errors on building Linux ARM64 HOT 2
- vcpkg support? HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hlslpp.