There seems to be an issue with the matrix definitions for some non-square matrices. F

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Some non square matrix definitions are incorrect about hlslpp HOT 6 CLOSED

mlangerak commented on September 26, 2024

Some non square matrix definitions are incorrect

from hlslpp.

Comments (6)

redorav commented on September 26, 2024

Hi @mlangerak, this is expected and by design. The reason for that is that is that even if the physical layout changes, the logical layout still works. Are you finding inconsistencies when using these?

from hlslpp.

mlangerak commented on September 26, 2024

Conceptually a float4x3 consists of 4 rows (or columns for column major) of float3's so you'd expect that the matrix internally has 4 SIMD vector elements, since that would be the most natural and efficient mapping to matrix operations like constructing from 4 rows (or columns for column major), accessing a row (or column for column major), etc.

from hlslpp.

redorav commented on September 26, 2024

That is true, the reason I went with the other model is so that it was more space-efficient. Would you make it work the same way if it was a 4x2 matrix also be 4 simd vectors? Changing it now to work how you need would be a huge amount of work, even if I understand the rationale for it. All the matrix operations are done taking this in mind.

from hlslpp.

redorav commented on September 26, 2024

I've been looking at the code. There seems to be no "right" solution to trying to represent non 4x4 matrices by fixed-width vectors. The advantages and disadvantages to a hypothetical change I can see are:

Advantages
· Matrices are "castable" to and from the origin matrices, for example a float4x4 can become a float4x3 without issues (it can already be cast to a float3x4)
· You can poke at matrix rows via [ ] operators and assign float3 or float4

Disadvantages
· Matrices take a lot more space. In the extreme case, a float4x1 takes up four SIMD vectors if we want to make it all consistent
· It biases a physical row layout over columns
· float2x2 will become bloated as well (it's a single simd vector at the moment)

This of course ignores the large disadvantage that is to change the inner workings of all these matrices, verifying results, etc. I don't have a unit test coverage as large as I would like, it was originally more manual.

I think I can see both sides of this, it took me a lot of time to get the matrix stuff working and verified results, etc (even if there are bugs), changing it now would cause a lot more issues than it solves. If you have other ideas or suggestions I'm open to seeing your perspective or how you'd solve it

from hlslpp.

mlangerak commented on September 26, 2024

There are pros and cons to either layout, but I would argue for using the layout that causes the least surprise even if it is less optimal in some respects. By least surprise I mean which layout is more common in practice. For instance I think (I did not double check this) that when sharing constant data with the GPU, the DXC compiler will use 4 float3's for a float4x3 in row-major mode. In that case, it would be "least surprise" if you could memcopy a hlslpp::float4x3 directly when pushing constant data to the GPU.

The float2x2 case is fortunate, since then DXC uses the same packing into a single float4 as hlslpp does already. So in that case it is already consistent with DXC.

[edit]
Supporting both row and column major does seem to make this more complicated. If you really needed that flexibility you could have separate declarations for the matrix type from the matrix layout. E.g., something like:

struct Matrix4x3{...} // storage for 4 float3's, implicitly row major
struct Matrix3x4{...} // storage for 3 float4's, implicitly row major
using float4x3 = Matrix4x3; // when compiled with row major matrices
using float4x3 = Matrix3x4; // when compiled with column major matrices

It would get confusing really quickly though, so unless there is a good reason to support both row and column major, I would default to row major always since it is more natural IMO, indeed I am using hlslpp and DXC in row major mode. Incidentally I need to also support Metal shading language which is sadly column major and there is no shader compiler switch to change it either it seems. To compensate, I implemented a mul intrinsic for MSL which hides this column/row major distinction:

template<typename T, int RowN, int ColN, int N>
inline decltype(auto) mul(matrix<T, ColN, RowN> m, vec<T, N> v)
{
    return v * m;  // Note reversed order of matrix-vector product; Metal has column major matrix layout.
}

This works well to hide the row/column major confusion, and so far I've been able to write all my MSL pretending it is row-major.

from hlslpp.

redorav commented on September 26, 2024

There is a common misunderstanding that this library is meant to interface with hlsl when uploading to the GPU, I'll refer you to #58 for some more discussion.

While that would be convenient there's all sorts of cases where this doesn't happen. Just as a few examples:

cbuffer ExampleCB
{
	float3 a;     // Offset: 0
	float b;      // Offset: 12
        float2x2 m;   // Offset: 16
        float4 v;     // Offset: 48
}

HLSL considers float3 to be 12 bytes, and if you append a float3 and a float it will put them in the same 16 byte line. HLSL++ won't be able to do that since all SIMD vectors are 16 bytes. i.e.

HLSL:   [ a.x ][ a.y ][ a.z ][  b  ]

HLSL++: [ a.x ][ a.y ][ a.z ][ a.w ]
        [  b  ][      padding      ]

Same applies to float1 and float2, with their respective paddings.

HLSL considers float2x2 to be two float4s, contrary to what you say above. The offset for the next variable v is 32 bytes and not 16 as one would expect from a single float4.

HLSL:   [ m_00 ][ m_01 ][    padding   ]
        [ m_01 ][ m_11 ][    padding   ]

HLSL++: [ m_00 ][ m_01 ][ m_01 ][ m_11 ]

float3x4 and float4x3 can vary in HLSL depending on whether you specify row_major or column_major. The alignment changes depending on how you specify this parameter, between 48 bytes and 64, as you'd expect, but there is no fixed meaning to float3x4 or float4x3, it can mean 3 rows 4 columns or the other way around. This isn't something that HLSL++ can express in any way really. A row_major float3x4 maps to float3x4, and column_major float4x3 maps to float4x3, but the other combinations will fail.

The main takeaway is that there is no least surprise behavior, these things are surprising no matter what you do, even if you had no SIMD vectors C++'s packing rules can come in and do unexpected things. It is not HLSL++'s aim to solve these problems.

That said, I use a little interface to cater for this use case in my codebase. As it is still incomplete it hasn't made its way here yet, but it is possible to do with not that much work using the store() family of functions. The other thing I have basically seen at lots of codebases is to just declare aligned types in general, i.e. don't use anything other than float4, maybe row_major float3x4 and float4x4.

Hopefully that helps

from hlslpp.

Some non square matrix definitions are incorrect about hlslpp HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent