redorav / hlslpp Goto Github PK

Math library using hlsl syntax with SSE/NEON support

License: MIT License

C++ 64.62% C 33.06% Batchfile 0.02% Lua 2.28% Shell 0.01%

hlsl math cpp shaders game-development sse sse41 neon vector matrix

hlslpp's Introduction

HLSL++

Small header-only math library for C++ with the same syntax as the hlsl shading language. It supports any SSE (x86/x64 devices like PC, Mac, PS4/5, Xbox One/Series) and NEON (ARM devices like Android, iOS, Switch) platforms. It features swizzling and all the operators and functions from the hlsl documentation. The library is aimed mainly at game developers as it's meant to ease the C++ to shader bridge by providing common syntax, but can be used for any application requiring fast, portable math. It also adds some functionality that hlsl doesn't natively provide, such as convenient matrix functions, quaternions and extended vectors such as float8 (8-component float) that take advantage of wide SSE registers.

Example

hlsl++ allows you to be as expressive in C++ as when programming in the shader language. Constructs such as the following are possible.

float4 foo4 = float4(1, 2, 3, 4);
float3 bar3 = foo4.xzy;
float2 logFoo2 = log(bar3.xz);
foo4.wx = logFoo2.yx;
float4 baz4 = float4(logFoo2, foo4.zz);
float4x4 fooMatrix4x4 = float4x4( 1, 2, 3, 4,
                                  5, 6, 7, 8,
                                  8, 7, 6, 5,
                                  4, 3, 2, 1);
float4 myTransformedVector = mul(fooMatrix4x4, baz4);
int2 ifoo2 = int2(1, 2);
int4 ifoo4 = int4(1, 2, 3, 4) + ifoo2.xyxy;
float4 fooCast4 = ifoo4.wwyx;

float8 foo8 = float8(1, 2, 3, 4, 5, 6, 7, 8);
float8 bar8 = float8(1, 2, 3, 4, 5, 6, 7, 8);
float8 add8 = foo8 + bar8;

The natvis files provided for Visual Studio debugging allow you to see both vectors and the result of the swizzling in the debugging window in a programmer-friendly way.

Requirements

The only required features are a C++ compiler supporting anonymous unions, and SSE or NEON depending on your target platform. If your target platform does not have SIMD support, it can also fall back to a scalar implementation. As a curiosity it also includes an Xbox 360 implementation.

How to use

// The quickest way, expensive in compile times but good for fast iteration
#include "hlsl++.h"

// If you care about your compile times in your cpp files
#include "hlsl++_vector_float.h"
#include "hlsl++_matrix_float.h"

// If you only need type information (e.g. in header files) and don't use any functions
#include "hlsl++_vector_float_type.h"
#include "hlsl++_quaternion_type.h"

Remember to add an include path to "hlslpp/include"
Windows has defines for min and max so if you're using this library and the <windows.h> header remember to #define NOMINMAX before including it
To force the scalar version of the library, define HLSLPP_SCALAR globally. The scalar library is only different from the SIMD version in its use of regular floats to represent vectors. It should only be used if your platform (e.g. embedded) does not have native SIMD support. It can also be used to compare performance
To enable the transforms feature, define HLSLPP_FEATURE_TRANSFORM globally
The f32 members of float4 and the [ ] operators make use of the union directly, so the generated code is up to the compiler. Use with care

Features

SSE/AVX/AVX2, NEON, Xbox360, and scalar versions
float1, float2, float3, float4, float8
int1, int2, int3, int4
uint1, uint2, uint3, uint4
double1, double2, double3, double4
floatNxM
quaternion
Conversion construction and assignment, e.g. float4(float2, float2) and int4(float2, int2)
Efficient swizzling for all vector types
Basic operators +, *, -, / for all vector and matrix types
Per-component comparison operators ==, !=, >, <, >=, <= (no ternary operator as overloading is disallowed in C++)
hlsl vector functions: abs, acos, all, any, asin, atan, atan2, ceil, clamp, cos, cosh, cross, degrees, distance, dot, floor, fmod, frac, exp, exp2, isfinite, isinf, isnan, length, lerp, log, log2, log10, max, mad, min, modf, normalize, pow, radians, reflect, refract, round, rsqrt, saturate, sign, sin, sincos, sinh, smoothstep, sqrt, step, trunc, tan, tanh
Additional matrix functions: determinant, transpose, inverse (not in hlsl but very useful)
Matrix multiplication for all NxM matrix combinations
Transformation matrices for scale, rotation and translation, as well as world-to-view look_at and view-to-projection orthographic/perspective coordinate transformations. These static functions are optionally available for matrix types float2x2, float3x3, float4x4 when hlsl++.h is compiled with HLSLPP_FEATURE_TRANSFORM definition.
Native visualizers for Visual Studio (.natvis files) which correctly parse with both MSVC and Clang in Windows

Missing/planned:

boolN types

hlslpp's People

Contributors

Stargazers

Watchers

hlslpp's Issues

HLSL++ structs do not support move-semantics

HLSL++ vector and matric structs have user-defined copy constructor which breaks "rule of zero", but do not define copy assignment, move constructor and move assignment operators which also means that these types also do not follow "rule of five" resulting in missing support of move semantics and lower performance when used in STL containers like std::vector.

It seems like HLSL++ types do not need to have user-defined copy constructor. Removing of user-defined copy-constructors will let the compiler generate correct implementations of noexcept copy/move constructors and noexcept assignment operators unlocking the effective memory management in modern C++.

Add shift operators to intN

Add integer operators (bitwise, modulo)

~, <<, >>, &, |, ^, <<=, >>=, &=, |=, ^=

Also modulo %

Integer vectors do not define /= operator

Floating point vector types have /= operator, but integer vectors do not.

Add uintN type

Make unit tests run as part of the AppVeyor builds

Unit tests should output a performance file

Synthetic tests for the various configurations would be able to create a comparison table for all the different functions, and find problematic areas.

Add mad function

Apparently hlsl actually has a mad function

Add nodiscard to all relevant functions

Add preincrement/postincrement operators

++, --

Add functions to create projection matrices

Add options like different ndcs, handedness, etc.

vceilq_f32

hlslpp_inline float32x4_t vceilq_f32(float32x4_t x)
{
	float32x4_t trnc = vcvtq_f32_s32(vcvtq_s32_f32(x));				// Truncate
	float32x4_t gt = vcgtq_f32(trnc, x);							// Check if truncation was greater or smaller (i.e. was negative or positive number)
	uint32x4_t shr = vshrq_n_u32(vreinterpretq_u32_f32(gt), 31);	// Shift to leave a 1 or a 0
	float32x4_t result = vaddq_f32(trnc, vcvtq_f32_u32(shr));		// Add to truncated value
	return result;
}

"float32x4_t gt = vcgtq_f32(trnc, x);" should be modified to "float32x4_t gt = vcgtq_f32(x, trnc);"

Matrix comparison operators

Matrices currently do not have any comparison operators defined for them. I can get around it by writing my own operators manually but it would be nice if these were built-in in hlslpp.

Build output

1>C:\Personal\ElectronicJonaJoy\src\EngineTests\math.tests.cpp(67,1): error C2678: binary '==': no operator found which takes a left-hand operand of type 'hlslpp::float4x4' (or there is no acceptable conversion)
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_float.h(1122,23): message : could be 'hlslpp::float1 hlslpp::operator ==(const hlslpp::float1 &,const hlslpp::float1 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_float.h(1123,23): message : or       'hlslpp::float2 hlslpp::operator ==(const hlslpp::float2 &,const hlslpp::float2 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_float.h(1124,23): message : or       'hlslpp::float3 hlslpp::operator ==(const hlslpp::float3 &,const hlslpp::float3 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_float.h(1125,23): message : or       'hlslpp::float4 hlslpp::operator ==(const hlslpp::float4 &,const hlslpp::float4 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_int.h(568,21): message : or       'hlslpp::int1 hlslpp::operator ==(const hlslpp::int1 &,const hlslpp::int1 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_int.h(569,21): message : or       'hlslpp::int2 hlslpp::operator ==(const hlslpp::int2 &,const hlslpp::int2 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_int.h(570,21): message : or       'hlslpp::int3 hlslpp::operator ==(const hlslpp::int3 &,const hlslpp::int3 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_int.h(571,21): message : or       'hlslpp::int4 hlslpp::operator ==(const hlslpp::int4 &,const hlslpp::int4 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_uint.h(573,22): message : or       'hlslpp::uint1 hlslpp::operator ==(const hlslpp::uint1 &,const hlslpp::uint1 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_uint.h(574,22): message : or       'hlslpp::uint2 hlslpp::operator ==(const hlslpp::uint2 &,const hlslpp::uint2 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_uint.h(575,22): message : or       'hlslpp::uint3 hlslpp::operator ==(const hlslpp::uint3 &,const hlslpp::uint3 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_uint.h(576,22): message : or       'hlslpp::uint4 hlslpp::operator ==(const hlslpp::uint4 &,const hlslpp::uint4 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_double.h(1079,24): message : or       'hlslpp::double1 hlslpp::operator ==(const hlslpp::double1 &,const hlslpp::double1 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_double.h(1080,24): message : or       'hlslpp::double2 hlslpp::operator ==(const hlslpp::double2 &,const hlslpp::double2 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_double.h(1081,24): message : or       'hlslpp::double3 hlslpp::operator ==(const hlslpp::double3 &,const hlslpp::double3 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_vector_double.h(1090,24): message : or       'hlslpp::double4 hlslpp::operator ==(const hlslpp::double4 &,const hlslpp::double4 &)' [found using argument-dependent lookup]
1>c:\personal\electronicjonajoy\external\hlslpp\include\hlsl++_quaternion.h(227,24): message : or       'hlslpp::float4 hlslpp::operator ==(const hlslpp::quaternion &,const hlslpp::quaternion &)' [found using argument-dependent lookup]
1>C:\Personal\ElectronicJonaJoy\src\EngineTests\math.tests.cpp(67,1): message : while trying to match the argument list '(hlslpp::float4x4, hlslpp::float4x4)'

compatiblity with opengl?

I tried replacing handmademath with this library in my opengl application but it broke.

Wrong value when flooring Y Component.

Below code snippet is the current behaviour for me.

float1 y{ -0.01f };
float uf = hlslpp::floor(y); // returns -1 : ok
			
float3 broken{ -11.15f,-0.1f,-15.0f };
// Accessing the Y component is correct
float yVal = broken.y;
// next statement returns -12.0f -> Floor of x component
float actualf = hlslpp::floor(broken.y);

The floor function seems to be flooring my x component and returning that value instead of the Y component.

Add rcp()

Add refract

I think the function refract is missing. The following code is stolen from here. I is the incident vector, N is the normal vector, and eta is the ratio of indices of refraction.

k = 1.0 - eta * eta * (1.0 - dot(N, I) * dot(N, I));
if (k < 0.0)
    R = floatN(0.0);
else
    R = eta * I - (eta * dot(N, I) + sqrt(k)) * N;

Please add it thank you.

Add boolN types

Better inclusion manual for usage in existing project

It would be nice if it would contain a manual on how to incorporate it into an existing VisualStudio solution to be able to quickly use this awesome library. For example which settings have to be checked for a successful compilation or what else to pay attention to, because simple including the headers doesn't work.

I think that would be great, because it enables inexperienced c++, vs pipeline users to quickly try and use this awesome library.

Double precision matrix support

Would you consider adding support for double precision matrix such as double4x4?

Add modulo operator

Apparently float also accepts the modulo operator so it needs to be added to every type

Integer division needs better implementation in SSE

SSE doesn't have native division instructions for vectors. One possibility is to extract the scalars, divide, then put back. Another alternative is to take a look at this website which seems to have alternatives and claim to be fast

http://libdivide.com/

Optimize double vectors using AVX

This is already halfway done, but here for keeping track. Takes advantage of AVX support to pack double3 and double4 into __m256d instead of two __m128d

Add function ternary() or select() to mimic the missing ternary operator

Add atan2

Double vectors are not initialize with zeroes in default constructors

Other vector types have proper initialization of the internal storage with zeroes, but double vectors do not.

Add overloads for float

It is ambiguous to do things like hlslpp::radians(0.3f) because float can be implicitly converted to floatN. Even if it's not the purpose of hlsl++ to provide scalar versions of these functions it's probably not hard and makes it more complete

Add bit manipulation functions

countbits
reversebits
firstbithigh
firstbitlow

Vector comparison operators are not available for doubles and uints

operator ==, !=, <, <=, >, >= is not implemented for double1, double2, double3, double4 types and the implementation is commented out for uint1, uint2, uint3, uint4. Meanwhile these operators are properly implemented for floats and ints.

I'm trying to use vector types in my template wrapper class Point<T, N> which is used with floats, ints, uints and doubles and its is currently failing to compile for T=uint32_t and T=double because of this asymetry in underlying vector types implementation.

Would it be possible to implement these comparison operators for all vector types?

Broken lerp

Lerp seems to be broken. Had to revert to path marked as slower in _hlslpp_lerp_ps in order to get it working.

uint4 components stored in reverse (w, z, y, x) order ? (SSE)

Considering the following declarations :

float4 f(1, 2, 3, 4);
uint4 u(1, 2, 3, 4);

float4 components are in expected order (x, y, z, w) in memory, but uint4 components are in reverse order (w, z, y, x)

Add "any" and 'all' syntax to branch according to vector comparison result

Hi,
I would found it very useful to add the 'any'/'all' HLSL syntax to branch according to vector comparison result.

ie.

void CommandList::setViewport(const uint4 & _viewport)
{
    if ( any( _viewport != m_viewport ) )
    {
        bindViewport(_viewport);
        m_viewport = _viewport;
    }
}

Operators like intN operator != could return boolN to make it even clearer to use.

Thanks,
Benoît.

Add matrix swizzling

Add doubleN types

Add AVX/AVX2 version of functions

_hlslpp_sel_ps NEON

I think there is a mistake in the NEON definition of _hlslpp_sel_ps
The SSE definition is
#define _hlslpp_sel_ps(x, y, mask) _mm_blendv_ps((x), (y), (mask))
which is correct, whem mask is 1 y is selected otherwise x
in NEON
#define _hlslpp_sel_ps(x, y, mask) vbslq_f32((mask), (x), (y))
which should be
#define _hlslpp_sel_ps(x, y, mask) vbslq_f32((mask), (y), (x))
in vbslq_f32 when mask is one the second argument is selected otherwise the third

GCC build error and MSVC warning on invalid cast of an rvalue

Latest version of HLSL++ doest not build with GCC & MSVC at maximum warning level:

GCC errors example: hlsl++_sse.h:792:41: error: invalid cast of an rvalue expression of type ‘__m128’ {aka ‘__vector(4) float’} to type ‘const n128i&’ {aka ‘const __vector(2) long long int&’} 792 | x = (const n128i&)_mm_load_ss((float*)p);
MSVC warnings example: hlsl++_sse.h(792,20): warning C4238: nonstandard extension used: class rvalue used as lvalue

Add operator [ ]

For vectors and matrices. Vectors return a float1, matrices a float4

Add scalar version of library

Add a non-vectorized version of the library. This can allow to mix and match on platforms (like NEON 32-bit) that don't have vectorized double types but may want to use the math lib. It can also help in future comparisons between vectorized code and scalar code.

Rename vecN in the natvis to rowN

For clarity, as the physical layout doesn't necessarily match (e.g. float4x3 and float3x4 have same physical layout)

Optimize NEON shuffles

They're too generic currently and inefficient. We can probably specialize most combinations using constructs such as

vcombine_f32(vget_high_f32(x), vget_low_f32(y))
vrev64q_f32(x)

etc.

Add Appveyor jobs for Linux and MacOS

Add load

I noticed there's store() but no load(). There is a section specified as "Float Store/Load" but load is missing. Just making sure it's not forgotten. Would be handy.

Add SSE2 fallbacks

Seems simple enough, it's these functions:

// Float
_mm_blend_ps
_mm_blendv_ps
_mm_trunc_ps
_mm_round_ps
_mm_ceil_ps

// Int
_mm_blend_epi16
_mm_mullo_epi32
_mm_mul_epi32
_mm_max_epi32
_mm_min_epi32

// Double
_mm_blend_pd

Compare with similar libraries

It wll be reasonable to compare this library with similar libraries, such as glm and hlml

Will be interesting to see something like a table of hlslpp vs glm vs hlml features

type size and alignment

hi,

hlslpp looks great，the only reason which prevents me to use it is the size and alignment of each types.
float1/2/3 is 16 bytes, and every floatN(xM) in hlslpp has alignment of 16 bytes(rather than 4).
that's very different from hlsl, we can't share some code between c++ and hlsl, such as some buffer struct defines.

any thoughts about it ? thanks.

Add sincos

hlslpp has been great so far. Great job.

Only had one issue with the code I tried to port some hlsl code, the lack of sincos():
https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-sincos

Move constructors are not auto-generated for matrix types anymore

Hi @redorav,
I've noticed that in one of the latest commits you've added manual implementation of the copy constructors and copy assignment operators to matrix types. As the result, C++ compiler does not generate move constructors and assignment operators automatically for these types and I received a bunch of issues from my static analysis system regarding std::move(matrix) calls and other std::move(...) calls for types that have matrix fields in Methane Kit. This can be fixed either by removing manual implementation of copy constructors and assignment operators to let C++ do the magic of auto-generating them properly or implement both copy and move constructors and assignment operators (according to rule of five). Also be sure to make move constructors and assignment operators noexcept according to standard. I have suggested to do this before in issue #40 which was fixed with removal of manual implementations. Is there any reason to keep these manual implementations? Are they different from the auto-generated ones?

Add modf

Fast affine inverse support

There are lots of cases in animation require computing the inverse of affine matrices, there are many assumptions that one can make when a 4x4 matrix is an affine transformation, any chance something like that would be considered?

(p.s. this is Bryan from PG ;))