Coder Social home page Coder Social logo

Comments (14)

syoyo avatar syoyo commented on June 10, 2024

All done by this commit: 4433618

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

Hi,

I've made a comparitive test between the older Embree 3.5 / 3.6 state and the current commit 4433618 for Embree 3.8.

I obtain the following results for the different platforms:
Measurements.pdf

I could not identify any quality regression over previous aarch64 versions. The timings look acceptable to me. I refrain from arguing that any particular feature is x percent faster, because of the very high speed variation in the x64 results. I would not expect such major performance changes in the matured code basis. I'm unsure whether this is caused by new compiler settings, new compiler versions, lower ambient temperature on the mobile test hardware, or improvements in the code. Because of these high variations, I don't put much trust into my aarch64 timings.

I would argue that we require better testing, as we discussed in #17 to reduce the uncertainty here.

I did run into one compilation issue that is specific to IOS:
The BUILD_IOS-enabled code in bvh_intersector_hybrid.cpp contains an ambiguity for vbool<4> conversion which resulted in a compiler error. I avoided this by def-disabling the respective two areas in the cpp file.

The above-linked PDF file contains two remarks. I have noticed that some results look worse than on x64 and would argue we should investigate a bit. Please note that these issues have occurred in earlier Embree aarch64 already. I have just not noticed them earlier.

  1. There's a hole in the displacement geometry test only on aarch64.

Windows x64:
win_x64_OK

Android aarch64:
android_aarch64_NOK

iOS aarch64:
ios_aarch64_NOK

Comparing Windows (left) to iOS (right):
comparison_win_vs_ios

  1. There's a hole in the curve geometry only on aarch64.

Windows x64:
win_x64_OK

Android aarch64:
android_aarch64_NOK

iOS aarch64:
ios_aarch64_NOK

Comparing Windows (left) to iOS (right):
comparison_win_vs_ios

Since these two artifacts are not located in the axis center and do not seem to occur on x64, I think it's worth to take a closer look. I will try to repro this in the original Embree tutorials with a particular camera setup. This should allow us to debug the problematic pixels.

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

@maikschulze Awesome!

As you may already know, I have added Windows CI build support for Github Actions: https://github.com/lighttransport/embree-aarch64/actions/runs/52064026

So its ready to run regression tests once your code is available.

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

Hi,
I've dug a bit deeper into the hole in the displacement test. The test can be reproduced with Embree's 'displacement_geometry' tutorial when setting the camera the following:

Tutorial()
      : TutorialApplication("displacement_geometry",FEATURE_RTCORE) 
    {
      /* set default camera */
      camera.from = Vec3fa(0.811794400, 1.00000000, 1.95984364);
      camera.to   = Vec3fa(0.0f,0.0f,0.0f);
    }

I've tried different approaches at rcp, sqrt and rsqrt precision but could not get rid off the artifact this way. I've then created higher resoluted images, up to 16K with a high contrast likelihood against the occluded backfase using color = abs(Ng).

It turns out that these holes / artifacts also appear in x64. In both x64 and arm they are consistently appearing on the primitive edges. It's best to zoom in closely. The first x64 artifact appears in the second level (top row) whereas the arm artifact appear in the first level (bottom row):
comparison_zoom

My data is available here:
https://drive.google.com/open?id=1Wy1lBMk_WAZClEYvsmQA9p-fv3MRO-yf

This makes me wonder whether we are seeing the following known issue:
https://software.intel.com/en-us/forums/embree-photo-realistic-ray-tracing-kernels/topic/805343

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

Thank you very much for the CI setup, @syoyo ! I will take a look at it and hopefully soon find the time to set up some public tests.

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

I've dug a bit deeper into the hole in the displacement test.

@maikschulze The hole may be a bug in Embree, but may be inherent to displacement algorithm itself(e.g. the normal may face opposite direction due to insufficient sampling rate).

In the latter case, we could implement reference displacement algorithm in NanoRT (https://github.com/lighttransport/nanort/tree/master/examples/vdisp ) and verify how it goes.

For the former case, I'd recommend the following step to debug(what I did for debugging curve intersector of Embree)

  1. Dump BVH : https://github.com/lighttransport/embree-aarch64/blob/master/test/curve/main.cc#L311
  2. Add conditional breakpoint when the program enters a pixel where artifact happens, then print ray's intersection state during step exectution(or add debug printf to suspicious place)

This is a brute force way and time consuming, but should work well.

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

Hi Syoyo,

I've looked at the noise function and its effect on the surface curvature. It does not seem to create any overlap. The sampling density is sufficient, normals are not rotated inwards. The artifacts appear even when lowering the noise displacement to a very small factor, effectively resulting in a sphere-like subdivision.

On x64 such a hole is reproducible with this ray:


RTCIntersectContext context;
rtcInitIntersectContext(&context);

RTCRayHit rayhit;
rayhit.ray.org_x = 0.811794400;
rayhit.ray.org_y = 1.00000000;
rayhit.ray.org_z = 1.95984340;

rayhit.ray.dir_x = -0.186146364;
rayhit.ray.dir_y = -0.291051656;
rayhit.ray.dir_z = -0.938423395;

rayhit.ray.time = 0.0;
rayhit.ray.tnear = 0.0;
rayhit.ray.tfar = 100000.0;
rayhit.ray.mask = -1;

rtcIntersect1(g_scene,&context,&rayhit); // misses prim 4 at tfar=1.5 and hits prim 0 at tfar=3 instead.

The surface's front face that should be hit is prim 4, which is indeed hit for any slight variation in ray origin or direction. Instead, for this particular ray it's missed and the back face prim 0 is hit.

This setup enters:
GridSOAIntersector1::intersect

which then enters:
PlueckerIntersector1::intersect

Here, prim 4 is discarded because it fails to pass this test:

const vfloat<M> eps = float(ulp)*abs(UVW);
vbool<M> valid = (min(U,V,W) >= -eps) | (max(U,V,W) <= eps);
if (unlikely(none(valid))) return false;

I can imagine that Loader::gather is the cause here (called from GridSOAIntersector1::intersect) and may create a tiny gap between neighboring grid patches. I don't believe there's much I can do here with my limited understanding of the algorithm's numerics. Increasing the PlueckerIntersector1::intersect::eps if fed by GridSOAIntersector1 seems necessary but hacky to me if the factor is not properly determined.

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

Hi,

for the hole in the curve I have better news. While it is quite prevalent on arm, I fail to find holes on x64 curves. Moreover, I was able to fix the holes quickly by highly improving the rcp and rsqrt precision as a sanity check.

Firstly, for arm the hole is reproducible in the 'interpolation' scene with this ray:

RTCIntersectContext context;
rtcInitIntersectContext(&context);

RTCRayHit rayhit;
rayhit.ray.org_x = 6.00000000;
rayhit.ray.org_y = 0.00000000;
rayhit.ray.org_z = -4.50000000;

rayhit.ray.dir_x = -0.651344419;
rayhit.ray.dir_y = 0.682293713;
rayhit.ray.dir_z = 0.332002729;

rayhit.ray.time = 0.0;
rayhit.ray.tnear = 0.0;
rayhit.ray.tfar = 100000.0;
rayhit.ray.mask = -1;

rtcIntersect1(g_scene, &context, &rayhit);

When rendering with this camera:

camera.from = Vec3fa(6, 0.00000000, -4.5);
camera.to   = Vec3fa(0.0f,0.0f,1.0f);
width = 1024;
height = 1024;

and removing everything but the curve with this shading: color = (Vec3fa(1.0) + Ng) / Vec3fa(2.0);

I get these results:
aarch64:
color_arm_1k

x64:
color_x64_1k

Comparison of x64 (left) and aarch64 (right) yields:
image

aarch64 stabilized:
color_arm_1k_stabilized

Comparing x64 (left) to a 'stabilized' aarch64 (right) yields:
image

The stabilized version contains the change of rcp and rsqrt functions like this:

Vec3fa rcp  ( const Vec3fa& a )
{
	const Vec3fa r = _mm_rcp_ps(a.m128);
	const Vec3fa res = _mm_mul_ps(r,_mm_sub_ps(vfloat4(2.0f), _mm_mul_ps(r, a)));
	return res;
}

Vec3fa rsqrt( const Vec3fa& a )
{
	__m128 r = _mm_rsqrt_ps(a.m128);
	return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}

and four Newton-Raphson iterations inside _mm_rcp_ps and _mm_rsqrt_ps for sanity. I will try to narrow it down a bit more.

from embree-aarch64.

maikschulze avatar maikschulze commented on June 10, 2024

The issue is related to the five rsqrt functions (color, vec2, vec3, vfloat4, math):

Vec3fa rsqrt( const Vec3fa& a )
{
	__m128 r = _mm_rsqrt_ps(a.m128);
	return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}

The issue appears when returning r right away and skipping the second line. @syoyo , do you happen to know what this formula does?

Two Newton-Raphson iterations suffice luckily. I will create a submit with the stabilizations in my fork. If we're not perfectly confident that we can skip the similar epilogue in the rcp functions, I argue we should re-add this there as well. While we still cannot guarantee water tightness without digging deep into numerical differences between SSE and NEON, I'm sure it will help with robustness.
I guess it's worth to re-evaluate performance as well.

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

@maikschulze Good catch! > curve

It looks ARM NEON's rsqrte and rcpe has less accuracy(6~8bits?) than SSE2(~12bits?), so we'll need 2x ~ 3x more rounds of Newton-Raphon iteration to get the same level of accuracy of SSE2.

https://qiita.com/sanmanyannyan/items/4d06b00078dd4abc4225

I'm not sure > rsqrt equation.
We'll be enough to use standard Newton-Raphson iterations for rsqrt and rcp

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/CIHDIACI.html

I think using improved rsqrt and rcp also solves the displacement issue.

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024
Vec3fa rsqrt( const Vec3fa& a )
{
	__m128 r = _mm_rsqrt_ps(a.m128);
	return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}

The second line is exactly the equation of one round of Newton-Raphson iteration https://en.wikipedia.org/wiki/Fast_inverse_square_root#Newton's_method

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

Filed an rcp/rsqrt issue to #24

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

@maikschulze Synched embree-aarch64 master with Intel embree v3.9.0. This would solve some issue.

from embree-aarch64.

syoyo avatar syoyo commented on June 10, 2024

We are now v3.11.0 and v3.9.0 become old. So close the issue

from embree-aarch64.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.