Comments (14)
All done by this commit: 4433618
from embree-aarch64.
Hi,
I've made a comparitive test between the older Embree 3.5 / 3.6 state and the current commit 4433618 for Embree 3.8.
I obtain the following results for the different platforms:
Measurements.pdf
I could not identify any quality regression over previous aarch64 versions. The timings look acceptable to me. I refrain from arguing that any particular feature is x percent faster, because of the very high speed variation in the x64 results. I would not expect such major performance changes in the matured code basis. I'm unsure whether this is caused by new compiler settings, new compiler versions, lower ambient temperature on the mobile test hardware, or improvements in the code. Because of these high variations, I don't put much trust into my aarch64 timings.
I would argue that we require better testing, as we discussed in #17 to reduce the uncertainty here.
I did run into one compilation issue that is specific to IOS:
The BUILD_IOS-enabled code in bvh_intersector_hybrid.cpp contains an ambiguity for vbool<4> conversion which resulted in a compiler error. I avoided this by def-disabling the respective two areas in the cpp file.
The above-linked PDF file contains two remarks. I have noticed that some results look worse than on x64 and would argue we should investigate a bit. Please note that these issues have occurred in earlier Embree aarch64 already. I have just not noticed them earlier.
- There's a hole in the displacement geometry test only on aarch64.
Comparing Windows (left) to iOS (right):
- There's a hole in the curve geometry only on aarch64.
Comparing Windows (left) to iOS (right):
Since these two artifacts are not located in the axis center and do not seem to occur on x64, I think it's worth to take a closer look. I will try to repro this in the original Embree tutorials with a particular camera setup. This should allow us to debug the problematic pixels.
from embree-aarch64.
@maikschulze Awesome!
As you may already know, I have added Windows CI build support for Github Actions: https://github.com/lighttransport/embree-aarch64/actions/runs/52064026
So its ready to run regression tests once your code is available.
from embree-aarch64.
Hi,
I've dug a bit deeper into the hole in the displacement test. The test can be reproduced with Embree's 'displacement_geometry' tutorial when setting the camera the following:
Tutorial()
: TutorialApplication("displacement_geometry",FEATURE_RTCORE)
{
/* set default camera */
camera.from = Vec3fa(0.811794400, 1.00000000, 1.95984364);
camera.to = Vec3fa(0.0f,0.0f,0.0f);
}
I've tried different approaches at rcp, sqrt and rsqrt precision but could not get rid off the artifact this way. I've then created higher resoluted images, up to 16K with a high contrast likelihood against the occluded backfase using color = abs(Ng).
It turns out that these holes / artifacts also appear in x64. In both x64 and arm they are consistently appearing on the primitive edges. It's best to zoom in closely. The first x64 artifact appears in the second level (top row) whereas the arm artifact appear in the first level (bottom row):
My data is available here:
https://drive.google.com/open?id=1Wy1lBMk_WAZClEYvsmQA9p-fv3MRO-yf
This makes me wonder whether we are seeing the following known issue:
https://software.intel.com/en-us/forums/embree-photo-realistic-ray-tracing-kernels/topic/805343
from embree-aarch64.
Thank you very much for the CI setup, @syoyo ! I will take a look at it and hopefully soon find the time to set up some public tests.
from embree-aarch64.
I've dug a bit deeper into the hole in the displacement test.
@maikschulze The hole may be a bug in Embree, but may be inherent to displacement algorithm itself(e.g. the normal may face opposite direction due to insufficient sampling rate).
In the latter case, we could implement reference displacement algorithm in NanoRT (https://github.com/lighttransport/nanort/tree/master/examples/vdisp ) and verify how it goes.
For the former case, I'd recommend the following step to debug(what I did for debugging curve intersector of Embree)
- Dump BVH : https://github.com/lighttransport/embree-aarch64/blob/master/test/curve/main.cc#L311
- Add conditional breakpoint when the program enters a pixel where artifact happens, then print ray's intersection state during step exectution(or add debug printf to suspicious place)
This is a brute force way and time consuming, but should work well.
from embree-aarch64.
Hi Syoyo,
I've looked at the noise function and its effect on the surface curvature. It does not seem to create any overlap. The sampling density is sufficient, normals are not rotated inwards. The artifacts appear even when lowering the noise displacement to a very small factor, effectively resulting in a sphere-like subdivision.
On x64 such a hole is reproducible with this ray:
RTCIntersectContext context;
rtcInitIntersectContext(&context);
RTCRayHit rayhit;
rayhit.ray.org_x = 0.811794400;
rayhit.ray.org_y = 1.00000000;
rayhit.ray.org_z = 1.95984340;
rayhit.ray.dir_x = -0.186146364;
rayhit.ray.dir_y = -0.291051656;
rayhit.ray.dir_z = -0.938423395;
rayhit.ray.time = 0.0;
rayhit.ray.tnear = 0.0;
rayhit.ray.tfar = 100000.0;
rayhit.ray.mask = -1;
rtcIntersect1(g_scene,&context,&rayhit); // misses prim 4 at tfar=1.5 and hits prim 0 at tfar=3 instead.
The surface's front face that should be hit is prim 4, which is indeed hit for any slight variation in ray origin or direction. Instead, for this particular ray it's missed and the back face prim 0 is hit.
This setup enters:
GridSOAIntersector1::intersect
which then enters:
PlueckerIntersector1::intersect
Here, prim 4 is discarded because it fails to pass this test:
const vfloat<M> eps = float(ulp)*abs(UVW);
vbool<M> valid = (min(U,V,W) >= -eps) | (max(U,V,W) <= eps);
if (unlikely(none(valid))) return false;
I can imagine that Loader::gather is the cause here (called from GridSOAIntersector1::intersect) and may create a tiny gap between neighboring grid patches. I don't believe there's much I can do here with my limited understanding of the algorithm's numerics. Increasing the PlueckerIntersector1::intersect::eps if fed by GridSOAIntersector1 seems necessary but hacky to me if the factor is not properly determined.
from embree-aarch64.
Hi,
for the hole in the curve I have better news. While it is quite prevalent on arm, I fail to find holes on x64 curves. Moreover, I was able to fix the holes quickly by highly improving the rcp and rsqrt precision as a sanity check.
Firstly, for arm the hole is reproducible in the 'interpolation' scene with this ray:
RTCIntersectContext context;
rtcInitIntersectContext(&context);
RTCRayHit rayhit;
rayhit.ray.org_x = 6.00000000;
rayhit.ray.org_y = 0.00000000;
rayhit.ray.org_z = -4.50000000;
rayhit.ray.dir_x = -0.651344419;
rayhit.ray.dir_y = 0.682293713;
rayhit.ray.dir_z = 0.332002729;
rayhit.ray.time = 0.0;
rayhit.ray.tnear = 0.0;
rayhit.ray.tfar = 100000.0;
rayhit.ray.mask = -1;
rtcIntersect1(g_scene, &context, &rayhit);
When rendering with this camera:
camera.from = Vec3fa(6, 0.00000000, -4.5);
camera.to = Vec3fa(0.0f,0.0f,1.0f);
width = 1024;
height = 1024;
and removing everything but the curve with this shading: color = (Vec3fa(1.0) + Ng) / Vec3fa(2.0);
Comparison of x64 (left) and aarch64 (right) yields:
Comparing x64 (left) to a 'stabilized' aarch64 (right) yields:
The stabilized version contains the change of rcp and rsqrt functions like this:
Vec3fa rcp ( const Vec3fa& a )
{
const Vec3fa r = _mm_rcp_ps(a.m128);
const Vec3fa res = _mm_mul_ps(r,_mm_sub_ps(vfloat4(2.0f), _mm_mul_ps(r, a)));
return res;
}
Vec3fa rsqrt( const Vec3fa& a )
{
__m128 r = _mm_rsqrt_ps(a.m128);
return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}
and four Newton-Raphson iterations inside _mm_rcp_ps
and _mm_rsqrt_ps
for sanity. I will try to narrow it down a bit more.
from embree-aarch64.
The issue is related to the five rsqrt functions (color, vec2, vec3, vfloat4, math):
Vec3fa rsqrt( const Vec3fa& a )
{
__m128 r = _mm_rsqrt_ps(a.m128);
return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}
The issue appears when returning r
right away and skipping the second line. @syoyo , do you happen to know what this formula does?
Two Newton-Raphson iterations suffice luckily. I will create a submit with the stabilizations in my fork. If we're not perfectly confident that we can skip the similar epilogue in the rcp functions, I argue we should re-add this there as well. While we still cannot guarantee water tightness without digging deep into numerical differences between SSE and NEON, I'm sure it will help with robustness.
I guess it's worth to re-evaluate performance as well.
from embree-aarch64.
@maikschulze Good catch! > curve
It looks ARM NEON's rsqrte and rcpe has less accuracy(6~8bits?) than SSE2(~12bits?), so we'll need 2x ~ 3x more rounds of Newton-Raphon iteration to get the same level of accuracy of SSE2.
https://qiita.com/sanmanyannyan/items/4d06b00078dd4abc4225
I'm not sure > rsqrt equation.
We'll be enough to use standard Newton-Raphson iterations for rsqrt and rcp
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/CIHDIACI.html
I think using improved rsqrt and rcp also solves the displacement issue.
from embree-aarch64.
Vec3fa rsqrt( const Vec3fa& a )
{
__m128 r = _mm_rsqrt_ps(a.m128);
return _mm_add_ps(_mm_mul_ps(_mm_set1_ps(1.5f),r), _mm_mul_ps(_mm_mul_ps(_mm_mul_ps(a, _mm_set1_ps(-0.5f)), r), _mm_mul_ps(r, r)));
}
The second line is exactly the equation of one round of Newton-Raphson iteration https://en.wikipedia.org/wiki/Fast_inverse_square_root#Newton's_method
from embree-aarch64.
Filed an rcp/rsqrt issue to #24
from embree-aarch64.
@maikschulze Synched embree-aarch64 master
with Intel embree v3.9.0
. This would solve some issue.
from embree-aarch64.
We are now v3.11.0
and v3.9.0 become old. So close the issue
from embree-aarch64.
Related Issues (20)
- curve_intersector_virtual.cpp takes too much time to compile(aarch64-v3.8.0) HOT 6
- verify NEON seg faults HOT 2
- `verify` watertight test fails to pass HOT 1
- Improve the accuracy of minps, maxps, rcp, rsqrt in NEON HOT 16
- Use newer SSE2NEON HOT 1
- SubD GridSOA seg faults om aarch64 (v3.11.0) HOT 2
- Watertight test fails in `verify` HOT 2
- SSE2NEON mapping for _mm_sqrt_ps(0) does not return 0 HOT 3
- support libc++ build on aarch64 linux + clang
- Code Changes that impact x64 HOT 3
- verify NEON.update failure in recent commit(at least from v3.11.0) HOT 7
- Lots of `verify` test failures in v3.12.0 merge HOT 2
- Internal compilation error(implicit_conv_expr) when using gcc8 ~ 10. HOT 1
- verify crashes for AVX2/NEON2X with ray packets HOT 1
- Native aarch64 build on Travis CI HOT 4
- create releases HOT 5
- verify fails to execute on M1 macOS HOT 5
- FPS degradation in tutorial code with GUI on M1 macOS
- Possible assertion at checkPadding16 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from embree-aarch64.