Coder Social home page Coder Social logo

lighttransport / embree-aarch64 Goto Github PK

View Code? Open in Web Editor NEW
47.0 47.0 11.0 70.73 MB

AARCH64 port of Embree ray tracing library

License: Apache License 2.0

CMake 1.46% C++ 82.98% Shell 0.27% C 14.40% Python 0.55% Batchfile 0.03% Makefile 0.16% Java 0.03% Objective-C 0.11% mIRC Script 0.02%

embree-aarch64's People

Contributors

atafra avatar betajippity avatar cbenthin avatar developer-ecosystem-engineering avatar freibold avatar gkyriazis avatar gregmund avatar heinrich26 avatar ingowald avatar jeffamstutz avatar johguenther avatar jomeng avatar kraszkow avatar louisfeng avatar maikschulze avatar pchang0414 avatar pnav avatar sambler avatar scmcduffee avatar selcott avatar skwerner avatar svenwoop avatar syoyo avatar timrowley avatar twinklebear avatar vikramambrose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

embree-aarch64's Issues

FPS degradation in tutorial code with GUI on M1 macOS

Originated from: #49

with pathtracer,

  • 50 fps(120 Mrays/s) on Threadripper 1950X
  • 4 fps(37 Mrays/s) on M1 macOS

With pathtracer --nodisplay(or bechmark mode(e.g. --benchmark 10 100)),

  • 60 fps on TR1950X
  • 16 fps on M1 macOS

There are 4 times difference for M1. The reason may be thread affinity or windowing component(GLFW),

verify crashes for AVX2/NEON2X with ray packets

When compiling Embree with

-DEMBREE_MAX_ISA=AVX2
-DEMBREE_NEON_AVX2_EMULATION=ON
-DEMBREE_RAY_PACKETS:BOOL=ON

the verify test suite crashes

2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Build fingerprint: 'OnePlus/OnePlus3/OnePlus3T:9/PKQ1.181203.001/1911042108:user/release-keys'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Revision: '0'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: ABI: 'arm64'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: pid: 16363, tid: 16447, name: Thread-2  >>> com.example.embreetest <<<
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Cause: null pointer dereference
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x0  000000702e1fa600  x1  000000704b7d1a58  x2  000000702e1fa380  x3  000000702e1fa6d0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x4  0000000000000140  x5  000000000000021c  x6  00000000000001c0  x7  0000000000000180
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x8  0000000000000000  x9  000000704b60da80  x10 000000704b7d1a00  x11 000000702e1faa40
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x12 000000702e1faa40  x13 00000000ffffffff  x14 00000000ffffffff  x15 00000000ffffffff
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x16 000000702e1faa00  x17 0000000000000040  x18 00000000000000d7  x19 000000702e1fa6d0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG:     x20 000000702e1fa380  x21 000000702e1fa600  x22 000000704b79b000  x23 0000000000000000
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG:     x24 000000007f800000  x25 000000702e1fa780  x26 0000000000000010  x27 0000000000000040
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG:     x28 0000000000000080  x29 000000702e1f9e30
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG:     sp  000000702e1f9e00  lr  000000703814c2f8  pc  0000000000000000
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: backtrace:
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #00 pc 0000000000000000  <unknown>
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #01 pc 00000000000592f4  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::AccelN::intersect8(void const*, embree::Accel::Intersectors*, RTCRayHit8&, embree::IntersectContext*)+116)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #02 pc 0000000000831200  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (_ZN6embree4avx215RayStreamFilter9filterSOAILi8ELb1EEEvPNS_5SceneEPcmmmPNS_16IntersectContextE+2416)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #03 pc 00000000000727d8  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (rtcIntersect16+104)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #04 pc 000000000005d564  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::IntersectWithModeInternal(embree::IntersectMode, embree::IntersectVariant, RTCSceneTy*, RTCRayHit*, unsigned int, RTCIntersectContext*)+3188)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #05 pc 000000000005dbcc  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::IntersectWithMode(embree::IntersectMode, embree::IntersectVariant, RTCSceneTy*, RTCRayHit*, unsigned int, RTCIntersectContext*)+348)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #06 pc 000000000009b47c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::UpdateTest::run(embree::VerifyApplication*, bool)+1996)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #07 pc 000000000005f338  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::VerifyApplication::Test::execute(embree::VerifyApplication*, bool)+152)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #08 pc 0000000000084318  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (_ZN6embree13TaskScheduler19ClosureTaskFunctionIZNS0_5spawnImZNS_12parallel_forImZNS_17VerifyApplication9TestGroup7executeEPS4_bE3$_0EEvT_RKT0_EUlRKNS_5rangeImEEE_EEvS8_S8_S8_SB_EUlvE_E7executeEv+136)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #09 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG:     #10 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.179 16740-16740/? A/DEBUG:     #11 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #12 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #13 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #14 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #15 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #16 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #17 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #18 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #19 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #20 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #21 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #22 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #23 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #24 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #25 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #26 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #27 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #28 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #29 pc 0000000000a8a610  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #30 pc 0000000000a88fec  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #31 pc 0000000000a8927c  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #32 pc 0000000000a89de8  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::thread_loop(unsigned long)+632)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #33 pc 0000000000a895a8  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::ThreadPool::thread_loop(unsigned long)+264)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #34 pc 0000000000a84684  /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::threadStartup(embree::ThreadStartupData*)+20)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #35 pc 0000000000099508  /system/lib64/libc.so (__pthread_start(void*)+36)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG:     #36 pc 0000000000023e18  /system/lib64/libc.so (__start_thread+68)

This is likely caused by dispatch to missing functions of the non-present AVX code path.

Status: A fix is being worked on.

Noise(wrong intersection result) in curve test

At least it happens from v3.5.0

aarch64 result of curve test contains some noise, probably due to a NEON./fp math issue.
render
(x86_64 v3.7.0)

v3 5 0-aarch64
(aarch64 v3.5.0)

v3 6 1-aarch64
(aarch64 master(v3.6.1 + iOS patch))

Internal compilation error(implicit_conv_expr) when using gcc8 ~ 10.

The issue was found in the discussion at #34 (comment)

On ARM Linux + gcc8 ~ gcc10, internal compilation error happens in the following line.
(FYI gcc 7.5 works fine)

/home/syoyo/work/embree-aarch64-neon2x/common/tasking/taskschedulerinternal.h:141:27: internal compiler error: unexpected expression ‘(std::__atomic_base<long unsigned int>::__int_type)((embree::TaskScheduler::TaskQueue*)this)->embree::TaskScheduler::TaskQueue::right’ of kind implicit_conv_expr
  141 |         new (&(tasks[right])) Task(func,thread.task,oldStackPtr,size);
      |                           ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.

The situation is using std::atomic<size_t> right for array lookup.

The code is tricky and if we want to ensure the atomic operation when creating Task, it'd be better to take a lock, insted of using std::atomic variable directly.

At least I can confirm explicitly call load() will solve the issue, for example tasks[right.load()], but not sure this is the right way to suppress the internal compilation error.

SSE2NEON mapping for _mm_sqrt_ps(0) does not return 0

The NEON-mapped function _mm_sqrt_ps does not handle zero inputs properly. Instead of returning 0, it returns a NaN value. This is caused by the underyling function 'vrsqrteq_f32' which returns +inf.

This affects CylinderN::intersect such that
const vfloat<N> Q = sqrt(D);
would not work properly if D=0, resulting in a false miss.

This can be reproduced on arm64 (not on x64) with the following ray in the interpolation scene:

  {
    RTCIntersectContext context;
    rtcInitIntersectContext(&context);

    RTCRayHit rayhitdebug;
    rayhitdebug.ray.org_x = -0.000000476837158;
    rayhitdebug.ray.org_y = 4.99999952;
    rayhitdebug.ray.org_z = -7.07106781;

    rayhitdebug.ray.dir_x = 0.645600736;
    rayhitdebug.ray.dir_y = -0.512669444;
    rayhitdebug.ray.dir_z = 0.566012084;

    rayhitdebug.ray.time = 0.0;
    rayhitdebug.ray.tnear = 0.0;
    rayhitdebug.ray.tfar = 100000.0;
    rayhitdebug.ray.mask = -1;
    rayhitdebug.ray.id = 0;
    rayhitdebug.ray.flags = 0;

    Vec3fa xx(-1, 0, 1);
    Vec3fa yy = rsqrt(xx);
    Vec3fa zz = sqrt(xx);

    rtcIntersect1(g_scene, &context, &rayhitdebug);
  }

SubD GridSOA seg faults om aarch64 (v3.11.0)

branch: aarch64-v3.11.0
os: Linux aarch64
compiler: clang 9

verify segfaults at NEON.enable_disable_geometry

 NEON.enable_disable_geometry ...AddressSanitizer:DEADLYSIGNAL
=================================================================
==19473==ERROR: AddressSanitizer: SEGV on unknown address 0x0000a6d6feec (pc 0x007fabf124a0 bp 0x007ffe6d2e70 sp 0x007ffe6d1420 T0)
AddressSanitizer:DEADLYSIGNAL
==19473==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
    #0 0x7fabf1249c in embree::sse2::GridSOA::root(unsigned long) const /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/grid_soa.h:62:97
    #1 0x7fabf1249c in embree::sse2::SubdivPatch1Intersector1::processLazyNode(embree::sse2::SubdivPatch1Precalculations<embree::sse2::GridSOAIntersector1::Precalculations>&, embree::IntersectContext*, embree::sse2::GridSOA const*, unsigned long&) /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/subdivpatch1_intersector.h:40:27
    #2 0x7fabf1249c in void embree::sse2::SubdivPatch1Intersector1::intersect<4, 4, true>(embree::Accel::Intersectors const*, embree::sse2::SubdivPatch1Precalculations<embree::sse2::GridSOAIntersector1::Precalculations>&, embree::RayHitK<1>&, embree::IntersectContext*, embree::sse2::GridSOA const*, unsigned long, embree::sse2::TravRay<4, 4, true> const&, unsigned long&) /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/subdivpatch1_intersector.h:50:30
    #3 0x7fabf1249c in embree::sse2::BVHNIntersector1<4, 1, true, embree::sse2::SubdivPatch1Intersector1>::intersect(embree::Accel::Intersectors const*, embree::RayHitK<1>&, embree::IntersectContext*) /home/syoyo/work/embree-aarch64/kernels/bvh/bvh_intersector1.cpp:109:9
    #4 0x7fab66fd5c in embree::Accel::Intersectors::intersect(RTCRayHit&, embree::IntersectContext*) /home/syoyo/work/embree-aarch64/kernels/common/accel.h:307:9
    #5 0x7fab66fd5c in rtcIntersect1 /home/syoyo/work/embree-aarch64/kernels/common/rtcore.cpp:491:25
    #6 0x56de88 in embree::EnableDisableGeometryTest::run(embree::VerifyApplication*, bool) /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:1695:11
    #7 0x4d5150 in embree::VerifyApplication::Test::execute(embree::VerifyApplication*, bool) /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:420:11
    #8 0x528a7c in embree::VerifyApplication::TestGroup::execute(embree::VerifyApplication*, bool)::$_0::operator()(unsigned long) const /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:673:38

verify fails to execute on M1 macOS

I got a M1 mac and compiled embree-aarch64, but verify exits without any reason.

OS: macOS 11.0.1
branch: main(v3.12.1-aarch64)
cmake 3.19(x86-64)

% ./scripts/bootstrap-arm64-macos.sh 
-- The C compiler identification is AppleClang 12.0.0.12000032
-- The CXX compiler identification is AppleClang 12.0.0.12000032
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.24.3 (Apple Git-128)") 
CMake Deprecation Warning at CMakeLists.txt:74 (cmake_policy):
  The OLD behavior for policy CMP0072 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- building for Apple Silicon Mac
-- Emulation of AVX2 by 2xNEON
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- OpenImageIO not found in your environment. You can 1) install
                              via your OS package manager, or 2) install it
                              somewhere on your machine and point OPENIMAGEIO_ROOT to it. (missing: OPENIMAGEIO_INCLUDE_DIR OPENIMAGEIO_LIBRARY) 
-- Could NOT find JPEG (missing: JPEG_LIBRARY) (found version "80")
-- Could NOT find PNG (missing: PNG_LIBRARIES) 
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/syoyo/work/embree-aarch64/build-arm64-macos
...
cd /Users/syoyo/work/embree-aarch64/build-arm64-macos/tutorials/verify && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_IOS -DEMBREE_TARGET_AVX -DEMBREE_TARGET_AVX2 -DEMBREE_TARGET_SSE2 -DEMBREE_TARGET_SSE42 -DNEON_AVX2_EMULATION -DTASKING_INTERNAL -I/Users/syoyo/work/embree-aarch64/kernels/../include -fsigned-char -Wall -Wformat -Wformat-security -fPIC -std=c++17 -fvisibility=hidden -fvisibility-inlines-hidden -fno-strict-aliasing -fno-tree-vectorize -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.7 -stdlib=libc++  -fsigned-char -g -DNDEBUG -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk  -D__SSE4_2__  -D__SSE4_1__   -o CMakeFiles/verify.dir/verify.cpp.o -c /Users/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp
% file verify 
verify: Mach-O 64-bit executable arm64

% ./verify 
zsh: killed     ./verify

% lldb verify
(lldb) target create "verify"
zsh: killed     lldb verify

@Developer-Ecosystem-Engineering have you got success to run verify on your ARM macOS system?

Native aarch64 build on Travis CI

Currently embree-aarch64 does not have native ARM build CI, which prevents running various tests(e.g. verify) on the CI.
(we currently use qemu aarch64 environment over x86 build job, which takes too much time to build)

Github Actions does not provide native ARM architecture on the CI server.

Fortunately, Travis CI started to provide ARM arch for OSS repo as a beta program

https://docs.travis-ci.com/user/multi-cpu-architectures/

So write Travis build script on arm64 for better ARM build & testing cycle.

verify NEON.update failure in recent commit(at least from v3.11.0)

NEON.update in verity started to fail at some point(but could be reproducible from v3.11.0)

OS: aarch64 linux(Jetson AGX)
Build config: ./scripts/build-aarch64-linux.sh

Embree Ray Tracing Kernels 3.11.0 (4a0f1750c6848437bf0d2f83c10863990caadda1)
  Compiler  : GCC 7.5.0
  Build     : Release 
  Platform  : Linux (32bit)
  CPU       : Unknown CPU ( MRA MRA MRA)
   Threads  : 8
   ISA      : SSE SSE2 NEON 
   Targets  : 
   MXCSR    : FTZ=0, DAZ=0
  Config
    Threads : default
    ISA     : SSE SSE2 NEON 
    Targets :  (supported)
              SSE2  (compile time enabled)
    Features: intersection_filter 
    Tasking : internal_tasking_system 

================================================================================
  WARNING: "Flush to Zero" or "Denormals are Zero" mode not enabled 
           in the MXCSR control and status register. This can have a severe 
           performance impact. Please enable these modes for each application 
           thread the following way:

           #include "xmmintrin.h"
           #include "pmmintrin.h"

           _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
           _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
================================================================================


 [PASSED]
                                                       fast_allocator_regression_test ... [PASSED]
                                                         motion_derivative_regression ... [PASSED]
                                                            collision_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                         parallel_for_regression_test ... [PASSED]
                                                      parallel_reduce_regression_test ... [PASSED]
                                                       parallel_prefix_sum_regression ... [PASSED]
                                                     parallel_for_for_regression_test ... [PASSED]
                                          parallel_for_for_prefix_sum_regression_test ... [PASSED]
                                                   parallel_partition_regression_test ... [PASSED]
                                                           RadixSortRegressionTestU32 ... [PASSED]
                                                           RadixSortRegressionTestU64 ... [PASSED]
                                                         parallel_set_regression_test ... [PASSED]
                                                         parallel_map_regression_test ... [PASSED]
                                                           parallel_filter_regression ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                NEON.multiple_devices ... [PASSED]
                                                                      NEON.types_test ... [PASSED]
                                                                      NEON.get_bounds ...++++++ [PASSED]
                                                               NEON.get_linear_bounds ...++++++ [PASSED]
                                                                   NEON.get_user_data ... [PASSED]
                                                                   NEON.buffer_stride ...++++++++ [PASSED]
                                                                     NEON.empty_scene ...++++++++++ [PASSED]
                                                                  NEON.empty_geometry ...++++++++++ [PASSED]
                                                                           NEON.build ...++++++++++ [PASSED]
                                                          NEON.overlapping_primitives ...++++++++++ [PASSED]
                                                             NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
                                                                NEON.user_geometry_id ...+++++ [PASSED]
                                                         NEON.enable_disable_geometry ...+++++ [PASSED]
                                                                          NEON.update ...-----+++++-++-+++++-++-+++--+-+-+---+-++-+-+--+++-+-+---++++---+-+-+-++---+-+-++-+--+-++++++++++++++++++++++++++-++-++++++++++++++++++++++++++-++--++++-+++++++++++-+++-++++++++++++++++++++++++++-++-++-+-++++++++++++++-+++---+++++++++++++---++++--+-+-+-++++++++++++++-+--++-++++++++++++++++++++++++++++++++++++++-+--+++--+++----+----+-+---+---++--++--+-+++++++++++++++++++++++++++++++++++++++++++++++-+---+++++-+++-+---+++--++++---+-+---+--+-+-+++-++++++++++++++++++++++++-+++++-+-++++++++++++++++++++++++++++++++++++++++ [FAILED]
                                                              NEON.build_garbage_geom ..................................................... [PASSED]
...

curve_intersector_virtual.cpp takes too much time to compile(aarch64-v3.8.0)

branch: aarch64-v3.8.0(port of intel embree v3.8.0)

geometry/curve_intersector_virtual.cpp takes too much time to compile(with -O2) on gcc(5 mins or more even on TR 1950X cross compiling). At least I can confirm the issue with gcc 7.4 and 8.0

clang(clang-9) also takes some time(~a couple of minutes) to compile geometry/curve_intersector_virtual.cpp

We recommended to use clang for aarch64 target for a while.

Watertight test fails in `verify`

branch: aarch64-v3.11.0
flags: EMBREE_RAY_PACKETS=On
compiler: clang-9
OS: Ubuntu 18.04(Jetson)

 [PASSED]
                                                       fast_allocator_regression_test ... [PASSED]
                                                         motion_derivative_regression ... [PASSED]
                                                            collision_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                         parallel_for_regression_test ... [PASSED]
                                                      parallel_reduce_regression_test ... [PASSED]
                                                       parallel_prefix_sum_regression ... [PASSED]
                                                     parallel_for_for_regression_test ... [PASSED]
                                          parallel_for_for_prefix_sum_regression_test ... [PASSED]
                                                   parallel_partition_regression_test ... [PASSED]
                                                           RadixSortRegressionTestU32 ... [PASSED]
                                                           RadixSortRegressionTestU64 ... [PASSED]
                                                         parallel_set_regression_test ... [PASSED]
                                                         parallel_map_regression_test ... [PASSED]
                                                           parallel_filter_regression ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                NEON.multiple_devices ... [PASSED]
                                                                      NEON.types_test ... [PASSED]
                                                                      NEON.get_bounds ...++++++ [PASSED]
                                                               NEON.get_linear_bounds ...++++++ [PASSED]
                                                                   NEON.get_user_data ... [PASSED]
                                                                   NEON.buffer_stride ...++++++++ [PASSED]
                                                                     NEON.empty_scene ...++++++++++ [PASSED]
                                                                  NEON.empty_geometry ...++++++++++ [PASSED]
                                                                           NEON.build ...++++++++++ [PASSED]
                                                          NEON.overlapping_primitives ...++++++++++ [PASSED]
                                                             NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
                                                                NEON.user_geometry_id ...+++++ [PASSED]
                                                         NEON.enable_disable_geometry ...+++++ [PASSED]
                                                                          NEON.update ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                              NEON.build_garbage_geom ..................................................... [PASSED]
                                                           NEON.interpolate.triangles ...++++++ [PASSED]
                                                                NEON.interpolate.grid ...++++++ [PASSED]
                                                              NEON.interpolate.subdiv ...++++++ [PASSED]
                                                                NEON.interpolate.hair ...++++++ [PASSED]
                                                                    NEON.triangle_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                        NEON.quad_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                             NEON.intersection_filter ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                      NEON.instancing ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                   NEON.inactive_rays ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                            NEON.watertight_triangles ...++++++++--------++++++++--------++++++++--------++++++++--------++++++++--------++++++++-------- [FAILED]
                                                         NEON.watertight_triangles_mb ...++++++++-+-------+++++++-----+++++---+++--++----++++---++---++-+---++++--+-++----+-+--+-+++----- [FAILED]
                                                                NEON.watertight_quads ...++++++++--------++++++++--------++++++++--------+++++++-------+++-++++-+------+++-++++--------++ [FAILED]
                                                             NEON.watertight_quads_mb ...++++++++--++------++++++-+---+++-----+++++-+------+++++-+----+++-+---+++++----+--++++--+--+----- [FAILED]
                                                                NEON.watertight_grids ...++++++++--------++++++++--------++++++++--------++++++---+-----+++++++-++--------++++-++++------ [FAILED]
                                                             NEON.watertight_grids_mb ...++++++++--------++++++++--------++++++++------+--+++++++-----+-+--++++++------+-+-++++++-------- [FAILED]
                                                               NEON.watertight_subdiv ...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [PASSED]
                                                              NEON.ray_alignment_test ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
                                                                     NEON.point_query ...+++++++++++++++++++++++ [PASSED]
                                                               NEON.regression_static ................................. [PASSED]
                                                              NEON.regression_dynamic ................................. [PASSED]
                                                    NEON.regression_static_build_join ................................. [PASSED]
                                                   NEON.regression_dynamic_build_join ................................. [PASSED]
                                                NEON.regression_static_memory_monitor ............................................................... [PASSED]
                                               NEON.regression_dynamic_memory_monitor ............................................................... [PASSED]
                                                            NEON.geometry_state_tests ... [PASSED]
                                                   NEON.scene_modified_geometry_tests ... [PASSED]
                                                   NEON.sphere_filter_multi_hit_tests ... [PASSED]

                                                                         Tests passed: 4782
                                                                         Tests failed: 288
                                                             Tests failed and ignored: 96


real	9m49.698s
user	67m11.132s
sys	2m55.296s

FAILED: watertight_*** tests
Suspicious: NEON.watertight_subdiv

support libc++ build on aarch64 linux + clang

Support libc++ build for aarch64 linux(native and cross-compiling)

libc++ build works fine for Android and llvm-ming(clang) build, so it should work well for aarch64 linux environment also.

Improve the accuracy of minps, maxps, rcp, rsqrt in NEON

ARM NEON's rcp estimate and rsqt estimate has less accuracy than corresponding SSE2 ops.

https://qiita.com/sanmanyannyan/items/62bb5ce6ada975a7106a

We need to increase the iteration of NewtonRaphson steps(2 or 3 times more iterations) to get the same level of the accuracy of rcp, rsqrt in SSE2 code path (estimate + one round of NewtonRaphson)

Currently we use use 2 iterations for NEON code path(vrcpsq_f32, vrsqrtsq_f32)

float32x4_t reciprocal = vrecpeq_f32(a);

Relates hole issue in #20

`verify` segfault on aarch64 linux

OS: Ubuntu 18.04.3 Linux
CPU: ARM a72(aarch64)
Build config : Use scripts/bootstrap-aarch64-linux.sh
Branch : master v3.6.1

./verify 
Cylinder test 3 failed: cylinder = Cylinder { p0 = (0, 0, 0), p1 = (1, 0, 0), r = 1}, ray = { 
  org = (0, 0, 0)
  dir = (1, 0, 0)
  near = 0
  far = inf
  time = 0
  mask = -1
  id = 0
  flags = 0
  Ng = (-3.60566e+18, 1.77965e-43, -1.69402e-24)  u = 1.77965e-43
  v = -8.17928e-33
  primID = 85
  geomID = 4294967295
  instID = 0
}, hit = 1, t = [-295.603; 295.603]
Cylinder test 4 failed: cylinder = Cylinder { p0 = (0, 0, 0), p1 = (1, 0, 0), r = 1}, ray = { 
  org = (0, 0, 0)
  dir = (-1, 0, 0)
  near = 0
  far = inf
  time = 0
  mask = -1
  id = 0
  flags = 0
  Ng = (-3.60566e+18, 1.77965e-43, -1.69402e-24)  u = 1.77965e-43
  v = -8.17928e-33
  primID = 85
  geomID = 4294967295
  instID = 0
}, hit = 1, t = [-295.603; 295.603]
verify: /mnt/data/work/embree-aarch64/kernels/common/device.cpp:67: embree::Device::Device(const char*): Assertion `isa::Cylinder::verify()' failed.
Aborted (core dumped)

Lots of `verify` test failures in v3.12.0 merge

OS: Jetson AGX(aarch64 linux)
Build config: scripts/bootstrap-aarch64-linux.sh

With Intel's v3.12.0 merge, lots of verify tests fails.

It looks something is changed from v3.11.0 to v3.12.0

Embree Ray Tracing Kernels 3.12.0 (b2e495336782d9ff20e59bf1f45cc7633bda7c5f)
  Compiler  : GCC 7.5.0
  Build     : Release 
  Platform  : Linux (32bit)
  CPU       : Unknown CPU ( MRA MRA MRA)
   Threads  : 8
   ISA      : SSE SSE2 NEON 
   Targets  : 
   MXCSR    : FTZ=0, DAZ=0
  Config
    Threads : default
    ISA     : SSE SSE2 NEON 
    Targets :  (supported)
              SSE2  (compile time enabled)
    Features: intersection_filter 
    Tasking : internal_tasking_system 

================================================================================
  WARNING: "Flush to Zero" or "Denormals are Zero" mode not enabled 
           in the MXCSR control and status register. This can have a severe 
           performance impact. Please enable these modes for each application 
           thread the following way:

           #include "xmmintrin.h"
           #include "pmmintrin.h"

           _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
           _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
================================================================================


 [PASSED]
                                                       fast_allocator_regression_test ... [PASSED]
                                                         motion_derivative_regression ... [PASSED]
                                                            collision_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                         parallel_for_regression_test ... [PASSED]
                                                      parallel_reduce_regression_test ... [PASSED]
                                                       parallel_prefix_sum_regression ... [PASSED]
                                                     parallel_for_for_regression_test ... [PASSED]
                                          parallel_for_for_prefix_sum_regression_test ... [PASSED]
                                                   parallel_partition_regression_test ... [PASSED]
                                                           RadixSortRegressionTestU32 ... [PASSED]
                                                           RadixSortRegressionTestU64 ... [PASSED]
                                                         parallel_set_regression_test ... [PASSED]
                                                         parallel_map_regression_test ... [PASSED]
                                                           parallel_filter_regression ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                NEON.multiple_devices ... [FAILED]
                                                                      NEON.types_test ... [PASSED]
                                                                      NEON.get_bounds ...---------------- [FAILED]
                                                               NEON.get_linear_bounds ...---------------- [FAILED]
                                                                   NEON.get_user_data ... [FAILED]
                                                                   NEON.buffer_stride ...-------- [FAILED]
                                                                     NEON.empty_scene ...---------- [FAILED]
                                                                  NEON.empty_geometry ...---------- [FAILED]
                                                                           NEON.build ...---------- [FAILED]
                                                          NEON.overlapping_primitives ...---------- [FAILED]
                                                             NEON.new_delete_geometry ...----- [FAILED]
                                                                NEON.user_geometry_id ...----- [FAILED]
                                                         NEON.enable_disable_geometry ...----- [FAILED]
                                                                          NEON.update ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                              NEON.build_garbage_geom ... [FAILED]
                                                           NEON.interpolate.triangles ...------ [FAILED]
                                                                NEON.interpolate.grid ...------ [FAILED]
                                                              NEON.interpolate.subdiv ...------ [FAILED]
                                                                NEON.interpolate.hair ...------ [FAILED]
                                                                    NEON.triangle_hit ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                                        NEON.quad_hit ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                             NEON.intersection_filter ...-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                                      NEON.instancing ...-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                                   NEON.inactive_rays ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
                                                            NEON.watertight_triangles ...------------------------------------------------------------------------------------------------ [FAILED]
                                                         NEON.watertight_triangles_mb ...------------------------------------------------------------------------------------------------ [FAILED]
                                                                NEON.watertight_quads ...------------------------------------------------------------------------------------------------ [FAILED]
                                                             NEON.watertight_quads_mb ...------------------------------------------------------------------------------------------------ [FAILED]
                                                                NEON.watertight_grids ...------------------------------------------------------------------------------------------------ [FAILED]
                                                             NEON.watertight_grids_mb ...------------------------------------------------------------------------------------------------ [FAILED]
                                                               NEON.watertight_subdiv ...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [PASSED]
                                                              NEON.ray_alignment_test ...------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ [FAILED]
                                                                     NEON.point_query ...----------------------- [FAILED]
                                                               NEON.regression_static ... [FAILED]
                                                              NEON.regression_dynamic ... [FAILED]
                                                    NEON.regression_static_build_join ... [FAILED]
                                                   NEON.regression_dynamic_build_join ... [FAILED]
                                                NEON.regression_static_memory_monitor ... [FAILED]
                                               NEON.regression_dynamic_memory_monitor ... [FAILED]
                                                            NEON.geometry_state_tests ... [FAILED]
                                                   NEON.scene_modified_geometry_tests ... [FAILED]
                                                   NEON.sphere_filter_multi_hit_tests ... [PASSED]

                                                                         Tests passed: 19
                                                                         Tests failed: 5051
                                                             Tests failed and ignored: 96

Situation is also same when using this PR: #35

create releases

This embree-fork is very useful. I was able to use it with blender on arm64 with very little effort.

To be able to use it in other software and installation scripts, it would be helpful if embree-aarch64 will have official releases, which could be downloaded as tar.gz according to version number.

Thank you!

Code Changes that impact x64

Hi,

I would like to report findings of a code we have performed to assess the impact of this 'fork' onto x64. Please understand that my listing below should not be treated as a list of defects, but rather present the differences to the original code base. I would like raise awareness of these changes and have a discussion about them.

We did execute one elaborate performance test of the 3.11 state which amounted to a low single-digit performance drop in comparison to the original code.

The following changes were identified. The screenshots show the Intel code (left) in comparison with the master branch of this repository (right).

1. sign in vec3fa.h

image

2. comparison intrinsics in vec3fa.h and vfloat_sse2.h

image

image

3. Change of mask computation in intersectNode in node_intersector1.h

image

4. Replacement of vreduce_min/max with reduce_min/max in vfloat8_avx.h

image

This code change is introduced with pull request #34 .

5. Change to rcp_safe in vec3fa.h

image

Has the been a strong reason for any of these code changes, do we have confidence they are all acceptable? Please let me know whether you regard any of these code changes risky or avoidable. I will gladly prepare a pull request. Personally, I would favor to revert or ifdef these points to be on the safe side.

verify NEON seg faults

brahch: verify-neon https://github.com/lighttransport/embree-aarch64/tree/verify-neon

Running NEON tests(reuses SSE2 tests) fails on aarch64 Linux.

...
                                                            collision_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                         parallel_for_regression_test ... [PASSED]
                                                      parallel_reduce_regression_test ... [PASSED]
                                                       parallel_prefix_sum_regression ... [PASSED]
                                                     parallel_for_for_regression_test ... [PASSED]
                                          parallel_for_for_prefix_sum_regression_test ... [PASSED]
                                                   parallel_partition_regression_test ... [PASSED]
                                                           RadixSortRegressionTestU32 ... [PASSED]
                                                           RadixSortRegressionTestU64 ... [PASSED]
                                                         parallel_set_regression_test ... [PASSED]
                                                         parallel_map_regression_test ... [PASSED]
                                                           parallel_filter_regression ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                NEON.multiple_devices ... [PASSED]
                                                                      NEON.types_test ... [PASSED]
                                                                      NEON.get_bounds ...++++++ [PASSED]
                                                               NEON.get_linear_bounds ...++++++ [PASSED]
                                                                   NEON.get_user_data ... [PASSED]
                                                                   NEON.buffer_stride ...++++++++ [PASSED]
                                                                     NEON.empty_scene ...++++++++++ [PASSED]
                                                                  NEON.empty_geometry ...++++++++++ [PASSED]
                                                                           NEON.build ...++++++++++ [PASSED]
                                                          NEON.overlapping_primitives ...++++++++++ [PASSED]
                                                             NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
                                                                NEON.user_geometry_id ...+++++ [PASSED]
                                                         NEON.enable_disable_geometry ...+++++ [PASSED]
                                                                          NEON.update ...Segmentation fault (core dumped)

Configuration

CMAKE_BIN=cmake

rm -rf build

$CMAKE_BIN \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DEMBREE_ARM=On \
  -DEMBREE_ADDRESS_SANITIZER=Off \
  -DCMAKE_INSTALL_PREFIX=$HOME/local/embree3 \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DEMBREE_ISPC_SUPPORT=Off \
  -DEMBREE_TASKING_SYSTEM=Internal \
  -DEMBREE_TUTORIALS=Off \
  -DEMBREE_MAX_ISA=SSE2 \
  -DEMBREE_RAY_PACKETS=On \
  -Bbuild -H.

Road to intel embree v3.9.0 and beyond

  • Merge neon-fix to embree-aarch64 master(v3.6.1)
  • Create aarch64-v3.8.0 branch from intel embree v3.8.0(recent version as of writing this issue(2nd March, 2020))
  • Merge neon-fix to aarch64-v3.8.0
  • Merge @maikschulze 's embree/embree#275 and embree/embree#276
  • Fix verify segfault: #22
  • Run tests, then merge aarch64-v3.8.0 to master
  • Merged v3.9.0 from Intel Embree

Note

There are still some amount of work required for Improving NEON code path. So neon-fix branch will continue to alive even after syncing embree-aarch64 with intel embree v3.8.0.

`verify` fails to pass with internal tasking system

verify fails to pass when built without ray packats(-DEMBREE_TASKING_SYSTEM=INTERNAL), even on x86-64 platform.

$ ./verify 
                                                                        create_device ...
Embree Ray Tracing Kernels 3.7.0 (72275957d467242edd283d206261f2410c5eabae)
  Compiler  : CLANG 8.0.0 (tags/RELEASE_800/final)
  Build     : Debug 
  Platform  : Linux (64bit)
  CPU       : Unknown CPU (AuthenticAMD)
   Threads  : 32
   ISA      : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 
   Targets  : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 
   MXCSR    : FTZ=1, DAZ=1
  Config
    Threads : default
    ISA     : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2 
    Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2  (supported)
              SSE2 SSE4.2 AVX AVX2 AVX512SKX  (compile time enabled)
    Features: intersection_filter 
    Tasking : internal_tasking_system 

 [PASSED]
                                                       fast_allocator_regression_test ... [PASSED]
                                                                cache_regression_test ... [PASSED]
                                                         parallel_for_regression_test ... [PASSED]
                                                      parallel_reduce_regression_test ... [PASSED]
                                                       parallel_prefix_sum_regression ... [PASSED]
                                                     parallel_for_for_regression_test ... [PASSED]
                                          parallel_for_for_prefix_sum_regression_test ... [PASSED]
                                                   parallel_partition_regression_test ... [PASSED]
                                                           RadixSortRegressionTestU32 ... [PASSED]
                                                           RadixSortRegressionTestU64 ... [PASSED]
                                                         parallel_set_regression_test ... [PASSED]
                                                         parallel_map_regression_test ... [PASSED]
                                                           parallel_filter_regression ... [PASSED]
                                                          barrier_sys_regression_test ... [PASSED]
                                                                SSE2.multiple_devices ... [PASSED]
                                                                      SSE2.types_test ... [PASSED]
                                                                      SSE2.get_bounds ...++++++ [PASSED]
                                                               SSE2.get_linear_bounds ...++++++ [PASSED]
                                                                   SSE2.get_user_data ... [PASSED]
                                                                   SSE2.buffer_stride ...++++++++ [PASSED]
                                                                     SSE2.empty_scene ...++++++++++ [PASSED]
                                                                  SSE2.empty_geometry ...++++++++++ [PASSED]
                                                                           SSE2.build ...++++++++++ [PASSED]
                                                          SSE2.overlapping_primitives ...++++++++++ [PASSED]
                                                             SSE2.new_delete_geometry ...............................................................................................................................+++.++ [PASSED]
                                                                SSE2.user_geometry_id ...+++++ [PASSED]
                                                         SSE2.enable_disable_geometry ...+++++ [PASSED]
                                                                          SSE2.update ...-----------------------------------------------------+--+++++---++---++---+++++++-+++-++----++++++++++++++++++++++++++++ [FAILED]
                                                              SSE2.build_garbage_geom ..................................................... [PASSED]
                                                           SSE2.interpolate.triangles ...++++++ [PASSED]
                                                                SSE2.interpolate.grid ...++++++ [PASSED]
                                                              SSE2.interpolate.subdiv ...++++++ [PASSED]
                                                                SSE2.interpolate.hair ...++++++ [PASSED]
                                                                    SSE2.triangle_hit ...---+----++--+-+-----++-++-+--+--+-+-++++---++-+-+-+-++-----++--++++--++--+++--+-----+-+-+---++++-----+----+-+---+++-+--- [FAILED]
                                                                        SSE2.quad_hit ...----------++--+-+-+---++++-+----++-+-++---+++++----+++----++-+-+--+-++----+-+-+----++++--+-++--+-+++--+-+-+-+---+-----+- [FAILED]
                                                             SSE2.intersection_filter ...-+-----+---+--+++----+--+------++--+-+---++---+++++-+-+--+---+---++-+------++--++++-++-----+++-+---+-+----+++--+++-+--++----+--+---+---+++---++-----+------++---+------+--+++---++--++---+-+++-----+++--+-++++++-++--+-+++---+--++++-++--+++-+-- [FAILED]
                                                                      SSE2.instancing ...---+verify: /home/syoyo/work/embree/tutorials/verify/verify.cpp:2813: virtual VerifyApplication::TestReturnValue embree::InstancingTest::run(embree::VerifyApplication *, bool): Assertion `passed' failed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.