lighttransport / embree-aarch64 Goto Github PK
View Code? Open in Web Editor NEWAARCH64 port of Embree ray tracing library
License: Apache License 2.0
AARCH64 port of Embree ray tracing library
License: Apache License 2.0
Hair intersection(FLAT_BEZIER) is broken on aarch64.
Current common/math/SSE2NEON.h was derived from older SSE2NEON. Meanwhile, DLTcollab/sse2neon is actively maintained and collaborate with more developers.
I would like to propose the changes to take DLTcollab/sse2neon and work on the pending issues like #24 .
Originated from: #49
with pathtracer
,
With pathtracer --nodisplay
(or bechmark mode(e.g. --benchmark 10 100
)),
There are 4 times difference for M1. The reason may be thread affinity or windowing component(GLFW),
When compiling Embree with
-DEMBREE_MAX_ISA=AVX2
-DEMBREE_NEON_AVX2_EMULATION=ON
-DEMBREE_RAY_PACKETS:BOOL=ON
the verify test suite crashes
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Build fingerprint: 'OnePlus/OnePlus3/OnePlus3T:9/PKQ1.181203.001/1911042108:user/release-keys'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Revision: '0'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: ABI: 'arm64'
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: pid: 16363, tid: 16447, name: Thread-2 >>> com.example.embreetest <<<
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: Cause: null pointer dereference
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x0 000000702e1fa600 x1 000000704b7d1a58 x2 000000702e1fa380 x3 000000702e1fa6d0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x4 0000000000000140 x5 000000000000021c x6 00000000000001c0 x7 0000000000000180
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x8 0000000000000000 x9 000000704b60da80 x10 000000704b7d1a00 x11 000000702e1faa40
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x12 000000702e1faa40 x13 00000000ffffffff x14 00000000ffffffff x15 00000000ffffffff
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x16 000000702e1faa00 x17 0000000000000040 x18 00000000000000d7 x19 000000702e1fa6d0
2020-10-13 06:36:46.158 16740-16740/? A/DEBUG: x20 000000702e1fa380 x21 000000702e1fa600 x22 000000704b79b000 x23 0000000000000000
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG: x24 000000007f800000 x25 000000702e1fa780 x26 0000000000000010 x27 0000000000000040
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG: x28 0000000000000080 x29 000000702e1f9e30
2020-10-13 06:36:46.159 16740-16740/? A/DEBUG: sp 000000702e1f9e00 lr 000000703814c2f8 pc 0000000000000000
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: backtrace:
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #00 pc 0000000000000000 <unknown>
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #01 pc 00000000000592f4 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::AccelN::intersect8(void const*, embree::Accel::Intersectors*, RTCRayHit8&, embree::IntersectContext*)+116)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #02 pc 0000000000831200 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (_ZN6embree4avx215RayStreamFilter9filterSOAILi8ELb1EEEvPNS_5SceneEPcmmmPNS_16IntersectContextE+2416)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #03 pc 00000000000727d8 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (rtcIntersect16+104)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #04 pc 000000000005d564 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::IntersectWithModeInternal(embree::IntersectMode, embree::IntersectVariant, RTCSceneTy*, RTCRayHit*, unsigned int, RTCIntersectContext*)+3188)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #05 pc 000000000005dbcc /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::IntersectWithMode(embree::IntersectMode, embree::IntersectVariant, RTCSceneTy*, RTCRayHit*, unsigned int, RTCIntersectContext*)+348)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #06 pc 000000000009b47c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::UpdateTest::run(embree::VerifyApplication*, bool)+1996)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #07 pc 000000000005f338 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (embree::VerifyApplication::Test::execute(embree::VerifyApplication*, bool)+152)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #08 pc 0000000000084318 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0x138b000) (_ZN6embree13TaskScheduler19ClosureTaskFunctionIZNS0_5spawnImZNS_12parallel_forImZNS_17VerifyApplication9TestGroup7executeEPS4_bE3$_0EEvT_RKT0_EUlRKNS_5rangeImEEE_EEvS8_S8_S8_SB_EUlvE_E7executeEv+136)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #09 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.177 16740-16740/? A/DEBUG: #10 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.179 16740-16740/? A/DEBUG: #11 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #12 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #13 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #14 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #15 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #16 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #17 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #18 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #19 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #20 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #21 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #22 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #23 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #24 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #25 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #26 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #27 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #28 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #29 pc 0000000000a8a610 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::wait()+64)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #30 pc 0000000000a88fec /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&)+124)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #31 pc 0000000000a8927c /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*)+76)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #32 pc 0000000000a89de8 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::thread_loop(unsigned long)+632)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #33 pc 0000000000a895a8 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::TaskScheduler::ThreadPool::thread_loop(unsigned long)+264)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #34 pc 0000000000a84684 /data/app/com.example.embreetest-UVLKVq2jp6Zb3lP9EXgong==/base.apk (offset 0xee000) (embree::threadStartup(embree::ThreadStartupData*)+20)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #35 pc 0000000000099508 /system/lib64/libc.so (__pthread_start(void*)+36)
2020-10-13 06:36:46.180 16740-16740/? A/DEBUG: #36 pc 0000000000023e18 /system/lib64/libc.so (__start_thread+68)
This is likely caused by dispatch to missing functions of the non-present AVX code path.
Status: A fix is being worked on.
Test displacement map on aarch64
The issue was found in the discussion at #34 (comment)
On ARM Linux + gcc8 ~ gcc10, internal compilation error happens in the following line.
(FYI gcc 7.5 works fine)
/home/syoyo/work/embree-aarch64-neon2x/common/tasking/taskschedulerinternal.h:141:27: internal compiler error: unexpected expression ‘(std::__atomic_base<long unsigned int>::__int_type)((embree::TaskScheduler::TaskQueue*)this)->embree::TaskScheduler::TaskQueue::right’ of kind implicit_conv_expr
141 | new (&(tasks[right])) Task(func,thread.task,oldStackPtr,size);
| ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-10/README.Bugs> for instructions.
The situation is using std::atomic<size_t> right
for array lookup.
The code is tricky and if we want to ensure the atomic operation when creating Task, it'd be better to take a lock, insted of using std::atomic
variable directly.
At least I can confirm explicitly call load()
will solve the issue, for example tasks[right.load()]
, but not sure this is the right way to suppress the internal compilation error.
From PR #11
I've noticed a performance issue that is only present on ARM devices: LOW_QUALITY BVH construction is slower than expected. For example, 1000,000 triangles take:
LOW_QUALITY: 0.74 seconds, 1.34 Mprims/s, 266 SAH build quality
MEDIUM_QUALITY: 0.57 seconds, 1.75 Mprims/s, 249 SAH build quality
HIGH_QUALITY: 1.47 seconds, 0.68 Mprims/s, 249 SAH build quality
The NEON-mapped function _mm_sqrt_ps does not handle zero inputs properly. Instead of returning 0, it returns a NaN value. This is caused by the underyling function 'vrsqrteq_f32' which returns +inf.
This affects CylinderN::intersect such that
const vfloat<N> Q = sqrt(D);
would not work properly if D=0, resulting in a false miss.
This can be reproduced on arm64 (not on x64) with the following ray in the interpolation scene:
{
RTCIntersectContext context;
rtcInitIntersectContext(&context);
RTCRayHit rayhitdebug;
rayhitdebug.ray.org_x = -0.000000476837158;
rayhitdebug.ray.org_y = 4.99999952;
rayhitdebug.ray.org_z = -7.07106781;
rayhitdebug.ray.dir_x = 0.645600736;
rayhitdebug.ray.dir_y = -0.512669444;
rayhitdebug.ray.dir_z = 0.566012084;
rayhitdebug.ray.time = 0.0;
rayhitdebug.ray.tnear = 0.0;
rayhitdebug.ray.tfar = 100000.0;
rayhitdebug.ray.mask = -1;
rayhitdebug.ray.id = 0;
rayhitdebug.ray.flags = 0;
Vec3fa xx(-1, 0, 1);
Vec3fa yy = rsqrt(xx);
Vec3fa zz = sqrt(xx);
rtcIntersect1(g_scene, &context, &rayhitdebug);
}
branch: aarch64-v3.11.0
os: Linux aarch64
compiler: clang 9
verify segfaults at NEON.enable_disable_geometry
NEON.enable_disable_geometry ...AddressSanitizer:DEADLYSIGNAL
=================================================================
==19473==ERROR: AddressSanitizer: SEGV on unknown address 0x0000a6d6feec (pc 0x007fabf124a0 bp 0x007ffe6d2e70 sp 0x007ffe6d1420 T0)
AddressSanitizer:DEADLYSIGNAL
==19473==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer:DEADLYSIGNAL
#0 0x7fabf1249c in embree::sse2::GridSOA::root(unsigned long) const /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/grid_soa.h:62:97
#1 0x7fabf1249c in embree::sse2::SubdivPatch1Intersector1::processLazyNode(embree::sse2::SubdivPatch1Precalculations<embree::sse2::GridSOAIntersector1::Precalculations>&, embree::IntersectContext*, embree::sse2::GridSOA const*, unsigned long&) /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/subdivpatch1_intersector.h:40:27
#2 0x7fabf1249c in void embree::sse2::SubdivPatch1Intersector1::intersect<4, 4, true>(embree::Accel::Intersectors const*, embree::sse2::SubdivPatch1Precalculations<embree::sse2::GridSOAIntersector1::Precalculations>&, embree::RayHitK<1>&, embree::IntersectContext*, embree::sse2::GridSOA const*, unsigned long, embree::sse2::TravRay<4, 4, true> const&, unsigned long&) /home/syoyo/work/embree-aarch64/kernels/bvh/../geometry/subdivpatch1_intersector.h:50:30
#3 0x7fabf1249c in embree::sse2::BVHNIntersector1<4, 1, true, embree::sse2::SubdivPatch1Intersector1>::intersect(embree::Accel::Intersectors const*, embree::RayHitK<1>&, embree::IntersectContext*) /home/syoyo/work/embree-aarch64/kernels/bvh/bvh_intersector1.cpp:109:9
#4 0x7fab66fd5c in embree::Accel::Intersectors::intersect(RTCRayHit&, embree::IntersectContext*) /home/syoyo/work/embree-aarch64/kernels/common/accel.h:307:9
#5 0x7fab66fd5c in rtcIntersect1 /home/syoyo/work/embree-aarch64/kernels/common/rtcore.cpp:491:25
#6 0x56de88 in embree::EnableDisableGeometryTest::run(embree::VerifyApplication*, bool) /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:1695:11
#7 0x4d5150 in embree::VerifyApplication::Test::execute(embree::VerifyApplication*, bool) /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:420:11
#8 0x528a7c in embree::VerifyApplication::TestGroup::execute(embree::VerifyApplication*, bool)::$_0::operator()(unsigned long) const /home/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp:673:38
It happens in Release build + MSVC(VS2022) + Windows 11 in some situation.
Seems Embree fails to allocate a buffer on 16 bytes boundary, and tries to read an address using unaligned load instruction.
Run a program multiple times. Sometimes it get success.
I got a M1 mac and compiled embree-aarch64, but verify
exits without any reason.
OS: macOS 11.0.1
branch: main
(v3.12.1-aarch64)
cmake 3.19(x86-64)
% ./scripts/bootstrap-arm64-macos.sh
-- The C compiler identification is AppleClang 12.0.0.12000032
-- The CXX compiler identification is AppleClang 12.0.0.12000032
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.24.3 (Apple Git-128)")
CMake Deprecation Warning at CMakeLists.txt:74 (cmake_policy):
The OLD behavior for policy CMP0072 will be removed from a future version
of CMake.
The cmake-policies(7) manual explains that the OLD behaviors of all
policies are deprecated and that a policy should be set to OLD only under
specific short-term circumstances. Projects should be ported to the NEW
behavior and not rely on setting a policy to OLD.
-- building for Apple Silicon Mac
-- Emulation of AVX2 by 2xNEON
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- OpenImageIO not found in your environment. You can 1) install
via your OS package manager, or 2) install it
somewhere on your machine and point OPENIMAGEIO_ROOT to it. (missing: OPENIMAGEIO_INCLUDE_DIR OPENIMAGEIO_LIBRARY)
-- Could NOT find JPEG (missing: JPEG_LIBRARY) (found version "80")
-- Could NOT find PNG (missing: PNG_LIBRARIES)
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/syoyo/work/embree-aarch64/build-arm64-macos
...
cd /Users/syoyo/work/embree-aarch64/build-arm64-macos/tutorials/verify && /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_IOS -DEMBREE_TARGET_AVX -DEMBREE_TARGET_AVX2 -DEMBREE_TARGET_SSE2 -DEMBREE_TARGET_SSE42 -DNEON_AVX2_EMULATION -DTASKING_INTERNAL -I/Users/syoyo/work/embree-aarch64/kernels/../include -fsigned-char -Wall -Wformat -Wformat-security -fPIC -std=c++17 -fvisibility=hidden -fvisibility-inlines-hidden -fno-strict-aliasing -fno-tree-vectorize -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.7 -stdlib=libc++ -fsigned-char -g -DNDEBUG -O3 -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk -D__SSE4_2__ -D__SSE4_1__ -o CMakeFiles/verify.dir/verify.cpp.o -c /Users/syoyo/work/embree-aarch64/tutorials/verify/verify.cpp
% file verify
verify: Mach-O 64-bit executable arm64
% ./verify
zsh: killed ./verify
% lldb verify
(lldb) target create "verify"
zsh: killed lldb verify
@Developer-Ecosystem-Engineering have you got success to run verify
on your ARM macOS system?
Currently embree-aarch64 does not have native ARM build CI, which prevents running various tests(e.g. verify
) on the CI.
(we currently use qemu aarch64 environment over x86 build job, which takes too much time to build)
Github Actions does not provide native ARM architecture on the CI server.
Fortunately, Travis CI started to provide ARM arch for OSS repo as a beta program
https://docs.travis-ci.com/user/multi-cpu-architectures/
So write Travis build script on arm64 for better ARM build & testing cycle.
Test subdivison surface on aarch64
NEON.update
in verity started to fail at some point(but could be reproducible from v3.11.0)
OS: aarch64 linux(Jetson AGX)
Build config: ./scripts/build-aarch64-linux.sh
Embree Ray Tracing Kernels 3.11.0 (4a0f1750c6848437bf0d2f83c10863990caadda1)
Compiler : GCC 7.5.0
Build : Release
Platform : Linux (32bit)
CPU : Unknown CPU ( MRA MRA MRA)
Threads : 8
ISA : SSE SSE2 NEON
Targets :
MXCSR : FTZ=0, DAZ=0
Config
Threads : default
ISA : SSE SSE2 NEON
Targets : (supported)
SSE2 (compile time enabled)
Features: intersection_filter
Tasking : internal_tasking_system
================================================================================
WARNING: "Flush to Zero" or "Denormals are Zero" mode not enabled
in the MXCSR control and status register. This can have a severe
performance impact. Please enable these modes for each application
thread the following way:
#include "xmmintrin.h"
#include "pmmintrin.h"
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
================================================================================
[PASSED]
fast_allocator_regression_test ... [PASSED]
motion_derivative_regression ... [PASSED]
collision_regression_test ... [PASSED]
cache_regression_test ... [PASSED]
parallel_for_regression_test ... [PASSED]
parallel_reduce_regression_test ... [PASSED]
parallel_prefix_sum_regression ... [PASSED]
parallel_for_for_regression_test ... [PASSED]
parallel_for_for_prefix_sum_regression_test ... [PASSED]
parallel_partition_regression_test ... [PASSED]
RadixSortRegressionTestU32 ... [PASSED]
RadixSortRegressionTestU64 ... [PASSED]
parallel_set_regression_test ... [PASSED]
parallel_map_regression_test ... [PASSED]
parallel_filter_regression ... [PASSED]
barrier_sys_regression_test ... [PASSED]
NEON.multiple_devices ... [PASSED]
NEON.types_test ... [PASSED]
NEON.get_bounds ...++++++ [PASSED]
NEON.get_linear_bounds ...++++++ [PASSED]
NEON.get_user_data ... [PASSED]
NEON.buffer_stride ...++++++++ [PASSED]
NEON.empty_scene ...++++++++++ [PASSED]
NEON.empty_geometry ...++++++++++ [PASSED]
NEON.build ...++++++++++ [PASSED]
NEON.overlapping_primitives ...++++++++++ [PASSED]
NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
NEON.user_geometry_id ...+++++ [PASSED]
NEON.enable_disable_geometry ...+++++ [PASSED]
NEON.update ...-----+++++-++-+++++-++-+++--+-+-+---+-++-+-+--+++-+-+---++++---+-+-+-++---+-+-++-+--+-++++++++++++++++++++++++++-++-++++++++++++++++++++++++++-++--++++-+++++++++++-+++-++++++++++++++++++++++++++-++-++-+-++++++++++++++-+++---+++++++++++++---++++--+-+-+-++++++++++++++-+--++-++++++++++++++++++++++++++++++++++++++-+--+++--+++----+----+-+---+---++--++--+-+++++++++++++++++++++++++++++++++++++++++++++++-+---+++++-+++-+---+++--++++---+-+---+--+-+-+++-++++++++++++++++++++++++-+++++-+-++++++++++++++++++++++++++++++++++++++++ [FAILED]
NEON.build_garbage_geom ..................................................... [PASSED]
...
branch: aarch64-v3.8.0
(port of intel embree v3.8.0)
geometry/curve_intersector_virtual.cpp
takes too much time to compile(with -O2
) on gcc(5 mins or more even on TR 1950X cross compiling). At least I can confirm the issue with gcc 7.4 and 8.0
clang(clang-9) also takes some time(~a couple of minutes) to compile geometry/curve_intersector_virtual.cpp
We recommended to use clang for aarch64 target for a while.
branch: aarch64-v3.11.0
flags: EMBREE_RAY_PACKETS=On
compiler: clang-9
OS: Ubuntu 18.04(Jetson)
[PASSED]
fast_allocator_regression_test ... [PASSED]
motion_derivative_regression ... [PASSED]
collision_regression_test ... [PASSED]
cache_regression_test ... [PASSED]
parallel_for_regression_test ... [PASSED]
parallel_reduce_regression_test ... [PASSED]
parallel_prefix_sum_regression ... [PASSED]
parallel_for_for_regression_test ... [PASSED]
parallel_for_for_prefix_sum_regression_test ... [PASSED]
parallel_partition_regression_test ... [PASSED]
RadixSortRegressionTestU32 ... [PASSED]
RadixSortRegressionTestU64 ... [PASSED]
parallel_set_regression_test ... [PASSED]
parallel_map_regression_test ... [PASSED]
parallel_filter_regression ... [PASSED]
barrier_sys_regression_test ... [PASSED]
NEON.multiple_devices ... [PASSED]
NEON.types_test ... [PASSED]
NEON.get_bounds ...++++++ [PASSED]
NEON.get_linear_bounds ...++++++ [PASSED]
NEON.get_user_data ... [PASSED]
NEON.buffer_stride ...++++++++ [PASSED]
NEON.empty_scene ...++++++++++ [PASSED]
NEON.empty_geometry ...++++++++++ [PASSED]
NEON.build ...++++++++++ [PASSED]
NEON.overlapping_primitives ...++++++++++ [PASSED]
NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
NEON.user_geometry_id ...+++++ [PASSED]
NEON.enable_disable_geometry ...+++++ [PASSED]
NEON.update ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.build_garbage_geom ..................................................... [PASSED]
NEON.interpolate.triangles ...++++++ [PASSED]
NEON.interpolate.grid ...++++++ [PASSED]
NEON.interpolate.subdiv ...++++++ [PASSED]
NEON.interpolate.hair ...++++++ [PASSED]
NEON.triangle_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.quad_hit ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.intersection_filter ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.instancing ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.inactive_rays ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.watertight_triangles ...++++++++--------++++++++--------++++++++--------++++++++--------++++++++--------++++++++-------- [FAILED]
NEON.watertight_triangles_mb ...++++++++-+-------+++++++-----+++++---+++--++----++++---++---++-+---++++--+-++----+-+--+-+++----- [FAILED]
NEON.watertight_quads ...++++++++--------++++++++--------++++++++--------+++++++-------+++-++++-+------+++-++++--------++ [FAILED]
NEON.watertight_quads_mb ...++++++++--++------++++++-+---+++-----+++++-+------+++++-+----+++-+---+++++----+--++++--+--+----- [FAILED]
NEON.watertight_grids ...++++++++--------++++++++--------++++++++--------++++++---+-----+++++++-++--------++++-++++------ [FAILED]
NEON.watertight_grids_mb ...++++++++--------++++++++--------++++++++------+--+++++++-----+-+--++++++------+-+-++++++-------- [FAILED]
NEON.watertight_subdiv ...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [PASSED]
NEON.ray_alignment_test ...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [PASSED]
NEON.point_query ...+++++++++++++++++++++++ [PASSED]
NEON.regression_static ................................. [PASSED]
NEON.regression_dynamic ................................. [PASSED]
NEON.regression_static_build_join ................................. [PASSED]
NEON.regression_dynamic_build_join ................................. [PASSED]
NEON.regression_static_memory_monitor ............................................................... [PASSED]
NEON.regression_dynamic_memory_monitor ............................................................... [PASSED]
NEON.geometry_state_tests ... [PASSED]
NEON.scene_modified_geometry_tests ... [PASSED]
NEON.sphere_filter_multi_hit_tests ... [PASSED]
Tests passed: 4782
Tests failed: 288
Tests failed and ignored: 96
real 9m49.698s
user 67m11.132s
sys 2m55.296s
FAILED: watertight_***
tests
Suspicious: NEON.watertight_subdiv
Support libc++ build for aarch64 linux(native and cross-compiling)
libc++ build works fine for Android and llvm-ming(clang) build, so it should work well for aarch64 linux environment also.
ARM NEON's rcp estimate and rsqt estimate has less accuracy than corresponding SSE2 ops.
https://qiita.com/sanmanyannyan/items/62bb5ce6ada975a7106a
We need to increase the iteration of NewtonRaphson steps(2 or 3 times more iterations) to get the same level of the accuracy of rcp, rsqrt in SSE2 code path (estimate + one round of NewtonRaphson)
Currently we use use 2 iterations for NEON code path(vrcpsq_f32, vrsqrtsq_f32)
embree-aarch64/common/math/math.h
Line 73 in 3f75f8c
Relates hole issue in #20
OS: Ubuntu 18.04.3 Linux
CPU: ARM a72(aarch64)
Build config : Use scripts/bootstrap-aarch64-linux.sh
Branch : master
v3.6.1
./verify
Cylinder test 3 failed: cylinder = Cylinder { p0 = (0, 0, 0), p1 = (1, 0, 0), r = 1}, ray = {
org = (0, 0, 0)
dir = (1, 0, 0)
near = 0
far = inf
time = 0
mask = -1
id = 0
flags = 0
Ng = (-3.60566e+18, 1.77965e-43, -1.69402e-24) u = 1.77965e-43
v = -8.17928e-33
primID = 85
geomID = 4294967295
instID = 0
}, hit = 1, t = [-295.603; 295.603]
Cylinder test 4 failed: cylinder = Cylinder { p0 = (0, 0, 0), p1 = (1, 0, 0), r = 1}, ray = {
org = (0, 0, 0)
dir = (-1, 0, 0)
near = 0
far = inf
time = 0
mask = -1
id = 0
flags = 0
Ng = (-3.60566e+18, 1.77965e-43, -1.69402e-24) u = 1.77965e-43
v = -8.17928e-33
primID = 85
geomID = 4294967295
instID = 0
}, hit = 1, t = [-295.603; 295.603]
verify: /mnt/data/work/embree-aarch64/kernels/common/device.cpp:67: embree::Device::Device(const char*): Assertion `isa::Cylinder::verify()' failed.
Aborted (core dumped)
OS: Jetson AGX(aarch64 linux)
Build config: scripts/bootstrap-aarch64-linux.sh
With Intel's v3.12.0
merge, lots of verify
tests fails.
It looks something is changed from v3.11.0 to v3.12.0
Embree Ray Tracing Kernels 3.12.0 (b2e495336782d9ff20e59bf1f45cc7633bda7c5f)
Compiler : GCC 7.5.0
Build : Release
Platform : Linux (32bit)
CPU : Unknown CPU ( MRA MRA MRA)
Threads : 8
ISA : SSE SSE2 NEON
Targets :
MXCSR : FTZ=0, DAZ=0
Config
Threads : default
ISA : SSE SSE2 NEON
Targets : (supported)
SSE2 (compile time enabled)
Features: intersection_filter
Tasking : internal_tasking_system
================================================================================
WARNING: "Flush to Zero" or "Denormals are Zero" mode not enabled
in the MXCSR control and status register. This can have a severe
performance impact. Please enable these modes for each application
thread the following way:
#include "xmmintrin.h"
#include "pmmintrin.h"
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
================================================================================
[PASSED]
fast_allocator_regression_test ... [PASSED]
motion_derivative_regression ... [PASSED]
collision_regression_test ... [PASSED]
cache_regression_test ... [PASSED]
parallel_for_regression_test ... [PASSED]
parallel_reduce_regression_test ... [PASSED]
parallel_prefix_sum_regression ... [PASSED]
parallel_for_for_regression_test ... [PASSED]
parallel_for_for_prefix_sum_regression_test ... [PASSED]
parallel_partition_regression_test ... [PASSED]
RadixSortRegressionTestU32 ... [PASSED]
RadixSortRegressionTestU64 ... [PASSED]
parallel_set_regression_test ... [PASSED]
parallel_map_regression_test ... [PASSED]
parallel_filter_regression ... [PASSED]
barrier_sys_regression_test ... [PASSED]
NEON.multiple_devices ... [FAILED]
NEON.types_test ... [PASSED]
NEON.get_bounds ...---------------- [FAILED]
NEON.get_linear_bounds ...---------------- [FAILED]
NEON.get_user_data ... [FAILED]
NEON.buffer_stride ...-------- [FAILED]
NEON.empty_scene ...---------- [FAILED]
NEON.empty_geometry ...---------- [FAILED]
NEON.build ...---------- [FAILED]
NEON.overlapping_primitives ...---------- [FAILED]
NEON.new_delete_geometry ...----- [FAILED]
NEON.user_geometry_id ...----- [FAILED]
NEON.enable_disable_geometry ...----- [FAILED]
NEON.update ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.build_garbage_geom ... [FAILED]
NEON.interpolate.triangles ...------ [FAILED]
NEON.interpolate.grid ...------ [FAILED]
NEON.interpolate.subdiv ...------ [FAILED]
NEON.interpolate.hair ...------ [FAILED]
NEON.triangle_hit ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.quad_hit ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.intersection_filter ...-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.instancing ...-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.inactive_rays ...---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [FAILED]
NEON.watertight_triangles ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_triangles_mb ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_quads ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_quads_mb ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_grids ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_grids_mb ...------------------------------------------------------------------------------------------------ [FAILED]
NEON.watertight_subdiv ...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [PASSED]
NEON.ray_alignment_test ...------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ [FAILED]
NEON.point_query ...----------------------- [FAILED]
NEON.regression_static ... [FAILED]
NEON.regression_dynamic ... [FAILED]
NEON.regression_static_build_join ... [FAILED]
NEON.regression_dynamic_build_join ... [FAILED]
NEON.regression_static_memory_monitor ... [FAILED]
NEON.regression_dynamic_memory_monitor ... [FAILED]
NEON.geometry_state_tests ... [FAILED]
NEON.scene_modified_geometry_tests ... [FAILED]
NEON.sphere_filter_multi_hit_tests ... [PASSED]
Tests passed: 19
Tests failed: 5051
Tests failed and ignored: 96
Situation is also same when using this PR: #35
This embree-fork is very useful. I was able to use it with blender on arm64 with very little effort.
To be able to use it in other software and installation scripts, it would be helpful if embree-aarch64 will have official releases, which could be downloaded as tar.gz according to version number.
Thank you!
Hi,
I would like to report findings of a code we have performed to assess the impact of this 'fork' onto x64. Please understand that my listing below should not be treated as a list of defects, but rather present the differences to the original code base. I would like raise awareness of these changes and have a discussion about them.
We did execute one elaborate performance test of the 3.11 state which amounted to a low single-digit performance drop in comparison to the original code.
The following changes were identified. The screenshots show the Intel code (left) in comparison with the master branch of this repository (right).
This code change is introduced with pull request #34 .
Has the been a strong reason for any of these code changes, do we have confidence they are all acceptable? Please let me know whether you regard any of these code changes risky or avoidable. I will gladly prepare a pull request. Personally, I would favor to revert or ifdef these points to be on the safe side.
brahch: verify-neon
https://github.com/lighttransport/embree-aarch64/tree/verify-neon
Running NEON tests(reuses SSE2 tests) fails on aarch64 Linux.
...
collision_regression_test ... [PASSED]
cache_regression_test ... [PASSED]
parallel_for_regression_test ... [PASSED]
parallel_reduce_regression_test ... [PASSED]
parallel_prefix_sum_regression ... [PASSED]
parallel_for_for_regression_test ... [PASSED]
parallel_for_for_prefix_sum_regression_test ... [PASSED]
parallel_partition_regression_test ... [PASSED]
RadixSortRegressionTestU32 ... [PASSED]
RadixSortRegressionTestU64 ... [PASSED]
parallel_set_regression_test ... [PASSED]
parallel_map_regression_test ... [PASSED]
parallel_filter_regression ... [PASSED]
barrier_sys_regression_test ... [PASSED]
NEON.multiple_devices ... [PASSED]
NEON.types_test ... [PASSED]
NEON.get_bounds ...++++++ [PASSED]
NEON.get_linear_bounds ...++++++ [PASSED]
NEON.get_user_data ... [PASSED]
NEON.buffer_stride ...++++++++ [PASSED]
NEON.empty_scene ...++++++++++ [PASSED]
NEON.empty_geometry ...++++++++++ [PASSED]
NEON.build ...++++++++++ [PASSED]
NEON.overlapping_primitives ...++++++++++ [PASSED]
NEON.new_delete_geometry ................................................................................................................................+++++ [PASSED]
NEON.user_geometry_id ...+++++ [PASSED]
NEON.enable_disable_geometry ...+++++ [PASSED]
NEON.update ...Segmentation fault (core dumped)
Configuration
CMAKE_BIN=cmake
rm -rf build
$CMAKE_BIN \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DEMBREE_ARM=On \
-DEMBREE_ADDRESS_SANITIZER=Off \
-DCMAKE_INSTALL_PREFIX=$HOME/local/embree3 \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DEMBREE_ISPC_SUPPORT=Off \
-DEMBREE_TASKING_SYSTEM=Internal \
-DEMBREE_TUTORIALS=Off \
-DEMBREE_MAX_ISA=SSE2 \
-DEMBREE_RAY_PACKETS=On \
-Bbuild -H.
branch verify-neon
After fixing segfault in #22 , verify
still fails.
verify
NEON watertight tests fails to pass(e.g. NEON.watertight_triangles
)
neon-fix
to embree-aarch64 master
(v3.6.1)
aarch64-v3.8.0
branch from intel embree v3.8.0
(recent version as of writing this issue(2nd March, 2020))neon-fix
to aarch64-v3.8.0
aarch64-v3.8.0
to master
v3.9.0
from Intel EmbreeThere are still some amount of work required for Improving NEON code path. So neon-fix
branch will continue to alive even after syncing embree-aarch64 with intel embree v3.8.0.
verify
fails to pass when built without ray packats(-DEMBREE_TASKING_SYSTEM=INTERNAL
), even on x86-64 platform.
$ ./verify
create_device ...
Embree Ray Tracing Kernels 3.7.0 (72275957d467242edd283d206261f2410c5eabae)
Compiler : CLANG 8.0.0 (tags/RELEASE_800/final)
Build : Debug
Platform : Linux (64bit)
CPU : Unknown CPU (AuthenticAMD)
Threads : 32
ISA : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2
MXCSR : FTZ=1, DAZ=1
Config
Threads : default
ISA : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND AVX2 FMA3 LZCNT BMI1 BMI2
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI AVX2 (supported)
SSE2 SSE4.2 AVX AVX2 AVX512SKX (compile time enabled)
Features: intersection_filter
Tasking : internal_tasking_system
[PASSED]
fast_allocator_regression_test ... [PASSED]
cache_regression_test ... [PASSED]
parallel_for_regression_test ... [PASSED]
parallel_reduce_regression_test ... [PASSED]
parallel_prefix_sum_regression ... [PASSED]
parallel_for_for_regression_test ... [PASSED]
parallel_for_for_prefix_sum_regression_test ... [PASSED]
parallel_partition_regression_test ... [PASSED]
RadixSortRegressionTestU32 ... [PASSED]
RadixSortRegressionTestU64 ... [PASSED]
parallel_set_regression_test ... [PASSED]
parallel_map_regression_test ... [PASSED]
parallel_filter_regression ... [PASSED]
barrier_sys_regression_test ... [PASSED]
SSE2.multiple_devices ... [PASSED]
SSE2.types_test ... [PASSED]
SSE2.get_bounds ...++++++ [PASSED]
SSE2.get_linear_bounds ...++++++ [PASSED]
SSE2.get_user_data ... [PASSED]
SSE2.buffer_stride ...++++++++ [PASSED]
SSE2.empty_scene ...++++++++++ [PASSED]
SSE2.empty_geometry ...++++++++++ [PASSED]
SSE2.build ...++++++++++ [PASSED]
SSE2.overlapping_primitives ...++++++++++ [PASSED]
SSE2.new_delete_geometry ...............................................................................................................................+++.++ [PASSED]
SSE2.user_geometry_id ...+++++ [PASSED]
SSE2.enable_disable_geometry ...+++++ [PASSED]
SSE2.update ...-----------------------------------------------------+--+++++---++---++---+++++++-+++-++----++++++++++++++++++++++++++++ [FAILED]
SSE2.build_garbage_geom ..................................................... [PASSED]
SSE2.interpolate.triangles ...++++++ [PASSED]
SSE2.interpolate.grid ...++++++ [PASSED]
SSE2.interpolate.subdiv ...++++++ [PASSED]
SSE2.interpolate.hair ...++++++ [PASSED]
SSE2.triangle_hit ...---+----++--+-+-----++-++-+--+--+-+-++++---++-+-+-+-++-----++--++++--++--+++--+-----+-+-+---++++-----+----+-+---+++-+--- [FAILED]
SSE2.quad_hit ...----------++--+-+-+---++++-+----++-+-++---+++++----+++----++-+-+--+-++----+-+-+----++++--+-++--+-+++--+-+-+-+---+-----+- [FAILED]
SSE2.intersection_filter ...-+-----+---+--+++----+--+------++--+-+---++---+++++-+-+--+---+---++-+------++--++++-++-----+++-+---+-+----+++--+++-+--++----+--+---+---+++---++-----+------++---+------+--+++---++--++---+-+++-----+++--+-++++++-++--+-+++---+--++++-++--+++-+-- [FAILED]
SSE2.instancing ...---+verify: /home/syoyo/work/embree/tutorials/verify/verify.cpp:2813: virtual VerifyApplication::TestReturnValue embree::InstancingTest::run(embree::VerifyApplication *, bool): Assertion `passed' failed.
Bases app framework based on bitmap-plasma example from Android NDK samples is here
https://github.com/lighttransport/embree-android/tree/android/tutorials/android/bitmap-render
/data/local/tmp
rtcIntersect1
fails to execute on AARCH64 when NEW_SORTING_CODE
is enabled.
Debugging with gdb and AddressSanitizer, BVH node data(ptr value) become corrupted around cur = toSizeT(s3);
https://github.com/embree/embree/blob/master/kernels/bvh/bvh_traverser1.h#L627
work around : NEW_SORTING_CODE
is disabled for AARCH64 target.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.