Coder Social home page Coder Social logo

Comments (12)

LAKostis avatar LAKostis commented on August 16, 2024 1

@LAKostis this is weird, because the blender-3.2 splashscreen renders fine for me on all the devices I've tested. Are you sure your Blender is not using your cycles cache? Maybe try clearing ~/.cache/cycles before every run and install the fatbin files into /usr/share/blender/4.1/scripts/addons/cycles/lib for consistency?

I also see that you seem to be using the Blender provided by your distro. For consistency, can you download the official binaries from Blender and try testing with that instead?

Yes, I've specially tested this before rocm-6.0.x migration and this demo started failing only with rocm-6.0.x. Cache clearing doesn't help. It can be device specific issue - this demo crashes on my RX 6700 XT (gfx1031) but works on gfx900 (with rendering artifacts and only after setting HSA_ENABLE_SMDA=0)

Regarding the blender build - I'm the blender package maintainer in this distro, so I'm sure what build options where used there :) For the sake of clarity official blender builds crash exactly the same way. I can provide any additional information or logs if you need any.

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024 1

HSA_OVERRIDE_GFX_VERSION

No, with HSA_OVERRIDE_GFX_VERSION=gfx1030 it doesn't start with error HIP hipInit: Invalid device

But interesting, it works with lowering -O level:

Read prefs: "/home/lakostis/.config/blender/4.1/config/userpref.blend"
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
Compiling HIP kernel ...
hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O3 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Kernel compilation finished in 50.93s.
Memory access fault by GPU node-1 (Agent handle: 0x7fa997041200) on address 0x7fa9f1b5f000. Reason: Page not present or supervisor privilege.
Aborted
...
❯ blender
Read prefs: "/home/lakostis/.config/blender/4.1/config/userpref.blend"
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
❯ hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O1 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
...

with -O1 rendering works. And it start crashing with >=-O2. So something is not right with optimization.

Also what kernel version are you on?

I'm using 6.6.25 kernel + patches up to v6.5-2638-gbf901afac5d5f from amd-staging-drm-next

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024 1

UPDATE: more funny things with compiler:

hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O1 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"

This command produces workable kernel despite of warnings about deprecated commands. But this command:

hipcc -Wno-parentheses-equality -Wno-unused-value -O1 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"

Produces kernel which crashes. And those kernels are not equal:

-rw-r--r-- 1 lakostis lakostis 4286552 Apr  7 13:10 cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458.crash_O1
-rw-r--r-- 1 lakostis lakostis 3213824 Apr  7 13:09 cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458.works_O1

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024

I can confirm that reverting that commit (30a3adf) fixes the crash with other scenes like classroom but not with blender-3.2 (https://cloud.blender.org/p/gallery/629f23f908e12d4ff15241d3) which still crashed with the similar error (happens only with rocm-6.x):

Compiling HIP kernel ...
hipcc -Wno-parentheses-equality -Wno-unused-value --hipcc-func-supp -O3 -ffast-math --amdgpu-target=gfx1031 -I /usr/share/blender/4.1/scripts/addons/cycles/source --genco /usr/share/blender/4.1/scripts/addons/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_F68CBA054A76B5B26A931C269B2CF458"
Warning: The --hipcc-func-supp option has been deprecated and will be removed in the future.
Warning: The --amdgpu-target option has been deprecated and will be removed in the future.  Use --offload-arch instead.
Kernel compilation finished in 50.97s.
Read blend: "/home/lakostis/Downloads/Blender 3.blend"
Memory access fault by GPU node-1 (Agent handle: 0x7fd727132600) on address 0x7fd7932ca000. Reason: Page not present or supervisor privilege.
Aborted

from llvm-project.

GZGavinZhao avatar GZGavinZhao commented on August 16, 2024

@LAKostis this is weird, because the blender-3.2 splashscreen renders fine for me on all the devices I've tested. Are you sure your Blender is not using your cycles cache? Maybe try clearing ~/.cache/cycles before every run and install the fatbin files into /usr/share/blender/4.1/scripts/addons/cycles/lib for consistency?

I also see that you seem to be using the Blender provided by your distro. For consistency, can you download the official binaries from Blender and try testing with that instead?

from llvm-project.

GZGavinZhao avatar GZGavinZhao commented on August 16, 2024

Thanks for the quick response!

It can be device specific issue - this demo crashes on my RX 6700 XT (gfx1031) but works on gfx900 (with rendering artifacts and only after setting HSA_ENABLE_SMDA=0)

Just curious, does compiling against gfx1030 and running Blender through HSA_OVERRIDE_GFX_VERSION on your RX 6700XT change anything? Also what kernel version are you on?

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024

UPDATE: checked with recent rocm and blender 4.2.0

kernel 6.9.10

❯ rpm -qa|fgrep 6.1.2-alt0
fgrep: warning: fgrep is obsolescent; using grep -F
rocm-comgr-devel-6.1.2-alt0.2.x86_64
llvm-rocm-6.1.2-alt0.2.x86_64
clang-rocm-6.1.2-alt0.2.x86_64
hip-devel-6.1.2-alt0.2.x86_64
librocm-smi1-6.1.2-alt0.2.x86_64
rocminfo-6.1.2-alt0.1.x86_64
clang-rocm-tools-6.1.2-alt0.2.x86_64
rocm-smi-6.1.2-alt0.2.x86_64
rocm-opencl-runtime-6.1.2-alt0.2.x86_64
llvm-rocm-filesystem-6.1.2-alt0.2.x86_64
libhsakmt1-6.1.2-alt0.1.x86_64
clang-rocm-libs-support-6.1.2-alt0.2.x86_64
libhsa-runtime1-6.1.2-alt0.1.x86_64
hip-runtime-amd-6.1.2-alt0.2.x86_64
clang-rocm-libs-6.1.2-alt0.2.x86_64
lld-rocm-6.1.2-alt0.2.x86_64
rocm-device-libs-6.1.2-alt0.2.x86_64
libamd_comgr2-6.1.2-alt0.2.x86_64
hipcc-6.1.2-alt0.2.x86_64

❯ rpm -q blender
blender-4.2.0-alt0.1.x86_64

If I compile gfx1031 with previous workaround (--hipcc-func-supp -O1) blender crashes on every rendering with errors:

❯ blender                                                                                                                                    
register_class(...):                                                                                                                         
Info: Registering key-config preferences class: 'Prefs', bl_idname 'Blender' has been registered before, unregistering previous              
register_class(...):                                                                                                                         
Info: Registering key-config preferences class: 'Prefs', bl_idname 'Blender' has been registered before, unregistering previous                                                                                                                                                           
Read blend: "/home/lakostis/Downloads/bmw27_gpu.blend"                                                                                                                                                                                                                                    
Warning: region type 4 missing in space type "Info" (id: 7) - removing region                                                                                                                                                                                                             
:0:rocdevice.cpp            :2895: 420951542923 us: [pid:4113852 tid:0x7f5723200000] Callback: Queue 0x7f5600700000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29                   
Aborted

If I compile kernel with default upstream options (-O3) it renders most scenes (bmw/classroom) but still crashes on Blender 3 scene:

Read blend: "/home/lakostis/Downloads/bmw27_gpu.blend"                                                                                       
Warning: region type 4 missing in space type "Info" (id: 7) - removing region                                                                
Compiling HIP kernel ...                                                                                                                     
hipcc -Wno-parentheses-equality -Wno-unused-value -O3 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.2/scripts/addons_core/cycles/source --genco /usr/share/blender/4.2/scripts/addons_core/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernel
s/cycles_kernel_gfx1031_210C856BB7ABA617B857E9D03ED272C1"                                                                                    
Kernel compilation finished in 110.52s.
...
Read blend: "/home/lakostis/Downloads/Blender 3.blend"                                                                                                                                                                                                                                    
Memory access fault by GPU node-1 (Agent handle: 0x7f115fbc1800) on address 0x7f0ffe842000. Reason: Page not present or supervisor privilege.                                                                                                                                             
Aborted

If I compile with -O2, still the same behavior, Blender 3 crashes.

But everything works with -O1:

❯ hipcc -Wno-parentheses-equality -Wno-unused-value -O1 -ffast-math --offload-arch=gfx1031 -I /usr/share/blender/4.2/scripts/addons_core/cycles/source --genco /usr/share/blender/4.2/scripts/addons_core/cycles/source/kernel/device/hip/kernel.cpp -o "/home/lakostis/.cache/cycles/kernels/cycles_kernel_gfx1031_210C856BB7ABA617B857E9D03ED272C1"                                                                                                                                                                                                                               
                                                                                                                                                                                                                  
❯ blender                                                                                                                                                                                                                                                                                 
Read blend: "/home/lakostis/Downloads/Blender 3.blend"                                                                                       
Saved session recovery to "/tmp/.private/lakostis/quit.blend"                                                                                
Writing userprefs: "/home/lakostis/.config/blender/4.2/config/userpref.blend" ok                                                             
Info: Preferences saved                                                                                                                                                                                                                                                                   
                                                                                                                                             
Blender quit

from llvm-project.

GZGavinZhao avatar GZGavinZhao commented on August 16, 2024

Does reverting 30a3adf still help? I'm considering just reverting this patch for Solus's ROCm 6.1.2.

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024

from llvm-project.

pravinjagtap avatar pravinjagtap commented on August 16, 2024

Can try after applying following patches:
llvm@c86a1e6
llvm@9ff7181
llvm@56af0e9

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024

I've tested with reverted commit, can try again without reverting.

Without reverting that commit everything works with -O1 but fails if O >=2.

from llvm-project.

LAKostis avatar LAKostis commented on August 16, 2024

Can try after applying following patches: llvm@c86a1e6 llvm@9ff7181 llvm@56af0e9

Hey! Those patches are already applied in rocm-llvm somewhere in this bulk commit (1ce2523)

❯ patch -p1 --dry-run < ../c86a1e6903e9935b808c1406f480c769279b69fa.patch
checking file llvm/lib/Transforms/Scalar/GVN.cpp
Hunk #1 succeeded at 487 (offset 14 lines).
Hunk #2 FAILED at 2789.
1 out of 2 hunks FAILED
checking file llvm/lib/Transforms/Scalar/NewGVN.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored
checking file llvm/test/Transforms/GVN/convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored
checking file llvm/test/Transforms/NewGVN/convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored

❯ patch -p1 --dry-run < ../9ff71814cb5d71e907feaa0b3165e866b882f9aa.patch
checking file llvm/lib/Transforms/Scalar/EarlyCSE.cpp
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
3 out of 3 hunks ignored
checking file llvm/test/Transforms/EarlyCSE/AMDGPU/convergent-call.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored

❯ patch -p1 --dry-run < ../56af0e913ce7ec29690cc7295d75fc5573153bbf.patch
checking file llvm/lib/Transforms/Scalar/EarlyCSE.cpp
Hunk #1 succeeded at 336 with fuzz 2 (offset 18 lines).
Hunk #2 FAILED at 352.
1 out of 2 hunks FAILED
checking file llvm/test/CodeGen/AMDGPU/cse-convergent.ll
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored

from llvm-project.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.