Coder Social home page Coder Social logo

Comments (9)

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Which distro are you using? The driver files are normally world readable.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Also, I do not recommend running with sudo. The execution of files that write to driver files needs root permissions. gpu-pac is the only utility that writes to these files. By default it creates bash files that you can execute yourself with sudo or if you use the --execute_pac option, it will execute the bash script with sudo which will prompt you for credentials at the command line.

It would also be helpful to execute with --debug option and post the log file contents here. Feel free to delete any details from the logfile that you do not want to make public.

from gpu-utils.

kcsf avatar kcsf commented on August 29, 2024

Hi Rick!
Thank you so much for your prompt response. I got busy and neglected to follow up. Now of course, it's rather urgent that I knock the power usage down on these gpus from 100 watts to 80.

Here's some info:

cg@gpu-13-23:$ pip list | grep rickslab-gpu-utils
rickslab-gpu-utils 3.6.0
cg@gpu-13-23:
$ pip3 list | grep rickslab-gpu-utils
rickslab-gpu-utils 3.6.0
cg@gpu-13-23:$ dpkg -l | grep gpu-utils
ii rickslab-gpu-utils 3.6.0-2 all AMD GPU performance adjustment and monitoring
cg@gpu-13-23:
$ gpu-ls --debug
Error: Invalid icon path
Ubuntu: Validated
Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 102, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list
pp_od_file_details = file_ptr.read()
PermissionError: [Errno 1] Operation not permitted
cg@gpu-13-23:~$ gpu-ls
Error: Invalid icon path
Detected GPUs: AMD: 1
AMD: amdgpu version: 1:6.0.60002-1718217.22.04
AMD: Wattman features enabled: 0xfffd7fff
Warning: Can not read parameter: loading, disabling for this GPU: 0
Warning: Can not read parameter: mem_loading, disabling for this GPU: 0
Warning: Can not read parameter: sclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: mclk_ps, disabling for this GPU: 0
Warning: Can not read parameter: ppm, disabling for this GPU: 0
Warning: Can not read parameter: power_dpm_force, disabling for this GPU: 0
Warning: Can not read parameter: power_cap_range, disabling for this GPU: 0
Warning: Can not read parameter: power, disabling for this GPU: 0
Warning: Can not read parameter: power_cap, disabling for this GPU: 0
Warning: Can not read parameter: temperatures, disabling for this GPU: 0
Warning: Can not read parameter: voltages, disabling for this GPU: 0
Warning: Can not read parameter: frequencies, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm_range, disabling for this GPU: 0
Warning: Can not read parameter: fan_enable, disabling for this GPU: 0
Warning: Can not read parameter: fan_target, disabling for this GPU: 0
Warning: Can not read parameter: fan_speed, disabling for this GPU: 0
Warning: Can not read parameter: pwm_mode, disabling for this GPU: 0
Warning: Can not read parameter: fan_pwm, disabling for this GPU: 0
1 total GPUs, 1 rw, 0 r-only, 0 w-only

Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 138, in main
gpu_list.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 2136, in read_gpu_pstates
gpu.read_gpu_pstates()
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1061, in read_gpu_pstates
for line in card_file:
PermissionError: [Errno 1] Operation not permitted
cg@gpu-13-23:~$ sudo gpu-ls --debug
Error: Invalid icon path
Ubuntu: Validated
Traceback (most recent call last):
File "/usr/bin/gpu-ls", line 154, in
main()
File "/usr/bin/gpu-ls", line 102, in main
gpu_list.set_gpu_list(clinfo_flag=True)
File "/usr/lib/python3/dist-packages/GPUmodules/GPUmodule.py", line 1885, in set_gpu_list
pp_od_file_details = file_ptr.read()
PermissionError: [Errno 1] Operation not permitted

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

When the --debug option is used, there should be a log file that is produced. Can you paste it contents here?

Also, can you upgrade to the latest version? I recently released 3.9.0 to PyPI.

from gpu-utils.

kcsf avatar kcsf commented on August 29, 2024

Ok, I upgraded to 3.9

now i'm getting this:
`cg@gpu-24-34:~$ gpu-ls --debug
Ubuntu: Validated
HW Exception by GPU node-1 (Agent handle: 0x5e41c0b8f730) reason :GPU Hang
Error: system support issue for 01:00.0: [[Errno 1] Operation not permitted]
Detected GPUs: AMD: 1
AMD: amdgpu version: 1:6.0.60002-1718217.22.04
AMD: Wattman features enabled: 0xfffd7fff
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]
Error: System support issue for GPU [01:00.0]

read_time_val: 10-Jun-2024 13:59:15

model_display: True: Cyan Skillfish
loading: True: None
mem_loading: True: None
mem_vram_usage: True: 0.06260871887207031
mem_gtt_usage: True: 0.2832306048274743
power: True: None
power_cap: True: None
energy: True: 0.0
temp_val: True: None
vddgfx_val: True: nan
fan_pwm: True: None
sclk_f_val: True: None
sclk_ps_val: True:
mclk_f_val: True: None
mclk_ps_val: True:
ppm: True:

Total of 1 GPU: 0 are rw, 1 is r-only, and 0 are w-only

Card Number: 0
Vendor: AMD
Readable: True
Writable: False
Compute: False
Device ID: {'device': '0x13fe', 'subsystem_device': '0x0000', 'subsystem_vendor': '0x1022', 'vendor': '0x1002'}
Decoded Device ID: Cyan Skillfish
Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Cyan Skillfish
Display Card Model: Cyan Skillfish
PCIe ID: 01:00.0
Link Speed: 16.0 GT/s PCIe
Link Width: 16
##################################################
Driver: amdgpu
vBIOS Version: 113-AMDRBN-003
Compute Platform: None
GPU Type: Modern
HWmon: /sys/class/drm/card0/device/hwmon/hwmon0
Card Path: /sys/class/drm/card0/device
System Card Path: /sys/devices/pci0000:00/0000:00:08.1/0000:01:00.0
##################################################
##################################################
Current GTT Memory Usage (%): 0.283
Current GTT Memory Used (GB): 0.011
Total GTT Memory (GB): 3.738
Current VRAM Usage (%): 0.063
Current VRAM Used (GB): 0.005
Total VRAM (GB): 8.000
Critical Temps (C): {}
Vddgfx Offset (mV): 0
Vddgfx Offset Range (mV): [-25, 25]
##################################################
Disabled Parameters: pp_od_clk_voltage, sclk_f_range, mclk_f_range, vddc_range,
pp_features, unique_id, loading, mem_loading,
sclk_ps, mclk_ps, pstates, ppm,
power_dpm_force, power_dpm_state, power_cap_range, power,
power_cap, temperatures, voltages, frequencies,
fan_speed_range, fan_pwm_range, fan_enable, fan_target,
fan_speed, pwm_mode, fan_pwm

`

from gpu-utils.

kcsf avatar kcsf commented on August 29, 2024

gpu-utils_debug-log.txt

Am I able to control the gpu speed and/or power use yet, or is there more troubleshooting to do?

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

I am running Ubuntu 22.04 on two systems and do not see the issue of driver files not being readable. Possibly a driver/hardware issue or feature definition may be different for newer GPUs. I suggest updating:
AMD: Wattman features enabled: 0xfffd7fff to 0xffffffff

Here is what ChatGPT has to say:

The error message you're encountering indicates a hardware exception caused by a GPU hang. This can be due to several factors, including hardware failures, driver issues, or system configuration problems. Here's a step-by-step guide to troubleshoot and address this issue:

Check System Logs:

Look into system logs for more detailed error messages. On Linux, you can use dmesg or check /var/log/syslog or /var/log/messages.
Update GPU Drivers:

Ensure that your GPU drivers are up to date. You can download the latest drivers from the GPU manufacturer's website (NVIDIA, AMD, etc.).
Check Hardware:

Ensure that the GPU is properly seated in its slot and that all power connectors are securely attached.
Monitor the GPU temperature to ensure it is not overheating. You can use tools like nvidia-smi for NVIDIA GPUs or radeontop for AMD GPUs.
Test GPU on Another System:

If possible, test the GPU on a different system to rule out hardware failure.
Verify System Configuration:

Ensure that your system’s power supply is adequate for the GPU.
Check for BIOS/UEFI updates for your motherboard and apply them if necessary.
Disable any overclocking settings and see if the problem persists.
Check Permissions:

The error message "Operation not permitted" suggests there might be a permissions issue. Make sure you have the necessary permissions to access the GPU. Running the operation as root or with sudo might help.
Consult Documentation:

Refer to the documentation for your specific GPU and system for any known issues or configuration tips.
Contact Support:

If the problem persists, consider reaching out to the GPU manufacturer’s support or your system’s support service for further assistance.
By systematically going through these steps, you should be able to identify and resolve the issue causing the GPU hang.

from gpu-utils.

kcsf avatar kcsf commented on August 29, 2024

Dang. I've tried most of that. It's a BC-250 (re-purposed PS5 card).
There are no bios updates for it. The only thing I can think to try is update the kernel & os to 24.04 - but it took me a long time to find an old kernel that worked in the first place.

Any ideas or suggestions would be much appreciated.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

I really doubt that any of this would be enabled for PS5 hardware.

from gpu-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.