Coder Social home page Coder Social logo

Comments (9)

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Thanks for raising a bug report. I think your provided enough details to figure it out. I will let you know when I have pushed an update.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

I pushed an update that should have fixed it but could not test it. Let me know your observations.

from gpu-utils.

DanaGoyette avatar DanaGoyette commented on August 29, 2024

Thanks, now gpu-ls says this (after rebooting to set ppfeaturemask).

Since I haven't really used these tools before, I don't know what to expect, but it makes sense that there's no WattMan: the same is true on Windows, you can't really tune anything.

Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
1 total GPUs, 0 rw, 0 r-only, 0 w-only

Card Number: None
   Vendor: AMD
   Readable: False
   Writable: False
   Compute: False
   Device ID: {'device': '0x67e3', 'subsystem_device': '0x0b0d', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'}
   Decoded Device ID: Baffin [Radeon Pro WX 4100]
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
   PCIe ID: 0004:01:00.0
   Driver: amdgpu
   GPU Type: Unsupported
   HWmon: None
   Card Path: None
   System Card Path: /sys/devices/pci0004:01/0004:01:00.0

Full debug log:

DEBUG:gpu-utils:env.set_args:Install type: debian
DEBUG:gpu-utils:env.set_args:Command line arguments:
  Namespace(about=False, short=False, table=False, pstates=False, ppm=False, clinfo=False, no_fan=False, debug=True)
DEBUG:gpu-utils:env.set_args:Local TZ: PDT
DEBUG:gpu-utils:env.set_args:pciid path set to: /usr/share/misc/pci.ids
DEBUG:gpu-utils:env.set_args:Icon path set to: /usr/share/rickslab-gpu-utils/icons
DEBUG:gpu-utils:gpu-ls.main:########## gpu-ls 3.6.2
DEBUG:gpu-utils:env.check_env:Using python: 3.9.7
DEBUG:gpu-utils:env.check_env:Using Linux Kernel: 5.15.28-cex7
DEBUG:gpu-utils:env.check_env:Using Linux Distro: Ubuntu
DEBUG:gpu-utils:env.check_env:Linux Distro Description: Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:Distro: Ubuntu, Ubuntu 21.10
DEBUG:gpu-utils:env.check_env:lspci path: /usr/bin/lspci
DEBUG:gpu-utils:env.check_env:clinfo path: /usr/bin/clinfo
DEBUG:gpu-utils:env.check_env:Ubuntu package query tool: /usr/bin/dpkg
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_NAME: [AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_VERSION: [OpenCL 1.1 Mesa 21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DRIVER_VERSION: [21.2.6]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_OPENCL_C_VERSION: [OpenCL C 1.1]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_COMPUTE_UNITS: [16]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: [3]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_ITEM_SIZES: [256 256 256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_WORK_GROUP_SIZE: [256]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE: [64]
DEBUG:gpu-utils:GPUmodule.read_gpu_opencl_data:openCL map CL_DEVICE_MAX_MEM_ALLOC_SIZE: [3435973836]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:OpenCL map: {None: {'prf_wg_multiple': '64', 'max_wg_size': '256', 'prf_wg_size': None, 'max_wi_sizes': '256 256 256', 'max_wi_dim': '3', 'max_mem_allocation': '3435973836', 'simd_ins_width': None, 'simd_width': None, 'simd_per_cu': None, 'max_cu': '16', 'device_name': 'AMD Radeon (TM) Pro WX 4100 (POLARIS11, DRM 3.42.0, 5.15.28-cex7, LLVM 12.0.1)', 'opencl_version': 'OpenCL C 1.1', 'driver_version': '21.2.6', 'device_version': 'OpenCL 1.1 Mesa 21.2.6'}}
DEBUG:gpu-utils:env.read_amdfeaturemask:Raw Featuremask string: [0xfff7bfff]
DEBUG:gpu-utils:env.read_amdfeaturemask:AMD featuremask: 0xfff7bfff
DEBUG:gpu-utils:GPUmodule.get_gpu_pci_list:Found GPU pci: 0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Found 1 GPUs
DEBUG:gpu-utils:GPUmodule.add:Added GPU Item e8020b1d36c540ccb5aa3eeedb97fe8e to GPU List
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:lspci output items:
 ['0004:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tSubsystem: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', '\tKernel driver in use: amdgpu', '\tKernel modules: amdgpu', '']
DEBUG:gpu-utils:GPUmodule.set_gpu_list:gpu_name: [Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]]
DEBUG:gpu-utils:GPUmodule.set_gpu_list:sysfpath: /sys/devices/pci0004:01/0004:01:00.0
device_dir: /sys/class/drm/card0/device
DEBUG:gpu-utils:GPUmodule.set_gpu_list:card_path not set for: 0004:01:00.0
DEBUG:gpu-utils:GPUmodule.set_gpu_list:GPU[e8020b1d36c540ccb5aa3eeedb97fe8e] type set to Unsupported
DEBUG:gpu-utils:GPUmodule.set_gpu_list:/sys/device file search found match to pcie_id 0004:01:00.0:
['/sys/devices/pci0004:01/0004:01:00.0']
DEBUG:gpu-utils:GPUmodule.populate_prm_from_dict:prm dict:
{'pcie_id': '0004:01:00.0', 'model': 'Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]', 'vendor': <vendor.AMD: 3>, 'driver': 'amdgpu', 'card_path': '', 'sys_card_path': '/sys/devices/pci0004:01/0004:01:00.0', 'gpu_type': <type.Unsupported: 2>, 'hwmon_path': '', 'readable': False, 'writable': False, 'compute': False, 'compute_platform': None}
DEBUG:gpu-utils:GPUmodule.set_gpu_list:Card flags: readable: False, writable: False, type: Unsupported
DEBUG:gpu-utils:GPUmodule.read_gpu_sensor_generic:sensor path set to [/sys/devices/pci0004:01/0004:01:00.0]
DEBUG:gpu-utils:GPUmodule.set_params_value:Set param value: [['0x1002', '0x67e3', '0x1002', '0x0b0d']], type: [<class 'list'>]
DEBUG:gpu-utils:GPUmodule.wattman_status:AMD featuremask: 0xfff7bfff

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Maybe there is also a difference in the way your system defines/uses what I am calling the card_path, which typically contains a link to the system device file. Can you check the contents of "/sys/class/drm/"? Also the contents of what gpu-ls is reporting for the "system card path" would also be useful.

AMD gpu's before Fiji are not well supported in the linux drivers, so available capability for Baffin may be limited. But I am motivated to figure out how to deal with card path and device path for this type of installation.

from gpu-utils.

DanaGoyette avatar DanaGoyette commented on August 29, 2024

/sys/class/drm:

lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-1 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-1
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-2 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-2
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-3 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-3
lrwxrwxrwx  1 root root    0 Mar 21 17:16 card0-DP-4 -> ../../devices/pci0004:01/0004:01:00.0/drm/card0/card0-DP-4
lrwxrwxrwx  1 root root    0 Mar 21 17:16 renderD128 -> ../../devices/pci0004:01/0004:01:00.0/drm/renderD128
-r--r--r--  1 root root 4096 Mar 21 17:16 version

/sys/class/hwmon:

lrwxrwxrwx  1 root root 0 Mar 21 17:16 hwmon0 -> ../../devices/virtual/thermal/thermal_zone0/hwmon0
lrwxrwxrwx  1 root root 0 Mar 21 17:16 hwmon1 -> ../../devices/pci0004:01/0004:01:00.0/hwmon/hwmon1

/sys/class/hwmon/hwmon1/:

lrwxrwxrwx 1 root root    0 Mar 21 17:16 device -> ../../../0004:01:00.0
-rw-r--r-- 1 root root 4096 Mar 21 17:45 fan1_enable
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_max
-r--r--r-- 1 root root 4096 Mar 21 17:16 fan1_min
-rw-r--r-- 1 root root 4096 Mar 21 17:45 fan1_target
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq1_input
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq1_label
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq2_input
-r--r--r-- 1 root root 4096 Mar 21 17:45 freq2_label
-r--r--r-- 1 root root 4096 Mar 21 17:16 in0_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 in0_label
-r--r--r-- 1 root root 4096 Mar 21 17:16 name
drwxr-xr-x 2 root root    0 Mar 21 17:45 power
-r--r--r-- 1 root root 4096 Mar 21 17:16 power1_average
-rw-r--r-- 1 root root 4096 Mar 21 17:16 power1_cap
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_default
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_max
-r--r--r-- 1 root root 4096 Mar 21 17:45 power1_cap_min
-r--r--r-- 1 root root 4096 Mar 21 17:16 power1_label
-rw-r--r-- 1 root root 4096 Mar 21 17:45 pwm1
-rw-r--r-- 1 root root 4096 Mar 21 17:45 pwm1_enable
-r--r--r-- 1 root root 4096 Mar 21 17:45 pwm1_max
-r--r--r-- 1 root root 4096 Mar 21 17:45 pwm1_min
lrwxrwxrwx 1 root root    0 Mar 21 17:16 subsystem -> ../../../../../class/hwmon
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_crit
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_crit_hyst
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_input
-r--r--r-- 1 root root 4096 Mar 21 17:16 temp1_label
-rw-r--r-- 1 root root 4096 Mar 21 17:16 uevent

Speaking of PCIe domains, the other place I've seen them is on multi-socket boards, but those are a different kind of expensive.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Are PCI domains unique to multi-socket boards? My first case of seeing it in this project. It would be cool to have a multi-socket system up and running, but with 64 core single socket system being available, I had not considered the cost of dual socket.

I just pushed a quick update. It adds capability to handle domain in setting card path. Let me know if it works. Once we get this working, It would be best if I refactored this section of code.

from gpu-utils.

DanaGoyette avatar DanaGoyette commented on August 29, 2024

In my ARM board's case, it's not really multi-socket, it just has the PCIe root hidden in firmware because of quirks.

Thanks for the additional fix, now it sees plenty of info. I'll paste the output, but not the (now larger) debug log.
Note that at the moment, I'm booted with amdgpu.bapm=0, as an attempt to work around odd hangs.

Ubuntu: Validated
Detected GPUs: AMD: 1
amdgpu/rocm version: UNKNOWN
AMD: Wattman features not enabled: 0xfff7bfff, See README file.
1 total GPUs, 0 rw, 1 r-only, 0 w-only

Card Number: 0
   Vendor: AMD
   Readable: True
   Writable: False
   Compute: False
   GPU UID: None
   Device ID: {'device': '0x67e3', 'subsystem_device': '0x0b0d', 'subsystem_vendor': '0x1002', 'vendor': '0x1002'}
   Decoded Device ID: Baffin [Radeon Pro WX 4100]
   Card Model: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]
   Display Card Model:  Baffin Pro WX 4100
   PCIe ID: 0004:01:00.0
      Link Speed: 8.0 GT/s PCIe
      Link Width: 8
   ##################################################
   Driver: amdgpu
   vBIOS Version: 113-D0150600-103
   Compute Platform: None
   GPU Type: Modern
   HWmon: /sys/class/drm/card0/device/hwmon/hwmon1
   Card Path: /sys/class/drm/card0/device
   System Card Path: /sys/devices/pci0004:01/0004:01:00.0
   ##################################################
   Current Power (W): 6.146
   Power Cap (W): 35.000
      Power Cap Range (W): [0, 35]
   Fan Enable: 0
   Fan PWM Mode: [2, 'Dynamic']
   Fan Target Speed (rpm): 2035
   Current Fan Speed (rpm): 2035
   Current Fan PWM (%): 19
      Fan Speed Range (rpm): [1600, 6000]
      Fan PWM Range (%): [0, 100]
   ##################################################
   Current GPU Loading (%): 0
   Current Memory Loading (%): 1
   Current GTT Memory Usage (%): 0.603
      Current GTT Memory Used (GB): 0.024
      Total GTT Memory (GB): 4.000
   Current VRAM Usage (%): 0.895
      Current VRAM Used (GB): 0.036
      Total VRAM (GB): 4.000
   Current  Temps (C): {'edge': 25.0}
   Critical Temps (C): {'edge': 99.0}
   Current Voltages (V): {'vddgfx': 718}
   Current Clk Frequencies (MHz): {'mclk': 300.0, 'sclk': 214.0}
   Current SCLK P-State: [0, '214Mhz']
   Current MCLK P-State: [0, '300Mhz']
   Power Profile Mode: 1-3D_FULL_SCREEN
   Power DPM Force Performance Level: auto

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

Can you check if the file pp_od_clk_voltage exists in the card path directory? Just want to verify if there are other issues in writing to the card. This is the driver file that is written to for under/overclocking the GPU. In older cards, I expect writing is not supported and the file doesn't exist.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 29, 2024

3.6.3 released with this fix.

from gpu-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.