Coder Social home page Coder Social logo

amdgpu-plot erasing plot about gpu-utils HOT 23 CLOSED

csecht avatar csecht commented on August 28, 2024
amdgpu-plot erasing plot

from gpu-utils.

Comments (23)

csecht avatar csecht commented on August 28, 2024

Just another view of amdgpu-plot after it ran for a couple hours...
amdgpu-plot_shrinkage

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I have never noticed this behavior in the past. I wonder if there is an old package dependency. I have a new project I am working on where I am leveraging a requirements file to make sure all dependencies are met. I will dig into this one over the weekend.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Can you verify which version of matplotlib you are using with the following command:

./amdgpu-plot --about

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

Version: v2.5.2
Maintainer: RueiKe
Status: Stable Release
matplotlib version: 2.1.1
pandas version: 0.24.2
numpy version: 1.16.2

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

I just ran amdgpu-plot and it's graphing fine now. Over the past few days, I've done a system update and a few restarts, but am not sure what fixed it or why it was buggy. Next time I'll record the --about data when it, or any other module, starts acting up.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

There is a known issue where the way I am using matplotlib eventually stops working correctly. It get's corrupted after about an hour. My approach is probably flawed, as I am redrawing the entire plot every update. There is a way to add to the plot, but I have not figured out to implement in yet. There is a comment in the README indicating this.

I have started another private project, and I have found that alignment of python environments is critical among collaborators. I am using python venv to accomplish it. I am not sure of the best way to make it available to casual users, but here is how I implement in the other project:
First, install venv:

sudo apt install -y python3-venv

Then activate the environment while in the project root directory:

python3 -m venv amdgpu-env
source amdgpu-env/bin/activate

The first time, or anytime the requirements file changes, you will need to execute this:

pip install --no-cache-dir -r requirements.txt

To exit the venv, execute the deactivate command.

Maybe there is a way to make all of this happen without the user having to be aware of all of the details. I will continue to research. Let me know your thoughts.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I was thinking about modifying amdgpu-chk to check the existence of the virtual env and create if needed and then run the pip install command.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

@csecht
I have made significant code updates, one minor bug, no new features, lots of PEP8 style updates. I have also added requirements.txt file for pip install. I have also enabled the use of venv, which I find quite useful. It is the latest on master. Let me know if you find any issues.

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

All seems fine on my local Linux host, but amdgpu-plot doesn't work on my remote host that has the same system, Ubuntu 18.04.3. This is what I get:

~/Desktop/amdgpu-utils-master$ ./amdgpu-plot
Traceback (most recent call last):
  File "./amdgpu-plot", line 61, in <module>
    import pandas as pd
  File "/usr/lib/python3/dist-packages/pandas/__init__.py", line 58, in <module>
    from pandas.io.api import *
  File "/usr/lib/python3/dist-packages/pandas/io/api.py", line 19, in <module>
    from pandas.io.packers import read_msgpack, to_msgpack
  File "/usr/lib/python3/dist-packages/pandas/io/packers.py", line 68, in <module>
    from pandas.util._move import (
ValueError: module functions cannot set METH_CLASS or METH_STATIC

( But -plot hadn't been working on the remote host prior to the recent master, either, because I wasn't ever able to get pandas installed.)
All other amdgpu-utils seem okay on the remote host.
I tried installing the requirements.txt file on the remote host and got an error:

~/Desktop/amdgpu-utils-master$ sudo -H pip3 install --no-cache-dir -r requirements.txt
[sudo] password for craig: 
Requirement already satisfied: cycler==0.10.0 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 1))
Collecting kiwisolver==1.1.0 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/f8/a1/5742b56282449b1c0968197f63eae486eca2c35dcd334bab75ad524e0de1/kiwisolver-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (90kB)
    100% |████████████████████████████████| 92kB 606kB/s 
Collecting matplotlib==3.1.1 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/57/4f/dd381ecf6c6ab9bcdaa8ea912e866dedc6e696756156d8ecc087e20817e2/matplotlib-3.1.1-cp36-cp36m-manylinux1_x86_64.whl (13.1MB)
    100% |████████████████████████████████| 13.1MB 32.0MB/s 
Collecting numpy==1.17.1 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/75/92/57179ed45307ec6179e344231c47da7f3f3da9e2eee5c8ab506bd279ce4e/numpy-1.17.1-cp36-cp36m-manylinux1_x86_64.whl (20.4MB)
    100% |████████████████████████████████| 20.4MB 1.1MB/s 
Collecting pandas==0.25.1 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/73/9b/52e228545d14f14bb2a1622e225f38463c8726645165e1cb7dde95bfe6d4/pandas-0.25.1-cp36-cp36m-manylinux1_x86_64.whl (10.5MB)
    100% |████████████████████████████████| 10.5MB 1.3MB/s 
Collecting pkg-resources==0.0.0 (from -r requirements.txt (line 6))
  Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 6)) (from versions: )
No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 6))

The basics check out, however:

 ~/Desktop/amdgpu-utils-master$ ./amdgpu-chk
Using python 3.6.8
           Python version OK. 
Using Linux Kernel 5.0.0-27-generic
           OS kernel OK. 
AMD GPU driver is driver=amdgpu latency=0
           AMD driver OK. 

The requirements installation worked fine on my local host.

~/Desktop/amdgpu-utils$ sudo -H pip3 install --no-cache-dir -r requirements.txt
Requirement already satisfied: cycler==0.10.0 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 1))
Collecting kiwisolver==1.1.0 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/f8/a1/5742b56282449b1c0968197f63eae486eca2c35dcd334bab75ad524e0de1/kiwisolver-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (90kB)
    100% |████████████████████████████████| 92kB 1.9MB/s 
Collecting matplotlib==3.0.3 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/e9/69/f5e05f578585ed9935247be3788b374f90701296a70c8871bcd6d21edb00/matplotlib-3.0.3-cp36-cp36m-manylinux1_x86_64.whl (13.0MB)
    100% |████████████████████████████████| 13.0MB 7.4MB/s 
Collecting numpy==1.16.3 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/c1/e2/4db8df8f6cddc98e7d7c537245ef2f4e41a1ed17bf0c3177ab3cc6beac7f/numpy-1.16.3-cp36-cp36m-manylinux1_x86_64.whl (17.3MB)
    100% |████████████████████████████████| 17.3MB 2.7MB/s 
Collecting pandas==0.24.2 (from -r requirements.txt (line 5))
  Downloading https://files.pythonhosted.org/packages/19/74/e50234bc82c553fecdbd566d8650801e3fe2d6d8c8d940638e3d8a7c5522/pandas-0.24.2-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
    100% |████████████████████████████████| 10.1MB 2.3MB/s 
Collecting pyparsing==2.4.0 (from -r requirements.txt (line 6))
  Downloading https://files.pythonhosted.org/packages/dd/d9/3ec19e966301a6e25769976999bd7bbe552016f0d32b577dc9d63d2e0c49/pyparsing-2.4.0-py2.py3-none-any.whl (62kB)
    100% |████████████████████████████████| 71kB 11.1MB/s 
Collecting python-dateutil==2.8.0 (from -r requirements.txt (line 7))
  Downloading https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)
    100% |████████████████████████████████| 235kB 8.5MB/s 
Collecting pytz==2019.1 (from -r requirements.txt (line 8))
  Downloading https://files.pythonhosted.org/packages/3d/73/fe30c2daaaa0713420d0382b16fbb761409f532c56bdcc514bf7b6262bb6/pytz-2019.1-py2.py3-none-any.whl (510kB)
    100% |████████████████████████████████| 512kB 88.0MB/s 
Collecting ruamel.yaml==0.16.5 (from -r requirements.txt (line 9))
  Downloading https://files.pythonhosted.org/packages/fa/90/ecff85a2e9c497e2fa7142496e10233556b5137db5bd46f3f3b006935ca8/ruamel.yaml-0.16.5-py2.py3-none-any.whl (123kB)
    100% |████████████████████████████████| 133kB 7.8MB/s 
Collecting ruamel.yaml.clib==0.1.2 (from -r requirements.txt (line 10))
  Downloading https://files.pythonhosted.org/packages/96/62/ed93cb8ae7e2ad8c5fe874e8027306aeee0c6a02c04fa015b5f99d14b3db/ruamel.yaml.clib-0.1.2-cp36-cp36m-manylinux1_x86_64.whl (549kB)
    100% |████████████████████████████████| 552kB 18.7MB/s 
Collecting six==1.12.0 (from -r requirements.txt (line 11))
  Downloading https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from kiwisolver==1.1.0->-r requirements.txt (line 2))
Installing collected packages: kiwisolver, six, python-dateutil, numpy, pyparsing, matplotlib, pytz, pandas, ruamel.yaml.clib, ruamel.yaml
  Found existing installation: six 1.11.0
    Not uninstalling six at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: python-dateutil 2.6.1
    Not uninstalling python-dateutil at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: numpy 1.13.3
    Not uninstalling numpy at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: pyparsing 2.2.0
    Not uninstalling pyparsing at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: matplotlib 2.1.1
    Not uninstalling matplotlib at /usr/lib/python3/dist-packages, outside environment /usr
  Found existing installation: pytz 2018.3
    Not uninstalling pytz at /usr/lib/python3/dist-packages, outside environment /usr
Successfully installed kiwisolver-1.1.0 matplotlib-3.0.3 numpy-1.16.3 pandas-0.24.2 pyparsing-2.4.0 python-dateutil-2.8.0 pytz-2019.1 ruamel.yaml-0.16.5 ruamel.yaml.clib-0.1.2 six-1.12.0

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

From the output of amdgpu-chk, it looks like you are not running the latest. You should get warnings concerning venv. Also, have you tried to run in a venv? The latest users guide has details on how to set it up.

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

Even with venv running, amdgpu-plot still compresses plots on long runs. This screenshot was after about 1.5 hr. Is it an X-axis scaling issue?
amdgpu-plot_longrun

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

This is a known issue that I don't know how to solve yet. I was hoping to come back to it after I gained more matplotlib experience in my next project, but unfortunately, the finance module of matplotlib has been deprecated, so I can not use it in my new project.

The approach that is implemented is that the plot utilities will update the entire plot with every new update. I truncate the dataframe and then re-plot the data frame each update. The preferred approach is to add to the current plot. I need to spend some time researching to figure out how to do that.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

@csecht
I spent a full day digging into the amdgpu-plot code and found several issues and opportunities for improvement. I was able to run the new version overnight without issues on one of my systems. Please give it a try and let me know of any issues.

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

I downloaded the most recent commit, have amdgpu-plot running now, and will let you know.
But the self-check at execution of -plot (in amdgpu-utils-env) and of -ls (while not in amdgpu-utils-env) can no longer report the amdgpu version:

AMD Wattman features enabled: 0xffff7fff
amdgpu version: UNKNOWN
2 AMD GPUs detected, 2 may be compatible, checking...
2 are confirmed compatible.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I had some debug mods still in place. I removed them and uploaded fixed version. Thanks!

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I am planning a release soon. Can you review the utility descriptions in the latest README.md?

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

Yes, after an overnight run, amdgpu-plot is working perfectly.

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

Here are README.md edits for consideration:

amdgpu-monitor: May want to add at the end of the section, or in User Guide, that monitor is shutdown with ^C.
Is the --plot option necessary, given amdgpu-plot? Also given that the --plot option opens both --gui and --plot windows; in the amdgpu-plot section of the User Guide it is recommended to not have a monitor and a plot function running at the same time because of excess system overhead or something to that effect.

amdgpu-plot: It says, "The --stdin option causes amdgpu-plot to read GPU data from stdin. This is how amdgpu-monitor produces the plot. The benefit of using it in this mode is that both the table and plots are updated with a single read from the driver files." Does this mean that both amdgpu-monitor and amdgpu-plot can be run simultaneously, but from different terminal windows? See above.

In any event, amdgpu-plot --stdin isn't working because in stalls on

amdgpu-plot waiting for initial data.........

unless I'm misunderstanding something about that option.
amdgpu-plot --stdin also executes without displaying the initial system check, as seen with -monitor, -plot, -ls, etc.

amdgpu-pac: Edit "If you have confidence, the --execute_pac option can be used to execute the bash file when saved and then delete it." to, "If you have confidence, the --execute_pac option can be used to execute the bash file when saved; once executed the file is automatically deleted."

amdgpu-pciid: All looks good here. I just want to crow that I added a PCI ID database entry for a "RX 560D OEM OC 2 GB" card, which is in one of my hosts. The name that the PCI ID moderator decided on is longer than what I proposed (and the "GB" part is truncated in the amdgpu-monitor window), but the important bits are displayed.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Here are README.md edits for consideration:

amdgpu-monitor: May want to add at the end of the section, or in User Guide, that monitor is shutdown with ^C.
Is the --plot option necessary, given amdgpu-plot? Also given that the --plot option opens both --gui and --plot windows; in the amdgpu-plot section of the User Guide it is recommended to not have a monitor and a plot function running at the same time because of excess system overhead or something to that effect.

If you run amdgpu-monitor with the --plot option, a single read of the GPU status is used to update both the plot and monitor. If you run them separately, then both tools will query the GPU resulting in twice as many reads.

amdgpu-plot: It says, "The --stdin option causes amdgpu-plot to read GPU data from stdin. This is how amdgpu-monitor produces the plot. The benefit of using it in this mode is that both the table and plots are updated with a single read from the driver files." Does this mean that both amdgpu-monitor and amdgpu-plot can be run simultaneously, but from different terminal windows? See above.

In any event, amdgpu-plot --stdin isn't working because in stalls on

amdgpu-plot waiting for initial data.........

unless I'm misunderstanding something about that option.
amdgpu-plot --stdin also executes without displaying the initial system check, as seen with -monitor, -plot, -ls, etc.

When using the --stdin option, you must pipe data into the process:

cat logfile | ./amdgpu-plot --stdin --simlog

I have modified both the plot and monitor tools to make things more clear.

amdgpu-pac: Edit "If you have confidence, the --execute_pac option can be used to execute the bash file when saved and then delete it." to, "If you have confidence, the --execute_pac option can be used to execute the bash file when saved; once executed the file is automatically deleted."

Good catch. I have modified.

amdgpu-pciid: All looks good here. I just want to crow that I added a PCI ID database entry for a "RX 560D OEM OC 2 GB" card, which is in one of my hosts. The name that the PCI ID moderator decided on is longer than what I proposed (and the "GB" part is truncated in the amdgpu-monitor window), but the important bits are displayed.

When I suggested a change, it was accepted as is. I guess it depends on which moderator checks your input.

I have modified the README.md and the docstrs of all utilities.

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

I do have two other major changes in the plot and monitor utilities:

  • I found that on my 5 GPU system, there was a significant chance that the GPUs were being read when the close window was selected, which would cause and error. I found/implemented an easy fix for this.
  • I also found that the monitor window would update sporadically when --plot was used on my 5 gpu system. I fixed this by buffering data writes to the plot process and using flush after writing all GPUs. This should also improve performance of systems with less GPUs.

Let me know if you see any issues with the latest on master.

from gpu-utils.

csecht avatar csecht commented on August 28, 2024

from gpu-utils.

Ricks-Lab avatar Ricks-Lab commented on August 28, 2024

Thanks for checking it out.

I think the systemd approach would be a good addition to the user guide. I have started my travels to the US, so I won’t do the release for at least a week.

from gpu-utils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.