uwsampl / relay-bench Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 5.0 78.24 MB

A repository containing examples and benchmarks for Relay.

License: Apache License 2.0

Python 90.17% Shell 5.43% CMake 0.98% Java 3.42%

relay-bench's People

Contributors

Stargazers

Watchers

Forkers

slyubomirsky carol-hsu ad1024 googol-lab zineos

relay-bench's Issues

[Idea] Compile relay -> pytorch/tensroflow eager, and remove all pytorch/tf direct benchmark.

Modular API for Telemetry

We have a basic telemetry system included in the dashboard now but it makes extensive assumptions about the system the dashboard runs on (e.g., using sensors and nvidia-smi to gather data and parsing records of the outputs of these commands). In the long term, we should separate the source of telemetry data and its parsing mechanism from the dashboard infrastructure and make it possible for users to write their own data-gathering and parsing files that can be specified by configuration.

@AD1024 this will be a longer-term goal

RFC: Global Dependency System for the Dashboard

The dependency management via setup.sh files is cumbersome and unable to handle the common case of multiple experiments all requiring a single dependency. The current dashboard, for example, manually pulls in and builds TVM and the Relay AoT compiler. It would be ideal to have a principled way to deal with global dependencies like these, which would allow the dashboard to stand alone and be reused more easily.

I think one way to accomplish this would be a new dependency system that would work as follows: The dashboard would take a dependency directory, which contains subdirectories corresponding to each global dependency. These subdirectories will contain a setup.sh file that defines first-time setup to do and a register.sh that does any environmental setup needed for the dependency.

The dashboard will provide an empty folder in the configured setup folder to serve as space for each dependency to pull in whatever files they need and store that information. Experiments will list the dependencies they need in their configs and the dashboard will ensure that any dependencies listed by experiments will be set up first. Before an experiment runs, the register.sh file for each included dependency will run and the environment after running register.sh will be used for running the experiment's run.sh (and possibly analyze.sh as well). The same could be used for subsystems as well.

Another cumbersome aspect of dealing with dependencies and the existing setup.sh system is knowing when to rerun the setup because the experiment or the dependency has changed. For example, the current dashboard requires an argument to know if it should update its local TVM install. Dependencies can perhaps include a update.sh file that will determine if the setup should be rerun (e.g., checking if the latest commit has changed). Another possibility might be including a system for configuring policies in the dashboard itself, such as rerunning the setup after a specified interval.

A tough case would be multiple experiments/subsystems requiring different versions of the same dependency (e.g., different TVM commits). The simplest way to handle it would be to have "multiple" dependencies that are each fixed to the versions required; perhaps it might be worth having a way to pass version information to dependencies and build all the separate required versions automatically, but that is unclear.

[organization] Restructure scripts and cut down on repetition

@jroesch notes here that there is a lot of repetition in the restructured benchmarks. This issue is to discuss possible ways to reduce that repetition and otherwise improve the structure of the experiments.

It is easy to add telemetry/logging for CPU and GPU

CPU:
sensors
then do things with sed

GPU:

nvidia-smi --format=csv --query-gpu="clocks.gr,clocks.current.memory,utilization.gpu,utilization.memory,pstate,power.draw,power.limit,temperature.gpu,temperature.memory,fan.speed"

You could read more about the documentation of nvidia-smi, but that query should cover most of the things you would care about logging.

My advice: set up a process that gets some numbers you care about every 5 or 10 seconds, and then logs them. With your infrastructure it should be easy to plot the logs and see if anything funny is going on. The queries themselves shouldn't have too much impact on performance, and I don't think you'll need a granularity finer than about 5 or 10 seconds.

Including telemetry graphs in the webpage

Now that we have the vis_telemetry subsystem, we should include the generated graphs in the generated webpage. @AD1024 can you try to modify the website generator to include the last run's telemetry graphs in the generated web page (say, under a header for each experiment that has a graph)? It would be valuable for usability

uwsampl / relay-bench Goto Github PK

relay-bench's People

Contributors

Stargazers

Watchers

Forkers

relay-bench's Issues

[Idea] Compile relay -> pytorch/tensroflow eager, and remove all pytorch/tf direct benchmark.

Modular API for Telemetry

RFC: Global Dependency System for the Dashboard

[organization] Restructure scripts and cut down on repetition

It is easy to add telemetry/logging for CPU and GPU

Including telemetry graphs in the webpage

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent