Coder Social home page Coder Social logo

relay-bench's People

Contributors

ad1024 avatar dooblad avatar joshpoll avatar jroesch avatar marisakirisame avatar slyubomirsky avatar weberlo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

relay-bench's Issues

Modular API for Telemetry

We have a basic telemetry system included in the dashboard now but it makes extensive assumptions about the system the dashboard runs on (e.g., using sensors and nvidia-smi to gather data and parsing records of the outputs of these commands). In the long term, we should separate the source of telemetry data and its parsing mechanism from the dashboard infrastructure and make it possible for users to write their own data-gathering and parsing files that can be specified by configuration.

@AD1024 this will be a longer-term goal

RFC: Global Dependency System for the Dashboard

The dependency management via setup.sh files is cumbersome and unable to handle the common case of multiple experiments all requiring a single dependency. The current dashboard, for example, manually pulls in and builds TVM and the Relay AoT compiler. It would be ideal to have a principled way to deal with global dependencies like these, which would allow the dashboard to stand alone and be reused more easily.

I think one way to accomplish this would be a new dependency system that would work as follows: The dashboard would take a dependency directory, which contains subdirectories corresponding to each global dependency. These subdirectories will contain a setup.sh file that defines first-time setup to do and a register.sh that does any environmental setup needed for the dependency.

The dashboard will provide an empty folder in the configured setup folder to serve as space for each dependency to pull in whatever files they need and store that information. Experiments will list the dependencies they need in their configs and the dashboard will ensure that any dependencies listed by experiments will be set up first. Before an experiment runs, the register.sh file for each included dependency will run and the environment after running register.sh will be used for running the experiment's run.sh (and possibly analyze.sh as well). The same could be used for subsystems as well.

Another cumbersome aspect of dealing with dependencies and the existing setup.sh system is knowing when to rerun the setup because the experiment or the dependency has changed. For example, the current dashboard requires an argument to know if it should update its local TVM install. Dependencies can perhaps include a update.sh file that will determine if the setup should be rerun (e.g., checking if the latest commit has changed). Another possibility might be including a system for configuring policies in the dashboard itself, such as rerunning the setup after a specified interval.

A tough case would be multiple experiments/subsystems requiring different versions of the same dependency (e.g., different TVM commits). The simplest way to handle it would be to have "multiple" dependencies that are each fixed to the versions required; perhaps it might be worth having a way to pass version information to dependencies and build all the separate required versions automatically, but that is unclear.

It is easy to add telemetry/logging for CPU and GPU

CPU:
sensors
then do things with sed

GPU:

nvidia-smi --format=csv --query-gpu="clocks.gr,clocks.current.memory,utilization.gpu,utilization.memory,pstate,power.draw,power.limit,temperature.gpu,temperature.memory,fan.speed"

You could read more about the documentation of nvidia-smi, but that query should cover most of the things you would care about logging.

My advice: set up a process that gets some numbers you care about every 5 or 10 seconds, and then logs them. With your infrastructure it should be easy to plot the logs and see if anything funny is going on. The queries themselves shouldn't have too much impact on performance, and I don't think you'll need a granularity finer than about 5 or 10 seconds.

Including telemetry graphs in the webpage

Now that we have the vis_telemetry subsystem, we should include the generated graphs in the generated webpage. @AD1024 can you try to modify the website generator to include the last run's telemetry graphs in the generated web page (say, under a header for each experiment that has a graph)? It would be valuable for usability

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.