Coder Social home page Coder Social logo

requests's People

Contributors

mikkokotila avatar

Stargazers

 avatar

Watchers

 avatar  avatar

requests's Issues

[RFC00101] Energy Draw Tracker

Energy Draw Tracker

Track and monitor energy draw for experiments related to model training, model inference in GPUs and CPUS.

Summary

The carbon footprint caused by energy consumption of GPUs and CPUs while doing model training and model inference could be reduced, if properly tracked and taken measures to reduce. By this tool, GPU/CPU usage for model training and model inference will be monitored, and logged.

Technical Overview

The Energy Draw Tool provides the following features:

  • Easy to use plugin for all your experiments with very few lines of code.
  • Can be used as a callback in your Tensorflow/Pytorch/Keras experiments
  • Usage monitoring which provides support for multiple devices.
    • GPU
    • CPU:
      • Intel/Mac Chips
  • Track the combined Energy Draw for experiments with distributed machines.
  • Do tracking for
    • Model training
    • Model inference.
    • Hyperparameter tuning
  • Save emission details to a database or csv files.
  • Use visualisation to view the emission statistics

Alternatives

Rationale

The proposed design is chosen over other designs because of a number of reasons:

  • Some existing designs does not support the new Apple M series Mac Processor.
  • Existing designs does not monitor an experiment which can be run together in different machines. The proposed design will monitor a combined output from multiple machines.
  • Easier to use as a callback in your experiments.
  • Can unify multiple solutions together, so that more categories of devices can be supported.

One of the best alternative approach to this design is CodeCarbon, but the following issues arise for running with codecarbon.

  • CodeCarbon does not support running with Apple M series Mac Processor.
  • Running experiments in distributed machines require CodeCarbon API being called in every single one of them.

Drawbacks

  • Implementation would require testing multiple machines, cost of testing would be higher.

Useful References

  • What similar work have we already successfully completed?

  • Is this something that have already been built by others?: No

  • Are there useful academic literature or other articles related with this topic? (provide links)

  • Have we built a relevant prototype previously? : No

  • Do we have a rough mock for the UI/UX? : No

  • Do we have a schematic for the system? : No

Unresolved Questions

  • What is there that is unresolved (and will be resolved as part of fulfilling this request)?
    • The unresolved is the impact created by Machine Learning experiments in Climate Change. This will be resolved as part of fulfilling this request.
  • Are there other requests with same or similar problems to solve? : No

Parts of the System Affected

  • Which parts of the current system are affected by this request? : None
  • What other open requests are closely related with this request? : None
  • Does this request depend on fulfillment of any other request? : None
  • Does any other request depend on the fulfillment of this request? : None

Future possibilities

  • The API could be extended to adopt reduction strategies for energy consumption.
  • The system could be globally used with many other ML tools, helping track energy.
  • Provides energy reduction strategies which could be aligned with other ML tools and their implementation methodology.

Infrastructure

  • Detect machine details:

    • Check if CPU or GPU.
    • Check the processor(Intel/Mac).
    • Check if distributed machines are being used.
  • Run energy Tracking

    • Make different APIs which supports coverage of all machines and processors.
    • If distributed machines are being used, automatically add tracker callback for the scripts run in distributed machines.
  • Logging the output

    • The output could be logged and viewed as
      • CSV files
      • Databases

Testing

The testing procedure can be done in the following steps:

  • A tensorflow example for model training:

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU tracking API:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • A tensorflow example for model inference.

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking API:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • A talos example for hyperparameter tuning.

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU tracking API:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.

Documentation

Describe the level of documentation fulfilling this request involves. Consider both end-user documentation and developer documentation.

  • End User Documentation:

    • Introduction
      • Mission
      • Summary
      • Frequently Asked Questions
    • Getting Started
      • Installation,
      • Quickstart
      • Examples.
    • Logging
      • Output logging
      • Visualisation
  • Developer Documentation

    • Tracker
      • API reference for tracking with GPU
      • API reference for tracking with CPU:
        • API reference for tracking with Intel based processors
        • API reference for tracking with M series MAC processors
    • Logging
      • API reference for logging to csv files.
      • API reference for logging to databases
      • API reference for visualisation tools

Version History.

Version 0.0.1

Recordings.

Work Phases.

Non-Coding.

  • Planning
  • Documentation
  • Prototype Release
  • Testing

Implementation.

API

  • Build an API for tracking with GPU devices with Nvidia.Use the nvidia-smi command's features.
    • Build Callbacks
    • Build API plugin to use with python.

References :
* power draw callback
* GpuStat

  • Build an API for tracking with CPU (Intel/Mac).
    • Build Callbacks
    • Build API plugin to use with python.

References :
* PyRAPL
* EnergyUsage

Docker

  • Write Dockerfile and upload the image to docker hub

Distributed Run

  • Track scripts running on distributed machines.
    • Add support for energy tracking for hyperparameter tuning using Jako

Logging

  • Write API to log Energy output to CSV file. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
  • Write API to log output to a postgres database. Columns include Timestamp, start time, end time, Energy in W/H, Device Type, Processor Type.
  • Add Hasura API to manage the postgres database.

Visualisation

  • Write APIs for visualising using Plotly/Dash and/or Metabase. Use the logging outputs from csvs/database for visualisations.

Documentation.

Write End User documentation, as well as Developer documentation.

  • End User Documentation:

    • Introduction
      • Mission
      • Summary
      • Frequently Asked Questions
    • Getting Started
      • Installation,
      • Quickstart
      • Examples.
    • Logging
      • Output logging
      • Visualisation
  • Developer Documentation

    • Tracker
      • API reference for tracking with GPU
      • API reference for tracking with CPU:
        • API reference for tracking with Intel based processors
        • API reference for tracking with M series MAC processors
    • Logging
      • API reference for logging to csv files
      • API reference for logging to databases
      • API reference for visualisation tools

Testing

All the testing can use the Bitcoin price prediction example

  • For model training:

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • For model inference :

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.
  • For hyperparameter tuning (Using Talos for hyperparameter tuning) :

    • Run the API for tracking with GPU, log the monitored output into a csv file.
    • CPU Tracking:
      • Run the API for tracking with Intel based processors, log the monitored output into a csv file.
      • Run the API for tracking with M series Mac based processors, log the monitored output into a csv file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.