Coder Social home page Coder Social logo

sandaar / adaptor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mkpankov/adaptor

0.0 2.0 0.0 54.75 MB

Framework for statistical modeling of computer performance

TeX 22.37% HTML 40.57% JavaScript 0.54% CSS 0.15% C 27.18% Shell 0.66% Python 8.52%

adaptor's Introduction

logo

Contains 'Adaptor' computer performance modeling framework.

Author: Michael K. Pankov, graduate of Bauman Moscow State Technical University.

Installation

$ means super-user console (use sudo on Ubuntu). # means usual user console.

  • Python 2.7.*

    $ apt-get install python2.7

    • easy_install

      # wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | sudo python

    • pip

      $ easy_install pip

    • ipython

      $ apt-get install ipython

    • recordtype

      $ pip install recordtype

    • parse

      $ pip install parse

    • matplotlib

      $ pip install matplotlib

    • numpy

      $ pip install numpy

  • CouchDB

    $ apt-get install couchdb

    • CouchApp

      $ pip install couchapp

    • CouchDB Kit

      $ pip install couchdbkit

  • Tools

    $ pip install ipdb

  • Orange

    Refer to Orange Download page. Section "Building from source", subsection "setup.py".

Running

  • ipython

    • import system
    • import scenarios
    • cpdh_main(...)

Useful links

  1. Orange tutorial.
  2. Scikit-learn. Seems to have that we need. Average documentation.
    • Documentation turned out to be quite good (has explanation of models). Has many regression models, especially isotonic one, which is possibly what is useful for us. Has an Ubuntu package.
    • Tutorial showed it's a decent package, although lacking easy visualization, which is present in Orange in many forms.
  3. mlpy. Seems to have what we need. Best documentation.
    • Has a lot of regression models and decent Python-style documentation with examples (!). Has Ubuntu package.
  4. PyML. Seems to have what we need. Somewhat documented.
  5. Orange. Has graphical interface. Maybe has what we need. Average documentation.
    • Current option.
    • Orange turned out to be laggy and buggy (especially on Linux) and very poorly documented. Apart from that, it has a name which makes it impossible to Google for. It's graphical interactive version is barely usable. Maybe it's better for scripting however. We now will go with another option.

TODO

  1. [ ] Fix the system setup changing current directory.
  2. [ ] Add support of Windows.
  3. [ ] Add support of Polybench/GPU.
  4. [ ] Perform experiments on GPU.
  5. [-] Think over the workflow. It is as follows.
    • Overall, it's postponed till we have at least locally working system.
    1. [ ] Data is collected until certain number of experiments is performed.
    2. [ ] Model is learned on these experiments. It's as simple as possible. Since source code features and optimization flags present very big amount of features, it will possibly lead to overfitting. To avoid that, we should consider the use of aggregated features (like level of optimizations instead of individual ones). The model is either of two.
      • This model should take into account the hardware-software platform, dataset size and guess good compiler parameters to reach optimal performance.
      • This model should take into account the hardware-software platform, dataset size and make a prediction of performance given some fixed compiler settings.
    3. [ ] Search is directed using feature ranking โ€” features ranked in top are explored first. However, the search existence itself should be reconsidered. Rather, just normal program launches should happen. Anyway, we then assume that some experiments were conducted the specified number of times. If we're lucky, we get new points in interesting area. System could tune settings automatically without notice to the user. It could piss him off, but it could be disabled at will. It would improve the search by searching in interesting area.
    4. [ ] New model is learned. Basically it's loop of experimenting and learning.
    • Scenario itself is trial to build regression model based on feature choice. Feature choice will be implemented to account for need of different models for different platforms, which is not obviously required per se.
    • Maybe doing an offline regression model building is not so useful. We should aim for online learning.
      • In general, the system should behave as a cloud service.
      • One thought is that we should periodically detect outliers for current model and re-learn it. When re-learning fails (as it will fail due to unexpected by current model observations), we add a new model, which is used with new examples. Outliers are removed from current model and new model is learn on them. The approach is flawed in detection of what outliers are actually unpredictable data, and what are just noise.
  6. [-] Add automatic building of dummy program.
  7. [-] Add dependency checking: numpy, recordtype, couchdb, couchdbkit, couchapp.

Ideas

Currently none.

adaptor's People

Contributors

mkpankov avatar

Watchers

James Cloos avatar Victoria Rozhina avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.