Coder Social home page Coder Social logo

reimplement's Introduction

Reimplement

reimplement's People

Contributors

vivekaxl avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

reimplement's Issues

Comparisons of dumb learner with progressive and projective sampling

Hypothesis

To find the most efficient configuration for a given workload, an accurate model of the system is not required. Rather a dumb model is 'good enough'.

How to build this 'good enough' model?

  • Split the data into 40-20-40 partitions, which are training, validation and testing dataset
  • Progressive sample the configurations and build a CART model (from Guo, 2013)
  • Use a new measure called rank difference which is mean of (actual rank - predicted rank) over the validation data. Please note that the ranks (relative ranks) of validation set is known
  • Stopping Criteria: If the mean rank difference doesn't reduce in three contiguous generations, then stop

What is minimum rank found?

  • Given a software and a representative workload, the task of a practitioner is to find the best possible configuration for the system. This can rephrased as if we sort the list of configuration based on their performance values, we need to find the configuration corresponding to the least performance score aka rank 1. Minimum rank is the lowest actual rank achieved by the models on the testing data.

Comparisions

  • Progressive Sampling:
    Progressively increase the size of the testing dataset until the prediction accuracy (mmre) doesn't reach a particular threshold (10%) in this case.
  • Projective Sampling:
    Using heuristic called feature-frequencies an initial population is generated and use to estimate the learning curve. Once the learning curve is found, the optimal size of the training set it also known. code

Summary of Results

  • Wins (15/22), Not so much wins (3/22), Loss (4/22)

Not so much wins?

Using really less number of configuration when compared with other two methods

  • BDBC - 35 < 44/203
  • rs-6d-c3-obj1 - 25 < 452/653
  • sol-6d-c2-obj2 - 26 < 323/1143
    where as the min ranks found are statistically significantly worse than competing methods. But if you look at the minimum ranks found: BDBC 2/2561 (median 2 out of 2561 possible configurations), rs-6d-c3-obj1 2/3840 and sol-6d-c2-obj2 5/2862.

Losses

The rank method uses more number of configuration. This is because all the data sets can be used to build accurate models with few number of configurations for eg SQL and hence terminates much faster. Our counter to these results would be in the real world, it is difficult to find systems which are so accurate. I don't have a reference to this statement but I am looking for one.

Confusion about policies.py

Here the author defined the function policy() in policies.py to control the stop criteria of rank-based and progressive methods.

def policy(scores, lives=3):
    """
    no improvement in last 3 runs
    """
    temp_lives = lives
    last = scores[0]
    for i,score in enumerate(scores):
        if i > 0:
            if temp_lives == 0:
                return i
            elif score >= last:
                temp_lives -= 1
                last = score
            else:
                temp_lives = lives  /**  Line 18 **/
                last = score
    return -1

But do you think that the variable lives in Line 18 should be temp_lives? I.e., the Line 18 should be

temp_lives = temp_lives /** or you can remove Line 18 **/

The paper "Using bad learners to find good configurations" didn't say that the scores should be improve in last 3 continuous runs.
I think that the stop criteria is more strict compared to that in your paper.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.