ai-se / reimplement Goto Github PK

View Code? Open in Web Editor NEW

3.0 1.0 7.0 21.37 MB

Python 100.00%

reimplement's Introduction

Reimplement

reimplement's People

Contributors

Stargazers

Watchers

Forkers

oxhead chubbymaggie liyaoyaol youngguys24 gu-youngfeng wangdy12 yangxingguang

reimplement's Issues

Active Learning vs Random Sampling

Active Learning Make it progressive

Use Median Rank rather than minimum rank

Comparisons of dumb learner with progressive and projective sampling

Hypothesis

To find the most efficient configuration for a given workload, an accurate model of the system is not required. Rather a dumb model is 'good enough'.

How to build this 'good enough' model?

Split the data into 40-20-40 partitions, which are training, validation and testing dataset
Progressive sample the configurations and build a CART model (from Guo, 2013)
Use a new measure called rank difference which is mean of (actual rank - predicted rank) over the validation data. Please note that the ranks (relative ranks) of validation set is known
Stopping Criteria: If the mean rank difference doesn't reduce in three contiguous generations, then stop

What is minimum rank found?

Given a software and a representative workload, the task of a practitioner is to find the best possible configuration for the system. This can rephrased as if we sort the list of configuration based on their performance values, we need to find the configuration corresponding to the least performance score aka rank 1. Minimum rank is the lowest actual rank achieved by the models on the testing data.

Comparisions

Progressive Sampling:
Progressively increase the size of the testing dataset until the prediction accuracy (mmre) doesn't reach a particular threshold (10%) in this case.
Projective Sampling:
Using heuristic called feature-frequencies an initial population is generated and use to estimate the learning curve. Once the learning curve is found, the optimal size of the training set it also known. code

Summary of Results

Wins (15/22), Not so much wins (3/22), Loss (4/22)

Not so much wins?

Using really less number of configuration when compared with other two methods

BDBC - 35 < 44/203
rs-6d-c3-obj1 - 25 < 452/653
sol-6d-c2-obj2 - 26 < 323/1143
where as the min ranks found are statistically significantly worse than competing methods. But if you look at the minimum ranks found: BDBC 2/2561 (median 2 out of 2561 possible configurations), rs-6d-c3-obj1 2/3840 and sol-6d-c2-obj2 5/2862.

Losses

The rank method uses more number of configuration. This is because all the data sets can be used to build accurate models with few number of configurations for eg SQL and hence terminates much faster. Our counter to these results would be in the real world, it is difficult to find systems which are so accurate. I don't have a reference to this statement but I am looking for one.

minmax normalization

How does WHAT performs when compared to Random Sampling (Guo)

Does Accurate Model even required?

Confusion about policies.py

Here the author defined the function policy() in policies.py to control the stop criteria of rank-based and progressive methods.

def policy(scores, lives=3):
    """
    no improvement in last 3 runs
    """
    temp_lives = lives
    last = scores[0]
    for i,score in enumerate(scores):
        if i > 0:
            if temp_lives == 0:
                return i
            elif score >= last:
                temp_lives -= 1
                last = score
            else:
                temp_lives = lives  /**  Line 18 **/
                last = score
    return -1

But do you think that the variable lives in Line 18 should be temp_lives? I.e., the Line 18 should be

temp_lives = temp_lives /** or you can remove Line 18 **/

The paper "Using bad learners to find good configurations" didn't say that the scores should be improve in last 3 continuous runs.
I think that the stop criteria is more strict compared to that in your paper.

Chen, Haifeng, et al. "A cooperative sampling approach to discovering optimal configurations in large scale computing systems." Reliable Distributed Systems, 2010 29th IEEE Symposium on. IEEE, 2010.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5530314

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.