reimplement's Introduction
reimplement's People
reimplement's Issues
Compare with old results
Compare z-score with minmax
Try K-Means
Try Hierarchical clustering
Active Learning vs Random Sampling
Active Learning Make it progressive
add active learning
Use Median Rank rather than minimum rank
Comparisons of dumb learner with progressive and projective sampling
Hypothesis
To find the most efficient configuration for a given workload, an accurate model of the system is not required. Rather a dumb model is 'good enough'.
How to build this 'good enough' model?
- Split the data into 40-20-40 partitions, which are training, validation and testing dataset
- Progressive sample the configurations and build a CART model (from Guo, 2013)
- Use a new measure called rank difference which is mean of (actual rank - predicted rank) over the validation data. Please note that the ranks (relative ranks) of validation set is known
- Stopping Criteria: If the mean rank difference doesn't reduce in three contiguous generations, then stop
What is minimum rank found?
- Given a software and a representative workload, the task of a practitioner is to find the best possible configuration for the system. This can rephrased as if we sort the list of configuration based on their performance values, we need to find the configuration corresponding to the least performance score aka rank 1. Minimum rank is the lowest actual rank achieved by the models on the testing data.
Comparisions
- Progressive Sampling:
Progressively increase the size of the testing dataset until the prediction accuracy (mmre) doesn't reach a particular threshold (10%) in this case. - Projective Sampling:
Using heuristic called feature-frequencies an initial population is generated and use to estimate the learning curve. Once the learning curve is found, the optimal size of the training set it also known. code
Summary of Results
- Wins (15/22), Not so much wins (3/22), Loss (4/22)
Not so much wins?
Using really less number of configuration when compared with other two methods
- BDBC - 35 < 44/203
- rs-6d-c3-obj1 - 25 < 452/653
- sol-6d-c2-obj2 - 26 < 323/1143
where as the min ranks found are statistically significantly worse than competing methods. But if you look at the minimum ranks found: BDBC 2/2561 (median 2 out of 2561 possible configurations), rs-6d-c3-obj1 2/3840 and sol-6d-c2-obj2 5/2862.
Losses
The rank method uses more number of configuration. This is because all the data sets can be used to build accurate models with few number of configurations for eg SQL and hence terminates much faster. Our counter to these results would be in the real world, it is difficult to find systems which are so accurate. I don't have a reference to this statement but I am looking for one.
minmax normalization
How does WHAT performs when compared to Random Sampling (Guo)
Does Accurate Model even required?
Confusion about policies.py
Here the author defined the function policy()
in policies.py to control the stop criteria of rank-based and progressive methods.
def policy(scores, lives=3):
"""
no improvement in last 3 runs
"""
temp_lives = lives
last = scores[0]
for i,score in enumerate(scores):
if i > 0:
if temp_lives == 0:
return i
elif score >= last:
temp_lives -= 1
last = score
else:
temp_lives = lives /** Line 18 **/
last = score
return -1
But do you think that the variable lives
in Line 18 should be temp_lives
? I.e., the Line 18 should be
temp_lives = temp_lives /** or you can remove Line 18 **/
The paper "Using bad learners to find good configurations" didn't say that the scores should be improve in last 3 continuous runs.
I think that the stop criteria is more strict compared to that in your paper.
Find difficult datasets
Bring all the data together
add feature weighting
None of this really makes much difference.
look at pca components and do predictions
Related Work
-
Chen, Haifeng, et al. "A cooperative sampling approach to discovering optimal configurations in large scale computing systems." Reliable Distributed Systems, 2010 29th IEEE Symposium on. IEEE, 2010.
Comparing Dumb Learner with random progressive sampling
from #19
z-score normalization
add random progressive sampling
Stats test to show how different it is from random sampling
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.