Coder Social home page Coder Social logo

reconfigsrc's Introduction

ReConfigSRC

1. Introduction

Project ReConfigSRC implements the experiments of ReConfig approach.

2. Environment & Dependency

ReConfigSRC is designed by Python language, so make sure that there is a Python environments on your computer. Besides, three widely-used Python libraries (numpy, pandas, and sklearn) are also required.

3. Experimental Steps

Step 1: prepare the raw datasets

The input datasets (in ".csv" format) of ReConfigSRC should be saved in the folder raw_data/. Note that the instances in each input dataset consist of a set of options and a performance.

Here are 3 example instances in dataset "Noc-obj1.csv", each instace has 4 options (width, complexity, fifo, multiplier) and a performance ($<energy).

width complexity fifo multiplier $<energy
3.0 1 4.0 1 7.8351029794899985
3.0 1 1.0 1 7.836833049419999
3.0 1 2.0 100 9.965784284660002
... ... ... ... ...

Step 2: obtain the results of the rank-based approach

Run the rank-based approach (i.e, src/rank_based.py) and obtain the preliminary prediction results, which are outputted into the folder experiment/rank_based/. Note that src/rank_based.py must be executed at first.

>> python src/rank-based.py

Step 3: obtain the results of the other approaches

Run the other approaches (src/classfication_exd.py, src/random_rank.py, src/reconfig.py, etc.) and obtain the corresponding ranking results. The prediction results are outputted in the folder experiment/${approach_name}.

>> python src/classfication_exd.py
>> python src/random_rank.py
>> python src/reconfig.py
>> ...

Step 4: analyze the ranking results

Run the src/experiment.py with command to analyze the results of each approach (in folder experiment/results/).

>> python src/experiment.py calRDTie

The other commands of src/experiment.py are as follows,

Command Description
projInfo Showing the basic information (e.g., options and dataset size) in each dataset.
projDistr {$index} Drawing the performance distribution of specific dataset.
tiedNums Drawing the number of tied configuretions in each datasets using the rank-based method.
calRDTie Calculating the RDTie of each approach using different methods.
vsRankBased RQ-1: Can ReConfig find better configurations than the rank-based approach?
vsOthers RQ-2: Can the learning-to-rank method in ReConfig outperform comparative methods in finding configurations?
removeRatio RQ-3: How many tied configurations should be filtered out in ReConfig?
vsRD RQ-4: Is RDTie stable for evaluating the tied prediction?

Note: The newly-submited src/execute.py is another user interface of step-2 to step-4, that is, you can only run the src/execute.py at once instead of running python files (step-2 to step-4) step by step.

reconfigsrc's People

Contributors

gu-youngfeng avatar youngguys24 avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

reconfigsrc's Issues

[ERROR]:Error in AdaRank::load():

Last commit a62edaf solved the issue #4 successfully, but it simultaneously triggered another error when running reconfig.py using ranker AdaRank, look at the following failing trace,

Discard orig. features
Model file:     trainer.txt
Feature normalization: No
Model:          AdaRank
Exception in thread "main" ciir.umass.edu.utilities.RankLibError: Error in AdaRank::load():
        at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:34)
        at ciir.umass.edu.learning.boosting.AdaRank.loadFromString(AdaRank.java:347)
        at ciir.umass.edu.learning.RankerFactory.loadRankerFromString(RankerFactory.java:109)
        at ciir.umass.edu.learning.RankerFactory.loadRankerFromFile(RankerFactory.java:99)
        at ciir.umass.edu.eval.Evaluator.rank(Evaluator.java:1258)
        at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:527)
Caused by: java.lang.NullPointerException
        at ciir.umass.edu.learning.boosting.AdaRank.loadFromString(AdaRank.java:333)
        ... 4 more

There is a possibility that some parameters in AdaRank are mis-assignmented. Here are the commands we used to invoke ranker AdaRank. The error happened when running command 2.

Command 1: Training a model trainer.txt based on trainset_txt15.txt.

>> java -jar RankLib.jar -train trainset_txt15.txt -ranker 3 -metric2t ERR@10 -round 500 -tolerance 0.002 -max 5 -save trainer.txt

Command 2: Re-ranking the testset_txt15.txt based on trainer.txt, then save the results in result.txt

>> java -jar RankLib.jar -rank testset_txt15.txt -load trainer.txt -indri result.txt

[ValueError]: Number of features of the model must match the input.

When running reconfig.py on large datasets, such as JavaGC and VP9 (more than 100,000 samples), an unexpected ValueError happened. Here are the error trace,

Traceback (most recent call last):
  File "reconfig.py", line 882, in <module>
    reconfig()
  File "reconfig.py", line 856, in reconfig
    predict_on_validation_set()
  File "reconfig.py", line 142, in predict_on_validation_set
    cart_predicted = carts(sub_train_set_rank, dataset_to_test)
  File "reconfig.py", line 58, in carts
    predicted = model.predict(test)
  File "C:\Users\yongfeng\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\tree\tree.py", line 430, in predict
    X = self._validate_X_predict(X, check_input)
  File "C:\Users\yongfeng\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\tree\tree.py", line 402, in _validate_X_predict
    % (self.n_features_, n_features))
ValueError: Number of features of the model must match the input. Model n_features is 44 and input n_features is 39
  • function predict_on_validation_set().
def predict_on_validation_set():
    """
    Note: use the sub_train set to predict on the validation set,
          save the results into "../temp_data/ltr_trainset/${project}/ltr_trainset_XX.csv"
    """
    datafolder = "../raw_data/"
    trainfolder = "../parse_data/sub_train/"
    split_datafolder = "../parse_data/data_split/"
    ...
   sub_train_set_rank_raw = pd.read_csv(sub_train_data[fileindex])
   sub_train_set_rank = read_data(sub_train_set_rank_raw)

   validation = update_data(validation_set)
   dataset_to_test = validation
   print("sub_train:", sub_train_data[fileindex], " validation:", csvfile)
   cart_predicted = carts(sub_train_set_rank, dataset_to_test)   ### Line 142
  • function carts(train, test).
def carts(train, test):
    """
    Note: use CART to predict preformance in test set
    """
    train_independent = [t.decision for t in train]
    train_dependent = [t.objective[-1] for t in train]
    test = test[test.columns[:-1]]
    model = DecisionTreeRegressor()
    model.fit(train_independent, train_dependent)
    print("sub_train features:", len(train[0].decision))
    print("validation features:", len(test.columns[:-1]))  
    predicted = model.predict(test)       ### Line 58 

Poor results of Learning to Rank model

ReConfig utilizes the library RankLib.jar to re-rank the original predicted ranking list outputed by the rank-based method.
However, the results shows that learning to rank model cannot improve the accuracy at all.

General Usage of RankLib
https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/#ranking.

The corresponding code snippet is,

def build_l2r_model(ranker):    ### Line 736 in reconfig.py ###
    ...
    cmd_line = "java -jar RankLib.jar -train " + txtfile + " -ranker " + str(ranker) + \
    " -save " + folderpath + "/mymodel_" + name_index + ".txt"
    os.system(cmd_line)
    ...

Here are 9 rankers provided by RankLib.jar.

Ranker ID Ranker Name
0 MART
1 RankNet
2 RankBoost
3 AdaRank
4 Coordinate Ascent
6 LambdaMART
7 ListNet
8 Random Forests

Should we tune some parameters or try different learning to rank rankers? For example, we can change the variable ranker from 0 to 6 (default is 2), or we can add an argument such as -metric2t to set metric to optimize on the training data (default is ERR@10).

[ERROR]:ciir.umass.edu.utilities.RankLibError

The newly-submitted change in reconfig.py added more tunable parameters, such as -ranker, but this change can induce some error, the failing trace when running the reconfig.py is as follows,

STEP-5: build the L2R model to re-rank the prediction of test set ...

../temp_data/txt/trainset_txt/Apache_AllMeasurements
Exception in thread "main" ciir.umass.edu.utilities.RankLibError: Unknown command-line parameter: unspecified
        at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:26)
        at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:409)

How did this happen? Are there any parameter set to a wrong value?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.