gu-youngfeng / reconfigsrc Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 298 KB

ReConfig is a post-processing approach to improve the ranking accuracy of the rank-based approach.

Python 100.00%

reconfig learning-to-rank configuration-optimization

reconfigsrc's Introduction

ReConfigSRC

1. Introduction

Project ReConfigSRC implements the experiments of ReConfig approach.

2. Environment & Dependency

ReConfigSRC is designed by Python language, so make sure that there is a Python environments on your computer. Besides, three widely-used Python libraries (numpy, pandas, and sklearn) are also required.

3. Experimental Steps

Step 1: prepare the raw datasets

The input datasets (in ".csv" format) of ReConfigSRC should be saved in the folder raw_data/. Note that the instances in each input dataset consist of a set of options and a performance.

Here are 3 example instances in dataset "Noc-obj1.csv", each instace has 4 options (width, complexity, fifo, multiplier) and a performance ($<energy).

width	complexity	fifo	multiplier	$<energy
3.0	1	4.0	1	7.8351029794899985
3.0	1	1.0	1	7.836833049419999
3.0	1	2.0	100	9.965784284660002
...	...	...	...	...

Step 2: obtain the results of the rank-based approach

Run the rank-based approach (i.e, src/rank_based.py) and obtain the preliminary prediction results, which are outputted into the folder experiment/rank_based/. Note that src/rank_based.py must be executed at first.

>> python src/rank-based.py

Step 3: obtain the results of the other approaches

Run the other approaches (src/classfication_exd.py, src/random_rank.py, src/reconfig.py, etc.) and obtain the corresponding ranking results. The prediction results are outputted in the folder experiment/${approach_name}.

>> python src/classfication_exd.py
>> python src/random_rank.py
>> python src/reconfig.py
>> ...

Step 4: analyze the ranking results

Run the src/experiment.py with command to analyze the results of each approach (in folder experiment/results/).

>> python src/experiment.py calRDTie

The other commands of src/experiment.py are as follows,

Command	Description
projInfo	Showing the basic information (e.g., options and dataset size) in each dataset.
projDistr {$index}	Drawing the performance distribution of specific dataset.
tiedNums	Drawing the number of tied configuretions in each datasets using the rank-based method.
calRDTie	Calculating the RDTie of each approach using different methods.
vsRankBased	RQ-1: Can ReConfig find better configurations than the rank-based approach?
vsOthers	RQ-2: Can the learning-to-rank method in ReConfig outperform comparative methods in finding configurations?
removeRatio	RQ-3: How many tied configurations should be filtered out in ReConfig?
vsRD	RQ-4: Is RDTie stable for evaluating the tied prediction?

Note: The newly-submited src/execute.py is another user interface of step-2 to step-4, that is, you can only run the src/execute.py at once instead of running python files (step-2 to step-4) step by step.

reconfigsrc's People

Contributors

Stargazers

Watchers

reconfigsrc's Issues

[ERROR]:Error in AdaRank::load():

Last commit a62edaf solved the issue #4 successfully, but it simultaneously triggered another error when running reconfig.py using ranker AdaRank, look at the following failing trace,

Discard orig. features
Model file:     trainer.txt
Feature normalization: No
Model:          AdaRank
Exception in thread "main" ciir.umass.edu.utilities.RankLibError: Error in AdaRank::load():
        at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:34)
        at ciir.umass.edu.learning.boosting.AdaRank.loadFromString(AdaRank.java:347)
        at ciir.umass.edu.learning.RankerFactory.loadRankerFromString(RankerFactory.java:109)
        at ciir.umass.edu.learning.RankerFactory.loadRankerFromFile(RankerFactory.java:99)
        at ciir.umass.edu.eval.Evaluator.rank(Evaluator.java:1258)
        at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:527)
Caused by: java.lang.NullPointerException
        at ciir.umass.edu.learning.boosting.AdaRank.loadFromString(AdaRank.java:333)
        ... 4 more

There is a possibility that some parameters in AdaRank are mis-assignmented. Here are the commands we used to invoke ranker AdaRank. The error happened when running command 2.

Command 1: Training a model trainer.txt based on trainset_txt15.txt.

>> java -jar RankLib.jar -train trainset_txt15.txt -ranker 3 -metric2t ERR@10 -round 500 -tolerance 0.002 -max 5 -save trainer.txt

Command 2: Re-ranking the testset_txt15.txt based on trainer.txt, then save the results in result.txt

>> java -jar RankLib.jar -rank testset_txt15.txt -load trainer.txt -indri result.txt

[ValueError]: Number of features of the model must match the input.

When running reconfig.py on large datasets, such as JavaGC and VP9 (more than 100,000 samples), an unexpected ValueError happened. Here are the error trace,

Traceback (most recent call last):
  File "reconfig.py", line 882, in <module>
    reconfig()
  File "reconfig.py", line 856, in reconfig
    predict_on_validation_set()
  File "reconfig.py", line 142, in predict_on_validation_set
    cart_predicted = carts(sub_train_set_rank, dataset_to_test)
  File "reconfig.py", line 58, in carts
    predicted = model.predict(test)
  File "C:\Users\yongfeng\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\tree\tree.py", line 430, in predict
    X = self._validate_X_predict(X, check_input)
  File "C:\Users\yongfeng\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\tree\tree.py", line 402, in _validate_X_predict
    % (self.n_features_, n_features))
ValueError: Number of features of the model must match the input. Model n_features is 44 and input n_features is 39

function predict_on_validation_set().

def predict_on_validation_set():
    """
    Note: use the sub_train set to predict on the validation set,
          save the results into "../temp_data/ltr_trainset/${project}/ltr_trainset_XX.csv"
    """
    datafolder = "../raw_data/"
    trainfolder = "../parse_data/sub_train/"
    split_datafolder = "../parse_data/data_split/"
    ...
   sub_train_set_rank_raw = pd.read_csv(sub_train_data[fileindex])
   sub_train_set_rank = read_data(sub_train_set_rank_raw)

   validation = update_data(validation_set)
   dataset_to_test = validation
   print("sub_train:", sub_train_data[fileindex], " validation:", csvfile)
   cart_predicted = carts(sub_train_set_rank, dataset_to_test)   ### Line 142

function carts(train, test).

def carts(train, test):
    """
    Note: use CART to predict preformance in test set
    """
    train_independent = [t.decision for t in train]
    train_dependent = [t.objective[-1] for t in train]
    test = test[test.columns[:-1]]
    model = DecisionTreeRegressor()
    model.fit(train_independent, train_dependent)
    print("sub_train features:", len(train[0].decision))
    print("validation features:", len(test.columns[:-1]))  
    predicted = model.predict(test)       ### Line 58

Poor results of Learning to Rank model

ReConfig utilizes the library RankLib.jar to re-rank the original predicted ranking list outputed by the rank-based method.
However, the results shows that learning to rank model cannot improve the accuracy at all.

General Usage of RankLib
https://sourceforge.net/p/lemur/wiki/RankLib%20How%20to%20use/#ranking.

The corresponding code snippet is,

def build_l2r_model(ranker):    ### Line 736 in reconfig.py ###
    ...
    cmd_line = "java -jar RankLib.jar -train " + txtfile + " -ranker " + str(ranker) + \
    " -save " + folderpath + "/mymodel_" + name_index + ".txt"
    os.system(cmd_line)
    ...

Here are 9 rankers provided by RankLib.jar.

Ranker ID	Ranker Name
0	MART
1	RankNet
2	RankBoost
3	AdaRank
4	Coordinate Ascent
6	LambdaMART
7	ListNet
8	Random Forests

Should we tune some parameters or try different learning to rank rankers? For example, we can change the variable ranker from 0 to 6 (default is 2), or we can add an argument such as -metric2t to set metric to optimize on the training data (default is ERR@10).

[ERROR]:ciir.umass.edu.utilities.RankLibError

The newly-submitted change in reconfig.py added more tunable parameters, such as -ranker, but this change can induce some error, the failing trace when running the reconfig.py is as follows,

STEP-5: build the L2R model to re-rank the prediction of test set ...

../temp_data/txt/trainset_txt/Apache_AllMeasurements
Exception in thread "main" ciir.umass.edu.utilities.RankLibError: Unknown command-line parameter: unspecified
        at ciir.umass.edu.utilities.RankLibError.create(RankLibError.java:26)
        at ciir.umass.edu.eval.Evaluator.main(Evaluator.java:409)

How did this happen? Are there any parameter set to a wrong value?

gu-youngfeng / reconfigsrc Goto Github PK

reconfigsrc's Introduction

ReConfigSRC

1. Introduction

2. Environment & Dependency

3. Experimental Steps

reconfigsrc's People

Contributors

Stargazers

Watchers

reconfigsrc's Issues

[ERROR]:Error in AdaRank::load():

[ValueError]: Number of features of the model must match the input.

Poor results of Learning to Rank model

[ERROR]:ciir.umass.edu.utilities.RankLibError

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent