Coder Social home page Coder Social logo

acbull / unbiased_lambdamart Goto Github PK

View Code? Open in Web Editor NEW
222.0 9.0 50.0 37.64 MB

Code for WWW'19 "Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm", which is based on LightGBM

License: MIT License

Python 16.91% Shell 0.44% CMake 0.39% R 12.49% Makefile 0.04% Batchfile 0.05% C++ 62.21% C 6.63% Jupyter Notebook 0.84%
learning-to-rank lightgbm bias

unbiased_lambdamart's Introduction

Unbiased LambdaMart

Unbiased LambdaMart is a unbiased version of traditional LambdaMart, which can jointly estimate the biases at click positions and the biases at unclick positions, and learn an unbiased ranker using a pairwise loss function.

The repository contains two parts, firstly an implementation of Unbiased LambdaMart based on LightGBM. Secondly a simulated click dataset with its generation scripts for evalution.

You can see our WWW 2019 (know as The Web Conference) paper Unbiased LambdaMART: An Unbiased PairwiseLearning-to-Rank Algorithm for more details.

Overview

  • Unbiased_LambdaMart:

    An implementation of Unbiased LambdaMart based on LightGBM. Note that LightGBM contains a wide variety of applications using gradient boosting decision tree algorithms. Our modification is mainly on the src/objective/rank_objective.hpp, which is the LambdaMart Ranking objective file.

  • evaluation:

    contains the synthetic click dataset generated using click models. This part of code is mainly forked from https://github.com/QingyaoAi/Unbiased-Learning-to-Rank-with-Unbiased-Propensity-Estimation. We also add the configs file to run our Unbiased LambdaMart on this synthetic dataset.

Setup

First compile the Unbias_LightGBM (Original LightGBM with the implementation of Unbiased LambdaMart)

On Linux LightGBM can be built using CMake and gcc or Clang.

Install CMake with sudo apt install cmake.

Run the following commands:

cd Unbias_LightGBM/
mkdir build ; cd build
cmake ..
make -j4

Note: glibc >= 2.14 is required. After compilation, we will get a "lighgbm" executable file in the folder.

Example

We modified the original example file to give an illustration.

Compile, then run the following commands:

cd Unbias_LightGBM
cp ./lightgbm ./examples/lambdarank/
cd ./examples/lambdarank/
./lightgbm config="train.conf"

Despite the original XXX.train (provides feature) and XXX.train.query (provides which query a document belongs to), our modified lambdamart requires a XXX.train.rank file to provide the position information to conduct debiasing. For later usage, remember to add this file.

Evaluation

Firstly, download the ranked dataset by an initial SVM ranker from HERE and unzip it into the evaluation directory. Also, one can generate this from scratch by their own, by refering to the procedure of Qingyao Ai, et al..

Next, generate the synthetic dataset from click models by:

cd evaluation
mkdir test_data
cd scripts
python generate_data.py ../click_model/user_browsing_model_0.1_1_4_1.json

Their are also other click model configurations in evaluation/click_model/, one can use any of them.

Finally, move the compiled lighgbm file into evaluation/configs, and then run:

./lightgbm config='train.conf'
./lightgbm config='test.conf'

In this way, the test results (LightGBM_predict_result.txt) based on synthetic click data will be generated. Next, we will evaluate it on real data, by:

cd ../scripts
python eval.py ../configs/LightGBM_predict_result.txt  #or any other model output.

Citation

Please consider citing the following paper when using our code for your application.

@inproceedings{unbias_lambdamart,
  title={Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm},
  author={Ziniu Hu, Yang Wang, Qu Peng, Hang Li},
  booktitle={Proceedings of the 2019 World Wide Web Conference},
  year={2019}
}

unbiased_lambdamart's People

Contributors

acbull avatar hbghhy avatar paddy74 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

unbiased_lambdamart's Issues

Reccommend using a submodule+fork for Unbias_LightGBM

Due to Unbias_LightGBM effectively being a fork of LightGBM, it would be sensible to create a fork of LightGBM with the necessary changes and renamed to Unbais_LightGBM and add that fork as a submodule to this project. The new fork would initially be set to the relevant commit of LightGBM that Unbias_LightGBM is based upon.

This would allow updates and bug fixes to LightGBM to be easily incorporated into this project, as well as additional clarity that Unbias_LightGBM is LightGBM with modification.

Add LICENSE.md to project root

As a publically viewable project, Unbiased_LambdaMart should include a LICENSE.md file in order to explicitly convey how this project may be used.
As an unlicensed repository, the only rights provided to other users are to view and fork the repository. This is not in line with any desire for this work to be utilized in other projects.

Coming from GitHub's licensing help page:

You're under no obligation to choose a license. It's your right not to
include one with your code or project, but please be aware of the
implications. Generally speaking, the absence of a license means that
the default copyright laws apply. This means that you retain all
rights to your source code and that nobody else may reproduce,
distribute, or create derivative works from your work. This might not
be what you intend.

Even if this is what you intend, if you publish your source code in a
public repository on GitHub, you have accepted the Terms of Service
which do allow other GitHub users some rights. Specifically, you allow
others to view and fork your repository.

If you want to share your work with others, we strongly encourage you
to include an open source license.

I would strongly recommend the MIT license to encourage the widest availability of this project to other researchers, or if you seek protections regarding promotion and advertising material the BSD 3-clause, or Apache 2.0 if you want MIT with more words.

It also appears at first glance that the project is licensed under MIT, as that is the license included in the Unbias_LightGBM directory. However, it is not entirely clear as that is also the license provided with LightGBM.

Question about `position_bins`

Hi,

I'm trying to understand the implementation differences between this repo and lightgbm. Does

position_bins = 12       : this denotes the maximum positions taken into account.

effectively serve the same as lambdarank_truncation_level in the newer releases of lightgbm? Looks like they each cap the number of results from a given query we look at. Wanted to confirm.

segmentation fault

I am getting a segmentation fault when running your version of lightgbm with the default train/test sets as provided at: https://github.com/Microsoft/LightGBM/tree/master/examples/lambdarank

 ./lightgbm config=train.conf
[LightGBM] [Info] Finished loading parameters
num_threads_: 8
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Finished loading data in 0.038245 seconds

  position         bias_i         bias_j         i_cost         j_cost
         0              1              1              0              0
         1              1              1              0              0
         2              1              1              0              0
         3              1              1              0              0
         4              1              1              0              0
         5              1              1              0              0
         6              1              1              0              0
         7              1              1              0              0
         8              1              1              0              0
         9              1              1              0              0
        10              1              1              0              0
        11              1              1              0              0
[LightGBM] [Info] Total Bins 6177
[LightGBM] [Info] Number of data: 3005, number of used features: 211
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[1]    6611 segmentation fault (core dumped)  ./lightgbm config=train.conf

How to tune hyperparameter when use the lambdamart example

I have read the paper and the train.conf file in the lambdarank. It seems there are some hyperparameter such as p and M in the paper. But I cannot find it in the train.conf. Do I miss something?

And I want to use the lib in a large-scale dataset. Train lgb M times will cost a lot time. Can we just train some weak lgb at begining to estimate position-baised parameter. And then re-train with a complex lgb. Did you try it?

Unbiased_lambdamart ndcg is low than original lambdamart

I do the follow step:

  1. download the generate_dataset
  2. python evaluation/scripts/generate_data.py evaluation/click_model/user_browsing_model_0.1_1_4_1.json to generate train and test data
  3. run ./lightgbm config="train.conf" and get test data ndcg@10=0.546817
  4. build a original lightgbm, version=2.1.1 that same to Unbiased_LambdaMart
  5. run ./lightgbm config="train.conf" and get test data ndcg@10=0.556632
  6. Why Unbiased_lambdamart ndcg is low than original lambdamart? The paper say Unbiased_LambdaMart is better than origial?

Want to use PythonAPI to train and predict.

Hi,

I wanted to use the code in a similar way as used in LightGBM where I just import LightGBM and use LGBMRanker to train, make predictions, etc.

I am currently using the Unbias_LightGBM/examples/lambdarank/train.conf file to train.
Can you please guide me to how I can do the same for this repo?

Question about reading "XXX.train" file

Hello!
I am now trying to train using a dataset containing Nan. However, I find that the sample provided is in the form of libSVM. Because libSVM only allows numerical values and not Nan. Therefore, I convert the data to a ".npy" file to try. I would like to ask if the provided code supports reading in other files, such as ".npy," ".csv." If it supports it, I would like to get some details about it.

Why is sigma = 2?

Just a question about the implementation of unbiased lambdaMART - why is the sigma coefficient = 2? Is this related to numerical stability? The paper states "sigma is a constant with default value of 2" (section 4.3), but doesn't provide a reason. Most implementations of the lambdarank gradient just defaults to sigma = 1, including LightGBM. I'm just wondering what the benefits were that drove you to pick sigma = 2 as opposed to 1.

Thank you!

AppleClang not supported -- Setup on Mac.

I am trying to setup up this repo on Mac Mojave 10.14.4. When I make the build directory and run 'cmake ..' from it, it shows me the following error:

CMake Error at CMakeLists.txt:27 (message):
AppleClang wasn't supported. Please see
https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#macos

-- Configuring incomplete, errors occurred!

Went to the above link and installed cmake and libomp through brew. I copied the cmake command that is given in the above link and it still showed me the same error as before.

When I try to install LightGBM through the same process, it works seamlessly.

How do I solve this issue and setup this repo?

Jupyter notebook example

Could you let me know how to use it in Jupyter notebook.

Should I add something to lightgbm package ?

Can u add an example

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.