acbull / unbiased_lambdamart Goto Github PK

Code for WWW'19 "Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm", which is based on LightGBM

License: MIT License

Python 16.91% Shell 0.44% CMake 0.39% R 12.49% Makefile 0.04% Batchfile 0.05% C++ 62.21% C 6.63% Jupyter Notebook 0.84%

unbiased_lambdamart's Issues

Reccommend using a submodule+fork for Unbias_LightGBM

Due to Unbias_LightGBM effectively being a fork of LightGBM, it would be sensible to create a fork of LightGBM with the necessary changes and renamed to Unbais_LightGBM and add that fork as a submodule to this project. The new fork would initially be set to the relevant commit of LightGBM that Unbias_LightGBM is based upon.

This would allow updates and bug fixes to LightGBM to be easily incorporated into this project, as well as additional clarity that Unbias_LightGBM is LightGBM with modification.

segmentation fault

I am getting a segmentation fault when running your version of lightgbm with the default train/test sets as provided at: https://github.com/Microsoft/LightGBM/tree/master/examples/lambdarank

 ./lightgbm config=train.conf
[LightGBM] [Info] Finished loading parameters
num_threads_: 8
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Finished loading data in 0.038245 seconds

  position         bias_i         bias_j         i_cost         j_cost
         0              1              1              0              0
         1              1              1              0              0
         2              1              1              0              0
         3              1              1              0              0
         4              1              1              0              0
         5              1              1              0              0
         6              1              1              0              0
         7              1              1              0              0
         8              1              1              0              0
         9              1              1              0              0
        10              1              1              0              0
        11              1              1              0              0
[LightGBM] [Info] Total Bins 6177
[LightGBM] [Info] Number of data: 3005, number of used features: 211
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[1]    6611 segmentation fault (core dumped)  ./lightgbm config=train.conf

Why is sigma = 2?

Just a question about the implementation of unbiased lambdaMART - why is the sigma coefficient = 2? Is this related to numerical stability? The paper states "sigma is a constant with default value of 2" (section 4.3), but doesn't provide a reason. Most implementations of the lambdarank gradient just defaults to sigma = 1, including LightGBM. I'm just wondering what the benefits were that drove you to pick sigma = 2 as opposed to 1.

Thank you!

Jupyter notebook example

Could you let me know how to use it in Jupyter notebook.

Should I add something to lightgbm package ?

Can u add an example

Want to use PythonAPI to train and predict.

Hi,

I wanted to use the code in a similar way as used in LightGBM where I just import LightGBM and use LGBMRanker to train, make predictions, etc.

I am currently using the Unbias_LightGBM/examples/lambdarank/train.conf file to train.
Can you please guide me to how I can do the same for this repo?

Question about reading "XXX.train" file

Hello!
I am now trying to train using a dataset containing Nan. However, I find that the sample provided is in the form of libSVM. Because libSVM only allows numerical values and not Nan. Therefore, I convert the data to a ".npy" file to try. I would like to ask if the provided code supports reading in other files, such as ".npy," ".csv." If it supports it, I would like to get some details about it.

Unbiased_lambdamart ndcg is low than original lambdamart

I do the follow step:

download the generate_dataset
python evaluation/scripts/generate_data.py evaluation/click_model/user_browsing_model_0.1_1_4_1.json to generate train and test data
run ./lightgbm config="train.conf" and get test data ndcg@10=0.546817
build a original lightgbm, version=2.1.1 that same to Unbiased_LambdaMart
run ./lightgbm config="train.conf" and get test data ndcg@10=0.556632
Why Unbiased_lambdamart ndcg is low than original lambdamart? The paper say Unbiased_LambdaMart is better than origial?

AppleClang not supported -- Setup on Mac.

I am trying to setup up this repo on Mac Mojave 10.14.4. When I make the build directory and run 'cmake ..' from it, it shows me the following error:

CMake Error at CMakeLists.txt:27 (message):
AppleClang wasn't supported. Please see
https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#macos

-- Configuring incomplete, errors occurred!

Went to the above link and installed cmake and libomp through brew. I copied the cmake command that is given in the above link and it still showed me the same error as before.

When I try to install LightGBM through the same process, it works seamlessly.

How do I solve this issue and setup this repo?

Add LICENSE.md to project root

As a publically viewable project, Unbiased_LambdaMart should include a LICENSE.md file in order to explicitly convey how this project may be used.
As an unlicensed repository, the only rights provided to other users are to view and fork the repository. This is not in line with any desire for this work to be utilized in other projects.

Coming from GitHub's licensing help page:

You're under no obligation to choose a license. It's your right not to
include one with your code or project, but please be aware of the
implications. Generally speaking, the absence of a license means that
the default copyright laws apply. This means that you retain all
rights to your source code and that nobody else may reproduce,
distribute, or create derivative works from your work. This might not
be what you intend.

Even if this is what you intend, if you publish your source code in a
public repository on GitHub, you have accepted the Terms of Service
which do allow other GitHub users some rights. Specifically, you allow
others to view and fork your repository.

If you want to share your work with others, we strongly encourage you
to include an open source license.

I would strongly recommend the MIT license to encourage the widest availability of this project to other researchers, or if you seek protections regarding promotion and advertising material the BSD 3-clause, or Apache 2.0 if you want MIT with more words.

It also appears at first glance that the project is licensed under MIT, as that is the license included in the Unbias_LightGBM directory. However, it is not entirely clear as that is also the license provided with LightGBM.

position_bins = 12       : this denotes the maximum positions taken into account.

effectively serve the same as lambdarank_truncation_level in the newer releases of lightgbm? Looks like they each cap the number of results from a given query we look at. Wanted to confirm.

acbull / unbiased_lambdamart Goto Github PK

unbiased_lambdamart's Issues

Reccommend using a submodule+fork for Unbias_LightGBM

segmentation fault

Why is sigma = 2?

Jupyter notebook example

Want to use PythonAPI to train and predict.

Question about reading "XXX.train" file

Unbiased_lambdamart ndcg is low than original lambdamart

AppleClang not supported -- Setup on Mac.

Add LICENSE.md to project root

Broken Link in README.md

How to tune hyperparameter when use the lambdamart example

Question about `position_bins`

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent