Coder Social home page Coder Social logo

acbull / unbiased_lambdamart Goto Github PK

View Code? Open in Web Editor NEW
222.0 9.0 50.0 37.64 MB

Code for WWW'19 "Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm", which is based on LightGBM

License: MIT License

Python 16.91% Shell 0.44% CMake 0.39% R 12.49% Makefile 0.04% Batchfile 0.05% C++ 62.21% C 6.63% Jupyter Notebook 0.84%
learning-to-rank lightgbm bias

unbiased_lambdamart's Issues

Reccommend using a submodule+fork for Unbias_LightGBM

Due to Unbias_LightGBM effectively being a fork of LightGBM, it would be sensible to create a fork of LightGBM with the necessary changes and renamed to Unbais_LightGBM and add that fork as a submodule to this project. The new fork would initially be set to the relevant commit of LightGBM that Unbias_LightGBM is based upon.

This would allow updates and bug fixes to LightGBM to be easily incorporated into this project, as well as additional clarity that Unbias_LightGBM is LightGBM with modification.

segmentation fault

I am getting a segmentation fault when running your version of lightgbm with the default train/test sets as provided at: https://github.com/Microsoft/LightGBM/tree/master/examples/lambdarank

 ./lightgbm config=train.conf
[LightGBM] [Info] Finished loading parameters
num_threads_: 8
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Loading query boundaries...
[LightGBM] [Info] Finished loading data in 0.038245 seconds

  position         bias_i         bias_j         i_cost         j_cost
         0              1              1              0              0
         1              1              1              0              0
         2              1              1              0              0
         3              1              1              0              0
         4              1              1              0              0
         5              1              1              0              0
         6              1              1              0              0
         7              1              1              0              0
         8              1              1              0              0
         9              1              1              0              0
        10              1              1              0              0
        11              1              1              0              0
[LightGBM] [Info] Total Bins 6177
[LightGBM] [Info] Number of data: 3005, number of used features: 211
[LightGBM] [Info] Finished initializing training
[LightGBM] [Info] Started training...
[1]    6611 segmentation fault (core dumped)  ./lightgbm config=train.conf

Why is sigma = 2?

Just a question about the implementation of unbiased lambdaMART - why is the sigma coefficient = 2? Is this related to numerical stability? The paper states "sigma is a constant with default value of 2" (section 4.3), but doesn't provide a reason. Most implementations of the lambdarank gradient just defaults to sigma = 1, including LightGBM. I'm just wondering what the benefits were that drove you to pick sigma = 2 as opposed to 1.

Thank you!

Jupyter notebook example

Could you let me know how to use it in Jupyter notebook.

Should I add something to lightgbm package ?

Can u add an example

Want to use PythonAPI to train and predict.

Hi,

I wanted to use the code in a similar way as used in LightGBM where I just import LightGBM and use LGBMRanker to train, make predictions, etc.

I am currently using the Unbias_LightGBM/examples/lambdarank/train.conf file to train.
Can you please guide me to how I can do the same for this repo?

Question about reading "XXX.train" file

Hello!
I am now trying to train using a dataset containing Nan. However, I find that the sample provided is in the form of libSVM. Because libSVM only allows numerical values and not Nan. Therefore, I convert the data to a ".npy" file to try. I would like to ask if the provided code supports reading in other files, such as ".npy," ".csv." If it supports it, I would like to get some details about it.

Unbiased_lambdamart ndcg is low than original lambdamart

I do the follow step:

  1. download the generate_dataset
  2. python evaluation/scripts/generate_data.py evaluation/click_model/user_browsing_model_0.1_1_4_1.json to generate train and test data
  3. run ./lightgbm config="train.conf" and get test data ndcg@10=0.546817
  4. build a original lightgbm, version=2.1.1 that same to Unbiased_LambdaMart
  5. run ./lightgbm config="train.conf" and get test data ndcg@10=0.556632
  6. Why Unbiased_lambdamart ndcg is low than original lambdamart? The paper say Unbiased_LambdaMart is better than origial?

AppleClang not supported -- Setup on Mac.

I am trying to setup up this repo on Mac Mojave 10.14.4. When I make the build directory and run 'cmake ..' from it, it shows me the following error:

CMake Error at CMakeLists.txt:27 (message):
AppleClang wasn't supported. Please see
https://github.com/Microsoft/LightGBM/blob/master/docs/Installation-Guide.rst#macos

-- Configuring incomplete, errors occurred!

Went to the above link and installed cmake and libomp through brew. I copied the cmake command that is given in the above link and it still showed me the same error as before.

When I try to install LightGBM through the same process, it works seamlessly.

How do I solve this issue and setup this repo?

Add LICENSE.md to project root

As a publically viewable project, Unbiased_LambdaMart should include a LICENSE.md file in order to explicitly convey how this project may be used.
As an unlicensed repository, the only rights provided to other users are to view and fork the repository. This is not in line with any desire for this work to be utilized in other projects.

Coming from GitHub's licensing help page:

You're under no obligation to choose a license. It's your right not to
include one with your code or project, but please be aware of the
implications. Generally speaking, the absence of a license means that
the default copyright laws apply. This means that you retain all
rights to your source code and that nobody else may reproduce,
distribute, or create derivative works from your work. This might not
be what you intend.

Even if this is what you intend, if you publish your source code in a
public repository on GitHub, you have accepted the Terms of Service
which do allow other GitHub users some rights. Specifically, you allow
others to view and fork your repository.

If you want to share your work with others, we strongly encourage you
to include an open source license.

I would strongly recommend the MIT license to encourage the widest availability of this project to other researchers, or if you seek protections regarding promotion and advertising material the BSD 3-clause, or Apache 2.0 if you want MIT with more words.

It also appears at first glance that the project is licensed under MIT, as that is the license included in the Unbias_LightGBM directory. However, it is not entirely clear as that is also the license provided with LightGBM.

How to tune hyperparameter when use the lambdamart example

I have read the paper and the train.conf file in the lambdarank. It seems there are some hyperparameter such as p and M in the paper. But I cannot find it in the train.conf. Do I miss something?

And I want to use the lib in a large-scale dataset. Train lgb M times will cost a lot time. Can we just train some weak lgb at begining to estimate position-baised parameter. And then re-train with a complex lgb. Did you try it?

Question about `position_bins`

Hi,

I'm trying to understand the implementation differences between this repo and lightgbm. Does

position_bins = 12       : this denotes the maximum positions taken into account.

effectively serve the same as lambdarank_truncation_level in the newer releases of lightgbm? Looks like they each cap the number of results from a given query we look at. Wanted to confirm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.