Coder Social home page Coder Social logo

sbinet-gonum / leaves Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dmitryikh/leaves

0.0 2.0 0.0 153 KB

pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks

License: MIT License

Shell 1.14% Python 4.83% Go 94.03%

leaves's Introduction

leaves

Build Status GoDoc Coverage Status Go Report Card

Logo

Intoduction

leaves is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in pure Go. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.

Features

  • Support LightGBM (repo) models:
    • reading models from text format
    • supporting numerical & categorical features
    • supporting configured parallel predictions for batches
    • addition optimizations for categorical features (for example, one hot decision rule)
    • addition optimizations exploiting only prediction usage
  • Support XGBoost (repo) models:
    • reading models from binary format
    • supporting missing values (nan)
    • supporting configured parallel predictions for batches

Usage examples

In order to start, go get this repository:

go get github.com/dmitryikh/leaves

Minimal example:

package main

import (
	"fmt"

	"github.com/dmitryikh/leaves"
)

func main() {
	// 1. Read model
	model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt")
	if err != nil {
		panic(err)
	}

	// 2. Do predictions!
	fvals := []float64{1.0, 2.0, 3.0}
	p := model.Predict(fvals, 0)
	fmt.Printf("Prediction for %v: %f\n", fvals, p)
}

In order to use XGBoost model, just change leaves.LGEnsembleFromFile, to leaves.XGEnsembleFromFile. For mode usage examples see leaves_test.go.

Benchmark

Below are comparisons of prediction speed on batches (~1000 objects in 1 API call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ะ“ะ‘ 2133 MHz LPDDR3. C API implementations were called from python bindings. But large batch size should neglect overhead of python bindings. leaves benchmarks were run by means of golang test framework: go test -bench. See benchmark for mode details on measurments. See testdata/README.md for data preparation pipelines.

Single thread:

Test Case Features Trees Batch size C API leaves
LightGBM MS LTR 137 500 1000 49ms 51ms
LightGBM Higgs 28 500 1000 50ms 50ms
XGBoost Higgs 28 500 1000 44ms 50ms

4 threads:

Test Case Features Trees Batch size C API leaves
LightGBM MS LTR 137 500 1000 14ms 14ms
LightGBM Higgs 28 500 1000 14ms 14ms
XGBoost Higgs 28 500 1000 ? 14ms

? - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings

Limitations

  • LightGBM models:
    • no support transformations functions (sigmoid, lambdarank, etc). Output scores is raw scores
  • XGBoost models:
    • no support transformations functions. Output scores is raw scores
    • support only gbtree models (most common)

Contacts

In case if you are interested in the project or if you have questions, please contact with me by email: khdmitryi at gmail.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.