arboretum_benchmark's People
arboretum_benchmark's Issues
Results
Criteo dataset
40M dataset - 5000 threes
python3 src/criteo_speed_test.py xgboost ; python3 src/criteo_speed_test.py lightgbm; python3 src/criteo_speed_test.py arboretum
reading data....
startring benchmark xgboost
2388.4275090694427
roc auc train:0.8894494708973515 cv:0.782368838905777
reading data....
startring benchmark lightgbm
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 1031045, number of negative: 30968955
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 3872
[LightGBM] [Info] Number of data: 32000000, number of used features: 39
[LightGBM] [Info] Using GPU Device: GeForce GTX 1070, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 38 dense feature groups (1220.70 MB) transfered to GPU in 0.967707 secs. 1 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.032220 -> initscore=-3.402412
[LightGBM] [Info] Start training from score -3.402412
2522.5609214305878
roc auc train:0.8973754627179997 cv:0.7764168713748687
reading data....
startring benchmark arboretum
feature 0 has been reduced to 15 bits
feature 1 has been reduced to 13 bits
feature 2 has been reduced to 10 bits
feature 3 has been reduced to 15 bits
feature 4 has been reduced to 12 bits
feature 5 has been reduced to 10 bits
feature 6 has been reduced to 9 bits
feature 7 has been reduced to 14 bits
feature 8 has been reduced to 10 bits
feature 9 has been reduced to 4 bits
feature 10 has been reduced to 8 bits
feature 11 has been reduced to 19 bits
feature 12 has been reduced to 11 bits
max feature size 19
Total bytes 8513978368 available 2080964608
Memory usage estimation 220 per record 7040000000 in total
copied features data 13 from 13
copied category features 1 from 26
roc auc train:0.8108035778617945 cv:0.7864887547868098
10M dataset - 5000 threes
reading data....
startring benchmark xgboost
662.0523405075073
roc auc train:0.965875942403954 cv:0.756494298707683
reading data....
startring benchmark lightgbm
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 247552, number of negative: 7752448
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 3844
[LightGBM] [Info] Number of data: 8000000, number of used features: 39
[LightGBM] [Info] Using GPU Device: GeForce GTX 1070, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 12
[LightGBM] [Info] 38 dense feature groups (305.18 MB) transfered to GPU in 0.311981 secs. 1 sparse feature groups
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.030944 -> initscore=-3.444143
[LightGBM] [Info] Start training from score -3.444143
805.6181969642639
roc auc train:0.9904928384666543 cv:0.7458449429346047
reading data....
startring benchmark arboretum
feature 0 has been reduced to 14 bits
feature 1 has been reduced to 13 bits
feature 2 has been reduced to 9 bits
feature 3 has been reduced to 14 bits
feature 4 has been reduced to 12 bits
feature 5 has been reduced to 9 bits
feature 6 has been reduced to 9 bits
feature 7 has been reduced to 13 bits
feature 8 has been reduced to 9 bits
feature 9 has been reduced to 4 bits
feature 10 has been reduced to 8 bits
feature 11 has been reduced to 19 bits
feature 12 has been reduced to 10 bits
max feature size 19
Total bytes 8513978368 available 7124615168
Memory usage estimation 180 per record 1440000000 in total
copied features data 13 from 13
copied category features 26 from 26
roc auc train:0.8339605258327059 cv:0.7800383186995146
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.