Comments (13)
@Laurae2 I'm not sure what the "numba profiler" is, please could you clarify?! Numba has a built in parallel diagnostics tool which tracks transforms made to it's own IR of the Python source as it converts serial code to parallel code, but that's a compile-time diagnostic tool not a performance profiler.
Further, Numba 0.41.0 JIT profiling works with Intel Vtune, set the NUMBA_ENABLE_PROFILING
environment variable to non-zero and that will register the LLVM JIT Event listener for Intel VTune.
from pygbm.
Recent changes in master do parallelize more things and scalability is not as bad as reported anymore. pygbm tend to stay close to 1.5 of the duration of lightgbm at worst.
from pygbm.
@ogrisel would it be worth taking a look with the new parallel diagnostics output http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics to check what is/isn't parallelized?
from pygbm.
Thanks for the feedback @stuartarchibald. This code does many calls to several jitted functions. Is there a way to get all the diagnostic reports for all the functions jitted by numba at the end of the benchmark script?
from pygbm.
Would setting the NUMBA_PARALLEL_DIAGNOSTICS
environment variable work for that purpose?
from pygbm.
Thanks, this is exactly what I was looking for. Sorry for not reading the doc carefully enough.
from pygbm.
Actually what we really need is to do 2 runs under a profiler, one with NUMBA_NUM_THREADS=1
and one with NUMBA_NUM_THREADS=8
(for instance), and then for each numba function in the critical path, compute the speed up ratio and spot the functions that least benefit from parallel=True
in term of speed up and then look at the detailed parallel diagnostic for those.
It's also possible that we have a function in the critical path that is not parallelized at all for some reason.
from pygbm.
I think this is related numba/numba#3438, as setting the thread count to one is not the same as just switching parallelism off (parallel transformations and scheduling still take place). There are potentially cases where adding more than one thread causes the code to slow down (parallel kernels with negligible per-thread work, but all the overhead of scheduling), and further kernels which cost more to schedule and execute on a thread than to just use the executing thread to run them.
from pygbm.
I did a quick bench on the current master on a machine with many cores (without profiling for now):
NUMBA_NUM_THREADS=1 OMP_NUM_THREADS=1 python benchmarks/bench_higgs_boson.py --n-trees 100 --n-leaf-nodes 255 --learning-rate=0.5
Model | Time | AUC | Speed up |
---|---|---|---|
LightGBM | 431s | 0.7519 | 1x |
pygbm | 460s | 0.7522 | 1x |
NUMBA_NUM_THREADS=8 OMP_NUM_THREADS=8 python benchmarks/bench_higgs_boson.py --n-trees 100 --n-leaf-nodes 255 --learning-rate=0.5
Model | Time | AUC | Speed up |
---|---|---|---|
LightGBM | 83s | 0.7519 | 5.2x |
pygbm | 146s | 0.7536 | 2.9x |
Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz with 2x10 physical cores.
This is not the same machine as previously but the scalability is still sub-optimal, so doing the profiling effort is required to understand where are the scalability bottlenecks.
from pygbm.
@ogrisel You can apply for a free Intel VTune license for profiling your code if you do research.
It will be much better than the numba profiler.
from pygbm.
@stuartarchibald You can use the numba profiler here: https://github.com/numba/data_profiler (it just adds the signatures in reality). Incurs overhead penalty.
Still better to use Intel VTune for real profiling though (way more details and easier to pinpoint the issues).
from pygbm.
@Laurae2 Ah, so that's what you are referring to, thanks. Yes, indeed, they have different purposes...
from pygbm.
@ogrisel Note that LightGBM number of threads scale with the number of columns. Higgs dataset does not have enough columns for 48 threads (it will underestimate the scalability which gives you a lower scaling target).
from pygbm.
Related Issues (20)
- API documentation is broken HOT 1
- All the examples require lightgbm HOT 1
- Allow score monitoring regardless of early stopping
- Optimize score loss computation
- Remove empty slice check (numba fixed the issue)
- Reuse grower (and thus the splitter) instead of creating a new one
- Updating to Scipy 1.2.0 breaks loss tests... HOT 2
- Optionally use left/right indices buffer HOT 7
- Avoid ordered_gradients? HOT 7
- Remove constant_hessian_value? HOT 1
- sum_gradient and sum_hessians computation in find_node_split_subtraction HOT 4
- Optimize categorical crossentropy gradient update HOT 3
- _update_raw_predictions() throws a deprecation warning HOT 1
- numba-integration-test failure HOT 6
- Status of this project? HOT 2
- Implement native support for missing values
- did you stopped development since since can not do better than lightGBM pr Xgboost pr catboost? HOT 4
- Implement histogram recycling to improve memory efficiency
- Recent Numba not usable with pygbm HOT 1
- Parallel splitting fails in nopython mode
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pygbm.