continuing from <a class="issue-link js-issue-link" data-error-text="Failed to load ti

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

There are some parallel in place partition algorithms: <a href="http://www.lsi.upc.edu

Optionally use left/right indices buffer about pygbm HOT 7 OPEN

maartenbreddels commented on June 11, 2024

Optionally use left/right indices buffer

from pygbm.

Comments (7)

ogrisel commented on June 11, 2024

In splitting.py, the left/right_indices_buffer will use up 8GB for 10^9 rows. If that causes swapping, the performance benefit of multithreading (which requires these buffers) are most probably not worth it. Would it be an option to disable this?

I don't see how we could have multithreading at that level anymore. You suggest disabling thread-based parallelism for the split_indices operation? Maybe that could be an option. @NicolasHug might know better how LightGBM does for this part of the code.

from pygbm.

NicolasHug commented on June 11, 2024

Would it be an option to disable this?

Technically yes... I suppose we could use a single-threaded quick-sort like partitioning scheme.

Is there also a method that could without the buffer

I don't think so, or at least no with the current strategy. those arrays are used to that sample indices don't overwrite each other

from pygbm.

NicolasHug commented on June 11, 2024

@NicolasHug might know better how LightGBM does for this part of the code

I haven't checked again but I don't think they have an option to disable parallel splitting.

@maartenbreddels could you check if you have the same issue on LightGBM? Note that they are reusing allocated data like we plan to do in #81 so we need to take this into account

from pygbm.

maartenbreddels commented on June 11, 2024

There are some parallel in place partition algorithms: http://www.lsi.upc.edu/~lfrias/research/parpar/wea08.pdf
they don't appear super trivial, not something I'd do in 1 evening.

But I think one of the buffers could be avoided, that would already save a bit. Would you be interested in a PR that does either a single threaded split, or uses 1 buffer, or both? I can't promise I can do it, but if the 1 buffer PR makes the code less readable, and that is a reason not to merge, I won't bother.

could you check if you have the same issue on LightGBM?

I cannot use LightGBM as it is now, the current implementation makes at least 2 memory copies, my vaex-ml hack avoids 1 copy, but still the memory usage is excessive.

My plan is to see what is possible with pygbm (much easier to understand, and easier to edit), and possible see how they can be translated to lightgbm.

from pygbm.

NicolasHug commented on June 11, 2024

I think a single-threaded version would be welcome and should not be too complicated to add.

I'd be curious to know how to avoid using one of the two arrays though!

from pygbm.

maartenbreddels commented on June 11, 2024

I think a single-threaded version would be welcome and should not be too complicated to add.

I'll open an PR for that, any guidelines for how this should be configurable?

I'd be curious to know how to avoid using one of the two arrays though!

I thought of using the sample_indices for the 'left' indices, and a scratchpad/buffer for the 'right' indices. Basically, sample_indices takes over the role of left_indices_buffer. That should work right?

from pygbm.

NicolasHug commented on June 11, 2024

If you can make sure that no entry in samples_indices gets overwritten before it's written into the other buffer then I guess so. But there's the "if" ^^

any guidelines for how this should be configurable?

Let go simple for now, you can try passing a parameter e.g. parallel_splitting from BaseGradientBoosing all the way down to SplittingContext.__init__, and make split_indices dispatch to either split_indices_parallel or split_indices_single_thread. We can work out the details later.

from pygbm.

Optionally use left/right indices buffer about pygbm HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent