Coder Social home page Coder Social logo

Comments (12)

38 avatar 38 commented on June 21, 2024 1

Hi,

That's a really nice suggestion and I think it's really good to have it.

Particularly, I think splitting the buffer based on drawing region may have the following issue:

  1. As you mentioned. The memory region may not be consecutive, for example we have a region that handles from (100, 100) to (200, 200), Since the image buffer is stored as a 2D array, it's not easy to have a reference like that.
  2. The drawing area may overlaps, so there's no thread safety guarantee.

Also if we synchronizing the access per pixel, that will be very expensive.

So that I am thinking is, for sure we need synchronizing threads to manipulate the image buffer. What we can improve it is to use a buffer that caches the modification in drawing area and later we acquire the lock once and put all pixels to the same one.

In fact, I am thinking about could we have an bitmap element.
Basically each thread just render a different in-memory image. And after that one thread convert all the in-memory image into bitmap elements and paste it on the main buffer ?

As long as it reduce the synchronized operations, it might have OK performance.

Ideally, we should have a pixel granularity lock, but it seems hard to have one based on the image crate.

So any thoughts?

from plotters.

ralfbiedert avatar ralfbiedert commented on June 21, 2024 1

Alright, I did a small experiment:

  • Rendering the x^2 curve in 640x480 to a BitMapBackend
  • Splitting the canvas into two horizontal subgraphs and rendering the curves individually
  • Simulating blitting a horizontal half-buffer line-by-line into a larger buffer.

With grid

test sine_640_480                                  ... bench:   4,505,700 ns/iter (+/- 200,020)
test sine_640_480_2_horizontal_subgraphs_draw_both ... bench:   5,087,750 ns/iter (+/- 184,736)
test sine_640_480_2_horizontal_subgraphs_draw_one  ... bench:   2,454,890 ns/iter (+/- 85,392)
test sine_640_480_2_horizontal_subgraphs_fake_blit ... bench:     331,725 ns/iter (+/- 6,994)

Thoughts:

  • These are ballpark numbers, esp. the blitting has allocation cost mixed in
  • Drawing two subgraphs at the given resolution takes 5ms on one thread. Parallel drawing and blitting would rather be in the 2.4 + 0.3 ms = 2.7ms area
  • For a graph as complex as these ones (I'd guess drawing the background grid is the most expensive part), there would be a clear benefit of parallelizing and then copying.
  • Also, I just realize the name sine_ is wrong, should be called x_square_ :)

Same experiment without the grid

test sine_640_480                                  ... bench:   2,973,837 ns/iter (+/- 134,352)
test sine_640_480_2_horizontal_subgraphs_draw_both ... bench:   2,998,700 ns/iter (+/- 475,909)
test sine_640_480_2_horizontal_subgraphs_draw_one  ... bench:   1,436,110 ns/iter (+/- 69,206)
test sine_640_480_2_horizontal_subgraphs_fake_blit ... bench:     333,900 ns/iter (+/- 29,314)

Without the grid, 320x240

test sine_320_240                                  ... bench:     925,335 ns/iter (+/- 8,680)
test sine_320_240_2_horizontal_subgraphs_draw_both ... bench:   1,063,720 ns/iter (+/- 85,561)
test sine_320_240_2_horizontal_subgraphs_draw_one  ... bench:     904,630 ns/iter (+/- 17,451)
test sine_320_240_2_horizontal_subgraphs_fake_blit ... bench:      36,906 ns/iter (+/- 1,035)

Even then, this drawing into separate buffers and blitting once done seems totally worth it. However, it would be nice not paying the extra cost of blitting + allocation, unless I actually want to pay it, using the threading library of my choice.

from plotters.

38 avatar 38 commented on June 21, 2024 1

Hello @ralfbiedert ,

After working for a while on this, I just want to let you know some updates:

  1. As I previously mentioned, the draw and blit is currently supported.
  2. Worked on the benchmark for a while and finally makes the bitmap backend 8 times faster.
  3. After think for a while, I don't think I am going to add inplace parallel rendering to the crate. But this can be done in this way: https://github.com/38/plotters/blob/3a17f8b7a3e6d1c2ae1d2c40f584313f491251c6/benches/benches/parallel.rs#L106

For the last update, I think the major problem is partially mutable borrow a vector seems to be unsafe anyway. So as a crate maintainer, I think it's important to not leaking the unsafe behavior to the safe rust code.
Since image crate requires the container to be anything that impls DerefMut. So we have to do make it mutably partial borrowed, but the problem is what if the thread code later moved one part of the buffer?
Plus the limit of trait bound, borrow a mutable reference is the only way we can feed it into image crate.

So what I am thinking is like the code in my benchmark:

            let mut buffer = vec![0u8; (W * H * 3) as usize];
            let (upper, lower) = unsafe {
                let upper_addr = &mut buffer[0] as *mut u8;
                let lower_addr = &mut buffer[(W * H * 3 / 2) as usize] as *mut u8;
                (
                    std::slice::from_raw_parts_mut(upper_addr, (W * H * 3 / 2) as usize),
                    std::slice::from_raw_parts_mut(lower_addr, (W * H * 3 / 2) as usize),
                )
            };

            [upper, lower]
                .par_iter_mut()
                .for_each(|b| draw_plot(&BitMapBackend::with_buffer(*b, (W, H / 2)).into_drawing_area(), 2.0));

So would you mind share you idea on this?

Thanks!

from plotters.

ralfbiedert avatar ralfbiedert commented on June 21, 2024

Ideally this feature should work with #54.

Now that I think about it, this might be really hard to implement efficiently, since it would probably require either:

  • splitting a mutable borrow of the backing array between multiple threads (which might work for vertical stacking but probably not for horizontal ones as the arrays are not continuous anymore)
  • synchronizing access to the backing array, which costs performance.

...

Alright, with both issues in mind, it might be worthwhile to do a performance analysis if having multiple owned backing arrays (and copying / blitting when merging) is "on average better", or avoiding multi-threaded access altogether (but reducing allocations & copying).

from plotters.

serzhiio avatar serzhiio commented on June 21, 2024

from plotters.

38 avatar 38 commented on June 21, 2024

I believe Rayon create has some of needed functionality.

It really depends on what we want to do. If we just do the bitmap element, which allows background rendering and then paste to foreground, then nothing much needs to be changed.

However, if we want to make a real thread safety guarantee, we may need many things to think about.

from plotters.

ralfbiedert avatar ralfbiedert commented on June 21, 2024

I think how to implement this depends a bit on the overarching values to address. In this scope mine would be, in order of priority:

  • control (I can precisely draw what I want, when and how I want it)
  • lightweight (few dependencies, low compile time, does not interfere with my own dependencies)
  • single-threaded speed (more common use case than multi-threaded)
  • multi-threaded speed

From that angle I would want:

  • generally speed, but not at the expense of being lightweight (e.g., I'd prefer not to have a default Rayon integration as that might interfere with being lightweight, and even control if there is a version e.g., with rayon-core)
  • multi-threaded speed, but not at the expense of making single-threaded speed significantly lower. I'd be willing to trade a bit, but not very much.

Basically each thread just render a different in-memory image. And after that one thread convert all the in-memory image into bitmap elements and paste it on the main buffer ?

As long as it reduce the synchronized operations, it might have OK performance.

This sounds like an interesting idea to hack with different image sizes / graphs, to see how much time plotting usually takes today, and how long image assembly would be, to get a ballpark figure.

from plotters.

38 avatar 38 commented on June 21, 2024

Open a new issue tracking that. But your benchmark seems to be very interesting. #58

Do you know any details about this ?

==

And yes, we can actually do something for parallel rendering just for BitMapBackend.

We probably would have a API for BitMapBackend allowing split a backend obect to a few different one, each one can draw independently. (But only vertically split?)

Thoughts?

from plotters.

38 avatar 38 commented on June 21, 2024

Hi @ralfbiedert ,

Just implemented the BitMapElement and try your benchmark quickly, sees a speedup. But lower than estimated. My guess is the thread synchronization overhead.

test sine_640_480                                      ... bench:   4,069,858 ns/iter (+/- 176,661)
test sine_640_480_2_horizontal_subgraphs_draw_and_blit ... bench:   3,227,221 ns/iter (+/- 536,819)
test sine_640_480_2_horizontal_subgraphs_draw_both     ... bench:   4,561,579 ns/iter (+/- 276,693)
test sine_640_480_2_horizontal_subgraphs_draw_one      ... bench:   2,291,869 ns/iter (+/- 94,406)
test sine_640_480_2_horizontal_subgraphs_fake_blit     ... bench:     260,334 ns/iter (+/- 22,001)
test sine_640_480_2_vertical_subgraphs_draw_and_blit   ... bench:   3,217,974 ns/iter (+/- 924,244)

Check the dev branch if you want to try this.

And this is one of the new benchmark I added

#[bench]
fn sine_640_480_2_horizontal_subgraphs_draw_and_blit(b: &mut Bencher) {
    let dim = (640 as u32, 360 as u32);
    let mut buffer = vec![0; (dim.0 * dim.1 * 3) as usize];

    let mut root = BitMapBackend::with_buffer(&mut buffer, dim).into_drawing_area();
    let (mut left, mut right) = root.split_horizentally(320);

    b.iter(|| {
        let plots:Vec<_> = [left.dim_in_pixel(), right.dim_in_pixel()]
            .par_iter()
            .map(|(w,h)| {
                let mut element = plotters::element::BitMapElement::new((0,0), (*w,*h));
                {
                    let left = element.as_bitmap_backend().into_drawing_area();
                    left.fill(&WHITE).unwrap();
                    let mut chart = ChartBuilder::on(&left)
                        .caption("y=x^2", ("Arial", 50).into_font())
                        .margin(5)
                        .x_label_area_size(30)
                        .y_label_area_size(30)
                        .build_ranged(-1f32..1f32, -0.1f32..1f32)
                        .unwrap();

                    chart.configure_mesh().draw().unwrap();

                    chart
                        .draw_series(LineSeries::new(
                            (-50..=50).map(|x| x as f32 / 50.0).map(|x| (x, x * x)),
                            &RED,
                        ))
                        .unwrap()
                        .label("y = x^2")
                        .legend(|(x, y)| Path::new(vec![(x, y), (x + 20, y)], &RED));

                    chart
                        .configure_series_labels()
                        .background_style(&WHITE.mix(0.8))
                        .border_style(&BLACK)
                        .draw()
                        .unwrap();

                }
                element
            })
        .collect();

        left.draw(&plots[0]).unwrap();
        right.draw(&plots[1]).unwrap();

        root.get_base_pixel()
    }); // `black_box` prevents `f` from being optimized away.
}

from plotters.

ralfbiedert avatar ralfbiedert commented on June 21, 2024

Worked on the benchmark for a while and finally makes the bitmap backend 8 times faster.

I just measured with my own benchmarks again, and can confirm drawing the graphs went from:

test sine_640_480                                  ... bench:   4,505,700 ns/iter (+/- 200,020)
test sine_640_480_2_horizontal_subgraphs_draw_both ... bench:   5,087,750 ns/iter (+/- 184,736)
test sine_640_480_2_horizontal_subgraphs_draw_one  ... bench:   2,454,890 ns/iter (+/- 85,392)

to

test sine_640_480                                  ... bench:     887,485 ns/iter (+/- 17,703)
test sine_640_480_2_horizontal_subgraphs_draw_both ... bench:   1,513,180 ns/iter (+/- 85,506)
test sine_640_480_2_horizontal_subgraphs_draw_one  ... bench:     773,900 ns/iter (+/- 22,090)

My actual app went from 3% to 1% CPU usage! This is awesome!

After think for a while, I don't think I am going to add inplace parallel rendering to the crate.

Although I think a "any layout" parallel story would be nice, I can see the overhead this might bring. Thanks for investigating!

So what I am thinking is like the code in my benchmark:

I submitted a PR, I think this can be simplified and made safe as:

let (upper, lower) = buffer.split_at_mut((W * H * 3 / 2) as usize);

from plotters.

38 avatar 38 commented on June 21, 2024

Glad you confirmed the performance improvement.

Although I think a "any layout" parallel story would be nice, I can see the overhead this might bring. Thanks for investigating!

Thanks for the suggestion. You PR just remainders me. I wasn't realize there's split_at_mut function which is safe. I just realized I was previously wrong that a mutable reference can be dropped by the worker thread. As long as the guest thread is holding just a mutable reference, the worst thing it can do is just swap the content but cannot simply move the content and invalidate the buffer. So it should be still safe.

So I think my argument on introducing any unsafe behavior into safe code isn't the case.

If this is the case, I am really happy to have the feature.


Update: I worked on this. And if you would like to try, just pull the latest dev branch. Code looks like this:

            let mut back = BitMapBackend::with_buffer(&mut buffer, (W, H));
            back.split(&[H / 2])
                .into_par_iter()
                .for_each(|b| draw_plot(&b.into_drawing_area(), 2.0));

from plotters.

38 avatar 38 commented on June 21, 2024

Seems we have investigave and impled things suggested under this one. Closing the issue for now. Feel free to reopen it if there's anything else. Thanks

from plotters.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.