Coder Social home page Coder Social logo

wgpu-profiler's Introduction

wgpu-profiler

Crates.io

Simple profiler scopes for wgpu using timer queries

Features

  • Easy to use profiler scopes
    • Allows nesting!
    • Can be disabled by runtime flag
    • Additionally generates debug markers
    • Thread-safe - can profile several command encoder/buffers in parallel
  • Internally creates pools of timer queries automatically
    • Does not need to know in advance how many queries/profiling scopes are needed
    • Caches up profiler-frames until results are available
      • No stalling of the device at any time!
  • Many profiler instances can live side by side
  • chrome trace flamegraph json export
  • Tracy integration (behind tracy feature flag)

How to use

Create a new profiler object:

use wgpu_profiler::{wgpu_profiler, GpuProfiler, GpuProfilerSettings};
// ...
let mut profiler = GpuProfiler::new(GpuProfilerSettings::default());

Now you can start creating profiler scopes:

// You can now open profiling scopes on any encoder or pass:
let mut scope = profiler.scope("name of your scope", &mut encoder, &device);

// Scopes can be nested arbitrarily!
let mut nested_scope = scope.scope("nested!", &device);

// Scopes on encoders can be used to easily create profiled passes!
let mut compute_pass = nested_scope.scoped_compute_pass("profiled compute", &device);

// Scopes expose the underlying encoder or pass they wrap:
compute_pass.set_pipeline(&pipeline);
// ...

// Scopes created this way are automatically closed when dropped.

GpuProfiler reads the device features on first use:

  • wgpu::Features::TIMESTAMP_QUERY is required to emit any timer queries.
    • Alone, this allows you to use timestamp writes on pass definition as done by Scope::scoped_compute_pass/Scope::scoped_render_pass
  • wgpu::Features::TIMESTAMP_QUERY_INSIDE_ENCODERS is required to issue queries at any point within encoders.
  • wgpu::Features::TIMESTAMP_QUERY_INSIDE_PASSES is required to issue queries at any point within passes.

Wgpu-profiler needs to insert buffer copy commands, so when you're done with an encoder and won't do any more profiling scopes on it, you need to resolve the queries:

profiler.resolve_queries(&mut encoder);

And finally, to end a profiling frame, call end_frame. This does a few checks and will let you know if something is off!

profiler.end_frame().unwrap();

Retrieving the oldest available frame and writing it out to a chrome trace file.

if let Some(profiling_data) = profiler.process_finished_frame(queue.get_timestamp_period()) {
    wgpu_profiler::chrometrace::write_chrometrace(std::path::Path::new("mytrace.json"), &profiling_data);
}

To get a look of it in action, check out the example project!

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

wgpu-profiler's People

Contributors

cwfitzgerald avatar dasetwas avatar davidster avatar icandivideby0 avatar imberflur avatar jcapucho avatar vini-fda avatar waywardmonkeys avatar wumpf avatar xstrom avatar zoxc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

wgpu-profiler's Issues

Scope guard types

The way the project I work on handles state transitions between e.g. pipelines while drawing with wgpu it isn't feasible to put things inside a scope macro. So I made the scope types (see below) which wrap the CommandEncoder/RenderPass and utilize Drop. I was wondering if something like these would be useful in this crate or make more sense in a separate crate or simply created by users as needed?

pub struct Scope<'a, W: ProfilerCommandRecorder> {
    profiler: &'a mut GpuProfiler,
    wgpu_thing: &'a mut W,
}

pub struct OwningScope<'a, W: ProfilerCommandRecorder> {
    profiler: &'a mut GpuProfiler,
    wgpu_thing: W,
}

// Separate type since we can't destructure types that impl Drop :/
pub struct ManualOwningScope<'a, W: ProfilerCommandRecorder> {
    profiler: &'a mut GpuProfiler,
    wgpu_thing: W,
}

impl<'a, W: ProfilerCommandRecorder> Scope<'a, W> {
    pub fn start(
        profiler: &'a mut GpuProfiler,
        wgpu_thing: &'a mut W,
        device: &wgpu::Device,
        label: &str,
    ) -> Self {
        profiler.begin_scope(label, wgpu_thing, device);
        Self {
            profiler,
            wgpu_thing,
        }
    }

    /// Starts a scope nested within this one
    pub fn scope(&mut self, device: &wgpu::Device, label: &str) -> Scope<'_, W> {
        Scope::start(self.profiler, self.wgpu_thing, device, label)
    }
}

impl<'a, W: ProfilerCommandRecorder> OwningScope<'a, W> {
    pub fn start(
        profiler: &'a mut GpuProfiler,
        mut wgpu_thing: W,
        device: &wgpu::Device,
        label: &str,
    ) -> Self {
        profiler.begin_scope(label, &mut wgpu_thing, device);
        Self {
            profiler,
            wgpu_thing,
        }
    }

    /// Starts a scope nested within this one
    pub fn scope(&mut self, device: &wgpu::Device, label: &str) -> Scope<'_, W> {
        Scope::start(self.profiler, &mut self.wgpu_thing, device, label)
    }
}

impl<'a, W: ProfilerCommandRecorder> ManualOwningScope<'a, W> {
    pub fn start(
        profiler: &'a mut GpuProfiler,
        mut wgpu_thing: W,
        device: &wgpu::Device,
        label: &str,
    ) -> Self {
        profiler.begin_scope(label, &mut wgpu_thing, device);
        Self {
            profiler,
            wgpu_thing,
        }
    }

    /// Starts a scope nested within this one
    pub fn scope(&mut self, device: &wgpu::Device, label: &str) -> Scope<'_, W> {
        Scope::start(self.profiler, &mut self.wgpu_thing, device, label)
    }

    /// Ends the scope allowing the extraction of owned the wgpu thing
    /// and the mutable reference to the GpuProfiler
    pub fn end_scope(mut self) -> (W, &'a mut GpuProfiler) {
        self.profiler.end_scope(&mut self.wgpu_thing);
        (self.wgpu_thing, self.profiler)
    }
}
impl<'a> Scope<'a, wgpu::CommandEncoder> {
    /// Start a render pass wrapped in an OwnedScope
    pub fn scoped_render_pass<'b>(
        &'b mut self,
        device: &wgpu::Device,
        label: &str,
        pass_descriptor: &wgpu::RenderPassDescriptor<'b, '_>,
    ) -> OwningScope<'b, wgpu::RenderPass> {
        let render_pass = self.wgpu_thing.begin_render_pass(pass_descriptor);
        OwningScope::start(self.profiler, render_pass, device, label)
    }
}

impl<'a> OwningScope<'a, wgpu::CommandEncoder> {
    /// Start a render pass wrapped in an OwnedScope
    pub fn scoped_render_pass<'b>(
        &'b mut self,
        device: &wgpu::Device,
        label: &str,
        pass_descriptor: &wgpu::RenderPassDescriptor<'b, '_>,
    ) -> OwningScope<'b, wgpu::RenderPass> {
        let render_pass = self.wgpu_thing.begin_render_pass(pass_descriptor);
        OwningScope::start(self.profiler, render_pass, device, label)
    }
}

impl<'a> ManualOwningScope<'a, wgpu::CommandEncoder> {
    /// Start a render pass wrapped in an OwnedScope
    pub fn scoped_render_pass<'b>(
        &'b mut self,
        device: &wgpu::Device,
        label: &str,
        pass_descriptor: &wgpu::RenderPassDescriptor<'b, '_>,
    ) -> OwningScope<'b, wgpu::RenderPass> {
        let render_pass = self.wgpu_thing.begin_render_pass(pass_descriptor);
        OwningScope::start(self.profiler, render_pass, device, label)
    }
}

// Scope
impl<'a, W: ProfilerCommandRecorder> std::ops::Deref for Scope<'a, W> {
    type Target = W;

    fn deref(&self) -> &Self::Target { self.wgpu_thing }
}

impl<'a, W: ProfilerCommandRecorder> std::ops::DerefMut for Scope<'a, W> {
    fn deref_mut(&mut self) -> &mut Self::Target { self.wgpu_thing }
}

impl<'a, W: ProfilerCommandRecorder> Drop for Scope<'a, W> {
    fn drop(&mut self) { self.profiler.end_scope(self.wgpu_thing); }
}

// OwningScope
impl<'a, W: ProfilerCommandRecorder> std::ops::Deref for OwningScope<'a, W> {
    type Target = W;

    fn deref(&self) -> &Self::Target { &self.wgpu_thing }
}

impl<'a, W: ProfilerCommandRecorder> std::ops::DerefMut for OwningScope<'a, W> {
    fn deref_mut(&mut self) -> &mut Self::Target { &mut self.wgpu_thing }
}

impl<'a, W: ProfilerCommandRecorder> Drop for OwningScope<'a, W> {
    fn drop(&mut self) { self.profiler.end_scope(&mut self.wgpu_thing); }
}

// ManualOwningScope
impl<'a, W: ProfilerCommandRecorder> std::ops::Deref for ManualOwningScope<'a, W> {
    type Target = W;

    fn deref(&self) -> &Self::Target { &self.wgpu_thing }
}

impl<'a, W: ProfilerCommandRecorder> std::ops::DerefMut for ManualOwningScope<'a, W> {
    fn deref_mut(&mut self) -> &mut Self::Target { &mut self.wgpu_thing }
}

Add an easy way to print a profiler frame

Users have to do that manually right now if they don't use chrome trace.

Also the readme could really use some of that, people coming in have no ideas what data to expect!

Race condition when dropping frames

There's currently a problem with the code for dropping frames that can cause wgpu to crash, the problem happens when the user application submits a frame that stalls long enough so that a new frame is also queued but the new frame uses more pools than the previous frame and causes the pools to be dropped

wgpu-profiler/src/lib.rs

Lines 322 to 332 in 637877c

fn reset_and_cache_unused_query_pools(&mut self, mut query_pools: Vec<QueryPool>) {
// If a pool was less than half of the size of the max frame, then we don't keep it.
// This way we're going to need less pools in upcoming frames and thus have less overhead in the long run.
let capacity_threshold = self.size_for_new_query_pools / 2;
for mut pool in query_pools.drain(..) {
pool.reset();
if pool.capacity >= capacity_threshold {
self.unused_pools.push(pool);
}
}
}

so the timeline is something like:

1st frame submitted
2nd frame submitted
1st frame pools dropped
1st frame tries to finish

this results in the following error from wgpu:

PanicInfo: panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `3`: Buffer[6][2] is no longer alive (ID: (6, 2, Vulkan))', wgpu/wgpu-core/src/hub.rs:371:9

I've also added some prints to the profiler code to better show the problem:

frame finished # The frame before the problematic frame finished without a problem
# We process the finished frame which causes the cache code to be invoked
# but the threshold isn't high enough to drop pools
Threshold: 24
# Queuing the new frame
# Buffer of pool 1 which has problems
Resolving to: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843017803628550) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 256, initial_range: 0..0, sub_ranges: [] } }, size: 256, usage: MAP_READ | COPY_DST }
# Buffer of pool 2 which has problems
Resolving to: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843013508661271) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 256, initial_range: 0..0, sub_ranges: [] } }, size: 256, usage: MAP_READ | COPY_DST }
Ending frame # The 1st frame is submitted and `end_frame` is called
# We start the 2nd frame
Resolving to: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843017803628574) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 384, initial_range: 0..0, sub_ranges: [] } }, size: 384, usage: MAP_READ | COPY_DST }
Resolving to: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843017803628577) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 608, initial_range: 0..0, sub_ranges: [] } }, size: 608, usage: MAP_READ | COPY_DST }
Ending frame # The 2nd frame is submitted
Dropping frame # Since the profiler was configured with only 1 max pending frame and the 1st frame hasn't finished the 1st frame is dropped
# This causes the cache code to run
Threshold: 38 # The 2nd frame uses more query pools causing `capacity_threshold` to increase
Destroying: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843017803628550) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 256, initial_range: 0..0, sub_ranges: [] } }, size: 256, usage: MAP_READ | COPY_DST } # Buffer of pool 1 doesn't pass the threshold
Destroying: Buffer { context: Context { type: "Native" }, id: ObjectId { id: Some(2305843013508661271) }, data: Any { .. }, map_context: Mutex { data: MapContext { total_size: 256, initial_range: 0..0, sub_ranges: [] } }, size: 256, usage: MAP_READ | COPY_DST } # Buffer of pool 2 doesn't pass the threshold
# Somewhere after the 1st frame finishes which causes wgpu to try transition the buffers

I also tried to come up with a small reproduction case based on the other dropped frames test but I couldn't find a way to simulate a stalled frame

async fn handle_dropped_frames_pool_increase_gracefully_async() {
    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor::default());
    let adapter = instance.request_adapter(&wgpu::RequestAdapterOptions::default()).await.unwrap();
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                features: wgpu::Features::TIMESTAMP_QUERY,
                ..Default::default()
            },
            None,
        )
        .await
        .unwrap();

    // max_num_pending_frames is one!
    let mut profiler = wgpu_profiler::GpuProfiler::new(1, queue.get_timestamp_period(), device.features());

    // Two frames without device poll, causing the profiler to drop a frame on the second round.
    {
        let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor::default());
        {
            let _ = wgpu_profiler::scope::Scope::start("testscope", &mut profiler, &mut encoder, &device);
        }
        profiler.resolve_queries(&mut encoder);

        queue.submit(std::iter::once(encoder.finish()));
        queue.on_submitted_work_done(|| {
            println!("done1");
        });

        profiler.end_frame().unwrap();

        // We haven't done a device poll, so there can't be a result!
        assert!(profiler.process_finished_frame().is_none());
    }
    {
        let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor::default());
        {
            let mut root = wgpu_profiler::scope::Scope::start("rootscope", &mut profiler, &mut encoder, &device);
            for _ in 0..32 {
                let _ = root.scope("nestedscope", &device);
            }
        }
        profiler.resolve_queries(&mut encoder);

        queue.submit(std::iter::once(encoder.finish()));
        queue.on_submitted_work_done(|| {
            println!("done2");
        });

        profiler.end_frame().unwrap();

        // We haven't done a device poll, so there can't be a result!
        assert!(profiler.process_finished_frame().is_none());
    }

    // Poll to explicitly trigger mapping callbacks.
    device.poll(wgpu::Maintain::Wait);

    // A single (!) frame should now be available.
    assert!(profiler.process_finished_frame().is_some());
    assert!(profiler.process_finished_frame().is_none());
}

Make use of pass timer queries

wgpu 0.18 supports timer queries directly on passes which don't need the INSIDE_PASSES timer feature. wgpu-profiler should have first class support for this!

scoped_render_pass and scoped_compute_pass don't need `#[must_use]`

It's hard to forget to use a render pass if you need it for something and some passes used for e.g. clearing a texture can be immediately dropped. Since these functions don't wrap a borrow of an existing value it isn't possible to drop them on accident and use the original wrapped value.

Broken wasm build due to std::process::id function call, is this a bug?

WASM build is broken in my project since upgrading wgpu-profiler. I guess it's been 'broken' since #30.

I'm not sure how you would like to proceed with this. On the one hand, I could probably find a way to never call wgpu-profiler functions from wasm builds. On the other hand, it was pretty nice being able to compile to web with no extra work required.

What do you think?

Show current thread id and process id

Currently, PID and TID are hardcoded to show only the constant "1":

fn write_results_recursive(file: &mut File, result: &GpuTimerScopeResult, last: bool) -> std::io::Result<()> {
    write!(
        file,
        r#"{{ "pid":1, "tid":1, "ts":{}, "dur":{}, "ph":"X", "name":"{}" }}{}"#,
        result.time.start * 1000.0 * 1000.0,
        (result.time.end - result.time.start) * 1000.0 * 1000.0,
        result.label,
        if last && result.nested_scopes.is_empty() { "\n" } else { ",\n" }
    )?;
    if result.nested_scopes.is_empty() {
        return Ok(());
    }

    for child in result.nested_scopes.iter().take(result.nested_scopes.len() - 1) {
        write_results_recursive(file, child, false)?;
    }
    write_results_recursive(file, result.nested_scopes.last().unwrap(), last)?;

    Ok(())
    // { "pid":1, "tid":1, "ts":546867, "dur":121564, "ph":"X", "name":"DoThings"
}

They should be changed to output the proper process and thread id.

Suggestion: use std::process::id() for the PID and std::thread::current().id() for the TID. The only problem is that the ThreadId type in the standard library cannot be trivially converted to a primitive type (int/uint/usize), without using any hacks. Also, ThreadIds are under the control of Rust’s standard library and there may not be any relationship between ThreadId and the underlying platform’s notion of a thread identifier (see the official docs). There's a proposal for stabilizing a u64 conversion, though (see this tracking issue)

Add puffin integration

It should be possible to have both CPU & GPU traces be shown in sync in Puffin 🤔

(Add an example screenshot of that to readme if it works out!)

Integration with tracy_client

Now that tracy has proper C api support for GPU timestamps, it would be a great extension to this library to offer a tracy feature where the timestamps collected by wgpu-profiler are reported to tracy.

I am planning on working on this in a while but I wanted to clear it with you before I did.

Profiler can't be used accross threads or with interleaved scopes

By design the user is currently forced to use one profiler per thread. In fact even working with two command encoders on a single thread in an interleaved fashion is not possible right now since every call to end_scope needs its corresponding call to begin_scope to be in correct order.

begin_scope needs to return a handle that end_scope works with instead. Also, both functions need to work with interior mutability, each requiring &GpuProfiler instead of &mut GpuProfiler.
I believe forcing &mut GpuProfiler for all other methods (query resolve, processing frame etc.) still makes sense though since this greatly simplifies things both for implementation and potential error cases

Allow custom payloads for Profiler Frame and Scopes

See also #49

Making the profiler generic over two payload types doesn't seem to outrageous (with both defaulting to ()). Need to figure out how to keep things ergonomic though for users that don't want to supply payloads to their scopes & frames!

Negative profile measurements on MacOS

I am seeing that sometimes query_result.time.end - query_result.time.start gives me a negative value on mac. Have you seen this before? I could create a mini repo that reproduces the problem if you'd like :).

wgpu 14?

when is the time frame this will get updated to wgpu 14?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.