I realize that the vku_framework.hpp was intended as a way to easily get started with

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Make the vku::framework and vku::Window more flexible about vookoo HOT 7 OPEN

andy-thomason commented on September 26, 2024

Make the vku::framework and vku::Window more flexible

from vookoo.

Comments (7)

andy-thomason commented on September 26, 2024

I'm happy for you to customise it as you wish. Real use cases will always drive software development. Best of all, develop a game engine or 3D application.

from vookoo.

FunMiles commented on September 26, 2024

@andy-thomason I am definitely using the framework for my new code. I am discovering quite a few things to adapt and generalize. By the way, I had stated that I had a working threading but eventually discovered it was buggy. This led me to stackoverflow where a patient soul helped figure out the issue. I will be submitting a pull request with a threaded demo and will update all the demos to cache the queue, rather than asking for it on each draw call. It will take me some time though, as I need to address the framework singleton aspect. I am also working on generalizing the buffer system to easily adapt/use the popular Vulkan Memory Allocator without forcing to use it as I'm sure @lhog wouldn't like that! ;P .

from vookoo.

FunMiles commented on September 26, 2024

After discovering the issue for the queue and threading, I am left wondering what approach to take. Here are my notes:

Any queue submission must be under a lock if you have threading.
If you are not going to have more than one thread doing threading, locking is not necessary
Locking when no contention exists is actually cheap

So the question becomes: (:heavy_check_mark: indicates current choices)

Should only a locking version be provided?
Should a non-locking version be provided?
- via templating the queue argument?
- via an abstract interface class wrapping the queue and providing two versions, one locking, one not-locking?

Comments/opinions are welcome.

Something totally off-topic but that probably will benefit from multi-threaded rendering: I discovered ImGui and have started exploring it with Vulkan for the rendering. I think I may do an open-source demo app for image processing using Vookoo for rendering and ImGui on top for the GUI.

from vookoo.

FunMiles commented on September 26, 2024

I have a fairly complete system for synchronized queue access at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues
There is one example opening two windows: 06-parallelTriangles.
I am ready to do a PR. I have two pending PRs that have not been merged into the mainline though. Should merge them into my code and have a single full PR?

from vookoo.

andy-thomason commented on September 26, 2024

That looks very good. The example is relatively short, which is very good. It is worth timing the calls with rdtsc as mutexes can be expensive if implemented badly. At best they are a spin lock on an atomic boolean - at worst an O/S call. One of the downfalls of OpenGL was the need for a mutex on every call.

…

On Thu, Apr 23, 2020 at 4:04 PM FunMiles ***@***.***> wrote: I have a fairly complete system for synchronized queue access at https://github.com/FunMiles/Vookoo/tree/lock_guard_queues There is one example opening two windows: 00-parallelTriangles. There is no cleanup for closing the windows. I need to do that before submitting a PR. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAL36XEWG44BF227WG7R7NLROBKHTANCNFSM4MKEOL5A> .

from vookoo.

FunMiles commented on September 26, 2024

I do not have experience measuring mutex costs on Windows yet, but I've dealt with them a lot on Unix based system and they are cheap. The important thing to remember is that they should cost only if there is contention. And a good implementation should spin-lock for a bit before going to the OS.
The Vulkan layers for example actually use mutexes to synchronize access to maps that record access to the Vulkan objects from various threads. They did two things to optimize:
- They have more than one map to try to minimize the likelihood of contention
- They made sure mutexes end up in individual cache lines to avoid false sharing
I am using one mutex for each queue but can also make sure they end up in individual cache lines. I will add the timing and try it on Windows/Linux/Mac over the weekend.

One thing you may notice in the last update I made is that rather than submit the static and dynamic command buffers to the queue separately, I've merged them into a single submission to reduce the number of lock/unlock of the queue mutex.

On my Mac, there are three queue families, all capable of doing graphics/compute/transfer. Each only has one queue at most. One could try to better pick queue families/queues and try to create multiple queue to assign to various cases to minimize contention likelihood. A round-robin distribution maybe?

from vookoo.

FunMiles commented on September 26, 2024

I have started doing rdtsc timings and on the Mac.
Preliminary timings. They are a measurement of the whole timing for a call to the queue submit routine. Not a timing of the cost of just the mutex itself as this is something fairly ill-defined since a mutex can only be acquired when it has been released by any other user.

In windows:
- In debug mode, the average times for 2 windows being opened is 0.5 ms drops to 0.3ms with only one window.
- In release mode, the average time is independent of the number of windows opened and is about 0.26ms
In linux:
- In debug, independently of the number of windows, 0.064ms
- In release 0.06ms
On Mac:
- Two windows open: 16ms!!!! one window: 0.28 ms

The Mac timings are very strange. I may have to raise that with the MoltenVK development team.
Some other messages about swapchain image request misuse is puzzling me when i replace the argument to submitKHR with a pointer rather than the reference, I get:

00000008 debugCallback: [ VUID-vkAcquireNextImageKHR-swapchain-01802 ] Object: 0x40000000004 (Type = 27) | vkAcquireNextImageKHR: Application has already previously acquired 3 images from swapchain. Only 2 are available to be acquired using a timeout of UINT64_MAX (given the swapchain has 3, and VkSurfaceCapabilitiesKHR::minImageCount is 2). The Vulkan spec states: If the number of currently acquired images is greater than the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount as returned by a call to vkGetPhysicalDeviceSurfaceCapabilities2KHR with the surface used to create swapchain, timeout must not be UINT64_MAX (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-vkAcquireNextImageKHR-swapchain-01802)

from vookoo.

Make the vku::framework and vku::Window more flexible about vookoo HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent