cg-tuwien / auto-vk-toolkit Goto Github PK

Getting serious about Vulkan development with this modern C++ framework, battle-tested in rapid prototyping, research, and teaching. Includes support for real-time ray tracing (RTX), serialization, and meshlets.

License: Other

C++ 69.44% C# 28.08% PowerShell 0.35% CMake 2.13%

vulkan rendering engine cpp visual-studio cereal framework mesh-shader mesh-shaders real-time-ray-tracing rtx serialization meshlets

auto-vk-toolkit's People

Stargazers

Watchers

auto-vk-toolkit's Issues

C++20: fmt::format -> std::format

Once std::format is fully supported by MSVC (which will be when C++20 is completely supported entirely), remove the dependency to fmtlib and refactor the code to use std::format instead of fmt::format!

Currently (October 6th, 2020), std::format is not fully supported in Microsoft's C++ Standard Library. But implementation it is being labeled as "work in progress".

Check those resources for further information:

Definition of done:

All calls to fmt::format have been replaced with std::format calls
fmtlib has been completely removed from the project (see external/universal/include/ and external/universal/src/)

Add ORCA loader example application

Add an example application that provides a [Load ORCA scene] button and lets the user load an ORCA scene from file and render it.

In order to accomplish this, the DDS loader from Issue #36 must be implemented, the shader from Issue #37 shall be used for rendering, and optimally, also the editor component from Issue #38 could be opened to modify stuff.

Fix MIP-mapping for compressed images

Loading of compressed images (like e.g. DDS) has been added, but their MIP-maps are not generated correctly yet. The following error message appears:

ERR:  Debug utils callback with Id[-2034488712|VUID-vkCmdBlitImage-dstImage-02000] and Message[Validation Error: [ VUID-vkCmdBlitImage-dstImage-02000 ] Object 0: handle = 0x69ed1f0000000842, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x86bc2a78 | In vkCmdBlitImage(), VkFormatFeatureFlags (0x0001D401) does not support required feature VK_FORMAT_FEATURE_BLIT_DST_BIT for format 141 used by VkImage 0x69ed1f0000000842[] with tiling VK_IMAGE_TILING_OPTIMAL. The Vulkan spec states: The format features of dstImage must contain VK_FORMAT_FEATURE_BLIT_DST_BIT (https://vulkan.lunarg.com/doc/view/1.2.148.0/windows/1.2-extensions/vkspec.html#VUID-vkCmdBlitImage-dstImage-02000)]

=> fix it!

Restructure code in cgb_post_build_helper

The code architectur in cgb_post_build_helper is suboptimal. While functionality-wise it seems to do okay, it should be refactored a bit.

For a clearer approach in handling asset files, it is probably a good idea to follow the approach of handling ORCA (.fscene) files, which is implemented in WpfApplicationcs, around line 629.

Post Build Helper: Rebuild shaders also if an #included file changed

Currently, only the shader files themselves are watched for changes. If they #include other files, and there is a change in that included file, changes go unnoticed.

Analyze shader files for #includes and add watches also for these files => update (redeploy) the including file whenever something in an #included file changed.

Revisit and refactor create_image_from_file implementation

The implementations of gvk::create_image_from_file are somewhat strange. Depending on the input format, either gli or stbi is used to load the image.

Review the way these methods are implemented and possibly refactor them. Maybe even come up with an interface that makes more sense (e.g. inferring the format completely automatically instead of passing the desired format to the create_image_from_file(const std::string& aPath, vk::Format aFormat, avk::memory_usage aMemoryUsage = avk::memory_usage::device, avk::image_usage aImageUsage = avk::image_usage::general_texture, avk::sync aSyncHandler = avk::sync::wait_idle()) overload).

It would make sense to work on this issue in conjunction with working on issue #59.

Definition of done:

Implementations of gvk::create_image_from_file functions have been evaluated and improved in terms of readability, functionality, and code-reuse.
The implementation is less confusing than in the current state --- or at least better documented.
gvk::create_image_from_file functions are well documented and the Contribution Guidelines have been followed.

MIP-mapping for images

Currently not implemented => implement!

Fix stack overflow in convert_for_gpu_usage

create_image_from_file creates one or several staging buffers and appends them as "custom deleter" to a command buffer (command_buffer_t::set_custom_deleter) in order to have their resources cleaned up later.

create_image_from_file is also used by convert_for_gpu_usage and IF there is a high number of images that are loaded, the "custom deleter"-stack of the single used command buffer grows accordingly. It can lead to a stack overflow.

Think about how to fix this best and fix it!

Possible approaches would be:

Modify command_buffer_t::set_custom_deleter so that it is no longer prone to stack overflows
The number of create_image_from_file invocations -- and thereby, the stack size -- is known in advance. Maybe create multiple command buffers if there's the danger of a stack overflow

Definition of done:

One of the above solutions has been implemented.
A test has been implemented to verify that the stack overflow no longer happens. That could either be a scene that loads a huge amount of textures, or it could be something artificial. In the unfixed version, a stack overflow should occur for the UE4 Sun Temple scene when the code around material_image_helpers.hpp#L213 is modified so that not a std::vector of staging buffers is set as custom deleter, but instead there is one call to command_buffer_t::set_custom_deleter for each staging buffer sb in line material_image_helpers.hpp#L196.
The functions/methods create_image_from_file, convert_for_gpu_usage, and command_buffer_t::set_custom_deleter are well documented and the Contribution Guidelines have been followed.

Use condition_variable instead of atomic_bools for main thread <-> render thread sync

Sometimes it freezes. And probably because of some CPU synchronization issues (although it could also result from GPU-sync, that appears less likely).

Rewrite the main thread <-> render thread sync using ONE std::condition_variable instead of the two std::atomic_bools:

std::atomic_bool mShouldSwapInputBuffers;
std::atomic_bool mInputBufferGoodToGo;

Also, change the methods that use them accordingly.

Post Build Helper (C#): What happens to .h files in the "shader" Visual Studio filter? + add documentation

It appears that no .spv is generated for .h files but for .glsl files it is.

To be investigated:

What happens with .h files? Are they just not converted to .spv?
Based on which logic are they not converted to .spv? Does it depend on the file ending or does SPIR-V compilation fail (silently?)?

Definition of done:

A sound logic has been implemented that handles .h files well. (That means: Does not SPIR-V compile them)
Create user-documentation (probably into visual_studio/README.md) and state that files that should not be turned into SPIR-V shall be added to a different Visual Studio filter, i.e. other than "shaders".

Add a way to store that a non-owning reference to an `owning_resource` is in use.

...useful for cg_element::submit_command_buffer_ref and cg_element::present_image.

Create an example application that uses multiple queues.

Currently, all example applications are using one single queue.
There should be an example application that makes use of multiple queues in a somewhat meaningful way.

Some ideas for example applications:

"Asynchronous Compute Example" which performs some demanding operation on an asynchronous (compute?!) queue. Such a queue can be created via xk::context().create_queue(vk::QueueFlagBits::eCompute, ak::queue_selection_preference::specialized_queue);
"Multiple Queues Example" which renders n concurrent frames, but each one of them on a different queue --- i.e. for n frames in flight, there would be n queues. Optimally, this example would allow n to be configured.

It would be nice if there was a bit of a load on the GPU or some (at least somewhat realistic) situation where a real benefit of using multiple queues would be measurable. Maybe dependencies to previous passes (e.g. multiple blur passes?) and using a low rendering resolution (so that not all SMs are under load all the time) would be a situation where multiple queues would allow a GPU to schedule more work in parallel.

Points to consider:

First of all, it has to be tested if the framework correctly produces multiple different queues.
Double-check whether window::add_queue_family_ownership and window::set_present_queue do the right things w.r.t. multiple queues. And -- probably most importantly -- check whether the ownership of swapchain images is assigned correctly in vulkan::create_swap_chain_for_window around line 831 (look out for switch (queueFamilyIndices.size())!).
Queue ownership transfer: VkSharingMode. The ownership will have to be transferred between queues. I.e. the queue which performs the computations/rendering is not the same queue which presents the image => transfer ownership!

Add EXR loader

Suggestion: https://github.com/syoyo/tinyexr

See issue #59 and only implement a separate EXR loader if there are some severe problems with the path proposed in issue #59.

Post Build Helper (C#): Add deployment of source code for Publish builds

There are three build types:

Debug_Vulkan
Release_Vulkan
Publish_Vulkan which is the same as Release_Vulkan, but always copies all the assets into the target directory (instead of making symbolic links, if configured)

The Publish configuration puts the resulting build into the target directory's executable/ subdirectory.
In addition to that, there shall be a source/ subdirectory where the source code is deployed into. Deploying the source code would mean to also deploy the Visual Studio project AND all the dependent assets and shaders (managed through Visual Studio filters) are deployed into source/ as well.
This also means that the deployed Visual Studio project file must be altered, referencing the assets and shaders from their new location (i.e. within the source/ directory).

The point is that the contents of the source/ directory shall be copy&pasteable to any other PC and should run there without any additional configuration. (Think about making a source code submission for a programming assignment.)

Definition of done:

When a project is built in Publish_Vulkan configuration, its source code is deployed to the target directory into the source/ folder
The deployed source code in the source/ folder can be opened and built without any changes.
The built program deployed to executable/ executes fine, even when deployed to another PC.

Viewport Matrix

Think about how to handle the Vieport Matrix. This is especially important for supporting Vulkan and OpenGL gracefully next to each other.

Further helpful information: http://glasnost.itcarlow.ie/~powerk/GeneralGraphicsNotes/projection/viewport_transformation.html

Implement parallel_invoker

Currently, there is only one invoker: class sequential_invoker which invokes invokee::initialize(), invokee::update(), invokee::render(), etc. sequentially. Vulkan, however, allows to build command buffers in parallel to decrease the CPU-time of a frame by utilizing multiple CPU cores.

Implement class parallel_invoker : public invoker_interface and invoke all of the invokee's callback methods from multiple threads in parallel.
Attention: Only the invokees that have the same execution order may be invoked in parallel (=> evaluate invokee::execution_order()!). invokees with different execution orders must be invoked sequentially in order to maintain correctness.

A parallel_invoker can be used by providing it to the call of gvk::start as a parameter:

gvk::start( gvk::parallel_invoker(), ... );

Furthermore, create a new example application which can potentially benefit from a parallel_invoker. That can be, for example, an application which animates or transforms a lot of objects on the GPU and/or needs to create a lot of command buffers. Maybe also look into the creation of secondary command buffers for this example application!

It would be a nice-to-have if the same invokees would always be invoked from the same thread::id (i.e. across subsequent frames). In the general case, this is probably infeasible => care must be taken about the way how class descriptor_cache assigns descriptor_pools to specific thread::ids. It is unclear what happens if a descriptor_pool was created from thread 1, but in a subsequent frame, is used from thread 2, for example. Think about possible problematic cases!

Definition of done:

gvk::parallel_invoker has been implemented and can be used as a replacement to gvk::sequential_invoker wherever the latter has been in use.
All the example applications still work when a gvk::parallel_invoker is used with them instead of a gvk::sequential_invoker.
An additional example application has been implemented which benefits from a gvk::parallel_invoker performance-wise.
The new class gvk::parallel_invoker and all its methods are well documented and the Contribution Guidelines have been followed.

Support further shader compilers (DXC most notably)

Definitely support dxc.exe!

Currently, glslc is used (in the post_build_helper via glslangValidator.exe). Would it make any sense to support shaderc as well?

Post Build Helper (C#): Properly handle #included files in Shader

Properly react to errors in files which are #included in shader files.

If there are errors in included files, make sure that the buttons

-> File
-> VS
-> Dir

do the right thing!

Implement OpenGL in disguise

Attention: this is probably a very laborious task

The idea would be to keep the interface, but swap out the rendering API with OpenGL. Keeping the interface would mean that the Vulkan-SDK remains as a requirement since some of the interfaces/methods take Vulkan-types as parameters. But those Vulkan-types would internally be translated to OpenGL counterparts.

The idea of this Issue is not actually to cut the dependency from Vulkan completely but instead, to use OpenGL as rendering API instead of Vulkan so that one can verify that the Vulkan implementation performance is okay. This shall help to spot performance issues resulting from suboptimal Vulkan implementation. Examples include:

Bad usage of synchronization (barriers, semaphores, fences)
Bad usage of image layout transitions (too many of them; or using a bad-performing layout)
...

The master branch shall not have any dependencies on OpenGL, but it shall require Vulkan only.
Therefore, it would probably be best to implement this on a separate branch that adds OpenGL support and keep it there (constantly rebasing on master).

Definition of done:

All the examples can be run on OpenGL (instead of Vulkan) without any modifications
All API-calls (originally to Vulkan) are translated to OpenGL calls under the hood.
A performance-comparison (example application?!) has been implemented that compares the performance of running on Vulkan and running on OpenGL.

Add DDS loader

stb_image does not support the DDS file format => must be loaded manually.

If code is added, make sure that it complies license-wise
Implement in gvk::create_image_from_file
Can the ORCA scenes "Sun Temple" and "Bistro" be loaded after this addition?

Add a nice standard-shader that can be used to render different types of materials

It would be nice to have a standard shader that can be used to render ORCA scenes out of the box nicely. The materials are loaded anyways (see struct material and struct material_config, and also struct material_gpu_data) and that standard shader would optimally regard many of these material settings.

Optimally, this would be a physically-based shader.

The shader should probably be stored under /shaders (i.e. create a new directory). To use it, it must be added to Visual Studio's filters for a certain project --- just as any other shader.

Add this shader to the orca_loader example and render a loaded ORCA scene with that shader! In order to get a complete illumination model that does not lose energy, probably a skybox will have to be added (which will contribute the global illumination part of the shading model).

Definition of done:

A physically-based shader has been implemented and added to the repository
A nice skybox has been added to the repository
The orca loader example application has been modified to render a skybox and use the new shader
Arbitrary models/ORCA scenes loaded through the orca loader example application's UI render nicely out of the box.
Newly added features are well documented and the Contribution Guidelines have been followed.

Integrate WaitDstStageMask into Semaphores

It would probably make sense, to integrate the WaitDstStageMask into the cgb::semaphore_t class. Depending on where the semaphore is created, one might be able to set the WaitDstStageMask to a meaningful value. The default value for it would be vk::PipelineStageFlagBits::eAllCommands, i.e. the unoptimized default.

When submitting a command buffer, a WaitDstStageMask has to be specified for each semaphore to wait on via the pWaitDstStageMask field of VkSubmitInfo. The right place for WaitDstStageMask seems to be inside of cgb::semaphore_t.

Colors as hex-code

Let colors be specified as hex-codes (e.g. 0x030303FF), not only as RGB-triples.

Swap chain recreation

Although it is somewhat prepared in the code already, it's not fully implemented. => Implement!

Something to read: https://vulkan-tutorial.com/Drawing_a_triangle/Swap_chain_recreation

Implementation:
Probably in window::sync_before_render() (which is invoked after all invokee::update()s and before all invokee::render()s). Maybe also some handling in window::render_frame() is required (which is invoked after all invokee::render()s). The exception-handler is already prepared in both cases: catch (vk::OutOfDateKHRError omg), but not implemented yet.

Updating the window's framebuffers should be straight forward.

Care must be taken when updating dependent objects like graphics_pipeline!
(Attention: A graphics_pipeline shall not always receive updates from swapchain recration, but only if it opts in... or maybe if it references a window's backbuffers?! Anyways, there must be a manual opt-in variant.)

Further care must be taken because an updated graphics_pipeline might lead to invalidated (pre-recorded and re-used) command_buffers!

In `cgb::sync`, get rid of the `read_memory_access` restrictions for the destination parameters.

The only thing that makes no sense according to 1 is:

_READ flags are passed into srcAccessMask

However, both _READ and _WRITE accesses can be subject to incoherent caches (e.g. COLOR_OUTPUT distributed across all streaming multiprocessors ... hmm, that will be handled by the ROP somehow, probably... but anyways, it makes sense to state the memory dependency!)

Good example by 1:

Let’s say that we’re allocating a fresh image, and we’re going to use it in a compute shader as a storage image. The pipeline barrier looks like:

srcStageMask = TOP_OF_PIPE – Wait for nothing
dstStageMask = COMPUTE – Unblock compute after the layout transition is done
srcAccessMask = 0 – This is key, there are no pending writes to flush out. This is the only way to use TOP_OF_PIPE in a memory barrier. It’s important to note that freshly allocated memory in Vulkan is always considered available and visible to all stages and access types. You cannot have stale caches when the memory was never accessed … What about recycled/aliased memory you ask? Excellent question, we’ll cover that too later.
oldLayout = UNDEFINED – Input is garbage
newLayout = GENERAL – Storage image compatible layout
dstAccessMask = SHADER_READ | SHADER_WRITE

Note the dstAccessMask = SHADER_READ | SHADER_WRITE after the layout transition!

Another example from 1:

vkCmdPipelineBarrier(image = image1, oldLayout = UNDEFINED, newLayout = COLOR_ATTACHMENT_OPTIMAL, srcStageMask = COLOR_ATTACHMENT_OUTPUT, srcAccessMask = COLOR_ATTACHMENT_WRITE, dstStageMask = COLOR_ATTACHMENT_OUTPUT, dstAccessMask = COLOR_ATTACHMENT_WRITE|READ)

This one is referring to aliased memory, but I think, for this issue, it can serve as a general example: WRITE accesses must be made available before caches are made coherent ("visible"!) to subsequent READs or WRITES.

Add an editor component that allows to change all the parameters of all the different objects loaded from an ORCA file!

Use ImGui to create a ready-to-use editor component that allows changing the parameters of a loaded ORCA scene file. That is:

Changing the position of 3D model instances
Changing the colors and intensities of light sources
Changing camera paths
...?

Add this editor to the orca_loader example.

C++20 Feature: fmt

Use C++20 format instead of fmtlib

Create an example application that does not require a window

Vulkan does not need windows to access the GPU. Create an example application that computes something on the GPU (using compute shaders?!) and outputs the result to the console.

The way to do this would simply be not to xk::context().create_window and not to pass any window to xk::execute.

However, there is a known issue: Initialization code of the context enters an endless loop of event handler invocations (generic_glfw::work_off_event_handlers). The event handlers do not reach a certain state if there are no windows --- or more precisely, if no surface is being created. That must be fixed first.

Why don't we have more than 1000 FPS for the "Hello World" Example?

Sometimes, even numbers of 6000 FPS can be seen for simple Vulkan examples on a not too high-end GPU (RTX 2070), like here for example: Quick Introduction to Mesh Shaders (OpenGL and Vulkan). With a UI (probably ImGui) there are still more than 4000 FPS. I think, Gears-Vk has never rendered more than ~800 FPS.

It must be somehow related to GLFW, input handling or the double-threaded architecture of Gears-Vk.

Update: There is now an option for running Gears-Vk in single-threaded mode. It can be enabled by setting the macro SINGLE_THREADED to 1 in composition.hpp: #define SINGLE_THREADED 1.
Attention: This has not been merged into master yet (October 6th, 2020).

Edit: I think that these low frame rates only occur in debug mode. In release mode, there should be several thousand FPS. However => investigate!

Try the following:

Make a single-threaded version of the framework (probably just using a compiler switch) where glfwPollEvents is used
If the first point does not help, try to disable input processing (big loop over all keys)
If that doesn't help, can it be related to Vulkan somehow? Do we have any strange barriers? Disable some parts of the code just to investigate how
If that doesn't help... maybe the problem is GLFW itself? Swapping out GLFW, however, would be a tedious endeavour most likely.

Definition of done:

Update GLFW with the latest version
Print the FPS to the window title and don't rely on ImGui
Use a profiler (one is built into Visual Studio) to check if there are any suspicious delays on the CPU side
Use a GPU profiler (such as RenderDoc or NVIDIA Nsight) to check if there are any suspicious delays on the GPU side.
Test both versions: #define SINGLE_THREADED 0 and #define SINGLE_THREADED 1 and compare the results!
Report if you were able to find any suspicious measurements and plan further actions.

Possible bug: subpass dependency that would allow writing indirect command arguments

During debugging, RenderDoc issues a warning:

Creating renderpass "Render Pass 129" contains a subpass dependency that would allow writing indirect command arguments. Indirect command contents are read at the end of the render pass, so write-after-read overwrites will cause incorrect display of indirect arguments.

To be investigated:

Is that is a real problem or only a side-effect of debugging with RenderDoc?
Does it also occur when debugging with NVIDIA Nsight? (Nsight Graphics is recommended)
Does it happen with the model_loader and/or(?) orca_loader examples?
How can it be fixed?

Definition of done:

The warning message no longer occurs OR the problem can be identified as a side effect of debugging with RenderDoc.

Combine buffers

A buffer should be usable, e.g., as both index_buffer and uniform_texel_buffer. => Just differentiate via metadata, but do not make them different types.

Maybe it would make for a nice interface to act as if a buffer would support multiple sets of metadata, but internally, it would all be combined into one metadata representation and the Vk-flags etc. would be handled accordingly.

Fix access violation which occures irregularly with the compute_image_processing example

SOMETIMES, but not always, an access violation occurs in the compute_image_processing example inside the nvogl64.dll. This happens for (currently) unknown reasons.

It can be provoked by hammering on one of the [1], [2], or [3] keys during startup. It appears to occur less frequently (or never?) after the first few frames.

Add support for Timeline Semaphores

Timeline Semaphores have been added with Vulkan 1.2. They offer several advantages over "the old way" which means: binary semaphores + fences (and the status-quo of how swapchain handling is implemented in Auto-Vk-Toolkit right now).

The following resources should be helpful for getting into timeline semaphores:

This task requires changes in both, Auto-Vk and Auto-Vk-Toolkit:

Add support for Timeline Semaphores to Auto-Vk, probably into avk::semaphore_t
Implement an alternative version of swapchain handling in Auto-Vk-Toolkit which uses Timeline Semaphores! The user shall be given the option to choose between using Timeline Semaphores or "the old way" (which is binary semaphores + fences).
Add an example that uses Timeline Semaphores to Auto-Vk-Toolkit!

This task would go well with issue #47.

Definition of done Auto-Vk:

Timeline Semaphores have been added to Auto-Vk and are abstracted nicely.
All relevant Timeline Semaphores-related functionality can be accessed through Auto-Vk.
Timeline Semaphores functionality has either been added to the existing class avk::semaphore_t or a new class avk::timeline_semaphore_t has been introduced.
- If a new class has been introduced, it supports all the "extra features" that avk::semaphore_t also supports, namely set_custom_deleter and using timeline_semaphore = avk::owning_resource<timeline_semaphore_t>;
The newly added functionality is well documented and the Contribution Guidelines have been followed.

Definition of done Auto-Vk-Toolkit:

The usage of Timeline Semaphores can be opted-in for swapchain handling in Auto-Vk-Toolkit.
A way to switch between 1) Timeline Semaphores and 2) "the old way" has been implemented. It might be the best option to just have a compiler switch (i.e. a #define) to switch between the two since there's no point in switching at runtime.
Depending on the switch, the code in window::update_concurrent_frame_synchronization has been altered to create the right resources (timeline semaphores for each concurrent frame OR binary semaphores and fences for each concurrent frame).
Depending on the switch, the code in window::sync_before_render has been altered.
Depending on the switch, the code in window::render_frame has been altered.
All the existing examples work with both: timeline semaphores and "the old way".
Somewhat optional but reaaaaally nice to have: A new example has been added that shows the strengths of timeline semaphores:
- Timeline semaphores are most useful in situations where we do not know the number of synchronization events beforehand, which would be the case for a situation situation with a fixed simulation frame rate (e.g., 60Hz physical simulation) and a variable render frame rate (FPS to the max). Such a scenario is described in the video linked above, approximately here.
The new swapchain handling functionality and the way how to switch are well documented and the Contribution Guidelines have been followed.

sRGB image and STORAGE_IMAGE_BIT

What's up with that? It doesn't seem to be supported.

=> enable sRGB textures in the compute_image_processing example, take a look at the error message and try to fix it!

Make `geometry_instance` independent of `bottom_level_acceleration_structure`

It should be more obvious from the software design, that different top_level_acceleration_structures can reference the same bottom_level_acceleration_structures with individual instance data.

Swap stb_image and GLI for a "big" image loader

Currently, two image loaders are used: stb_image and GLI. In the function gvk::create_image_from_file, first it is tried to load an image via GLI. If that fails, it is tried to load an image via stb_image. GLI can load image formats DDS and KTX (whatever that is).

In the case of loading via GLI, further actions are required: The correct Vulkan-Format must be chosen (see material_image_helpers.hpp#L239 and MIP-maps can not be generated automatically, but must be loaded from file as well (see material_image_helpers.hpp#L171ff).

When loaded via stb_image, always "ordinary" RGBA-formats are created and MIP-maps are created via image_t::generate_mip_maps.

With GLI, there is a problem: Depending on the image format, not all images can be flipped due to some limitations of GLI. That's why there is an if at material_image_helpers.hpp#L231. This turns into a real problem, when loading the UE4 Sun Temple scene or the NVIDIA Emerald Square City Scene from the ORCA scene repository because they contain such textures that can not be flipped.

Another problem is that some not too uncommon image formats are not supported as well, like EXR and TIFF.

This might be a good point in time to swap out stb_image and GLI for one of the "big" image loaders:

DevIL, or
FreeImage

To be checked before integrating one of these:

Is the project still "alive"?
Does the license comply? Gears-Vk is licensed under MIT, i.e. the license of the image loader library must comply with MIT. I believe that there were some discrepancies between GPL and MIT, s.t. MIT software may not include GPL code? But I could be wrong about that => to be checked!
Can DDS textures of different formats be loaded (and also uploaded onto the GPU... i.e. not that DDS formats are loaded in an RGB8 format into main memory)?
Can HDR images be loaded?
Can EXR images be loaded?
Can TIFF images be loaded?

This issue should optimally be implemented in conjunction with issue #50.

Definition of done:

Switch from cmdow.exe to a PowerShell script and compile cgb_post_build_helper on first usage

Switching to PowerShell would have some great advantages:

On first usage ever, one could build the cgb_post_build_helper on first usage and get rid of several of the problems (Known Issues and Troubleshooting w.r.t. CGB Post Build Helper) and of the security warning!
Whenever the target directory is completely empty, the script could launch the build step in a synchronous way. Visual Studio is stuck in that case, but resource loading won't fail.
At subsequent builds (i.e. when the target directory is not empty), cgb_post_build_helper can be invoked in an asynchronous way to save time (or maybe that should never happen?)

Shader Hot Reloading

The post_build_helper already monitors and copies files into the output directory. What's still missing is a file watcher in C++ which detects changed files and updates pipelines.

Support a "small vector" type

At many places within Auto-Vk and Gears-Vk, small std::vectors are passed around --- meaning that only few elements are contained. A small vector type would probably be very beneficial to be used instead of std::vector especially when passing something to arguments or returning results from functions/methods.

But not only for argument/result passing could such a type be useful, but also for holding data like, e.g., the swapchain images, etc. Especially for all the frames in flight, at many places there will be exactly #frames-in-flight (e.g., three) resources to be held. Would be a waste to have a memory redirection for each of these given that there will be only very few elements in each case.

There are a few implementations, but I think that they generally have too many dependencies:

LLVM's llvm::SmallVector< T, N >
Boost's boost::container::small_vector (additional description)
Facebook's folly::small_vector<T,Int=1,...>

Note: Looks like llvm::SmallVector does not satisfy the noexcept requirement (see below). And looks like folly:small_vector requires Boost, which is hell of a dependency. ^^

Possible route to go:

It might even be the best idea to implement it by ourselves following a similar strategy as described here: https://easyperf.net/blog/2016/11/25/Small_size_optimization (with corresponding GitHub repository here: https://github.com/dendibakh/prep/blob/master/SmallVector.cpp), meaning that we have both, a std::array and a std::vector as members and if the std::array gets too small, std::moves are performed from the std::array into the std::vector.

I think, the best option would be to have a std::variant<std::array<T, N>, std::vector<T>> and when that "fateful" move from array->vector has to happen, it would probably be best to: 1st move the std::array into a std::vector<T> tmp, and 2nd move that "back" into the std::variant.

We define this type to serve a specific purpose, namely: To serve the purpose of passing parameters or results. That means, it shall offer convenient addition of elements (push_back) and once constructed, convenient usage of those elements, either by turning it into a std::span or by getting begin/end iterators. But the type does not have to pay attention to stuff like graceful handling of iterator invalidation, etc. and does not even have to support removal of elements. It shall just be a useful (rather temporarily used) container for passing around information and compared to a std::vector, it shall have the advantage that it does not perform a heap allocation for small numbers of elements.

Some required properties of the small vector type:

The small vector type must have noexcept specifiers for the move constructor (T(T&&)) and the move assignment operator (T& operator=(T&&)). If it does not have it, the frameworks will break.
The small vector type should be easily convertible to/from std::vector. I.e. it would probably be great if it were implicitly assignable to a std::vector or could implicitly be constructed from a std::vector.
The small vector type should be convertible to a std::span

Definition of done:

A small vector type has been added to Gears-Vk
The small vector type satisfies the requirements stated above
The license of the small vector complies with MIT
The required dependencies have been added to the repository and have been configured within the Visual Studio project (optimally, there are no additional dependencies)
The small vector type is used wherever it makes sense (which also means that it is not used when a std::span would be more appropriate).
The code is well documented and the Contribution Guidelines have been followed.

Is the maximum number of bones-parameter properly handled?

When a bone animation is created via one of the gvk::prepare_animation_* methods, and the aMaxBoneMatrices parameter is set to a value which is less than the actual number of bone matrices within a mesh, is that properly handled?

To be investigated:

Do the gvk::prepare_animation_* methods not write beyond memory boundaries on the C++ side?
Does the algorithm stop in time?
Are the resulting selected bone matrices deterministic? (I.e. after several runs, are always the same aMaxBoneMatrices matrices selected?)

Definition of done:

gvk::prepare_animation_* methods not write beyond memory boundaries on the C++ side.
The algorithm stops asap.
The algorithm always selects the same bone matrices.
gvk::prepare_animation_* methods are well documented and the Contribution Guidelines have been followed.

Synchronization issues depending on the presentation mode and the number of concurrent frames

Depending on the synchronization mode and number of concurrent frame settings, some visual artefacts can occur. The reason for this has to be investigated.

focus_rt may serve as a good test project since the visual artefacts can be observed well in that project.

For the following configuration:

mainWnd->set_number_of_concurrent_frames(3);
mainWnd->set_presentaton_mode(cgb::presentation_mode::triple_buffering);

very obvious visual artefacts at moving objects can be observed.

TODOs:

Investigate the immediate presentation mode
Investigate the double_buffering presentation mode
Investigate the vsync presentation mode
Investigate the triple_buffering presentation mode
Why are there no visual artefacts with 1 concurrent frame?
Why are there tendentially more visual artefacts with higher numbers of concurrent frames?
Does the previous point only apply to double_buffering and triple_buffering or to all presentation modes?
Why can the visual artefacts be solved by waiting on the fence of one frame earlier as the previous frame with the same in flight index? Or can it?

SFINAE/C++20 concept for convert_for_gpu_usage functions

Some of the convert_for_gpu_usage functions are expecting a const std::vector<....>& as input argument. It is not cool to dictate the specific container to the user => support arbitrary containers of the desired type as input.

Use the SFINAE class has_size_and_iterators to ensure that it is a collection and in addition another SFINAE to check for the desired type.

Update: Or even better, switch from SFINAE to C++20 Concepts!

Definition of done:

Investigate how to best refactor these functions/methods. Either use templates (or variadic templates) or evaluate if std::span can be used, which is a C++20 feature.
Refactor extern std::tuple<std::vector<material_gpu_data>, std::vector<avk::image_sampler>> convert_for_gpu_usage(const std::vector<gvk::material_config>& aMaterialConfigs, ...
Refactor typename std::enable_if<avk::has_size_and_iterators<In>::value, Out>::type convert_for_gpu_usage(const In& aLightsourceData, glm::mat4 aTransformationMatrix = glm::mat4{1.0f})
Refactor typename std::enable_if<avk::has_resize<Out>::value, Out>::type convert_for_gpu_usage(const In& aLightsourceData, const size_t aNumElements, glm::mat4 aTransformationMatrix = glm::mat4{1.0f})
Refactor Out convert_for_gpu_usage(const In& aLightsourceData, const size_t aNumElements, glm::mat4 aTransformationMatrix = glm::mat4{1.0f})
Refactor void convert_for_gpu_usage(const In& aLightsourceData, const size_t aNumElements, glm::mat4 aTransformationMatrix, Out& aDestination)
Search in the source code if there are further functions/methods where this can be applied.
All of the refactored functions/methods are well documented and the Contribution Guidelines have been followed.

W.r.t. C++20 feature support, check the following resources:

Furthermore, ensure that you have installed the latest Visual Studio 2019 non-preview version (which is 16.7.5 on October 6th, 2020), so that you can use as many C++20 features as currently supported by MSVC.

Further resources about relevant C++ topics:

GoingNative 2012 Variadic Templates are Funadic
C++20: An (Almost) Complete Overview - Marc Gregoire - CppCon 2020 about std::span at time 45:29

There seems to be some lag introduced into GLFW which somehow depends on mouse input

On a PC with AMD x370, Ryzen 3900X, and RTX 2080, lags can be observed constantly when moving the mouse. It appears that there are no lags when there is just keyboard input.

On a Microsoft Surface (Core i5, Intel HD 620), lags can be observed when a bluetooth mouse is being connected during runtime. After it has connected, the lags seem to be gone.

It could be worthwhile trying to not use GLFW callbacks but query mouse-related data once per frame. I.e. not use the callbacks to:

glfw_cursor_pos_callback
glfw_mouse_button_callback
glfw_scroll_callback

create_new_project.exe does not set the right relative path

When copying a project from cg_base/visual_studio/examples/orca_loader/orca_loader.vcxproj to ., the asset paths were not correctly adapted. sponza.fscene did not point to inside the cg_base folder.

Figure out how to generically handle swap chain image's layout transitions

Either handle via
window_vulkan::mImageLayoutTransitionBeginningOfFrame and window_vulkan::mImageLayoutTransitionPresent

via a renderpass? I don't know. Maybe try to use only the former?!

create_new_project.exe creates invalid .vcxproj files

When copying a project with create_new_project.exe the resulting .vcxproj files are corrupted when precompiled headers are used (maybe also in other cases).

The resulting XML in the copy-target's .vcxproj looks like follows:

  <ItemGroup>

      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release_OpenGL|x64'">Create</PrecompiledHeader>
      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Publish_Vulkan|x64'">Create</PrecompiledHeader>
      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Publish_OpenGL|x64'">Create</PrecompiledHeader>
      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug_Vulkan|x64'">Create</PrecompiledHeader>
      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Release_Vulkan|x64'">Create</PrecompiledHeader>
      <PrecompiledHeader Condition="'$(Configuration)|$(Platform)'=='Debug_OpenGL|x64'">Create</PrecompiledHeader>
    </ClCompile>
  </ItemGroup>

I.e. the problem is that the opening <ClCompile> is missing.

Follow Vulkan best practices according to the "best practices" validation layer

For each example application, enable the best practices validation layer feature and try to get rid of all the warnings (errors?) it produces!

The validation layer feature can be enabled like follows:

gvk::start(
	...
	[](gvk::validation_layers& aValLayerConfig){
		aValLayerConfig.enable_feature(vk::ValidationFeatureEnableEXT::eBestPractices);
	},
	...
);

Modify whatever is necessary to get rid of all the warnings or errors emitted from this "best practices" validation layer feature!

Definition of done:

Integrate Vulkan Memory Allocator

Vulkan Memory Allocator (VMA) shall be added and supported throughout all the classes which allocate memory.

First step is to investigate if it shall be added to Gears-Vk or to Auto-Vk. Depending on the path chosen, Auto-Vk has to be adapted accordingly:

If VMA is added to Auto-Vk, the interface to code using Auto-Vk can stay probably roughly the same. However, it might be desirable to allow the user to alter VMA-config in some way---either through the existing (extended?) interface or through additional functionality.
If VMA is not added to Auto-Vk but to Gears-Vk, the interfaces of Auto-Vk need to be changed so that they support custom memory allocators (which must comply to some interface) wherever memory is to be allocated.

License-wise, everything should be fine, since VMA is licensed under MIT.

Useful video: DD2018: Adam Sawicki - Porting your engine to Vulkan or DX12

Something isn't right with the render<->swapchain-image synchronization

Implement the following animations as an additional example:

An object moves on a straight or circular path, camera is fixed, but sees the object all the time
The camera moves on a straight or circular path, object is fixed, but always seen by camera
Object and camera move on the same path, camera takes a fixed look on the object

Watch out for stutters like, for example, the object "pops" forth and back, or a streak is visible, coming from not frame-perfect synchronization. What could cause this?
Most likely, it has something to do with how synchronization is handled in window::sync_before_render() and window::render_frame().

Try all the different presentation modes:

presentation_mode::immediate
presentation_mode::relaxed_fifo
presentation_mode::fifo
presentation_mode::mailbox

While presentation_mode::immediate can cause tearing, actually this popping forth and back should NOT happen with it. Frames are rendered and presented in strict order (at least that is what SHOULD happen), i.e. I don't think that it can happen that an image that is handed-over to the presentation engine later, can "overtake" an earlier submitted image.

With the other presentation modes, neither tearing nor popping forth and back may happen. If it does, it indicates a problem.

What could mayyybe happen: The animation could stutter a bit with presentation_mode::mailbox. But with presentation_mode::fifo, everything must be perfectly smooth at least. => To test this, implementing the animation based on the delta time would be a good idea... hmm, not sure if these artifacts can be provoked easily.

Read more about presentation modes here: API without Secrets: Introduction to Vulkan* Part 2: Swap Chain -> navigate to section "Selecting Presentation Mode".

This task would go well with issue #45.

Definition of done:

An additional example application is implemented that has all of the features described above
No stuttering can be seen in this example application that comes from the (semaphore/fence-based) swapchain synchronization.
No stuttering can be seen in any of the other example applications that comes from the swapchain synchronization.
Both, debug mode and release mode have been tested.
The new example application is well documented and the Contribution Guidelines have been followed.

Sustainable solution for displaying images with ImGui that is compatible with ImGui's master

In ImGui, images to be rendered are abstracted by the means of ImTextureID which is to be implemented in a API-specific and/or application-specific manner. Currently, Gears-Vk uses the approach from the pull request here: ocornut/imgui#914

However, that is not a great solution because it does not look like this pull request is going to be merged into master. Also, it is probably not a great or sustainable solution because either descriptor pools are deleted/recreated way too often (every frame) or the pool might go out memory some time in the future because it is never freed. (not sure about that, though)

Also, martty/imgui is several hundred commits behind ImGui's master.

First step: Investigate, how exactly descriptors and descriptor pools are handled by the above mentioned pull request's code.

Then: Implement a sustainable solution that integrates well with Gears-Vk and -- most importantly -- compiles against and works with ImGui's master branch, so that it only requires "pure ImGui" and does not depend on a 3rd party's pull request.

Definition of done:

We can pull in the latest changes from ImGui's master branch and it still works.
The functionality has been nicely integrated in a future-proof way so that we do not expect future ImGui changes to break our code.
Images with alpha channel are rendered transparently
Images without alpha channel are rendered opaquely
A nice to have would be to render images with an alpha channel in front of an opaque background
The code added/modified and also the UI code added to at least one of the example applications has been documented and the Contribution Guidelines have been followed.

cg-tuwien / auto-vk-toolkit Goto Github PK

auto-vk-toolkit's People

Stargazers

Watchers

Forkers

auto-vk-toolkit's Issues

Recommend Projects

Recommend Topics

Recommend Org