The clustered-shading's discuss from daveh355

Deferred lighting cluster question [LWJGL]

Hello, thank you very much for the tutorial! I am currently looking for a way to implement cheap multiple lights for my LWJGL-based engine and you described everything very well!

However, I faced one issue: lights are distributed only to some clusters and they have very sharp borders. After some testing, I got to know that they do not pass AABB intersection check. If I increase the radius from 5.0 to a value near 50.0, then lights cover the scene properly, but still have a lot of visual glitches. (on a screenshot, the objects are located in z ~ -30, camera is at world origin facing negative z, positive x is to the left)

For the scene,I use 3 point lights with radius 5.0:
pointLights.add(new PointLight(new Vector3f(-10, 5, -60), new Vector3f(1, 0, 0), 0.3f, 5.0f)); pointLights.add(new PointLight(new Vector3f(20, 5, -60), new Vector3f(0, 1, 0), 0.3f, 5.0f)); pointLights.add(new PointLight(new Vector3f(-30, 5, -60), new Vector3f(0, 0, 1), 0.3f, 5.0f));
(first value is position, second is color, third is intensity, fourth is radius).

This is my clustered lighting class, which contains shaders, deferred scene texture and methods to update lights and draw deferred scene:
` public class TestCluster {

private DeferredClusterShader shader;
private DeferredLightAABBCullingShader lightCullingShader;
private DeferredTestShader lightingShader;

private Texture deferredSceneTexture;

private int ssbo;
private int lightSSBO;

private int gridSizeX = 12;
private int gridSizeY = 12;
private int gridSizeZ = 24;
private int numClusters = gridSizeX * gridSizeY * gridSizeZ;

private final int clusterAlignedSize = 448; // (436 + 16 - 1) & ~(16 - 1) = 448 bytes
private final int lightAlignedSize = 48; // (12 + 12 + 4 + 4) = 32 bytes

private int width;
private int height;

private Mesh mesh;
private RenderParameter config;

private boolean isFirstPass = true;

public TestCluster(int width,int height) {
	this.width = width;
	this.height = height;
	this.shader = DeferredClusterShader.getInstance();
	this.lightCullingShader = DeferredLightAABBCullingShader.getInstance();
	this.lightingShader = DeferredTestShader.getInstance();
	
	deferredSceneTexture = new Texture2D(width, height, 
			ImageFormat.RGBA16FLOAT, SamplerFilter.Nearest, TextureWrapMode.ClampToEdge);
	
	createSSBO();
	
}

private void createSSBO() {
	this.ssbo = glGenBuffers();
	glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
	

	glBufferData(GL_SHADER_STORAGE_BUFFER, clusterAlignedSize * numClusters, GL_STATIC_COPY);
	
	glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo);
}

public void cullLightsCompute() {
	shader.bind();
	shader.updateUniforms(width, height, gridSizeX, gridSizeY, gridSizeZ);
	glDispatchCompute(gridSizeX, gridSizeY, gridSizeZ);
	glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
}

public void updateLightSSBO(List<PointLight> pointLights) {
	ByteBuffer lightData = BufferUtils.createByteBuffer(lightAlignedSize * pointLights.size());
	FloatBuffer fv = lightData.asFloatBuffer();
	
	for(PointLight light: pointLights) {
		Vector3f p = light.getPosition();
		Vector3f c = light.getColor();
		float i = light.getIntensity();
		float r = light.getRadius();
		
		fv.put(p.x).put(p.y).put(p.z).put(1.0f)
		.put(c.x).put(c.y).put(c.z).put(1.0f)
		.put(i)
		.put(r)
		.put(0.0f).put(0.0f); // 4 + 4 bytes padding
	}
	
	fv.flip();
	
	if(isFirstPass) {
		this.lightSSBO = glGenBuffers();
		isFirstPass = false;
	}
	
	glBindBuffer(GL_SHADER_STORAGE_BUFFER, lightSSBO);
	glBufferData(GL_SHADER_STORAGE_BUFFER, lightData, GL_DYNAMIC_DRAW); // in LWJGL we can put bytes directly without specifying capacity
	glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, lightSSBO);
	
}

public void lightAABBIntersection() {
	lightCullingShader.bind();
	lightCullingShader.updateUniforms(GLContext.getMainCamera().getViewMatrix());
	glDispatchCompute(27,1,1); // for 12x12x24 work groups of cluster shader we have 3456 threads, same as for 27 work groups
	glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
}

public void render(Texture albedo, Texture position, Texture normal, Texture specularEmissionDiffuseSSAOBloom, Texture depth) {
	lightingShader.bind();
	glBindImageTexture(0, deferredSceneTexture.getId(), 0, false, 0, GL_WRITE_ONLY, GL_RGBA16F);
	glBindImageTexture(2, albedo.getId(), 0, false, 0, GL_READ_ONLY, GL_RGBA16F);
	glBindImageTexture(3, position.getId(), 0, false, 0, GL_READ_ONLY, GL_RGBA32F);
	glBindImageTexture(4, normal.getId(), 0, false, 0, GL_READ_ONLY, GL_RGBA16F);
	glBindImageTexture(5, specularEmissionDiffuseSSAOBloom.getId(), 0, false, 0, GL_READ_ONLY, GL_RGBA16F);
	
	lightingShader.updateUniforms(width, height, gridSizeX, gridSizeY, gridSizeZ);
	
	glDispatchCompute(width/2, height/2,1);
}

public Texture getDeferredSceneTexture() {
	return deferredSceneTexture;
}

In shader class, nothing is very specific, just compile the program and update the uniforms. Since java does not have struct like in your example in C++, I use byte buffer to load light data to SSBO. Basically,I just take data from light object and put it to the buffer with 8 padding bytes. This is tested and gives proper results in shader.

For lighting I use compute shader as seen in the code and get the fragment position from global invocation ID, which gives precise result with screenSize / 2 work groups and local size of 2:

`#version 430 core

layout (local_size_x = 2, local_size_y = 2) in;

layout (binding = 0, rgba16f) uniform writeonly image2D defferedSceneImage;

layout (binding = 2, rgba16f) uniform readonly image2DMS albedoSampler;
layout (binding = 3, rgba32f) uniform readonly image2DMS worldPositionSampler;
layout (binding = 4, rgba16f) uniform readonly image2DMS normalSampler;
layout (binding = 5, rgba16f) uniform readonly image2DMS specular_emission_diffuse_ssao_bloom_Sampler;

struct PointLight {
vec4 position;
vec4 color;
float intensity;
float radius;
};

struct Cluster {
vec4 minPoint;
vec4 maxPoint;
uint count;
uint lightIndices[100];
};

layout(std430, binding = 1) restrict buffer clusterSSBO
{
Cluster clusters[];
};

layout(std430, binding = 2) restrict buffer lightSSBO
{
PointLight pointLight[];
};

uniform float zNear;
uniform float zFar;
uniform uvec3 gridSize;
uniform uvec2 screenDimensions;

uniform mat4 viewMatrix;

vec3 diffuse(vec3 albedo, vec3 normal, vec3 toLightDirection, vec3 lightColor,
	float lightIntensity) {
float diffuseFactor = max(dot(normal, toLightDirection), 0.0);
return albedo * lightColor * diffuseFactor * lightIntensity;
}

vec3 specular(vec3 normal, vec3 position, vec3 toLightDir, vec3 lightColor,
	float reflectance, float emission, float lightIntensity) {
vec3 cameraDirection = normalize(-position);
vec3 fromLightDir = -toLightDir;
vec3 reflectedLight = normalize(reflect(fromLightDir, normal));
float specularFactor = max(dot(cameraDirection, reflectedLight), 0.0);
specularFactor = pow(specularFactor, emission);
return lightIntensity * specularFactor * reflectance * lightColor;
}

vec3 calculateLight(vec3 albedo, vec3 position, vec3 normal, PointLight light,
	float reflectance, float emission) {
vec3 lightDirection = light.position.xyz - position;
vec3 toLightDir = normalize(lightDirection);

vec3 diffuse = diffuse(albedo, normal, toLightDir, light.color.rgb,
		light.intensity);
vec3 specular = specular(normal, position, toLightDir, light.color.rgb,
		reflectance, emission, light.intensity);

// Attenuation
float distance = length(light.position.xyz - position);
float attenuation = 1.0 / (1.0 + (distance / light.radius));

diffuse *= attenuation;
specular *= attenuation;

return diffuse + specular;
}

void main(void) {
    // compute coord represents coordinates in screen space
ivec2 computeCoord = ivec2(gl_GlobalInvocationID.x,
		gl_GlobalInvocationID.y);

vec3 finalColor = vec3(0);
vec3 albedo = vec3(0);
vec3 position = vec3(0);
vec4 normal = vec4(0);
vec4 specular_emission_diffuse_ssao_bloom = vec4(0);
vec4 depth = vec4(0);

albedo = imageLoad(albedoSampler, computeCoord, 0).rgb;
normal = imageLoad(normalSampler, computeCoord, 0).rbga;

position = imageLoad(worldPositionSampler, computeCoord, 0).rgb;
specular_emission_diffuse_ssao_bloom = imageLoad(
		specular_emission_diffuse_ssao_bloom_Sampler, computeCoord, 0).rgba;
		
    // we get fragment position from sampled world pos multiplied by view matrix
uint zTile = uint(
		(log(abs(vec3(viewMatrix * vec4(position, 1.0)).z) / zNear)
				* gridSize.z) / log(zFar / zNear));
vec2 tileSize = screenDimensions / gridSize.xy;

uvec3 tile = uvec3(computeCoord.xy / tileSize, zTile);
uint tileIndex = tile.x + (tile.y * gridSize.x)
		+ (tile.z * gridSize.x * gridSize.y);

uint lightCount = clusters[tileIndex].count;

for (int i = 0; i < lightCount; ++i) {
	uint lightIndex = clusters[tileIndex].lightIndices[i];
	PointLight light = pointLight[lightIndex];

	// Lighting
	finalColor += calculateLight(albedo, position, normalize(normal.xyz),
			light, specular_emission_diffuse_ssao_bloom.r,
			specular_emission_diffuse_ssao_bloom.g);

}

imageStore(defferedSceneImage, computeCoord, vec4(finalColor, 1.0));
}`

The shaders for clusters and AABB intersection are not different from your example from the tutorial.
This screenshot demonstrates when I render tile.xyz (multiplied by 0.01):

And this is depth of tiles (zTile * 0.01):

In render loop, it goes in the following order:

Init clustered renderer class, run cluster shader to make a grid and fill cluster SSBO
add lights to SSBO and to first AABB intersection check for them (3 lights for now)
update loop:

update camera and scene
forward render scene to G-buffer
test lights again for AABB intersection using shader from tutorial, update light indices in cluster SSBO
use compute shader to do lighting to a texture
post-processing stage on deferred scene texture
draw final texture on screen

So far, clusters are defined correctly, however, in view space they all have negative z position, which also seems fine (or not?). The issue starts when testing AABB sphere intersection, after a bit of testing I got to know that if I do not load view matrix (view matrix will be not identity, but all-zeros), then light position will be zero and the check if distance squared <= radius squared will be always true, however, when I load a view matrix (even identity matrix), the check will return false in most of the cases. It will be true in case the light position is actually inside the cluster (which I am not sure, I judged it by the colored clusters as you see on the screenshot above). Just in case, this is my view matrix calculation (it is very basic view matrix definition):
`
private Matrix4f updateViewMatrix() {

	viewMatrix.identity();
	
	viewMatrix.rotate((float)Math.toRadians(pitch), new Vector3f(1,0,0));
	viewMatrix.rotate((float)Math.toRadians(yaw), new Vector3f(0,1,0));
	viewMatrix.rotate((float)Math.toRadians(roll), new Vector3f(0,0,1));
	
	Vector3f negativePos = new Vector3f(-position.x, -position.y, -position.z);
	viewMatrix.translate(negativePos);
	
	this.invViewMatirx.set(viewMatrix).invert();
	
	return viewMatrix;
}`

So, at this moment I don't know where exactly can be the mistake and I'm trying to fix it.
I know that your example is not in LWJGL, though the structure is very much alike. I would really appreciate if you can give some suggestions because I don't know where else can I search for the information (I've asked many people in GPU programming dpt at my university, but clustered lighting seems to be very unique concept which not many people are aware of).

Thank you for the tutorial and any possible help! :)

Specs:
width: 1280
height: 720
zNear: 0.1
zFar: 10000
number of lights: 3
grid size: 12x12x24
OpenGL: 4.5

EDIT:
For better understanding, I applied albedo texture for all pixels where light count = 0:

Now I tried to do lighting in forward pass to check if determining screen space position by global invocation ID is wrong. Apparently, it's not. Same issue occurs when rendered in fragment shader in forward pass.

Great tutorial, only one detail

Hi Dave, so far one of the best tutorials i've ever read about clustered shading, one precisation(tell me if i am wrong) in the README where you specify numClusters you mean gridSizeX * gridSizeY * gridSizeZ right? If so the rest looks good and i've succesfully implemented in my engine.
Thanks and yes this really should be integrated in learnopengl.com

daveh355 / clustered-shading Goto Github PK

clustered-shading's Issues

Deferred lighting cluster question [LWJGL]

Great tutorial, only one detail

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent