Light

scoutydren / cuda-flocking-boids Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cis565-fall-2021/project1-cuda-flocking

0.0 0.0 0.0 5.69 MB

CMake 17.05% Makefile 0.99% C++ 38.73% GLSL 0.95% Cuda 42.28%

cuda-flocking-boids's Introduction

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

Beini Gu
- LinkedIn, personal website, twitter, etc.
Tested on: Windows 10, AMD Ryzen 7 3700X 8-Core Processor 3.59 GHz 16GB, GTX 2060

CUDA Boids Flocking

Visualization results

visualization with 8000 boids

Animated

Performance Analysis

Variant: Changing number of boids

naive boids implementation
- with visualization (vis fps can exceed 60fps, vertical sync off)
- simulation only
- **Results: ** FPS decreases logarithmically as number of boids increases. Same decreasing profile with and without visualization. Having visualization results in extra computing costs which in general slows down the performance. One possible reason for explaining this performance slowdown with number of boids was in kernUpdateVelocityBruteForce and computeVelocityChange, we are looping through all the boids that exists in the entire scene non-parallel checking if each one of them is within a "rule distance".
scattered uniform grid implementation
- with visualization
- simulation only
- Results: FPS decreases as number of boids increases, but the rate of decreasing, compared to naive boids implementation, is much slower, which indicates much stronger performance compared to the naive boids method as we are only checking boids in neighboring cells instead of all the boids. The reason which the program still slows down after the number of boids reaches really big is that in kernUpdateNeighborSearchScattered, we are looping from the start index to the end index, this loop will gets bigger if the number of boids gets bigger.
- Same as naive boids, visualization slows down the program with extra draw calls.
coherent uniform grid implementation
- with visualization
- simulation only
- **Results: **FPS decreases as number of boids increases, comparing with scattered grid(showed in orange dots), coherent grid implementation has stronger performance when the number of boids gets larger. For small number of grids, coherent grid does not seem to have clear advantage. The reason is because in coherent method, we hold an extra buffer to store the reshuffled position and velocity to avoid indexing from the indices array each time. For small number of boids, this may not be a real big advantage, but for larger numbers, it has better performance.

Variant: Changing block size

No visualization
- For the naive implementation, the fps does varies with respect to the blockSize, but for uniform grid implementations, blockSize does not indicate a correlation with the performance. The performance about the best at around 64 - 256. Smaller block size results in smaller total number of threads and lower occupancy so we should avoid small block sizes.

Variant: Changing width of each cell

Decreasing the cell width would slow down performance, as measured using the coherent grid approach with 50,000 boids, if we divide the cell width by 4, the fps goes down from around 1400fps to 1100 fps, and dividing by 8 results in only around 500 fps.
This would be the usual case, but if the grid has very high density and very large cell width, which implies the cell count is small. Then it defeats the purpose of dividing the area using uniform grids, causing the performance to slow down just like the way we did in the naive implementation when we are checking too many neighbors. So we should decrease the cell width.
Result: the cell width should be a carefully chosen value which has similar scale with the rule distances.

cuda-flocking-boids's People

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.