On some architectures, all members of a :abbr:`warp` have to execute the
same instruction, so-called "lock-step" execution. This is done to achieve
higher performance, but there are some drawbacks. If a an **if** statement
is present inside a warp will cause the warp to be executed more than once,
one time for each branch. On architectures without lock-step execution, such
as NVIDIA Volta (e.g., GeForce 16xx-series) or newer, warp divergence is less costly.
To my understanding, GeForce 16xx-series is not an example of a NVIDIA Volta or newer. This might need be verified and potentially modified. I would also maybe clarify the claim about the if-statement; from what I've understood, there would be branch divergence only if the if-statement is evaluated at runtime (not templated branch), and multiple threads withing a single warp actually execute different branch of the if statement.