Ray Tracing on AMD RX 6000, Why Does It Work Worse?

One of the topics that have come up in recent days is about the performance of the AMD RX 6000 compared to the NVIDIA RTX 3000 in terms of Ray Tracing, where the company with the green logo seems to have a performance advantage when making use of the tracing rays with respect to its direct rival. But, is there another reason apart from those already known?

Ray Tracing has become one of the technological innovations in terms of graphics, especially since NVIDIA in the RTX 2000 family added hardware to accelerate the so-called real-time ray tracing, a trend to which it has recently joined AMD with its RX 6000 range and again NVIDIA with its RTX 3000.

But things are not even between the NVIDIA RTX 3000 and the AMD Radeon RX 6000 is not on par when it comes to ray tracing, in part it can be explained by the greater number of ALUs in FP32 that have the cores of NVIDIA GPUs, but that’s only part of the story.

What is the Ray Tracing problem on AMD RX 6000?

One of the key points to accelerate Ray Tracing is the use of acceleration data structures , which what they do is store a map of the position of the objects in the scene.

How useful are they? Simple, in Ray Tracing they prevent rays from being launched and testing towards parts of the scene where there is nothing, so they save a lot of time and hence they are called acceleration structures, of which there is not a single type , but several different ones.

In the case of NVIDIA they decided to add in their RT Cores a unit capable of traversing a type of data structure, BVH trees, this means that if we use this data structure when making ray tracing in our games we will not have to invoke a Compute Shader program to do the walkthrough.

But in the case of AMD they have decided not to give preference to any type of acceleration structure, which means that the path has to be controlled by a compute shader program in the case of using the classic tree data structures such as they are the Octrees, BVHs, KD-Trees, etc.

A simple explanation of what a tree is

In computing, trees are not an ordered and listed data structure but rather hierarchical, this means that when it comes to traversing them, the processor will have to take several iterations.

The node where the tree begins is called the root.
Any node that has one or more nodes below it is called a parent.
Every node that has a node above it in the hierarchy is called a child.
Every node that is at the end of the hierarchy is called a leaf.

Trees should not be confused with conditional jumps in the code that are based on jumping to one line or another when a condition occurs, trees assume that when there are several nodes it is best that they be focused by several different threads of execution with each iteration.

Contemporary GPUs usually have Shader units made up of 4 SIMD ALUs, where each of them executes a thread of execution, so they can execute trees with up to 4 nodes without problems, of course, when it begins to travel a node then it will more and more sub-nodes so the number of threads to be executed will be very high.

That is why NVIDIA added hardware specialized in traversing BVH trees in its RT Cores, to avoid not having to use shader units for this, however this unit only works for that type of data structure, but in exchange it can traverse said structure of data very quickly.

But there is a way in which the data of a node can be presented and it is to present the different routes in a line way, this allows sending the data in a one-dimensional array which is what a 1D texture is, which is the best way to send data to AMD GPUs.

The solution on the part of AMD is that the developers forget to present the acceleration structure in the form of a texture, of course this comes from the decision they have made not to add specialized hardware to traverse a specific type of tree structure, giving preference to a greater versatility instead of doing it to a greater speed.

This means that developers at the moment have to adopt specialized measures for each brand of graphics cards when implementing Ray Tracing.

Where does the discrepancy come from?

Some of you may wonder why AMD has decided not to include hardware to traverse the data structure and this is very simple, it is not part of the minimum specification for DirectX Ray Tracing.

What’s more, in DXR we can perform Ray Tracing by replacing the intersection units with Shader units that execute an Intersection Shader, but the specialized intersection units that AMD and NVIDIA have included are much more efficient because they do the job several times faster occupying only a portion in comparison.

What we are referring to is that Microsoft when creating its API did not put the way in which the hardware had to work under the table and this has given AMD room to dispense with specialized hardware to navigate data structures in tree, which has affected the performance of your graphics cards.

Although the AMD Ray Tracing patent spoke of the inclusion of a unit capable of traversing 4-node trees, BVH-4, it also warned that it was optional and because of the information that can be obtained from the recently published ISA RDNA 2 there are no references to the unit in charge of traversing the trees, only to the intersection instructions.