Ray Tracing on AMD Radeon RX 6000 Graphics Cards, How Does it Work?

With the presentation of the RX 6000 from AMD it is clear that Ray Tracing more than being a whim of NVIDIA has come to stay as a standard within graphics hardware, and it is also clear that it will be the standard way of rendering in the future. Even so, for now a transition time awaits us in which both leading companies will make proposals in this regard. How is AMD’s approach presented with its RX 6000?

When AMD presented its new RDNA graphics architecture a year ago, we took one of lime and one of sand; The good news came in the form of a new graphics architecture after more than five years with the GCN architecture, but the bad news came in the form of the lack of dedicated hardware for what is called Real Time Ray Tracing or real-time ray tracing. . But a few months ago AMD confirmed that the RDNA 2 architecture will be equipped with this type of units, so they will be able to compete with NVIDIA in this regard, although its operation is somewhat different from NVIDIA’s proposal.

The intersection unit in the RX 6000: the key to ray tracing

If we look at the ray tracing pipeline we will see that regardless of the hardware it is always the same, it is a process that is repeated repeatedly where an enormous number of times the intersection between the ray and the object is calculated. This repetitive calculation is more costly to do in specialized units rather than the shaders themselves.

Since the AMD and NVIDIA units are very similar, we recommend that you read the tutorial on this website entitled “What are RT Cores for Ray Tracing and how do they work?” where the NVIDIA solution works as a complement to this tutorial so that you can have a complete idea of the differences between the two approaches.

Each of the intersection units are found within each Compute Unit, the reasons for them are as follows:

They need to have access to the BVH tree in memory, so they need to be able to traverse the GPU cache system, and just like the SIMD units that run shader programs they need access to the entire cache hierarchy.
They have to be close to the SIMD units because these are the ones that depend on the result of the intersection unit to know what type of shader they apply to the objects in the Ray Tracing.

AMD has opted for a different solution: integrate the intersection unit into the texture filtering unit or at least let them share access to the data cache. We know this information from two different sources, the first is the presentation in the 2020 Hot Chips made by Microsoft about the SoC of its Xbox Series X, since it has an integrated GPU with RDNA 2 architecture, the same as the RX 6000 graphics cards. from AMD.

Let’s not forget that AMD itself confirmed that the solution for Ray Tracing in next-generation consoles with their GPUs and on PC is exactly the same.

The second source is a patent from AMD itself where it is said that the intersection unit for Ray Tracing is in the texture unit, and this has led to the confusion that the texture unit cannot calculate the intersection of the rays and texturing at the same time, but in reality the texturing is only applied in one stage of the graphic pipeline, which is the texturing of the scene where the pixel shaders act, so outside of that stage these units are rarely necessary.

The texture unit simply applies the bilinear filter, and this means that it takes 4 neighboring samples per pixel and interpolates between them. Every contemporary GPU usually has 4 texture units accompanied by 16 Load / Store units with which they access the data cache of the Compute Unit or the SM.

The only difference with NVIDIA’s solution for calculating the intersection in Ray Tracing is that in the RX 6000 AMD the access to the data cache through the L / S units is switched between the texture filtering units and the intersection unit.

Why is the intersection unit for Ray Tracing in the CU?

The execution units within the GPU usually work with instructions, generally of the register-register type, so they lack a complex mechanism to access the memory hierarchy, and this allows them to be simpler cores than those of a CPU and place more of them inside each chip. The way that the SIMD units in the Compute Unit access the memory hierarchy, made up of the internal caches of the GPU and the VRAM, is to use the Load / Store units for this.

Almost all types of shader programs tend to operate on registers, but there is one type that is pixels or shaders that do require access to the memory hierarchy, since they work with the huge amounts of data from the textures, and hence, the texture units have access to the memory hierarchy together with the SIMD units.

In the specific case of Ray Tracing we need to store the position of the objects in the scene in a spatial data structure that we call BVH. This data structure does not fit in the internal memory of the GPU, so the intersection unit needs to use the memory hierarchy, which means that these units are also connected to the cache and the VRAM.

The RX 6000 is more geared towards DirectX 12 Ultimate requirements

There is still a long way to go for ray tracing to replace rasterization and there is a long way to go where the most optimistic predictions speak of a minimum of three years ahead. The reason for this is that Ray Tracing requires very high computing power and there are scenes where even the most powerful GPU would completely choke on trying to get adequate performance.

In traditional Ray Tracing, a ray bounces off various objects until it runs out of energy or simply leaves the scene; To understand energy, one must bear in mind that each object has a refraction quotient that goes from 0 to 1 and that is the amount of light that they absorb and reflect. An object with a refractive quotient of 0 absorbs all the light completely and will not emit it, while an object with a refractive quotient of 1 will emit all the light that reaches it.

Every time a ray hits an object it creates new indirect rays and so on as long as the refractive quotient is not low enough. Obviously it can be understood that this is a huge number of intersections to calculate that exceed the capacity of the intersection units.

To avoid this, in APIs such as Microsoft’s DX12 Ultimate or Vulkan, a new type of shader program has been added: the Ray Generation Shader, which consists in that the generation of new rays is not automatic but must be explicitly invoked. by the code, which means that in the first years we will see objects in games that do not refract rays in order to reduce the amount of rays in the scene and to achieve stable frame rates.

This means that when a ray hits an object and has to continue its trajectory generating new rays then the intersection unit has to ask the shader program in charge of coordinating the path what to do.

Is the RX 6000 Ray Tracing solution better than the RTX 3000?

Well, we do not know for sure since at the moment both one company and the other have chosen to give different metrics, and in the case of AMD, the information that we have indirectly via Microsoft is that the intersection units can perform 4 Rays. Ops per cycle, but we do not know what those Ray Op are exactly, the only thing we also know from Microsoft is that the intersection units of the GPU of its console are equivalent to 25 TFLOPS, but we do not know the context of this figure.

In NVIDIA’s case they claim that the RTX 3080’s RT Cores have a combined power of 58 RT-TFLOPS, but we don’t know if that is the computing power of the RT Cores by itself or is the computing power that the CUDA drives should be enough to have the same performance.

Be that as it may, the reality is that we can only trust what both architectures tell us and the information we have, and it seems that the units of the RX 6000 are more similar to those of the RTX 2000 with 4 calculation units lightning-box intersection and 1 lightning-triangle unit, but NVIDIA in the RTX 3000 has doubled the latter so the capacity when calculating the intersections is somewhat greater.

How this translates into each game depends on a number of factors, but in any case it seems that AMD’s solution for Ray Tracing on their RX 6000 is good and efficient enough to go to next-gen consoles as well.