GPUs from AMD and NVIDIA, Is the Future without RT Cores for RT?

Is it possible that in the future the RT Cores will disappear from the future GPUs of NVIDIA, Intel and / or AMD? Can the Shader units with their enormous computing power grow enough to the point of making the inclusion completely dispensable? of these types of units?

The RT Cores, Ray Accelerator Units or intersection units are specialized units that are in charge of a single task in the GPUs and that came for the first time from the hand of the first NVIDIA RTX.

In this article we will not explain what they are for, for this we recommend you look for the article in HardZone entitled What are RT Cores for Ray Tracing and how do they work? in which we explain in a simple but detailed way the operation of this type of units.

What are RT Cores or intersection units?

The RT Cores in NVIDIA or Ray Accelerator Units in AMD are units in charge of calculating the intersection between the rays and the different elements of the scene, to understand what is the need for this type of unit in the hardware of the new graphics cards we have to understand how the simplest version of the ray tracing algorithm works:

For each pixel or object in which the pixel is located, if the ray intersects with said object: the color value of that pixel on the screen changes.

This is done continuously and repetitively in each and every one of the frames that the GPU renders that are generated using the ray tracing algorithm or one of its variants, either partially to solve the indirect lighting problems that rasterization cannot solve by itself.

The Möller – Trumbore algorithm for the intersection between rays and triangles

Ray intersection units are fixed function units that perform the Moller-Trumbore algorithm . It must be taken into account that what fixed function units do is always apply the same program from some input data, said program is micro-wired, so the transistors that make up said unit are placed in such a way that they can only run that program and not another.

The advantage of fixed function units is that they need fewer transistors than programmable units which are much more complex, but a fixed function unit only makes sense in hardware where programmable units dominate if it can perform its task at one time. speed that at the cost and speed level cannot be matched by the programmable part.

Obviously, like any algorithm, it is possible to execute it in shader units, but for this to be possible it would be necessary for said units to be fast enough to dispense with fixed function units.

The cost of the Möller – Trumbore algorithm

Despite the fact that there are more algorithms, this is the most famous and used, that is why we have decided to use it as an example and believe me that its cost is not directly cheap since in total there are 27 floating point operations per pixel. But, in some architectures, because the division is more complex to implement in shaders, it is not performed by conventional SIMD units but by SFUs, which can perform much more complex arithmetic operations but with a lower speed than sums. and multiplications.

In other words, we would need 27 FLOPS not per pixel but per pixel and intersection, now think about the number of intersections and pixels in a scene and you will get a rough idea of why the intersection units or RT Cores are so necessary.

The type of shader program that replaces RT Cores

In the API specifications for real-time Ray Tracing, both in DXR within DX12 Ultimate and in the Ray Tracing extensions for Vulkan, there is a type of shader that has become obsolete, which is the Intersection Shader, which it completely replaces to intersection units in hardware where they are not present.

Keep in mind that a shader is nothing more than a program and the fact that programmers have to make their own intersection unit game by game can be a tedium, that’s why both APIs include example intersection shaders. The trade-off for this? Many developers may view the intersection algorithm included in APIs as well as fixed function units as inappropriate.

In hardware design, it is not usual to eliminate fixed function units that function as accelerators, but rather it is usual to expand the capacities of said units and even make these units programmable, so the next step in the evolution of intersection units , if it has not already been done, it is for a specific domain purpose with micro-programmed code that can be updated.

Therefore, it is possible that we will see the creation of new intersection algorithms with better performance, which end up written in the internal memory of each of the units with a firmware update.

Fixed function units have never been removed from a GPU

A GPU has a series of fixed function units to render 3D graphics, these units, like the intersection units, are responsible for performing repetitive and repetitive tasks in each frame. We refer to units such as texture units, those in charge of rasterizing geometry, etc.

These units have never been eliminated due to the fact that their tasks can be carried out by a shader unit, what’s more, if we took a GPU without said fixed units and made them render a scene in 3D, they would be an order of magnitude more inefficient than a GPU with fewer shader units but with those units included.

The tendency is always that a part appears that is repetitive and repetitive in each frame, which would occupy a good part of the time and resources of the units that execute the shaders, since it ends up creating a type of specialized unit that not only discharges from said task to those units but to do it more quickly and for a portion of the cost.