Why does AMD have worse performance in Ray Tracing?

One of the things that is talked about at length is the poor performance of AMD graphics cards in Ray Tracing , especially compared to NVIDIA‘s. However, many throw their hands up when we say that the implementation of the necessary hardware by the Radeon Technology Group is so poor that it seems literally like a boycott of the adoption of this technology. Which let’s remember that it is ideal to solve certain visual problems in computer graphics and it is not an invention of the GeForce manufacturer either.

For us, the main function of a graphics card is that it allows us to play our games with ease and performance, at the same time that if you are going to need it for more professional tasks, such as video editing or the creation of 3D models, it is more than enough. with his work. When we say that AMD has little performance in Ray Tracing, we are not putting NVIDIA through the roof, but rather, as users that we are also, we are saddened to see that something that in Radeons could be much better, is not.

The ray tracing algorithm

To understand the poor performance of AMD cards in Ray Tracing we have to understand that this is actually a recursive algorithm to generate a complete scene, which in its simplest version can be summarized as follows:

For each pixel in the scene
- Calculate the visualization ray
  - If the lightning strikes an object, evaluate the color of the object.
  - If not, that pixel has the background color.

The ray is nothing more than a vector that moves from the camera that “records” the scene and that crosses a matrix of points or a mesh, where each one of them is a pixel. Each time a check effect will be performed on the scene. Well, if we have a scene in Full HD this means that 2 million checks will have to be carried out, if the game is at 60 FPS this is 120 million checks per second.

Mathematically, the most common formula to check it is not a simple operation, but rather a complex equation with vectors, which requires some power. So much so that the simple fact of not having a parallel unit in charge of carrying out this task can reduce the percentage performance to single digit figures.

Hardware intersection units

That is why NVIDIA has the RT Cores and AMD has the Ray Accelerator Units, they are the same, since they are the same type of unit and are used for the same task. However, in the last generation, the RX 6000 had a limitation that luckily the RTG has solved in RDNA 3 and, consequently, in the RX 7000 range.

What is the problem, then?

The good thing, and therefore the positive, is that now what was missing in RDNA 2 has been included in RDNA 3.
The bad thing and what makes us have a poor performance of Ray Tracing on AMD is the amount of ray-triangle interactions that it can calculate. A jump of only 50% is very poor when your rival has doubled the performance from one generation to another.

Let’s not forget that the first 3D cards that appeared on the market were responsible for increasingly accelerating the operation of triangle rasterization, which is the most common in this regard. The same goes for this part in ray tracing. So the fact that AMD has made such a small leap in this regard is disappointing.

How does it affect overall performance?

Although the intersection of rays is a part of the set, it is a common element in all the scenes that is essential. Let’s not forget that it is a process that goes by stages where the fact that one goes slower than normal ends up affecting the performance of the subsequent ones.

Therefore, if we manage to speed up a stage, we obtain a shorter time to generate the same frame, that is, it takes fewer milliseconds and this is more frames per second. What has to be clear is that the intersection process is recursive and continuous in Ray Tracing and, therefore, it is necessary that this part has a good performance.

The other problem: floating point performance

GPUs typically work on blocks of data in unison, applying the same instruction to them. That is why its quintessential type of unit is what we call SIMD units, which, as their name suggests, apply the same instruction to several different data at the same time. Well, NVIDIA in the RTX 30 made a rather curious improvement that allows it to calculate twice as many 32-bit floating point operations per clock cycle and core.

The trick was to add a second 16-element SIMD unit on each of the sub-cores for a total of 64 additional ops per unit inside the GPU. However, they did not increase the number of records or accesses, since they were commuted with the unit of integers. What does this translate to? Both the RTX 30 and RTX 40 achieve double floating point performance under certain conditions, not always.

AMD, on the other hand, has sought another solution which they have called Dual Issue, but in their technical specifications they say that the number of floating point units has not increased, but that under certain conditions they can pack 2 instructions at the same time. However, the number of units per core or Compute Unit is still a maximum of 64, instead of 128, as in the case of NVIDIA.

What does AMD mean by “Dual Issue” in RDNA 3?

However, if you count the number of floating point operations given by AMD, which are usually given at a theoretical maximum, performing 100% of the time the FMA operation or addition with floating point multiplication, which is unrealistic, since it does not take into account memory accesses and the fact that programs do not always use said instruction, but it does take into account that it is the most used when generating graphics. The fact is that the instruction is 2 operations.

Well, what AMD has done is that certain instructions can be packaged two by two in the calculation units, allowing twice the power in floating point to be achieved with RDNA 2 under certain conditions. It is the same case as with NVIDIA GPUs. The additional floating point power is not doubled in general, but only under certain conditions. So it is a common problem. In any case, the measurement in TFLOPS is still a marketing trick today.

So why is it important to AMD’s Ray Tracing performance? Well, due to the fact that it helps us to measure the calculation power of the units that are used in the rest of the stages of ray tracing that are not the intersection of rays. In any case, AMD itself claims that the intergenerational improvement is 18% at the same clock speed.

AMD GPU Performance in Ray Tracing: The Numbers

If we compare the performance of the different intersection units on both the different generations of graphics cards from NVIDIA and AMD, we will see what the problem is.

GPUs	Intersections/s (in millions)	cores	MHz	Intersections (core and MHz)
RTX 2080Ti	105600	68	1545	one
RTX 3090Ti	312480	84	1860	2
RTX 4090	1290240	144	2520	3.6
RX 6950 XT	184800	80	2310	one
RX 7900 XTX	360000	96	2500	1.5

At first glance, the raw power in this aspect is higher than that of an RTX 3090 Ti, yes, we look at the second column. However, it is the latter that is important, as it tells us how many intercepts are computed per core and clock cycle on the GPU. And the disappointment comes from the fact that although AMD is not asked to give the result of 3.6 for the RTX 40, it is asked to at least reach 2 for the RTX 30. This is the main reason for the poor performance of AMD graphics cards in Ray Tracing. And the reason why we think they could have done much better.

It is more, and already to finish, because the Ray Accelerator Unit is a black box in itself that can be replaced without affecting the rest of the architecture. AMD can pick up and make an RX 7×50 range for the coming year that retains all the goodness of the current RDNA 3, but with the improved RAU and see gaming performance increase by double-digit percentages in terms of frame rate is concerned.

What is the performance of AMD games with Ray Tracing in RDNA 3?

Now to finish we have the cherry on the cake and talk about how it performs in games. Since AMD publicly claimed a 50% improvement, we should expect an equally large jump. However, we later discovered that they refer to performance per watt, to a certain amount of these and with a specific game, which has not been specified. So the important thing is to know what the improvement has been compared to the previous generation, in this aspect, especially due to the fact that they start from a rather poor performance in ray tracing that is from the RX 6000.