AMD RDNA 2, the Graphics Architecture of the RX 6000, XSX, XSX and PS5

The RDNA 2 architecture has already been launched on the market, as it is found both inside the SoC of the new generation PlayStation 5 consoles and the Xbox Series X / S. But what secrets does this AMD graphics architecture hold? What novelties does it bring and compared to its predecessor?

RDNA 2 architecture, also known as Big Navi, is a leap in quality as far as AMD graphics cards are concerned, since we are talking about the first time that AMD brings us an architecture with dedicated hardware for acceleration Ray Tracing, Variable Rate Shading support and many other changes in which we will go into detail.

RDNA 2, an evolved version of RDNA

We have to assume that RDNA 2 is an incremental improvement of RDNA, so we are facing an architecture in which another has been taken as the basis and a series of changes are made on it in order to obtain greater efficiency and performance , apart from adding some new features to keep the architecture competitive with respect to the competition.

Machine Learning on AMD RDNA 2

In the RX 5700 and RX 5600, based on Navi 10, AMD did not add support for Int8 and Int4 instructions, despite having done it in the Vega at 7nm, Curiously AMD for the RX 5500 and RX 5300 did add the instructions with Int8 precision and Int4, apart from the BF16 that are widely used for artificial intelligence.

In other words, while the low-end RDNA-based RX 5000 had the ability to work in accuracies that are ideal for artificial intelligence algorithms, the high-end did not. This has been solved by AMD in RDNA 2 where the entire range has support for data under this precision, which gives full support for DirectML

Ray Tracing in AMD RDNA 2

AMD has added a ray intersection unit for Ray Tracing within each of the Compute Units called Ray Acceleration Unit. The work of this type of unit and its operation is the same as that of the NVIDIA RT Core and takes care of perform a task that would require enormous power if done using Compute Units.

AMD has not given an equivalent power, but Microsoft‘s engineers in the Hot Chips assured that to achieve the same performance as in the Ray Accelerator Unit, only a GPU of 25 TFLOPS would be necessary, calculating only the intersection of the rays, more than double the power of Xbox Series X. Without these types of units ray tracing becomes so slow that it is unworkable.

For example, in a GPU of the RX 6000 family, under the benchmark “Procedural Geometry Sample application” of the DXR SDK, using the Ray Accelerator Units the result is 471 FPS, while without activating this and through the “DXR Fallback Software layer “gets only 34 FPS.

New scheduler on the Compute Unit

Another of the improvements that AMD has made is the addition of a new scheduler in Compute Units, in RDNA the GCN was still used, which supported about 40 Wavefronts, for one that supports 32 Wavefronts. Although the figure is lower, the use of the ALUs when carrying out the different instructions is much higher and therefore the performance ends up being much higher.

Variable Rate Shading in AMD RDNA 2

Typically, the ratio of Pixel (or Fragment Shader) instructions to the number of pixels on the screen is 1: 1. The idea of Variable Rate Shading is to group an instruction into a set of pixels in such a way that the rate of “shading” varies. The reason for this may be things such as that the value of those pixels is the same and it is a waste of resources to repeat the same instruction, since it will always give the same result, so it is important to highlight that the resolution, the density pixels on screen does not vary.

If we take a look at the AMD patents on the VRS we can deduce how it is done inside the GPU:

Support for VRS requires in a GPU that the raster units and ROPS, Render Backend in AMD, have been modified to support this rendering technique, which is used to save the repetitive calculation of groups of pixels with the same attributes .

Infinite Cache

Exclusive to the RX 6000 and totally absent on consoles, at least from Xbox Series X, the Infinite Cache is a top-level cache and at the same time a victim cache, which is responsible for rescuing the cache lines discarded by the L2 cache , avoiding in most cases that the GPU has to go to the VRAM to recover the data, which means savings in access time and energy consumed. The Infinite Cache is key in the RX 6000 to achieve higher clock speeds compared to RDNA,