Pipeline 3D: This is How All GPUs Render Graphics

The way GPUs generate graphics in real time seems like magic to us, but they all follow the same procedure and the same stages. That is why we have decided to organize a trip through the 3D pipeline, which is common for all APIs and all 3D architectures.

In computing, a pipeline is a series of steps ordered and repeated ad nauseam to perform a task. In the case of the 3D pipeline, it has been the same for decades and the changes have been in small optimizations.

That is why we are going to take a trip through the 3D pipeline, in a completely generic way and without focusing on any type of GPU architecture or specific graphics card.

Shader programs in the graphical pipeline

A Shader is nothing more than a small program written in a high-level language that is used to modify a graphical primitive, be it a vertex or a pixel. These programs are executed several times during the graphical pipeline by a series of units that are actually processors themselves, which we call shader units.

During various stages of the pipeline the data enters from the shader units to other units of the same type and those of fixed function. To date, there is no specialized shader unit type for each type of shader, but rather a generic processor is used. At the end of the day, although we can differentiate a vertex from a pixel for a processor, they are nothing more than binary data to process.

The first stage of the 3D pipeline occurs on the CPU

A GPU is not a CPU, it is obvious, but it has to be clarified by the fact that they do not work the same. The reason for this is that a GPU does not run an operating system and neither a program, but rather its job is to read a screen list that the CPU writes and that is nothing more than a list of instructions on how to draw the next frame .

This list is written by the CPU in a part of the main RAM, the GPU through a DMA unit accesses the main RAM and copies it. This list is always found in the same part of RAM and the graphics card continuously consults it to generate each of the frames on the screen.

The information contained in the list will allow the GPU to compose a 3D scene, in addition to integrating instructions to manipulate the different elements. This list will be processed by the GPU command processor, which will organize the rest of the components of the graphics chip during the different stages.

World Space Pipeline

The World Space Pipeline is the first half of the pipeline for creating real-time 3D scenes that happens on the GPU, it is so called because this is where the elements of the world are ordered and drawn before projecting it. In this part we work with vectors in three-dimensional space, while in the second half of the 3D pipeline we work with pixels.

It is today the lightest part in terms of computing, which is curious due to the fact that at the dawn of 3D graphics, it was the calculation of the geometry of the scene that most took engineers on their heads. to create hardware capable of displaying 3D graphics in real time.

Second to fifth stage of the 3D pipeline: Transformation matrices

The matrices used during the geometric pipeline are a series of successive arithmetic operations, which are carried out in a concatenated manner and correspond to various stages of the geometric pipeline. We are not going to go into the mathematical part of them due to the fact that we want to make things as simple as possible.

All of them are performed in the following order and for each of the objects in a 3D scene and are executed for each object in the scene.

Model Matrix: The first matrix transforms the coordinates of each of the objects to common coordinates.
View Matrix: The second step is to rotate and move each object in order to position each object according to the camera’s point of view.
Projection Matrix : What this matrix does is transform the objects according to the distance from the camera, making the close ones go bigger and the farther ones smaller.

GPUs perform all the computations for transformation matrices in shader units today because they are powerful enough to do so. In the past, fixed function units and even combinations between fixed function and programmable part had been used, but today this is no longer the case and the entire World Space Pipeline runs on shaders.

Shader types in the World Space Pipeline

The first of them is the Vertex Shader, which is the most common and used. This allows us to modify through a program the values of each vertex such as color, position, length and even the texture to which they are associated.The second and third type of Shader are called Hull and Domain Shader, which are used only in the tessellation of geometry. Which consists of creating new vertices for a three-dimensional object without losing its external shape.

The fourth type is the Geometry Shader, it was typical of DirectX 10 and runs right at the end of the geometric pipeline, in the Geometry Shader a group of vertices is taken that form a single primitive and through a shader program these vertices are transformed to create a new primitive from modifying the original form.

As of DirectX 12 Ultimate these four shaders have been grouped into two different shaders called the Amplification Shader and the Mesh Shader. The first one does not operate any primitives, but decides how many shaders are to be executed and is completely optional. The Mesh Shaders instead replace all the shaders of the World Space Pipeline in a single type of shader.

Screen Space Pipeline

The Screen Space Pipeline is the second half of the 3D pipeline, where objects are transformed into two-dimensional space on the screen and manipulated. At this stage, the GPU gives color and texture to the different elements that make up the scene to later send the final result to the image buffer.

Sixth stage of the 3D pipeline: Scan Conversion or rasterized

At the end of the World Space Pipeline we have all the objects positioned correctly according to their distance and position with respect to the camera, so it is time to convert the scene into a 2D image. Where in the first place what is done is to discard the depth value of each object with respect to the camera, which is going to be stored in an image buffer called Z-Buffer where the distance of each pixel with respect to the camera is stored in each position in the image buffer.

This process is carried out in specialized units within the GPUs, which since the first 3D accelerators are an integral part of all graphics chips dedicated to rendering this type of graphics. Because it is the hardware itself that performs this work

Once the rasterization process is finished, the texturing process comes where the fragments are sent back to the Shader units for texturing and the application of Pixel Shaders.

Seventh stage of the 3D pipeline: texturing

The next step is to give texture to the different fragments, to visualize the concept you have to imagine that on the one hand we have a series of surfaces that are the fragments and on the other we have a series of adhesives that we have to fool each fragment .

From the beginning of the pipeline, each of the surfaces is assigned a series of parameters, including the memory address where the textures are located, so that the textures can be applied correctly on the surface, so that each pixel remains in its correct position.

The placement of the textures is carried out by specialized units that are inside the GPU, today they are in the units in charge of executing the shader programs and connected to the first-level data cache.

The data / texture cache and texture filtering

In all the GPUs you will have observed that the texture filtering unit is very close to the calculation units that run the shaders, as well as a data cache also called texture cache.

The ALUs of the shader units always execute the data that is in their registers, but there are times when it is necessary to access data far from these and for that reason they are brought with a cache system, including, of course, the textures themselves or fragments of the textures.

These textures arrive unfiltered, which causes a pixelated effect, which is corrected through interpolation effects. The most basic is bilinear interpolation, which consists of interpolating the values of 4 neighboring pixels. Hence, in all GPUs all the texture units are grouped into shader units 4 by 4.

Pixel Shaders

There is only one type of shader that is used in the Screen Space Pipeline, this is the Pixel Shader and it consists of being able to manipulate the values of the pixels through a series of programs, being one of the two essential shader types together with the Vertex Shader.

The Pixel Shader is precisely the most computationally loaded part of the entire graphical pipeline, the reason for this is that a triangle can be made up of three vertices, but a large number of pixels. In addition, since GPUs work with 2 × 2 pixel fragments internally in each Shader unit, its operation is different from the rest of the shaders and it is the only type that works with data outside the registers.

Last stage of the 3D pipeline: Render Output

The final part is the Render Output, a series of units that copy the final result of each pixel into the final image buffer, which is usually found in VRAM, but more modern systems write to the GPU’s own L2 cache. to speed up post-processing.

The units in charge of this stage are called Render OutPut, abbreviated ROP and plural ROPS. They originate from the Blitter units of the Commodore Amiga and the Atari ST. Which carry out a transfer of a block of data from one memory to another with an instruction in the middle.

After this stage, the image is already in the image buffer and the final image is completely finished to be reproduced on the user’s screen. It is also possible that post-processing effects are carried out, but this is already outside the 3D pipeline and they are performed as if a 2D image were being manipulated.