RV32X: the RISC-V Variant that Will Allow Open Source Graphics Cards

The market for graphics processors or GPUs, whether in the form of graphics cards or integrated GPUs, is currently occupied by NVIDIA, AMD and to a lesser extent Intel. Does this mean that it is not possible to build a GPU that does not use commercial technology? The solution can come in the use of the totally free and open ISA RISC-V, in a variant designed for graphics under the name RV32X

The people of Pixilica, creators of the CPU based on the ISA RISC-V called SiFive have made a proposal called RV32X, which aims to create GPUs using the ISA RISC-V. Which by the completely free and open nature of RISC-V could change the future of GPUs completely.

RV32X, applying RISC-V to build a GPU

The RISC-V standard is a completely free and open set of registers and instructions for use that is based on a base specification and a series of optional extensions for the creation of CPUs for different types of uses and utilities. But is it possible to use RISC-V to create shader units like NVIDIA’s SMs or AMD’s Compute Units? Not to begin with and a new extension of said ISA would be necessary to use it to create a shader unit.

The company Pixilica has proposed the creation of a GPU where the shader units are based on the ISA RISC-V and for this it has proposed an additional extension to the ISA RISC-V with a series of instructions that allow you to manipulate graphic primitives. Be it pixels, vertices and other types of data used in real-time graphics rendering.

To do this, they propose to use the V extension for RISC-V vector instructions and to expand the list of instructions from it in order to create the shader unit.

What are the components of a Shader unit?

Every shader unit, regardless of the brand we talk about, has the following components:

A decoder, scheduler that takes the waves of execution threads and organizes them.
A well of registers, on which the planner places in an orderly manner the threads to be executed by the ALUs or execution units.
ALUs or execution units with the ability to perform arithmetic and logical operations on the data. These can be vector or SIMD and scalar.
Texture units to perform interpolation of texture pixels. These units are fixed function.
Top-level cache for data
Ability to export data from the data cache to higher levels of the cache, but further away from the shader unit, and also to export to the ROPS
An instruction cache that can be internal to each shader unit or shared among several.

Once the common basic elements are known, let’s take a look at Pixilica’s proposal.

The RV32X, a RISC-V based Shader unit

The first thing we have to keep in mind is that the standard for vectors in RISC-V supports up to 128 bits, which can be implemented as follows: 1 128-bit operation, 2 64-bit, 4 32-bit, 8 16 bit or 16 8 bit. We call this ability to halve precision in any ALU as over-register SIMD. Although in the case at hand, the proposal is to place 4 32-bit ALUs in floating point.

The other type of units are the so-called special function units, which in the design will be scalar and with an integer precision of 16 bits. The instructions that they would execute would be the transcendental calls. That is, sine, cosine, tangent, powers, logarithms and roots of different degrees. Which are difficult to implement in a SIMD unit due to the cost of transistors.

A difference with respect to the SIMD units used in a CPU would be that the registers would not be 128 bits, but 136 bits, where the first 8 bits would serve to define the type of graphic primitive or data to be processed. The 8 bits allow up to 256 data types, so there is a wide margin to define them, some of the data types allow you to interact with the different fixed function units that are in the GPU, as well as with the VRAM.

In total we have about 4 register wells, each of them 1024 different elements of 136 bits each, which is currently the maximum block of execution threads in APIs like Direct3D within DirectX. Each of the manholes is connected to a 4-ALU 32-bit floating point SIMD unit. This translates into 16 ALUs per manhole, which would be confirmed by the instruction support with 4 x 4 element matrices.

Decoding Instructions on the RV32X

The decoding of the instructions would come in a fixed function decoder. It must be taken into account that the instructions would be 32 bits long and therefore of fixed size. These would arrive through the instruction cache, while the data for each instruction would also be in the data cache within the Compute Unit. The use of a microcode instruction decoder is also possible.

The proposal includes the integration of a standard RISC-V core accompanying the RV32X, which does not mean that it becomes a typical CPU, but rather that it would be used to integrate the Round-Robin execution typical of GPUs, which is based on give each instruction an execution time. If it does not run in the specified time, usually due to a lack of data. Then it moves back down the list.

It must be taken into account that the ALUs of the shader units execute the instructions of the different execution threads in cascade as they arrive from the registers. When the scheduler has filled the registers with the data from the data caches and instructions when they are operated by the ALUs as if they were a stack, once a group of instructions has been resolved the ALUs read the next group, until all are traversed the logs and there are no threads to run.

Fixed function on RV32X

The fixed function units in the same way as the rest of the accelerators are not defined in the RV32X, but the instructions that allow interaction with them from the Shader units are. These units are responsible for always performing the same function, since it is wired or uses a fixed microcode, but in any case they would be units completely apart from the RV32X for the most part.

Therefore, each manufacturer would have its own fixed function units and their implementation. As well as creating proprietary extensions in the event that these had implemented additional functions on the common operation of the same. Keep in mind that the RV32X does not define a complete GPU but rather a shader unit that is part of a GPU, which is important because it executes the different types of shaders, but it is not the entire GPU.

As a note, in terms of texture units we have one per manhole. Keep in mind that the usual ratio between ALUs in SIMD units is usually 16: 1 in most GPUs. Within each Shader Unit, we typically see four texture units, bringing the ratio to 64 32-bit floating point ALUs on average for each shader unit.

Does this open a new branch in the GPU market?

Leaving aside all the technical part, if we look at the market for PostPC devices and the SoCs that are assigned to them, we will see that a large number of integrated GPUs are fully proprietary. So far the most used in SoCs made in China was the Mali architecture, but with the purchase of ARM by NVIDIA this limits the possibilities. Qualcomm owns the Adreno architecture, Apple has its own too, and even Imagination with PowerVR means having to license a third party to use the technology.

The fact that there is a proposal for a shader unit based on RISC-V is key, since it will allow the creation of new GPUs based on a standard that is completely free and therefore non-proprietary. This will mean that new units and improvements can be created that solve common problems instead of looking for the same solution from different perspectives at the same time.

If we are to look at a world power that can benefit from such a technology it is clear that the first that comes to mind is China, but we can also see an evolution of AMD, NVIDIA or Intel to use the RISC-V version to build shader units to benefit from common progress. In the same way that it happens in software and the world of open source.