Arithmetic Intensity and Bandwidth of RAM or VRAM

Arithmetic Intensity and Bandwidth of RAM or VRAM

The relationship between RAM and a processor is totally symbiotic, while RAM is meaningless without the processor it cannot live without it regardless of where it is. So they are part of a whole that feeds each other. So there is a relationship for performance and this is the arithmetic intensity. We explain what it consists of.

A processor, regardless of whether it is a CPU or a GPU, does nothing other than process data , which leads to the need for memory to feed it. Unfortunately, with the passage of time, the distance between the speed of memory and that of the CPU has been growing, which has led to the implementation of techniques such as cache memory. We cannot forget either the latency between the processor and the memory, which occurs when the interface between the RAM and the processor cannot grant or modify the data with enough speed.

However, we cannot measure performance in a general way, since each program or rather, each algorithm within each program has a different computational load. And this is where the arithmetic intensity term comes in. But, let’s see what it is and what it consists of, as well as other elements that have to do with performance on a computer.

What is arithmetic intensity?

Intensidad aritmética

Arithmetic density is a measure of performance, consisting of measuring the number of floating-point operations that a processor executes on a specific section of code. To obtain it, the number of floating-point operations is divided by the number of bytes that the algorithm uses to execute.

How useful is it? Well, the fact that it allows in certain fields of computing where very powerful computers are needed for specific tasks to be able to have the best possible hardware system to execute the algorithms under the best conditions. This model is used mainly in scientific computing. Although it also serves to optimize performance in closed systems such as video game consoles.

In the case of making use of a highly parallelized hardware architecture, a high arithmetic intensity is required, that is, a low ratio between bandwidth and computing capacity from the moment in which the ratio between the computing capacity of said processors and the available memory bandwidth is high. Since it is required in many applications and especially in graphics that a calculation is processed several times and therefore a great computational power is required in comparison.

Algorithm performance and relationship with arithmetic intensity

Notación O Algoritmos

When writing an algorithm, programmers take into account the performance of the algorithms they write in their programs, which is measured by the Big O notation, which measures the mean of operations with respect to the data. The Big O notation is not measured using any benchmark, but rather the programmers calculate them by hand to get a rough idea of the workload of the programs

  • Or (1): the algorithm does not depend on the size of the data to be processed. An algorithm with an O (1) performance is considered to have ideal performance and is unbeatable.
  • O (n): execution time is directly proportional to data size, performance grows linearly. It may also be that a
  • O (log n): occurs in algorithms that usually chop up and solve a problem by part, such as data ordering algorithms or binary searches.
  • O (n log n): it is an evolution of the previous one, it is about further dividing the resolution of the different parts.
  • O (n 2 ): there are algorithms that perform multiple iterations because they have to query the data multiple times. Therefore, they are usually highly repetitive algorithms and therefore have an exponential computational load.
  • O (n!): An algorithm that follows this complexity is a totally flawed algorithm in terms of performance and requires rewriting.

Not all algorithms can reach the O (1) complexity level, and some of them perform much better on one type of hardware than another. That is why domain-specific accelerators or processors have been developed in recent years that accelerate one type of algorithm over others. The general idea is to divide the algorithms into parts and treat each of them with the most suitable processing unit for its arithmetic intensity.

Ratio between communication and computing


The inverse case is the ratio between communication and computation, which is measured inversely to arithmetic intensity and therefore is achieved by dividing the number of bytes by the power in floating point operations. So it is used to measure the bandwidth required to execute that part of the code. The problem when measuring comes from the fact that the data is not always in the same place and therefore the RAM bandwidth is used as a reference.

It must be taken into account that it is not a totally reliable measure, not only due to the fact that the cache system brings the data closer to the processor, but also due to the fact that there is the phenomenon of latency where each type of memory Used RAM have different advantages and disadvantages and a result can vary depending on the type of memory used.

Today, when choosing memory in a system, not only bandwidth is taken into account, but also energy consumption, since the energy cost of moving the data is exceeding the cost of processing it. So you are opting for certain types of specific memory in certain applications. Of course, always within the costs associated with building a system and they are not the same in a supercomputer as in a home PC.