Cache Memory in CPU and GPU: What It Is For

When we look at the specifications of a processor, one of the things that stands out is the cache memory, which is found not only on all CPUs, but also on all GPUs. In this article we will explain what cache memory is in plain and accessible language, so that you know what this type of memory is for on your PC.

Cache memory was first implemented in the Intel 80486, but its origins date back to the IBM S / 360 where the idea of cache memory was implemented for the first time. Today due to the gap between the CPU, GPU and other processors with memory, it has become an indispensable part of every processor.

Why is the cache necessary?

Cache memory is necessary due to the fact that RAM memory is too slow for a CPU to execute its instructions with enough speed and we cannot accelerate it any more. The solution? Add an internal memory well in the processor that allows you to zoom in on the latest data and instructions.

The problem is that doing this is extremely complex, since it forces the programs themselves to do it, thus expending CPU cycles. The solution? Create a memory with a mechanism that copies the data and instructions closest to what is currently being executed.

Because the cache is inside the processor, once the CPU finds the data inside it, it executes it much faster than if it had natural access to RAM.

How does the cache work?

First of all we have to bear in mind that the cache is not part of the RAM memory nor does it work as such, in addition, it cannot be controlled like RAM where programs can occupy and free memory as they please for when they need it. The reason? Cache works totally separate from RAM.

The job of the cache is to move the data from memory to the processor. The usual thing in a program is that the code is executed in sequence, that is, if the current instruction is in line 1000 then the next one will be in 1001 unless it is a jump instruction. The idea of caches? Well, transfer part of the data and instructions to an internal memory of the processor.

When the CPU or GPU looks for a data or an instruction, the first thing it will do is look at the cache closest to the processor and therefore the one with the lowest level to increase until it reaches the expected data. The idea is that you don’t have to access memory.

Cache levels on CPU and GPU

In a multicore system where we have two or more cores, we find that all of them access the same RAM memory well, there is a single interface for memory and several processors fighting for access to it. It is at this point where it is necessary to create an additional cache level, which communicates with the memory controller and this with the higher cache levels.

Normally multicore CPUs usually have two levels of cache, but in some designs what we have are clusters, which are based on groups of several CPUs with a shared L2 cache, but that share the space with other clusters, which sometimes forces to the inclusion of a third-level cache.

Although not common, level 3 caches make an appearance as soon as the memory interface is a large enough bottleneck that adding an additional level in the hierarchy helps performance.

The memory hierarchy

The rules of the memory hierarchy are very clear, it starts from the processor registers and ends in the slowest memory of the same and always follows the same rules:

The current level of the hierarchy has more capacity than the previous one but less than the next.
As we move away from the CPU, the latency of the instructions increases.
As we move away from the CPU, the bandwidth with the data decreases.

In the specific case of cache levels, they store smaller and smaller pieces of information but always contain a fragment of the next level. So the L1 cache is a subset of the L2 cache data which in turn is a superset of the L1 cache data and a subset of the L3 cache data if any.

However, the last level cache, the one closest to memory, is not a subset of RAM, but only a copy of the memory page or set of these closest to the processor.

Cache Miss or when data is not found

One of the biggest performance issues is when a Miss Cache occurs which occurs when data is not found at a cache level. This is extremely dangerous for the performance of an out-of-order CPU since the consequence is a lot of lost processor cycles, but it is no less dangerous for an out-of-order CPU.

For the design of a CPU, the fact that the sum total of the search time of all the Cache Miss combined together with the search time is greater than looking for the data directly in the cache is a failure. Many CPU designs have had to go back to the design table due to the fact that seek time is longer than access to RAM.

That is why architects are very reluctant to add additional levels in an architecture because yes, it has to be justified in the face of improved performance.

Coherence with memory

Because the cache has copies of data from the RAM, but it is not the RAM itself, this causes the danger that the data will not match, not only between the cache and the RAM, but also between the different cache levels where some are separated.

That is why mechanisms are necessary that are responsible for maintaining the coherence of the data at all levels with each other. Something that is supposes the implementation of an extremely complex system that increases with the number of processor cores.