What is Memory Consistency and How Does it Work?

Memory consistency is one of the key elements for multicore systems that share the same memory and that use a hierarchy of caches. Thanks to it, all cores have consistent access to memory and have a common view of the memory they use to run programs. We explain how it works and what is its usefulness.

Imagine for a moment that we have a system with several cores, but none of them have a cache hierarchy and therefore have a shared view of memory.

The problem is that due to the gap in speed between the RAM and the processors, the cache memory has ended up being used, which copies the pages or lines of memory closest to the code that the processor is executing at all times.

The use of caches that prevents performance bottlenecks ends up creating coherence problems since the processors do not execute the instructions on the RAM directly but on the caches, so it is necessary to implement a system of coherence to ensure that the information stored in the caches is correct.

Data consistency and memory consistency

The consistency of the data refers to the contract between the programs and the RAM, that is, how they are going to use the memory, but especially how the different threads of execution are going to communicate with each other. For example, thread 1 needs data B of thread 2 and thread 2 the data A, so that the data is correct it is ideal for the correct operation of the program that is being executed.

Memory coherence is instead the mechanism at the hardware level that allows threads of execution to communicate correctly, which is completely transparent to the software and works automatically in the background ensuring that the different cores have a view of the memory in common, that is, totally consistent.

The Cache Information Problem

As we have said before, caches are what makes a complex coherence system necessary, the reason for this is very simple and has to do with the fact that there are two types of caches:

Private Caches : Close to the cores, much smaller and completely private from them, any changes that are made in the copies of the memory addresses in the private caches without a cache coherence system will not be reflected in the copies.
Shared Caches: They are further away from the cores, they are much larger than the private caches.

The easiest thing for hardware engineers would be to connect the different private caches to the RAM, the problem they face is that the RAM memory has on average few memory channels, so while one core would be accessing the RAM others they would stand by. That is why a shared cache is always used at the last level, which is what the RAM is connected to.

In the cache hierarchy, the only communication of each cache level is with which it is connected, in such a way that the private caches are not connected to RAM directly but only to the shared cache or to subsequent or previous levels of the cache. private cache if any.

Sniffing mechanism for memory consistency

Cache coherence is not implemented through any coprocessor, but through a series of mechanisms integrated into the hardware, the two most used coherence protocols are the following:

Write and invalidate: When a memory write operation is performed, then all the copies of that memory line in the different caches are invalidated. Which forces the cores to have to update their data.
Write and update: In this system when a processor writes on a cache line then all the copies in the different caches are updated.

To find out which cache lines have been changed, a “sniffer” is used, which is a hardware mechanism that monitors access to memory and caches by the different cores in such a way that the coherence system knows which addresses memory changes at every moment. The problem with this method is that it doesn’t scale very well with a large number of cores.

Directory-based memory consistency mechanism

This system of coherence is based on maintaining a directory of the caches in which the different lines of memory are located. This system is a hardware-level implementation of the directory organization of an operating system. The directory does not store the data of the different cache lines but in which cache lines the different memory lines are copied.

It has the particularity that it scales much better with the number of cores, than the sniffing mechanism, but it is still a more advanced sniffing mechanism, which records all the accesses that have been made to memory, in such a way so that the different cores know which processors have previously accessed the data from a cache line.