A good part of the power in our PCs is based on the division of work, where several elements share the execution of a part of the code to work with it. But sometimes there are conflicts in access to data and instructions that become a performance problem. One way to fix it is transactional memory.
One of the biggest problems with the multicore CPU systems that our PCs use is that they are based on the Von Neumann model, which is that there is only one shared memory well. As the number of execution units, cores, threads and other elements that work in parallel in a CPU increases. More and more conflicts are created between them. Not only in accessing the data, but also in the information contained in the different memory addresses and therefore the value of the variables used by the programs. There are many methods to avoid these conflicts, one of them is transactional memory, which we are going to describe in this article.
An introduction to the problem
When writing a program, it is encoded in a series of instructions that apparently are executed sequentially. But already with the parallelism of instructions with a single kernel in the middle of execution, different execution units can enter. To this we must take into account that the execution out of order adds the complexity that access to memory and data at runtime is done in a disorderly way.
When there are a large number of requests, it ends up creating a contention to access the same memory. This causes requests to be delayed longer and longer, increasing memory latency with the CPU on certain instructions and affecting bandwidth. For this, there are mechanisms that avoid these conflicts in memory access as much as possible, in such a way that the processes access memory from orderly memory. This avoids conflicts when modifying the data in its hierarchy, as well as reducing contention problems and consequently access latency.
The simplest way to achieve this is through locks, which are sections of the code where we mark that they do not have to be executed simultaneously by different threads of the CPU. That is, only one core of it can be responsible for this part of the code. So we have made a lock to the rest of the cores and the rest will only be able to enter the execution when the instruction that ends the lock is reached. Which will happen when the part of the code isolated to all the cores except one has been completed.
What is transactional memory?
One method of avoiding the problems described in the previous section is to use transactional memory. Which is not a type of memory or storage, so we are not talking about a pure piece of hardware. Its origin is in the transactions of the databases, it is a type of instructions executed in the Load-Store units.
The transaction system in a processor works as follows:
- A copy of the part of memory that multiple cores want to access is created, one for each instance.
- Each instance modifies its private copy independently of the rest of the private copies.
- If a data has been modified in a private copy and not in the rest, then the modification is also copied in the rest of the private copies.
- If two instances make a change to the same data at the same time and it creates an inconsistency in the data, then both private copies are deleted. and the private copies of the rest are copied
The fourth point is important, since it is in that part where it becomes clear that it is necessary for that part of the code to be serialized. This means that the rest of the instances stop modifying their private copies and the modifications are made by only one of the instances. When it ends, the modifications are then copied to the rest of the private copies. When the part of the code marked as transactional has already been executed and all the private copies contain the same information, then the result is copied into the corresponding cache lines and memory addresses.
Transactional memory systems, the Intel TSX
The acronym TSX, Transactional Synchronization Extensions, refers to a series of additional instructions to the x86 ISA, which are intended to add transactional memory support to Intel CPUs. Therefore, it is a series of instructions and mechanisms associated with them that allow to delimit specific sections of the code as transactional and for the Intel CPU to carry out the process that we have discussed in the previous process. But in this case the Intel implementation is a bit more complex. Since, as we have seen before, if there is a conflict between two data, the entire process is aborted by one of the running instances.
Its implementation in hardware is achieved by adding a new type of cache called transactional cache in which the different operations are performed on the different data. Keep in mind that what transactional memory seeks is to reduce conflicts when accessing memory. Although the caches support a greater amount of requests than the RAM in general, these are also limited and especially at the levels furthest from the cores. All this is combined with the use of internal memories and private registers that serve as support for the private copies executed by the different cores.
The Intel TSX instructions are not a complex set, we have on the one hand the XBEGIN instruction that marks us when a transactional section of memory begins, the XEND instruction that marks the end and the XABORT, which serves to mark an exit from the process when an exceptional situation occurs.
The end of Intel TSX instructions?
Today’s CPU control units are actually full-blown microcontrollers, this means that the way it decodes instructions and the list of instructions can be updated. Intel made the first implementation on the Haswell architecture and it has remained within Intel CPUs thus far. Since it has recently been disabled via firmware on Intel’s own sixth, seventh and eighth generation cores.
From time to time Intel performs remote updates of its CPUs, which are carried out through the Intel Management Engine that we have in our PC without our knowing it. They are not usually common but can include optimizations to the execution of certain instructions or even the elimination of support for others. The elimination of the Intel TSX in the Intel Core is because with the latest modifications of the internal microcode of the control unit it implies a conflict in the operation of the software, which means that the CPU does not work as it should.
But the real reason is that the Intel TSX allows malicious code to be executed under the radar of classic security systems, especially that which affects the operating system. Since the private copies do not correspond to the user’s environment or the operating system. So it is still a problem similar to that of speculative execution.