Translation from one ISA to Another: How to Do It through Hardware?

We all have a PostPC device and a PC being used at the same time, both types of devices have incompatible software. Wouldn’t it be ideal if a PC or PostPC device could do a translation from one ISA to another? We are going to talk about this topic in this article, in the most detailed and accessible way possible.

The world of computing is divided into two halves at the moment, on the one hand the programs that work under the ISA x86 and on the other the programs that work under the ISA x86. But in some cases it is important for a PC or PostPC to run x86 and AMD programs at the same time.

Software for translation from one ISA to another

The software solutions for a CPU with a set of registers and instructions to understand another are mainly three, none of them are solutions that do not require any hardware in the system to do the translation work, but none of them are effective enough as to guarantee the same performance as hardware-based solutions, which we will see later in this article.

Emulation of the hardware the software is intended to run on

The simplest is emulation, which consists of a program that in real time translates the source code into binary for an architecture in code that the host machine can understand. The trade-off for it? Since the translation is done in real time, we find that there is a loss of performance that depends on the difficulty of the emulation.

In addition, in some systems, not only the CPU is emulated, but all the accessory hardware, adding an extra level of difficulty in it and the need for hardware to emulate, which is usually a couple of orders of magnitude than the system that it intends to emulate.

That is why, despite the fact that the PC is technically superior to video game consoles, we find that it takes a long time, sometimes up to more than a decade, to be able to correctly emulate a video game console just launched on the market.

Translation from one ISA to another compilation path

The easiest way to make one ISA impersonate another is obviously to translate all instructions from one ISA to instructions from another ISA during the installation of an application.

And how is this achieved? Well, doing an unpacking of instructions, this means taking a line of the program code one by one with a program that works in the background during installation, which analyzes and translates the instructions of the program for the ISA at source into one or more instructions of the target ISA.

This method is the one that Apple has used in its M1 to execute the x86 code, however the code obtained is not entirely efficient compared to the direct compilation of the application, generating a native binary for the new set from the source code of the program of records and instructions.

Fat Binaries

A Fat Binary is an already compiled source code that has different sections for different architectures, in such a way that when the program starts, it first asks what the ISA of the processor is and then it goes to the memory address where the binary code for said set of records and instructions.

The consequence is that the source code of the program is much larger than in a binary for a single architecture, but at the same time it is the easiest way to ensure a transition to a new ISA on a platform, since with Fat Binaries you make sure that the application reaches the new market without many complications.

Hardware for translation from one ISA to another

But what really interests us are the options at the hardware level, mainly due to the fact that we do not want to find ourselves with the lack of speed of the emulation, the imprecision of the code generated by the translators at installation time and we want the code for another ISA that our CPU cannot execute is not there occupying memory capacity.

Is it possible to do this? Yes, but in order to understand how it is possible before we have to understand a series of basic concepts.

CISC vs RISC, two ways to design the instruction set of a processor

To understand the difference between CISC and RISC, the first thing we have to imagine is that a program is nothing more than a list of instructions, imagine for a moment that you leave a list of instructions in a paper, you have all the freedom to use the language to write said list.

Now, imagine that they tell you that you can only use a limited series of verbs. The consequence? Your list will become much larger due to the fact that you will have to compose actions through others. Well, that concept is the difference between CISC instructions and RISC instructions, the difference is that in a processor, be it a CPU, a GPU or whatever type, each of these instructions must be encoded in the hardware and has its own data path in the decoding and execution stages of the instruction cycle.

The consequence is that RISC instruction set processors are much simpler than CISC processors in composition, but this RISC vs CISC approach has been out of date for years, but it is understood that a CISC application will occupy less memory space by requiring fewer instructions, but a RISC processor is easier to run.

The importance of micro-instructions

At the end of the 80s the market was dominated by two processors, on the one hand the Intel x86 where the 80286 and the 80386 were the kings and on the other the Motorola 68000. The particularity of both? They were CISC processors and segmenting them in order to increase speed was difficult.

But what is segmentation? It is the fact of dividing the execution of an instruction into several clock cycles, in such a way that when an instruction is in stage n, then the instruction that comes after will be in n-1 and the previous one in stage n + 1 , but this, which allowed them to increase clock speeds considerably at first, became a problem and they saw that the ISAs as they were created could not scale much more, be they RISC or CISC.

The solution came from the hand of using micro-instructions, these are very basic instructions that are used to create other more complex instructions in combinations of these in different orders. The idea with regard to increasing the clock speed is that if you subdivide into a greater number of stages then the clock speed that you can achieve is how long each of these stages takes to execute.

In the end, most processors since the mid-90s are designed apparently being a specific ISA for the software but they are really disguised, since internally they add an additional decoding phase where the instructions through a special unit are translated into micro -instructions.

Micro-instructions are also important because they allow hardware engineers two things, on the one hand to take advantage of the design patterns of some instructions to others and on the other to make that in case the budget is limited they can take advantage of parts of one instruction into another. So when it comes to moving from one ISA to another in real time in terms of hardware, the important thing happens to be that the translation is done to micro-instructions, since with this we save the double translation.

Fixed-function hardware for real-time translation from one ISA to another

The image above corresponds to a VIA Technologies patent that is titled translated into Spanish in the following:

Microprocessor that executes program instructions for the ISA ARM and ISA x86 through a hardware micro-instruction translation in a common execution pipeline.

We are not going to go into the explanation of the patent point by point, only that this type of unit is not an impossible at the hardware level, but, wouldn’t it be better to move the source code from one ISA to another directly through hardware? The reason for this is none other than to simplify the translation hardware, an ISA can have dozens of different instructions and a 1: 1 correlation can lead to extremely complex hardware, while the translation into micro-instructions is much simpler.

The translation hardware has an internal table in which each instruction is translated into a series of micro-instructions, the advantage of this is that it is not even necessary to support the entire instruction set in a 1: 1 ratio and is New instructions for the source ISA can be added in future translation hardware firmware updates.

Artificial Intelligence for the translation of instructions in real time

The problem with hardware translators is that despite the fact that a set of processors of different architecture can share an external ISA, we find that each architecture within each ISA can have variations in the set of internal micro-instructions, in such a way that forces to create a new translator with each new architecture launched on the market.

The solution to avoid this headache is to pull artificial intelligence, based on training an AI so that it learns the patterns to translate one ISA into another, basically this would be nothing more than training an AI to translate a language natural to another, but this solution is more of a mixed solution since it combines the compilation or emulation by software with the hardware for the AI that is that it learns the patterns that it will then apply to generate the code.

Because we are seeing the addition of hardware units to accelerate artificial intelligence algorithms and the cost in terms of hardware is zero, since it does not require the creation of additional hardware. Using artificial intelligence algorithms alongside hardware to accelerate them will be key to moving from one ISA to another in real time.