CPI and IPC, the Performance Values of Every Processor

When measuring the performance of a processor, many metrics are usually used, but the most common are the CPI and the IPC. What do these acronyms refer to and how do they affect the design of new

When we say that one processor has better performance than another processor, what exactly do we mean? Well, a processor executes a program at a higher speed than another, but what are the parameters and conditions used to improve the performance of a processor compared to its successor?

Marketing departments when it comes to selling a technology as complex as a CPU or a GPU have to pull simplifications in performance, since these have an increasing complexity and precisely one of the fallacies they have created has been the term IPC as a term to measure the performance of a processor.

CPI or Cycles per Instruction, the true measure of performance

The reality in processors is that not all instructions take the same number of cycles to resolve , that is, the instruction cycle of each instruction varies according to its complexity and the number of steps it has to go through.

One way that architects improve the performance of the processors is by reducing the number of steps a CPU needs to solve certain specific instructions, in such a way that in what is the performance when executing a thread or a program You end up increasing the speed of program execution by speeding up those instructions.

This phenomenon occurs both in CPUs and GPUs, and it is the most common way to increase the performance of a processor, since the set of registers and instructions is what the instruction does, the memory addressing it uses and the records, but does not indicate how.

For example. In this comparison table you can see the evolution of the GCN GPUs architecture from AMD to RDNA from the same company, as you can see the same shader program takes fewer cycles in the RDNA architecture, since a good part of its instructions have a Lower CPI.

IPC or instructions per cycle

In computer architecture , the concept of instructions per cycle refers to the number of instructions that a processor executes simultaneously, so it is mainly limited to the number of execution units in the processor, but it is not the only limiting factor. since there may be instructions that use elements of the processor in common and that together give worse performance.

But this term has a confusing variant, based on taking the time that two processors take to execute the same program, and then measuring the difference in time, passing it to clock cycles and measuring the variation between the two. What comes out is an average of instructions per cycle, but it is not what in computer architecture is usually called IPC.

The IPC as we have said at the beginning is the amount of simultaneous instructions that the processor is executing, and when we say executing we mean what it is solving at that moment. It is not the amount of instructions resolved since very few instructions, to say almost none, are resolved today in a single clock cycle.

Benchmarks only tell part of the story

One way to measure the performance of a processor is through benchmarks, which are nothing more than programs that execute a series of instructions and it is in the selection of the instructions used where the key to everything is. Manufacturers rely heavily on performance tests to sell the product and need to know which instructions the newer versions of these will use the most.

That is why there are benchmarks that are more oriented to take advantage of the instructions whose CPI has been improved in a processor or to give an advantage to a specific brand over the others, at the same time the manufacturers also give preference to improving the CPI and IPC of the most used instructions for performance tests.

The software is the one that commands at the end of the day

What a processor does is execute a program and the important thing is that the most used programs on the market end up gaining more and more performance, that is why not only the instructions of the processors are optimized to get more performance in the benchmarks but also to that in the most used programs and functions make a difference.

In some cases, new variants of instruction sets and execution units are created, as happened with the advent of SIMD units in the late 1990s for multimedia content or this is happening with instructions to accelerate artificial intelligence algorithms.

Performance improvement doesn’t always have to come with a reduction in the number of cycles per instruction, it can come in the form of new execution units and even supporting co-processors.

Architecture versus architecture where CPI and IPC are concerned

One of the most common fallacies is to compare two architectures under the same ISA but from different manufacturers , which is a wrong comparison if it is done without taking into account a series of conditions.

Since we are completely unaware of the historical changes that each manufacturer has made in the CPI and IPC of the instructions, when measuring a program even under the same ISA we are going to find disparate results if we compare the performance with that of another program. The reason? They both use the same instructions but the time per instruction varies.

But if we talk about implementations of the same manufacturer, then we find that both CPUs and GPUs evolve progressively, from one iteration to another a good part of the instructions of the previous one are maintained and the CPI of a few selected in the processor design period.

What is the relationship between FLOPS and CPI and CPI?

The FLOPS is the number of operations in comma or floating point that a unit of execution can do, an operation only lasts one cycle and should not be confused with an instruction. These are also used in marketing instead of CPI and IPC, especially when it comes to GPUs.

For example, we can have a processor that has a much lower FLOPS rate than another and get better performance due to the fact that the number of cycles per instruction is shorter in one architecture than another . When it comes to measuring performance in FLOPS, marketing departments take the instruction with the lowest CPI of all, as long as they present very high numbers. But in reality we all know that a program is made up of a multitude of disparate instructions.

In the same way that it is a mistake to compare different architectures with a disparate CPI, it is even more a mistake to compare different architectures in terms of the FLOPS rate.