TFLOPS: Why They Are Not Useful When Comparing Yields

If there is something that everyone looks at when comparing performance between CPUs, GPUs or consoles, it is TFLOPS . But at the same time, the vast majority do not know what they are, what they measure and, above all, why they are not really important in this sector. Today we are going to put white on black on this and other topics by easily calculating how they are measured and why they are so overrated.

It is the eternal fight between those who try to make the average user aware of the fact that TFLOPS as a performance measure are not the main thing when evaluating the performance of a component or system, and among those who take it as the word of God. .

TFLOPS: Why They Are Not Useful

How to understand that this measure is not really representative as such? Well, getting to know her more thoroughly first.

The correct thing would be to talk about FLOPS, why?

Basically because TFLOPS is nothing more than a larger unit of measurement than it depends on intrinsically by means of FLOPS or Floating Point Operations per Second, translated as floating point operations per second. As its name indicates, it is the unit of performance measurement for PC within floating point operations and as a standard it is defined by several within the PC world such as LINPACK, for example.

With this understood, controversy arises, since there are several ways to measure FLOPS and therefore TFLOPS, where the latter are nothing more than a larger unit of measurement to reflect billions of instructions per second, in particular ten raised to 12 FLOPS.

The standards reflect two different measurements: real-time and sustained , where, as a general rule, no manufacturer specifies exactly which one is the one they collect in their data. Normally they offer the real time, since this usually coincides with the maximum peak of each component, offering a somewhat biased measurement with it.

Why comparing with TFLOPS is not correct?

Basically because TFLOPS are a measure that does not take into account anything about the architecture, but rather the units of computation and their speed or frequency. Therefore, it leaves aside any parameter that does influence performance, such as inputs and outputs, the arrangement of caches, their latencies, ALUs, buses and others.

To give an easy and clear example, the RX 5700 XT gets 9,754 TFLOPS, while the RTX 2070 gets 7,465 TFLOPS, which would reflect a difference of 30.66% between the two, however, the performance is practically the same in the real life.

How do you get to these numbers? Very easy:

  • TFLOPS-> Shaders x 2 x boost frequency
  • RTX 2070-> 2304 x 2 x 1620 -> 7,464,960 FLOPS -> 7,465 TFLOPS
  • RX 5700 XT -> 2560 x 2 x 1905 MHz -> 9,753,600 FLOPS -> 9,753 TFLOPS

As AMD does its calculations in Boost Clock, but it turns out that the frequency never really reaches such levels and is somewhere in between the Game Clock and the Base Clock, it is more realistic to compare yields to take the latter instead of the former, for what the RX 5700 XT would have about 8,217 TFLOPS .

In addition to this, to compare you need to know the performance per watt and the architecture at least, with all the variations that this implies. This applies to both CPUs, GPUs, consoles or any other component worth its salt, where in many cases the power of the CPU and GPU are added when it comes to SoCs, which further complicates the measuring stick. .