MTBF of the Hardware : What is it For and How is it Calculated?

Most manufacturers of hardware components, especially when it comes to storage devices and fans for PCs, usually give the MTBF data expressed in hours (many times, millions of hours). In this article we are going to tell you what this parameter is, what it is for and how it is calculated so that you can understand its importance and impact.

“This SSD has an MTBF of 1.5 million hours . ” It is an expression with a very high value (1.5 million hours is equivalent to more than 171 years) that is related to the reliability of a component, but the value is so high that it makes us doubt because, after all, nobody you imagine a component running for such a long time, right?

MTBF

What is MTBF and what is it for?

MTBF comes from the acronym in English for Mean Time Before Failures or mean time between failures. Denotes the expected time between failures of a mechanical or electronic system during normal system operation. This value is typically calculated as an arithmetic mean (average), and is used for serviceable systems. This is important, because there is also the term MTTF (Mean Time To Failure) which is the equivalent but when there is no possible repair.

SSD

Therefore, the definition of MTBF depends on the definition of what the manufacturer considers a fault, that is, it is something subjective . In the case of complex and repairable systems, failures are considered to be those outside the design conditions that put the system out of service and in a state of repair. Faults that may occur but do not put the system out of service are not taken into account.

In any case, this parameter is assumed to be the time a component is capable of operating before it begins to fail, and is calculated as the arithmetic mean of what the manufacturer expects, no more, no less. It is therefore not an exact or reliable value .

Let’s take an example to understand it better: let’s suppose three identical systems that start working correctly at the same time (time 0) until they all fail. The first system fails after 100 hours, the second after 120 hours, and the third runs for 130 hours before failing. The MTBF of the systems is the average of these three failure times, which is 116,667 hours. If these systems could not be repaired, we would be talking about MTTF.

MTBF

Thus, this parameter is used to tell us what the manufacturer “calculates” is the average life time of a device before some electronic or mechanical failure, and be careful with this because it does not contemplate other types of failures, such as The cells in an SSD have reached their write limit  (it is not considered to have failed, but to have reached the end of its useful life).

What is this parameter for then? Is it reliable?

The MTBF value can be used as a system reliability parameter or to compare different systems or designs. This value should only be understood conditionally as “half-life” and not as a quantitative identity between working and failed units.

Since it can be expressed as the average life expectancy, many engineers assume that 50% of the components will have failed when they reach their MTBF. This inaccuracy can lead to poor design decisions, because in addition the probability of failure prediction implies the total absence of systematic failures (that is, a constant failure rate with only intrinsic and random failures), which is not easy to verify.

All that said, you should know that the MTBF, although it is a reliability value, is actually more of a reliability value. This value does not guarantee that a component will work for a certain time, because after all, only electronic and mechanical failures are considered when calculating this value and many other circumstantial variables are not taken into account, such as type of use. , environmental conditions and a long etcetera.

What we mean by this is that although the MTBF can be an indication that a product is reliable, it does not guarantee us absolutely nothing (after all it is a prediction and a calculation … nobody has been testing a fan for 171 years to make sure its mean time between failures is 1.5 million hours).