Big Data: What Does it Have to Do with Hardware?

The term Big Data, also known as Macrodata, takes quite some time at the scene of modern computing. However, like “the cloud”, it is a term that is sometimes difficult to explain since it is quite abstract. So, in this article we are going to explain what Big Data is , what it consists of and, more importantly, what the hardware influences on it.

It is not surprising that conspiracy theorists develop many theories on this term, but from now on we can guarantee you that there is no link between Big Data and world domination, you can rest assured. So what are these Big Data that are talked about so much in modern computing? Let’s see it.

Big Data

What is the big data

Essentially, it means ” a massive volume of data “, but if we told you before that it was a somewhat abstract concept, it is because this is not all, but also encompasses the study of this data to look for patterns in them. It is a cost effective and complicated way of information processor to try to discover something useful in it.

To give you an example, imagine a supercomputer doing tests to investigate a disease, which extracts millions and millions of data. Big Data includes not only that data, but the way to manage, classify and analyze it to try to find the answers that are sought.

Thus, Big Data has five characteristics that define its use and philosophy:

  1. Volume – of course we are talking about massive volumes of data, so if the size of these is not significant it cannot be considered Big Data. Volume is therefore the primary characteristic of this concept.
  2. Variety – This attribute addresses the nature and type of data to be analyzed.
  3. Speed – this data must be analyzed in real time, which means that even when analyzing huge volumes of data they must all be available at the same time. This is where hardware comes into play, both for the capacity to house the data and for the power to manage it.
  4. Variability – the coherence of the data sets determines the extent to which they fit the concept.
  5. Accuracy – is the quality of the data used for the analysis. Only quality data can produce patterns, otherwise it would be a waste of time. In other words, if you are analyzing the data from an investigation of a disease, you cannot enter data related to the analysis of the times of a Formula 1 driver because it would be inconsistent.

How much data is generated and stored?

In total, there are an estimated 2.7 Zettabytes of data in the digital universe. What does this amount to? Let’s see the table …

  • A Terabyte is 1024 Gigabytes
  • One Petabyte is 1024 Terabytes
  • An Exabyte is 1024 Petabytes
  • A Zettabyte is 1024 Exabytes.

So 2.7 Zettabytes is about 2,968,681,394,995 Gigabytes. If we wanted to store it on 4TB hard drives, we would need almost 725 million hard drives, something unthinkable, right? Well, not so much actually, considering that more than 150,000 emails are sent every minute, 3.3 million Facebook posts are generated or 3.8 million Google searches are performed.

Furthermore, these figures increase day after day and more and more information is generated. To put you in perspective, in 2020 44 times more data is produced than in 2010, and the expectation is that these figures that we have given you will multiply by two before five years.

Big Data management and how hardware influences

In reality, the management of Big Data is not too complicated to understand. We will try to explain it in a simple way (reality is somewhat more complex, but for us to understand each other we will simplify it as much as possible):

  1. The data is captured.
  2. The captured data is sorted and separated into smaller units by an algorithm to make analyzing them easier.
  3. An index of the data is created, since otherwise the time it would take to find any data would be multiplied.
  4. The data is stored.
  5. The data is analyzed using a large number of algorithms in order to search for the data that interests us, as we explained before.
  6. The results are displayed.

Following the example of the supercomputer that is being used to analyze a disease and try to find a cure. This super computer generates a massive volume of data, with many inputs and calculations every second, so it takes a huge storage space to be able to save and classify them for further analysis.

This is where hardware comes in. You need a lot of storage space, but also that it be very fast, everything possible to manage this data in the shortest possible time. You also need a lot of RAM and a lot of computing capacity to be able to run the algorithms that analyze this data, right?

In summary, the management of Big Data is only possible as the hardware industry advances, since if the processors, hard drives and RAM do not improve at the same rate as the data we generate grows, their analysis would not be possible .