For years, people ask all-knowing Google how big data can help businesses to succeed, what big data technologies are the best, and a wide range of other important questions. A lot has been written and said about big data already, but the term itself remains unexplained. To be fair, we do not count a widespread definition “big data is big.” This concept raises another question: what are the measures for “big” – 1 terabyte, 1 petabyte, 1 exabyte or more?
Our big data consulting team favors a consistent approach, and here we would like to share the fundamentals and define what big data is through its key features.
We call big data the data that meets two criteria as follows.
Informationally: In contrast to traditional data that may change at any moment (e.g., bank accounts, quantity of goods in a warehouse), big data represents a log of records where each describes some event (e.g., a purchase in a store, a web page view, a sensor value at a given moment, a comment on a social network). Due to its very nature, event data does not change.
Besides, big data may contain omissions and errors, which makes it a bad choice for the tasks where absolute accuracy is crucial. So, it doesn’t make much sense to use big data for bookkeeping. However, big data is correct statistically and can give a clear understanding of the overall picture, trends and dependencies. Another example from Finance: big data can help identify and measure market risks based on the analysis of customer behavior, industry benchmarks, product portfolio performance, interest rates history, commodity price changes, etc.
Technically: Big data has a volume that requires parallel processing and a special approach to storage: one computer (or one node as IT gurus call it) is not sufficient to perform these tasks – we need many, typically from 10 to 100.
Besides, big data solution needs scalability. To cope with ever-growing data volume, we don’t need to introduce any changes to the software each time the amount of data increases. If this happens, we just involve more nodes, and the data will be redistributed among them automatically.
Let’s go beyond the definition and look at some illustrative examples to better understand what big data is. We classified these examples to show big data practical applications in different industries.
To create a 360-degree customer view, retailers need to collect, store and analyze a plethora of data. The more data sources they use, the more complete picture they will get. Say, for each of their 10+ million customers they can analyze:
Customer analytics is equally beneficial for retailers and customers. The former can adjust their product portfolio to better satisfy customer needs and organize efficient marketing activities. The latter can enjoy favorite products, relevant promotions and personalized communication.
To avoid expensive downtimes that affect all the related processes, manufacturers can use sensor data to foster proactive maintenance. Imagine that the analytical system has been collecting and analyzing sensor data for several months to form a history of observations. Based on this historical data, the system has identified a set of patterns that are likely to end up with a machine breakdown. For instance, the system recognizes that picture formed by temperature and load sensors is similar to pre-failure situation #3 and alerts the maintenance team to check the machinery.
Companies also use big data analytics to monitor the performance of their remote employees and improve the efficiency of the processes. Let’s take transportation as an example. Companies can collect and store the telemetry data that comes from each truck in real time to identify a typical behavior of each driver. Once the pattern is defined, the system analyzes real-time data, compares it with the pattern and signals if there is a mismatch. Thus, the company can ensure safe working conditions (as drivers should change to have a rest, but they sometimes neglect the rule).
Banks can detect an unusual card behavior in real time (if somebody else, not the owner, is using it) and block suspicious activities or at least postpone them to notify the owner. For example, if the user is trying to withdraw money in Spain, while they reside in Texas, before declining the transaction, the bank can check the user’s info on the social network – maybe they are simply on vacations. Besides, the bank can verify if this user has any linkage with fraud-related accounts or activities across all other channels.
There are two categories of big data sources: internal and external ones. Let’s have a closer look at them.
When a company generates data, owns and controls it, this data is internal. External data is public data or the data generated outside the company; correspondingly, the company neither owns nor controls it. Let’s look at some self-explanatory examples of data sources.
Big data can be used both as a part of traditional BI and in an independent system. Let’s turn to examples again. A company analyses big data to identify behavior patterns of every customer. Based on these insights, it allocates the customers with similar behavior patterns to a particular segment. Finally, a traditional BI system uses customer segments as another attribute for reporting. For instance, users can create reports that show the sales per customer segment or their response to a recent promotion.
Another example: Imagine an ecommerce website supported by the analytical system that identifies the preferences of each user by monitoring the products they buy or are interested in (according to the time spent on a product page). Based on this information, the system recommends “you-may-also-like” products. This is an independent system.
The world of big data speaks its own language. Let’s look at some good-to-know names and terms:
Our big data consultants created a short quiz. There are five questions for you to check how much you’ve learned about big data:
Well done! We hope that the article was helpful to you and that after reading it you’ve found the quiz easy.
Big data is another step to your business success. We will help you to adopt an advanced approach to big data to unleash its full potential.