The ‘Scary’ Seven: big data challenges and ways to solve them
Before going to battle, each general needs to study his opponents: how big their army is, what their weapons are, how many battles they’ve had and what primary tactics they use. This knowledge can enable the general to craft the right strategy and be ready for battle.
Just like that, before going big data, each decision maker has to know what they are dealing with. Here, we cover 7 major big data challenges and offer their solutions. Using this ‘insider info’, you will be able to tame the scary big data creatures without letting them defeat you in the battle for building a data-driven business.
Here are these 7 big data challenges:
- Insufficient understanding and acceptance of big data
- Confusing variety of big data technologies
- Paying loads of money
- Complexity of managing data quality
- Dangerous big data security holes
- Tricky process of converting big data into valuable insights
- Troubles of upscaling
Oftentimes, companies fail to know even the basics: what big data actually is, what its benefits are, what infrastructure is needed, etc. Without a clear understanding, a big data adoption project risks to be doomed to failure. Companies may waste lots of time and resources on things they don’t even know how to use.
And if employees don’t understand big data’s value and/or don’t want to change the existing processes for the sake of its adoption, they can resist it and impede the company’s progress.
Big data, being a huge change for a company, should be accepted by top management first and then down the ladder. To ensure big data understanding and acceptance at all levels, IT departments need to organize numerous trainings and workshops.
To see to big data acceptance even more, the implementation and use of the new big data solution need to be monitored and controlled. However, top management should not overdo with control because it may have an adverse effect.
It can be easy to get lost in the variety of big data technologies now available on the market. Do you need Spark or would the speeds of Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And it’s even easier to choose poorly, if you are exploring the ocean of technological opportunities without a clear view of what you need.
If you are new to the world of big data, trying to seek professional help would be the right way to go. You could hire an expert or turn to a vendor for big data consulting. In both cases, with joint efforts, you’ll be able to work out a strategy and, based on that, choose the needed technology stack.
Big data adoption projects entail lots of expenses. If you opt for an on-premises solution, you’ll have to mind the costs of new hardware, new hires (administrators and developers), electricity and so on. Plus: although the needed frameworks are open-source, you’ll still need to pay for the development, setup, configuration and maintenance of new software.
If you decide on a cloud-based big data solution, you’ll still need to hire staff (as above) and pay for cloud services, big data solution development as well as setup and maintenance of needed frameworks.
Moreover, in both cases, you’ll need to allow for future expansions to avoid big data growth getting out of hand and costing you a fortune.
The particular salvation of your company’s wallet will depend on your company’s specific technological needs and business goals. For instance, companies who want flexibility benefit from cloud. While companies with extremely harsh security requirements go on-premises.
There are also hybrid solutions when parts of data are stored and processed in cloud and parts – on-premises, which can also be cost-effective. And resorting to data lakes or algorithm optimizations (if done properly) can also save money:
- Data lakes can provide cheap storage opportunities for the data you don’t need to analyze at the moment.
- Optimized algorithms, in their turn, can reduce computing power consumption by 5 to 100 times. Or even more.
All in all, the key to solving this challenge is properly analyzing your needs and choosing a corresponding course of action.
Data from diverse sources
Sooner or later, you’ll run into the problem of data integration, since the data you need to analyze comes from diverse sources in a variety of different formats. For instance, ecommerce companies need to analyze data from website logs, call-centers, competitors’ website ‘scans’ and social media. Data formats will obviously differ, and matching them can be problematic. For example, your solution has to know that skis named SALOMON QST 92 17/18, Salomon QST 92 2017-18 and Salomon QST 92 Skis 2018 are the same thing, while companies ScienceSoft and Sciencesoft are not.
Nobody is hiding the fact that big data isn’t 100% accurate. And all in all, it’s not that critical. But it doesn’t mean that you shouldn’t at all control how reliable your data is. Not only can it contain wrong information, but also duplicate itself, as well as contain contradictions. And it’s unlikely that data of extremely inferior quality can bring any useful insights or shiny opportunities to your precision-demanding business tasks.
There is a whole bunch of techniques dedicated to cleansing data. But first things first. Your big data needs to have a proper model. Only after creating that, you can go ahead and do other things, like:
- Compare data to the single point of truth (for instance, compare variants of addresses to their spellings in the postal system database).
- Match records and merge them, if they relate to the same entity.
But mind that big data is never 100% accurate. You have to know it and deal with it, which is something this article on big data quality can help you with.
Security challenges of big data are quite a vast issue that deserves a whole other article dedicated to the topic. But let’s look at the problem on a larger scale.
Quite often, big data adoption projects put security off till later stages. And, frankly speaking, this is not too much of a smart move. Big data technologies do evolve, but their security features are still neglected, since it’s hoped that security will be granted on the application level. And what do we get? Both times (with technology advancement and project implementation) big data security just gets cast aside.
The precaution against your possible big data security challenges is putting security first. It is particularly important at the stage of designing your solution’s architecture. Because if you don’t get along with big data security from the very start, it’ll bite you when you least expect it.
Here’s an example: your super-cool big data analytics looks at what item pairs people buy (say, a needle and thread) solely based on your historical data about customer behavior. Meanwhile, on Instagram, a certain soccer player posts his new look, and the two characteristic things he’s wearing are white Nike sneakers and a beige cap. He looks good in them, and people who see that want to look this way too. Thus, they rush to buy a similar pair of sneakers and a similar cap. But in your store, you have only the sneakers. As a result, you lose revenue and maybe some loyal customers.
The reason that you failed to have the needed items in stock is that your big data tool doesn’t analyze data from social networks or competitor’s web stores. While your rival’s big data among other things does note trends in social media in near-real time. And their shop has both items and even offers a 15% discount if you buy both.
The idea here is that you need to create a proper system of factors and data sources, whose analysis will bring the needed insights, and ensure that nothing falls out of scope. Such a system should often include external sources, even if it may be difficult to obtain and analyze external data.
The most typical feature of big data is its dramatic ability to grow. And one of the most serious challenges of big data is associated exactly with this.
Your solution’s design may be thought through and adjusted to upscaling with no extra efforts. But the real problem isn’t the actual process of introducing new processing and storing capacities. It lies in the complexity of scaling up so, that your system’s performance doesn’t decline and you stay within budget.
The first and foremost precaution for challenges like this is a decent architecture of your big data solution. As long as your big data solution can boast such a thing, less problems are likely to occur later. Another highly important thing to do is designing your big data algorithms while keeping future upscaling in mind.
But besides that, you also need to plan for your system’s maintenance and support so that any changes related to data growth are properly attended to. And on top of that, holding systematic performance audits can help identify weak spots and timely address them.
Win or lose?
As you could have noticed, most of the reviewed challenges can be foreseen and dealt with, if your big data solution has a decent, well-organized and thought-through architecture. And this means that companies should undertake a systematic approach to it. But besides that, companies should:
- Hold workshops for employees to ensure big data adoption.
- Carefully select technology stack.
- Mind costs and plan for future upscaling.
- Remember that data isn’t 100% accurate but still manage its quality.
- Dig deep and wide for actionable insights.
- Never neglect big data security.
If your company follows these tips, it has a fair chance to defeat the Scary Seven.