Data Warehouse Pricing: Things To Be Aware Of

Head of Data Analytics Department, ScienceSoft

5 min read

Editor's note: In the article, Alex tells about the factors that influence data warehouse pricing relying on his 25 years of experience in data analytics.

I bet you’ve already spent quite a time trying to find out how much a DWH can cost you and spent it in vain. Little wonder as there is a direct correlation between the DWH scope and its costs. Therefore, even if you start building a data warehouse from scratch and know the approximate data volume, this input data is not enough for accurate calculations as the price ranges are too wide due to scope variations. However, if you know the factors that comprise the DWH price, you’ll be able to calculate a ballpark figure after defining your DWH objectives.

With ScienceSoft having no interest in selling hardware or providing cloud storage, I have prepared this unbiased overview of price ranges for DWH summands. For the overview to be digestible, let me divide the overall costs into three groups: costs associated with software, hardware\virtual machines, and personnel.

Costs associated with software

Thoroughly chosen software enables DWH scalability and efficiency. For example, in one of ScienceSoft’s projects, AWS Application Load Balancers allowed a telecom company to scale the data volume up and down with no impact on the performance of their data warehouse.

When building a data warehouse, I recommend you start with analyzing data sources and data volumes to gather functional and non-functional software requirements. I suggest commissioning this task to professionals to ensure that the chosen software is in full compliance with your business needs.

When choosing database\metadata management software and extraction, transformation, and loading (ETL) tools, nothing prevents companies from using proprietary solutions (Oracle, IBM DB2, Microsoft SQL Server, Vertica, etc.). In ScienceSoft’s practice, however, we often use open-source solutions (PostgreSQL, Greenplum, Cassandra, ClickHouse, Talend, etc.), which are known for their stability and reliability in production and have a massive user community.

In case of opting for free database management software and open-source ETL tools, the major item of software expenditure becomes the ETL pipeline creation. The development of each pipeline can cost you from $140 to $2,800 depending on the project complexity (the number of data sources and data formats, data quality and consistency requirements, DWH configurations, etc.) Thus, once you know the complexity of data preparation requirements for your DWH project, you will be able to calculate your software costs.

Costs associated with hardware\virtual machines

Choosing the way to store the massive amount of data is the second thing on the agenda. Here you face two options: an on-premises or a cloud storage system. Even if you know the data volume to keep and get actionable insights out of, the prices may vary significantly. Let’s see how the price ranges for storing 1Tb (+0.5Tb growing annually) of data depending on the storage-type choice. I’ve selected hardware configurations and cloud clusters to best accommodate the chosen data volume.

Data warehouse hardware\virtual machines costs

Personnel costs

To get the full idea of the DWH expenses, you have to carefully calculate the cost of the development and implementation services not to make the staffing the most expensive part of your DWH project. To give you an estimate, I have taken the average market hourly rates of DWH consultants, developers and support specialists and the amount of time required to perform such activities as DWH consulting, configuration, data migration, data cleaning, user training, ongoing troubleshooting, and performance tuning. But again, the costs associated with personnel depend on your DWH needs and constraints, and any accurate calculations are only possible after analyzing your particular case.

Data warehouse personnel costs

How to balance the books?

The inaccurate estimation of DWH costs makes your project run the risk of ending in overspending and falling behind schedule. The cost factors I outlined here can be your first step to avoid such a negative outcome. For a more precise calculation, many more aspects should be taken into account, for example, if your company will need real-time data processing. In case you’re not sure you have all the factors to estimate your DWH project cost, ScienceSoft’s data analytics team can help you define them and provide you with an estimation. Just let me know.

