Data Warehouse Design: How To Structure Your Data Assets

Head of Data Analytics Department, ScienceSoft

4 min. read

Editor’s note: In this article, ScienceSoft’s data analytics experts share their 15 years of experience in data warehouse services to help you understand the architectural differences, as well as strong and weak sides of the two data warehouse design types.

If your company is developing a strategy for implementing a data warehouse (DWH), one of the first things you’ll need to settle on is whether to build your DWH in the cloud or on premises.


On-premises DWH

The architecture of a traditional on-premises DWH is represented by two components – the staging area and the storage area.

The staging area

The staging area is architected with the process of extraction, transformation, and loading. This process may be of two kinds - traditional ETL and modern ELT- depending of the place, where the transformation step is performed. With the traditional ETL process, data transformation into the target data structure is performed before loading into the storage area. With the ELT process, raw data is retrieved and loaded into the storage area with the transformation step performed in the storage area.

The storage area

ScienceSoft employs two approaches to building a storage area - top-down and bottom-up. According to both approaches, a data warehouse involves two structural elements – a centralized repository (here all company’s data is kept) and data marts (a subject-oriented database for storing the data related to specific business areas, for example, data belonging to certain units – marketing, finance, etc.).

According to the top-down approach, a central repository is designed first. Only after that, data marts are created to further segment enterprise data. The bottom-up approach considers data marts as main elements for storing data and the centralized repository is viewed as a combination of data marts to facilitate enterprise-wide analysis and reporting.

Strong sides of the on-premises DWHs

Opting for the on-premises data warehouse, you get:

  • Full control of your DWH

You get complete control over hardware and software, so in case of a failure, your IT staff has direct access to the problem area. Your specialists can adjust or replace hardware and software any time avoiding reliance on third parties. Moreover, data security remains strictly under your IT team’s control.

  • Full compliance

Data compliance is easier to achieve with on-premises DWHs. For example, to meet some regulatory requirements, you need to know exactly where the data is located.

  • Availability

Business users from a facility where DWH is located can effectively access all the data stored in a DWH without the dependence on the internet connection.

Weak sides of the on-premises DWHs

  • Full responsibility

Together with the full control of your on-premises DWH, you get the total responsibility for its implementation and maintenance. You will have to handle hardware and software upfront and ongoing costs, as well as pay for a team in charge of the deployment, administration and maintenance of your DWH solution.

  • Lack of scalability

On-premises DWHs cannot deal with the requirements for increased memory or compute power as effectively as in-cloud DWHs. Upscaling inevitably leads to purchasing new hardware, which may result in the need to tune or replace current software.

Cloud DWH

Going cloud is one of the DWH implementation trends. Each cloud vendor offers a unique DWH architecture. Here, we will have a look at the architecture of Amazon Redshift data warehouse, which is a cloud-based representation of a traditional data warehouse.

The core infrastructure component is a cluster. A cluster is composed of one or more compute nodes. If a cluster consists of more than one compute node, a leader node is added with the purpose to manage communication between them and client applications. Each compute node has its own dedicated CPU, memory, and attached disk storage which allows parallel query processing – that is why queries are processed quickly.

Strong sides of cloud DWH

  • Scalability

The inherent agility of cloud data warehouses allows upscaling and downscaling with no impact to DWH performance.

  • Reduced costs

Data warehouse costs vary significantly depending on numerous factors - a cloud vendor, software technology stack, etc. (Here, you may have a look at how ScienceSoft’s best practices DWH implementation helped to find the optimal technological combination for the customer based on their requirements to storage and compute.) But the one thing you don’t have to worry in case of implementing a data warehouse in the cloud, is your hardware. All costs connected with it (hardware acquisition, deployment, maintenance, administration, etc.) are eliminated. In case of choosing the PaaS model for your cloud DWH, you eliminate software acquisition and maintenance costs as well.

Hand over your DWH implementation and management to us

Turn to ScienceSoft to employ data warehouse as a service (DWaaS) to eliminate all issues connected with DWH implementation and management.

Weak sides of cloud DWH

  • Compliance

Although most cloud providers have security features difficult for attackers to penetrate, some industry standards and regulations still require sensitive data to be stored on premises.

  • The risk of budget overruns

Cloud flexibility can result in overspending, for example, due to unexpectedly increased query volumes. Thus, you will have to set query threshold alerts not to end up spending too much.

Do you want the best of both worlds? Go for a hybrid DWH.

A hybrid DWH is a data repository that coexists on premises and in the cloud environment – be it public, private or both. That way it combines the flexibility of the cloud, data availability and compliance granted by the on-premises deployment.

Do you want your best DWH solution?

ScienceSoft can help you weigh all the pros and cons to choose the best DWH implementation option for your company.


I want a DWH solution

A DWH vendor with 14 years of experience, we can develop, migrate, and support your data warehouse or consult on any issue concerning your DWH.