A Full Guide to Data Warehouse Design

Data Warehouse Design Guide - ScienceSoft

ScienceSoft has been rendering data warehousing services since 2005.

Data warehouse design: the essence

A data warehouse provides for the integration of data from heterogeneous and distributed data sources and its structuring and storing for analytical querying and reporting. Data warehouse design is the first step in implementing a DWH solution, and it focuses on creating the architecture of a DWH system.

Data warehouse solution architecture

A typical DWH architecture includes:

Data source layer

– internal and external data sources (ERP, CRM, sensor devices, social media, public databases, etc.) providing data fed into the DWH.

Staging area

– a temporary repository where records from data source systems undergo consolidation and processing before loading into the storage area. The staging area may be absent when data transformation goes in the target database (DWH/data marts).

Data storage layer

– hosting a DWH database permanent data storage that keeps slightly and highly structured data, and data marts – data warehouse subsets providing information for reporting and analysis for a company’s specific business line, department, or team.

Analytics and BI

– the data in the DWH database and data marts can be queried via OLAP tools, data mining tools, reporting and visualization tools.

Sample DWH solution architecture

DWH design plan

A DWH design process and its duration depend on:

  • Source system complexity and quality.
  • Data analytics complexity.
  • Data security complexity, etc.

Below, we list core steps needed to design a data warehouse solution.

Note: The timeframes below are highly approximate, as, for example, the architecture design project for an enterprise-level DWH may last up to 3-6 months and even more because of the project scale and specificity.

1

DWH requirements engineering

2

Discovery

3

DWH conceptualization

4

DWH design project planning

5

DWH technologies selection

6

DWH system analysis and data governance design

7

DWH data modeling and ETL/ELT design

Note: The next steps would be DWH development and launch, which are not addressed within the framework of this guide. In case you are interested in the end-to-end data warehouse implementation process, explore our structured overview of the DWH implementation process.

Talents required to design a data warehouse

Project manager

End-to-end DWH design project management:

  • Defines DWH design project scope, goals and deliverables.
  • Develops the DWH design project plan and communication approach.
  • Communicates DWH design project purpose and expectations to stakeholders.
  • Estimates and coordinates the efforts of DWH design project team members.
  • Ensures timelines and quality of the DWH design project deliverables within the set budget frames.

Business analyst

  • Analyzes the needs of key stakeholders and end users and translates the needs into the DWH requirements affecting design (e.g., the DWH solution should support operational analytics).
  • Describes the scope of the DWH system, its modules, and integrations with other software.

Solution architect

  • Designs a DWH architecture based on business and technology requirements.
  • Ensures the architectural requirements (availability, scalability, performance, reliability, etc.) are implemented in the DWH design.
  • Suggests a technology stack.

DWH system analyst

  • Examines data sources and data analytics software (if any) to be integrated into the DWH solution.
  • Draws up a system requirements specification for creating data models, designing ETL/ELT processes, etc.
  • Defines data integrity and data cleansing rules, etc.

Data engineer

  • Designs a data model and its structures and draws up the data flows.
  • Designs ETL/ELT processes.

Sourcing models

All in-house

Pros: The company has full control over the DWH design project.

Caution: Risk of project delays/failure due to the shortage of resources.

Outsourcing of technical resources

The company owns the DWH design project management while relying on outsourced resources to perform DWH platform selection, DWH solution architecture design and data modeling, etc.

Pros: No risk of the technical resources overprovisioning after the project completion.

Caution: The model requires constant cooperation of all team members. High requirements for in-house PM and BA competencies.

Complete outsourcing (in-house project sponsor, everything else is outsourced)

The company communicates its DWH-related needs to a vendor, who takes on detailed DWH requirements engineering, business planning, systems analysis, DWH design, etc.

Pros: No DWH project delays or failures due to resource unavailability.

Caution: Increased vendor dependency.

Get Your DWH Well-Designed!

ScienceSoft’s DWH team is ready to design a cost-effective and high-performing DWH solution within the set time and budget frames, applying DWH design best practices.

Data warehouse software we recommend

Below, we list full-scale data warehousing platforms recognized as leaders of Gartner Magic Quadrant and Forrester Wave reports that offer a comprehensive set of technologies to design scalable and high-performing cloud DWHs.

Amazon Redshift

Best for: petabyte-scale analytics

DESCRIPTION

  • Integration of all data types (structured, semi-structured, unstructured).
  • SQL data querying (including big data).
  • Automatic infrastructure provisioning, database backups and cluster health monitoring.
  • Federated query capability.
  • Deep integration with the AWS services (including S3, AWS Glue, Amazon EMR).
  • Integration with third-party tools (Power BI, Tableau, Informatica, Qlik, Talend Cloud).
  • Materialized views and ML-optimized performance.
  • End-to-end data encryption, granular access control and network isolation.
  • Separate billing for compute and storage resources.
  • On-demand pricing with no upfront costs.

Pricing

  • On-demand pricing: $0.25/hour (dc2.large) - $13.04/hour (ra3.16xlarge).
  • Reserved instance pricing can save up to 75% over the on-demand option (in a 3-year term).
  • Data storage (RA3 node types): $0.024/GB/month.

Azure Synapse Analytics

Best for: advanced data management

DESCRIPTION

  • SQL querying of structured, semi-structured, unstructured data, including big data.
  • Support for T-SQL, Python, Scala, Spark SQL, and .Net.
  • Native integration with Azure services, including Apache Spark, Power BI, Azure ML, Azure Stream Analytics, Azure Cosmos DB, etc.
  • Integration with third-party BI services (Tableau, SAS, Qlik, etc.).
  • Speeding up queries with result-set caching and workload isolation.
  • Automatic restore points and backups.
  • Always-on data encryption, dynamic data masking and fine-grained access control.
  • Separate billing for storage and compute resources.
  • Cost optimization with the pay-as-you-go/reserved capacity pricing models.

Pricing

  • Compute on-demand pricing: $1.20/hour (DW100c) - $360/hour (DW30000c).
  • Compute reserved instance pricing can save up to 65% over the on-demand option (in a 3-year term).
  • Data storage: $122.88/TB/month.

Oracle Autonomous Data Warehouse

Best for: hybrid DWH

Description

  • Deployment flexibility (Oracle public cloud (shared/dedicated infrastructure) or a customer’s data center).
  • Integration of all data types (structured, semi-structured, unstructured).
  • Automated DWH provisioning, scaling, tuning, and securing.
  • Native integration with Oracle Analytics Desktop.
  • Connectivity to Oracle Cloud Infrastructure Object Storage, Azure Blob Storage, Amazon S3.
  • Connection with custom applications and third-party products via SQL*Net, JDBC, ODBC.
  • Always-on data encryption, multifactor authentication, data classification and discovery.
  • Independent scaling of storage and compute resources.

Pricing

  • Compute: $1.3441/CPU/hour.
  • Data storage: $118.40/TB/month (in the public cloud).

Chose Optimal Techs to Design a Reliable DWH

We are ready to assist you with selecting the right DWH technology stack to design a scalable and effective DWH solution to address your short-and long-term data storage and processing needs and reduce DWH implementation and maintenance costs.

DWH design cost

Designing a 10GB data warehouse solution, which involves data transformation and data cleansing processes, may cost from $40,000 depending on the initial data quality, data transformation complexity, etc.

Among the major DWH design cost drivers are:

  • Number of data sources (ERP, CRM, SCM, etc.), data disparity across different sources (e.g., the difference in the data structure, format), data source complexity.
  • Data volume to be processed and stored.
  • Source data quality (low-quality data requires sophisticated data cleansing procedures).
  • Required data security level.
  • DWH velocity, scalability, and fault tolerance requirements.

Consider Professional DWH Design and Implementation Services

With 15 years in data warehousing services, ScienceSoft helps you design and implement a cost-effective DWH solution meeting your tactical and strategic business needs.

DWH design

  • DWH requirements engineering.
  • DWH design project planning.
  • DWH solution conceptualization and architecture design.
  • DWH software selection.
  • DWH system analysis and data governance design.
  • Design of data models and ETL/ELT process.

DWH implementation

  • DWH requirements engineering.
  • DWH solution conceptualization and platform selection.
  • DWH architecture design.
  • DWH solution development.
  • DWH quality assurance and launch.
  • DWH support and evolution.

About ScienceSoft

ScienceSoft is a global IT consulting and software development company headquartered in McKinney, TX, US. Since 2005, we’ve been providing DWH services, including DWH consulting, to help our customers build robust analytics with scalable and effective DWH solutions designed in accordance with their particular business needs.