Data Warehouse Implementation in 2021

Data Warehouse Implementation Steps, Costs, Trends - ScienceSoft

ScienceSoft has been providing data warehousing services since 2005.

The gist of data warehouse (DWH) implementation

DWH implementation implies developing and deploying a data warehouse to gather and structure company’s data from diverse sources for analytical querying and reporting. DWH is a core component of a business intelligence (BI) solution, a solution to derive historical, situational and predictive insights into business processes from collected and analyzed data.

DWH solutions

Learn what an enterprise data warehouse (EDW) is and explore its typical architecture, features, and integrations. Find out what software we recommend for building it.

Learn about a typical architecture for a healthcare DWH, its features, valuable integrations, factors that model healthcare DWH success, major financial outcomes and platforms for building a robust healthcare DWH solution.

DWH implementation trends

Trend 1. Moving to the cloud

  • Scalability and flexibility of a cloud DWH.

The inherent scalability of a cloud data warehouse allows for easy adaptation to the changing amount of data and the required processing capacity. Thus, scaling the data volume up and down does not affect the performance of the data warehouse.

  • Flexible pricing options.

Cloud providers offer flexible pricing models (e.g., pay-as-you-go) and discount opportunities for provisioned resources to meet their clients’ technical needs and budgets.

  • Data availability.

Nearly all cloud DWHs perform consistent backups automatically, which results in 99.99% data availability and fault tolerance.

Trend 2. Turning to Data Warehouse as a Service (DWaaS)

Minimizing data administration efforts.

When opting for DWaaS, you eliminate hardware and software acquisition, configuration and maintenance costs. As a DWaaS provider performs DWH administration and management, there is no need to hire an in-house team for managing the data storage infrastructure.

Trend 3. Big data integration into DWH

Combining historical business data with less structured data from big data sources (machine data, transactional data, public data, etc.) provides for uncovering hidden data patterns and correlations and getting insights that can drive business-improving actions, which is a huge step towards accurate forecasting and boosted profit.

DWH implementation plan

The process of implementing a DWH is closely bound to particular business needs and objectives, so the data warehouse implementation steps may differ or merge depending on the project specificity and scale.

1

DWH feasibility study

2

Discovery

3

DWH conceptualization and platform selection

4

Business planning

5

DWH system analysis and architecture design

6

Development and stabilization

7

DWH launch

8

DWH support and evolution

Talents required for DWH implementation

Project manager

Defines and communicates DWH implementation project objectives, manages project scope, costs, timing and quality.

Business analyst

Elicits and documents DWH solution’s functional and non-functional requirements (including DWH solution’s building blocks, integrations with data source systems, etc.), technical limitations (if any).

DWH system analyst

Analyses data sources (and their dependencies) and data analytics software (if any) to be integrated with the DWH solution. Reviews data loaded into the data warehouse for accuracy.

DWH solution architect

Draws up DWH architecture requirements. Designs DWH architecture that supports high availability, performance, scalability, and security of the DWH solution.

Data engineer

Develops a data model and its structures, draws up data flows (based on the system analyst’s input). Develops, tests and maintains a data pipeline routing source data to the DWH. Builds the ETL/ELT process.

Quality assurance engineer

Conducts DWH solution’s requirements analysis, defines a test strategy, and designs an optimal test environment to simulate real-time DWH scenarios. Executes test cases to evaluate functional and non-functional aspects of the DWH system.

DevOps engineer

Sets up the DWH software development infrastructure, automates and streamlines development and release processes by introducing CI/CD pipelines, monitors DWH performance, availability, and security.

Sourcing models

In-house DWH implementation

The company has full control over a DWH implementation project.

Caution: Not to delay or compromise the project, there should be the sufficient amount of resources and expertise.

Technical resources are partially outsourced

Augmenting the in-house tech team with a vendor’s resources to perform such activities as DWH design, implementation or support. The company has substantial control over the implementation project.

Caution: High requirements to in-house competencies. Additionally, there should be effective communication between all stakeholders to avoid project delays.

Technical resources are fully outsourced

Minimized risk of the resource overprovisioning after the project completion.

Caution: High requirements to in-house PM and BA competencies.

In-house project sponsor, everything else is outsourced

Minimized risk of DWH implementation project delays or failures due to resource unavailability. A vendor takes on full responsibility for the DWH implementation project and all related risks.

Caution: Increased vendor-related risks due to high vendor dependency.

Need Help to Implement a DWH?

With 15+ years in data warehousing, ScienceSoft is ready to advise on, implement and support your DWH to help you benefit from a cost-effective and high-performing data warehouse fully meeting your data storage, analytical and reporting needs.

Data warehouse software we recommend

Amazon Redshift

Best for: petabyte-scale analytics

DESCRIPTION

  • Integration of structured, semi-structured, unstructured data types.
  • SQL data querying (including big data).
  • Integrations with the AWS ecosystem (including S3, AWS Glue, Amazon EMR) and third-party tools (Power BI, Tableau, Informatica, Qlik, Talend Cloud).
  • Automated infrastructure provisioning, backups and cluster health monitoring.
  • Federated query support and result caching.
  • ML-optimized performance under varying workloads.
  • Data encryption in transit and at rest and fine-grained access control.
  • Separate scaling of compute and storage.

Pricing

  • On-demand pricing: $0.25/hour (dc2.large) - $13.04/hour (ra3.16xlarge).
  • Reserved instance pricing can save up to 75% over the on-demand option (in a 3-year term).
  • Data storage (RA3 node types): $0.024/GB/month.

Azure Synapse Analytics

Best for: advanced data management

DESCRIPTION

  • SQL querying of structured, semi-structured, unstructured data types.
  • Multilanguage support (T-SQL, Python, Scala, Spark SQL, .Net).
  • Native integrations with Apache Spark, Power BI, Azure ML, Azure Stream Analytics, Azure Cosmos DB, etc.
  • Integration with third-party BI tools, including Tableau, SAS, Qlik, etc.
  • Result-set caching.
  • Automatic restore points and backups.
  • End-to-end data encryption, dynamic data masking, granular access control.

Pricing

  • Compute on-demand pricing: $1.20/hour (DW100c) - $360/hour (DW30000c).
  • Compute reserved instance pricing can save up to 65% over the on-demand option (in a 3-year term).
  • Data storage: $122.88/TB/month.

Oracle Autonomous Data Warehouse

Best for: high-speed query processing

Description

  • Querying across structured, semi-structured, unstructured data types.
  • Connection with custom applications and third-party products via SQL*Net, JDBC, ODBC.
  • Connectivity to Oracle Cloud Infrastructure Object Storage, Azure Blob Storage, Amazon S3.
  • Native integration with Oracle Analytics Desktop.
  • Deployment flexibility (Oracle public cloud (shared/dedicated infrastructure) or a customer’s data center).
  • Automated scaling, performance tuning, patching and upgrades, backups and recovery.
  • Independent storage and compute scaling.
  • Data encryption at rest and in transit.
  • Multifactor authentication.

Pricing

  • Compute: $1.3441/CPU/hour.
  • Data storage: $118.40/TB/month (in the public cloud).

Get Advice on Optimal DWH Software

ScienceSoft is ready to help you choose optimal DWH technologies to reduce DWH implementation and maintenance costs and maximize ROI.

DWH implementation costs

A DWH implementation project, which involves developing a 10GB data warehouse with data integration and data cleansing processes, may vary from $225,000 to $485,000 (excluding software licensing and other regular fees).

The major factors that influence DWH implementation costs are:

  • Number of data sources (ERP, CRM, SCM, etc.) and their complexity (whether a data model description exists or not; if the data model differs from the new DWH data model).
  • Data volume.
  • Complexity of data cleansing.
  • Number of data tables and columns used for analysis.
  • Required security level and policies.
  • DWH velocity, scalability, and fault tolerance.

Consider Professional Services for DWH Implementation

ScienceSoft has been providing DWH services since 2005 and can help you build a DWH solution fully aligned with your business objectives with optimized investments involved.

DWH implementation consulting

  • DWH implementation feasibility study.
  • DWH solution conceptualization and platform selection.
  • DWH system analysis and architecture design.
  • DWH solution implementation strategy.
  • Optimal DWH implementation sourcing model.

DWH implementation outsourcing

  • DWH implementation feasibility study.
  • DWH solution conceptualization and platform selection.
  • DWH system analysis and architecture design.
  • DWH solution development.
  • DWH quality assurance and launch.
  • DWH support and evolution.

About ScienceSoft

ScienceSoft is a global IT consulting and software development company headquartered in McKinney, TX, US. Since 2005, we’ve been helping companies handle and benefit from data with a full range of DWH services, including DWH consulting, DWH implementation, DWH migration and support, and Data Warehouse as a Service (DWaaS).