Enterprise Data Warehouse
Architecture, Types, Features, Benefits
Since 2005, ScienceSoft provides businesses from 30+ industries with scalable EDW solutions for efficient data consolidation and enterprise-wide analytics.
An enterprise data warehouse (EDW) is a data management solution that centralizes company-wide data in a highly structured format ready for analytics querying and reporting. Its implementation costs can range from $70,000 for the most basic solution to $400,000 to cover the needs of a midsize enterprise. The costs of an EDWH for a large enterprise comprising multiple data sources and advanced analytics capabilities will be higher. An enterprise data warehouse integrates with a data lake, ML and BI software.
To enable answering both enterprise-level and department-specific questions, the enterprise data warehouse ingests data from all corporate business-critical software and external data sources, including:
Data Warehouse vs Enterprise Data Warehouse
- Stores data for particular business units
- Answers department-specific questions
- Consolidates and stores data for all business units
- Answers enterprise-level and department-specific questions
An enterprise data warehouse is a core element of a BI solution, which is structured in accordance with the particular data management and analytics needs a company pursues. Here, you may see common architectural components of the solutions that ScienceSoft delivers:
- A data source layer - an enterprise data warehouse pulls data from apps’ databases, enterprise systems (e.g., CRM, ERP, document management software, HRM) and external sources (e.g., social media, government reports, chosen stock market trackers).
- A staging area - an intermediate storage area of temporary nature for data processing under the extract, transform and load (ETL) process. ETL consolidates data from multiple sources and transforms it into a modeled format suitable for storing in the enterprise DWH. Cloud-based enterprise data warehouses, due to their scalability, use ELT (extract, load, transform), which means that the transformation step is performed after data loading into an enterprise data warehouse.
- Data storage layer - centralized storage where data is made accessible for analytics (querying, reporting) and sharing.
- Analytics and BI - data in the enterprise data warehouse can be queried via OLAP tools, data mining tools, reporting and visualization tools.
Enterprise Data Warehouse Types
There are three deployment environment types for data warehousing solutions:
- On-premises – a company purchases all required hardware and software to build and deploy an enterprise data warehouse and maintains it further on.
- Cloud-hosted – a company deploys an enterprise data warehouse in the cloud, eliminating the need to purchase and maintain hardware and software.
- Hybrid – a company augments an on-premises enterprise data warehouse with a cloud-hosted repository.
- Full control over the enterprise data warehouse. In case of a failure, an in-house IT team has direct access to the DWH’s problem area for hardware and software tuning. Moreover, data security remains strictly under the in-house IT team’s control.
- Full compliance with the required data standards. Data security compliance is easier to achieve with on-premises enterprise DWHs.
- Availability. Business users from a facility where the enterprise data warehouse is located can effectively access all the data stored in the data warehouse without the dependence on the internet connection.
See the points of caution
- Full responsibility. Together with the control of the on-premises enterprise data warehouse, a company is fully responsible for its implementation and maintenance.
- Complexity of agile scaling. To comply with the increased storage or compute requirements, you need to purchase new hardware, which may result in the need to tune or replace current software.
- Scalability. The inherent agility of cloud data warehouses allows upscaling and downscaling with no impact on the enterprise data warehouse performance.
- Reduced costs. There are no hardware-related costs (hardware acquisition, deployment, maintenance, administration, etc.). And if you opt for Enterprise Data Warehouse as a Service, all software acquisition and maintenance costs are eliminated either.
See the points of caution
- Data compliance. Although most cloud providers have security features difficult for attackers to penetrate, some industry standards and regulations (FDA, HIPAA, etc.) still require sensitive data to be stored on-premises.
- The risk of budget overruns. Unexpectedly increased query volumes, which require additional compute/storage resources, lead to overspending if no controlling or limiting the cloud resources is set up.
- Cloud flexibility. Meeting storage and compute requirements with the near-unlimited cloud resources.
- Data compliance. Ensuring sensitive data is stored within the environment which fully meets data compliance standards.
See the points of caution
- DWH costs. The company has to cover the maintenance costs and operating expenses of the on-premises DWH system while still paying the subscription fee for cloud DWH services.
For each project, ScienceSoft creates a tailored feature set that perfectly suits our customers’ business needs. From our experience, the following functionality is common for secure and efficient enterprise data warehouses.
Data integration and management
- Data integration with ETL/ELT.
- Full and incremental data extraction/load.
- Structured, semi-structured, unstructured data ingestion.
- Big data ingestion.
- Streaming data ingestion.
- Data loading and querying using SQL.
- Subject-oriented data repository.
- Time-variant (data from the historical point of view) data repository.
- Nonvolatile (read-only) data repository.
- Granular data storage.
- Metadata storage.
- Storage in multiple environments (cloud, on-premises, hybrid).
- Automated DWH maintenance tasks – backups, replication, patching, etc.
- Advanced data searching (materialized view support, data indexes, result-caching, etc.).
Security and compliance
- Data encryption.
- Securing data access with user authentication and authorization.
- Granular access control (row- and column-level).
- Compliance with national, regional, and industry-specific regulations (for example, GDPR, HIPAA, PCI DSS).
To build a future-proof EDW with all the necessary features, it’s crucial to create data models that not only reflect current business operations but can be easily modified in response to your business growth. Keeping data models’ flexibility in mind, ScienceSoft’s solution architects always work together with seasoned industry experts to envisage potential business changes and efficiently organize enterprise data access, collection, storage, analysis, security, etc.
To store massive volumes of heterogeneous data imported from multiple sources in its original format. In data lakes, the captured raw data remains secure and accessible for further analytics querying and reporting with EDW.
Self-service analytics software
To enable business users to make decisions based on timely and relevant reports, queries and analysis customized and conducted according to their own needs.
Machine learning software
To enable data scientists to build machine learning models with processed and cleaned data from the enterprise data warehouse to predict a company’s revenue, assess financial risks, forecast market trends and the company’s performance, etc.
Relying on 17-year experience in data warehousing, ScienceSoft builds secure and scalable EDW solutions that help our customers achieve their business goals.
ScienceSoft as a Reliable EDW Implementation Partner
When we first contacted ScienceSoft, we needed expert advice on the creation of the centralized analytical solution to achieve company-wide transparent analytics and reporting. After a series of interviews, ScienceSoft’s consultants analyzed our workloads, documentation, and the existing infrastructure and provided us with a clear project roadmap.
They stayed in daily contact with us, which allowed us to adjust the scope of works promptly and implement new requirements on the fly. Additionally, the team delivered demos every other week so that we could be sure that the system aligned with our business needs.
Heather Owen Nigl, Chief Financial Officer, Alta Resources
Strong cross-department collaboration and increased productivity of business users that have access to a single point of truth with highly structured data.
Saved time of IT staff and data analysts due to automated data management procedures (data collection, transformation, cleansing, structuring, modeling, etc.).
A 360-degree view of your business over the yearstime with comprehensive data consolidated from the key business apps.
Improved data quality (data consistency, accuracy, completeness, auditability, etc.) due to the holistic enterprise data management approach.
EDW Implementation Success Stories by ScienceSoft
End-to-end BI Solution Development for the Producer of Phytotherapy Products
- Design and implementation of a cloud EDW based on Azure SQL Server to integrate data from multiple enterprise apps (e.g., ERP, CRM, SCM, manufacturing execution system).
- Near real-time analytics with Power BI; 20+ different report templates for the sales, production, product quality management, and other departments.
Result: Efficient cCompany-wide reporting that brings transparency to the entire product life cycle and helps the company client to improve its its internal processes and operations.
Retail Analytics Solution for a Multinational FMCG Corporation
- A three-module BI system that enables advanced sales analytics on 100 SKUs being sold via marketing channel of 10 large retail chains and 10,500 stores.
- Analytics engine for querying sales data from an OLAP cube and calculating specific KPIs (e.g., sales growth in a particular store over a time span).
Result: Boosted sales analytics helping the company identify its sales trends, find out the most promising SKUs, estimate growth potential and optimize sales and marketing activities.
BI Solution for 200 Healthcare Centers
- Development of Microsoft SQL Server enterprise data warehouse that uses a Transact-SQL script for loading data from 200 databases.
- JReport tool to build multiple reports on medication classification, missed medications, hospital readmissions, employee performance reporting, community marketing activity summary.
Result: Prompt analytics and reporting allowing healthcare centers to boost streamline their management processes.
Financial Reporting with BI for an International Real Estate Developer
- ETL pipelines and a data warehouse aggregating data from around 40 sources.
- Analytical cube with 15 dimensions and 30 measures enabling to analyze enterprise’s income, net operating expenses, net profit, operating cash flow, and other indicators.
- Integrated financial and analytical reports and dashboards based on Power BI.
Result: Comprehensive financial analysis that helped the real estate developer improve its decision-making. better understand its business.
Development of an EDW and an Analytics Solution for a Multibusiness Corporation
- Over 90 reports for different business directions and user roles.
Result: A 360-degree customer view, optimized stock management, and easy assessment of the employees’ performance.
BI Platform for a Regulatory Authority
- BI solution for financial market analysis that can support 200+ concurrent business users over 500 reports at once.
- Elaborate user access control, permission matrix based on row and column level for advanced security of sensitive data.
Result: Easy monitoring and management of enterprise data for the company to conduct in-depth financial analysis, automate data flows, build custom analytics models.
Development Costs and Timelines
The major factors that influencinge the cost of the enterprise data warehouse implementation project are:
- The number and diversity of data sources, the complexity of data flows.
- The volume of the data to be ingested and stored by the solution.
- An EDW’s performance, availability, scalability requirements.
- The target data quality thresholds (consistency, accuracy, completeness, etc.) to be achieved within the data transformation processes.
- The data sensitivity and data security requirements.
Based on ScienceSoft’s experience in EDW software implementation, the approximate timeframes for the EDW implementation project are from 3 to 12 months and the cost of an enterprise data warehouse implementation project may vary as follows:
$70,000 – $200,000*
For companies with 200 – 500 employees.
$200,000 – $400,000*
For companies with 500 – 1,000 employees.
$400,000 – $1,000,000*
For companies with more than 1,000 employees.
*Monthly software license fee and other regular fees are NOT included.
Ballpark timelines for each stage of EDW implementation
A typical ScienceSoft's project on EDW software implementation covers the following stages and timelines:
- EDW goals elicitation: 3-20 days.
- EDW solution conceptualization and tech stack selection: 2-15 days.
- Business case and project roadmap creation: 2-15 days.
- System analysis and EDW architecture design: from 15 days.
- EDW solution development and stabilization: from 2 months.
- EDW solution launch: from 2 days.
- After-launch support, maintenance, and evolution: as requested.
The selected platforms are recognized leaders in enterprise data warehousing solutions (The Forrester Wave, Gartner Magic Quadrant), which are fully compliant with the key criteria for an enterprise-scale DWH: almost instant scalability of compute and storage resources (due to the cloud-based nature), high performance and availability (up to 99.99% uptime), advanced security, etc.
Azure Synapse Analytics
A scalable data warehousing solution with a node-based architecture, which employs parallel query processing to achieve fast query response time and high query throughput. Azure Synapse unifies the Azure Data Lake storage and the SQL data warehouse to allow direct querying of raw data and combining relational and non-relational data for deeper analytics insight.
Data storage - $23 per TB/month ($0.04/1 TB/hour).The data storage encompasses the size of your DWH and 7 days of incremental snapshot storage.
Learn Azure Synapse costs for your case with Pricing Calculator
Dynamic data masking, built-in authentication, authorization, data encryption, etc.
A scalable data warehousing service, which achieves great performance due to such features as massively parallel processing, columnar data storage, query optimizer, result caching, etc. With the Redshift Spectrum feature it is possible to query data directly from Amazon to enable data lake analytics.
The price is charged according to the amount of stored data and the number of nodes. The on-demand pricing option starts from $0.25/hour (hourly rate based on the type and number of nodes in the cluster).
Learn Amazon Redshift costs for your case with Pricing Calculator
End-to-end encryption, granular access controls, network isolation, etc.
A scalable data warehousing solution backed up with the Dremel technology designed to instantly run queries on massive structured datasets.
Storage costs: $0.02/GB/mo ($0.01/GB/month for long-term storage).
Streaming inserts: $0.01/200 MB.
For query performance, 2 subscription options are available:
- Pay-as-you-go ($5/TB, 1st TB/month is free).
- Flat-rate pricing (from $10,000/ month for a dedicated reservation of 500 processing units).
Data encryption, Google’s virtual private cloud policy controls, etc.
ScienceSoft is a global IT consulting and IT service company headquartered in McKinney, TX, US. Since 2005, we render data warehouse consulting services to support our clients’ agile and data-based decision-making. Being ISO 27001-certified, ScienceSoft guarantees cooperation with us does not pose any risks to our customers' data security.