Enterprise Data Warehouse: Overview
ScienceSoft has been rendering a full range of data warehousing services since 2005.
- Stores data for particular business units
- Answers department-specific questions
- Consolidates and stores data for all business units
- Answers enterprise-level and department-specific questions
A data source layer
– data from internal and external data sources – ERP, CRM, accounting and financial software, IoT devices, social media, etc.
A staging area
– an intermediate storage area of temporary nature for data processing under the extract, transform and load (ETL) process. ETL consolidates data from multiple sources and transforms it into a modeled format suitable for storing in the enterprise DWH. Cloud-based EDWs, due to their scalability, use ELT (extract, load, transform), which means that the transformation step is performed after data loading into an enterprise data warehouse.
Data storage layer
– a centralized storage where data is made accessible for analytics (querying, reporting) and sharing.
Analytics and BI
– analytics, data mining, data reporting and visualization tools.
Architecture of an enterprise data warehouse with a staging area:
- Enterprise-wide data integration from internal and external data sources.
- Controlled data loading and data management procedures.
- Storing of historical and non-volatile data.
- Subject-oriented data repository.
- Integration with analytics and reporting software.
- Big data integration.
- Real-time data warehousing (integration of sensor data, log file data, social media data, etc.).
- Storing raw data.
- Storage in multiple environments (cloud, on-premises, hybrid).
- Instant scalability.
- Automation of EDW maintenance tasks – backups, replication, patching, etc.
- Granular access control
- Federated Queries
To achieve maximum effectiveness, EDW should integrate data from the company’s business-critical systems and external data sources, including:
To get an understanding of the value a company gets by integrating all-rounded data from various sources, have a look at one of ScienceSoft’s projects. We helped a producer of phytotherapy products consolidate their disintegrated data sources into the unified central storage to get company-wide reporting, benchmark their performance, etc.
Clear link of the EDW solution with business objectives, economic justification of EDW capabilities in terms of their business value.
Architecture flexibility for further EDW evolution without compromising the EDW performance.
Automation of EDW maintenance and administration tasks (ETL monitoring, managing data quality and data security, etc.) to decrease operational costs.
EDW stability and availability for quickly accessing business-critical data in a centralized location for analysis and reporting to reduce time-to-insight and accelerate data literacy expansion across the enterprise.
EDW high security and data protection standards.
Out-of-the-box integrations with data sources; SDKs in most common programming languages – to reduce the development costs.
The solutions we selected are recognized as leaders in enterprise data warehousing solutions (Forrester Wave, Gartner Magic Quadrant) and are in full compliance with the key criteria for an enterprise-scale DWH: almost instant scalability of compute and storage resources (due to the cloud-based nature), high performance and availability (up to 99.99% uptime), advanced security, etc.
A scalable data warehousing solution with a node-based architecture, which employs parallel query processing to achieve fast query response time and high query throughput. Azure Synapse unifies the Azure Data Lake storage and the SQL data warehouse to allow direct querying of raw data and combining relational and non-relational data for deeper analytics insight.
Dynamic data masking, built-in authentication, authorization, data encryption, etc.
- Data storage – $122.88 per TB/month ($ 0.17/TB/hour). The data storage size includes your DWH data and 7 days of incremental snapshot storage.
- Query performance pricing depends on the service level and region.
A scalable data warehousing service, which achieves great performance due to such features as massively parallel processing, columnar data storage, query optimizer, result caching, etc. With the Redshift Spectrum feature it is possible to query data directly from Amazon to enable data lake analytics.
End-to-end encryption, granular access controls, network isolation, etc.
The price is charged according to the amount of stored data and the number of nodes. The on-demand pricing option starts from $0.25/hour (hourly rate based on the type and number of nodes in the cluster).
A scalable data warehousing solution backed up with the Dremel technology designed to instantly run queries on massive structured datasets.
Data encryption, Google’s virtual private cloud policy controls, etc.
Storage costs: $0.02/GB/mo ($0.01/GB/month for long-term storage).
Streaming inserts: $0.01/200 MB.
For query performance, 2 subscription options are available:
- Pay-as-you-go ($5/TB, 1st TB/month is free).
- Flat-rate pricing (from $10,000/ month for a dedicated reservation of 500 processing units).
Having 15+ years of hands-on experience in delivering DWH solutions, partnerships with global technology leaders (including Microsoft, Amazon and Oracle), we know how to deliver tailored EDW solutions that help our clients meet their tactical and strategic business objectives.
EDW consulting and implementation
To help you establish an EDW solution, we cover:
- Business needs analysis and requirements elicitation.
- EDW implementation strategy design.
- EDW configuration and development.
- EDW integration.
- Data management procedures.
- User training.
- EDW support and administration (if required).
EDW as a Service
For you to avoid EDW development, implementation and management, we customize an enterprise data warehouse and rent it out to you on a subscription fee basis.