How to Set Up Data Consolidation: Plan, Skills, Software, Sourcing Models and Costs
Since 1989, ScienceSoft has been providing data analytics services to help companies design and implement cost-effective data consolidation solutions.
Data consolidation: summary
Data consolidation is aimed at combining data from multiple sources to provide a unified view on data and enable company-wide analytics.
Key project steps: Data consolidation requirements determination, data consolidation solution conceptualization, architecture design, development, and launch.
Team: A project manager, a business analyst, a data analyst, a solution architect, a data engineer, QA and DevOps engineers.
Step 1. Business goals determination
Duration: 3-10 days (including 1-2 Q&A sessions with stakeholders), depends on the number of business units involved.
- Gathering business objectives that the data consolidation solution needs to meet, for example consolidate data in a central storage for analytics and reporting, enhance data quality and security, set up master data management, etc. These goals are further prioritized and classified into two groups – core and optional.
- Outlining a high-level project management scenario, including deliverables, skills required, time and budget, potential risks, etc.
Step 2. Discovery
Duration: 10-20 days, depends on the number and types of data source systems, etc.
Outlining business requirements and determining the high-level scope of a data consolidation solution by:
- Reviewing data sources to be integrated.
- Mapping out the current data flow.
- Discovering and describing connectivity between systems.
- Discovering and describing established data security practices.
- Conducting preliminary analysis of data in the source systems (defining data type and structure, volume, etc.) and data quality across them.
Step 3. Conceptualization and data consolidation tech selection
Duration: 10-20 days, depends on the number and types of data source systems, integration points, and complexity of the solution.
- Defining the optimal approach (ETL vs ELT) for replicating data from source systems to the final destination.
Tip: Use the ETL approach for consolidating smaller datasets of structured data, and the ELT approach to consolidate high data volumes of various structure.
- Forming solution architecture vision and selecting the preliminary tech stack, taking into account:
- Source systems (how many systems, what data types they contain, how often they are updated, etc.).
- Data volume to be consolidated.
- Data flows to be implemented.
- Data security requirements, etc.
Tip: Not to overcomplicate the architecture, set up close cooperation of business users with a business analyst and a solution architect for the accurate definition of the solution’s requirements.
Step 4. Project planning
Duration: 5-10 days, depends on the results of previous steps.
- Defining data consolidation project scope and timeline, estimating efforts for the data consolidation project, TCO and ROI.
- Outlining project risks and developing a risk management plan.
- Drawing up data consolidation project documentation (scope statement, deployment strategy, testing strategy, project implementation roadmap, etc.).
Step 5. Data consolidation architecture design
Duration: 10-40 days.
- Detailed data profiling and master data management:
- Tagging data with keywords, descriptions, or categories.
- Assessing data quality against the metrics set by business users.
- Discovering metadata and assessing its accuracy.
- Assessing risks of data consolidation, etc.
Note 1: If the source data is locked within custom-built legacy systems or is entered manually, data profiling will require much more effort.
- Designing each data layer, its internal architecture (landing, staging, processing, storing, analytical) and ETL/ELT processes for data consolidation and data flow control.
- Designing a data quality management framework (data cleansing, enrichment, deduplication, etc.).
- Designing a data security framework (data access policies and policies for data access monitoring, data encryption policies, etc.).
Note 2: Together with data consolidation architecture design, data stores (data warehouses, data marts, etc.) are designed and data objects from source systems are mapped to destination stores.
Note 3: During the data consolidation architecture design step, the final tech stack for each component of the data consolidation solution is defined.
Step 6. Development and stabilization
Duration: 10-80+ days, depends on target solution and its complexity.
- Data consolidation architecture development and testing.
- Developing ETL/ELT pipelines and ETL/ELT testing.
- Implementing data quality management, data quality validation.
- Implementing data security policies.
- Developing data models and structures.
- Data consolidation solution testing and stabilization.
Step 7. Launch and after-launch support
Duration: 5-15 – for launch, 10-60 days – for after-launch support.
- Deploying the data consolidation solution.
- ETL/ELT performance tuning.
- Adjusting data consolidation solution performance and availability, etc.
- After-launch support of the solution and end users.
- Determines data consolidation project scope, prepares budget estimations, develops a project schedule.
- Monitors project performance and costs, makes ongoing adjustments.
- Manages work and communication with vendors and suppliers.
- Communicates project updates (adjustments, progress, backlog) to project stakeholders.
- Analyzes business and user needs and expectations (enhanced data consistency and accuracy, streamlined data access, etc.).
- Analyzes data flows, their purposes, complexity and dependencies.
- Defines the requirements for a data consolidation solution.
- Coordinates the creation of project documentation (data consolidation solution scope, its components, etc.)
- Identifies data sources for data consolidation.
- Profiles data (assesses data quality, tags data, discovers master and metadata, etc.).
- Defines a requirements specification for creating data models, designing ETL/ELT processes.
- Develops and maintains a data pipeline to route source data to the destination data store.
- Builds the ETL/ELT process.
- Develops data models and their structures.
- Audits the quality of data loaded into the destination data store.
- General data consolidation solution architecture elaboration, description, and justification
- Selects and justifies the data consolidation tech stack.
- Orchestrates and controls the tech team.
Quality assurance engineer
- Designs a test strategy.
- Creates tests to evaluate the developed data consolidation solution.
- Analyzes bugs and errors found during the quality assurance activities, documents test results.
- Provides recommendations on software improvements.
- Sets up the data consolidation software development infrastructure.
- Introduces continuous integration/continuous deployment (CI/CD) pipelines to streamline data consolidation.
In-house data consolidation solution development
- Maximum control over the data consolidation project
- Possible lack of expertise/resources leading to project delays, budget overruns, etc.
- Full responsibility for the hiring and managerial efforts are on the customer’s side.
Technical resources are partially/fully outsourced
- Maintaining substantial control over the data consolidation project (technical aspects included).
- Quick project ramp-up due to high resource availability and scalability.
- Cost-effectiveness of the data consolidation project due to minimized risks of resource overprovisioning.
- Challenges in the coordination of in-house and outsourced resources.
In-house team with outsourced consultancy
- Deep understanding of the existing data flows and data to be consolidated within the internal team.
- Outsourced consultancy provides expert guidance over the data consolidation planning and execution, as well as fills in the gaps in specific tech skills.
- Risks related to consultancy vendor selection.
Requires time and expertise for establishing smooth cooperation between the in-house team and the outsourced consultancy.
Full outsourcing of the data consolidation development project
- A vendor has full responsibility for the data consolidation project and all related risks.
- Implementation of data consolidation best practices.
- High vendor risks.
Microsoft SQL Server Integration Services (SSIS)
Best for: enterprise-level on-premises data integration and transformation
- A broad range of data management capabilities (data mining, cleansing, master data management, etc.)
- Using connection managers to connect to a variety of data sources, both relational and non-relational.
- ETL/ELT support.
- Intuitive UI and the drag and drop feature for creating data integration workflows code-free.
- Alerting in case of a job failure.
Comprehensive documentation and digital training resources.
SQL Server Integration Services (SSIS) is available under the SQL Server licensing.
- Express and Developer editions – free
- Standard – $3,586/core
- Enterprise - $13,748/core
- SSIS integration runtime on Microsoft Azure - $0.84 - $35.372/hour
Talend Open Studio
Best for: consolidating data from a wide variety of sources
Gartner named Talend a Leader in the 2020 Magic Quadrant for Data Integration Tools.
- 1,000+ pre-built connectors, including:
- RDBMS connectors: Oracle, Teradata, Microsoft SQL server.
- SaaS connectors: Marketo, Salesforce, NetSuite.
- Packaged apps: SAP, Microsoft Dynamics 365, Sugar CRM.
- Support for ETL/ELT.
- Support for custom code writing for enhanced flexibility.
- Exporting and executing standalone jobs in runtime environment.
- Additional features are available through other open-source Talend extensions, including Data Streams Free Edition, Open Studio for Big Data, Open Studio for Data Quality, Open Studio for ESB and Open Studio for MDM.
Free version is available.
Pricing for commercial versions: upon request to a vendor.
Oracle Data Integrator
Best for: for bulk/batch data consolidation scenarios
Gartner named Oracle a Leader in the 2020 Magic Quadrant for Data Integration Tools.
- Based on the extract, load and transform architecture.
- Fully integrated with Oracle Database, Oracle GoldenGate, Oracle Cloud, Oracle Autonomous Data Warehouse, Oracle Fusion Middleware, Oracle Big Data Appliance, etc.
- Native big data support for Apache Spark, Apache Pig, etc.
Declarative flow-based user interface for enhanced user experience and productivity.
- Named User Plus - $900/named user plus, $198 for a software update license and support.
- Processor License - $30,000/processor, $6,600 for a software update license and support.
Estimating the cost of the data consolidation project is only possible after a careful analysis of the project scope. Below, you may find some ballpark estimates for consolidating data into the data warehouse for further analytics (software license fees are NOT included):
Data consolidation cost factors:
Consider professional services for data consolidation
Since 1989, ScienceSoft has been providing a wide range of data management services to help businesses consolidate disparate data under one roof in the most efficient and cost-effective way.
Get a consultation on data consolidation
- Data consolidation requirements engineering.
- Business case creation, recommendations on data consolidation cost optimization.
- Data consolidation conceptualization and software selection.
- Data consolidation framework design:
- Data source analysis
- Data quality framework design
- Data security framework design
- Designing each data layer
- ETL/ELT design, etc.
- Data consolidation requirements engineering.
- Business case creation, recommendations on data consolidation price optimization.
- Data consolidation conceptualization and software selection.
- Data consolidation framework design.
- Data consolidation development and stabilization.
- Data consolidation solution launch and after-launch support.
ScienceSoft is an IT consulting and software development company headquartered in McKinney, Texas. We provide consulting assistance and implement data consolidation solutions to help companies maximize data value and enhance the decision-making process. Being ISO 9001 and ISO 27001 certified, we provide data management services relying on a mature quality management system and guarantee cooperation with us does not pose any risks to our customers’ data security.