Healthcare Data Warehousing on Amazon Web Services (AWS)

Healthcare Data Warehousing on Amazon Web Services (AWS) - ScienceSoft

ScienceSoft has been providing a full range of data warehousing and healthcare IT services since 2005.

Healthcare Data Warehouse: the Essence

A healthcare data warehouse (DWH) helps collect healthcare data from disparate sources into a centralized repository, process and structure it for analytical querying and reporting.

A healthcare DWH ingests large volumes of data from a wide variety of data sources, including:

  • Electronic health record system.
  • Sensors and wearables for patients.
  • Medical equipment telemetry.
  • Patient applications.
  • Enterprise resource planning system.
  • Human resource management system.
  • Supply chain management system.
  • Medical billing and claims management system.

Why Build a Healthcare DWH?

By integrating disparate data sources into a DWH, a healthcare organization:

Achieves high healthcare data accuracy and consistency.

Improves healthcare data trustworthiness and timeliness.

Reduces the complexity of the healthcare data environment while meeting security and compliance needs.

The healthcare DWH becomes a solid foundation for further analytics initiatives targeted to achieve the following goals:

  • Personalized care delivery.
  • Improved health outcomes and reduced medical errors.
  • Increased healthcare operational efficiency.
  • Decreased healthcare operating costs.
  • Optimized healthcare resource management.
  • Improved risk management and reduced fraudulent activity.

Healthcare DWH Features ScienceSoft Provides You With

ScienceSoft delivers functionality for AWS medical data warehouse solutions based on the customer's needs. Below, we’ve described the features often requested by healthcare organizations:

Data management and integration

  • Integration of all healthcare data types (structured, semi-structured, unstructured).
  • Big data ingestion.
  • Streaming data ingestion.
  • Integration of healthcare data with ETL/ELT processes.
  • Full and incremental healthcare data extraction/load.
  • Healthcare data transformation of varying complexity (data type conversion, summarization, etc.).
  • SQL-based healthcare data querying.

Data storage

  • Integrated, summarized, subject-oriented healthcare data storage.
  • Storing current and historical healthcare data.
  • Metadata storage.
  • Storing data oriented to a specific administrative (HR, accounting, etc.) or medical department (radiology, intensive care, pediatrics, etc.) in data marts.
  • Protected health information (PHI) storage.

Database performance

  • Elastic scaling of storage and compute resources.
  • High performance query processing.
  • Materialized view support.
  • Result caching.
  • Performance and concurrency dynamic management.

Database reliability

  • Automated infrastructure provisioning.
  • Automatic backup of healthcare data.
  • Fault tolerance.

Security and compliance

  • Healthcare data encryption at rest and in transit.
  • Granular row- and column-level security control.
  • Dynamic healthcare data masking.
  • External healthcare data tokenization.
  • Network isolation.
  • Compliance with healthcare regulations (HIPAA, FDA, HITECH, etc.).

Common Integrations for a Healthcare DWH

Important integrations for a Healthcare DWH - ScienceSoft

To increase the value of the AWS healthcare data warehouse, ScienceSoft recommends implementing the following integrations:

A data lake – Amazon Simple Storage Service (Amazon S3)

Amazon S3 may serve as an economical long-term storage for structured (e.g., claims data, EHRs) and unstructured (radiology images, audio/video recordings, streaming healthcare data from wearables and devices, etc.) data at any scale before data is queried by the DWH.

Self-service BI software – Amazon QuickSight

Self-service ML-powered BI service, which facilitates quickly delivering business and medical insights to the decision-makers in the form of immersive reports and interactive dashboards.

Machine learning (ML) software – Amazon SageMaker

To train ML models on structured healthcare data from the data warehouse and enable advanced analytics to predict clinical outcomes, deliver proactive patient care, improve healthcare scheduling, monitor supply chain, make grounded decisions about hospital spending, etc.

How ScienceSoft Ensures Medical DWH Success

To implement an efficient medical DWH solution on AWS, ScienceSoft's experts recommend that you focus on the following aspects:

Proof of Concept

We help you validate your healthcare DWH solution with a tailored PoC to better understand its potential and get real-life user feedback.

Healthcare DWH scalability and flexibility

With a custom healthcare DWH solution design, we ensure your ability to instantly upload any type (structured, semi-structured, unstructured) and amount of healthcare-related data to efficiently address changing data analytics objectives.

Security and healthcare data protection measures

To allow you to store and process healthcare data within the secure HIPAA-compliant environment and safeguard sensitive patient data, we select and customize the technologies enabling all-time data encryption and dynamic data masking, design data access policies, set up multi-factor authentication, etc.

AWS Services* to Build a Fully Fledged Healthcare DWH Solution

Leveraging AWS technologies, ScienceSoft helps healthcare organizations build fast, highly scalable, and cost-effective DWH solutions, which ingest all types (structured, semi-structured, unstructured) of healthcare data, instantly process and store it in a highly-secure HIPAA-compliant environment.

Amazon Redshift


Data warehousing service.


  • Integration of all healthcare data types (structured, semi-structured, unstructured).
  • SQL-querying over structured and semi-structured healthcare data in the DWH, over a data lake (S3) and operational databases.
  • Instant scaling of storage and compute resources with the possibility of separate billing.
  • Federated queries support.
  • Optimized healthcare data storage (columnar storage, data compressions, etc.) for high performance queries.
  • Materialized views support and result caching.
  • Database automation (automated provisioning, backups, table design, etc.)
  • ML capabilities for optimized performance under varying workloads.
  • Native integrations with the AWS ecosystem (including S3, AWS Glue, Amazon EMR, Amazon Kinesis Data Firehose, Amazon QuickSight, etc.).
  • Healthcare data encryption (at rest and in transit) and fine-grained access control.
  • HIPAA-eligible.


  • On-demand pricing: $0.25 $13.04/hour.
  • Reserved instance pricing offers saving up to 57% over the on-demand option (a 3-year term).
  • Data storage (RA3 node types): $0.024/GB/month.

Amazon Relational Database Service (RDS)


Relational database service.


  • Support for Amazon Aurora, MySQL, Microsoft SQL Server, PostgreSQL, Oracle Database, and MariaDB.
  • Automatic configurations setup (provisioning, patching, backup, recovery, etc.) for the selected database and recommendations on database engine version, storage, etc.
  • Fast and consistent database performance backed up with General Purpose SSD and Provisioned IOPS SSD storage.
  • High database availability with the Multi-AZ capability (keeping a redundant copy of healthcare data in a separate location).
  • Easy scaling of compute and storage resources with near-zero downtime.
  • Point-in-time recovery capability.
  • Healthcare data encryption in transit and at rest with key managed by default.
  • Fine-grained access control.
  • HIPAA-eligible.


  • On-demand pricing: $0.016  $195.2062/hour.
  • AWS RDS reserved instances offer saving up to 66% over the on-demand option (when used in a steady state).
  • Data storage: $0.10$0.375/GB-month

AWS Glue


Data integration service.


  • Centralized repository (Data Catalog) for maintaining metadata for all healthcare data assets located on AWS.
  • Automatic data schema discovery for managing the ETL jobs.
  • Executing ETL jobs on demand, on schedule or based on event-driven triggers with the job scheduler.
  • Automatic code generation (in Scala or Python) to extract, transform, and load healthcare data for high-performance ETL jobs.
  • Development endpoints to develop and test ETL scripts.
  • ETL operations on streaming data with the streaming ETL jobs capability.
  • Combining and replicating healthcare data across multiple data stores (Amazon Redshift, Amazon S3, Amazon DynamoDB, etc.) using SQL.
  • ML-driven healthcare data processing with the FindMatches capability (for example, for finding the matching records within a single database and across different ones).
  • HIPAA-eligible.


  • ETL jobs and development endpoints (an hourly rate based on the number of data processing units used to run an ETL job): $0.44/1 data processing unit (4 vCPU and 16 GB of memory), billed per second.
  • Data Catalog storage: the first million objects stored – free, $1.00/100,000 objects stored above 1 million/month.
  • Data Catalog requests: the first million request – free, $1.00/million requests above 1 million/month.

Amazon Simple Storage Service (Amazon S3)


Object storage service.


  • Storing objects in S3 buckets (objects can be appended with metadata object tags for efficient store management).
  • Variety of S3 storage classes with particular healthcare data access levels.
  • S3 Storage Lens solution for monitoring and analyzing the stored objects.
  • Analytical querying of healthcare data without data movement.
  • Integration with the Amazon Athena and Amazon Redshift Spectrum analytics services for SQL querying.
  • Replicating healthcare data with the S3 Replication feature for reduced latency, compliance and security needs, etc.
  • Integration with Amazon Macie to pinpoint and protect sensitive healthcare data.
  • 99.999999999% of healthcare data durability.
  • Granting/restricting access to the S3 resources with such features as AWS Identity and Access Management, Access Control Lists, Access S3 Point, etc.
  • Restricting data access to all the objects in the bucket with the S3 Block Public Access capability.
  • Multi-Factor Authentication (MFA) Delete and S3 Versioning features to prevent accidental healthcare data deletes and enable restoration.
  • Maintains HIPAA/HITECH compliance program.


General purpose healthcare data storage (S3 Standard):

  • First 50 TB/month – $0.023/GB.
  • Next 450 TB/month – $0.022/GB.
  • Over 500 TB/month – $0.021/GB.

Amazon SageMaker


Machine learning (ML) platform.


  • Integrated environment for building, training and testing ML models (SageMaker Studio).
  • Preparing healthcare data and engineering ML model features with Amazon SageMaker Wrangler from a wide variety of healthcare data sources (Amazon S3, Amazon Redshift, Amazon Athena, Amazon SageMaker Feature Store, etc.).
  • Building training datasets with automatic data labeling (SageMaker Ground Truth).
  • Support for 150+ open-source ML models (NLP, image classification models, etc.)
  • Automatic building, training and tuning of ML models with Amazon SageMaker Autopilot.
  • Support for supervised, unsupervised and reinforcement learning.
  • Optimized for working with major ML frameworks (TensorFlow, Apache MXNet, PyTorch, Chainer, Keras, etc.).
  • Built-in ML model debugger for solving performance bottlenecks.
  • Automatic model tuning and continuous performance monitoring.
  • HIPAA-eligible.


  • Building – ML compute instance: from $0.05 to $28.50 /hour/instance + Storage: GB-month or Amazon Elastic File System (EFS) storage.
  • Training – ML compute instance: from $0.23 to $113.068 /hour/instance + Storage: $0.14 per GB-month of provisioned storage.
  • Real-Time Inference – ML compute instance: from $0.056 to $113.068/hour/instance + Storage: $0.14 per GB-month of provisioned storage + data processing: $0.016/GB.
  • Batch Transform – ML compute instance: from $0.115 to $28.152 /hour/instance.

Consulting and Implementation Services for a Healthcare DWH Solution on AWS

With solid experience in data warehousing services, ScienceSoft, a recognized AWS Select Services Partner, can help you design, implement, or upgrade your AWS-based healthcare DWH solution with maximum business value and optimized investments involved.

AWS-based Healthcare DWH consulting

  • Healthcare DWH solution design:
    • Healthcare DWH requirements engineering.
    • Business case creation.
    • Outline of optimal AWS technologies for healthcare DWH solution.
    • Healthcare data governance strategy design.
    • Healthcare DWH solution architecture design.
  • Healthcare DWH implementation/migration/ optimization strategy and plan creation.
  • Consulting support during healthcare DWH implementation/migration/optimization.
  • Healthcare DWH solution complete project management.
Go for consulting

AWS-based Healthcare DWH implementation

  • Healthcare DWH solution feasibility study.
  • Business case creation.
  • Healthcare DWH solution conceptualization and AWS techs selection.
  • Healthcare DWH system analysis and architecture design.
  • Healthcare DWH solution development.
  • Healthcare DWH quality assurance and launch.
  • Healthcare DWH support and evolution.
Go for implementation

About ScienceSoft

ScienceSoft is a global IT consulting and IT service vendor headquartered in McKinney, TX, US. As an AWS Select Services partner, we help healthcare organizations around the globe design and implement healthcare data warehousing solutions with AWS services to consolidate disparate healthcare data and enable transparent data analysis and reporting. Being ISO 13485-certified, we design, develop and test high-quality medical software according to the requirements of the FDA and the Council of the European Union.

* Amazon Web Services, Amazon Redshift, Amazon QuickSight, Amazon Relational Database Service (RDS), AWS Glue, Amazon S3, Amazon Sagemaker are trademarks of, Inc. or its affiliates in the United States and/or other countries.