Healthcare Data Warehousing on Amazon Web Services (AWS)
Healthcare data warehouse: the essence
A healthcare data warehouse (DWH) helps collect healthcare data from disparate sources into a centralized repository, process and structure it for analytical querying and reporting.
A healthcare DWH ingests large volumes of data from a wide variety of data sources, including:
Achieves high healthcare data accuracy and consistency.
Improves healthcare data trustworthiness and timeliness.
Reduces the complexity of the healthcare data environment while meeting security and compliance needs.
The healthcare DWH becomes a solid foundation for further analytics initiatives targeted to achieve the following goals:
Data management and integration
- Integration of all healthcare data types (structured, semi-structured, unstructured).
- Big data ingestion.
- Streaming data ingestion.
- Integration of healthcare data with ETL/ELT processes.
- Full and incremental healthcare data extraction/load.
- Healthcare data transformation of varying complexity (data type conversion, summarization, etc.).
- SQL-based healthcare data querying.
- Integrated, summarized, subject-oriented healthcare data storage.
- Storing current and historical healthcare data.
- Metadata storage.
- Storing data oriented to a specific administrative (HR, accounting, etc.) or medical department (radiology, intensive care, pediatrics, etc.) in data marts.
- Protected health information (PHI) storage.
- Elastic scaling of storage and compute resources.
- High performance query processing.
- Materialized view support.
- Result caching.
- Performance and concurrency dynamic management.
- Automated infrastructure provisioning.
- Automatic backup of healthcare data.
- Fault tolerance.
Security and compliance
- Healthcare data encryption at rest and in transit.
- Granular row- and column-level security control.
- Dynamic healthcare data masking.
- External healthcare data tokenization.
- Network isolation.
- Compliance with healthcare regulations (HIPAA, FDA, HITECH, etc.).
A data lake
A data lake may serve as a storage for structured (e.g., claims data, EHRs) and unstructured (radiology images, audio/video recordings, streaming healthcare data from wearables and devices, etc.) data at any scale before data is further integrated into the DWH. Additionally, a data lake may become economical long-term storage for healthcare data, which is less frequently accessed but important to retain for compliance.
To enable analyzing and visualizing processed healthcare data. Self-service BI tools facilitate conducting data analysis and quickly delivering business and medical insights to the decision-makers in the form of immersive reports and interactive dashboards.
Machine learning (ML) software
To train ML models on structured healthcare data from the data warehouse and enable advanced analytics to predict clinical outcomes, deliver proactive patient care, improve healthcare scheduling, monitor supply chain, make grounded decisions about hospital spending, etc.
Proof of Concept
We help you validate your healthcare DWH solution with a tailored PoC to better understand its potential and get real-life user feedback.
Healthcare DWH scalability and flexibility
With a custom healthcare DWH solution design, we ensure your ability to instantly upload any type (structured, semi-structured, unstructured) and amount of healthcare-related data to efficiently address changing data analytics objectives.
Focus on security and healthcare data protection measures
To allow you to store and process healthcare data within the secure HIPAA-compliant environment and safeguard sensitive patient data, we select and customize the technologies enabling all-time data encryption and dynamic data masking, design data access policies, set up multi-factor authentication, etc.
Leveraging AWS technologies, ScienceSoft helps healthcare organizations build fast, highly scalable, and cost-effective DWH solutions, which ingest all types (structured, semi-structured, unstructured) of healthcare data, instantly process and store it in a highly-secure HIPAA-compliant environment.
Data warehousing service.
- Integration of all healthcare data types (structured, semi-structured, unstructured).
- SQL-querying over structured and semi-structured healthcare data in the DWH, over a data lake (S3) and operational databases.
- Instant scaling of storage and compute resources with the possibility of separate billing.
- Federated queries support.
- Optimized healthcare data storage (columnar storage, data compressions, etc.) for high performance queries.
- Materialized views support and result caching.
- Database automation (automated provisioning, backups, table design, etc.)
- ML capabilities for optimized performance under varying workloads.
- Native integrations with the AWS ecosystem (including S3, AWS Glue, Amazon EMR, Amazon Kinesis Data Firehose, Amazon QuickSight, etc.).
- Healthcare data encryption (at rest and in transit) and fine-grained access control.
- On-demand pricing: $0.25 – $13.04/hour.
- Reserved instance pricing offers saving up to 75% over the on-demand option (a 3-year term).
- Data storage (RA3 node types): $0.024/GB/month.
Amazon Relational Database Service (RDS)
Relational database service.
- Support for Amazon Aurora, MySQL, Microsoft SQL Server, PostgreSQL, Oracle Database, and MariaDB.
- Automatic configurations setup (provisioning, patching, backup, recovery, etc.) for the selected database and recommendations on database engine version, storage, etc.
- Fast and consistent database performance backed up with General Purpose SSD and Provisioned IOPS SSD storage.
- High database availability with the Multi-AZ capability (keeping a redundant copy of healthcare data in a separate location).
- Easy scaling of compute and storage resources with near-zero downtime.
- Point-in-time recovery capability.
- Healthcare data encryption in transit and at rest with key managed by default.
- Fine-grained access control.
- On-demand pricing: $0.017 – $187.859/hour.
- AWS RDS reserved instances offer saving up to 69% over the on-demand option (when used in a steady state).
- Data storage: $0.10 – $0.25/GB-month
Data integration service.
- Centralized repository (Data Catalog) for maintaining metadata for all healthcare data assets located on AWS.
- Automatic data schema discovery for managing the ETL jobs.
- Executing ETL jobs on demand, on schedule or based on event-driven triggers with the job scheduler.
- Automatic code generation (in Scala or Python) to extract, transform, and load healthcare data for high-performance ETL jobs.
- Development endpoints to develop and test ETL scripts.
- ETL operations on streaming data with the streaming ETL jobs capability.
- Combining and replicating healthcare data across multiple data stores (Amazon Redshift, Amazon S3, Amazon DynamoDB, etc.) using SQL.
- ML-driven healthcare data processing with the FindMatches capability (for example, for finding the matching records within a single database and across different ones).
- ETL jobs and development endpoints (an hourly rate based on the number of data processing units used to run an ETL job): $0.44/1 data processing unit (4 vCPU and 16 GB of memory), billed per second.
- Data Catalog storage: the first million objects stored – free, $1.00/100,000 objects stored above 1 million/month.
- Data Catalog requests: the first million request – free, $1.00/million requests above 1 million/month.
Amazon Simple Storage Service (Amazon S3)
Object storage service.
- Storing objects in S3 buckets (objects can be appended with metadata object tags for efficient store management).
- Variety of S3 storage classes with particular healthcare data access levels.
- S3 Storage Lens solution for monitoring and analyzing the stored objects.
- Analytical querying of healthcare data without data movement.
- Integration with the Amazon Athena and Amazon Redshift Spectrum analytics services for SQL querying.
- Replicating healthcare data with the S3 Replication feature for reduced latency, compliance and security needs, etc.
- Integration with Amazon Macie to pinpoint and protect sensitive healthcare data.
- 99.999999999% of healthcare data durability.
- Granting/restricting access to the S3 resources with such features as AWS Identity and Access Management, Access Control Lists, Access S3 Point, etc.
- Restricting data access to all the objects in the bucket with the S3 Block Public Access capability.
- Multi-Factor Authentication (MFA) Delete and S3 Versioning features to prevent accidental healthcare data deletes and enable restoration.
- Maintains HIPAA/HITECH compliance program.
General purpose healthcare data storage (S3 Standard):
- First 50 TB/month – $0.023/GB.
- Next 450 TB/month – $0.022/GB.
- Over 500 TB/month – $0.021/GB.
Machine learning (ML) platform.
- Integrated environment for building, training and testing ML models (SageMaker Studio).
- Preparing healthcare data and engineering ML model features with Amazon SageMaker Wrangler from a wide variety of healthcare data sources (Amazon S3, Amazon Redshift, Amazon Athena, Amazon SageMaker Feature Store, etc.).
- Building training datasets with automatic data labeling (SageMaker Ground Truth).
- Support for 150+ open-source ML models (NLP, image classification models, etc.)
- Automatic building, training and tuning of ML models with Amazon SageMaker Autopilot.
- Support for supervised, unsupervised and reinforcement learning.
- Optimized for working with major ML frameworks (TensorFlow, Apache MXNet, PyTorch, Chainer, Keras, etc.).
- Built-in ML model debugger for solving performance bottlenecks.
- Automatic model tuning and continuous performance monitoring.
- Building – ML compute instance: from $0.0582 to $28.152 /hour/instance + Storage: GB-month or Amazon Elastic File System (EFS) storage.
- Training – ML compute instance: from $0.134 to $35.894 /hour/instance + Storage: GB-month.
- Real-Time Inference – ML compute instance: from $0.065 to $28.152/hour/instance + Storage: GB-month + data processing: $0.016/GB.
- Batch Transform – ML compute instance: from $0.134 to $28.152 /hour/instance.
Consulting and Implementation Services for a Healthcare DWH Solution on AWS
With solid experience in data warehousing services, ScienceSoft, a recognized AWS Select Consulting Partner, can help you design, implement, or upgrade your AWS-based healthcare DWH solution with maximum business value and optimized investments involved.
AWS-based Healthcare DWH consulting
- Healthcare DWH solution design:
- Healthcare DWH requirements engineering.
- Business case creation.
- Outline of optimal AWS technologies for healthcare DWH solution.
- Healthcare data governance strategy design.
- Healthcare DWH solution architecture design.
- Healthcare DWH implementation/migration/ optimization strategy and plan creation.
- Consulting support during healthcare DWH implementation/migration/optimization.
- Healthcare DWH solution complete project management.
AWS-based Healthcare DWH implementation
- Healthcare DWH solution feasibility study.
- Business case creation.
- Healthcare DWH solution conceptualization and AWS techs selection.
- Healthcare DWH system analysis and architecture design.
- Healthcare DWH solution development.
- Healthcare DWH quality assurance and launch.
- Healthcare DWH support and evolution.
ScienceSoft is a global IT consulting and IT service vendor headquartered in McKinney, TX, US. As an AWS Select Consulting partner, we help healthcare organizations around the globe design and implement healthcare data warehousing solutions with AWS services to consolidate disparate healthcare data and enable transparent data analysis and reporting. To learn more check our DWH service offering.