5 Best Big Data Databases
With 7 years in big data services, ScienceSoft assists companies with selecting and implementing proper software for their big data initiatives.
Big data databases: the essence
Big data databases are flexible repositories for storing big data. They are mostly NoSQL databases built on a horizontal architecture without rigid schemas, which enables quick and cost-effective processing of huge volumes of structured, semi-structured and unstructured data.
Features of big data databases
Data storage
- Storing petabytes of data.
- Storing unstructured, semi-structured and structured data.
- Distributed schema-agnostic big data storage.
Data model options
- Key-value.
- Document-oriented.
- Graph.
- Wide-column store.
- Multi-model.
Data querying
- Support for multiple concurrent queries.
- Batch and streaming/real-time big data loading/processing.
- Support for analytical workloads.
Database performance
- Horizontal scaling for elastic resource setup and provisioning.
- Automatic big data replication across multiple servers for minimized latency and strong availability (up to 99.99%).
- On-demand and provisioned capacity modes.
- Automated deleting of expired data from tables.
Database security and reliability
- Big data encryption in transit and at rest.
- User authorization and authentication.
- Continuous and on-demand backup and restore.
- Point-in-time restore.
Best big data databases for comparison
A leader among Big Data NoSQL databases in the Forrester Wave Report.
- Support for key-value and document data models.
- ACID (atomicity, consistency, isolation, durability) transactions.
- Integrations with AWS S3, AWS EMR, Amazon Redshift.
- Microsecond latency with DynamoDB Accelerator.
- Real-time data processing with DynamoDB Streams.
- On-demand and provisioned read/write capacity modes.
- End-to-end big data encryption.
- Point-in-time recovery and on-demand backup and restore.
Operational workloads, IoT, social media, gaming, ecommerce apps.
Database operations:
- On-demand request units (RU): $1.25/million write RU and $0/25/million read RU.
- Provisioned capacity unit (CU): $0.00065/write CU and $0.00013/read CU.
Storage: first 25 GB/month – free, $0.25/GB/month thereafter.
A leader among Big Data NoSQL databases in the Forrester Wave Report.
- Support for the multi-model data schema.
- Open-source APIs for SQL, MongoDB, Cassandra, Gremlin, etc.
- Integration with Azure Synapse Analytics for real-time no-ETL analytics on operational data.
- Support for ACID transactions.
- On-demand and provisioned capacity modes.
- Big data encryption (in transit and at rest) and access control.
- 99.999% availability.
Operations management, ecommerce, gaming, IoT apps.
Database operations:
- Provisioned throughput: 100 request units/second, single-region write account - $0.012/hour (autoscale) and $0.008/hour (manual).
- Provisioned throughput reserved capacity: up to 65% savings.
- Serverless (bills for the request units (RU) used for each database operation) – $0.25 for 1,000,000 RU.
Storage: 1GB consumed transactional storage (row-oriented) – $0.25/month.
- Support for Apache CQL API code, Cassandra-licensed drivers and developer tools for running Cassandra workloads.
- Big data encryption at rest and in transit.
- On-demand and provisioned capacity modes.
- Integration with Amazon CloudWatch for performance monitoring.
- Continuous backup of table data with point-in-time recovery.
- 99.99% availability within AWS Regions.
- Integration with AWS Identity and Access Management for database access control.
Fleet management, industrial maintenance apps.
Database operations:
- On-demand throughput: $1.45/million write RU, $0.29/million read RU.
- Provisioned throughput: write RUs - $0.00075/hour, read RUs - $0.00015/hour.
Storage: $0.30/GB/month.
- MongoDB compatibility.
- Support for the ACID transactions.
- Migration support (e.g., MongoDB databases on-premises to Amazon DocumentDB) with AWS Database Migration Service.
- Support for role-based access with built-in roles.
- Network isolation.
- Instance monitoring and repair.
- Cluster snapshots.
User profiles, catalogs, and content management.
- On-demand instances: $0.277- $8.864/instance-hour consumed (Memory Optimized Instances Current Generation).
- Database I/O: $0.20/1million request.
- Database storage: $0.10/GB/month.
- Backup storage: $0.021/GB/month.
- Flexible database management platform for big data querying with SQL.
- Automated infrastructure provisioning.
- On-demand and provisioned capacity modes.
- Amazon Redshift Spectrum to query big data in the data lake (Amazon S3).
- Federated queries support for operational data querying.
- Big data encryption (in transit and at rest).
- Network isolation.
- Row- and column-level security.
BI and real-time operational analytics on business events.
Not suitable for Online Transaction Processing (OLTP) in milliseconds.
- On-demand pricing: $0.25/hour (dc2.large) - $13.04/hour (ra3.16xlarge).
- Reserved instance pricing allows saving up to 75% over the on-demand option.
- Managed storage pricing (for RA3 node types) –$0.024/GB/month.
Big data database implementation
Big data consulting
We offer:
- Big data storage, processing, and analytics needs analysis.
- Big data solution architecture.
- An outline of the optimal big data solution technology stack.
- Recommendations on big data quality management and big data security.
- Big data databases admin training.
- Proof of concept (for complex projects).
Big data database implementation
Our team takes on:
- Big data storage and processing needs analysis
- Big data solution architecture.
- Big data database integration (integration with big data source systems, a data lake, DWH, ML software, big data analysis and reporting software, etc.).
- Big data governance procedures setup (big data quality, security, etc.)
- Admin and user training.
- Big data database support (if required).