The Customer is a U.S. company operating in 18 states. Having vast experience in telecommunications, the Customer decided to launch a new service allowing pet owners to monitor their pets’ locations with the help of wearable trackers managed via a mobile app.
The Customer wanted a big data solution that would allow the users to be always up-to-date about their pets’ locations, receive real-time notifications about critical events, as well as access the reports on their pet’s presence.
The solution was to enable media content transfer (audio, video and photos) so that pet owners could speak to their pets or see where their pets were at a particular moment.
As the Customer expected that the number of users would be constantly growing, the solution was to be easily scalable to store and process an increasing amount of data.
To ensure that the solution is highly scalable, ScienceSoft’s big data team delivered it in the cloud and made Apache Kafka, Apache Spark and MongoDB its core.
How the solution works:
- Multiple GPS-trackers transmit real-time data about pet location, as well as about events (e.g., low battery, leaving a safe territory, etc.) to the message broker using the MQTT protocol. The protocol was chosen because it helps to ensure a device-friendly interface and save mobile phone battery life.
- A stream data processor based on Apache Kafka streams data from multiple MQTT topics, processes it in real time and checks data quality. Its component, Kafka Streams, makes push-notifications possible and ensures a safe data transfer.
- A data aggregator implemented on Apache Spark processes data in memory, aggregates it by hour, day, week and month and transfers to a data warehouse. For the latter, ScienceSoft’s team suggested MongoDB technology because it allows storing the time series events as a single document (by hour, by day, by week). Besides, its document-oriented design allows in-place updates that lead to a major performance win.
- Operational database on PostgreSQL RDS stores users’ profiles, accounts and configuration data.
- RESTful services separate user interface from the data storage, as well as ensure reliability, scalability and independency from the platform type or a programming language.
The Customer received an easily scalable big data solution that allows processing 30,000+ events per second from 1 million devices. As a result, the users can track their pet’s location in real time, as well as send and receive photos, videos and voice messages. If a critical event happens (e.g., a pet crossed a geo fence set by the pet owner or the pet’s wearable tracker turned “out of communication,” etc.), the user receives push-notifications. Pet owners can also access hourly, weekly or monthly reports set automatically, or manually tune the reporting period, if needed.
Technologies and Tools
Amazon Web Services, MQTT, Apache Kafka (stream data processor), Apache Spark (data aggregator), MongoDB (data warehouse), PostgreSQL RDS (operational database), RESTful web services.