Data Analytics System Enabling Cross Analysis of 30,000 Attributes and 100x Faster Reporting

Industry

Marketing & Advertisement, Media & Entertainment

Technologies

Hadoop, Python, Scala, Spark, AWS, Big data, Cloud, Azure, .NET, C#, WPF, XAML, MVVM

About Our Client

The Client is a leading market research company.

Challenge

Though having a robust analytical system, the Client believed that it would not be able to satisfy the company’s future needs. Acknowledging this situation, the Client was keeping their eyes open for a future-focused, innovative solution. The system-to-be was to cope with the continuously growing amount of data, analyze big data faster, and enable comprehensive advertising channel analysis.

After deciding on the system’s architecture, the Client was searching for a highly qualified team experienced in big data implementation. Satisfied with a long-lasting cooperation with ScienceSoft, the Client addressed our consultants to do the entire migration from the old analytical system to a new one.

Solution

During the project, the Client’s business intelligence architects were cooperating closely with ScienceSoft’s big data team. The former designed an idea, and the latter was responsible for its implementation.

For the new analytical system, the Client’s architects selected the following frameworks:

Apache Hadoop – for data storage.
Apache Hive – for data aggregation, querying, and analysis.
Apache Spark – for data processing.

Amazon Web Services and Microsoft Azure were selected as cloud computing platforms.

Upon the Client’s request, during the migration, the old system and the new one were operating in parallel.

Overall, the solution included five main modules:

Data preparation
Staging
Data warehouse 1
Data warehouse 2
Desktop application

Data preparation

The system has been supplied with raw data taken from multiple sources, such as TV views, mobile devices' browsing history, website visits data, and surveys. To enable the system to process more than 1,000 different types of raw data (archives, XLS, TXT, etc.), data preparation included the following stages coded in Python:

Data transformation
Data parsing
Data merging
Data loading into the system.

Staging

Apache Hive formed the core of that module. At that stage, the data structure was similar to the raw data structure and had no established connections between respondents from different sources, for example, TV and the internet.

Data warehouse 1

Similar to the previous block, this one was also based on Apache Hive. There, data mapping took place. For example, the system processed the respondents’ data for radio, TV, internet, and newspaper sources and linked users’ IDs from different data sources according to the mapping rules. The ETL for that block was written in Python.

Data warehouse 2

With Apache Hive and Spark as a core, the block guaranteed data processing on the fly according to the business logic: it calculated sums, averages, probabilities, etc. Spark’s DataFrames were used to process SQL queries from the desktop app. The ETL was coded in Scala. Besides, Spark allowed filtering query results according to access rights granted to the system’s users.

Desktop application

Utilizing the power of WPF and C#, our team developed a comprehensive desktop application that facilitated an in-depth cross-analysis of nearly 30,000 attributes, creating detailed intersection matrices for diverse market analytics. By adopting the MVVM architectural pattern, we ensured a clean separation of concerns that simplified future app enhancements and maintenance. Users benefited from a variety of standard and ad hoc reports, such as Reach Ranking and Share of Time, with the flexibility to define custom parameters for personalized insights. The application's responsive design, crafted with XAML, allowed for the presentation of complex data through simple, interactive charts. Moreover, the system offered forecasting features; for instance, it could predict revenue based on expected reach and advertising budgets.

Results

At the project closing stage, the new system was able to process several queries up to 100 times faster than the outdated solution. With the valuable insights that the analysis of almost 30,000 attributes brought, the Client was able to carry out comprehensive advertising channel analysis for different markets.

Technologies and Tools

Apache Hadoop, Apache Hive, Apache Spark, Python (ETL), Scala (Spark, ETL), SQL (ETL), Amazon Web Services (Cloud storage), Microsoft Azure (Cloud storage), .NET, WPF, C#.

Apache Hadoop Case Studies Python Case Studies Apache Spark Case Studies AWS Case Studies Big Data Case Studies Cloud Case Studies Azure Case Studies .NET Case Studies Marketing & Advertising IT Case Studies Entertainment Case Studies Software Development Case Studies Migration Case Studies BI Case Studies Data Analytics Case Studies

Get in touch instantly

Call us Email us WhatsApp Live chat

More Case Studies

AcuFlex: First-of-Its-Kind Clinician‑Ready Movement Assessment Platform for S.A.E. Orthopedics

ScienceSoft delivered an MVP of AcuFlex, an innovative slant board-connected movement assessment platform, for S.A.E. Orthopedics. The solution enables real-time joint mobility tracking and historical analytics of clinician-guided stretching sessions.

Project details

HIPAA-Compliant Provider-to-Provider Telehealth Platform for Scalable Behavioral Care Delivery

ScienceSoft is building a Microsoft-based telehealth platform for a leading behavioral health provider. Using low-code tools and pre-built integrations, core workflows were transitioned in 4 months. Full rollout is projected to cut manual coordination by 40%.

Project details

You've viewed 2 of 265 success stories