Managed IT Support for Apache NiFi
The Customer is an American biotechnology corporation with 10,000+ employees.
The Customer has been running a big data solution for laboratory data and had an in-house team to support it. Still, the team was lacking the resources to support Apache NiFi – a data processing tool in the Customer’s big data ecosystem. When the Customer turned to ScienceSoft, they had a huge backlog of tasks for Apache NiFi configuration, support and enhancement, and the whole big data ecosystem suffering from delays in data transfer and processing due to the bugs in the code.
To solve the Customer’s challenge, ScienceSoft dedicated a managed IT support team consisting of a Project Manager, a Senior Open Source Engineer, and a Data Engineer.
ScienceSoft started with analyzing the Customer’s IT infrastructure in general and data pipelines with Apache NiFi in particular. During the discovery, our team found out that IT infrastructure monitoring was missing. To recommend the Customer a fitting monitoring tool, we analyzed strengths and weaknesses of several options. Based on the analysis, our team prepared a complex monitoring solution consisting of Netdata, Prometheus, and Grafana, and tuned Apache NiFi so that it could transfer data to the monitoring solution.
To deal with a backlog of tasks for Apache NiFi configuration, support and enhancement, ScienceSoft’s team established effective collaboration with the Customer’s in-house team. The majority of tasks were allocated across sprints by the in-house team, while ScienceSoft’s team took part in prioritization and could include high-priority tasks in a sprint. Every 2 weeks we had meetings with the Customer where we presented the results of our work within a sprint.
To ensure fast delivery, ScienceSoft’s team introduced CI\CD pipelines for the NiFi system using NiFi Registry API.
In general, about 50% of efforts were dedicated to development and code enhancement, and another 50% - to support tasks. Now, when the development tasks are finished, the Customer wants to shift to support and maintenance completely.
After observing how the Customer uses Apache NiFi to satisfy their needs, ScienceSoft pointed out a better option – Apache Airflow. The Customer adopted this recommendation, and included the transition from Apache NiFi to Apache Airflow into the backlog.
At the project closing stage, the stabilized big data solution was able to process several queries up to 10 times faster than before. The stability of the system and the percentage of the successfully processed data increased from 50% up to 99% thanks to the enhancements implemented by our data engineers.
Technologies and Tools
Communication and collaboration tools: Slack, Google Meet.
Big data: Apache NiFi, Apache Kafka, Apache Zookeeper.
Monitoring tools: Netdata, Prometheus, Grafana.
Ticketing system: Jira.