en flag +1 214 306 68 37
Data Management Platform to Securely Consolidate Data across 15 Subsidiaries of a Biotech Company

Data Management Platform to Securely Consolidate Data across 15 Subsidiaries of a Biotech Company

Industry
Science
Technologies
AWS, Other

About our Customer

The Customer is a biopharmaceutical company that combines natural science and cutting-edge technologies to develop alternative treatments for mental disorders. It leverages data-driven drug development and digital therapeutics to innovate and personalize mental healthcare globally.

Decentralized Scientific Data Was Holding Back Research

The Customer’s company is comprised of 15 subsidiaries that operate across the globe. Having started in 2018, the Customer has accumulated vast scientific data like research findings, preclinical reports, trial protocols, and clinical trial reports. However, due to decentralized data storage and management, it was hard for the subsidiaries’ clinical scientists and managers to leverage the full value of this data.

Defining the Scope of a Data Consolidation Solution

Initially, the Customer turned to ScienceSoft to build a solution that would consolidate the research data from globally dispersed sources into a data warehouse with analytics capabilities. Relying on 17 years of experience in data warehousing and first-hand expertise from 20+ healthcare data analytics projects, ScienceSoft’s team analyzed the Customer’s goals and the overall maturity of its IT ecosystem. The experts concluded that most of the subsidiaries’ current systems could not be integrated with a DWH within the required deadline due to their technical and organizational isolation.

With that in mind, the Customer decided to start by implementing a data management platform that would centralize the subsidiaries’ data and enable its easy navigation and presentation. Since all of the Customer’s subsidiaries are legally independent entities, strictly defined user access rights were also highly important to the Customer.

To meet the Customer’s priorities and tight deadlines, ScienceSoft suggested dividing the project into two steps:

  • Developing an MVP that would enable the most critical capabilities: centralized data storage, role-based access, and ML-powered data search.
  • Investigating the subsidiaries’ systems and data to prepare the solution to be evolved into a full-scale DWH that would enable the ingestion of unstructured data types, ML-based analytics, and direct integrations with the subsidiaries’ data sources.

An AWS Data Management Platform as a Foundation for a Data Warehouse

ScienceSoft’s team delivered the MVP design in 6 weeks. The designed solution is a data management platform that stores each subsidiary’s data in a dedicated folder, enables keyword-based search across all folders, and allows data access with restrictions defined by the data owner.

Data ingestion

The proposed MVP can ingest three data formats: DOCX, PDF, and CSV. Users manually prepare and upload files to the cloud data storage (Amazon S3) via the Secure File Transfer Protocol (SFTP). Each of the 15 subsidiaries has a dedicated folder in the centralized repository. The storage also supports data versioning, enabling users to upload newer versions of files while still having access to their previous iterations.

Data search

Users can perform keyword-based searches with filters across all folders in the storage. The system runs an AWS Lambda function to find documents that contain the given keyword. Amazon QuickSight provides a dashboard table that features the keyword-containing files and links to them.

Security

Data access is restricted by row-level security policies defining which records are revealed to any user or user group. Each folder’s owner determines the access rights.

The stored data is encrypted at rest and in transit. Encryption is also applied to the administrative, system, and user action logs.

System availability and fault tolerance

The system has recovery and backup mechanisms to enable fast recovery from an unplanned event or primary data failure. The solution can recover from any single point of failure automatically.

Evolution capabilities

Data platform layers are built with regionally distributed processing and encryption for further GDPR and HIPAA compliance.

ScienceSoft also provided the Customer with an architecture design to upgrade the data management platform to a data warehouse in the future. The proposed DWH architecture enables support for unstructured data (e.g., images), data processing (e.g., slicing and dicing or data segmentation), advanced data visualization techniques (e.g., three-dimensional graphs), and ML-powered analytics capabilities (e.g., forecasting, anomaly alerting).

Cross-Subsidiary Data Availability with Granular Security

Within six weeks, the Customer received a comprehensive MVP design for a data management platform. Once implemented, it will allow scientists and managers from geographically dispersed subsidiary companies to get immediate access to valuable clinical and non-clinical data, which will streamline new drug development and improve the treatment of mental disorders.

ScienceSoft provided a detailed roadmap to MVP implementation, complete with detailed guidelines on how to set up and manage the required tools and systems.

The delivered solution design enables data access control across 15 legally independent entities and is ready to be expanded with advanced data processing, visualization, and analytics capabilities.

Technologies and Tools

SFTP, Amazon S3, AWS Lambda Functions, Amazon QuickSight

Have a question to our team or need help with your project?

Our team is ready to provide client references, estimate your project, or answer any other question related to your IT initiative.

Upload file

Drag and drop or to upload your file(s)

?

Max file size 10MB, up to 5 files and 20MB total

Supported formats:

doc, docx, xls, xlsx, ppt, pptx, pps, ppsx, odp, jpeg, jpg, png, psd, webp, svg, mp3, mp4, webm, odt, ods, pdf, rtf, txt, csv, log

More Case Studies