In our efforts to accelerate drug development and advance the use of AI in pharma, Saama’s product team has been training machine learning models on CRF data and queries. Our goal is to help data managers and medical monitors identify data quality issues more efficiently and enable continuous improvement as the machine learning models get smarter.

The main benefit of an AI-powered clinical data review solution is the way it minimizes manual work. People in the clinical trial industry know all too well how tedious it is to navigate multiple data listing reports and dashboards to find data discrepancies. Manual review is labor intensive, time-consuming, and prone to error, and doesn’t even fully cover patient safety data.

Getting Out of the Query Quagmire

Current data review processes generate queries that require further investigation, but queries that lead to changes in critical data points only amount to about three percent of the total. Because a typical study involves 30–50 raw CRF datasets with 40–60 variables each, and because subject data across multiple visits can result in millions of data points, it takes a lot of work to find a needle in the haystack. 

Machine learning models, trained to predict data discrepancies and manage queries using historical clinical data, can offer a significant process improvement that reduces time to database lock. Sponsors and CROs can use current clinical and scientific data from electronic data capture (EDC) systems and third-party sources (labs, biomarkers, PK/PD) to almost instantly identify discrepancies and generate query text related to: 

  • Baseline characteristics (e.g., inclusion/exclusion criteria violations)
  • Randomization/stratification (e.g., multiple enrollments for the same patient)
  • Compliance (e.g., missing study visits)
  • Disposition (e.g., discontinuation reasons)
  • Efficacy and safety (e.g., data outliers) 

Working with Machine Learning Models

Since machine learning systems involve continuous education, it’s important to have a human in the middle to provide and use feedback for model training and deployment. Users require an interactive interface for selecting models, viewing prediction details for data questions and filters, and accepting or overriding system recommendations.

Saama puts it all together in a solution called Smart Data Query (SDQ). This exciting new solution delivers a unified experience for query tracking, data review, and workflow. Using bi-directional EDC integration for query actions, the solution provides machine learning-based standardization, analysis, and submission deliverables.

Manage Data More Effectively with SDQ