Most of the time, not enough sample clinical data is available due to various reasons. This sample data is essential for running analytics for development and testing. Rajeev Gangal discusses the way they tackled their need for generating simulated data for a project. Read on to know more.

To paraphrase a recent statement by The Economist, ‘Data is the New Oil.’ Unfortunately, companies often have limited access to it. Often, the amount of real available data is not enough to run typical analytics and data science algorithms to obtain insights based on KPIs.

Simulation is a methodology that addresses this gap between data and actionable insights by generating data, which has properties similar to the original sample data.

Clinical data, especially lab data, is unavailable in the public domain, as it is a critical intellectual property for the sponsor and sensitive data from a participating patient’s viewpoint. Sometimes, very limited anonymized samples (50 subjects) may be available and simulating and generating data for more subjects to help development and testing of Patient (Clinical) Data repositories. It may be argued that data can also be curated manually; however, that can only be done for small datasets, which may miss the statistical nature of actual data.