Innovative Pathways: How AI Reshapes Market Access
This is the first of a series of articles dedicated to current, exciting developments in Artificial Intelligence (AI) and their impact on Market Access. In this article, I will focus on the emergence of synthetic data, an approach to generating data that has the potential to revolutionise evidence generation practices, especially for orphan drugs.
The emergence of synthetic data
Introduction
In recent years, AIs attracted much attention in the health sector, as it is continually unlocking new possibilities in personalised medicine and disease prevention. The market for AI-driven diagnostic, prognostic and management programmes is booming; the AI diagnostics market alone is projected to grow from $910.69m in 2022 to $11.3b by 2031 at a staggering CAGR of 32.30% (Straits Research, 2022). In this rapidly evolving field, a new AI application has emerged – synthetic data generation. Put simply, synthetic data are AI-generated information that resemble a reference real-world data set with high accuracy. Importantly, some statistical properties from the original dataset, such as correlations or proportions of categorical data, are preserved, meaning that the synthetic data “behave” the same way the reference real-world data would (Kokosi & Harron, 2022). As such, synthetic data offer a range of benefits to clinical development but also carry new risks. In the following sections I will examine the environment in which synthetic data evolved and discuss the benefits and risks that come with the use of synthetic data.
Recruitment: The first hurdle
Clinical research aims to shed light on the unknown by generating high-quality data that allow meaningful, accurate inferences to a wider population. However, this requires a diverse study population, which can be challenging to recruit. Strikingly, patient recruitment accounts for 32% of overall clinical trial costs, making it the number one cost driver of clinical trials (Deloitte, 2020). Many aspects of trial recruitment contribute to the costs, including screening, patient outreach and diagnoses to determine trial eligibility, as well as the cost of staff and facilities needed to run these processes (Dattani et al, 2013). Thus, conventional data collection, most significantly through randomised control trials, is often expensive and time-consuming: between 2016 and 2019, the recruitment duration of industry-sponsored Phase 3 trials lasted on average for 13 months (Brøgger-Mikkelsen, 2022). Notably, target recruitment is often not met in the defined time frame, leading to delays of up to 6 months with losses of US$ 600,000 to US$ 8 million per delayed day across the pharmaceutical industry (Chaudhari et al, 2020). As a result, there is a trade-off between cost and quality leading to concessions like conducting a trial with a small or skewed study population. Another factor contributing to the challenge of trial recruitment is the fear of being allocated to the placebo arm of a study, which is among the top reasons why patients do not partake in clinical trials (Deloitte, 2020).
It is noteworthy that these recruitment challenges are exacerbated in studies focussing on conditions with small patient populations such as orphan diseases. Firstly, the pool of potential participants is restricted making it challenging to enrol large numbers of patients and decreasing the odds of rallying a diversity of people. Secondly, many rare diseases lack validated diagnostic tests and few trial centres are equipped to make a diagnosis and enrol participants (Clinical Trials Arena, 2018). Finally, due to a lack of treatment options, patients suffering from rare diseases are often especially keen on being in the treatment arm of a study. As such, recruitment for rare disease trials can be particularly challenging.
Synthetic data can decrease these recruitment burdens by helping to deliver the required sample size, provided that some appropriate data can be obtained from health records to train the AI programme. Considering that most healthcare systems are undergoing a technological transition where data are increasingly stored electronically, collecting inputs to train an AI system is becoming easier; also, data from across the globe can be made available, eliminating geographical barriers. The most promising way of using this hack is a hybrid trial model that employs a synthetic control arm, meaning that the entire placebo group is artificially generated by AI. Consequently, the required sample size is halved, as patients are needed only for the treatment arm (Deloitte, 2020). This is a particularly lucrative opportunity for studies on rare diseases due to the considerable recruitment challenges previously discussed. Some argue that patients may be more inclined to participate in hybrid clinical trials as they would definitely receive active treatment, which would tackle the fear of placebo, a key challenge of trial recruitment (Goldsack, 2019). However, if patients know what group they are assigned to, the study is effectively unblinded, making it vulnerable to bias. In cases where a study population is matched with historical data, a synthetic control arm may be appropriate, for example when limited historical data are available. Therefore, while a synthetic control can be advantageous for some clinical research, it is important to reflect on the trial design and its implications for bias when planning a study. Overall, synthetic data offers significant benefits to trial recruitment including cost and time savings that could support the development of orphan drugs in particular.
Data protection
Another challenge in conventional study practices is data protection and privacy. Once patient data have been collected, considerable diligence is required to anonymise and securely store information to ensure continued compliance with safety and privacy regulations, such as the EU General Data Protection Regulation (GDPR). While the protection of trial participants and their data is of utmost importance, the interplay of different rules applying to a study can present a significant challenge to researchers (Negrouk et al, 2018).
Synthetic datasets may bypass this hurdle, as they do not contain data from actual people (Kokosi & Harron, 2022). In fact, it can be argued that synthetic data offer greater privacy than ‘normal data, as they are the product of population models being transformed into new, artificial ones, thereby rendering the original data fully unrecognisable (Zerdick, 2021). However, there is an ongoing debate on how to understand synthetic data in the context of data protection. It is uncertain whether synthetic data should be considered personal data, what threat scenarios would look like, and whether potential privacy benefits are meaningful or not (Zerdick, 2021). Therefore, many questions remain, leaving it uncertain whether or not synthetic data offer any real benefits in terms of data protection.
Rubbish in, rubbish out
The key characteristic of synthetic data is that they can preserve some of the statistical properties found in the original dataset, which enlarges a dataset while maintaining its characteristics (Kokosi & Harron, 2022). The benefit is that data biases or imbalances can be corrected by extending the dataset to a size and diversity where more accurate deductions and predictions can be made (Kokosi & Harron, 2022; Van den Goorbergh et al, 2022).
However, there are two issues. Firstly, correcting imbalances is a delicate task and can negatively impact the model if not handled correctly: False corrections can result in the inaccurate calibration of risk predictions or wrong absolute risk, leading to inaccurate results that can misrepresent the course of a disease, the effect of a therapy or the differences between a placebo and a treatment arm (Kokosi & Harron, 2022; Van den Goorbergh et al, 2022). Secondly, like any AI-generated output, synthetic data are only as good as the information the AI was trained by. If the reference data are unknowingly biased, the synthetic data will be of equally poor quality, and their use therefore highly problematic. AI bias particularly affects marginalised communities and ethnic minorities, as they are under-represented in AI training and testing datasets, which is also known as “health data poverty” (RHC, 2022). Therefore, caution and careful examination of training data sets are required to ensure the safe, effective use of AI and the generation of meaningful study outputs. However, while some biases are easy to recognise, others may be less evident: you don’t know what you don’t know. For example, when investigating a rare or poorly documented disease, it may be unclear which statistical characteristics of a dataset are accurate and which should be ascribed to bias. It is important to highlight that AI technologies are intended for use at scale, for example for nationwide health screening (RHC, 2022). Therefore, failure to detect and rectify bias has severe implications and could inflict harm at a population level, which may only be realised after months, years or maybe not at all (RHC, 2022). This is one of the most – if not the most – significant risks attached to AI in healthcare generally, and it highlights the need for vigorous quality and safety regulations around the use of AI, which are currently lacking in most countries.
Conclusion
This article has offered a glimpse into the emergence of synthetic data and their benefits and risks. While the use of synthetic data can tackle key challenges of trial recruitment, including small population sizes and costs, it also poses new, dramatic risks. The accuracy of the training datasets is of utmost importance, as well as a deep understanding of the data, potential biases and how to correct these appropriately. Implications for data privacy are yet to be fully understood, which warrants caution when employing synthetic data.
Next Steps
If this article has sparked your interest, stay tuned for the second one in this series. The next article will discuss synthetic data from a regulatory and payer perspective, looking at how synthetic data have been used in the past to support pricing decisions, and critically examining how synthetic data may or may not shape regulatory processes.
Get in touch with Decisive Consulting Ltd to explore how we can help navigate your journey through Market Access.
Written by Sophia Naegeli
Bibliography
Brøgger-Mikkelsen M. (2022). “Changes in key recruitment performance metrics from 2008-2019 in industry-sponsored phase III clinical trials registered at ClincialTrials.gov”, PLoS One, 17(7): e0271819. doi: 10.1371/journal.pone.0271819.
Chaudhari N, Ravi R, Gogtay N J, Thatte U M (2020). “Recruitment and retention of the participants in clinical trials: Challenges and solutions”, Perspect Clin Res, 11(2):64-69.
Clinical Trials Arena (2018). Patient Recruitment Strategies for Rare Diseases – Part 1. Link: https://www.clinicaltrialsarena.com/comment/patient-recruitment-solutions-for-rare-diseases-6068129-2/?cf-view [Last accessed: 22.11.23].
Dattani N, Hardelid P, Davey J, et al. (2013). “Accessing electronic administrative health data for research takes time”, Arch Dis Child, 98:391–2. doi:10.1136/archdischild-2013-303730.
Deloitte (2020). Intelligent clinical trials. Transforming through AI-enabled engagement. Link: https://www2.deloitte.com/content/dam/insights/us/articles/22934_intelligent-clinical-trials/DI_Intelligent-clinical-trials.pdf [Last accessed 31.10.23].
Goldsack J (2019). Synthetic control arms can save time and money in clinical trials. Link: Synthetic control arms are a good option for some clinical trials - STAT (statnews.com). [Last accessed: 1.11.23]
Kokosi T and Harron K (2022). “Synthetic data in medical research”, BMJMED, 1:e000167. doi:10.1136/ bmjmed-2022-000167
Negrouk A, Lacombe D, Meunier F. (2018). “Diverging EU health regulations: the urgent need for coordination and convergence.”, J Cancer Policy, 17:24–9. Doi: 10.1016/j.jcpo.2017.05.007
Regulatory Horizons Council (RHC) (2022). The Regulation of Artificial Intelligence as a Medical Device. Link: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1120503/RHC_regulation_of_AI_as_a_Medical_Device_report.pdf [Last accessed: 22.11.23]
Straits Research (2022). Artificial Intelligence (AI) in Diagnostics Market. Link: https://straitsresearch.com/report/artificial-intelligence-in-diagnostics-market. Last accessed: 3.7.2023
Van den Goorbergh R, van Smeden M, Timmerman D, et al. (2022). “The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression” Link: https://arxiv.org/abs/2202.09101. [Last accessed: 22.11.23].
Zerdick T (2021). Is the future of privacy synthetic? European Data Protection Supervisor. Link: Is the future of privacy synthetic? | European Data Protection Supervisor (europa.eu). [Last accessed: 1.11.23]