The different models described above present several challenges, as detailed below.
Related to the database content:
Differences in the underlying health care systems,
Different mechanisms of data generation and collection as well as data availability,
Mapping of different drugs and disease dictionaries (e.g., SNOMED, the International Classification of Disease, 10th Revision (ICD-10), Read codes),
Free text medical notes in different languages,
Differences in the validation of study variables and access to source documents for validation,
Differences in the type and quality of information contained within each database.
Related to the organisation of the network:
Different ethical and governance requirements in each country regarding processing of anonymised or pseudo-anonymised healthcare data,
Issues linked to intellectual property and authorship,
Implementing quality controls procedures at each partner and across the entire network,
Sustainability and funding mechanisms,
The networks tend to become very topic specific over time and to become isolated in ‘silos’.
Each model has strengths and weaknesses in facing the above challenges, as illustrated in Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies (eGEMs 2016;4(1):2). In particular, a central analysis or a CDM provide protection from problems related to variation in how protocols are implemented by individual analysts (as described in Quantifying how small variations in design elements affect risk in an incident cohort study in claims; Pharmacoepidemiol Drug Saf. 2020;29(1):84-93). Several of the networks have made their codes, common data models and analytics software publicly available, such as OHDSI, DARWIN EU®, Sentinel, VAC4EU. This is one of the potential solutions to minimise reproducibility issues in multi-database studies.
Timeliness or speed of running studies is important in order to meet short regulatory timelines, in circumstances where prompt decision-making is needed. Solutions need therefore to be further developed and introduced to be able to run multi-database studies with shorter timelines. Independently from the model used, a critical factor that should be considered in speeding up studies relates to having tasks completed that are independent of any particular study. This includes all activities associated with governance, such as having prespecified agreements on data access, processes for protocol development and study management, and identification and characterisation of a large set of databases. This also includes some activities related to the analysis, such as creating common definitions for frequently used variables, and creating common analytical systems for the most typical and routine analyses. This latter point is made easier with the use of CDMs with standardised analytics and tools that can be re-used to support faster analysis, as demonstrated in DARWIN EU®, where analytical pipelines are being developed to fulfil the needs of EMA-commissioned studies based on pre-specified analysis plans (see Catalogue of Standard Analyses).