Five models of studies are presented, classified according to specific choices in the steps needed to execute a study: protocol development and agreement (whether separate or common); where the data are extracted and analysed (locally or centrally); how the data are extracted and analysed (using individual or common programs); and use of a CDM and which type (study specific or general). The key characteristics of the steps needed to execute each study model are presented in the following Figure and explained in this section.
The traditional model to combine data from multiple data sources happens when data extraction and analysis are performed independently at each centre based on separate protocols. This is usually followed by meta-analysis of the different estimates obtained (see Chapter 9 and Annex 1).
This type of model, when viewed as a prospective decision to combine results from multiple data sources on the same topic, may be considered as a baseline situation which a research network will try to improve. Moreover, since meta-analyses facilitate the evaluation of heterogeneity of results across different independent studies, it should be used retrospectively regardless of the model of studies used. If all the data sources can be accessed, explaining such variation should also be attempted.
This is coherent with the recommendations from Multi-centre, multi-database studies with common protocols: lessons learnt from the IMI PROTECT project (Pharmacoepidemiol Drug Saf. 2016;25(S1):156-165), stating that investigating heterogeneity may provide useful information on the issue under investigation. This approach eventually increases consistency in findings from observational drug effect studies or reveals causes of differential drug effects.
In this model, data are extracted and analysed locally, with site-specific programs that are developed by each centre, on the basis of a common protocol agreed by study partners that defines and standardises exposures, outcomes and covariates, analytical programmes and reporting formats. The results of each analysis, either at the subject level or in an aggregated format depending on the governance of the network, are shared and can be pooled together using meta-analysis.
This approach allows the assessment of database or population characteristics and their impact on estimates, but reduces variability of results determined by differences in design. Examples of research networks that use the common protocol approach are PROTECT (as described in Improving Consistency and Understanding of Discrepancies of Findings from Pharmacoepidemiological Studies: the IMI PROTECT Project, Pharmacoepidemiol Drug Saf. 2016;25(S1): 1-165), which has implemented this approach in collaboration with CNODES (Major bleeding in users of direct oral anticoagulants in atrial fibrillation: A pooled analysis of results from multiple population-based cohort studies, Pharmacoepidemiol Drug Saf. 2021 Oct;30(10):1339-52).
This approach requires very detailed common protocols and data specifications that reduce variability in interpretation by researchers.
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted with site-specific programs, transferred without analysis and conversion to a CDM, and pooled and analysed at the central partner receiving them. Data received at the central partner can be reformatted to a common structure to facilitate the analysis.
Examples for this approach are when databases are very similar in structure and content, as is the case for some Nordic registries, or the Italian regional databases. Examples of such models are Risks and benefits of psychotropic medication in pregnancy: cohort studies based on UK electronic primary care health records (Health Technol Assess. 2016;20(23):1–176) and All‐cause mortality and antipsychotic use among elderly persons with high baseline cardiovascular and cerebrovascular risk: a multi‐center retrospective cohort study in Italy (Expert Opin. Drug Metab. Toxicol. 2019;15(2):179-88).
The central analysis allows for assessment of pooled data adjusting for covariates on an individual patient level and removing an additional source of variability linked to the statistical programing and analysis. However, this model becomes more difficult to implement, especially in Europe, due to the stronger privacy requirements when sharing patient level data.
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted and transformed into an agreed CDM; data in the CDM are then processed locally in all the sites with one common program. The output of the common program is transferred to a specific partner. The output to be shared may be an analytical dataset or study estimates, depending on the governance of the network.
Examples of research networks that used this approach by employing a study-specific CDM with transmission of anonymised patient-level data (allowing a detailed characterisation of each database) are EU-ADR (as explained in Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?, J Intern Med 2014;275(6):551-61), SOS, ARITMO, SAFEGUARD, GRIP, EMIF, EUROmediCAT, ADVANCE, VAC4EU and ConcePTION. In all these projects, a CDM was utilised and R, SAS, STATA or Jerboa scripts used to create and share common analytics. Diagnosis codes for case finding can be mapped across terminologies by using the Codemapper, developed in the ADVANCE project, as explained in CodeMapper: semiautomatic coding of case definitions (Pharmacoepidemiol Drug Saf. 2017;26(8):998-1005).
An example of a study performed using this model is Background rates of Adverse Events of Special Interest for monitoring COVID-19 vaccines, an ACCESS study.
In this approach, the local databases are transformed into a CDM prior to and independently of any study protocol. When a study is required, a common protocol is developed and a centrally created analysis program is created that runs locally on each database to extract and analyse the data. The output of the common programs shared may be an analytical dataset or study estimates, depending on the governance of the network.
Three examples of research networks which use a generalised CDM are the Sentinel Initiative (as described in The U.S. Food and Drug Administration's Mini-Sentinel Program, Pharmacoepidemiol Drug Saf 2012;21(S1):1–303), OHDSI – Observational Health Data Sciences and Informatics and the Canadian Network for Observational Drug Effect Studies (CNODES). The latter was relying on the second model proposed in this chapter, but it has been converted into a CDM, with six provinces having already completed the transformation of their data, as explained in Building a framework for the evaluation of knowledge translation for the Canadian Network for Observational Drug Effect Studies (Pharmacoepidemiol. Drug Saf. 2020;29 (S1),8-25).
The main advantage of a general CDM is that it can be used for virtually any study involving that database. OHDSI is based on the Observational Medical Outcomes Partnership (OMOP) CDM which is now used by many organisations and has been tested for its suitability for safety studies (see for example, Validation of a common data model for active safety surveillance research, J Am Med Inform Assoc. 2012;19(1):54–60, and Can We Rely on Results From IQVIA Medical Research Data UK Converted to the Observational Medical Outcome Partnership Common Data Model?: A Validation Study Based on Prescribing Codeine in Children, Clin Pharmacol Ther. 2020;107(4):915-25). Conversion into the OMOP CDM requires formal mapping of database items to standardised concepts. This is resource intensive and will need to be updated every time the databases are refreshed. Examples of studies performed with the OMOP CDM in Europe are Large-scale evidence generation and evaluation across a network of databases (LEGEND): assessing validity using hypertension as a case study (J Am Med Inform Assoc. 2020;27(8):1268-77) and Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study (Lancet Rheumatol. 2020;2: e698–711).
In A Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance (Drug Saf. 2015;38(8):749-65), it is suggested that slight conceptual differences between the Sentinel and the OMOP models do not significantly impact on identifying known safety associations. Differences in risk estimations can be primarily attributed to the choices and implementation of the analytic approach.
A future development that has been investigated and could be applied across all models is federated learning. Federated learning is a machine learning technique that trains an algorithm across multiple independent data sources, without exchanging patient-level data. This approach stands in contrast to traditional centralized machine learning techniques where all the local datasets are uploaded to one server. Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus allowing to address critical issues such as data privacy, data security, data access rights and access to distributed data. Although federated learning is promising, challenges remain, as discussed in The future of digital health with federated learning (NPJ Digit Med. 2020;14;3:119).