9.2.1. Independent analyses: separate protocols, local and individual data extraction and analysis, no CDM
9.2.2. Local analysis: common protocol, local and individual data extraction and analysis, no CDM
9.2.3. Sharing of data: common protocol, local and individual data extraction, central analysis
9.2.5. General CDM: common protocol, local and common data extraction and analysis, general CDM
Studies may be classified into five categories according to specific choices in the steps needed for their execution, i.e., protocol and statistical analysis plan (SAP) development, location of data extraction and analysis (locally or centrally), methods for data extraction and analysis (using individual or common programs, use of a CDM, and which type of CDM: study-specific or general CDM). The key steps needed to execute each study model are presented in the following Figure and explained in this section.
9.2.1. Independent analyses: separate protocols, local and individual data extraction and analysis, no CDM
The traditional model to combine data from multiple data sources consists in data extraction and analysis performed independently at each centre, based on separate protocols. This is usually followed by a meta-analysis of the different estimates obtained (see Chapter 10 and Annex 1).
This type of model, when viewed as a means to combine results from multiple data sources on the same research questions, may be considered as a baseline situation which a research network should try to improve on for the study design. Meta-analyses also facilitate the evaluation of heterogeneity of results across different independent studies and they could be performed retrospectively regardless of the model of studies used, in line with the recommendations from the Multi-centre, multi-database studies with common protocols: lessons learnt from the IMI PROTECT project (Pharmacoepidemiol Drug Saf. 2016;25(S1):156-65). Investigating heterogeneity may provide useful information on the issue under investigation, and explaining such variation should also be attempted if the data sources can be accessed. An example of such an investigation is Assessing heterogeneity of electronic health-care databases: A case study of background incidence rates of venous thromboembolism (Pharmacoepidemiol Drug Saf. 2023 Apr 17. doi: 10.1002/pds.5631). This approach increases consistency in findings from observational drug effect studies or reveals causes of differential drug effects.
9.2.2. Local analysis: common protocol, local and individual data extraction and analysis, no CDM
In this model, data are extracted and analysed locally, with site-specific programs developed by each centre, on the basis of a common protocol and a common SAP agreed by all study partners. The common SAP defines and standardises exposures, outcomes and covariates, analytical programmes and reporting formats. The results of each analysis, either at the subject level or in an aggregated format depending on the governance of the network, are shared and can be pooled together using meta-analysis.
This approach allows the assessment of database or population characteristics and their impact on estimates, but it reduces the variability of results determined by differences in design. Examples of research networks that use the common protocol approach are PROTECT (as described in Improving Consistency and Understanding of Discrepancies of Findings from Pharmacoepidemiological Studies: the IMI PROTECT Project, Pharmacoepidemiol Drug Saf. 2016;25(S1): 1-165), which has implemented this approach in collaboration with CNODES (see Major bleeding in users of direct oral anticoagulants in atrial fibrillation: A pooled analysis of results from multiple population-based cohort studies, Pharmacoepidemiol Drug Saf. 2021;30(10):1339-52).
This approach requires very detailed common protocols and data specifications that reduce variability in interpretation by researchers.
9.2.3. Sharing of data: common protocol, local and individual data extraction, central analysis
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted with site-specific programs, transferred without analysis and conversion to a CDM, and pooled and analysed at the central partner receiving them. Data received at the central partner can be reformatted to a common structure to facilitate the analysis.
This approach applies when databases are very similar in structure and content, as for some Nordic registries and the Italian regional databases. Examples of such models are Protocol: Methodology of the brodalumab assessment of hazards: a multicentre observational safety (BRAHMS) study (BMJ. Open 2023;13(2):e066057) and All‐cause mortality and antipsychotic use among elderly persons with high baseline cardiovascular and cerebrovascular risk: a multi‐center retrospective cohort study in Italy (Expert Opin. Drug Metab. Toxicol. 2019;15(2):179-88).
The central analysis allows for assessment of pooled data adjusting for covariates on an individual patient level and removing an additional source of variability linked to the statistical programming and analysis. However, this model becomes more difficult to implement due to the stronger privacy requirements for sharing patient level data.
9.2.4. Study specific CDM: common protocol, local and individual data extraction, local and common analysis, study specific CDM
In this approach, a common protocol is agreed by the study partners. Data intended to be used for the study are locally extracted and transformed into an agreed CDM. The data in the CDM are then processed locally in every site with one common program. The output of the common program is transferred to a specific partner. The output to be shared may be an analytical dataset or study estimates, depending on the governance of the network. This model is explained in From Inception to ConcePTION: Genesis of a Network to Support Better Monitoring and Communication of Medication Safety During Pregnancy and Breastfeeding (Clin Pharmacol Ther. 2022;111(1):321-31).
Examples of research networks that used this approach by employing a study-specific CDM with transmission of anonymised patient-level data (allowing a detailed characterisation of each database) are EU-ADR (as explained in Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?, J Intern Med 2014;275(6):551-61), SOS, ARITMO, SAFEGUARD, GRIP, EMIF, EUROmediCAT, ADVANCE, VAC4EU and ConcePTION. In all these projects, a CDM was utilised, and R, SAS, STATA or Jerboa scripts used to create and share common analytics. Diagnosis codes for case finding can be mapped across terminologies by using the Codemapper developed in ADVANCE (see CodeMapper: semiautomatic coding of case definitions, Pharmacoepidemiol Drug Saf. 2017;26(8):998-1005). An example of a study performed using this model is Background rates of 41 adverse events of special interest for COVID-19 vaccines in 10 European healthcare databases - an ACCESS cohort study, Vaccine. 2023;41(1):251-262).
9.2.5. General CDM: common protocol, local and common data extraction and analysis, general CDM
In this approach, the local databases are transformed into a CDM prior to, and are agnostic to, any study protocol. When a study is required, a common protocol is developed and a centrally created analysis program is created that runs locally on each database to extract and analyse the data. The output of the common programs shared may be an analytical dataset or study estimates, depending on the governance of the network.
Examples of research networks which use a generalised CDM are the Sentinel Initiative (as described in The US Food and Drug Administration Sentinel System: a national resource for a learning health system, Journal of the American Medical Informatics Association, Volume 29, December 2022, Pages 2191–2200) OHDSI – Observational Health Data Sciences and Informatics, the Canadian Network for Observational Drug Effect Studies (CNODES), and EMA’s Data Analysis and Real World Interrogation Network (DARWIN EU®). The latter uses the same CDM as OHDSI, and combines previously existing analytical pipelines with bespoke newly developed ones, based on an EMA-endorsed catalogue of Standardised Analytics.
The main advantage of a general CDM is that it can be used for nearly any study involving the same database converted into the CDM. OHDSI and DARWIN EU® are based on the Observational Medical Outcomes Partnership (OMOP) CDM which is now used by many organisations and has been tested for its suitability for safety studies (see, for example, Validation of a common data model for active safety surveillance research, J Am Med Inform Assoc. 2012;19(1):54–60; and Can We Rely on Results From IQVIA Medical Research Data UK Converted to the Observational Medical Outcome Partnership Common Data Model?: A Validation Study Based on Prescribing Codeine in Children, Clin Pharmacol Ther. 2020;107(4):915-25). Conversion into the OMOP CDM requires formal mapping of database items to standardised concepts, which is a resource intensive and iterative process. Iterations on the same databases usually lead to gains in efficiency. Mapping expertise and software are also constantly developed to support and accelerate the conversion process. Examples of studies performed with the OMOP CDM in Europe are Large-scale evidence generation and evaluation across a network of databases (LEGEND): assessing validity using hypertension as a case study (J Am Med Inform Assoc. 2020;27(8):1268-77); Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study (Lancet Rheumatol. 2020;11(2):e698–711); Characterising the background incidence rates of adverse events of special interest for covid-19 vaccines in eight countries: multinational network cohort study (BMJ. 2021;373:n1435); Venous or arterial thrombosis and deaths among COVID-19 cases: a European network cohort study (Lancet Infectious Diseases 2022;22(8):P1142-52); and Comparative risk of thrombosis with thrombocytopenia syndrome or thromboembolic events associated with different covid-19 vaccines: international network cohort study from five European countries and the US (BMJ. 2022;379:e071594).
In A Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance (Drug Saf. 2015;38(8):749-65), it is suggested that slight conceptual differences between the Sentinel and the OMOP models do not significantly impact on identifying known safety associations. Differences in risk estimations can be primarily attributed to the choices and implementation of the analytic approach.
A review of IT-architecture, legal considerations, and statistical methods for federated analyses is presented in Federated analyses of multiple data sources in drug safety studies (Pharmacoepidemiol Drug Saf. 2023 Mar;32(3):279-286).