Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


4.6. Research networks for multi-database studies


4.6.1. General considerations


Pooling data across different databases affords insight into the generalisability of the results and may improve precision. A growing number of studies use data from networks of databases, often from different countries. Some of these networks are based on long-term contracts with selected partners and are very well structured (such as Sentinel, the Vaccine Safety Datalink (VSD) or the Canadian Network for Observational Drug Effect Studies (CNODES), but others are looser collaborations based on an open community principle (e.g. Observational Health Data Sciences and Informatics (OHDSI). In Europe, collaborations for multi-database studies have been strongly encouraged by the Drug Safety research funded by the European Commission (EC) and public-private partnerships such as the Innovative Medicines Initiative (IMI). This funding resulted in the conduct of groundwork necessary to overcome the hurdles of data sharing across countries for specific projects (e.g. PROTECT, ADVANCE, EMIF, EHDEN) or for specific post-authorisation studies.


In this chapter, networking is used to mean collaboration between investigators for sharing expertise and resources. The ENCePP Database of Research Resources may facilitate such networking by providing an inventory of research centres and data sources that can collaborate on specific pharmacoepidemiology and pharmacovigilance studies in Europe. It allows the identification of centres and data sets by country, type of research and other relevant fields.


The use of research networks in drug safety analyses is well established and a significant body of practical experience exists. By contrast, no consensus exists on the use of such networks, or indeed of single sources of observational data, in estimating effectiveness. In particular, the use in support of licensing applications will require evaluations of the reliability of results and the verifiability of research processes that are currently at an early stage. Specific advice on effectiveness can only be given once this work has been done and incorporated into regulatory guidelines. Hence this discussion currently relates only to product safety (see Assessing strength of evidence for regulatory decision making in licensing: What proof do we need for observational studies of effectiveness?; Pharmacoepidemiol. Drug Saf. 2020 Apr 16).

From a methodological point of view, research networks have many advantages over single database studies:

  • In case of primary data collection, shorten the time needed for obtaining the desired sample size and speed-up investigation of drug safety issues or other outcomes.


  • Benefit from the heterogeneity of treatment options across countries, which allows studying the effect of different drugs used for the same indication or specific patterns of utilisation.


  • May provide additional knowledge on the generalisability of results and on the consistency of information, for instance whether a safety issue exists in several countries. Possible inconsistencies might be caused by different biases or truly different effects in the databases revealing causes of differential drug effects, and these might be investigated.


  • Involve experts from various countries addressing case definitions, terminologies, coding in databases and research practices provides opportunities to increase consistency of results of observational studies.


  • Allow pooling data or results and increase the amount of information gathered for a specific issue addressed in different databases.


The article Different strategies to execute multi-database studies for medicines surveillance in real world setting: a reflection on the European model (Clin. Pharmacol. Ther. 2020 Apr 3) describes different models applied for combining data or results from multiple databases. A common characteristic of all models is the fact that data partners maintain physical and operational control over electronic data in their existing environment and therefore the data extraction is always done locally. Differences however exist in the following areas: use of a common protocol; use of a common data model (CDM); and where and how the data analysis is done.

Use of a common data model (CDM) implies that local formats are translated into a predefined, common data structure, which allows launching a similar data extraction and analysis script across several databases. Sometimes the CDM imposes a common terminology as well, as in the case of the OMOP CDM. The CDM can be systematically applied on the entire database (generalised CDM) or on the subset of data needed for a specific study (study specific CDM). In the EU, study specific CDMs have generated results in several projects and studies and initial steps have been taken to create generalised CDMs, but experience based on real-life studies is still limited. An example is the study Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study.

4.6.2. Models of studies using multiple data sources


Five models of studies are presented, classified according to specific choices in the steps needed to execute a study: protocol development and agreement (whether separate or common); where the data are extracted and analysed (locally or centrally); how the data are extracted and analysed (using individual or common programs); and use of a CDM and which type (study specific or general) (see Table 1). Meta-analysis: separate protocols, local and individual data extraction and analysis, no CDM


The traditional mode to combine data from multiple data sources is when data extraction and analysis are performed independently at each centre based on separate protocols. This is usually followed by meta-analysis of the different estimates obtained (see Chapter 5.7).

This type of model may be viewed as a baseline situation which a research network will try to improve. Moreover, meta-analysis should be used in all models of studies presented, as there is always the possibility that different data sources provides different results and hence explicitly looking for such variation should always be considered. If all the data sources can be accessed, explaining variations in term of covariates should also be attempted. This is coherent with the recommendations from Multi-centre, multi-database studies with common protocols: lessons learnt from the IMI PROTECT project (Pharmacoepidemiol. Drug Saf. 2016;25(S1):156-165) that states that a priori pooling of data from several databases may disguise heterogeneity that may provide useful information on the safety issue under investigation. On the other hand, parallel analysis of databases allows exploring reasons for heterogeneity through extensive sensitivity analyses. This approach eventually increases consistency in findings from observational drug effect studies or reveal causes of differential drug effects. Local analysis: common protocol, local and individual data extraction and analysis, no CDM


In this option, data are extracted and analysed locally, with site-specific programs that are developed by each centre, on the basis of a common protocol. Definitions of exposure, outcomes and covariates, analytical programmes and reporting formats are standardised according to a common protocol and the results of each analysis, either at a patient level or in an aggregated format depending on the governance of the network, are shared and pooled together through meta-analysis.

This approach allows assessment of database or population characteristics and their impact on estimates but reduces variability of results determined by differences in design. Examples of research networks that use the common protocol approach are PROTECT (as described in Improving Consistency and Understanding of Discrepancies of Findings from Pharmacoepidemiological Studies: the IMI PROTECT Project. (Pharmacoepidemiol Drug Saf 2016;25(S1): 1-165) and the Canadian Network for Observational Drug Effect Studies (CNODES). The latter is experimenting with a CDM as explained in Building a framework for the evaluation of knowledge translation for the Canadian Network for Observational Drug Effect Studies (Pharmacoepidemiol. Drug Saf. 2020;29 (S1),8-25)

This approach requires very detailed common protocols and data specifications that reduce variability in interpretations by researchers. Sharing of raw data: common protocol, local and individual data extraction, central analysis, no CDM


In this approach, a mutually agreed protocol is agreed by the study partners. Data intended to be used for the study are locally extracted with site-specific programs, transferred without analysis and conversion to a CDM, and pooled and analyzed at the central partner receiving them.

Examples for this approach are when databases are very similar in structure and content as is the case for some Nordic registries, or on the Italian regional databases. Examples of such models are Selective serotonin reuptake inhibitors during pregnancy and risk of persistent pulmonary hypertension in the newborn: population based cohort study from the five Nordic Countries (BMJ 2012;344:d8012) and All‐cause mortality and antipsychotic use among elderly persons with high baseline cardiovascular and cerebrovascular risk: a multi‐center retrospective cohort study in Italy (Expert Opin. Drug Metab. Toxicol. 2019;15:179-88).

The central analysis allows removing an additional source of variability linked to the statistical programing and analysis. Study specific CDM: common protocol, local and individual data extraction, local and common analysis, study specific CDM


In this approach, a mutually agreed protocol is agreed by the study partners and data intended to be used for the study are locally extracted and loaded into a CDM; data in the CDM are then processed locally in all the sites with one common program. The output of the common program is transferred to a specific partner. The output to be shared may be an analytical dataset or study estimates, depending on the governance of the network.

Examples of research networks that used this approach by employing a study-specific CDM with transmission of anonymised patient-level data (allowing a detailed characterisation of each database) are EU-ADR (as explained in Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?, J Intern Med 2014;275(6):551-61), SOS, ARITMO, SAFEGUARD, GRIP, EMIF, EUROmediCAT and ADVANCE. In all these projects, a basic and simple CDM was utilised and R, SAS, STATA or Jerboa scripts have been used to create and share common analytics. Diagnosis codes for case finding can be mapped across terminologies by using the Codemapper, developed in the ADVANCE project, as explained in CodeMapper: semiautomatic coding of case definitions (Pharmacoepidemiol Drug Saf 2017;26(8):998-1005).

An approach to quantify the impact of different case finding algorithms, called the component strategy, was developed in the EMIF and ADVANCE projects and could also be compatible with the simple and generalised common data model (see Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project. PLoS One 2016;11(8):e0160648). General CDM: common protocol, local and common data extraction and analysis, general CDM


In this approach, the local databases are transformed into a CDM prior to and independent of any study protocol. When a study is required, a protocol is agreed by the study partners and a centrally developed analysis program is created that runs locally on each database to extract and analyse the data. The output of the common programs shared may be an analytical dataset or study estimates, depending on the governance of the network.


Two examples of research networks which use a generalised CDM are the Sentinel Initiative (as described in The U.S. Food and Drug Administration's Mini-Sentinel Program, Pharmacoepidemiol Drug Saf 2012;21(S1):1–303) and OHDSI. The main advantage of a general CDM is that it can be used for virtually any study involving that database. OHDSI is based on the Observational Medical Outcomes Partnership (OMOP) CDM which is now used by many organisations and has been tested for its suitability for safety studies (see for example Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012;19(1):54–60 and Can We Rely on Results From IQVIA Medical Research Data UK Converted to the Observational Medical Outcome Partnership Common Data Model?: A Validation Study Based on Prescribing Codeine in Children (Clin Pharmacol Ther 2020;107(4):915-25)). Conversion into the OMOP CDM, requires formal mapping of database items to standardised concepts. This is resource intensive and will need to be updated every time the databases is refreshed. An example of a study performed with the OMOP CDM in Europe is Safety of hydroxychloroquine, alone and in combination with azithromycin, in light of rapid wide-spread use for COVID-19: a multinational, network cohort and self-controlled case series study.


In a Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance (Drug Saf 2015;38(8):749-65), it is suggested that slight conceptual differences between the Sentinel and the OMOP models do not significantly impact on identifying known safety associations. Differences in risk estimations can be primarily attributed to the choices and implementation of the analytic approach.

Table 1: Models of studies using multiple data sources: key characteristics following the steps needed to execute a study

For larger version of the table click here.



4.6.3. Challenges of different models


The different models presented above present several challenges:


Related to the scientific content

  • Differences in the underlying health care systems

  • Different mechanisms of data generation and collection

  • Mapping of differing disease coding systems (e.g., the International Classification of Disease, 10th Revision (ICD-10), Read codes, the International Classification of Primary Care (ICPC-2)) and narrative medical information in different languages

  • Validation of study variables and access to source documents for validation

Related to the organisation of the network

  • Differences in culture and experience between academia, public institutions and private partners

  • Differences in the type and quality of information contained within each mapped database

  • Different ethical and governance requirements in each country regarding processing of anonymised or pseudo-anonymised healthcare data

  • Choice of data sharing model and access rights of partners

  • Issues linked to intellectual property and authorship.

  • Sustainability and funding mechanisms.

Each model has strengths and weaknesses in facing the above challenges, as illustrated in Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies (eGEMs 2016;4(1):2). In particular, a central analysis or a CDM provide protection from problems related to variation in how protocols are implemented as individual analysts might implement protocols differently (as described in Quantifying how small variations in design elements affect risk in an incident cohort study in claims; Pharmacoepidemiol. Drug Saf. 2020;29(1):84-93). Experience has shown that many of these difficulties can be overcome by full involvement and good communication between partners, and a project agreement between network members defining roles and responsibilities and addressing issues of intellectual property and authorship. Several of the networks have made their code, products data models and analytics software publicly available as OHDSI, Sentinel, ADVANCE.

Timeliness or speed for running studies is important in order to meet short regulatory timelines in circumstances where prompt decisions are needed. Solutions need therefore to be further developed and introduced to be able to run multi-database studies with shorter timelines. Independently from the model used, major factors that should be considered in speeding up studies include having work independent of any particular study already done. This includes factors such as: prespecified agreements on data access and processes for protocol development and study management, identification and characterisation of a large set of databases, creation of common definitions for variables that seem likely to occur in studies, and a common analytical systems where the most typical and routine analyses are already defined (this latter point is made easier with the use of CDMs, especially general ones, with standardised analytics and tools that can be re-used to support faster analysis).




« Back