Pooling data across different databases increases precision and generalisability of the results. A growing number of studies use data from networks of databases, often from different countries. Some of these networks are based on long-term contracts with selected partners and are very well structured (such as Sentinel, the Vaccine Safety Datalink (VSD) or the Canadian Network for Observational Drug Effect Studies (CNODES)), but others are looser collaborations based on an open community principle (e.g. Observational Health Data Sciences and Informatics (OHDSI)). In Europe, collaborations for multi-database studies have been strongly encouraged by the Drug Safety research funded by the European Commission (EC) and public-private partnerships such as the Innovative Medicines Initiative (IMI). This funding resulted in the conduct of groundwork necessary to overcome the hurdles of data sharing across countries for specific projects (e.g. PROTECT, ADVANCE, EMIF) or for specific post-authorisation studies.
Networking implies collaboration between investigators for sharing expertise and resources. The ENCePP Database of Research Resources may facilitate such networking by providing an inventory of research centres and data sources that can collaborate on specific pharmacoepidemiology and pharmacovigilance studies in Europe. It allows the identification of centres and data sets by country, type of research and other relevant fields.
From a methodological point of view, research networks have many advantages over single database studies:
Research networks increase the size of study populations and shorten the time needed for obtaining the desired sample size. Hence, they can facilitate research on rare events and speed-up investigation of Drug Safety issues.
The heterogeneity of treatment options across countries allows studying the effect of different drugs used for the same indication.
Research networks may provide additional knowledge on whether a Drug Safety issue exists in several countries (and thereby reveal causes of differential drug effects), on the generalisability of results, on the consistency of information and on the impact of biases on estimates.
Involvement of experts from various countries addressing case definitions, terminologies, coding in databases and research practices provides opportunities to increase consistency of results of observational studies.
Sharing of data sources facilitates harmonisation of data elaboration and transparency in analyses and benchmarking of data management.
The potential for pooling data or results maximises the amount of information gathered for a specific issue addressed in different databases.
Different models have been applied for combining data or results from multiple databases. A common characteristic of all models is the fact that data partners maintain physical and operational control over electronic data in their existing environment. Differences however exist in the following areas: use of a common protocol; use of a common data model; and use of common data transformation analytics.
Use of a common data model (CDM) implies that local formats are translated into a predefined, common data structure, which allows launching a similar data transformation script across several databases. The CDM can be systematically applied on the entire database (generalised CDM) or on the subset of data needed for a specific study (study-specific CDM). In the EU, study-specific CDMs have generated results in several projects and studies. Initial steps have been taken to create generalised CDMs, but experience based on real-life studies is lacking.
The traditional way to combine data from multiple data sources is when data extraction and analysis are performed independently at each centre based on separate protocols. This is usually followed by meta-analysis of the different estimates obtained (see Chapter 5.7).
In this option, data are extracted and analysed locally on the basis of a common protocol. Definitions of exposure, outcomes and covariates, analytical programmes and reporting formats are standardised according to a common protocol and the results of each analysis are shared in an aggregated format and pooled together through meta-analysis. This approach allows assessment of database or population characteristics and their impact on estimates but reduces variability of results determined by differences in design. Examples of research networks that use the common protocol approach are PROTECT (as described in Improving Consistency and Understanding of Discrepancies of Findings from Pharmacoepidemiological Studies: the IMI PROTECT Project. (Pharmacoepidemiol Drug Saf 2016;25(S1): 1-165) and the Canadian Network for Observational Drug Effect Studies (CNODES).
This approach requires very detailed common protocols and data specifications that reduce variability in interpretations by researchers. Multi-centre, multi-database studies with common protocols: lessons learnt from the IMI PROTECT project (Pharmacoepidemiol Drug Saf 2016;25(S1):156-165) states that a priori pooling of data from several databases may disguise heterogeneity that may provide useful information on the safety issue under investigation. On the other hand, parallel analysis of databases allows exploring reasons for heterogeneity through extensive sensitivity analyses. This approach eventually increases consistency in findings from observational drug effect studies or reveal causes of differential drug effects.
Data can also be extracted from local databases using a study-specific, database-tailored extraction into a CDM and pre-processed locally. The resulting data can be transmitted to a central data warehouse as patient-level data or aggregated data for further analysis. Examples of research networks that used this approach by employing a study-specific CDM with transmission of anonymised patient-level data (allowing a detailed characterisation of each database) are EU-ADR (as explained in Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how?, J Intern Med 2014;275(6):551-61), SOS, ARITMO, SAFEGUARD, GRIP, EMIF, EUROmediCAT and ADVANCE. In all these projects, a basic and simple common date model was utilised and R, SAS, STATA or Jerboa scripts have been used to create and share common analytics. Diagnosis codes for case finding can be mapped across terminologies by using the Codemapper, developed in the ADVANCE project, as explained in CodeMapper: semiautomatic coding of case definitions (Pharmacoepidemiol Drug Saf 2017;26(8):998-1005).
An approach to quantify the impact of different case finding algorithms, called the component strategy, was developed in the EMIF and ADVANCE projects and could also be compatible with the simple and generalised common data model (see Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project. PLoS One 2016;11(8):e0160648).
Two examples of research networks which use a generalised CDM are the Sentinel Initiative (as described in The U.S. Food and Drug Administration's Mini-Sentinel Program, Pharmacoepidemiol Drug Saf 2012;21(S1):1–303) and OHDSI. The main advantage of a general CDM is that it can be used for virtually any study involving that database. OHDSI is based on the Observational Medical Outcomes Partnership (OMOP) CDM which is now used by many organisations and has been tested for its suitability for safety studies (see for example Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc 2012;19(1):54–60). Conversion into the OMOP CDM, requires formal mapping of database items to standardised concepts. This is resource intensive and will need to be conducted every time the databases is updated.
In A Comparative Assessment of Observational Medical Outcomes Partnership and Mini-Sentinel Common Data Models and Analytics: Implications for Active Drug Safety Surveillance (Drug Saf 2015;38(8):749-65), it is suggested that slight conceptual differences between the Sentinel and the OMOP models do not significant impact on identifying known safety associations. Differences in risk estimations can be primarily attributed to the choices and implementation of the analytic approach.
For some studies, it has been possible to analyse centrally patient level data extracted based on a common protocol, such as in Selective serotonin reuptake inhibitors during pregnancy and risk of persistent pulmonary hypertension in the newborn: population based cohort study from the five Nordic Countries (BMJ 2012;344:d8012). If databases are very similar in structure and content as is the case for some Nordic registries, a CDM might not be required for data extraction. The central analysis allows removing an additional source of variability linked to the statistical programing and analysis.
The different models presented above present many challenges:
Related to the scientific content
Differences in the underlying health care systems and mechanisms of data generation and collection
Mapping of differing disease coding systems (e.g., the International Classification of Disease, 10th Revision (ICD-10), Read codes, the International Classification of Primary Care (ICPC-2)) and narrative medical information in different languages.
Validation of study variables and access to source documents for validation.
Related to the organisation of the network
Differences in culture and experience between academia, public institutions and private partners.
Differences in the type and quality of information contained within each mapped database.
Different ethical and governance requirements in each country regarding processing of anonymised or pseudo-anonymised healthcare data.
Choice of data sharing model and access rights of partners.
Issues linked to intellectual property and authorship.
Sustainability and funding mechanisms.
Each model has strengths and weaknesses in facing the above challenges, as illustrated in Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies (EGEMS 2016 Feb). Experience has shown that many of these difficulties can be overcome by full involvement and good communication between partners, and a project agreement between network members defining roles and responsibilities and addressing issues of intellectual property and authorship. Several of the networks have made their code, products and data models publicly available as OMOP, Sentinel, ADVANCE.