Secondary use of data refers to the utilisation of data already gathered for other purposes. These data can be further linked to prospectively collected medical and non-medical data. Electronic healthcare databases (e.g., claims databases, electronic health records) and patient registries are examples of data sources that can be leveraged as secondary data for pharmacoepidemiological studies.
The last decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains information on existing European and worldwide databases that may be used for pharmacoepidemiological research. However, this field is continuously evolving.
A description of the main features and applications of frequently used electronic healthcare databases for pharmacoepidemiology research in the United States and in Europe is presented in the textbook Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2019, Chapters 11-14). In general, the limitations of using electronic healthcare databases should be acknowledged, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol. 2005; 58(4): 323-37).
In order to assist in the selection and appropriate use of data sources for pharmacoepidemiological research, the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf. 2012;21(1):1-10) highlights potential limitations of secondary databases containing routinely collected healthcare information, such as electronic medical records (from either primary or secondary care) and claims databases, and recommends procedures for data analysis and interpretation. A section of the guideline is dedicated to multi-database studies which may be defined as “studies using at least two healthcare databases, which are not linked with each other at an individual person level, either because they insist on different populations, or because, even if populations overlap, local regulations forbid record linkage”. The document also contains references to data quality and validation procedures, data processing/transformation, and data privacy and security (see also Chapter 12.2, Data quality frameworks). In Different Strategies to Execute Multi-Database Studies for Medicines Surveillance in Real-World Setting: A Reflection on the European Model (Clin Pharmacol Ther. 2020;108(2):228-235), four strategies to conduct multi-database studies are discussed (see also Chapter 8, Research networks for multi-database studies). Algorithms have also been proposed to identify fit-for-purpose data sources to address research questions. For example, The Structured Process to Identify Fit-For-Purpose Data: A Data Feasibility Assessment Framework (Clin Pharmacol Ther. 2022;111(1):122-34) provides a structured and detailed stepwise approach for the identification and feasibility assessment of candidate data sources for a specific study. In order to help signpost regulators, researchers and industry to the relevant data sources to address a research question, the joint EMA-EU Heads of Medicines Agency Big Data Steering Group has also published a list of metadata (2022) describing data sources and studies and defined following extensive consultation of interested parties. This list will be used in the rebuilding and enhancement of the ENCePP Inventory of Data sources. The experience will show how such initiatives can support the validity and transparency of study results and ultimately the level of confidence in the evidence provided. It should also be acknowledged that many investigators naturally use the data source(s) they can directly access and are familiar with.
The US FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets (2013) provides criteria for best practice that apply to the study design, analysis, conduct and documentation. It emphasizes that investigators should understand the potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of outcomes of interest in the proposed study and captured in the database. This is also covered in the UK MHRA guidance on the use of real-world data in clinical studies to support regulatory decisions (2021). Guidance for conducting studies within electronic healthcare databases can also be found in the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices (ISPE GPP, 2015), in particular sections IV-B (Study conduct, Data collection). This guidance emphasises the importance of patient data protection.
The use of “Real-world data” (RWD) for the generation of “Real-world evidence” (RWE) for regulatory decision-making has been addressed by guidelines issued by regulatory agencies. The article Real-World Data for Regulatory Decision Making: Challenges and Possible Solutions for Europe (Clin Pharmacol Ther. 2019;106(1):36-9) describes the operational, technical and methodological challenges for the acceptability of real-world data for regulatory purposes and presents possible solutions to address these challenges. The draft US FDA guidance Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products (2021)provides recommendations focused on regulatory studies using electronic and claims databases, and a more general draft guidance provides Considerations for the Use of Real-World Data and Real-World Evidence To Support Regulatory Decision-Making for Drug and Biological Products (December 2021). More information on RWD and RWE are available in Chapter 15.7, Real-world evidence and pharmacoepidemiology.
The Joint ISPE-ISPOR Special Task Force Report on Good Practices for Real‐World Data Studies of Treatment and/or Comparative Effectiveness (2017) recommends good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER) and reviews methodological issues and possible solutions for CER studies based on secondary data analysis (see also Chapter 15.1 on comparative effectiveness research). Many of the principles are applicable to studies with other objectives than CER, but some aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.
The majority of the examples and methods covered in Chapter 4 are based on studies and methodologic developments in secondary use of healthcare databases, since this is one of the most frequent approaches used in pharmacoepidemiology. Several potential issues need to be considered in the use of electronic healthcare data for pharmacoepidemiological studies as they may affect the validity of the results. They include completeness of data capture, bias in the assessment of exposure, outcome and covariates, variability between data sources and the impact of changes over time in the data (as has been noted in the pre- vs. post-COVID-19 period), access methodology and the healthcare system of the country or region covered by the database.