Secondary use of data refers to the utilisation of data already collected for other purposes. These data can be further linked to prospectively collected medical and non-medical data. Electronic healthcare databases (e.g., claims databases, electronic health records) and patient registries are examples of data sources that can be leveraged as secondary data for pharmacoepidemiological studies.
The last decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains information on existing European and worldwide databases that may be used for pharmacoepidemiological research. However, this field is continuously evolving.
A description of the main features, applications and limitations of frequently used electronic healthcare databases for pharmacoepidemiology research in the United States and in Europe is presented in the textbook Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2019, Chapters 11-14).
In order to assist in the selection and appropriate use, including the assessment of strengths and limitations, of data sources for pharmacoepidemiological research, the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf. 2012;21(1):1-10) highlights potential limitations of data sources for secondary use containing routinely collected healthcare information, such as electronic health records (from either primary or secondary care) and claims databases, and recommends procedures for data analysis and interpretation. A section of the guideline is dedicated to multi-database studies which may be defined as “studies using at least two healthcare databases, which are not linked with each other at an individual person level, either because they insist on different populations, or because, even if populations overlap, local regulations forbid record linkage” (see Chapter 9). References to data quality and validation procedures, data processing/transformation, and data privacy and security (see Chapter 12.2) are also provided. In Different Strategies to Execute Multi-Database Studies for Medicines Surveillance in Real-World Setting: A Reflection on the European Model (Clin Pharmacol Ther. 2020;108(2):228-235), four strategies to conduct multi-database studies are discussed (see also Chapter 9). Specific processes have also been proposed to identify fit-for-purpose data sources to address research questions. For example, The Structured Process to Identify Fit-For-Purpose Data: A Data Feasibility Assessment Framework (Clin Pharmacol Ther. 2022;111(1):122-34) provides a structured and detailed stepwise approach for the identification and feasibility assessment of candidate data sources for a specific study. In order to help signpost regulators, researchers, industry and evidence reviewers to the relevant data sources to address a research question, the joint EMA-EU Heads of Medicines Agency Big Data Steering Group has also published a list of metadata (2022) describing data sources and studies and defined following extensive consultation of interested parties. This list will be used in the rebuilding and enhancement of the ENCePP Inventory of Data sources. The experience will show how such initiatives can support the validity and transparency of study results and ultimately the level of confidence in the evidence provided. It should also be acknowledged that many investigators naturally use the data source(s) they can directly access and are familiar with.
The FDA Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets (2013) provides criteria for best practice that apply to the study design, analysis, conduct and documentation. It emphasizes that investigators should understand the potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of outcomes of interest in the proposed study and captured in the database. This is also covered in the UK MHRA guidance on the use of real-world data in clinical studies to support regulatory decisions (2021). Guidance for conducting studies within electronic healthcare databases can also be found in the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices (ISPE GPP, 2015), in particular sections IV-B (Study conduct, Data collection). This guidance emphasises the importance of patient data protection.
The use of real-world data (RWD) for the generation of real-world evidence (RWE) for regulatory decision-making has been addressed by guidelines issued by regulatory agencies. The article Real-World Data for Regulatory Decision Making: Challenges and Possible Solutions for Europe (Clin Pharmacol Ther. 2019;106(1):36-9) describes the operational, technical and methodological challenges for the acceptability of RWD for regulatory purposes and presents possible solutions to address these challenges. The draft FDA guidance Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products (2021) provides recommendations focused on regulatory studies using electronic health records and claims databases, and a more general draft guidance provides Considerations for the Use of Real-World Data and Real-World Evidence To Support Regulatory Decision-Making for Drug and Biological Products (December 2021). More information on RWD and RWE are available in Chapter 16.7, Real-world evidence and pharmacoepidemiology.
The Joint ISPE-ISPOR Special Task Force Report on Good Practices for Real‐World Data Studies of Treatment and/or Comparative Effectiveness (2017) recommends good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER) and reviews methodological issues and possible solutions for CER studies based on secondary data analysis (see also Chapter 16.1). Many of the principles are applicable to studies with other objectives than CER, but some aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.
Most of the examples and methods covered in Chapter 4 are based on studies and methodologic developments concerning secondary use of healthcare databases, since this is one of the most frequent approaches used in pharmacoepidemiology.