16.1.1. Introduction
16.1.2. Methods for comparative effectiveness research
16.1.3. Methods for REA
16.1.4. Specific aspects
Comparative effectiveness research (CER) is designed to inform healthcare decisions for the prevention, the diagnosis and the treatment of a given health condition. CER therefore compares the potential benefits and harms of therapeutic strategies available in routine practice. The compared interventions may be related to similar treatments, such as competing medicines within the same class or with different mechanism of actions, or to different therapeutic approaches, such as surgical procedures and drug therapies. The comparison may focus only on the relative medical benefits and risks of the different options, or it may weigh both their costs and their benefits. The methods of comparative effectiveness research (Annu Rev Public Health 2012;33:425-45) defines the key elements of CER as a) a head-to-head comparison of active treatments, b) study population typical of the day-to-day clinical practice, and c) evidence focussed on informing healthcare and tailored to the characteristics of individual patients. CER is often discussed in the regulatory context of real-world evidence (RWE) generated by clinical trials or non-interventional (observational) studies using real-world data (RWD) (see Chapter 16.6).
The term ‘Relative effectiveness assessment (REA)’ is also used when comparing multiple technologies or a new technology against standard of care, while ‘rapid’ REA refers to performing an assessment within a limited timeframe in the case of a new marketing authorisation or a new indication granted for an approved medicine (see What is a rapid review? A methodological exploration of rapid reviews in Health Technology Assessments, Int J Evid Based Healthc. 2012;10(4):397-410).
16.1.2. Methods for comparative effectiveness research
CER may use a variety of data sources and methods. Methods to generate evidence for CER are divided below in four categories according to the data source: randomised clinical trials (RCTs), observational data, synthesis of published RCTs and cross-design synthesis.
16.1.2.1. CER based on randomised clinical trials
RCTs are considered the gold standard for demonstrating the efficacy of medicinal products but they rarely measure the benefits, risks or comparative effectiveness of an intervention in post-authorisation clinical practice. Moreover, relatively few RCTs are designed with an alternative therapeutic strategy as a comparator, which limits the utility of the resulting data in establishing recommendations for treatment choices. For these reasons, other methodologies such as pragmatic trials and large simple trials may be used to complement traditional confirmatory RCTs in CER. These trials are discussed in Chapter 4.2.7. The estimand framework described in the ICH E9-R1 Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials (2019) should be considered in the planning of comparative effectiveness trials as it provides coherence and transparency on important elements of CER, namely definitions of exposures, endpoints, intercurrent events (ICEs), strategies to manage ICEs, approach to missing data and sensitivity analyses.
In order to facilitate comparison of results of CER between clinical trials, the COMET (Core Outcome Measures in Effectiveness Trials) Initiative aims at developing agreed minimum standardized sets of outcomes (‘core outcome sets’, COS) to be assessed and reported in effectiveness trials of a specific condition. Choosing Important Health Outcomes for Comparative Effectiveness Research: An Updated Review and User Survey (PLoS One 2016;11(1):e0146444) provides an updated review of studies that have addressed the development of COS for measurement and reporting in clinical trials. It is also worth noting that regulatory disease guidelines also establish outcomes of clinical interest to assess if a new therapeutic intervention works. Use of the same endpoint across RCTs thus facilitate comparisons.
16.1.2.2. CER using observational data
Use of observational data in CER
Although observational data from Phase IV trials, post-authorisation safety studies (PASS), or other RWD sources can be used to assess comparative effectiveness (and safety), it is generally inappropriate to use such data as a replacement for randomised evidence, especially in a confirmatory setting. Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials (JAMA 2023;329(16):1376-85) concludes that RWE studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle. When and How Can Real World Data Analyses Substitute for Randomized Controlled Trials? (Clin Pharmacol. Ther. 2017;102(6):924-33) suggests that RWE may be preferred over randomised evidence when studying a highly promising treatment for a disease with no other available treatment and where ethical considerations may preclude randomising patients to placebo, particularly if the disease is likely to result in severely compromised quality of life or mortality. In these cases, RWE could support medicines regulation by providing evidence on the safety and effectiveness of the therapy against the typical disease progression observed in the absence of treatment. This comparator disease trajectory may be assessed from historical controls that were diagnosed prior to the availability of the new treatment, or other sources.
When Can We Rely on Real-World Evidence to Evaluate New Medical Treatments? (Clin Pharmacol Ther. 2021; 111(1): 30–4) recommends that decisions regarding use of RWE in the evaluation of new treatments should depend on the specific research question, characteristics of the potential study settings and characteristics of the settings where study results would be applied, and take into account three dimensions in which RWE studies might differ from traditional clinical trials: use of RWD, delivery of real-world treatment and real-world treatment assignment. Observational data have, for instance, been used in proof-of-concept studies on anaplastic lymphoma kinase-positive non-small cell lung cancer, in pivotal trials on acute lymphoblastic leukaemia, thalassemia syndrome and haemophilia A, and in studies aimed at label expansion for epilepsy (see Characteristics of non-randomised studies using comparisons with external controls submitted for regulatory approval in the USA and Europe: a systematic review, BMJ Open. 2019;1;9(2):e024895; The Use of External Controls in FDA Regulatory Decision Making, Ther Innov Regul Sci. 2021;55(5):1019–35; and Application of Real-World Data to External Control Groups in Oncology Clinical Trial Drug Development, Front Oncol. 2022;11:695936).
Outside of specific circumstances, observational data and clinical trials are considered complementary to generate comprehensive evidence. For example, clinical trials may include historical controls from observational studies, or identify eligible study participants from disease registries. In defense of pharmacoepidemiology--embracing the yin and yang of drug research (N Engl J Med 2007;357(22):2219-21) shows that strengths and weaknesses of RCTs and observational studies may make both designs necessary in the study of drug effects. Hybrid approaches for CER allow to enrich clinical trials with observational data, for example:
Use of historical data to partially replace concurrent controls in randomised trials (see A roadmap to using historical controls in clinical trials - by Drug Information Association Adaptive Design Scientific Working Group (DIA-ADSWG), Orphanet J Rare Dis. 2020;15:69);
Use of historical data as prior evidence for relative treatment effects (see Prior Elicitation for Use in Clinical Trial Design and Analysis: A Literature Review, Int J Environ Res Public Health 2021;18(4):1833);
Construction of external control groups in single arm studies and Phase IV trials (see the draft FDA guidance Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products (2023), A Review of Causal Inference for External Comparator Arm Studies (Drug Saf. 2022;45(8):815-37) and Methods for external control groups for single arm trials or long-term uncontrolled extensions to randomized clinical trials, Pharmacoepidemiol Drug Saf. 2020; 29(11):1382–92).
Methods for CER using observational data
The use of non-randomised data for causal inference is notoriously prone to various sources of bias. For this reason, it is strongly recommended to carefully design or select the source of RWD and to adopt statistical methods that acknowledge and adjust for major risks of bias (e.g. confounding, missing data).
A framework to address these challenges adopts counterfactual theory to treat the observational study as an emulation of a randomised trial. Target trial emulation (described in Chapter 4.2.6.) is a strategy that uses existing tools and methods to formalise the design and analysis of observational studies. It stimulates investigators to identify potential sources of concerns and develop a design that best addresses these concerns and the risk of bias.
Target trial emulation consists in designing first a hypothetical ideal randomised trial (“target trial”) that would answer the research question. A second step identifies how to best emulate the design elements of the target trial (including its eligibility criteria, treatment strategies, assignment procedure, follow-up, outcome, causal contrasts and pre-specified analysis plan) using the available observational data source and the analytic approaches to apply, given the trade-offs in an observational setting. This approach may prevent some common biases, such as immortal time bias or prevalent user bias while also identifying situations where adequate emulation may not be possible using the data at hand. Emulating a Target Trial of Interventions Initiated During Pregnancy with Healthcare Databases: The Example of COVID-19 Vaccination (Epidemiology 2023;34(2):238-46) describes a step-by-step specification of the protocol components of a target trial and their emulation including sensitivity analyses using negative controls to evaluate the presence of confounding and, alternatively to a cohort design, a case-crossover or case-time-control design to eliminate confounding by unmeasured time-fixed factors. Comparative Effectiveness of BNT162b2 and mRNA-1273 Vaccines in U.S. Veterans (N Engl J Med. 2022;386(2):105-15) used target trial emulation to design a study where recipients of each vaccine were matched in a 1:1 ratio according to their baseline risk factors. This design could not be applied where baseline measurements are not collected at treatment start, which may be the case in some patient registries. Use of the estimand framework of the ICH E9 (R1) Addendum to design the target trial may increase transparency on the choices and assumptions needed in the observational study to emulate key trial protocol components, such as the estimand, exposure, intercurrent events (and the strategies to manage them), the missing data and the sensitivity analyses, and therefore may help evaluate the extent to which the observational study addresses the same question as the target trial. Studies on the effect of treatment duration are also often impaired by selection bias: How to estimate the effect of treatment duration on survival outcomes using observational data (BMJ. 2018;360: k182) proposes a 3-step approach (cloning, censoring, weighting) that could be used with target trial simulation to achieve better comparability with the treatment assignment performed in the trial and overcome bias in the observational study.
Statistical inference methods that can be used for conducting causal inference in non-interventional studies are described in Chapter 6.2.3 and include multivariable regression (to adjust for confounding, missing data, measurement error, and other sources of bias), propensity score methods (to adjust for confounding bias), prognostic or disease risk score methods (to adjust for confounding), G-methods and marginal structure models (to adjust for time-dependent confounding), and imputation methods (to adjust for missing data). In some situations, these methods can also be used to adjust for instrumental variables or to estimate prior event rate ratios. Causal Inference in Oncology Comparative Effectiveness Research Using Observational Data: Are Instrumental Variables Underutilized? (J Clin Oncol. 2023;41(13):2319-2322) summarises the key assumption, advantages and disadvantages of methods of causal inference in CER to adjust for confounding, including regression adjustment, propensity scores, difference-in differences, regression discontinuity and instrumental variable, highlighting that different methods can be combined. In some cases, observational studies may substantially benefit from collecting instrumental variables, and this should be considered early on when designing the study. For example, Dealing with missing data using the Heckman selection model: methods primer for epidemiologists (Int J Epidemiol. 2023;52(1):5-13) illustrates the use of instrumental variables to address data that are missing not at random. Another example is discussed in Association of Osteoporosis Medication Use After Hip Fracture With Prevention of Subsequent Nonvertebral Fractures: An Instrumental Variable Analysis (JAMA Netw Open. 2018;1(3):e180826.), where instrumental variables are used to adjust for unobserved confounders.
The Agency for Healthcare Research and Quality (AHRQ)’s Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide (2013) identifies minimal standards and best practices for observational CER. It provides principles on a wide range of topics for designing research and developing protocols, with relevant questions to be addressed and checklists of key elements to be considered. The RWE Navigator website discusses methods using observational RWD with a focus on effectiveness research, such as the source of RWD, study designs, approaches to summarising and synthesising the evidence, modelling of effectiveness and methods to adjust for bias and governance aspects. It also presents a glossary of terms and case studies.
A roadmap to using historical controls in clinical trials - by Drug Information Association Adaptive Design Scientific Working Group (DIA-ADSWG) (Orphanet J Rare Dis. 2020;15:69) describes methods to minimise disadvantages of using historical controls in clinical trials, i.e. frequentist methods (e.g. propensity score methods and meta-analytical approach) or Bayesian methods (e.g. power prior method, adaptive designs and the meta-analytic combined [MAC] and meta-analytic predictive [MAP] approaches for meta-analysis). It also provides recommendations on approaches to apply historical controls when they are needed while maximising scientific validity to the extent feasible.
In the context of hybrid studies, key methodological issues to be considered when combining RWD and RCT data include:
Differences between the RWD and RCT in terms of data quality and applicability,
Differences between available RWD sources (e.g., due to heterogeneity in studied populations, differences in study design, etc.),
Risk of bias (particularly for RWD),
Generalisability (especially for RCT findings beyond the overall treatment effect).
Methods for systematic reviews and meta-analyses of observational studies are presented in Chapter 10 and Annex 1 of this Guide. They are also addressed in the Cochrane Handbook for Systematic Reviews of Interventions and the Methods Guide for Effectiveness and Comparative Effectiveness Reviews presented in section 16.1.2.3 of this Chapter.
Assessment of observational studies used in CER
Given the potential for bias and confounding in CER based on observational non-randomised studies, the design and results of such studies need to be adequately assessed. The Good ReseArch for Comparative Effectiveness (GRACE) (IQVIA, 2016) provides guidance to enhance the quality of observational CER studies and support their evaluation for decision-making using the provided checklist. How well can we assess the validity of non-randomised studies of medications? A systematic review of assessment tools (BMJ Open 2021;11:e043961) examined whether assessment tools for non-randomised studies address critical elements that influence the validity of findings from non-randomised studies for CER. It concludes that major design-specific sources of bias (e.g., lack of new-user design, lack of active comparator design, time-related bias, depletion of susceptibles, reverse causation) and statistical assessment of internal and external validity are not sufficiently addressed in most of the tools evaluated, although these critical elements should be integrated to systematically investigate the validity of non-randomised studies on comparative safety and effectiveness of medications. The article also provides a glossary of terms, a description of the characteristics the tools and a description of methodological challenges they address.
Comparison of results of observational studies and RCTs
Even if observational studies are not appropriate to replace RCTs for many CER topics and cannot answer exactly the same research question, comparison of their results for a same objective is currently a domain of interest. The underlying assumption is that if observational studies consistently match the results of published trials and predict the results of ongoing trials, this may increase the confidence in the validity of future RWD analyses performed in the absence of randomised trial evidence. In a review of five interventions, Randomized, controlled trials, observational studies, and the hierarchy of research designs (N Engl J Med 2000;342(25):1887-92) found that the results of well-designed observational studies (with either a cohort or case-control design) did not systematically overestimate the magnitude of treatment effects. Interim results from the 10 first emulations reported in Emulating Randomized Clinical Trials With Nonrandomized Real-World Evidence Studies: First Results From the RCT DUPLICATE Initiative (Circulation 2021;143(10):1002-13) found that differences between the RCT and corresponding RWE study populations remained but the RWE emulations achieved a hazard ratio estimate that was within the 95% CI from the corresponding RCT in 8 of 10 studies. Selection of active comparator therapies with similar indications and use patterns enhanced the validity of RWE. Final results of this project are discussed in the presentation Lessons Learned from Trial Replication Analyses: Findings from the DUPLICATE Demonstration Project (Duke-Margolis Center for Health Policy Workshop, 10 May 2022). Emulation Differences vs. Biases When Calibrating Real-World Evidence Findings Against Randomized Controlled Trials (Clin Pharmacol Ther. 2020;107(4):735-7) provides guidance on how to investigate and interpret differences in treatment effect estimates from the two study types.
An important source of selection bias leading to discrepancies between results of observational studies and RCTs may be the use of prevalent drug users in the former. Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol 2003;158(9):915-20) explains the biases introduced by use of prevalent drug users and how a new-user (or incident user) design eliminate these biases by restricting analyses to persons under observation at the start of the current course of treatment. The incident user design in comparative effectiveness research (Pharmacoepidemiol Drug Saf. 2013; 22(1):1–6) reviews published CER case studies in which investigators had used the incident user design and discusses its strengths (reduced bias) and weaknesses (reduced precision of comparative effectiveness estimates). Unless otherwise justified, the incident user design should always be used.
16.1.2.3. CER based on evidence synthesis of published RCTs
The Cochrane Handbook for Systematic Reviews of Interventions (version 6.2, 2022) describes in detail the process of preparing and maintaining systematic reviews on the effects of healthcare interventions. Although its scope is focused on Cochrane reviews, it has a much wider applicability. It includes guidance on the standard methods applicable to every review (planning a review, searching and selecting studies, data collection, risk of bias assessment, statistical analysis, GRADE and interpreting results), as well as more specialised topics. The (GRADE) working group (Grading of Recommendations Assessment, Development, and Evaluation) offers a structured process for rating quality of evidence and grading strength of recommendations in systematic reviews, health technology assessment and clinical practice guidelines. The Methods Guide for Effectiveness and Comparative Effectiveness Reviews (AHRQ, 2018) provides resources supporting comparative effectiveness reviews. They are focused on the US Effective Health Care (EHC) programme and may therefore have limitations as regards their generalisability.
A pairwise meta-analysis of RCT results is used when the primary aim is to estimate the relative effect of two interventions. Network meta-analysis for indirect treatment comparisons (Statist Med. 2002;21:2313–24) introduces methods for assessing the relative effectiveness of two treatments when they have not been compared directly in a randomised trial but have each been compared to other treatments. Overview of evidence synthesis and network meta-analysis – RWE Navigator discussed methods and best practices and gives access to published articles on this topic. A prominent issue that has been overlooked by some systematic literature reviews and network meta-analyses is the fact that RCTs included in a network meta-analysis are usually not comparable with each other even though they all compared to placebo. Different screening and inclusion/exclusion criteria often create different patient groups, and these differences are rarely discussed in indirect comparisons. Before indirect comparison are performed, researchers should therefore check the similarity/differences between the RCTs.
16.1.2.4. CER based on cross-design synthesis
Decision-making should ideally be based on all available evidence, including both randomised and non-randomised studies, and on both individual patient data and published aggregated data. Clinical trials are highly suitable to investigate efficacy but less practical to study long-term outcomes or rare diseases. On the other hand, observational data offer important insights about treatment populations, long-term outcomes (e.g., safety), patient-reported outcomes, prescription patterns, active comparators, etc. Combining evidence from these two sources could therefore be helpful to reach certain effectiveness/safety conclusions earlier or to address more complex questions. Several methods have been proposed but are still experimental. The article Framework for the synthesis of non-randomised studies and randomised controlled trials: a guidance on conducting a systematic review and meta-analysis for healthcare decision making (BMJ Evid Based Med. 2022;27(2):109-19) uses a 7-step mixed methods approach to develop guidance on when and how to best combine evidence from non-randomised studies and RCTs to improve transparency and build confidence in summary effect estimates. It provides recommendations on the most appropriate statistical approaches based on analytical scenarios in healthcare decision making and highlights potential challenges for the implementation of this approach.
Methodological Guidelines for Rapid REA of Pharmaceuticals (EUnetHTA, 2013) cover a broad spectrum of issues on REA. They address methodological challenges that are encountered by health technology assessors while performing rapid REA and provide and discuss practical recommendations on definitions to be used and how to extract, assess and present relevant information in assessment reports. Specific topics covered include the choice of comparators, strengths and limitations of various data sources and methods, internal and external validity of studies, the selection and assessment of endpoints and the evaluation of relative safety.
16.1.4.1. Secondary use of data for CER
Electronic healthcare records, patient registries and other data sources are increasingly used in clinical effectiveness studies as they capture real clinical encounters and may document reasons for treatment decisions that are relevant for the general patient population. As they are primarily designed for clinical care and not research, information on relevant covariates and in particular on confounding factors may not be available or adequately measured. These aspects are presented in other chapters of this Guide (see Chapter 6, Methods to address bias and confounding; Chapter 8.2, Secondary use of data, and other chapters for secondary use of data in other contexts) but they need to be specifically considered in the context of CER. For example, the Drug Information Association Adaptive Design Scientific Working Group ( DIA-ADSWG) Roadmap to using historical controls in clinical trials (Orphanet J Rare Dis. 2020;15:69) describes the main sources of RWD to be used as historical controls, with an Appendix providing guidance on factors to be evaluated in the assessment of the relevance of RWD sources and resultant analyses.
16.1.4.2 Data quality
Data quality is essential to ensure the rigor of CER and secondary use of data requires special attention. Comparative Effectiveness Research Using Electronic Health Records Data: Ensure Data Quality (SAGE Research Methods, 2020) discusses challenges and share experiences encountered during the process of transforming electronic health record data into a research quality dataset for CER. This aspect and other quality issues are also discussed in Chapter 13 on Quality management.
In order to address missing information, some CER studies have attempted to integrate information from healthcare databases with information collected ad hoc from study subjects. Enhancing electronic health record measurement of depression severity and suicide ideation: a Distributed Ambulatory Research in Therapeutics Network (DARTNet) study (J Am Board Fam Med. 2012;25(5):582-93) shows the value of linking direct measurements and pharmacy claims data to data from electronic healthcare records. Assessing medication exposures and outcomes in the frail elderly: assessing research challenges in nursing home pharmacotherapy (Med Care 2010;48(6 Suppl):S23-31) describes how merging longitudinal electronic clinical and functional data from nursing home sources with Medicare and Medicaid claims data can support unique study designs in CER but pose many challenging design and analytic issues.
16.1.4.3. Transparency and reproducibility
Clear and transparent study protocols for observational CER should be used to support the evaluation, interpretation and reproducibility of results. Use of the HARPER protocol template (HARmonized Protocol Template to Enhance Reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: A good practices report of a joint ISPE/ISPOR task force, Pharmacoepidemiol Drug Saf. 2023;32(1):44-55) is recommended to facilitate protocol development and addressing important design components. Public registration and posting of the protocol, disease and drug code lists, and statistical programming is strongly recommended to ensure that results from comparative effectiveness studies can be replicated using the same data and/or design, as emphasised in Journal of Comparative Effectiveness Research welcoming the submission of study design protocols to foster transparency and trust in real-world evidence (J Comp Eff Res. 2023;12(1):e220197). The EU PAS Register and ClinicalTrials.gov should be used for this purpose.