Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


4.2. Types of study design



4.2.1. Cohort studies

4.2.2. Case-control studies

4.2.3. Case-only designs

4.2.4. Cross-sectional studies

4.2.5. Ecological studies and case-population studies

4.2.6. Target trial emulation

4.2.7. Pragmatic trials and large simple trials



This chapter briefly describes the main types of study designs. Specific aspects or applications of these designs are presented in Chapter 4.4. These designs are fully described in several textbooks cited in the Introduction, for example, Modern Epidemiology 4th ed. (T. Lash, T.J. VanderWeele, S. Haneuse, K. Rothman. Wolters Kluwer, 2020).


The choice of the study design should be primarily driven by the need to obtain valid evidence regarding the objective(s) of the study by mitigating the risk of selection bias, information bias and confounding (see Chapter 6). Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available (Am J Epidemiol. 2016;183(8):758-64) has proposed target trial emulation as a strategy that uses existing tools and methods to formalise the design and analysis of observational studies. It stimulates investigators to identify potential sources of concerns and develop a design that best addresses these concerns and the risk of bias. Target trial emulation is described in Chapter 4.2.6. The increasing ability to use electronical data from routine healthcare systems has opened up new opportunities for investigators to conduct studies. Many investigators use the data source(s) they have access to and are familiar with in terms of potential bias, confounding and missing data.


4.2.1. Cohort studies


In a cohort study, the investigator identifies a population from which the study subjects will be identified, defines two or more groups of subjects (referred to as study cohorts) who are at risk for the outcome of interest and differ according to their exposure, and follows them over time to observe the occurrence of the outcome of interest in the exposed and unexposed cohorts. A cohort study may also include a single cohort that is heterogeneous with respect to exposure history, and occurrence of the outcome is measured and compared between exposure groups within the cohort. The amount of follow-up of each subject in the cohorts is counted and the total person-time experience serves as the denominator for the calculation of the incidence rate of the outcome of interest. Cohorts are called fixed when individuals may not move from one exposure group to the other. They are called closed when entry is not allowed after the cohort’s inception. The population of a cohort may also be called dynamic (or open) if it can gain and lose members who contribute to the person-time experience for the duration of their presence in the cohort. The main advantages of a cohort study are the possibility to calculate directly interpretable incidence rates of an outcome and to investigate multiple outcomes for a given exposure. The cohort design is also well suited to studies using large electronic records (such as electronic healthcare records and administrative claim data) where individual data are collected over long periods of time, allowing to study the effect of drug exposures to outcomes occurring later. Disadvantages are the need for a large sample size and possibly a long study duration to study rare outcomes, although use of existing electronic healthcare databases allow to retrospectively observe and analyse large cohorts (see Chapters 8.2 and 9).


Cohort studies are commonly used in pharmacoepidemiology to study the utilisation and effects of medicinal products. At the beginning of the COVID-19 pandemic, it was the design of choice to compare the risk and severity of SARS-CoV-2 infection in persons using or not certain types of medicines. An example is Renin-angiotensin system blockers and susceptibility to COVID-19: an international, open science, cohort analysis (Lancet Digit Health 2021;3(2):e98-e114) where electronic health records were used to identify and follow patients aged 18 years or older with at least one prescription for RAS blockers, calcium channel blockers, thiazide or thiazide-like diuretics. Four outcomes were assessed: COVID-19 diagnosis, hospital admission with COVID-19, hospital admission with pneumonia, and hospital admission with pneumonia, acute respiratory distress syndrome, acute kidney injury, or sepsis.


4.2.2. Case-control studies


In a case-control study, the investigator first identifies cases of the outcome of interest and establishes their exposure status, but the denominators (person-time of observation) to calculate their incidence rates are not measured. A referent (traditionally called “control”) group without the outcome of interest is then sampled to estimate the relative distribution of the exposed and unexposed denominators in the source population from which the cases originate. Only the relative size of the incidence rates can therefore be calculated. Advantages of a case-control study include a computational efficiency far superior to the cohort design, the possibility to initiate a study based on a set of cases already identified (e.g., in a hospital) and the possibility to study rare outcomes and their association with multiple exposures or risk factors. One of the main difficulties of case-control studies is the appropriate selection of controls independently of exposure or other relevant risk factors in order to ensure that the distribution of exposure categories among controls is a valid representation of the distribution in the source population. Another disadvantage is the difficulty to study rare exposures, as a large sample of cases and controls would be needed to identify exposed groups large enough for the planned statistical analysis.


In order to increase the efficiency of exposure assessment in case-control studies, an alternative approach is a design in which the source population is a cohort. The nested case-control design includes all cases occurring in the cohort and a pre-specified number of controls randomly chosen from the population at risk at each time a case (or other relevant event) occurs. A case-cohort study includes all cases and a randomly selected sub-cohort from the population at risk. Advantages of such designs is to allow the conduct of a set of case-control studies from a single cohort and use efficiently electronic healthcare records databases where data on exposures and outcomes are already available.


The study Impact of vaccination on household transmission of SARS-COV-2 in England (N Engl J Med. 2021;385(8):759-60) is a nested case-control study where the cohort was defined by occurrence of a laboratory-confirmed COVID-19 case occurring in a household between 4 January 2021 to 28 February 2021. A ‘cases’ was defined as a secondary case occurring in the same household as a COVID-19 case and a ‘control’ was identified as a person without infection. Exposure was defined by the presence of a vaccinated COVID-19 case vs. an unvaccinated COVID-19 case in the same household with the restriction that the vaccinated COVID-19 case had to be vaccinated 21 days prior to being diagnosed. The statistical analysis calculated the odds ratios and 95% confidence intervals for household members becoming ‘cases’ if the COVID-19 case was vaccinated with 21 days or more before testing positive, vs. household members where the COVID-19 case was not vaccinated.


4.2.3. Case-only designs General considerations


Although case-only designs are not considered as traditional study designs, they are increasingly used, and have been the topic of a large amount of methodological research. Case-only designs are designs in which cases are the only subjects; they reduce confounding by using the exposure and outcome history of each case as its own control, thereby eliminating confounding by characteristics that are constant over time, such as sex, socio-economic factors, genetic factors or chronic diseases. They are also best suited to studying transient exposures in relation to acute outcomes. Control yourself: ISPE-endorsed guidance in the application of self-controlled study designs in pharmacoepidemiology (Pharmacoepidemiol Drug Saf. 2021;30(6):671–84) proposes a common terminology to facilitate critical thinking in the design, analysis and review of studies, called by the authors ‘Self-controlled Crossover Observational PharmacoEpidemiologic (SCOPE)’ studies. These are split into outcome-anchored (case-crossover, case-time-control and case-case-time control) and exposure-anchored (self-controlled case series and self-controlled risk interval) that are suitable for slightly different research questions.


A simple form of a self-controlled design is the sequence symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84). Hypothesis-free screening of large administrative databases for unsuspected drug-outcome associations (Eur J Epidemiol 2018;33(6):545-55) demonstrates how the sequence symmetry analysis can screen across a very wide range of exposures and outcomes. Case-crossover design


The case-crossover (CCO) design compares the risk of exposure in a time period prior to an outcome, with that in an earlier reference time-period, or set of time periods, to examine the effect of transient exposures on acute events (see The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events, Am J Epidemiol 1991;133(2):144-53). The case-time-control design is a modification of the case-crossover design which use exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (The case-time-control design, Epidemiology 1995;6(3):248-53). However, if not well matched, the case-time-control group may reintroduce selection bias (see Confounding and exposure trends in case-crossover and case-time-control designs, Epidemiology 1996;7(3):231-9). Methods have been suggested to overcome the exposure-trend bias while controlling for time-invariant confounders (see Future cases as present controls to adjust for exposure trend bias in case-only studies, Epidemiology 2011;22(4):568-74). Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol. 2016;184(10):761-9) demonstrates that case-crossover studies of medicines that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group. Evaluation of the Case-Crossover (CCO) Study Design for Adverse Drug Event Detection (Drug Saf. 2017;40(9):789-98) showed that the CCO design adequately performs in studies of acute outcomes with abrupt onsets and exposures characterised as transient with immediate effects.


The self-controlled case-series design (SCCS) and the self-controlled risk interval (SCRI) method were initially developed more specifically for vaccine studies and include only cases with an exposure history, with the observation period for each case and each exposure divided into risk window(s) (e.g., number of days immediately following each exposure) and a control window (observed time outside this risk window). Self-controlled case series


A good overview of the self-controlled case series (SCCS) is provided in Tutorial in biostatistics: the self-controlled case series method (Stat Med. 2006;25(10):1768-97), Self-controlled case series methods: an alternative to standard epidemiological study designs (BMJ. 2016; 354) and Investigating the assumptions of the self-controlled case series method (Stat Med. 2018;37(4):643-58). 


SCCS estimate a relative incidence, that is, incidence rates within the risk window(s) after exposure relative to incidence rates within the control window(s). The SCCS design inherently controls for time-invariant and between-individual confounding, but potential confounders that vary over time e.g., confounding by indication, within the same persons still need to be controlled for.


Three assumptions of the SCCS are that 1) events arise independently within individuals (e.g., fractures do not affect the occurrence of a subsequent fracture), 2) events do not influence subsequent follow-up, and 3) the event itself does not affect the chance of being exposed. However, SCCS studies can be adapted to circumvent these assumptions in specific situations. The third assumption is generally the most limiting one, but where the event only temporarily affects the chance of exposure, additional ‘pre-exposure’ windows can be included; otherwise Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16) describes an extended SCCS method that can address permanent changes to the chance of exposure post-event where exposure windows are short, and is suitable where the event of interest is death.


Tutorial in biostatistics: the self-controlled case series method (Stat Med. 2006;25(10):1768-97) details how to fit SCCS models using standard statistical packages. The book Self-Controlled Case Series Studies: A Modelling Guide with R (P. Farrington, H. Whitaker, Y. G. Weldeselassie, 1st Edition, Chapman and Hall/CRC, 2021) provides a more detailed account. Examples from the tutorial and book are available from


An illustrative example of an SCCS study is Opioids and the Risk of Fracture: a Self-Controlled Case Series Study in the Clinical Practice Research Datalink (Am J Epidemiol. 2021;190(7):1324-31) where the relative incidence of fracture was estimated by comparing time windows when cases were exposed following an opioid prescription and unexposed to opioids. Multiple contiguous risk windows were included to capture changes in risk from new use through to long-term use. A washout window was included after prescriptions stopped, and a pre-exposure window was included to address potential bias from event-dependent exposure. Age, season and exposure to fracture risk–increasing drugs were adjusted for. SCCS assumptions were checked using sensitivity analyses, including taking first fractures only to address independence of events, and excluding individuals who died to address events influencing follow-up.


Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect. 2011;139(12):1805-17) assesses how the SCCS method has been used across 40 vaccine studies, highlights good practices, and provides guidance on how the method should be used and reported. Using several analytical approaches is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs. When should case-only designs be used for safety monitoring of medical products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strengths, and major differences (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should, and should not, be used for drug safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf. 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets to assess four outcomes and five design choices. The Use of active Comparators in self-controlled Designs (Am J Epidemiol. 2021;190(10):2181-7) showed that presence of confounding by indication can be mitigated by using an active comparator, using an empirical example of a study of the association between penicillin and venous thromboembolism (VTE), with roxithromycin, a macrolide antibiotic, as the comparator, and upper respiratory infection, a transient risk factor for VTE, representing time-dependent confounding by indication. Self-controlled risk interval design


The self-controlled risk interval (SCRI) design is a restricted SCCS design suitable when exposure risk windows are short. Rather than using all follow-up time available, short control windows before and/or after risk windows are selected; gaps between risk and control windows may be included e.g., to allow for a washout period. Power may be reduced as compared with the SCCS, but will often suffice for use with large databases where events are not very rare. Since each individual’s observation period is short, age and time effects often do not require control. In Use of FDA's Sentinel System to Quantify Seizure Risk Immediately Following New Ranolazine Exposure (Drug Saf. 2019;42(7):897-906), new users were restricted to patients with 32 days of continuous exposure to ranolazine (i.e., capturing individuals that typically would have a 30-day dispensing). The observation period began the day after the start of the incident ranolazine dispensing and ended on the 32nd day after the index date, with two risk windows covering days 1-10 and 11-20, and the control window days 21-32. The relative incidence is calculated as a ratio of the number of events in the risk interval to the number of events in the control interval multiplied by the ratio of the length of control interval to length of risk interval from only cases.


According to the Master Protocol: Assessment of Risk of Safety Outcomes Following COVID-19 Vaccination ( (2021), the standard SCCS design is more adaptable and is thus preferred when risk or control windows may be less well-defined, when there is a need to increase statistical power, or when unmeasured time-varying confounding is a lesser concern. The SCCS design can also be more easily used to assess multiple occurrences of independent events within an individual. The SCRI design is preferred when it is feasible to have strictly defined risk and control windows for outcomes of interest, or when time varying confounding is a concern. Despite the short observation periods, SCRI may be vulnerable to time-varying confounders; a means of adjustment in SCRI studies, e.g., for steep age effects sometimes seen in studies of childhood vaccine safety, is provided in Quantifying the impact of time-varying baseline risk adjustment in the self-controlled risk interval design (Pharmacoepidemiol Drug Saf. 2015;24(12):1304-12).


4.2.4. Cross-sectional studies 


Cross-sectional studies are studies that seek to collect information on a study population at a specified time point without considering the relative timing of putative outcomes and exposures. Cross-Sectional Studies: Strengths, Weaknesses, and Recommendations (Chest 2020;158(1S):S65-S71) provides recommendations for the conduct of such studies, as well as use cases.


The data collected at the time point may include both exposure and outcome data. In studies looking at the association between drug use and a clinical outcome, use of prevalent drug users (i.e., patients already treated for some time before study follow-up begins) can introduce two types of bias. Firstly, prevalent drug users are “survivors” of the early period of treatment, which can introduce substantial (selection) bias if the risk varies with time. Secondly, covariates relevant for drug use at the time of the entry (e.g., disease severity) may be affected by previous drug utilisation, or patients may differ regarding health-related behaviours (healthy user effect). No firm inference on a causal relationship can therefore be made from the results.


The study The incidence of cerebral venous thrombosis: a cross-sectional study (Stroke 2012;43(12):3375-7) was used to provide an estimate of the background incidence of cerebral sinus venous thrombosis (CSVT) in the context of the safety assessment of COVID-19 vaccines. Patients were identified from all 19 hospitals from two Dutch provinces using specific code lists. Review of medical records and case ascertainment were conducted to include only confirmed cases. Incidence was calculated using population figures from census data as the denominator.


4.2.5. Ecological studies and case-population studies


In ecological studies, populations are the unit of analysis, for example, comparing measures of a drug’s utilisation across countries and correlating it with these countries’ aggregate incidence rate of an outcome. Fundamentals of the ecological design are described in Ecologic studies in epidemiology: concepts, principles, and methods (Annu Rev Public Health 1995;16:61-81) and a ‘tool box’ is presented in Study design VI - Ecological studies (Evid Based Dent. 2006;7(4):108).


As illustrated in Control without separate controls: evaluation of vaccine safety using case-only methods (Vaccine 2004;22(15-16):2064-70), ecological analyses assume that a strong correlation between the trend in an indicator of an exposure (vaccine coverage in this example) and the trend in incidence of a disease (trends calculated over time or across geographical regions) is consistent with a causal relationship. Such comparisons at the population level may only generate hypotheses as they do not allow controlling for time-related confounding variables, such as age and seasonal factors. Moreover, they do not establish whether the outcome primarily occurred in the exposed individuals.


Case-population studies are a form of ecological studies where cases are compared to an aggregated comparator consisting of population data. The case-population study design: an analysis of its application in pharmacovigilance (Drug Saf. 2011;34(10):861-8) explains this design and its application in pharmacovigilance for signal generation and drug surveillance. The design is also explained in Chapter 2: Study designs in drug utilization research of the textbook Drug Utilization Research - Methods and Applications (M Elseviers, B Wettermark, AB Almarsdóttir, et al. Editors. Wiley Blackwell, 2016). An example is a multinational case-population study aiming to estimate population rates of a suspected adverse event using national sales data in Transplantation for Acute Liver Failure in Patients Exposed to NSAIDs or Paracetamol, Drug Saf. 2013;36(2):135–44. Based on the same study, Choice of the denominator in case population studies: event rates for registration for liver transplantation after exposure to NSAIDs in the SALT study in France (Pharmacoepidemiol Drug Saf. 2013;22(2):160-7) compared sales data and healthcare insurance data as denominators to estimate population exposure and found large differences in the event rates. Choosing the wrong denominator in case-population studies might generate erroneous results. The choice of the right denominator depends not only on a valid data source but will also depend on the hazard function of the adverse event.


The case-population approach has also been adapted for vaccine safety surveillance, in particular for prospective investigation of urgent vaccine safety concerns or for the prospective generation of vaccine safety signals (see Vaccine Case-Population: A New Method for Vaccine Safety Surveillance, Drug Saf. 2016 Dec;39(12):1197-1209).


A pragmatic approach towards case-population studies is recommended: in situations where nation-wide or region-wide electronic health records (EHRs) are available and allow assessing the outcomes and confounders with sufficient validity, a case-population approach is neither necessary nor desirable, as one can perform a population-based cohort or case-control study with adequate control for confounding. In situations where outcomes are difficult to ascertain in EHRs, or where such databases do not exist, the case-population design might give an approximation of the absolute and relative risk when both events and exposures are rare. This is limited by the ecological nature of the reference data that restricts the ability to control for confounding.


Other forms of ecological studies include interrupted time-series analyses (see Chapter 4.3.3) and the case-coverage (ecological) design mainly used for vaccine monitoring (see Chapter 16.2).


4.2.6. Target trial emulation General principles


Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available (Am J Epidemiol. 2016;183(8):758-64) introduced target trial emulation in pharmacoepidemiology as a conceptual framework helping researchers to identify and avoid potential biases in observational studies. Target trial emulation is a strategy that uses existing tools and methods to formalise the design and analysis of such studies. It stimulates investigators to identify potential sources of concerns and develop a design that best addresses these concerns and minimises the risk of bias. The first step of the strategy is to design a hypothetical ideal randomised trial (“target trial”) that would answer the research question. The target trial is described with regards to all design elements: the eligibility criteria, the treatment strategies, the assignment procedure, the follow-up, the outcome, the causal contrasts, and the analysis plan. In the second step, the researcher specifies how best to emulate the design elements of the target trial using the available observational data and considering analytic approaches given the trade-offs in an observational setting.


The target trial paradigm has been shown to prevent some common biases, such as immortal time bias or prevalent user bias while also identifying situations where adequate emulation may not be possible using the data at hand (see Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses, J Clin Epidemiol. 2016;79:70-5). Target Trial Emulation: A Framework for Causal Inference From Observational Data (JAMA. 2022;328(24):2446-7) stresses, however, that the lack of randomisation and blinding still requires high attention to the prevention and/or control of selection bias, information bias and confounding, as described in Chapter 6. Successful emulation of a target trial also requires proper definition of time zero, i.e., the start of follow-up in the observational data. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available (Am J Epidemiol. 2016;183(8) 758-64) describes two unbiased choices of time zero when eligibility criteria can be met at multiple times.


The need to explicitly describe the design elements that emulate the clinical trial provides transparency on the study design, the assumptions needed to emulate the trial, and the definition of causal effects, which also increases replicability of the study. The design of both the target trial and its emulation should be compared in a table, following the example of Emulating a Target Trial of Interventions Initiated During Pregnancy with Healthcare Databases: The Example of COVID-19 Vaccination (Epidemiology 2023;34(2):238-46).


Statistical aspects of target trials are discussed in Chapters 3.6 (The target trial) and 22 (Target trial emulation) of the Causal Inference Book (Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC). Extensions of the approach


Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses (J Clin Epidemiol. 2016;79:70-5) gives recommendations on how to deal with more complex scenarios in target trial emulation. The problem of multiple eligible points zero for patients can either be resolved by random selection or by using them all by emulating a sequence of nested trials with increasing time zero. Inverse probability weighting is proposed to estimate the per protocol effect of sustained treatment accounting for potential selection bias due to informative censoring.


A three-step method (cloning, censoring, weighting) has been proposed in How to estimate the effect of treatment duration on survival outcomes using observational data (BMJ. 2018;360: k182) to overcome bias in studies on the effect of treatment duration (and cumulative dose), that are often impaired by selection bias and to achieve better comparability with the treatment assignment performed in clinical trials. A clone-censor-weight approach is also recommended to deal with situations where individuals’ data are consistent with several strategies.


Emulating a target trial in case-control designs: an application to statins and colorectal cancer (Int J Epidemiol. 2020;49(5):1637–46) describes how to emulate a target trial using case-control data and demonstrates that better emulation reduces the discrepancies between observational and randomised trial evidence. Target trial emulation in causal inference studies


A causal inference study is a study designed to investigate, at the individual patient level, the causal effect of an exposure in comparison to non-exposure or to another exposure. In the context of pharmacoepidemiology, the exposure is generally a medical treatment, and the outcome of interest is generally a measure of its safety or effectiveness.


ENCePP recommends that, unless an alternative strategy is justified, target trial emulation should be considered for non-interventional causal inference studies to improve internal validity and increase transparency on definitions and assumptions.


Consideration of the estimand framework (as described in the ICH Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical principles for Clinical Trial, 2019) for the design of the hypothetical trial may provide additional coherence and transparency on definitions of exposures, endpoints, intercurrent events (ICEs), strategies to manage ICEs, approach to missing data and sensitivity analyses to be emulated in the observational study. In particular, the observational study may benefit from the formalised identification of the ICEs in the hypothetical trial. Examples


In the context of the COVID-19 pandemic, several observational studies on vaccine effectiveness used target trial emulation. The observational study BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting (N Engl J Med. 2021;384(15):1412-23) emulated a target trial of the effect of the BNT162b2 vaccine on COVID-19 outcomes by matching vaccine recipients and controls on a daily basis on a wide range of potential confounding factors. The large population size of four large healthcare organisations led to a nearly perfect matching leading to a consistent pattern of similarity between the groups in the days just before day 12 after the first dose, the anticipated onset of the vaccine effect. A similar target trial emulation design was used in Comparative Effectiveness of BNT162b2 and mRNA-1273 Vaccines in U.S. Veterans (N Engl J Med. 2022;386(2):105-15).


In the field of pregnancy epidemiology, Emulating a Target Trial of Interventions Initiated During Pregnancy with Healthcare Databases: The Example of COVID-19 Vaccination (Epidemiology 2023;34(2):238-46) describes a step-by-step specification of the protocol components of a target trial and their emulation including sensitivity analyses using negative controls to evaluate the presence of confounding and, alternatively to a cohort design, a case-crossover or case-time-control design to eliminate confounding by unmeasured time-fixed factors.


In oncology, The value of explicitly emulating a target trial when using real world evidence: an application to colorectal cancer screening (Eur J Epidemiol. 2017;32(6):495-500) compared an observational analysis that explicitly emulated a target trial of screening colonoscopy with simpler observational analyses that do not synchronise treatment assignment and eligibility determination at time zero and/or do not allow for repeated eligibility. This comparison suggests that the lack of an explicit emulation of the target trial leads to biased estimates and shows that allowing for repeated eligibility increases the statistical efficiency of the estimates. Target trial emulation vs. replication of an existing RCT


It is important to distinguish between target trial emulation, i.e., the emulation of a hypothetical ideal RCT, and the replication of existing RCTs, which is sometimes also called emulation. The aim of target trial emulation is to use a framework to conduct a study that avoids common biases and to transparently describe its underlying assumptions and limitations. Replication studies of existing RCTs, however, try to come as close as possible to the results of the existing, non-ideal RCT, to prove the validity of the data and the study design.


Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials (JAMA 2023;329(16):1376-85) concludes that real-world evidence studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle Several studies have compared the results of randomised clinical trials and of observational target trial emulations designed to ask similar questions. Comparing Effect Estimates in Randomized Trials and Observational Studies From the Same Population: An Application to Percutaneous Coronary Intervention (J Am Hear Assoc. 2021;10(11):e020357) highlighted differences between the two study designs that may affect the results and be generalisable to other types of interventions: the observational study conducted in the same registry as the registry used to recruit clinical trial patients needed to be performed in a period that preceded the clinical trial; eligibility criteria differed as not all the necessary data were available for the observational study and no exclusion was based on informed consent; some outcomes could not be defined similarly and some potential confounding factors could not be measured in the observational study.


Emulation differences versus biases when calibrating RWE findings against RCTs(Clin Pharmacol Ther. 2020;107(4):735-7) provides guidance on how to investigate and interpret differences in the estimates of treatment effect in the two study types. It is also emphasised that observational effectiveness studies should not aim at emulating RCTs but at investigating questions that cannot be answered by RCTs, as in cases where randomisation would be difficult or unethical.


4.2.7. Pragmatic trials and large simple trials Pragmatic trials


Randomised controlled trials (RCTs) are considered the gold standard for demonstrating the efficacy of medicinal products and for obtaining an initial estimate of the risk of adverse outcomes. However, they are not necessarily indicative of the benefits, risks or comparative effectiveness of an intervention when used in clinical practice. The ADAPT-SMART Glossary defines pragmatic clinical trials (PCTs) as ‘trials [that] examine interventions under circumstances that approach real-world practice, with more heterogeneous patient populations, possibly less-standardised treatment protocols, and delivery in routine clinical settings as opposed to a research environment’.


The GetReal Trial Tool: design, assess and discuss clinical drug trials in light of Real World Evidence generation (J Clin Epidemiol. 2022;149:244-253) more broadly defines PCTs as ‘methodologies which incorporate real-world elements into clinical trial design, maintaining randomisation’ and describes the GetReal Trial Tool, designed to assess the impact of design choices on generalisability to routine clinical practice, while taking into account risk of bias, precision, acceptability and operational feasibility.


The book Pragmatic Randomized Clinical Trials Using Primary Data Collection and Electronic Health Records (1st Edition - April 8, 2021, Eds: Cynthia Girman, Mary Ritchey) addresses practical aspects and challenges of the design, implementation, and dissemination of PCTs. The publication Series: Pragmatic trials and real world evidence: Paper 1. Introduction (J Clin Epidemiol. 2017;88:7-13) describes the main characteristics of this design and the complex interplay between design options, feasibility, acceptability, validity, precision, and generalisability of the results, and the review Pragmatic Trials (N Engl J Med. 2016;375(5):454-63) discusses the context in which a pragmatic design is relevant, and its strengths and limitations based on examples. Pragmatic trials revisited: applicability is about individualization (J. Clin. Epidemiol. 2018;99:164-166) advocates for more patient-oriented pragmatic trials and suggests to 1) develop new study designs that focus on a single person, 2) incorporate patients’ perspectives on their care, and 3) integrate clinical research and medical care.


PCTs are focused on evaluating benefits and risks of treatments in patient populations and settings that are more representative of routine clinical practice. To ensure generalisability, PCTs should represent the patients to whom the treatment will be applied, for instance, inclusion criteria may be broader (e.g., allowing co-morbidity, co-medication, wider age range) and the follow-up may be minimised and allow for treatment switching. Real-World Data and Randomised Controlled Trials: The Salford Lung Study (Adv Ther. 2020;37(3):977-997) and Monitoring safety in a phase III real-world effectiveness trial: use of novel methodology in the Salford Lung Study (Pharmacoepidemiol Drug Saf. 2017;26(3):344-352) describes the model of a phase III PCT where patients were enrolled through primary care practices using minimal exclusion criteria and without extensive diagnostic testing, and where potential safety events were captured through patients’ electronic health records and triggered review by the specialist safety team.


Pragmatic explanatory continuum summary (PRECIS): a tool to help trial designers (CMAJ. 2009;180(10): E45-E57) is a tool to support pragmatic trial designs and help define and evaluate the degree of pragmatism. The Pragmatic–Explanatory Continuum Indicator Summary (PRECIS) tool has been further refined and now comprises nine domains each scored on a 5 point Likert scale ranging from very explanatory to very pragmatic with an exclusive focus on the issue of applicability (The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350: h2147). A checklist and additional guidance is provided in Improving the reporting of pragmatic trials: an extension of the CONSORT statement (BMJ. 2008;337 (a2390):1-8), and Good Clinical Practice Guidance and Pragmatic Clinical Trials: Balancing the Best of Both Worlds (Circulation 2016;133(9):872-80) discusses the application of Good Clinical Practice to pragmatic trials, and the use of additional data sources such as registries and electronic health records for “EHR-facilitated” PCTs.


Based on the evidence that costs and complexity of conducting randomised trials lead to more restrictive eligibility criteria and shorter durations of trials, and therefore reduce the generalisability and reliability of the evidence about the efficacy and safety of interventions, the article The Magic of Randomization versus the Myth of Real-World Evidence (N Engl J Med. 2020;382(7):674-678) proposes measures to remove practical obstacles to the conduct of randomised trials of appropriate size.


The BRACE CORONA study (Effect of Discontinuing vs Continuing Angiotensin-Converting Enzyme Inhibitors and Angiotensin II Receptor Blockers on Days Alive and Out of the Hospital in Patients Admitted With COVID-19: A Randomized Clinical Trial, JAMA. 2021;325(3):254-64) is a registry-based pragmatic trial that included patients hospitalised with COVID-19 who were taking ACEIs or ARBs prior to hospital admission, to determine whether discontinuation vs. continuation of these drugs affects the number of days alive and out of the hospital. Patients with a suspected COVID-19 diagnosis were included in the registry and followed up until diagnosis confirmation and randomised to either discontinue or continue ACEI or ARB therapy for 30 days. There was no specific treatment modification beyond discontinuing or continuing use of ACEIs or ARBs, the study team provided oversight on drug replacement based on current treatment guidelines. Treatment adherence was assessed based on medical prescriptions recorded in electronic health records after discharge. Large simple trials


Large simple trials are pragmatic clinical trials with minimal data collection narrowly focused on clearly defined outcomes important to patients as well as clinicians. Their large sample size provides adequate statistical power to detect even small differences in effects, the clinical relevance of which can subsequently be assessed. Additionally, large simple trials include a follow-up time that mimics routine clinical practice.


Large simple trials are particularly suited when an adverse event is very rare or has a delayed latency (with a large expected attrition rate), when the population exposed to the risk is heterogeneous (e.g., different indications and age groups), when several risks need to be assessed in the same trial or when many confounding factors need to be balanced between treatment groups. In these circumstances, the cost and complexity of a traditional RCT may outweigh its advantages and large simple trials can help keep the volume and complexity of data collection to a minimum.


Outcomes that are simple and objective can also be measured from the routine process of care using epidemiological follow-up methods, for example by using questionnaires or hospital discharge records. Classical examples of published large simple trials are An assessment of the safety of paediatric ibuprofen: a practitioner based randomised clinical trial (JAMA. 1995;279:929-33) and Comparative mortality associated with ziprasidone and olanzapine in real-world use among 18,154 patients with schizophrenia: The Zodiac Observational Study of Cardiac Outcomes (ZODIAC) (Am J Psychiatry 2011;168(2):193-201).


Note that the use of the term ‘simple’ in the expression ‘Large simple trials’ refers to data structure and not to data collection. It is used in relation to situations in which a small number of outcomes are measured. The term may therefore not adequately reflect the complexity of the studies undertaken,  see for example Methods for the Watch the Spot Trial. A Pragmatic Trial of More- versus Less-Intensive Strategies for Active Surveillance of Small Pulmonary Nodules (Ann Am Thorac Soc 2019;16(12): 1567–1576) Randomised database studies


Randomised database studies can be considered a special form of a large simple trial where patients included in the trial are enrolled from a healthcare system with electronic records. Eligible patients may be identified and flagged automatically by the software, with the opportunity of allowing comparison of included and non-included patients with respect to demographic characteristics and clinical history. Database screening or record linkage can be used to collect outcomes of interest otherwise assessed through the normal process of care. Patient recruitment, informed consent and proper documentation of patient information are hurdles that still need to be addressed in accordance with the applicable legislation for RCTs.


Randomised database studies attempt to combine the advantages of randomisation and observational database studies. These and other aspects of randomised database studies are discussed in The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials (Health Technol Assess. 2014;18(43):1-146) which illustrates the practical implementation of randomised studies in general practice databases. More recent work has been conducted to extend quality standards in the Consolidated Standards of Reporting Trials (CONSORT) to also include database studies: CONSORT extension for the reporting of randomised controlled trials conducted using cohorts and routinely collected data (CONSORT-ROUTINE): checklist with explanation and elaboration (BMJ. 2021;373:n857). These quality standards for reporting also have implications on trial design and conduct.


Published examples of randomised database studies are still scarce, however, this design is becoming more common with the increasing use of electronic health records. Pragmatic randomised trials using routine electronic health records: putting them to the test (BMJ. 2012;344:e55) describes a project to implement randomised trials in the everyday clinical work of general practitioners, comparing treatments that are already in common use, and using routinely collected electronic healthcare records both to identify participants and to gather results. The above-mentioned Salford Lung Study, and the study described in Design of a pragmatic clinical trial embedded in the Electronic Health Record: The VA's Diuretic Comparison Project (Contemp Clin Trials 2022, 116:106754) belong to this category.


A particular form of randomised database studies is the registry-based randomised trial, which uses an existing registry as a source for the identification of cases, their randomisation and their follow-up. The editorial The randomized registry trial - the next disruptive technology in clinical research? (N Engl J Med. 2013;369(17):1579-81) introduces this concept. This hybrid design aims at achieving both internal and external validity by performing a RCT in a data source with higher generalisability (such as registries). Other examples are the TASTE trial that followed patients in the long-term using data from a Scandinavian registry (Thrombus aspiration during ST-segment elevation myocardial infarction (N Engl J Med. 2013;369:1587-97) and A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial (JACC Cardiovasc Interv. 2014;7:857-67).


The importance of large simple trials has been highlighted by their role in evaluating well-established products that were repurposed for the treatment of COVID-19. The PRINCIPLE Trial platform (for trials in primary care) and the RECOVERY Trial platform (for trials in hospitals) have been recruiting large numbers of study participants and sites within short periods of time. In addition to brief case report forms, important clinical outcomes such as death, intensive care admission and ventilation were ascertained through data linkage to existing data streams. The study Lopinavir-ritonavir in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial (Lancet 2020;396:1345–52) found that in patients admitted to hospital with COVID-19, lopinavir–ritonavir was not associated with reductions in 28-day mortality, duration of hospital stay, or risk of progressing to invasive mechanical ventilation or death. On the other hand, in Dexamethasone in Hospitalized Patients with Covid-19 (N Engl J Med. 2021;384(8):693-704), the RECOVERY trial also reported that the use of dexamethasone resulted in lower 28-day mortality in patients who were receiving either invasive mechanical ventilation or oxygen alone at randomisation. Inhaled budesonide for COVID-19 in people at high risk of complications in the community in the UK (PRINCIPLE): a randomised, controlled, open-label, adaptive platform trial (Lancet 2021;398:843-55) reported on the effectiveness of an inhaled corticosteroid for COVID-19 community patients. The streamlined and reusable approaches in data collection in these still recruiting platform trials clearly were essential in the achievements to enrol larger numbers of trial participants and evaluate multiple treatments rapidly.


« Back