Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


Chapter 4: Study design


4.1. Overview

4.2. Types of study design

4.2.1. Cohort studies

4.2.2. Case-control studies

4.2.3. Case-only designs

4.2.4. Cross-sectional studies

4.2.5. Ecological studies

4.3. Definition and validation of drug exposure, outcomes and covariates

4.3.1. Assessment of exposure

4.3.2. Assessment of outcomes

4.3.3. Assessment of covariates

4.3.4. Misclassification and validation

4.4. Specific aspects of study design

4.4.1. Pragmatic trials and large simple trials

4.4.2. Target trial emulation

4.4.3. Self-controlled case series and self-controlled risk interval designs

4.4.4. Positive and negative control exposures and outcomes

4.4.5. Use of an active comparator

4.4.6. Interrupted time series analyses and Difference-in-Differences method

4.4.7. Case-population studies



4.1. Overview


An epidemiological study measures a parameter of occurrence (generally incidence, prevalence or risk or rate ratio) of a health phenomenon (e.g., a disease) in a specified population and with a specified time reference (time point or time period). Epidemiological studies may be descriptive or analytic. Descriptive studies do not aim to evaluate a causal relationship between a population characteristic and the occurrence parameter and generally do not include formal comparisons between population groups. Analytic studies, in contrast, use study populations assembled by the investigators to assess relationships that may be interpreted in causal terms. In pharmacoepidemiology, analytic studies generally aim to quantify the association between a drug exposure and a health phenomenon and test the hypothesis of a causal relationship. They are comparative by nature, e.g., comparing the occurrence of an outcome between subjects being drug users or being non-users or users of a different medicinal product.


Studies can be interventional or non-interventional (observational). Observational Studies: Cohort and Case-Control Studies (Plast Reconstr Surg. 2010;126(6):2234-42) provides a simple and clear explanation of the different types of observational studies and of their advantages and disadvantages. In interventional studies, the subjects are randomly assigned by the investigator to be either exposed or unexposed. These studies, known as randomised clinical trials (RCTs), are typically conducted to test the efficacy of treatments such as new medications. In RCTs, randomisation is used with the intention that the only difference between the exposed and unexposed groups will be the treatment itself. Thus, any differences in the outcome can be attributed to the effect of such treatment. In contrast to experimental studies where exposure is assigned by the investigator, in observational studies the investigator plays no role with regards to which subjects are exposed and which are unexposed. The exposures are either chosen by, or are characteristics of, the subjects themselves.


In order to obtain valid estimates of the effect of a determinant on a parameter of disease occurrence, analytic studies must address three types of epidemiological errors: random error (chance), systematic error (bias) and confounding.


  • Random error (chance): the observed effect estimate is a numerical value obtained from the study data which may be explained by random error because of the underlying variation in the population. The confidence interval (CI) allows the investigator to estimate the range of values within which the actual effect is likely to fall. 
  • Systematic error (bias): the observed effect estimate may be due to systematic error in the measurement of the exposure or disease, or in the selection of the study population. Systematic errors are often predictable. For example, mothers of children with congenital malformations will recall more instances of drug use during pregnancy than mothers of healthy children. This is known in epidemiology as “recall bias”, a type of information bias. Two main types of biases generally need to be considered, information bias and selection bias. Information biases can occur whenever there are errors in the measurement of subject characteristics, for example a lack of pathology results leading to outcome misclassification of certain types of tumours, or lack of validation of exposure, leading to misclassify the exposed and non-exposed status of some study participants. The consequences of these errors depend on whether the distribution of errors for the exposure or disease depends on the value of other variables (differential misclassification) or not (nondifferential misclassification). Selection biases result from procedures used to select subjects and from factors that influence study participation. For example, a case-control study may include non-case subjects with a higher prevalence of one category of the exposure of interest than in the source population for the cases. External factors such as media attention to safety issues may also influence health seeking behaviours and measurement of the incidence of a given outcome. 
  • Confounding: Confounding results from the presence of an additional factor, known as a confounder or confounding factor, that is associated with both the exposure of interest and the outcome. As a result, the exposed and unexposed groups will likely differ not only with regards to the exposure of interest, but also with regards to a number of other characteristics, some of which are themselves related to the likelihood of developing the outcome. Confounding distorts the observed effect estimate for the outcome and the exposure under study. As there is not always a firm distinction between bias and confounding, confounding is also often classified as a type of bias.


There are many different situations where bias may occur, and some authors attribute a name to each of them. The number of such situations is in theory illimited. ENCePP recommends that, rather than being able to name each of them, it is preferable to understand the underlying mechanisms of information bias, selection bias and confounding, be alert to their presence and likelihood of occurrence in a study and recognise methods for their prevention, detection, and control at the analytical stage if possible, such as restriction, stratification, matching, regression and sensitivity analyses. Chapter 5.1 on methods to address bias nevertheless treats time-related bias (a type of information bias with misclassification of person-time) separately as it may have important consequences on the result of a study and may be dealt with by design and time-dependent analyses.


The interpretation of evidence in epidemiology has often relied on whether the p-value is below a certainty threshold and/or the confidence interval excludes some reference value. The ASA statement on P values: context, process, and purpose (Am Statistician 2016;70(2),129-33) of the American Statistical Association emphasised that a p-value, or statistical significance, does not provide a good measure of evidence regarding a model or hypothesis, nor does it measure the size of an effect or the importance of a result. It is therefore recommended to avoid relying only on statistical significance, such as p-values, to interpret study results (see, for example, Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations, Eur J Epidemiol. 2016;31(4):337-50; Scientists rise up against statistical significance, Nature 2019;567(7748):305-7; It’s time to talk about ditching statistical significance, Nature 2019;567(7748):283; Chapter 15. Precision and Study size in Modern epidemiology, Lash TL, VanderWeele TJ, Haneuse S, Rothman KJ, 4th edition, Philadelphia, PA, Wolters Kluwer, 2021). This series of articles led to substantial changes in the guidelines for reporting study results in manuscripts submitted to medical journals, as discussed in Preparing a manuscript for submission to a medical journal (International Committee for Medical Journal Editors, 2021). Causal analyses of existing databases: no power calculations required (J Clin Epidemiol. 2022;144:203-5) encourages researchers to use large healthcare databases to estimate measures of association as opposed to systematically attempting at testing hypotheses (with sufficient power). The ENCePP also recommends that, instead of a dichotomous interpretation based on whether a p-value is below a certain threshold, or a confidence interval excludes some reference value, researchers should rely on a more comprehensive quantitative interpretation that considers the magnitude, precision, and possible bias in the estimates, in addition to a qualitative assessment of the relevance of the selected study design. This is considered a more appropriate approach than one that ascribes to chance any result that does not meet conventional criteria for statistical significance.


The large number of observational studies performed urgently with existing data and in sometimes difficult conditions in early times of the COVID-19 pandemic has raised concerns about the validity of many studies published without peer-review. Considerations for pharmacoepidemiological analyses in the SARS-CoV-2 pandemic (Pharmacoepidemiol Drug Saf. 2020;29(8):825-83) provides recommendations across eight domains: (1) timeliness of evidence generation; (2) the need to align observational and interventional research on efficacy (3) the specific challenges related to “real‐time epidemiology” during an ongoing pandemic; (4) which design to use to answer a specific question; (5) considerations on the definition of exposures and outcomes and what covariates to collect ; (6) the need for transparent reporting; (7) temporal and geographical aspects to be considered when ascertaining outcomes in COVID-19 patients, and (8) the need for rapid assessment. The article Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;190(8):1452-6) reviews and illustrates how immortal time bias and selection bias were present in several studies evaluating the effects of drugs on SARS-CoV-2 infection, and how they can be addressed. Although these two examples specifically refer to COVID-19 studies, such considerations are applicable to research questions with other types of exposures and outcomes.


4.2. Types of study design


This chapter briefly describes the main types of study design. Specific aspects or applications of these designs are presented in Chapter 4.4. These designs are fully described in several textbooks cited in the Introduction, for example, Modern Epidemiology 4th ed. (T. Lash, T.J. VanderWeele, S. Haneuse, K. Rothman. Wolters Kluwer, 2020).


4.2.1. Cohort studies


In a cohort study, the investigator identifies a population at risk for the outcome of interest, defines two or more groups of subjects (referred to as study cohorts) who are free of disease and differ according to their extent of exposure, and follows them over time to observe the occurrence of the disease in the exposed and unexposed cohorts. A cohort study may also include a single cohort that is heterogeneous with respect to exposure history, and occurrence of disease is measured and compared between exposure groups within the cohort. The person-time of observation of each member of the cohorts is counted and the total person-time experience serves as the denominator for the calculation of the incidence rate of the outcome of interest. Cohorts are called fixed when individuals may not move from one exposure group to the other. They are called closed when no loss to follow-up is allowed. The population of a cohort may also be called dynamic (or open) if it can gain and lose members who contribute to the person-time experience for the duration of their presence in the cohort. The main advantages of a cohort study are the possibility to calculate directly interpretable incidence rates of an outcome and to investigate multiple outcomes for a given exposure. Disadvantages are the need for a large sample size and possibly a long study duration to study rare outcomes, although use of existing electronic health records databases allow to retrospectively observe and analyse large cohorts (see Chapters 7.2 and 8).


Cohort studies are commonly used in pharmacoepidemiology to study the utilisation and effects of medicinal products. At the beginning of the COVID-19 pandemic, it was the design of choice to compare the risk and severity of SARS-CoV-2 infection in persons using or not certain types of medicines. An example is Renin-angiotensin system blockers and susceptibility to COVID-19: an international, open science, cohort analysis (Lancet Digit Health 2021;3(2):e98-e114) where electronic health records were used to identify and follow patients aged 18 years or older with at least one prescription for RAS blockers, calcium channel blockers, thiazide or thiazide-like diuretics. Four outcomes were assessed: COVID-19 diagnosis, hospital admission with COVID-19, hospital admission with pneumonia, and hospital admission with pneumonia, acute respiratory distress syndrome, acute kidney injury, or sepsis.


4.2.2. Case-control studies


In a case-control study, the investigator first identifies cases of the outcome of interest and their exposure status, but the denominators (person-time of observation) to calculate their incidence rates are not measured. A referent (traditionally called “control”) group is then sampled to estimate the relative distribution of the exposed and unexposed denominators in the source population from which the cases originate. Only the relative size of the incidence rates can therefore be calculated. Advantages of a case-control study include the possibility to initiate a study based on a set of cases already identified (e.g., in a hospital) and the possibility to study rare outcomes and their association with multiple exposures or risk factors. One of the main difficulties of case-control studies is the appropriate selection of controls independently of exposure or other relevant risk factors in order to ensure that the distribution of exposure categories among controls is a valid representation of the distribution in the source population. Another disadvantage is the difficulty to study rare exposures as a large sample of cases and controls would be needed to identify exposed groups large enough for the planned statistical analysis.


In order to increase the efficiency of exposure assessment in case-control studies, an alternative approach is a design in which the source population is a cohort. The nested case-control design includes all cases occurring in the cohort and a pre-specified number of controls randomly chosen from the population at risk at each time a case (or other relevant event) occurs. A case-cohort study includes all cases and a randomly selected sub-cohort from the population at risk. Advantages of such designs is to allow the conduct of a set of case-control studies from a single cohort and use efficiently electronic healthcare records databases where data on exposures and outcomes are already available.


The study Impact of vaccination on household transmission of SARS-COV-2 in England (N Engl J Med. 2021;385(8):759-60) is a nested case-control study where the cohort was defined by occurrence of a laboratory-confirmed COVID-19 case occurring in a household between 4 January 2021 to 28 February 2021. A “case” was defined as a secondary case occurring in the same household as a COVID-19 case and a “control” was identified as a person without infection. Exposure was defined by the presence of a vaccinated COVID-19 case vs. an unvaccinated COVID-19 case in the same household with the restriction that the vaccinated COVID-19 case had to be vaccinated 21 days prior to being diagnosed. The statistical analysis calculated the odds ratios and 95% confidence intervals for household members becoming ‘cases’ if the COVID-19 case was vaccinated with 21 days or more before testing positive, vs. household members where the COVID-19 case was not vaccinated.


In A plea to stop using the case-control design in retrospective database studies (Stat Med. 2019;38(22):4199-208), the authors argue, based on examples, that the case-control design may lead to bias due to residual confounding that stems from unadjusted differences between exposure groups or from accidental inclusion of intermediary variables in propensity scores or disease-risk scores. It is therefore recommended to use negative control exposures (see Chapter 4.4.4) to evaluate presence of confounding, or alternative designs such as a cohort or a self-controlled design. This is illustrated in the nested case-control study First-dose ChAdOx1 and BNT162b2 COVID-19 vaccines and thrombocytopenic, thromboembolic and hemorrhagic events in Scotland (Nat Med. 2021; 27(7):1290-7), where the authors highlight the possibility of residual confounding by indication and performed a post-hoc self-controlled case series (SCCS, see below) analysis to adjust for time-invariant confounders.


4.2.3. Case-only designs


Although case-only designs are not considered as traditional study designs, they are increasingly used, and a large amount of methodological research has been published over the last decade. They are therefore presented separately.


Case-only designs are designs in which cases are the only subjects. This design reduces confounding by using the exposure and outcome history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, such as sex, socio-economic factors, genetic factors or chronic diseases. Control yourself: ISPE-endorsed guidance in the application of self-controlled study designs in pharmacoepidemiology (Pharmacoepidemiol Drug Saf. 2021;30(6):671–84) proposes a common terminology to facilitate critical thinking in the design, analysis and review of studies called by the authors Self-controlled Crossover Observational PharmacoEpidemiologic (SCOPE) studies. These are split into outcome-anchored (case-crossover, case-time-control and case-case-time control), and exposure-anchored (self-controlled case series and self-controlled risk interval) that are suitable for slightly different research questions. The article concludes that these designs are best suited to studying transient exposures in relation to acute outcomes.


A simple form of a self-controlled design is the sequence symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84).


The case-crossover (CCO) design compares the risk of exposure in a time period prior to an outcome with that in an earlier reference time-period, or set of time periods, to examine the effect of transient exposures on acute events (see The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events, Am J Epidemiol 1991;133(2):144-53). The case-time-control designs are a modification of the case-crossover design which use exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (The case-time-control design, Epidemiology 1995;6(3):248-53). However, if not well matched, the case-time-control group may reintroduce selection bias (see Confounding and exposure trends in case-crossover and case-time-control designs (Epidemiology 1996;7(3):231-9). Methods have been suggested to overcome the exposure-trend bias while controlling for time-invariant confounders (see Future cases as present controls to adjust for exposure trend bias in case-only studies, Epidemiology 2011;22(4):568-74). Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol. 2016;184(10):761-9) demonstrates that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group. Evaluation of the Case-Crossover (CCO) Study Design for Adverse Drug Event Detection (Drug Saf. 2017;40(9):789-98) showed that the CCO design adequately performs in studies of acute outcomes with abrupt onsets and exposures characterised as transient with immediate effects.


The self-controlled case-series design (SCCS) and the self-controlled risk interval (SCRI) method were initially developed more specifically for vaccine studies and include only exposed cases. The observation period for each exposure for each case is divided into risk window(s) (e.g., number of days immediately following each exposure) and a control window (observed time outside this risk window). A good overview is provided in Tutorial in biostatistics: the self-controlled case series method (Stat Med. 2006;25(10):1768-97) and Investigating the assumptions of the self-controlled case series method (Stat Med. 2018;37(4):643-58). These designs are further discussed in Chapter 4.4.3, and their application to vaccine safety studies is presented in Chapter 15.2.1.


4.2.4. Cross-sectional studies


Cross-sectional studies are descriptive studies that seek to collect information on a study population at a specified time point. Cross-Sectional Studies: Strengths, Weaknesses, and Recommendations (Chest 2020;158(1S):S65-S71) provides recommendations for the conduct of cross-sectional studies as well as use cases.


The data collected at the time point may include both exposure and outcome data. In studies looking at the association between drug use and a clinical outcome, use of prevalent drug users (i.e., patients already treated for some time before study follow-up begins) can introduce two types of bias. Firstly, prevalent drug users are “survivors” of the early period of treatment, which can introduce substantial (selection) bias if the risk varies with time. Secondly, covariates relevant for drug use at the time of the entry (e.g., disease severity) may be affected by previous drug utilisation or patients may differ regarding health-related behaviours (healthy user effect). No firm inference on a causal relationship can therefore be made from the results.


The study The incidence of cerebral venous thrombosis: a cross-sectional study (Stroke 2012;43(12):3375-7) was used to provide an estimate of the background incidence of cerebral sinus venous thrombosis (CSVT) in the context of the safety assessment of COVID-19 vaccines. Patients were identified from all 19 hospitals from two Dutch provinces using specific code lists. Review of medical records and case ascertainment were conducted to include only confirmed cases. Incidence was calculated using population figures from census data as the denominator.


4.2.5. Ecological studies


Ecological analyses are not hypothesis testing but hypothesis generating studies. Fundamentals of the ecological design are described in Ecologic studies in epidemiology: concepts, principles, and methods (Annu Rev Public Health 1995;16:61-81) and a ‘tool box’ is presented in Study design VI - Ecological studies (Evid Based Dent. 2006;7(4):108).


As illustrated in Control without separate controls: evaluation of vaccine safety using case-only methods (Vaccine 2004;22(15-16):2064-70), ecological analyses assume that a strong correlation between the trend in an indicator of an exposure (vaccine coverage in this example) and the trend in incidence of a disease (trends calculated over time or across geographical regions) is consistent with a causal relationship. Such comparisons at the population level may only generate hypotheses as they do not allow controlling for time-related confounding variables, such as age and seasonal factors. Moreover, they do not establish that the effect occurred in the exposed individuals.


Case-population studies and interrupted time series analyses are forms of ecological studies and are discussed in Chapters 4.4.7 and 4.4.6, respectively. The case-coverage (ecological) design is mainly used for vaccine monitoring and is presented in Chapters, and


4.3. Definition and validation of drug exposure, outcomes and covariates


Historically, pharmacoepidemiological studies relied on patient-reported information or paper-based health records. The rapid increase in access to electronic healthcare records and large administrative databases has changed the way exposures and outcomes are defined, measured and validated. All variables in secondary data sources should be defined with care taking into account the fact that information is often recorded for purposes other than pharmacoepidemiology. Secondary data originate mainly from four types of data sources: prescription data (e.g., UK CPRD primary care data), data on dispensing (e.g., PHARMO outpatient pharmacy database), data on payment for medication (namely claims data, e.g., IMS LifeLink PharMetrics Plus), data collected in surveys, and data from specific means of data collection (e.g., pregnancy registries, vaccine registries). Misclassification of exposure, outcome or any covariate, or incorrect categorisation of these variables, may lead to information bias, i.e., a distortion of the value of the point estimate (see Chapter 5).


4.3.1. Assessment of exposure


Exposure definitions can include simple dichotomous variables (e.g., ever vs. never exposed) or be more granular, including estimates of duration, exposure windows (e.g., current vs. past exposure) also referred to as risk periods, or dosage (e.g., current dosage, cumulative dosage over time). Consideration should be given to both the requirements of the study design and the availability of variables. Assumptions made when preparing drug exposure data for analysis have an impact on results: an unreported step in pharmacoepidemiological studies (Pharmacoepidemiol Drug Saf. 2018;27(7):781-8) demonstrates the effect of certain exposure assumptions on findings and provides a framework to report preparation of exposure data. The Methodology chapter of the book Drug Utilization Research. Methods and Applications (M. Elseviers, B. Wettermark, A.B. Almarsdottir et al. Ed. Wiley Blackwell, 2016) discusses different methods for data collection on drug utilisation.


The population included in these data sources follows a process of attrition: drugs that are prescribed are not necessarily dispensed, and drugs that are dispensed are not necessarily ingested. In Primary non-adherence in general practice: a Danish register study (Eur J Clin Pharmacol 2014;70(6):757-63), 9.3% of all prescriptions for new therapies were never redeemed at the pharmacy, with different percentages per therapeutic and patient groups. The attrition from dispensing to ingestion is even more difficult to measure, as it is compounded by uncertainties about which dispensed drugs are actually taken by the patients and the patients’ ability to provide an accurate account of their intake.


4.3.2. Assesment of outcomes


A case definition compatible with the data source should be developed for each outcome of a study at the design stage. This description should include how events will be identified and classified as cases, whether cases will include prevalent as well as incident cases, exacerbations and second episodes (as differentiated from repeat codes) and all other inclusion or exclusion criteria. The reason for the data collection and the nature of the healthcare system that generated the data should also be described as they can impact on the quality of the available information and the presence of potential biases. Published case definitions of outcomes, such as those developed by the Brighton Collaboration in the context of vaccine studies, are useful but not necessarily compatible with the information available in observational data sources. For example, information on the onset or duration of symptoms, or clinical diagnostic procedures, may not be available.


Search criteria to identify outcomes should be defined and the list of codes and any used case finding algorithm should be provided. Generation of code lists requires expertise in both the coding system and the disease area. Researchers should consult clinicians who are familiar with the coding practice within the studied field. Suggested methodologies are available for some coding systems, as described in Creating medical and drug code lists to identify cases in primary care databases (Pharmacoepidemiol Drug Saf. 2009;18(8):704-7). Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models (Annu Rev Biomed Data Sci. 2018;1:53-68) reports on methods for phenotyping (finding subjects with specific conditions or outcomes) which are becoming more commonly used particularly in multi-database studies (see Chapters 8 and 15.6).  Care should be given when re-using a code list from another study as code lists depend on the study objective and methods. Public repository of codes such as are available and researchers are also encouraged to make their own set of coding available.


In some circumstances, chart review or free text entries in electronic format linked to coded entries can be useful for outcome identification or confirmation. Such identification may involve an algorithm with use of multiple code lists (for example disease plus therapy codes) or an endpoint committee to adjudicate available information against a case definition. In some cases, initial plausibility checks or subsequent medical chart review will be necessary. When databases contain prescription data only, drug exposure may be used as a proxy for an outcome, or linkage to different databases is required. The accurate date of onset is particularly important for studies relying upon timing of exposure and outcome such as in the self-controlled designs (see Chapters 4.2.3 and 4.4.3).


4.3.3. Assesment of covariates


In pharmacoepidemiological studies, covariates use includes selecting and matching study subjects, comparing characteristics of the cohorts, developing propensity scores, creating stratification variables, evaluating effect modifiers and adjusting for confounders. Reliable assessment of covariates is therefore essential for the validity of results. A given database may or may not be suitable for studying a research question depending on the availability of information on these covariates.


Some patient characteristics and covariates vary with time and accurate assessment is therefore time dependent. The timing of assessment of the covariates is an important factor for the correct classification of the subjects and should be clearly reported. Capturing covariates can be done at one or multiple points during the study period. In the latter scenario, the variable will be modelled as time-dependent variable (See section 4.4.6).


Assessment of covariates can be performed using different periods of time (look-back periods or run-in periods). Fixed look-back periods (for example 6 months or 1 year) can be appropriate when there are changes in coding methods or in practices or when using the entire medical history of a patient is not feasible. Estimation using all available covariates information versus a fixed look-back window for dichotomous covariates (Pharmacoepidemiol Drug Saf. 2013; 22(5):542-50) establishes that defining covariates based on all available historical data, rather than on data observed over a commonly shared fixed historical window will result in estimates with less bias. However, this approach may not always be applicable, for example when data from paediatric and adult periods are combined because covariates may significantly differ between paediatric and adult populations (e.g., height and weight).


4.3.4. Misclassification and validation




The validity of pharmacoepidemiological studies depends on the correct assessment of exposure, outcomes and confounders. Measurement errors, i.e., misclassification of binary or categorical variables or mismeasurement of continuous variables result in information bias. The effect of misclassification in the presence of covariates (Am J Epidemiol. 1980;112(4):564–9) shows that non-differential misclassification of a confounder results in incomplete control for confounding.


Misclassification of exposure is non-differential if the assessment of exposure does not depend on the true outcome status and misclassification of outcome is non-differential if the assessment of the outcome does not depend on exposure status. Misclassification of exposure and outcome is considered dependent if the factors that predict misclassification of exposure are expected to also predict misclassification of outcome.


Misconceptions About Misclassification: Non-Differential Misclassification Does Not Always Bias Results Toward the Null  (Am J Epidemiol. 2022; kwac03) emphasises that bias towards the null is not always “conservative” but might mask important safety signals and discusses seven exceptions to the epidemiologic ‘mantra’ about non-differential misclassification bias resulting in estimates towards the null. One important exception is outcome measurement with perfect specificity which results in unbiased estimates of the risk ratio.


The influence of misclassification on the point estimate should be quantified or, if this is not possible, its impact on the interpretation of the results should be discussed. FDA’s Quantitative Bias Analysis Methodology Development: Sequential Bias Adjustment for Outcome Misclassification (2017) proposes a method of adjustment when validation of the variable is complete. Use of the Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies (Am J Epidemiol. 1993;138(11):1007–15) proposes a method based on estimates of the positive predictive value which requires validation of a sample of patients with the outcome only, while assuming that sensitivity is non-differential and has been used in a web application (Outcome misclassification: Impact, usual practice in pharmacoepidemiological database studies and an online aid to correct biased estimates of risk ratio or cumulative incidence; Pharmacoepidemiol Drug Saf. 2020;29(11):1450-5) which allows correction of risk ratio or cumulative incidence point estimates and confidence intervals for bias due to outcome misclassification based on this methodology. The article Basic methods for sensitivity analysis of biases (Int J Epidemiol. 1996;25(6):1107-16) provides different examples of methods for examining the sensitivity of study results to biases, with a focus on methods that can be implemented without computer programming. Good practices for quantitative bias analysis (Int J Epidemiol. 2014;43(6):1969-85) advocates explicit and quantitative assessment of misclassification bias, including guidance on which biases to assess in each situation, what level of sophistication to use, and how to present the results.




Common misconceptions about validation studies (Int J Epidemiol. 2020;49(4): 1392-6) discusses important aspects on the design of validation studies. It stresses the importance of stratification on key variables (e.g., exposure in outcome validation) and shows that by sampling conditionally on the imperfectly classified measure (e.g., case as identified by the study algorithm), only the positive and negative predictive values can be validly estimated.


Most database studies will be subject to outcome misclassification to some degree, although case adjudication against an established case definition or a reference standard can remove false positives, while false negatives can be mitigated if a broad search algorithm is used. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract. 2010:60:e128 36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf. 2012;supp1:82 9) provide examples of validation. External validation against chart review or physician/patient questionnaire is possible in some instances but the questionnaires cannot always be considered as ‘gold standard’. Misclassification of exposure should also be measured based on validation, as feasible.


Linkage validation can be used when another database is used for the validation through linkage methods (see Using linked electronic data to validate algorithms for health outcomes in administrative databases, J Comp Eff Res. 2015;4:359-66). In some situations, there is no access to a resource to provide data for comparison. In this case, indirect validation may be an option, as explained in the textbook Applying quantitative bias analysis to epidemiologic data (Lash T, Fox MP, Fink AK. Springer-Verlag, New-York, 2009).


Structural validation of the database with internal logic checks should also be performed to verify the completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures or if a certain variable has values within a known reasonable range.


While the positive predictive value is more easily measured than the negative predictive value, a low specificity is more damaging than a low sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics; J Clin Epidemiol. 2005;58(4):323-37).


For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of a previous validation study should however consider the effect of any differences in prevalence and inclusion and exclusion criteria, the distribution and analysis of risk factors as well as subsequent changes to health care, procedures and coding, as illustrated in Basic Methods for Sensitivity Analysis of Biases, (Int J Epidemiol. 1996;25(6):1107-16).


4.4. Specific aspects of study design


4.4.1. Pragmatic trials and large simple trials Pragmatic trials


RCTs are considered the gold standard for demonstrating the efficacy of medicinal products and for obtaining an initial estimate of the risk of adverse outcomes. However, they are not necessarily indicative of the benefits, risks or comparative effectiveness of an intervention when used in clinical practice. The IMI GetReal Glossary defines a pragmatic clinical trial (PCT) as ‘a study comparing several health interventions among a randomised, diverse population representing clinical practice, and measuring a broad range of health outcomes’. The publication Series: Pragmatic trials and real world evidence: Paper 1. Introduction (J Clin Epidemiol. 2017;88:7-13) describes the main characteristics of this design and the complex interplay between design options, feasibility, acceptability, validity, precision, and generalisability of the results, and the review Pragmatic Trials (N Engl J Med. 2016;375(5):454-63) discusses the context in which a pragmatic design is relevant, and its strengths and limitations based on examples.


PCTs are focused on evaluating benefits and risks of treatments in patient populations and settings that are more representative of routine clinical practice. To ensure generalisability, PCTs should represent the patients to whom the treatment will be applied, for instance, inclusion criteria may be broader (e.g., allowing co-morbidity, co-medication, wider age range) and the follow-up may be minimised and allow for treatment switching. Real-World Data and Randomised Controlled Trials: The Salford Lung Study (Adv Ther. 2020;37(3):977-997) and Monitoring safety in a phase III real-world effectiveness trial: use of novel methodology in the Salford Lung Study (Pharmacoepidemiol Drug Saf. 2017;26(3):344-352) describes the model of a phase III PCT where patients were enrolled through primary care practices using minimal exclusion criteria and without extensive diagnostic testing, and where potential safety events were captured through patients’ electronic health records and triggered review by the specialist safety team.


Pragmatic explanatory continuum summary (PRECIS): a tool to help trial designers (CMAJ. 2009;180(10): E45-E57) is a tool to support pragmatic trial designs and help define and evaluate the degree of pragmatism. The Pragmatic–Explanatory Continuum Indicator Summary (PRECIS) tool has been further refined and now comprises nine domains each scored on a 5 point Likert scale ranging from very explanatory to very pragmatic with an exclusive focus on the issue of applicability (The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350: h2147). A checklist and additional guidance is provided in Improving the reporting of pragmatic trials: an extension of the CONSORT statement (BMJ. 2008;337 (a2390):1-8), and Good Clinical Practice Guidance and Pragmatic Clinical Trials: Balancing the Best of Both Worlds (Circulation 2016;133(9):872-80) discusses the application of Good Clinical Practice to pragmatic trials, and the use of additional data sources such as registries and electronic health records for “EHR-facilitated” PCTs.


Based on the evidence that the current costs and complexity of conducting randomised trials lead to more restrictive eligibility criteria and shorter durations of trials, and therefore reduce the generalisability and reliability of the evidence about the efficacy and safety of interventions, the article The Magic of Randomization versus the Myth of Real-World Evidence (N Engl J Med. 2020;382(7):674-678) proposes measures to remove practical obstacles to the conduct of randomised trials of appropriate size.


The BRACE CORONA study (Effect of Discontinuing vs Continuing Angiotensin-Converting Enzyme Inhibitors and Angiotensin II Receptor Blockers on Days Alive and Out of the Hospital in Patients Admitted With COVID-19: A Randomized Clinical Trial, JAMA. 2021;325(3):254-64) is a registry-based pragmatic trial that included patients hospitalised with COVID-19 who were taking ACEIs or ARBs prior to hospital admission, to determine whether discontinuation vs. continuation of these drugs affects the number of days alive and out of the hospital. Patients with a suspected COVID-19 diagnosis were included in the registry and followed up until diagnosis confirmation and randomised to either discontinue or continue ACEI or ARB therapy for 30 days. There was no specific treatment modification beyond discontinuing or continuing use of ACEIs or ARBs, the study team provided oversight on drug replacement based on current treatment guidelines. Treatment adherence was assessed based on medical prescriptions recorded in electronic health records after discharge. Large simple trials


Large simple trials are pragmatic clinical trials with minimal data collection narrowly focused on clearly defined outcomes important to patients as well as clinicians. Their large sample size provides adequate statistical power to detect even small differences in effects, the clinical relevance of which can subsequently be assessed. Additionally, large simple trials include a follow-up time that mimics routine clinical practice.


Large simple trials are particularly suited when an adverse event is very rare or has a delayed latency (with a large expected attrition rate), when the population exposed to the risk is heterogeneous (e.g., different indications and age groups), when several risks need to be assessed in the same trial or when many confounding factors need to be balanced between treatment groups. In these circumstances, the cost and complexity of a traditional RCT may outweigh its advantages and large simple trials can help keep the volume and complexity of data collection to a minimum.


Outcomes that are simple and objective can also be measured from the routine process of care using epidemiological follow-up methods, for example by using questionnaires or hospital discharge records. Classical examples of published large simple trials are An assessment of the safety of paediatric ibuprofen: a practitioner based randomised clinical trial (JAMA. 1995;279:929-33) and Comparative mortality associated with ziprasidone and olanzapine in real-world use among 18,154 patients with schizophrenia: The Zodiac Observational Study of Cardiac Outcomes (ZODIAC) (Am J Psychiatry 2011;168(2):193-201).


Note that the use of the term ‘simple’ in the expression ‘Large simple trials’ refers to data structure and not to data collection. It is used in relation to situations in which a small number of outcomes are measured. The term may therefore not adequately reflect the complexity of the studies undertaken. Randomised database studies


Randomised database studies can be considered a special form of a large simple trial where patients included in the trial are enrolled from a healthcare system with electronic records. Eligible patients may be identified and flagged automatically by the software, with the opportunity of allowing comparison of included and non-included patients with respect to demographic characteristics and clinical history. Database screening or record linkage can be used to collect outcomes of interest otherwise assessed through the normal process of care. Patient recruitment, informed consent and proper documentation of patient information are hurdles that still need to be addressed in accordance with the applicable legislation for RCTs.


Randomised database studies attempt to combine the advantages of randomisation and observational database studies. These and other aspects of randomised database studies are discussed in The opportunities and challenges of pragmatic point-of-care randomised trials using routinely collected electronic records: evaluations of two exemplar trials (Health Technol Assess. 2014;18(43):1-146) which illustrates the practical implementation of randomised studies in general practice databases. More recent work has been conducted to extend quality standards in the Consolidated Standards of Reporting Trials (CONSORT) to also include database studies: CONSORT extension for the reporting of randomised controlled trials conducted using cohorts and routinely collected data (CONSORT-ROUTINE): checklist with explanation and elaboration (BMJ. 2021;373:n857). These quality standards for reporting also have implications on trial design and conduct.


Published examples of randomised database studies are still scarce, however, this design is becoming more common with the increasing use of electronic health records. Pragmatic randomised trials using routine electronic health records: putting them to the test (BMJ. 2012;344:e55) describes a project to implement randomised trials in the everyday clinical work of general practitioners, comparing treatments that are already in common use, and using routinely collected electronic healthcare records both to identify participants and to gather results. The above-mentioned Salford Lung Study, and the study described in Design of a pragmatic clinical trial embedded in the Electronic Health Record: The VA's Diuretic Comparison Project (Contemp Clin Trials 2022, 116:106754) belong to this category.


A particular form of randomised database studies is the registry-based randomised trial, which uses an existing registry as a source for the identification of cases, their randomisation and their follow-up. The editorial The randomized registry trial - the next disruptive technology in clinical research? (N Engl J Med. 2013;369(17):1579-81) introduces this concept. This hybrid design aims at achieving both internal and external validity by performing a RCT in a data source with higher generalisability (such as registries). Other examples are the TASTE trial that followed patients in the long-term using data from a Scandinavian registry (Thrombus aspiration during ST-segment elevation myocardial infarction (N Engl J Med. 2013;369:1587-97) and A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial (JACC Cardiovasc Interv. 2014;7:857-67).


The importance of large simple trials has been highlighted by their role in evaluating well-established products that were repurposed for the treatment of COVID-19. The PRINCIPLE Trial platform (for trials in primary care) and the RECOVERY Trial platform (for trials in hospitals) have been recruiting large numbers of study participants and sites within short periods of time. In addition to brief case report forms, important clinical outcomes such as death, intensive care admission and ventilation were ascertained through data linkage to existing data streams. The study Lopinavir-ritonavir in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial (Lancet 2020;396:1345–52) found that in patients admitted to hospital with COVID-19, lopinavir–ritonavir was not associated with reductions in 28-day mortality, duration of hospital stay, or risk of progressing to invasive mechanical ventilation or death. On the other hand, in Dexamethasone in Hospitalized Patients with Covid-19 (N Engl J Med. 2021;384(8):693-704), the RECOVERY trial also reported that the use of dexamethasone resulted in lower 28-day mortality in patients who were receiving either invasive mechanical ventilation or oxygen alone at randomisation. Inhaled budesonide for COVID-19 in people at high risk of complications in the community in the UK (PRINCIPLE): a randomised, controlled, open-label, adaptive platform trial (Lancet 2021;398:843-55) reported on the effectiveness of an inhaled corticosteroid for COVID-19 community patients. The streamlined and reusable approaches in data collection in these still recruiting platform trials clearly were essential in the achievements to enrol larger numbers of trial participants and evaluate multiple treatments rapidly.


4.4.2. Target trial emulation


Observational emulation of a clinical trial was initially introduced in The clinical trial as a paradigm for epidemiologic research (J Clin Epidemiol. 1989;42(6):491-6). It was later extended to pharmacoepidemiology as a conceptual framework helping researchers to identify and avoid potential biases, as described in Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available (Am J Epidemiol. 2016;183(8):758-64). The number of target trial emulations using observational data published in the scientific literature is now rapidly growing.


The underlying idea is to design a hypothetical ideal randomised trial (“target trial”) that would answer the research question. The target trial is described with regards to all design elements: the eligibility criteria, the treatment strategies, the assignment procedure, the follow-up, the outcome, the causal contrasts and the analysis plan. In the second step, the researcher specifies how best to emulate the design elements of the target trial using the available observational data source and what analytic approaches to take given the trade-offs in an observational setting. The target trial paradigm aims to prevent some common biases, such as immortal time bias or prevalent user bias while also identifying situations where adequate emulation may not be possible using the data at hand. It also facilitates a systematic methodological evaluation and comparison of observational studies (Specifying a target trial prevents immortal time bias and other self-inflicted injuries in observational analyses. J Clin Epidemiol. 2016;79:70-5).The framework can also be used to help describe the randomised trial which the available observational data can most closely emulate.


Several studies have compared the results of randomised clinical trials and of observational target trial emulations designed to ask similar questions. Comparing Effect Estimates in Randomized Trials and Observational Studies From the Same Population: An Application to Percutaneous Coronary Intervention (J Am Hear Assoc. 2021;10:e020357) highlighted differences between the two study designs that may affect the results and be generalisable to other types of interventions: the observational study conducted in the same registry used to recruit clinical trial patients needed to be performed in a period that precedes the clinical trial; eligibility criteria differed as not all the necessary data were available for the study and no exclusion was based on informed consent; some outcomes could not be defined similarly; and some potential confounding factors could not be measured. Emulating a target trial in case-control designs: an application to statins and colorectal cancer (Int J Epidemiol. 2020;49(5):1637–46) describes how to emulate a target trial using case-control data and demonstrates that better emulation reduces the discrepancies between observational and randomised trial evidence. Interim results from the 10 first emulations reported in Emulating Randomized Clinical Trials With Nonrandomized Real-World Evidence Studies: First Results From the RCT DUPLICATE Initiative (Circulation 2021;143(10):1002-13) found that selection of active comparator therapies with similar indications and use patterns enhances the validity of real-world evidence. Emulation differences versus biases when calibrating RWE findings against RCTs (Clin Pharmacol Ther. 2020;107(4):735-7) provided guidance on how to investigate and interpret differences in treatment effect estimates from the two study types. The authors of these articles also emphasise that emulation of clinical trials is not the purpose of observational studies. Their strength is the ability to answer questions that cannot be answered by RCTs, as in cases where randomisation would be difficult or unethical or questions cannot be answered by RCTs, and synergies between the two designs should be further explored to support faster availability of trial results into clinical practice.


Successful emulation of a target trial requires proper definition of time points, including time zero of follow-up in the observational data. Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available (Am J Epidemiol. 2016;183(8) 758-64) describes two unbiased choices of time zero when eligibility criteria can be met at multiple times. Studies on the effect of treatment duration are also often impaired by selection bias and How to estimate the effect of treatment duration on survival outcomes using observational data (BMJ. 2018;360: k182) proposes a 3-step method (cloning, censoring, weighting) for overcoming bias in these types of studies.


In the context of the COVID-19 pandemic, several observational studies on vaccine effectiveness used target trial emulation. The observational study BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting (N Engl J Med. 2021;384(15):1412-23) emulated a target trial of the effect of the BNT162b2 vaccine on COVID-19 outcomes by matching vaccine recipients and controls on a daily basis on a wide range of potential confounding factors. The large population size of four large health care organisations led to a nearly perfect matching leading to a consistent pattern of similarity between the groups in the days just before day 12 after the first dose, the anticipated onset of the vaccine effect. A similar target trial emulation design was used in Comparative Effectiveness of BNT162b2 and mRNA-1273 Vaccines in U.S. Veterans (N Engl J Med. 2022;386(2):105-15).


ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions (BMJ. 2016;355:i4919) supports the evaluation of bias in estimates of the comparative effectiveness (harm or benefit) of interventions from studies that did not use randomisation and can be applied to target trials and to systematic reviews that include non-randomised studies.


Statistical aspects of target trials are discussed in Chapters 3.6 (The target trial) and 22 (Target trial emulation) of the Causal Inference Book (Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC).


4.4.3. Self-controlled case series and self-controlled risk interval designs


The self-controlled case series (SCCS) design was initially developed for vaccines (see Chapter 15.2). It is a case-only design where the observation period for each case is divided into risk window(s) (e.g., number of days following a vaccine or prescription exposure) and control window(s) (observed time before and after risk windows). SCCS estimates a relative incidence, that is, incidence rates within the risk window(s) after exposure relative to incidence rates within the control window(s). The SCCS design inherently controls for time-invariant and between-individual confounding, but potential confounders that vary over time within the same persons still need to be controlled for.


Three assumptions of the SCCS are that 1) events arise independently within individuals (e.g., fractures do not affect the occurrence of a subsequent fracture), 2) events do not influence subsequent follow-up, and 3) the event itself does not affect the chance of being exposed. However, SCCS studies can be adapted to circumvent these assumptions in specific situations. The third assumption is generally the most limiting, but where the event only temporarily affects the chance of exposure, additional ‘pre-exposure’ windows can be included; otherwise Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16) describes an extended SCCS method that can address permanent changes to the chance of exposure post-event where exposure windows are short, and is suitable where the event of interest is death.


A general introduction is given in Self controlled case series methods: an alternative to standard epidemiological study designs (BMJ. 2016; 354), and in Tutorial in biostatistics: the self-controlled case series method (Stat Med. 2006;25(10):1768-97), which further explains how to fit SCCS models using standard statistical packages. The book Self-Controlled Case Series Studies: A Modelling Guide with R (P. Farrington, H. Whitaker, Y. G. Weldeselassie, 1st Edition, Chapman and Hall/CRC, 2021) provides a more detailed account. Examples from the tutorial and book are available from


An illustrative example of an SCCS study is Opioids and the Risk of Fracture: a Self-Controlled Case Series Study in the Clinical Practice Research Datalink (Am J Epidemiol. 2021;190(7):1324-31) where the relative incidence of fracture was estimated by comparing time windows when cases were exposed following an opioid prescription and unexposed to opioids. Multiple contiguous risk windows were included to capture changes in risk from new use through to long-term use. A washout window was included after prescriptions stopped, and a pre-exposure window was included to address potential bias from event-dependent exposure. Age, season and exposure to fracture risk–increasing drugs were adjusted for. SCCS assumptions were checked using sensitivity analyses, including taking first fractures only to address independence of events, and excluding individuals who died to address events influencing follow-up.


Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect. 2011;139(12):1805-17) assesses how the SCCS method has been used across 40 vaccine studies, highlights good practice and gives guidance on how the method should be used and reported. Using several methods of analysis is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs. When should case-only designs be used for safety monitoring of medical products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strengths, and major differences (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should and should not be used for drug safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf. 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets. Four outcomes and five design choices were assessed. The Use of active Comparators in self-controlled Designs (Am J Epidemiol. 2021;190(10):2181-7) showed that presence of confounding by indication can be mitigated by using an active comparator, using an empirical example of a study of the association between penicillin and venous thromboembolism (VTE), with roxithromycin, a macrolide antibiotic, as the comparator, and upper respiratory infection, a transient risk factor for VTE, representing time-dependent confounding by indication.


The self-controlled risk interval design (SCRI) has been mostly used in vaccine safety studies. It is a restricted SCCS design suitable when exposure risk windows are short. Rather than using all follow-up time available, short control windows before and/or after risk windows are selected; gaps between risk and control windows may be included e.g., to allow for washout. Power is reduced as compared with the SCCS, but will often suffice for use with large databases where events are not very rare. Since each individual’s observation period is short, age and time effects often do not require control. In Use of FDA's Sentinel System to Quantify Seizure Risk Immediately Following New Ranolazine Exposure (Drug Saf. 2019;42(7):897-906), new users were restricted to patients with 32 days of continuous exposure to ranolazine (i.e., capturing individuals that typically would have a 30-day dispensing). The observation period began the day after the start of the incident ranolazine dispensing and ended on the 32nd day after the index date, with two risk windows covering days 1-10 and 11-20, and the control window days 21-32. The relative incidence is calculated as a ratio of the number of events in the risk interval to the number of events in the control interval multiplied by the ratio of the length of control interval to length of risk interval from only cases.


According to the Master Protocol: Assessment of Risk of Safety Outcomes Following COVID-19 Vaccination (, the standard SCCS design is more adaptable and is thus preferred when risk or control windows may be less well-defined, when there is a need to increase statistical power, or when unmeasured time-varying confounding is a lesser concern. The SCCS design can also be more easily used to assess multiple occurrences of independent events within an individual. The SCRI design is preferred when it is feasible to have strictly defined risk and control windows for outcomes of interest, or when time varying confounding is a concern. Despite the short observation periods, SCRI may be vulnerable to time-varying confounders; a means of adjustment in SCRI studies, e.g., for steep age effects sometimes seen in studies of childhood vaccine safety, is provided in Quantifying the impact of time-varying baseline risk adjustment in the self-controlled risk interval design (Pharmacoepidemiol Drug Saf. 2015; 24(12):1304-12).


4.4.4. Positive and negative control exposures and outcomes


The validity of causal associations may be tested by using control exposures or outcomes. A negative control outcome is a variable known not to be causally affected by the treatment of interest. Likewise, a negative control exposure is a variable known not to causally affect the outcome of interest. Conversely, a positive control outcome is a variable that is understood to be positively associated with the exposure of interest and a positive control exposure is one which is known to increase the risk of the outcome of interest.


Well-selected positive and negative controls support decision-making on whether the data at hand correctly support the study results for known associations or correctly demonstrate lack of association. Positive controls with negative findings and negative controls with positive findings may signal the presence of bias, as illustrated in a study showing that adherence to statins was associated with a decreased risk of biologically implausible outcomes (Statin adherence and risk of accidents: a cautionary tale, Circulation 2009;119(15):2051-7) and in Utilization of Positive and Negative Controls to Examine Comorbid Associations in Observational Database Studies (Med Care 2017;55(3):244-51). This general principle, with additional examples, is described in Control Outcomes and Exposures for Improving Internal Validity of Nonrandomized Studies (Health Serv Res. 2015;50(5):1432-51) and Negative Controls: A Tool for Detecting Confounding and Bias in Observational Studies (Epidemiology 2010 May; 21(3): 383–388.). Negative controls have also been used to identify other sources of bias including selection bias and measurement bias in Brief Report: Negative Controls to Detect Selection Bias and Measurement Bias in Epidemiologic Studies (Epidemiology. 2016 Sep; 27(5): 637–641) and in Negative control exposure studies in the presence of measurement error: implications for attempted effect estimate calibration (Int J Epidemiol. 2018 Apr; 47(2): 587–596). Chapter 18. Method Validity of The Book of OHDSI (2021) recommends use of negative and positive controls as a diagnostic test to evaluate whether the study design produced valid results and proposes practical considerations for their selection. Selecting drug-event combinations as reliable controls nevertheless poses important challenges: it is difficult to establish for negative controls proof of absence of an association, and it is still more problematic to select positive controls because it is desirable not only to measure an association but also an accurate estimate of the effect size. This has led to attempts to establish libraries of controls that can be used to characterise the performance of different observational datasets in detecting various types of associations using a number of different study designs. Although the methods used to identify negative and positive controls may be questioned according to Evidence of Misclassification of Drug-Event Associations Classified as Gold Standard 'Negative Controls' by the Observational Medical Outcomes Partnership (OMOP) (Drug Saf. 2016;39(5):421-32), this approach may allow to separate random and systematic errors in epidemiological studies, providing a context for evaluating uncertainty surrounding effect estimates.


Beyond the detection of bias, positive and negative controls can be used to correct unmeasured confounding as described in Interpreting observational studies: Why empirical calibration is needed to correct p-values (Stat Med. 2014;33(2):209-18), Robust empirical calibration of p-values using observational data (Stat Med. 2016;35(22):3883-8), Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data (Proc Natl Acad Sci. USA 2018;115(11): 571-7), Empirical assessment of case-based methods for identification of drugs associated with acute liver injury in the French National Healthcare System database (SNDS) (Pharmacoepidemiol Drug Saf. 2021;30(3):320-33), and Risk of depression, suicide and psychosis with hydroxychloroquine treatment for rheumatoid arthritis: a multinational network cohort study (Rheumatology (Oxford) 2021;60:3222-34). However, Limitations of empirical calibration of p-values using observational data, Stat Med. 2016;35(22):3869-82) concludes that, although the method may reduce the number of false positive results, it may also reduce the ability to detect a true safety or efficacy signal.


4.4.5. Use of an active comparator


The main purpose of using an active comparator is to reduce confounding by indication or by severity. Its use is optimal in the context of the new user design (see Chapter 5.1.1), whereby comparison is between patients with the same indication initiating different treatments as described in The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application, Curr Epidemiol Rep. 2015;2(4):221-8. For example, the study Risk of skin cancer in new users of thiazides and thiazide-like diuretics: a cohort study using an active comparator group (Br J Dermatol. 2021;185:343-52) used a cohort design with stratification on the propensity score to control for baseline covariates to estimate incidence rates and incidence rate ratios in short-term (<20 prescriptions) and long-term (≥20 prescriptions) drug users. Active-comparator design and new-user design in observational studies (Nat Rev Rheumatol. 2015;11:437-41) summarises the three main advantages of active comparator design: to increase the similarity in measured patient characteristics between treatment groups; to reduce potential for unmeasured confounding; and possibly to improve the clinical relevance of the research question.

Ideally, an active comparator should be chosen to represent the counterfactual risk of a given outcome with a different treatment, i.e., it should have a known and positive safety profile with respect to the event(s) of interest and ideally represent the background risk in the diseased (for example, safety of antiepileptics in pregnancy in relation to risk of congenital malformations could be compared against that of lamotrigine, which is known not to be teratogenic).


With newly marketed medicines, an active comparator with ideal comparability of patients’ characteristics may be unavailable because prescribing of newly marketed medicines may be driven to a greater extent by patients’ prognostic characteristics (early users may be either sicker or healthier than all patients with the indication) and by reimbursement considerations compared to prescribing of established medicines. This is described for comparative effectiveness studies in Assessing the comparative effectiveness of newly marketed medications: methodological challenges and implications for drug development (Clin Pharmacol Ther. 2011;90(6):777-90) and in Newly marketed medications present unique challenges for nonrandomized comparative effectiveness analyses. (J Comp Eff Res. 2012;1(2):109-11). Other challenges include treatment effect heterogeneity as patient characteristics of users evolve over time, and low precision owing to slow drug uptake.


4.4.6. Interrupted time series analyses and Difference-in-Differences method


In evaluating the effectiveness of population-level interventions that are implemented at a specific point in time (with clearly defined before-after periods, such as policy effect date, regulatory action date) interrupted time series (ITS) studies are becoming the standard approach. ITS, a quasi-experimental design with which to evaluate the longitudinal effects of interventions, through regression modelling, establishes the expected pre-intervention trend for an outcome of interest. The counterfactual scenario in the absence of the intervention serves as the comparator, the expected trend that provides a comparison for the evaluation of the impact of the intervention by examining any change occurring following the intervention period (Interrupted time series regression for the evaluation of public health interventions: a tutorial, Int J Epidemiol. 2017;46:348-55).


ITS analysis requires that several assumptions are met, its implementation is technically sophisticated, as explained in Regression based quasi-experimental approach when randomisation is not an option: Interrupted time series analysis (BMJ. 2015; 350:h2750). The use of ITS regression in impact research is illustrated in Chapter 15.4, Methods for pharmacovigilance impact research.


When data on exposed and control populations is available, Difference-in-Differences (DiD) methods are sometimes preferable. These methods compare the outcome mean or trend for exposed and control groups before and after a certain time point, providing insight into the changes of the variable for the exposed population relative to the change in the negative outcome group. This approach can be a more robust approach to causal inference than ITS, by comparing the exposed group to a control group subject to the same time-varying factors. First, DiD takes the difference for both groups before and after the intervention. Then it subtracts the difference of the control group from the exposed group to control for time-varying factors, thus estimating the clean impact of the intervention.


A basic introduction on the method can be found in Impact evaluation using Difference-in-Differences (RAUSP Management Journal 2019;54:519-532). Further extensions can be found in the literature, for example assessment of variation in treatment timing, as in Difference-in-differences with variation in treatment timing (Journal of Econometrics 2021;225:254-77). A good overview of the method applied to public health policy research is available in Designing Difference in Difference Studies: Best Practices for Public Health Policy Research (Annu Rev Public Health 2018;39:53-469).


4.4.7. Case-population studies


Note: Chapter 4.4.7. has not been updated for Revision 10


Case-population studies are a form of ecological studies where cases are compared to an aggregated comparator consisting of population data. The case-population study design: an analysis of its application in pharmacovigilance (Drug Saf. 2011;34(10):861-8) explains its design and its application in pharmacovigilance for signal generation and drug surveillance. The design is also explained in Chapter 2: Study designs in drug utilization research of the textbook Drug Utilization Research - Methods and Applications (M Elseviers, B Wettermark, AB Almarsdóttir, et al. Editors. Wiley Blackwell, 2016). An example is a multinational case-population study aiming to estimate population rates of a suspected adverse event using national sales data in Transplantation for Acute Liver Failure in Patients Exposed to NSAIDs or Paracetamol, Drug Saf. 2013;36(2):135–44. Based on the same study, Choice of the denominator in case population studies: event rates for registration for liver transplantation after exposure to NSAIDs in the SALT study in France (Pharmacoepidemiol Drug Saf. 2013;22(2):160-7) compared sales data and healthcare insurance data as denominators to estimate population exposure and found large differences in the event rates. Choosing the wrong denominator in case-population studies might generate erroneous results. The choice of the right denominator depends not only on a valid data source but will also depend on the hazard function of the adverse event.


The case-population approach has also been adapted for vaccine safety surveillance, in particular for prospective investigation of urgent vaccine safety concerns or for the prospective generation of vaccine safety signals (see Vaccine Case-Population: A New Method for Vaccine Safety Surveillance, Drug Saf. 2016 Dec;39(12):1197-1209).


Use of the case-population design for fast investigation is illustrated in Use of renin-angiotensin-aldosterone system inhibitors and risk of COVID-19 requiring admission to hospital: a case-population study (Lancet 2020;395(10238):1705-14), in which the authors consecutively selected patients aged 18 years or older with a PCR-confirmed diagnosis of COVID-19 requiring admission to hospital from seven hospitals between March 1 and March 24, 2020. As a reference group, ten patients per case were randomly sampled, individually matched for age, sex, region and date of admission to hospital from a primary health-care database (available year: 2018). Information was extracted on comorbidities and prescriptions up to the month before index date from electronic clinical records of both cases and controls. Although the cases and controls originated from different data sources in different years, it was assumed that the primary health-care database of controls represented the source population of the cases and that a random sample of controls from that database would provide a valid estimate of the prevalence of the exposure and covariates in the source population, approaching the primary base paradigm of case-control studies.


A pragmatic attitude towards case-population studies is recommended: in situations where nation-wide or region-wide electronic health records (EHRs) are available and allow assessing the outcomes and confounders with sufficient validity, a case-population approach is neither necessary nor desirable, as one can perform a population-based cohort or case-control study with adequate control for confounding. In situations where outcomes are difficult to ascertain in EHRs or where such databases do not exist, the case-population design might give an approximation of the absolute and relative risk when both events and exposures are rare. This is limited by the ecological nature of the reference data that restricts the ability to control for confounding.


« Back to main table of contents