Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


5.3. Methods to handle bias and confounding


5.3.1. New-user designs


New user (incident user) designs can avoid prevalence bias by restricting the analysis to persons under observation at the start of the current course of treatment, therefore with the same baseline risk. Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol 2003;158 (9):915–20).  In addition to defining new-user designs, the article explains how they can be implemented as case-control studies and describes the logistical and sample size limitations involved.


5.3.2. Case-only designs


Case-only designs reduce confounding by using the exposure history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, as demographics, socio-economic factors, genetics and chronic diseases.


A simple form of a case-only design is the symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84). In this study, the risk of depression associated with cardiovascular drugs was estimated by analysing the non-symmetrical distribution of prescription orders for cardiovascular drugs and antidepressants.


The case-crossover design studies transient exposures with acute effects (The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events. Am J Epidemiol 1991;133:144-53) and The case-time-control design (Epidemiology 1995;6(3):248-53). It uses exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (Case-crossover and Case-Time-Control Designs as Alternatives in Pharmacoepidemiologic Research. Pharmacoepidemiol Drug Saf 1997; Suppl 3. S51-S59). However, if not well matched, the control group may reintroduce selection bias as discussed in Confounding and exposure trends in case-crossover and case-time-control designs(Epidemiology. 1996;7:231-9). In this situation, a ‘case-time-control’ method may be helpful as explained in Future cases as present controls to adjust for exposure trend bias in case-only studies (Epidemiology 2011;22:568–74).


The self-controlled case series (SCCS) design was primarily developed to investigate the association between a vaccine and an adverse event but is increasingly used to study drug exposure. In this design, the observation period following each exposure for each case is divided into risk period(s) (e.g. number(s) of days immediately following each exposure) and a control period (e.g. the remaining observation period). Incidence rates within the risk period after exposure are compared with incidence rates within the control period.


The Tutorial in biostatistics: the self-controlled case series method (Stat Med 2006; 25(10):1768-97) and the associated website explain how to fit SCCS models using standard statistical packages.


Like cohort or case-control studies, the SCCS method remains, however, susceptible to confounding by indication, at least if the indication varies over time. Relevant time intervals for the risk and control periods need also to be defined and this may become complex, e.g. with primary vaccination with several doses. The bias introduced by inaccurate specification of the risk window is discussed and a data-based approach for identifying the optimal risk windows is proposed in Identifying optimal risk windows for self-controlled case series studies of vaccine safety (Stat Med 2011; 30(7):742-52).


The SCCS also assumes that the event itself does not affect the chance of being exposed. The pseudolikelihood method developed to address this possible issue is described in Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16). Based on a review of 40 vaccine studies, Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect 2011;139(12):1805-17) assesses how the SCCS method has been used, highlights good practice and gives guidance on how the method should be used and reported. Using several methods of analysis is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs.


Within-person study designs had lower precision and greater susceptibility to bias because of trends in exposure than cohort and nested case-control designs (J Clin Epidemiol 2012;65(4):384-93) compares cohort, case-control, case-cross-over and SCCS designs to explore the association between thiazolidinediones and the risks of heart failure and fracture and anticonvulsants and the risk of fracture. The self-controlled case-series and case-cross over designs were more susceptible to bias, but this bias was removed when follow-up was sampled both before and after the outcome, or when a case-time-control design was used.


When should case-only designs be used for safety monitoring of medicinal products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strength and major difference (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should and should not be used for drug safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets. Four outcomes and five design choices were assessed.


In Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol. 2016 Oct 25., Epub ahead of print) it was demonstrated that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group.


5.3.3. Disease risk scores


An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol 1976;104:609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.


DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res 2009;18:67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows large variation exists with differences in terminology and methods used.


In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf. 2012 May;21 Suppl 2:138–47) it is argued that DRS may have its place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores (see below) may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology. 2016;27:133-42).


5.3.4. Propensity scores


Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.


A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol 1999;150:327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.


High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiol 2009; 20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol 2011;173:1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf 2012;20:849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity score to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS.


Most cohort studies match patients 1:1 on the propensity score. Increasing the matching ratio may increase precision but also bias. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf. 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.

The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf 2011;20:1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2014; Epub 2014 Jan 29). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol. 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested.


Performance of propensity score calibration – a simulation study (Am J Epidemiol 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.


Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol. 2010; 1;172(7):843–54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.


Although in most situations propensity score models, with the exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in drug safety research. Furthermore, assessment of the PS distribution may reveal non-positivity. An important limitation of PS is that it is not directly amenable for case-control studies.


5.3.5. Instrumental variables


Instrumental variable (IV) methods were invented over 70 years ago but were used by epidemiologists only recently. Over the past decade or so, non-parametric versions of IV methods have appeared that connect IV methods to causal and measurement-error models important in epidemiological applications. An introduction to instrumental variables for epidemiologists (Int J Epidemiol 2000;29:722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. Where IV assumptions are questionable, the corrections can still serve as part of the sensitivity analysis or external adjustment. Where the assumptions are more defensible, as in field trials and in studies that obtain validation or reliability data, IV methods can form an integral part of the analysis. A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumption could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).


A proposal for reporting instrumental variable analyses has been suggested in Commentary: how to report instrumental variable analyses (suggestions welcome) (Epidemiology. 2013;24(3):370-4). In particular the type of treatment effect (average treatment effect/homogeneity condition or local average treatment effect/monotonicity condition) and the testing of critical assumptions for valid IV analyses should be reported. In support of these guidelines, the standardized difference has been proposed to falsify the assumption that confounders are not related to the instrumental variable (Quantitative falsification of instrumental variables assumption using balance measures, Epidemiology. 2014;25(5):770-2).


The complexity of the issues associated with confounding by indication, channelling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis, the author showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which is at odds with results from clinical trials.. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf. 2016;25 Suppl 1:132-41).


Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010;19:537–54) is a practical guidance on IV analyses in pharmacoepidemiology. Instrumental variable methods for causal inference (Stat Med. 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.


An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf 2014; 2014;23(2):165-77) demonstrated that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.


Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol. 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options.


5.3.6. Prior event rate ratios


Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) from periods before and after initiation of a drug exposure as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf. 2008;17:671-685).  For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.


Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf. 2015;24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided.


5.3.7. Handling time-dependent confounding in the analysis


Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation. G-estimation

G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure. Marginal Structural Models (MSM)


The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11:550-60) introduces MSM, a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding.


MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11:561–70).


Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol 2003;158:687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta(2)-agonist use and the risk of coronary heart disease (Eur J Epidemiol 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression.

Beyond the approaches proposed above, traditional and efficient approaches to deal with time dependent variables should be considered in the design of the study, such as nested case control studies with assessment of time varying exposure windows.



Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes