Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


5.2. Bias and confounding


5.2.1. Selection bias


Selection bias entails the selective recruitment into the study of subjects that are not representative of the exposure or outcome pattern in the source population. Examples of selection bias are referral bias, self-selection bias, prevalence bias or protopathic bias (Strom BL, Kimmel SE, Hennessy S. Pharmacoepidemiology, 5th Edition, Wiley, 2012).


Protopathic bias


Protopathic bias arises when the initiation of a drug (exposure) occurs in response to a symptom of the (at this point undiagnosed) disease under study (outcome). For example, use of analgesics in response to pain caused by an undiagnosed tumour might lead to the erroneous conclusion that the analgesic caused the tumour. Protopathic bias thus reflects a reversal of cause and effect (Bias: Considerations for research practice. Am J Health Syst Pharm 2008;65:2159-68). This is particularly a problem in studies of drug-cancer associations and other outcomes with long latencies. It may be handled by including a time-lag, i.e. by disregarding all exposure during a specified period of time before the index date.


Prevalence bias


The practice of including prevalent users in observational studies, i.e. patients taking a therapy for some time before study follow-up began, can cause two types of bias. Firstly, prevalent users are ‘survivors’ (healthy-users) of the early period of pharmacotherapy, which can introduce substantial selection bias if risk varies with time, as seen in the association between contraceptive intake and venous thrombosis which was initially overestimated due to the heathy-users bias. (The Transnational Study on Oral Contraceptives and the Health of Young Women. Methods, results, new analyses and the healthy user effect, Hum Reprod Update. 1999 Nov-Dec;5(6)). Secondly, covariates for drug users at study entry are often plausibly affected by the drug itself.


5.2.2. Information bias


Information bias arises when incorrect information about either exposure or outcome or any covariates is collected in the study. It can be either non-differential when it does occur randomly across exposed/non-exposed participants or differential when it is influenced by the disease or exposure status.

Non differential misclassification bias drives the risk estimate towards the null value, while differential bias can drive the risk estimate in either direction. Examples of non-differential misclassification bias are recall bias (e.g., in case controls studies cases and controls can have different recall of their past exposures) and surveillance or detection bias.


Surveillance bias (or detection bias)


Surveillance or detection bias arises when patients in one exposure group have a higher probability of having the study outcome detected, due to increased surveillance, screening or testing of the outcome itself, or an associated symptom. For example, post-menopausal exposure to estrogen is associated with an increased risk of bleeding that can trigger screening for endometrial cancers, leading to a higher probability of early stage endometrial cancers being detected. Any association between estrogen exposure and endometrial cancer potentially overestimates risk, because unexposed patients with sub-clinical cancers would have a lower probability of their cancer being diagnosed or recorded. This is discussed in Alternative analytic methods for case-control studies of estrogens and endometrial cancer (N Engl J Med 1978;299(20):1089-94).


This non-random type of misclassification bias can be reduced by selecting an unexposed comparator group with a similar likelihood of screening or testing, selecting outcomes that are likely to be diagnosed equally in both exposure groups, or by adjusting for the surveillance rate in the analysis. The issues and recommendations are outlined in Surveillance Bias in Outcomes Reporting (JAMA, 2011;305(23):2462-3)).


Time-related bias


Time-related bias is most often a form of differential misclassification bias and is triggered by inappropriate accounting of follow-up time and exposure status in the study design and analysis. 


The choice of the exposure risk window can influence risk comparisons due to misclassification of drug exposure possibly associated with risks that vary over time. A study of the effects of exposure misclassification due to the time-window design in pharmacoepidemiologic studies (Clin Epidemiol 1994:47(2):183–89) considers the impact of the time-window design on the validity of risk estimates in record linkage studies. In adverse drug reaction studies, an exposure risk-window constitutes the number of exposure days assigned to each prescription. The ideal design situation would occur when each exposure risk-window would only cover the period of potential excess risk. The estimation of the time of drug-related risk is however complex as it depends on the duration of drug use, timing of ingestion and the onset and persistence of drug toxicity. With longer windows, a substantive attenuation of incidence rates may be observed. Risk windows should be validated or sensitivity analyses should be conducted.


Immortal time bias


Immortal time bias refers to a period of cohort follow-up time during which death (or an outcome that determines end of follow-up) cannot occur. (K. Rothman, S. Greenland, T. Lash. Pharmacoepidemiology, 3rd Edition, Lippincott Williams & Wilkins, 2008 p. 106-7).


Immortal time bias can arise when the period between cohort entry and date of first exposure to a drug, during which the event of interest has not occurred, is either misclassified or simply excluded and not accounted for in the analysis. Immortal time bias in observational studies of drug effects (Pharmacoepidemiol Drug Saf 2007;16:241-9) demonstrates how several observational studies used a flawed approach to design and data analysis, leading to immortal time bias, which can generate an illusion of treatment effectiveness. This is frequently found in studies that compare groups of ‘users’ against ‘non-users’. Observational studies with surprisingly beneficial drug effects should therefore be re-assessed to account for this bias.


Immortal time bias in Pharmacoepidemiology (Am J Epidemiol 2008;167:492-9) describes various cohort study designs leading to this bias, quantifies its magnitude under different survival distributions and illustrates it with data from a cohort of lung cancer patients. For time-based, event-based and exposure-based cohort definitions, the bias in the rate ratio resulting from misclassified or excluded immortal time increases proportionately to the duration of immortal time.


Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods (Am J Epidemiol 2005;162:1016-23) describes five different approaches to deal with immortal time bias. The use of a time-dependent approach had several advantages: no subjects were excluded from the analysis and the study allowed effect estimation at any point in time after discharge. However, changes of exposure might be predictive of the study endpoint and need adjustment for time-varying confounders using complex methods. Problem of immortal time bias in cohort studies: example using statins for preventing progression of diabetes (BMJ 2010; 340:b5087) describes how immortal time in observation studies can bias the results in favour of the treatment group and how they can be identified and avoided. It is recommended that all cohort studies should be assessed for the presence of immortal time bias using appropriate validity criteria. However, Re. ‘Immortal time bias on pharmacoepidemiology’ (Am J Epidemiol 2009; 170: 667-8) argues that sound efforts at minimising the influence of more common biases should not be sacrificed to that of avoiding immortal time bias.


Other forms of time-related bias


Time-window Bias in Case-control Studies. Statins and Lung Cancer (Epidemiology 2011; 22 (2):228-31) describes a case-control study which reported a 45% reduction in the rate of lung cancer with any statin use. A differential misclassification bias arose from the methods used to select controls and measure their exposure, which resulted in exposure assessment to statins being based on a shorter time-span for cases than controls and an over-representation of unexposed cases. Properly accounting for time produced a null association.


In many database studies, exposure status during hospitalisations is unknown. Exposure misclassification bias may occur with a direction depending on whether exposure to drugs prescribed preceding hospitalisations are continued or discontinued and if days of hospitalisation are considered as gaps of exposure or not, especially during hospitalisation when several exposure categories are assigned, such as current, recent and past. The differential bias arising from the lack of information on (or lack of consideration of) hospitalisations that occur during the observation period (called ‘immeasurable time bias’ described in Immeasurable time bias in observational studies of drug effects on mortality. Am J Epidemiol 2008;168 (3):329-35) can be particularly problematic when studying serious chronic diseases that require extensive medication use and multiple hospitalisations.


In the case of case control studies assessing chronic diseases with multiple hospitalizations and in-patient treatment (such as the use of inhaled corticosteroids and death in chronic obstructive pulmonary disease patients), no clearly valid approach to data analysis can fully circumvent this bias. However, sensitivity analyses such as restricting the analysis to non-hospitalised patients or providing estimates weighted by exposable time may provide additional information on the potential impact of this bias: Immeasurable time bias in observational studies of drug effects on mortality. (Am J Epidemiol 2008;168 (3):329-35).


In cohort studies where a first-line therapy (such as metformin) has been compared with second- or third-line therapies, patients are unlikely to be at the same stage of the disease (e.g. diabetes), which can induce confounding of the association with an outcome (e.g. cancer incidence) by disease duration. An outcome related to the first-line therapy may also be attributed to the second-line therapy if it occurs after a long period of exposure. Such situation requires matching on disease duration and consideration of latency time windows in the analysis (example drawn from Metformin and the Risk of Cancer. Time-related biases in observational studies. Diabetes Care 2012;35(12):2665-73).


5.2.3. Confounding

Confounding occurs when the estimate of measure of association is distorted by the presence of another risk factor. For a variable to be a confounder, it must be associated with both the exposure and the outcome, without being in the causal pathway.


Confounding by indication


Confounding by indication refers to a determinant of the outcome parameter that is present in people at perceived high risk or poor prognosis and is an indication for intervention. This means that differences in care, for example, between cases and controls may partly originate from differences in indication for medical intervention such as the presence of risk factors for particular health problems. Other names for this type of confounding are ‘channelling’ or ‘confounding by severity’.


This type of confounding has frequently been reported in studies evaluating the efficacy of pharmaceutical interventions and is almost always encountered in various extents in pharmacoepidemiological studies. A good example can be found in Confounding and indication for treatment in evaluation of drug treatment for hypertension (BMJ 1997;315:1151-4).


The article Confounding by indication: the case of the calcium channel blockers (Pharmacoepidemiol Drug Saf 2000;9:37-41) demonstrates that studies with potential confounding by indication can benefit from appropriate analytic methods, including separating the effects of a drug taken at different times, sensitivity analysis for unmeasured confounders, instrumental variables and G-estimation.


With the more recent application of pharmacoepidemiological methods to assess effectiveness, confounding by indication is a greater challenge and the article Approaches to combat with confounding by indication in observational studies of intended drug effects (Pharmacoepidemiol Drug Saf 2003;12:551-8) focusses on its possible reduction in studies of intended effects. An extensive review of these and other methodological approaches discussing their strengths and limitations is discussed in Methods to assess intended effects of drug treatment in observational studies are reviewed (J Clin Epidemiol 2004;57:1223-31).


Unmeasured confounding


Complete adjustment for confounders would require detailed information on clinical parameters, lifestyle or over-the-counter medications, which are often not measured in electronic healthcare records, causing residual confounding bias. Using directed acyclic graphs to detect limitations of traditional regression in longitudinal studies (Int J Public Health 2010;55:701-3) reviews confounding and intermediate effects in longitudinal data and introduces causal graphs to understand the relationships between the variables in an epidemiological study.


Unmeasured confounding can be adjusted for only through randomisation. When this is not possible, as most often in pharmacoepidemiological studies, the potential impact of residual confounding on the results should be estimated and considered in the discussion.


Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics (Pharmacoepidemiol Drug Saf 2006;15(5):291-303) provides a systematic approach to sensitivity analyses to investigate the impact of residual confounding in pharmacoepidemiological studies that use healthcare utilisation databases. In the article, four basic approaches to sensitivity analysis were identified: (1) sensitivity analyses based on an array of informed assumptions; (2) analyses to identify the strength of residual confounding that would be necessary to explain an observed drug-outcome association; (3) external adjustment of a drug-outcome association given additional information on single binary confounders from survey data using algebraic solutions; (4) external adjustment considering the joint distribution of multiple confounders of any distribution from external sources of information using propensity score calibration. The paper concludes that sensitivity analyses and external adjustments can improve our understanding of the effects of drugs in epidemiological database studies. With the availability of easy-to-apply spreadsheets (e.g. at, sensitivity analyses should be used more frequently, substituting qualitative discussions of residual confounding.


The amount of bias in exposure-effect estimates that can plausibly occur due to residual or unmeasured confounding has been debated. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study (Am J Epidemiol 2007;166:646–55) considers the extent and patterns of bias in estimates of exposure-outcome associations that can result from residual or unmeasured confounding, when there is no true association between the exposure and the outcome. With plausible assumptions about residual and unmeasured confounding, effect sizes of the magnitude frequently reported in observational epidemiological studies can be generated. This study also highlights the need to perform sensitivity analyses to assess whether unmeasured and residual confounding are likely problems. Another important finding of this study was that when confounding factors (measured or unmeasured) are interrelated (e.g. in situations of confounding by indication), adjustment for a few factors can almost completely eliminate confounding.


Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes