New user (incident user) designs restrict the study population to persons who are observed at the start of treatment. New user design prevents ‘depletion of susceptibles’ – unwanted exclusion from a safety assessment of persons discontinuing treatments following early adverse reactions- and helps alleviate healthy user bias for preventive treatments in some circumstances. The article Evaluating medication effects outside of clinical trials: new-user designs (Am J Epidemiol 2003;158 (9):915–20) defines new-user designs and explains how they can be implemented as case-control studies. New user design helps mitigate confounding by indication, severity or frailty, as described in The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep 2015;2(4):221-8).
The use of case only study designs can reduce selection bias where the statistical assumptions of the method are fulfilled (see Chapter 18.104.22.168).
The active comparator approach includes only populations who have received treatment (see Chapter 22.214.171.124). These comparisons are less likely to be biased by unmeasured patient characteristics than studies where one group received no therapy at all (see Healthy User and Related Biases in Observational Studies of Preventive Interventions: A Primer for Physicians. J Gen Intern Med 2011, 26(5):546-50).
Misclassification can occur in exposure, outcome, or covariate variables. Outcome misclassification occurs when a non-case is classified as a case (false positive error) or a case is classified as a non-case (false negative error). Errors are quantified as estimates of positive predictive value (PPV), negative predictive value, sensitivity and specificity. Most database studies will be subject to outcome misclassification to some degree, unless cases have been adjudicated against a case definition, so the point estimate should always be adjusted accordingly. One should avoid the epidemiologic ‘mantra’ about non-differential misclassification of exposure producing conservative estimates, because the logic does not necessarily apply. Good practices for quantitative bias analysis (Int J Epidemiol 2014;43(6):1969-85) advocates explicit and quantitative assessment of misclassification bias, including decision guidance on which biases to assess in a given situation, what level of sophistication to use, and how to present the results. Use of the Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies (Am J Epidemiol 1993;138(11):1007-15) proposes a method based on estimates of the PPV which requires validation of a sample of those with the outcome only. By addressing misclassification of confounding variables, for example, by external adjustment, one alleviates the issue of residual confounding (see Adjustments for unmeasured confounders in pharmacoepidemiologic database studies using external information. Med Care 2007;45(10 Supl 2):S158-65).
Case-only designs reduce confounding by using the exposure history of each case as its own control and thereby eliminate confounding by characteristics that are constant over time, such as sex, socio-economic factors, genetics and chronic diseases. A review of case only designs is available in Use of self-controlled designs in pharmacoepidemiology (J Intern Med 2014; 275(6): 581-9).
A simple form of a case-only design is the symmetry analysis (initially described as prescription sequence symmetry analysis), introduced as a screening tool in Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis (Epidemiology 1996;7(5):478-84).
The case-crossover design compares the risk of exposure in a time period prior to an outcome with that in an earlier reference time-period, or set of time periods, to examine the effect of transient exposures on acute events (see The Case-Crossover Design: A Method for Studying Transient Effects on the Risk of Acute Events, Am J Epidemiol 1991;133(2):144-53). The case-time-control designs are a modification of case-crossover designs which use exposure history data from a traditional control group to estimate and adjust for the bias from temporal changes in prescribing (The case-time-control design, Epidemiology 1995;6(3):248-53). However, if not well matched, the case-time-control group may reintroduce selection bias (Confounding and exposure trends in case-crossover and case-time-control designs (Epidemiology 1996;7(3):231-9). Methods have been suggested to overcome the exposure-trend bias while controlling for time-invariant confounders (see Future cases as present controls to adjust for exposure trend bias in case-only studies, Epidemiology 2011;22(4):568-74, and "First-wave" bias when conducting active safety monitoring of newly marketed medications with outcome-indexed self-controlled designs, Am J Epidemiol 2014;180(6):636-44). Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology (Am J Epidemiol 2016; 184(10):761-9) demonstrates that case-crossover studies of drugs that may be used indefinitely are biased upward. This bias is alleviated, but not removed completely, by using a control group.
In the self-controlled case series (SCCS) design, the observation period following each exposure for each case is divided into risk period(s) (e.g. number of days immediately following each exposure) and a control period (observed time outside this risk period). Incidence rates within the risk period after exposure are compared with incidence rates within the control period. The Tutorial in biostatistics: the self-controlled case series method (Stat Med 2006; 25(10):1768-97) and the associated website http://statistics.open.ac.uk/sccs explain how to fit SCCS models using standard statistical packages. The bias introduced by inaccurate specification of the risk window is discussed and a data-based approach for identifying the optimal risk windows is proposed in Identifying optimal risk windows for self-controlled case series studies of vaccine safety (Stat Med 2011; 30(7):742-52). The SCCS also assumes that the event itself does not affect the chance of being exposed. The pseudo-likelihood method developed to address this possible issue is described in Cases series analysis for censored, perturbed, or curtailed post-event exposures (Biostatistics 2009;10(1):3-16). Use of the self-controlled case-series method in vaccine safety studies: review and recommendations for best practice (Epidemiol Infect 2011;139(12):1805-17) assesses how the SCCS method has been used across 40 vaccine studies, highlights good practice and gives guidance on how the method should be used and reported. Using several methods of analysis is recommended, as it can reinforce conclusions or shed light on possible sources of bias when these differ for different study designs.
When should case-only designs be used for safety monitoring of medical products? (Pharmacoepidemiol Drug Saf 2012;21(Suppl. 1):50-61) compares the SCCS and case-crossover methods as to their use, strength and major difference (directionality). It concludes that case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. It also provides recommendations on when case-only designs should and should not be used for Drug Safety monitoring. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system (Drug Saf 2013;36(Suppl. 1):S83-S93) evaluates the performance of the SCCS design using 399 drug-health outcome pairs in 5 observational databases and 6 simulated datasets. Four outcomes and five design choices were assessed. Within-person study designs had lower precision and greater susceptibility to bias because of trends in exposure than cohort and nested case-control designs (J Clin Epidemiol 2012;65(4):384-93) compares cohort, case-control, case-cross-over and SCCS designs to explore the association between thiazolidinediones and the risks of heart failure and fracture and anticonvulsants and the risk of fracture. Bias was removed when follow-up was sampled both before and after the outcome, or when a case-time-control design was used.
The main purpose of using an active comparator is to reduce confounding by indication or by severity, at least in relation to the contrasts “treated diseased vs. untreated undiseased” or “treated diseased vs. untreated diseased”. It is optimal to use the active comparator in the context of the new user design, whereby comparison is between patients with the same indication initiating different treatments (see The active comparator, new user study design in pharmacoepidemiology: historical foundations and contemporary application (Curr Epidemiol Rep 2015;2(4):221-8)). An active comparator should be chosen to represent the counterfactual risk of a given outcome in the absence of the treatment of interest, i.e., it should have a known and positive safety profile with respect to the events of interest, ideally represent the background risk in the diseased but untreated (for example, safety of newer antibiotics in pregnancy in relation to risk of congenital malformations could be compared against that of penicillin, which is not known to be teratogenic). Especially with the newly marketed medicines, no active comparator with ideal comparability may be available, because prescribing newly marketed medicines may be driven to a greater extent by patients prognostic characteristics than prescribing of established medicines (early users may be either sicker or healthier than all patients with the indication). This also applies to comparative effectiveness studies as described in Assessing the comparative effectiveness of newly marketed medications: methodological challenges and implications for drug development (Clin Pharmacol Ther 2011;90(6):777-90) and in Newly marketed medications present unique challenges for nonrandomized comparative effectiveness analyses. (J Comp Eff Res 2012;1(2):109-11). Other challenges include treatment effect heterogeneity as patient characteristics of users evolve over time, and low precision owing to slow drug uptake.
An approach to controlling for a large number of confounding variables is to summarise them in a single multivariable confounder score. Stratification by a multivariate confounder score (Am J Epidemiol 1976;104(6):609-20) shows how control for confounding may be based on stratification by the score. An example is a disease risk score (DRS) that estimates the probability or rate of disease occurrence conditional on being unexposed. The association between exposure and disease is then estimated with adjustment for the disease risk score in place of the individual covariates.
DRSs are however difficult to estimate if outcomes are rare. Use of disease risk scores in pharmacoepidemiologic studies (Stat Methods Med Res 2009;18(1):67-80) includes a detailed description of their construction and use, a summary of simulation studies comparing their performance to traditional models, a comparison of their utility with that of propensity scores, and some further topics for future research. Disease risk score as a confounder summary method: systematic review and recommendations (Pharmacoepidemiol Drug Saf 2013;22(2);122-29), examines trends in the use and application of DRS as a confounder summary method and shows that large variation exists with differences in terminology and methods used.
In Role of disease risk scores in comparative effectiveness research with emerging therapies (Pharmacoepidemiol Drug Saf 2012;21 Suppl 2:138–47) it is argued that DRS may have a place when studying drugs that are recently introduced to the market. In such situations, as characteristics of users change rapidly, exposure propensity scores may prove highly unstable. DRSs based mostly on biological associations would be more stable. However, DRS models are still sensitive to misspecification as discussed in Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models (Epidemiology 2016;27(1):133-42).
Databases used in pharmacoepidemiological studies often include records of prescribed medications and encounters with medical care providers, from which one can construct surrogate measures for both drug exposure and covariates that are potential confounders. It is often possible to track day-by-day changes in these variables. However, while this information can be critical for study success, its volume can pose challenges for statistical analysis.
A propensity score (PS) is analogous to the disease risk score in that it combines a large number of possible confounders into a single variable (the score). The exposure propensity score (EPS) is the conditional probability of exposure to a treatment given observed covariates. In a cohort study, matching or stratifying treated and comparison subjects on EPS tends to balance all of the observed covariates. However, unlike random assignment of treatments, the propensity score may not balance unobserved covariates. Invited Commentary: Propensity Scores (Am J Epidemiol 1999;150(4):327–33) reviews the uses and limitations of propensity scores and provide a brief outline of the associated statistical theory. The authors present results of adjustment by matching or stratification on the propensity score.
High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Healthcare Claims Data (Epidemiol 2009; 20(4):512-22) discusses the high dimensional propensity score (hd-PS) model approach. It attempts to empirically identify large numbers of potential confounders in healthcare databases and, by doing so, to extract more information on confounders and proxies. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples (Am J Epidemiol 2011;173(12):1404-13) evaluates the relative performance of hd-PS in smaller samples. Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records (Pharmacoepidemiol Drug Saf 2012;20(8):849-57) evaluates the use of hd-PS in a primary care electronic medical record database. In addition, the article Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system (Pharmacoepidemiol Drug Saf 2012;21(S1):41-9) summarises the application of this method for automating confounding control in sequential cohort studies as applied to safety monitoring systems using healthcare databases and also discusses the strengths and limitations of hd-PS.
Most cohort studies match patients 1:1 on the propensity score. Increasing the matching ratio may increase precision but also bias. One-to-many propensity score matching in cohort studies (Pharmacoepidemiol Drug Saf 2012;21(S2):69-80) tests several methods for 1:n propensity score matching in simulation and empirical studies and recommends using a variable ratio that increases precision at a small cost of bias. Matching by propensity score in cohort studies with three treatment groups (Epidemiology 2013;24(3):401-9) develops and tests a 1:1:1 propensity score matching approach offering a way to compare three treatment options.
The use of several measures of balance for developing an optimal propensity score model is described in Measuring balance and model selection in propensity score methods (Pharmacoepidemiol Drug Saf 2011;20(11):1115-29) and further evaluated in Propensity score balance measures in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2014;23(8):802-11). In most situations, the standardised difference performs best and is easy to calculate (see Balance measures for propensity score methods: a clinical example on beta-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2011;20(11):1130-7) and Reporting of covariate selection and balance assessment in propensity score analysis is suboptimal: a systematic review (J Clin Epidemiol 2015;68(2):112-21)). Metrics for covariate balance in cohort studies of causal effects (Stat Med 2013;33:1685-99) shows in a simulation study that the c-statistics of the PS model after matching and the general weighted difference perform as well as the standardized difference and are preferred when an overall summary measure of balance is requested. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study (Am J Epidemiol 2010; 172(7):843-54) demonstrates how ‘trimming’ of the propensity score eliminates subjects who are treated contrary to prediction and their exposed/unexposed counterparts, thereby reducing bias by unmeasured confounders.
Performance of propensity score calibration-–a simulation study (Am J Epidemiol 2007;165(10):1110-8) introduces ‘propensity score calibration’ (PSC). This technique combines propensity score matching methods with measurement error regression models to address confounding by variables unobserved in the main study. This is done by using additional covariate measurements observed in a validation study, which is often a subset of the main study.
Although in most situations propensity score models, with the exception of hd-PS, do not have any advantages over conventional multivariate modelling in terms of adjustment for identified confounders, several other benefits may be derived. Propensity score methods may help to gain insight into determinants of treatment including age, frailty and comorbidity and to identify individuals treated against expectation. A statistical advantage of PS analyses is that if exposure is not infrequent it is possible to adjust for a large number of covariates even if outcomes are rare, a situation often encountered in Drug Safety research. Furthermore, assessment of the PS distribution may reveal non-positivity. An important limitation of PS is that it is not directly amenable for case-control studies. A critical assessment of propensity scores is provided in Propensity scores: from naive enthusiasm to intuitive understanding (Stat Methods Med Res 2012;21(3):273-93).
Instrumental variable (IV) analysis is an approach to address uncontrolled confounding in comparative studies. An introduction to instrumental variables for epidemiologists (Int J Epidemiol 2000;29(4):722-9) presents those developments, illustrated by an application of IV methods to non-parametric adjustment for non-compliance in randomised trials. The author mentions a number of caveats but concludes that IV corrections can be valuable in many situations. IV analysis in comparative safety and effectiveness research is reviewed in Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010; 19(6):537-54). A review of IV analysis for observational comparative effectiveness studies suggested that in the large majority of studies, in which IV analysis was applied, one of the assumption could be violated (Potential bias of instrumental variable analyses for observational comparative effectiveness research, Ann Intern Med. 2014;161(2):131-8).
A proposal for reporting instrumental variable analyses has been suggested in Commentary: how to report instrumental variable analyses (suggestions welcome) (Epidemiology 2013;24(3):370-4). In particular the type of treatment effect (average treatment effect/homogeneity condition or local average treatment effect/monotonicity condition) and the testing of critical assumptions for valid IV analyses should be reported. In support of these guidelines, the standardized difference has been proposed to falsify the assumption that confounders are not related to the instrumental variable (Quantitative falsification of instrumental variables assumption using balance measures, Epidemiology 2014;25(5):770-2).
The complexity of the issues associated with confounding by indication, channelling and selective prescribing is explored in Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable (Epidemiology 2006;17(3):268-75). A conventional, adjusted multivariable analysis showed a higher risk of gastrointestinal toxicity for selective COX-2-inhibitors than for traditional NSAIDs, which was at odds with results from clinical trials. However, a physician-level instrumental variable approach (a time-varying estimate of a physician’s relative preference for a given drug, where at least two therapeutic alternatives exist) yielded evidence of a protective effect due to COX-2 exposure, particularly for shorter term drug exposures. Despite the potential benefits of physician-level IVs their performance can vary across databases and strongly depends on the definition of IV used as discussed in Evaluating different physician's prescribing preference based instrumental variables in two primary care databases: a study of inhaled long-acting beta2-agonist use and the risk of myocardial infarction (Pharmacoepidemiol Drug Saf 2016;25 Suppl 1:132-41).
Instrumental variable methods in comparative safety and effectiveness research (Pharmacoepidemiol Drug Saf 2010;19(6):537–54) is a practical guidance on IV analyses in pharmacoepidemiology. Instrumental variable methods for causal inference (Stat Med 2014;33(13):2297-340) is a tutorial, including statistical code for performing IV analysis.
An important limitation of IV analysis is that weak instruments (small association between IV and exposure) lead to decreased statistical efficiency and biased IV estimates as detailed in Instrumental variables: application and limitations (Epidemiology 2006;17:260-7). For example, in the above mentioned study on non-selective NSAIDs and COX-2-inhibitors, the confidence intervals for IV estimates were in the order of five times wider than with conventional analysis. Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study (Pharmacoepidemiol Drug Saf 2014; 2014;23(2):165-77) demonstrated that a stronger IV-exposure association is needed in nested case-control studies compared to cohort studies in order to achieve the same bias reduction. Increasing the number of controls reduces this bias from IV analysis with relatively weak instruments.
Selecting on treatment: a pervasive form of bias in instrumental variable analyses (Am J Epidemiol 2015;181(3):191-7) warns against bias in IV analysis by including only a subset of possible treatment options.
Another method proposed to control for unmeasured confounding is the Prior Event Rate Ratio (PERR) adjustment method, in which the effect of exposure is estimated using the ratio of rate ratios (RRs) from periods before and after initiation of a drug exposure, as discussed in Replicated studies of two randomized trials of angiotensin converting enzyme inhibitors: further empiric validation of the ‘prior event rate ratio’ to adjust for unmeasured confounding by indication (Pharmacoepidemiol Drug Saf 2008;17(7):671-685). For example, when a new drug is launched, direct estimation of the drugs effect observed in the period after launch is potentially confounded. Differences in event rates in the period before the launch between future users and future non-users may provide a measure of the amount of confounding present. By dividing the effect estimate from the period after launch by the effect obtained in the period before launch, the confounding in the second period can be adjusted for. This method requires that confounding effects are constant over time, that there is no confounder-by-treatment interaction, and outcomes are non-lethal events.
Performance of prior event rate ratio adjustment method in pharmacoepidemiology: a simulation study (Pharmacoepidemiol Drug Saf 2015(5);24:468-477) discusses that the PERR adjustment method can help to reduce bias as a result of unmeasured confounding in certain situations but that theoretical justification of assumptions should be provided.
Methods for dealing with time-dependent confounding (Stat Med. 2013;32(9):1584-618) provides an overview of how time-dependent confounding can be handled in the analysis of a study. It provides an in-depth discussion of marginal structural models and g-computation.
Beyond the G-estimation and the Marginal Structural Model (MSM) described below, traditional and efficient approaches to deal with time dependent variables should be considered in the design of the study, such as nested case control studies with assessment of time varying exposure windows.
G-estimation is a method for estimating the joint effects of time-varying treatments using ideas from instrumental variables methods. G-estimation of Causal Effects: Isolated Systolic Hypertension and Cardiovascular Death in the Framingham Heart Study (Am J Epidemiol 1998;148(4):390-401) demonstrates how the G-estimation procedure allows for appropriate adjustment of the effect of a time-varying exposure in the presence of time-dependent confounders that are themselves influenced by the exposure.
The use of Marginal Structural Models can be an alternative to G-estimation. Marginal Structural Models and Causal Inference in Epidemiology (Epidemiology 2000;11(5):550-60) introduces a class of causal models that allow for improved adjustment for confounding in situations of time-dependent confounding.
MSMs have two major advantages over G-estimation. Even if it is useful for survival time outcomes, continuous measured outcomes and Poisson count outcomes, logistic G-estimation cannot be conveniently used to estimate the effect of treatment on dichotomous outcomes unless the outcome is rare. The second major advantage of MSMs is that they resemble standard models, whereas G-estimation does not (see Marginal Structural Models to Estimate the Causal Effect of Zidovudine on the Survival of HIV-Positive Men. Epidemiology 2000;11(5):561-70).
Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models (Am J Epidemiol 2003;158(7):687-94) provides a clear example in which standard Cox analysis failed to detect a clinically meaningful net benefit of treatment because it does not appropriately adjust for time-dependent covariates that are simultaneously confounders and intermediate variables. This net benefit was shown using a marginal structural survival model. In Time-dependent propensity score and collider-stratification bias: an example of beta2-agonist use and the risk of coronary heart disease (Eur J Epidemiol 2013;28(4):291-9), various methods to control for time-dependent confounding are compared in an empirical study on the association between inhaled beta-2-agonists and the risk of coronary heart disease. MSMs resulted in slightly reduced associations compared to standard Cox-regression.
One may test the validity of putative causal associations by using control exposures or outcomes. Well-chosen positive and negative controls help convince investigator that the data at hand correctly detect existing associations or correctly demonstrate lack of association when none is expected. Positive controls turning out as negative and negative as positive may signal presence of a bias, as illustrated in a study demonstrating health adherer bias by showing that adherence to statins was associated with decreased risks of biologically implausible outcomes (Statin adherence and risk of accidents: a cautionary tale, Circulation 2009;119(15):2051-7). The general principle, with additional examples, is described in Control Outcomes and Exposures for Improving Internal Validity of Nonrandomized Studies (Health Serv Res 2015;50(5):1432-51).
Selecting drug-event combinations as reliable controls poses a challenge: it is difficult to establish for negative controls proof of absence of an association, and it is still more problematic to select positive controls because it is desirable not only to establish an association but also an accurate estimate of the effect size. This has led to attempts to establish libraries of controls that can be used to characterise the performance of different observational datasets in detecting various types of association using a number of different study designs. Although this kind of controls may be questioned according to Evidence of Misclassification of Drug-Event Associations Classified as Gold Standard 'Negative Controls' by the Observational Medical Outcomes Partnership (OMOP) (Drug Saf 2016;39(5):421-32), the approach of calibrating the performance of epidemiological methods prior to performing a study holds the promise of providing a trustworthy framework for interpretation of the results, as shown by Interpreting observational studies: Why empirical calibration is needed to correct p-values (Stat Med. 2014;33(2):209-18), Robust empirical calibration of p-values using observational data (Stat Med 2016;35(22):3883-8) and Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data (Proc Natl Acad Sci USA 2018;115 (11): 571-7).
Triangulation is not a separate methodological approach, but rather a framework, formally described in Triangulation in aetiological epidemiology (Int J Epidemiol 2016;45(6):1866-86). Triangulation is defined as “the practice of obtaining more reliable answers to research questions through integrating results from several different approaches, where each approach has different key sources of potential bias that are unrelated to each other.” In some ways, the paper formalises approaches already used in many nonrandomised pharmacoepidemiologic studies, including control exposures and outcomes, sensitivity analyses, comparing results from different population and different study designs – all within the same study and while explicitly specifying the direction of bias in each approach. Triangulation was used (without using the explicit term) in Associations of maternal antidepressant use during the first trimester of pregnancy with preterm birth, small for gestational age, autism spectrum disorder, and attention-deficit/hyperactivity disorder in offspring (JAMA 2017;317(15):1553-62), whereby, within the same study, the authors used negative controls (paternal exposure to antidepressants), and assess the association using different study design and study population (sibling design).