5.4.1. Misclassification
The validity of pharmacoepidemiological studies depends on the correct assessment of exposure, outcomes and confounders. Measurement errors, i.e., misclassification of binary or categorical variables or mismeasurement of continuous variables result in information bias. The effect of misclassification in the presence of covariates (Am J Epidemiol. 1980;112(4):564–9) shows that misclassification of a confounder results in incomplete control for confounding.
Misclassification of exposure is non-differential if the assessment of exposure does not depend on the true outcome status and misclassification of outcome is non-differential if the assessment of the outcome does not depend on exposure status. Misclassification of exposure and outcome is considered dependent if the factors that predict misclassification of exposure are expected to also predict misclassification of outcome.
Misconceptions About Misclassification: Non-Differential Misclassification Does Not Always Bias Results Toward the Null (Am J Epidemiol. 2022; kwac03) emphasises that bias towards the null is not always “conservative” but might mask important safety signals and discusses seven exceptions to the epidemiologic ‘mantra’ about non-differential misclassification bias resulting in estimates towards the null. One important exception is outcome measurement with perfect specificity which results in unbiased estimates of the risk ratio.
The influence of misclassification on the point estimate should be quantified or, if this is not possible, its impact on the interpretation of the results should be discussed. FDA’s Quantitative Bias Analysis Methodology Development: Sequential Bias Adjustment for Outcome Misclassification (2017) proposes a method of adjustment when validation of the variable is complete. Use of the Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies (Am J Epidemiol. 1993;138(11):1007–15) proposes a method based on estimates of the positive predictive value which requires validation of a sample of patients with the outcome only, while assuming that sensitivity is non-differential and has been used in a web application (Outcome misclassification: Impact, usual practice in pharmacoepidemiological database studies and an online aid to correct biased estimates of risk ratio or cumulative incidence; Pharmacoepidemiol Drug Saf. 2020;29(11):1450-5) which allows correction of risk ratio or cumulative incidence point estimates and confidence intervals for bias due to outcome misclassification based on this methodology. The article Basic methods for sensitivity analysis of biases (Int J Epidemiol. 1996;25(6):1107-16) provides different examples of methods for examining the sensitivity of study results to biases, with a focus on methods that can be implemented without computer programming. Good practices for quantitative bias analysis (Int J Epidemiol. 2014;43(6):1969-85) advocates explicit and quantitative assessment of misclassification bias, including guidance on which biases to assess in each situation, what level of sophistication to use, and how to present the results.
5.4.2. Validation
Common misconceptions about validation studies (Int J Epidemiol. 2020;49(4): 1392-6) discusses important aspects on the design of validation studies. It stresses the importance of stratification on key variables (e.g., exposure in outcome validation) and shows that by sampling conditionally on the imperfectly classified measure (e.g., case as identified by the study algorithm), only the positive and negative predictive values can be validly estimated.
Most database studies will be subject to outcome misclassification to some degree, although case adjudication against an established case definition or a reference standard can remove false positives, while false negatives can be mitigated if a broad search algorithm is used. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract. 2010:60:e128 36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 6th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf. 2012;supp1:82 9) provide examples of validation. External validation against chart review or physician/patient questionnaire is possible in some instances but the questionnaires cannot always be considered as ‘gold standard’. Misclassification of exposure should also be measured based on validation, as feasible.
Linkage validation can be used when another database is used for the validation through linkage methods (see Using linked electronic data to validate algorithms for health outcomes in administrative databases, J Comp Eff Res. 2015;4:359-66). In some situations, there is no access to a resource to provide data for comparison. In this case, indirect validation may be an option, as explained in the textbook Applying quantitative bias analysis to epidemiologic data (Lash T, Fox MP, Fink AK. Springer-Verlag, New-York, 2009).
Structural validation of the database with internal logic checks should also be performed to verify the completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures or if a certain variable has values within a known reasonable range.
While the positive predictive value is more easily measured than the negative predictive value, a low specificity is more damaging than a low sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics; J Clin Epidemiol. 2005;58(4):323-37).
For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of a previous validation study should however consider the effect of any differences in prevalence and inclusion and exclusion criteria, the distribution and analysis of risk factors as well as subsequent changes to health care, procedures and coding, as illustrated in Basic Methods for Sensitivity Analysis of Biases, (Int J Epidemiol. 1996;25(6):1107-16)