In healthcare databases, the correct assessment of drug exposure, outcome and covariate is crucial to the validity of research. The validation of electronic information on drug exposure, outcome or covariate is crucial for database studies and definitions should be included in the technical handbook of every database, ideally providing estimates of sensitivity, specificity, and the positive and negative predictive value. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract 2010;60:e128-36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1:82-9) contain examples.
Completeness and validity of all variables used as exposure, outcomes, potential confounders and effect modifiers should be considered. Assumptions included in case definitions or other algorithms may need to be confirmed. For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of previous validation should, however, consider the effect of any differences in variables or analyses and subsequent changes to health care, procedures and coding. A full understanding of both the health care system and procedures that generated the data is required. This is particularly important for studies relying upon accurate timing of exposure, outcome and covariate recording such as in the self-controlled case series. External validation against chart review or physician/patient questionnaire is possible with some resources. However, the questionnaires cannot always be considered as ‘gold standard’.
Review of records against a case definition by experts may also be possible. While false positives are more easily measured than false negatives, speciﬁcity of an outcome is more important than sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol 2005;58(4):323-37). Alternatively, internal logic checks can test for completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures.
Concordance between datasets such as comparison of cancer or death registries with clinical or administrative records can validate individual records or overall incidence or prevalence rates.
Linkage validation can be used as well, when another database is used for the validation of current one, through linkage methods (Using linked electronic data to validate algorithms for health outcomes in administrative databases., J Comp Eff Res. 2015 Aug;4(4):359-66.)
|10. Specific topics|
|Annex 1.||Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes|