Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


5.1.4. Validation


In healthcare databases, the correct assessment of drug exposure, outcome and covariate is crucial to the validity of research. The validation of electronic information on drug exposure, outcome or covariate is crucial for database studies and definitions should be included in the technical handbook of every database, ideally providing estimates of sensitivity, specificity, and the positive and negative predictive value. Validity of diagnostic coding within the General Practice Research Database: a systematic review (Br J Gen Pract 2010;60:e128-36), the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012) and Mini-Sentinel's systematic reviews of validated methods for identifying health outcomes using administrative and claims data: methods and lessons learned (Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1:82-9) contain examples.


Completeness and validity of all variables used as exposure, outcomes, potential confounders and effect modifiers should be considered. Assumptions included in case definitions or other algorithms may need to be confirmed. For databases routinely used in research, documented validation of key variables may have been done previously by the data provider or other researchers. Any extrapolation of previous validation should, however, consider the effect of any differences in variables or analyses and subsequent changes to health care, procedures and coding. A full understanding of both the health care system and procedures that generated the data is required. This is particularly important for studies relying upon accurate timing of exposure, outcome and covariate recording such as in the self-controlled case series.  External validation against chart review or physician/patient questionnaire is possible with some resources. However, the questionnaires cannot always be considered as ‘gold standard’.


Review of records against a case definition by experts may also be possible. While false positives are more easily measured than false negatives, specificity of an outcome is more important than sensitivity when considering bias in relative risk estimates (see A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol 2005;58(4):323-37). Alternatively, internal logic checks can test for completeness and accuracy of variables. For example, one can investigate whether an outcome was followed by (or proceeded from) appropriate exposure or procedures.


Concordance between datasets such as comparison of cancer or death registries with clinical or administrative records can validate individual records or overall incidence or prevalence rates.

Linkage validation can be used as well, when another database is used for the validation of current one, through linkage methods (Using linked electronic data to validate algorithms for health outcomes in administrative databases., J Comp Eff Res. 2015 Aug;4(4):359-66.)




Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes