Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


Section 3.2. Secondary use of data

The use of already available electronic patient healthcare data for research has had a marked impact on pharmacoepidemiology research. The last two decades have witnessed the development of key data resources, expertise and methodology that have allowed the conduct of landmark studies in the field.  Electronic medical records and record linkage of administrative health records are the main types of databases. Examples of the first and second are the CPRD in the UK and the national or regional registries and databases in the Nordic countries, Italy, Netherlands and other countries, respectively. The ENCePP Inventory of Databases contains key information on the databases that are registered in ENCePP. Section 3.3 of this Guide also describes databases and healthcare records used by research networks.


A comprehensive description of the main features and applications of frequently used databases for pharmacoepidemiology research in the United States and in Europe appears in the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012, Chapters 11 - 18). It should be noted that limitations exist in using electronic healthcare databases, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol 2005; 58: 23-337).


The primary purpose of the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf 2012;21:1-10) is to assist in the selection and use of data resources in pharmacoepidemiology by highlighting potential limitations and recommending tested procedures. Although it refers in the title and objective to data resources or databases, it mainly refers to databases of routinely collected healthcare information and does not include spontaneous report databases. It is a simple, well-structured guideline that will help investigators when selecting databases for their research and helps database custodians to describe their database in a useful manner. A section is entirely dedicated to the use of multi-site studies. The entire document contains references to data quality and data processing/transformation issues and there are sections dedicated to quality and validation procedures. There are also separate sections on privacy and security.


The Working Group for the Survey and Utilisation of Secondary Data (AGENS) with representatives from the German Society for Social Medicine and Prevention (DGSPM) and the German Society for Epidemiology (DGEpi) developed a Good Practice in Secondary Data Analysis Version 2 aiming to establish a standard for planning, conducting and analysing studies on the basis of secondary data. It is also aimed to be used as the basis for contracts between data owners (so-called primary users) and secondary users. It is divided into 11 sections addressing, among other aspects, the study protocol, quality assurance and data protection.


The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets provides criteria for best practice that apply to design, analysis, conduct and documentation. It emphasizes that investigators should understand potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of safety outcomes of interest in the proposed study and captured in the database.


General guidance for studies including those conducted with electronic healthcare databases can also be found in the ISPE GPP, in particular sections IV-B (Study conduct, Data collection). This guidance emphasises the paramount importance of patient data protection.


The International Society for Pharmacoeconomics and Outcome Research (ISPOR) established a task force to recommend good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER). The Task Force has subsequently published three articles (Part I, Part II and Part III) that review methodological issues and possible solutions for CER studies based on secondary data analysis (see also section 9.1 on comparative effectiveness research ). Many of the principles are applicable to studies with other objectives than CER, but aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.


Particular issues of note in the use of electronic patient healthcare data for pharmacoepidemiological research include the following:

  • Completeness of data capture i.e. does the database reliably capture all of the patient’s healthcare interactions or are there known gaps in coverage, capture, longitudinality or eligibility? Researchers using claims data rarely have the opportunity to carry out quality assurance on the whole data set. Descriptive analyses of the integrity of a US Medicaid Claims Database (Pharmacoepidemiol Drug Saf 2003;12:103–11) concludes that performing such analyses can reveal important limitations of the data and whenever possible, researchers should examine the ‘parent’ data set for apparent irregularities.

  • Bias in assessment of drug exposure from an administrative database. The relevance of these biases for quality control in more clinical databases is explored in European Surveillance of Antimicrobial Consumption (ESAC): Data Collection Performance and Methodological Approach (Br J Clin Pharmacol 2004;58: 419-28). This article describes a retrospective data collection effort (1997–2001) through an international network of surveillance systems, aimed at collecting publicly available, comparable and reliable data on antibiotic use in Europe. The data collected were screened for bias, using a checklist focusing on detection bias in sample and census data, errors in assigning medicinal product packages to the Anatomical Therapeutic Chemical Classification System, errors in calculations of Defined Daily Doses per package, bias by over-the-counter sales and parallel trade, and bias in ambulatory/hospital care mix. The authors describe the methodological rigour needed to assure data validity and to ensure reliable cross-national comparison.

  • Validity of the data and the definitions used, which is not simply about source record validation of a particular endpoint. There are many possible ways to define endpoints and researchers may only seek to validate their choice. Validation and validity of diagnoses in the General Practice Research Database (GPRD): a systematic review (Br J Clin Pharmacol 2010;69:4-14) investigated the range of methods used to validate diagnoses in a primary care database and concluded that a number of methods had been used to assess validity and that overall, estimates of validity were high. The quality of reporting of the validations was, however, often inadequate to permit a clear interpretation. Not all methods provided a quantitative estimate of validity and most methods considered only the positive predictive value of a set of diagnostic codes in a highly selected group of cases.

  • Discordance between data sources, such as in Discordance of databases designed for claims payment versus clinical information systems: implications for outcomes research (Ann Intern Med 1993; 119: 844-50), a comparative study of a clinical versus an insurance claims database for predictors of prognosis in patients with ischaemic heart disease. A finding was that claims data failed to identify more than half of the patients with conditions important for prognosis when compared with the clinical information system.

Another example of the hazards of using large linked databases is provided in Vaccine safety surveillance using large linked databases: opportunities, hazards and proposed guidelines (Expert Rev Vaccines 2003; 2(1):21-9).


Quality management is further addressed in section 6 of the Guide.


Individual Chapters:


1. General aspects of study protocol

2. Research question

3. Approaches to data collection

3.1. Primary data collection

3.2. Secondary use of data

3.3. Research networks

3.4. Spontaneous report database

3.5. Using data from social media and electronic devices as a data source

3.5.1. General considerations

4. Study design and methods

4.1. General considerations

4.2. Challenges and lessons learned

4.2.1. Definition and validation of drug exposure, outcomes and covariates Assessment of exposure Assessment of outcomes Assessment of covariates Validation

4.2.2. Bias and confounding Choice of exposure risk windows Time-related bias Immortal time bias Other forms of time-related bias Confounding by indication Protopathic bias Surveillance bias Unmeasured confounding

4.2.3. Methods to handle bias and confounding New-user designs Case-only designs Disease risk scores Propensity scores Instrumental variables Prior event rate ratios Handling time-dependent confounding in the analysis

4.2.4. Effect modification

4.3. Ecological analyses and case-population studies

4.4. Hybrid studies

4.4.1. Pragmatic trials

4.4.2. Large simple trials

4.4.3. Randomised database studies

4.5. Systematic review and meta-analysis

4.6. Signal detection methodology and application

5. The statistical analysis plan

5.1. General considerations

5.2. Statistical plan

5.3. Handling of missing data

6. Quality management

7. Communication

7.1. Principles of communication

7.2. Guidelines on communication of studies

8. Legal context

8.1. Ethical conduct, patient and data protection

8.2. Pharmacovigilance legislation

8.3. Reporting of adverse events/reactions

9. Specific topics

9.1. Comparative effectiveness research

9.1.1. Introduction

9.1.2. General aspects

9.1.3. Prominent issues in CER Randomised clinical trials vs. observational studies Use of electronic healthcare databases Bias and confounding in observational CER

9.2. Vaccine safety and effectiveness

9.2.1. Vaccine safety General aspects Signal detection Signal refinement Hypothesis testing studies Meta-analyses Studies on vaccine safety in special populations

9.2.2. Vaccine effectiveness Definitions Traditional cohort and case-control studies Screening method Indirect cohort (Broome) method Density case-control design Test negative design Case coverage design Impact assessment Methods to study waning immunity

9.3. Design and analysis of pharmacogenetic studies

9.3.1. Introduction

9.3.2. Identification of genetic variants

9.3.3. Study designs

9.3.4. Data collection

9.3.5. Data analysis

9.3.6. Reporting

9.3.7. Clinical practice guidelines

9.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes