Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


4.2. Secondary data collection


Secondary data collection refers to collection of data already gathered for another purpose (e.g. electronic healthcare data, claims or prescription data). These can also be linked to non-medical data. The last two decades have witnessed the development of key data resources, expertise and methodology that have allowed use of such data for pharmacoepidemiology. The ENCePP Inventory of Data Sources contains key information on the databases that are registered in ENCePP. Section 4.6 of this Guide also describes existing research networks.


A comprehensive description of the main features and applications of frequently used databases for pharmacoepidemiology research in the United States and in Europe appears in the book Pharmacoepidemiology (B. Strom, S.E. Kimmel, S. Hennessy. 5th Edition, Wiley, 2012, Chapters 11 - 18). The limitations existing in using electronic healthcare databases should be acknowledged, as detailed in A review of uses of healthcare utilisation databases for epidemiologic research on therapeutics (J Clin Epidemiol 2005; 58: 23-337).


The primary purpose of the ISPE-endorsed Guidelines for Good Database Selection and use in Pharmacoepidemiology Research (Pharmacoepidemiol Drug Saf 2012;21:1-10) is to assist in the selection and use of data resources in pharmacoepidemiology by highlighting potential limitations and recommending tested procedures. This text mainly refers to databases of routinely collected healthcare information and does not include spontaneous report databases. It is a simple, well-structured guideline that will help investigators when selecting databases for their research and helps database custodians to describe their database in a useful manner. An entire section is dedicated to the use of multi-site studies. The entire document contains references to data quality and data processing/transformation issues and there are sections dedicated to quality and validation procedures. There are also separate sections on privacy and security.


The Working Group for the Survey and Utilisation of Secondary Data (AGENS) with representatives from the German Society for Social Medicine and Prevention (DGSPM) and the German Society for Epidemiology (DGEpi) developed a Good Practice in Secondary Data Analysis Version 2 aiming to establish a standard for planning, conducting and analysing studies on the basis of secondary data. It is also aimed to be used as the basis for contracts between data owners (so-called primary users) and secondary users. It is divided into 11 sections addressing, among other aspects, the study protocol, quality assurance and data protection.

The FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets provides criteria for best practice that apply to design, analysis, conduct and documentation. It emphasizes that investigators should understand potential limitations of electronic healthcare data systems, make provisions for their appropriate use and refer to validation studies of safety outcomes of interest in the proposed study and captured in the database.


General guidance for studies including those conducted with electronic healthcare databases can also be found in the ISPE GPP, in particular sections IV-B (Study conduct, Data collection). This guidance emphasises the paramount importance of patient data protection.


The International Society for Pharmacoeconomics and Outcome Research (ISPOR) established a task force to recommend good research practices for designing and analysing retrospective databases for comparative effectiveness research (CER). The Task Force has subsequently published three articles (Part I, Part II and Part III) that review methodological issues and possible solutions for CER studies based on secondary data analysis (see also section 10.1 on comparative effectiveness research). Many of the principles are applicable to studies with other objectives than CER, but aspects of pharmacoepidemiological studies based on secondary use of data, such as data quality, ethical issues, data ownership and privacy, are not covered.

Particular issues to be considered in the use of electronic healthcare data for pharmacoepidemiological research include the following:

  • Completeness of data capture: does the database reliably capture all of the patient’s healthcare interactions or are there known gaps in coverage, capture, longitudinality or eligibility? Researchers using claims data rarely have the opportunity to carry out quality assurance on the whole data set. Descriptive analyses of the integrity of a US Medicaid Claims Database (Pharmacoepidemiol Drug Saf 2003;12:103–11) concludes that performing such analyses can reveal important limitations of the data and whenever possible, researchers should examine the ‘parent’ data set for apparent irregularities.
  • The relevance of bias in assessment of drug exposure for quality control in clinical databases: European Surveillance of Antimicrobial Consumption (ESAC): Data Collection Performance and Methodological Approach (Br J Clin Pharmacol 2004;58: 419-28) describes a retrospective data collection effort (1997–2001) through an international network of surveillance systems, aimed at collecting publicly available, comparable and reliable data on antibiotic use in Europe. The data collected were screened for bias, using a checklist focusing on detection bias in sample and census data, errors in assigning medicinal product packages to the Anatomical Therapeutic Chemical Classification System, errors in calculations of Defined Daily Doses per package, bias by over-the-counter sales and parallel trade, and bias in ambulatory/hospital care mix. The authors describe the methodological rigour needed to assure data validity and to ensure reliable cross-national comparison.


  • Validity of diagnoses: Validation and validity of diagnoses in the General Practice Research Database (GPRD): a systematic review (Br J Clin Pharmacol 2010;69:4-14) investigated the range of methods used to validate diagnoses in a primary care database and concluded that a number of methods had been used to assess validity and that overall, estimates of validity were high. The quality of reporting of the validations was, however, often inadequate to permit a clear interpretation. Not all methods provided a quantitative estimate of validity and most methods considered only the positive predictive value of a set of diagnostic codes in a highly selected group of cases.



  • The impact of changes over time in data, access methodology and the environment: Evidence generation from healthcare databases: recommendations for managing change (Pharmacoepidemiol Drug Saf 2016;25(7):749-754) proposes aspects to be considered to minimise the occurrence of problems of validity, reproducibility and comparability because of changes in the data or systems. A section addresses issues that may occur where common data models and associated tools are introduced.


An example of the hazards of using large linked databases is provided in Vaccine safety surveillance using large linked databases: opportunities, hazards and proposed guidelines (Expert Rev Vaccines 2003; 2(1):21-9).

Quality management is further addressed in section 7 of the Guide.



Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes