13.2. Data quality frameworks
Quality in research is a measure of excellence that impacts medicines development and public health. What is quality management system (QMS)? (American Society for Quality, 2022) defines a QSM as a formalised system that documents processes, procedures, and responsibilities for achieving quality policies and objectives. Quality management principles as described in ISO Quality management principles are applicable to pharmacoepidemiological research. ISO 9000:2015 describes the fundamental concepts and principles of quality management which are universally applicable to organisations and specifies the terms and definitions that apply to quality management and quality management system standards. The book Total Quality Management-Key Concepts and Case Studies (D.R. Kiran, BSP Books, Elsevier, 2016) deals with the management principles and practices that govern the quality function and presents all the aspects of quality control and management in practice.
The Commission Implementing Regulation (EU) No 520/2012 and the Good pharmacovigilance practices (GVP) Module I provide a framework for the quality management of pharmacovigilance and safety studies of authorised medicinal products.
Measurable quality can be achieved by:
Quality planning: establishing structures (including validated computerised systems) and planning integrated and consistent processes;
Quality assurance and control: monitoring and evaluating how effectively the structures and processes have been established and how effectively the processes are being carried out;
Quality improvement: correcting and improving the structures and processes where necessary.
Pharmacoepidemiological research may be based on primary data collection or secondary use of data which is collected for other purposes (see Chapter 8). Primary data collection is a controlled process to which all steps of quality management should apply. Secondary use of data requires quality control addressing a posteriori data quality irrespective of its use (also part of the concept of reliability mentioned in the next section, e.g., detection of missing information, errors made during a transfer or conversion, outliers, duplicates, implausible values), as well as data quality in the context of its use for a specific study (also named relevance).
Pharmacoepidemiological research is also becoming more complex and may use a very large amount of data. In such situation, managing quality implies a risk-based approach. Risk-based quality management is incorporated as Good Clinical Practice expectation in ICH E8 (R1) and addressed in the European Commission’s Risk proportionate approaches in clinical trials (2017), EMA’s Reflection paper on risk-based quality management in clinical trials (2013) and GVP Module III on Pharmacovigilance inspections (2014).
The considerations and recommendations in Chapter 5. regarding the definition and validation of exposure, outcomes and covariates are essential aspects to be addressed for quality management. Adequate information on data sources is needed in order to identify real-world data sources and to assess their suitability for specific research questions. The HMA-EMA Good Practice Guide for the use of the Metadata Catalogue of Real-World Data Sources (2022) provides recommendations on the identifiability of data sources based on the data elements described in the List of metadata for Real World Data catalogues (2022). This catalogue will also help assessing the quality of data sources proposed to be used in a study protocol or referred to in a study report.
Large electronic data sources such as electronic healthcare records, insurance claims and other administrative data have opened up new opportunities for investigators to rapidly conduct pharmacoepidemiological studies and clinical trials in real-world settings, with a large number of subjects. A concern is that these data have not been collected systematically for research on the utilisation, safety or effectiveness of medicinal products, which could affect the validity, reliability and reproducibility of the investigation. Several data quality frameworks have been developed to understand the strengths and limitations of the data to answer a research question, the impact they may have on the study results, and the decisions to be made to complement available data. The dimensions covered by these frameworks overlap, with different levels of details. Quality Control Systems for Secondary Use Data (2022) lists the domains addressed in several of them.
The following non-exhaustive list provides links to published data quality frameworks generally applicable to data sources, with a short description of their content.
The draft HMA-EMA Data Quality Framework for EU medicines regulation (2022) provides general considerations on data quality that are relevant for regulatory decision-making, definitions for data dimensions and sub-dimensions, as well as ideas for their characterisation and related metrics. It also provides an analysis of what data quality actions and metrics can be put in place in different scenarios and introduces a maturity model to drive the evolution of automation to support data-driven regulatory decision making. The proposed data dimensions include Reliability (with sub-dimensions of Precision, Accuracy and Plausibility), Extensiveness (with sub-dimensions of Completeness and Coverage), Coherence (with the sub-dimensions of formal, structural and semantic coherence, Uniqueness, Conformance and Validity), Timeliness and Relevance.
The European Health Data Space Data Quality Framework (2022) of the Joint Action Towards the European Health Data Space (TEHDAS) project has defined six dimensions deemed the most important ones at data source level: reliability, relevance, timeliness, coherence, coverage and completeness.
Kahn’s A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data (eGEMs. 2016;4(1):1244) describes a framework with three data quality categories: Conformance (with sub-categories of Value, Relational Conformance and Computational Conformance), Completeness, and Plausibility (with sub-categories of Uniqueness, Atemporal Plausibility and Temporal Plausibility). These categories are applied in two contexts: Verification and Validation. This framework is used by the US National Patient-Centered Clinical Research Network (PCORnet), with an additional component, Persistence, and by the Observational Health Data Science and Informatics (OHDSI) network. Based on this framework, the Data Analytics chapter of the Book of OHDSI (2021) provides an automated tool performing the data quality checks in databases conforming to the OMOP common data model. Increasing Trust in Real-World Evidence Through Evaluation of Observational Data Quality (J Am Med Inform Assoc. 2021;28(10):2251-7) describes an open source R package that executes and summarises over 3,300 data quality checks in databases available in OMOP.
Duke-Margolis Center for Health Policy’s Characterizing RWD Quality and Relevancy for Regulatory Purposes (2018) and Determining Real-World Data’s Fitness for Use and the Role of Reliability (2019) specify that determining if a real-world dataset is fit-for-regulatory-purpose is a contextual exercise, as a data source that is appropriate for one purpose may not be suitable for other evaluations. A real-world dataset should be evaluated as fit-for-purpose if, within the given clinical and regulatory context, it fulfils two dimensions: Data Relevancy (including Availability of key data elements, Representativeness, Sufficient subjects and Longitudinality) and Data Reliability with two aspects: Data Quality (Validity, Plausibility, Consistency, Conformance and Completeness) and Accrual.
Data quality frameworks have been described for specific types of data sources and specific objectives. For example, the EMA’s Guideline on Registry-based studies (2021) describes four quality components for use of patient registries (mainly disease registries) for regulatory purposes: Consistency, Completeness, Accuracy and Timeliness. A roadmap to using historical controls in clinical trials – by Drug Information Association Adaptive Design Scientific Working Group (DIA-ADSWG) (Orphanet J Rare Dis. 2020;15:69) describes the main sources of RWD to be used as historical controls, with an Appendix providing guidance on factors to be evaluated in the assessment of the relevance of RWD sources and resultant analyses.
Algorithms have been proposed to identify fit-for-purpose data to address research questions. For example, The Structured Process to Identify Fit-For-Purpose Data: A Data Feasibility Assessment Framework (Clin Pharmacol Ther. 2022;111(1):122-34) and its update, A Structured Process to Identify Fit-for Purpose Study Design and Data to Generate Valid and Transparent Real-World Evidence for Regulatory uses (Clin Pharmacol Ther. 2023;113(6):1235-1239), aim to complement FDA’s framework for RWE with a structured and detailed stepwise approach for the identification and feasibility assessment of candidate data sources for a specific study. The update emphasises the importance of initial study design, including designing a hypothetical target trial as a benchmark for the real-world study design before proceeding to data feasibility assessment. Whilst the approach of data feasibility assessment should be recommended, the complexity of some of the algorithms may discourage their use in practice. The experience will show to which extent they can support the validity and transparency of study results and ultimately the level of confidence in the evidence provided. It is also acknowledged that many investigators simply use the data source(s) they have access to and are familiar with in terms of potential bias, confounding and missing data.
Rules, procedures, roles and responsibilities of quality assurance and quality control for clinical trials and biomedical research are well defined and described in many documents, such as the ICH E6 (R2) Good clinical practice, the European Forum for Good Clinical Practice (EFCGP) Guidelines, the Imperial College Academic Health Science Centre (AHSC)’s Quality Control and Quality Assurance SOP or the article Quality by Design in Clinical Trials: A Collaborative Pilot With FDA (Ther Innov Regul Sci. 2013; 47(2):161-6).
Quality management principles applicable to observational studies with primary data collection or secondary use of data are described in the Commission Implementing Regulation (EU) No 520/2012, GVP Module I, FDA’s Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Health Care Data Sets, in recommendations from scientific societies such as the ISPE Guidelines for Good Pharmacoepidemiology Practices or the Guidelines and recommendations for ensuring Good Epidemiological Practice (GEP): a guideline developed by the German Society for Epidemiology (Eur J Epidemiol. 2019;34(3):301-17), and in general epidemiology textbooks cited in the Introduction of this Guide. The Strengthening the Reporting of Observational studies in Epidemiology (STROBE) Statement Guidelines for reporting observational studies has established recommendations for improving the quality of reporting of observational studies and seeks to ensure a clear presentation of what was planned, done, and found.
The following articles are practical examples of quality aspects implementation or assessment in different settings:
Poor reporting quality of observational clinical studies comparing treatments of COVID-19 - a retrospective cross-sectional study (BMC Med Res Methodol. 2022;22(1):2) found a poor reporting quality of observational studies on the treatment of COVID-19 throughout the year 2020 with a mean adherence of 45.6% to the STROBE checklist items in 147 observational studies.
Quality of observational studies in prestigious journals of occupational medicine and health based on Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: a cross‑sectional study (BMC Res Notes 2018;11:266) found that all sub-items of the STROBE statement were reported in 63.7%, not reported in 29.7% and not applicable in 6.6% of the 60 studies evaluated.
Chapter 11 ‘Data Collection and Quality Assurance’ of the Agency for Healthcare Research and Quality (AHRQ)’s Registries for Evaluating Patient Outcomes: A User's Guide, 4th Edition (2020) reviews key areas of data collection, cleaning, storage, and quality assurance for registries, with practical examples.
Validation and validity of diagnoses in the General Practice Research Database (GPRD): a systematic review (Br J Clin Pharmacol. 2010;69:4-14) assesses the quality of the methods used to validate diagnoses in the former GPRD.
Quality assurance in non-interventional studies (Ger Med Sci. 2009;7:Doc 29: 1-14) proposes measures of quality assurance that can be applied at different stages of non-interventional studies without compromising the character of non-intervention.