Chapter 11: Signal detection methodology and application

11.1. General aspects of signal detection

A general overview of methods for signal detection and recommendations for their application are provided in the report of the CIOMS Working Group VIII Practical aspects of signal detection in pharmacovigilance; empirical results on various aspects of signal detection obtained from the IMI PROTECT project have been summarised in Good Signal Detection Practices: Evidence from IMI PROTECT. (Drug Saf. 2016;39(6):469-90).

The EU Guideline on good pharmacovigilance practices (GVP) Module IX (Rev 1)- Signal Management defines signal management as the set of activities performed to determine whether, based on an examination of individual case safety reports (ICSRs), aggregated data from active surveillance systems, studies, literature information or other data sources, there are new risks associated with an active substance or a medicinal product or whether risks have changed. Signal management covers all steps from detecting signals (signal detection), through their validation and confirmation, analysis, prioritisation and assessment to recommending action, as well as the tracking of the steps taken and of any recommendations made. The Guideline on good pharmacovigilance practices (GVP) - Module IX Addendum I – Methodological aspects of signal detection from spontaneous reports of suspected adverse reactions describes the components of an effective signal detection system and lists some of the methodological aspects that have been proved to be effective and that should be considered. Implementation details of such a system are not provided as they may be database dependent.

The FDA’s Guidance for Industry-Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment provides best practice for documenting, assessing and reporting individual case safety reports and case series and for identifying, evaluating, investigating and interpreting safety signals, including recommendations on data mining techniques and use of pharmacoepidemiological studies.

11.2. Methods of statistical signal detection

Quantitative analysis of spontaneous adverse drug reaction reports is routinely used in drug safety research. Several articles have been published on statistical signal detection. Quantitative signal detection using spontaneous ADR reporting (Pharmacoepidemiol Drug Saf. 2009;18(6):427-36) describes the core concepts behind the most common methods, the proportional reporting ratio (PRR), reporting odds ratio (ROR), information component (IC) and empirical Bayes geometric mean (EBGM). The authors also discuss the role of Bayesian shrinkage in screening spontaneous reports and the importance of changes over time in screening the properties of the measures.

Additionally, they discuss major areas of controversy (such as stratification and evaluation and implementation of methods) and give some suggestions as to where emerging research is likely to lead. Data mining for signals in spontaneous reporting databases: proceed with caution (Pharmacoepidemiol Drug Saf. 2007;16(4):359–65) reviews data mining methodologies and their limitations and provides useful points to consider before incorporating data mining as a routine component of any pharmacovigilance program. Disproportionality Analysis for Pharmacovigilance Signal Detection in Small Databases or Subsets: Recommendations for Limiting False-Positive Associations (Drug Saf. 2020;43(5):479-87) evaluates the impact of database size on the performance of disproportionality analysis, with regards to limiting spurious associations.

Methods such as multiple logistic regression have the theoretical capability to reduce masking and confounding by co-medication and underlying disease. Regression-Adjusted GPS Algorithm describes the use of regression to increase the discriminatory power of the Gamma Poisson Shrinkage (GPS) algorithm. Data-Driven Prediction of Drug Effects and Interactions (Sci Transl Med. 2012 Mar 14; 4(125): 125ra31) describes the application of regression methods to correct for synthetic associations caused by hidden, or unmeasured, covariates as well as those from indication and concomitant drug use. The letter Logistic regression in signal detection: another piece added to the puzzle (Clin Pharmacol Ther. 2013;94(3):312) highlights the variability of results obtained in different studies based on this method and the daunting computational task it requires. More work is needed on its value for pharmacovigilance in the real-world setting.

A more recent proposal involves a broadening of the basis for computational screening of individual case safety reports, by considering multiple aspects of the strength of evidence in a predictive model. This approach combines disproportionality analysis with features such as the number of well-documented reports, the number of recent reports and geographical spread of the case series (Improved statistical signal detection in pharmacovigilance by combining multiple strength-of-evidence aspects in vigiRank, Drug Saf. 2014;37(8):617–28). In a similar spirit, logistic regression has been proposed to combine a disproportionality measure with a measure of unexpectedness for the time-to-onset distribution (Use of logistic regression to combine two causality criteria for signal detection in vaccine spontaneous report data, Drug Saf. 2014;37(12):1047-57). In A prediction model‐based algorithm for computer‐assisted database screening of adverse drug reactions in the Netherlands (Pharmacoepidemiol Drug Saf. 2018;27(2):199-205), five relevant characteristics (number of reports, disproportionality, Naranjo score, proportion of MAH reports, proportion of HCP reports) were chosen as potential predictors in the model and tested against the presence in the Summary of Product Characteristics (SmPC) of each unique drug‐ADR association at the time of the analysis. All candidate predictors were included into the final model with an increased screening efficiency. The authors comment that the choice of candidate predictors may depend on each spontaneous report databases but that the method of generating a prediction model‐based priority list of signals could be useful in other databases.

Methods for statistical signal detection tend to classify reports based on reported adverse event terms considered one at time. Broader categories such as High-Level Terms or Standardized MedDRA Queries are sometimes used to group similar adverse events and improve sensitivity. However, this may be at the expense of specificity. Consensus clustering for case series identification and adverse event profiles in pharmacovigilance (Artif Intell Med, 2021; 122:1-9) proposes a different approach where cluster analysis attempts to identify case series describing similar clinical conditions, accounting for the complete sets of signs, symptoms, and diagnoses on each report.

Disproportionality methods are usually calculated on the cumulative data and therefore do not provide a direct insight into temporal changes in frequency of reports. Methodologies to monitor changes in the frequency of reporting over time have been developed with the focus to enhance pharmacovigilance when databases are small, when drugs have established safety profiles and/or when product quality defects, medication errors and cases of abuse or misuse are of concern. Automated method for detecting increases in frequency of spontaneous adverse event reports over time (J Biopharm Stat. 2013; 23(1):161-77) presents a regression method with both smooth trend and seasonal components, while An algorithm to detect unexpected increases in frequency of reports of adverse events in EudraVigilance (Pharmacoepidemiol Drug Saf. 2018;27(1):38-45) presents the testing of a model based on a negative binomial time-series regression model on thirteen historical concerns. Additionally, a modification of the Information Component to screen for spatial-temporal disproportionality is described in Using VigiBase to Identify Substandard Medicines: Detection Capacity and Key Prerequisites (Drug Saf. 2015; 38(4): 373–82). Despite the promising results of these methods, and even if theoretically they seem appealing, limited work has been performed to assess their effectiveness.

11.3. Triage of statistical safety signals

The revised guidance on Screening for adverse reactions in EudraVigilance describes methods for screening adverse drug reactions used by the European Medicines Agency and national competent authorities. The proposed methods complement the classical disproportionality analysis with additional data summaries, based on both statistical and clinical considerations. This approach is based on the fact that, although disproportionality methods have demonstrated to detect many adverse reactions before other currently used methods of signal detection, this is not true for all types of adverse reactions.

Hence a comprehensive and efficient routine signal detection system will seek to integrate a number of different methods to prioritise the drug event combinations for further evaluation. For the methods recommended, the guidance addresses elements of their interpretation, their potential advantages and limitations and the evidence behind. Areas of uncertainty that require resolution before firm recommendations can be made are also mentioned.

As understanding increases regarding the mechanisms at a molecular level that are involved in adverse effects of drugs it would be expected that this information will inform efforts to predict and detect drug safety problems. Such modelling is presented in Data-driven prediction of drug effects and interactions (Sci Transl Med. 2012 14;4(125):125ra31) and should be a major focus of drug safety research activities. An example of an application of this concept is illustrated in the paper Cheminformatics-aided pharmacovigilance: application to Stevens-Johnson Syndrome (J Am Med Inform Assoc. 2016; 23(5): 968–78) where the authors apply a Quantitative Structure-Activity Relationship (QSAR) model to predict the drugs associated with Stevens Johnson syndrome in a pharmacovigilance database. In Target Adverse Event Profiles for Predictive Safety in the Postmarket Setting (Clin Pharmacol Ther. 2021;109(5):1232-43), the authors identify drugs that share pharmacological targets with the drug of interest and use information from these drugs to predict post-marketing adverse drug reactions of the drug of interest. Machine learning on data from the FDA Adverse Event Reporting System, peer-reviewed literature and FDA drug labels is used for the prediction. In Role of serotonin and norepinephrine transporters in antidepressant-induced arterial hypertension: a pharmacoepidemiological-pharmacodynamic study (Eur J Clin Pharmacol. 2020 Sep;76(9):1321-1327.) disproportionality analysis on Vigibase was combined with a pharmacodynamic study to study the relationship between SRIs ands SNRIs and arterial hypertension, taking in to account the affinity for noradrenergic and serotonergic receptors.

With pharmacovigilance databases increasing in size, manual review of all cases becomes a non-scalable process both because the increasing number of cases to review in each potential signal and because it is difficult to summarise hundreds of case reports in a narrative format. To address some of these issues there has been recent experimentation with machine learning and natural language processing techniques. Towards Automating Adverse Event Review: A Prediction Model for Case Report Utility (Drug Saf. 2020 Apr;43(4):329-338) notes the need to develop modernised pharmacovigilance practices and shows the feasibility of developing a tool predictive of ICSR utility. Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA Adverse Event Reporting System (Comput Biol Med. 2021 Aug;135:104517) describes the use of machine learning techniques to quickly eliminate non-assessable reports.

11.4. Performance comparison of signal detection methods

The role of data mining in pharmacovigilance (Expert Opin Drug Saf. 2005;4(5):929-48) explains how signal detection algorithms work and addresses questions regarding their validation, comparative performance, limitations and potential for use and misuse in pharmacovigilance.

An empirical evaluation of several disproportionality methods in a number of different spontaneous reporting databases is given in Comparison of statistical detection methods within and across spontaneous reporting databases (Drug Saf. 2015;38(6);577-87).

Performance of pharmacovigilance signal detection algorithms for the FDA adverse event reporting system (Clin Pharmacol Ther. 2013;93(6):539-46) describes the performance of signal-detection algorithms for spontaneous reports in the US FDA adverse event reporting system against a benchmark constructed by the Observational Health Data Sciences and Informatics community. It concludes that logistic regression performs better than traditional disproportionality analysis. Other studies have addressed similar or related questions, for examples Large-scale regression-based pattern discovery: The example of screening the WHO global Drug Safety database (Stat Anal. Data Min. 2010;3(4):197–208), Are all quantitative postmarketing signal detection methods equal? Performance characteristics of logistic regression and Multi-item Gamma Poisson Shrinker (Pharmacoepidemiol Drug Saf. 2012; 21(6):622–30 and Data-driven prediction of drug effects and interactions (Sci Transl Med. 2012;4(125):125ra31).

11.5. Stratification and sub-group analyses

Many statistical signal detection algorithms disregard the underlying diversity and give equal weight to reports on all patients when computing the expected number of reports for a drug-event pair. This may give them vulnerability to confounding and distortions due to effect modification, and could result in true signals being masked or false associations being flagged as potential signals. Stratification and/or subgroup analyses might address these issues, and whereas stratification is implemented in some standard software packages, routine use of subgroup analyses is less common. Performance of stratified and subgrouped disproportionality analyses in spontaneous databases (Drug Saf. 2016; 39(4):355-64) performed a comparison across a range of spontaneous report databases and covariates and found subgroup analyses to improve first pass signal detection, whereas stratification did not; subgroup analyses by patient age and country of origin were found to bring the greatest value.

11.6. Masking

Masking is a statistical issue by which true signals of disproportionate reporting are hidden by the presence of other products in the database. It is a phenomenon often observed when external factors, such as solicited schemes of reporting of adverse drug reactions or media attention, affect the reporting dynamics leading to a relative increase in the reporting rate for a specific medicinal product. As the change in reporting dynamics can be restricted in time and location, masking is not fully understood, but can be highly impactful if the reporting dynamics change dramatically over a long period and across multiple countries, such as seen in the COVID-19 world-wide vaccination campaigns.

Publications have described methods assessing the extent and impact of the masking effect of measures of disproportionality. They include A conceptual approach to the masking effect of measures of disproportionality (Pharmacoepidemiol Drug Saf. 2014;23(2):208-17), with an application described in Assessing the extent and impact of the masking effect of disproportionality analyses on two spontaneous reporting systems databases (Pharmacoepidemiol Drug Saf. 2014;23(2):195-207), Outlier removal to uncover patterns in adverse drug reaction surveillance - a simple unmasking strategy (Pharmacoepidemiol Drug Saf. 2013;22(10):1119-29) and A potential event-competition bias in safety signal detection: results from a spontaneous reporting research database in France (Drug Saf. 2013;36(7):565-72). The value of these methods in practice needs to be further investigated.

11.7. Complementary role of databases

A time-consuming step in signal detection of adverse reactions is the determination of whether an effect is already recorded in the product information. A database which can be searched for this information allows filtering or flagging reaction monitoring reports for signals related to unlisted reactions, thus improving considerably the efficiency of the signal detection process by restricting attention to those drugs and adverse event not already considered causally related. In research, it permits an evaluation of the effect of background restriction on the performance of statistical signal detection. An example of such database is the PROTECT Database of adverse drug reactions (EU SPC ADR database), a structured Excel database of all adverse drug reactions (ADRs) listed in Chapter 4.8 of the SmPC of medicinal products authorised in the European Union (EU) according to the centralised procedure, based exclusively on the Medical Dictionary for Regulatory Activities (MedDRA) terminology. Efforts to identify adverse drug reactions in regulatory documents using natural language processing are being explored and could help build and maintain such databases in the future. ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance (Drug Saf. 2021;44(1):83-94) presents a systematic evaluation of different such approaches.

Other large observational databases such as claims and electronic health records databases are potentially useful as part of a larger signal detection and refinement strategy. Modern methods of pharmacovigilance: detecting adverse effects of drugs (Clin Med 2009;9(5):486-9) describes the strengths and weaknesses of different data sources for signal detection (spontaneous reports, electronic patient records and cohort-event monitoring). A number of studies have considered the use of observational data in electronic systems that complement existing methods of safety surveillance e.g. the PROTECT, OHDSI and Sentinel projects. Useful Interplay Between Spontaneous ADR Reports and Electronic Healthcare Records in Signal Detection (Drug Saf. 2015;38(12):1201-10) investigates the potential of using electronic health records alongside spontaneous reporting systems to improve signal detection, concluding that the former may have additional value for adverse events with a high background incidence. Toward multimodal signal detection of adverse drug reactions (J Biomed Inform. 2017;76:41-9) concludes that utilising and jointly analysing multiple data sources may lead to improved signal detection but development of this approach requires a deeper understanding the data sources used, additional benchmarks and further research on methods to generate and synthesise signals.