Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


15.5. Artificial intelligence in pharmacoepidemiology


15.5.1. Introduction


Artificial intelligence (AI) is a catch-all term for a set of tools and techniques that allow machines to do activities commonly described as requiring human-level intelligence. While no consensus on a definition of AI exists, a common trend is an analogy to human intelligence, however, this is unhelpful as it suggests an idea of Artificial General Intelligence, whereas current techniques and tools are dedicated to assist specific tasks, i.e., Artificial Narrow Intelligence.


Machine Learning (ML) is considered a subset of AI and reflects the ability of computers to identify and extract rules from data rather than those rules being explicitly coded by a human. Deep Learning (DL) is a subtype of ML with increased complexity of how it parses and analyses data. The rules identified by ML or DL applications constitute an algorithm and the outputs are often said to be data-driven, as opposed to rules explicitly coded by a human that form knowledge-based algorithms.


Natural language processing (NLP) sits at the interface of linguistics, computer science and AI and is concerned with providing machines with the ability to understand text and spoken words. NLP can be subset into statistical NLP, which uses machine learning or deep learning approaches and symbolic NLP, which uses a semantic rule-based methodology.


Applications of AI in pharmacoepidemiology can be broadly classified into those that extract and structure some data and those that produce some insight.


15.5.2. Data extraction


AI techniques can be used to extract text data from unstructured documents transforming it into information available in a structured, research-ready format to which statistical techniques can be applied. A potential application being explored is in extracting data from medical notes, usually including a named-entity recognition, i.e., discovering mentions of entities of a specific class or group such as medication or diseases, and a relation extraction, allowing to relate sets of entities, e.g., a medicine and an indication.


The 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text (J Am Med Inform Assoc. 2011;18(5):552-6) presents three tasks: a concept extraction of medical concepts from patient reports; a classification task focused on assigning assertion types for medical problem concepts; and a relation classification task focused on assigning relation types that hold between medical problems, tests, and treatments. Multiple algorithms were compared showing promising results for concept extraction. In NEAR: Named entity and attribute recognition of clinical concepts (J Biomed Inform. 2022;130:104092), three deep learning models were created for the same data used in the 2010 i2b2 challenge and have showed an improvement in performance.


Some of the first applications of machine learning and NLP to extract information from clinical notes focused on the identification of adverse drug events in medical notes, as illustrated in papers such as A method for systematic discovery of adverse drug events from clinical notes (J Am Med Inform Assoc. 2015;22(6):1196-204), Detecting Adverse Drug Events with Rapidly Trained Classification Models (Drug Saf. 2019;42(1):147-56) and MADEx: A System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes (Drug Saf. 2019;42(1):123-33).


Another common application for medical concept extraction from clinical text is the identification of a relevant set of patients, often referred to as computable phenotyping as exemplified in Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications (J Am Med Inform Assoc. 2010;17(5):507-13). Combining deep learning with token selection for patient phenotyping from electronic health records (Sci Rep. 2020;10(1):1432) describes the development of deep learning models to construct a computable phenotype directly from the medical notes.


A large body of research has focused on extracting information from clinical notes in electronic health records. The approach can also be applied with some adjustment to other sets of unstructured data, including spontaneous reporting systems, as reflected in Identifying risks areas related to medication administrations - text mining analysis using free-text descriptions of incident reports (BMC Health Serv Res. 2019;19(1):791), product information documentation such as presented in Machine learning-based identification and rule-based normalization of adverse drug reactions in drug labels (BMC Bioinformatics. 2019;20(Suppl. 21):707) or even literature screening for systematic reviews as explored in Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool (Syst Rev. 2018 Mar 12;7(1):45).


In the systematic review Use of unstructured text in prognostic clinical prediction models: a systematic review (J Am Med Inform Assoc. 2022 Apr 27;ocac058), data extraction from unstructured text was shown to be beneficial in most studies. However, data extraction from unstructured text data does not show perfect accuracy (or related metric) and may have wide variability with respect to model performance for the same data extraction task, as shown in ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance (Drug Saf. 2021;44(1):83-94). Thus, the application of these techniques should consider the objective in terms of precision or recall. For instance, a model that identifies medical concepts in a spontaneous report of an adverse drug reaction from a patient and maps it to a medical vocabulary might preferably focus on achieving a high recall, as false positives can be picked up in the manual review of the potential signal, whereas models with high precision and low recall may introduce irretrievable loss of information. In other words, machine learning models to extract data are likely to introduce some error and thus the error tolerance for the specific application needs to be considered.


15.5.3. Data insights


In pharmacoepidemiology, data insights extracted with machine learning models are typically one of three categories: confounding control, clinical prediction models and probabilistic phenotyping.


Propensity score methods are a predominant technique for confounding control. In practice, the propensity score is most often estimated using a logistic regression model, in which treatment status is regressed on observed baseline characteristics. In Evaluating large-scale propensity score performance through real-world and synthetic data experiments (Int J Epidemiol. 2018;47(6):2005-14) and A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting  (Biom J. 2019;61(4):1049-72) machine learning models were explored as alternatives to traditional logistic regression with a view to improve propensity score estimation. The theoretical advantages of using machine learning models include a simpler model parametrisation, by dispensing the need for investigator-defined covariate selection, and better modelling of non-linear effects and interactions. However, most studies in this field use synthetic or plasmode data and applications in real-world data need to be further explored.


The concept of rule-based, knowledge-based algorithms and risk-based stratification is not new to medicine and healthcare, the Framingham risk score being one of the most well-known. Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review (J Am Med Inform Assoc. 2022;29(5):983-9) shows that there is a growing trend to develop data-driven clinical prediction models. However, problem definition is often not clearly reported, and the final model is often not completely presented. This trend was exacerbated with the COVID-19 pandemic, where over two-hundred papers on clinical prediction models were published as mentioned in Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal (BMJ. 2020;369:m1328). The authors also suggest that prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. Clinical prediction models have also been applied for safety signal detection with some degree of success as exemplified in A supervised adverse drug reaction signalling framework imitating Bradford Hill's causality considerations (J Biomed Inform. 2015;56:356-68).


Probabilistic phenotyping is another potential use of machine learning in pharmacoepidemiology. It refers to the development of a case definition using a set of labelled examples to train a model and the outputting of the probability of a phenotype as a continuous trait. It differs from machine learning-based computable phenotyping mentioned earlier as probabilistic phenotyping takes a set of features and estimates a probability of a phenotype whereas for the computable phenotyping, the machine learning technique merely extracts information that identifies a relevant case.


Identifying who has long COVID in the USA: a machine learning approach using N3C data (Lancet Digit Health. 2022;S2589-7500(22)00048-6) describes the development of a probabilistic phenotype of patients with long COVID using machine learning models and showed a high accuracy. Probabilistic phenotyping can be applied in wider contexts. In An Application of Machine Learning in Pharmacovigilance: Estimating Likely Patient Genotype From Phenotypical Manifestations of Fluoropyrimidine Toxicity (Clin Pharmacol Ther. 2020; 107(4): 944–7), a machine learning model using clincal manifestations of adverse reactions is used to estimate the probability of having a specific genotype, known to be correlated with severe but varied outcomes.


As development of probabilistic phenotypes is likely to increase, tools to assess the performance characteristics such as PheValuator: Development and evaluation of a phenotype algorithm evaluator (J Biomed Inform. 2019;97:103258) become more relevant.



« Back