An epidemiological study measures a parameter of occurrence (generally incidence, prevalence or risk or rate ratio) of a health phenomenon (e.g., a disease) in a specified population and with a specified time reference (time point or time period). Epidemiological studies may be descriptive or analytic. Descriptive studies do not aim to evaluate a causal relationship between a population characteristic and the occurrence parameter and generally do not include formal comparisons between population groups. Analytic studies (also called causal inference studies), in contrast, use study populations assembled by the investigators to assess relationships that may be interpreted in causal terms. In pharmacoepidemiology, analytic studies generally aim to quantify the association between exposure to a medicine and a health phenomenon, and test the hypothesis of a causal relationship. They are comparative by nature, e.g., comparing the occurrence of an outcome between subjects being users of the medicine or non-users, or users of a different medicinal product.
Studies can be interventional or non-interventional (observational). In interventional studies, the subjects are assigned by the investigator to be either exposed or unexposed. Most often, in these studies, exposure is assigned randomly and are known as randomised clinical trials (RCTs), and are typically conducted to test the efficacy of treatments such as new medications. In RCTs, randomisation is used with the intention that the only difference between the exposed and unexposed groups will be the treatment itself. Thus, any differences in the outcome can be attributed to the effect of such treatment. In contrast to experimental studies where exposure is assigned by the investigator, in observational studies the investigator plays no role with regards to which subjects are exposed and which are unexposed. The exposures are either chosen by, or are characteristics of, the subjects themselves. Observational Studies: Cohort and Case-Control Studies (Plast Reconstr Surg. 2010;126(6):2234-42) provides a simple and clear explanation of the different types of observational studies and of their advantages and disadvantages (see also Chapter 4.2. Study designs).
In order to obtain valid estimates of the effect of a determinant on a parameter of disease occurrence, analytic studies must address three factors: random error (chance), systematic error (bias) and confounding. It is important to understand that error is defined as the difference in the measured value to the true value of a particular observation.
Random error (chance): the observed effect estimate is a numerical value which may be explained by random error because of the underlying variation in the population. The confidence interval (CI) allows the investigator to estimate the range of values within which the actual effect is likely to fall.
Systematic error (bias): the observed effect estimate may be due to systematic error in the selection of the study population or in the measurement of the exposure or disease. Two main types of biases need to be considered, selection bias and information bias. Selection bias results from procedures used to select subjects and from factors that influence study participation. For example, a case-control study may include non-case subjects with a higher prevalence of one category of the exposure of interest than in the source population for the cases. External factors such as media attention to safety issues may also influence healthcare seeking behaviours and measurement of the incidence of a given outcome. Information biases can occur whenever there are errors in the measurement of subject characteristics, for example a lack of pathology results leading to outcome misclassification of certain types of tumours, or lack of validation of exposure, leading to misclassify the exposed and non-exposed status of some study participants. For example, mothers of children with congenital malformations will recall more instances of medicine use during pregnancy than mothers of healthy children. This is known in epidemiology as “recall bias”, a type of information bias. The consequences of these errors generally depend on whether the distribution of errors for the exposure or disease depends on the value of other variables (differential misclassification) or not (nondifferential misclassification).
Confounding: Confounding results from the presence of an additional factor, known as a confounder or confounding factor, which is associated with both the exposure of interest and the outcome. As a result, the exposed and unexposed groups will likely differ not only with regards to the exposure of interest, but also with regards to a number of other characteristics, some of which are themselves related to the likelihood of developing the outcome. Confounding distorts the observed effect estimate for the outcome and the exposure under study. As there is not always a firm distinction between bias and confounding, confounding is also often classified as a type of bias.
There are many different situations where bias may occur, and some authors attribute a name to each of them. The number of such situations is in theory illimited. ENCePP recommends that, rather than being able to name each of them, it is preferable to understand the underlying mechanisms of information bias, selection bias and confounding, be alert to their presence and likelihood of occurrence in a study, and recognise methods for their prevention, detection, and control at the analytical stage if possible - such as restriction, stratification, matching, regression and sensitivity analyses. Chapter 6 on methods to address bias nevertheless treats time-related bias (a type of information bias with misclassification of person-time) separately, as it may have important consequences on the result of a study and may be dealt with by design and time-dependent analyses.
The role of chance (random error) in the interpretation of evidence in epidemiology has often relied on whether the p-value is below a certainty threshold and/or the confidence interval excludes some reference value. The ASA statement on P values: context, process, and purpose (Am Statistician 2016;70(2),129-33) of the American Statistical Association emphasised that a p-value, or statistical significance, does not provide a good measure of evidence regarding a model or hypothesis, nor does it measure the size of an effect or the importance of a result. It is therefore recommended to avoid relying only on statistical significance, such as p-values, to interpret study results (see, for example, Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations, Eur J Epidemiol. 2016;31(4):337-50; Scientists rise up against statistical significance, Nature 2019;567(7748):305-7; It’s time to talk about ditching statistical significance, Nature 2019;567(7748):283; Chapter 15. Precision and Study size in Modern epidemiology, Lash TL, VanderWeele TJ, Haneuse S, Rothman KJ, 4^{th} edition, Philadelphia, PA, Wolters Kluwer, 2021). This series of articles led to substantial changes in the guidelines for reporting study results in manuscripts submitted to medical journals, as discussed in Preparing a manuscript for submission to a medical journal (International Committee for Medical Journal Editors, 2021). Causal analyses of existing databases: no power calculations required (J Clin Epidemiol. 2022;144:203-5) encourages researchers to use large healthcare databases to estimate measures of association as opposed to systematically attempting at testing hypotheses (with sufficient power). The ENCePP also recommends that, instead of a dichotomous interpretation based on whether a p-value is below a certain threshold, or a confidence interval excludes some reference value, researchers should rely on a more comprehensive quantitative interpretation that considers the magnitude, precision, and possible bias in the estimates, in addition to a qualitative assessment of the relevance of the selected study design. This is considered a more appropriate approach than one that ascribes to chance any result that does not meet conventional criteria for statistical significance.
Given that the large number of observational studies performed urgently with existing data and in sometimes difficult conditions in early times of the COVID-19 pandemic has raised concerns about the validity of many studies published without peer-review, we recommend to balance urgency and use of appropriate methodology. Considerations for pharmacoepidemiological analyses in the SARS-CoV-2 pandemic (Pharmacoepidemiol Drug Saf. 2020;29(8):825-83) provides recommendations across eight domains: (1) timeliness of evidence generation; (2) the need to align observational and interventional research on efficacy (3) the specific challenges related to “real‐time epidemiology” during an ongoing pandemic; (4) which design to use to answer a specific question; (5) considerations on the definition of exposures and outcomes and what covariates to collect ; (6) the need for transparent reporting; (7) temporal and geographical aspects to be considered when ascertaining outcomes in COVID-19 patients, and (8) the need for rapid assessment. The article Biases in evaluating the safety and effectiveness of drugs for covid-19: designing real-world evidence studies.(Am J Epidemiol. 2021;190(8):1452-6) reviews and illustrates how immortal time bias and selection bias were present in several studies evaluating the effects of drugs on SARS-CoV-2 infection, and how they can be addressed. Although these two examples specifically refer to COVID-19 studies, such considerations are applicable to research questions with other types of exposures and outcomes.
COVID-19 pandemic-related disruptions in healthcare are likely to have impacted the design of current as well as future non-interventional, real-world studies. Changes in access to healthcare and healthcare seeking behavior during the pandemic will create and exacerbate the challenges inherent to observational studies when using real-world data from this period. The article Noninterventional studies in the COVID-19 era: methodological considerations for study design and analysis (J Clin Epidemiol. 2023;153:91-101) presents a general framework for supporting study design of non-interventional studies using real-world data from the COVID-19 era.
Finally, graphical frameworks for presenting study designs are increasingly recommended, to foster transparency, enhance understanding of the design, and support the evaluation of study protocols and the interpretation of study results, as illustrated in A Framework for Visualizing Study Designs and Data Observability in Electronic Health Record Data (Clin Epidemiol. 2022;14:601-8) and Visualizations throughout pharmacoepidemiology study planning, implementation, and reporting (Pharmacoepidemiol Drug Saf. 2022;31(11):1140-52).