There is a considerable body of literature explaining statistical methods for observational studies but very little addressing the statistical analysis plan. A clear guide to general principles and the need for a plan is given in Design of Observational Studies (P.R. Rosenbaum, Springer Series in Statistics, 2010. Chapter 18), which also gives useful advice on how to test complex hypotheses in a way that minimises the chances of drawing incorrect conclusions.
Planning analyses for randomised clinical trials is covered in a number of publications. These often give checklists of the component parts of an analysis plan and much of this applies equally to non-randomised designs. A good reference in this respect is the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH): ICH E9 ‘Statistical Principles for Clinical Trials’ and its addendum on estimands and sensitivity analysis in clinical trials (ICH E9(R1).
While specific guidance on the statistical analysis plan for epidemiological studies is sparse, the following principles will apply to most of the studies.
A study is generally designed with the objective of addressing a set of research questions. However, the initial product of a study is a set of numerical and categorical observations that do not usually provide a direct answer to the questions that the study is designed to address. The statistical analysis plan details the mathematical transformations that will be performed on the observed data in the study and the patterns of results that will be interpreted as supporting answers to the questions. An important part of the statistical analysis plan will explain how problems in the data will be handled in such calculations, for example missing or partial data.
The statistical analysis plan should be sufficiently detailed so that it can be followed and reproduced by any competent analyst. Thus, it should provide clear and complete templates for each analysis.
Pre-specification of statistical and epidemiological analyses can be challenging for data that are notcollected specifically to answer the study questions. This is often the case in observational studies, where secondary data are used. However, thoughtful specification of the way missing values will be handled or the use of a small part of the data as a pilot set to guide analysis can be useful techniques to overcome such problems. A feature common to most studies is that some not pre-specified analyses will be performed in response to observations in the data to help interpretation of results. It is important to distinguish between such data-driven analyses and the pre-specified findings. Post-hoc modifications to the analysis strategy should be noted and explained. The statistical analysis plan provides a confirmation of this process.
Strong emphasis will be given in studies using observational data to measures taken to control and quantify levels of bias. Thus, part of the analysis plan will be devoted to converting scientific understanding of the causal relationships between the exposures and outcomes that are the primary focus of the study and other variables that are available in the dataset into a credible mathematical model. It is also advisable to include appropriate negative controls – (exposure, outcome) pairs that are strongly believed not to be causally related for which a similar model is considered reasonable – in the analysis as these may indicate uncontrolled confounding.
A particular concern in retrospective studies is that decisions about the analysis should be made blinded to any knowledge of the results. This should be a consideration in the study design, particularly when feasibility studies are to be performed to inform the design phase. Feasibility studies should be independent of the main study results.