Compared to the protocol that includes a section outlining the analyses, the SAP is a more technical, stand-alone document describing in detail the planned analyses, population definitions and methodology.
Given the influence of statistical decisions on study conclusions, a well-documented and transparent statistical plan is essential. Developing a SAP forces researchers to think about which data to collect, in which format. This may then guide decisions on e.g., measurement instruments and timing of (repeated) measurements.
Further guidance on general principles and justification for the need for a SAP are provided in Design of Observational Studies (P.R. Rosenbaum, Springer Series in Statistics, 2020).
The following objectives of a SAP apply to most studies, including observational studies:
Transparency as to how the analysis will proceed, by specifying in advance the methodology that will be applied. A SAP should always be completed prior to start of data analysis. Revisions after the start of the analysis might be possible, provided these changes are noted and justified in a revised SAP.
Communication to the study team, especially statisticians, involved in the study. It promotes good planning and efficiency for other stakeholders such as reviewers and the target audience of the study. Readers of observational research might dismiss important findings if they were not pre-specified.
Reproducibility so that in the future, for similar studies, the same analytical steps can be performed. The SAP should be sufficiently detailed so that it can be followed and reproduced by any statistician. Thus, it should provide clear and complete templates for each analysis.
Validity of study outcomes, with the SAP enabling the researcher to separate the pre-planned analyses to address the research question from data-driven analyses, to understand and interpret the data.
Pre-specification of statistical and epidemiological analyses can be challenging for data that are not collected specifically to answer the research question. This is often the case in observational studies where secondary use of data is frequent (see Chapter 8.2). Nevertheless, The Value of Statistical Analysis Plans in Observational Research: Defining High-Quality Research From the Start (JAMA 2012;308(8):773-4) provides arguments to produce a SAP for observational research which is more vulnerable to issues of reproducibility. A main component of an observational study is an initial raw dataset including a set of variables that do not usually provide a direct answer to the research question. The SAP details the statistical calculations that will be performed on these observed data and the patterns of results that will in turn be interpreted.
Specific to observational studies, strong emphasis needs to be given to measures applied to control and possibly quantify bias. Avoiding bias in observational studies: part 8 in a series of articles on evaluation of scientific publications (Dtsch Arztebl Int. 2009;106(41):664-8) explains how these methodological issues can be avoided by careful planning. Factors that may bias the results of observational studies are described in Chapter 6.1. In this context, thoughtful specification of the way missing values will be handled and the use of a small part of the data as a pilot set to guide the analysis can be useful approaches. Handling of missing data is discussed in Chapter 6.3.
In some studies, analyses that are not pre-specified will be performed in response to observations in the data, in order to support interpretation of the results. It is important to distinguish between such data-driven analyses and pre-specified findings. Post-hoc modifications to the analytical strategy should be duly noted and justified in the revision history of the SAP.