There is a considerable body of literature explaining statistical methods for observational studies but very little addressing the statistical analysis plan. A clear guide to general principles and the need for a plan is given in Design of Observational Studies (P.R. Rosenbaum, Springer Series in Statistics, 2010. Chapter18), which also gives useful advice on how to test complex hypotheses in a way that controls the chances of drawing incorrect conclusions.
Planning analyses for randomised clinical trials is covered in a number of publications. These often give checklists of the component parts of an analysis plan and much of this applies equally to non-randomised design. A good reference in this respect is the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). ICH E9 ‘Statistical Principles for Clinical Trials’. While specific guidance on the statistical analysis plan for epidemiological studies is sparse, the following principles will apply to most of the studies.
A particular concern in retrospective studies is that decisions about the analysis should be made blinded to any knowledge of the results. This should be a consideration in the study design, particularly when feasibility studies are to be performed to inform the design phase. Feasibility studies should be independent of the main study results.
The statistical and epidemiological analysis plan is usually structured to reflect the protocol and will address, where relevant, the following points:
8.1. Which confounders will be considered and how they will be defined
8.2. Adjustment for confounders in statistical models
8.3. Restriction in analysis
8.4. Matching, including PS matching
8.5. Self-controlled study designs
8.6. Statistical approach for any selection of a subset of confounders
8.7. Methods for assessing the level of confounding adjustment achieved
8.8. Sensitivity analyses for residual confounding
9.1. How missing data will be reported;
9.2. Methods of imputation;
9.3. Sensitivity analyses for handling missing data;
9.4. How censored data will be treated, with rationale.
10.1. Criteria for assessing fit;
10.2. Alternative models in the event of clear lack of fit.
11.1. Criteria, circumstances and possible drawbacks for performing an interim analysis and possible actions (including stopping rules) that can be taken on the basis of such an analysis
12.1. Description of target population;
12.2. Description of the analysis population if different, e.g. after PS matching or in IV analyses.
Missing data, or missing values, occur when no data value is stored for the variable in the current observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.
The book Statistical analysis with missing data (Little RJA, Rubin DB. 2nd ed.,Wiley 2002) describes many aspects of the handling of missing data. The section ‘Handling of missing values’ in Rothman’s Modern Epidemiology, 3rd ed. (K. Rothman, S. Greenland, T. Lash. Lippincott Williams & Wilkins, 2008) is a summary of the state of the art, focused on practical issues for epidemiologists. Ways of dealing with such data include complete subject analysis (subjects with missing values are deleted from the analyses) and imputation methods (missing data are predicted based on the observed values and the pattern of missingness). A method commonly used in epidemiology is to create a category of the variable, or an indicator, for the missing values. This practice can be invalid even if the data are missing completely at random and should be avoided (Indicator and Stratification Methods for Missing Explanatory Variables in Multiple Linear Regression. J Am Stat Assoc 1996;91(433):222–230).
A concise review of methods to handle missing data is also provided in the section ‘Missing data’ of the Encyclopedia of Epidemiologic Methods (Gail MH, Benichou J, Editors. Wiley 2000). Identifying the pattern of missing data is important as some methods for handling missing data assume a defined pattern of missingness. Biased results may be obtained if it is incorrectly assumed that data are missing at random. In general, it is desirable to show that conclusions drawn from the data are not sensitive to the particular strategy used to handle missing values. To investigate this, it may be helpful to repeat the analysis with a variety of approaches.
Other useful references on handling of missing data include the books Multiple Imputation for Nonresponse in Surveys (Rubin DB, Wiley, 1987) and Analysis of Incomplete Multivariate Data (Schafer JL, Chapman & Hall/CRC, 1997), and the articles Using the outcome for imputation of missing predictor values was preferred (J Clin Epi 2006;59(10):1092-101), Recovery of information from multiple imputation: a simulation study (Emerg Themes Epidemiol 2012;9(1):3) and Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data (Stat Med. 2014;33:3725-37).
|10. Specific topics|
|Annex 1.||Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes|