A case definition compatible with the data source should be developed for each outcome of a study at the design stage. This description should include how events will be identified and classified as cases, whether cases will include prevalent as well as incident cases, exacerbations and second episodes (as differentiated from repeat codes) and all other inclusion or exclusion criteria. If feasible, prevalent cases should not be included. The reason for the data collection and the nature of the healthcare system that generated the data should also be described as they can impact on the quality of the available information and the presence of potential biases. Published case definitions of outcomes, such as those developed by the Brighton Collaboration in the context of vaccine studies, are useful but not necessarily compatible with the information available in observational data sources. For example, information on the onset or duration of symptoms, or clinical diagnostic procedures, may not be available.
Search criteria to identify outcomes should be defined and the list of codes and any used case finding algorithm should be provided. Generation of code lists requires expertise in both the coding system and the disease area. Researchers should consult clinicians who are familiar with the coding practice within the studied field. Suggested methodologies are available for some coding systems, as described in Creating medical and drug code lists to identify cases in primary care databases (Pharmacoepidemiol Drug Saf. 2009;18(8):704-7). Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models (Annu Rev Biomed Data Sci. 2018;1:53-68) reports on methods for phenotyping (finding subjects with specific conditions or outcomes) which are becoming more commonly used, particularly in multi-database studies (see Chapters 9 and 16.6). Care should be given when re-using a code list from another study as code lists depend on the study objective and methods. Public repository of codes such as Clinicalcodes.org are available and researchers are also encouraged to make their own set of coding available.
In some circumstances, chart review or free text entries in electronic format linked to coded entries can be useful for outcome identification or confirmation. Such identification may involve an algorithm with use of multiple code lists (for example disease plus therapy codes) or an endpoint committee to adjudicate available information against a case definition. In some cases, initial plausibility checks or subsequent medical chart review will be necessary. When databases contain prescription data only, drug exposure may be used as a proxy for an outcome, or linkage to different databases is required. The accurate date of onset is particularly important for studies relying upon timing of exposure and outcome such as in the self-controlled designs (see Chapter 4.2.3).