Print page Resize text Change font-size Change font-size Change font-size High contrast

Home > Standards & Guidances > Methodological Guide

ENCePP Guide on Methodological Standards in Pharmacoepidemiology


10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

Pharmacogenetics is defined as the study of genetic variation as a determinant of drug response. It can complement information on clinical factors and disease sub-phenotypes to optimise the prediction of treatment response.


Individual variation in the response to drugs is an important clinical issue and may range from a lack of therapeutic effect to serious adverse drug reactions. This heterogeneity of response has important policy implications if individual patients not responding to conventional agents are denied access to other agents based on clinical trial evidence and systematic reviews that show no overall benefit. While clinical variables such as disease severity, age, concomitant drug use and illnesses are potentially important determinants of the response to drugs, heterogeneity in drug disposition (absorption, metabolism, distribution, and excretion) and targets (such as receptors and signal transduction modulators) may be an important cause of inter-individual variability in the therapeutic effects of drugs (see Pharmacogenomics: translating functional genomics into rational therapeutics. Science 1999;286(5439):487-91). Identification of variation in genes which modify the response to drugs provides the opportunity to optimise safety and effectiveness of the currently available drugs and develop new drugs for paediatric and adult populations (see Drug discovery: a historical perspective. Science 2000;287(5460):1960-4).


10.3.2. Identification of genetic variants

Identification of genetic variation associated with important drug or therapy-related outcomes can follow two main approaches.


The first is the candidate gene approach in which as many as dozens to thousands of genetic variations within one or several genes, including a common form of variations known as single nucleotide polymorphisms (SNPs), are genotyped, including the coding and noncoding sequence. Generally they are chosen on the grounds of biological plausibility, which may have been proven before in previous studies, or of knowledge of functional genes known to be involved in pharmacokinetic and pharmacodynamics pathways or related to the disease or intermediate phenotype. Methodological and statistical issues in pharmacogenomics (J Pharm Pharmacol 2010;62(2):161-6) discusses pros and cons of a candidate gene approach and a genome-wide scan approach (see below), and A tutorial on statistical methods for population association studies (Nat Rev Genet 2006;7(10):781-91) gives an outline of key methods that can be used. The advantage of the candidate gene approach is that resources can be directed to several important genetic polymorphisms and the higher a priori chance of relevant drug-gene interactions. This approach, however, requires a priori information about the likelihood of the polymorphism, gene, or gene-product interacting with a drug or drug pathway. Moving towards individualized medicine with pharmacogenomics (Nature 2004;429:464-8) explains that lack or incompleteness of information on genes from previous studies may result in the failure in identifying every important genetic determinant in the genome.


The second approach is hypothesis-generating or hypothesis-agnostic, known as genome-wide, which identifies genetic variants across the whole genome. By comparing the frequency of genetic or SNP markers between drug responders and non-responders, or those with or without drug toxicity, important genetic determinants are identified. In this approach, no previous information or specific gene/variant hypothesis is needed. Because of the concept of linkage disequilibrium, whereby certain genetic determinants tend to be co-inherited together, it is possible that the genetic associations identified through a genome-wide approach may not be truly biologically functional polymorphisms, but instead may simply be a linkage-related marker of another genetic determinant that is the true biologically relevant genetic determinant. Thus, this approach is considered discovery in nature. It may detect the SNPs in genes, which were previously not considered as candidate genes, or even SNPs outside of the genes. Nonetheless, failure to cover all relevant genetic risk factors can still be a problem, though less than with the candidate gene approach. It is therefore important to conduct replication and validation studies (in vivo and in vitro) to ascertain the generalisability of findings to populations of patients, to characterise the mechanistic basis of the effect of these genes on drug action, and to identify true biologic genetic determinants. This approach is useful for studying complex diseases where multiple genetic variations contribute to disease risk, but are applicable to disease and treatment outcomes.


Various genome-wide approaches are currently available including genome and exome sequencing, and application of various chips that type hundreds of thousands to billions of SNPs (e.g. exome chip). Finally, power is usually limited to detect only common variants with a large effect, and therefore large sample sizes should be considered, e.g. through pooling of biobanks.


10.3.3. Study designs

Several options are available for the design of pharmacogenetic studies. Firstly, RCTs, both pre- and post-authorisation, provide the opportunity to address several pharmacogenetic questions. Pharmacogenetics in randomized controlled trials: considerations for trial design (Pharmacogenomics 2011;12(10):1485-92) describes three different trial designs differing in the timing of randomization and genotyping, and Promises and challenges of pharmacogenetics: an overview of study design, methodological and statistical issues (JRSM Cardiovasc Dis 2012 5;1(1)) discusses outstanding methodological and statistical issues that may lead to heterogeneity among reported pharmacogenetic studies and how they may be addressed. Pharmacogenetic trials can be designed (or post hoc analysed) with the intention to study whether a subgroup of patients, defined by certain genetic characteristics, respond differently to the treatment under study. Alternatively, a trial can verify whether genotype-guided treatment is beneficial over standard care. Obvious limitations with regard to the assessment of rare adverse drug events are the large sample size required and its related high costs. In order to make a trial as efficient as possible in terms of time, money and/or sample size, it is possible to opt for an adaptive trial design, which allows prospectively planned modifications in design after patients have been enrolled in the study. Such a design uses accumulating data to decide how to modify aspects of the study during its progress, without undermining the validity and integrity of the trial. An additional benefit is that the expected number of patients exposed to an inferior/harmful treatment can be reduced (see Potential of adaptive clinical trial designs in pharmacogenetic research. Pharmacogenomics 2012;13(5):571-8).


Observational studies are the alternative and can be family-based (using twins or siblings) or population-based (using unrelated individuals). The main advantage of family-based studies is the avoidance of bias due to population stratification. A clear practical disadvantage for pharmacogenetic studies is the requirement to study families where patients have been treated with the same drugs (see Methodological quality of pharmacogenetic studies: issues of concern. Stat Med 2008;27(30):6547-69).


Population-based studies may be designed to assess drug-gene interactions as cohort (including exposure-only), case-cohort and case-control studies (including case-only, as described in Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol 1996;144(3):207-13). Sound pharmacoepidemiological principles as described in the current Guide also apply to observational pharmacogenetic studies. A specific type of confounding due to population stratification needs to be considered in pharmacogenetic studies, and, if present, needs to be dealt with. Its presence may be obvious where the study population includes more than one immediately recognisable ethnic group; however in other studies stratification may be more subtle. Population stratification can be detected by Pritchard and Rosenberg’s method, which involves genotyping additional SNPs in other areas of the genome and testing for association between them and outcome. In genome-wide association studies, the data contained within the many SNPs typed can be used to assess population stratification without the need to undertake any further genotyping. Several methods have been suggested to control for population stratification such as genomic control, structure association and EIGENSTAT.

These methods are discussed in Methodological quality of pharmacogenetic studies: issues of concern (Stat Med 2008;27(30):6547-69) and Softwares and methods for estimating genetic ancestry in human populations (Hum Genomics 2013;7:1).


The main advantage of exposure-only and case-only designs is the smaller sample size that is required, at the cost of not being able to study the main effects of drug exposure (case-only) or genetic variant (exposure-only) on the outcome. Furthermore, interaction can be assessed only on a multiplicative scale, whereas from a public health perspective additive interactions are very relevant. An important condition that has to be fulfilled for case-only studies is that the exposure is independent of the genetic variant, e.g. prescribers are not aware of the genotype of a patient and do not take this into account, directly or indirectly (by observing clinical characteristics associated with the genetic variant). In the exposure-only design, the genetic variant should not be associated with the outcome, for example variants of genes coding for cytochrome p-450 enzymes. When these conditions are fulfilled and the main interest is in the drug-gene interaction, these designs may be an efficient option. In practice, case-control and case-only studies usually result in the same interaction effect as empirically assessed in Bias in the case-only design applied to studies of gene-environment and gene-gene interaction: a systematic review and meta-analysis (Int J Epidemiol 2011;40(5):1329-41). The assumption of independence of genetic and exposure factors can be verified among controls before proceeding to the case-only analysis. Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias (Int J Epidemiol 2004;33(5):1014-24) conducted sensitivity analyses to describe the circumstances in which controls can be used as proxy for the source population when evaluating gene-environment independence. The gene-environment association in controls will be a reasonably accurate reflection of that in the source population if baseline risk of disease is small (<1%) and the interaction and independent effects are moderate (i.e. risk ratio<2), or if the disease risk is low (e.g. <5%) in all strata of genotype and exposure. Furthermore, non-independence of gene-environment can be adjusted in multivariable models if non-independence can be measured in controls.


10.3.4. Data collection


The same principles and approaches to data collection as for other pharmacoepidemiological studies can be followed (see section 3 of this Guide on Approaches to Data Collection). An efficient approach to data collection for pharmacogenetic studies is to combine secondary use of electronic health records with primary data collection (e.g. biological samples to extract DNA).


Examples are given by SLCO1B1 genetic variant associated with statin-induced myopathy: a proof-of-concept study using the clinical practice research datalink (Clin Pharmacol Ther 2013;94(6):695-701), Diuretic therapy, the alpha-adducin gene variant, and the risk of myocardial infarction or stroke in persons with treated hypertension (JAMA 2002;287(13):1680-9) and Interaction between the Gly460Trp alpha-adducin gene variant and diuretics on the risk of myocardial infarction (J Hypertens 2009 Jan;27(1):61-8). Another approach to enrich electronic health records with biological samples is record linkage to biobanks as illustrated in Genetic variation in the renin-angiotensin system modifies the beneficial effects of ACE inhibitors on the risk of diabetes mellitus among hypertensives (Hum Hypertens 2008;22(11):774-80). A third approach is to use active surveillance methods to fully characterise drug effects such that a rigorous phenotype can be developed prior to genetic analysis. This approach was followed in Adverse drug reaction active surveillance: developing a national network in Canada's children's hospitals (Pharmacoepidemiol Drug Saf 2009;18(8):713-21) and EUDRAGENE: European collaboration to establish a case-control DNA collection for studying the genetic basis of adverse drug reactions (Pharmacogenomics 2006;7(4):633-8).


10.3.5. Data analysis


The focus of data analysis should be on the measure of effect modification (see section 4.2.4 of this Guide on Effect Modification). Attention should be given to whether the mode of inheritance (e.g. dominant, recessive or additive) is defined a priori based on prior knowledge from functional studies. However, investigators are usually naïve regarding the underlying mode of inheritance. A solution might be to undertake several analyses, each under a different assumption, though the approach to analysing data raises the problem of multiple testing (see Methodological quality of pharmacogenetic studies: issues of concern. Stat Med 2008;27(30):6547-69). The problem of multiple testing and the increased risk of type I error is in general a problem in pharmacogenetic studies evaluating multiple SNPs, multiple exposures and multiple interactions. The most common approach to correct for multiple testing is to use the Bonferroni correction. This correction may be considered too conservative and runs the risk of producing many pharmacogenetic studies with a null result. Other approaches to adjust for multiple testing include permutation testing and false discovery rate (FDR) control, which are less conservative. The FDR, described in Statistical significance for genomewide studies (Proc Natl Acad Sci USA 2003;100(16):9440-5), estimates the expected proportion of false-positives among associations that are declared significant, which is expressed as a q-value.


Alternative innovative methods are under development and may be used in the future, such as the systems biology approach, a Bayesian approach, or data mining (see Methodological and statistical issues in pharmacogenomics. J Pharm Pharmacol 2010;62(2):161-6).


Important complementary approaches include the conduct of individual patient data meta-analyses and/or replication studies to avoid the risk of false-positive findings.


An important step in analysis of genome-wide association studies data that needs to be considered is the conduct of rigorous quality control procedures before conducting the final association analyses. Relevant guidelines include Guideline for data analysis of genomewide association studies (Cancer Genomics Proteomics 2007;4(1):27-34) and Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis (Curr Pharmacogenomics Person Med 2011;9(1):41-66).


10.3.6. Reporting


The guideline STrengthening the REporting of Genetic Association studies (STREGA)--an extension of the STROBE statement (Eur J Clin Invest 2009;39(4):247-66) should be followed for reporting findings of genetic studies.


10.3.7. Clinical practice guidelines


An important step towards the implementation of the use of genotype information to guide pharmacotherapy is the development of clinical practice guidelines. Several initiatives have been developed to provide these guidelines such as the Clinical Pharmacogenetics Implementation Consortium. Furthermore, several clinical practice recommendations have been published, for example Recommendations for HLA-B*15:02 and HLA-A*31:01 genetic testing to reduce the risk of carbamazepine-induced hypersensitivity reactions (Epilepsia 2014;55(4):496-506) or Clinical practice guideline: CYP2D6 genotyping for safe and efficacious codeine therapy (J Popul Ther Clin Pharmacol 2013;20(3):e369-96).


10.3.8. Resources


An important pharmacogenomics knowledge resource is available through PharmGKB that encompasses clinical information including dosing guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype-phenotype relationships. PharmGKB collects curates and disseminates knowledge about the impact of human genetic variation on drug responses.



Individual Chapters:


1. Introduction

2. Formulating the research question

3. Development of the study protocol

4. Approaches to data collection

4.1. Primary data collection

4.1.1. Surveys

4.1.2. Randomised clinical trials

4.2. Secondary data collection

4.3. Patient registries

4.3.1. Definition

4.3.2. Conceptual differences between a registry and a study

4.3.3. Methodological guidance

4.3.4. Registries which capture special populations

4.3.5. Disease registries in regulatory practice and health technology assessment

4.4. Spontaneous report database

4.5. Social media and electronic devices

4.6. Research networks

4.6.1. General considerations

4.6.2. Models of studies using multiple data sources

4.6.3. Challenges of different models

5. Study design and methods

5.1. Definition and validation of drug exposure, outcomes and covariates

5.1.1. Assessment of exposure

5.1.2. Assessment of outcomes

5.1.3. Assessment of covariates

5.1.4. Validation

5.2. Bias and confounding

5.2.1. Selection bias

5.2.2. Information bias

5.2.3. Confounding

5.3. Methods to handle bias and confounding

5.3.1. New-user designs

5.3.2. Case-only designs

5.3.3. Disease risk scores

5.3.4. Propensity scores

5.3.5. Instrumental variables

5.3.6. Prior event rate ratios

5.3.7. Handling time-dependent confounding in the analysis

5.4. Effect measure modification and interaction

5.5. Ecological analyses and case-population studies

5.6. Pragmatic trials and large simple trials

5.6.1. Pragmatic trials

5.6.2. Large simple trials

5.6.3. Randomised database studies

5.7. Systematic reviews and meta-analysis

5.8. Signal detection methodology and application

6. The statistical analysis plan

6.1. General considerations

6.2. Statistical analysis plan structure

6.3. Handling of missing data

7. Quality management

8. Dissemination and reporting

8.1. Principles of communication

8.2. Communication of study results

9. Data protection and ethical aspects

9.1. Patient and data protection

9.2. Scientific integrity and ethical conduct

10. Specific topics

10.1. Comparative effectiveness research

10.1.1. Introduction

10.1.2. General aspects

10.1.3. Prominent issues in CER

10.2. Vaccine safety and effectiveness

10.2.1. Vaccine safety

10.2.2. Vaccine effectiveness

10.3. Design and analysis of pharmacogenetic studies

10.3.1. Introduction

10.3.2. Identification of generic variants

10.3.3. Study designs

10.3.4. Data collection

10.3.5. Data analysis

10.3.6. Reporting

10.3.7. Clinical practice guidelines

10.3.8. Resources

Annex 1. Guidance on conducting systematic revies and meta-analyses of completed comparative pharmacoepidemiological studies of safety outcomes