Pharmacogenomics is the co-development of a drug that targets a subgroup of the patients,and a device that predicts whether a patient is in the subgroup of responders to the
drug. It is a two-stage process, including a training stage and a validation stage. The purpose
of the training stage is to identify a biomarker positive (G+) subgroup of patients and
its complement, the biomarker negative (G-) subgroup. Typically, subgroups are discovered
by comparing the genetic profiles of the responders to the drug with the non-responders.
Microarrays could be used to develop such a diagnostic device for identification of subgroups.
The purpose of the validation stage is then to prove that the biomarker found in the
training stage has sufficient sensitivity and specificity for clinical use, and to independently
validate the efficacy and safety of the drug for the target G+ subgroup.
Major statistical problems in the analysis of microarray experiments in pharmacogenomics
include normalization of gene expressions, biomarker selection in the training stage
and determination of sample sizes for a validation study. Before doing any formal analysis
on gene expression data, it is important to normalize the data first to reduce variation between
arrays caused by sources of non-biological origin. Then for biomarker selection in
the training stage, a re-sampling based multiple testing procedure is proposed by following
the generalized partitioning principles. This procedure controls generalized Familywise
Error Rates (gFWER) asymptotically. To plan for a validation study, sample sizes for microarray
experiments are determined to meet the pre-specified sensitivity and specificity
requirements.
This dissertation is arranged as follows. Chapter 1 introduces the motivation of pharmacogenomics
and design considerations of microarray experiments in pharmacogenomics.
Chapter 2 compares different normalization techniques for microarray experiments. Chapter
3 focuses on the strong control of gFWER in multiple hypothesis testing. The resampling
based multiple testing procedures are applied to select differentially expressed
genes in the training stage. Chapter 4 formulates sample size determination procedures for
validation studies with change of platforms taken into account. Chapter 5 discusses future
research.