Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Mathematical Sciences (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Qiuying Sha

Committee Member 1

Shuanglin Zhang

Committee Member 2

Kui Zhang

Committee Member 3

Jingfeng Jiang


This dissertation includes two papers with each distributed in one chapter. To date, genome-wide association studies (GWAS) have identified a large number of common variants that are associated with complex diseases successfully. However, the common variants identified by GWAS only account for a small proportion of trait heritability. Many studies showed that rare variants could explain parts of the missing heritability. Since the well-developed common variant detecting methods are underpowered for rare variant association tests unless sample sizes or effect sizes are very large, investigation the roles of rare variants in complex diseases presents substantial challenges. In chapter 1, we proposed novel statistical tests to test the association between rare and common variants in a genomic region and a complex trait of interest based on cross-validation prediction error. we first proposed a prediction error method (PE) based on Ridge regression. Based on PE, we also proposed another two tests PE-WS and PE-TOW by testing a weighted combination of variants with two different weighting schemes. Using extensive simulation studies, we showed that PE-TOW and PE-WS are consistently more powerful than TOW and WS, respectively, and PE is the most powerful test when causal variants contain both common and rare variants. In genome-wide association studies (GWAS), the joint analysis of multiple phenotypes could have increased power over analyzing each phenotype individually. With this motivation, several methods that jointly analyze multiple phenotypes have been developed, such as O’Brien’s method, Trait-based Association Test that uses Extended x Simes procedure (TATES), MAONVA and MultiPhen. However, the performance of these methods under a wide range of scenarios is not consistent: one test may be powerful in some situations, but not in the others. Thus, one challenge in joint analysis of multiple phenotypes is to construct a test that could maintain good performance across different scenarios. In chapter 2, we developed a novel statistical method to test the association between a genetic variant and multiple phenotypes based on cross-validation prediction error. Extensive simulations were conducted to evaluate the type I error rates and to compare the power performance of the PE method with various existing methods. Simulation studies showed that the PE method controls the type I error rates very well and has consistently higher power than the tests we compared in all the scenarios.

Included in

Biostatistics Commons