Date of Award


Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Mathematical Sciences (PhD)

Administrative Home Department

Department of Mathematical Sciences

Advisor 1

Qiuying Sha

Advisor 2

Shuanglin Zhang

Committee Member 1

Kui Zhang

Committee Member 2

Laura E. Brown


This dissertation includes four papers with each distributed in one chapter.

In chapter 1, I compared the performance of eight multivariate phenotype association tests. The motivation to conduct this power comparison paper is as follows. For nearly 15 years, genome-wide association studies (GWAS) have been widely used to identify genetic variants associated with human diseases and traits. GWAS typically investigate genetic variants for a predefined phenotype, thus fail to identify weak but important effects. In recent years, many multivariate association tests have been developed. However, there is a lack of comprehensive summary of such kinds of approaches. To fill this important gap, I did this power comparison work. The results show that none of the methods is consistently more powerful than that of others. Relatively more powerful methods are still in large demanding.

In chapter 2, I proposed a Weighted Combination of multiple Phenotypes approach (WCmulP) for testing multiple correlated phenotypes and one genetic variant of interest. WCmulP linearly combines the multiple phenotypes with optimal weights such that the score test statistic is maximized. I compare WCmulP with other widely used tests and conduct extensive simulation studies as well as real data analysis to evaluate the performance of these methods. The results show that WCmulP outperforms the compared methods in most of the simulation scenarios and real data analysis.

As the availability of electronic health record (EHR), thousands of clinical phenotypes can be measured and collected systematically. As a result, the phenome-wide association studies (PheWAS) emerged to detect variants with a broad spectrum of phenotypes. However, the current PheWAS are intrinsically univariate test, which investigate the phenotype one at a time. Genuine PheWAS that simultaneously test the wide range of phenotypes need to be discovered. In chapter 3, I proposed a novel PheWAS approach, which referred to as PheCLC (PheWAS using clustering linear combination), to examine genetic variation associated with up to thousands of phenotypes. PheCLC jointly analyzes a wide spectrum of human phenotypes as well as classifies them into different categories based on the International Classification of Diseases (ICD) codes. The simulation results show that PheCLC certainly controls type I error rates and is much more powerful than the traditional multivariate approaches.

To date, GWAS have published thousands of common variants associated with human diseases. However, these common variants only contribute a small portion of the phenotypic variance. Many studies showed that rare variants could substantially explain missing heritability. In chapter 4, I derived a rare variant association study for family-based designs, where the rare variants can be enriched compared to population-based designs. I applied the proposed method as well as the other two family-based tests to the genetic analysis workshop 19 (GAW19) dataset and the results show that our method can identify more genes with power greater than 40% than the other two methods.

Available for download on Friday, May 01, 2020